【机器学习】深度学习推荐系统（二十八）：X 推荐算法listwiseRescoring（同刷多样性降权）机制详解-程序员充电站

X 推荐算法列表式重评分机制详解

前言

列表式重评分（Listwise Rescoring）是 X 推荐系统中一个重要的分数修正机制。与传统的点式（Pointwise）或对式（Pairwise）评分不同，列表式重评分考虑整个候选列表的上下文，根据候选在组内的位置来调整分数，从而确保推荐结果的多样性。

一、列表式重评分概述

1.1 什么是列表式重评分？

列表式重评分是一种基于位置的分数调整机制：

分组：将候选推文按照某个键（Key）分组（如作者、服务类型等）
排序：在每个组内按分数降序排序
重评分：根据候选在组内的位置（索引）应用重评分因子

1.2 为什么需要列表式重评分？

问题：如果只使用模型分数排序，可能会出现：

同一个作者的多个推文连续出现
同一类型的推文过度集中
缺乏内容多样性

解决方案：列表式重评分通过位置衰减机制，降低同一组内靠后候选的分数，从而：

确保作者多样性
确保内容类型多样性
提升用户体验

1.3 核心概念

候选列表 ↓ 按 Key 分组（如作者ID） ↓ 组内按分数排序 ↓ 根据位置应用重评分因子 ↓ 更新分数

二、列表式重评分框架

2.1 核心 Trait

traitListwiseRescoringProvider[C<:CandidateWithFeatures[TweetCandidate],K]{/** * 定义分组键（Key） * 例如：作者ID、服务类型等 */defgroupByKey(candidate:C):Option[K]/** * 根据候选在组内的位置计算重评分因子 * @param index: 候选在组内的位置（0-based，按分数降序） */defcandidateRescoringFactor(query:PipelineQuery,candidate:C,index:Int):Double/** * 应用列表式重评分 * 返回：Map[推文ID, 重评分因子] */defapply(query:PipelineQuery,candidates:Seq[C]):Map[Long,Double]={candidates.groupBy(groupByKey)// 按 Key 分组.flatMap{case(Some(_),groupedCandidates)=>// 组内按分数降序排序valsortedCandidates=groupedCandidates.sortBy(_.features.getOrElse(ScoreFeature,None).getOrElse(0.0))(Ordering.Double.reverse)// 根据位置计算重评分因子sortedCandidates.zipWithIndex.map{case(candidate,index)=>candidate.candidate.id->candidateRescoringFactor(query,candidate,index)}case_=>Map.empty}}}

2.2 工作流程

1. 输入：候选推文列表 ↓ 2. 按 groupByKey 分组 例如：按作者ID分组 { author1: [tweet1, tweet2, tweet3], author2: [tweet4, tweet5], ... } ↓ 3. 每个组内按分数降序排序 { author1: [tweet1(10.0), tweet2(9.0), tweet3(8.0)], author2: [tweet4(9.5), tweet5(8.5)], ... } ↓ 4. 根据位置计算重评分因子 { author1: { tweet1: factor(0) = 1.0, // 第1个，不衰减 tweet2: factor(1) = 0.8, // 第2个，衰减20% tweet3: factor(2) = 0.64, // 第3个，衰减36% }, author2: { tweet4: factor(0) = 1.0, tweet5: factor(1) = 0.8, } } ↓ 5. 输出：Map[推文ID, 重评分因子]

2.3 在 HeuristicScorer 中的使用

valrescorers=Seq(// ... 其他重评分规则RescoreListwise(AuthorBasedListwiseRescoringProvider(query,candidates)),RescoreListwise(ContentExplorationListwiseRescoringProvider(query,candidates)),// ... 更多列表式重评分)// RescoreListwise 的实现caseclassRescoreListwise(listwiseRescoringMap:Map[Long,Double])extendsRescoringFactorProvider{overridedefselector(...):Boolean=listwiseRescoringMap.contains(candidate.candidate.id)overridedeffactor(...):Double=listwiseRescoringMap(candidate.candidate.id)}

使用方式：

finalScore=currentScore × listwiseRescoringFactor

三、列表式重评分提供者详解

3.1 基于作者的重评分（AuthorBasedListwiseRescoringProvider）

3.1.1 目的

确保作者多样性，避免同一作者的多个推文连续出现。

3.1.2 分组键

overridedefgroupByKey(candidate:CandidateWithFeatures[TweetCandidate]):Option[Long]=candidate.features.getOrElse(AuthorIdFeature,None)

分组依据：作者ID

3.1.3 重评分因子

overridedefcandidateRescoringFactor(query:PipelineQuery,candidate:CandidateWithFeatures[TweetCandidate],index:Int):Double={valisSmallFollowGraph=query.features.get.getOrElse(SGSFollowedUsersFeature,Seq.empty).size<=MinFollowedvaldecayFactor=if(isSmallFollowGraph){query.params(SmallFollowGraphAuthorDiversityDecayFactor)}else{query.params(AuthorDiversityDecayFactor)}valfloor=if(isSmallFollowGraph){query.params(SmallFollowGraphAuthorDiversityFloor)}else{query.params(AuthorDiversityFloor)}authorDiversityBasedRescorer(index,decayFactor,floor)}

公式：

factor = (1 - floor) × decayFactor^index + floor 其中： - index: 候选在组内的位置（0-based） - decayFactor: 衰减因子（通常 0.5-0.9） - floor: 最低因子（通常 0.1-0.5）

3.1.4 实际示例

假设decayFactor = 0.8,floor = 0.2：

位置	计算	因子
0 (第1个)	`(1-0.2) × 0.8^0 + 0.2 = 0.8 × 1 + 0.2`	1.0
1 (第2个)	`(1-0.2) × 0.8^1 + 0.2 = 0.8 × 0.8 + 0.2`	0.84
2 (第3个)	`(1-0.2) × 0.8^2 + 0.2 = 0.8 × 0.64 + 0.2`	0.712
3 (第4个)	`(1-0.2) × 0.8^3 + 0.2 = 0.8 × 0.512 + 0.2`	0.61
…	…	…
∞	`(1-0.2) × 0 + 0.2`	0.2(floor)

效果：

同一作者的第1个推文：不衰减（factor = 1.0）
同一作者的第2个推文：衰减16%（factor = 0.84）
同一作者的第3个推文：衰减28.8%（factor = 0.712）
同一作者的第4个推文：衰减39%（factor = 0.61）
后续推文：最低衰减到 floor（factor = 0.2）

3.1.5 小关注图特殊处理

如果用户关注数 ≤ 50，使用不同的参数：

SmallFollowGraphAuthorDiversityDecayFactor
SmallFollowGraphAuthorDiversityFloor

原因：小关注图的用户可能更愿意看到同一作者的多个推文。

3.2 基于已印象作者的重评分（ImpressedAuthorDecayRescoringProvider）

3.2.1 目的

考虑用户已经看过的作者，进一步降低这些作者推文的分数。

3.2.2 核心机制

defapply(...):Map[Long,Double]={// 1. 计算已印象作者的频率valauthorFreq=calculateAuthorImpressionFrequencies(query)// 2. 分组并排序candidates.groupBy(groupByKey).flatMap{case(Some(authorId),groupedCandidates)=>valsortedCandidates=groupedCandidates.sortBy(...)sortedCandidates.zipWithIndex.map{case(candidate,index)=>// 3. 有效索引 = 组内位置 + 已印象次数valeffectiveIndex=index+authorFreq.getOrElse(authorId,0)// 4. 根据是否内部网络使用不同的衰减参数valisInNetworkCandidate=candidate.features.getOrElse(InNetworkFeature,true)valdecayFactor=if(isInNetworkCandidate)inNetworkDecayFactorelseoutNetworkDecayFactorvalfloor=if(isInNetworkCandidate)inNetworkFloorelseoutNetworkFloor authorDiversityBasedRescorer(effectiveIndex,decayFactor,floor)}}}

3.2.3 已印象频率计算

privatedefcalculateAuthorImpressionFrequencies(query:PipelineQuery):Map[Long,Int]={valimpressedTweetIds=query.features.map(_.getOrElse(ImpressedTweets,Seq.empty)).getOrElse(Seq.empty).toSetvalservedAuthorMap=query.features.map(_.get(ServedAuthorIdsFeature)).getOrElse(Map.empty)servedAuthorMap.map{case(authorId,tweetIds)=>valimpressedCount=tweetIds.count(impressedTweetIds.contains)authorId->impressedCount}.filter(_._2>0)// 只包含至少有一个已印象推文的作者}

3.2.4 实际示例

假设：

作者A在组内是第1个（index = 0）
用户已经看过作者A的2个推文（authorFreq = 2）
decayFactor = 0.8,floor = 0.2

计算：

effectiveIndex = 0 + 2 = 2 factor = (1 - 0.2) × 0.8^2 + 0.2 = 0.8 × 0.64 + 0.2 = 0.712

效果：即使这是作者A在组内的第1个推文，但由于用户已经看过该作者的推文，有效索引变为2，分数衰减28.8%。

3.3 基于内容探索的重评分（ContentExplorationListwiseRescoringProvider）

3.3.1 目的

限制内容探索类型的推文数量，避免过度探索。

3.3.2 分组键

overridedefgroupByKey(candidate:CandidateWithFeatures[TweetCandidate]):Option[ServedType]=Some(candidate.features.get(ServedTypeFeature))

分组依据：服务类型（ServedType）

3.3.3 重评分因子

overridedefcandidateRescoringFactor(query:PipelineQuery,candidate:CandidateWithFeatures[TweetCandidate],index:Int):Double={if(query.params(EnableContentExplorationCandidateMaxCountParam)){valservedType=candidate.features.get(ServedTypeFeature)if(servedType==ServedType.ForYouUserInterestSummary||servedType==ServedType.ForYouContentExploration||servedType==ServedType.ForYouContentExplorationTier2||servedType==ServedType.ForYouContentExplorationDeepRetrievalI2i||servedType==ServedType.ForYouContentExplorationTier2DeepRetrievalI2i){0.0001// 几乎完全衰减}else{1.0}}else{1.0}}

逻辑：

如果是内容探索类型的推文：factor = 0.0001（几乎完全衰减）
否则：factor = 1.0（不衰减）

注意：这个实现不按位置衰减，而是对所有内容探索类型的推文应用相同的衰减因子。

3.4 基于深度检索的重评分（DeepRetrievalListwiseRescoringProvider）

3.4.1 目的

限制深度检索类型的推文数量。

3.4.2 重评分因子

overridedefcandidateRescoringFactor(query:PipelineQuery,candidate:CandidateWithFeatures[TweetCandidate],index:Int):Double={if(query.params(EnableDeepRetrievalMaxCountParam)){valservedType=candidate.features.get(ServedTypeFeature)valmaxCount=query.params(DeepRetrievalMaxCountParam)if(servedType==ServedType.ForYouContentExplorationDeepRetrievalI2i&&index>=maxCount){0.0001// 超过最大数量后几乎完全衰减}else{1.0}}else{1.0}}

逻辑：

如果是深度检索类型且位置 >= maxCount：factor = 0.0001
否则：factor = 1.0

示例：

maxCount = 3
位置 0, 1, 2：factor = 1.0
位置 3, 4, …：factor = 0.0001

3.5 基于候选源多样性的重评分（CandidateSourceDiversityListwiseRescoringProvider）

3.5.1 目的

确保候选源多样性，避免同一来源的推文过度集中。

3.5.2 分组键

overridedefgroupByKey(candidate:CandidateWithFeatures[TweetCandidate]):Option[(ServedType,Option[SourceSignal])]={valservedType=candidate.features.get(ServedTypeFeature)valsourceSignalOpt=candidate.features.getOrElse(SourceSignalFeature,None)Some((servedType,sourceSignalOpt))}

分组依据：(服务类型, 源信号)

3.5.3 重评分因子

overridedefcandidateRescoringFactor(query:PipelineQuery,candidate:CandidateWithFeatures[TweetCandidate],index:Int):Double={candidate.features.get(ServedTypeFeature)match{caseServedType.ForYouInNetwork=>1.0// 内部网络不衰减case_=>if(query.params(EnableCandidateSourceDiversityDecay)){valdecayFactor=query.params(CandidateSourceDiversityDecayFactor)valfloor=query.params(CandidateSourceDiversityFloor)candidateSourceDiversityRescorer(index,decayFactor,floor)}else{1.0}}}

公式：

factor = (1 - floor) × decayFactor^index + floor

特殊处理：

内部网络推文（ForYouInNetwork）：不衰减（factor = 1.0）
其他类型：按位置衰减

3.6 其他列表式重评分提供者

3.6.1 EvergreenDeepRetrievalListwiseRescoringProvider

类似DeepRetrievalListwiseRescoringProvider，但针对常青深度检索类型。

3.6.2 EvergreenDeepRetrievalCrossBorderListwiseRescoringProvider

针对跨境常青深度检索类型。

3.6.3 ImpressedMediaClusterBasedListwiseRescoringProvider

基于已印象媒体聚类的重评分，降低已看过的媒体聚类内容的分数。

3.6.4 ImpressedImageClusterBasedListwiseRescoringProvider

基于已印象图片聚类的重评分，降低已看过的图片聚类内容的分数。

四、列表式重评分的数学原理

4.1 指数衰减公式

大多数列表式重评分使用指数衰减：

factor(index) = (1 - floor) × decayFactor^index + floor 其中： - index: 候选在组内的位置（0-based） - decayFactor: 衰减因子（0 < decayFactor < 1） - floor: 最低因子（0 < floor < 1）

4.2 公式特性

index = 0：factor = (1 - floor) × 1 + floor = 1.0（不衰减）
index → ∞：factor → floor（收敛到最低值）
单调递减：位置越靠后，因子越小
平滑衰减：指数衰减比线性衰减更平滑

4.3 参数影响

参数	影响	示例
`decayFactor`增大	衰减更慢	0.9 vs 0.5
`decayFactor`减小	衰减更快	0.5 vs 0.9
`floor`增大	最低值更高	0.5 vs 0.1
`floor`减小	最低值更低	0.1 vs 0.5

4.4 衰减曲线示例

假设decayFactor = 0.8,floor = 0.2：

factor 1.0 |● | \ 0.8 | \● | \ 0.6 | \● | \ 0.4 | \● | \ 0.2 |________\●___________ (floor) | \ 0.0 |___________\___________ 0 1 2 3 4 5 ... index

五、列表式重评分的完整流程

5.1 在推荐 Pipeline 中的位置

候选生成 ↓ 特征提取 ↓ 模型评分（多模型融合） ↓ 启发式重评分 ├─ 点式重评分（RescoreOutOfNetwork, RescoreReplies, ...） └─ 列表式重评分（AuthorBased, ContentExploration, ...）⭐ ↓ Phoenix 重评分（可选） ↓ 多样性重评分（可选） ↓ 最终排序

5.2 列表式重评分的执行顺序

在HeuristicScorer中，列表式重评分在点式重评分之后执行：

valrescorers=Seq(RescoreOutOfNetwork,// 点式RescoreReplies,// 点式RescoreMTLNormalization(...),// 点式RescoreListwise(AuthorBasedListwiseRescoringProvider(...)),// 列表式RescoreListwise(ContentExplorationListwiseRescoringProvider(...)),// 列表式// ... 更多列表式重评分)valscaleFactor=rescorers.map(_(query,candidate)).product

注意：所有重评分因子是相乘的。

5.3 实际计算示例

假设一个推文：

步骤1：多模型融合

WeightedModelScore = 10.0

步骤2：点式重评分

RescoreOutOfNetwork = 0.8 RescoreReplies = 1.0 RescoreMTLNormalization = 0.95 Score = 10.0 × 0.8 × 1.0 × 0.95 = 7.6

步骤3：列表式重评分（基于作者）

假设：

这是作者A在组内的第2个推文（index = 1）
decayFactor = 0.8,floor = 0.2
factor = (1 - 0.2) × 0.8^1 + 0.2 = 0.84

Score = 7.6 × 0.84 = 6.384

步骤4：其他列表式重评分

ContentExploration = 1.0 CandidateSourceDiversity = 0.9 Score = 6.384 × 1.0 × 0.9 = 5.746

最终分数：5.746

六、列表式重评分的优势

6.1 确保多样性

作者多样性：避免同一作者连续出现
内容类型多样性：避免同一类型过度集中
来源多样性：确保不同来源的内容

6.2 考虑上下文

列表上下文：考虑整个候选列表
用户历史：考虑已印象的内容
位置感知：根据位置调整分数

6.3 灵活可配置

参数可调：衰减因子和最低值可配置
条件启用：可以根据条件启用/禁用
分组灵活：可以按不同键分组

6.4 平滑衰减

指数衰减：比硬性限制更平滑
有下限：不会完全衰减到0
可预测：衰减曲线可预测

七、列表式重评分的局限性

7.1 计算复杂度

分组排序：需要对每个组进行排序
多次遍历：需要多次遍历候选列表
内存开销：需要存储分组结果

7.2 参数调优

参数敏感：衰减因子和最低值需要仔细调优
场景依赖：不同场景可能需要不同参数
A/B 测试：需要大量 A/B 测试找到最优参数

7.3 可能的问题

过度衰减：可能导致高质量内容被过度降低
位置偏差：可能对位置靠后的候选不公平
组间不平衡：不同组的大小可能差异很大

八、最佳实践

8.1 参数设置

衰减因子（decayFactor）
- 范围：0.5 - 0.9
- 较小值：衰减更快，多样性更强
- 较大值：衰减更慢，相关性更强
最低值（floor）
- 范围：0.1 - 0.5
- 较小值：允许更多多样性
- 较大值：保证最低相关性

8.2 分组策略

选择合适的键：根据业务目标选择分组键
避免过度分组：分组太细可能导致效果不明显
考虑组大小：确保每个组有足够的候选

8.3 监控指标

多样性指标：作者多样性、类型多样性
相关性指标：CTR、参与度
用户体验指标：满意度、留存率

九、总结

9.1 核心要点

列表式重评分：基于整个候选列表的上下文调整分数
分组排序：按键分组，组内按分数排序
位置衰减：根据位置应用指数衰减
多样性保证：确保推荐结果的多样性

9.2 关键公式

factor(index) = (1 - floor) × decayFactor^index + floor 最终分数 = 原始分数 × factor

9.3 主要提供者

AuthorBasedListwiseRescoringProvider：作者多样性
ImpressedAuthorDecayRescoringProvider：已印象作者衰减
ContentExplorationListwiseRescoringProvider：内容探索限制
DeepRetrievalListwiseRescoringProvider：深度检索限制
CandidateSourceDiversityListwiseRescoringProvider：候选源多样性

9.4 设计理念

多样性优先：在保证相关性的同时确保多样性
位置感知：根据位置调整分数
平滑衰减：使用指数衰减而非硬性限制
灵活配置：参数可配置，易于调优

参考文件:

home-mixer/server/src/main/scala/com/twitter/home_mixer/product/scored_tweets/scorer/ListwiseRescoringProvider.scala
home-mixer/server/src/main/scala/com/twitter/home_mixer/product/scored_tweets/scorer/AuthorBasedListwiseRescoringProvider.scala
home-mixer/server/src/main/scala/com/twitter/home_mixer/product/scored_tweets/scorer/ContentExplorationListwiseRescoringProvider.scala
home-mixer/server/src/main/scala/com/twitter/home_mixer/product/scored_tweets/scorer/ImpressedAuthorDecayRescoringProvider.scala