文章预览
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 RO - 机器人 1、[LG] Value-Incentivized Preference Optimization:A Unified Approach to Online and Offline RLHF 2、[LG] Understanding Transformer Reasoning Capabilities via Graph Algorithms 3、[LG] Robust Preference Optimization through Reward Model Distillation 4、[CL] Nearest Neighbor Speculative Decoding for LLM Generation and Attribution 5、[LG] Zipper:A Multi-Tower Decoder Architecture for Fusing Modalities 摘要:价值激励偏好优化、通过图算法理解Transformer的推理能力、通过奖励模型蒸馏实现鲁棒偏好优化、面向LLM生成和归因的最近邻推测解码、面向模态融合的多塔解码器架构 1、[LG] Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF S Cen, J Mei, K Goshvadi, H Dai… [Google & CMU] 价值激励偏好优化:在线和离线RLHF的统一方法 要点: 提出价值激励偏好优化(VPO),用于在线和离线的R
………………………………