文章预览
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人 1、[CL] Mistral-C2F:Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs 2、[CL] It Takes Two:On the Seamlessness between Reward and Policy Model in RLHF 3、[LG] Flow Map Matching 4、[LG] Beyond Model Collapse:Scaling Up with Synthesized Data Requires Reinforcement 5、[LG] Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning 摘要:用于增强RLH和高效混合LLM分析和推理能力的"从粗到细"Actor、RLHF中奖励模型与策略模型的无缝性、流映射匹配、用强化反馈防止大规模合成数据迭代训练造成的模型坍缩、将值迭代网络扩展至5000层实现极长程规划 1、[CL] Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs C Zheng, K Sun, X Zhou [Bytedance] Mistral-C2F:用于增强RLH和高效混合LLM分析
………………………………