文章预览
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 RO - 机器人 1、[LG] Deconstructing What Makes a Good Optimizer for Language Models 2、[LG] Towards a theory of learning dynamics in deep state space models 3、[LG] Vision language models are blind 4、[IR] BM25S:Orders of magnitude faster lexical search via eager sparse scoring 5、[CL] Variational Best-of-N Alignment 摘要:面向语言模型的优化器要素解构、深度状态空间模型学习动态理论研究、视觉语言模型视觉能力的局限性、通过快速稀疏评分将BM25速度提高几个数量级、变分最优N选1(vBoN)对齐 1、[LG] Deconstructing What Makes a Good Optimizer for Language Models R Zhao, D Morwani, D Brandfonbrener, N Vyas, S Kakade [Harvard University] 面向语言模型的优化器要素解构 要点: 进行了全面实验,比较了SGD、Adam、Lion、Adafactor和Signum等多种优化算法在不同模型规模、超参数和模型结构下用于训练自动回归语言
………………………………