文章预览
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 1、[LG] Time Matters:Scaling Laws for Any Budget 2、[LG] All Random Features Representations are Equivalent 3、[LG] Infinite Width Models That Work:Why Feature Learning Doesn't Matter as Much as You Think 4、[LG] Efficient World Models with Context-Aware Tokenization 5、[CL] Efficacy of Language Model Self-Play in Non-Zero-Sum Games 摘要:适用于任意预算的缩放律、所有随机特征表示都是等价的、有效的无限宽模型、基于上下文感知词元化的高效世界模型、非零和博弈中语言模型自弈的有效性 1、[LG] Time Matters: Scaling Laws for Any Budget I Inbar, L Sernau [Google DeepMind] 时间很重要:适用于任意预算的缩放律 要点: 语言模型的最终质量受参数数量和训练数据量的约束。 基于FLOPs的训练时间估计很差,一个更准确的代理是基于内存拷贝量。 通过简单的计算,我们可以仅从超参数估计Transfo
………………………………