2024-07-18 05:39
本条微博链接
通过分析LLM预训练中梯度动态,发现不同层低秩结构以非均匀方式出现,并提出自适应层级低秩压缩与仅反向传播LRCs的高效联合微调策略WeLore。
[LG]《From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients》A Jaiswal, L Yin, Z Zhang, S Liu... [University of Texas at Austin University of S
………………………………