firstbacksecondback
26 Results
Workshop
|
A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules Kairong Luo · Haodong Wen · Shengding Hu · Zhenbo Sun · Zhiyuan Liu · Maosong Sun · Kaifeng Lyu · Wenguang Chen |
||
Poster
|
Wed 16:30 |
D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models Haoran Que · Jiaheng Liu · Ge Zhang · Chenchen Zhang · Xingwei Qu · Yinghao Ma · Feiyu Duan · ZhiqiBai zhiqi · JiakaiWang · Yuanxing Zhang · Xu Tan · Jie Fu · Jiamang Wang · Lin Qu · Wenbo Su · Bo Zheng |