Skip to yearly menu bar Skip to main content


Poster

How does Architecture Influence the Base Capabilities of Pre-trained Language Models? A Case Study Based on FFN-Wider and MoE Transformers

Xin Lu ⋅ Yanyan Zhao ⋅ Bing Qin ⋅ Liangyu Huo ⋅ Qing Yang ⋅ Dongliang Xu
2024 Poster
[ Paper [ Poster [ OpenReview

Abstract

Video

Chat is not available.