For a complete list of my publications, please visit my Google Scholar profile.
2025
-
Group-Level Data Selection for Efficient Pretraining
Zichun Yu, Fei Peng, Jie Lei, Arnold Overwijk, Wen-tau Yih, Chenyan Xiong
NeurIPS 2025
Paper Code -
Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning
Xiaochuan Li, Zichun Yu, Chenyan Xiong
ICLR 2025
Paper Code
2024
- MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models
Zichun Yu, Spandan Das, Chenyan Xiong
NeurIPS 2024
Paper Code
2023
-
An In-depth Look at Gemini’s Language Abilities
Syeda Nahida Akter*, Zichun Yu*, Aashiq Muhamed*, Tianyue Ou*, Alex Bäuerle, Ángel Alexander Cabrera, Krish Dholakia, Chenyan Xiong, Graham Neubig
Preprint
Paper Code -
Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In
Zichun Yu, Chenyan Xiong, Shi Yu, Zhiyuan Liu
ACL 2023
Paper Code
2022