For a complete list of my publications, please visit my Google Scholar profile.

2025

  • Group-Level Data Selection for Efficient Pretraining
    Zichun Yu, Fei Peng, Jie Lei, Arnold Overwijk, Wen-tau Yih, Chenyan Xiong
    NeurIPS 2025
    Paper Code

  • Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning
    Xiaochuan Li, Zichun Yu, Chenyan Xiong
    ICLR 2025
    Paper Code

2024

  • MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models
    Zichun Yu, Spandan Das, Chenyan Xiong
    NeurIPS 2024
    Paper Code

2023

  • An In-depth Look at Gemini’s Language Abilities
    Syeda Nahida Akter*, Zichun Yu*, Aashiq Muhamed*, Tianyue Ou*, Alex Bäuerle, Ángel Alexander Cabrera, Krish Dholakia, Chenyan Xiong, Graham Neubig
    Preprint
    Paper Code

  • Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In
    Zichun Yu, Chenyan Xiong, Shi Yu, Zhiyuan Liu
    ACL 2023
    Paper Code

2022

  • Automatic Label Sequence Generation for Prompting Sequence-to-sequence Models
    Zichun Yu, Tianyu Gao, Zhengyan Zhang, Yankai Lin, Zhiyuan Liu, Maosong Sun, Jie Zhou
    COLING 2022
    Paper Code