Welcome to Yichuan Mo’s Homepage! 👏

I am a fourth-year PhD candidate at the ZERO Lab in the School of Intelligence Science and Technology at Peking University, advised by Prof. Yisen Wang. I received my bachelor's degree from Shanghai Jiao Tong University in 2022, where I was co-advised by Prof. Shilin Wang and Prof. Junchi Yan.

My research focuses on the safety alignment of deep learning models, particularly Large Language Models and diffusion-based image/text generation models with over 1k citations. More broadly, I am interested in exploring next-generation language model paradigms, including model architectures, post-training, and decoding strategies. Recently, I have been studying language diffusion models and investigating their potential integration with conventional auto-regressive pipelines.

I am an optimistic and kind person who enjoys finding happiness in everyday life. I am particularly passionate about sports, especially badminton, swimming, and running. My long-term aspiration is to build safe Artificial General Intelligence (AGI) systems that can continuously and reliably benefit all humanity. In the face of rapidly evolving AI technologies, I maintain a humble attitude and learn from fellow researchers and enjoy embracing new challenges.

I expect to graduate in June 2027 and am currently seeking job opportunities starting in summer 2027 in areas including large language model training, quant research, and academic positions. Feel free to contact me at mo666666@stu.pku.edu.cn!

🎓 Education

Peking University
Sep. 2022 – Present
Ph.D. Candidate
School of Intelligence Science and Technology
Shanghai Jiao Tong University
Sep. 2018 – Jun. 2022
B.Eng.
School of Computer Science

💯 Academic Performance

Undergraduate

GPA: 90.93/100 (or 3.94/4.3), Rank: 2/128 (Top 1.6%)
Courses: 55.81% above A, 24.42% above A+

Graduate

GPA: 3.88/4.0
Courses: 71.4% above A

🏆 Selective Awards

Undergraduate

2019.10 Merit Student of Shanghai Jiao Tong University
2019.12 National Scholarship (Top 1% in CE Dept.)
2020.12 National Scholarship (Top 1% in CE Dept.)
2021.12 Weichai Power Scholarship
2022.05 Outstanding Dormitory
2022.06 Outstanding Graduate of Shanghai (Top 3%)
2022.06 Outstanding Undergraduate Thesis, Shanghai Jiao Tong University (Top 1%, 1 out of 128 students in CE Dept.)

Graduate

2022.12 Sunshine Dormitory
2023.09 Xiaomi Scholarship (First-Class)
2023.09 Merit Student of Peking University
2024.09 Yuehua Luo Scholarship
2024.09 Outstanding Research Award in Peking University
2025.04 Nomination for Academic Star（Five Graduate students in AI Dept.）
2025.11 Taotian Scholarship (Eight Graduate students in AI Dept.)
2026.04 Optiver AI PhD Scholarship (Six PhD students in China)

📝 Papers

(* Equal Contribution and # Student First Author)

Accepted

TrustLDM: Benchmarking Trustworthiness in Language Diffusion Model
Yichuan Mo^*, Yukun Jiang^*, Yanbo Shi^*, Mingjie Li^*, Michael Backes, Yang Zhang, and Yisen Wang
ICLR 2026 Trustworthy Workshop (First benchmark for evaluating trustworthiness of language diffusion models)
[PDF]
Decoding Large Language Diffusion Models with Foreseeing Movement
Yichuan Mo^*, Quan Chen^*, Mingjie Li, Zeming Wei, and Yisen Wang
ICLR 2026 DeLTa Workshop
[PDF]
Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations
Zeming Wei, Yifei Wang, Li Ang, Yichuan Mo, and Yisen Wang
TPAMI 2026 (Adopted at scale by Anthropic)
[PDF] [Code] [Anthropic Blog]
On the Adversarial Transferability of Generalized “Skip Connections”
Yisen Wang, Yichuan Mo^#, Dongxian Wu, Mingjie Li, Xingjun Ma, and Zhouchen Lin
TPAMI 2026 (Journal extension of SGM, original paper cited 400+ times on Google Scholar)
[PDF] [Code]
Fight Back Against Jailbreaking via Prompt Adversarial Tuning
Yichuan Mo^*, Yuji Wang^*, Zeming Wei, and Yisen Wang
NeurIPS 2024
[PDF] [Code]
TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors
Yichuan Mo, Hui Huang, Mingjie Li, Ang Li, and Yisen Wang
ICML 2024 (First backdoor input detection method for diffusion models)
[PDF] [Code]
PID: Prompt-Independent Data Protection Against Latent Diffusion Models
Ang Li, Yichuan Mo, Mingjie Li, and Yisen Wang
ICML 2024
[PDF] [Code]
When Adversarial Training Meets Vision Transformers: Recipes from Training to Architecture
Yichuan Mo, Dongxian Wu, Yifei Wang, Yiwen Guo, and Yisen Wang
NeurIPS 2022 (Spotlight, Top 5%) (First work to improve adversarial robustness of ViTs)
[PDF] [Code] [Comment]
Improving Generative Adversarial Networks via Adversarial Learning in Latent Space
Yang Li, Yichuan Mo, Liangliang Shi, Junchi Yan, Xiaolu Zhang, and Jun Zhou
NeurIPS 2022 (Spotlight, Top 5%)
[PDF] [Code]
DICE: Domain-attack Invariant Causal Learning for Improved Data Privacy Protection and Adversarial Robustness
Qibing Ren, Yiting Chen, Yichuan Mo, Qitian Wu, and Junchi Yan
SIGKDD 2022
[PDF] [Code]
Multi-Task Learning Improves Synthetic Speech Detection
Yichuan Mo, and Shilin Wang
ICASSP 2022
[PDF] [Code]

Preprint

SelfCAD: Protecting Your Efficient Reasoning Capabilities via Self-Cautious Insertion
Taiye Chen, Mingjie Li, Yichuan Mo, Shuo Feng, and Yisen Wang
Preprint 2026
[PDF]
Generalist++: A Meta-learning Framework for Mitigating Trade-off in Adversarial Training
Yisen Wang, Yichuan Mo^#, Hongjun Wang, Junyi Li, and Zhouchen Lin
arXiv 2025
[PDF]
Are Smarter LLMs Safer? Exploring Safety-Reasoning Trade-offs in Prompting and Fine-Tuning
Ang Li, Yichuan Mo, Mingjie Li, Yifei Wang, and Yisen Wang
arXiv 2025 (First to reveal the safety–reasoning capability trade-off)
[PDF]

🤖 Open-source Models

Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making
World's Top Medical AI Model (Jan. 2026)
[PDF] [github] [huggingface] [blog]
Baichuan-M2: Scaling Medical Capability with Large Verifier System
World's Top Open-Source Medical AI Model (Aug. 2025)
[PDF] [github] [huggingface] [blog]

🛠️ Academic Service

Reviewer: NeurIPS 2023/2024/2025; ICLR 2024/2025/2026; ICML 2024/2025/2026; CVPR 2025/2026; ICCV 2025; IJCAI 2024; AAAI 2025/2026; AISTATS 2025; ECCV 2026
Top Reviewer of NeurIPS 2023 (Top 10.49%)
Top Reviewer of NeurIPS 2024 (Top 8.60%)
Top Reviewer of NeurIPS 2025 (Top 8.02%)
Notable Reviewer of ICLR 2025 (Top 3%)

🔗 Links

(Alphabetical Order)

My Best Friends: Haoyu Geng, Zetian Jiang, Danning Lao, Yang Li, Chang Liu, Han Lu, Xudong Lu, Qibing Ren, Yuji Wang, Nianzu Yang
My Best Labmates: Jingyi Cui, Tianqi Du, Lizhe Fang, Xiaojun Guo, Mingjie Li, Ykang Li, Yifei Wang, Zeming Wei, Pengyun Yue, Qi Zhang, Yige Zhang

Yichuan Mo

🎓 Education

💯 Academic Performance

🏆 Selective Awards

📝 Papers

TrustLDM: Benchmarking Trustworthiness in Language Diffusion Model

Decoding Large Language Diffusion Models with Foreseeing Movement

Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations

On the Adversarial Transferability of Generalized “Skip Connections”

Fight Back Against Jailbreaking via Prompt Adversarial Tuning

TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors

PID: Prompt-Independent Data Protection Against Latent Diffusion Models

When Adversarial Training Meets Vision Transformers: Recipes from Training to Architecture

Improving Generative Adversarial Networks via Adversarial Learning in Latent Space

DICE: Domain-attack Invariant Causal Learning for Improved Data Privacy Protection and Adversarial Robustness

Multi-Task Learning Improves Synthetic Speech Detection

SelfCAD: Protecting Your Efficient Reasoning Capabilities via Self-Cautious Insertion

Generalist++: A Meta-learning Framework for Mitigating Trade-off in Adversarial Training

Are Smarter LLMs Safer? Exploring Safety-Reasoning Trade-offs in Prompting and Fine-Tuning

🤖 Open-source Models

Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making

Baichuan-M2: Scaling Medical Capability with Large Verifier System

🛠️ Academic Service

🔗 Links