Publications

You can also find my articles on my Google Scholar profile.

2026

  1. SelfCAD: Protecting Your Efficient Reasoning Capabilities via Self-Cautious Insertion

    Preprint 2026

  2. TrustLDM: Benchmarking Trustworthiness in Language Diffusion Model

    ICLR 2026 Trustworthy Workshop (First benchmark for evaluating trustworthiness of language diffusion models)

  3. Decoding Large Language Diffusion Models with Foreseeing Movement

    ICLR 2026 DeLTa Workshop

  4. Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations

    TPAMI 2026 (Adopted at scale by Anthropic)

  5. On the Adversarial Transferability of Generalized “Skip Connections”

    TPAMI 2026 (Journal extension of SGM, original paper cited 400+ times on Google Scholar)

2025

  1. Generalist++: A Meta-learning Framework for Mitigating Trade-off in Adversarial Training

    arXiv 2025

  2. Are Smarter LLMs Safer? Exploring Safety-Reasoning Trade-offs in Prompting and Fine-Tuning

    arXiv 2025 (First to reveal the safety–reasoning capability trade-off)

2024

  1. Fight Back Against Jailbreaking via Prompt Adversarial Tuning

    NeurIPS 2024

  2. TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors

    ICML 2024 (First backdoor input detection method for diffusion models)

  3. PID: Prompt-Independent Data Protection Against Latent Diffusion Models

    ICML 2024

2022

  1. When Adversarial Training Meets Vision Transformers: Recipes from Training to Architecture

    NeurIPS 2022 (Spotlight, Top 5%) (First work to improve adversarial robustness of ViTs)

  2. Improving Generative Adversarial Networks via Adversarial Learning in Latent Space

    NeurIPS 2022 (Spotlight, Top 5%)

  3. DICE: Domain-attack Invariant Causal Learning for Improved Data Privacy Protection and Adversarial Robustness

    SIGKDD 2022

  4. Multi-Task Learning Improves Synthetic Speech Detection

    ICASSP 2022