Andy J Yang

杨加锋

Selected publications (see also my Google Scholar page):

Length generalization bounds for transformers
Andy Yang, Pascal Bergsträßer, Georg Zetzsche, David Chiang, Anthony W. Lin.
ArXiv preprint.
Probability Distributions Computed by Autoregressive Transformers
Andy Yang, Anej Svete, Jiaoda Li, Anthony Widjaja Lin, Jonathan Rawski, Ryan Cotterell, David Chiang
In Proc. ICLR. 2026.
The Transformer Cookbook
Andy Yang, Christopher Watson, Anton Xue, Satwik Bhattamishra, Jose Llarena, William Merrill, Emile Dos Santos Ferreira, Anej Svete, and David Chiang.
Transactions on Machine Learning Research, January 2026.
Knee-deep in C-RASP: a transformer depth hierarchy
Andy Yang, Michaël Cadilhac, and David Chiang.
In Proc. NeurIPS 38. 2025.
Simulating hard attention using soft attention
Andy Yang, Lena Strobl, David Chiang, and Dana Angluin.
Transactions of the Association for Computational Linguistics, 2025.
A Formal Framework for Understanding Length Generalization in Transformers
Xinting Huang, Andy Yang, Satwik Bhattamishra, Yash Sarrof, Andreas Krebs, Hattie Zhou, Preetum Nakkiran, Michael Hahn.
In Proc. ICLR. 2025.
Masked hard-attention transformers recognize exactly the star-free languages.
Andy Yang, David Chiang, and Dana Angluin.
In Proc. NeurIPS 37, 10202–10235. 2024.