Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models
Yonggan Fu*, Xin Dong*, Shizhe Diao, Matthijs Van keirsbilck, Hanrong Ye, Wonmin Byeon , Yashaswi Karnati, Lucas Liebenwein, Hannah Zhang, Nikolaus Binder, Maksim Khadkevich, Alexander Keller, Jan Kautz, Yingyan (Celine) Lin, Pavlo Molchanov | NeruIPS 2025