News

We recently introduced Nemotron Nano V2, a 9B hybrid model that delivers competitive or superior accuracy on reasoning benchmarks while achieving up to 6× higher inference throughput in reasoning tasks (e.g., 8k input and 16k output tokens). The model builds on our Mamba-based Hybrid LLM work.
Our token-efficient long video model for multimodal LLMs (STORM) is on arXiv. It achieves more than 5% improvement on MLVU and LongVideoBench compared to SOTA while reducing the computation costs by up to 8× and the decoding latency by 2.4-2.9×. Check the project page for more details!
We intoduce efficient 2D parallel sequence modeling for image classifcaiton and generation. The paper is accepted to CVPR 2025!
Another Mamba-Transformer Hybrid LLM is on arxiv! Check the blog post. The paper is accepted as a Spotlight to ICLR 2025!
We’ve released a new 8B Mamba-based Hybrid LLM! The checkpoints as well as the code are also released as part of NVIDIA’s Megatron-LM project. I gave a talk at KAIST.

Research Interests

Co-authored with many colleagues at NVIDIA (incl. W. Byeon), “NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model”, arXiv, 2025
Co-authored with many colleagues at NVIDIA (incl. W. Byeon), “Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models”, arXiv, 2025
J Jiang, X Li, Z Liu, M Li, G Chen, Z Li, D Huang, G Liu, Z Yu, K Keutzer, S Ahn, J Kautz, H Yin, Y Lu, S Han, W Byeon, “Token-Efficient Long Video Understanding for Multimodal LLMs”, arXiv, 2025
H Wang, W Byeon, J Xu, J Gu, KC Cheung, X Wang, K Han, J Kautz, S Liu, “Parallel Sequence Modeling via Generalized Spatial Propagation Network”, CVPR, 2025
X Dong, Y Fu, S Diao, W Byeon, Z Chen, A S Mahabaleshwarkar, S Liu, M Keirsbilck, M Chen, Y Suhara, Y Lin, J Kautz, P Molchanov, “Hymba: A Hybrid-head Architecture for Small Language Models”, ICLR, 2025
R Waleffe, W Byeon, D Riach, B Norick, V Korthikanti, T Dao, A Gu, A Hatamizadeh, S Singh, D Narayanan, G Kulshreshtha, V Singh, J Casper, J Kautz, M Shoeybi, B Catanzaro, “An Empirical Study of Mamba-based Language Models”, arXiv, 2024