NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Co-authored with many colleagues at NVIDIA (incl. Wonmin Byeon) | arXiv

[arxiv]  [blog]  [Hugging Face] 

Abstract

We introduce Nemotron-Nano-9B-v2, a hybrid Mamba-Transformer language model designed to increase throughput for reasoning workloads while achieving state-of-the-art accuracy compared to similarly-sized models. Nemotron-Nano-9B-v2 builds on the Nemotron-H architecture, in which the majority of the self-attention layers in the common Transformer architecture are replaced with Mamba-2 layers, to achieve improved inference speed when generating the long thinking traces needed for reasoning. We create Nemotron-Nano-9B-v2 by first pre-training a 12-billion-parameter model (Nemotron-Nano-12B-v2-Base) on 20 trillion tokens using an FP8 training recipe. After aligning Nemotron-Nano-12B-v2-Base, we employ the Minitron strategy to compress and distill the model with the goal of enabling inference on up to 128k tokens on a single NVIDIA A10G GPU (22GiB of memory, bfloat16 precision). Compared to existing similarly-sized models (e.g., Qwen3-8B), we show that Nemotron-Nano-9B-v2 achieves on-par or better accuracy on reasoning benchmarks while achieving up to 6x higher inference throughput in reasoning settings like 8k input and 16k output tokens. We are releasing Nemotron-Nano-9B-v2, Nemotron-Nano12B-v2-Base, and Nemotron-Nano-9B-v2-Base checkpoints along with the majority of our pre- and post-training datasets on Hugging Face.

@misc{nvidia2025nvidianemotronnano2,
      title={NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model}, 
      author={NVIDIA and : and Aarti Basant and Abhijit Khairnar and Abhijit Paithankar and Abhinav Khattar and Adithya Renduchintala and Aditya Malte and Akhiad Bercovich and Akshay Hazare and Alejandra Rico and Aleksander Ficek and Alex Kondratenko and Alex Shaposhnikov and Alexander Bukharin and Ali Taghibakhshi and Amelia Barton and Ameya Sunil Mahabaleshwarkar and Amy Shen and Andrew Tao and Ann Guan and Anna Shors and Anubhav Mandarwal and Arham Mehta and Arun Venkatesan and Ashton Sharabiani and Ashwath Aithal and Ashwin Poojaryå and Ayush Dattagupta and Balaram Buddharaju and Banghua Zhu and Barnaby Simkin and Bilal Kartal and Bita Darvish Rouhani and Bobby Chen and Boris Ginsburg and Brandon Norick and Brian Yu and Bryan Catanzaro and Charles Wang and Charlie Truong and Chetan Mungekar and Chintan Patel and Chris Alexiuk and Christian Munley and Christopher Parisien and Dan Su and Daniel Afrimi and Daniel Korzekwa and Daniel Rohrer and Daria Gitman and David Mosallanezhad and Deepak Narayanan and Dima Rekesh and Dina Yared and Dmytro Pykhtar and Dong Ahn and Duncan Riach and Eileen Long and Elliott Ning and Eric Chung and Erick Galinkin and Evelina Bakhturina and Gargi Prasad and Gerald Shen and Haifeng Qian and Haim Elisha and Harsh Sharma and Hayley Ross and Helen Ngo and Herman Sahota and Hexin Wang and Hoo Chang Shin and Hua Huang and Iain Cunningham and Igor Gitman and Ivan Moshkov and Jaehun Jung and Jan Kautz and Jane Polak Scowcroft and Jared Casper and Jian Zhang and Jiaqi Zeng and Jimmy Zhang and Jinze Xue and Jocelyn Huang and Joey Conway and John Kamalu and Jonathan Cohen and Joseph Jennings and Julien Veron Vialard and Junkeun Yi and Jupinder Parmar and Kari Briski and Katherine Cheung and Katherine Luna and Keith Wyss and Keshav Santhanam and Kezhi Kong and Krzysztof Pawelec and Kumar Anik and Kunlun Li and Kushan Ahmadian and Lawrence McAfee and Laya Sleiman and Leon Derczynski and Luis Vega and Maer Rodrigues de Melo and Makesh Narsimhan Sreedhar and Marcin Chochowski and Mark Cai and Markus Kliegl and Marta Stepniewska-Dziubinska and Matvei Novikov and Mehrzad Samadi and Meredith Price and Meriem Boubdir and Michael Boone and Michael Evans and Michal Bien and Michal Zawalski and Miguel Martinez and Mike Chrzanowski and Mohammad Shoeybi and Mostofa Patwary and Namit Dhameja and Nave Assaf and Negar Habibi and Nidhi Bhatia and Nikki Pope and Nima Tajbakhsh and Nirmal Kumar Juluru and Oleg Rybakov and Oleksii Hrinchuk and Oleksii Kuchaiev and Oluwatobi Olabiyi and Pablo Ribalta and Padmavathy Subramanian and Parth Chadha and Pavlo Molchanov and Peter Dykas and Peter Jin and Piotr Bialecki and Piotr Januszewski and Pradeep Thalasta and Prashant Gaikwad and Prasoon Varshney and Pritam Gundecha and Przemek Tredak and Rabeeh Karimi Mahabadi and Rajen Patel and Ran El-Yaniv and Ranjit Rajan and Ria Cheruvu and Rima Shahbazyan and Ritika Borkar and Ritu Gala and Roger Waleffe and Ruoxi Zhang and Russell J. Hewett and Ryan Prenger and Sahil Jain and Samuel Kriman and Sanjeev Satheesh and Saori Kaji and Sarah Yurick and Saurav Muralidharan and Sean Narenthiran and Seonmyeong Bak and Sepehr Sameni and Seungju Han and Shanmugam Ramasamy and Shaona Ghosh and Sharath Turuvekere Sreenivas and Shelby Thomas and Shizhe Diao and Shreya Gopal and Shrimai Prabhumoye and Shubham Toshniwal and Shuoyang Ding and Siddharth Singh and Siddhartha Jain and Somshubra Majumdar and Soumye Singhal and Stefania Alborghetti and Syeda Nahida Akter and Terry Kong and Tim Moon and Tomasz Hliwiak and Tomer Asida and Tony Wang and Tugrul Konuk and Twinkle Vashishth and Tyler Poon and Udi Karpas and Vahid Noroozi and Venkat Srinivasan and Vijay Korthikanti and Vikram Fugro and Vineeth Kalluru and Vitaly Kurin and Vitaly Lavrukhin and Wasi Uddin Ahmad and Wei Du and Wonmin Byeon and Ximing Lu and Xin Dong and Yashaswi Karnati and Yejin Choi and Yian Zhang and Ying Lin and Yonggan Fu and Yoshi Suhara and Zhen Dong and Zhiyu Li and Zhongbo Zhu and Zijia Chen},
      year={2025},
      eprint={2508.14444},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.14444}, 
}