News

Our token-efficient long video model for multimodal LLMs (STORM) is on arXiv. It achieves more than 5% improvement on MLVU and LongVideoBench compared to SOTA while reducing the computation costs by up to 8× and the decoding latency by 2.4-2.9×. Check the project page for more details!
We intoduce efficient 2D parallel sequence modeling for image classifcaiton and generation. The paper is accepted to CVPR 2025!
Another Mamba-Transformer Hybrid LLM is on arxiv! Check the blog post. The paper is accepted as a Spotlight to ICLR 2025!
We’ve released a new 8B Mamba-based Hybrid LLM! The checkpoints as well as the code are also released as part of NVIDIA’s Megatron-LM project. I gave a talk at KAIST.
1 paper accepted to ECCV 2024.
1 paper accepted to CVPR 2024.
1 paper accepted to NeurIPS 2023.

Research Interests

J Jiang, X Li, Z Liu, M Li, G Chen, Z Li, D Huang, G Liu, Z Yu, K Keutzer, S Ahn, J Kautz, H Yin, Y Lu, S Han, W Byeon, “Token-Efficient Long Video Understanding for Multimodal LLMs”, arXiv, 2025
H Wang, W Byeon, J Xu, J Gu, KC Cheung, X Wang, K Han, J Kautz, S Liu, “Parallel Sequence Modeling via Generalized Spatial Propagation Network”, CVPR, 2025
X Dong, Y Fu, S Diao, W Byeon, Z Chen, A S Mahabaleshwarkar, S Liu, M Keirsbilck, M Chen, Y Suhara, Y Lin, J Kautz, P Molchanov, “Hymba: A Hybrid-head Architecture for Small Language Models”, ICLR, 2025
R Waleffe, W Byeon, D Riach, B Norick, V Korthikanti, T Dao, A Gu, A Hatamizadeh, S Singh, D Narayanan, G Kulshreshtha, V Singh, J Casper, J Kautz, M Shoeybi, B Catanzaro, “An Empirical Study of Mamba-based Language Models”, arXiv, 2024
J T.H. Smith, S De Mello, J Kautz, S W. Linderman, W Byeon, “Convolutional State Space Models for Long-Range Spatiotemporal Modeling”, NeurIPS, 2023