News
-
We are hiring interns! If you are a PhD student working on SSM, Mamba, or alternative architectures for LLMs or multimodal LLMs, email me your CV.
-
Another Mamba-Transformer Hybrid LLM is on arxiv! Check the blog post.
-
We’ve released a new 8B Mamba-based Hybrid LLM! The checkpoints as well as the code are also released as part of NVIDIA’s Megatron-LM project. I gave a talk at KAIST.
-
1 paper accepted to ECCV 2024.
-
1 paper accepted to CVPR 2024.
-
1 paper accepted to NeurIPS 2023.
Research Interests
-
Recurrent Neural Network (RNN), State-Space Models (SSM), Linear RNNs
-
Sequence Learning, Spatio-Temporal Learning
-
Predictive Learning, Few-shot Learning, Lifelong Learning
Selected Projects
-
X Dong, Y Fu, S Diao, W Byeon, Z Chen, A S Mahabaleshwarkar, S Liu, M Keirsbilck, M Chen, Y Suhara, Y Lin, J Kautz, P Molchanov, “Hymba: A Hybrid-head Architecture for Small Language Models”, arXiv, 2024
-
R Waleffe, W Byeon, D Riach, B Norick, V Korthikanti, T Dao, A Gu, A Hatamizadeh, S Singh, D Narayanan, G Kulshreshtha, V Singh, J Casper, J Kautz, M Shoeybi, B Catanzaro, “An Empirical Study of Mamba-based Language Models”, arXiv, 2024
-
J T.H. Smith, S De Mello, J Kautz, S W. Linderman, W Byeon, “Convolutional State Space Models for Long-Range Spatiotemporal Modeling”, NeurIPS, 2023