About Me

Introduction

Hello! I’m Sicong Leng, a 3rd-year P.h.D. at Nanyang Technological University. I am currently under Alibaba-NTU talent programme and jointly supervised by Prof.Lu Shijian (Visual-Intelligence Lab/S-lab) and Dr.Bing Lidong (Alibaba DAMO Academy).

I specialize in Deep Learning with a focus on Multi-modality and Embodied AI research, especially for Vision+Language and Vision+Language+Action. Feel free to reach out to me for collaborations, questions, or just to chat!

News

[25.09] We present MMR1 with open-sourced data, code and model! Check out our paper here.
[25.09] 2 papers accepted by NeurIPS 2025! One is accepted as oral paper! Congratulations to the co-authors!
[25.08] RynnVLA-001 is released! Check out our paper and code here.
[25.06] Inf-CLIP has received the CVPR 2025 Highlight!
[25.03] MMR1 has been released! Check out our code here.
[25.03] 2 papers accepted by CVPR 2025! Congratulations to the co-authors!
[25.01] VideoLLaMA 3 has been released! Check out our paper and code here.
[24.10] Inf-CLIP has been released! Check out our project here.
[24.10] CMM has been released! Check out our project here.
[24.09] 1 paper accepted by NeurIPS 2024! Congratulations to the co-authors!
[24.06] VideoLLaMA 2 has been released! Check out our paper and code here.
[24.04] VCD has received the CVPR 2024 Highlight!
[24.03] 3 papers accepted by CVPR 2024! Congratulations to the co-authors!
[23.11] VCD has been released! Check out our paper and code here.
[23.08] We present our work at Nvidia Internal Technical Sharing!
[23.08] We present our work at AAAI 2023 Summer Symposium Series!
[23.07] Tell2Design has received the Area Chair Award and Best Paper Nomination at ACL 2023!
[23.06] Our paper Tell2Design has been accepted by ACL 2023 as a long oral paper!

Awards

NeurIPS 2025 Oral
CVPR 2025 Highlight
CVPR 2024 Highlight
ACL 2023 Area Chair Award
ACL 2023 Best Paper Nomination
ACL 2023 Oral

Selected Publications

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources [paper] [code] [data] [model]
- Sicong Leng*, Jing Wang*, Jiaxi Li*, Hao Zhang*, Zhiqiang Hu, Boqiang Zhang, Yuming Jiang, Hang Zhang, Xin Li, Lidong Bing, Deli Zhao, Wei Lu, Yu Rong, Aixin Sun, Shijian Lu
- ArXiv 2025
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding [paper] [code]
- Boqiang Zhang*, Kehan Li*, Zesen Cheng*, Zhiqiang Hu*, Yuqian Yuan*, Guanzheng Chen*, Sicong Leng*, Yuming Jiang*, Hang Zhang*, Xin Li*, Peng Jin, Wenqi Zhang, Fan Wang, Lidong Bing, Deli Zhao
- ArXiv 2025
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss [paper] [code]
- Zesen Cheng, Hang Zhang, Kehan Li, Sicong Leng, Zhiqiang Hu, Fei Wu, Deli Zhao, Xin Li, Lidong Bing
- CVPR 2025 $\color{red}{\text{(Highlight)}}$
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio [paper] [project] [code]
- Sicong Leng*, Yun Xing*, Zesen Cheng*, Yang Zhou, Hang Zhang, Xin Li, Deli Zhao, Shijian Lu, Chunyan Miao, Lidong Bing
- NeurIPS 2025
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs [paper] [code]
- Zesen Cheng*, Sicong Leng*, Hang Zhang*, Yifei Xin*, Xin Li*, Guanzheng Chen, Yongxin Zhu, Wenqi Zhang, Ziyang Luo, Deli Zhao, Lidong Bing
- ArXiv 2024
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding [paper] [code]
- Sicong Leng*, Hang Zhang*, Guanzheng Chen, Xin Li, Shijian Lu, Chunyan Miao, Lidong Bing
- CVPR 2024 $\color{red}{\text{(Highlight)}}$
Tell2Design: A Dataset for Language-Guided Floor Plan Generation [paper] [code]
- Sicong Leng*, Yang Zhou*, Mohammed Haroon Dupty, Wee Sun Lee, Sam Conrad Joyce, Wei Lu
- ACL 2023 $\color{red}{\text{(Area Chair Award) (Best Paper Nomination)}}$

Please refer to Google Scholar for the full list of publications.

Service

Reviewer:
- 2025: TPAMI, TMM, NAACL, EMNLP
- 2024: EMNLP, WACV
- 2023: EMNLP, CoNLL, NIPS, ACL
Program Committee:
- EMNLP 2023 Industry Track

Work experience

Aug 2021 - Aug 2023: Research Assistant
- StatNLP Lab, Singapore University of Technology and Design
- Research on NLP and Multi-modal Learning
- Supervisor: Professor Lu Wei

Website last updated on 26th September 2025.