About Me

Introduction

Hello! I’m Sicong Leng, a 2nd-year P.h.D. at Nanyang Technological University. I am currently under Alibaba-NTU talent programme and jointly supervised by Prof.Lu Shijian (Visual-Intelligence Lab/S-lab) and Dr.Bing Lidong (Alibaba DAMO Academy).

I specialize in Deep Learning with a focus on Multi-modality research, especially for Vision+Language. Feel free to reach out to me for collaborations, questions, or just to chat!

News

  • [24.10] Inf-CLIP has been released! Check out our project here.
  • [24.10] CMM has been released! Check out our project here.
  • [24.09] 1 paper accepted by NeurIPS 2024! Congratulations to the co-authors!
  • [24.06] VideoLLaMA 2 has been released! Check out our paper and code here
  • [24.04] VCD has received the CVPR 2024 Highlight!
  • [24.03] 3 papers accepted by CVPR 2024! Congratulations to the co-authors!
  • [23.11] VCD has been released! Check out our paper and code here
  • [23.08] We present our work at Nvidia Internal Technical Sharing!
  • [23.08] We present our work at AAAI 2023 Summer Symposium Series!
  • [23.07] Tell2Design has received the Area Chair Award and Best Paper Nomination at ACL 2023!
  • [23.06] Our paper Tell2Design has been accepted by ACL 2023 as a long oral paper!

Awards

  • ACL 2023 Area Chair Award
  • ACL 2023 Best Paper Nomination

Selected Publications

  • Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss [paper] [code]
    • Zesen Cheng, Hang Zhang, Kehan Li, Sicong Leng, Zhiqiang Hu, Fei Wu, Deli Zhao, Xin Li, Lidong Bing
    • ArXiv 2024
  • The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio [paper] [project] [code]
    • Sicong Leng*, Yun Xing*, Zesen Cheng*, Yang Zhou, Hang Zhang, Xin Li, Deli Zhao, Shijian Lu, Chunyan Miao, Lidong Bing
    • ArXiv 2024
  • VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs [paper] [code]
    • Zesen Cheng*, Sicong Leng*, Hang Zhang*, Yifei Xin*, Xin Li*, Guanzheng Chen, Yongxin Zhu, Wenqi Zhang, Ziyang Luo, Deli Zhao, Lidong Bing
    • ArXiv 2024
  • Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding [paper] [code]
    • Sicong Leng*, Hang Zhang*, Guanzheng Chen, Xin Li, Shijian Lu, Chunyan Miao, Lidong Bing
    • CVPR 2024 $\color{red}{\text{(Highlight)}}$
  • Tell2Design: A Dataset for Language-Guided Floor Plan Generation [paper] [code]
    • Sicong Leng*, Yang Zhou*, Mohammed Haroon Dupty, Wee Sun Lee, Sam Conrad Joyce, Wei Lu
    • ACL 2023 $\color{red}{\text{(Area Chair Award) (Best Paper Nomination)}}$

Please refer to Google Scholar for the full list of publications.

Service

  • Reviewer:
    • 2025: NAACL
    • 2024: EMNLP, WACV
    • 2023: EMNLP, CoNLL, NIPS, ACL
  • Program Committee:
    • EMNLP 2023 Industry Track

Work experience

  • Aug 2021 - Aug 2023: Research Assistant
    • StatNLP Lab, Singapore University of Technology and Design
    • Research on NLP and Multi-modal Learning
    • Supervisor: Professor Lu Wei

Website last updated on 25th October 2024.