prof_pic.png
NOW
Research Intern @ Tencent Hunyuan
Streaming joint audio-video generation.
OPEN TO
Ph.D. Fall 2027
Multimodal generation, agentic systems, real-time generative agents.

Shihao CHENG (程世豪)

M.S. student in Communication and Information Systems, Wuhan University

Hi! I am Shihao Cheng (程世豪), an M.S. student at Wuhan University advised by Prof. Zhigang Tu (expected graduation: June 2027). I received my B.S. degree in Information Engineering from the Harbin Institute of Technology in 2024, graduating ranked 1st out of 29 in my class.

Currently, I am a Research Intern at Tencent Hunyuan, focusing on streaming joint audio-video generation. My research focuses on multimodal generation and agentic systems, with a long-term interest in building real-time multimodal generative agents that produce coherent, controllable, and interactive content in response to evolving user intent.

My work has been published in top-tier venues including CVPR 2026 (Highlight), ECCV 2026, and T-CSVT.

I am looking for Ph.D. positions (Fall 2027) and RA opportunities in multimodal AI. I welcome discussions about potential fit with your research group.

News

Jun 2026 🎉 Unison and InteractiveAvatar were accepted by ECCV 2026!
Apr 2026 🎬 Started as a Research Intern at Tencent Hunyuan, working on agentic streaming video generation and world models.
Mar 2026 🎉 GeoMMAgent was accepted by CVPR 2026 as a Highlight!
Jun 2025 🎉 OwlSight was accepted by IEEE T-CSVT (IF 11.1).
Sep 2024 🎓 Started M.S. studies at Wuhan University, advised by Prof. Zhigang Tu.

Selected Publications

  1. ECCV 2026
    unison.png
    Shihao Cheng, Jiaxu Zhang, Quanyue Song, and 6 more authors
    First Author
    In European Conference on Computer Vision (ECCV), 2026
  2. ECCV 2026
    interactive_avatar.png
    InteractiveAvatar: Real-Time Streaming Video Generation for Consistent and Intent-Aware Avatars
    Shihao Cheng and others
    In European Conference on Computer Vision (ECCV), 2026
  3. CVPR 2026
    geommbench.png
    Aoran Xiao, Shihao Cheng, Yonghao Xu, and 3 more authors
    Co-First Author
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Highlight , 2026
  4. T-CSVT 2025
    owlsight.png
    Shihao Cheng, Jinlu Zhang, Yue Liu, and 2 more authors
    First Author
    IEEE Transactions on Circuits and Systems for Video Technology, 2025

Experience

2026.04 – Present, Shenzhen
Research Intern · streaming joint audio-video generation.
Real-time multimodal generative systems that align speech, sound, and motion under streaming, online conditions.

Education

2024.09 – 2027.06 (expected), Wuhan
M.S. in Communication and Information Systems.
Research advisor: Prof. Zhigang Tu.
2020.09 – 2024.06, Harbin
B.S. in Information Engineering.
Avg. 92.3 / 100, ranked 1 / 29 in class.

Awards & Honors

  • 2024: Provincial Outstanding Graduate (Top 1%).
  • 2023: National Scholarship (Highest Honor for Bachelor students in China, 8,000RMB¥, Top 1%).