cv
Curriculum Vitae of Shihao Cheng (程世豪). Click the button on the right to download the PDF.
Contact Information
| Name | Shihao Cheng |
| Professional Title | M.S., Wuhan University |
| shihaocheng@whu.edu.cn | |
| Phone | +86 19819662573 |
| Location | Faculty of Information Science, Wuhan University, Wuhan, Hubei 430072 |
Professional Summary
M.S. student at LIESMARS, Wuhan University. Research interests focus on multimodal understanding and generation, with works on audio-video joint generation, streaming video world models, and dark-video understanding. Published at CVPR (Highlight), ECCV, and T-CSVT.
Experience
-
2026 - Present Remote / Beijing, China
Research Intern
Tencent Hunyuan
Research on agentic video world modeling that synergizes reasoning and generation.
- Streaming audio-visual generation with semantic-temporal alignment.
- Hierarchical World State Memory for long-range consistency.
Education
-
2024 - 2027 Wuhan, Hubei, China
M.S.
Wuhan University
Communication and Information Systems
- Advised by Prof. Zhigang Tu at LIESMARS.
- Research focus on multimodal understanding and generation.
-
2020 - 2024 Harbin, Heilongjiang, China
B.S.
Harbin Institute of Technology
Information Engineering
- Class Rank: 1 / 29.
- National Scholarship 2023 (¥8,000 RMB, Top 1%).
- Provincial Outstanding Graduate, 2024 (Top 1%).
Awards
-
2023 National Scholarship
Ministry of Education of the People's Republic of China
¥8,000 RMB. Top 1%.
-
2024 Provincial Outstanding Graduate
Heilongjiang Province
Top 1%.
Publications
-
2026 Unison: Harmonizing Motion, Speech, and Sound for Human-Centric Audio-Video Generation
ECCV 2026
Resolved speech-SFX interference and motion-audio desynchronization in audio-video generation.
-
2026 InteractiveAvatar: Real-Time Streaming Video Generation for Consistent and Intent-Aware Avatars
ECCV 2026
Resolved understanding interaction and long-term semantic drift in infinite-length video generation.
-
2026 GeoMMBench and GeoMMAgent: Toward Expert-Level Multimodal Intelligence in Geoscience and Remote Sensing
CVPR 2026 (Highlight)
Built the first expert-level multimodal benchmark and a multi-agent framework for interdisciplinary, multi-sensor remote-sensing scenarios.
-
2025 OwlSight: A Robust Illumination Adaptation Framework for Dark Video Human Action Recognition
IEEE T-CSVT, 2025 (IF 11.1)
Resolved inefficient brightness utilization in joint low-light enhancement and action recognition.