cv

Curriculum Vitae of Shihao Cheng (程世豪). Click the button on the right to download the PDF.

Contact Information

Name Shihao Cheng
Professional Title M.S., Wuhan University
Email shihaocheng@whu.edu.cn
Phone +86 19819662573
Location Faculty of Information Science, Wuhan University, Wuhan, Hubei 430072

Professional Summary

M.S. student at LIESMARS, Wuhan University. Research interests focus on multimodal understanding and generation, with works on audio-video joint generation, streaming video world models, and dark-video understanding. Published at CVPR (Highlight), ECCV, and T-CSVT.

Experience

  • 2026 - Present

    Remote / Beijing, China

    Research Intern
    Tencent Hunyuan
    Research on agentic video world modeling that synergizes reasoning and generation.
    • Streaming audio-visual generation with semantic-temporal alignment.
    • Hierarchical World State Memory for long-range consistency.

Education

  • 2024 - 2027

    Wuhan, Hubei, China

    M.S.
    Wuhan University
    Communication and Information Systems
    • Advised by Prof. Zhigang Tu at LIESMARS.
    • Research focus on multimodal understanding and generation.
  • 2020 - 2024

    Harbin, Heilongjiang, China

    B.S.
    Harbin Institute of Technology
    Information Engineering
    • Class Rank: 1 / 29.
    • National Scholarship 2023 (¥8,000 RMB, Top 1%).
    • Provincial Outstanding Graduate, 2024 (Top 1%).

Awards

  • 2023
    National Scholarship
    Ministry of Education of the People's Republic of China

    ¥8,000 RMB. Top 1%.

  • 2024
    Provincial Outstanding Graduate
    Heilongjiang Province

    Top 1%.

Publications

  • 2026
    Unison: Harmonizing Motion, Speech, and Sound for Human-Centric Audio-Video Generation
    ECCV 2026

    Resolved speech-SFX interference and motion-audio desynchronization in audio-video generation.

  • 2026
    InteractiveAvatar: Real-Time Streaming Video Generation for Consistent and Intent-Aware Avatars
    ECCV 2026

    Resolved understanding interaction and long-term semantic drift in infinite-length video generation.

  • 2026
    GeoMMBench and GeoMMAgent: Toward Expert-Level Multimodal Intelligence in Geoscience and Remote Sensing
    CVPR 2026 (Highlight)

    Built the first expert-level multimodal benchmark and a multi-agent framework for interdisciplinary, multi-sensor remote-sensing scenarios.

  • 2025
    OwlSight: A Robust Illumination Adaptation Framework for Dark Video Human Action Recognition
    IEEE T-CSVT, 2025 (IF 11.1)

    Resolved inefficient brightness utilization in joint low-light enhancement and action recognition.

Skills

Research (Advanced): Multimodal Generation, Audio-Video Generation, Streaming Video, World Models, MLLMs, Video Understanding
Programming (Advanced): Python, PyTorch, CUDA, Deep Learning Frameworks

Languages

Chinese : Native speaker
English : Fluent (research reading & writing)

Interests

Research Topics: Audio-Video Joint Generation, Streaming Long-Video Generation, World Models, Multimodal Alignment, Video World Modeling