About Me

I'm currently a second-year PhD student at Dartmouth College, advised by Prof. Yu-Wing Tai. My current research interests lie in multimodal generative models, efficient diffusion models, and multimodal agentic systems.

I received my B.S. in Data Science and Technology from the Hong Kong University of Science and Technology. During my undergraduate studies, I was fortunate to work with Prof. Xiaomeng Li, Prof. Yu-Wing Tai, and Prof. Chi Keung Tang, and I also had the chance to be an exchange student at EPFL.

I'm also a music producer and a photographer.

Multimodal generative models. I study how to leverage large language models for generation tasks across text, image, video, and 3D modalities.
Efficient diffusion models. I develop efficient methods for high-quality content generation using diffusion-based approaches.
Multimodal agentic systems. I explore how to build autonomous agents that reason and act across multiple modalities.

Selected Publications

AtlasVid
AtlasVid: Efficient Ultra-High-Resolution Long Video Generation via Decoupled Global-Local Modeling
Submitted to NeurIPS 2026

Ziyang Mai*, Yuyao Zhang*, Yu-Wing Tai (*equal contribution)

Proposed a framework that decouples global and local modeling for ultra-high-resolution long video generation. Uses temporally scaled RoPE for global semantic proxy and hierarchical locality-preserving attention for high-res details, achieving 60.9× speedup over native 4K generators with resolution-agnostic training.

UltraGen
UltraGen: Efficient Ultra-High-Resolution Image Generation with Hierarchical Local Attention
Submitted to ECCV 2026

Yuyao Zhang*, Yu-Wing Tai

Developed an efficient, scalable ultra-high resolution (8K) image generation framework. Eliminates the need for hi-res training data, achieving >10× inference speedup and significantly lower memory usage compared to FLUX baselines.

HierEdit: Region-Aware Hierarchical Diffusion for Efficient High-Resolution Editing
CVPR 2026

Yuyao Zhang*, Alexander Huang-Menders, Yu-Wing Tai

Developed an efficient, scalable ultra-high resolution (4K) image editing framework without any specialized high-resolution training data. Achieves >5× faster inference speed and enables ultra-high-res editing that other models failed.

LayerCraft
LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration
NeurIPS 2025
Guarini Graduate Student Travel Award

Yuyao Zhang*, Jinghao Li, Yu-Wing Tai

Proposed an automated framework using LLMs for structured text-to-image generation and editing. Introduced ChainArchitect for CoT-based 3D-aware layout generation and Object-Integration Network (OIN) for seamless and efficient subject-driven inpainting.

Self-prompting Medical Segmentation
Self-prompting Large Vision Models for Few-Shot Medical Image Segmentation
MICCAI 2023 (DART)

Qi Wu*, Yuyao Zhang*, Marawan Elbatel  (* equal contribution)

Proposed a self-prompting method for few-shot medical image segmentation using SAM. Demonstrated superior performance over varying state-of-the-art medical segmentation baselines.

Education

Dartmouth College
Sep. 2024 – Present

PhD Student in Computer Science · Hanover, NH, USA

Advisor: Prof. Yu-Wing Tai

Hong Kong University of Science and Technology
Sep. 2020 – Jun. 2024

B.Sc. in Data Science and Technology, First Class Honor (Top 10%) · Hong Kong, China

Advisors: Prof. Xiaomeng Li, Prof. Yu-Wing Tai, Prof. Chi-Keung Tang

École polytechnique fédérale de Lausanne (EPFL)
Sep. 2022 – Feb. 2023

Regular Term Exchange · Lausanne, Switzerland

Experience & Teaching

Reviewer

CVPR 2026, ECCV 2024/2026, NeurIPS 2026, IJCV, TPAMI

Teaching Assistant · Dartmouth College
Sep. 2024 – Present
  • COSC 89/189 (Generative AI), ENGS 106 (Machine Learning), COSC 83/183 (Computer Vision)
Teaching Assistant · HKUST
Sep. 2023 – Dec. 2023

COMP 3311 Database Management Systems

Research Assistant · HKUST
Jun. 2023 – May 2024

Advisors: Prof. Chi Keung Tang & Prof. Yu-Wing Tai