Biography: I am currently a PhD candidate at Caltech in Computing + Mathematical Sciences advised by Yisong Yue. Before Caltech, I received my Bachelor’s degree in Computer Science from Shanghai Jiao Tong University in 2020.
My research interests primarily lie in advancing generative modeling techniques and exploring their applications to inverse problems. Currently, I particularly focus on diffusion models and their role in solving inverse problems.
I am supported by PIMCO Data Science Graduate Fellowhip.
I enjoy photography in my spare time. You can find my photography portfolio on this website.
Plug-and-play diffusion prior methods have emerged as a promising research direction for solving inverse problems. However, current studies primarily focus on natural image restoration, leaving the performance of these algorithms in scientific inverse problems largely unexplored. To address this gap, we introduce \textscInverseBench, a unified framework that evaluates diffusion models across five distinct scientific inverse problems. These problems present unique structural challenges that differ from existing benchmarks, arising from critical scientific applications such as black hole imaging, seismology, optical tomography, medical imaging, and fluid dynamics. With \textscInverseBench, we benchmark 15 inverse problem algorithms that use plug-and-play diffusion prior methods against strong, domain-specific baselines, offering valuable new insights into the strengths and weaknesses of existing algorithms. We open-source the datasets, pre-trained models, and the codebase to facilitate future research and development.
@inproceedings{zheng2025inversebench,title={InverseBench: Benchmarking Plug-and-Play Diffusion Models for Scientific Inverse Problems},author={Zheng, Hongkai and Chu, Wenda and Zhang, Bingliang and Wu, Zihui and Wang, Austin and Feng, Berthy and Zou, Caifeng and Sun, Yu and Kovachki, Nikola Borislavov and Ross, Zachary E and Bouman, Katherine and Yue, Yisong},booktitle={The Thirteenth International Conference on Learning Representations},year={2025},url={https://openreview.net/forum?id=U3PBITXNG6},}
When solving inverse problems, one increasingly popular approach is to use pre-trained diffusion models as plug-and-play priors. This framework can accommodate different forward models without re-training while preserving the generative capability of diffusion models. Despite their success in many imaging inverse problems, most existing methods rely on privileged information such as derivative, pseudo-inverse, or full knowledge about the forward model. This reliance poses a substantial limitation that restricts their use in a wide range of problems where such information is unavailable, such as in many scientific applications. We propose Ensemble Kalman Diffusion Guidance (EnKG), a derivative-free approach that can solve inverse problems by only accessing forward model evaluations and a pre-trained diffusion model prior. We study the empirical effectiveness of EnKG across various inverse problems, including scientific settings such as inferring fluid flows and astronomical objects, which are highly non-linear inverse problems that often only permit black-box access to the forward model.
@misc{zheng2024ensemblekalmandiffusionguidance,title={Ensemble Kalman Diffusion Guidance: A Derivative-free Method for Inverse Problems},author={Zheng, Hongkai and Chu, Wenda and Wang, Austin and Kovachki, Nikola and Baptista, Ricardo and Yue, Yisong},year={2024},eprint={2409.20175},archiveprefix={arXiv},primaryclass={cs.LG},url={https://arxiv.org/abs/2409.20175}}
We propose an efficient approach to train large diffusion models with masked transformers. While masked transformers have been extensively explored for representation learning, their application to generative learning is less explored in the vision domain. Our work is the first to exploit masked training to reduce the training cost of diffusion models significantly. Specifically, we randomly mask out a high proportion (\emphe.g., 50%) of patches in diffused input images during training. For masked training, we introduce an asymmetric encoder-decoder architecture consisting of a transformer encoder that operates only on unmasked patches and a lightweight transformer decoder on full patches. To promote a long-range understanding of full patches, we add an auxiliary task of reconstructing masked patches to the denoising score matching objective that learns the score of unmasked patches. Experiments on ImageNet-256256 show that our approach achieves the same performance as the state-of-the-art Diffusion Transformer (DiT) model, using only 31% of its original training time. Thus, our method allows for efficient training of diffusion models without sacrificing the generative performance.
@article{zheng2024fast,title={Fast Training of Diffusion Models with Masked Transformers},author={Zheng, Hongkai and Nie, Weili and Vahdat, Arash and Anandkumar, Anima},journal={Transactions on Machine Learning Research},issn={2835-8856},year={2024},url={https://openreview.net/forum?id=vTBjBtGioE},}
Diffusion models have found widespread adoption in various areas. However, their sampling process is slow because it requires hundreds to thousands of network evaluations to emulate a continuous process defined by differential equations. In this work, we use neural operators, an efficient method to solve the probability flow differential equations, to accelerate the sampling process of diffusion models. Compared to other fast sampling methods that have a sequential nature, we are the first to propose a parallel decoding method that generates images with only one model forward pass. We propose diffusion model sampling with neural operator (DSNO) that maps the initial condition, i.e., Gaussian distribution, to the continuous-time solution trajectory of the reverse diffusion process. To model the temporal correlations along the trajectory, we introduce temporal convolution layers that are parameterized in the Fourier space into the given diffusion model backbone. We show our method achieves state-of-the-art FID of 3.78 for CIFAR-10 and 7.83 for ImageNet-64 in the one-model-evaluation setting.
@inproceedings{zheng2023fast,eprint={2211.13449},title={Fast sampling of diffusion models via operator learning},author={Zheng, Hongkai and Nie, Weili and Vahdat, Arash and Azizzadenesheli, Kamyar and Anandkumar, Anima},booktitle={International conference on machine learning},pages={42390--42402},year={2023},organization={PMLR},url={https://proceedings.mlr.press/v202/zheng23d.html},}