publications
2025
- InverseBench: Benchmarking Plug-and-Play Diffusion Models for Scientific Inverse ProblemsHongkai Zheng*, Wenda Chu*, Bingliang Zhang*, Zihui Wu*, Austin Wang, Berthy Feng, Caifeng Zou, Yu Sun , Nikola Borislavov Kovachki, Zachary E Ross, Katherine Bouman, and Yisong YueIn The Thirteenth International Conference on Learning Representations, Spotlight (top 5.1%) , 2025
Plug-and-play diffusion prior methods have emerged as a promising research direction for solving inverse problems. However, current studies primarily focus on natural image restoration, leaving the performance of these algorithms in scientific inverse problems largely unexplored. To address this gap, we introduce \textscInverseBench, a unified framework that evaluates diffusion models across five distinct scientific inverse problems. These problems present unique structural challenges that differ from existing benchmarks, arising from critical scientific applications such as black hole imaging, seismology, optical tomography, medical imaging, and fluid dynamics. With \textscInverseBench, we benchmark 15 inverse problem algorithms that use plug-and-play diffusion prior methods against strong, domain-specific baselines, offering valuable new insights into the strengths and weaknesses of existing algorithms. We open-source the datasets, pre-trained models, and the codebase to facilitate future research and development.
@inproceedings{zheng2025inversebench, title = {InverseBench: Benchmarking Plug-and-Play Diffusion Models for Scientific Inverse Problems}, author = {Zheng, Hongkai and Chu, Wenda and Zhang, Bingliang and Wu, Zihui and Wang, Austin and Feng, Berthy and Zou, Caifeng and Sun, Yu and Kovachki, Nikola Borislavov and Ross, Zachary E and Bouman, Katherine and Yue, Yisong}, booktitle = {The Thirteenth International Conference on Learning Representations}, year = {2025}, url = {https://openreview.net/forum?id=U3PBITXNG6}, }
2024
- Ensemble Kalman Diffusion Guidance: A Derivative-free Method for Inverse Problems2024
When solving inverse problems, one increasingly popular approach is to use pre-trained diffusion models as plug-and-play priors. This framework can accommodate different forward models without re-training while preserving the generative capability of diffusion models. Despite their success in many imaging inverse problems, most existing methods rely on privileged information such as derivative, pseudo-inverse, or full knowledge about the forward model. This reliance poses a substantial limitation that restricts their use in a wide range of problems where such information is unavailable, such as in many scientific applications. We propose Ensemble Kalman Diffusion Guidance (EnKG), a derivative-free approach that can solve inverse problems by only accessing forward model evaluations and a pre-trained diffusion model prior. We study the empirical effectiveness of EnKG across various inverse problems, including scientific settings such as inferring fluid flows and astronomical objects, which are highly non-linear inverse problems that often only permit black-box access to the forward model.
@misc{zheng2024ensemblekalmandiffusionguidance, title = {Ensemble Kalman Diffusion Guidance: A Derivative-free Method for Inverse Problems}, author = {Zheng, Hongkai and Chu, Wenda and Wang, Austin and Kovachki, Nikola and Baptista, Ricardo and Yue, Yisong}, year = {2024}, eprint = {2409.20175}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, url = {https://arxiv.org/abs/2409.20175}, }
- Fast Training of Diffusion Models with Masked TransformersHongkai Zheng*, Weili Nie*, Arash Vahdat, and Anima AnandkumarTransactions on Machine Learning Research, 2024
We propose an efficient approach to train large diffusion models with masked transformers. While masked transformers have been extensively explored for representation learning, their application to generative learning is less explored in the vision domain. Our work is the first to exploit masked training to reduce the training cost of diffusion models significantly. Specifically, we randomly mask out a high proportion (\emphe.g., 50%) of patches in diffused input images during training. For masked training, we introduce an asymmetric encoder-decoder architecture consisting of a transformer encoder that operates only on unmasked patches and a lightweight transformer decoder on full patches. To promote a long-range understanding of full patches, we add an auxiliary task of reconstructing masked patches to the denoising score matching objective that learns the score of unmasked patches. Experiments on ImageNet-256256 show that our approach achieves the same performance as the state-of-the-art Diffusion Transformer (DiT) model, using only 31% of its original training time. Thus, our method allows for efficient training of diffusion models without sacrificing the generative performance.
@article{zheng2024fast, title = {Fast Training of Diffusion Models with Masked Transformers}, author = {Zheng, Hongkai and Nie, Weili and Vahdat, Arash and Anandkumar, Anima}, journal = {Transactions on Machine Learning Research}, issn = {2835-8856}, year = {2024}, url = {https://openreview.net/forum?id=vTBjBtGioE}, }
- Physics-informed neural operator for learning partial differential equationsZongyi Li*, Hongkai Zheng*, Nikola Kovachki, David Jin, Haoxuan Chen, Burigede Liu, Kamyar Azizzadenesheli, and Anima AnandkumarACM/JMS Journal of Data Science, 2024
@article{li2024physics, title = {Physics-informed neural operator for learning partial differential equations}, author = {Li, Zongyi and Zheng, Hongkai and Kovachki, Nikola and Jin, David and Chen, Haoxuan and Liu, Burigede and Azizzadenesheli, Kamyar and Anandkumar, Anima}, journal = {ACM/JMS Journal of Data Science}, volume = {1}, number = {3}, pages = {1--27}, year = {2024}, publisher = {ACM New York, NY}, url = {https://dl.acm.org/doi/full/10.1145/3648506}, }
2023
- Fast sampling of diffusion models via operator learningHongkai Zheng, Weili Nie, Arash Vahdat, Kamyar Azizzadenesheli, and Anima AnandkumarIn International conference on machine learning, 2023
Diffusion models have found widespread adoption in various areas. However, their sampling process is slow because it requires hundreds to thousands of network evaluations to emulate a continuous process defined by differential equations. In this work, we use neural operators, an efficient method to solve the probability flow differential equations, to accelerate the sampling process of diffusion models. Compared to other fast sampling methods that have a sequential nature, we are the first to propose a parallel decoding method that generates images with only one model forward pass. We propose diffusion model sampling with neural operator (DSNO) that maps the initial condition, i.e., Gaussian distribution, to the continuous-time solution trajectory of the reverse diffusion process. To model the temporal correlations along the trajectory, we introduce temporal convolution layers that are parameterized in the Fourier space into the given diffusion model backbone. We show our method achieves state-of-the-art FID of 3.78 for CIFAR-10 and 7.83 for ImageNet-64 in the one-model-evaluation setting.
@inproceedings{zheng2023fast, eprint = {2211.13449}, title = {Fast sampling of diffusion models via operator learning}, author = {Zheng, Hongkai and Nie, Weili and Vahdat, Arash and Azizzadenesheli, Kamyar and Anandkumar, Anima}, booktitle = {International conference on machine learning}, pages = {42390--42402}, year = {2023}, organization = {PMLR}, url = {https://proceedings.mlr.press/v202/zheng23d.html}, }
2022
- Langevin Monte Carlo for Contextual BanditsPan Xu, Hongkai Zheng, Eric V Mazumdar, Kamyar Azizzadenesheli, and Animashree AnandkumarIn International Conference on Machine Learning, 2022
We study the efficiency of Thompson sampling for contextual bandits. Existing Thompson sampling-based algorithms need to construct a Laplace approximation (i.e., a Gaussian distribution) of the posterior distribution, which is inefficient to sample in high dimensional applications for general covariance matrices. Moreover, the Gaussian approximation may not be a good surrogate for the posterior distribution for general reward generating functions. We propose an efficient posterior sampling algorithm, viz., Langevin Monte Carlo Thompson Sampling (LMC-TS), that uses Markov Chain Monte Carlo (MCMC) methods to directly sample from the posterior distribution in contextual bandits. Our method is computationally efficient since it only needs to perform noisy gradient descent updates without constructing the Laplace approximation of the posterior distribution. We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits, viz., linear contextual bandits. We conduct experiments on both synthetic data and real-world datasets on different contextual bandit models, which demonstrates that directly sampling from the posterior is both computationally efficient and competitive in performance.
@inproceedings{xu2022langevin, title = {Langevin Monte Carlo for Contextual Bandits}, author = {Xu, Pan and Zheng, Hongkai and Mazumdar, Eric V and Azizzadenesheli, Kamyar and Anandkumar, Animashree}, booktitle = {International Conference on Machine Learning}, pages = {24830--24850}, year = {2022}, organization = {PMLR}, url = {https://proceedings.mlr.press/v162/xu22p.html}, }
2020
- Implicit competitive regularization in GANsFlorian Schaefer*, Hongkai Zheng*, and Animashree AnandkumarIn International Conference on Machine Learning, 2020
The success of GANs is usually attributed to properties of the divergence obtained by an optimal discriminator. In this work we show that this approach has a fundamental flaw:\{If we do not impose regularity of the discriminator, it can exploit visually imperceptible errors of the generator to always achieve the maximal generator loss. In practice, gradient penalties are used to regularize the discriminator. However, this needs a metric on the space of images that captures visual similarity. Such a metric is not known, which explains the limited success of gradient penalties in stabilizing GANs.\{Instead, we argue that the implicit competitive regularization (ICR) arising from the simultaneous optimization of generator and discriminator enables GANs performance. We show that opponent-aware modelling of generator and discriminator, as present in competitive gradient descent (CGD), can significantly strengthen ICR and thus stabilize GAN training without explicit regularization. In our experiments, we use an existing implementation of WGAN-GP and show that by training it with CGD without any explicit regularization, we can improve the inception score (IS) on CIFAR10, without any hyperparameter tuning.
@inproceedings{schaefer2020implicit, title = {Implicit competitive regularization in GANs}, author = {Schaefer, Florian and Zheng, Hongkai and Anandkumar, Animashree}, booktitle = {International Conference on Machine Learning}, pages = {8533--8544}, year = {2020}, organization = {PMLR}, url = {https://proceedings.mlr.press/v119/schaefer20a.html}, }