Interpolate start reference image.

Illustration of five benchmark problems in the InverseBench. \( G \) represents the forward model that produces observations from the source. \( G^{\dagger} \) represents the inverse map. In the linear inverse scattering problem (left two), the observation is the recorded data from the receivers and the unknown source we aim to infer is the permittivity map of the object. The bottom panel displays the efficiency and accuracy plots for our benchmarked algorithms. Certain characteristics of the problem cause the efficiency and accuracy trade-offs of each algorithm to vary across tasks. In these plots, the larger radius of the points indicates greater interaction with the forward function \(G\), as measured by the number of forward model evaluations.

Abstract

Plug-and-play diffusion models have emerged as a promising research direction for solving inverse problems. However, current studies primarily focus on natural image restoration, leaving the performance of these algorithms in scientific inverse problems largely unexplored. To address this gap, we introduce InverseBench, a framework that evaluates diffusion models across five distinct scientific inverse problems. These problems present unique structural challenges that differ from existing benchmarks, arising from critical scientific applications such as optical tomography, medical imaging, black hole imaging, seismology, and fluid dynamics. With InverseBench, we benchmark 14 inverse problem algorithms that use plug-and-play diffusion models against strong, domain-specific baselines, offering valuable new insights into the strengths and weaknesses of existing algorithms. We open-source the codebase at https://github.com/devzhk/InverseBench with the datasets and pre-trained models to facilitate future research and development.

Methods & Inverse Problems

The following table summarizes the 14 benchmarked methods in InverseBench. They require different levels of properties of the forward model of the inverse problem.

Category Method SVD Pseudo inverse Linear Gradient
Linear guidance DDRM --
DDNM --
ΠGDM --
General guidance DPS
LGD
DPG
SCG
EnKG
Variable-splitting DiffPIR
PnP-DM
DAPS
Variational Bayes RED-diff
Sequential Monte Carlo FPS --
MCGDiff --

Characteristics of different inverse problems in InverseBench, from left to right: whether the forward model is linear, whether one can compute the SVD from the forward model, whether the inverse problem operates in the complex domain, whether the forward model can be solved in closed form, whether one can access gradients from the forward model, and the noise type.

Problem Linear SVD Complex domain Closed-form forward Gradient access Noise type
Linear inverse scattering Gaussian
Compressed sensing MRI Real-world
Black hole imaging Non-additive
Full waveform inversion Noise-free
Navier-Stokes equation Gaussian

Qualitative Visualization

Visual examples for each problem. In the linear inverse scattering task, PnP diffusion prior methods can generate shaper images than the traditional baseline FISTA. For black hole imaging, the baselines are blurrier than plug-in-plug diffusion prior methods. Full waveform inversion: Adam\(^*\) and LBFGS\(^*\) are initialized from Gaussian blurred ground truth. With random or constant initialization, these optimization methods simply fail. Navier-Stokes equation: PnP diffusion prior methods capture more rich flow feature than the conventional baseline.



Experimental Results

Failures of PnPDM methods

Illustration of the failures of PnPDP methods (DAPS as an example) on full waveform inversion. With a small learning rate, DAPS is numerically stable but does not solve the inverse problem effectively. With a slightly larger learning rate, DAPS produces a noisy velocity map that breaks the stability condition of the PDE solver, resulting in a complete failure.

Comparison to traditional baseline

Relative performance of plug-and-play diffusion prior methods compared with traditional baselines under different levels of measurement sparsity on different tasks. Metrics are averaged over multiple PnPDP methods. The performance difference increases in general as the measurement becomes sparser.

PnPDM method on out-of-distribution test samples

(a)
(b)

PnPDP methods on out-of-distribution test samples. (a) Black-hole imaging problem on digits inputs; and (b) inverse scattering on sources that contain 9 cells, while the prior model is trained on images with 1 to 6 cells.

Acknowledgments

This research is funded in part by NSF CPS Grant 1918655, NSF Award 2048237, NSF Award 2034306 and Amazon AI4Science Discovery Award. H.Z. is supported by the PIMCO and Amazon AI4science fellowship. Z.W. is supported by the Amazon AI4Science fellowship. B.Z. and W.C. are supported by the Kortschak Scholars Fellowship. B.F. is supported by the Pritzker Award and NSF Graduate Research Fellowship. Z.E.R. And C.Z. are supported by a Packard Fellowship from the David and Lucile Packard Foundation. We thank Ben Prather, Abhishek Joshi, Vedant Dhruv, C.K. Chan, and Charles Gammie for the synthetic blackhole images GRMHD dataset used here, generated under NSF grant AST 20-34306.

BibTeX

@inproceedings{
zheng2025inversebench,
title={InverseBench: Benchmarking Plug-and-Play Diffusion Models for Scientific Inverse Problems},
author={Hongkai Zheng and Wenda Chu and Bingliang Zhang and Zihui Wu and Austin Wang and Berthy Feng and Caifeng Zou and Yu Sun and Nikola Borislavov Kovachki and Zachary E Ross and Katherine Bouman and Yisong Yue},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=U3PBITXNG6}
}