AVIS: Accelerating Video Inverse Problem Solvers with Autoregressive Diffusion Models
Code (coming soon)
Overview
Diffusion models provide powerful priors for zero-shot video inverse problems, but their real-time deployment is hindered by two inefficiencies: high initial latency caused by holistic video restoration, and low throughput resulting from multiple VAE passes to enforce measurement consistency in pixel space.
To overcome these limitations, we propose Autoregressive Video Inverse problem Solver (AVIS).
AVIS and AVIS Flash
AVIS : AR backbone + measurement-consistent initialization + guidance for every chunk. AVIS Flash : same backbone and initialization + guidance only for the first chunk.
The AVIS framework leverages autoregressive video diffusion models to restore videos in a streaming manner, naturally eliminating latency bottlenecks.
Additionally, AVIS initializes reverse diffusion with a measurement-consistent estimate, reducing the required sampling steps.
While AVIS enforces measurement consistency for every video chunk, we further introduce a highly accelerated variant that enforces measurement consistency solely on the first video chunk, dubbed AVIS Flash.
Baseline Comparisons
VISION-XL
LVTINO
AVIS
AVIS Flash
Latency (s) ↓
167
114
4
4
Time (s) ↓
167
114
68.5
13.7
FPS (frame/s) ↑
0.49
0.71
1.18
5.91
Efficiency comparison on 81 frames of 480x854 resolution video on a single RTX 4090 GPU.
Our proposed frameworks (AVIS and AVIS Flash) achieve significant improvements across all efficiency metrics. Bold: best, underline: second-best.
Temporal Average
To view in full resolution, please download the videos.
Measurement
VISION-XL
LVTINO
AVIS
AVIS Flash
Measurement
VISION-XL
LVTINO
AVIS
AVIS Flash
Measurement
VISION-XL
LVTINO
AVIS
AVIS Flash
Spatio-Temporal Average
To view in full resolution, please download the videos.
Measurement
VISION-XL
LVTINO
AVIS
AVIS Flash
Measurement
VISION-XL
LVTINO
AVIS
AVIS Flash
Measurement
VISION-XL
LVTINO
AVIS
AVIS Flash
Inpainting
To view in full resolution, please download the videos.
Measurement
VISION-XL
LVTINO
AVIS
AVIS Flash
Measurement
VISION-XL
LVTINO
AVIS
AVIS Flash
Measurement
VISION-XL
LVTINO
AVIS
AVIS Flash
Gaussian Deblur
To view in full resolution, please download the videos.
Measurement
VISION-XL
LVTINO
AVIS
AVIS Flash
Measurement
VISION-XL
LVTINO
AVIS
AVIS Flash
Measurement
VISION-XL
LVTINO
AVIS
AVIS Flash
Super Resolution
To view in full resolution, please download the videos.
Measurement
VISION-XL
LVTINO
AVIS
AVIS Flash
Measurement
VISION-XL
LVTINO
AVIS
AVIS Flash
Measurement
VISION-XL
LVTINO
AVIS
AVIS Flash
Autoregressive Propagation
Autoregressive propagation alone preserves temporal context, but gradually drifts from the desired restoration when later chunks are generated from pure noise (middle row).
AVIS mitigates this drift by initializing reverse diffusion from a measurement-consistent estimate, keeping each chunk closer to the target restoration trajectory (bottom row).
Long Video Restoration
Stable restoration of a 1-minute, 960-frame video by periodically re-injecting measurement consistency to suppress error accumulation over time.