AVIS

Overview

Diffusion models provide powerful priors for zero-shot video inverse problems, but their real-time deployment is hindered by two inefficiencies: high initial latency caused by holistic video restoration, and low throughput resulting from multiple VAE passes to enforce measurement consistency in pixel space. To overcome these limitations, we propose Autoregressive Video Inverse problem Solver (AVIS).

AVIS and AVIS Flash

AVIS : AR backbone + measurement-consistent initialization + guidance for every chunk.
AVIS Flash : same backbone and initialization + guidance only for the first chunk.

The AVIS framework leverages autoregressive video diffusion models to restore videos in a streaming manner, naturally eliminating latency bottlenecks. Additionally, AVIS initializes reverse diffusion with a measurement-consistent estimate, reducing the required sampling steps. While AVIS enforces measurement consistency for every video chunk, we further introduce a highly accelerated variant that enforces measurement consistency solely on the first video chunk, dubbed AVIS Flash.

Baseline Comparisons

	VISION-XL	LVTINO	AVIS	AVIS Flash
Latency (s) ↓	167	114	4	4
Time (s) ↓	167	114	68.5	13.7
FPS (frame/s) ↑	0.49	0.71	1.18	5.91

Efficiency comparison on 81 frames of 480x854 resolution video on a single RTX 4090 GPU.
Our proposed frameworks (AVIS and AVIS Flash) achieve significant improvements across all efficiency metrics. Bold: best, underline: second-best.

Temporal Average

To view in full resolution, please download the videos.

Measurement

VISION-XL

LVTINO

AVIS

AVIS Flash

Measurement

VISION-XL

LVTINO

AVIS

AVIS Flash

Measurement

VISION-XL

LVTINO

AVIS

AVIS Flash

Spatio-Temporal Average

To view in full resolution, please download the videos.

Measurement

VISION-XL

LVTINO

AVIS

AVIS Flash

Measurement

VISION-XL

LVTINO

AVIS

AVIS Flash

Measurement

VISION-XL

LVTINO

AVIS

AVIS Flash

Inpainting

To view in full resolution, please download the videos.

Measurement

VISION-XL

LVTINO

AVIS

AVIS Flash

Measurement

VISION-XL

LVTINO

AVIS

AVIS Flash

Measurement

VISION-XL

LVTINO

AVIS

AVIS Flash

Gaussian Deblur

To view in full resolution, please download the videos.

Measurement

VISION-XL

LVTINO

AVIS

AVIS Flash

Measurement

VISION-XL

LVTINO

AVIS

AVIS Flash

Measurement

VISION-XL

LVTINO

AVIS

AVIS Flash

Super Resolution

To view in full resolution, please download the videos.

Measurement

VISION-XL

LVTINO

AVIS

AVIS Flash

Measurement

VISION-XL

LVTINO

AVIS

AVIS Flash

Measurement

VISION-XL

LVTINO

AVIS

AVIS Flash

Autoregressive Propagation

Autoregressive propagation alone preserves temporal context, but gradually drifts from the desired restoration when later chunks are generated from pure noise (middle row).
AVIS mitigates this drift by initializing reverse diffusion from a measurement-consistent estimate, keeping each chunk closer to the target restoration trajectory (bottom row).

AVIS: Accelerating Video Inverse Problem Solvers with Autoregressive Diffusion Models

Overview

AVIS and AVIS Flash

Baseline Comparisons

Temporal Average

Spatio-Temporal Average

Inpainting

Gaussian Deblur

Super Resolution

Autoregressive Propagation

Long Video Restoration