Yingying Deng, Xiangyu He, Changwang Mei, Peisong Wang, Fan Tang
Though Rectified Flows (ReFlows) with distillation offers a promising way for fast sampling, its fast inversion transforms images back to structured noise for recovery and following editing remains unsolved. This paper introduces FireFlow, a simple yet effective zero-shot approach that inherits the startling capacity of ReFlow-based models (such as FLUX) in generation while extending its capabilities to accurate inversion and editing in $8$ steps. We first demonstrate that a carefully designed numerical solver is pivotal for ReFlow inversion, enabling accurate inversion and reconstruction with the precision of a second-order solver while maintaining the practical efficiency of a first-order Euler method. This solver achieves a $3\times$ runtime speedup compared to state-of-the-art ReFlow inversion and editing techniques, while delivering smaller reconstruction errors and superior editing results in a training-free mode. The code is available at $\href{https://github.com/HolmesShuan/FireFlow}{this URL}$.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Generation | PIE-Bench | Background LPIPS | 123.6 | FireFlow |
| Image Generation | PIE-Bench | Background PSNR | 23.03 | FireFlow |
| Image Generation | PIE-Bench | CLIPSIM | 26.02 | FireFlow |
| Image Generation | PIE-Bench | Structure Distance | 27.1 | FireFlow |
| Image Generation | PIE-Bench | Background LPIPS | 239.4 | FireFlow (Add Q) |
| Image Generation | PIE-Bench | Background PSNR | 16.49 | FireFlow (Add Q) |
| Image Generation | PIE-Bench | CLIPSIM | 27.33 | FireFlow (Add Q) |
| Image Generation | PIE-Bench | Structure Distance | 70.9 | FireFlow (Add Q) |
| Text-to-Image Generation | PIE-Bench | Background LPIPS | 123.6 | FireFlow |
| Text-to-Image Generation | PIE-Bench | Background PSNR | 23.03 | FireFlow |
| Text-to-Image Generation | PIE-Bench | CLIPSIM | 26.02 | FireFlow |
| Text-to-Image Generation | PIE-Bench | Structure Distance | 27.1 | FireFlow |
| Text-to-Image Generation | PIE-Bench | Background LPIPS | 239.4 | FireFlow (Add Q) |
| Text-to-Image Generation | PIE-Bench | Background PSNR | 16.49 | FireFlow (Add Q) |
| Text-to-Image Generation | PIE-Bench | CLIPSIM | 27.33 | FireFlow (Add Q) |
| Text-to-Image Generation | PIE-Bench | Structure Distance | 70.9 | FireFlow (Add Q) |
| 10-shot image generation | PIE-Bench | Background LPIPS | 123.6 | FireFlow |
| 10-shot image generation | PIE-Bench | Background PSNR | 23.03 | FireFlow |
| 10-shot image generation | PIE-Bench | CLIPSIM | 26.02 | FireFlow |
| 10-shot image generation | PIE-Bench | Structure Distance | 27.1 | FireFlow |
| 10-shot image generation | PIE-Bench | Background LPIPS | 239.4 | FireFlow (Add Q) |
| 10-shot image generation | PIE-Bench | Background PSNR | 16.49 | FireFlow (Add Q) |
| 10-shot image generation | PIE-Bench | CLIPSIM | 27.33 | FireFlow (Add Q) |
| 10-shot image generation | PIE-Bench | Structure Distance | 70.9 | FireFlow (Add Q) |
| 1 Image, 2*2 Stitchi | PIE-Bench | Background LPIPS | 123.6 | FireFlow |
| 1 Image, 2*2 Stitchi | PIE-Bench | Background PSNR | 23.03 | FireFlow |
| 1 Image, 2*2 Stitchi | PIE-Bench | CLIPSIM | 26.02 | FireFlow |
| 1 Image, 2*2 Stitchi | PIE-Bench | Structure Distance | 27.1 | FireFlow |
| 1 Image, 2*2 Stitchi | PIE-Bench | Background LPIPS | 239.4 | FireFlow (Add Q) |
| 1 Image, 2*2 Stitchi | PIE-Bench | Background PSNR | 16.49 | FireFlow (Add Q) |
| 1 Image, 2*2 Stitchi | PIE-Bench | CLIPSIM | 27.33 | FireFlow (Add Q) |
| 1 Image, 2*2 Stitchi | PIE-Bench | Structure Distance | 70.9 | FireFlow (Add Q) |