TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Controlling Vision-Language Models for Multi-Task Image Re...

Controlling Vision-Language Models for Multi-Task Image Restoration

Ziwei Luo, Fredrik K. Gustafsson, Zheng Zhao, Jens Sjölund, Thomas B. Schön

2023-10-02Image DenoisingShadow RemovalRain RemovalImage ReconstructionImage DehazingImage InpaintingImage RestorationUnified Image RestorationSingle Image DerainingLanguage ModellingLow-Light Image Enhancement
PaperPDFCode(official)

Abstract

Vision-language models such as CLIP have shown great impact on diverse downstream tasks for zero-shot or label-free predictions. However, when it comes to low-level vision such as image restoration their performance deteriorates dramatically due to corrupted inputs. In this paper, we present a degradation-aware vision-language model (DA-CLIP) to better transfer pretrained vision-language models to low-level vision tasks as a multi-task framework for image restoration. More specifically, DA-CLIP trains an additional controller that adapts the fixed CLIP image encoder to predict high-quality feature embeddings. By integrating the embedding into an image restoration network via cross-attention, we are able to pilot the model to learn a high-fidelity image reconstruction. The controller itself will also output a degradation feature that matches the real corruptions of the input, yielding a natural classifier for different degradation types. In addition, we construct a mixed degradation dataset with synthetic captions for DA-CLIP training. Our approach advances state-of-the-art performance on both \emph{degradation-specific} and \emph{unified} image restoration tasks, showing a promising direction of prompting image restoration with large-scale pretrained vision-language models. Our code is available at https://github.com/Algolzw/daclip-uir.

Results

TaskDatasetMetricValueModel
Image EnhancementLOLAverage PSNR23.77DA-CLIP
Image EnhancementLOLLPIPS0.083DA-CLIP
Image EnhancementLOLSSIM0.83DA-CLIP
Rain RemovalRain100HPSNR33.91DA-CLIP
Rain RemovalRain100HSSIM0.926DA-CLIP
DehazingRESIDE-6KPSNR30.16DA-CLIP
DehazingRESIDE-6KSSIM0.936DA-CLIP
Image DehazingRESIDE-6KPSNR30.16DA-CLIP
Image DehazingRESIDE-6KSSIM0.936DA-CLIP

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Unsupervised Part Discovery via Descriptor-Based Masked Image Restoration with Optimized Constraints2025-07-16Assay2Mol: large language model-based drug design using BioAssay context2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16