TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Context-Aware Robust Fine-Tuning

Context-Aware Robust Fine-Tuning

Xiaofeng Mao, Yuefeng Chen, Xiaojun Jia, Rong Zhang, Hui Xue, Zhao Li

2022-11-29Domain Generalization
PaperPDF

Abstract

Contrastive Language-Image Pre-trained (CLIP) models have zero-shot ability of classifying an image belonging to "[CLASS]" by using similarity between the image and the prompt sentence "a [CONTEXT] of [CLASS]". Based on exhaustive text cues in "[CONTEXT]", CLIP model is aware of different contexts, e.g. background, style, viewpoint, and exhibits unprecedented robustness against a wide range of distribution shifts. However, recent works find further fine-tuning of CLIP models improves accuracy but sacrifices the robustness on downstream tasks. We conduct an empirical investigation to show fine-tuning will corrupt the context-aware ability of pre-trained CLIP features. To solve this problem, we propose Context-Aware Robust Fine-tuning (CAR-FT). CAR-FT regularizes the model during fine-tuning to capture the context information. Specifically, we use zero-shot prompt weights to get the context distribution contained in the image. By minimizing the Kullback-Leibler Divergence (KLD) between context distributions induced by original/fine-tuned CLIP models, CAR-FT makes the context-aware ability of CLIP inherited into downstream tasks, and achieves both higher In-Distribution (ID) and Out-Of-Distribution (OOD) accuracy. The experimental results show CAR-FT achieves superior robustness on five OOD test datasets of ImageNet, and meanwhile brings accuracy gains on nine downstream tasks. Additionally, CAR-FT surpasses previous Domain Generalization (DG) methods and gets 78.5% averaged accuracy on DomainBed benchmark, building the new state-of-the-art.

Results

TaskDatasetMetricValueModel
Domain AdaptationPACSAverage Accuracy96.8CAR-FT (CLIP, ViT-B/16)
Domain AdaptationImageNet-RTop-1 Error Rate10.3CAR-FT (CLIP, ViT-L/14@336px)
Domain AdaptationOffice-HomeAverage Accuracy85.7CAR-FT (CLIP, ViT-B/16)
Domain AdaptationImageNet-ATop-1 accuracy %81.5CAR-FT (CLIP, ViT-L/14@336px)
Domain AdaptationDomainNetAverage Accuracy62.5CAR-FT (CLIP, ViT-B/16)
Domain AdaptationVLCSAverage Accuracy85.5CAR-FT (CLIP, ViT-B/16)
Domain AdaptationImageNet-SketchTop-1 accuracy65.5CAR-FT (CLIP, ViT-L/14@336px)
Domain AdaptationTerraIncognitaAverage Accuracy61.9CAR-FT (CLIP, ViT-B/16)
Domain GeneralizationPACSAverage Accuracy96.8CAR-FT (CLIP, ViT-B/16)
Domain GeneralizationImageNet-RTop-1 Error Rate10.3CAR-FT (CLIP, ViT-L/14@336px)
Domain GeneralizationOffice-HomeAverage Accuracy85.7CAR-FT (CLIP, ViT-B/16)
Domain GeneralizationImageNet-ATop-1 accuracy %81.5CAR-FT (CLIP, ViT-L/14@336px)
Domain GeneralizationDomainNetAverage Accuracy62.5CAR-FT (CLIP, ViT-B/16)
Domain GeneralizationVLCSAverage Accuracy85.5CAR-FT (CLIP, ViT-B/16)
Domain GeneralizationImageNet-SketchTop-1 accuracy65.5CAR-FT (CLIP, ViT-L/14@336px)
Domain GeneralizationTerraIncognitaAverage Accuracy61.9CAR-FT (CLIP, ViT-B/16)

Related Papers

Simulate, Refocus and Ensemble: An Attention-Refocusing Scheme for Domain Generalization2025-07-17GLAD: Generalizable Tuning for Vision-Language Models2025-07-17MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling2025-07-17InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing2025-07-16From Physics to Foundation Models: A Review of AI-Driven Quantitative Remote Sensing Inversion2025-07-11Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion2025-07-08Prompt-Free Conditional Diffusion for Multi-object Image Augmentation2025-07-08Integrated Structural Prompt Learning for Vision-Language Models2025-07-08