TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Meta-DMoE: Adapting to Domain Shift by Meta-Distillation f...

Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts

Tao Zhong, Zhixiang Chi, Li Gu, Yang Wang, Yuanhao Yu, Jin Tang

2022-10-08Meta-LearningDomain GeneralizationTransfer LearningTest-time AdaptationKnowledge Distillation
PaperPDFCode(official)

Abstract

In this paper, we tackle the problem of domain shift. Most existing methods perform training on multiple source domains using a single model, and the same trained model is used on all unseen target domains. Such solutions are sub-optimal as each target domain exhibits its own specialty, which is not adapted. Furthermore, expecting single-model training to learn extensive knowledge from multiple source domains is counterintuitive. The model is more biased toward learning only domain-invariant features and may result in negative knowledge transfer. In this work, we propose a novel framework for unsupervised test-time adaptation, which is formulated as a knowledge distillation process to address domain shift. Specifically, we incorporate Mixture-of-Experts (MoE) as teachers, where each expert is separately trained on different source domains to maximize their specialty. Given a test-time target domain, a small set of unlabeled data is sampled to query the knowledge from MoE. As the source domains are correlated to the target domains, a transformer-based aggregator then combines the domain knowledge by examining the interconnection among them. The output is treated as a supervision signal to adapt a student prediction network toward the target domain. We further employ meta-learning to enforce the aggregator to distill positive knowledge and the student network to achieve fast adaptation. Extensive experiments demonstrate that the proposed method outperforms the state-of-the-art and validates the effectiveness of each proposed component. Our code is available at https://github.com/n3il666/Meta-DMoE.

Results

TaskDatasetMetricValueModel
Domain AdaptationPACSAverage Accuracy86.9Meta-DMoE (ResNet-50)
Domain AdaptationDomainNetAverage Accuracy44.2Meta-DMoE (ResNet-50)
Domain GeneralizationPACSAverage Accuracy86.9Meta-DMoE (ResNet-50)
Domain GeneralizationDomainNetAverage Accuracy44.2Meta-DMoE (ResNet-50)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction2025-07-18Simulate, Refocus and Ensemble: An Attention-Refocusing Scheme for Domain Generalization2025-07-17GLAD: Generalizable Tuning for Vision-Language Models2025-07-17MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling2025-07-17Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17Uncertainty-Aware Cross-Modal Knowledge Distillation with Prototype Learning for Multimodal Brain-Computer Interfaces2025-07-17Are encoders able to learn landmarkers for warm-starting of Hyperparameter Optimization?2025-07-16