TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/MoCo v3

MoCo v3

Computer VisionIntroduced 200010 papers
Source Paper

Description

MoCo v3 aims to stabilize training of self-supervised ViTs. MoCo v3 is an incremental improvement of MoCo v1/2. Two crops are used for each image under random data augmentation. They are encoded by two encoders fqf_qfq​ and fkf_kfk​ with output vectors qqq and kkk. qqq behaves like a "query", where the goal of learning is to retrieve the corresponding "key". The objective is to minimize a contrastive loss function of the following form:

Lq=−log⁡exp⁡(q⋅k+/τ)exp⁡(q⋅k+/τ)+∑k−exp⁡(q⋅k−/τ)\mathcal{L_q}=-\log \frac{\exp \left(q \cdot k^{+} / \tau\right)}{\exp \left(q \cdot k^{+} / \tau\right)+\sum_{k^{-}} \exp \left(q \cdot k^{-} / \tau\right)}Lq​=−logexp(q⋅k+/τ)+∑k−​exp(q⋅k−/τ)exp(q⋅k+/τ)​

This approach aims to train the Transformer in the contrastive/Siamese paradigm. The encoder fqf_qfq​ consists of a backbone (e.g., ResNet and ViT), a projection head, and an extra prediction head. The encoder fkf_kfk​ has the back the backbone and projection head but not the prediction head. fkf_kfk​ is updated by the moving average of fqf_qfq​, excluding the prediction head.

Papers Using This Method

Task-Specific Knowledge Distillation from the Vision Foundation Model for Enhanced Medical Image Segmentation2025-03-10Enhancing Contrastive Learning Inspired by the Philosophy of "The Blind Men and the Elephant"2024-12-21SRA: A Novel Method to Improve Feature Embedding in Self-supervised Learning for Histopathological Images2024-10-23Improving Visual Prompt Tuning for Self-supervised Vision Transformers2023-06-08Internet Explorer: Targeted Representation Learning on the Open Web2023-02-27Pretraining the Vision Transformer using self-supervised methods for vision based Deep Reinforcement Learning2022-09-22Fast-MoCo: Boost Momentum-based Contrastive Learning with Combinatorial Patches2022-07-17OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework2022-02-07Self-Supervised Learning with Swin Transformers2021-05-10An Empirical Study of Training Self-Supervised Vision Transformers2021-04-05