TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Semantic-visual Guided Transformer for Few-shot Class-incr...

Semantic-visual Guided Transformer for Few-shot Class-incremental Learning

Wenhao Qiu, Sichao Fu, Jingyi Zhang, Chengxiang Lei, Qinmu Peng

2023-03-27Few-Shot Class-Incremental LearningRepresentation LearningClass Incremental Learningclass-incremental learningIncremental Learning
PaperPDF

Abstract

Few-shot class-incremental learning (FSCIL) has recently attracted extensive attention in various areas. Existing FSCIL methods highly depend on the robustness of the feature backbone pre-trained on base classes. In recent years, different Transformer variants have obtained significant processes in the feature representation learning of massive fields. Nevertheless, the progress of the Transformer in FSCIL scenarios has not achieved the potential promised in other fields so far. In this paper, we develop a semantic-visual guided Transformer (SV-T) to enhance the feature extracting capacity of the pre-trained feature backbone on incremental classes. Specifically, we first utilize the visual (image) labels provided by the base classes to supervise the optimization of the Transformer. And then, a text encoder is introduced to automatically generate the corresponding semantic (text) labels for each image from the base classes. Finally, the constructed semantic labels are further applied to the Transformer for guiding its hyperparameters updating. Our SV-T can take full advantage of more supervision information from base classes and further enhance the training robustness of the feature backbone. More importantly, our SV-T is an independent method, which can directly apply to the existing FSCIL architectures for acquiring embeddings of various incremental classes. Extensive experiments on three benchmarks, two FSCIL architectures, and two Transformer variants show that our proposed SV-T obtains a significant improvement in comparison to the existing state-of-the-art FSCIL methods.

Results

TaskDatasetMetricValueModel
Continual Learning CUB-200-2011Average Accuracy78.65SV-T
Continual Learning CUB-200-2011Last Accuracy 76.17SV-T
Continual LearningCIFAR-100Average Accuracy76.84SV-T
Continual LearningCIFAR-100Last Accuracy69.75SV-T
Continual Learningmini-ImagenetAverage Accuracy85.07SV-T
Continual Learningmini-ImagenetLast Accuracy 81.65SV-T
Class Incremental Learning CUB-200-2011Average Accuracy78.65SV-T
Class Incremental Learning CUB-200-2011Last Accuracy 76.17SV-T
Class Incremental LearningCIFAR-100Average Accuracy76.84SV-T
Class Incremental LearningCIFAR-100Last Accuracy69.75SV-T
Class Incremental Learningmini-ImagenetAverage Accuracy85.07SV-T
Class Incremental Learningmini-ImagenetLast Accuracy 81.65SV-T

Related Papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Are encoders able to learn landmarkers for warm-starting of Hyperparameter Optimization?2025-07-16Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16A Mixed-Primitive-based Gaussian Splatting Method for Surface Reconstruction2025-07-15Dual Dimensions Geometric Representation Learning Based Document Dewarping2025-07-11