TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Enhance the Visual Representation via Discrete Adversarial...

Enhance the Visual Representation via Discrete Adversarial Training

Xiaofeng Mao, Yuefeng Chen, Ranjie Duan, Yao Zhu, Gege Qi, Shaokai Ye, Xiaodan Li, Rong Zhang, Hui Xue

2022-09-16Image ClassificationSelf-Supervised LearningDomain GeneralizationObject Detection
PaperPDFCode(official)

Abstract

Adversarial Training (AT), which is commonly accepted as one of the most effective approaches defending against adversarial examples, can largely harm the standard performance, thus has limited usefulness on industrial-scale production and applications. Surprisingly, this phenomenon is totally opposite in Natural Language Processing (NLP) task, where AT can even benefit for generalization. We notice the merit of AT in NLP tasks could derive from the discrete and symbolic input space. For borrowing the advantage from NLP-style AT, we propose Discrete Adversarial Training (DAT). DAT leverages VQGAN to reform the image data to discrete text-like inputs, i.e. visual words. Then it minimizes the maximal risk on such discrete images with symbolic adversarial perturbations. We further give an explanation from the perspective of distribution to demonstrate the effectiveness of DAT. As a plug-and-play technique for enhancing the visual representation, DAT achieves significant improvement on multiple tasks including image classification, object detection and self-supervised learning. Especially, the model pre-trained with Masked Auto-Encoding (MAE) and fine-tuned by our DAT without extra data can get 31.40 mCE on ImageNet-C and 32.77% top-1 accuracy on Stylized-ImageNet, building the new state-of-the-art. The code will be available at https://github.com/alibaba/easyrobust.

Results

TaskDatasetMetricValueModel
Domain AdaptationStylized-ImageNetTop 1 Accuracy32.77MAE+DAT (ViT-H)
Domain AdaptationImageNet-RTop-1 Error Rate34.39MAE+DAT (ViT-H)
Domain AdaptationImageNet-ATop-1 accuracy %68.92MAE+DAT (ViT-H)
Domain AdaptationImageNet-Cmean Corruption Error (mCE)31.4MAE+DAT (ViT-H)
Domain AdaptationImageNet-SketchTop-1 accuracy50.03MAE+DAT (ViT-H)
Domain GeneralizationStylized-ImageNetTop 1 Accuracy32.77MAE+DAT (ViT-H)
Domain GeneralizationImageNet-RTop-1 Error Rate34.39MAE+DAT (ViT-H)
Domain GeneralizationImageNet-ATop-1 accuracy %68.92MAE+DAT (ViT-H)
Domain GeneralizationImageNet-Cmean Corruption Error (mCE)31.4MAE+DAT (ViT-H)
Domain GeneralizationImageNet-SketchTop-1 accuracy50.03MAE+DAT (ViT-H)

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17Simulate, Refocus and Ensemble: An Attention-Refocusing Scheme for Domain Generalization2025-07-17GLAD: Generalizable Tuning for Vision-Language Models2025-07-17