TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Deep Residual Learning for Image Recognition

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

2015-12-10CVPR 2016 6Pedestrian Trajectory PredictionRetinal OCT Disease ClassificationImage ClassificationMulti-Label Image ClassificationFace Anti-SpoofingDomain GeneralizationSemantic SegmentationMedical Image ClassificationPerson Re-IdentificationClassificationDynamic Facial Expression RecognitionOut-of-Distribution GeneralizationObject DetectionImage-to-Image Translation
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode(official)CodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode

Abstract

Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

Results

TaskDatasetMetricValueModel
Autonomous VehiclesUAV-HumanBackpack63.5ResNet
Autonomous VehiclesUAV-HumanGender74.7ResNet
Autonomous VehiclesUAV-HumanHat65.2ResNet
Autonomous VehiclesUAV-HumanLCC49.7ResNet
Autonomous VehiclesUAV-HumanLCS69.3ResNet
Autonomous VehiclesUAV-HumanUCC44.4ResNet
Autonomous VehiclesUAV-HumanUCS68.9ResNet
Image-to-Image TranslationGTAV-to-Cityscapes LabelsmIoU41.7ResNet101 65.1
Image-to-Image TranslationSyn2Real-CAccuracy52.4No Adaptation
Domain AdaptationOffice-31Average Accuracy76.1ResNet-50
Domain AdaptationOffice-HomeAccuracy59.9ResNet-50 [cite:CVPR16DRL]
Domain AdaptationImageNet-RTop-1 Error Rate63.9ResNet-50
Domain AdaptationImageNet-ATop-1 accuracy %4.2ResNet-50 (300 Epochs)
Domain AdaptationVizWiz-ClassificationAccuracy - All Images47.5ResNet-152
Domain AdaptationVizWiz-ClassificationAccuracy - Clean Images51.3ResNet-152
Domain AdaptationVizWiz-ClassificationAccuracy - Corrupted Images43.3ResNet-152
Domain AdaptationVizWiz-ClassificationAccuracy - All Images46.3ResNet-101
Domain AdaptationVizWiz-ClassificationAccuracy - Clean Images50.1ResNet-101
Domain AdaptationVizWiz-ClassificationAccuracy - Corrupted Images40.5ResNet-101
Domain AdaptationVizWiz-ClassificationAccuracy - All Images42.9ResNet-50
Domain AdaptationVizWiz-ClassificationAccuracy - Clean Images47.7ResNet-50
Domain AdaptationVizWiz-ClassificationAccuracy - Corrupted Images37.1ResNet-50
Image GenerationGTAV-to-Cityscapes LabelsmIoU41.7ResNet101 65.1
Image GenerationSyn2Real-CAccuracy52.4No Adaptation
Person Re-IdentificationSYSU-30kRank-120.1ResNet-50 (generalization)
CrowdsUCF-QNRFMAE190Resnet101
Speaker VerificationVoxCeleb2EER100ResNet-50
Semantic SegmentationCityscapes valmIoU75.7Dilated-ResNet (Dilated-ResNet-101)
Semantic SegmentationDADA-segmIoU23.6ResNet-101
Semantic SegmentationDADA-segmIoU18.96ResNet-50
Multi-Label Image ClassificationVizWiz-ClassificationAccuracy47.5ResNet151
Pedestrian Attribute RecognitionUAV-HumanBackpack63.5ResNet
Pedestrian Attribute RecognitionUAV-HumanGender74.7ResNet
Pedestrian Attribute RecognitionUAV-HumanHat65.2ResNet
Pedestrian Attribute RecognitionUAV-HumanLCC49.7ResNet
Pedestrian Attribute RecognitionUAV-HumanLCS69.3ResNet
Pedestrian Attribute RecognitionUAV-HumanUCC44.4ResNet
Pedestrian Attribute RecognitionUAV-HumanUCS68.9ResNet
Object DetectionCOCO minivalAP5064.3Cascade Mask R-CNN (ResNet-50)
Object DetectionCOCO minivalAP7550.5Cascade Mask R-CNN (ResNet-50)
Object DetectionCOCO minivalbox AP46.3Cascade Mask R-CNN (ResNet-50)
Object DetectionCOCO minivalAP5063GFL (ResNet-50)
Object DetectionCOCO minivalAP7548.3GFL (ResNet-50)
Object DetectionCOCO minivalbox AP44.5GFL (ResNet-50)
Object DetectionCOCO minivalAP5061.9ATSS (ResNet-50)
Object DetectionCOCO minivalAP7547ATSS (ResNet-50)
Object DetectionCOCO minivalbox AP43.5ATSS (ResNet-50)
Image ClassificationGasHisSDBAccuracy98.56ResNet-50
Image ClassificationGasHisSDBF1-Score99.24ResNet-50
Image ClassificationGasHisSDBPrecision99.94ResNet-50
Image ClassificationGasHisSDBAccuracy98.47ResNet-18
Image ClassificationGasHisSDBF1-Score99.19ResNet-18
Image ClassificationGasHisSDBPrecision99.94ResNet-18
Image ClassificationOmniBenchmarkAverage Top-1 Accuracy37.4ResNet-101
Image ClassificationOmniBenchmarkAverage Top-1 Accuracy34.3ResNet-50
Image Classificationcifar1001:1 Accuracy45.98shreynet
Image ClassificationImageNetGFLOPs11.3ResNet-152
Image ClassificationImageNetGFLOPs7.6ResNet-101
Image ClassificationImageNetGFLOPs3.8ResNet-50
Image ClassificationVizWiz-ClassificationAccuracy47.5ResNet151
3DCOCO minivalAP5064.3Cascade Mask R-CNN (ResNet-50)
3DCOCO minivalAP7550.5Cascade Mask R-CNN (ResNet-50)
3DCOCO minivalbox AP46.3Cascade Mask R-CNN (ResNet-50)
3DCOCO minivalAP5063GFL (ResNet-50)
3DCOCO minivalAP7548.3GFL (ResNet-50)
3DCOCO minivalbox AP44.5GFL (ResNet-50)
3DCOCO minivalAP5061.9ATSS (ResNet-50)
3DCOCO minivalAP7547ATSS (ResNet-50)
3DCOCO minivalbox AP43.5ATSS (ResNet-50)
Breast Tumour ClassificationPCamAUC0.948ResNet-50 (e)
Breast Tumour ClassificationPCamAUC0.942ResNet-34 (e)
Unsupervised Domain AdaptationOffice-HomeAccuracy59.9ResNet-50 [cite:CVPR16DRL]
2D ClassificationCOCO minivalAP5064.3Cascade Mask R-CNN (ResNet-50)
2D ClassificationCOCO minivalAP7550.5Cascade Mask R-CNN (ResNet-50)
2D ClassificationCOCO minivalbox AP46.3Cascade Mask R-CNN (ResNet-50)
2D ClassificationCOCO minivalAP5063GFL (ResNet-50)
2D ClassificationCOCO minivalAP7548.3GFL (ResNet-50)
2D ClassificationCOCO minivalbox AP44.5GFL (ResNet-50)
2D ClassificationCOCO minivalAP5061.9ATSS (ResNet-50)
2D ClassificationCOCO minivalAP7547ATSS (ResNet-50)
2D ClassificationCOCO minivalbox AP43.5ATSS (ResNet-50)
ClassificationXImageNet-12Robustness Score0.8985ResNet 50
ClassificationNCT-CRC-HE-100KAccuracy (%)94.72ResNet-50
ClassificationNCT-CRC-HE-100KF1-Score97.09ResNet-50
ClassificationNCT-CRC-HE-100KPrecision100ResNet-50
ClassificationNCT-CRC-HE-100KSpecificity99.34ResNet-50
ClassificationNCT-CRC-HE-100KAccuracy (%)92.66ResNet-18
ClassificationNCT-CRC-HE-100KF1-Score95.23ResNet-18
ClassificationNCT-CRC-HE-100KPrecision99.9ResNet-18
ClassificationNCT-CRC-HE-100KSpecificity99.08ResNet-18
2D Object DetectionCOCO minivalAP5064.3Cascade Mask R-CNN (ResNet-50)
2D Object DetectionCOCO minivalAP7550.5Cascade Mask R-CNN (ResNet-50)
2D Object DetectionCOCO minivalbox AP46.3Cascade Mask R-CNN (ResNet-50)
2D Object DetectionCOCO minivalAP5063GFL (ResNet-50)
2D Object DetectionCOCO minivalAP7548.3GFL (ResNet-50)
2D Object DetectionCOCO minivalbox AP44.5GFL (ResNet-50)
2D Object DetectionCOCO minivalAP5061.9ATSS (ResNet-50)
2D Object DetectionCOCO minivalAP7547ATSS (ResNet-50)
2D Object DetectionCOCO minivalbox AP43.5ATSS (ResNet-50)
Medical Image ClassificationNCT-CRC-HE-100KAccuracy (%)94.72ResNet-50
Medical Image ClassificationNCT-CRC-HE-100KF1-Score97.09ResNet-50
Medical Image ClassificationNCT-CRC-HE-100KPrecision100ResNet-50
Medical Image ClassificationNCT-CRC-HE-100KSpecificity99.34ResNet-50
Medical Image ClassificationNCT-CRC-HE-100KAccuracy (%)92.66ResNet-18
Medical Image ClassificationNCT-CRC-HE-100KF1-Score95.23ResNet-18
Medical Image ClassificationNCT-CRC-HE-100KPrecision99.9ResNet-18
Medical Image ClassificationNCT-CRC-HE-100KSpecificity99.08ResNet-18
Domain GeneralizationImageNet-RTop-1 Error Rate63.9ResNet-50
Domain GeneralizationImageNet-ATop-1 accuracy %4.2ResNet-50 (300 Epochs)
Domain GeneralizationVizWiz-ClassificationAccuracy - All Images47.5ResNet-152
Domain GeneralizationVizWiz-ClassificationAccuracy - Clean Images51.3ResNet-152
Domain GeneralizationVizWiz-ClassificationAccuracy - Corrupted Images43.3ResNet-152
Domain GeneralizationVizWiz-ClassificationAccuracy - All Images46.3ResNet-101
Domain GeneralizationVizWiz-ClassificationAccuracy - Clean Images50.1ResNet-101
Domain GeneralizationVizWiz-ClassificationAccuracy - Corrupted Images40.5ResNet-101
Domain GeneralizationVizWiz-ClassificationAccuracy - All Images42.9ResNet-50
Domain GeneralizationVizWiz-ClassificationAccuracy - Clean Images47.7ResNet-50
Domain GeneralizationVizWiz-ClassificationAccuracy - Corrupted Images37.1ResNet-50
10-shot image generationCityscapes valmIoU75.7Dilated-ResNet (Dilated-ResNet-101)
10-shot image generationDADA-segmIoU23.6ResNet-101
10-shot image generationDADA-segmIoU18.96ResNet-50
16kCOCO minivalAP5064.3Cascade Mask R-CNN (ResNet-50)
16kCOCO minivalAP7550.5Cascade Mask R-CNN (ResNet-50)
16kCOCO minivalbox AP46.3Cascade Mask R-CNN (ResNet-50)
16kCOCO minivalAP5063GFL (ResNet-50)
16kCOCO minivalAP7548.3GFL (ResNet-50)
16kCOCO minivalbox AP44.5GFL (ResNet-50)
16kCOCO minivalAP5061.9ATSS (ResNet-50)
16kCOCO minivalAP7547ATSS (ResNet-50)
16kCOCO minivalbox AP43.5ATSS (ResNet-50)
1 Image, 2*2 StitchingGTAV-to-Cityscapes LabelsmIoU41.7ResNet101 65.1
1 Image, 2*2 StitchingSyn2Real-CAccuracy52.4No Adaptation

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Simulate, Refocus and Ensemble: An Attention-Refocusing Scheme for Domain Generalization2025-07-17GLAD: Generalizable Tuning for Vision-Language Models2025-07-17