Alejandro Newell, Zhiao Huang, Jia Deng
We introduce associative embedding, a novel method for supervising convolutional neural networks for the task of detection and grouping. A number of computer vision problems can be framed in this manner including multi-person pose estimation, instance segmentation, and multi-object tracking. Usually the grouping of detections is achieved with multi-stage pipelines, instead we propose an approach that teaches a network to simultaneously output detections and group assignments. This technique can be easily integrated into any state-of-the-art network architecture that produces pixel-wise predictions. We show how to apply this method to both multi-person pose estimation and instance segmentation and report state-of-the-art performance for multi-person pose on the MPII and MS-COCO datasets.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Pose Estimation | OCHuman | Test AP | 32.8 | Associative Embedding+ |
| Pose Estimation | OCHuman | Validation AP | 40 | Associative Embedding+ |
| Pose Estimation | OCHuman | Test AP | 29.5 | Associative Embedding |
| Pose Estimation | OCHuman | Validation AP | 32.1 | Associative Embedding |
| Pose Estimation | COCO test-dev | AP50 | 86.8 | AE |
| Pose Estimation | COCO test-dev | AP75 | 72.3 | AE |
| Pose Estimation | COCO test-dev | APL | 72.6 | AE |
| Pose Estimation | COCO test-dev | APM | 60.6 | AE |
| Pose Estimation | COCO test-dev | AR | 70.2 | AE |
| Pose Estimation | COCO test-dev | AR50 | 89.5 | AE |
| Pose Estimation | COCO test-dev | AR75 | 76 | AE |
| Pose Estimation | COCO test-dev | ARL | 78.1 | AE |
| Pose Estimation | COCO test-dev | ARM | 64.6 | AE |
| Pose Estimation | COCO (Common Objects in Context) | Test AP | 62.8 | Pose-AE |
| Pose Estimation | OCHuman | Test AP | 32.8 | Associative Embedding+ |
| Pose Estimation | OCHuman | Validation AP | 40 | Associative Embedding+ |
| Pose Estimation | OCHuman | Test AP | 29.5 | Associative Embedding |
| Pose Estimation | OCHuman | Validation AP | 32.1 | Associative Embedding |
| Pose Estimation | COCO (Common Objects in Context) | AP | 0.655 | Associative Embedding |
| 3D | OCHuman | Test AP | 32.8 | Associative Embedding+ |
| 3D | OCHuman | Validation AP | 40 | Associative Embedding+ |
| 3D | OCHuman | Test AP | 29.5 | Associative Embedding |
| 3D | OCHuman | Validation AP | 32.1 | Associative Embedding |
| 3D | COCO test-dev | AP50 | 86.8 | AE |
| 3D | COCO test-dev | AP75 | 72.3 | AE |
| 3D | COCO test-dev | APL | 72.6 | AE |
| 3D | COCO test-dev | APM | 60.6 | AE |
| 3D | COCO test-dev | AR | 70.2 | AE |
| 3D | COCO test-dev | AR50 | 89.5 | AE |
| 3D | COCO test-dev | AR75 | 76 | AE |
| 3D | COCO test-dev | ARL | 78.1 | AE |
| 3D | COCO test-dev | ARM | 64.6 | AE |
| 3D | COCO (Common Objects in Context) | Test AP | 62.8 | Pose-AE |
| 3D | OCHuman | Test AP | 32.8 | Associative Embedding+ |
| 3D | OCHuman | Validation AP | 40 | Associative Embedding+ |
| 3D | OCHuman | Test AP | 29.5 | Associative Embedding |
| 3D | OCHuman | Validation AP | 32.1 | Associative Embedding |
| 3D | COCO (Common Objects in Context) | AP | 0.655 | Associative Embedding |
| 2D Human Pose Estimation | COCO-WholeBody | WB | 27.4 | AE |
| 2D Human Pose Estimation | COCO-WholeBody | body | 40.5 | AE |
| 2D Human Pose Estimation | COCO-WholeBody | face | 47.7 | AE |
| 2D Human Pose Estimation | COCO-WholeBody | foot | 7.7 | AE |
| 2D Human Pose Estimation | COCO-WholeBody | hand | 34.1 | AE |
| 2D Human Pose Estimation | OCHuman | Test AP | 32.8 | Associative Embedding+ |
| 2D Human Pose Estimation | OCHuman | Validation AP | 40 | Associative Embedding+ |
| 2D Human Pose Estimation | OCHuman | Test AP | 29.5 | Associative Embedding |
| 2D Human Pose Estimation | OCHuman | Validation AP | 32.1 | Associative Embedding |
| Multi-Person Pose Estimation | COCO (Common Objects in Context) | AP | 0.655 | Associative Embedding |
| 1 Image, 2*2 Stitchi | OCHuman | Test AP | 32.8 | Associative Embedding+ |
| 1 Image, 2*2 Stitchi | OCHuman | Validation AP | 40 | Associative Embedding+ |
| 1 Image, 2*2 Stitchi | OCHuman | Test AP | 29.5 | Associative Embedding |
| 1 Image, 2*2 Stitchi | OCHuman | Validation AP | 32.1 | Associative Embedding |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP50 | 86.8 | AE |
| 1 Image, 2*2 Stitchi | COCO test-dev | AP75 | 72.3 | AE |
| 1 Image, 2*2 Stitchi | COCO test-dev | APL | 72.6 | AE |
| 1 Image, 2*2 Stitchi | COCO test-dev | APM | 60.6 | AE |
| 1 Image, 2*2 Stitchi | COCO test-dev | AR | 70.2 | AE |
| 1 Image, 2*2 Stitchi | COCO test-dev | AR50 | 89.5 | AE |
| 1 Image, 2*2 Stitchi | COCO test-dev | AR75 | 76 | AE |
| 1 Image, 2*2 Stitchi | COCO test-dev | ARL | 78.1 | AE |
| 1 Image, 2*2 Stitchi | COCO test-dev | ARM | 64.6 | AE |
| 1 Image, 2*2 Stitchi | COCO (Common Objects in Context) | Test AP | 62.8 | Pose-AE |
| 1 Image, 2*2 Stitchi | OCHuman | Test AP | 32.8 | Associative Embedding+ |
| 1 Image, 2*2 Stitchi | OCHuman | Validation AP | 40 | Associative Embedding+ |
| 1 Image, 2*2 Stitchi | OCHuman | Test AP | 29.5 | Associative Embedding |
| 1 Image, 2*2 Stitchi | OCHuman | Validation AP | 32.1 | Associative Embedding |
| 1 Image, 2*2 Stitchi | COCO (Common Objects in Context) | AP | 0.655 | Associative Embedding |