Juhong Min, Jongmin Lee, Jean Ponce, Minsu Cho
Establishing visual correspondences under large intra-class variations requires analyzing images at different levels, from features linked to semantics and context to local patterns, while being invariant to instance-specific details. To tackle these challenges, we represent images by "hyperpixels" that leverage a small number of relevant features selected among early to late layers of a convolutional neural network. Taking advantage of the condensed features of hyperpixels, we develop an effective real-time matching algorithm based on Hough geometric voting. The proposed method, hyperpixel flow, sets a new state of the art on three standard benchmarks as well as a new dataset, SPair-71k, which contains a significantly larger number of image pairs than existing datasets, with more accurate and richer annotations for in-depth analysis.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Matching | SPair-71k | PCK | 28.2 | HPF |
| Image Matching | PF-PASCAL | PCK | 88.3 | HPF |
| Image Matching | PF-WILLOW | PCK | 76.3 | HPF |
| Image Matching | Caltech-101 | IoU | 63 | HPF |
| Image Matching | Caltech-101 | LT-ACC | 87 | HPF |
| Semantic correspondence | SPair-71k | PCK | 28.2 | HPF |
| Semantic correspondence | PF-PASCAL | PCK | 88.3 | HPF |
| Semantic correspondence | PF-WILLOW | PCK | 76.3 | HPF |
| Semantic correspondence | Caltech-101 | IoU | 63 | HPF |
| Semantic correspondence | Caltech-101 | LT-ACC | 87 | HPF |