TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CATs: Cost Aggregation Transformers for Visual Corresponde...

CATs: Cost Aggregation Transformers for Visual Correspondence

Seokju Cho, Sunghwan Hong, Sangryul Jeon, Yunsung Lee, Kwanghoon Sohn, Seungryong Kim

2021-06-04NeurIPS 2021 12Semantic correspondence
PaperPDFCode

Abstract

We propose a novel cost aggregation network, called Cost Aggregation Transformers (CATs), to find dense correspondences between semantically similar images with additional challenges posed by large intra-class appearance and geometric variations. Cost aggregation is a highly important process in matching tasks, which the matching accuracy depends on the quality of its output. Compared to hand-crafted or CNN-based methods addressing the cost aggregation, in that either lacks robustness to severe deformations or inherit the limitation of CNNs that fail to discriminate incorrect matches due to limited receptive fields, CATs explore global consensus among initial correlation map with the help of some architectural designs that allow us to fully leverage self-attention mechanism. Specifically, we include appearance affinity modeling to aid the cost aggregation process in order to disambiguate the noisy initial correlation maps and propose multi-level aggregation to efficiently capture different semantics from hierarchical feature representations. We then combine with swapping self-attention technique and residual connections not only to enforce consistent matching but also to ease the learning process, which we find that these result in an apparent performance boost. We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies. Code and trained models are available at https://github.com/SunghwanHong/CATs.

Results

TaskDatasetMetricValueModel
Image MatchingSPair-71kPCK49.9CATs
Image MatchingPF-PASCALPCK92.6CATs
Image MatchingPF-WILLOWPCK79.2CATs
Semantic correspondenceSPair-71kPCK49.9CATs
Semantic correspondencePF-PASCALPCK92.6CATs
Semantic correspondencePF-WILLOWPCK79.2CATs

Related Papers

RL from Physical Feedback: Aligning Large Motion Models with Humanoid Control2025-06-15Jamais Vu: Exposing the Generalization Gap in Supervised Semantic Correspondence2025-06-09Do It Yourself: Learning Semantic Correspondence from Pseudo-Labels2025-06-05MotionRAG-Diff: A Retrieval-Augmented Diffusion Framework for Long-Term Music-to-Dance Generation2025-06-03Cora: Correspondence-aware image editing using few step diffusion2025-05-29Semantic Correspondence: Unified Benchmarking and a Strong Baseline2025-05-23TC-MGC: Text-Conditioned Multi-Grained Contrastive Learning for Text-Video Retrieval2025-04-07SemAlign3D: Semantic Correspondence between RGB-Images through Aligning 3D Object-Class Representations2025-03-28