TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/BATMAN: Bilateral Attention Transformer in Motion-Appearan...

BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation

Ye Yu, Jialin Yuan, Gaurav Mittal, Li Fuxin, Mei Chen

2022-08-01Visual Object TrackingSemi-Supervised Video Object SegmentationOptical Flow EstimationSegmentationSemantic SegmentationVideo Object SegmentationVideo Semantic SegmentationVideo Understanding
PaperPDF

Abstract

Video Object Segmentation (VOS) is fundamental to video understanding. Transformer-based methods show significant performance improvement on semi-supervised VOS. However, existing work faces challenges segmenting visually similar objects in close proximity of each other. In this paper, we propose a novel Bilateral Attention Transformer in Motion-Appearance Neighboring space (BATMAN) for semi-supervised VOS. It captures object motion in the video via a novel optical flow calibration module that fuses the segmentation mask with optical flow estimation to improve within-object optical flow smoothness and reduce noise at object boundaries. This calibrated optical flow is then employed in our novel bilateral attention, which computes the correspondence between the query and reference frames in the neighboring bilateral space considering both motion and appearance. Extensive experiments validate the effectiveness of BATMAN architecture by outperforming all existing state-of-the-art on all four popular VOS benchmarks: Youtube-VOS 2019 (85.0%), Youtube-VOS 2018 (85.3%), DAVIS 2017Val/Testdev (86.2%/82.2%), and DAVIS 2016 (92.5%).

Results

TaskDatasetMetricValueModel
VideoYouTube-VOS 2019F-Measure (Seen)89.3BATMAN
VideoYouTube-VOS 2019F-Measure (Unseen)87.2BATMAN
VideoYouTube-VOS 2019Jaccard (Seen)84.5BATMAN
VideoYouTube-VOS 2019Jaccard (Unseen)79BATMAN
VideoYouTube-VOS 2019Mean Jaccard & F-Measure85BATMAN
VideoYouTube-VOS 2019Mean Jaccard & F-Measure84.1AOT
VideoYouTube-VOS 2019F-Measure (Seen)85.4STCN
VideoYouTube-VOS 2019F-Measure (Seen)85.1CFBI
VideoYouTube-VOS 2019F-Measure (Unseen)83CFBI
VideoYouTube-VOS 2019Jaccard (Seen)80.6CFBI
VideoYouTube-VOS 2019Jaccard (Unseen)75.2CFBI
VideoYouTube-VOS 2019Mean Jaccard & F-Measure81CFBI
VideoDAVIS 2016F-Score94.2BATMAN (val)
VideoDAVIS 2016J&F92.5BATMAN (val)
VideoDAVIS 2016Jaccard (Mean)90.7BATMAN (val)
VideoDAVIS 2016F-Score92.5STCN (val)
VideoDAVIS 2016J&F91.6STCN (val)
VideoDAVIS 2016Jaccard (Mean)90.8STCN (val)
VideoDAVIS 2016F-Score92.1AOT (val)
VideoDAVIS 2016J&F91.1AOT (val)
VideoDAVIS 2016Jaccard (Mean)90.1AOT (val)
VideoDAVIS 2016F-Score91.4LCM (val)
VideoDAVIS 2016J&F90.7LCM (val)
VideoDAVIS 2016Jaccard (Mean)89.9LCM (val)
VideoDAVIS 2016F-Score94RPCMVOS (val)
VideoDAVIS 2016J&F90.6RPCMVOS (val)
VideoDAVIS 2016Jaccard (Mean)87.1RPCMVOS (val)
VideoDAVIS 2016F-Score91.5KMN (val)
VideoDAVIS 2016J&F90.5KMN (val)
VideoDAVIS 2016Jaccard (Mean)89.5KMN (val)
VideoDAVIS 2016F-Score91.2TransVOS (val)
VideoDAVIS 2016J&F90.5TransVOS (val)
VideoDAVIS 2016Jaccard (Mean)89.8TransVOS (val)
VideoDAVIS 2016F-Score91.1CFBI+ (val)
VideoDAVIS 2016J&F89.9CFBI+ (val)
VideoDAVIS 2016Jaccard (Mean)88.7CFBI+ (val)
VideoDAVIS 2016F-Score90.5CFBI (val)
VideoDAVIS 2016J&F89.4CFBI (val)
VideoDAVIS 2016Jaccard (Mean)88.3CFBI (val)
VideoDAVIS 2016F-Score88.7RMN (val)
VideoDAVIS 2016J&F88.8RMN (val)
VideoDAVIS 2016Jaccard (Mean)88.9RMN (val)
VideoDAVIS 2016F-Score89.9STM (val)
VideoDAVIS 2016Jaccard (Mean)88.7STM (val)
VideoDAVIS 2017 (test-dev)F-measure86.1BATMAN
VideoDAVIS 2017 (test-dev)Jaccard78.4BATMAN
VideoDAVIS 2017 (test-dev)Mean Jaccard & F-Measure82.2BATMAN
VideoDAVIS 2017 (test-dev)F-measure81.8LCM
VideoDAVIS 2017 (test-dev)Jaccard74.4LCM
VideoDAVIS 2017 (test-dev)Mean Jaccard & F-Measure78.1LCM
VideoDAVIS 2017 (test-dev)F-measure80.9TransVOS
VideoDAVIS 2017 (test-dev)Jaccard73TransVOS
VideoDAVIS 2017 (test-dev)Mean Jaccard & F-Measure76.9TransVOS
VideoDAVIS 2017 (test-dev)F-measure79.6STCN
VideoDAVIS 2017 (test-dev)Jaccard72.7STCN
VideoDAVIS 2017 (test-dev)Mean Jaccard & F-Measure76.1STCN
VideoDAVIS 2017 (test-dev)F-measure78.1RMN
VideoDAVIS 2017 (test-dev)Jaccard71.9RMN
VideoDAVIS 2017 (test-dev)Jaccard71.6CFBI+
VideoDAVIS 2017 (test-dev)Mean Jaccard & F-Measure75.6CFBI+
VideoDAVIS 2017 (test-dev)F-measure78.7CFBI
VideoDAVIS 2017 (test-dev)Jaccard71.4CFBI
VideoDAVIS 2017 (test-dev)Mean Jaccard & F-Measure75CFBI
VideoYouTube-VOS 2018F-Measure (Seen)88.5AOT
VideoYouTube-VOS 2018F-Measure (Unseen)86.1AOT
VideoYouTube-VOS 2018Jaccard (Seen)83.7AOT
VideoYouTube-VOS 2018Jaccard (Unseen)78.1AOT
VideoYouTube-VOS 2018Mean Jaccard & F-Measure84.1AOT
VideoYouTube-VOS 2018F-Measure (Seen)86.5STCN
VideoYouTube-VOS 2018F-Measure (Unseen)85.7STCN
VideoYouTube-VOS 2018Jaccard (Seen)81.9STCN
VideoYouTube-VOS 2018Jaccard (Unseen)77.9STCN
VideoYouTube-VOS 2018Mean Jaccard & F-Measure83STCN
VideoYouTube-VOS 2018Jaccard (Seen)82.2LCM
VideoYouTube-VOS 2018Mean Jaccard & F-Measure82LCM
VideoYouTube-VOS 2018F-Measure (Seen)86.7TransVOS
VideoYouTube-VOS 2018F-Measure (Unseen)83.4TransVOS
VideoYouTube-VOS 2018Jaccard (Seen)82TransVOS
VideoYouTube-VOS 2018Jaccard (Unseen)75TransVOS
VideoYouTube-VOS 2018Mean Jaccard & F-Measure81.8TransVOS
VideoYouTube-VOS 2018Jaccard (Seen)81.2SST
VideoYouTube-VOS 2018Jaccard (Unseen)76SST
VideoYouTube-VOS 2018Mean Jaccard & F-Measure81.7SST
VideoYouTube-VOS 2018F-Measure (Seen)84.9LWL
VideoYouTube-VOS 2018F-Measure (Unseen)84.4LWL
VideoYouTube-VOS 2018Jaccard (Seen)80.4LWL
VideoYouTube-VOS 2018Jaccard (Unseen)76.4LWL
VideoYouTube-VOS 2018Mean Jaccard & F-Measure81.5LWL
VideoYouTube-VOS 2018Jaccard (Unseen)75.3KMN
VideoYouTube-VOS 2018F-Measure (Seen)84.2STM
VideoYouTube-VOS 2018F-Measure (Unseen)80.9STM
VideoYouTube-VOS 2018Jaccard (Seen)79.7STM
VideoYouTube-VOS 2018Jaccard (Unseen)72.8STM
VideoYouTube-VOS 2018Mean Jaccard & F-Measure79.4STM
VideoYouTube-VOS 2018F-Measure (Seen)85.7RMN
VideoYouTube-VOS 2018F-Measure (Unseen)82.4RMN
VideoYouTube-VOS 2018Jaccard (Seen)82.1RMN
VideoYouTube-VOS 2018Jaccard (Unseen)75.7RMN
VideoDAVIS 2017 (val)F-measure89.3BATMAN
VideoDAVIS 2017 (val)Mean Jaccard & F-Measure86.2BATMAN
VideoDAVIS 2017 (val)Jaccard82.2STCN
VideoDAVIS 2017 (val)Mean Jaccard & F-Measure85.4STCN
VideoDAVIS 2017 (val)F-measure87.5AOT
VideoDAVIS 2017 (val)Jaccard82.3AOT
VideoDAVIS 2017 (val)Mean Jaccard & F-Measure84.9AOT
VideoDAVIS 2017 (val)F-measure86.4TransVOS
VideoDAVIS 2017 (val)Jaccard81.4TransVOS
VideoDAVIS 2017 (val)Mean Jaccard & F-Measure83.9TransVOS
VideoDAVIS 2017 (val)F-measure86RMN
VideoDAVIS 2017 (val)Mean Jaccard & F-Measure83.5RMN
VideoDAVIS 2017 (val)F-measure85.1SST
VideoDAVIS 2017 (val)Jaccard79.9SST
VideoDAVIS 2017 (val)Mean Jaccard & F-Measure82.5SST
VideoDAVIS 2017 (val)F-measure84.5CFBI
VideoDAVIS 2017 (val)Jaccard79.3CFBI
VideoDAVIS 2017 (val)F-measure84.1LWL
VideoDAVIS 2017 (val)Jaccard79.1LWL
VideoDAVIS 2017 (val)Mean Jaccard & F-Measure81.6LWL
VideoDAVIS 2017 (val)F-measure86.5LCM
VideoDAVIS 2017 (val)Jaccard80.5LCM
VideoYouTube-VOS 2018Jaccard (Unseen)75.3KMN
VideoYouTube-VOS 2018F-Measure (Unseen)83.4CFBI
Object TrackingYouTube-VOS 2018Jaccard (Unseen)75.7RMN
Object TrackingYouTube-VOS 2018Jaccard (Unseen)75.3KMN
Object TrackingYouTube-VOS 2018F-Measure (Seen)86.7TransVOS
Object TrackingYouTube-VOS 2018F-Measure (Unseen)83.4TransVOS
Object TrackingYouTube-VOS 2018F-Measure (Unseen)83.4CFBI
Video Object SegmentationYouTube-VOS 2019F-Measure (Seen)89.3BATMAN
Video Object SegmentationYouTube-VOS 2019F-Measure (Unseen)87.2BATMAN
Video Object SegmentationYouTube-VOS 2019Jaccard (Seen)84.5BATMAN
Video Object SegmentationYouTube-VOS 2019Jaccard (Unseen)79BATMAN
Video Object SegmentationYouTube-VOS 2019Mean Jaccard & F-Measure85BATMAN
Video Object SegmentationYouTube-VOS 2019Mean Jaccard & F-Measure84.1AOT
Video Object SegmentationYouTube-VOS 2019F-Measure (Seen)85.4STCN
Video Object SegmentationYouTube-VOS 2019F-Measure (Seen)85.1CFBI
Video Object SegmentationYouTube-VOS 2019F-Measure (Unseen)83CFBI
Video Object SegmentationYouTube-VOS 2019Jaccard (Seen)80.6CFBI
Video Object SegmentationYouTube-VOS 2019Jaccard (Unseen)75.2CFBI
Video Object SegmentationYouTube-VOS 2019Mean Jaccard & F-Measure81CFBI
Video Object SegmentationDAVIS 2016F-Score94.2BATMAN (val)
Video Object SegmentationDAVIS 2016J&F92.5BATMAN (val)
Video Object SegmentationDAVIS 2016Jaccard (Mean)90.7BATMAN (val)
Video Object SegmentationDAVIS 2016F-Score92.5STCN (val)
Video Object SegmentationDAVIS 2016J&F91.6STCN (val)
Video Object SegmentationDAVIS 2016Jaccard (Mean)90.8STCN (val)
Video Object SegmentationDAVIS 2016F-Score92.1AOT (val)
Video Object SegmentationDAVIS 2016J&F91.1AOT (val)
Video Object SegmentationDAVIS 2016Jaccard (Mean)90.1AOT (val)
Video Object SegmentationDAVIS 2016F-Score91.4LCM (val)
Video Object SegmentationDAVIS 2016J&F90.7LCM (val)
Video Object SegmentationDAVIS 2016Jaccard (Mean)89.9LCM (val)
Video Object SegmentationDAVIS 2016F-Score94RPCMVOS (val)
Video Object SegmentationDAVIS 2016J&F90.6RPCMVOS (val)
Video Object SegmentationDAVIS 2016Jaccard (Mean)87.1RPCMVOS (val)
Video Object SegmentationDAVIS 2016F-Score91.5KMN (val)
Video Object SegmentationDAVIS 2016J&F90.5KMN (val)
Video Object SegmentationDAVIS 2016Jaccard (Mean)89.5KMN (val)
Video Object SegmentationDAVIS 2016F-Score91.2TransVOS (val)
Video Object SegmentationDAVIS 2016J&F90.5TransVOS (val)
Video Object SegmentationDAVIS 2016Jaccard (Mean)89.8TransVOS (val)
Video Object SegmentationDAVIS 2016F-Score91.1CFBI+ (val)
Video Object SegmentationDAVIS 2016J&F89.9CFBI+ (val)
Video Object SegmentationDAVIS 2016Jaccard (Mean)88.7CFBI+ (val)
Video Object SegmentationDAVIS 2016F-Score90.5CFBI (val)
Video Object SegmentationDAVIS 2016J&F89.4CFBI (val)
Video Object SegmentationDAVIS 2016Jaccard (Mean)88.3CFBI (val)
Video Object SegmentationDAVIS 2016F-Score88.7RMN (val)
Video Object SegmentationDAVIS 2016J&F88.8RMN (val)
Video Object SegmentationDAVIS 2016Jaccard (Mean)88.9RMN (val)
Video Object SegmentationDAVIS 2016F-Score89.9STM (val)
Video Object SegmentationDAVIS 2016Jaccard (Mean)88.7STM (val)
Video Object SegmentationDAVIS 2017 (test-dev)F-measure86.1BATMAN
Video Object SegmentationDAVIS 2017 (test-dev)Jaccard78.4BATMAN
Video Object SegmentationDAVIS 2017 (test-dev)Mean Jaccard & F-Measure82.2BATMAN
Video Object SegmentationDAVIS 2017 (test-dev)F-measure81.8LCM
Video Object SegmentationDAVIS 2017 (test-dev)Jaccard74.4LCM
Video Object SegmentationDAVIS 2017 (test-dev)Mean Jaccard & F-Measure78.1LCM
Video Object SegmentationDAVIS 2017 (test-dev)F-measure80.9TransVOS
Video Object SegmentationDAVIS 2017 (test-dev)Jaccard73TransVOS
Video Object SegmentationDAVIS 2017 (test-dev)Mean Jaccard & F-Measure76.9TransVOS
Video Object SegmentationDAVIS 2017 (test-dev)F-measure79.6STCN
Video Object SegmentationDAVIS 2017 (test-dev)Jaccard72.7STCN
Video Object SegmentationDAVIS 2017 (test-dev)Mean Jaccard & F-Measure76.1STCN
Video Object SegmentationDAVIS 2017 (test-dev)F-measure78.1RMN
Video Object SegmentationDAVIS 2017 (test-dev)Jaccard71.9RMN
Video Object SegmentationDAVIS 2017 (test-dev)Jaccard71.6CFBI+
Video Object SegmentationDAVIS 2017 (test-dev)Mean Jaccard & F-Measure75.6CFBI+
Video Object SegmentationDAVIS 2017 (test-dev)F-measure78.7CFBI
Video Object SegmentationDAVIS 2017 (test-dev)Jaccard71.4CFBI
Video Object SegmentationDAVIS 2017 (test-dev)Mean Jaccard & F-Measure75CFBI
Video Object SegmentationYouTube-VOS 2018F-Measure (Seen)88.5AOT
Video Object SegmentationYouTube-VOS 2018F-Measure (Unseen)86.1AOT
Video Object SegmentationYouTube-VOS 2018Jaccard (Seen)83.7AOT
Video Object SegmentationYouTube-VOS 2018Jaccard (Unseen)78.1AOT
Video Object SegmentationYouTube-VOS 2018Mean Jaccard & F-Measure84.1AOT
Video Object SegmentationYouTube-VOS 2018F-Measure (Seen)86.5STCN
Video Object SegmentationYouTube-VOS 2018F-Measure (Unseen)85.7STCN
Video Object SegmentationYouTube-VOS 2018Jaccard (Seen)81.9STCN
Video Object SegmentationYouTube-VOS 2018Jaccard (Unseen)77.9STCN
Video Object SegmentationYouTube-VOS 2018Mean Jaccard & F-Measure83STCN
Video Object SegmentationYouTube-VOS 2018Jaccard (Seen)82.2LCM
Video Object SegmentationYouTube-VOS 2018Mean Jaccard & F-Measure82LCM
Video Object SegmentationYouTube-VOS 2018F-Measure (Seen)86.7TransVOS
Video Object SegmentationYouTube-VOS 2018F-Measure (Unseen)83.4TransVOS
Video Object SegmentationYouTube-VOS 2018Jaccard (Seen)82TransVOS
Video Object SegmentationYouTube-VOS 2018Jaccard (Unseen)75TransVOS
Video Object SegmentationYouTube-VOS 2018Mean Jaccard & F-Measure81.8TransVOS
Video Object SegmentationYouTube-VOS 2018Jaccard (Seen)81.2SST
Video Object SegmentationYouTube-VOS 2018Jaccard (Unseen)76SST
Video Object SegmentationYouTube-VOS 2018Mean Jaccard & F-Measure81.7SST
Video Object SegmentationYouTube-VOS 2018F-Measure (Seen)84.9LWL
Video Object SegmentationYouTube-VOS 2018F-Measure (Unseen)84.4LWL
Video Object SegmentationYouTube-VOS 2018Jaccard (Seen)80.4LWL
Video Object SegmentationYouTube-VOS 2018Jaccard (Unseen)76.4LWL
Video Object SegmentationYouTube-VOS 2018Mean Jaccard & F-Measure81.5LWL
Video Object SegmentationYouTube-VOS 2018Jaccard (Unseen)75.3KMN
Video Object SegmentationYouTube-VOS 2018F-Measure (Seen)84.2STM
Video Object SegmentationYouTube-VOS 2018F-Measure (Unseen)80.9STM
Video Object SegmentationYouTube-VOS 2018Jaccard (Seen)79.7STM
Video Object SegmentationYouTube-VOS 2018Jaccard (Unseen)72.8STM
Video Object SegmentationYouTube-VOS 2018Mean Jaccard & F-Measure79.4STM
Video Object SegmentationYouTube-VOS 2018F-Measure (Seen)85.7RMN
Video Object SegmentationYouTube-VOS 2018F-Measure (Unseen)82.4RMN
Video Object SegmentationYouTube-VOS 2018Jaccard (Seen)82.1RMN
Video Object SegmentationYouTube-VOS 2018Jaccard (Unseen)75.7RMN
Video Object SegmentationDAVIS 2017 (val)F-measure89.3BATMAN
Video Object SegmentationDAVIS 2017 (val)Mean Jaccard & F-Measure86.2BATMAN
Video Object SegmentationDAVIS 2017 (val)Jaccard82.2STCN
Video Object SegmentationDAVIS 2017 (val)Mean Jaccard & F-Measure85.4STCN
Video Object SegmentationDAVIS 2017 (val)F-measure87.5AOT
Video Object SegmentationDAVIS 2017 (val)Jaccard82.3AOT
Video Object SegmentationDAVIS 2017 (val)Mean Jaccard & F-Measure84.9AOT
Video Object SegmentationDAVIS 2017 (val)F-measure86.4TransVOS
Video Object SegmentationDAVIS 2017 (val)Jaccard81.4TransVOS
Video Object SegmentationDAVIS 2017 (val)Mean Jaccard & F-Measure83.9TransVOS
Video Object SegmentationDAVIS 2017 (val)F-measure86RMN
Video Object SegmentationDAVIS 2017 (val)Mean Jaccard & F-Measure83.5RMN
Video Object SegmentationDAVIS 2017 (val)F-measure85.1SST
Video Object SegmentationDAVIS 2017 (val)Jaccard79.9SST
Video Object SegmentationDAVIS 2017 (val)Mean Jaccard & F-Measure82.5SST
Video Object SegmentationDAVIS 2017 (val)F-measure84.5CFBI
Video Object SegmentationDAVIS 2017 (val)Jaccard79.3CFBI
Video Object SegmentationDAVIS 2017 (val)F-measure84.1LWL
Video Object SegmentationDAVIS 2017 (val)Jaccard79.1LWL
Video Object SegmentationDAVIS 2017 (val)Mean Jaccard & F-Measure81.6LWL
Video Object SegmentationDAVIS 2017 (val)F-measure86.5LCM
Video Object SegmentationDAVIS 2017 (val)Jaccard80.5LCM
Video Object SegmentationYouTube-VOS 2018Jaccard (Unseen)75.3KMN
Video Object SegmentationYouTube-VOS 2018F-Measure (Unseen)83.4CFBI
Semi-Supervised Video Object SegmentationYouTube-VOS 2018Jaccard (Unseen)75.3KMN
Semi-Supervised Video Object SegmentationYouTube-VOS 2018F-Measure (Unseen)83.4CFBI
Visual Object TrackingYouTube-VOS 2018Jaccard (Unseen)75.7RMN
Visual Object TrackingYouTube-VOS 2018Jaccard (Unseen)75.3KMN
Visual Object TrackingYouTube-VOS 2018F-Measure (Seen)86.7TransVOS
Visual Object TrackingYouTube-VOS 2018F-Measure (Unseen)83.4TransVOS
Visual Object TrackingYouTube-VOS 2018F-Measure (Unseen)83.4CFBI

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17