TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/AssembleNet++: Assembling Modality Representations via Att...

AssembleNet++: Assembling Modality Representations via Attention Connections

Michael S. Ryoo, AJ Piergiovanni, Juhana Kangaspunta, Anelia Angelova

2020-08-18Action ClassificationActivity Recognition
PaperPDFCode

Abstract

We create a family of powerful video models which are able to: (i) learn interactions between semantic object information and raw appearance and motion features, and (ii) deploy attention in order to better learn the importance of features at each convolutional block of the network. A new network component named peer-attention is introduced, which dynamically learns the attention weights using another block or input modality. Even without pre-training, our models outperform the previous work on standard public activity recognition datasets with continuous videos, establishing new state-of-the-art. We also confirm that our findings of having neural connections from the object modality and the use of peer-attention is generally applicable for different existing architectures, improving their performances. We name our model explicitly as AssembleNet++. The code will be available at: https://sites.google.com/corp/view/assemblenet/

Results

TaskDatasetMetricValueModel
VideoCharadesMAP59.8AssembleNet++ 50
VideoCharadesMAP54.98AssembleNet++ 50 without object
VideoToyota Smarthome datasetCS63.6AssembleNet++

Related Papers

ZKP-FedEval: Verifiable and Privacy-Preserving Federated Evaluation using Zero-Knowledge Proofs2025-07-15SEZ-HARN: Self-Explainable Zero-shot Human Activity Recognition Network2025-06-25Efficient Retail Video Annotation: A Robust Key Frame Generation Approach for Product and Customer Interaction Analysis2025-06-17DeSPITE: Exploring Contrastive Deep Skeleton-Pointcloud-IMU-Text Embeddings for Advanced Point Cloud Human Activity Understanding2025-06-16MORIC: CSI Delay-Doppler Decomposition for Robust Wi-Fi-based Human Activity Recognition2025-06-15AgentSense: Virtual Sensor Data Generation Using LLM Agents in Simulated Home Environments2025-06-13ScalableHD: Scalable and High-Throughput Hyperdimensional Computing Inference on Multi-Core CPUs2025-06-10SurgBench: A Unified Large-Scale Benchmark for Surgical Video Analysis2025-06-09