Description
A (2+1)D Convolution is a type of convolution used for action recognition convolutional neural networks, with a spatiotemporal volume. As opposed to applying a 3D Convolution over the entire volume, which can be computationally expensive and lead to overfitting, a (2+1)D convolution splits computation into two convolutions: a spatial 2D convolution followed by a temporal 1D convolution.
Papers Using This Method
Multi-Microphone and Multi-Modal Emotion Recognition in Reverberant Environment2024-09-14Use of a Multiscale Vision Transformer to predict Nursing Activities Score from Low Resolution Thermal Videos in an Intensive Care Unit2024-05-30Temporal Contrastive Learning with Curriculum2022-09-02Motion-Focused Contrastive Learning of Video Representations2022-01-11ByteTrack: Multi-Object Tracking by Associating Every Detection Box2021-10-13Self-Supervised Video Representation Learning with Meta-Contrastive Network2021-08-19Spatiotemporal Contrastive Learning of Facial Expressions in Videos2021-08-06Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting2021-06-18You Only Learn One Representation: Unified Network for Multiple Tasks2021-05-10The 3TConv: An Intrinsic Approach to Explainable 3D CNNs2021-01-01Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics2020-08-31Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition2020-08-03RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices2020-07-20Neural Graph Collaborative Filtering2019-05-20Spatiotemporal CNNs for Pornography Detection in Videos2018-10-24Multi-Fiber Networks for Video Recognition2018-07-30A Closer Look at Spatiotemporal Convolutions for Action Recognition2017-11-30