TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CrossMoCo: Multi-modal Momentum Contrastive Learning for P...

CrossMoCo: Multi-modal Momentum Contrastive Learning for Point Cloud

Sneha Paul, Zachary Patterson, Nizar Bouguila

2023-06-0820th Conference on Robots and Vision (CRV) 2023 63D Point Cloud Linear ClassificationFew-Shot LearningSelf-Supervised LearningContrastive LearningFew-Shot 3D Point Cloud Classification3D Object Classification3D Point Cloud Classification
PaperPDFCode

Abstract

The point cloud is a 3D geometric data that lacks a specific structure and is permutation-invariant. The applications of point clouds have gained significant attention recently in the field of vision tasks. However, most existing works on point clouds utilize supervised learning on large labelled data, which are costly and laborious to collect. To this end, unsupervised learning, for example, self-supervised learning, has shown promising performance in various tasks of 2D computer vision and holds the potential in 3D computer vision applications. In this study, we introduce a novel selfsupervised method called CrossMoCo, which learns the representations of unlabelled point cloud data in a multi-modal setup that also utilizes the 2D rendered images of the point clouds. CrossMoCo outperforms existing methods on multimodal self-supervised learning on point cloud by introducing two new concepts: momentum contrastive learning with more negative samples and multiple-view intra-modal contrastive learning. The first component learns from an online encoder and a momentum encoder with a large number of negative samples, which provides consistent learning signals. The second component enforces consistency between different views of the samples of the same modality, thereby improving multimodal representation. We conduct extensive studies on two popular benchmark datasets (ModelNet40 and ScanObjectNN) for linear classification and few-shot learning tasks. Our results demonstrate that CrossMoCo achieves superior performance over existing methods for both tasks on both datasets, achieving up to 4.36% improvement on linear classification and up to 9.2% on few-shot tasks. Our code is available at https://github.com/snehaputul/CrossMoCo.

Related Papers

GLAD: Generalizable Tuning for Vision-Language Models2025-07-17A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16LLM-Driven Dual-Level Multi-Interest Modeling for Recommendation2025-07-15