TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers

575,626 papers

GenFlow: Interactive Modular System for Image Generation

Duc-Hung Nguyen, Huu-Phuc Huynh, Minh-Triet Tran, Trung-Nghia Le

2025-06-26Image Generation
Paper
CA-I2P: Channel-Adaptive Registration Network with Global Optimal Selection

Zhixin Cheng, Jiacheng Deng, Xinjun Li, Xiaotian Yin, Bohao Liao et al.

2025-06-26Image to Point Cloud RegistrationPoint Cloud Registration
Paper
CoPa-SG: Dense Scene Graphs with Parametric and Proto-Relations

Julian Lorenz, Mrunmai Phatak, Robin Schön, Katja Ludwig, Nico Hörmann et al.

2025-06-26Scene Graph GenerationScene UnderstandingGraph Generation
Paper
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

Hongbo Liu, Jingwen He, Yi Jin, Dian Zheng, Yuhao Dong et al.

2025-06-26Spatial ReasoningVideo Generation
Paper
PanSt3R: Multi-view Consistent Panoptic Segmentation

Lojze Zust, Yohann Cabon, Juliette Marrie, Leonid Antsfeld, Boris Chidlovskii et al.

2025-06-263D geometryPanoptic Segmentation2D Panoptic Segmentation+3
Paper
Holistic Surgical Phase Recognition with Hierarchical Input Dependent State Space Models

Haoyang Wu, Tsun-Hsuan Wang, Mathias Lechner, Ramin Hasani, Jennifer A. Eckhoff et al.

2025-06-26Surgical phase recognition
Paper
LLaVA-Pose: Enhancing Human Pose and Action Understanding via Keypoint-Integrated Instruction Tuning

Dewen Zhang, Tahir Hussain, Wangpeng An, Hayaru Shouno

2025-06-26Instruction FollowingAction Understanding
PaperCode
DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images

Badri Vishal Kasuba, Parag Chaudhuri, Ganesh Ramakrishnan

2025-06-26Question AnsweringVisual Groundingdocument understanding+3
PaperCode
Continual Self-Supervised Learning with Masked Autoencoders in Remote Sensing

Lars Möllenbrok, Behnood Rasti, Begüm Demir

2025-06-26Continual LearningSelf-Supervised LearningKnowledge Distillation
Paper
HieraSurg: Hierarchy-Aware Diffusion Model for Surgical Video Generation

Diego Biagini, Nassir Navab, Azade Farshad

2025-06-26Panoptic SegmentationSegmentationVideo Generation
Paper
HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context

Qize Yang, Shimin Yao, Weixuan Chen, Shenghao Fu, Detao Bai et al.

2025-06-26Reinforcement LearningMultimodal ReasoningLarge Language Model
PaperCode
WordCon: Word-level Typography Control in Scene Text Rendering

Wenda Shi, Yiren Song, Zihan Rao, Dengming Zhang, Jiaming Liu et al.

2025-06-26Disentanglementparameter-efficient fine-tuning
Paper
Video Virtual Try-on with Conditional Diffusion Transformer Inpainter

Cheng Zou, Senlin Cheng, Bolei Xu, Dandan Zheng, Xiaobo Li et al.

2025-06-26Virtual Try-onVideo InpaintingVideo Generation
Paper
DuET: Dual Incremental Object Detection via Exemplar-Free Task Arithmetic

Munish Monga, Vishal Chudasama, Pankaj Wasnik, Biplab Banerjee

2025-06-26Autonomous DrivingIncremental Learningobject-detection+1
Paper
Temporal Rate Reduction Clustering for Human Motion Segmentation

Xianghan Meng, Zhengyu Tong, Zhiyuan Huang, Chun-Guang Li

2025-06-26Motion SegmentationClustering
Paper
DiMPLe -- Disentangled Multi-Modal Prompt Learning: Enhancing Out-Of-Distribution Alignment with Invariant and Spurious Feature Separation

Umaima Rahman, Mohammad Yaqub, Dwarikanath Mahapatra

2025-06-26Contrastive Learning
Paper
ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation

Xiwei Xuan, Ziquan Deng, Kwan-Liu Ma

2025-06-26Open Vocabulary Semantic SegmentationScene UnderstandingSemantic Segmentation+2
PaperCode
BitMark for Infinity: Watermarking Bitwise Autoregressive Image Generative Models

Louis Kerner, Michel Meintz, Bihe Zhao, Franziska Boenisch, Adam Dziedzic et al.

2025-06-26Image Generation
Paper
GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding

Zijun Lin, Shuting He, Cheston Tan, Bihan Wen

2025-06-26Visual GroundingLarge Language Model3D visual grounding
Paper
Task-Aware KV Compression For Cost-Effective Long Video Understanding

Minghao Qin, Yan Shu, Peitian Zhang, Kun Lun, Huaying Yuan et al.

2025-06-26Video Understanding
PaperCode
PreviousPage 73 of 28782Next