TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers

575,626 papers

Ontology-based knowledge representation for bone disease diagnosis: a foundation for safe and sustainable medical artificial intelligence systems

Loan Dao, Ngoc Quoc Ly

2025-06-05Question AnsweringMultimodal Deep LearningDiagnostic+2
Paper
MMRefine: Unveiling the Obstacles to Robust Refinement in Multimodal Large Language Models

Gio Paik, Geewook Kim, Jinbae Im

2025-06-05
PaperCode
ViCocktail: Automated Multi-Modal Data Collection for Vietnamese Audio-Visual Speech Recognition

Thai-Binh Nguyen, Thi Van Nguyen, Quoc Truong Do, Chi Mai Luong

2025-06-05Speech Recognitionspeech-recognitionAudio-Visual Speech Recognition+1
Paper
Exploring bidirectional bounds for minimax-training of Energy-based models

Cong Geng, Jia Wang, Li Chen, Zhiyong Gao, Jes Frellsen et al.

2025-06-05Density Estimation
Paper
Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and Datasets

Marianna Nezhurina, Tomer Porian, Giovanni Pucceti, Tommie Kerssies, Romain Beaumont et al.

2025-06-05
PaperCode
StatsMerging: Statistics-Guided Model Merging via Task-Specific Teacher Distillation

Ranjith Merugu, Bryan Bo Cao, Shubham Jain

2025-06-05Knowledge Distillation
PaperCode
Contrastive Flow Matching

George Stoica, Vivek Ramanujan, Xiang Fan, Ali Farhadi, Ranjay Krishna et al.

2025-06-05
PaperCode
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos

Hanoona Rasheed, Abdelrahman Shaker, Anqi Tang, Muhammad Maaz, Ming-Hsuan Yang et al.

2025-06-05Mathematical ReasoningBenchmarking
Paper
FreeTimeGS: Free Gaussian Primitives at Anytime and Anywhere for Dynamic Scene Reconstruction

Yifan Wang, Peishan Yang, Zhen Xu, Jiaming Sun, Zhanhua Zhang et al.

2025-06-05
Paper
Neural Inverse Rendering from Propagating Light

Anagh Malik, Benjamin Attal, Andrew Xie, Matthew O'Toole, David B. Lindell et al.

2025-06-05CVPR 2025 1Inverse Rendering3D Reconstruction
Paper
SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs

Jiahui Wang, Zuyan Liu, Yongming Rao, Jiwen Lu

2025-06-05
PaperCode
ContentV: Efficient Training of Video Generation Models with Limited Compute

Wenfeng Lin, Renjie Chen, Boyuan Liu, Shiyue Yan, Ruoyu Feng et al.

2025-06-05Image GenerationVideo Generation
Paper
Refer to Anything with Vision-Language Prompts

Shengcao Cao, Zijun Wei, Jason Kuen, Kangning Liu, Lingzhi Zhang et al.

2025-06-05BenchmarkingReferring ExpressionGeneralized Referring Expression Segmentation+4
Paper
Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning

Xingjian Ran, Yixuan Li, Linning Xu, Mulin Yu, Bo Dai et al.

2025-06-05Spatial ReasoningIndoor Scene Synthesis
Paper
Defurnishing with X-Ray Vision: Joint Removal of Furniture from Panoramas and Mesh

Alan Dolhasz, Chen Ma, Dave Gausebeck, Kevin Chen, Gregor Miller et al.

2025-06-05
Paper
VideoMolmo: Spatio-Temporal Grounding Meets Pointing

Ghazi Shazan Ahmad, Ahmed Heakl, Hanan Gani, Abdelrahman Shaker, Zhiqiang Shen et al.

2025-06-05Referring Video Object SegmentationAutonomous DrivingSemantic Segmentation+5
PaperCode
Unleashing Hour-Scale Video Training for Long Video-Language Understanding

Jingyang Lin, Jialian Wu, Ximeng Sun, Ze Wang, Jiang Liu et al.

2025-06-05Instruction FollowingLanguage Modelling
Paper
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning

Xinyan Chen, Renrui Zhang, Dongzhi Jiang, Aojun Zhou, Shilin Yan et al.

2025-06-05Mathematical ReasoningMathVisual Reasoning
PaperCode
Revisiting Depth Representations for Feed-Forward 3D Gaussian Splatting

Duochao Shi, Weijie Wang, Donny Y. Chen, Zeyu Zhang, Jia-Wang Bian et al.

2025-06-05Novel View Synthesis3DGS
Paper
Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs

Haoyuan Li, Yanpeng Zhou, Yufei Gao, Tao Tang, Jianhua Han et al.

2025-06-05Question AnsweringVisual Groundingcross-modal alignment+2
Paper
PreviousPage 330 of 28782Next