TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers

575,626 papers

Grounded Reinforcement Learning for Visual Reasoning

Gabriel Sarch, Snigdha Saha, Naitik Khandelwal, Ayush Jain, Michael J. Tarr et al.

2025-05-29Spatial ReasoningReinforcement LearningVisual Reasoning+1
PaperCode
ImmunoDiff: A Diffusion Model for Immunotherapy Response Prediction in Lung Cancer

Moinak Bhattacharya, Judy Huang, Amna F. Sher, Gagandeep Singh, Chao Chen et al.

2025-05-29AnatomySurvival Prediction
Paper
OpenUni: A Simple Baseline for Unified Multimodal Understanding and Generation

Size Wu, Zhonghua Wu, Zerui Gong, Qingyi Tao, Sheng Jin et al.

2025-05-29
PaperCode
D-AR: Diffusion via Autoregressive Models

Ziteng Gao, Mike Zheng Shou

2025-05-29Denoising
PaperCode
VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models

Xiangdong Zhang, Jiaqi Liao, Shaofeng Zhang, Fanqing Meng, Xiangpeng Wan et al.

2025-05-29Self-Supervised LearningVideo UnderstandingVideo Generation
PaperCode
Radiant Triangle Soup with Soft Connectivity Forces for 3D Reconstruction and Novel View Synthesis

Nathaniel Burgdorfer, Philippos Mordohai

2025-05-29Novel View Synthesis3D Reconstruction
Paper
Comparing the Effects of Persistence Barcodes Aggregation and Feature Concatenation on Medical Imaging

Dashti A. Ali, Richard K. G. Do, William R. Jarnagin, Aras T. Asaad, Amber L. Simpson et al.

2025-05-29Feature EngineeringTopological Data AnalysisMedical Image Analysis
PaperCode
A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis

Shengyuan Liu, Boyun Zheng, WenTing Chen, Zhihao Peng, Zhenfei Yin et al.

2025-05-29DiagnosticVisual Question Answering (VQA)
Paper
Bridging Classical and Modern Computer Vision: PerceptiveNet for Tree Crown Semantic Segmentation

Georgios Voulgaris

2025-05-29SegmentationSemantic Segmentation
Paper
DeepChest: Dynamic Gradient-Free Task Weighting for Effective Multi-Task Learning in Chest X-ray Classification

Youssef Mohamed, Noran Mohamed, Khaled Abouhashad, Feilong Tang, Sara Atito et al.

2025-05-29Representation LearningMulti-Task LearningDiagnostic
PaperCode
Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles

Zifu Wang, Junyi Zhu, Bo Tang, Zhiyu Li, Feiyu Xiong et al.

2025-05-29Reinforcement Learning
PaperCode
PCA for Enhanced Cross-Dataset Generalizability in Breast Ultrasound Tumor Segmentation

Christian Schmidt, Heinrich Martin Overhoff

2025-05-29Tumor SegmentationStyle TransferSegmentation+4
Paper
Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression Recognition

Yu Li, Jin Jiang, Jianhua Zhu, Shuai Peng, Baole Wei et al.

2025-05-29Spatial ReasoningHandwritten Mathmatical Expression RecognitionLanguage Modelling+1
PaperCode
Qwen Look Again: Guiding Vision-Language Reasoning Models to Re-attention Visual Information

Xu Chu, Xinrong Chen, Guanyu Wang, Zhijie Tan, Kui Huang et al.

2025-05-29Hallucination
PaperCode
Position Paper: Metadata Enrichment Model: Integrating Neural Networks and Semantic Knowledge Graphs for Cultural Heritage Applications

Jan Ignatowicz, Krzysztof Kutt, Grzegorz J. Nalepa

2025-05-29Knowledge Graphs
Paper
Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion Modulation

Jiahao Cui, Yan Chen, Mingwang Xu, Hanlin Shang, Yuxuan Chen et al.

2025-05-29Video Alignment
PaperCode
CLIP-AE: CLIP-assisted Cross-view Audio-Visual Enhancement for Unsupervised Temporal Action Localization

Rui Xia, Dan Jiang, Quan Zhang, Ke Zhang, Chun Yuan et al.

2025-05-29Action LocalizationInformation RetrievalTemporal Action Localization
Paper
OmniEarth-Bench: Towards Holistic Evaluation of Earth's Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data

Fengxiang Wang, Mingshuo Chen, Xuming He, Yifan Zhang, Feng Liu et al.

2025-05-29
Paper
VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning

Liyun Zhu, Qixiang Chen, Xi Shen, Xiaodong Cun

2025-05-29Question AnsweringDescriptiveAnomaly Detection+1
PaperCode
R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation

Kaijie Chen, Zihao Lin, Zhiyang Xu, Ying Shen, Yuguang Yao et al.

2025-05-29Text-to-Image GenerationBenchmarkingText to Image Generation+1
Paper
PreviousPage 449 of 28782Next