TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers

575,626 papers

TRAP: Targeted Redirecting of Agentic Preferences

Hangoo Kang, Jehyeok Yeon, Gagandeep Singh

2025-05-29Decision Making
Paper
Autoformalization in the Era of Large Language Models: A Survey

Ke Weng, Lun Du, Sirui Li, Wangyue Lu, Haozhe Sun et al.

2025-05-29Automated Theorem Proving
PaperCode
EVOREFUSE: Evolutionary Prompt Optimization for Evaluation and Mitigation of LLM Over-Refusal to Pseudo-Malicious Instructions

Xiaorui Wu, Xiaofeng Mao, Fei Li, Xin Zhang, Xiaolu Zhang et al.

2025-05-29Safety Alignment
Paper
GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning

Jusheng Zhang, Yijia Fan, Wenjun Lin, Ruiqi Chen, Haoyi Jiang et al.

2025-05-29Multimodal ReasoningVisual Reasoning
Paper
MathArena: Evaluating LLMs on Uncontaminated Math Competitions

Mislav Balunović, Jasper Dekoninck, Ivo Petrov, Nikola Jovanović, Martin Vechev et al.

2025-05-29Mathematical ReasoningMathMemorization
PaperCode
Conceptual Framework Toward Embodied Collective Adaptive Intelligence

Fan Wang, Shaoshan Liu

2025-05-29Navigate
Paper
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents

Jenny Zhang, Shengran Hu, Cong Lu, Robert Lange, Jeff Clune et al.

2025-05-29Meta-Learning
PaperCode
Foundation Molecular Grammar: Multi-Modal Foundation Models Induce Interpretable Molecular Graph Languages

Michael Sun, Weize Yuan, Gang Liu, Wojciech Matusik, Jie Chen et al.

2025-05-29
PaperCode
Socratic-PRMBench: Benchmarking Process Reward Models with Systematic Reasoning Patterns

Xiang Li, Haiyang Yu, Xinghua Zhang, Ziyang Huang, Shizhu He et al.

2025-05-29Benchmarking
Paper
Infi-MMR: Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models

Zeyu Liu, Yuhang Liu, Guanghao Zhu, Congkai Xie, Zhen Li et al.

2025-05-29MathMultimodal ReasoningLogical Reasoning
Paper
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning

Ziyin Zhang, Jiahao Xu, Zhiwei He, Tian Liang, Qiuzhi Liu et al.

2025-05-29Mathematical ReasoningAutomated Theorem Proving
PaperCode
ATLAS: Learning to Optimally Memorize the Context at Test Time

Ali Behrouz, Zeman Li, Praneeth Kacham, Majid Daliri, Yuan Deng et al.

2025-05-29Long-Context UnderstandingCommon Sense ReasoningLanguage Modelling
Paper
Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time

Mohamad Chehade, Soumya Suvra Ghosal, Souradip Chakraborty, Avinash Reddy, Dinesh Manocha et al.

2025-05-29
Paper
Label-Guided In-Context Learning for Named Entity Recognition

Fan Bai, Hamid Hassanzadeh, Ardavan Saeedi, Mark Dredze

2025-05-29named-entity-recognitionNamed Entity RecognitionSemantic Similarity+3
PaperCode
Don't Take the Premise for Granted: Evaluating the Premise Critique Ability of Large Language Models

Jinzhe Li, Gengxu Li, Yi Chang, Yuan Wu

2025-05-29
PaperCode
SenWiCh: Sense-Annotation of Low-Resource Languages for WiC using Hybrid Methods

Roksana Goworek, Harpal Karlcut, Muhammad Shezad, Nijaguna Darshana, Abhishek Mane et al.

2025-05-29Multilingual NLPCross-Lingual Transfer
Paper
SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models

Zixiang Xu, Yanbo Wang, Yue Huang, Jiayi Ye, Haomin Zhuang et al.

2025-05-29
PaperCode
Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation

Ziling Cheng, Meng Cao, Leila Pishdad, Yanshuai Cao, Jackie Chi Kit Cheung et al.

2025-05-29MathGSM8K
Paper
Child-Directed Language Does Not Consistently Boost Syntax Learning in Language Models

Francesca Padovani, Jaap Jumelet, Yevgen Matusevych, Arianna Bisazza

2025-05-29
PaperCode
Automatic classification of stop realisation with wav2vec2.0

James Tanner, Morgan Sonderegger, Jane Stuart-Smith, Jeff Mielke, Tyler Kendall et al.

2025-05-29Classification
PaperCode
PreviousPage 438 of 28782Next