TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers

575,626 papers

ToolHaystack: Stress-Testing Tool-Augmented Language Models in Realistic Long-Term Interactions

Beong-woo Kwak, Minju Kim, Dongha Lim, Hyungjoo Chae, Dongjin Kang et al.

2025-05-29
PaperCode
ARC: Argument Representation and Coverage Analysis for Zero-Shot Long Document Summarization with Instruction Following LLMs

Mohamed Elaraby, Diane Litman

2025-05-29Instruction FollowingAbstractive Text SummarizationDocument Summarization
Paper
GeNRe: A French Gender-Neutral Rewriting System Using Collective Nouns

Enzo Doyen, Amalia Todirascu

2025-05-29
PaperCode
AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora

Jiaxin Bai, Wei Fan, Qi Hu, Qing Zong, Chunyang Li et al.

2025-05-29Knowledge Graphsgraph construction
PaperCode
Characterizing the Expressivity of Transformer Language Models

Jiaoda Li, Ryan Cotterell

2025-05-29
Paper
Table-R1: Inference-Time Scaling for Table Reasoning

Zheyuan Yang, Lyuhao Chen, Arman Cohan, Yilun Zhao

2025-05-29Fact Verification
PaperCode
Understanding Refusal in Language Models with Sparse Autoencoders

Wei Jie Yeo, Nirmalendu Prakash, Clement Neo, Roy Ka-Wei Lee, Erik Cambria et al.

2025-05-29
PaperCode
Translation in the Wild

Yuri Balashov

2025-05-29Machine TranslationTranslation
Paper
Probability-Consistent Preference Optimization for Enhanced LLM Reasoning

Yunqiao Yang, Houxing Ren, Zimu Lu, Ke Wang, Weikang Shi et al.

2025-05-29Mathematical Reasoning
PaperCode
CLaC at SemEval-2025 Task 6: A Multi-Architecture Approach for Corporate Environmental Promise Verification

Nawar Turk, Eeham Khan, Leila Kosseim

2025-05-29
Paper
Revisiting Overthinking in Long Chain-of-Thought from the Perspective of Self-Doubt

Keqin Peng, Liang Ding, Yuanxin Ouyang, Meng Fang, DaCheng Tao et al.

2025-05-29Mathematical Reasoning
Paper
Evaluating the performance and fragility of large language models on the self-assessment for neurological surgeons

Krithik Vishwanath, Anton Alyakin, Mrigayu Ghosh, Jin Vivian Lee, Daniel Alexander Alber et al.

2025-05-29
Paper
UAQFact: Evaluating Factual Knowledge Utilization of LLMs on Unanswerable Questions

Chuanyuan Tan, Wenbiao Shao, Hao Xiong, Tong Zhu, Zhenhua Liu et al.

2025-05-29
PaperCode
The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model Convergence

Marco Gaido, Sara Papi, Luisa Bentivogli, Alessio Brutti, Mauro Cettolo et al.

2025-05-29Speech-to-Text
PaperCode
From Parameters to Prompts: Understanding and Mitigating the Factuality Gap between Fine-Tuned LLMs

Xuan Gong, Hanbo Huang, Shiyu Liang

2025-05-29Knowledge GraphsFew-Shot Learning
Paper
Adaptive Jailbreaking Strategies Based on the Semantic Understanding Capabilities of Large Language Models

Mingyu Yu, Wei Wang, Yanjie Wei, Sujuan Qin

2025-05-29
Paper
Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation

Beiduo Chen, Yang Janet Liu, Anna Korhonen, Barbara Plank

2025-05-29
PaperCode
Discriminative Policy Optimization for Token-Level Reward Models

Hongzhan Chen, Tao Yang, Shiping Gao, Ruijun Chen, Xiaojun Quan et al.

2025-05-29Mathematical ReasoningMathText Generation+2
PaperCode
Neither Stochastic Parroting nor AGI: LLMs Solve Tasks through Context-Directed Extrapolation from Training Data Priors

Harish Tayyar Madabushi, Melissa Torgbi, Claire Bonial

2025-05-29
Paper
Proximalized Preference Optimization for Diverse Feedback Types: A Decomposed Perspective on DPO

Kaiyang Guo, Yinchuan Li, Zhitang Chen

2025-05-29
Paper
PreviousPage 439 of 28782Next