TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers

575,626 papers

Does It Make Sense to Speak of Introspection in Large Language Models?

Iulia M. Comsa, Murray Shanahan

2025-06-05
Paper
Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation

Noy Sternlicht, Ariel Gera, Roy Bar-Haim, Tom Hope, Noam Slonim et al.

2025-06-05Benchmarking
PaperCode
TALL -- A Trainable Architecture for Enhancing LLM Performance in Low-Resource Languages

Moshe Ofer, Orel Zamler, Amos Azaria

2025-06-05Translation
Paper
Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers

Yutao Hou, Zeguan Xiao, Fei Yu, Yihan Jiang, Xuetao Wei et al.

2025-06-05MathGSM8KMMLU
Paper
Controlling Summarization Length Through EOS Token Weighting

Zeno Belligoli, Emmanouil Stergiadis, Eran Fainman, Ilya Gusev

2025-06-05Text Generation
Paper
SCOP: Evaluating the Comprehension Process of Large Language Models from a Cognitive View

Yongjie Xiao, Hongru Liang, Peixin Qin, Yao Zhang, Wenqiang Lei et al.

2025-06-05Reading Comprehension
Paper
From Struggle (06-2024) to Mastery (02-2025) LLMs Conquer Advanced Algorithm Exams and Pave the Way for Editorial Generation

Adrian Marius Dumitran, Theodor-Pierre Moroianu, Vasile Paul Alexe

2025-06-05
Paper
ConECT Dataset: Overcoming Data Scarcity in Context-Aware E-Commerce MT

Mikołaj Pokrywka, Wojciech Kusa, Mieszko Rutkowski, Mikołaj Koszowski

2025-06-05Machine TranslationNMTTranslation+1
Paper
Simulating LLM-to-LLM Tutoring for Multilingual Math Feedback

Junior Cedric Tonga, KV Aditya Srivatsa, Kaushal Kumar Maurya, Fajri Koto, Ekaterina Kochmar et al.

2025-06-05Math
Paper
A Practitioner's Guide to Building ASR Models for Low-Resource Languages: A Case Study on Scottish Gaelic

Ondřej Klejch, William Lamb, Peter Bell

2025-06-05
Paper
ICPC-Eval: Probing the Frontiers of LLM Reasoning with Competitive Programming Contests

Shiyi Xu, Yiwen Hu, Yingqian Min, Zhipeng Chen, Wayne Xin Zhao et al.

2025-06-05Code Generation
PaperCode
Evaluating the Effectiveness of Linguistic Knowledge in Pretrained Language Models: A Case Study of Universal Dependencies

Wenxi Li

2025-06-05Paraphrase Identification
Paper
Prompting LLMs: Length Control for Isometric Machine Translation

Dávid Javorský, Ondřej Bojar, François Yvon

2025-06-05Machine Translationde-enTranslation
Paper
Multiple-Choice Question Generation Using Large Language Models: Methodology and Educator Insights

Giorgio Biancini, Alessio Ferrato, Carla Limongelli

2025-06-05Question AnsweringQuestion GenerationMultiple-choice
Paper
MockConf: A Student Interpretation Dataset: Analysis, Word- and Span-level Alignment and Baselines

Dávid Javorský, Ondřej Bojar, François Yvon

2025-06-05
PaperCode
Joint Evaluation of Answer and Reasoning Consistency for Hallucination Detection in Large Reasoning Models

Changyue Wang, Weihang Su, Qingyao Ai, Yiqun Liu

2025-06-05HallucinationDiagnostic
PaperCode
A Reasoning-Based Approach to Cryptic Crossword Clue Solving

Martin Andrews, Sam Witteveen

2025-06-05
PaperCode
Evaluating Vision-Language and Large Language Models for Automated Student Assessment in Indonesian Classrooms

Nurul Aisyah, Muhammad Dehan Al Kautsar, Arif Hidayat, Raqib Chowdhury, Fajri Koto et al.

2025-06-05Multiple-choice
Paper
Design of intelligent proofreading system for English translation based on CNN and BERT

Feijun Liu, Huifeng Wang, Kun Wang, Yizhen Wang

2025-06-05Machine TranslationBenchmarkingTranslation
Paper
Fine-Grained Interpretation of Political Opinions in Large Language Models

Jingyu Hu, Mengyue Yang, Mengnan Du, Weiru Liu

2025-06-05
Paper
PreviousPage 327 of 28782Next