TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/On the Difficulty of Evaluating Baselines: A Study on Reco...

On the Difficulty of Evaluating Baselines: A Study on Recommender Systems

Steffen Rendle, Li Zhang, Yehuda Koren

2019-05-04Collaborative FilteringRecommendation Systems
PaperPDFCodeCode

Abstract

Numerical evaluations with comparisons to baselines play a central role when judging research in recommender systems. In this paper, we show that running baselines properly is difficult. We demonstrate this issue on two extensively studied datasets. First, we show that results for baselines that have been used in numerous publications over the past five years for the Movielens 10M benchmark are suboptimal. With a careful setup of a vanilla matrix factorization baseline, we are not only able to improve upon the reported results for this baseline but even outperform the reported results of any newly proposed method. Secondly, we recap the tremendous effort that was required by the community to obtain high quality results for simple methods on the Netflix Prize. Our results indicate that empirical findings in research papers are questionable unless they were obtained on standardized benchmarks where baselines have been tuned extensively by the research community.

Results

TaskDatasetMetricValueModel
Recommendation SystemsMovieLens 10MRMSE0.7485Bayesian timeSVD++ flipped
Recommendation SystemsMovieLens 10MRMSE0.7523Bayesian timeSVD++
Recommendation SystemsMovieLens 10MRMSE0.7563Bayesian SVD++
Recommendation SystemsMovieLens 10MRMSE0.772SGD MF
Recommendation SystemsMovieLens 10MRMSE0.823U-RBM

Related Papers

IP2: Entity-Guided Interest Probing for Personalized News Recommendation2025-07-18A Reproducibility Study of Product-side Fairness in Bundle Recommendation2025-07-18SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Looking for Fairness in Recommender Systems2025-07-16Journalism-Guided Agentic In-Context Learning for News Stance Detection2025-07-15LLM-Stackelberg Games: Conjectural Reasoning Equilibria and Their Applications to Spearphishing2025-07-12NLGCL: Naturally Existing Neighbor Layers Graph Contrastive Learning for Recommendation2025-07-10