TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Towards Amortized Ranking-Critical Training for Collaborat...

Towards Amortized Ranking-Critical Training for Collaborative Filtering

Sam Lobel, Chunyuan Li, Jianfeng Gao, Lawrence Carin

2019-06-10Learning-To-RankReinforcement LearningCollaborative FilteringRecommendation Systems
PaperPDFCode(official)

Abstract

Collaborative filtering is widely used in modern recommender systems. Recent research shows that variational autoencoders (VAEs) yield state-of-the-art performance by integrating flexible representations from deep neural networks into latent variable models, mitigating limitations of traditional linear factor models. VAEs are typically trained by maximizing the likelihood (MLE) of users interacting with ground-truth items. While simple and often effective, MLE-based training does not directly maximize the recommendation-quality metrics one typically cares about, such as top-N ranking. In this paper we investigate new methods for training collaborative filtering models based on actor-critic reinforcement learning, to directly optimize the non-differentiable quality metrics of interest. Specifically, we train a critic network to approximate ranking-based metrics, and then update the actor network (represented here by a VAE) to directly optimize against the learned metrics. In contrast to traditional learning-to-rank methods that require to re-run the optimization procedure for new lists, our critic-based method amortizes the scoring process with a neural network, and can directly provide the (approximate) ranking scores for new lists. Empirically, we show that the proposed methods outperform several state-of-the-art baselines, including recently-proposed deep learning approaches, on three large-scale real-world datasets. The code to reproduce the experimental results and figure plots is on Github: https://github.com/samlobel/RaCT_CF

Results

TaskDatasetMetricValueModel
Recommendation SystemsMovieLens 20MRecall@200.403RaCT
Recommendation SystemsMovieLens 20MRecall@500.543RaCT
Recommendation SystemsMovieLens 20MnDCG@1000.434RaCT
Recommendation SystemsMillion Song DatasetRecall@200.268RaCT
Recommendation SystemsMillion Song DatasetRecall@500.364RaCT
Recommendation SystemsMillion Song DatasetnDCG@1000.319RaCT
Recommendation SystemsNetflixRecall@200.357RaCT
Recommendation SystemsNetflixRecall@500.45RaCT
Recommendation SystemsNetflixnDCG@1000.392RaCT

Related Papers

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18IP2: Entity-Guided Interest Probing for Personalized News Recommendation2025-07-18A Reproducibility Study of Product-side Fairness in Bundle Recommendation2025-07-18VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback2025-07-17VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17