TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/JoLT: Joint Probabilistic Predictions on Tabular Data Usin...

JoLT: Joint Probabilistic Predictions on Tabular Data Using LLMs

Aliaksandra Shysheya, John Bronskill, James Requeima, Shoaib Ahmed Siddiqui, Javier Gonzalez, David Duvenaud, Richard E. Turner

2025-02-17Imputationtabular-classification
PaperPDFCode(official)Code

Abstract

We introduce a simple method for probabilistic predictions on tabular data based on Large Language Models (LLMs) called JoLT (Joint LLM Process for Tabular data). JoLT uses the in-context learning capabilities of LLMs to define joint distributions over tabular data conditioned on user-specified side information about the problem, exploiting the vast repository of latent problem-relevant knowledge encoded in LLMs. JoLT defines joint distributions for multiple target variables with potentially heterogeneous data types without any data conversion, data preprocessing, special handling of missing data, or model training, making it accessible and efficient for practitioners. Our experiments show that JoLT outperforms competitive methods on low-shot single-target and multi-target tabular classification and regression tasks. Furthermore, we show that JoLT can automatically handle missing data and perform data imputation by leveraging textual side information. We argue that due to its simplicity and generality, JoLT is an effective approach for a wide variety of real prediction problems.

Related Papers

Missing value imputation with adversarial random forests -- MissARF2025-07-21MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling2025-07-17BMFM-DNA: A SNP-aware DNA foundation model to capture variant effects2025-06-26Leveraging AI Graders for Missing Score Imputation to Achieve Accurate Ability Estimation in Constructed-Response Tests2025-06-25DIM-SUM: Dynamic IMputation for Smart Utility Management2025-06-24Trustworthy Prediction with Gaussian Process Knowledge Scores2025-06-23LSCD: Lomb-Scargle Conditioned Diffusion for Time series Imputation2025-06-20Covariance Decomposition for Distance Based Species Tree Estimation2025-06-19