TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/GAIN: Missing Data Imputation using Generative Adversarial...

GAIN: Missing Data Imputation using Generative Adversarial Nets

Jinsung Yoon, James Jordon, Mihaela van der Schaar

2018-06-07ICML 2018 7ImputationMultivariate Time Series Imputation
PaperPDFCodeCodeCodeCodeCodeCode(official)CodeCode

Abstract

We propose a novel method for imputing missing data by adapting the well-known Generative Adversarial Nets (GAN) framework. Accordingly, we call our method Generative Adversarial Imputation Nets (GAIN). The generator (G) observes some components of a real data vector, imputes the missing components conditioned on what is actually observed, and outputs a completed vector. The discriminator (D) then takes a completed vector and attempts to determine which components were actually observed and which were imputed. To ensure that D forces G to learn the desired distribution, we provide D with some additional information in the form of a hint vector. The hint reveals to D partial information about the missingness of the original sample, which is used by D to focus its attention on the imputation quality of particular components. This hint ensures that G does in fact learn to generate according to the true data distribution. We tested our method on various datasets and found that GAIN significantly outperforms state-of-the-art imputation methods.

Results

TaskDatasetMetricValueModel
ImputationKDD CUP Challenge 2018MSE (10% missing)0.378GAIN
Feature EngineeringKDD CUP Challenge 2018MSE (10% missing)0.378GAIN

Related Papers

Missing value imputation with adversarial random forests -- MissARF2025-07-21MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling2025-07-17BMFM-DNA: A SNP-aware DNA foundation model to capture variant effects2025-06-26Leveraging AI Graders for Missing Score Imputation to Achieve Accurate Ability Estimation in Constructed-Response Tests2025-06-25DIM-SUM: Dynamic IMputation for Smart Utility Management2025-06-24Trustworthy Prediction with Gaussian Process Knowledge Scores2025-06-23LSCD: Lomb-Scargle Conditioned Diffusion for Time series Imputation2025-06-20Covariance Decomposition for Distance Based Species Tree Estimation2025-06-19