TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Estimating Missing Data in Temporal Data Streams Using Mul...

Estimating Missing Data in Temporal Data Streams Using Multi-directional Recurrent Neural Networks

Jinsung Yoon, William R. Zame, Mihaela van der Schaar

2017-11-23Multivariate Time Series ImputationMatrix Completion
PaperPDFCodeCodeCode

Abstract

Missing data is a ubiquitous problem. It is especially challenging in medical settings because many streams of measurements are collected at different - and often irregular - times. Accurate estimation of those missing measurements is critical for many reasons, including diagnosis, prognosis and treatment. Existing methods address this estimation problem by interpolating within data streams or imputing across data streams (both of which ignore important information) or ignoring the temporal aspect of the data and imposing strong assumptions about the nature of the data-generating process and/or the pattern of missing data (both of which are especially problematic for medical data). We propose a new approach, based on a novel deep learning architecture that we call a Multi-directional Recurrent Neural Network (M-RNN) that interpolates within data streams and imputes across data streams. We demonstrate the power of our approach by applying it to five real-world medical datasets. We show that it provides dramatically improved estimation of missing measurements in comparison to 11 state-of-the-art benchmarks (including Spline and Cubic Interpolations, MICE, MissForest, matrix completion and several RNN methods); typical improvements in Root Mean Square Error are between 35% - 50%. Additional experiments based on the same five datasets demonstrate that the improvements provided by our method are extremely robust.

Results

TaskDatasetMetricValueModel
ImputationUCI localization dataMAE (10% missing)0.248M-RNN
ImputationPhysioNet Challenge 2012MAE (10% of data as GT)0.451M-RNN
ImputationBeijing Multi-Site Air-Quality DatasetMAE (PM2.5)14.24M-RNN
Feature EngineeringUCI localization dataMAE (10% missing)0.248M-RNN
Feature EngineeringPhysioNet Challenge 2012MAE (10% of data as GT)0.451M-RNN
Feature EngineeringBeijing Multi-Site Air-Quality DatasetMAE (PM2.5)14.24M-RNN

Related Papers

New Hardness Results for Low-Rank Matrix Completion2025-06-23Contrastive Matrix Completion with Denoising and Augmented Graph Views for Robust Recommendation2025-06-12N$^2$: A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion2025-06-04Covariate-Adjusted Deep Causal Learning for Heterogeneous Panel Data Models2025-05-26Optimal Transport with Heterogeneously Missing Data2025-05-22RGNMR: A Gauss-Newton method for robust matrix completion with theoretical guarantees2025-05-19Adaptively-weighted Nearest Neighbors for Matrix Completion2025-05-14Euclidean Distance Matrix Completion via Asymmetric Projected Gradient Descent2025-04-28