Autopopulus: A Novel Framework for Autoencoder Imputation on Large Clinical Datasets.

Document Type


Publication Date


Publication Title

Annu Int Conf IEEE Eng Med Biol Soc


spokane; washington; Datasets as Topic; Disease Progression; Electronic Health Records; Humans; Renal Insufficiency, Chronic; Research Design; Software; Uncertainty


The adoption of electronic health records (EHRs) has made patient data increasingly accessible, precipitating the development of various clinical decision support systems and data-driven models to help physicians. However, missing data are common in EHR-derived datasets, which can introduce significant uncertainty, if not invalidating the use of a predictive model. Machine learning (ML)-based imputation methods have shown promise in various domains for the task of estimating values and reducing uncertainty to the point that a predictive model can be employed. We introduce Autopopulus, a novel framework that enables the design and evaluation of various autoencoder architectures for efficient imputation on large datasets. Autopopulus implements existing autoencoder methods as well as a new technique that outputs a range of estimated values (rather than point estimates), and demonstrates a workflow that helps users make an informed decision on an appropriate imputation method. To further illustrate Autopopulus' utility, we use it to identify not only which imputation methods can most accurately impute on a large clinical dataset, but to also identify the imputation methods that enable downstream predictive models to achieve the best performance for prediction of chronic kidney disease (CKD) progression.

Clinical Institute

Kidney & Diabetes


Health Information Technology