Zhengping Che, Yu Cheng, Shuangfei Zhai, Zhaonan Sun, Yan Liu

The rapid growth of Electronic Health Records (EHRs), as well as theaccompanied opportunities in Data-Driven Healthcare (DDH), has been attractingwidespread interests and attentions. Recent progress in the design andapplications of deep learning methods has shown promising results and isforcing massive changes in healthcare academia and industry, but most of thesemethods rely on massive labeled data. In this work, we propose a general deeplearning framework which is able to boost risk prediction performance withlimited EHR data. Our model takes a modified generative adversarial networknamely ehrGAN, which can provide plausible labeled EHR data by mimicking realpatient records, to augment the training dataset in a semi-supervised learningmanner. We use this generative model together with a convolutional neuralnetwork (CNN) based prediction model to improve the onset predictionperformance. Experiments on two real healthcare datasets demonstrate that ourproposed framework produces realistic data samples and achieves significantimprovements on classification tasks with the generated data over severalstat-of-the-art baselines.