South Carolina Statewide Electronic Health Records and its Modeling Cases
Presented by Jiajia Zhang
Department of Epidemiology & Biostatistics
University of South Carolina
Electronic health records (EHRs) are promising but challenging resources for research on investigating and monitoring disease progression. Due to their complexity, we proposed several ways to address longitudinal or unstructured EHRs data to improve the dynamic prediction of survival outcomes.
Case 1: Motivated by improving the prediction of the HIV suppression status using electronic health records (EHR) data, we propose a functional multi-variable logistic regression model that accounts for the longitudinal binary process and continuous process simultaneously. Specifically, the longitudinal measurements for either binary or continuous variables are modelled by functional principal components analysis and their corresponding functional principal component scores are used to build logistic regression model for prediction. The longitudinal binary data are linked to underlying Gaussian processes. The estimation is done using penalized spline for the longitudinal continuous and binary data. Group-lasso is used to select longitudinal processes and multivariate FPCA is proposed to revise functional principal component scores with correlation. The method is evaluated via comprehensive simulation studies and then apply to predict viral suppression using EHR data for people living with HIV in South Carolina.
Case 2: Comorbidity indices like the Charlson Comorbidity Index (CCI) provide a single score but overlook the impact of comorbidity history/timing and severity features on mortality. We propose dynamically predicting the risk of all-cause mortality using a landmark large language model that could decipher the chronological comorbidity history as well as consider the severity. The data included 45,353 hospitalized patients in Prisma Hospital in South Carolina. We propose a two-step landmark large language model: initially extracting longitudinal features from concatenated comorbidity descriptions history via BERT (Bidirectional Encoder Representations from Transformers), followed by fitting a binary classification model to predict all-cause mortality of patients.
The results provide evidence that EHRs and dynamic prediction have the potential to assist clinicians in understanding patients’ disease progression based on their historical information at different times.
A seminar tea will be held at 2:45 p.m. in University Office Plaza, Room 240. All are Welcome.