TY  - JOUR
A1  - Vaid, Akhil
A1  - Chan, Lili
A1  - Chaudhary, Kumardeep
A1  - Jaladanki, Suraj K.
A1  - Paranjpe, Ishan
A1  - Russak, Adam J.
A1  - Kia, Arash
A1  - Timsina, Prem
A1  - Levin, Matthew A.
A1  - He, John Cijiang
A1  - Böttinger, Erwin
A1  - Charney, Alexander W.
A1  - Fayad, Zahi A.
A1  - Coca, Steven G.
A1  - Glicksberg, Benjamin S.
A1  - Nadkarni, Girish N.
T1  - Predictive approaches for acute dialysis requirement and death in COVID-19
JF  - Clinical journal of the American Society of Nephrology : CJASN
N2  - Background and objectives
AKI treated with dialysis initiation is a common complication of coronavirus disease 2019 (COVID-19) among hospitalized patients. However, dialysis supplies and personnel are often limited. 

Design, setting, participants, & measurements
Using data from adult patients hospitalized with COVID-19 from five hospitals from theMount Sinai Health System who were admitted between March 10 and December 26, 2020, we developed and validated several models (logistic regression, Least Absolute Shrinkage and Selection Operator (LASSO), random forest, and eXtreme GradientBoosting [XGBoost; with and without imputation]) for predicting treatment with dialysis or death at various time horizons (1, 3, 5, and 7 days) after hospital admission. Patients admitted to theMount Sinai Hospital were used for internal validation, whereas the other hospitals formed part of the external validation cohort. Features included demographics, comorbidities, and laboratory and vital signs within 12 hours of hospital admission.

Results
A total of 6093 patients (2442 in training and 3651 in external validation) were included in the final cohort. Of the different modeling approaches used, XGBoost without imputation had the highest area under the receiver operating characteristic (AUROC) curve on internal validation (range of 0.93-0.98) and area under the precisionrecall curve (AUPRC; range of 0.78-0.82) for all time points. XGBoost without imputation also had the highest test parameters on external validation (AUROC range of 0.85-0.87, and AUPRC range of 0.27-0.54) across all time windows. XGBoost without imputation outperformed all models with higher precision and recall (mean difference in AUROC of 0.04; mean difference in AUPRC of 0.15). Features of creatinine, BUN, and red cell distribution width were major drivers of the model's prediction.

Conclusions
 An XGBoost model without imputation for prediction of a composite outcome of either death or dialysis in patients positive for COVID-19 had the best performance, as compared with standard and other machine learning models.
KW  - COVID-19
KW  - dialysis
KW  - machine learning
KW  - prediction
KW  - AKI
Y1  - 2021
U6  - https://doi.org/10.2215/CJN.17311120
SN  - 1555-9041
SN  - 1555-905X
VL  - 16
IS  - 8
SP  - 1158
EP  - 1168
PB  - American Society of Nephrology
CY  - Washington
ER  - 
TY  - JOUR
A1  - Dellepiane, Sergio
A1  - Vaid, Akhil
A1  - Jaladanki, Suraj K.
A1  - Coca, Steven
A1  - Fayad, Zahi A.
A1  - Charney, Alexander W.
A1  - Böttinger, Erwin
A1  - He, John Cijiang
A1  - Glicksberg, Benjamin S.
A1  - Chan, Lili
A1  - Nadkarni, Girish
T1  - Acute kidney injury in patients hospitalized with COVID-19 in New York City
BT  - Temporal Trends From March 2020 to April 2021
JF  - Kidney medicine
Y1  - 2021
U6  - https://doi.org/10.1016/j.xkme.2021.06.008
SN  - 2590-0595
VL  - 3
IS  - 5
SP  - 877
EP  - 879
PB  - Elsevier
CY  - Amsterdam
ER  - 
TY  - GEN
A1  - Dellepiane, Sergio
A1  - Vaid, Akhil
A1  - Jaladanki, Suraj K.
A1  - Coca, Steven
A1  - Fayad, Zahi A.
A1  - Charney, Alexander W.
A1  - Böttinger, Erwin
A1  - He, John Cijiang
A1  - Glicksberg, Benjamin S.
A1  - Chan, Lili
A1  - Nadkarni, Girish
T1  - Acute kidney injury in patients hospitalized with COVID-19 in New York City
BT  - Temporal Trends From March 2020 to April 2021
T2  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät
T3  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät - 21 
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-585415
SN  - 2590-0595
IS  - 5
ER  - 
TY  - JOUR
A1  - Datta, Suparno
A1  - Sachs, Jan Philipp
A1  - Freitas da Cruz, Harry
A1  - Martensen, Tom
A1  - Bode, Philipp
A1  - Morassi Sasso, Ariane
A1  - Glicksberg, Benjamin S.
A1  - Böttinger, Erwin
T1  - FIBER
BT  - enabling flexible retrieval of electronic health records data for clinical predictive modeling
JF  - JAMIA open
N2  - Objectives: 
The development of clinical predictive models hinges upon the availability of comprehensive clinical data. Tapping into such resources requires considerable effort from clinicians, data scientists, and engineers. Specifically, these efforts are focused on data extraction and preprocessing steps required prior to modeling, including complex database queries. A handful of software libraries exist that can reduce this complexity by building upon data standards. However, a gap remains concerning electronic health records (EHRs) stored in star schema clinical data warehouses, an approach often adopted in practice. In this article, we introduce the FlexIBle EHR Retrieval (FIBER) tool: a Python library built on top of a star schema (i2b2) clinical data warehouse that enables flexible generation of modeling-ready cohorts as data frames. 

Materials and Methods: 
FIBER was developed on top of a large-scale star schema EHR database which contains data from 8 million patients and over 120 million encounters. To illustrate FIBER's capabilities, we present its application by building a heart surgery patient cohort with subsequent prediction of acute kidney injury (AKI) with various machine learning models. 

Results:
Using FIBER, we were able to build the heart surgery cohort (n = 12 061), identify the patients that developed AKI (n = 1005), and automatically extract relevant features (n = 774). Finally, we trained machine learning models that achieved area under the curve values of up to 0.77 for this exemplary use case.

Conclusion: 
FIBER is an open-source Python library developed for extracting information from star schema clinical data warehouses and reduces time-to-modeling, helping to streamline the clinical modeling process.
KW  - databases
KW  - factual
KW  - electronic health records
KW  - information storage and
KW  - retrieval
KW  - workflow
KW  - software/instrumentation
Y1  - 2021
U6  - https://doi.org/10.1093/jamiaopen/ooab048
SN  - 2574-2531
VL  - 4
IS  - 3
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - De Freitas, Jessica K.
A1  - Johnson, Kipp W.
A1  - Golden, Eddye
A1  - Nadkarni, Girish N.
A1  - Dudley, Joel T.
A1  - Böttinger, Erwin
A1  - Glicksberg, Benjamin S.
A1  - Miotto, Riccardo
T1  - Phe2vec
BT  - Automated disease phenotyping based on unsupervised embeddings from electronic health records
JF  - Patterns
N2  - Robust phenotyping of patients from electronic health records (EHRs) at scale is a challenge in clinical informatics. Here, we introduce Phe2vec, an automated framework for disease phenotyping from EHRs based on unsupervised learning and assess its effectiveness against standard rule-based algorithms from Phenotype KnowledgeBase (PheKB). Phe2vec is based on pre-computing embeddings of medical concepts and patients' clinical history. Disease phenotypes are then derived from a seed concept and its neighbors in the embedding space. Patients are linked to a disease if their embedded representation is close to the disease phenotype. Comparing Phe2vec and PheKB cohorts head-to-head using chart review, Phe2vec performed on par or better in nine out of ten diseases. Differently from other approaches, it can scale to any condition and was validated against widely adopted expert-based standards. Phe2vec aims to optimize clinical informatics research by augmenting current frameworks to characterize patients by condition and derive reliable disease cohorts.
Y1  - 2021
U6  - https://doi.org/10.1016/j.patter.2021.100337
SN  - 2666-3899
VL  - 2
IS  - 9
PB  - Elsevier
CY  - Amsterdam
ER  -