Refine
Has Fulltext
- no (23) (remove)
Year of publication
Document Type
- Article (20)
- Conference Proceeding (2)
- Review (1)
Is part of the Bibliography
- yes (23)
Keywords
- prediction (23) (remove)
Institute
- Department Psychologie (4)
- Institut für Biochemie und Biologie (4)
- Department Linguistik (2)
- Hasso-Plattner-Institut für Digital Engineering gGmbH (2)
- Institut für Mathematik (2)
- Department für Inklusionspädagogik (1)
- Fachgruppe Betriebswirtschaftslehre (1)
- Hochschulambulanz (1)
- Institut für Ernährungswissenschaft (1)
- Institut für Geowissenschaften (1)
Background:
COVID-19 has infected millions of people worldwide and is responsible for several hundred thousand fatalities. The COVID-19 pandemic has necessitated thoughtful resource allocation and early identification of high-risk patients. However, effective methods to meet these needs are lacking.
Objective:
The aims of this study were to analyze the electronic health records (EHRs) of patients who tested positive for COVID-19 and were admitted to hospitals in the Mount Sinai Health System in New York City; to develop machine learning models for making predictions about the hospital course of the patients over clinically meaningful time horizons based on patient characteristics at admission; and to assess the performance of these models at multiple hospitals and time points.
Methods:
We used Extreme Gradient Boosting (XGBoost) and baseline comparator models to predict in-hospital mortality and critical events at time windows of 3, 5, 7, and 10 days from admission. Our study population included harmonized EHR data from five hospitals in New York City for 4098 COVID-19-positive patients admitted from March 15 to May 22, 2020. The models were first trained on patients from a single hospital (n=1514) before or on May 1, externally validated on patients from four other hospitals (n=2201) before or on May 1, and prospectively validated on all patients after May 1 (n=383). Finally, we established model interpretability to identify and rank variables that drive model predictions.
Results:
Upon cross-validation, the XGBoost classifier outperformed baseline models, with an area under the receiver operating characteristic curve (AUC-ROC) for mortality of 0.89 at 3 days, 0.85 at 5 and 7 days, and 0.84 at 10 days. XGBoost also performed well for critical event prediction, with an AUC-ROC of 0.80 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. In external validation, XGBoost achieved an AUC-ROC of 0.88 at 3 days, 0.86 at 5 days, 0.86 at 7 days, and 0.84 at 10 days for mortality prediction. Similarly, the unimputed XGBoost model achieved an AUC-ROC of 0.78 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. Trends in performance on prospective validation sets were similar. At 7 days, acute kidney injury on admission, elevated LDH, tachypnea, and hyperglycemia were the strongest drivers of critical event prediction, while higher age, anion gap, and C-reactive protein were the strongest drivers of mortality prediction.
Conclusions:
We externally and prospectively trained and validated machine learning models for mortality and critical events for patients with COVID-19 at different time horizons. These models identified at-risk patients and uncovered underlying relationships that predicted outcomes.
We investigate spatio-temporal properties of earthquake patterns in the San Jacinto fault zone (SJFZ), California, between Cajon Pass and the Superstition Hill Fault, using a long record of simulated seismicity constrained by available seismological and geological data. The model provides an effective realization of a large segmented strike-slip fault zone in a 3D elastic half-space, with heterogeneous distribution of static friction chosen to represent several clear step-overs at the surface. The simulated synthetic catalog reproduces well the basic statistical features of the instrumental seismicity recorded at the SJFZ area since 1981. The model also produces events larger than those included in the short instrumental record, consistent with paleo-earthquakes documented at sites along the SJFZ for the last 1,400 years. The general agreement between the synthetic and observed data allows us to address with the long-simulated seismicity questions related to large earthquakes and expected seismic hazard. The interaction between m a parts per thousand yen 7 events on different sections of the SJFZ is found to be close to random. The hazard associated with m a parts per thousand yen 7 events on the SJFZ increases significantly if the long record of simulated seismicity is taken into account. The model simulations indicate that the recent increased number of observed intermediate SJFZ earthquakes is a robust statistical feature heralding the occurrence of m a parts per thousand yen 7 earthquakes. The hypocenters of the m a parts per thousand yen 5 events in the simulation results move progressively towards the hypocenter of the upcoming m a parts per thousand yen 7 earthquake.
Based on an analysis of continuous monitoring of farm animal behavior in the region of the 2016 M6.6 Norcia earthquake in Italy, Wikelski et al., 2020; (Seismol Res Lett, 89, 2020, 1238) conclude that animal activity can be anticipated with subsequent seismic activity and that this finding might help to design a "short-term earthquake forecasting method." We show that this result is based on an incomplete analysis and misleading interpretations. Applying state-of-the-art methods of statistics, we demonstrate that the proposed anticipatory patterns cannot be distinguished from random patterns, and consequently, the observed anomalies in animal activity do not have any forecasting power.