TY - THES A1 - Rogge-Solti, Andreas T1 - Probabilistic Estimation of Unobserved Process Events T1 - Probabilistische Abschätzung Unbeobachteter Prozessereignisse N2 - Organizations try to gain competitive advantages, and to increase customer satisfaction. To ensure the quality and efficiency of their business processes, they perform business process management. An important part of process management that happens on the daily operational level is process controlling. A prerequisite of controlling is process monitoring, i.e., keeping track of the performed activities in running process instances. Only by process monitoring can business analysts detect delays and react to deviations from the expected or guaranteed performance of a process instance. To enable monitoring, process events need to be collected from the process environment. When a business process is orchestrated by a process execution engine, monitoring is available for all orchestrated process activities. Many business processes, however, do not lend themselves to automatic orchestration, e.g., because of required freedom of action. This situation is often encountered in hospitals, where most business processes are manually enacted. Hence, in practice it is often inefficient or infeasible to document and monitor every process activity. Additionally, manual process execution and documentation is prone to errors, e.g., documentation of activities can be forgotten. Thus, organizations face the challenge of process events that occur, but are not observed by the monitoring environment. These unobserved process events can serve as basis for operational process decisions, even without exact knowledge of when they happened or when they will happen. An exemplary decision is whether to invest more resources to manage timely completion of a case, anticipating that the process end event will occur too late. This thesis offers means to reason about unobserved process events in a probabilistic way. We address decisive questions of process managers (e.g., "when will the case be finished?", or "when did we perform the activity that we forgot to document?") in this thesis. As main contribution, we introduce an advanced probabilistic model to business process management that is based on a stochastic variant of Petri nets. We present a holistic approach to use the model effectively along the business process lifecycle. Therefore, we provide techniques to discover such models from historical observations, to predict the termination time of processes, and to ensure quality by missing data management. We propose mechanisms to optimize configuration for monitoring and prediction, i.e., to offer guidance in selecting important activities to monitor. An implementation is provided as a proof of concept. For evaluation, we compare the accuracy of the approach with that of state-of-the-art approaches using real process data of a hospital. Additionally, we show its more general applicability in other domains by applying the approach on process data from logistics and finance. N2 - Unternehmen versuchen Wettbewerbsvorteile zu gewinnen und die Kundenzufriedenheit zu erhöhen. Um die Qualität und die Effizienz ihrer Prozesse zu gewährleisten, wenden Unternehmen Geschäftsprozessmanagement an. Hierbei spielt die Prozesskontrolle im täglichen Betrieb eine wichtige Rolle. Prozesskontrolle wird durch Prozessmonitoring ermöglicht, d.h. durch die Überwachung des Prozessfortschritts laufender Prozessinstanzen. So können Verzögerungen entdeckt und es kann entsprechend reagiert werden, um Prozesse wie erwartet und termingerecht beenden zu können. Um Prozessmonitoring zu ermöglichen, müssen prozessrelevante Ereignisse aus der Prozessumgebung gesammelt und ausgewertet werden. Sofern eine Prozessausführungsengine die Orchestrierung von Geschäftsprozessen übernimmt, kann jede Prozessaktivität überwacht werden. Aber viele Geschäftsprozesse eignen sich nicht für automatisierte Orchestrierung, da sie z.B. besonders viel Handlungsfreiheit erfordern. Dies ist in Krankenhäusern der Fall, in denen Geschäftsprozesse oft manuell durchgeführt werden. Daher ist es meist umständlich oder unmöglich, jeden Prozessfortschritt zu erfassen. Zudem ist händische Prozessausführung und -dokumentation fehleranfällig, so wird z.B. manchmal vergessen zu dokumentieren. Eine Herausforderung für Unternehmen ist, dass manche Prozessereignisse nicht im Prozessmonitoring erfasst werden. Solch unbeobachtete Prozessereignisse können jedoch als Entscheidungsgrundlage dienen, selbst wenn kein exaktes Wissen über den Zeitpunkt ihres Auftretens vorliegt. Zum Beispiel ist bei der Prozesskontrolle zu entscheiden, ob zusätzliche Ressourcen eingesetzt werden sollen, wenn eine Verspätung angenommen wird. Diese Arbeit stellt einen probabilistischen Ansatz für den Umgang mit unbeobachteten Prozessereignissen vor. Dabei werden entscheidende Fragen von Prozessmanagern beantwortet (z.B. "Wann werden wir den Fall beenden?", oder "Wann wurde die Aktivität ausgeführt, die nicht dokumentiert wurde?"). Der Hauptbeitrag der Arbeit ist die Einführung eines erweiterten probabilistischen Modells ins Geschäftsprozessmanagement, das auf stochastischen Petri Netzen basiert. Dabei wird ein ganzheitlicher Ansatz zur Unterstützung der einzelnen Phasen des Geschäftsprozesslebenszyklus verfolgt. Es werden Techniken zum Lernen des probabilistischen Modells, zum Vorhersagen des Zeitpunkts des Prozessendes, zum Qualitätsmanagement von Dokumentationen durch Erkennung fehlender Einträge, und zur Optimierung von Monitoringkonfigurationen bereitgestellt. Letztere dient zur Auswahl von relevanten Stellen im Prozess, die beobachtet werden sollten. Diese Techniken wurden in einer quelloffenen prototypischen Anwendung implementiert. Zur Evaluierung wird der Ansatz mit existierenden Alternativen an echten Prozessdaten eines Krankenhauses gemessen. Die generelle Anwendbarkeit in weiteren Domänen wird examplarisch an Prozessdaten aus der Logistik und dem Finanzwesen gezeigt. KW - Geschäftsprozessmanagement KW - stochastische Petri Netze KW - Bayessche Netze KW - Probabilistische Modelle KW - Vorhersage KW - Fehlende Daten KW - Process Mining KW - business process management KW - stochastic Petri nets KW - Bayesian networks KW - probabilistic models KW - prediction KW - missing data KW - process mining Y1 - 2014 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-70426 ER - TY - JOUR A1 - Vaid, Akhil A1 - Somani, Sulaiman A1 - Russak, Adam J. A1 - De Freitas, Jessica K. A1 - Chaudhry, Fayzan F. A1 - Paranjpe, Ishan A1 - Johnson, Kipp W. A1 - Lee, Samuel J. A1 - Miotto, Riccardo A1 - Richter, Felix A1 - Zhao, Shan A1 - Beckmann, Noam D. A1 - Naik, Nidhi A1 - Kia, Arash A1 - Timsina, Prem A1 - Lala, Anuradha A1 - Paranjpe, Manish A1 - Golden, Eddye A1 - Danieletto, Matteo A1 - Singh, Manbir A1 - Meyer, Dara A1 - O'Reilly, Paul F. A1 - Huckins, Laura A1 - Kovatch, Patricia A1 - Finkelstein, Joseph A1 - Freeman, Robert M. A1 - Argulian, Edgar A1 - Kasarskis, Andrew A1 - Percha, Bethany A1 - Aberg, Judith A. A1 - Bagiella, Emilia A1 - Horowitz, Carol R. A1 - Murphy, Barbara A1 - Nestler, Eric J. A1 - Schadt, Eric E. A1 - Cho, Judy H. A1 - Cordon-Cardo, Carlos A1 - Fuster, Valentin A1 - Charney, Dennis S. A1 - Reich, David L. A1 - Böttinger, Erwin A1 - Levin, Matthew A. A1 - Narula, Jagat A1 - Fayad, Zahi A. A1 - Just, Allan C. A1 - Charney, Alexander W. A1 - Nadkarni, Girish N. A1 - Glicksberg, Benjamin S. T1 - Machine learning to predict mortality and critical events in a cohort of patients with COVID-19 in New York City: model development and validation JF - Journal of medical internet research : international scientific journal for medical research, information and communication on the internet ; JMIR N2 - Background: COVID-19 has infected millions of people worldwide and is responsible for several hundred thousand fatalities. The COVID-19 pandemic has necessitated thoughtful resource allocation and early identification of high-risk patients. However, effective methods to meet these needs are lacking. Objective: The aims of this study were to analyze the electronic health records (EHRs) of patients who tested positive for COVID-19 and were admitted to hospitals in the Mount Sinai Health System in New York City; to develop machine learning models for making predictions about the hospital course of the patients over clinically meaningful time horizons based on patient characteristics at admission; and to assess the performance of these models at multiple hospitals and time points. Methods: We used Extreme Gradient Boosting (XGBoost) and baseline comparator models to predict in-hospital mortality and critical events at time windows of 3, 5, 7, and 10 days from admission. Our study population included harmonized EHR data from five hospitals in New York City for 4098 COVID-19-positive patients admitted from March 15 to May 22, 2020. The models were first trained on patients from a single hospital (n=1514) before or on May 1, externally validated on patients from four other hospitals (n=2201) before or on May 1, and prospectively validated on all patients after May 1 (n=383). Finally, we established model interpretability to identify and rank variables that drive model predictions. Results: Upon cross-validation, the XGBoost classifier outperformed baseline models, with an area under the receiver operating characteristic curve (AUC-ROC) for mortality of 0.89 at 3 days, 0.85 at 5 and 7 days, and 0.84 at 10 days. XGBoost also performed well for critical event prediction, with an AUC-ROC of 0.80 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. In external validation, XGBoost achieved an AUC-ROC of 0.88 at 3 days, 0.86 at 5 days, 0.86 at 7 days, and 0.84 at 10 days for mortality prediction. Similarly, the unimputed XGBoost model achieved an AUC-ROC of 0.78 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. Trends in performance on prospective validation sets were similar. At 7 days, acute kidney injury on admission, elevated LDH, tachypnea, and hyperglycemia were the strongest drivers of critical event prediction, while higher age, anion gap, and C-reactive protein were the strongest drivers of mortality prediction. Conclusions: We externally and prospectively trained and validated machine learning models for mortality and critical events for patients with COVID-19 at different time horizons. These models identified at-risk patients and uncovered underlying relationships that predicted outcomes. KW - machine learning KW - COVID-19 KW - electronic health record KW - TRIPOD KW - clinical KW - informatics KW - prediction KW - mortality KW - EHR KW - cohort KW - hospital KW - performance Y1 - 2020 U6 - https://doi.org/10.2196/24018 SN - 1439-4456 SN - 1438-8871 VL - 22 IS - 11 PB - Healthcare World CY - Richmond, Va. ER - TY - JOUR A1 - Vaid, Akhil A1 - Chan, Lili A1 - Chaudhary, Kumardeep A1 - Jaladanki, Suraj K. A1 - Paranjpe, Ishan A1 - Russak, Adam J. A1 - Kia, Arash A1 - Timsina, Prem A1 - Levin, Matthew A. A1 - He, John Cijiang A1 - Böttinger, Erwin A1 - Charney, Alexander W. A1 - Fayad, Zahi A. A1 - Coca, Steven G. A1 - Glicksberg, Benjamin S. A1 - Nadkarni, Girish N. T1 - Predictive approaches for acute dialysis requirement and death in COVID-19 JF - Clinical journal of the American Society of Nephrology : CJASN N2 - Background and objectives AKI treated with dialysis initiation is a common complication of coronavirus disease 2019 (COVID-19) among hospitalized patients. However, dialysis supplies and personnel are often limited. Design, setting, participants, & measurements Using data from adult patients hospitalized with COVID-19 from five hospitals from theMount Sinai Health System who were admitted between March 10 and December 26, 2020, we developed and validated several models (logistic regression, Least Absolute Shrinkage and Selection Operator (LASSO), random forest, and eXtreme GradientBoosting [XGBoost; with and without imputation]) for predicting treatment with dialysis or death at various time horizons (1, 3, 5, and 7 days) after hospital admission. Patients admitted to theMount Sinai Hospital were used for internal validation, whereas the other hospitals formed part of the external validation cohort. Features included demographics, comorbidities, and laboratory and vital signs within 12 hours of hospital admission. Results A total of 6093 patients (2442 in training and 3651 in external validation) were included in the final cohort. Of the different modeling approaches used, XGBoost without imputation had the highest area under the receiver operating characteristic (AUROC) curve on internal validation (range of 0.93-0.98) and area under the precisionrecall curve (AUPRC; range of 0.78-0.82) for all time points. XGBoost without imputation also had the highest test parameters on external validation (AUROC range of 0.85-0.87, and AUPRC range of 0.27-0.54) across all time windows. XGBoost without imputation outperformed all models with higher precision and recall (mean difference in AUROC of 0.04; mean difference in AUPRC of 0.15). Features of creatinine, BUN, and red cell distribution width were major drivers of the model's prediction. Conclusions An XGBoost model without imputation for prediction of a composite outcome of either death or dialysis in patients positive for COVID-19 had the best performance, as compared with standard and other machine learning models. KW - COVID-19 KW - dialysis KW - machine learning KW - prediction KW - AKI Y1 - 2021 U6 - https://doi.org/10.2215/CJN.17311120 SN - 1555-9041 SN - 1555-905X VL - 16 IS - 8 SP - 1158 EP - 1168 PB - American Society of Nephrology CY - Washington ER -