Refine
Has Fulltext
- no (42) (remove)
Document Type
- Article (42) (remove)
Language
- English (42)
Is part of the Bibliography
- yes (42)
Keywords
- machine learning (42) (remove)
Institute
- Institut für Biochemie und Biologie (10)
- Hasso-Plattner-Institut für Digital Engineering gGmbH (5)
- Hasso-Plattner-Institut für Digital Engineering GmbH (4)
- Department Linguistik (3)
- Institut für Geowissenschaften (3)
- Institut für Physik und Astronomie (3)
- Institut für Umweltwissenschaften und Geographie (3)
- Department Erziehungswissenschaft (2)
- Fachgruppe Betriebswirtschaftslehre (2)
- Digital Engineering Fakultät (1)
Objective:
Hypertension has long been recognized as one of the most important predisposing factors for cardiovascular diseases and mortality.
In recent years, machine learning methods have shown potential in diagnostic and predictive approaches in chronic diseases.
Electronic health records (EHRs) have emerged as a reliable source of longitudinal data. The aim of this study is to predict the onset of hypertension using modern deep learning (DL) architectures, specifically long short-term memory (LSTM) networks, and longitudinal EHRs.
Materials and Methods:
We compare this approach to the best performing models reported from previous works, particularly XGboost, applied to aggregated features.
Our work is based on data from 233 895 adult patients from a large health system in the United States. We divided our population into 2 distinct longitudinal datasets based on the diagnosis date.
To ensure generalization to unseen data, we trained our models on the first dataset (dataset A "train and validation") using cross-validation, and then applied the models to a second dataset (dataset B "test") to assess their performance.
We also experimented with 2 different time-windows before the onset of hypertension and evaluated the impact on model performance.
Results:
With the LSTM network, we were able to achieve an area under the receiver operating characteristic curve value of 0.98 in the "train and validation" dataset A and 0.94 in the "test" dataset B for a prediction time window of 1 year. Lipid disorders, type 2 diabetes, and renal disorders are found to be associated with incident hypertension.
Conclusion:
These findings show that DL models based on temporal EHR data can improve the identification of patients at high risk of hypertension and corresponding driving factors. In the long term, this work may support identifying individuals who are at high risk for developing hypertension and facilitate earlier intervention to prevent the future development of hypertension.
We present the extension of the Kalmag model, proposed as a candidate for IGRF-13, to the twentieth century.
The dataset serving its derivation has been complemented by new measurements coming from satellites, ground-based observatories and land, marine and airborne surveys.
As its predecessor, this version is derived from a combination of a Kalman filter and a smoothing algorithm, providing mean models and associated uncertainties. These quantities permit a precise estimation of locations where mean solutions can be considered as reliable or not.
The temporal resolution of the core field and the secular variation was set to 0.1 year over the 122 years the model is spanning.
Nevertheless, it can be shown through ensembles a posteriori sampled, that this resolution can be effectively achieved only by a limited amount of spatial scales and during certain time periods.
Unsurprisingly, highest accuracy in both space and time of the core field and the secular variation is achieved during the CHAMP and Swarm era. In this version of Kalmag, a particular effort was made for resolving the small-scale lithospheric field.
Under specific statistical assumptions, the latter was modeled up to spherical harmonic degree and order 1000, and signal from both satellite and survey measurements contributed to its development.
External and induced fields were jointly estimated with the rest of the model. We show that their large scales could be accurately extracted from direct measurements whenever the latter exhibit a sufficiently high temporal coverage.
Temporally resolving these fields down to 3 hours during the CHAMP and Swarm missions, gave us access to the link between induced and magnetospheric fields. In particular, the period dependence of the driving signal on the induced one could be directly observed.
The model is available through various physical and statistical quantities on a dedicated website at https://ionocovar.agnld.uni-potsdam.de/Kalmag/.
Abstract
In recent years, feedforward neural networks (NNs) have been successfully applied to reconstruct global plasmasphere dynamics in the equatorial plane. These neural network‐based models capture the large‐scale dynamics of the plasmasphere, such as plume formation and erosion of the plasmasphere on the nightside. However, their performance depends strongly on the availability of training data. When the data coverage is limited or non‐existent, as occurs during geomagnetic storms, the performance of NNs significantly decreases, as networks inherently cannot learn from the limited number of examples. This limitation can be overcome by employing physics‐based modeling during strong geomagnetic storms. Physics‐based models show a stable performance during periods of disturbed geomagnetic activity if they are correctly initialized and configured. In this study, we illustrate how to combine the neural network‐ and physics‐based models of the plasmasphere in an optimal way by using data assimilation. The proposed approach utilizes advantages of both neural network‐ and physics‐based modeling and produces global plasma density reconstructions for both quiet and disturbed geomagnetic activity, including extreme geomagnetic storms. We validate the models quantitatively by comparing their output to the in‐situ density measurements from RBSP‐A for an 18‐month out‐of‐sample period from June 30, 2016 to January 01, 2018 and computing performance metrics. To validate the global density reconstructions qualitatively, we compare them to the IMAGE EUV images of the He+ particle distribution in the Earth's plasmasphere for a number of events in the past, including the Halloween storm in 2003.
Simple Summary Gliomas are heterogenous types of cancer, therefore the therapy should be personalized and targeted toward specific pathways. We developed a methodology that corrected strong batch effects from The Cancer Genome Atlas datasets and estimated glioma grade-specific co-enrichment mechanisms using machine learning. Our findings created hypotheses for annotations, e.g., pathways, that should be considered as therapeutic targets. Gliomas develop and grow in the brain and central nervous system. Examining glioma grading processes is valuable for improving therapeutic challenges. One of the most extensive repositories storing transcriptomics data for gliomas is The Cancer Genome Atlas (TCGA). However, such big cohorts should be processed with caution and evaluated thoroughly as they can contain batch and other effects. Furthermore, biological mechanisms of cancer contain interactions among biomarkers. Thus, we applied an interpretable machine learning approach to discover such relationships. This type of transparent learning provides not only good predictability, but also reveals co-predictive mechanisms among features. In this study, we corrected the strong and confounded batch effect in the TCGA glioma data. We further used the corrected datasets to perform comprehensive machine learning analysis applied on single-sample gene set enrichment scores using collections from the Molecular Signature Database. Furthermore, using rule-based classifiers, we displayed networks of co-enrichment related to glioma grades. Moreover, we validated our results using the external glioma cohorts. We believe that utilizing corrected glioma cohorts from TCGA may improve the application and validation of any future studies. Finally, the co-enrichment and survival analysis provided detailed explanations for glioma progression and consequently, it should support the targeted treatment.
Shams et al. report that glioma patients' motor status is predicted accurately by diffusion MRI metrics along the corticospinal tract based on support vector machine method, reaching an overall accuracy of 77%. They show that these metrics are more effective than demographic and clinical variables.
Along tract statistics enables white matter characterization using various diffusion MRI metrics. These diffusion models reveal detailed insights into white matter microstructural changes with development, pathology and function. Here, we aim at assessing the clinical utility of diffusion MRI metrics along the corticospinal tract, investigating whether motor glioma patients can be classified with respect to their motor status. We retrospectively included 116 brain tumour patients suffering from either left or right supratentorial, unilateral World Health Organization Grades II, III and IV gliomas with a mean age of 53.51 +/- 16.32 years. Around 37% of patients presented with preoperative motor function deficits according to the Medical Research Council scale. At group level comparison, the highest non-overlapping diffusion MRI differences were detected in the superior portion of the tracts' profiles. Fractional anisotropy and fibre density decrease, apparent diffusion coefficient axial diffusivity and radial diffusivity increase. To predict motor deficits, we developed a method based on a support vector machine using histogram-based features of diffusion MRI tract profiles (e.g. mean, standard deviation, kurtosis and skewness), following a recursive feature elimination method. Our model achieved high performance (74% sensitivity, 75% specificity, 74% overall accuracy and 77% area under the curve). We found that apparent diffusion coefficient, fractional anisotropy and radial diffusivity contributed more than other features to the model. Incorporating the patient demographics and clinical features such as age, tumour World Health Organization grade, tumour location, gender and resting motor threshold did not affect the model's performance, revealing that these features were not as effective as microstructural measures. These results shed light on the potential patterns of tumour-related microstructural white matter changes in the prediction of functional deficits.
Psychology and nutritional science research has highlighted the impact of negative emotions and cognitive load on calorie consumption behaviour using subjective questionnaires. Isolated studies in other domains objectively assess cognitive load without considering its effects on eating behaviour. This study aims to explore the potential for developing an integrated eating behaviour assistant system that incorporates cognitive load factors. Two experimental sessions were conducted using custom-developed experimentation software to induce different stimuli. During these sessions, we collected 30 h of physiological, food consumption, and affective states questionnaires data to automatically detect cognitive load and analyse its effect on food choice. Utilising grid search optimisation and leave-one-subject-out cross-validation, a support vector machine model achieved a mean classification accuracy of 85.12% for the two cognitive load tasks using eight relevant features. Statistical analysis was performed on calorie consumption and questionnaire data. Furthermore, 75% of the subjects with higher negative affect significantly increased consumption of specific foods after high-cognitive-load tasks. These findings offer insights into the intricate relationship between cognitive load, affective states, and food choice, paving the way for an eating behaviour assistant system to manage food choices during cognitive load. Future research should enhance system capabilities and explore real-world applications.
The ZuCo benchmark on cross-subject reading task classification with EEG and eye-tracking data
(2023)
We present a new machine learning benchmark for reading task classification with the goal of advancing EEG and eye-tracking research at the intersection between computational language processing and cognitive neuroscience. The benchmark task consists of a cross-subject classification to distinguish between two reading paradigms: normal reading and task-specific reading. The data for the benchmark is based on the Zurich Cognitive Language Processing Corpus (ZuCo 2.0), which provides simultaneous eye-tracking and EEG signals from natural reading of English sentences. The training dataset is publicly available, and we present a newly recorded hidden testset. We provide multiple solid baseline methods for this task and discuss future improvements. We release our code and provide an easy-to-use interface to evaluate new approaches with an accompanying public leaderboard: .
Leaf area index (LAI) is a key variable in understanding and modeling crop-environment interactions.
With the advent of increasingly higher spatial resolution satellites and sensors mounted on remotely piloted aircrafts (RPAs), the use of remote sensing in precision agriculture is becoming more common.
Since also the availability of methods to retrieve LAI from image data have also drastically expanded, it is necessary to test simultaneously as many methods as possible to understand the advantages and disadvantages of each approach.
Ground-based LAI data from three years of barley experiments were related to remote sensing information using vegetation indices (VI), machine learning (ML) and radiative transfer models (RTM), to assess the relative accuracy and efficacy of these methods.
The optimized soil adjusted vegetation index and a modified version of the Weighted Difference Vegetation Index performed slightly better than any other retrieval method. However, all methods yielded coefficients of determination of around 0.7 to 0.9.
The best performing machine learning algorithms achieved higher accuracies when four Sentinel-2 bands instead of 12 were used.
Also, the good performance of VIs and the satisfactory performance of the 4-band RTM, strongly support the synergistic use of satellites and RPAs in precision agriculture. One of the methods used, Sen2-Agri, an open source ML-RTM-based operational system, was also able to accurately retrieve LAI, although it is restricted to Sentinel-2 and Landsat data.
This study shows the benefits of testing simultaneously a broad range of retrieval methods to monitor crops for precision agriculture.
In the context of persistent images of self-perpetuated technologies, we discuss the interplay of digital technologies and organisational dynamics against the backdrop of systems theory. Building on the case of an international corporation that, during an agile reorganisation, introduced an AI-based personnel management platform, we show how technical systems produce a form of algorithmic contingency that subsequently leads to the emergence of formal and informal interaction systems. Using the concept of datafication, we explain how these interactions are barriers to the self-perpetuation of data-based decision-making, making it possible to take into consideration further decision factors and complementing the output of the platform. The research was carried out within the scope of the research project ‘Organisational Implications of Digitalisation: The Development of (Post-)Bureaucratic Organisational Structures in the Context of Digital Transformation’ funded by the German Research Foundation (DFG).
Nowadays, production planning and control must cope with mass customization, increased fluctuations in demand, and high competition pressures. Despite prevailing market risks, planning accuracy and increased adaptability in the event of disruptions or failures must be ensured, while simultaneously optimizing key process indicators. To manage that complex task, neural networks that can process large quantities of high-dimensional data in real time have been widely adopted in recent years. Although these are already extensively deployed in production systems, a systematic review of applications and implemented agent embeddings and architectures has not yet been conducted. The main contribution of this paper is to provide researchers and practitioners with an overview of applications and applied embeddings and to motivate further research in neural agent-based production. Findings indicate that neural agents are not only deployed in diverse applications, but are also increasingly implemented in multi-agent environments or in combination with conventional methods — leveraging performances compared to benchmarks and reducing dependence on human experience. This not only implies a more sophisticated focus on distributed production resources, but also broadening the perspective from a local to a global scale. Nevertheless, future research must further increase scalability and reproducibility to guarantee a simplified transfer of results to reality.