TY - JOUR A1 - Fernandez-Palomino, Carlos Antonio A1 - Hattermann, Fred A1 - Krysanova, Valentina A1 - Lobanova, Anastasia A1 - Vega-Jacome, Fiorella A1 - Lavado, Waldo A1 - Santini, William A1 - Aybar, Cesar A1 - Bronstert, Axel T1 - A novel high-resolution gridded precipitation dataset for peruvian and ecuadorian watersheds BT - development and hydrological evaluation JF - Journal of hydrometeorology N2 - A novel approach for estimating precipitation patterns is developed here and applied to generate a new hydrologically corrected daily precipitation dataset, called RAIN4PE (Rain for Peru and Ecuador), at 0.1 degrees spatial resolution for the period 1981-2015 covering Peru and Ecuador. It is based on the application of 1) the random forest method to merge multisource precipitation estimates (gauge, satellite, and reanalysis) with terrain elevation, and 2) observed and modeled streamflow data to first detect biases and second further adjust gridded precipitation by inversely applying the simulated results of the ecohydrological model SWAT (Soil and Water Assessment Tool). Hydrological results using RAIN4PE as input for the Peruvian and Ecuadorian catchments were compared against the ones when feeding other uncorrected (CHIRP and ERA5) and gauge-corrected (CHIRPS, MSWEP, and PISCO) precipitation datasets into the model. For that, SWAT was calibrated and validated at 72 river sections for each dataset using a range of performance metrics, including hydrograph goodness of fit and flow duration curve signatures. Results showed that gauge-corrected precipitation datasets outperformed uncorrected ones for streamflow simulation. However, CHIRPS, MSWEP, and PISCO showed limitations for streamflow simulation in several catchments draining into the Pacific Ocean and the Amazon River. RAIN4PE provided the best overall performance for streamflow simulation, including flow variability (low, high, and peak flows) and water budget closure. The overall good performance of RAIN4PE as input for hydrological modeling provides a valuable criterion of its applicability for robust countrywide hydrometeorological applications, including hydroclimatic extremes such as droughts and floods. Significance StatementWe developed a novel precipitation dataset RAIN4PE for Peru and Ecuador by merging multisource precipitation data (satellite, reanalysis, and ground-based precipitation) with terrain elevation using the random forest method. Furthermore, RAIN4PE was hydrologically corrected using streamflow data in watersheds with precipitation underestimation through reverse hydrology. The results of a comprehensive hydrological evaluation showed that RAIN4PE outperformed state-of-the-art precipitation datasets such as CHIRP, ERA5, CHIRPS, MSWEP, and PISCO in terms of daily and monthly streamflow simulations, including extremely low and high flows in almost all Peruvian and Ecuadorian catchments. This underlines the suitability of RAIN4PE for hydrometeorological applications in this region. Furthermore, our approach for the generation of RAIN4PE can be used in other data-scarce regions. KW - Amazon region KW - Complex terrain KW - South America KW - Streamflow KW - Precipitation KW - Hydrology KW - Water budget / balance KW - Inverse methods KW - Mountain meteorology KW - Machine learning Y1 - 2022 U6 - https://doi.org/10.1175/JHM-D-20-0285.1 SN - 1525-755X SN - 1525-7541 VL - 23 IS - 3 SP - 309 EP - 336 PB - American Meteorological Soc. CY - Boston ER - TY - JOUR A1 - Chen, Junchao A1 - Lange, Thomas A1 - Andjelkovic, Marko A1 - Simevski, Aleksandar A1 - Lu, Li A1 - Krstić, Miloš T1 - Solar particle event and single event upset prediction from SRAM-based monitor and supervised machine learning JF - IEEE transactions on emerging topics in computing / IEEE Computer Society, Institute of Electrical and Electronics Engineers N2 - The intensity of cosmic radiation may differ over five orders of magnitude within a few hours or days during the Solar Particle Events (SPEs), thus increasing for several orders of magnitude the probability of Single Event Upsets (SEUs) in space-borne electronic systems. Therefore, it is vital to enable the early detection of the SEU rate changes in order to ensure timely activation of dynamic radiation hardening measures. In this paper, an embedded approach for the prediction of SPEs and SRAM SEU rate is presented. The proposed solution combines the real-time SRAM-based SEU monitor, the offline-trained machine learning model and online learning algorithm for the prediction. With respect to the state-of-the-art, our solution brings the following benefits: (1) Use of existing on-chip data storage SRAM as a particle detector, thus minimizing the hardware and power overhead, (2) Prediction of SRAM SEU rate one hour in advance, with the fine-grained hourly tracking of SEU variations during SPEs as well as under normal conditions, (3) Online optimization of the prediction model for enhancing the prediction accuracy during run-time, (4) Negligible cost of hardware accelerator design for the implementation of selected machine learning model and online learning algorithm. The proposed design is intended for a highly dependable and self-adaptive multiprocessing system employed in space applications, allowing to trigger the radiation mitigation mechanisms before the onset of high radiation levels. KW - Machine learning KW - Single event upsets KW - Random access memory KW - monitoring KW - machine learning algorithms KW - predictive models KW - space missions KW - solar particle event KW - single event upset KW - machine learning KW - online learning KW - hardware accelerator KW - reliability KW - self-adaptive multiprocessing system Y1 - 2022 U6 - https://doi.org/10.1109/TETC.2022.3147376 SN - 2168-6750 VL - 10 IS - 2 SP - 564 EP - 580 PB - Institute of Electrical and Electronics Engineers CY - [New York, NY] ER - TY - JOUR A1 - Gautam, Khem Raj A1 - Zhang, Guoqiang A1 - Landwehr, Niels A1 - Adolphs, Julian T1 - Machine learning for improvement of thermal conditions inside a hybrid ventilated animal building JF - Computers and electronics in agriculture : COMPAG online ; an international journal N2 - In buildings with hybrid ventilation, natural ventilation opening positions (windows), mechanical ventilation rates, heating, and cooling are manipulated to maintain desired thermal conditions. The indoor temperature is regulated solely by ventilation (natural and mechanical) when the external conditions are favorable to save external heating and cooling energy. The ventilation parameters are determined by a rule-based control scheme, which is not optimal. This study proposes a methodology to enable real-time optimum control of ventilation parameters. We developed offline prediction models to estimate future thermal conditions from the data collected from building in operation. The developed offline model is then used to find the optimal controllable ventilation parameters in real-time to minimize the setpoint deviation in the building. With the proposed methodology, the experimental building's setpoint deviation improved for 87% of time, on average, by 0.53 degrees C compared to the current deviations. KW - Animal building KW - Natural ventilation KW - Automatically controlled windows KW - Machine learning KW - Optimization Y1 - 2021 U6 - https://doi.org/10.1016/j.compag.2021.106259 SN - 0168-1699 SN - 1872-7107 VL - 187 PB - Elsevier Science CY - Amsterdam [u.a.] ER - TY - JOUR A1 - Lischeid, Gunnar A1 - Webber, Heidi A1 - Sommer, Michael A1 - Nendel, Claas A1 - Ewert, Frank T1 - Machine learning in crop yield modelling BT - A powerful tool, but no surrogate for science JF - Agricultural and forest meteorology N2 - Provisioning a sufficient stable source of food requires sound knowledge about current and upcoming threats to agricultural production. To that end machine learning approaches were used to identify the prevailing climatic and soil hydrological drivers of spatial and temporal yield variability of four crops, comprising 40 years yield data each from 351 counties in Germany. Effects of progress in agricultural management and breeding were subtracted from the data prior the machine learning modelling by fitting smooth non-linear trends to the 95th percentiles of observed yield data. An extensive feature selection approach was followed then to identify the most relevant predictors out of a large set of candidate predictors, comprising various soil and meteorological data. Particular emphasis was placed on studying the uniqueness of identified key predictors. Random Forest and Support Vector Machine models yielded similar although not identical results, capturing between 50% and 70% of the spatial and temporal variance of silage maize, winter barley, winter rapeseed and winter wheat yield. Equally good performance could be achieved with different sets of predictors. Thus identification of the most reliable models could not be based on the outcome of the model study only but required expert's judgement. Relationships between drivers and response often exhibited optimum curves, especially for summer air temperature and precipitation. In contrast, soil moisture clearly proved less relevant compared to meteorological drivers. In view of the expected climate change both excess precipitation and the excess heat effect deserve more attention in breeding as well as in crop modelling. KW - Crop modelling KW - Machine learning KW - Random forests KW - Support vector KW - machine KW - Feature selection KW - Equivocality Y1 - 2021 U6 - https://doi.org/10.1016/j.agrformet.2021.108698 SN - 0168-1923 SN - 1873-2240 VL - 312 PB - Elsevier CY - Amsterdam ER - TY - JOUR A1 - Fournier, Bertrand A1 - Steiner, Magdalena A1 - Brochet, Xavier A1 - Degrune, Florine A1 - Mammeri, Jibril A1 - Carvalho, Diogo Leite A1 - Siliceo, Sara Leal A1 - Bacher, Sven A1 - Peña-Reyes, Carlos Andrés A1 - Heger, Thierry Jean T1 - Toward the use of protists as bioindicators of multiple stresses in agricultural soils BT - a case study in vineyard ecosystems JF - Ecological indicators : integrating monitoring, assessment and management N2 - Management of agricultural soil quality requires fast and cost-efficient methods to identify multiple stressors that can affect soil organisms and associated ecological processes. Here, we propose to use soil protists which have a great yet poorly explored potential for bioindication. They are ubiquitous, highly diverse, and respond to various stresses to agricultural soils caused by frequent management or environmental changes. We test an approach that combines metabarcoding data and machine learning algorithms to identify potential stressors of soil protist community composition and diversity. We measured 17 key variables that reflect various potential stresses on soil protists across 132 plots in 28 Swiss vineyards over 2 years. We identified the taxa showing strong responses to the selected soil variables (potential bioindicator taxa) and tested for their predictive power. Changes in protist taxa occurrence and, to a lesser extent, diversity metrics exhibited great predictive power for the considered soil variables. Soil copper concentration, moisture, pH, and basal respiration were the best predicted soil variables, suggesting that protists are particularly responsive to stresses caused by these variables. The most responsive taxa were found within the clades Rhizaria and Alveolata. Our results also reveal that a majority of the potential bioindicators identified in this study can be used across years, in different regions and across different grape varieties. Altogether, soil protist metabarcoding data combined with machine learning can help identifying specific abiotic stresses on microbial communities caused by agricultural management. Such an approach provides complementary information to existing soil monitoring tools that can help manage the impact of agricultural practices on soil biodiversity and quality. KW - Biomonitoring KW - Machine learning KW - Predictive model KW - Soil function KW - Soil KW - quality KW - Microbial ecology Y1 - 2022 U6 - https://doi.org/10.1016/j.ecolind.2022.108955 SN - 1470-160X SN - 1872-7034 VL - 139 PB - Elsevier CY - Amsterdam ER - TY - JOUR A1 - Baerenzung, Julien A1 - Holschneider, Matthias A1 - Wicht, Johannes A1 - Lesur, Vincent A1 - Sanchez, Sabrina T1 - The Kalmag model as a candidate for IGRF-13 JF - Earth, planets and space N2 - We present a new model of the geomagnetic field spanning the last 20 years and called Kalmag. Deriving from the assimilation of CHAMP and Swarm vector field measurements, it separates the different contributions to the observable field through parameterized prior covariance matrices. To make the inverse problem numerically feasible, it has been sequentialized in time through the combination of a Kalman filter and a smoothing algorithm. The model provides reliable estimates of past, present and future mean fields and associated uncertainties. The version presented here is an update of our IGRF candidates; the amount of assimilated data has been doubled and the considered time window has been extended from [2000.5, 2019.74] to [2000.5, 2020.33]. KW - Geomagnetic field KW - Secular variation KW - Assimilation KW - Kalman filter KW - Machine learning Y1 - 2020 U6 - https://doi.org/10.1186/s40623-020-01295-y SN - 1880-5981 VL - 72 IS - 1 PB - Springer CY - New York ER - TY - JOUR A1 - Panzer, Marcel A1 - Bender, Benedict T1 - Deep reinforcement learning in production systems BT - a systematic literature review JF - International Journal of Production Research N2 - Shortening product development cycles and fully customizable products pose major challenges for production systems. These not only have to cope with an increased product diversity but also enable high throughputs and provide a high adaptability and robustness to process variations and unforeseen incidents. To overcome these challenges, deep Reinforcement Learning (RL) has been increasingly applied for the optimization of production systems. Unlike other machine learning methods, deep RL operates on recently collected sensor-data in direct interaction with its environment and enables real-time responses to system changes. Although deep RL is already being deployed in production systems, a systematic review of the results has not yet been established. The main contribution of this paper is to provide researchers and practitioners an overview of applications and to motivate further implementations and research of deep RL supported production systems. Findings reveal that deep RL is applied in a variety of production domains, contributing to data-driven and flexible processes. In most applications, conventional methods were outperformed and implementation efforts or dependence on human experience were reduced. Nevertheless, future research must focus more on transferring the findings to real-world systems to analyze safety aspects and demonstrate reliability under prevailing conditions. KW - Machine learning KW - reinforcement learning KW - production control KW - production planning KW - manufacturing processes KW - systematic literature review Y1 - 2021 U6 - https://doi.org/10.1080/00207543.2021.1973138 SN - 1366-588X SN - 0020-7543 VL - 13 IS - 60 PB - Taylor & Francis CY - London ER - TY - JOUR A1 - Sapegin, Andrey A1 - Jaeger, David A1 - Cheng, Feng A1 - Meinel, Christoph T1 - Towards a system for complex analysis of security events in large-scale networks JF - Computers & security : the international journal devoted to the study of the technical and managerial aspects of computer security N2 - After almost two decades of development, modern Security Information and Event Management (SIEM) systems still face issues with normalisation of heterogeneous data sources, high number of false positive alerts and long analysis times, especially in large-scale networks with high volumes of security events. In this paper, we present our own prototype of SIEM system, which is capable of dealing with these issues. For efficient data processing, our system employs in-memory data storage (SAP HANA) and our own technologies from the previous work, such as the Object Log Format (OLF) and high-speed event normalisation. We analyse normalised data using a combination of three different approaches for security analysis: misuse detection, query-based analytics, and anomaly detection. Compared to the previous work, we have significantly improved our unsupervised anomaly detection algorithms. Most importantly, we have developed a novel hybrid outlier detection algorithm that returns ranked clusters of anomalies. It lets an operator of a SIEM system to concentrate on the several top-ranked anomalies, instead of digging through an unsorted bundle of suspicious events. We propose to use anomaly detection in a combination with signatures and queries, applied on the same data, rather than as a full replacement for misuse detection. In this case, the majority of attacks will be captured with misuse detection, whereas anomaly detection will highlight previously unknown behaviour or attacks. We also propose that only the most suspicious event clusters need to be checked by an operator, whereas other anomalies, including false positive alerts, do not need to be explicitly checked if they have a lower ranking. We have proved our concepts and algorithms on a dataset of 160 million events from a network segment of a big multinational company and suggest that our approach and methods are highly relevant for modern SIEM systems. KW - Intrusion detection KW - SAP HANA KW - In-memory KW - Security KW - Machine learning KW - Anomaly detection KW - Outlier detection Y1 - 2017 U6 - https://doi.org/10.1016/j.cose.2017.02.001 SN - 0167-4048 SN - 1872-6208 VL - 67 SP - 16 EP - 34 PB - Elsevier Science CY - Oxford ER - TY - JOUR A1 - Haupt, Johannes A1 - Bender, Benedict A1 - Fabian, Benjamin A1 - Lessmann, Stefan T1 - Robust identification of email tracking BT - a machine learning approach JF - European Journal of Operational Research N2 - Email tracking allows email senders to collect fine-grained behavior and location data on email recipients, who are uniquely identifiable via their email address. Such tracking invades user privacy in that email tracking techniques gather data without user consent or awareness. Striving to increase privacy in email communication, this paper develops a detection engine to be the core of a selective tracking blocking mechanism in the form of three contributions. First, a large collection of email newsletters is analyzed to show the wide usage of tracking over different countries, industries and time. Second, we propose a set of features geared towards the identification of tracking images under real-world conditions. Novel features are devised to be computationally feasible and efficient, generalizable and resilient towards changes in tracking infrastructure. Third, we test the predictive power of these features in a benchmarking experiment using a selection of state-of-the-art classifiers to clarify the effectiveness of model-based tracking identification. We evaluate the expected accuracy of the approach on out-of-sample data, over increasing periods of time, and when faced with unknown senders. (C) 2018 Elsevier B.V. All rights reserved. KW - Analytics KW - Data privacy KW - Email tracking KW - Machine learning Y1 - 2018 U6 - https://doi.org/10.1016/j.ejor.2018.05.018 SN - 0377-2217 SN - 1872-6860 VL - 271 IS - 1 SP - 341 EP - 356 PB - Elsevier CY - Amsterdam ER - TY - JOUR A1 - Prasse, Paul A1 - Knaebel, Rene A1 - Machlica, Lukas A1 - Pevny, Tomas A1 - Scheffer, Tobias T1 - Joint detection of malicious domains and infected clients JF - Machine learning N2 - Detection of malware-infected computers and detection of malicious web domains based on their encrypted HTTPS traffic are challenging problems, because only addresses, timestamps, and data volumes are observable. The detection problems are coupled, because infected clients tend to interact with malicious domains. Traffic data can be collected at a large scale, and antivirus tools can be used to identify infected clients in retrospect. Domains, by contrast, have to be labeled individually after forensic analysis. We explore transfer learning based on sluice networks; this allows the detection models to bootstrap each other. In a large-scale experimental study, we find that the model outperforms known reference models and detects previously unknown malware, previously unknown malware families, and previously unknown malicious domains. KW - Machine learning KW - Neural networks KW - Computer security KW - Traffic data KW - Https traffic Y1 - 2019 U6 - https://doi.org/10.1007/s10994-019-05789-z SN - 0885-6125 SN - 1573-0565 VL - 108 IS - 8-9 SP - 1353 EP - 1368 PB - Springer CY - Dordrecht ER -