TY  - JOUR
A1  - Andres, Maximilian
A1  - Bruttel, Lisa
A1  - Friedrichsen, Jana
T1  - How communication makes the difference between a cartel and tacit collusion
BT  - a machine learning approach
JF  - European economic review
N2  - This paper sheds new light on the role of communication for cartel formation. Using machine learning to evaluate free-form chat communication among firms in a laboratory experiment, we identify typical communication patterns for both explicit cartel formation and indirect attempts to collude tacitly. We document that firms are less likely to communicate explicitly about price fixing and more likely to use indirect messages when sanctioning institutions are present. This effect of sanctions on communication reinforces the direct cartel-deterring effect of sanctions as collusion is more difficult to reach and sustain without an explicit agreement. Indirect messages have no, or even a negative, effect on prices.
KW  - cartel
KW  - collusion
KW  - communication
KW  - machine learning
KW  - experiment
Y1  - 2023
U6  - https://doi.org/10.1016/j.euroecorev.2022.104331
SN  - 0014-2921
SN  - 1873-572X
VL  - 152
SP  - 1
EP  - 18
PB  - Elsevier
CY  - Amsterdam
ER  - 
TY  - JOUR
A1  - Ayzel, Georgy
T1  - Deep neural networks in hydrology
BT  - the new generation of universal and efficient models
BT  - новое поколение универсальных и эффективных моделей
JF  - Vestnik of Saint Petersburg University. Earth Sciences
N2  - For around a decade, deep learning - the sub-field of machine learning that refers to artificial neural networks comprised of many computational layers - modifies the landscape of statistical model development in many research areas, such as image classification, machine translation, and speech recognition. Geoscientific disciplines in general and the field of hydrology in particular, also do not stand aside from this movement. Recently, the proliferation of modern deep learning-based techniques and methods has been actively gaining popularity for solving a wide range of hydrological problems: modeling and forecasting of river runoff, hydrological model parameters regionalization, assessment of available water resources. identification of the main drivers of the recent change in water balance components. This growing popularity of deep neural networks is primarily due to their high universality and efficiency. The presented qualities, together with the rapidly growing amount of accumulated environmental information, as well as increasing availability of computing facilities and resources, allow us to speak about deep neural networks as a new generation of mathematical models designed to, if not to replace existing solutions, but significantly enrich the field of geophysical processes modeling. This paper provides a brief overview of the current state of the field of development and application of deep neural networks in hydrology. Also in the following study, the qualitative long-term forecast regarding the development of deep learning technology for managing the corresponding hydrological modeling challenges is provided based on the use of "Gartner Hype Curve", which in the general details describes a life cycle of modern technologies.
N2  - В течение последнего десятилетия глубокое обучение - область машинного обучения, относящаяся к искусственным нейронным сетям, состоящим из множества вычислительных слоев, - изменяет ландшафт развития статистических моделей во многих областях исследований, таких как классификация изображений, машинный перевод, распознавание речи. Географические науки, а также входящая в их состав область исследования гидрологии суши, не стоят в стороне от этого движения. В последнее время применение современных технологий и методов глубокого обучения активно набирает популярность для решения широкого спектра гидрологических задач: моделирования и прогнозирования речного стока, районирования модельных параметров, оценки располагаемых водных ресурсов, идентификации факторов, влияющих на современные изменения водного режима. Такой рост популярности глубоких нейронных сетей продиктован прежде всего их высокой универсальностью и эффективностью. Представленные качества в совокупности с быстрорастущим количеством накопленной информации о состоянии окружающей среды, а также ростом доступности вычислительных средств и ресурсов, позволяют говорить о глубоких нейронных сетях как о новом поколении математических моделей, призванных если не заменить существующие решения, то значительно обогатить область моделирования геофизических процессов. В данной работе представлен краткий обзор текущего состояния области разработки и применения глубоких нейронных сетей в гидрологии. Также в работе предложен качественный долгосрочный прогноз развития технологии глубокого обучения для решения задач гидрологического моделирования на основе использования «кривой ажиотажа Гартнера», в общих чертах описывающей жизненный цикл современных технологий.
T2  - Глубокие нейронные сети в гидрологии
KW  - deep neural networks
KW  - deep learning
KW  - machine learning
KW  - hydrology
KW  - modeling
KW  - глубокие нейронные сети
KW  - глубокое обучение
KW  - машинное обучение
KW  - гидрология
KW  - моделирование
Y1  - 2021
U6  - https://doi.org/10.21638/spbu07.2021.101
SN  - 2541-9668
SN  - 2587-585X
VL  - 66
IS  - 1
SP  - 5
EP  - 18
PB  - Univ. Press
CY  - St. Petersburg
ER  - 
TY  - JOUR
A1  - Ayzel, Georgy
A1  - Izhitskiy, Alexander
T1  - Climate Change Impact Assessment on Freshwater Inflow into the Small Aral Sea
JF  - Water
N2  - During the last few decades, the rapid separation of the Small Aral Sea from the isolated basin has changed its hydrological and ecological conditions tremendously. In the present study, we developed and validated the hybrid model for the Syr Darya River basin based on a combination of state-of-the-art hydrological and machine learning models. Climate change impact on freshwater inflow into the Small Aral Sea for the projection period 2007-2099 has been quantified based on the developed hybrid model and bias corrected and downscaled meteorological projections simulated by four General Circulation Models (GCM) for each of three Representative Concentration Pathway scenarios (RCP). The developed hybrid model reliably simulates freshwater inflow for the historical period with a Nash-Sutcliffe efficiency of 0.72 and a Kling-Gupta efficiency of 0.77. Results of the climate change impact assessment showed that the freshwater inflow projections produced by different GCMs are misleading by providing contradictory results for the projection period. However, we identified that the relative runoff changes are expected to be more pronounced in the case of more aggressive RCP scenarios. The simulated projections of freshwater inflow provide a basis for further assessment of climate change impacts on hydrological and ecological conditions of the Small Aral Sea in the 21st Century.
KW  - Small Aral Sea
KW  - hydrology
KW  - climate change
KW  - modeling
KW  - machine learning
Y1  - 2019
U6  - https://doi.org/10.3390/w11112377
SN  - 2073-4441
VL  - 11
IS  - 11
PB  - MDPI
CY  - Basel
ER  - 
TY  - JOUR
A1  - Baumgart, Lene
A1  - Boos, Pauline
A1  - Eckstein, Bernd
T1  - Datafication and algorithmic contingency
BT  - how agile organisations deal with technical systems
JF  - Work organisation, labour & globalisation
N2  - In the context of persistent images of self-perpetuated technologies, we discuss the interplay of digital technologies and organisational dynamics against the backdrop of systems theory. Building on the case of an international corporation that, during an agile reorganisation, introduced an AI-based personnel management platform, we show how technical systems produce a form of algorithmic contingency that subsequently leads to the emergence of formal and informal interaction systems. Using the concept of datafication, we explain how these interactions are barriers to the self-perpetuation of data-based decision-making, making it possible to take into consideration further decision factors and complementing the output of the platform. The research was carried out within the scope of the research project ‘Organisational Implications of Digitalisation: The Development of (Post-)Bureaucratic Organisational Structures in the Context of Digital Transformation’ funded by the German Research Foundation (DFG).
KW  - digitalisation
KW  - datafication
KW  - organisation
KW  - agile
KW  - technical system
KW  - systems theory
KW  - interaction
KW  - algorithmic contingency
KW  - machine learning
KW  - platform
Y1  - 2023
U6  - https://doi.org/10.13169/workorgalaboglob.17.1.0061
SN  - 1745-641X
SN  - 1745-6428
VL  - 17
IS  - 1
SP  - 61
EP  - 73
PB  - Pluto Journals
CY  - London
ER  - 
TY  - JOUR
A1  - Bornhorst, Julia
A1  - Nustede, Eike Jannik
A1  - Fudickar, Sebastian
T1  - Mass Surveilance of C. elegans-Smartphone-Based DIY Microscope and Machine-Learning-Based Approach for Worm Detection
JF  - Sensors
N2  - The nematode Caenorhabditis elegans (C. elegans) is often used as an alternative animal model due to several advantages such as morphological changes that can be seen directly under a microscope. Limitations of the model include the usage of expensive and cumbersome microscopes, and restrictions of the comprehensive use of C. elegans for toxicological trials. With the general applicability of the detection of C. elegans from microscope images via machine learning, as well as of smartphone-based microscopes, this article investigates the suitability of smartphone-based microscopy to detect C. elegans in a complete Petri dish. Thereby, the article introduces a smartphone-based microscope (including optics, lighting, and housing) for monitoring C. elegans and the corresponding classification via a trained Histogram of Oriented Gradients (HOG) feature-based Support Vector Machine for the automatic detection of C. elegans. Evaluation showed classification sensitivity of 0.90 and specificity of 0.85, and thereby confirms the general practicability of the chosen approach.
KW  - Caenorhabditis elegans
KW  - machine learning
KW  - smartphone
KW  - microscope
KW  - SVM
KW  - HOG
Y1  - 2019
U6  - https://doi.org/10.3390/s19061468
SN  - 1424-8220
VL  - 19
IS  - 6
PB  - MDPI
CY  - Basel
ER  - 
TY  - JOUR
A1  - Brandes, Stefanie
A1  - Sicks, Florian
A1  - Berger, Anne
T1  - Behaviour classification on giraffes (Giraffa camelopardalis) using machine learning algorithms on triaxial acceleration data of two commonly used GPS devices and its possible application for their management and conservation
JF  - Sensors
N2  - Averting today's loss of biodiversity and ecosystem services can be achieved through conservation efforts, especially of keystone species. Giraffes (Giraffa camelopardalis) play an important role in sustaining Africa's ecosystems, but are 'vulnerable' according to the IUCN Red List since 2016. Monitoring an animal's behavior in the wild helps to develop and assess their conservation management. One mechanism for remote tracking of wildlife behavior is to attach accelerometers to animals to record their body movement. We tested two different commercially available high-resolution accelerometers, e-obs and Africa Wildlife Tracking (AWT), attached to the top of the heads of three captive giraffes and analyzed the accuracy of automatic behavior classifications, focused on the Random Forests algorithm. For both accelerometers, behaviors of lower variety in head and neck movements could be better predicted (i.e., feeding above eye level, mean prediction accuracy e-obs/AWT: 97.6%/99.7%; drinking: 96.7%/97.0%) than those with a higher variety of body postures (such as standing: 90.7-91.0%/75.2-76.7%; rumination: 89.6-91.6%/53.5-86.5%). Nonetheless both devices come with limitations and especially the AWT needs technological adaptations before applying it on animals in the wild. Nevertheless, looking at the prediction results, both are promising accelerometers for behavioral classification of giraffes. Therefore, these devices when applied to free-ranging animals, in combination with GPS tracking, can contribute greatly to the conservation of giraffes.
KW  - giraffe
KW  - triaxial acceleration
KW  - machine learning
KW  - random forests
KW  - behavior classification
KW  - giraffe conservation
Y1  - 2021
U6  - https://doi.org/10.3390/s21062229
SN  - 1424-8220
VL  - 21
IS  - 6
PB  - MDPI
CY  - Basel
ER  - 
TY  - JOUR
A1  - Ceulemans, Ruben
A1  - Guill, Christian
A1  - Gaedke, Ursula
T1  - Top predators govern multitrophic diversity effects in tritrophic food webs
JF  - Ecology : a publication of the Ecological Society of America
N2  - It is well known that functional diversity strongly affects ecosystem functioning. However, even in rather simple model communities consisting of only two or, at best, three trophic levels, the relationship between multitrophic functional diversity and ecosystem functioning appears difficult to generalize, because of its high contextuality. In this study, we considered several differently structured tritrophic food webs, in which the amount of functional diversity was varied independently on each trophic level. To achieve generalizable results, largely independent of parametrization, we examined the outcomes of 128,000 parameter combinations sampled from ecologically plausible intervals, with each tested for 200 randomly sampled initial conditions. Analysis of our data was done by training a random forest model. This method enables the identification of complex patterns in the data through partial dependence graphs, and the comparison of the relative influence of model parameters, including the degree of diversity, on food-web properties. We found that bottom-up and top-down effects cascade simultaneously throughout the food web, intimately linking the effects of functional diversity of any trophic level to the amount of diversity of other trophic levels, which may explain the difficulty in unifying results from previous studies. Strikingly, only with high diversity throughout the whole food web, different interactions synergize to ensure efficient exploitation of the available nutrients and efficient biomass transfer to higher trophic levels, ultimately leading to a high biomass and production on the top level. The temporal variation of biomass showed a more complex pattern with increasing multitrophic diversity: while the system initially became less variable, eventually the temporal variation rose again because of the increasingly complex dynamical patterns. Importantly, top predator diversity and food-web parameters affecting the top trophic level were of highest importance to determine the biomass and temporal variability of any trophic level. Overall, our study reveals that the mechanisms by which diversity influences ecosystem functioning are affected by every part of the food web, hampering the extrapolation of insights from simple monotrophic or bitrophic systems to complex natural food webs.
KW  - food-web efficiency
KW  - functional diversity
KW  - machine learning
KW  - nutrient
KW  - exploitation
KW  - production
KW  - random forest
KW  - temporal variability
KW  - top
KW  - predator
KW  - trait diversity
Y1  - 2021
U6  - https://doi.org/10.1002/ecy.3379
SN  - 0012-9658
SN  - 1939-9170
VL  - 102
IS  - 7
PB  - Wiley
CY  - Hoboken
ER  - 
TY  - JOUR
A1  - Chen, Junchao
A1  - Lange, Thomas
A1  - Andjelkovic, Marko
A1  - Simevski, Aleksandar
A1  - Lu, Li
A1  - Krstić, Miloš
T1  - Solar particle event and single event upset prediction from SRAM-based monitor and supervised machine learning
JF  - IEEE transactions on emerging topics in computing / IEEE Computer Society, Institute of Electrical and Electronics Engineers
N2  - The intensity of cosmic radiation may differ over five orders of magnitude within a few hours or days during the Solar Particle Events (SPEs), thus increasing for several orders of magnitude the probability of Single Event Upsets (SEUs) in space-borne electronic systems. Therefore, it is vital to enable the early detection of the SEU rate changes in order to ensure timely activation of dynamic radiation hardening measures. In this paper, an embedded approach for the prediction of SPEs and SRAM SEU rate is presented. The proposed solution combines the real-time SRAM-based SEU monitor, the offline-trained machine learning model and online learning algorithm for the prediction. With respect to the state-of-the-art, our solution brings the following benefits: (1) Use of existing on-chip data storage SRAM as a particle detector, thus minimizing the hardware and power overhead, (2) Prediction of SRAM SEU rate one hour in advance, with the fine-grained hourly tracking of SEU variations during SPEs as well as under normal conditions, (3) Online optimization of the prediction model for enhancing the prediction accuracy during run-time, (4) Negligible cost of hardware accelerator design for the implementation of selected machine learning model and online learning algorithm. The proposed design is intended for a highly dependable and self-adaptive multiprocessing system employed in space applications, allowing to trigger the radiation mitigation mechanisms before the onset of high radiation levels.
KW  - Machine learning
KW  - Single event upsets
KW  - Random access memory
KW  - monitoring
KW  - machine learning algorithms
KW  - predictive models
KW  - space missions
KW  - solar particle event
KW  - single event upset
KW  - machine learning
KW  - online learning
KW  - hardware accelerator
KW  - reliability
KW  - self-adaptive multiprocessing system
Y1  - 2022
U6  - https://doi.org/10.1109/TETC.2022.3147376
SN  - 2168-6750
VL  - 10
IS  - 2
SP  - 564
EP  - 580
PB  - Institute of Electrical and Electronics Engineers
CY  - [New York, NY]
ER  - 
TY  - JOUR
A1  - Cope, Justin L.
A1  - Baukmann, Hannes A.
A1  - Klinger, Jörn E.
A1  - Ravarani, Charles N. J.
A1  - Böttinger, Erwin
A1  - Konigorski, Stefan
A1  - Schmidt, Marco F.
T1  - Interaction-based feature selection algorithm outperforms polygenic risk score in predicting Parkinson’s Disease status
JF  - Frontiers in genetics
N2  - Polygenic risk scores (PRS) aggregating results from genome-wide association studies are the state of the art in the prediction of susceptibility to complex traits or diseases, yet their predictive performance is limited for various reasons, not least of which is their failure to incorporate the effects of gene-gene interactions. Novel machine learning algorithms that use large amounts of data promise to find gene-gene interactions in order to build models with better predictive performance than PRS. Here, we present a data preprocessing step by using data-mining of contextual information to reduce the number of features, enabling machine learning algorithms to identify gene-gene interactions. We applied our approach to the Parkinson's Progression Markers Initiative (PPMI) dataset, an observational clinical study of 471 genotyped subjects (368 cases and 152 controls). With an AUC of 0.85 (95% CI = [0.72; 0.96]), the interaction-based prediction model outperforms the PRS (AUC of 0.58 (95% CI = [0.42; 0.81])). Furthermore, feature importance analysis of the model provided insights into the mechanism of Parkinson's disease. For instance, the model revealed an interaction of previously described drug target candidate genes TMEM175 and GAPDHP25. These results demonstrate that interaction-based machine learning models can improve genetic prediction models and might provide an answer to the missing heritability problem.
KW  - epistasis
KW  - machine learning
KW  - feature selection
KW  - parkinson's disease
KW  - PPMI (parkinson's progression markers initiative)
Y1  - 2021
U6  - https://doi.org/10.3389/fgene.2021.744557
SN  - 1664-8021
VL  - 12
PB  - Frontiers Media
CY  - Lausanne
ER  - 
TY  - JOUR
A1  - Döllner, Jürgen Roland Friedrich
T1  - Geospatial artificial intelligence
BT  - potentials of machine learning for 3D point clouds and geospatial digital twins
JF  - Journal of photogrammetry, remote sensing and geoinformation science : PFG : Photogrammetrie, Fernerkundung, Geoinformation
N2  - Artificial intelligence (AI) is changing fundamentally the way how IT solutions are implemented and operated across all application domains, including the geospatial domain. This contribution outlines AI-based techniques for 3D point clouds and geospatial digital twins as generic components of geospatial AI. First, we briefly reflect on the term "AI" and outline technology developments needed to apply AI to IT solutions, seen from a software engineering perspective. Next, we characterize 3D point clouds as key category of geodata and their role for creating the basis for geospatial digital twins; we explain the feasibility of machine learning (ML) and deep learning (DL) approaches for 3D point clouds. In particular, we argue that 3D point clouds can be seen as a corpus with similar properties as natural language corpora and formulate a "Naturalness Hypothesis" for 3D point clouds. In the main part, we introduce a workflow for interpreting 3D point clouds based on ML/DL approaches that derive domain-specific and application-specific semantics for 3D point clouds without having to create explicit spatial 3D models or explicit rule sets. Finally, examples are shown how ML/DL enables us to efficiently build and maintain base data for geospatial digital twins such as virtual 3D city models, indoor models, or building information models.
N2  - Georäumliche Künstliche Intelligenz: Potentiale des Maschinellen Lernens für 3D-Punktwolken und georäumliche digitale Zwillinge. Künstliche Intelligenz (KI) verändert grundlegend die Art und Weise, wie IT-Lösungen in allen Anwendungsbereichen, einschließlich dem Geoinformationsbereich, implementiert und betrieben werden. In diesem Beitrag stellen wir KI-basierte Techniken für 3D-Punktwolken als einen Baustein der georäumlichen KI vor. Zunächst werden kurz der Begriﬀ "KI” und die technologischen Entwicklungen skizziert, die für die Anwendung von KI auf IT-Lösungen aus der Sicht der Softwaretechnik erforderlich sind. Als nächstes charakterisieren wir 3D-Punktwolken als Schlüsselkategorie von Geodaten und ihre Rolle für den Aufbau von räumlichen digitalen Zwillingen; wir erläutern die Machbarkeit der Ansätze für Maschinelles Lernen (ML) und Deep Learning (DL) in Bezug auf 3D-Punktwolken. Insbesondere argumentieren wir, dass 3D-Punktwolken als Korpus mit ähnlichen Eigenschaften wie natürlichsprachliche Korpusse gesehen werden können und 
formulieren eine "Natürlichkeitshypothese” für 3D-Punktwolken. Im Hauptteil stellen wir einen Workﬂow zur Interpretation  von 3D-Punktwolken auf der Grundlage von ML/DL-Ansätzen vor, die eine domänenspeziﬁsche und anwendungsspeziﬁsche Semantik für 3D-Punktwolken ableiten, ohne explizite räumliche 3D-Modelle oder explizite Regelsätze erstellen zu müssen.  Abschließend wird an Beispielen gezeigt, wie ML/DL es ermöglichen, Basisdaten für räumliche digitale Zwillinge, wie z.B. für virtuelle 3D-Stadtmodelle, Innenraummodelle oder Gebäudeinformationsmodelle, eﬃzient aufzubauen und zu pﬂegen.
KW  - geospatial artificial intelligence
KW  - machine learning
KW  - deep learning
KW  - 3D
KW  - point clouds
KW  - geospatial digital twins
KW  - 3D city models
Y1  - 2020
U6  - https://doi.org/10.1007/s41064-020-00102-3
SN  - 2512-2789
SN  - 2512-2819
VL  - 88
IS  - 1
SP  - 15
EP  - 24
PB  - Springer International Publishing
CY  - Cham
ER  - 
TY  - JOUR
A1  - Ebers, Martin
A1  - Hoch, Veronica R. S.
A1  - Rosenkranz, Frank
A1  - Ruschemeier, Hannah
A1  - Steinrötter, Björn
T1  - The European Commission’s proposal for an Artificial Intelligence Act
BT  - a critical assessment by members of the Robotics and AI Law Society (RAILS)
JF  - J : multidisciplinary scientific journal
N2  - On 21 April 2021, the European Commission presented its long-awaited proposal for a Regulation “laying down harmonized rules on Artificial Intelligence”, the so-called “Artificial Intelligence Act” (AIA). This article takes a critical look at the proposed regulation. After an introduction (1), the paper analyzes the unclear preemptive effect of the AIA and EU competences (2), the scope of application (3), the prohibited uses of Artificial Intelligence (AI) (4), the provisions on high-risk AI systems (5), the obligations of providers and users (6), the requirements for AI systems with limited risks (7), the enforcement system (8), the relationship of the AIA with the existing legal framework (9), and the regulatory gaps (10). The last section draws some final conclusions (11).
KW  - artificial intelligence
KW  - machine learning
KW  - European Union
KW  - regulation
KW  - harmonization
KW  - Artificial Intelligence Act
Y1  - 2021
U6  - https://doi.org/10.3390/j4040043
SN  - 2571-8800
VL  - 4
IS  - 4
SP  - 589
EP  - 603
PB  - MDPI
CY  - Basel
ER  - 
TY  - JOUR
A1  - Frommhold, Martin
A1  - Heim, Arend
A1  - Barabanov, Mikhail
A1  - Maier, Franziska
A1  - Mühle, Ralf-Udo
A1  - Smirenski, Sergei M.
A1  - Heim, Wieland
T1  - Breeding habitat and nest-site selection by an obligatory "nest-cleptoparasite", the Amur Falcon Falco amurensis
JF  - Ecology and evolution
N2  - The selection of a nest site is crucial for successful reproduction of birds. Animals which re-use or occupy nest sites constructed by other species often have limited choice. Little is known about the criteria of nest-stealing species to choose suitable nesting sites and habitats. Here, we analyze breeding-site selection of an obligatory "nest-cleptoparasite", the Amur Falcon Falco amurensis. We collected data on nest sites at Muraviovka Park in the Russian Far East, where the species breeds exclusively in nests of the Eurasian Magpie Pica pica. We sampled 117 Eurasian Magpie nests, 38 of which were occupied by Amur Falcons. Nest-specific variables were assessed, and a recently developed habitat classification map was used to derive landscape metrics. We found that Amur Falcons chose a wide range of nesting sites, but significantly preferred nests with a domed roof. Breeding pairs of Eurasian Hobby Falco subbuteo and Eurasian Magpie were often found to breed near the nest in about the same distance as neighboring Amur Falcon pairs. Additionally, the occurrence of the species was positively associated with bare soil cover, forest cover, and shrub patches within their home range and negatively with the distance to wetlands. Areas of wetlands and fallow land might be used for foraging since Amur Falcons mostly depend on an insect diet. Additionally, we found that rarely burned habitats were preferred. Overall, the effect of landscape variables on the choice of actual nest sites appeared to be rather small. We used different classification methods to predict the probability of occurrence, of which the Random forest method showed the highest accuracy. The areas determined as suitable habitat showed a high concordance with the actual nest locations. We conclude that Amur Falcons prefer to occupy newly built (domed) nests to ensure high nest quality, as well as nests surrounded by available feeding habitats.
KW  - cleptoparasitism
KW  - fire
KW  - habitat use
KW  - machine learning
KW  - magpie
KW  - nest-site selection
KW  - random forest
Y1  - 2019
U6  - https://doi.org/10.1002/ece3.5878
SN  - 2045-7758
VL  - 9
IS  - 24
SP  - 14430
EP  - 14441
PB  - Wiley
CY  - Hoboken
ER  - 
TY  - JOUR
A1  - Garbulowski, Mateusz
A1  - Smolinska, Karolina
A1  - Çabuk, Uğur
A1  - Yones, Sara A.
A1  - Celli, Ludovica
A1  - Yaz, Esma Nur
A1  - Barrenas, Fredrik
A1  - Diamanti, Klev
A1  - Wadelius, Claes
A1  - Komorowski, Jan
T1  - Machine learning-based analysis of glioma grades reveals co-enrichment
JF  - Cancers
N2  - Simple Summary Gliomas are heterogenous types of cancer, therefore the therapy should be personalized and targeted toward specific pathways. We developed a methodology that corrected strong batch effects from The Cancer Genome Atlas datasets and estimated glioma grade-specific co-enrichment mechanisms using machine learning. Our findings created hypotheses for annotations, e.g., pathways, that should be considered as therapeutic targets. Gliomas develop and grow in the brain and central nervous system. Examining glioma grading processes is valuable for improving therapeutic challenges. One of the most extensive repositories storing transcriptomics data for gliomas is The Cancer Genome Atlas (TCGA). However, such big cohorts should be processed with caution and evaluated thoroughly as they can contain batch and other effects. Furthermore, biological mechanisms of cancer contain interactions among biomarkers. Thus, we applied an interpretable machine learning approach to discover such relationships. This type of transparent learning provides not only good predictability, but also reveals co-predictive mechanisms among features. In this study, we corrected the strong and confounded batch effect in the TCGA glioma data. We further used the corrected datasets to perform comprehensive machine learning analysis applied on single-sample gene set enrichment scores using collections from the Molecular Signature Database. Furthermore, using rule-based classifiers, we displayed networks of co-enrichment related to glioma grades. Moreover, we validated our results using the external glioma cohorts. We believe that utilizing corrected glioma cohorts from TCGA may improve the application and validation of any future studies. Finally, the co-enrichment and survival analysis provided detailed explanations for glioma progression and consequently, it should support the targeted treatment.
KW  - glioma
KW  - machine learning
KW  - batch effect
KW  - TCGA
KW  - co-enrichment
KW  - rough sets
Y1  - 2022
U6  - https://doi.org/10.3390/cancers14041014
SN  - 2072-6694
VL  - 14
IS  - 4
PB  - MDPI
CY  - Basel
ER  - 
TY  - JOUR
A1  - Ghafarian, Fatemeh
A1  - Wieland, Ralf
A1  - Lüttschwager, Dietmar
A1  - Nendel, Claas
T1  - Application of extreme gradient boosting and Shapley Additive explanations to predict temperature regimes inside forests from standard open-field meteorological data
JF  - Environmental modelling & software with environment data news
N2  - Forest microclimate can buffer biotic responses to summer heat waves, which are expected to become more extreme under climate warming. Prediction of forest microclimate is limited because meteorological observation standards seldom include situations inside forests. 
We use eXtreme Gradient Boosting - a Machine Learning technique - to predict the microclimate of forest sites in Brandenburg, Germany, using seasonal data comprising weather features. 
The analysis was amended by applying a SHapley Additive explanation to show the interaction effect of variables and individualised feature attributions. 
We evaluate model performance in comparison to artificial neural networks, random forest, support vector machine, and multi-linear regression. 
After implementing a feature selection, an ensemble approach was applied to combine individual models for each forest and improve robustness over a given single prediction model. 
The resulting model can be applied to translate climate change scenarios into temperatures inside forests to assess temperature-related ecosystem services provided by forests.
KW  - cooling effect
KW  - machine learning
KW  - ensemble method
KW  - ecosystem services
Y1  - 2022
U6  - https://doi.org/10.1016/j.envsoft.2022.105466
SN  - 1364-8152
SN  - 1873-6726
VL  - 156
PB  - Elsevier
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Hampf, Anna
A1  - Nendel, Claas
A1  - Strey, Simone
A1  - Strey, Robert
T1  - Biotic yield losses in the Southern Amazon, Brazil
BT  - making use of smartphone-assisted plant disease diagnosis data
JF  - Frontiers in plant science : FPLS
N2  - Pathogens and animal pests (P&A) are a major threat to global food security as they directly affect the quantity and quality of food. The Southern Amazon, Brazil's largest domestic region for soybean, maize and cotton production, is particularly vulnerable to the outbreak of P&A due to its (sub)tropical climate and intensive farming systems. However, little is known about the spatial distribution of P&A and the related yield losses. Machine learning approaches for the automated recognition of plant diseases can help to overcome this research gap. The main objectives of this study are to (1) evaluate the performance of Convolutional Neural Networks (ConvNets) in classifying P&A, (2) map the spatial distribution of P&A in the Southern Amazon, and (3) quantify perceived yield and economic losses for the main soybean and maize P&A. The objectives were addressed by making use of data collected with the smartphone application Plantix. The core of the app's functioning is the automated recognition of plant diseases via ConvNets. Data on expected yield losses were gathered through a short survey included in an "expert" version of the application, which was distributed among agronomists. Between 2016 and 2020, Plantix users collected approximately 78,000 georeferenced P&A images in the Southern Amazon. The study results indicate a high performance of the trained ConvNets in classifying 420 different crop-disease combinations. Spatial distribution maps and expert-based yield loss estimates indicate that maize rust, bacterial stalk rot and the fall armyworm are among the most severe maize P&A, whereas soybean is mainly affected by P&A like anthracnose, downy mildew, frogeye leaf spot, stink bugs and brown spot. Perceived soybean and maize yield losses amount to 12 and 16%, respectively, resulting in annual yield losses of approximately 3.75 million tonnes for each crop and economic losses of US$2 billion for both crops together. The high level of accuracy of the trained ConvNets, when paired with widespread use from following a citizen-science approach, results in a data source that will shed new light on yield loss estimates, e.g., for the analysis of yield gaps and the development of measures to minimise them.
KW  - plant pathology
KW  - animal pests
KW  - pathogens
KW  - machine learning
KW  - digital
KW  - image processing
KW  - disease diagnosis
KW  - crowdsourcing
KW  - crop losses
Y1  - 2021
U6  - https://doi.org/10.3389/fpls.2021.621168
SN  - 1664-462X
VL  - 12
PB  - Frontiers Media
CY  - Lausanne
ER  - 
TY  - JOUR
A1  - Hecker, Pascal
A1  - Steckhan, Nico
A1  - Eyben, Florian
A1  - Schuller, Björn Wolfgang
A1  - Arnrich, Bert
T1  - Voice Analysis for Neurological Disorder Recognition – A Systematic Review and Perspective on Emerging Trends
JF  - Frontiers in Digital Health
N2  - Quantifying neurological disorders from voice is a rapidly growing field of research and holds promise for unobtrusive and large-scale disorder monitoring. The data recording setup and data analysis pipelines are both crucial aspects to effectively obtain relevant information from participants. Therefore, we performed a systematic review to provide a high-level overview of practices across various neurological disorders and highlight emerging trends. PRISMA-based literature searches were conducted through PubMed, Web of Science, and IEEE Xplore to identify publications in which original (i.e., newly recorded) datasets were collected. Disorders of interest were psychiatric as well as neurodegenerative disorders, such as bipolar disorder, depression, and stress, as well as amyotrophic lateral sclerosis amyotrophic lateral sclerosis, Alzheimer's, and Parkinson's disease, and speech impairments (aphasia, dysarthria, and dysphonia). Of the 43 retrieved studies, Parkinson's disease is represented most prominently with 19 discovered datasets. Free speech and read speech tasks are most commonly used across disorders. Besides popular feature extraction toolkits, many studies utilise custom-built feature sets. Correlations of acoustic features with psychiatric and neurodegenerative disorders are presented. In terms of analysis, statistical analysis for significance of individual features is commonly used, as well as predictive modeling approaches, especially with support vector machines and a small number of artificial neural networks. An emerging trend and recommendation for future studies is to collect data in everyday life to facilitate longitudinal data collection and to capture the behavior of participants more naturally. Another emerging trend is to record additional modalities to voice, which can potentially increase analytical performance.
KW  - neurological disorders
KW  - voice
KW  - speech
KW  - everyday life
KW  - multiple modalities
KW  - machine learning
KW  - disorder recognition
Y1  - 2022
U6  - https://doi.org/10.3389/fdgth.2022.842301
SN  - 2673-253X
PB  - Frontiers Media SA
CY  - Lausanne, Schweiz
ER  - 
TY  - JOUR
A1  - Hollenstein, Nora
A1  - Trondle, Marius
A1  - Plomecka, Martyna
A1  - Kiegeland, Samuel
A1  - Ozyurt, Yilmazcan
A1  - Jäger, Lena Ann
A1  - Langer, Nicolas
T1  - The ZuCo benchmark on cross-subject reading task classification with EEG and eye-tracking data
JF  - Frontiers in psychology
N2  - We present a new machine learning benchmark for reading task classification with the goal of advancing EEG and eye-tracking research at the intersection between computational language processing and cognitive neuroscience. The benchmark task consists of a cross-subject classification to distinguish between two reading paradigms: normal reading and task-specific reading. The data for the benchmark is based on the Zurich Cognitive Language Processing Corpus (ZuCo 2.0), which provides simultaneous eye-tracking and EEG signals from natural reading of English sentences. The training dataset is publicly available, and we present a newly recorded hidden testset. We provide multiple solid baseline methods for this task and discuss future improvements. We release our code and provide an easy-to-use interface to evaluate new approaches with an accompanying public leaderboard: .
KW  - reading task classification
KW  - eye-tracking
KW  - EEG
KW  - machine learning
KW  - reading research
KW  - cross-subject evaluation
Y1  - 2023
U6  - https://doi.org/10.3389/fpsyg.2022.1028824
SN  - 1664-1078
VL  - 13
PB  - Frontiers Media
CY  - Lausanne
ER  - 
TY  - JOUR
A1  - Kappattanavar, Arpita Mallikarjuna
A1  - Hecker, Pascal
A1  - Moontaha, Sidratul
A1  - Steckhan, Nico
A1  - Arnrich, Bert
T1  - Food choices after cognitive load
BT  - an affective computing approach
JF  - Sensors
N2  - Psychology and nutritional science research has highlighted the impact of negative emotions and cognitive load on calorie consumption behaviour using subjective questionnaires. Isolated studies in other domains objectively assess cognitive load without considering its effects on eating behaviour. This study aims to explore the potential for developing an integrated eating behaviour assistant system that incorporates cognitive load factors. Two experimental sessions were conducted using custom-developed experimentation software to induce different stimuli. During these sessions, we collected 30 h of physiological, food consumption, and affective states questionnaires data to automatically detect cognitive load and analyse its effect on food choice. Utilising grid search optimisation and leave-one-subject-out cross-validation, a support vector machine model achieved a mean classification accuracy of 85.12% for the two cognitive load tasks using eight relevant features. Statistical analysis was performed on calorie consumption and questionnaire data. Furthermore, 75% of the subjects with higher negative affect significantly increased consumption of specific foods after high-cognitive-load tasks. These findings offer insights into the intricate relationship between cognitive load, affective states, and food choice, paving the way for an eating behaviour assistant system to manage food choices during cognitive load. Future research should enhance system capabilities and explore real-world applications.
KW  - cognitive load
KW  - eating behaviour
KW  - machine learning
KW  - physiological signals
KW  - photoplethysmography
KW  - electrodermal activity
KW  - sensors
Y1  - 2023
U6  - https://doi.org/10.3390/s23146597
SN  - 1424-8220
VL  - 23
IS  - 14
PB  - MDPI
CY  - Basel
ER  - 
TY  - JOUR
A1  - Kibrik, Andrej A.
A1  - Khudyakova, Mariya V.
A1  - Dobrov, Grigory B.
A1  - Linnik, Anastasia
A1  - Zalmanov, Dmitrij A.
T1  - Referential Choice
BT  - Predictability and Its Limits
JF  - Frontiers in psychology
N2  - We report a study of referential choice in discourse production, understood as the choice between various types of referential devices, such as pronouns and full noun phrases. Our goal is to predict referential choice, and to explore to what extent such prediction is possible. Our approach to referential choice includes a cognitively informed theoretical component, corpus analysis, machine learning methods and experimentation with human participants. Machine learning algorithms make use of 25 factors, including referent’s properties (such as animacy and protagonism), the distance between a referential expression and its antecedent, the antecedent’s syntactic role, and so on. Having found the predictions of our algorithm to coincide with the original almost 90% of the time, we hypothesized that fully accurate prediction is not possible because, in many situations, more than one referential option is available. This hypothesis was supported by an experimental study, in which participants answered questions about either the original text in the corpus, or about a text modified in accordance with the algorithm’s prediction. Proportions of correct answers to these questions, as well as participants’ rating of the questions’ difficulty, suggested that divergences between the algorithm’s prediction and the original referential device in the corpus occur overwhelmingly in situations where the referential choice is not categorical.
KW  - referential choice
KW  - non-categoricity
KW  - machine learning
KW  - cross-methodological approach
KW  - discourse production
Y1  - 2016
U6  - https://doi.org/10.3389/fpsyg.2016.01429
SN  - 1664-1078
VL  - 7
PB  - Frontiers Research Foundation
CY  - Lausanne
ER  - 
TY  - JOUR
A1  - Kibrik, Andrej A.
A1  - Khudyakova, Mariya V.
A1  - Dobrov, Grigory B.
A1  - Linnik, Anastasia
A1  - Zalmanov, Dmitrij A.
T1  - Referential Choice: Predictability and Its Limits
JF  - Frontiers in psychology
N2  - We report a study of referential choice in discourse production, understood as the choice between various types of referential devices, such as pronouns and full noun phrases. Our goal is to predict referential choice, and to explore to what extent such prediction is possible. Our approach to referential choice includes a cognitively informed theoretical component, corpus analysis, machine learning methods and experimentation with human participants. Machine learning algorithms make use of 25 factors, including referent’s properties (such as animacy and protagonism), the distance between a referential expression and its antecedent, the antecedent’s syntactic role, and so on. Having found the predictions of our algorithm to coincide with the original almost 90% of the time, we hypothesized that fully accurate prediction is not possible because, in many situations, more than one referential option is available. This hypothesis was supported by an experimental study, in which participants answered questions about either the original text in the corpus, or about a text modified in accordance with the algorithm’s prediction. Proportions of correct answers to these questions, as well as participants’ rating of the questions’ difficulty, suggested that divergences between the algorithm’s prediction and the original referential device in the corpus occur overwhelmingly in situations where the referential choice is not categorical.
KW  - referential choice
KW  - non-categoricity
KW  - machine learning
KW  - cross-methodological approach
KW  - discourse production
Y1  - 2016
U6  - https://doi.org/10.3389/fpsyg.2016.01429
SN  - 1664-1078
VL  - 7
SP  - 9939
EP  - 9947
PB  - Frontiers Research Foundation
CY  - Lausanne
ER  -