Refine
Year of publication
- 2019 (2) (remove)
Document Type
- Article (1)
- Doctoral Thesis (1)
Language
- English (2)
Is part of the Bibliography
- yes (2)
Keywords
- machine learning (2) (remove)
Institute
- Institut für Biochemie und Biologie (2) (remove)
Since half a century, cytometry has been a major scientific discipline in the field of cytomics - the study of system’s biology at single cell level. It enables the investigation of physiological processes, functional characteristics and rare events with proteins by analysing multiple parameters on an individual cell basis. In the last decade, mass cytometry has been established which increased the parallel measurement to up to 50 proteins. This has shifted the analysis strategy from conventional consecutive manual gates towards multi-dimensional data processing. Novel algorithms have been developed to tackle these high-dimensional protein combinations in the data. They are mainly based on clustering or non-linear dimension reduction techniques, or both, often combined with an upstream downsampling procedure. However, these tools have obstacles either in comprehensible interpretability, reproducibility, computational complexity or in comparability between samples and groups.
To address this bottleneck, a reproducible, semi-automated cytometric data mining workflow PRI (pattern recognition of immune cells) is proposed which combines three main steps: i) data preparation and storage; ii) bin-based combinatorial variable engineering of three protein markers, the so called triploTs, and subsequent sectioning of these triploTs in four parts; and iii) deployment of a data-driven supervised learning algorithm, the cross-validated elastic-net regularized logistic regression, with these triploT sections as input variables. As a result, the selected variables from the models are ranked by their prevalence, which potentially have discriminative value. The purpose is to significantly facilitate the identification of meaningful subpopulations, which are most distinguish between two groups. The proposed workflow PRI is exemplified by a recently published public mass cytometry data set. The authors found a T cell subpopulation which is discriminative between effective and ineffective treatment of breast carcinomas in mice. With PRI, that subpopulation was not only validated, but was further narrowed down as a particular Th1 cell population. Moreover, additional insights of combinatorial protein expressions are revealed in a traceable manner. An essential element in the workflow is the reproducible variable engineering. These variables serve as basis for a clearly interpretable visualization, for a structured variable exploration and as input layers in neural network constructs.
PRI facilitates the determination of marker levels in a semi-continuous manner. Jointly with the combinatorial display, it allows a straightforward observation of correlating patterns, and thus, the dominant expressed markers and cell hierarchies. Furthermore, it enables the identification and complex characterization of discriminating subpopulations due to its reproducible and pseudo-multi-parametric pattern presentation. This endorses its applicability as a tool for unbiased investigations on cell subsets within multi-dimensional cytometric data sets.
The selection of a nest site is crucial for successful reproduction of birds. Animals which re-use or occupy nest sites constructed by other species often have limited choice. Little is known about the criteria of nest-stealing species to choose suitable nesting sites and habitats. Here, we analyze breeding-site selection of an obligatory "nest-cleptoparasite", the Amur Falcon Falco amurensis. We collected data on nest sites at Muraviovka Park in the Russian Far East, where the species breeds exclusively in nests of the Eurasian Magpie Pica pica. We sampled 117 Eurasian Magpie nests, 38 of which were occupied by Amur Falcons. Nest-specific variables were assessed, and a recently developed habitat classification map was used to derive landscape metrics. We found that Amur Falcons chose a wide range of nesting sites, but significantly preferred nests with a domed roof. Breeding pairs of Eurasian Hobby Falco subbuteo and Eurasian Magpie were often found to breed near the nest in about the same distance as neighboring Amur Falcon pairs. Additionally, the occurrence of the species was positively associated with bare soil cover, forest cover, and shrub patches within their home range and negatively with the distance to wetlands. Areas of wetlands and fallow land might be used for foraging since Amur Falcons mostly depend on an insect diet. Additionally, we found that rarely burned habitats were preferred. Overall, the effect of landscape variables on the choice of actual nest sites appeared to be rather small. We used different classification methods to predict the probability of occurrence, of which the Random forest method showed the highest accuracy. The areas determined as suitable habitat showed a high concordance with the actual nest locations. We conclude that Amur Falcons prefer to occupy newly built (domed) nests to ensure high nest quality, as well as nests surrounded by available feeding habitats.