J. Computer Applications
Refine
Has Fulltext
- yes (7)
Document Type
- Doctoral Thesis (6)
- Habilitation Thesis (1)
Language
- English (7)
Is part of the Bibliography
- yes (7)
Keywords
- Bewegung (1)
- Biomarker-Erkennung (1)
- Computersicherheit (1)
- Diabetes mellitus Typ 2 (1)
- E-Learning (1)
- Epidemiologie (1)
- Ernährung (1)
- Fettstoffwechsel (1)
- Forstwirtschaft (1)
- Gen-Expression (1)
Gene expression data is analyzed to identify biomarkers, e.g. relevant genes, which serve for diagnostic, predictive, or prognostic use. Traditional approaches for biomarker detection select distinctive features from the data based exclusively on the signals therein, facing multiple shortcomings in regards to overfitting, biomarker robustness, and actual biological relevance. Prior knowledge approaches are expected to address these issues by incorporating prior biological knowledge, e.g. on gene-disease associations, into the actual analysis. However, prior knowledge approaches are currently not widely applied in practice because they are often use-case specific and seldom applicable in a different scope. This leads to a lack of comparability of prior knowledge approaches, which in turn makes it currently impossible to assess their effectiveness in a broader context.
Our work addresses the aforementioned issues with three contributions. Our first contribution provides formal definitions for both prior knowledge and the flexible integration thereof into the feature selection process. Central to these concepts is the automatic retrieval of prior knowledge from online knowledge bases, which allows for streamlining the retrieval process and agreeing on a uniform definition for prior knowledge. We subsequently describe novel and generalized prior knowledge approaches that are flexible regarding the used prior knowledge and applicable to varying use case domains. Our second contribution is the benchmarking platform Comprior. Comprior applies the aforementioned concepts in practice and allows for flexibly setting up comprehensive benchmarking studies for examining the performance of existing and novel prior knowledge approaches. It streamlines the retrieval of prior knowledge and allows for combining it with prior knowledge approaches. Comprior demonstrates the practical applicability of our concepts and further fosters the overall development and comparability of prior knowledge approaches. Our third contribution is a comprehensive case study on the effectiveness of prior knowledge approaches. For that, we used Comprior and tested a broad range of both traditional and prior knowledge approaches in combination with multiple knowledge bases on data sets from multiple disease domains. Ultimately, our case study constitutes a thorough assessment of a) the suitability of selected knowledge bases for integration, b) the impact of prior knowledge being applied at different integration levels, and c) the improvements in terms of classification performance, biological relevance, and overall robustness.
In summary, our contributions demonstrate that generalized concepts for prior knowledge and a streamlined retrieval process improve the applicability of prior knowledge approaches. Results from our case study show that the integration of prior knowledge positively affects biomarker results, particularly regarding their robustness. Our findings provide the first in-depth insights on the effectiveness of prior knowledge approaches and build a valuable foundation for future research.
Learning analytics at scale
(2021)
Digital technologies are paving the way for innovative educational approaches. The learning format of Massive Open Online Courses (MOOCs) provides a highly accessible path to lifelong learning while being more affordable and flexible than face-to-face courses. Thereby, thousands of learners can enroll in courses mostly without admission restrictions, but this also raises challenges. Individual supervision by teachers is barely feasible, and learning persistence and success depend on students' self-regulatory skills. Here, technology provides the means for support. The use of data for decision-making is already transforming many fields, whereas in education, it is still a young research discipline. Learning Analytics (LA) is defined as the measurement, collection, analysis, and reporting of data about learners and their learning contexts with the purpose of understanding and improving learning and learning environments. The vast amount of data that MOOCs produce on the learning behavior and success of thousands of students provides the opportunity to study human learning and develop approaches addressing the demands of learners and teachers.
The overall purpose of this dissertation is to investigate the implementation of LA at the scale of MOOCs and to explore how data-driven technology can support learning and teaching in this context. To this end, several research prototypes have been iteratively developed for the HPI MOOC Platform. Hence, they were tested and evaluated in an authentic real-world learning environment. Most of the results can be applied on a conceptual level to other MOOC platforms as well. The research contribution of this thesis thus provides practical insights beyond what is theoretically possible. In total, four system components were developed and extended:
(1) The Learning Analytics Architecture: A technical infrastructure to collect, process, and analyze event-driven learning data based on schema-agnostic pipelining in a service-oriented MOOC platform. (2) The Learning Analytics Dashboard for Learners: A tool for data-driven support of self-regulated learning, in particular to enable learners to evaluate and plan their learning activities, progress, and success by themselves. (3) Personalized Learning Objectives: A set of features to better connect learners' success to their personal intentions based on selected learning objectives to offer guidance and align the provided data-driven insights about their learning progress. (4) The Learning Analytics Dashboard for Teachers: A tool supporting teachers with data-driven insights to enable the monitoring of their courses with thousands of learners, identify potential issues, and take informed action.
For all aspects examined in this dissertation, related research is presented, development processes and implementation concepts are explained, and evaluations are conducted in case studies. Among other findings, the usage of the learner dashboard in combination with personalized learning objectives demonstrated improved certification rates of 11.62% to 12.63%. Furthermore, it was observed that the teacher dashboard is a key tool and an integral part for teaching in MOOCs. In addition to the results and contributions, general limitations of the work are discussed—which altogether provide a solid foundation for practical implications and future research.
Continental rift systems open up unique possibilities to study the geodynamic system of our planet: geodynamic localization processes are imprinted in the morphology of the rift by governing the time-dependent activity of faults, the topographic evolution of the rift or by controlling whether a rift is symmetric or asymmetric. Since lithospheric necking localizes strain towards the rift centre, deformation structures of previous rift phases are often well preserved and passive margins, the end product of continental rifting, retain key information about the tectonic history from rift inception to continental rupture.
Current understanding of continental rift evolution is based on combining observations from active rifts with data collected at rifted margins. Connecting these isolated data sets is often accomplished in a conceptual way and leaves room for subjective interpretation. Geodynamic forward models, however, have the potential to link individual data sets in a quantitative manner, using additional constraints from rock mechanics and rheology, which allows to transcend previous conceptual models of rift evolution. By quantifying geodynamic processes within continental rifts, numerical modelling allows key insight to tectonic processes that operate also in other plate boundary settings, such as mid ocean ridges, collisional mountain chains or subduction zones.
In this thesis, I combine numerical, plate-tectonic, analytical, and analogue modelling approaches, whereas numerical thermomechanical modelling constitutes the primary tool. This method advanced rapidly during the last two decades owing to dedicated software development and the availability of massively parallel computer facilities. Nevertheless, only recently the geodynamical modelling community was able to capture 3D lithospheric-scale rift dynamics from onset of extension to final continental rupture.
The first chapter of this thesis provides a broad introduction to continental rifting, a summary of the applied rift modelling methods and a short overview of previews studies. The following chapters, which constitute the main part of this thesis feature studies on plate boundary dynamics in two and three dimension followed by global scale analyses (Fig. 1).
Chapter II focuses on 2D geodynamic modelling of rifted margin formation. It highlights the formation of wide areas of hyperextended crustal slivers via rift migration as a key process that affected many rifted margins worldwide. This chapter also contains a study of rift velocity evolution, showing that rift strength loss and extension velocity are linked through a dynamic feed-back. This process results in abrupt accelerations of the involved plates during rifting illustrating for the first time that rift dynamics plays a role in changing global-scale plate motions. Since rift velocity affects key processes like faulting, melting and lower crustal flow, this study also implies that the slow-fast velocity evolution should be imprinted in rifted margin structures.
Chapter III relies on 3D Cartesian rift models in order to investigate various aspects of rift obliquity. Oblique rifting occurs if the extension direction is not orthogonal to the rift trend. Using 3D lithospheric-scale models from rift initialisation to breakup I could isolate a characteristic evolution of dominant fault orientations. Further work in Chapter III addresses the impact of rift obliquity on the strength of the rift system. We illustrate that oblique rifting is mechanically preferred over orthogonal rifting, because the brittle yielding requires a lower tectonic force. This mechanism elucidates rift competition during South Atlantic rifting, where the more oblique Equatorial Atlantic Rift proceeded to breakup while the simultaneously active but less oblique West African rift system became a failed rift. Finally this Chapter also investigates the impact of a previous rift phase on current tectonic activity in the linkage area of the Kenyan with Ethiopian rift. We show that the along strike changes in rift style are not caused by changes in crustal rheology. Instead the rift linkage pattern in this area can be explained when accounting for the thinned crust and lithosphere of a Mesozoic rift event.
Chapter IV investigates rifting from the global perspective. A first study extends the oblique rift topic of the previous chapter to global scale by investigating the frequency of oblique rifting during the last 230 million years. We find that approximately 70% of all ocean-forming rift segments involved an oblique component of extension where obliquities exceed 20°. This highlights the relevance of 3D approaches in modelling, surveying, and interpretation of many rifted margins. In a final study, we propose a link between continental rift activity, diffuse CO2 degassing and Mesozoic/Cenozoic climate changes. We used recent CO2 flux measurements in continental rifts to estimate worldwide rift-related CO2 release, which we based on the global extent of rifts through time. The first-order correlation to paleo-atmospheric CO2 proxy data suggests that rifts constitute a major element of the global carbon cycle.
Human actuation
(2018)
Ever since the conception of the virtual reality headset in 1968, many researchers have argued that the next step in virtual reality is to allow users to not only see and hear, but also feel virtual worlds. One approach is to use mechanical equipment to provide haptic feedback, e.g., robotic arms, exoskeletons and motion platforms. However, the size and the weight of such mechanical equipment tends to be proportional to its target’s size and weight, i.e., providing human-scale haptic feedback requires human-scale equipment, often restricting them to arcades and lab environments.
The key idea behind this dissertation is to bypass mechanical equipment by instead leveraging human muscle power. We thus create software systems that orchestrate humans in doing such mechanical labor—this is what we call human actuation. A potential benefit of such systems is that humans are more generic, flexible, and versatile than machines. This brings a wide range of haptic feedback to modern virtual reality systems.
We start with a proof-of-concept system—Haptic Turk, focusing on delivering motion experiences just like a motion platform. All Haptic Turk setups consist of a user who is supported by one or more human actuators. The user enjoys an interactive motion simulation such as a hang glider experience, but the motion is generated by those human actuators who manually lift, tilt, and push the user’s limbs or torso. To get the timing and force right, timed motion instructions in a format familiar from rhythm games are generated by the system.
Next, we extend the concept of human actuation from 3-DoF to 6-DoF virtual reality where users have the freedom to walk around. TurkDeck tackles this problem by orchestrating a group of human actuators to reconfigure a set of passive props on the fly while the user is progressing in the virtual environment. TurkDeck schedules human actuators by their distances from the user, and instructs them to reconfigure the props to the right place on the right time using laser projection and voice output.
Our studies in Haptic Turk and TurkDeck showed that human actuators enjoyed the experience but not as much as users. To eliminate the need of dedicated human actuators, Mutual Turk makes everyone a user by exchanging mechanical actuation between two or more users. Mutual Turk’s main functionality is that it orchestrates the users so as to actuate props at just the right moment and with just the right force to produce the correct feedback in each other's experience.
Finally, we further eliminate the need of another user, making human actuation applicable to single-user experiences. iTurk makes the user constantly reconfigure and animate otherwise passive props. This allows iTurk to provide virtual worlds with constantly varying or even animated haptic effects, even though the only animate entity present in the system is the user. Our demo experience features one example each of iTurk’s two main types of props, i.e., reconfigurable props (the foldable board from TurkDeck) and animated props (the pendulum).
We conclude this dissertation by summarizing the findings of our explorations and pointing out future directions. We discuss the development of human actuation compare to traditional machine actuation, the possibility of combining human and machine actuators and interaction models that involve more human actuators.
Background: Consumption of whole-grain, coffee, and red meat were consistently related to the risk of developing type 2 diabetes in prospective cohort studies, but potentially underlying biological mechanisms are not well understood. Metabolomics profiles were shown to be sensitive to these dietary exposures, and at the same time to be informative with respect to the risk of type 2 diabetes. Moreover, graphical network-models were demonstrated to reflect the biological processes underlying high-dimensional metabolomics profiles.
Aim: The aim of this study was to infer hypotheses on the biological mechanisms that link consumption of whole-grain bread, coffee, and red meat, respectively, to the risk of developing type 2 diabetes. More specifically, it was aimed to consider network models of amino acid and lipid profiles as potential mediators of these risk-relations.
Study population: Analyses were conducted in the prospective EPIC-Potsdam cohort (n = 27,548), applying a nested case-cohort design (n = 2731, including 692 incident diabetes cases). Habitual diet was assessed with validated semiquantitative food-frequency questionnaires. Concentrations of 126 metabolites (acylcarnitines, phosphatidylcholines, sphingomyelins, amino acids) were determined in baseline-serum samples. Incident type 2 diabetes cases were assed and validated in an active follow-up procedure. The median follow-up time was 6.6 years.
Analytical design: The methodological approach was conceptually based on counterfactual causal inference theory. Observations on the network-encoded conditional independence structure restricted the space of possible causal explanations of observed metabolomics-data patterns. Given basic directionality assumptions (diet affects metabolism; metabolism affects future diabetes incidence), adjustment for a subset of direct neighbours was sufficient to consistently estimate network-independent direct effects. Further model-specification, however, was limited due to missing directionality information on the links between metabolites. Therefore, a multi-model approach was applied to infer the bounds of possible direct effects. All metabolite-exposure links and metabolite-outcome links, respectively, were classified into one of three categories: direct effect, ambiguous (some models indicated an effect others not), and no-effect.
Cross-sectional and longitudinal relations were evaluated in multivariable-adjusted linear regression and Cox proportional hazard regression models, respectively. Models were comprehensively adjusted for age, sex, body mass index, prevalence of hypertension, dietary and lifestyle factors, and medication.
Results: Consumption of whole-grain bread was related to lower levels of several lipid metabolites with saturated and monounsaturated fatty acids. Coffee was related to lower aromatic and branched-chain amino acids, and had potential effects on the fatty acid profile within lipid classes. Red meat was linked to lower glycine levels and was related to higher circulating concentrations of branched-chain amino acids. In addition, potential marked effects of red meat consumption on the fatty acid composition within the investigated lipid classes were identified.
Moreover, potential beneficial and adverse direct effects of metabolites on type 2 diabetes risk were detected. Aromatic amino acids and lipid metabolites with even-chain saturated (C14-C18) and with specific polyunsaturated fatty acids had adverse effects on type 2 diabetes risk. Glycine, glutamine, and lipid metabolites with monounsaturated fatty acids and with other species of polyunsaturated fatty acids were classified as having direct beneficial effects on type 2 diabetes risk.
Potential mediators of the diet-diabetes links were identified by graphically overlaying this information in network models. Mediation analyses revealed that effects on lipid metabolites could potentially explain about one fourth of the whole-grain bread effect on type 2 diabetes risk; and that effects of coffee and red meat consumption on amino acid and lipid profiles could potentially explain about two thirds of the altered type 2 diabetes risk linked to these dietary exposures.
Conclusion: An algorithm was developed that is capable to integrate single external variables (continuous exposures, survival time) and high-dimensional metabolomics-data in a joint graphical model. Application to the EPIC-Potsdam cohort study revealed that the observed conditional independence patterns were consistent with the a priori mediation hypothesis: Early effects on lipid and amino acid metabolism had the potential to explain large parts of the link between three of the most widely discussed diabetes-related dietary exposures and the risk of developing type 2 diabetes.
Computer Security deals with the detection and mitigation of threats to computer networks, data, and computing hardware. This
thesis addresses the following two computer security problems: email spam campaign and malware detection.
Email spam campaigns can easily be generated using popular dissemination tools by specifying simple grammars that serve as message templates. A grammar is disseminated to nodes of a bot net, the nodes create messages by instantiating the grammar at random. Email spam campaigns can encompass huge data volumes and therefore pose a threat to the stability of the infrastructure of email service providers that have to store them. Malware -software that serves a malicious purpose- is affecting web servers, client computers via active content, and client computers through executable files. Without the help of malware detection systems it would be easy for malware creators to collect sensitive information or to infiltrate computers.
The detection of threats -such as email-spam messages, phishing messages, or malware- is an adversarial and therefore intrinsically
difficult problem. Threats vary greatly and evolve over time. The detection of threats based on manually-designed rules is therefore
difficult and requires a constant engineering effort. Machine-learning is a research area that revolves around the analysis of data and the discovery of patterns that describe aspects of the data. Discriminative learning methods extract prediction models from data that are optimized to predict a target attribute as accurately as possible. Machine-learning methods hold the promise of automatically identifying patterns that robustly and accurately detect threats. This thesis focuses on the design and analysis of discriminative learning methods for the two computer-security problems under investigation: email-campaign and malware detection.
The first part of this thesis addresses email-campaign detection. We focus on regular expressions as a syntactic framework, because regular expressions are intuitively comprehensible by security engineers and administrators, and they can be applied as a detection mechanism in an extremely efficient manner. In this setting, a prediction model is provided with exemplary messages from an email-spam campaign. The prediction model has to generate a regular expression that reveals the syntactic pattern that underlies the entire campaign, and that a security engineers finds comprehensible and feels confident enough to use the expression to blacklist further messages at the email server. We model this problem as two-stage learning problem with structured input and output spaces which can be solved using standard cutting plane methods. Therefore we develop an appropriate loss function, and derive a decoder for the resulting optimization problem.
The second part of this thesis deals with the problem of predicting whether a given JavaScript or PHP file is malicious or benign. Recent malware analysis techniques use static or dynamic features, or both. In fully dynamic analysis, the software or script is executed and observed for malicious behavior in a sandbox environment. By contrast, static analysis is based on features that can be extracted directly from the program file. In order to bypass static detection mechanisms, code obfuscation techniques are used to spread a malicious program file in many different syntactic variants. Deobfuscating the code before applying a static classifier can be subjected to mostly static code analysis and can overcome the problem of obfuscated malicious code, but on the other hand increases the computational costs of malware detection by an order of magnitude. In this thesis we present a cascaded architecture in which a classifier first performs a static analysis of the original code and -based on the outcome of this first classification step- the code may be deobfuscated and classified again. We explore several types of features including token $n$-grams, orthogonal sparse bigrams, subroutine-hashings, and syntax-tree features and study the robustness of detection methods and feature types against the evolution of malware over time. The developed tool scans very large file collections quickly and accurately.
Each model is evaluated on real-world data and compared to reference methods. Our approach of inferring regular expressions to filter emails belonging to an email spam campaigns leads to models with a high true-positive rate at a very low false-positive rate that is an order of magnitude lower than that of a commercial content-based filter. Our presented system -REx-SVMshort- is being used by a commercial email service provider and complements content-based and IP-address based filtering.
Our cascaded malware detection system is evaluated on a high-quality data set of almost 400,000 conspicuous PHP files and a collection of more than 1,00,000 JavaScript files. From our case study we can conclude that our system can quickly and accurately process large data collections at a low false-positive rate.
The relationship between climate and forest productivity is an intensively studied subject in forest science. This Thesis is embedded within the general framework of future forest growth under climate change and its implications for the ongoing forest conversion. My objective is to investigate the future forest productivity at different spatial scales (from a single specific forest stand to aggregated information across Germany) with focus on oak-pine forests in the federal state of Brandenburg. The overarching question is: how are the oak-pine forests affected by climate change described by a variety of climate scenarios. I answer this question by using a model based analysis of tree growth processes and responses to different climate scenarios with emphasis on drought events. In addition, a method is developed which considers climate change uncertainty of forest management planning.
As a first 'screening' of climate change impacts on forest productivity, I calculated the change in net primary production on the base of a large set of climate scenarios for different tree species and the total area of Germany. Temperature increases up to 3 K lead to positive effects on the net primary production of all selected tree species. But, in water-limited regions this positive net primary production trend is dependent on the length of drought periods which results in a larger uncertainty regarding future forest productivity. One of the regions with the highest uncertainty of net primary production development is the federal state of Brandenburg.
To enhance the understanding and ability of model based analysis of tree growth sensitivity to drought stress two water uptake approaches in pure pine and mixed oak-pine stands are contrasted. The first water uptake approach consists of an empirical function for root water uptake. The second approach is more mechanistic and calculates the differences of soil water potential along a soil-plant-atmosphere continuum. I assumed the total root resistance to vary at low, medium and high total root resistance levels. For validation purposes three data sets on different tree growth relevant time scales are used. Results show that, except the mechanistic water uptake approach with high total root resistance, all transpiration outputs exceeded observed values. On the other hand high transpiration led to a better match of observed soil water content. The strongest correlation between simulated and observed annual tree ring width occurred with the mechanistic water uptake approach and high total root resistance. The findings highlight the importance of severe drought as a main reason for small diameter increment, best supported by the mechanistic water uptake approach with high root resistance. However, if all aspects of the data sets are considered no approach can be judged superior to the other. I conclude that the uncertainty of future productivity of water-limited forest ecosystems under changing environmental conditions is linked to simulated root water uptake.
Finally my study aimed at the impacts of climate change combined with management scenarios on an oak-pine forest to evaluate growth, biomass and the amount of harvested timber. The pine and the oak trees are 104 and 9 years old respectively. Three different management scenarios with different thinning intensities and different climate scenarios are used to simulate the performance of management strategies which explicitly account for the risks associated with achieving three predefined objectives (maximum carbon storage, maximum harvested timber, intermediate). I found out that in most cases there is no general management strategy which fits best to different objectives. The analysis of variance in the growth related model outputs showed an increase of climate uncertainty with increasing climate warming. Interestingly, the increase of climate-induced uncertainty is much higher from 2 to 3 K than from 0 to 2 K.