Refine
Has Fulltext
- no (3)
Year of publication
- 2019 (3) (remove)
Language
- English (3)
Is part of the Bibliography
- yes (3)
Keywords
- Machine learning (3) (remove)
High-throughput RNA sequencing produces large gene expression datasets whose analysis leads to a better understanding of diseases like cancer. The nature of RNA-Seq data poses challenges to its analysis in terms of its high dimensionality, noise, and complexity of the underlying biological processes. Researchers apply traditional machine learning approaches, e. g. hierarchical clustering, to analyze this data. Until it comes to validation of the results, the analysis is based on the provided data only and completely misses the biological context. However, gene expression data follows particular patterns - the underlying biological processes. In our research, we aim to integrate the available biological knowledge earlier in the analysis process. We want to adapt state-of-the-art data mining algorithms to consider the biological context in their computations and deliver meaningful results for researchers.
Detection of malware-infected computers and detection of malicious web domains based on their encrypted HTTPS traffic are challenging problems, because only addresses, timestamps, and data volumes are observable. The detection problems are coupled, because infected clients tend to interact with malicious domains. Traffic data can be collected at a large scale, and antivirus tools can be used to identify infected clients in retrospect. Domains, by contrast, have to be labeled individually after forensic analysis. We explore transfer learning based on sluice networks; this allows the detection models to bootstrap each other. In a large-scale experimental study, we find that the model outperforms known reference models and detects previously unknown malware, previously unknown malware families, and previously unknown malicious domains.
The Kp index is a measure of the midlatitude global geomagnetic activity and represents short-term magnetic variations driven by solar wind plasma and interplanetary magnetic field. The Kp index is one of the most widely used indicators for space weather alerts and serves as input to various models, such as for the thermosphere and the radiation belts. It is therefore crucial to predict the Kp index accurately. Previous work in this area has mostly employed artificial neural networks to nowcast Kp, based their inferences on the recent history of Kp and on solar wind measurements at L1. In this study, we systematically test how different machine learning techniques perform on the task of nowcasting and forecasting Kp for prediction horizons of up to 12 hr. Additionally, we investigate different methods of machine learning and information theory for selecting the optimal inputs to a predictive model. We illustrate how these methods can be applied to select the most important inputs to a predictive model of Kp and to significantly reduce input dimensionality. We compare our best performing models based on a reduced set of optimal inputs with the existing models of Kp, using different test intervals, and show how this selection can affect model performance.