TY  - INPR
A1  - Prasse, Paul
A1  - Gruben, Gerrit
A1  - Machlika, Lukas
A1  - Pevny, Tomas
A1  - Sofka, Michal
A1  - Scheffer, Tobias
T1  - Malware Detection by HTTPS Traffic Analysis
N2  - In order to evade detection by network-traffic analysis, a growing proportion of malware uses the encrypted HTTPS protocol. We explore the problem of detecting  malware on client computers based on HTTPS traffic analysis. In this setting, malware has to be detected based on the host IP address, ports, timestamp,  and data volume information of TCP/IP packets that are sent and received by all the applications on the client. We develop a scalable protocol that allows us to collect network flows of known malicious and benign applications as training data and derive a malware-detection method based on a neural networks and sequence classification. We study the method's ability to detect known and new, unknown malware in a large-scale empirical study.
KW  - machine learning
KW  - computer security
Y1  - 2017
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-100942
ER  - 
TY  - JOUR
A1  - Kibrik, Andrej A.
A1  - Khudyakova, Mariya V.
A1  - Dobrov, Grigory B.
A1  - Linnik, Anastasia
A1  - Zalmanov, Dmitrij A.
T1  - Referential Choice
BT  - Predictability and Its Limits
JF  - Frontiers in psychology
N2  - We report a study of referential choice in discourse production, understood as the choice between various types of referential devices, such as pronouns and full noun phrases. Our goal is to predict referential choice, and to explore to what extent such prediction is possible. Our approach to referential choice includes a cognitively informed theoretical component, corpus analysis, machine learning methods and experimentation with human participants. Machine learning algorithms make use of 25 factors, including referent’s properties (such as animacy and protagonism), the distance between a referential expression and its antecedent, the antecedent’s syntactic role, and so on. Having found the predictions of our algorithm to coincide with the original almost 90% of the time, we hypothesized that fully accurate prediction is not possible because, in many situations, more than one referential option is available. This hypothesis was supported by an experimental study, in which participants answered questions about either the original text in the corpus, or about a text modified in accordance with the algorithm’s prediction. Proportions of correct answers to these questions, as well as participants’ rating of the questions’ difficulty, suggested that divergences between the algorithm’s prediction and the original referential device in the corpus occur overwhelmingly in situations where the referential choice is not categorical.
KW  - referential choice
KW  - non-categoricity
KW  - machine learning
KW  - cross-methodological approach
KW  - discourse production
Y1  - 2016
U6  - https://doi.org/10.3389/fpsyg.2016.01429
SN  - 1664-1078
VL  - 7
PB  - Frontiers Research Foundation
CY  - Lausanne
ER  - 
TY  - GEN
A1  - Kibrik, Andrej A.
A1  - Khudyakova, Mariya V.
A1  - Dobrov, Grigory B.
A1  - Linnik, Anastasia
A1  - Zalmanov, Dmitrij A.
T1  - Referential Choice
BT  - Predictability and Its Limits
N2  - We report a study of referential choice in discourse production, understood as the choice between various types of referential devices, such as pronouns and full noun phrases. Our goal is to predict referential choice, and to explore to what extent such prediction is possible. Our approach to referential choice includes a cognitively informed theoretical component, corpus analysis, machine learning methods and experimentation with human participants. Machine learning algorithms make use of 25 factors, including referent’s properties (such as animacy and protagonism), the distance between a referential expression and its antecedent, the antecedent’s syntactic role, and so on. Having found the predictions of our algorithm to coincide with the original almost 90% of the time, we hypothesized that fully accurate prediction is not possible because, in many situations, more than one referential option is available. This hypothesis was supported by an experimental study, in which participants answered questions about either the original text in the corpus, or about a text modified in accordance with the algorithm’s prediction. Proportions of correct answers to these questions, as well as participants’ rating of the questions’ difficulty, suggested that divergences between the algorithm’s prediction and the original referential device in the corpus occur overwhelmingly in situations where the referential choice is not categorical.
T3  - Zweitveröffentlichungen der Universität Potsdam : Humanwissenschaftliche Reihe - 306 
KW  - cross-methodological approach
KW  - discourse production
KW  - machine learning
KW  - non-categoricity
KW  - referential choice
Y1  - 2016
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-100313
ER  - 
TY  - JOUR
A1  - Kibrik, Andrej A.
A1  - Khudyakova, Mariya V.
A1  - Dobrov, Grigory B.
A1  - Linnik, Anastasia
A1  - Zalmanov, Dmitrij A.
T1  - Referential Choice: Predictability and Its Limits
JF  - Frontiers in psychology
N2  - We report a study of referential choice in discourse production, understood as the choice between various types of referential devices, such as pronouns and full noun phrases. Our goal is to predict referential choice, and to explore to what extent such prediction is possible. Our approach to referential choice includes a cognitively informed theoretical component, corpus analysis, machine learning methods and experimentation with human participants. Machine learning algorithms make use of 25 factors, including referent’s properties (such as animacy and protagonism), the distance between a referential expression and its antecedent, the antecedent’s syntactic role, and so on. Having found the predictions of our algorithm to coincide with the original almost 90% of the time, we hypothesized that fully accurate prediction is not possible because, in many situations, more than one referential option is available. This hypothesis was supported by an experimental study, in which participants answered questions about either the original text in the corpus, or about a text modified in accordance with the algorithm’s prediction. Proportions of correct answers to these questions, as well as participants’ rating of the questions’ difficulty, suggested that divergences between the algorithm’s prediction and the original referential device in the corpus occur overwhelmingly in situations where the referential choice is not categorical.
KW  - referential choice
KW  - non-categoricity
KW  - machine learning
KW  - cross-methodological approach
KW  - discourse production
Y1  - 2016
U6  - https://doi.org/10.3389/fpsyg.2016.01429
SN  - 1664-1078
VL  - 7
SP  - 9939
EP  - 9947
PB  - Frontiers Research Foundation
CY  - Lausanne
ER  - 
TY  - GEN
A1  - Hollstein, André
A1  - Segl, Karl
A1  - Guanter, Luis
A1  - Brell, Maximilian
A1  - Enesco, Marta
T1  - Ready-to-Use methods for the detection of clouds, cirrus, snow, shadow, water and clear sky pixels in Sentinel-2 MSI images
T2  - remote sensing
N2  - Classification of clouds, cirrus, snow, shadows and clear sky areas is a crucial step in the pre-processing of optical remote sensing images and is a valuable input for their atmospheric correction. The Multi-Spectral Imager on board the Sentinel-2's of the Copernicus program offers optimized bands for this task and delivers unprecedented amounts of data regarding spatial sampling, global coverage, spectral coverage, and repetition rate. Efficient algorithms are needed to process, or possibly reprocess, those big amounts of data. Techniques based on top-of-atmosphere reflectance spectra for single-pixels without exploitation of external data or spatial context offer the largest potential for parallel data processing and highly optimized processing throughput. Such algorithms can be seen as a baseline for possible trade-offs in processing performance when the application of more sophisticated methods is discussed. We present several ready-to-use classification algorithms which are all based on a publicly available database of manually classified Sentinel-2A images. These algorithms are based on commonly used and newly developed machine learning techniques which drastically reduce the amount of time needed to update the algorithms when new images are added to the database. Several ready-to-use decision trees are presented which allow to correctly label about 91% of the spectra within a validation dataset. While decision trees are simple to implement and easy to understand, they offer only limited classification skill. It improves to 98% when the presented algorithm based on the classical Bayesian method is applied. This method has only recently been used for this task and shows excellent performance concerning classification skill and processing performance. A comparison of the presented algorithms with other commonly used techniques such as random forests, stochastic gradient descent, or support vector machines is also given. Especially random forests and support vector machines show similar classification skill as the classical Bayesian method.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 455 
KW  - Sentinel-2 MSI
KW  - cloud detection
KW  - snow detection
KW  - cirrus detection
KW  - shadow detection
KW  - Bayesian classification
KW  - machine learning
KW  - decision trees
Y1  - 2018
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-407938
ER  - 
TY  - CHAP
ED  - Meinel, Christoph
ED  - Polze, Andreas
ED  - Oswald, Gerhard
ED  - Strotmann, Rolf
ED  - Seibold, Ulrich
ED  - Schulzki, Bernhard
T1  - HPI Future SOC Lab
BT  - Proceedings 2016
N2  - The “HPI Future SOC Lab” is a cooperation of the Hasso Plattner Institute (HPI) and industrial partners. Its mission is to enable and promote exchange and interaction between the research community and the industrial partners.
  The HPI Future SOC Lab provides researchers with free of charge access to a complete infrastructure of state of the art hard and software. This infrastructure includes components, which might be too expensive for an ordinary research environment, such as servers with up to 64 cores and 2 TB main memory. The offerings address researchers particularly from but not limited to the areas of computer science and business information systems. Main areas of research include cloud computing, parallelization, and In-Memory technologies.
  This technical report presents results of research projects executed in 2016. Selected projects have presented their results on April 5th and November 3th 2016 at the Future SOC Lab Day events.
N2  - Das Future SOC Lab am HPI ist eine Kooperation des Hasso-Plattner-Instituts mit verschiedenen Industriepartnern. Seine Aufgabe ist die Ermöglichung und Förderung des Austausches zwischen Forschungsgemeinschaft und Industrie.
  Am Lab wird interessierten Wissenschaftlern eine Infrastruktur von neuester Hard- und Software kostenfrei für Forschungszwecke zur Verfügung gestellt. Dazu zählen teilweise noch nicht am Markt verfügbare Technologien, die im normalen Hochschulbereich in der Regel nicht zu finanzieren wären, bspw. Server mit bis zu 64 Cores und 2 TB Hauptspeicher. Diese Angebote richten sich insbesondere an Wissenschaftler in den Gebieten Informatik und Wirtschaftsinformatik. Einige der Schwerpunkte sind Cloud Computing, Parallelisierung und In-Memory Technologien. 
  In diesem Technischen Bericht werden die Ergebnisse der Forschungsprojekte des Jahres 2016 vorgestellt.  Ausgewählte Projekte stellten ihre Ergebnisse am 5. April 2016 und 3. November 2016 im Rahmen der Future SOC Lab Tag Veranstaltungen vor.
KW  - Future SOC Lab
KW  - research projects
KW  - multicore architectures
KW  - In-Memory technology
KW  - cloud computing
KW  - machine learning
KW  - artifical intelligence
KW  - Future SOC Lab
KW  - Forschungsprojekte
KW  - Multicore Architekturen
KW  - In-Memory Technologie
KW  - Cloud Computing
KW  - maschinelles Lernen
KW  - künstliche Intelligenz
Y1  - 2016
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-406787
ER  -