TY  - GEN
A1  - Fichte, Johannes Klaus
A1  - Truszczynski, Miroslaw
A1  - Woltran, Stefan
T1  - Dual-normal logic programs
BT  - the forgotten class
T2  - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe
N2  - Disjunctive Answer Set Programming is a powerful declarative programming paradigm with complexity beyond NP. Identifying classes of programs for which the consistency problem is in NP is of interest from the theoretical standpoint and can potentially lead to improvements in the design of answer set programming solvers. One of such classes consists of dual-normal programs, where the number of positive body atoms in proper rules is at most one. Unlike other classes of programs, dual-normal programs have received little attention so far. In this paper we study this class. We relate dual-normal programs to propositional theories and to normal programs by presenting several inter-translations. With the translation from dual-normal to normal programs at hand, we introduce the novel class of body-cycle free programs, which are in many respects dual to head-cycle free programs. We establish the expressive power of dual-normal programs in terms of SE- and UE-models, and compare them to normal programs. We also discuss the complexity of deciding whether dual-normal programs are strongly and uniformly equivalent.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 585 
KW  - answer set programming
KW  - classes of logic programs
KW  - strong and uniform equivalence
KW  - propositional satisfiability
Y1  - 2019
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-414490
SN  - 1866-8372
IS  - 585
ER  - 
TY  - GEN
A1  - Arvidsson, Samuel Janne
A1  - Kwasniewski, Miroslaw
A1  - Riaño- Pachón, Diego Mauricio
A1  - Mueller-Roeber, Bernd
T1  - QuantPrime
BT  - a flexible tool for reliable high-throughput primer design for quantitative PCR
T2  - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe
N2  - Background

Medium- to large-scale expression profiling using quantitative polymerase chain reaction (qPCR) assays are becoming increasingly important in genomics research. A major bottleneck in experiment preparation is the design of specific primer pairs, where researchers have to make several informed choices, often outside their area of expertise. Using currently available primer design tools, several interactive decisions have to be made, resulting in lengthy design processes with varying qualities of the assays.
Results

Here we present QuantPrime, an intuitive and user-friendly, fully automated tool for primer pair design in small- to large-scale qPCR analyses. QuantPrime can be used online through the internet http://www.quantprime.de/ or on a local computer after download; it offers design and specificity checking with highly customizable parameters and is ready to use with many publicly available transcriptomes of important higher eukaryotic model organisms and plant crops (currently 295 species in total), while benefiting from exon-intron border and alternative splice variant information in available genome annotations. Experimental results with the model plant Arabidopsis thaliana, the crop Hordeum vulgare and the model green alga Chlamydomonas reinhardtii show success rates of designed primer pairs exceeding 96%.
Conclusion

QuantPrime constitutes a flexible, fully automated web application for reliable primer design for use in larger qPCR experiments, as proven by experimental data. The flexible framework is also open for simple use in other quantification applications, such as hydrolyzation probe design for qPCR and oligonucleotide probe design for quantitative in situ hybridization. Future suggestions made by users can be easily implemented, thus allowing QuantPrime to be developed into a broad-range platform for the design of RNA expression assays.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 943 
KW  - prime pair
KW  - genome annotation
KW  - specific prime pair
KW  - primer pair design
KW  - quantification protocol
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-431531
SN  - 1866-8372
IS  - 943
ER  - 
TY  - INPR
A1  - Prasse, Paul
A1  - Gruben, Gerrit
A1  - Machlika, Lukas
A1  - Pevny, Tomas
A1  - Sofka, Michal
A1  - Scheffer, Tobias
T1  - Malware Detection by HTTPS Traffic Analysis
N2  - In order to evade detection by network-traffic analysis, a growing proportion of malware uses the encrypted HTTPS protocol. We explore the problem of detecting  malware on client computers based on HTTPS traffic analysis. In this setting, malware has to be detected based on the host IP address, ports, timestamp,  and data volume information of TCP/IP packets that are sent and received by all the applications on the client. We develop a scalable protocol that allows us to collect network flows of known malicious and benign applications as training data and derive a malware-detection method based on a neural networks and sequence classification. We study the method's ability to detect known and new, unknown malware in a large-scale empirical study.
KW  - machine learning
KW  - computer security
Y1  - 2017
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-100942
ER  - 
TY  - THES
A1  - Tinnefeld, Christian
T1  - Building a columnar database on shared main memory-based storage
BT  - database operator placement in a shared main memory-based storage system that supports data access and code execution
N2  - In the field of disk-based parallel database management systems exists a great variety of solutions based on a shared-storage or a shared-nothing architecture. In contrast, main memory-based parallel database management systems are dominated solely by the shared-nothing approach as it preserves the in-memory performance advantage by processing data locally on each server. We argue that this unilateral development is going to cease due to the combination of the following three trends: a) Nowadays network technology features remote direct memory access (RDMA) and narrows the performance gap between accessing main memory inside a server and of a remote server to and even below a single order of magnitude. b) Modern storage systems scale gracefully, are elastic, and provide high-availability. c) A modern storage system such as Stanford's RAMCloud even keeps all data resident in main memory. Exploiting these characteristics in the context of a main-memory parallel database management system is desirable. The advent of RDMA-enabled network technology makes the creation of a parallel main memory DBMS based on a shared-storage approach feasible.

This thesis describes building a columnar database on shared main memory-based storage. The thesis discusses the resulting architecture (Part I), the implications on query processing (Part II), and presents an evaluation of the resulting solution in terms of performance, high-availability, and elasticity (Part III).

In our architecture, we use Stanford's RAMCloud as shared-storage, and the self-designed and developed in-memory AnalyticsDB as relational query processor on top. AnalyticsDB encapsulates data access and operator execution via an interface which allows seamless switching between local and remote main memory, while RAMCloud provides not only storage capacity, but also processing power. Combining both aspects allows pushing-down the execution of database operators into the storage system. We describe how the columnar data processed by AnalyticsDB is mapped to RAMCloud's key-value data model and how the performance advantages of columnar data storage can be preserved.

The combination of fast network technology and the possibility to execute database operators in the storage system opens the discussion for site selection. We construct a system model that allows the estimation of operator execution costs in terms of network transfer, data processed in memory, and wall time. This can be used for database operators that work on one relation at a time - such as a scan or materialize operation - to discuss the site selection problem (data pull vs. operator push). Since a database query translates to the execution of several database operators, it is possible that the optimal site selection varies per operator. For the execution of a database operator that works on two (or more) relations at a time, such as a join, the system model is enriched by additional factors such as the chosen algorithm (e.g. Grace- vs. Distributed Block Nested Loop Join vs. Cyclo-Join), the data partitioning of the respective relations, and their overlapping as well as the allowed resource allocation.

We present an evaluation on a cluster with 60 nodes where all nodes are connected via RDMA-enabled network equipment. We show that query processing performance is about 2.4x slower if everything is done via the data pull operator execution strategy (i.e. RAMCloud is being used only for data access) and about 27% slower if operator execution is also supported inside RAMCloud (in comparison to operating only on main memory inside a server without any network communication at all). The fast-crash recovery feature of RAMCloud can be leveraged to provide high-availability, e.g. a server crash during query execution only delays the query response for about one second. Our solution is elastic in a way that it can adapt to changing workloads a) within seconds, b) without interruption of the ongoing query processing, and c) without manual intervention.
N2  - Diese Arbeit beschreibt die Erstellung einer spalten-orientierten Datenbank auf einem geteilten, Hauptspeicher-basierenden Speichersystem. Motiviert wird diese Arbeit durch drei Faktoren. Erstens ist moderne Netzwerktechnologie mit “Remote Direct Memory Access” (RDMA) ausgestattet. Dies reduziert den Unterschied hinsichtlich Latenz und Durchsatz zwischen dem Speicherzugriff innerhalb eines Rechners und auf einen entfernten Rechner auf eine Größenordnung. Zweitens skalieren moderne Speichersysteme, sind elastisch und hochverfügbar. Drittens hält ein modernes Speichersystem wie Stanford's RAMCloud alle Daten im Hauptspeicher vor. Diese Eigenschaften im Kontext einer spalten-orientierten Datenbank zu nutzen ist erstrebenswert. Die Arbeit ist in drei Teile untergliedert. Der erste Teile beschreibt die Architektur einer spalten-orientierten Datenbank auf einem geteilten, Hauptspeicher-basierenden Speichersystem. Hierbei werden die im Rahmen dieser Arbeit entworfene und entwickelte Datenbank AnalyticsDB sowie Stanford's RAMCloud verwendet. Die Architektur beschreibt wie Datenzugriff und Operatorausführung gekapselt werden um nahtlos zwischen lokalem und entfernten Hauptspeicher wechseln zu können. Weiterhin wird die Ablage der nach einem relationalen Schema formatierten Daten von AnalyticsDB in RAMCloud behandelt, welches mit einem Schlüssel-Wertpaar Datenmodell operiert. Der zweite Teil fokussiert auf die Implikationen bei der Abarbeitung von Datenbankanfragen. Hier steht die Diskussion im Vordergrund wo (entweder in AnalyticsDB oder in RAMCloud) und mit welcher Parametrisierung einzelne Datenbankoperationen ausgeführt werden. Dafür werden passende Kostenmodelle vorgestellt, welche die Abbildung von Datenbankoperationen ermöglichen, die auf einer oder mehreren Relationen arbeiten. Der dritte Teil der Arbeit präsentiert eine Evaluierung auf einem Verbund von 60 Rechnern hinsichtlich der Leistungsfähigkeit, der Hochverfügbarkeit und der Elastizität vom System.
T2  - Die Erstellung einer spaltenorientierten Datenbank auf einem verteilten, Hauptspeicher-basierenden Speichersystem
KW  - computer science
KW  - database technology
KW  - main memory computing
KW  - cloud computing
KW  - verteilte Datenbanken
KW  - Hauptspeicher Technologie
KW  - virtualisierte IT-Infrastruktur
Y1  - 2014
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-72063
ER  - 
TY  - GEN
A1  - Margaria, Tiziana
A1  - Kubczak, Christian
A1  - Steffen, Bernhard
T1  - Bio-jETI
BT  - a service integration, design, and provisioning platform for orchestrated bioinformatics processes
T2  - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe
N2  - Background: With Bio-jETI, we introduce a service platform for interdisciplinary work on biological application domains and illustrate its use in a concrete application concerning statistical data processing in R and xcms for an LC/MS analysis of FAAH gene knockout.

Methods: Bio-jETI uses the jABC environment for service-oriented modeling and design as a graphical process modeling tool and the jETI service integration technology for remote tool execution.

Conclusions: As a service definition and provisioning platform, Bio-jETI has the potential to become a core technology in interdisciplinary service orchestration and technology transfer. Domain experts, like biologists not trained in computer science, directly define complex service orchestrations as process models and use efficient and complex bioinformatics tools in a simple and intuitive way.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 822 
KW  - fatty acid amide hydrolase
KW  - composite service
KW  - service orchestration
KW  - rest service
KW  - electronic tool integration
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-428868
IS  - 822
ER  - 
TY  - GEN
A1  - Dworschak, Steve
A1  - Grell, Susanne
A1  - Nikiforova, Victoria J.
A1  - Schaub, Torsten H.
A1  - Selbig, Joachim
T1  - Modeling biological networks by action languages via answer set programming
T2  - Postprints der Universität Potsdam : Mathematisch Naturwissenschaftliche Reihe
N2  - We describe an approach to modeling biological networks by action languages via answer set programming. To this end, we propose an action language for modeling biological networks, building on previous work by Baral et al. We introduce its syntax and semantics along with a translation into answer set programming, an efficient Boolean Constraint Programming Paradigm. Finally, we describe one of its applications, namely, the sulfur starvation response-pathway of the model plant Arabidopsis thaliana and sketch the functionality of our system and its usage.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 843 
KW  - biological network model
KW  - action language
KW  - answer set programming
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-429846
SN  - 1866-8372
IS  - 843
ER  - 
TY  - GEN
A1  - Repsilber, Dirk
A1  - Kern, Sabine
A1  - Telaar, Anna
A1  - Walzl, Gerhard
A1  - Black, Gillian F.
A1  - Selbig, Joachim
A1  - Parida, Shreemanta K.
A1  - Kaufmann, Stefan H. E.
A1  - Jacobsen, Marc
T1  - Biomarker discovery in heterogeneous tissue samples
BT  - taking the in-silico deconfounding approach
T2  - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe
N2  - Background: For heterogeneous tissues, such as blood, measurements of gene expression are confounded by relative proportions of cell types involved. Conclusions have to rely on estimation of gene expression signals for homogeneous cell populations, e.g. by applying micro-dissection, fluorescence activated cell sorting, or in-silico deconfounding. We studied feasibility and validity of a non-negative matrix decomposition algorithm using experimental gene expression data for blood and sorted cells from the same donor samples. Our objective was to optimize the algorithm regarding detection of differentially expressed genes and to enable its use for classification in the difficult scenario of reversely regulated genes. This would be of importance for the identification of candidate biomarkers in heterogeneous tissues.

Results: Experimental data and simulation studies involving noise parameters estimated from these data revealed that for valid detection of differential gene expression, quantile normalization and use of non-log data are optimal. We demonstrate the feasibility of predicting proportions of constituting cell types from gene expression data of single samples, as a prerequisite for a deconfounding-based classification approach. Classification cross-validation errors with and without using deconfounding results are reported as well as sample-size dependencies. Implementation of the algorithm, simulation and analysis scripts are available.

Conclusions: The deconfounding algorithm without decorrelation using quantile normalization on non-log data is proposed for biomarkers that are difficult to detect, and for cases where confounding by varying proportions of cell types is the suspected reason. In this case, a deconfounding ranking approach can be used as a powerful alternative to, or complement of, other statistical learning approaches to define candidate biomarkers for molecular diagnosis and prediction in biomedicine, in realistically noisy conditions and with moderate sample sizes.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 854 
KW  - differential gene expression
KW  - quantile normalization
KW  - heterogeneous tissue
KW  - gene expression matrix
KW  - homogeneous cell population
KW  - selection
KW  - microdissection
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-429343
SN  - 1866-8372
IS  - 854
ER  - 
TY  - GEN
A1  - Margaria, Tiziana
A1  - Steffen, Bernhard
A1  - Kubczak, Christian
T1  - Evolution support in heterogeneous service-oriented landscapes
T2  - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe
N2  - We present an approach that provides automatic or semi-automatic support for evolution and change management in heterogeneous legacy landscapes where (1) legacy heterogeneous, possibly distributed platforms are integrated in a service oriented fashion, (2) the coordination of functionality is provided at the service level, through orchestration, (3) compliance and correctness are provided through policies and business rules, (4) evolution and correctness-by-design are supported by the eXtreme Model Driven Development paradigm (XMDD) offered by the jABC (Margaria and Steffen in Annu. Rev. Commun. 57, 2004)—the model-driven service oriented development platform we use here for integration, design, evolution, and governance. The artifacts are here semantically enriched, so that automatic synthesis plugins can field the vision of Enterprise Physics: knowledge driven business process development for the end user.

We demonstrate this vision along a concrete case study that became over the past three years a benchmark for Semantic Web Service discovery and mediation. We enhance the Mediation Scenario of the Semantic Web Service Challenge along the 2 central evolution paradigms that occur in practice: (a) Platform migration: platform substitution of a legacy system by an ERP system and (b) Backend extension: extension of the legacy Customer Relationship Management (CRM) and Order Management System (OMS) backends via an additional ERP layer.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 918 
KW  - evolving systems
KW  - semantic web services
KW  - service mediation
KW  - web services
KW  - SOA
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-432405
SN  - 1866-8372
IS  - 918
ER  - 
TY  - GEN
A1  - Larhlimi, Abdelhalim
A1  - David, Laszlo
A1  - Selbig, Joachim
A1  - Bockmayr, Alexander
T1  - F2C2
BT  - a fast tool for the computation of flux coupling in genome-scale metabolic networks
T2  - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe
N2  - Background: Flux coupling analysis (FCA) has become a useful tool in the constraint-based analysis of genome-scale metabolic networks. FCA allows detecting dependencies between reaction fluxes of metabolic networks at steady-state. On the one hand, this can help in the curation of reconstructed metabolic networks by verifying whether the coupling between reactions is in agreement with the experimental findings. On the other hand, FCA can aid in defining intervention strategies to knock out target reactions.

Results: We present a new method F2C2 for FCA, which is orders of magnitude faster than previous approaches. As a consequence, FCA of genome-scale metabolic networks can now be performed in a routine manner.

Conclusions: We propose F2C2 as a fast tool for the computation of flux coupling in genome-scale metabolic networks. F2C2 is freely available for non-commercial use at https://sourceforge.net/projects/f2c2/files/.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 921 
KW  - balance analysis
KW  - reconstruction
KW  - pathways
KW  - models
KW  - metabolic network
KW  - couple reaction
KW  - reversible reaction
KW  - linear programming problem
KW  - coupling relationship
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-432431
SN  - 1866-8372
IS  - 921
ER  - 
TY  - GEN
A1  - Wallenta, Daniel
T1  - A Lefschetz fixed point formula for elliptic quasicomplexes
T2  - Postprints der Universität Potsdam : Mathematisch Naturwissenschaftliche Reihe
N2  - In a recent paper, the Lefschetz number for endomorphisms (modulo trace class operators) of sequences of trace class curvature was introduced. We show that this is a well defined, canonical extension of the classical Lefschetz number and establish the homotopy invariance of this number. Moreover, we apply the results to show that the Lefschetz fixed point formula holds for geometric quasiendomorphisms of elliptic quasicomplexes.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 885 
KW  - elliptic complexes
KW  - Fredholm complexes
KW  - Lefschetz number
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-435471
SN  - 1866-8372
IS  - 885
SP  - 577
EP  - 587
ER  -