TY - GEN A1 - Wallenta, Daniel T1 - A Lefschetz fixed point formula for elliptic quasicomplexes T2 - Postprints der Universität Potsdam : Mathematisch Naturwissenschaftliche Reihe N2 - In a recent paper, the Lefschetz number for endomorphisms (modulo trace class operators) of sequences of trace class curvature was introduced. We show that this is a well defined, canonical extension of the classical Lefschetz number and establish the homotopy invariance of this number. Moreover, we apply the results to show that the Lefschetz fixed point formula holds for geometric quasiendomorphisms of elliptic quasicomplexes. T3 - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 885 KW - elliptic complexes KW - Fredholm complexes KW - Lefschetz number Y1 - 2020 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-435471 SN - 1866-8372 IS - 885 SP - 577 EP - 587 ER - TY - GEN A1 - Gebser, Martin A1 - Harrison, Amelia A1 - Kaminski, Roland A1 - Lifschitz, Vladimir A1 - Schaub, Torsten H. T1 - Abstract gringo T2 - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe N2 - This paper defines the syntax and semantics of the input language of the ASP grounder gringo. The definition covers several constructs that were not discussed in earlier work on the semantics of that language, including intervals, pools, division of integers, aggregates with non-numeric values, and lparse-style aggregate expressions. The definition is abstract in the sense that it disregards some details related to representing programs by strings of ASCII characters. It serves as a specification for gringo from Version 4.5 on. T3 - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 592 KW - nested expressions Y1 - 2019 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-414751 SN - 1866-8372 IS - 592 ER - TY - GEN A1 - Banbara, Mutsunori A1 - Soh, Takehide A1 - Tamura, Naoyuki A1 - Inoue, Katsumi A1 - Schaub, Torsten H. T1 - Answer set programming as a modeling language for course timetabling T2 - Postprints der Universität Potsdam : Mathematisch Naturwissenschaftliche Reihe N2 - The course timetabling problem can be generally defined as the task of assigning a number of lectures to a limited set of timeslots and rooms, subject to a given set of hard and soft constraints. The modeling language for course timetabling is required to be expressive enough to specify a wide variety of soft constraints and objective functions. Furthermore, the resulting encoding is required to be extensible for capturing new constraints and for switching them between hard and soft, and to be flexible enough to deal with different formulations. In this paper, we propose to make effective use of ASP as a modeling language for course timetabling. We show that our ASP-based approach can naturally satisfy the above requirements, through an ASP encoding of the curriculum-based course timetabling problem proposed in the third track of the second international timetabling competition (ITC-2007). Our encoding is compact and human-readable, since each constraint is individually expressed by either one or two rules. Each hard constraint is expressed by using integrity constraints and aggregates of ASP. Each soft constraint S is expressed by rules in which the head is the form of penalty (S, V, C), and a violation V and its penalty cost C are detected and calculated respectively in the body. We carried out experiments on four different benchmark sets with five different formulations. We succeeded either in improving the bounds or producing the same bounds for many combinations of problem instances and formulations, compared with the previous best known bounds. T3 - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 594 KW - answer set programming KW - educational timetabling KW - course timetabling Y1 - 2019 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-415469 SN - 1866-8372 IS - 594 SP - 783 EP - 798 ER - TY - GEN A1 - Ostrowski, Max A1 - Schaub, Torsten H. T1 - ASP modulo CSP BT - the clingcon system T2 - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe N2 - We present the hybrid ASP solver clingcon, combining the simple modeling language and the high performance Boolean solving capacities of Answer Set Programming (ASP) with techniques for using non-Boolean constraints from the area of Constraint Programming (CP). The new clingcon system features an extended syntax supporting global constraints and optimize statements for constraint variables. The major technical innovation improves the interaction between ASP and CP solver through elaborated learning techniques based on irreducible inconsistent sets. A broad empirical evaluation shows that these techniques yield a performance improvement of an order of magnitude. T3 - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 579 KW - answer set KW - constraints KW - logic KW - SMT Y1 - 2019 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-413908 SN - 1866-8372 IS - 579 ER - TY - GEN A1 - Hoos, Holger A1 - Kaminski, Roland A1 - Lindauer, Marius A1 - Schaub, Torsten H. T1 - aspeed BT - solver scheduling via answer set programming T2 - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe N2 - Although Boolean Constraint Technology has made tremendous progress over the last decade, the efficacy of state-of-the-art solvers is known to vary considerably across different types of problem instances, and is known to depend strongly on algorithm parameters. This problem was addressed by means of a simple, yet effective approach using handmade, uniform, and unordered schedules of multiple solvers in ppfolio, which showed very impressive performance in the 2011 Satisfiability Testing (SAT) Competition. Inspired by this, we take advantage of the modeling and solving capacities of Answer Set Programming (ASP) to automatically determine more refined, that is, nonuniform and ordered solver schedules from the existing benchmarking data. We begin by formulating the determination of such schedules as multi-criteria optimization problems and provide corresponding ASP encodings. The resulting encodings are easily customizable for different settings, and the computation of optimum schedules can mostly be done in the blink of an eye, even when dealing with large runtime data sets stemming from many solvers on hundreds to thousands of instances. Also, the fact that our approach can be customized easily enabled us to swiftly adapt it to generate parallel schedules for multi-processor machines. T3 - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 588 KW - algorithm schedules KW - answer set programming KW - portfolio-based solving Y1 - 2019 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-414743 SN - 1866-8372 IS - 588 ER - TY - GEN A1 - Durzinsky, Markus A1 - Marwan, Wolfgang A1 - Ostrowski, Max A1 - Schaub, Torsten H. A1 - Wagler, Annegret T1 - Automatic network reconstruction using ASP T2 - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe N2 - Building biological models by inferring functional dependencies from experimental data is an important issue in Molecular Biology. To relieve the biologist from this traditionally manual process, various approaches have been proposed to increase the degree of automation. However, available approaches often yield a single model only, rely on specific assumptions, and/or use dedicated, heuristic algorithms that are intolerant to changing circumstances or requirements in the view of the rapid progress made in Biotechnology. Our aim is to provide a declarative solution to the problem by appeal to Answer Set Programming (ASP) overcoming these difficulties. We build upon an existing approach to Automatic Network Reconstruction proposed by part of the authors. This approach has firm mathematical foundations and is well suited for ASP due to its combinatorial flavor providing a characterization of all models explaining a set of experiments. The usage of ASP has several benefits over the existing heuristic algorithms. First, it is declarative and thus transparent for biological experts. Second, it is elaboration tolerant and thus allows for an easy exploration and incorporation of biological constraints. Third, it allows for exploring the entire space of possible models. Finally, our approach offers an excellent performance, matching existing, special-purpose systems. T3 - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 560 KW - regulatory networks KW - biological networks KW - answer Y1 - 2019 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-412419 SN - 1866-8372 IS - 560 ER - TY - GEN A1 - Margaria, Tiziana A1 - Kubczak, Christian A1 - Steffen, Bernhard T1 - Bio-jETI BT - a service integration, design, and provisioning platform for orchestrated bioinformatics processes T2 - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe N2 - Background: With Bio-jETI, we introduce a service platform for interdisciplinary work on biological application domains and illustrate its use in a concrete application concerning statistical data processing in R and xcms for an LC/MS analysis of FAAH gene knockout. Methods: Bio-jETI uses the jABC environment for service-oriented modeling and design as a graphical process modeling tool and the jETI service integration technology for remote tool execution. Conclusions: As a service definition and provisioning platform, Bio-jETI has the potential to become a core technology in interdisciplinary service orchestration and technology transfer. Domain experts, like biologists not trained in computer science, directly define complex service orchestrations as process models and use efficient and complex bioinformatics tools in a simple and intuitive way. T3 - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 822 KW - fatty acid amide hydrolase KW - composite service KW - service orchestration KW - rest service KW - electronic tool integration Y1 - 2020 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-428868 IS - 822 ER - TY - GEN A1 - Repsilber, Dirk A1 - Kern, Sabine A1 - Telaar, Anna A1 - Walzl, Gerhard A1 - Black, Gillian F. A1 - Selbig, Joachim A1 - Parida, Shreemanta K. A1 - Kaufmann, Stefan H. E. A1 - Jacobsen, Marc T1 - Biomarker discovery in heterogeneous tissue samples BT - taking the in-silico deconfounding approach T2 - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe N2 - Background: For heterogeneous tissues, such as blood, measurements of gene expression are confounded by relative proportions of cell types involved. Conclusions have to rely on estimation of gene expression signals for homogeneous cell populations, e.g. by applying micro-dissection, fluorescence activated cell sorting, or in-silico deconfounding. We studied feasibility and validity of a non-negative matrix decomposition algorithm using experimental gene expression data for blood and sorted cells from the same donor samples. Our objective was to optimize the algorithm regarding detection of differentially expressed genes and to enable its use for classification in the difficult scenario of reversely regulated genes. This would be of importance for the identification of candidate biomarkers in heterogeneous tissues. Results: Experimental data and simulation studies involving noise parameters estimated from these data revealed that for valid detection of differential gene expression, quantile normalization and use of non-log data are optimal. We demonstrate the feasibility of predicting proportions of constituting cell types from gene expression data of single samples, as a prerequisite for a deconfounding-based classification approach. Classification cross-validation errors with and without using deconfounding results are reported as well as sample-size dependencies. Implementation of the algorithm, simulation and analysis scripts are available. Conclusions: The deconfounding algorithm without decorrelation using quantile normalization on non-log data is proposed for biomarkers that are difficult to detect, and for cases where confounding by varying proportions of cell types is the suspected reason. In this case, a deconfounding ranking approach can be used as a powerful alternative to, or complement of, other statistical learning approaches to define candidate biomarkers for molecular diagnosis and prediction in biomedicine, in realistically noisy conditions and with moderate sample sizes. T3 - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 854 KW - differential gene expression KW - quantile normalization KW - heterogeneous tissue KW - gene expression matrix KW - homogeneous cell population KW - selection KW - microdissection Y1 - 2020 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-429343 SN - 1866-8372 IS - 854 ER - TY - THES A1 - Tinnefeld, Christian T1 - Building a columnar database on shared main memory-based storage BT - database operator placement in a shared main memory-based storage system that supports data access and code execution N2 - In the field of disk-based parallel database management systems exists a great variety of solutions based on a shared-storage or a shared-nothing architecture. In contrast, main memory-based parallel database management systems are dominated solely by the shared-nothing approach as it preserves the in-memory performance advantage by processing data locally on each server. We argue that this unilateral development is going to cease due to the combination of the following three trends: a) Nowadays network technology features remote direct memory access (RDMA) and narrows the performance gap between accessing main memory inside a server and of a remote server to and even below a single order of magnitude. b) Modern storage systems scale gracefully, are elastic, and provide high-availability. c) A modern storage system such as Stanford's RAMCloud even keeps all data resident in main memory. Exploiting these characteristics in the context of a main-memory parallel database management system is desirable. The advent of RDMA-enabled network technology makes the creation of a parallel main memory DBMS based on a shared-storage approach feasible. This thesis describes building a columnar database on shared main memory-based storage. The thesis discusses the resulting architecture (Part I), the implications on query processing (Part II), and presents an evaluation of the resulting solution in terms of performance, high-availability, and elasticity (Part III). In our architecture, we use Stanford's RAMCloud as shared-storage, and the self-designed and developed in-memory AnalyticsDB as relational query processor on top. AnalyticsDB encapsulates data access and operator execution via an interface which allows seamless switching between local and remote main memory, while RAMCloud provides not only storage capacity, but also processing power. Combining both aspects allows pushing-down the execution of database operators into the storage system. We describe how the columnar data processed by AnalyticsDB is mapped to RAMCloud's key-value data model and how the performance advantages of columnar data storage can be preserved. The combination of fast network technology and the possibility to execute database operators in the storage system opens the discussion for site selection. We construct a system model that allows the estimation of operator execution costs in terms of network transfer, data processed in memory, and wall time. This can be used for database operators that work on one relation at a time - such as a scan or materialize operation - to discuss the site selection problem (data pull vs. operator push). Since a database query translates to the execution of several database operators, it is possible that the optimal site selection varies per operator. For the execution of a database operator that works on two (or more) relations at a time, such as a join, the system model is enriched by additional factors such as the chosen algorithm (e.g. Grace- vs. Distributed Block Nested Loop Join vs. Cyclo-Join), the data partitioning of the respective relations, and their overlapping as well as the allowed resource allocation. We present an evaluation on a cluster with 60 nodes where all nodes are connected via RDMA-enabled network equipment. We show that query processing performance is about 2.4x slower if everything is done via the data pull operator execution strategy (i.e. RAMCloud is being used only for data access) and about 27% slower if operator execution is also supported inside RAMCloud (in comparison to operating only on main memory inside a server without any network communication at all). The fast-crash recovery feature of RAMCloud can be leveraged to provide high-availability, e.g. a server crash during query execution only delays the query response for about one second. Our solution is elastic in a way that it can adapt to changing workloads a) within seconds, b) without interruption of the ongoing query processing, and c) without manual intervention. N2 - Diese Arbeit beschreibt die Erstellung einer spalten-orientierten Datenbank auf einem geteilten, Hauptspeicher-basierenden Speichersystem. Motiviert wird diese Arbeit durch drei Faktoren. Erstens ist moderne Netzwerktechnologie mit “Remote Direct Memory Access” (RDMA) ausgestattet. Dies reduziert den Unterschied hinsichtlich Latenz und Durchsatz zwischen dem Speicherzugriff innerhalb eines Rechners und auf einen entfernten Rechner auf eine Größenordnung. Zweitens skalieren moderne Speichersysteme, sind elastisch und hochverfügbar. Drittens hält ein modernes Speichersystem wie Stanford's RAMCloud alle Daten im Hauptspeicher vor. Diese Eigenschaften im Kontext einer spalten-orientierten Datenbank zu nutzen ist erstrebenswert. Die Arbeit ist in drei Teile untergliedert. Der erste Teile beschreibt die Architektur einer spalten-orientierten Datenbank auf einem geteilten, Hauptspeicher-basierenden Speichersystem. Hierbei werden die im Rahmen dieser Arbeit entworfene und entwickelte Datenbank AnalyticsDB sowie Stanford's RAMCloud verwendet. Die Architektur beschreibt wie Datenzugriff und Operatorausführung gekapselt werden um nahtlos zwischen lokalem und entfernten Hauptspeicher wechseln zu können. Weiterhin wird die Ablage der nach einem relationalen Schema formatierten Daten von AnalyticsDB in RAMCloud behandelt, welches mit einem Schlüssel-Wertpaar Datenmodell operiert. Der zweite Teil fokussiert auf die Implikationen bei der Abarbeitung von Datenbankanfragen. Hier steht die Diskussion im Vordergrund wo (entweder in AnalyticsDB oder in RAMCloud) und mit welcher Parametrisierung einzelne Datenbankoperationen ausgeführt werden. Dafür werden passende Kostenmodelle vorgestellt, welche die Abbildung von Datenbankoperationen ermöglichen, die auf einer oder mehreren Relationen arbeiten. Der dritte Teil der Arbeit präsentiert eine Evaluierung auf einem Verbund von 60 Rechnern hinsichtlich der Leistungsfähigkeit, der Hochverfügbarkeit und der Elastizität vom System. T2 - Die Erstellung einer spaltenorientierten Datenbank auf einem verteilten, Hauptspeicher-basierenden Speichersystem KW - computer science KW - database technology KW - main memory computing KW - cloud computing KW - verteilte Datenbanken KW - Hauptspeicher Technologie KW - virtualisierte IT-Infrastruktur Y1 - 2014 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-72063 ER - TY - GEN A1 - Hoos, Holger A1 - Lindauer, Marius A1 - Schaub, Torsten H. T1 - claspfolio 2 BT - advances in algorithm selection for answer set programming T2 - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe N2 - Building on the award-winning, portfolio-based ASP solver claspfolio, we present claspfolio 2, a modular and open solver architecture that integrates several different portfolio-based algorithm selection approaches and techniques. The claspfolio 2 solver framework supports various feature generators, solver selection approaches, solver portfolios, as well as solver-schedule-based pre-solving techniques. The default configuration of claspfolio 2 relies on a light-weight version of the ASP solver clasp to generate static and dynamic instance features. The flexible open design of claspfolio 2 is a distinguishing factor even beyond ASP. As such, it provides a unique framework for comparing and combining existing portfolio-based algorithm selection approaches and techniques in a single, unified framework. Taking advantage of this, we conducted an extensive experimental study to assess the impact of different feature sets, selection approaches and base solver portfolios. In addition to gaining substantial insights into the utility of the various approaches and techniques, we identified a default configuration of claspfolio 2 that achieves substantial performance gains not only over clasp's default configuration and the earlier version of claspfolio, but also over manually tuned configurations of clasp. T3 - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 606 KW - solver KW - sat Y1 - 2019 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-416129 SN - 1866-8372 IS - 606 ER -