TY - JOUR A1 - Peng, Junjie A1 - Liu, Danxu A1 - Wang, Yingtao A1 - Zeng, Ying A1 - Cheng, Feng A1 - Zhang, Wenqiang T1 - Weight-based strategy for an I/O-intensive application at a cloud data center JF - Concurrency and computation : practice & experience N2 - Applications with different characteristics in the cloud may have different resources preferences. However, traditional resource allocation and scheduling strategies rarely take into account the characteristics of applications. Considering that an I/O-intensive application is a typical type of application and that frequent I/O accesses, especially small files randomly accessing the disk, may lead to an inefficient use of resources and reduce the quality of service (QoS) of applications, a weight allocation strategy is proposed based on the available resources that a physical server can provide as well as the characteristics of the applications. Using the weight obtained, a resource allocation and scheduling strategy is presented based on the specific application characteristics in the data center. Extensive experiments show that the strategy is correct and can guarantee a high concurrency of I/O per second (IOPS) in a cloud data center with high QoS. Additionally, the strategy can efficiently improve the utilization of the disk and resources of the data center without affecting the service quality of applications. KW - IOPS KW - process scheduling KW - random I KW - O KW - small files KW - weight Y1 - 2018 U6 - https://doi.org/10.1002/cpe.4648 SN - 1532-0626 SN - 1532-0634 VL - 30 IS - 19 PB - Wiley CY - Hoboken ER - TY - JOUR A1 - Schaub, Torsten H. A1 - Woltran, Stefan T1 - Answer set programming unleashed! JF - Künstliche Intelligenz N2 - Answer Set Programming faces an increasing popularity for problem solving in various domains. While its modeling language allows us to express many complex problems in an easy way, its solving technology enables their effective resolution. In what follows, we detail some of the key factors of its success. Answer Set Programming [ASP; Brewka et al. Commun ACM 54(12):92–103, (2011)] is seeing a rapid proliferation in academia and industry due to its easy and flexible way to model and solve knowledge-intense combinatorial (optimization) problems. To this end, ASP offers a high-level modeling language paired with high-performance solving technology. As a result, ASP systems provide out-off-the-box, general-purpose search engines that allow for enumerating (optimal) solutions. They are represented as answer sets, each being a set of atoms representing a solution. The declarative approach of ASP allows a user to concentrate on a problem’s specification rather than the computational means to solve it. This makes ASP a prime candidate for rapid prototyping and an attractive tool for teaching key AI techniques since complex problems can be expressed in a succinct and elaboration tolerant way. This is eased by the tuning of ASP’s modeling language to knowledge representation and reasoning (KRR). The resulting impact is nicely reflected by a growing range of successful applications of ASP [Erdem et al. AI Mag 37(3):53–68, 2016; Falkner et al. Industrial applications of answer set programming. K++nstliche Intelligenz (2018)] Y1 - 2018 U6 - https://doi.org/10.1007/s13218-018-0550-z SN - 0933-1875 SN - 1610-1987 VL - 32 IS - 2-3 SP - 105 EP - 108 PB - Springer CY - Heidelberg ER - TY - GEN A1 - Schaub, Torsten H. A1 - Woltran, Stefan T1 - Special issue on answer set programming T2 - Künstliche Intelligenz Y1 - 2018 U6 - https://doi.org/10.1007/s13218-018-0554-8 SN - 0933-1875 SN - 1610-1987 VL - 32 IS - 2-3 SP - 101 EP - 103 PB - Springer CY - Heidelberg ER - TY - JOUR A1 - Schäfer, Robin A1 - Stede, Manfred T1 - Argument mining on twitter BT - a survey JF - Information technology : it ; Methoden und innovative Anwendungen der Informatik und Informationstechnik ; Organ der Fachbereiche 3 und 4 der GI e.V. und des Fachbereichs 6 der ITG N2 - In the last decade, the field of argument mining has grown notably. However, only relatively few studies have investigated argumentation in social media and specifically on Twitter. Here, we provide the, to our knowledge, first critical in-depth survey of the state of the art in tweet-based argument mining. We discuss approaches to modelling the structure of arguments in the context of tweet corpus annotation, and we review current progress in the task of detecting argument components and their relations in tweets. We also survey the intersection of argument mining and stance detection, before we conclude with an outlook. KW - Argument Mining KW - Twitter KW - Stance Detection Y1 - 2021 U6 - https://doi.org/10.1515/itit-2020-0053 SN - 1611-2776 SN - 2196-7032 VL - 63 IS - 1 SP - 45 EP - 58 PB - De Gruyter CY - Berlin ER - TY - JOUR A1 - Ayzel, Georgy A1 - Heistermann, Maik T1 - The effect of calibration data length on the performance of a conceptual hydrological model versus LSTM and GRU BT - a case study for six basins from the CAMELS dataset JF - Computers & geosciences : an international journal devoted to the publication of papers on all aspects of geocomputation and to the distribution of computer programs and test data sets ; an official journal of the International Association for Mathematical Geology N2 - We systematically explore the effect of calibration data length on the performance of a conceptual hydrological model, GR4H, in comparison to two Artificial Neural Network (ANN) architectures: Long Short-Term Memory Networks (LSTM) and Gated Recurrent Units (GRU), which have just recently been introduced to the field of hydrology. We implemented a case study for six river basins across the contiguous United States, with 25 years of meteorological and discharge data. Nine years were reserved for independent validation; two years were used as a warm-up period, one year for each of the calibration and validation periods, respectively; from the remaining 14 years, we sampled increasing amounts of data for model calibration, and found pronounced differences in model performance. While GR4H required less data to converge, LSTM and GRU caught up at a remarkable rate, considering their number of parameters. Also, LSTM and GRU exhibited the higher calibration instability in comparison to GR4H. These findings confirm the potential of modern deep-learning architectures in rainfall runoff modelling, but also highlight the noticeable differences between them in regard to the effect of calibration data length. KW - Artificial neural networks KW - Calibration KW - Deep learning KW - Rainfall-runoff KW - modelling Y1 - 2021 U6 - https://doi.org/10.1016/j.cageo.2021.104708 SN - 0098-3004 SN - 1873-7803 VL - 149 PB - Elsevier CY - Amsterdam ER - TY - JOUR A1 - Kossmann, Jan A1 - Halfpap, Stefan A1 - Jankrift, Marcel A1 - Schlosser, Rainer T1 - Magic mirror in my hand, which is the best in the land? BT - an experimental evaluation of index selection algorithms JF - Proceedings of the VLDB Endowment N2 - Indexes are essential for the efficient processing of database workloads. Proposed solutions for the relevant and challenging index selection problem range from metadata-based simple heuristics, over sophisticated multi-step algorithms, to approaches that yield optimal results. The main challenges are (i) to accurately determine the effect of an index on the workload cost while considering the interaction of indexes and (ii) a large number of possible combinations resulting from workloads containing many queries and massive schemata with possibly thousands of attributes.
In this work, we describe and analyze eight index selection algorithms that are based on different concepts and compare them along different dimensions, such as solution quality, runtime, multi-column support, solution granularity, and complexity. In particular, we analyze the solutions of the algorithms for the challenging analytical Join Order, TPC-H, and TPC-DS benchmarks. Afterward, we assess strengths and weaknesses, infer insights for index selection in general and each approach individually, before we give recommendations on when to use which approach. Y1 - 2020 U6 - https://doi.org/10.14778/3407790.3407832 SN - 2150-8097 VL - 13 IS - 11 SP - 2382 EP - 2395 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Kaya, Adem A1 - Freitag, Melina A. T1 - Conditioning analysis for discrete Helmholtz problems JF - Computers and mathematics with applications : an international journal N2 - In this paper, we examine conditioning of the discretization of the Helmholtz problem. Although the discrete Helmholtz problem has been studied from different perspectives, to the best of our knowledge, there is no conditioning analysis for it. We aim to fill this gap in the literature. We propose a novel method in 1D to observe the near-zero eigenvalues of a symmetric indefinite matrix. Standard classification of ill-conditioning based on the matrix condition number is not true for the discrete Helmholtz problem. We relate the ill-conditioning of the discretization of the Helmholtz problem with the condition number of the matrix. We carry out analytical conditioning analysis in 1D and extend our observations to 2D with numerical observations. We examine several discretizations. We find different regions in which the condition number of the problem shows different characteristics. We also explain the general behavior of the solutions in these regions. KW - Helmholtz problem KW - Condition number KW - Ill-conditioning KW - Indefinite KW - matrices Y1 - 2022 U6 - https://doi.org/10.1016/j.camwa.2022.05.016 SN - 0898-1221 SN - 1873-7668 VL - 118 SP - 171 EP - 182 PB - Elsevier Science CY - Amsterdam ER - TY - JOUR A1 - Mattis, Toni A1 - Beckmann, Tom A1 - Rein, Patrick A1 - Hirschfeld, Robert T1 - First-class concepts BT - Reified architectural knowledge beyond dominant decompositions JF - Journal of object technology : JOT / ETH Zürich, Department of Computer Science N2 - Ideally, programs are partitioned into independently maintainable and understandable modules. As a system grows, its architecture gradually loses the capability to accommodate new concepts in a modular way. While refactoring is expensive and not always possible, and the programming language might lack dedicated primary language constructs to express certain cross-cutting concerns, programmers are still able to explain and delineate convoluted concepts through secondary means: code comments, use of whitespace and arrangement of code, documentation, or communicating tacit knowledge.
Secondary constructs are easy to change and provide high flexibility in communicating cross-cutting concerns and other concepts among programmers. However, such secondary constructs usually have no reified representation that can be explored and manipulated as first-class entities through the programming environment.
In this exploratory work, we discuss novel ways to express a wide range of concepts, including cross-cutting concerns, patterns, and lifecycle artifacts independently of the dominant decomposition imposed by an existing architecture. We propose the representation of concepts as first-class objects inside the programming environment that retain the capability to change as easily as code comments. We explore new tools that allow programmers to view, navigate, and change programs based on conceptual perspectives. In a small case study, we demonstrate how such views can be created and how the programming experience changes from draining programmers' attention by stretching it across multiple modules toward focusing it on cohesively presented concepts. Our designs are geared toward facilitating multiple secondary perspectives on a system to co-exist in symbiosis with the original architecture, hence making it easier to explore, understand, and explain complex contexts and narratives that are hard or impossible to express using primary modularity constructs. KW - software engineering KW - modularity KW - exploratory programming KW - program KW - comprehension KW - remodularization KW - architecture recovery Y1 - 2022 U6 - https://doi.org/10.5381/jot.2022.21.2.a6 SN - 1660-1769 VL - 21 IS - 2 SP - 1 EP - 15 PB - ETH Zürich, Department of Computer Science CY - Zürich ER - TY - JOUR A1 - Koumarelas, Ioannis A1 - Jiang, Lan A1 - Naumann, Felix T1 - Data preparation for duplicate detection JF - Journal of data and information quality : (JDIQ) N2 - Data errors represent a major issue in most application workflows. Before any important task can take place, a certain data quality has to be guaranteed by eliminating a number of different errors that may appear in data. Typically, most of these errors are fixed with data preparation methods, such as whitespace removal. However, the particular error of duplicate records, where multiple records refer to the same entity, is usually eliminated independently with specialized techniques. Our work is the first to bring these two areas together by applying data preparation operations under a systematic approach prior to performing duplicate detection.
Our process workflow can be summarized as follows: It begins with the user providing as input a sample of the gold standard, the actual dataset, and optionally some constraints to domain-specific data preparations, such as address normalization. The preparation selection operates in two consecutive phases. First, to vastly reduce the search space of ineffective data preparations, decisions are made based on the improvement or worsening of pair similarities. Second, using the remaining data preparations an iterative leave-one-out classification process removes preparations one by one and determines the redundant preparations based on the achieved area under the precision-recall curve (AUC-PR). Using this workflow, we manage to improve the results of duplicate detection up to 19% in AUC-PR. KW - data preparation KW - data wrangling KW - record linkage KW - duplicate detection KW - similarity measures Y1 - 2020 U6 - https://doi.org/10.1145/3377878 SN - 1936-1955 SN - 1936-1963 VL - 12 IS - 3 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Kossmann, Jan A1 - Schlosser, Rainer T1 - Self-driving database systems BT - a conceptual approach JF - Distributed and parallel databases N2 - Challenges for self-driving database systems, which tune their physical design and configuration autonomously, are manifold: Such systems have to anticipate future workloads, find robust configurations efficiently, and incorporate knowledge gained by previous actions into later decisions. We present a component-based framework for self-driving database systems that enables database integration and development of self-managing functionality with low overhead by relying on separation of concerns. By keeping the components of the framework reusable and exchangeable, experiments are simplified, which promotes further research in that area. Moreover, to optimize multiple mutually dependent features, e.g., index selection and compression configurations, we propose a linear programming (LP) based algorithm to derive an efficient tuning order automatically. Afterwards, we demonstrate the applicability and scalability of our approach with reproducible examples. KW - database systems KW - self-driving KW - recursive tuning KW - workload prediction KW - robustness Y1 - 2020 U6 - https://doi.org/10.1007/s10619-020-07288-w SN - 0926-8782 SN - 1573-7578 VL - 38 IS - 4 SP - 795 EP - 817 PB - Springer CY - Dordrecht ER -