TY  - JOUR
A1  - Peng, Junjie
A1  - Liu, Danxu
A1  - Wang, Yingtao
A1  - Zeng, Ying
A1  - Cheng, Feng
A1  - Zhang, Wenqiang
T1  - Weight-based strategy for an I/O-intensive application at a cloud data center
JF  - Concurrency and computation : practice & experience
N2  - Applications with different characteristics in the cloud may have different resources preferences. However, traditional resource allocation and scheduling strategies rarely take into account the characteristics of applications. Considering that an I/O-intensive application is a typical type of application and that frequent I/O accesses, especially small files randomly accessing the disk, may lead to an inefficient use of resources and reduce the quality of service (QoS) of applications, a weight allocation strategy is proposed based on the available resources that a physical server can provide as well as the characteristics of the applications. Using the weight obtained, a resource allocation and scheduling strategy is presented based on the specific application characteristics in the data center. Extensive experiments show that the strategy is correct and can guarantee a high concurrency of I/O per second (IOPS) in a cloud data center with high QoS. Additionally, the strategy can efficiently improve the utilization of the disk and resources of the data center without affecting the service quality of applications.
KW  - IOPS
KW  - process scheduling
KW  - random I
KW  - O
KW  - small files
KW  - weight
Y1  - 2018
U6  - https://doi.org/10.1002/cpe.4648
SN  - 1532-0626
SN  - 1532-0634
VL  - 30
IS  - 19
PB  - Wiley
CY  - Hoboken
ER  - 
TY  - JOUR
A1  - Schaub, Torsten H.
A1  - Woltran, Stefan
T1  - Answer set programming unleashed!
JF  - Künstliche Intelligenz
N2  - Answer Set Programming faces an increasing popularity for problem solving in various domains. While its modeling language allows us to express many complex problems in an easy way, its solving technology enables their effective resolution. In what follows, we detail some of the key factors of its success. Answer Set Programming [ASP; Brewka et al. Commun ACM 54(12):92–103, (2011)] is seeing a rapid proliferation in academia and industry due to its easy and flexible way to model and solve knowledge-intense combinatorial (optimization) problems. To this end, ASP offers a high-level modeling language paired with high-performance solving technology. As a result, ASP systems provide out-off-the-box, general-purpose search engines that allow for enumerating (optimal) solutions. They are represented as answer sets, each being a set of atoms representing a solution. The declarative approach of ASP allows a user to concentrate on a problem’s specification rather than the computational means to solve it. This makes ASP a prime candidate for rapid prototyping and an attractive tool for teaching key AI techniques since complex problems can be expressed in a succinct and elaboration tolerant way. This is eased by the tuning of ASP’s modeling language to knowledge representation and reasoning (KRR). The resulting impact is nicely reflected by a growing range of successful applications of ASP [Erdem et al. AI Mag 37(3):53–68, 2016; Falkner et al. Industrial applications of answer set programming. K++nstliche Intelligenz (2018)]
Y1  - 2018
U6  - https://doi.org/10.1007/s13218-018-0550-z
SN  - 0933-1875
SN  - 1610-1987
VL  - 32
IS  - 2-3
SP  - 105
EP  - 108
PB  - Springer
CY  - Heidelberg
ER  - 
TY  - GEN
A1  - Schaub, Torsten H.
A1  - Woltran, Stefan
T1  - Special issue on answer set programming
T2  - Künstliche Intelligenz
Y1  - 2018
U6  - https://doi.org/10.1007/s13218-018-0554-8
SN  - 0933-1875
SN  - 1610-1987
VL  - 32
IS  - 2-3
SP  - 101
EP  - 103
PB  - Springer
CY  - Heidelberg
ER  - 
TY  - JOUR
A1  - Schäfer, Robin
A1  - Stede, Manfred
T1  - Argument mining on twitter
BT  - a survey
JF  - Information technology : it ; Methoden und innovative Anwendungen der Informatik und Informationstechnik ; Organ der Fachbereiche 3 und 4 der GI e.V. und des Fachbereichs 6 der ITG
N2  - In the last decade, the field of argument mining has grown notably. However, only relatively few studies have investigated argumentation in social media and specifically on Twitter. Here, we provide the, to our knowledge, first critical in-depth survey of the state of the art in tweet-based argument mining. We discuss approaches to modelling the structure of arguments in the context of tweet corpus annotation, and we review current progress in the task of detecting argument components and their relations in tweets. We also survey the intersection of argument mining and stance detection, before we conclude with an outlook.
KW  - Argument Mining
KW  - Twitter
KW  - Stance Detection
Y1  - 2021
U6  - https://doi.org/10.1515/itit-2020-0053
SN  - 1611-2776
SN  - 2196-7032
VL  - 63
IS  - 1
SP  - 45
EP  - 58
PB  - De Gruyter
CY  - Berlin
ER  - 
TY  - JOUR
A1  - Ayzel, Georgy
A1  - Heistermann, Maik
T1  - The effect of calibration data length on the performance of a conceptual hydrological model versus LSTM and GRU
BT  - a case study for six basins from the CAMELS dataset
JF  - Computers & geosciences : an international journal devoted to the publication of papers on all aspects of geocomputation and to the distribution of computer programs and test data sets ; an official journal of the International Association for Mathematical Geology
N2  - We systematically explore the effect of calibration data length on the performance of a conceptual hydrological model, GR4H, in comparison to two Artificial Neural Network (ANN) architectures: Long Short-Term Memory Networks (LSTM) and Gated Recurrent Units (GRU), which have just recently been introduced to the field of hydrology. We implemented a case study for six river basins across the contiguous United States, with 25 years of meteorological and discharge data. Nine years were reserved for independent validation; two years were used as a warm-up period, one year for each of the calibration and validation periods, respectively; from the remaining 14 years, we sampled increasing amounts of data for model calibration, and found pronounced differences in model performance. While GR4H required less data to converge, LSTM and GRU caught up at a remarkable rate, considering their number of parameters. Also, LSTM and GRU exhibited the higher calibration instability in comparison to GR4H. These findings confirm the potential of modern deep-learning architectures in rainfall runoff modelling, but also highlight the noticeable differences between them in regard to the effect of calibration data length.
KW  - Artificial neural networks
KW  - Calibration
KW  - Deep learning
KW  - Rainfall-runoff
KW  - modelling
Y1  - 2021
U6  - https://doi.org/10.1016/j.cageo.2021.104708
SN  - 0098-3004
SN  - 1873-7803
VL  - 149
PB  - Elsevier
CY  - Amsterdam
ER  - 
TY  - JOUR
A1  - Kossmann, Jan
A1  - Halfpap, Stefan
A1  - Jankrift, Marcel
A1  - Schlosser, Rainer
T1  - Magic mirror in my hand, which is the best in the land?
BT  - an experimental evaluation of index selection algorithms
JF  - Proceedings of the VLDB Endowment
N2  - Indexes are essential for the efficient processing of database workloads. Proposed solutions for the relevant and challenging index selection problem range from metadata-based simple heuristics, over sophisticated multi-step algorithms, to approaches that yield optimal results. The main challenges are (i) to accurately determine the effect of an index on the workload cost while considering the interaction of indexes and (ii) a large number of possible combinations resulting from workloads containing many queries and massive schemata with possibly thousands of attributes. <br /> In this work, we describe and analyze eight index selection algorithms that are based on different concepts and compare them along different dimensions, such as solution quality, runtime, multi-column support, solution granularity, and complexity. In particular, we analyze the solutions of the algorithms for the challenging analytical Join Order, TPC-H, and TPC-DS benchmarks. Afterward, we assess strengths and weaknesses, infer insights for index selection in general and each approach individually, before we give recommendations on when to use which approach.
Y1  - 2020
U6  - https://doi.org/10.14778/3407790.3407832
SN  - 2150-8097
VL  - 13
IS  - 11
SP  - 2382
EP  - 2395
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - JOUR
A1  - Kaya, Adem
A1  - Freitag, Melina A.
T1  - Conditioning analysis for discrete Helmholtz problems
JF  - Computers and mathematics with applications : an international journal
N2  - In this paper, we examine conditioning of the discretization of the Helmholtz problem. Although the discrete Helmholtz problem has been studied from different perspectives, to the best of our knowledge, there is no conditioning analysis for it. We aim to fill this gap in the literature. We propose a novel method in 1D to observe the near-zero eigenvalues of a symmetric indefinite matrix. Standard classification of ill-conditioning based on the matrix condition number is not true for the discrete Helmholtz problem. We relate the ill-conditioning of the discretization of the Helmholtz problem with the condition number of the matrix. We carry out analytical conditioning analysis in 1D and extend our observations to 2D with numerical observations. We examine several discretizations. We find different regions in which the condition number of the problem shows different characteristics. We also explain the general behavior of the solutions in these regions.
KW  - Helmholtz problem
KW  - Condition number
KW  - Ill-conditioning
KW  - Indefinite
KW  - matrices
Y1  - 2022
U6  - https://doi.org/10.1016/j.camwa.2022.05.016
SN  - 0898-1221
SN  - 1873-7668
VL  - 118
SP  - 171
EP  - 182
PB  - Elsevier Science
CY  - Amsterdam
ER  - 
TY  - JOUR
A1  - Mattis, Toni
A1  - Beckmann, Tom
A1  - Rein, Patrick
A1  - Hirschfeld, Robert
T1  - First-class concepts
BT  - Reified architectural knowledge beyond dominant decompositions
JF  - Journal of object technology : JOT / ETH Zürich, Department of Computer Science
N2  - Ideally, programs are partitioned into independently maintainable and understandable modules. As a system grows, its architecture gradually loses the capability to accommodate new concepts in a modular way. While refactoring is expensive and not always possible, and the programming language might lack dedicated primary language constructs to express certain cross-cutting concerns, programmers are still able to explain and delineate convoluted concepts through secondary means: code comments, use of whitespace and arrangement of code, documentation, or communicating tacit knowledge. <br /> Secondary constructs are easy to change and provide high flexibility in communicating cross-cutting concerns and other concepts among programmers. However, such secondary constructs usually have no reified representation that can be explored and manipulated as first-class entities through the programming environment. <br /> In this exploratory work, we discuss novel ways to express a wide range of concepts, including cross-cutting concerns, patterns, and lifecycle artifacts independently of the dominant decomposition imposed by an existing architecture. We propose the representation of concepts as first-class objects inside the programming environment that retain the capability to change as easily as code comments. We explore new tools that allow programmers to view, navigate, and change programs based on conceptual perspectives. In a small case study, we demonstrate how such views can be created and how the programming experience changes from draining programmers' attention by stretching it across multiple modules toward focusing it on cohesively presented concepts. Our designs are geared toward facilitating multiple secondary perspectives on a system to co-exist in symbiosis with the original architecture, hence making it easier to explore, understand, and explain complex contexts and narratives that are hard or impossible to express using primary modularity constructs.
KW  - software engineering
KW  - modularity
KW  - exploratory programming
KW  - program
KW  - comprehension
KW  - remodularization
KW  - architecture recovery
Y1  - 2022
U6  - https://doi.org/10.5381/jot.2022.21.2.a6
SN  - 1660-1769
VL  - 21
IS  - 2
SP  - 1
EP  - 15
PB  - ETH Zürich, Department of Computer Science
CY  - Zürich
ER  - 
TY  - JOUR
A1  - Koumarelas, Ioannis
A1  - Jiang, Lan
A1  - Naumann, Felix
T1  - Data preparation for duplicate detection
JF  - Journal of data and information quality : (JDIQ)
N2  - Data errors represent a major issue in most application workflows. Before any important task can take place, a certain data quality has to be guaranteed by eliminating a number of different errors that may appear in data. Typically, most of these errors are fixed with data preparation methods, such as whitespace removal. However, the particular error of duplicate records, where multiple records refer to the same entity, is usually eliminated independently with specialized techniques. Our work is the first to bring these two areas together by applying data preparation operations under a systematic approach prior to performing duplicate detection. <br /> Our process workflow can be summarized as follows: It begins with the user providing as input a sample of the gold standard, the actual dataset, and optionally some constraints to domain-specific data preparations, such as address normalization. The preparation selection operates in two consecutive phases. First, to vastly reduce the search space of ineffective data preparations, decisions are made based on the improvement or worsening of pair similarities. Second, using the remaining data preparations an iterative leave-one-out classification process removes preparations one by one and determines the redundant preparations based on the achieved area under the precision-recall curve (AUC-PR). Using this workflow, we manage to improve the results of duplicate detection up to 19% in AUC-PR.
KW  - data preparation
KW  - data wrangling
KW  - record linkage
KW  - duplicate detection
KW  - similarity measures
Y1  - 2020
U6  - https://doi.org/10.1145/3377878
SN  - 1936-1955
SN  - 1936-1963
VL  - 12
IS  - 3
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - JOUR
A1  - Kossmann, Jan
A1  - Schlosser, Rainer
T1  - Self-driving database systems
BT  - a conceptual approach
JF  - Distributed and parallel databases
N2  - Challenges for self-driving database systems, which tune their physical design and configuration autonomously, are manifold: Such systems have to anticipate future workloads, find robust configurations efficiently, and incorporate knowledge gained by previous actions into later decisions. We present a component-based framework for self-driving database systems that enables database integration and development of self-managing functionality with low overhead by relying on separation of concerns. By keeping the components of the framework reusable and exchangeable, experiments are simplified, which promotes further research in that area. Moreover, to optimize multiple mutually dependent features, e.g., index selection and compression configurations, we propose a linear programming (LP) based algorithm to derive an efficient tuning order automatically. Afterwards, we demonstrate the applicability and scalability of our approach with reproducible examples.
KW  - database systems
KW  - self-driving
KW  - recursive tuning
KW  - workload prediction
KW  - robustness
Y1  - 2020
U6  - https://doi.org/10.1007/s10619-020-07288-w
SN  - 0926-8782
SN  - 1573-7578
VL  - 38
IS  - 4
SP  - 795
EP  - 817
PB  - Springer
CY  - Dordrecht
ER  -