publish.UP Search

Adaptive windows for duplicate detection (2012)

Draisbach, Uwe ; Naumann, Felix ; Szott, Sascha ; Wonneberg, Oliver

Duplicate detection is the task of identifying all groups of records within a data set that represent the same real-world entity, respectively. This task is difficult, because (i) representations might differ slightly, so some similarity measure must be defined to compare pairs of records and (ii) data sets might have a high volume making a pair-wise comparison of all records infeasible. To tackle the second problem, many algorithms have been suggested that partition the data set and compare all record pairs only within each partition. One well-known such approach is the Sorted Neighborhood Method (SNM), which sorts the data according to some key and then advances a window over the data comparing only records that appear within the same window. We propose several variations of SNM that have in common a varying window size and advancement. The general intuition of such adaptive windows is that there might be regions of high similarity suggesting a larger window size and regions of lower similarity suggesting a smaller window size. We propose and thoroughly evaluate several adaption strategies, some of which are provably better than the original SNM in terms of efficiency (same results with fewer comparisons).

An alert correlation platform for memory-supported techniques (2012)

Roschke, Sebastian ; Cheng, Feng ; Meinel, Christoph

Intrusion Detection Systems (IDS) have been widely deployed in practice for detecting malicious behavior on network communication and hosts. False-positive alerts are a popular problem for most IDS approaches. The solution to address this problem is to enhance the detection process by correlation and clustering of alerts. To meet the practical requirements, this process needs to be finished fast, which is a challenging task as the amount of alerts in large-scale IDS deployments is significantly high. We identifytextitdata storage and processing algorithms to be the most important factors influencing the performance of clustering and correlation. We propose and implement a highly efficient alert correlation platform. For storage, a column-based database, an In-Memory alert storage, and memory-based index tables lead to significant improvements of the performance. For processing, algorithms are designed and implemented which are optimized for In-Memory databases, e.g. an attack graph-based correlation algorithm. The platform can be distributed over multiple processing units to share memory and processing power. A standardized interface is designed to provide a unified view of result reports for end users. The efficiency of the platform is tested by practical experiments with several alert storage approaches, multiple algorithms, as well as a local and a distributed deployment.

Behaviour equivalence and compatibility of business process models with complex correspondences (2012)

Weidlich, Matthias ; Dijkman, Remco ; Weske, Mathias

Once multiple models of a business process are created for different purposes or to capture different variants, verification of behaviour equivalence or compatibility is needed. Equivalence verification ensures that two business process models specify the same behaviour. Since different process models are likely to differ with respect to their assumed level of abstraction and the actions that they take into account, equivalence notions have to cope with correspondences between sets of actions and actions that exist in one process but not in the other. In this paper, we present notions of equivalence and compatibility that can handle these problems. In essence, we present a notion of equivalence that works on correspondences between sets of actions rather than single actions. We then integrate our equivalence notion with work on behaviour inheritance that copes with actions that exist in one process but not in the other, leading to notions of behaviour compatibility. Compatibility notions verify that two models have the same behaviour with respect to the actions that they have in common. As such, our contribution is a collection of behaviour equivalence and compatibility notions that are applicable in more general settings than existing ones.

Charging and billing in modern communications networks a comprehensive survey of the state of the art and future requirements (2012)

Kühne, Ralph ; Huitema, George ; Carle, George

In mobile telecommunication networks the trend for an increasing heterogeneity of access networks, the convergence with fixed networks as well as with the Internet are apparent. The resulting future converged network with an expected wide variety of services and a possibly stiff competition between the different market participants as well as legal issues will bring about requirements for charging systems that demand for more flexibility, scalability and efficiency than is available in today's systems. This article surveys recent developments in charging and billing architectures comprising both standardisation work as well as research projects. The second main contribution of this article is a comparison of key features of these developments thus giving a list of essential charging and billing ingredients for tomorrow's communication and service environments.

Concepts for cartography-oriented visualization of virtual 3D city models (2012)

Semmo, Amir ; Hildebrandt, Dieter ; Trapp, Matthias ; Döllner, Jürgen Roland Friedrich

Virtual 3D city models serve as an effective medium with manifold applications in geoinformation systems and services. To date, most 3D city models are visualized using photorealistic graphics. But an effective communication of geoinformation significantly depends on how important information is designed and cognitively processed in the given application context. One possibility to visually emphasize important information is based on non-photorealistic rendering, which comprehends artistic depiction styles and is characterized by its expressiveness and communication aspects. However, a direct application of non-photorealistic rendering techniques primarily results in monotonic visualization that lacks cartographic design aspects. In this work, we present concepts for cartography-oriented visualization of virtual 3D city models. These are based on coupling non-photorealistic rendering techniques and semantics-based information for a user, context, and media-dependent representation of thematic information. This work highlights challenges for cartography-oriented visualization of 3D geovirtual environments, presents stylization techniques and discusses their applications and ideas for a standardized visualization. In particular, the presented concepts enable a real-time and dynamic visualization of thematic geoinformation.

Covering or complete? : Discovering conditional inclusion dependencies (2012)

Bauckmann, Jana ; Abedjan, Ziawasch ; Leser, Ulf ; Müller, Heiko ; Naumann, Felix

Data dependencies, or integrity constraints, are used to improve the quality of a database schema, to optimize queries, and to ensure consistency in a database. In the last years conditional dependencies have been introduced to analyze and improve data quality. In short, a conditional dependency is a dependency with a limited scope defined by conditions over one or more attributes. Only the matching part of the instance must adhere to the dependency. In this paper we focus on conditional inclusion dependencies (CINDs). We generalize the definition of CINDs, distinguishing covering and completeness conditions. We present a new use case for such CINDs showing their value for solving complex data quality tasks. Further, we define quality measures for conditions inspired by precision and recall. We propose efficient algorithms that identify covering and completeness conditions conforming to given quality thresholds. Our algorithms choose not only the condition values but also the condition attributes automatically. Finally, we show that our approach efficiently provides meaningful and helpful results for our use case.

Cyber-physical systems with dynamic structure : towards modeling and verification of inductive invariants (2012)

Becker, Basil ; Giese, Holger

Cyber-physical systems achieve sophisticated system behavior exploring the tight interconnection of physical coupling present in classical engineering systems and information technology based coupling. A particular challenging case are systems where these cyber-physical systems are formed ad hoc according to the specific local topology, the available networking capabilities, and the goals and constraints of the subsystems captured by the information processing part. In this paper we present a formalism that permits to model the sketched class of cyber-physical systems. The ad hoc formation of tightly coupled subsystems of arbitrary size are specified using a UML-based graph transformation system approach. Differential equations are employed to define the resulting tightly coupled behavior. Together, both form hybrid graph transformation systems where the graph transformation rules define the discrete steps where the topology or modes may change, while the differential equations capture the continuous behavior in between such discrete changes. In addition, we demonstrate that automated analysis techniques known for timed graph transformation systems for inductive invariants can be extended to also cover the hybrid case for an expressive case of hybrid models where the formed tightly coupled subsystems are restricted to smaller local networks.

Explicit use-case representation in object-oriented programming languages (2012)

Hirschfeld, Robert ; Perscheid, Michael ; Haupt, Michael

Use-cases are considered an integral part of most contemporary development processes since they describe a software system's expected behavior from the perspective of its prospective users. However, the presence of and traceability to use-cases is increasingly lost in later more code-centric development activities. Use-cases, being well-encapsulated at the level of requirements descriptions, eventually lead to crosscutting concerns in system design and source code. Tracing which parts of the system contribute to which use-cases is therefore hard and so limits understandability. In this paper, we propose an approach to making use-cases first-class entities in both the programming language and the runtime environment. Having use-cases present in the code and the running system will allow developers, maintainers, and operators to easily associate their units of work with what matters to the users. We suggest the combination of use-cases, acceptance tests, and dynamic analysis to automatically associate source code with use-cases. We present UseCasePy, an implementation of our approach to use-case-centered development in Python, and its application to the Django Web framework.

Generalized aggregate quality of service computation for composite services (2012)

Yang, Yong ; Dumas, Marlon ; Garcia-Banuelos, Luciano ; Polyvyanyy, Artem ; Zhang, Liang

This article addresses the problem of estimating the Quality of Service (QoS) of a composite service given the QoS of the services participating in the composition. Previous solutions to this problem impose restrictions on the topology of the orchestration models, limiting their applicability to well-structured orchestration models for example. This article lifts these restrictions by proposing a method for aggregate QoS computation that deals with more general types of unstructured orchestration models. The applicability and scalability of the proposed method are validated using a collection of models from industrial practice.

In-Memory Data Management (2012)

Plattner, Hasso ; Zeier, Alexander

Nach 50 Jahren erfolgreicher Entwicklunghat die Business-IT einen neuenWendepunkt erreicht. Hier zeigen die Autoren erstmalig, wieIn-Memory Computing dieUnternehmensprozesse künftig verändern wird. Bisher wurden Unternehmensdaten aus Performance-Gründen auf verschiedene Datenbanken verteilt: Analytische Datenresidieren in Data Warehouses und werden regelmäßig mithilfe transaktionaler Systeme synchronisiert. Diese Aufspaltung macht flexibles Echtzeit-Reporting aktueller Daten unmöglich. Doch dank leistungsfähigerMulti-Core-CPUs, großer Hauptspeicher, Cloud Computing und immerbesserer mobiler Endgeräte lassen die Unternehmen dieses restriktive Modell zunehmend hinter sich. Die Autoren stellen Techniken vor, die eine analytische und transaktionale Verarbeitung in Echtzeit erlauben und so dem Geschäftsleben neue Wege bahnen.

Refine

Has Fulltext

Author

Year of publication

Document Type

Language

Is part of the Bibliography

Keywords

Institute

30 search hits