publish.UP Search

Understanding cryptic schemata in large extract-transform-load systems (2012)

Extract-Transform-Load (ETL) tools are used for the creation, maintenance, and evolution of data warehouses, data marts, and operational data stores. ETL workflows populate those systems with data from various data sources by specifying and executing a DAG of transformations. Over time, hundreds of individual workflows evolve as new sources and new requirements are integrated into the system. The maintenance and evolution of large-scale ETL systems requires much time and manual effort. A key problem is to understand the meaning of unfamiliar attribute labels in source and target databases and ETL transformations. Hard-to-understand attribute labels lead to frustration and time spent to develop and understand ETL workflows. We present a schema decryption technique to support ETL developers in understanding cryptic schemata of sources, targets, and ETL transformations. For a given ETL system, our recommender-like approach leverages the large number of mapped attribute labels in existing ETL workflows to produce good and meaningful decryptions. In this way we are able to decrypt attribute labels consisting of a number of unfamiliar few-letter abbreviations, such as UNP_PEN_INT, which we can decrypt to UNPAID_PENALTY_INTEREST. We evaluate our schema decryption approach on three real-world repositories of ETL workflows and show that our approach is able to suggest high-quality decryptions for cryptic attribute labels in a given schema.

Secure neighbor discovery Review, challenges, perspectives, and recommendations (2012)

AlSa'deh, Ahmad ; Meinel, Christoph

Secure Neighbor Discovery is designed as a countermeasure to Neighbor Discovery Protocol threats. The authors discuss Secure Neighbor Discovery implementation and deployment challenges and review proposals to optimize it.

The JCop language specification : Version 1.0, April 2012 (2012)

Appeltauer, Malte ; Hirschfeld, Robert

Program behavior that relies on contextual information, such as physical location or network accessibility, is common in today's applications, yet its representation is not sufficiently supported by programming languages. With context-oriented programming (COP), such context-dependent behavioral variations can be explicitly modularized and dynamically activated. In general, COP could be used to manage any context-specific behavior. However, its contemporary realizations limit the control of dynamic adaptation. This, in turn, limits the interaction of COP's adaptation mechanisms with widely used architectures, such as event-based, mobile, and distributed programming. The JCop programming language extends Java with language constructs for context-oriented programming and additionally provides a domain-specific aspect language for declarative control over runtime adaptations. As a result, these redesigned implementations are more concise and better modularized than their counterparts using plain COP. JCop's main features have been described in our previous publications. However, a complete language specification has not been presented so far. This report presents the entire JCop language including the syntax and semantics of its new language constructs.

Covering or complete? : Discovering conditional inclusion dependencies (2012)

Bauckmann, Jana ; Abedjan, Ziawasch ; Leser, Ulf ; Müller, Heiko ; Naumann, Felix

Data dependencies, or integrity constraints, are used to improve the quality of a database schema, to optimize queries, and to ensure consistency in a database. In the last years conditional dependencies have been introduced to analyze and improve data quality. In short, a conditional dependency is a dependency with a limited scope defined by conditions over one or more attributes. Only the matching part of the instance must adhere to the dependency. In this paper we focus on conditional inclusion dependencies (CINDs). We generalize the definition of CINDs, distinguishing covering and completeness conditions. We present a new use case for such CINDs showing their value for solving complex data quality tasks. Further, we define quality measures for conditions inspired by precision and recall. We propose efficient algorithms that identify covering and completeness conditions conforming to given quality thresholds. Our algorithms choose not only the condition values but also the condition attributes automatically. Finally, we show that our approach efficiently provides meaningful and helpful results for our use case.

Cyber-physical systems with dynamic structure : towards modeling and verification of inductive invariants (2012)

Becker, Basil ; Giese, Holger

Cyber-physical systems achieve sophisticated system behavior exploring the tight interconnection of physical coupling present in classical engineering systems and information technology based coupling. A particular challenging case are systems where these cyber-physical systems are formed ad hoc according to the specific local topology, the available networking capabilities, and the goals and constraints of the subsystems captured by the information processing part. In this paper we present a formalism that permits to model the sketched class of cyber-physical systems. The ad hoc formation of tightly coupled subsystems of arbitrary size are specified using a UML-based graph transformation system approach. Differential equations are employed to define the resulting tightly coupled behavior. Together, both form hybrid graph transformation systems where the graph transformation rules define the discrete steps where the topology or modes may change, while the differential equations capture the continuous behavior in between such discrete changes. In addition, we demonstrate that automated analysis techniques known for timed graph transformation systems for inductive invariants can be extended to also cover the hybrid case for an expressive case of hybrid models where the formed tightly coupled subsystems are restricted to smaller local networks.

Adaptive windows for duplicate detection (2012)

Draisbach, Uwe ; Naumann, Felix ; Szott, Sascha ; Wonneberg, Oliver

Duplicate detection is the task of identifying all groups of records within a data set that represent the same real-world entity, respectively. This task is difficult, because (i) representations might differ slightly, so some similarity measure must be defined to compare pairs of records and (ii) data sets might have a high volume making a pair-wise comparison of all records infeasible. To tackle the second problem, many algorithms have been suggested that partition the data set and compare all record pairs only within each partition. One well-known such approach is the Sorted Neighborhood Method (SNM), which sorts the data according to some key and then advances a window over the data comparing only records that appear within the same window. We propose several variations of SNM that have in common a varying window size and advancement. The general intuition of such adaptive windows is that there might be regions of high similarity suggesting a larger window size and regions of lower similarity suggesting a smaller window size. We propose and thoroughly evaluate several adaption strategies, some of which are provably better than the original SNM in terms of efficiency (same results with fewer comparisons).

M-Adhesive Transformation Systems with Nested Application Conditions Part 2: Embedding, Critical Pairs and Local Confluence (2012)

Ehrig, Hartmut ; Golas, Ulrike ; Habel, Annegret ; Lambers, Leen ; Orejas, Fernando

Graph transformation systems have been studied extensively and applied to several areas of computer science like formal language theory, the modeling of databases, concurrent or distributed systems, and visual, logical, and functional programming. In most kinds of applications it is necessary to have the possibility of restricting the applicability of rules. This is usually done by means of application conditions. In this paper, we continue the work of extending the fundamental theory of graph transformation to the case where rules may use arbitrary (nested) application conditions. More precisely, we generalize the Embedding theorem, and we study how local confluence can be checked in this context. In particular, we define a new notion of critical pair which allows us to formulate and prove a Local Confluence Theorem for the general case of rules with nested application conditions. All our results are presented, not for a specific class of graphs, but for any arbitrary M-adhesive category, which means that our results apply to most kinds of graphical structures. We demonstrate our theory on the modeling of an elevator control by a typed graph transformation system with positive and negative application conditions.

Industrial case study on the integration of SysML and AUTOSAR with triple graph grammars (2012)

Giese, Holger ; Hildebrandt, Stephan ; Neumann, Stefan ; Wätzoldt, Sebastian

During the overall development of complex engineering systems different modeling notations are employed. For example, in the domain of automotive systems system engineering models are employed quite early to capture the requirements and basic structuring of the entire system, while software engineering models are used later on to describe the concrete software architecture. Each model helps in addressing the specific design issue with appropriate notations and at a suitable level of abstraction. However, when we step forward from system design to the software design, the engineers have to ensure that all decisions captured in the system design model are correctly transferred to the software engineering model. Even worse, when changes occur later on in either model, today the consistency has to be reestablished in a cumbersome manual step. In this report, we present in an extended version of [Holger Giese, Stefan Neumann, and Stephan Hildebrandt. Model Synchronization at Work: Keeping SysML and AUTOSAR Models Consistent. In Gregor Engels, Claus Lewerentz, Wilhelm Schäfer, Andy Schürr, and B. Westfechtel, editors, Graph Transformations and Model Driven Enginering - Essays Dedicated to Manfred Nagl on the Occasion of his 65th Birthday, volume 5765 of Lecture Notes in Computer Science, pages 555–579. Springer Berlin / Heidelberg, 2010.] how model synchronization and consistency rules can be applied to automate this task and ensure that the different models are kept consistent. We also introduce a general approach for model synchronization. Besides synchronization, the approach consists of tool adapters as well as consistency rules covering the overlap between the synchronized parts of a model and the rest. We present the model synchronization algorithm based on triple graph grammars in detail and further exemplify the general approach by means of a model synchronization solution between system engineering models in SysML and software engineering models in AUTOSAR which has been developed for an industrial partner. In the appendix as extension to [19] the meta-models and all TGG rules for the SysML to AUTOSAR model synchronization are documented.

Multi-scale representations of virtual 3D city models (2012)

Glander, Tassilo

Virtual 3D city and landscape models are the main subject investigated in this thesis. They digitally represent urban space and have many applications in different domains, e.g., simulation, cadastral management, and city planning. Visualization is an elementary component of these applications. Photo-realistic visualization with an increasingly high degree of detail leads to fundamental problems for comprehensible visualization. A large number of highly detailed and textured objects within a virtual 3D city model may create visual noise and overload the users with information. Objects are subject to perspective foreshortening and may be occluded or not displayed in a meaningful way, as they are too small. In this thesis we present abstraction techniques that automatically process virtual 3D city and landscape models to derive abstracted representations. These have a reduced degree of detail, while essential characteristics are preserved. After introducing definitions for model, scale, and multi-scale representations, we discuss the fundamentals of map generalization as well as techniques for 3D generalization. The first presented technique is a cell-based generalization of virtual 3D city models. It creates abstract representations that have a highly reduced level of detail while maintaining essential structures, e.g., the infrastructure network, landmark buildings, and free spaces. The technique automatically partitions the input virtual 3D city model into cells based on the infrastructure network. The single building models contained in each cell are aggregated to abstracted cell blocks. Using weighted infrastructure elements, cell blocks can be computed on different hierarchical levels, storing the hierarchy relation between the cell blocks. Furthermore, we identify initial landmark buildings within a cell by comparing the properties of individual buildings with the aggregated properties of the cell. For each block, the identified landmark building models are subtracted using Boolean operations and integrated in a photo-realistic way. Finally, for the interactive 3D visualization we discuss the creation of the virtual 3D geometry and their appearance styling through colors, labeling, and transparency. We demonstrate the technique with example data sets. Additionally, we discuss applications of generalization lenses and transitions between abstract representations. The second technique is a real-time-rendering technique for geometric enhancement of landmark objects within a virtual 3D city model. Depending on the virtual camera distance, landmark objects are scaled to ensure their visibility within a specific distance interval while deforming their environment. First, in a preprocessing step a landmark hierarchy is computed, this is then used to derive distance intervals for the interactive rendering. At runtime, using the virtual camera distance, a scaling factor is computed and applied to each landmark. The scaling factor is interpolated smoothly at the interval boundaries using cubic Bézier splines. Non-landmark geometry that is near landmark objects is deformed with respect to a limited number of landmarks. We demonstrate the technique by applying it to a highly detailed virtual 3D city model and a generalized 3D city model. In addition we discuss an adaptation of the technique for non-linear projections and mobile devices. The third technique is a real-time rendering technique to create abstract 3D isocontour visualization of virtual 3D terrain models. The virtual 3D terrain model is visualized as a layered or stepped relief. The technique works without preprocessing and, as it is implemented using programmable graphics hardware, can be integrated with minimal changes into common terrain rendering techniques. Consequently, the computation is done in the rendering pipeline for each vertex, primitive, i.e., triangle, and fragment. For each vertex, the height is quantized to the nearest isovalue. For each triangle, the vertex configuration with respect to their isovalues is determined first. Using the configuration, the triangle is then subdivided. The subdivision forms a partial step geometry aligned with the triangle. For each fragment, the surface appearance is determined, e.g., depending on the surface texture, shading, and height-color-mapping. Flexible usage of the technique is demonstrated with applications from focus+context visualization, out-of-core terrain rendering, and information visualization. This thesis presents components for the creation of abstract representations of virtual 3D city and landscape models. Re-using visual language from cartography, the techniques enable users to build on their experience with maps when interpreting these representations. Simultaneously, characteristics of 3D geovirtual environments are taken into account by addressing and discussing, e.g., continuous scale, interaction, and perspective.

MDE settings in SAP : a descriptive field study (2012)

Hebig, Regina ; Giese, Holger

MDE techniques are more and more used in praxis. However, there is currently a lack of detailed reports about how different MDE techniques are integrated into the development and combined with each other. To learn more about such MDE settings, we performed a descriptive and exploratory field study with SAP, which is a worldwide operating company with around 50.000 employees and builds enterprise software applications. This technical report describes insights we got during this study. For example, we identified that MDE settings are subject to evolution. Finally, this report outlines directions for future research to provide practical advises for the application of MDE settings.

Explicit use-case representation in object-oriented programming languages (2012)

Hirschfeld, Robert ; Perscheid, Michael ; Haupt, Michael

Use-cases are considered an integral part of most contemporary development processes since they describe a software system's expected behavior from the perspective of its prospective users. However, the presence of and traceability to use-cases is increasingly lost in later more code-centric development activities. Use-cases, being well-encapsulated at the level of requirements descriptions, eventually lead to crosscutting concerns in system design and source code. Tracing which parts of the system contribute to which use-cases is therefore hard and so limits understandability. In this paper, we propose an approach to making use-cases first-class entities in both the programming language and the runtime environment. Having use-cases present in the code and the running system will allow developers, maintainers, and operators to easily associate their units of work with what matters to the users. We suggest the combination of use-cases, acceptance tests, and dynamic analysis to automatically associate source code with use-cases. We present UseCasePy, an implementation of our approach to use-case-centered development in Python, and its application to the Django Web framework.

Quantitative modeling and analysis of service-oriented real-time systems using interval probabilistic timed automata (2012)

Krause, Christian ; Giese, Holger

One of the key challenges in service-oriented systems engineering is the prediction and assurance of non-functional properties, such as the reliability and the availability of composite interorganizational services. Such systems are often characterized by a variety of inherent uncertainties, which must be addressed in the modeling and the analysis approach. The different relevant types of uncertainties can be categorized into (1) epistemic uncertainties due to incomplete knowledge and (2) randomization as explicitly used in protocols or as a result of physical processes. In this report, we study a probabilistic timed model which allows us to quantitatively reason about nonfunctional properties for a restricted class of service-oriented real-time systems using formal methods. To properly motivate the choice for the used approach, we devise a requirements catalogue for the modeling and the analysis of probabilistic real-time systems with uncertainties and provide evidence that the uncertainties of type (1) and (2) in the targeted systems have a major impact on the used models and require distinguished analysis approaches. The formal model we use in this report are Interval Probabilistic Timed Automata (IPTA). Based on the outlined requirements, we give evidence that this model provides both enough expressiveness for a realistic and modular specifiation of the targeted class of systems, and suitable formal methods for analyzing properties, such as safety and reliability properties in a quantitative manner. As technical means for the quantitative analysis, we build on probabilistic model checking, specifically on probabilistic time-bounded reachability analysis and computation of expected reachability rewards and costs. To carry out the quantitative analysis using probabilistic model checking, we developed an extension of the Prism tool for modeling and analyzing IPTA. Our extension of Prism introduces a means for modeling probabilistic uncertainty in the form of probability intervals, as required for IPTA. For analyzing IPTA, our Prism extension moreover adds support for probabilistic reachability checking and computation of expected rewards and costs. We discuss the performance of our extended version of Prism and compare the interval-based IPTA approach to models with fixed probabilities.

Charging and billing in modern communications networks a comprehensive survey of the state of the art and future requirements (2012)

Kühne, Ralph ; Huitema, George ; Carle, George

In mobile telecommunication networks the trend for an increasing heterogeneity of access networks, the convergence with fixed networks as well as with the Internet are apparent. The resulting future converged network with an expected wide variety of services and a possibly stiff competition between the different market participants as well as legal issues will bring about requirements for charging systems that demand for more flexibility, scalability and efficiency than is available in today's systems. This article surveys recent developments in charging and billing architectures comprising both standardisation work as well as research projects. The second main contribution of this article is a comparison of key features of these developments thus giving a list of essential charging and billing ingredients for tomorrow's communication and service environments.

Lazy graph transformation (2012)

Orejas, Fernando ; Lambers, Leen

Applying an attributed graph transformation rule to a given object graph always implies some kind of constraint solving. In many cases, the given constraints are almost trivial to solve. For instance, this is the case when a rule describes a transformation G double right arrow H, where the attributes of H are obtained by some simple computation from the attributes of G. However there are many other cases where the constraints to solve may be not so trivial and, moreover, may have several answers. This is the case, for instance, when the transformation process includes some kind of searching. In the current approaches to attributed graph transformation these constraints must be completely solved when defining the matching of the given transformation rule. This kind of early binding is well-known from other areas of Computer Science to be inadequate. For instance, the solution chosen for the constraints associated to a given transformation step may be not fully adequate, meaning that later, in the search for a better solution, we may need to backtrack this transformation step. In this paper, based on our previous work on the use of symbolic graphs to deal with different aspects related with attributed graphs, including attributed graph transformation, we present a new approach that, based on the new notion of narrowing graph transformation rule, allows us to delay constraint solving when doing attributed graph transformation, in a way that resembles lazy computation. For this reason, we have called lazy this new kind of transformation. Moreover, we show that the approach is sound and complete with respect to standard attributed graph transformation. A running example, where a graph transformation system describes some basic operations of a travel agency, shows the practical interest of the approach.

In-Memory Data Management (2012)

Plattner, Hasso ; Zeier, Alexander

Nach 50 Jahren erfolgreicher Entwicklunghat die Business-IT einen neuenWendepunkt erreicht. Hier zeigen die Autoren erstmalig, wieIn-Memory Computing dieUnternehmensprozesse künftig verändern wird. Bisher wurden Unternehmensdaten aus Performance-Gründen auf verschiedene Datenbanken verteilt: Analytische Datenresidieren in Data Warehouses und werden regelmäßig mithilfe transaktionaler Systeme synchronisiert. Diese Aufspaltung macht flexibles Echtzeit-Reporting aktueller Daten unmöglich. Doch dank leistungsfähigerMulti-Core-CPUs, großer Hauptspeicher, Cloud Computing und immerbesserer mobiler Endgeräte lassen die Unternehmen dieses restriktive Modell zunehmend hinter sich. Die Autoren stellen Techniken vor, die eine analytische und transaktionale Verarbeitung in Echtzeit erlauben und so dem Geschäftsleben neue Wege bahnen.

Structuring process models (2012)

Polyvyanyy, Artem

One can fairly adopt the ideas of Donald E. Knuth to conclude that process modeling is both a science and an art. Process modeling does have an aesthetic sense. Similar to composing an opera or writing a novel, process modeling is carried out by humans who undergo creative practices when engineering a process model. Therefore, the very same process can be modeled in a myriad number of ways. Once modeled, processes can be analyzed by employing scientific methods. Usually, process models are formalized as directed graphs, with nodes representing tasks and decisions, and directed arcs describing temporal constraints between the nodes. Common process definition languages, such as Business Process Model and Notation (BPMN) and Event-driven Process Chain (EPC) allow process analysts to define models with arbitrary complex topologies. The absence of structural constraints supports creativity and productivity, as there is no need to force ideas into a limited amount of available structural patterns. Nevertheless, it is often preferable that models follow certain structural rules. A well-known structural property of process models is (well-)structuredness. A process model is (well-)structured if and only if every node with multiple outgoing arcs (a split) has a corresponding node with multiple incoming arcs (a join), and vice versa, such that the set of nodes between the split and the join induces a single-entry-single-exit (SESE) region; otherwise the process model is unstructured. The motivations for well-structured process models are manifold: (i) Well-structured process models are easier to layout for visual representation as their formalizations are planar graphs. (ii) Well-structured process models are easier to comprehend by humans. (iii) Well-structured process models tend to have fewer errors than unstructured ones and it is less probable to introduce new errors when modifying a well-structured process model. (iv) Well-structured process models are better suited for analysis with many existing formal techniques applicable only for well-structured process models. (v) Well-structured process models are better suited for efficient execution and optimization, e.g., when discovering independent regions of a process model that can be executed concurrently. Consequently, there are process modeling languages that encourage well-structured modeling, e.g., Business Process Execution Language (BPEL) and ADEPT. However, the well-structured process modeling implies some limitations: (i) There exist processes that cannot be formalized as well-structured process models. (ii) There exist processes that when formalized as well-structured process models require a considerable duplication of modeling constructs. Rather than expecting well-structured modeling from start, we advocate for the absence of structural constraints when modeling. Afterwards, automated methods can suggest, upon request and whenever possible, alternative formalizations that are "better" structured, preferably well-structured. In this thesis, we study the problem of automatically transforming process models into equivalent well-structured models. The developed transformations are performed under a strong notion of behavioral equivalence which preserves concurrency. The findings are implemented in a tool, which is publicly available.

Structuring acyclic process models (2012)

Polyvyanyy, Artem ; Garcia-Banuelos, Luciano ; Dumas, Marlon

This article studies the problem of transforming a process model with an arbitrary topology into an equivalent well-structured process model. While this problem has received significant attention, there is still no full characterization of the class of unstructured process models that can be transformed into well-structured ones, nor an automated method for structuring any process model that belongs to this class. This article fills this gap in the context of acyclic process models. The article defines a necessary and sufficient condition for an unstructured acyclic process model to have an equivalent well-structured process model under fully concurrent bisimulation, as well as a complete structuring method. The method has been implemented as a tool that takes process models captured in the BPMN and EPC notations as input. The article also reports on an empirical evaluation of the structuring method using a repository of process models from commercial practice.

Supporting object-oriented programming of semantic-web software (2012)

Quasthoff, Matthias ; Meinel, Christoph

This paper presents the state of the art in the development of Semantic-Web-enabled software using object-oriented programming languages. Object triple mapping (OTM) is a frequently used method to simplify the development of such software. A case study that is based on interviews with developers of OTM frameworks is presented at the core of this paper. Following the results of the case study, the formalization of OTM is kept separate from optional but desirable extensions of OTM with regard to metadata, schema matching, and integration into the Semantic-Web infrastructure. The material that is presented is expected to not only explain the development of Semantic-Web software by the usage of OTM, but also explain what properties of Semantic-Web software made developers come up with OTM. Understanding the latter will be essential to get nonexpert software developers to use Semantic-Web technologies in their software.

IPv6 Deployment and Spam Challenges (2012)

Rafiee, Hosnieh ; von Loewis, Martin ; Meinel, Christoph

Spam has posed a serious problem for users of email since its infancy. Today, automated strategies are required to deal with the massive amount of spam traffic. IPv4 networks offer a variety of solutions to reduce spam, but IPv6 networks' large address space and use of temporary addresses - both of which are particularly vulnerable to spam attacks - makes dealing with spam and the use of automated approaches much more difficult. IPv6 thus poses a unique security issue for ISPs because it's more difficult for them to differentiate between good IP addresses and those that are known to originate spam messages.

An alert correlation platform for memory-supported techniques (2012)

Roschke, Sebastian ; Cheng, Feng ; Meinel, Christoph

Intrusion Detection Systems (IDS) have been widely deployed in practice for detecting malicious behavior on network communication and hosts. False-positive alerts are a popular problem for most IDS approaches. The solution to address this problem is to enhance the detection process by correlation and clustering of alerts. To meet the practical requirements, this process needs to be finished fast, which is a challenging task as the amount of alerts in large-scale IDS deployments is significantly high. We identifytextitdata storage and processing algorithms to be the most important factors influencing the performance of clustering and correlation. We propose and implement a highly efficient alert correlation platform. For storage, a column-based database, an In-Memory alert storage, and memory-based index tables lead to significant improvements of the performance. For processing, algorithms are designed and implemented which are optimized for In-Memory databases, e.g. an attack graph-based correlation algorithm. The platform can be distributed over multiple processing units to share memory and processing power. A standardized interface is designed to provide a unified view of result reports for end users. The efficiency of the platform is tested by practical experiments with several alert storage approaches, multiple algorithms, as well as a local and a distributed deployment.

Refine

Has Fulltext

Author

Year of publication

Document Type

Language

Is part of the Bibliography

Keywords

Institute

30 search hits