Refine
Year of publication
- 2012 (30) (remove)
Document Type
- Article (15)
- Monograph/Edited Volume (11)
- Doctoral Thesis (3)
- Preprint (1)
Is part of the Bibliography
- yes (30)
Keywords
- AUTOSAR (2)
- Data Integration (2)
- Datenintegration (2)
- Java (2)
- Model Synchronisation (2)
- Model Transformation (2)
- Modellierung (2)
- Structuring (2)
- SysML (2)
- Tripel-Graph-Grammatik (2)
Institute
- Hasso-Plattner-Institut für Digital Engineering gGmbH (30) (remove)
Duplicate detection is the task of identifying all groups of records within a data set that represent the same real-world entity, respectively. This task is difficult, because (i) representations might differ slightly, so some similarity measure must be defined to compare pairs of records and (ii) data sets might have a high volume making a pair-wise comparison of all records infeasible. To tackle the second problem, many algorithms have been suggested that partition the data set and compare all record pairs only within each partition. One well-known such approach is the Sorted Neighborhood Method (SNM), which sorts the data according to some key and then advances a window over the data comparing only records that appear within the same window. We propose several variations of SNM that have in common a varying window size and advancement. The general intuition of such adaptive windows is that there might be regions of high similarity suggesting a larger window size and regions of lower similarity suggesting a smaller window size. We propose and thoroughly evaluate several adaption strategies, some of which are provably better than the original SNM in terms of efficiency (same results with fewer comparisons).
In-Memory Data Management
(2012)
Nach 50 Jahren erfolgreicher Entwicklunghat die Business-IT einen neuenWendepunkt erreicht. Hier zeigen die Autoren erstmalig, wieIn-Memory Computing dieUnternehmensprozesse künftig verändern wird. Bisher wurden Unternehmensdaten aus Performance-Gründen auf verschiedene Datenbanken verteilt: Analytische Datenresidieren in Data Warehouses und werden regelmäßig mithilfe transaktionaler Systeme synchronisiert. Diese Aufspaltung macht flexibles Echtzeit-Reporting aktueller Daten unmöglich. Doch dank leistungsfähigerMulti-Core-CPUs, großer Hauptspeicher, Cloud Computing und immerbesserer mobiler Endgeräte lassen die Unternehmen dieses restriktive Modell zunehmend hinter sich. Die Autoren stellen Techniken vor, die eine analytische und transaktionale Verarbeitung in Echtzeit erlauben und so dem Geschäftsleben neue Wege bahnen.
One of the key challenges in service-oriented systems engineering is the prediction and assurance of non-functional properties, such as the reliability and the availability of composite interorganizational services. Such systems are often characterized by a variety of inherent uncertainties, which must be addressed in the modeling and the analysis approach. The different relevant types of uncertainties can be categorized into (1) epistemic uncertainties due to incomplete knowledge and (2) randomization as explicitly used in protocols or as a result of physical processes. In this report, we study a probabilistic timed model which allows us to quantitatively reason about nonfunctional properties for a restricted class of service-oriented real-time systems using formal methods. To properly motivate the choice for the used approach, we devise a requirements catalogue for the modeling and the analysis of probabilistic real-time systems with uncertainties and provide evidence that the uncertainties of type (1) and (2) in the targeted systems have a major impact on the used models and require distinguished analysis approaches. The formal model we use in this report are Interval Probabilistic Timed Automata (IPTA). Based on the outlined requirements, we give evidence that this model provides both enough expressiveness for a realistic and modular specifiation of the targeted class of systems, and suitable formal methods for analyzing properties, such as safety and reliability properties in a quantitative manner. As technical means for the quantitative analysis, we build on probabilistic model checking, specifically on probabilistic time-bounded reachability analysis and computation of expected reachability rewards and costs. To carry out the quantitative analysis using probabilistic model checking, we developed an extension of the Prism tool for modeling and analyzing IPTA. Our extension of Prism introduces a means for modeling probabilistic uncertainty in the form of probability intervals, as required for IPTA. For analyzing IPTA, our Prism extension moreover adds support for probabilistic reachability checking and computation of expected rewards and costs. We discuss the performance of our extended version of Prism and compare the interval-based IPTA approach to models with fixed probabilities.
During the overall development of complex engineering systems different modeling notations are employed. For example, in the domain of automotive systems system engineering models are employed quite early to capture the requirements and basic structuring of the entire system, while software engineering models are used later on to describe the concrete software architecture. Each model helps in addressing the specific design issue with appropriate notations and at a suitable level of abstraction. However, when we step forward from system design to the software design, the engineers have to ensure that all decisions captured in the system design model are correctly transferred to the software engineering model. Even worse, when changes occur later on in either model, today the consistency has to be reestablished in a cumbersome manual step. In this report, we present in an extended version of [Holger Giese, Stefan Neumann, and Stephan Hildebrandt. Model Synchronization at Work: Keeping SysML and AUTOSAR Models Consistent. In Gregor Engels, Claus Lewerentz, Wilhelm Schäfer, Andy Schürr, and B. Westfechtel, editors, Graph Transformations and Model Driven Enginering - Essays Dedicated to Manfred Nagl on the Occasion of his 65th Birthday, volume 5765 of Lecture Notes in Computer Science, pages 555–579. Springer Berlin / Heidelberg, 2010.] how model synchronization and consistency rules can be applied to automate this task and ensure that the different models are kept consistent. We also introduce a general approach for model synchronization. Besides synchronization, the approach consists of tool adapters as well as consistency rules covering the overlap between the synchronized parts of a model and the rest. We present the model synchronization algorithm based on triple graph grammars in detail and further exemplify the general approach by means of a model synchronization solution between system engineering models in SysML and software engineering models in AUTOSAR which has been developed for an industrial partner. In the appendix as extension to [19] the meta-models and all TGG rules for the SysML to AUTOSAR model synchronization are documented.
Virtual 3D city models serve as an effective medium with manifold applications in geoinformation systems and services. To date, most 3D city models are visualized using photorealistic graphics. But an effective communication of geoinformation significantly depends on how important information is designed and cognitively processed in the given application context. One possibility to visually emphasize important information is based on non-photorealistic rendering, which comprehends artistic depiction styles and is characterized by its expressiveness and communication aspects. However, a direct application of non-photorealistic rendering techniques primarily results in monotonic visualization that lacks cartographic design aspects. In this work, we present concepts for cartography-oriented visualization of virtual 3D city models. These are based on coupling non-photorealistic rendering techniques and semantics-based information for a user, context, and media-dependent representation of thematic information. This work highlights challenges for cartography-oriented visualization of 3D geovirtual environments, presents stylization techniques and discusses their applications and ideas for a standardized visualization. In particular, the presented concepts enable a real-time and dynamic visualization of thematic geoinformation.
Virtual 3D city models play an important role in the communication of complex geospatial information in a growing number of applications, such as urban planning, navigation, tourist information, and disaster management. In general, homogeneous graphic styles are used for visualization. For instance, photorealism is suitable for detailed presentations, and non-photorealism or abstract stylization is used to facilitate guidance of a viewer's gaze to prioritized information. However, to adapt visualization to different contexts and contents and to support saliency-guided visualization based on user interaction or dynamically changing thematic information, a combination of different graphic styles is necessary. Design and implementation of such combined graphic styles pose a number of challenges, specifically from the perspective of real-time 3D visualization. In this paper, the authors present a concept and an implementation of a system that enables different presentation styles, their seamless integration within a single view, and parametrized transitions between them, which are defined according to tasks, camera view, and image resolution. The paper outlines potential usage scenarios and application fields together with a performance evaluation of the implementation.
Use-cases are considered an integral part of most contemporary development processes since they describe a software system's expected behavior from the perspective of its prospective users. However, the presence of and traceability to use-cases is increasingly lost in later more code-centric development activities. Use-cases, being well-encapsulated at the level of requirements descriptions, eventually lead to crosscutting concerns in system design and source code. Tracing which parts of the system contribute to which use-cases is therefore hard and so limits understandability.
In this paper, we propose an approach to making use-cases first-class entities in both the programming language and the runtime environment. Having use-cases present in the code and the running system will allow developers, maintainers, and operators to easily associate their units of work with what matters to the users. We suggest the combination of use-cases, acceptance tests, and dynamic analysis to automatically associate source code with use-cases. We present UseCasePy, an implementation of our approach to use-case-centered development in Python, and its application to the Django Web framework.
Process-aware information systems typically involve various kinds of process stakeholders. That, in turn, leads to multiple process models that capture a common process from different perspectives and at different levels of abstraction. In order to guarantee a certain degree of uniformity, the consistency of such related process models is evaluated using formal criteria. However, it is unclear how modelling experts assess the consistency between process models, and which kind of notion they perceive to be appropriate.
In this paper, we focus on control flow aspects and investigate the adequacy of consistency notions. In particular, we report findings from an online experiment, which allows us to compare in how far trace equivalence and two notions based on behavioural profiles approximate expert perceptions on consistency. Analysing 69 expert statements from process analysts, we conclude that trace equivalence is not suited to be applied as a consistency notion, whereas the notions based on behavioural profiles approximate the perceived consistency of our subjects significantly. Therefore, our contribution is an empirically founded answer to the correlation of behaviour consistency notions and the consistency perception by experts in the field of business process modelling.
This article addresses the problem of estimating the Quality of Service (QoS) of a composite service given the QoS of the services participating in the composition. Previous solutions to this problem impose restrictions on the topology of the orchestration models, limiting their applicability to well-structured orchestration models for example. This article lifts these restrictions by proposing a method for aggregate QoS computation that deals with more general types of unstructured orchestration models. The applicability and scalability of the proposed method are validated using a collection of models from industrial practice.