Refine
Has Fulltext
- yes (17) (remove)
Year of publication
- 2020 (17) (remove)
Document Type
- Postprint (8)
- Doctoral Thesis (7)
- Monograph/Edited Volume (2)
Language
- English (17) (remove)
Is part of the Bibliography
- yes (17)
Keywords
Institute
- Hasso-Plattner-Institut für Digital Engineering GmbH (17) (remove)
RHEEMix in the data jungle
(2020)
Data analytics are moving beyond the limits of a single platform. In this paper, we present the cost-based optimizer of Rheem, an open-source cross-platform system that copes with these new requirements. The optimizer allocates the subtasks of data analytic tasks to the most suitable platforms. Our main contributions are: (i) a mechanism based on graph transformations to explore alternative execution strategies; (ii) a novel graph-based approach to determine efficient data movement plans among subtasks and platforms; and (iii) an efficient plan enumeration algorithm, based on a novel enumeration algebra. We extensively evaluate our optimizer under diverse real tasks. We show that our optimizer can perform tasks more than one order of magnitude faster when using multiple platforms than when using a single platform.
Rapid decline of glomerular filtration rate estimated from creatinine (eGFRcrea) is associated with severe clinical endpoints. In contrast to cross-sectionally assessed eGFRcrea, the genetic basis for rapid eGFRcrea decline is largely unknown. To help define this, we meta-analyzed 42 genome-wide association studies from the Chronic Kidney Diseases Genetics Consortium and United Kingdom Biobank to identify genetic loci for rapid eGFRcrea decline. Two definitions of eGFRcrea decline were used: 3 mL/min/1.73m(2)/year or more ("Rapid3"; encompassing 34,874 cases, 107,090 controls) and eGFRcrea decline 25% or more and eGFRcrea under 60 mL/min/1.73m(2) at follow-up among those with eGFRcrea 60 mL/min/1.73m(2) or more at baseline ("CKDi25"; encompassing 19,901 cases, 175,244 controls). Seven independent variants were identified across six loci for Rapid3 and/or CKDi25: consisting of five variants at four loci with genome-wide significance (near UMOD-PDILT (2), PRKAG2, WDR72, OR2S2) and two variants among 265 known eGFRcrea variants (near GATM, LARP4B). All these loci were novel for Rapid3 and/or CKDi25 and our bioinformatic follow-up prioritized variants and genes underneath these loci. The OR2S2 locus is novel for any eGFRcrea trait including interesting candidates. For the five genome-wide significant lead variants, we found supporting effects for annual change in blood urea nitrogen or cystatin-based eGFR, but not for GATM or (LARP4B). Individuals at high compared to those at low genetic risk (8-14 vs. 0-5 adverse alleles) had a 1.20-fold increased risk of acute kidney injury (95% confidence interval 1.08-1.33). Thus, our identified loci for rapid kidney function decline may help prioritize therapeutic targets and identify mechanisms and individuals at risk for sustained deterioration of kidney function.
The analysis of behavioral models is of high importance for cyber-physical systems, as the systems often encompass complex behavior based on e.g. concurrent components with mutual exclusion or probabilistic failures on demand. The rule-based formalism of probabilistic timed graph transformation systems is a suitable choice when the models representing states of the system can be understood as graphs and timed and probabilistic behavior is important. However, model checking PTGTSs is limited to systems with rather small state spaces.
We present an approach for the analysis of large scale systems modeled as probabilistic timed graph transformation systems by systematically decomposing their state spaces into manageable fragments. To obtain qualitative and quantitative analysis results for a large scale system, we verify that results obtained for its fragments serve as overapproximations for the corresponding results of the large scale system. Hence, our approach allows for the detection of violations of qualitative and quantitative safety properties for the large scale system under analysis. We consider a running example in which we model shuttles driving on tracks of a large scale topology and for which we verify that shuttles never collide and are unlikely to execute emergency brakes. In our evaluation, we apply an implementation of our approach to the running example.
The demand for peer-to-peer ridesharing services increased over the last years rapidly. To cost-efficiently dispatch orders and communicate accurate pick-up times is challenging as the current location of each available driver is not exactly known since observed locations can be outdated for several seconds. The developed trajectory visualization tool enables transportation network companies to analyze dispatch processes and determine the causes of unexpected delays. As dispatching algorithms are based on the accuracy of arrival time predictions, we account for factors like noise, sample rate, technical and economic limitations as well as the duration of the entire process as they have an impact on the accuracy of spatio-temporal data. To improve dispatching strategies, we propose a prediction approach that provides a probability distribution for a driver’s future locations based on patterns observed in past trajectories. We demonstrate the capabilities of our prediction results to ( i) avoid critical delays, (ii) to estimate waiting times with higher confidence, and (iii) to enable risk considerations in dispatching strategies.
A treemap is a visualization that has been specifically designed to facilitate the exploration of tree-structured data and, more general, hierarchically structured data. The family of visualization techniques that use a visual metaphor for parent-child relationships based “on the property of containment” (Johnson, 1993) is commonly referred to as treemaps. However, as the number of variations of treemaps grows, it becomes increasingly important to distinguish clearly between techniques and their specific characteristics. This paper proposes to discern between Space-filling Treemap TS, Containment Treemap TC, Implicit Edge Representation Tree TIE, and Mapped Tree TMT for classification of hierarchy visualization techniques and highlights their respective properties. This taxonomy is created as a hyponymy, i.e., its classes have an is-a relationship to one another: TS TC TIE TMT. With this proposal, we intend to stimulate a discussion on a more unambiguous classification of treemaps and, furthermore, broaden what is understood by the concept of treemap itself.
With the spread of smart phones capable of taking high-resolution photos and the development of high-speed mobile data infrastructure, digital visual media is becoming one of the most important forms of modern communication. With this development, however, also comes a devaluation of images as a media form with the focus becoming the frequency at which visual content is generated instead of the quality of the content. In this work, an interactive system using image-abstraction techniques and an eye tracking sensor is presented, which allows users to experience diverting and dynamic artworks that react to their eye movement. The underlying modular architecture enables a variety of different interaction techniques that share common design principles, making the interface as intuitive as possible. The resulting experience allows users to experience a game-like interaction in which they aim for a reward, the artwork, while being held under constraints, e.g., not blinking. The co nscious eye movements that are required by some interaction techniques hint an interesting, possible future extension for this work into the field of relaxation exercises and concentration training.
Distance Education or e-Learning platform should be able to provide a virtual laboratory to let the participants have hands-on exercise experiences in practicing their skill remotely. Especially in Cybersecurity e-Learning where the participants need to be able to attack or defend the IT System. To have a hands-on exercise, the virtual laboratory environment must be similar to the real operational environment, where an attack or a victim is represented by a node in a virtual laboratory environment. A node is usually represented by a Virtual Machine (VM). Scalability has become a primary issue in the virtual laboratory for cybersecurity e-Learning because a VM needs a significant and fix allocation of resources. Available resources limit the number of simultaneous users. Scalability can be increased by increasing the efficiency of using available resources and by providing more resources. Increasing scalability means increasing the number of simultaneous users.
In this thesis, we propose two approaches to increase the efficiency of using the available resources. The first approach in increasing efficiency is by replacing virtual machines (VMs) with containers whenever it is possible. The second approach is sharing the load with the user-on-premise machine, where the user-on-premise machine represents one of the nodes in a virtual laboratory scenario. We also propose two approaches in providing more resources. One way to provide more resources is by using public cloud services. Another way to provide more resources is by gathering resources from the crowd, which is referred to as Crowdresourcing Virtual Laboratory (CRVL).
In CRVL, the crowd can contribute their unused resources in the form of a VM, a bare metal system, an account in a public cloud, a private cloud and an isolated group of VMs, but in this thesis, we focus on a VM. The contributor must give the credential of the VM admin or root user to the CRVL system. We propose an architecture and methods to integrate or dis-integrate VMs from the CRVL system automatically. A Team placement algorithm must also be investigated to optimize the usage of resources and at the same time giving the best service to the user. Because the CRVL system does not manage the contributor host machine, the CRVL system must be able to make sure that the VM integration will not harm their system and that the training material will be stored securely in the contributor sides, so that no one is able to take the training material away without permission. We are investigating ways to handle this kind of threats.
We propose three approaches to strengthen the VM from a malicious host admin. To verify the integrity of a VM before integration to the CRVL system, we propose a remote verification method without using any additional hardware such as the Trusted Platform Module chip. As the owner of the host machine, the host admins could have access to the VM's data via Random Access Memory (RAM) by doing live memory dumping, Spectre and Meltdown attacks. To make it harder for the malicious host admin in getting the sensitive data from RAM, we propose a method that continually moves sensitive data in RAM. We also propose a method to monitor the host machine by installing an agent on it. The agent monitors the hypervisor configurations and the host admin activities.
To evaluate our approaches, we conduct extensive experiments with different settings. The use case in our approach is Tele-Lab, a Virtual Laboratory platform for Cyber Security e-Learning. We use this platform as a basis for designing and developing our approaches. The results show that our approaches are practical and provides enhanced security.
Economic impact of clinical decision support interventions based on electronic health records
(2020)
Background
Unnecessary healthcare utilization, non-adherence to current clinical guidelines, or insufficient personalized care are perpetual challenges and remain potential major cost-drivers for healthcare systems around the world. Implementing decision support systems into clinical care is promised to improve quality of care and thereby yield substantial effects on reducing healthcare expenditure. In this article, we evaluate the economic impact of clinical decision support (CDS) interventions based on electronic health records (EHR).
Methods
We searched for studies published after 2014 using MEDLINE, CENTRAL, WEB OF SCIENCE, EBSCO, and TUFTS CEA registry databases that encompass an economic evaluation or consider cost outcome measures of EHR based CDS interventions. Thereupon, we identified best practice application areas and categorized the investigated interventions according to an existing taxonomy of front-end CDS tools.
Results and discussion
Twenty-seven studies are investigated in this review. Of those, twenty-two studies indicate a reduction of healthcare expenditure after implementing an EHR based CDS system, especially towards prevalent application areas, such as unnecessary laboratory testing, duplicate order entry, efficient transfusion practice, or reduction of antibiotic prescriptions. On the contrary, order facilitators and undiscovered malfunctions revealed to be threats and could lead to new cost drivers in healthcare. While high upfront and maintenance costs of CDS systems are a worldwide implementation barrier, most studies do not consider implementation cost. Finally, four included economic evaluation studies report mixed monetary outcome results and thus highlight the importance of further high-quality economic evaluations for these CDS systems.
Conclusion
Current research studies lack consideration of comparative cost-outcome metrics as well as detailed cost components in their analyses. Nonetheless, the positive economic impact of EHR based CDS interventions is highly promising, especially with regard to reducing waste in healthcare.
Comment sections of online news platforms are an essential space to express opinions and discuss political topics. However, the misuse by spammers, haters, and trolls raises doubts about whether the benefits justify the costs of the time-consuming content moderation. As a consequence, many platforms limited or even shut down comment sections completely. In this thesis, we present deep learning approaches for comment classification, recommendation, and prediction to foster respectful and engaging online discussions. The main focus is on two kinds of comments: toxic comments, which make readers leave a discussion, and engaging comments, which make readers join a discussion. First, we discourage and remove toxic comments, e.g., insults or threats. To this end, we present a semi-automatic comment moderation process, which is based on fine-grained text classification models and supports moderators. Our experiments demonstrate that data augmentation, transfer learning, and ensemble learning allow training robust classifiers even on small datasets. To establish trust in the machine-learned models, we reveal which input features are decisive for their output with attribution-based explanation methods. Second, we encourage and highlight engaging comments, e.g., serious questions or factual statements. We automatically identify the most engaging comments, so that readers need not scroll through thousands of comments to find them. The model training process builds on upvotes and replies as a measure of reader engagement. We also identify comments that address the article authors or are otherwise relevant to them to support interactions between journalists and their readership. Taking into account the readers' interests, we further provide personalized recommendations of discussions that align with their favored topics or involve frequent co-commenters. Our models outperform multiple baselines and recent related work in experiments on comment datasets from different platforms.
Successfully completing any data science project demands careful consideration across its whole process. Although the focus is often put on later phases of the process, in practice, experts spend more time in earlier phases, preparing data, to make them consistent with the systems' requirements or to improve their models' accuracies. Duplicate detection is typically applied during the data cleaning phase, which is dedicated to removing data inconsistencies and improving the overall quality and usability of data. While data cleaning involves a plethora of approaches to perform specific operations, such as schema alignment and data normalization, the task of detecting and removing duplicate records is particularly challenging. Duplicates arise when multiple records representing the same entities exist in a database. Due to numerous reasons, spanning from simple typographical errors to different schemas and formats of integrated databases. Keeping a database free of duplicates is crucial for most use-cases, as their existence causes false negatives and false positives when matching queries against it. These two data quality issues have negative implications for tasks, such as hotel booking, where users may erroneously select a wrong hotel, or parcel delivery, where a parcel can get delivered to the wrong address. Identifying the variety of possible data issues to eliminate duplicates demands sophisticated approaches.
While research in duplicate detection is well-established and covers different aspects of both efficiency and effectiveness, our work in this thesis focuses on the latter. We propose novel approaches to improve data quality before duplicate detection takes place and apply the latter in datasets even when prior labeling is not available. Our experiments show that improving data quality upfront can increase duplicate classification results by up to 19%. To this end, we propose two novel pipelines that select and apply generic as well as address-specific data preparation steps with the purpose of maximizing the success of duplicate detection. Generic data preparation, such as the removal of special characters, can be applied to any relation with alphanumeric attributes. When applied, data preparation steps are selected only for attributes where there are positive effects on pair similarities, which indirectly affect classification, or on classification directly. Our work on addresses is twofold; first, we consider more domain-specific approaches to improve the quality of values, and, second, we experiment with known and modified versions of similarity measures to select the most appropriate per address attribute, e.g., city or country.
To facilitate duplicate detection in applications where gold standard annotations are not available and obtaining them is not possible or too expensive, we propose MDedup. MDedup is a novel, rule-based, and fully automatic duplicate detection approach that is based on matching dependencies. These dependencies can be used to detect duplicates and can be discovered using state-of-the-art algorithms efficiently and without any prior labeling. MDedup uses two pipelines to first train on datasets with known labels, learning to identify useful matching dependencies, and then be applied on unseen datasets, regardless of any existing gold standard. Finally, our work is accompanied by open source code to enable repeatability of our research results and application of our approaches to other datasets.
Gait analysis is an important tool for the early detection of neurological diseases and for the assessment of risk of falling in elderly people. The availability of low-cost camera hardware on the market today and recent advances in Machine Learning enable a wide range of clinical and health-related applications, such as patient monitoring or exercise recognition at home. In this study, we evaluated the motion tracking performance of the latest generation of the Microsoft Kinect camera, Azure Kinect, compared to its predecessor Kinect v2 in terms of treadmill walking using a gold standard Vicon multi-camera motion capturing system and the 39 marker Plug-in Gait model. Five young and healthy subjects walked on a treadmill at three different velocities while data were recorded simultaneously with all three camera systems. An easy-to-administer camera calibration method developed here was used to spatially align the 3D skeleton data from both Kinect cameras and the Vicon system. With this calibration, the spatial agreement of joint positions between the two Kinect cameras and the reference system was evaluated. In addition, we compared the accuracy of certain spatio-temporal gait parameters, i.e., step length, step time, step width, and stride time calculated from the Kinect data, with the gold standard system. Our results showed that the improved hardware and the motion tracking algorithm of the Azure Kinect camera led to a significantly higher accuracy of the spatial gait parameters than the predecessor Kinect v2, while no significant differences were found between the temporal parameters. Furthermore, we explain in detail how this experimental setup could be used to continuously monitor the progress during gait rehabilitation in older people.
How We Found Our IMU
(2020)
Inertial measurement units (IMUs) are commonly used for localization or movement tracking in pervasive healthcare-related studies, and gait analysis is one of the most often studied topics using IMUs. The increasing variety of commercially available IMU devices offers convenience by combining the sensor modalities and simplifies the data collection procedures. However, selecting the most suitable IMU device for a certain use case is increasingly challenging. In this study, guidelines for IMU selection are proposed. In particular, seven IMUs were compared in terms of their specifications, data collection procedures, and raw data quality. Data collected from the IMUs were then analyzed by a gait analysis algorithm. The difference in accuracy of the calculated gait parameters between the IMUs could be used to retrace the issues in raw data, such as acceleration range or sensor calibration. Based on our algorithm, we were able to identify the best-suited IMUs for our needs. This study provides an overview of how to select the IMUs based on the area of study with concrete examples, and gives insights into the features of seven commercial IMUs using real data.
Single-column data profiling
(2020)
The research area of data profiling consists of a large set of methods and processes to examine a given dataset and determine metadata about it. Typically, different data profiling tasks address different kinds of metadata, comprising either various statistics about individual columns (Single-column Analysis) or relationships among them (Dependency Discovery). Among the basic statistics about a column are data type, header, the number of unique values (the column's cardinality), maximum and minimum values, the number of null values, and the value distribution. Dependencies involve, for instance, functional dependencies (FDs), inclusion dependencies (INDs), and their approximate versions.
Data profiling has a wide range of conventional use cases, namely data exploration, cleansing, and integration. The produced metadata is also useful for database management and schema reverse engineering. Data profiling has also more novel use cases, such as big data analytics. The generated metadata describes the structure of the data at hand, how to import it, what it is about, and how much of it there is. Thus, data profiling can be considered as an important preparatory task for many data analysis and mining scenarios to assess which data might be useful and to reveal and understand a new dataset's characteristics.
In this thesis, the main focus is on the single-column analysis class of data profiling tasks. We study the impact and the extraction of three of the most important metadata about a column, namely the cardinality, the header, and the number of null values.
First, we present a detailed experimental study of twelve cardinality estimation algorithms. We classify the algorithms and analyze their efficiency, scaling far beyond the original experiments and testing theoretical guarantees. Our results highlight their trade-offs and point out the possibility to create a parallel or a distributed version of these algorithms to cope with the growing size of modern datasets.
Then, we present a fully automated, multi-phase system to discover human-understandable, representative, and consistent headers for a target table in cases where headers are missing, meaningless, or unrepresentative for the column values. Our evaluation on Wikipedia tables shows that 60% of the automatically discovered schemata are exact and complete. Considering more schema candidates, top-5 for example, increases this percentage to 72%.
Finally, we formally and experimentally show the ghost and fake FDs phenomenon caused by FD discovery over datasets with missing values. We propose two efficient scores, probabilistic and likelihood-based, for estimating the genuineness of a discovered FD. Our extensive set of experiments on real-world and semi-synthetic datasets show the effectiveness and efficiency of these scores.
Lifelong learning plays an increasingly important role in many societies. Technology is changing faster than ever and what has been important to learn today, may be obsolete tomorrow. The role of informal programs is becoming increasingly important. Particularly, Massive Open Online Courses have become popular among learners and instructors. In 2008, a group of Canadian education enthusiasts started the first Massive Open Online Courses or MOOCs to prove their cognitive theory of Connectivism. Around 2012, a variety of American start-ups redefined the concept of MOOCs. Instead of following the connectivist doctrine they returned to a more traditional approach. They focussed on video lecturing and combined this with a course forum that allowed the participants to discuss with each other and the teaching team. While this new version of the concept was enormously successful in terms of massiveness—hundreds of thousands of participants from all over the world joined the first of these courses—many educators criticized the re-lapse to the cognitivist model. In the early days, the evolving platforms often did not have more features than a video player, simple multiple-choice quizzes, and the course forum. It soon became a major interest of research to allow the scaling of more modern approaches of learning and teaching for the massiveness of these courses. Hands-on exercises, alternative forms of assessment, collaboration, and teamwork are some of the topics on the agenda. The insights provided by cognitive and pedagogical theories, however, do not necessarily always run in sync with the needs and the preferences of the majority of participants. While the former promote action-learning, hands-on-learning, competence-based-learning, project-based-learning, team-based-learning as the holy grail, many of the latter often rather prefer a more laid-back style of learning, sometimes referred to as edutainment. Obviously, given the large numbers of participants in these courses, there is not just one type of learners. Participants are not a homogeneous mass but a potpourri of individuals with a wildly heterogeneous mix of backgrounds, previous knowledge, familial and professional circumstances, countries of origin, gender, age, and so on. For the majority of participants, a full-time job and/or a family often just does not leave enough room for more time intensive tasks, such as practical exercises or teamwork. Others, however, particularly enjoy these hands-on or collaborative aspects of MOOCs. Furthermore, many subjects particularly require these possibilities and simply cannot be taught or learned in courses that lack collaborative or hands-on features. In this context, the thesis discusses how team assignments have been implemented on the HPI MOOC platform. During the recent years, several experiments have been conducted and a great amount of experience has been gained by employing team assignments in courses in areas, such as Object-Oriented Programming, Design Thinking, and Business Innovation on various instances of this platform: openHPI, openSAP, and mooc.house
The “HPI Future SOC Lab” is a cooperation of the Hasso Plattner Institute (HPI) and industry partners. Its mission is to enable and promote exchange and interaction between the research community and the industry partners.
The HPI Future SOC Lab provides researchers with free of charge access to a complete infrastructure of state of the art hard and software. This infrastructure includes components, which might be too expensive for an ordinary research environment, such as servers with up to 64 cores and 2 TB main memory. The offerings address researchers particularly from but not limited to the areas of computer science and business information systems. Main areas of research include cloud computing, parallelization, and In-Memory technologies.
This technical report presents results of research projects executed in 2017. Selected projects have presented their results on April 25th and November 15th 2017 at the Future SOC Lab Day events.
This work presents a new design for programming environments that promote the exploration of domain-specific software artifacts and the construction of graphical tools for such program comprehension tasks. In complex software projects, tool building is essential because domain- or task-specific tools can support decision making by representing concerns concisely with low cognitive effort. In contrast, generic tools can only support anticipated scenarios, which usually align with programming language concepts or well-known project domains.
However, the creation and modification of interactive tools is expensive because the glue that connects data to graphics is hard to find, change, and test. Even if valuable data is available in a common format and even if promising visualizations could be populated, programmers have to invest many resources to make changes in the programming environment. Consequently, only ideas of predictably high value will be implemented. In the non-graphical, command-line world, the situation looks different and inspiring: programmers can easily build their own tools as shell scripts by configuring and combining filter programs to process data.
We propose a new perspective on graphical tools and provide a concept to build and modify such tools with a focus on high quality, low effort, and continuous adaptability. That is, (1) we propose an object-oriented, data-driven, declarative scripting language that reduces the amount of and governs the effects of glue code for view-model specifications, and (2) we propose a scalable UI-design language that promotes short feedback loops in an interactive, graphical environment such as Morphic known from Self or Squeak/Smalltalk systems.
We implemented our concept as a tool building environment, which we call VIVIDE, on top of Squeak/Smalltalk and Morphic. We replaced existing code browsing and debugging tools to iterate within our solution more quickly. In several case studies with undergraduate and graduate students, we observed that VIVIDE can be applied to many domains such as live language development, source-code versioning, modular code browsing, and multi-language debugging. Then, we designed a controlled experiment to measure the effect on the time to build tools. Several pilot runs showed that training is crucial and, presumably, takes days or weeks, which implies a need for further research.
As a result, programmers as users can directly work with tangible representations of their software artifacts in the VIVIDE environment. Tool builders can write domain-specific scripts to populate views to approach comprehension tasks from different angles. Our novel perspective on graphical tools can inspire the creation of new trade-offs in modularity for both data providers and view designers.
With rising complexity of today's software and hardware systems and the hypothesized increase in autonomous, intelligent, and self-* systems, developing correct systems remains an important challenge. Testing, although an important part of the development and maintainance process, cannot usually establish the definite correctness of a software or hardware system - especially when systems have arbitrarily large or infinite state spaces or an infinite number of initial states. This is where formal verification comes in: given a representation of the system in question in a formal framework, verification approaches and tools can be used to establish the system's adherence to its similarly formalized specification, and to complement testing.
One such formal framework is the field of graphs and graph transformation systems. Both are powerful formalisms with well-established foundations and ongoing research that can be used to describe complex hardware or software systems with varying degrees of abstraction. Since their inception in the 1970s, graph transformation systems have continuously evolved; related research spans extensions of expressive power, graph algorithms, and their implementation, application scenarios, or verification approaches, to name just a few topics.
This thesis focuses on a verification approach for graph transformation systems called k-inductive invariant checking, which is an extension of previous work on 1-inductive invariant checking. Instead of exhaustively computing a system's state space, which is a common approach in model checking, 1-inductive invariant checking symbolically analyzes graph transformation rules - i.e. system behavior - in order to draw conclusions with respect to the validity of graph constraints in the system's state space. The approach is based on an inductive argument: if a system's initial state satisfies a graph constraint and if all rules preserve that constraint's validity, we can conclude the constraint's validity in the system's entire state space - without having to compute it.
However, inductive invariant checking also comes with a specific drawback: the locality of graph transformation rules leads to a lack of context information during the symbolic analysis of potential rule applications. This thesis argues that this lack of context can be partly addressed by using k-induction instead of 1-induction. A k-inductive invariant is a graph constraint whose validity in a path of k-1 rule applications implies its validity after any subsequent rule application - as opposed to a 1-inductive invariant where only one rule application is taken into account. Considering a path of transformations then accumulates more context of the graph rules' applications.
As such, this thesis extends existing research and implementation on 1-inductive invariant checking for graph transformation systems to k-induction. In addition, it proposes a technique to perform the base case of the inductive argument in a symbolic fashion, which allows verification of systems with an infinite set of initial states. Both k-inductive invariant checking and its base case are described in formal terms. Based on that, this thesis formulates theorems and constructions to apply this general verification approach for typed graph transformation systems and nested graph constraints - and to formally prove the approach's correctness.
Since unrestricted graph constraints may lead to non-termination or impracticably high execution times given a hypothetical implementation, this thesis also presents a restricted verification approach, which limits the form of graph transformation systems and graph constraints. It is formalized, proven correct, and its procedures terminate by construction. This restricted approach has been implemented in an automated tool and has been evaluated with respect to its applicability to test cases, its performance, and its degree of completeness.