000 Informatik, Informationswissenschaft, allgemeine Werke
Refine
Year of publication
- 2022 (33) (remove)
Document Type
- Article (19)
- Doctoral Thesis (9)
- Conference Proceeding (2)
- Master's Thesis (1)
- Report (1)
- Review (1)
Is part of the Bibliography
- yes (33)
Keywords
- Answer set programming (2)
- social media (2)
- 3D Druck (1)
- 3D printing (1)
- AI Lab (1)
- Abschlussbericht (1)
- Access Control (1)
- Analytical models (1)
- Aufzählungsalgorithmen (1)
- Benchmark (1)
Institute
- Hasso-Plattner-Institut für Digital Engineering GmbH (12)
- Hasso-Plattner-Institut für Digital Engineering gGmbH (8)
- Institut für Informatik und Computational Science (7)
- Fachgruppe Volkswirtschaftslehre (2)
- Wirtschafts- und Sozialwissenschaftliche Fakultät (2)
- Fachgruppe Betriebswirtschaftslehre (1)
- Fachgruppe Politik- & Verwaltungswissenschaft (1)
The article explores whether and to what extent expert recommendations affect decision-making within the Security Council and its North Korea and Iran sanctions regimes. The article first develops a rationalist theoretical argument to show why making many second-stage decisions, such as determining lists of items under export restrictions, subjects Security Council members to repeating coordination situations. Expert recommendations may provide focal point solutions to coordination problems, even when interests diverge and preferences remain stable. Empirically, the article first explores whether expert recommendations affected decision-making on commodity sanctions imposed on North Korea. Council members heavily relied on recommended export trigger lists as focal points, solving a divisive conflict among great powers. Second, the article explores whether expert recommendations affected the designation of sanctions violators in the Iran sanctions regime. Council members designated individuals and entities following expert recommendations as focal points, despite conflicting interests among great powers. The article concludes that expert recommendations are an additional means of influence in Security Council decision-making and seem relevant for second-stage decision-making among great powers in other international organisations.
N-of-1 trials are the gold standard study design to evaluate individual treatment effects and derive personalized treatment strategies. Digital tools have the potential to initiate a new era of N-of-1 trials in terms of scale and scope, but fully functional platforms are not yet available.
Here, we present the open source StudyU platform, which includes the StudyU Designer and StudyU app.
With the StudyU Designer, scientists are given a collaborative web application to digitally specify, publish, and conduct N-of-1 trials.
The StudyU app is a smartphone app with innovative user-centric elements for participants to partake in trials published through the StudyU Designer to assess the effects of different interventions on their health.
Thereby, the StudyU platform allows clinicians and researchers worldwide to easily design and conduct digital N-of-1 trials in a safe manner.
We envision that StudyU can change the landscape of personalized treatments both for patients and healthy individuals, democratize and personalize evidence generation for self-optimization and medicine, and can be integrated in clinical practice.
Omics and male infertility
(2022)
Male infertility is a multifaceted disorder affecting approximately 50% of male partners in infertile couples.
Over the years, male infertility has been diagnosed mainly through semen analysis, hormone evaluations, medical records and physical examinations, which of course are fundamental, but yet inefficient, because 30% of male infertility cases remain idiopathic. This dilemmatic status of the unknown needs to be addressed with more sophisticated and result-driven technologies and/or techniques.
Genetic alterations have been linked with male infertility, thereby unveiling the practicality of investigating this disorder from the "omics" perspective.
Omics aims at analyzing the structure and functions of a whole constituent of a given biological function at different levels, including the molecular gene level (genomics), transcript level (transcriptomics), protein level (proteomics) and metabolites level (metabolomics). In the current study, an overview of the four branches of omics and their roles in male infertility are briefly discussed; the potential usefulness of assessing transcriptomic data to understand this pathology is also elucidated.
After assessing the publicly obtainable transcriptomic data for datasets on male infertility, a total of 1385 datasets were retrieved, of which 10 datasets met the inclusion criteria and were used for further analysis.
These datasets were classified into groups according to the disease or cause of male infertility.
The groups include non-obstructive azoospermia (NOA), obstructive azoospermia (OA), non-obstructive and obstructive azoospermia (NOA and OA), spermatogenic dysfunction, sperm dysfunction, and Y chromosome microdeletion.
Findings revealed that 8 genes (LDHC, PDHA2, TNP1, TNP2, ODF1, ODF2, SPINK2, PCDHB3) were commonly differentially expressed between all disease groups.
Likewise, 56 genes were common between NOA versus NOA and OA (ADAD1, BANF2, BCL2L14, C12orf50, C20orf173, C22orf23, C6orf99, C9orf131, C9orf24, CABS1, CAPZA3, CCDC187, CCDC54, CDKN3, CEP170, CFAP206, CRISP2, CT83, CXorf65, FAM209A, FAM71F1, FAM81B, GALNTL5, GTSF1, H1FNT, HEMGN, HMGB4, KIF2B, LDHC, LOC441601, LYZL2, ODF1, ODF2, PCDHB3, PDHA2, PGK2, PIH1D2, PLCZ1, PROCA1, RIMBP3, ROPN1L, SHCBP1L, SMCP, SPATA16, SPATA19, SPINK2, TEX33, TKTL2, TMCO2, TMCO5A, TNP1, TNP2, TSPAN16, TSSK1B, TTLL2, UBQLN3).
These genes, particularly the above-mentioned 8 genes, are involved in diverse biological processes such as germ cell development, spermatid development, spermatid differentiation, regulation of proteolysis, spermatogenesis and metabolic processes.
Owing to the stage-specific expression of these genes, any mal-expression can ultimately lead to male infertility.
Therefore, currently available data on all branches of omics relating to male fertility can be used to identify biomarkers for diagnosing male infertility, which can potentially help in unravelling some idiopathic cases.
Modern data analysis tasks often involve control flow statements, such as the iterations in PageRank and K-means. To achieve scalability, developers usually implement these tasks in distributed dataflow systems, such as Spark and Flink. Designers of such systems have to choose between providing imperative or functional control flow constructs to users. Imperative constructs are easier to use, but functional constructs are easier to compile to an efficient dataflow job. We propose Mitos, a system where control flow is both easy to use and efficient. Mitos relies on an intermediate representation based on the static single assignment form. This allows us to abstract away from specific control flow constructs and treat any imperative control flow uniformly both when building the dataflow job and when coordinating the distributed execution.
During a crisis event, social media enables two-way communication and many-to-many information broadcasting, browsing others’ posts, publishing own content, and public commenting. These records can deliver valuable insights to approach problematic situations effectively. Our study explores how social media communication can be analyzed to understand the responses to health crises better. Results based on nearly 800 K tweets indicate that the coping and regulation foci framework holds good explanatory power, with four clusters salient in public reactions: 1) “Understanding” (problem-promotion); 2) “Action planning” (problem-prevention); 3) “Hope” (emotion-promotion) and 4) “Reassurance” (emotion-prevention). Second, the inter-temporal analysis shows high volatility of topic proportions and a shift from self-centered to community-centered topics during the course of the event. The insights are beneficial for research on crisis management and practicians who are interested in large-scale monitoring of their audience for well-informed decision-making.
Deep metric learning employs deep neural networks to embed instances into a metric space such that distances between instances of the same class are small and distances between instances from different classes are large. In most existing deep metric learning techniques, the embedding of an instance is given by a feature vector produced by a deep neural network and Euclidean distance or cosine similarity defines distances between these vectors. This paper studies deep distributional embeddings of sequences, where the embedding of a sequence is given by the distribution of learned deep features across the sequence. The motivation for this is to better capture statistical information about the distribution of patterns within the sequence in the embedding. When embeddings are distributions rather than vectors, measuring distances between embeddings involves comparing their respective distributions. The paper therefore proposes a distance metric based on Wasserstein distances between the distributions and a corresponding loss function for metric learning, which leads to a novel end-to-end trainable embedding model. We empirically observe that distributional embeddings outperform standard vector embeddings and that training with the proposed Wasserstein metric outperforms training with other distance functions.
Answer set planning
(2022)
Answer Set Planning refers to the use of Answer Set Programming (ASP) to compute plans, that is, solutions to planning problems, that transform a given state of the world to another state. The development of efficient and scalable answer set solvers has provided a significant boost to the development of ASP-based planning systems. This paper surveys the progress made during the last two and a half decades in the area of answer set planning, from its foundations to its use in challenging planning domains. The survey explores the advantages and disadvantages of answer set planning. It also discusses typical applications of answer set planning and presents a set of challenges for future research.
Fast style transfer methods have recently gained popularity in art-related applications as they make a generalized real-time stylization of images practicable. However, they are mostly limited to one-shot stylizations concerning the interactive adjustment of style elements. In particular, the expressive control over stroke sizes or stroke orientations remains an open challenge. To this end, we propose a novel stroke-adjustable fast style transfer network that enables simultaneous control over the stroke size and intensity, and allows a wider range of expressive editing than current approaches by utilizing the scale-variance of convolutional neural networks. Furthermore, we introduce a network-agnostic approach for style-element editing by applying reversible input transformations that can adjust strokes in the stylized output. At this, stroke orientations can be adjusted, and warping-based effects can be applied to stylistic elements, such as swirls or waves. To demonstrate the real-world applicability of our approach, we present StyleTune, a mobile app for interactive editing of neural style transfers at multiple levels of control. Our app allows stroke adjustments on a global and local level. It furthermore implements an on-device patch-based upsampling step that enables users to achieve results with high output fidelity and resolutions of more than 20 megapixels. Our approach allows users to art-direct their creations and achieve results that are not possible with current style transfer applications.
Computational drug sensitivity models have the potential to improve therapeutic outcomes by identifying targeted drug components that are likely to achieve the highest efficacy for a cancer cell line at hand at a therapeutic dose. State of the art drug sensitivity models use regression techniques to predict the inhibitory concentration of a drug for a tumor cell line. This regression objective is not directly aligned with either of these principal goals of drug sensitivity models: We argue that drug sensitivity modeling should be seen as a ranking problem with an optimization criterion that quantifies a drug's inhibitory capacity for the cancer cell line at hand relative to its toxicity for healthy cells. We derive an extension to the well-established drug sensitivity regression model PaccMann that employs a ranking loss and focuses on the ratio of inhibitory concentration and therapeutic dosage range. We find that the ranking extension significantly enhances the model's capability to identify the most effective anticancer drugs for unseen tumor cell profiles based in on in-vitro data.
One of the first and easy to use techniques for proving run time bounds for evolutionary algorithms is the so-called method of fitness levels by Wegener. It uses a partition of the search space into a sequence of levels which are traversed by the algorithm in increasing order, possibly skipping levels. An easy, but often strong upper bound for the run time can then be derived by adding the reciprocals of the probabilities to leave the levels (or upper bounds for these). Unfortunately, a similarly effective method for proving lower bounds has not yet been established. The strongest such method, proposed by Sudholt (2013), requires a careful choice of the viscosity parameters gamma(i), j, 0 <= i < j <= n. In this paper we present two new variants of the method, one for upper and one for lower bounds. Besides the level leaving probabilities, they only rely on the probabilities that levels are visited at all. We show that these can be computed or estimated without greater difficulties and apply our method to reprove the following known results in an easy and natural way. (i) The precise run time of the (1+1) EA on LEADINGONES. (ii) A lower bound for the run time of the (1+1) EA on ONEMAX, tight apart from an O(n) term. (iii) A lower bound for the run time of the (1+1) EA on long k-paths (which differs slightly from the previous result due to a small error in the latter). We also prove a tighter lower bound for the run time of the (1+1) EA on jump functions by showing that, regardless of the jump size, only with probability O(2(-n)) the algorithm can avoid to jump over the valley of low fitness.
A standard approach to accelerating shortest path algorithms on networks is the bidirectional search, which explores the graph from the start and the destination, simultaneously. In practice this strategy performs particularly well on scale-free real-world networks. Such networks typically have a heterogeneous degree distribution (e.g., a power-law distribution) and high clustering (i.e., vertices with a common neighbor are likely to be connected themselves). These two properties can be obtained by assuming an underlying hyperbolic geometry. <br /> To explain the observed behavior of the bidirectional search, we analyze its running time on hyperbolic random graphs and prove that it is (O) over tilde (n(2-1/alpha) + n(1/(2 alpha)) + delta(max)) with high probability, where alpha is an element of (1/2, 1) controls the power-law exponent of the degree distribution, and dmax is the maximum degree. This bound is sublinear, improving the obvious worst-case linear bound. Although our analysis depends on the underlying geometry, the algorithm itself is oblivious to it.
This study quantifies the distributional effects of the minimum wage introduced in Germany in 2015. Using detailed Socio-Economic Panel survey data, we assess changes in the hourly wages, working hours, and monthly wages of employees who were entitled to be paid the minimum wage. We employ a difference-in-differences analysis, exploiting regional variation in the “bite” of the minimum wage. At the bottom of the hourly wage distribution, we document wage growth of 9% in the short term and 21% in the medium term. At the same time, we find a reduction in working hours, such that the increase in hourly wages does not lead to a subortionate increase in monthly wages. We conclude that working hours adjustments play an important role in the distributional effects of minimum wages.
We present fully polynomial time approximation schemes for a broad class of Holant problems with complex edge weights, which we call Holant polynomials. We transform these problems into partition functions of abstract combinatorial structures known as polymers in statistical physics. Our method involves establishing zero-free regions for the partition functions of polymer models and using the most significant terms of the cluster expansion to approximate them. Results of our technique include new approximation and sampling algorithms for a diverse class of Holant polynomials in the low-temperature regime (i.e. small external field) and approximation algorithms for general Holant problems with small signature weights. Additionally, we give randomised approximation and sampling algorithms with faster running times for more restrictive classes. Finally, we improve the known zero-free regions for a perfect matching polynomial.
We present a method employing Answer Set Programming in combination with Approximate Model Counting for fast and accurate calculation of error propagation probabilities in digital circuits. By an efficient problem encoding, we achieve an input data format similar to a Verilog netlist so that extensive preprocessing is avoided. By a tight interconnection of our application with the underlying solver, we avoid iterating over fault sites and reduce calls to the solver. Several circuits were analyzed with varying numbers of considered cycles and different degrees of approximation. Our experiments show, that the runtime can be reduced by approximation by a factor of 91, whereas the error compared to the exact result is below 1%.
The engineering of digital twins and their user interaction parts with explicated processes, namely processaware digital twin cockpits (PADTCs), is challenging due to the complexity of the systems and the need for information from different disciplines within the engineering process. Therefore, it is interesting to investigate how to facilitate their engineering by using already existing data, namely event logs, and reducing the number of manual steps for their engineering. Current research lacks systematic, automated approaches to derive process-aware digital twin cockpits even though some helpful techniques already exist in the areas of process mining and software engineering. Within this paper, we present a low-code development approach that reduces the amount of hand-written code needed and uses process mining techniques to derive PADTCs. We describe what models could be derived from event log data, which generative steps are needed for the engineering of PADTCs, and how process mining could be incorporated into the resulting application. This process is evaluated using the MIMIC III dataset for the creation of a PADTC prototype for an automated hospital transportation system. This approach can be used for early prototyping of PADTCs as it needs no hand-written code in the first place, but it still allows for the iterative evolvement of the application. This empowers domain experts to create their PADTC prototypes.
In order to achieve their business goals, organizations heavily rely on the operational excellence of their business processes. In traditional scenarios, business processes are usually well-structured, clearly specifying when and how certain tasks have to be executed. Flexible and knowledge-intensive processes are gathering momentum, where a knowledge worker drives the execution of a process case and determines the exact process path at runtime. In the case of an exception, the knowledge worker decides on an appropriate handling. While there is initial work on exception handling in well-structured business processes, exceptions in case management have not been sufficiently researched. This paper proposes an exception handling framework for stage-oriented case management languages, namely Guard Stage Milestone Model, Case Management Model and Notation, and Fragment-based Case Management. The effectiveness of the framework is evaluated with two real-world use cases showing that it covers all relevant exceptions and proposed handling strategies.
In discrete manufacturing, the knowledge about causal relationships makes it possible to avoid unforeseen production downtimes by identifying their root causes. Learning causal structures from real-world settings remains challenging due to high-dimensional data, a mix of discrete and continuous variables, and requirements for preprocessing log data under the causal perspective. In our work, we address these challenges proposing a process for causal reasoning based on raw machine log data from production monitoring. Within this process, we define a set of transformation rules to extract independent and identically distributed observations. Further, we incorporate a variable selection step to handle high-dimensionality and a discretization step to include continuous variables. We enrich a commonly used causal structure learning algorithm with domain-related orientation rules, which provides a basis for causal reasoning. We demonstrate the process on a real-world dataset from a globally operating precision mechanical engineering company. The dataset contains over 40 million log data entries from production monitoring of a single machine. In this context, we determine the causal structures embedded in operational processes. Further, we examine causal effects to support machine operators in avoiding unforeseen production stops, i.e., by detaining machine operators from drawing false conclusions on impacting factors of unforeseen production stops based on experience.
Knowledge graphs are structured repositories of knowledge that store facts
about the general world or a particular domain in terms of entities and
their relationships. Owing to the heterogeneity of use cases that are served
by them, there arises a need for the automated construction of domain-
specific knowledge graphs from texts. While there have been many research
efforts towards open information extraction for automated knowledge graph
construction, these techniques do not perform well in domain-specific settings.
Furthermore, regardless of whether they are constructed automatically from
specific texts or based on real-world facts that are constantly evolving, all
knowledge graphs inherently suffer from incompleteness as well as errors in
the information they hold.
This thesis investigates the challenges encountered during knowledge graph
construction and proposes techniques for their curation (a.k.a. refinement)
including the correction of semantic ambiguities and the completion of missing
facts. Firstly, we leverage existing approaches for the automatic construction
of a knowledge graph in the art domain with open information extraction
techniques and analyse their limitations. In particular, we focus on the
challenging task of named entity recognition for artwork titles and show
empirical evidence of performance improvement with our proposed solution
for the generation of annotated training data.
Towards the curation of existing knowledge graphs, we identify the issue of
polysemous relations that represent different semantics based on the context.
Having concrete semantics for relations is important for downstream appli-
cations (e.g. question answering) that are supported by knowledge graphs.
Therefore, we define the novel task of finding fine-grained relation semantics
in knowledge graphs and propose FineGReS, a data-driven technique that
discovers potential sub-relations with fine-grained meaning from existing pol-
ysemous relations. We leverage knowledge representation learning methods
that generate low-dimensional vectors (or embeddings) for knowledge graphs
to capture their semantics and structure. The efficacy and utility of the
proposed technique are demonstrated by comparing it with several baselines
on the entity classification use case.
Further, we explore the semantic representations in knowledge graph embed-
ding models. In the past decade, these models have shown state-of-the-art
results for the task of link prediction in the context of knowledge graph comple-
tion. In view of the popularity and widespread application of the embedding
techniques not only for link prediction but also for different semantic tasks,
this thesis presents a critical analysis of the embeddings by quantitatively
measuring their semantic capabilities. We investigate and discuss the reasons
for the shortcomings of embeddings in terms of the characteristics of the
underlying knowledge graph datasets and the training techniques used by
popular models.
Following up on this, we propose ReasonKGE, a novel method for generating
semantically enriched knowledge graph embeddings by taking into account the
semantics of the facts that are encapsulated by an ontology accompanying the
knowledge graph. With a targeted, reasoning-based method for generating
negative samples during the training of the models, ReasonKGE is able to
not only enhance the link prediction performance, but also reduce the number
of semantically inconsistent predictions made by the resultant embeddings,
thus improving the quality of knowledge graphs.
Drinking is different!
(2022)
Locus of control (LOC) measures how much an individual believes in the causal relationship between her own actions and her life’s outcomes. While earlier literature has shown that an increasing internal LOC is associated with increased health-conscious behavior in domains such as smoking, exercise or diets, we find that drinking seems to be different. Using very informative German panel data, we extend and generalize previous findings and find a significant positive association between having an internal LOC and the probability of occasional and regular drinking for men and women. An increase in an individual’s LOC by one standard deviation increases the probability of occasional or regular drinking on average by 3.4% for men and 6.9% for women. Using a decomposition method, we show that roughly a quarter of this association can be explained by differences in the social activities between internal and external individuals.
Evaluating creativity of verbal responses or texts is a challenging task due to psychometric issues associated with subjective ratings and the peculiarities of textual data. We explore an approach to objectively assess the creativity of responses in a sentence generation task to 1) better understand what language-related aspects are valued by human raters and 2) further advance the developments toward automating creativity evaluations. Over the course of two prior studies, participants generated 989 four-word sentences based on a four-letter prompt with the instruction to be creative. We developed an algorithm that scores each sentence on eight different metrics including 1) general word infrequency, 2) word combination infrequency, 3) context-specific word uniqueness, 4) syntax uniqueness, 5) rhyme, 6) phonetic similarity, and similarity of 7) sequence spelling and 8) semantic meaning to the cue. The text metrics were then used to explain the averaged creativity ratings of eight human raters. We found six metrics to be significantly correlated with the human ratings, explaining a total of 16% of their variance. We conclude that the creative impression of sentences is partly driven by different aspects of novelty in word choice and syntax, as well as rhythm and sound, which are amenable to objective assessment.
Answer Set Programming (ASP) is a paradigm for modeling and solving problems for knowledge representation and reasoning. There are plenty of results dedicated to studying the hardness of (fragments of) ASP. So far, these studies resulted in characterizations in terms of computational complexity as well as in fine-grained insights presented in form of dichotomy-style results, lower bounds when translating to other formalisms like propositional satisfiability (SAT), and even detailed parameterized complexity landscapes. A generic parameter in parameterized complexity originating from graph theory is the socalled treewidth, which in a sense captures structural density of a program. Recently, there was an increase in the number of treewidth-based solvers related to SAT. While there are translations from (normal) ASP to SAT, no reduction that preserves treewidth or at least keeps track of the treewidth increase is known. In this paper we propose a novel reduction from normal ASP to SAT that is aware of the treewidth, and guarantees that a slight increase of treewidth is indeed sufficient. Further, we show a new result establishing that, when considering treewidth, already the fragment of normal ASP is slightly harder than SAT (under reasonable assumptions in computational complexity). This also confirms that our reduction probably cannot be significantly improved and that the slight increase of treewidth is unavoidable. Finally, we present an empirical study of our novel reduction from normal ASP to SAT, where we compare treewidth upper bounds that are obtained via known decomposition heuristics. Overall, our reduction works better with these heuristics than existing translations. (c) 2021 Elsevier B.V. All rights reserved.
Data stream processing systems (DSPSs) are a key enabler to integrate continuously generated data, such as sensor measurements, into enterprise applications. DSPSs allow to steadily analyze information from data streams, e.g., to monitor manufacturing processes and enable fast reactions to anomalous behavior. Moreover, DSPSs continuously filter, sample, and aggregate incoming streams of data, which reduces the data size, and thus data storage costs.
The growing volumes of generated data have increased the demand for high-performance DSPSs, leading to a higher interest in these systems and to the development of new DSPSs. While having more DSPSs is favorable for users as it allows choosing the system that satisfies their requirements the most, it also introduces the challenge of identifying the most suitable DSPS regarding current needs as well as future demands. Having a solution to this challenge is important because replacements of DSPSs require the costly re-writing of applications if no abstraction layer is used for application development. However, quantifying performance differences between DSPSs is a difficult task. Existing benchmarks fail to integrate all core functionalities of DSPSs and lack tool support, which hinders objective result comparisons. Moreover, no current benchmark covers the combination of streaming data with existing structured business data, which is particularly relevant for companies.
This thesis proposes a performance benchmark for enterprise stream processing called ESPBench. With enterprise stream processing, we refer to the combination of streaming and structured business data. Our benchmark design represents real-world scenarios and allows for an objective result comparison as well as scaling of data. The defined benchmark query set covers all core functionalities of DSPSs. The benchmark toolkit automates the entire benchmark process and provides important features, such as query result validation and a configurable data ingestion rate.
To validate ESPBench and to ease the use of the benchmark, we propose an example implementation of the ESPBench queries leveraging the Apache Beam software development kit (SDK). The Apache Beam SDK is an abstraction layer designed for developing stream processing applications that is applied in academia as well as enterprise contexts. It allows to run the defined applications on any of the supported DSPSs. The performance impact of Apache Beam is studied in this dissertation as well. The results show that there is a significant influence that differs among DSPSs and stream processing applications. For validating ESPBench, we use the example implementation of the ESPBench queries developed using the Apache Beam SDK. We benchmark the implemented queries executed on three modern DSPSs: Apache Flink, Apache Spark Streaming, and Hazelcast Jet. The results of the study prove the functioning of ESPBench and its toolkit. ESPBench is capable of quantifying performance characteristics of DSPSs and of unveiling differences among systems.
The benchmark proposed in this thesis covers all requirements to be applied in enterprise stream processing settings, and thus represents an improvement over the current state-of-the-art.
Text collections, such as corpora of books, research articles, news, or business documents are an important resource for knowledge discovery. Exploring large document collections by hand is a cumbersome but necessary task to gain new insights and find relevant information. Our digitised society allows us to utilise algorithms to support the information seeking process, for example with the help of retrieval or recommender systems. However, these systems only provide selective views of the data and require some prior knowledge to issue meaningful queries and asses a system’s response. The advancements of machine learning allow us to reduce this gap and better assist the information seeking process. For example, instead of sighting countless business documents by hand, journalists and investigator scan employ natural language processing techniques, such as named entity recognition. Al-though this greatly improves the capabilities of a data exploration platform, the wealth of information is still overwhelming. An overview of the entirety of a dataset in the form of a two-dimensional map-like visualisation may help to circumvent this issue. Such overviews enable novel interaction paradigms for users, which are similar to the exploration of digital geographical maps. In particular, they can provide valuable context by indicating how apiece of information fits into the bigger picture.This thesis proposes algorithms that appropriately pre-process heterogeneous documents and compute the layout for datasets of all kinds. Traditionally, given high-dimensional semantic representations of the data, so-called dimensionality reduction algorithms are usedto compute a layout of the data on a two-dimensional canvas. In this thesis, we focus on text corpora and go beyond only projecting the inherent semantic structure itself. Therefore,we propose three dimensionality reduction approaches that incorporate additional information into the layout process: (1) a multi-objective dimensionality reduction algorithm to jointly visualise semantic information with inherent network information derived from the underlying data; (2) a comparison of initialisation strategies for different dimensionality reduction algorithms to generate a series of layouts for corpora that grow and evolve overtime; (3) and an algorithm that updates existing layouts by incorporating user feedback provided by pointwise drag-and-drop edits. This thesis also contains system prototypes to demonstrate the proposed technologies, including pre-processing and layout of the data and presentation in interactive user interfaces.
The availability of commercial 3D printers and matching 3D design software has allowed a wide range of users to create physical prototypes – as long as these objects are not larger than hand size. However, when attempting to create larger, "human-scale" objects, such as furniture, not only are these machines too small, but also the commonly used 3D design software is not equipped to design with forces in mind — since forces increase disproportionately with scale.
In this thesis, we present a series of end-to-end fabrication software systems that support users in creating human-scale objects. They achieve this by providing three main functions that regular "small-scale" 3D printing software does not offer: (1) subdivision of the object into small printable components combined with ready-made objects, (2) editing based on predefined elements sturdy enough for larger scale, i.e., trusses, and (3) functionality for analyzing, detecting, and fixing structural weaknesses. The presented software systems also assist the fabrication process based on either 3D printing or steel welding technology.
The presented systems focus on three levels of engineering challenges: (1) fabricating static load-bearing objects, (2) creating mechanisms that involve motion, such as kinematic installations, and finally (3) designing mechanisms with dynamic repetitive movement where power and energy play an important role.
We demonstrate and verify the versatility of our systems by building and testing human-scale prototypes, ranging from furniture pieces, pavilions, to animatronic installations and playground equipment. We have also shared our system with schools, fablabs, and fabrication enthusiasts, who have successfully created human-scale objects that can withstand with human-scale forces.
This study aims to compare online vs. offline flirting and dating behavior using the example of the location-based real-time dating (LBRTD) app Tinder, a popular dating platform. We focus on persons' self-descriptions like self-esteem, social desirability, state social anxiety, and adjustment behavior on Tinder and the perceived data privacy of the app. Data was gathered using a survey approach with Tinder users reporting their behavior in offline and online settings. The comparison between offline and online behavior was made using Response Surface Analysis. The results suggest that the different conditions of the natural and digital worlds do not influence the individual's behavior and emotional perception. The results are analyzed and discuss gender, age, motivation to use the app, and the user's relationship status.
Das Promotionsvorhaben verfolgt das Ziel, die Zuverlässigkeit der Datenspeicherung und die Speicherdichte von neu entwickelten Speichern (Emerging Memories) mit Multi-Level-Speicherzellen zu verbessern bzw. zu erhöhen. Hierfür werden Codes zur Erkennung von unidirektionalen Fehlern analysiert, modifiziert und neu entwickelt, um sie innerhalb der neuen Speicher anwenden zu können. Der Fokus liegt dabei auf sog. Berger-Codes und m-aus-n-Codes. Da Multi-Level-Speicherzellen nicht mehr binär, sondern mit mehreren Leveln arbeiten, können bisher verwendete Codes nicht mehr verwendet werden, bzw. müssen entsprechend angepasst werden. Auf Basis der Berger-Codes und m-aus-n-Codes werden in dieser Arbeit neue Codes abgeleitet, welche in der Lage sind, Daten auch in mehrwertigen Systemen zu schützen.
This paper aims to contribute to exploring the design possibilities of robots for use in human-robot interaction. In an experiment, we investigate the influence of the human's personality and the robot's design, especially its humanization, on its acceptance. We use the Almere model, the Big 5 personality traits, and the anthropomorphic gestalt variants to build the foundation for our investigation. The assumption that an anthropomorphized robot variant would, in principle, be preferred to the standard variant when a natural choice is enforced could not be evidenced in our experiment. This allows for the interpretation that anthropomorphism does not necessarily lead to intentional perception and, consequently, does not guarantee that it can automatically generate acceptance.
Data profiling is the extraction of metadata from relational databases. An important class of metadata are multi-column dependencies. They come associated with two computational tasks. The detection problem is to decide whether a dependency of a given type and size holds in a database. The discovery problem instead asks to enumerate all valid dependencies of that type. We investigate the two problems for three types of dependencies: unique column combinations (UCCs), functional dependencies (FDs), and inclusion dependencies (INDs).
We first treat the parameterized complexity of the detection variants. We prove that the detection of UCCs and FDs, respectively, is W[2]-complete when parameterized by the size of the dependency. The detection of INDs is shown to be one of the first natural W[3]-complete problems. We further settle the enumeration complexity of the three discovery problems by presenting parsimonious equivalences with well-known enumeration problems. Namely, the discovery of UCCs is equivalent to the famous transversal hypergraph problem of enumerating the hitting sets of a hypergraph. The discovery of FDs is equivalent to the simultaneous enumeration of the hitting sets of multiple input hypergraphs. Finally, the discovery of INDs is shown to be equivalent to enumerating the satisfying assignments of antimonotone, 3-normalized Boolean formulas.
In the remainder of the thesis, we design and analyze discovery algorithms for unique column combinations. Since this is as hard as the general transversal hypergraph problem, it is an open question whether the UCCs of a database can be computed in output-polynomial time in the worst case. For the analysis, we therefore focus on instances that are structurally close to databases in practice, most notably, inputs that have small solutions. The equivalence between UCCs and hitting sets transfers the computational hardness, but also allows us to apply ideas from hypergraph theory to data profiling. We devise an discovery algorithm that runs in polynomial space on arbitrary inputs and achieves polynomial delay whenever the maximum size of any minimal UCC is bounded. Central to our approach is the extension problem for minimal hitting sets, that is, to decide for
a set of vertices whether they are contained in any minimal solution. We prove that this is yet another problem that is complete for the complexity class W[3], when parameterized by the size of the set that is to be extended. We also give several conditional lower bounds under popular hardness conjectures such as the Strong Exponential Time Hypothesis (SETH). The lower bounds suggest that the running time of our algorithm for the extension problem is close to optimal.
We further conduct an empirical analysis of our discovery algorithm on real-world databases to confirm that the hitting set perspective on data profiling has merits also in practice. We show that the resulting enumeration times undercut their theoretical worst-case bounds on practical data, and that the memory consumption of our method is much smaller than that of previous solutions. During the analysis we make two observations about the connection between databases and their corresponding hypergraphs. On the one hand, the hypergraph representations containing all relevant information are usually significantly smaller than the original inputs. On the other hand, obtaining those hypergraphs is the actual bottleneck of any practical application. The latter often takes much longer than enumerating the solutions, which is in stark contrast to the fact that the preprocessing is guaranteed to be polynomial while the enumeration may take exponential time.
To make the first observation rigorous, we introduce a maximum-entropy model for non-uniform random hypergraphs and prove that their expected number of minimal hyperedges undergoes a phase transition with respect to the total number of edges. The result also explains why larger databases may have smaller hypergraphs. Motivated by the second observation, we present a new kind of UCC discovery algorithm called Hitting Set Enumeration with Partial Information and Validation (HPIValid). It utilizes the fast enumeration times in practice in order to speed up the computation of the corresponding hypergraph. This way, we sidestep the bottleneck while maintaining the advantages of the hitting set perspective. An exhaustive empirical evaluation shows that HPIValid outperforms the current state of the art in UCC discovery. It is capable of processing databases that were previously out of reach for data profiling.
Lehrende in der Lehrkräfteausbildung sind stets damit konfrontiert, dass sie den Studierenden innovative Methoden modernen Schulunterrichts traditionell rezipierend vorstellen. In Deutschland gibt es circa 40 Universitäten, die Informatik mit Lehramtsbezug ausbilden. Allerdings gibt es nur wenige Konzepte, die sich mit der Verbindung von Bildungswissenschaften und der Informatik mit ihrer Didaktik beschäftigen und keine Konzepte, die eine konstruktivistische Lehre in der Informatik verfolgen.
Daher zielt diese Masterarbeit darauf ab, diese Lücke aufgreifen und anhand des „Didaktik der Informatik I“ Moduls der Universität Potsdam ein Modell zur konstruktivistischen Hochschullehre zu entwickeln. Dabei soll ein bestehendes konstruktivistisches Lehrmodell auf die Informatikdidaktik übertragen und Elemente zur Verbindung von Bildungswissenschaften, Fachwissenschaften und Fachdidaktiken mit einbezogen werden. Dies kann eine Grundlage für die Planung von Informatikdidaktischen Modulen bieten, aber auch als Inspiration zur Übertragung bestehender innovativer Lehrkonzepte auf andere Fachdidaktiken dienen.
Um ein solches konstruktivistisches Lehr-Lern-Modell zu erstellen, wird zunächst der Zusammenhang von Bildungswissenschaften, Fachwissenschaften und Fachdidaktiken erläutert und anschließend die Notwendigkeit einer Vernetzung hervorgehoben. Hieran folgt eine Darstellung zu relevanten Lerntheorien und bereits entwickelten innovativen Lernkonzepten. Anknüpfend wird darauf eingegangen, welche Anforderungen die Kultusminister- Konferenz an die Ausbildung von Lehrkräften stellt und wie diese Ausbildung für die Informatik momentan an der Universität Potsdam erfolgt. Aus allen Erkenntnissen heraus werden Anforderungen an ein konstruktivistisches Lehrmodell festgelegt. Unter Berücksichtigung der Voraussetzungen der Studienordnung für das Lehramt Informatik wird anschließend ein Modell für konstruktivistische Informatikdidaktik vorgestellt.
Weiterführende Forschung könnte sich damit auseinandersetzen, inwiefern sich die Motivation und Leistung im vergleich zum ursprünglichen Modul ändert und ob die Kompetenzen zur Unterrichtsplanung und Unterrichtsgestaltung durch das neue Modulkonzept stärker ausgebaut werden können.
With the fast rise of cloud computing adoption in the past few years, more companies are migrating their confidential files from their private data center to the cloud to help enterprise's digital transformation process. Enterprise file synchronization and share (EFSS) is one of the solutions offered for enterprises to store their files in the cloud with secure and easy file sharing and collaboration between its employees. However, the rapidly increasing number of cyberattacks on the cloud might target company's files on the cloud to be stolen or leaked to the public. It is then the responsibility of the EFSS system to ensure the company's confidential files to only be accessible by authorized employees.
CloudRAID is a secure personal cloud storage research collaboration project that provides data availability and confidentiality in the cloud. It combines erasure and cryptographic techniques to securely store files as multiple encrypted file chunks in various cloud service providers (CSPs). However, several aspects of CloudRAID's concept are unsuitable for secure and scalable enterprise cloud storage solutions, particularly key management system, location-based access control, multi-cloud storage management, and cloud file access monitoring.
This Ph.D. thesis focuses on CloudRAID for Business (CfB) as it resolves four main challenges of CloudRAID's concept for a secure and scalable EFSS system. First, the key management system is implemented using the attribute-based encryption scheme to provide secure and scalable intra-company and inter-company file-sharing functionalities. Second, an Internet-based location file access control functionality is introduced to ensure files could only be accessed at pre-determined trusted locations. Third, a unified multi-cloud storage resource management framework is utilized to securely manage cloud storage resources available in various CSPs for authorized CfB stakeholders. Lastly, a multi-cloud storage monitoring system is introduced to monitor the activities of files in the cloud using the generated cloud storage log files from multiple CSPs.
In summary, this thesis helps CfB system to provide holistic security for company's confidential files on the cloud-level, system-level, and file-level to ensure only authorized company and its employees could access the files.
Boolean Satisfiability (SAT) is one of the problems at the core of theoretical computer science. It was the first problem proven to be NP-complete by Cook and, independently, by Levin. Nowadays it is conjectured that SAT cannot be solved in sub-exponential time. Thus, it is generally assumed that SAT and its restricted version k-SAT are hard to solve. However, state-of-the-art SAT solvers can solve even huge practical instances of these problems in a reasonable amount of time.
Why is SAT hard in theory, but easy in practice? One approach to answering this question is investigating the average runtime of SAT. In order to analyze this average runtime the random k-SAT model was introduced. The model generates all k-SAT instances with n variables and m clauses with uniform probability. Researching random k-SAT led to a multitude of insights and tools for analyzing random structures in general. One major observation was the emergence of the so-called satisfiability threshold: A phase transition point in the number of clauses at which the generated formulas go from asymptotically almost surely satisfiable to asymptotically almost surely unsatisfiable. Additionally, instances around the threshold seem to be particularly hard to solve.
In this thesis we analyze a more general model of random k-SAT that we call non-uniform random k-SAT. In contrast to the classical model each of the n Boolean variables now has a distinct probability of being drawn. For each of the m clauses we draw k variables according to the variable distribution and choose their signs uniformly at random. Non-uniform random k-SAT gives us more control over the distribution of Boolean variables in the resulting formulas. This allows us to tailor distributions to the ones observed in practice. Notably, non-uniform random k-SAT contains the previously proposed models random k-SAT, power-law random k-SAT and geometric random k-SAT as special cases.
We analyze the satisfiability threshold in non-uniform random k-SAT depending on the variable probability distribution. Our goal is to derive conditions on this distribution under which an equivalent of the satisfiability threshold conjecture holds. We start with the arguably simpler case of non-uniform random 2-SAT. For this model we show under which conditions a threshold exists, if it is sharp or coarse, and what the leading constant of the threshold function is. These are exactly the three ingredients one needs in order to prove or disprove the satisfiability threshold conjecture. For non-uniform random k-SAT with k=3 we only prove sufficient conditions under which a threshold exists. We also show some properties of the variable probabilities under which the threshold is sharp in this case. These are the first results on the threshold behavior of non-uniform random k-SAT.
Individuals have an intrinsic need to express themselves to other humans within a given community by sharing their experiences, thoughts, actions, and opinions. As a means, they mostly prefer to use modern online social media platforms such as Twitter, Facebook, personal blogs, and Reddit. Users of these social networks interact by drafting their own statuses updates, publishing photos, and giving likes leaving a considerable amount of data behind them to be analyzed. Researchers recently started exploring the shared social media data to understand online users better and predict their Big five personality traits: agreeableness, conscientiousness, extraversion, neuroticism, and openness to experience. This thesis intends to investigate the possible relationship between users’ Big five personality traits and the published information on their social media profiles. Facebook public data such as linguistic status updates, meta-data of likes objects, profile pictures, emotions, or reactions records were adopted to address the proposed research questions. Several machine learning predictions models were constructed with various experiments to utilize the engineered features correlated with the Big 5 Personality traits. The final predictive performances improved the prediction accuracy compared to state-of-the-art approaches, and the models were evaluated based on established benchmarks in the domain. The research experiments were implemented while ethical and privacy points were concerned. Furthermore, the research aims to raise awareness about privacy between social media users and show what third parties can reveal about users’ private traits from what they share and act on different social networking platforms.
In the second part of the thesis, the variation in personality development is studied within a cross-platform environment such as Facebook and Twitter platforms. The constructed personality profiles in these social platforms are compared to evaluate the effect of the used platforms on one user’s personality development. Likewise, personality continuity and stability analysis are performed using two social media platforms samples. The implemented experiments are based on ten-year longitudinal samples aiming to understand users’ long-term personality development and further unlock the potential of cooperation between psychologists and data scientists.