004 Datenverarbeitung; Informatik
Refine
Has Fulltext
- yes (410) (remove)
Year of publication
Document Type
- Monograph/Edited Volume (121)
- Doctoral Thesis (115)
- Article (86)
- Postprint (50)
- Conference Proceeding (30)
- Master's Thesis (3)
- Preprint (3)
- Bachelor Thesis (1)
- Habilitation Thesis (1)
Language
- English (410) (remove)
Keywords
- machine learning (14)
- answer set programming (10)
- Cloud Computing (9)
- Hasso-Plattner-Institut (9)
- cloud computing (9)
- Forschungskolleg (8)
- Forschungsprojekte (8)
- Future SOC Lab (8)
- Hasso Plattner Institute (8)
- In-Memory Technologie (8)
Institute
- Hasso-Plattner-Institut für Digital Engineering gGmbH (136)
- Institut für Informatik und Computational Science (108)
- Hasso-Plattner-Institut für Digital Engineering GmbH (97)
- Extern (46)
- Mathematisch-Naturwissenschaftliche Fakultät (24)
- Wirtschaftswissenschaften (8)
- Digital Engineering Fakultät (7)
- Institut für Physik und Astronomie (4)
- Fachgruppe Betriebswirtschaftslehre (2)
- Institut für Umweltwissenschaften und Geographie (2)
“Broadcast your gender.”
(2022)
Social media platforms provide a large array of behavioral data relevant to social scientific research. However, key information such as sociodemographic characteristics of agents are often missing. This paper aims to compare four methods of classifying social attributes from text. Specifically, we are interested in estimating the gender of German social media creators. By using the example of a random sample of 200 YouTube channels, we compare several classification methods, namely (1) a survey among university staff, (2) a name dictionary method with the World Gender Name Dictionary as a reference list, (3) an algorithmic approach using the website gender-api.com, and (4) a Multinomial Naïve Bayes (MNB) machine learning technique. These different methods identify gender attributes based on YouTube channel names and descriptions in German but are adaptable to other languages. Our contribution will evaluate the share of identifiable channels, accuracy and meaningfulness of classification, as well as limits and benefits of each approach. We aim to address methodological challenges connected to classifying gender attributes for YouTube channels as well as related to reinforcing stereotypes and ethical implications.
We introduce a simple approach extending the input language of Answer Set Programming (ASP) systems by multi-valued propositions. Our approach is implemented as a (prototypical) preprocessor translating logic programs with multi-valued propositions into logic programs with Boolean propositions only. Our translation is modular and heavily benefits from the expressive input language of ASP. The resulting approach, along with its implementation, allows for solving interesting constraint satisfaction problems in ASP, showing a good performance.
The exponential expanding of the numbers of web sites and Internet users makes WWW the most important global information resource. From information publishing and electronic commerce to entertainment and social networking, the Web allows an inexpensive and efficient access to the services provided by individuals and institutions. The basic units for distributing these services are the web sites scattered throughout the world. However, the extreme fragility of web services and content, the high competence between similar services supplied by different sites, and the wide geographic distributions of the web users drive the urgent requirement from the web managers to track and understand the usage interest of their web customers. This thesis, "X-tracking the Usage Interest on Web Sites", aims to fulfill this requirement. "X" stands two meanings: one is that the usage interest differs from various web sites, and the other is that usage interest is depicted from multi aspects: internal and external, structural and conceptual, objective and subjective. "Tracking" shows that our concentration is on locating and measuring the differences and changes among usage patterns. This thesis presents the methodologies on discovering usage interest on three kinds of web sites: the public information portal site, e-learning site that provides kinds of streaming lectures and social site that supplies the public discussions on IT issues. On different sites, we concentrate on different issues related with mining usage interest. The educational information portal sites were the first implementation scenarios on discovering usage patterns and optimizing the organization of web services. In such cases, the usage patterns are modeled as frequent page sets, navigation paths, navigation structures or graphs. However, a necessary requirement is to rebuild the individual behaviors from usage history. We give a systematic study on how to rebuild individual behaviors. Besides, this thesis shows a new strategy on building content clusters based on pair browsing retrieved from usage logs. The difference between such clusters and the original web structure displays the distance between the destinations from usage side and the expectations from design side. Moreover, we study the problem on tracking the changes of usage patterns in their life cycles. The changes are described from internal side integrating conceptual and structure features, and from external side for the physical features; and described from local side measuring the difference between two time spans, and global side showing the change tendency along the life cycle. A platform, Web-Cares, is developed to discover the usage interest, to measure the difference between usage interest and site expectation and to track the changes of usage patterns. E-learning site provides the teaching materials such as slides, recorded lecture videos and exercise sheets. We focus on discovering the learning interest on streaming lectures, such as real medias, mp4 and flash clips. Compared to the information portal site, the usage on streaming lectures encapsulates the variables such as viewing time and actions during learning processes. The learning interest is discovered in the form of answering 6 questions, which covers finding the relations between pieces of lectures and the preference among different forms of lectures. We prefer on detecting the changes of learning interest on the same course from different semesters. The differences on the content and structure between two courses leverage the changes on the learning interest. We give an algorithm on measuring the difference on learning interest integrated with similarity comparison between courses. A search engine, TASK-Moniminer, is created to help the teacher query the learning interest on their streaming lectures on tele-TASK site. Social site acts as an online community attracting web users to discuss the common topics and share their interesting information. Compared to the public information portal site and e-learning web site, the rich interactions among users and web content bring the wider range of content quality, on the other hand, provide more possibilities to express and model usage interest. We propose a framework on finding and recommending high reputation articles in a social site. We observed that the reputation is classified into global and local categories; the quality of the articles having high reputation is related with the content features. Based on these observations, our framework is implemented firstly by finding the articles having global or local reputation, and secondly clustering articles based on their content relations, and then the articles are selected and recommended from each cluster based on their reputation ranks.
Diversity is a term that is broadly used and challenging for informatics research, development and education. Diversity concerns may relate to unequal participation, knowledge and methodology, curricula, institutional planning etc. For a lot of these areas, measures, guidelines and best practices on diversity awareness exist. A systemic, sustainable impact of diversity measures on informatics is still largely missing. In this paper I explore what working with diversity and gender concepts in informatics entails, what the main challenges are and provide thoughts for improvement. The paper includes definitions of diversity and intersectionality, reflections on the disciplinary basis of informatics and practical implications of integrating diversity in informatics research and development. In the final part, two concepts from the social sciences and the humanities, the notion of “third space”/hybridity and the notion of “feminist ethics of care”, serve as a lens to foster more sustainable ways of working with diversity in informatics.
Knowledge-intensive business processes are flexible and data-driven. Therefore, traditional process modeling languages do not meet their requirements: These languages focus on highly structured processes in which data plays a minor role. As a result, process-oriented information systems fail to assist knowledge workers on executing their processes. We propose a novel case management approach that combines flexible activity-centric processes with data models, and we provide a joint semantics using colored Petri nets. The approach is suited to model, verify, and enact knowledge-intensive processes and can aid the development of information systems that support knowledge work.
Knowledge-intensive processes are human-centered, multi-variant, and data-driven. Typical domains include healthcare, insurances, and law. The processes cannot be fully modeled, since the underlying knowledge is too vast and changes too quickly. Thus, models for knowledge-intensive processes are necessarily underspecified. In fact, a case emerges gradually as knowledge workers make informed decisions. Knowledge work imposes special requirements on modeling and managing respective processes. They include flexibility during design and execution, ad-hoc adaption to unforeseen situations, and the integration of behavior and data. However, the predominantly used process modeling languages (e.g., BPMN) are unsuited for this task.
Therefore, novel modeling languages have been proposed. Many of them focus on activities' data requirements and declarative constraints rather than imperative control flow. Fragment-Based Case Management, for example, combines activity-centric imperative process fragments with declarative data requirements. At runtime, fragments can be combined dynamically, and new ones can be added. Yet, no integrated semantics for flexible activity-centric process models and data models exists.
In this thesis, Wickr, a novel case modeling approach extending fragment-based Case Management, is presented. It supports batch processing of data, sharing data among cases, and a full-fledged data model with associations and multiplicity constraints. We develop a translational semantics for Wickr targeting (colored) Petri nets. The semantics assert that a case adheres to the constraints in both the process fragments and the data models. Among other things, multiplicity constraints must not be violated. Furthermore, the semantics are extended to multiple cases that operate on shared data. Wickr shows that the data structure may reflect process behavior and vice versa. Based on its semantics, prototypes for executing and verifying case models showcase the feasibility of Wickr. Its applicability to knowledge-intensive and to data-centric processes is evaluated using well-known requirements from related work.
Clustering in education is important in identifying groups of objects in order to find linked patterns of correlations in educational datasets. As such, MOOCs provide a rich source of educational datasets which enable a wide selection of options to carry out clustering and an opportunity for cohort analyses. In this experience paper, five research studies on clustering in MOOCs are reviewed, drawing out several reasonings, methods, and students’ clusters that reflect certain kinds of learning behaviours. The collection of the varied clusters shows that each study identifies and defines clusters according to distinctive engagement patterns. Implications and a summary are provided at the end of the paper.
First come, first served: Critical choices between alternative actions are often made based on events external to an organization, and reacting promptly to their occurrence can be a major advantage over the competition. In Business Process Management (BPM), such deferred choices can be expressed in process models, and they are an important aspect of process engines. Blockchain-based process execution approaches are no exception to this, but are severely limited by the inherent properties of the platform: The isolated environment prevents direct access to external entities and data, and the non-continual runtime based entirely on atomic transactions impedes the monitoring and detection of events. In this paper we provide an in-depth examination of the semantics of deferred choice, and transfer them to environments such as the blockchain. We introduce and compare several oracle architectures able to satisfy certain requirements, and show that they can be implemented using state-of-the-art blockchain technology.
A survey has been carried out in the Computer Science (CS) department at the University of Baghdad to investigate the attitudes of CS students in a female dominant environment, showing the differences between male and female students in different academic years. We also compare the attitudes of the freshman students of two different cultures (University of Baghdad, Iraq, and the University of Potsdam).
The term Linked Data refers to connected information sources comprising structured data about a wide range of topics and for a multitude of applications. In recent years, the conceptional and technical foundations of Linked Data have been formalized and refined. To this end, well-known technologies have been established, such as the Resource Description Framework (RDF) as a Linked Data model or the SPARQL Protocol and RDF Query Language (SPARQL) for retrieving this information. Whereas most research has been conducted in the area of generating and publishing Linked Data, this thesis presents novel approaches for improved management. In particular, we illustrate new methods for analyzing and processing SPARQL queries. Here, we present two algorithms suitable for identifying structural relationships between these queries. Both algorithms are applied to a large number of real-world requests to evaluate the performance of the approaches and the quality of their results. Based on this, we introduce different strategies enabling optimized access of Linked Data sources. We demonstrate how the presented approach facilitates effective utilization of SPARQL endpoints by prefetching results relevant for multiple subsequent requests. Furthermore, we contribute a set of metrics for determining technical characteristics of such knowledge bases. To this end, we devise practical heuristics and validate them through thorough analysis of real-world data sources. We discuss the findings and evaluate their impact on utilizing the endpoints. Moreover, we detail the adoption of a scalable infrastructure for improving Linked Data discovery and consumption. As we outline in an exemplary use case, this platform is eligible both for processing and provisioning the corresponding information.
In this paper we report on our experiments in teaching computer science concepts with a mix of tangible and abstract object manipulations. The goal we set ourselves was to let pupils discover the challenges one has to meet to automatically manipulate formatted text. We worked with a group of 25 secondary school pupils (9-10th grade), and they were actually able to “invent” the concept of mark-up language. From this experiment we distilled a set of activities which will be replicated in other classes (6th grade) under the guidance of maths teachers.