004 Datenverarbeitung; Informatik
Refine
Year of publication
- 2022 (48) (remove)
Document Type
- Article (20)
- Doctoral Thesis (12)
- Monograph/Edited Volume (10)
- Postprint (4)
- Conference Proceeding (2)
Language
- English (48) (remove)
Is part of the Bibliography
- yes (48)
Keywords
- machine learning (5)
- Business Process Management (2)
- Data profiling (2)
- YouTube (2)
- authorship attribution (2)
- gender (2)
- maschinelles Lernen (2)
- social media (2)
- text based classification methods (2)
- Activity-oriented Optimization (1)
Institute
- Hasso-Plattner-Institut für Digital Engineering GmbH (23)
- Wirtschaftswissenschaften (7)
- Digital Engineering Fakultät (4)
- Hasso-Plattner-Institut für Digital Engineering gGmbH (4)
- Institut für Informatik und Computational Science (4)
- Fachgruppe Betriebswirtschaftslehre (3)
- Sozialwissenschaften (2)
- Fachgruppe Soziologie (1)
- Institut für Biochemie und Biologie (1)
- Institut für Mathematik (1)
CovRadar
(2022)
The ongoing pandemic caused by SARS-CoV-2 emphasizes the importance of genomic surveillance to understand the evolution of the virus, to monitor the viral population, and plan epidemiological responses. Detailed analysis, easy visualization and intuitive filtering of the latest viral sequences are powerful for this purpose. We present CovRadar, a tool for genomic surveillance of the SARS-CoV-2 Spike protein. CovRadar consists of an analytical pipeline and a web application that enable the analysis and visualization of hundreds of thousand sequences. First, CovRadar extracts the regions of interest using local alignment, then builds a multiple sequence alignment, infers variants and consensus and finally presents the results in an interactive app, making accessing and reporting simple, flexible and fast.
COMMIT
(2022)
Composition and functions of microbial communities affect important traits in diverse hosts, from crops to humans. Yet, mechanistic understanding of how metabolism of individual microbes is affected by the community composition and metabolite leakage is lacking. Here, we first show that the consensus of automatically generated metabolic reconstructions improves the quality of the draft reconstructions, measured by comparison to reference models. We then devise an approach for gap filling, termed COMMIT, that considers metabolites for secretion based on their permeability and the composition of the community. By applying COMMIT with two soil communities from the Arabidopsis thaliana culture collection, we could significantly reduce the gap-filling solution in comparison to filling gaps in individual reconstructions without affecting the genomic support. Inspection of the metabolic interactions in the soil communities allows us to identify microbes with community roles of helpers and beneficiaries. Therefore, COMMIT offers a versatile fully automated solution for large-scale modelling of microbial communities for diverse biotechnological applications. <br /> Author summaryMicrobial communities are important in ecology, human health, and crop productivity. However, detailed information on the interactions within natural microbial communities is hampered by the community size, lack of detailed information on the biochemistry of single organisms, and the complexity of interactions between community members. Metabolic models are comprised of biochemical reaction networks based on the genome annotation, and can provide mechanistic insights into community functions. Previous analyses of microbial community models have been performed with high-quality reference models or models generated using a single reconstruction pipeline. However, these models do not contain information on the composition of the community that determines the metabolites exchanged between the community members. In addition, the quality of metabolic models is affected by the reconstruction approach used, with direct consequences on the inferred interactions between community members. Here, we use fully automated consensus reconstructions from four approaches to arrive at functional models with improved genomic support while considering the community composition. We applied our pipeline to two soil communities from the Arabidopsis thaliana culture collection, providing only genome sequences. Finally, we show that the obtained models have 90% genomic support and demonstrate that the derived interactions are corroborated by independent computational predictions.
ReadBouncer
(2022)
Motivation:
Nanopore sequencers allow targeted sequencing of interesting nucleotide sequences by rejecting other sequences from individual pores. This feature facilitates the enrichment of low-abundant sequences by depleting overrepresented ones in-silico. Existing tools for adaptive sampling either apply signal alignment, which cannot handle human-sized reference sequences, or apply read mapping in sequence space relying on fast graphical processing units (GPU) base callers for real-time read rejection. Using nanopore long-read mapping tools is also not optimal when mapping shorter reads as usually analyzed in adaptive sampling applications.
Results:
Here, we present a new approach for nanopore adaptive sampling that combines fast CPU and GPU base calling with read classification based on Interleaved Bloom Filters. ReadBouncer improves the potential enrichment of low abundance sequences by its high read classification sensitivity and specificity, outperforming existing tools in the field. It robustly removes even reads belonging to large reference sequences while running on commodity hardware without GPUs, making adaptive sampling accessible for in-field researchers. Readbouncer also provides a user-friendly interface and installer files for end-users without a bioinformatics background.
Teaching and learning as well as administrative processes are still experiencing intensive changes with the rise of artificial intelligence (AI) technologies and its diverse application opportunities in the context of higher education. Therewith, the scientific interest in the topic in general, but also specific focal points rose as well. However, there is no structured overview on AI in teaching and administration processes in higher education institutions that allows to identify major research topics and trends, and concretizing peculiarities and develops recommendations for further action. To overcome this gap, this study seeks to systematize the current scientific discourse on AI in teaching and administration in higher education institutions. This study identified an (1) imbalance in research on AI in educational and administrative contexts, (2) an imbalance in disciplines and lack of interdisciplinary research, (3) inequalities in cross-national research activities, as well as (4) neglected research topics and paths. In this way, a comparative analysis between AI usage in administration and teaching and learning processes, a systematization of the state of research, an identification of research gaps as well as further research path on AI in higher education institutions are contributed to research.
Teaching and learning as well as administrative processes are still experiencing intensive changes with the rise of artificial intelligence (AI) technologies and its diverse application opportunities in the context of higher education. Therewith, the scientific interest in the topic in general, but also specific focal points rose as well. However, there is no structured overview on AI in teaching and administration processes in higher education institutions that allows to identify major research topics and trends, and concretizing peculiarities and develops recommendations for further action. To overcome this gap, this study seeks to systematize the current scientific discourse on AI in teaching and administration in higher education institutions. This study identified an (1) imbalance in research on AI in educational and administrative contexts, (2) an imbalance in disciplines and lack of interdisciplinary research, (3) inequalities in cross-national research activities, as well as (4) neglected research topics and paths. In this way, a comparative analysis between AI usage in administration and teaching and learning processes, a systematization of the state of research, an identification of research gaps as well as further research path on AI in higher education institutions are contributed to research.
The use of neural networks is considered as the state of the art in the field of image classification. A large number of different networks are available for this purpose, which, appropriately trained, permit a high level of classification accuracy. Typically, these networks are applied to uncompressed image data, since a corresponding training was also carried out using image data of similar high quality. However, if image data contains image errors, the classification accuracy deteriorates drastically. This applies in particular to coding artifacts which occur due to image and video compression. Typical application scenarios for video compression are narrowband transmission channels for which video coding is required but a subsequent classification is to be carried out on the receiver side. In this paper we present a special H.264/Advanced Video Codec (AVC) based video codec that allows certain regions of a picture to be coded with near constant picture quality in order to allow a reliable classification using neural networks, whereas the remaining image will be coded using constant bit rate. We have combined this feature with the ability to run with lowest latency properties, which is usually also required in remote control applications scenarios. The codec has been implemented as a fully hardwired High Definition video capable hardware architecture which is suitable for Field Programmable Gate Arrays.
“Broadcast your gender.”
(2022)
Social media platforms provide a large array of behavioral data relevant to social scientific research. However, key information such as sociodemographic characteristics of agents are often missing. This paper aims to compare four methods of classifying social attributes from text. Specifically, we are interested in estimating the gender of German social media creators. By using the example of a random sample of 200 YouTube channels, we compare several classification methods, namely (1) a survey among university staff, (2) a name dictionary method with the World Gender Name Dictionary as a reference list, (3) an algorithmic approach using the website gender-api.com, and (4) a Multinomial Naïve Bayes (MNB) machine learning technique. These different methods identify gender attributes based on YouTube channel names and descriptions in German but are adaptable to other languages. Our contribution will evaluate the share of identifiable channels, accuracy and meaningfulness of classification, as well as limits and benefits of each approach. We aim to address methodological challenges connected to classifying gender attributes for YouTube channels as well as related to reinforcing stereotypes and ethical implications.
“Broadcast your gender.”
(2022)
Social media platforms provide a large array of behavioral data relevant to social scientific research. However, key information such as sociodemographic characteristics of agents are often missing. This paper aims to compare four methods of classifying social attributes from text. Specifically, we are interested in estimating the gender of German social media creators. By using the example of a random sample of 200 YouTube channels, we compare several classification methods, namely (1) a survey among university staff, (2) a name dictionary method with the World Gender Name Dictionary as a reference list, (3) an algorithmic approach using the website gender-api.com, and (4) a Multinomial Naïve Bayes (MNB) machine learning technique. These different methods identify gender attributes based on YouTube channel names and descriptions in German but are adaptable to other languages. Our contribution will evaluate the share of identifiable channels, accuracy and meaningfulness of classification, as well as limits and benefits of each approach. We aim to address methodological challenges connected to classifying gender attributes for YouTube channels as well as related to reinforcing stereotypes and ethical implications.
The analysis of behavioral models such as Graph Transformation Systems (GTSs) is of central importance in model-driven engineering. However, GTSs often result in intractably large or even infinite state spaces and may be equipped with multiple or even infinitely many start graphs. To mitigate these problems, static analysis techniques based on finite symbolic representations of sets of states or paths thereof have been devised. We focus on the technique of k-induction for establishing invariants specified using graph conditions. To this end, k-induction generates symbolic paths backwards from a symbolic state representing a violation of a candidate invariant to gather information on how that violation could have been reached possibly obtaining contradictions to assumed invariants. However, GTSs where multiple agents regularly perform actions independently from each other cannot be analyzed using this technique as of now as the independence among backward steps may prevent the gathering of relevant knowledge altogether.
In this paper, we extend k-induction to GTSs with multiple agents thereby supporting a wide range of additional GTSs. As a running example, we consider an unbounded number of shuttles driving on a large-scale track topology, which adjust their velocity to speed limits to avoid derailing. As central contribution, we develop pruning techniques based on causality and independence among backward steps and verify that k-induction remains sound under this adaptation as well as terminates in cases where it did not terminate before.