Hasso-Plattner-Institut für Digital Engineering GmbH
Refine
Year of publication
Document Type
- Article (157) (remove)
Language
- English (157)
Keywords
- MOOC (33)
- e-learning (33)
- Digitale Bildung (32)
- Kursdesign (32)
- Micro Degree (32)
- Online-Lehre (32)
- Onlinekurs (32)
- Onlinekurs-Produktion (32)
- digital education (32)
- micro degree (32)
The transversal hypergraph problem asks to enumerate the minimal hitting sets of a hypergraph. If the solutions have bounded size, Eiter and Gottlob [SICOMP'95] gave an algorithm running in output-polynomial time, but whose space requirement also scales with the output. We improve this to polynomial delay and space. Central to our approach is the extension problem, deciding for a set X of vertices whether it is contained in any minimal hitting set. We show that this is one of the first natural problems to be W[3]-complete. We give an algorithm for the extension problem running in time O(m(vertical bar X vertical bar+1) n) and prove a SETH-lower bound showing that this is close to optimal. We apply our enumeration method to the discovery problem of minimal unique column combinations from data profiling. Our empirical evaluation suggests that the algorithm outperforms its worst-case guarantees on hypergraphs stemming from real-world databases.
Circular economy
(2021)
In a circular economy, the use of recycled resources in production is a key performance indicator for management. Yet, academic studies are still unable to inform managers on appropriate recycling and pricing policies. We develop an optimal control model integrating a firm's recycling rate, which can use both virgin and recycled resources in the production process. Our model accounts for recycling influence both at the supply- and demandsides. The positive effect of a firm's use of recycled resources diminishes over time but may increase through investments. Using general formulations for demand and cost, we analytically examine joint dynamic pricing and recycling investment policies in order to determine their optimal interplay over time. We provide numerical experiments to assess the existence of a steady-state and to calculate sensitivity analyses with respect to various model parameters. The analysis shows how to dynamically adapt jointly optimized controls to reach sustainability in the production process. Our results pave the way to sounder sustainable practices for firms operating within a circular economy.
Design thinking is a well-established practical and educational approach to fostering high-level creativity and innovation, which has been refined since the 1950s with the participation of experts like Joy Paul Guilford and Abraham Maslow. Through real-world projects, trainees learn to optimize their creative outcomes by developing and practicing creative cognition and metacognition. This paper provides a holistic perspective on creativity, enabling the formulation of a comprehensive theoretical framework of creative metacognition. It focuses on the design thinking approach to creativity and explores the role of metacognition in four areas of creativity expertise: Products, Processes, People, and Places. The analysis includes task-outcome relationships (product metacognition), the monitoring of strategy effectiveness (process metacognition), an understanding of individual or group strengths and weaknesses (people metacognition), and an examination of the mutual impact between environments and creativity (place metacognition). It also reviews measures taken in design thinking education, including a distribution of cognition and metacognition, to support students in their development of creative mastery. On these grounds, we propose extended methods for measuring creative metacognition with the goal of enhancing comprehensive assessments of the phenomenon. Proposed methodological advancements include accuracy sub-scales, experimental tasks where examinees explore problem and solution spaces, combinations of naturalistic observations with capability testing, as well as physiological assessments as indirect measures of creative metacognition.
CrashNet
(2021)
Destructive car crash tests are an elaborate, time-consuming, and expensive necessity of the automotive development process. Today, finite element method (FEM) simulations are used to reduce costs by simulating car crashes computationally. We propose CrashNet, an encoder-decoder deep neural network architecture that reduces costs further and models specific outcomes of car crashes very accurately. We achieve this by formulating car crash events as time series prediction enriched with a set of scalar features. Traditional sequence-to-sequence models are usually composed of convolutional neural network (CNN) and CNN transpose layers. We propose to concatenate those with an MLP capable of learning how to inject the given scalars into the output time series. In addition, we replace the CNN transpose with 2D CNN transpose layers in order to force the model to process the hidden state of the set of scalars as one time series. The proposed CrashNet model can be trained efficiently and is able to process scalars and time series as input in order to infer the results of crash tests. CrashNet produces results faster and at a lower cost compared to destructive tests and FEM simulations. Moreover, it represents a novel approach in the car safety management domain.
In liquid-chromatography-tandem-mass-spectrometry-based proteomics, information about the presence and stoichiometry ofprotein modifications is not readily available. To overcome this problem,we developed multiFLEX-LF, a computational tool that builds uponFLEXIQuant, which detects modified peptide precursors and quantifiestheir modification extent by monitoring the differences between observedand expected intensities of the unmodified precursors. multiFLEX-LFrelies on robust linear regression to calculate the modification extent of agiven precursor relative to a within-study reference. multiFLEX-LF cananalyze entire label-free discovery proteomics data sets in a precursor-centric manner without preselecting a protein of interest. To analyzemodification dynamics and coregulated modifications, we hierarchicallyclustered the precursors of all proteins based on their computed relativemodification scores. We applied multiFLEX-LF to a data-independent-acquisition-based data set acquired using the anaphase-promoting complex/cyclosome (APC/C) isolated at various time pointsduring mitosis. The clustering of the precursors allows for identifying varying modification dynamics and ordering the modificationevents. Overall, multiFLEX-LF enables the fast identification of potentially differentially modified peptide precursors and thequantification of their differential modification extent in large data sets using a personal computer. Additionally, multiFLEX-LF candrive the large-scale investigation of the modification dynamics of peptide precursors in time-series and case-control studies.multiFLEX-LF is available athttps://gitlab.com/SteenOmicsLab/multiflex-lf.
CovRadar
(2022)
The ongoing pandemic caused by SARS-CoV-2 emphasizes the importance of genomic surveillance to understand the evolution of the virus, to monitor the viral population, and plan epidemiological responses. Detailed analysis, easy visualization and intuitive filtering of the latest viral sequences are powerful for this purpose. We present CovRadar, a tool for genomic surveillance of the SARS-CoV-2 Spike protein. CovRadar consists of an analytical pipeline and a web application that enable the analysis and visualization of hundreds of thousand sequences. First, CovRadar extracts the regions of interest using local alignment, then builds a multiple sequence alignment, infers variants and consensus and finally presents the results in an interactive app, making accessing and reporting simple, flexible and fast.
Omics and male infertility
(2022)
Male infertility is a multifaceted disorder affecting approximately 50% of male partners in infertile couples.
Over the years, male infertility has been diagnosed mainly through semen analysis, hormone evaluations, medical records and physical examinations, which of course are fundamental, but yet inefficient, because 30% of male infertility cases remain idiopathic. This dilemmatic status of the unknown needs to be addressed with more sophisticated and result-driven technologies and/or techniques.
Genetic alterations have been linked with male infertility, thereby unveiling the practicality of investigating this disorder from the "omics" perspective.
Omics aims at analyzing the structure and functions of a whole constituent of a given biological function at different levels, including the molecular gene level (genomics), transcript level (transcriptomics), protein level (proteomics) and metabolites level (metabolomics). In the current study, an overview of the four branches of omics and their roles in male infertility are briefly discussed; the potential usefulness of assessing transcriptomic data to understand this pathology is also elucidated.
After assessing the publicly obtainable transcriptomic data for datasets on male infertility, a total of 1385 datasets were retrieved, of which 10 datasets met the inclusion criteria and were used for further analysis.
These datasets were classified into groups according to the disease or cause of male infertility.
The groups include non-obstructive azoospermia (NOA), obstructive azoospermia (OA), non-obstructive and obstructive azoospermia (NOA and OA), spermatogenic dysfunction, sperm dysfunction, and Y chromosome microdeletion.
Findings revealed that 8 genes (LDHC, PDHA2, TNP1, TNP2, ODF1, ODF2, SPINK2, PCDHB3) were commonly differentially expressed between all disease groups.
Likewise, 56 genes were common between NOA versus NOA and OA (ADAD1, BANF2, BCL2L14, C12orf50, C20orf173, C22orf23, C6orf99, C9orf131, C9orf24, CABS1, CAPZA3, CCDC187, CCDC54, CDKN3, CEP170, CFAP206, CRISP2, CT83, CXorf65, FAM209A, FAM71F1, FAM81B, GALNTL5, GTSF1, H1FNT, HEMGN, HMGB4, KIF2B, LDHC, LOC441601, LYZL2, ODF1, ODF2, PCDHB3, PDHA2, PGK2, PIH1D2, PLCZ1, PROCA1, RIMBP3, ROPN1L, SHCBP1L, SMCP, SPATA16, SPATA19, SPINK2, TEX33, TKTL2, TMCO2, TMCO5A, TNP1, TNP2, TSPAN16, TSSK1B, TTLL2, UBQLN3).
These genes, particularly the above-mentioned 8 genes, are involved in diverse biological processes such as germ cell development, spermatid development, spermatid differentiation, regulation of proteolysis, spermatogenesis and metabolic processes.
Owing to the stage-specific expression of these genes, any mal-expression can ultimately lead to male infertility.
Therefore, currently available data on all branches of omics relating to male fertility can be used to identify biomarkers for diagnosing male infertility, which can potentially help in unravelling some idiopathic cases.
Peer assessment in MOOCs
(2021)
We report on a systematic review of the landscape of peer assessment in massive open online courses (MOOCs) with papers from 2014 to 2020 in 20 leading education technology publication venues across four databases containing education technology-related papers, addressing three research issues: the evolution of peer assessment in MOOCs during the period 2014 to 2020, the methods used in MOOCs to assess peers, and the challenges of and future directions in MOOC peer assessment. We provide summary statistics and a review of methods across the corpus and highlight three directions for improving the use of peer assessment in MOOCs: the need for focusing on scaling learning through peer evaluations, the need for scaling and optimizing team submissions in team peer assessments, and the need for embedding a social process for peer assessment.
A path in an edge-colored graph is rainbow if no two edges of it are colored the same, and the graph is rainbow-connected if there is a rainbow path between each pair of its vertices. The minimum number of colors needed to rainbow-connect a graph G is the rainbow connection number of G, denoted by rc(G).& nbsp;A simple way to rainbow-connect a graph G is to color the edges of a spanning tree with distinct colors and then re-use any of these colors to color the remaining edges of G. This proves that rc(G) <= |V (G)|-1. We ask whether there is a stronger connection between tree-like structures and rainbow coloring than that is implied by the above trivial argument. For instance, is it possible to find an upper bound of t(G)-1 for rc(G), where t(G) is the number of vertices in the largest induced tree of G? The answer turns out to be negative, as there are counter-examples that show that even c .t(G) is not an upper bound for rc(G) for any given constant c.& nbsp;In this work we show that if we consider the forest number f(G), the number of vertices in a maximum induced forest of G, instead of t(G), then surprisingly we do get an upper bound. More specifically, we prove that rc(G) <= f(G) + 2. Our result indicates a stronger connection between rainbow connection and tree-like structures than that was suggested by the simple spanning tree based upper bound.
Graphs play an important role in many areas of Computer Science. In particular, our work is motivated by model-driven software development and by graph databases. For this reason, it is very important to have the means to express and to reason about the properties that a given graph may satisfy. With this aim, in this paper we present a visual logic that allows us to describe graph properties, including navigational properties, i.e., properties about the paths in a graph. The logic is equipped with a deductive tableau method that we have proved to be sound and complete.
Industry 4.0 is transforming how businesses innovate and, as a result, companies are spearheading the movement towards 'Digital Transformation'. While some scholars advocate the use of design thinking to identify new innovative behaviours, cognition experts emphasise the importance of top managers in supporting employees to develop these behaviours. However, there is a dearth of research in this domain and companies are struggling to implement the required behaviours. To address this gap, this study aims to identify and prioritise behavioural strategies conducive to design thinking to inform the creation of a managerial mental model. We identify 20 behavioural strategies from 45 interviewees with practitioners and educators and combine them with the concepts of 'paradigm-mindset-mental model' from cognition theory. The paper contributes to the body of knowledge by identifying and prioritising specific behavioural strategies to form a novel set of survival conditions aligned to the new industrial paradigm of Industry 4.0.
As resources are valuable assets, organizations have to decide which resources to allocate to business process tasks in a way that the process is executed not only effectively but also efficiently. Traditional role-based resource allocation leads to effective process executions, since each task is performed by a resource that has the required skills and competencies to do so. However, the resulting allocations are typically not as efficient as they could be, since optimization techniques have yet to find their way in traditional business process management scenarios. On the other hand, operations research provides a rich set of analytical methods for supporting problem-specific decisions on resource allocation. This paper provides a novel framework for creating transparency on existing tasks and resources, supporting individualized allocations for each activity in a process, and the possibility to integrate problem-specific analytical methods of the operations research domain. To validate the framework, the paper reports on the design and prototypical implementation of a software architecture, which extends a traditional process engine with a dedicated resource management component. This component allows us to define specific resource allocation problems at design time, and it also facilitates optimized resource allocation at run time. The framework is evaluated using a real-world parcel delivery process. The evaluation shows that the quality of the allocation results increase significantly with a technique from operations research in contrast to the traditional applied rule-based approach.
In the field of Business Process Management (BPM), modeling business processes and related data is a critical issue since process activities need to manage data stored in databases. The connection between processes and data is usually handled at the implementation level, even if modeling both processes and data at the conceptual level should help designers in improving business process models and identifying requirements for implementation. Especially in data -and decision-intensive contexts, business process activities need to access data stored both in databases and data warehouses. In this paper, we complete our approach for defining a novel conceptual view that bridges process activities and data. The proposed approach allows the designer to model the connection between business processes and database models and define the operations to perform, providing interesting insights on the overall connected perspective and hints for identifying activities that are crucial for decision support.
Background:
More patient data are needed to improve research on rare liver diseases. Mobile health apps enable an exhaustive data collection. Therefore, the European Reference Network on Hepatological diseases (ERN RARE-LIVER) intends to implement an app for patients with rare liver diseases communicating with a patient registry, but little is known about which features patients and their healthcare providers regard as being useful.
Aims:
This study aimed to investigate how an app for rare liver diseases would be accepted, and to find out which features are considered useful.
Methods:
An anonymous survey was conducted on adult patients with rare liver diseases at a single academic, tertiary care outpatient-service. Additionally, medical experts of the ERN working group on autoimmune hepatitis were invited to participate in an online survey.
Results:
In total, the responses from 100 patients with autoimmune (n = 90) or other rare (n = 10) liver diseases and 32 experts were analyzed. Patients were convinced to use a disease specific app (80%) and expected some benefit to their health (78%) but responses differed signifi-cantly between younger and older patients (93% vs. 62%, p < 0.001; 88% vs. 64%, p < 0.01). Comparing patients' and experts' feedback, patients more often expected a simplified healthcare pathway (e.g. 89% vs. 59% (p < 0.001) wanted access to one's own medical records), while healthcare providers saw the benefit mainly in improving compliance and treatment outcome (e.g. 93% vs. 31% (p < 0.001) and 70% vs. 21% (p < 0.001) expected the app to reduce mistakes in taking medication and improve quality of life, respectively).
Process mining techniques are valuable to gain insights into and help improve (work) processes. Many of these techniques focus on the sequential order in which activities are performed. Few of these techniques consider the statistical relations within processes. In particular, existing techniques do not allow insights into how responses to an event (action) result in desired or undesired outcomes (effects). We propose and formalize the ARE miner, a novel technique that allows us to analyze and understand these action-response-effect patterns. We take a statistical approach to uncover potential dependency relations in these patterns. The goal of this research is to generate processes that are: (1) appropriately represented, and (2) effectively filtered to show meaningful relations. We evaluate the ARE miner in two ways. First, we use an artificial data set to demonstrate the effectiveness of the ARE miner compared to two traditional process-oriented approaches. Second, we apply the ARE miner to a real-world data set from a Dutch healthcare institution. We show that the ARE miner generates comprehensible representations that lead to informative insights into statistical relations between actions, responses, and effects.
Despite advances in machine learning-based clinical prediction models, only few of such models are actually deployed in clinical contexts. Among other reasons, this is due to a lack of validation studies. In this paper, we present and discuss the validation results of a machine learning model for the prediction of acute kidney injury in cardiac surgery patients initially developed on the MIMIC-III dataset when applied to an external cohort of an American research hospital. To help account for the performance differences observed, we utilized interpretability methods based on feature importance, which allowed experts to scrutinize model behavior both at the global and local level, making it possible to gain further insights into why it did not behave as expected on the validation cohort. The knowledge gleaned upon derivation can be potentially useful to assist model update during validation for more generalizable and simpler models. We argue that interpretability methods should be considered by practitioners as a further tool to help explain performance differences and inform model update in validation studies.
Anomaly detection in process mining aims to recognize outlying or unexpected behavior in event logs for purposes such as the removal of noise and identification of conformance violations. Existing techniques for this task are primarily frequency-based, arguing that behavior is anomalous because it is uncommon. However, such techniques ignore the semantics of recorded events and, therefore, do not take the meaning of potential anomalies into consideration. In this work, we overcome this caveat and focus on the detection of anomalies from a semantic perspective, arguing that anomalies can be recognized when process behavior does not make sense. To achieve this, we propose an approach that exploits the natural language associated with events. Our key idea is to detect anomalous process behavior by identifying semantically inconsistent execution patterns. To detect such patterns, we first automatically extract business objects and actions from the textual labels of events. We then compare these against a process-independent knowledge base. By populating this knowledge base with patterns from various kinds of resources, our approach can be used in a range of contexts and domains. We demonstrate the capability of our approach to successfully detect semantic execution anomalies through an evaluation based on a set of real-world and synthetic event logs and show the complementary nature of semantics-based anomaly detection to existing frequency-based techniques.
Open edX is an incredible platform to deliver MOOCs and SPOCs, designed to be robust and support hundreds of thousands of students at the same time. Nevertheless, it lacks a lot of the fine-grained functionality needed to handle students individually in an on-campus course. This short session will present the ongoing project undertaken by the 6 public universities of the Region of Madrid plus the Universitat Politècnica de València, in the framework of a national initiative called UniDigital, funded by the Ministry of Universities of Spain within the Plan de Recuperación, Transformación y Resiliencia of the European Union. This project, led by three of these Spanish universities (UC3M, UPV, UAM), is investing more than half a million euros with the purpose of bringing the Open edX platform closer to the functionalities required for an LMS to support on-campus teaching. The aim of the project is to coordinate what is going to be developed with the Open edX development community, so these developments are incorporated into the core of the Open edX platform in its next releases. Features like a complete redesign of platform analytics to make them real-time, the creation of dashboards based on these analytics, the integration of a system for customized automatic feedback, improvement of exams and tasks and the extension of grading capabilities, improvements in the graphical interfaces for both students and teachers, the extension of the emailing capabilities, redesign of the file management system, integration of H5P content, the integration of a tool to create mind maps, the creation of a system to detect students at risk, or the integration of an advanced voice assistant and a gamification mobile app, among others, are part of the functionalities to be developed. The idea is to transform a first-class MOOC platform into the next on-campus LMS.
“How can a course structure be redesigned based on empirical data to enhance the learning effectiveness through a student-centered approach using objective criteria?”, was the research question we asked. “Digital Twins for Virtual Commissioning of Production Machines” is a course using several innovative concepts including an in-depth practical part with online experiments, called virtual labs. The teaching-learning concept is continuously evaluated. Card Sorting is a popular method for designing information architectures (IA), “a practice of effectively organizing, structuring, and labeling the content of a website or application into a structuref that enables efficient navigation” [11]. In the presented higher education context, a so-called hybrid card sort was used, in which each participants had to sort 70 cards into seven predefined categories or create new categories themselves. Twelve out of 28 students voluntarily participated in the process and short interviews were conducted after the activity. The analysis of the category mapping creates a quantitative measure of the (dis-)similarity of the keywords in specific categories using hierarchical clustering (HCA). The learning designer could then interpret the results to make decisions about the number, labeling and order of sections in the course.
With the growing number of online learning resources, it becomes increasingly difficult and overwhelming to keep track of the latest developments and to find orientation in the plethora of offers. AI-driven services to recommend standalone learning resources or even complete learning paths are discussed as a possible solution for this challenge. To function properly, such services require a well-defined set of metadata provided by the learning resource. During the last few years, the so-called MOOChub metadata format has been established as a de-facto standard by a group of MOOC providers in German-speaking countries. This format, which is based on schema.org, already delivers a quite comprehensive set of metadata. So far, this set has been sufficient to list, display, sort, filter, and search for courses on several MOOC and open educational resources (OER) aggregators. AI recommendation services and further automated integration, beyond a plain listing, have special requirements, however. To optimize the format for proper support of such systems, several extensions and modifications have to be applied. We herein report on a set of suggested changes to prepare the format for this task.
The integration of MOOCs into the Moroccan Higher Education (MHE) took place in 2013 by developing different partnerships and projects at national and international levels. As elsewhere, the Covid-19 crisis has played an important role in accelerating distance education in MHE. However, based on our experience as both university professors and specialists in educational engineering, the effective execution of the digital transition has not yet been implemented. Thus, in this article, we present a retrospective feedback of MOOCs in Morocco, focusing on the policies taken by the government to better support the digital transition in general and MOOCs in particular. We are therefore seeking to establish an optimal scenario for the promotion of MOOCs, which emphasizes the policies to be considered, and which recalls the importance of conducting a delicate articulation taking into account four levels, namely environmental, institutional, organizational and individual. We conclude with recommendations that are inspired by the Moroccan academic contex that focus on the major role that MOOCs plays for university students and on maintaining lifelong learning.
“Financial Analysis” is an online course designed for professionals consisting of three MOOCs, offering a professionally and institutionally recognized certificate in finance. The course is open but not free of charge and attracts mostly professionals from the banking industry. The primary objective of this study is to identify indicators that can predict learners at high risk of failure. To achieve this, we analyzed data from a previous course that had 875 enrolled learners and involve in the course during Fall 2021. We utilized correspondence analysis to examine demographic and behavioral variables.
The initial results indicate that demographic factors have a minor impact on the risk of failure in comparison to learners’ behaviors on the course platform. Two primary profiles were identified: (1) successful learners who utilized all the documents offered and spent between one to two hours per week, and (2) unsuccessful learners who used less than half of the proposed documents and spent less than one hour per week. Between these groups, at-risk students were identified as those who used more than half of the proposed documents and spent more than two hours per week. The goal is to identify those in group 1 who may be at risk of failing and those in group 2 who may succeed in the current MOOC, and to implement strategies to assist all learners in achieving success.
This paper presents a new design for MOOCs for professional development of skills needed to meet the UN Sustainable Development Goals – the CoMOOC or Co-designed Massive Open Online Collaboration. The CoMOOC model is based on co-design with multiple stakeholders including end-users within the professional communities the CoMOOC aims to reach. This paper shows how the CoMOOC model could help the tertiary sector deliver on the UN Sustainable Development Goals (UNSDGs) – including but not limited to SDG 4 Education – by providing a more effective vehicle for professional development at a scale that the UNSDGs require. Interviews with professionals using MOOCs, and design-based research with professionals have informed the development of the Co-MOOC model. This research shows that open, online, collaborative learning experiences are highly effective for building professional community knowledge. Moreover, this research shows that the collaborative learning design at the heart of the CoMOOC model is feasible cross-platform Research with teachers working in crisis contexts in Lebanon, many of whom were refugees, will be presented to show how this form of large scale, co-designed, online learning can support professionals, even in the most challenging contexts, such as mass displacement, where expertise is urgently required.
xMOOCs
(2023)
The World Health Organization designed OpenWHO.org to provide an inclusive and accessible online environment to equip learners across the globe with critical up-to-date information and to be able to effectively protect themselves in health emergencies. The platform thus focuses on the eXtended Massive Open Online Course (xMOOC) modality – contentfocused and expert-driven, one-to-many modelled, and self-paced for scalable learning. In this paper, we describe how OpenWHO utilized xMOOCs to reach mass audiences during the COVID-19 pandemic; the paper specifically examines the accessibility, language inclusivity and adaptability of hosted xMOOCs. As of February 2023, OpenWHO had 7.5 million enrolments across 200 xMOOCs on health emergency, epidemic, pandemic and other public health topics available across 65 languages, including 46 courses targeted for the COVID-19 pandemic. Our results suggest that the xMOOC modality allowed OpenWHO to expand learning during the pandemic to previously underrepresented groups, including women, participants ages 70 and older, and learners younger than age 20. The OpenWHO use case shows that xMOOCs should be considered when there is a need for massive knowledge transfer in health emergency situations, yet the approach should be context-specific according to the type of health emergency, targeted population and region. Our evidence also supports previous calls to put intervention elements that contribute to removing barriers to access at the core of learning and health information dissemination. Equity must be the fundamental principle and organizing criteria for public health work.
How to reuse inclusive stem Moocs in blended settings to engage young girls to scientific careers
(2023)
The FOSTWOM project (2019–2022), an ERASMUS+ funding, gave METID (Politecnico di Milano) and the MOOC Técnico (Instituto Superior Técnico, University of Lisbon), together with other partners, the opportunity to support the design and creation of gender-inclusive MOOCs. Among other project outputs, we designed a toolkit and a framework that enabled the production of two MOOCs for undergraduate and graduate students in Science, Technology, Engineering and Maths (STEM) and used them as academic content free of gender stereotypes about intellectual ability. In this short paper, the authors aim to 1) briefly share the main outputs of the project; 2) tell the story of how the FOSTWOM approach together with 3) a motivational strategy, the Heroine’s Learning Journey, proved to be effective in the context of rural and marginal areas in Brazil, with young girls as a specific target audience.
Challenges and proposals for introducing digital certificates in higher education infrastructures
(2023)
Questions about the recognition of MOOCs within and outside higher education were already being raised in the early 2010s. Today, recognition decisions are still made more or less on a case-by-case basis. However, digital certification approaches are now emerging that could automate recognition processes. The technical development of the required machinereadable documents and infrastructures is already well advanced in some cases. The DigiCerts consortium has developed a solution based on a collective blockchain. There are ongoing and open discussions regarding the particular technology, but the institutional implementation of digital certificates raises further questions. A number of workshops have been held at the Institute for Interactive Systems at Technische Hochschule Lübeck, which have identified the need for new responsibilities for issuing certificates. It has also become clear that all members of higher education institutions need to develop skills in the use of digital certificates.
To implement OERs at HEIs sustainably, not just technical infrastructure is required, but also well-trained staff. The University of Graz is in charge of an OER training program for university staff as part of the collaborative project Open Education Austria Advanced (OEAA) with the aim of ensuring long-term competence growth in the use and creation of OERs. The program consists of a MOOC and a guided blended learning format that was evaluated to find out which accompanying teaching and learning concepts can best facilitate targeted competence development. The evaluation of the program shows that learning videos, self-study assignments and synchronous sessions are most useful for the learning process. The results indicate that the creation of OERs is a complex process that can be undergone more effectively in the guided program.
Loss of expertise in the fields of Nuclear- and Radio-Chemistry (NRC) is problematic at a scientific and social level. This has been addressed by developing a MOOC, in order to let students in scientific matters discover all the benefits of NRC to society and improving their awareness of this discipline. The MOOC “Essential Radiochemistry for Society” includes current societal challenges related to health, clean and sustainable energy for safety and quality of food and agriculture.
NRC teachers belonging to CINCH network were invited to use the MOOC in their teaching, according to various usage models: on the basis of these different experiences, some usage patterns were designed, describing context characteristics (number and age of students, course), activities’ scheduling and organization, results and students’ feedback, with the aim of encouraging the use of MOOCs in university teaching, as an opportunity for both lecturers and students. These models were the basis of a “toolkit for teachers”. By experiencing digital teaching resources created by different lecturers, CINCH teachers took a first meaningful step towards understanding the worth of Open Educational Resources (OER) and the importance of their creation, adoption and sharing for knowledge progress. In this paper, the entire path from MOOC concept to MOOC different usage models, to awareness-raising regarding OER is traced in conceptual stages.
Innovat MOOC
(2023)
The COVID-19 pandemic has revealed the importance for university teachers to have adequate pedagogical and technological competences to cope with the various possible educational scenarios (face-to-face, online, hybrid, etc.), making use of appropriate active learning methodologies and supporting technologies to foster a more effective learning environment. In this context, the InnovaT project has been an important initiative to support the development of pedagogical and technological competences of university teachers in Latin America through several trainings aiming to promote teacher innovation. These trainings combined synchronous online training through webinars and workshops with asynchronous online training through the MOOC “Innovative Teaching in Higher Education.” This MOOC was released twice. The first run took place right during the lockdown of 2020, when Latin American teachers needed urgent training to move to emergency remote teaching overnight. The second run took place in 2022 with the return to face-to-face teaching and the implementation of hybrid educational models. This article shares the results of the design of the MOOC considering the constraints derived from the lockdowns applied in each country, the lessons learned from the delivery of such a MOOC to Latin American university teachers, and the results of the two runs of the MOOC.
As Thailand moves towards becoming an innovation-driven economy, the need for human capital development has become crucial. Work-based skill MOOCs, offered on Thai MOOC, a national digital learning platform launched by Thailand Cyber University Project, ministry of Higher Education, Science, Research and Innovation, provide an effective way to overcome this challenge. This paper discusses the challenges faced in designing an instruction for work-based skill MOOCs that can serve as a foundation model for many more to come. The instructional design of work-based skill courses in Thai MOOC involves four simple steps, including course selection, learning from accredited providers, course requirements completion, and certification of acquired skills. The development of such courses is ongoing at the higher education level, vocational level, and pre-university level, which serve as a foundation model for many more work-based skill MOOC that will be offered on Thai MOOC soon. The instructional design of work-based skills courses should focus on the development of currently demanded professional competencies and skills, increasing the efficiency of work in the organization, creativity, and happiness in life that meets the human resources needs of industries in the 4.0 economy era in Thailand. This paper aims to present the challenges of designing instruction for work-based skill MOOCs and suggests effective ways to design instruction to enhance workforce development in Thailand.
ReadBouncer
(2022)
Motivation:
Nanopore sequencers allow targeted sequencing of interesting nucleotide sequences by rejecting other sequences from individual pores. This feature facilitates the enrichment of low-abundant sequences by depleting overrepresented ones in-silico. Existing tools for adaptive sampling either apply signal alignment, which cannot handle human-sized reference sequences, or apply read mapping in sequence space relying on fast graphical processing units (GPU) base callers for real-time read rejection. Using nanopore long-read mapping tools is also not optimal when mapping shorter reads as usually analyzed in adaptive sampling applications.
Results:
Here, we present a new approach for nanopore adaptive sampling that combines fast CPU and GPU base calling with read classification based on Interleaved Bloom Filters. ReadBouncer improves the potential enrichment of low abundance sequences by its high read classification sensitivity and specificity, outperforming existing tools in the field. It robustly removes even reads belonging to large reference sequences while running on commodity hardware without GPUs, making adaptive sampling accessible for in-field researchers. Readbouncer also provides a user-friendly interface and installer files for end-users without a bioinformatics background.
This research paper aims to introduce a novel practitioner-oriented and research-based taxonomy of video genres. This taxonomy can serve as a scaffolding strategy to support educators throughout the entire educational system in creating videos for pedagogical purposes. A taxonomy of video genres is essential as videos are highly valued resources among learners. Although the use of videos in education has been extensively researched and well-documented in systematic research reviews, gaps remain in the literature. Predominantly, researchers employ sophisticated quantitative methods and similar approaches to measure the performance of videos. This trend has led to the emergence of a strong learning analytics research tradition with its embedded literature. This body of research includes analysis of performance of videos in online courses such as Massive Open Online Courses (MOOCs). Surprisingly, this same literature is limited in terms of research outlining approaches to designing and creating educational videos, which applies to both video-based learning and online courses. This issue results in a knowledge gap, highlighting the need for developing pedagogical tools and strategies for video making. These can be found in frameworks, guidelines, and taxonomies, which can serve as scaffolding strategies. In contrast, there appears to be very few frameworks available for designing and creating videos for pedagogica purposes, apart from a few well-known frameworks. In this regard, this research paper proposes a novel taxonomy of video genres that educators can utilize when creating videos intended for use in either video-based learning environments or online courses. To create this taxonomy, a large number of videos from online courses were collected and analyzed using a mixed-method research design approach.
From MOOC to “2M-POC”
(2023)
IFP School develops and produces MOOCs since 2014. After the COVID-19 crisis, the demand of our industrial and international partners to offer continuous training to their employees increased drastically in an energy transition and sustainable mobility environment that finds itself in constant and rapid evolution. Therefore, it is time for a new format of digital learning tools to efficiently and rapidly train an important number of employees. To address this new demand, in a more and more digital learning environment, we have completely changed our initial MOOC model to propose an innovative SPOC business model mixing synchronous and asynchronous modules. This paper describes the work that has been done to transform our MOOCs to a hybrid SPOC model. We changed the format itself from a standard MOOC model of several weeks to small modules of one week average more adapted to our client’s demand. We precisely engineered the exchanges between learners and the social aspect all along the SPOC duration. We propose a multimodal approach with a combination of asynchronous activities like online module, exercises, and synchronous activities like webinars with experts, and after-work sessions. Additionally, this new format increases the number of uses of the MOOC resources by our professors in our own master programs.
With all these actions, we were able to reach a completion rate between 80 and 96% – total enrolled –, compared to the completion rate of 15 to 28% – total enrolled – as to be recorded in our original MOOC format. This is to be observed for small groups (50–100 learners) as SPOC but also for large groups (more than 2500 learners), as a Massive and Multimodal Private Online Course (“2M-POC”). Today a MOOC is not a simple assembly of videos, text, discussions forums and validation exercises but a complete multimodal learning path including social learning, personal followup, synchronous and asynchronous modules. We conclude that the original MOOC format is not at all suitable to propose efficient training to companies, and we must re-engineer the learning path to have a SPOC hybrid and multimodal training compatible with a cost-effective business model.
In 2020, the project “iMooX – The MOOC Platform as a Service for all Austrian Universities” was launched. It is co-financed by the Austrian Ministry of Education, Science and Research. After half of the funding period, the project management wants to assess and share results and outcomes but also address (potential) additional “impacts” of the MOOC platform. Building upon work on OER impact assessment, this contribution describes in detail how the specific iMooX.at approach of impact measurement was developed. Literature review, stakeholder analysis, and problem-based interviews were the base for developing a questionnaire addressing the defined key stakeholder “MOOC creators”. The article also presents the survey results in English for the first time but focuses more on the development, strengths, and weaknesses of the selected methods. The article is seen as a contribution to the further development of impact assessment for MOOC platforms.
Thai MOOC academy
(2023)
Thai MOOC Academy is a national digital learning platform that has been serving as a mechanism for promoting lifelong learning in Thailand since 2017. It has recently undergone significant improvements and upgrades, including the implementation of a credit bank system and a learner’s eportfolio system interconnected with the platform. Thai MOOC Academy is introducing a national credit bank system for accreditation and management, which allows for the transfer of expected learning outcomes and educational qualifications between formal education, non-formal education, and informal education. The credit bank system has five distinct features, including issuing forgery-prevented certificates, recording learning results, transferring external credits within the same wallet, accumulating learning results, and creating a QR code for verification purposes. The paper discusses the features and future potential of Thai MOOC Academy, as it is extended towards a sandbox for the national credit bank system in Thailand.
The MOOChub is a joined web-based catalog of all relevant German and Austrian MOOC platforms that lists well over 750 Massive Open Online Courses (MOOCs). Automatically building such a catalog requires that all partners describe and publicly offer the metadata of their courses in the same way. The paper at hand presents the genesis of the idea to establish a common metadata standard and the story of its subsequent development. The result of this effort is, first, an open-licensed de-facto-standard, which is based on existing commonly used standards and second, a first prototypical platform that is using this standard: the MOOChub, which lists all courses of the involved partners. This catalog is searchable and provides a more comprehensive overview of basically all MOOCs that are offered by German and Austrian MOOC platforms. Finally, the upcoming developments to further optimize the catalog and the metadata standard are reported.
Digital technologies have enabled a variety of learning offers that opened new challenges in terms of recognition of formal, informal and non-formal learning, such as MOOCs.
This paper focuses on how providing relevant data to describe a MOOC is conducive to increase the transparency of information and, ultimately, the flexibility of European higher education.
The EU-funded project ECCOE took up these challenges and developed a solution by identifying the most relevant descriptors of a learning opportunity with a view to supporting a European system for micro-credentials. Descriptors indicate the specific properties of a learning opportunity according to European standards. They can provide a recognition framework also for small volumes of learning (micro-credentials) to support the integration of non-formal learning (MOOCs) into formal learning (e.g. institutional university courses) and to tackle skills shortage, upskilling and reskilling by acquiring relevant competencies. The focus on learning outcomes can facilitate the recognition of skills and competences of students and enhance both virtual and physical mobility and employability.
This paper presents two contexts where ECCOE descriptors have been adopted: the Politecnico di Milano MOOC platform (Polimi Open Knowledge – POK), which is using these descriptors as the standard information to document the features of its learning opportunities, and the EU-funded Uforest project on urban forestry, which developed a blended training program for students of partner universities whose MOOCs used the ECCOE descriptors.
Practice with ECCOE descriptors shows how they can be used not only to detail MOOC features, but also as a compass to design the learning offer. In addition, some rules of thumb can be derived and applied when using specific descriptors.
Founded in 2013, OpenClassrooms is a French online learning company that offers both paid courses and free MOOCs on a wide range of topics, including computer science and education. In 2021, in partnership with the EDA research unit, OpenClassrooms shared a database to solve the problem of how to increase persistence in their paid courses, which consist of a series of MOOCs and human mentoring. Our statistical analysis aims to identify reasons for dropouts that are due to the course design rather than demographic predictors or external factors.We aim to identify at-risk students, i.e. those who are on the verge of dropping out at a specific moment. To achieve this, we use learning analytics to characterize student behavior. We conducted data analysis on a sample of data related to the “Web Designers” and “Instructional Design” courses. By visualizing the student flow and constructing speed and acceleration predictors, we can identify which parts of the course need to be calibrated and when particular attention should be paid to these at-risk students.
The main aim of this article is to explore how learning analytics and synchronous collaboration could improve course completion and learner outcomes in MOOCs, which traditionally have been delivered asynchronously. Based on our experience with developing BigBlueButton, a virtual classroom platform that provides educators with live analytics, this paper explores three scenarios with business focused MOOCs to improve outcomes and strengthen learned skills.
Modularization describes the transformation of MOOCs from a comprehensive academic course format into smaller, more manageable learning offerings. It can be seen as one of the prerequisites for the successful implementation of MOOC-based micro-credentials in professional education and training. This short paper reports on the development and application of a modularization framework for Open Online Courses. Using the example of eGov-Campus, a German MOOC provider for the public sector linked to both academia and formal professional development, the structural specifications for modularized MOOC offerings and a methodology for course transformation as well as associated challenges in technology, organization and educational design are outlined. Following on from this, future prospects are discussed under the headings of individualization, certification and integration.
This work explores the use of different generative AI tools in the design of MOOC courses. Authors in this experience employed a variety of AI-based tools, including natural language processing tools (e.g. Chat-GPT), and multimedia content authoring tools (e.g. DALLE-2, Midjourney, Tome.ai) to assist in the course design process. The aim was to address the unique challenges of MOOC course design, which includes to create engaging and effective content, to design interactive learning activities, and to assess student learning outcomes. The authors identified positive results with the incorporation of AI-based tools, which significantly improved the quality and effectiveness of MOOC course design. The tools proved particularly effective in analyzing and categorizing course content, identifying key learning objectives, and designing interactive learning activities that engaged students and facilitated learning. Moreover, the use of AI-based tools, streamlined the course design process, significantly reducing the time required to design and prepare the courses. In conclusion, the integration of generative AI tools into the MOOC course design process holds great potential for improving the quality and efficiency of these courses. Researchers and course designers should consider the advantages of incorporating generative AI tools into their design process to enhance their course offerings and facilitate student learning outcomes while also reducing the time and effort required for course development.
The massive growth of MOOCs in 2011 laid the groundwork for the achievement of SDG 4. With the various benefits of MOOCs, there is also anticipation that online education should focus on more interactivity and global collaboration. In this context, the Global MOOC and Online Education Alliance (GMA) established a diverse group of 17 world-leading universities and three online education platforms from across 14 countries on all six continents in 2020. Through nearly three years of exploration, GMA has gained experience and achieved progress in fostering global cooperation in higher education. First, in joint teaching, GMA has promoted in-depth cooperation between members inside and outside the alliance. Examples include promoting the exchange of high-quality MOOCs, encouraging the creation of Global Hybrid Classroom, and launching Global Hybrid Classroom Certificate Programs. Second, in capacity building and knowledge sharing, GMA has launched Online Education Dialogues and the Global MOOC and Online Education Conference, inviting global experts to share best practices and attracting more than 10 million viewers around the world. Moreover, GMA is collaborating with international organizations to support teachers’ professional growth, create an online learning community, and serve as a resource for further development. Third, in public advocacy, GMA has launched the SDG Hackathon and Global Massive Open Online Challenge (GMOOC) and attracted global learners to acquire knowledge and incubate their innovative ideas within a cross-cultural community to solve real-world problems that all humans face and jointly create a better future. Based on past experiences and challenges, GMA will explore more diverse cooperation models with more partners utilizing advanced technology, provide more support for digital transformation in higher education, and further promote global cooperation towards building a human community with a shared future.
This research paper provides an overview of the current state of MOOCs (massive open online courses) and universities in Austria, focusing on the national MOOC platform iMooX.at. The study begins by presenting the results of an analysis of the performance agreements of 22 Austrian public universities for the period 2022–2024, with a specific focus on the mention of MOOC activities and iMooX. The authors find that 12 of 22 (55 %) Austrian public universities use at least one of these terms, indicating a growing interest in MOOCs and online learning. Additionally, the authors analyze internal documentation data to share insights into how many universities in Austria have produced and/or used a MOOC on the iMooX platform since its launch in 2014. These findings provide a valuable measure of the current usage and monitoring of MOOCs and iMooX among Austrian higher education institutions. Overall, this research contributes to a better understanding of the current state of MOOCs and their integration within Austrian higher education.
Based on the performance requirements of modern spatio-temporal data mining applications, in-memory database systems are often used to store and process the data. To efficiently utilize the scarce DRAM capacities, modern database systems support various tuning possibilities to reduce the memory footprint (e.g., data compression) or increase performance (e.g., additional indexes). However, the selection of cost and performance balancing configurations is challenging due to the vast number of possible setups consisting of mutually dependent individual decisions. In this paper, we introduce a novel approach to jointly optimize the compression, sorting, indexing, and tiering configuration for spatio-temporal workloads. Further, we consider horizontal data partitioning, which enables the independent application of different tuning options on a fine-grained level. We propose different linear programming (LP) models addressing cost dependencies at different levels of accuracy to compute optimized tuning configurations for a given workload and memory budgets. To yield maintainable and robust configurations, we extend our LP-based approach to incorporate reconfiguration costs as well as a worst-case optimization for potential workload scenarios. Further, we demonstrate on a real-world dataset that our models allow to significantly reduce the memory footprint with equal performance or increase the performance with equal memory size compared to existing tuning heuristics.
This short paper sets out to propose a novel and interesting learning design that facilitates for cooperative learning in which students do not conduct traditional group work in an asynchronous online education setting. This learning design will be explored in a Small Private Online Course (SPOC) among teachers and school managers at a teacher education. Such an approach can be made possible by applying specific criteria commonly used to define collaborative learning. Collaboration can be defined, among other things, as a structured way of working among students that includes elements of co-laboring. The cooperative learning design involves adapting various traditional collaborative learning approaches for use in an online learning environment. A critical component of this learning design is that students work on a self-defined case project related to their professional practices. Through an iterative process, students will receive ongoing feedback and formative assessments from instructors and follow students at specific points, meaning that co-constructing of knowledge and learning takes place as the SPOC progresses. This learning design can contribute to better learning experiences and outcomes for students, and be a valuable contribution to current research discussions on learning design in Massive Open Online Courses (MOOCs).
This qualitative study explores the impact of Personalized Learning Experience (PLE) courses at a higher education institution from the perspective of undergraduate students. The PLE program requires students to take at least one of their elective courses in the form of MOOCs during their undergraduate studies. Drawing on interviews with six students across different faculties, the study identified four key themes that encapsulate the effects of PLE courses: (1) Certificate driven learning with a focus on occupation skill enhancement, (2) diverse course offerings to enhance personal and academic development, (3) learning flexibility, and (4) student satisfaction. The findings suggest that PLE courses offered through MOOC platforms allow students to broaden their academic horizons, gain valuable skills, and tailor their education to better align with their interests and goals. Furthermore, this study highlights the potential benefits of incorporating PLE courses in higher education institutions, emphasizing their role in promoting a more dynamic and student-centered learning environment.
In an effort to describe and produce different formats for video instruction, the research community in technology-enhanced learning, and MOOC scholars in particular, have focused on the general style of video production: whether it is a digitally scripted “talk-and-chalk” or a “talking head” version of a learning unit. Since these production styles include various sub-elements, this paper deconstructs the inherited elements of video production in the context of educational live-streams. Using over 700 videos – both from synchronous and asynchronous modalities of large video-based platforms (YouTube and Twitch), 92 features were found in eight categories of video production. These include commonly analyzed features such as the use of green screen and a visible instructor, but also less studied features such as social media connections and changing camera perspective depending on the topic being covered. Overall, the research results enable an analysis of common video production styles and a toolbox for categorizing new formats – independent of their final (a)synchronous use in MOOCs. Keywords: video production, MOOC video styles, live-streaming.
This paper investigates private university students’ language learning activities in MOOC platforms and their attitude toward it. The study explores the development of MOOC use in Chinese private universities, with a focus on two modes: online et blended. We conducted empirical studies with students learning French and Japanese as a second foreign language, using questionnaires (N = 387) and interviews (N = 20) at a private university in Wuhan. Our results revealed that the majority of students used the MOOC platform more than twice a week and focused on the MOOC video, materials and assignments. However, we also found that students showed less interest in online communication (forums). Those who worked in the blended learning mode, especially Japanese learning students, had a more positive attitude toward MOOCs than other students.
Recent findings suggest a role of oxytocin on the tendency to spontaneously mimic the emotional facial expressions of others. Oxytocin-related increases of facial mimicry, however, seem to be dependent on contextual factors. Given previous literature showing that people preferentially mimic emotional expressions of individuals associated with high (vs. low) rewards, we examined whether the reward value of the mimicked agent is one factor influencing the oxytocin effects on facial mimicry. To test this hypothesis, 60 male adults received 24 IU of either intranasal oxytocin or placebo in a double-blind, between-subject experiment. Next, the value of male neutral faces was manipulated using an associative learning task with monetary rewards. After the reward associations were learned, participants watched videos of the same faces displaying happy and angry expressions. Facial reactions to the emotional expressions were measured with electromyography. We found that participants judged as more pleasant the face identities associated with high reward values than with low reward values. However, happy expressions by low rewarding faces were more spontaneously mimicked than high rewarding faces. Contrary to our expectations, we did not find a significant direct effect of intranasal oxytocin on facial mimicry, nor on the reward-driven modulation of mimicry. Our results support the notion that mimicry is a complex process that depends on contextual factors, but failed to provide conclusive evidence of a role of oxytocin on the modulation of facial mimicry.
Somatosensory input generated by one's actions (i.e., self-initiated body movements) is generally attenuated. Conversely, externally caused somatosensory input is enhanced, for example, during active touch and the haptic exploration of objects. Here, we used functional magnetic resonance imaging (fMRI) to ask how the brain accomplishes this delicate weighting of self-generated versus externally caused somatosensory components. Finger movements were either self-generated by our participants or induced by functional electrical stimulation (FES) of the same muscles. During half of the trials, electrotactile impulses were administered when the (actively or passively) moving finger reached a predefined flexion threshold. fMRI revealed an interaction effect in the contralateral posterior insular cortex (pIC), which responded more strongly to touch during self-generated than during FES-induced movements. A network analysis via dynamic causal modeling revealed that connectivity from the secondary somatosensory cortex via the pIC to the supplementary motor area was generally attenuated during self-generated relative to FES-induced movements-yet specifically enhanced by touch received during self-generated, but not FES-induced movements. Together, these results suggest a crucial role of the parietal operculum and the posterior insula in differentiating self-generated from externally caused somatosensory information received from one's moving limb.
“One video fit for all”
(2023)
Online learning in mathematics has always been challenging, especially for mathematics in STEM education. This paper presents how to make “one fit for all” lecture videos for mathematics in STEM education. In general, we do believe that there is no such thing as “one fit for all” video. The curriculum requires a high level of prior knowledge in mathematics from high school to get a good understanding, and the variation of prior knowledge levels among STEM education students is often high. This creates challenges for both online teaching and on-campus teaching. This article presents experimenting and researching on a video format where students can get a real-time feeling, and which fits their needs regarding their existing prior knowledge. They have the possibility to ask and receive answers during the video without having to feel that they must jump into different sources, which helps to reduce unnecessary distractions. The fundamental video format presented here is that of dynamic branching videos, which has to little degree been researched in education related studies. The reason might be that this field is quite new for higher education, and there is relatively high requirement on the video editing skills from the teachers’ side considering the platforms that are available so far. The videos are implemented for engineering students who take the Linear Algebra course at the Norwegian University of Science and Technology in spring 2023. Feedback from the students gathered via anonymous surveys so far (N = 21) is very positive. With the high suitability for online teaching, this video format might lead the trend of online learning in the future. The design and implementation of dynamic videos in mathematics in higher education was presented for the first time at the EMOOCs conference 2023.
Rigorous runtime analysis is a major approach towards understanding evolutionary computing techniques, and in this area linear pseudo-Boolean objective functions play a central role. Having an additional linear constraint is then equivalent to the NP-hard Knapsack problem, certain classes thereof have been studied in recent works. In this article, we present a dynamic model of optimizing linear functions under uniform constraints. Starting from an optimal solution with respect to a given constraint bound, we investigate the runtimes that different evolutionary algorithms need to recompute an optimal solution when the constraint bound changes by a certain amount. The classical (1+1) EA and several population-based algorithms are designed for that purpose, and are shown to recompute efficiently. Furthermore, a variant of the (1+(λ,λ))GA for the dynamic optimization problem is studied, whose performance is better when the change of the constraint bound is small.
Academia-industry collaborations are beneficial when both sides bring strengths to the partnership and the collaboration outcome is of mutual benefit. These types of collaboration projects are seen as a low-risk learning opportunity for both parties. In this paper, government initiatives that can change the business landscape and academia-industry collaborations that can provide upskilling opportunities to fill emerging business needs are discussed. In light of Japan’s push for next-level modernization, a Japanese software company took a positive stance towards building new capabilities outside what it had been offering its customers. Consequently, an academic research group is laying out infrastructure for learning analytics research. An existing learning analytics dashboard was modularized to allow the research group to focus on natural language processing experiments while the software company explores a development framework suitable for data visualization techniques and artificial intelligence development. The results of this endeavor demonstrate that companies working with academia can creatively explore collaborations outside typical university-supported avenues.
The TU Delft Extension School for Continuing Education develops and delivers MOOCs, programs and other online courses for lifelong learners and professionals worldwide focused on Science, Engineering & Design. At the beginning of 2022, we started a project to examine whether creating an online course had any impact on TU Delft campus education. Through a survey, we collected feedback from 68 TU Delft lecturers involved in developing and offering online courses and programs for lifelong learners and professionals. The lecturers reported on the impact of developing an online course on a personal and curricular level. The results showed that the developed online materials, and the acquired skills and experiences from creating online courses, were beneficial for campus education, especially during the transition to remote emergency teaching in the COVID-19 lockdown periods. In this short paper, we will describe the responses in detail and map the benefits and challenges experienced by lecturers when implementing their online course materials and newly acquired educational skills on campus. Finally, we will explore future possibilities to extend the reported, already relevant, impact of MOOCs and of other online courses on campus education.
ganon
(2020)
Motivation:
The exponential growth of assembled genome sequences greatly benefits metagenomics studies. However, currently available methods struggle to manage the increasing amount of sequences and their frequent updates. Indexing the current RefSeq can take days and hundreds of GB of memory on large servers. Few methods address these issues thus far, and even though many can theoretically handle large amounts of references, time/memory requirements are prohibitive in practice. As a result, many studies that require sequence classification use often outdated and almost never truly up-to-date indices.
Results:
Motivated by those limitations, we created ganon, a k-mer-based read classification tool that uses Interleaved Bloom Filters in conjunction with a taxonomic clustering and a k-mer counting/filtering scheme. Ganon provides an efficient method for indexing references, keeping them updated. It requires <55 min to index the complete RefSeq of bacteria, archaea, fungi and viruses. The tool can further keep these indices up-to-date in a fraction of the time necessary to create them. Ganon makes it possible to query against very large reference sets and therefore it classifies significantly more reads and identifies more species than similar methods. When classifying a high-complexity CAMI challenge dataset against complete genomes from RefSeq, ganon shows strongly increased precision with equal or better sensitivity compared with state-of-the-art tools. With the same dataset against the complete RefSeq, ganon improved the F1-score by 65% at the genus level. It supports taxonomy- and assembly-level classification, multiple indices and hierarchical classification.
An independency (cliquy) tree of an n-vertex graph G is a spanning tree of G in which the set of leaves induces an independent set (clique). We study the problems of minimizing or maximizing the number of leaves of such trees, and fully characterize their parameterized complexity. We show that all four variants of deciding if an independency/cliquy tree with at least/most l leaves exists parameterized by l are either Para-NP- or W[1]-hard. We prove that minimizing the number of leaves of a cliquy tree parameterized by the number of internal vertices is Para-NP-hard too. However, we show that minimizing the number of leaves of an independency tree parameterized by the number k of internal vertices has an O*(4(k))-time algorithm and a 2k vertex kernel. Moreover, we prove that maximizing the number of leaves of an independency/cliquy tree parameterized by the number k of internal vertices both have an O*(18(k))-time algorithm and an O(k 2(k)) vertex kernel, but no polynomial kernel unless the polynomial hierarchy collapses to the third level. Finally, we present an O(3(n) . f(n))-time algorithm to find a spanning tree where the leaf set has a property that can be decided in f (n) time and has minimum or maximum size.
We present fully polynomial time approximation schemes for a broad class of Holant problems with complex edge weights, which we call Holant polynomials. We transform these problems into partition functions of abstract combinatorial structures known as polymers in statistical physics. Our method involves establishing zero-free regions for the partition functions of polymer models and using the most significant terms of the cluster expansion to approximate them. Results of our technique include new approximation and sampling algorithms for a diverse class of Holant polynomials in the low-temperature regime (i.e. small external field) and approximation algorithms for general Holant problems with small signature weights. Additionally, we give randomised approximation and sampling algorithms with faster running times for more restrictive classes. Finally, we improve the known zero-free regions for a perfect matching polynomial.
Patent document collections are an immense source of knowledge for research and innovation communities worldwide. The rapid growth of the number of patent documents poses an enormous challenge for retrieving and analyzing information from this source in an effective manner. Based on deep learning methods for natural language processing, novel approaches have been developed in the field of patent analysis. The goal of these approaches is to reduce costs by automating tasks that previously only domain experts could solve. In this article, we provide a comprehensive survey of the application of deep learning for patent analysis. We summarize the state-of-the-art techniques and describe how they are applied to various tasks in the patent domain. In a detailed discussion, we categorize 40 papers based on the dataset, the representation, and the deep learning architecture that were used, as well as the patent analysis task that was targeted. With our survey, we aim to foster future research at the intersection of patent analysis and deep learning and we conclude by listing promising paths for future work.
In discrete manufacturing, the knowledge about causal relationships makes it possible to avoid unforeseen production downtimes by identifying their root causes. Learning causal structures from real-world settings remains challenging due to high-dimensional data, a mix of discrete and continuous variables, and requirements for preprocessing log data under the causal perspective. In our work, we address these challenges proposing a process for causal reasoning based on raw machine log data from production monitoring. Within this process, we define a set of transformation rules to extract independent and identically distributed observations. Further, we incorporate a variable selection step to handle high-dimensionality and a discretization step to include continuous variables. We enrich a commonly used causal structure learning algorithm with domain-related orientation rules, which provides a basis for causal reasoning. We demonstrate the process on a real-world dataset from a globally operating precision mechanical engineering company. The dataset contains over 40 million log data entries from production monitoring of a single machine. In this context, we determine the causal structures embedded in operational processes. Further, we examine causal effects to support machine operators in avoiding unforeseen production stops, i.e., by detaining machine operators from drawing false conclusions on impacting factors of unforeseen production stops based on experience.
RHEEMix in the data jungle
(2020)
Data analytics are moving beyond the limits of a single platform. In this paper, we present the cost-based optimizer of Rheem, an open-source cross-platform system that copes with these new requirements. The optimizer allocates the subtasks of data analytic tasks to the most suitable platforms. Our main contributions are: (i) a mechanism based on graph transformations to explore alternative execution strategies; (ii) a novel graph-based approach to determine efficient data movement plans among subtasks and platforms; and (iii) an efficient plan enumeration algorithm, based on a novel enumeration algebra. We extensively evaluate our optimizer under diverse real tasks. We show that our optimizer can perform tasks more than one order of magnitude faster when using multiple platforms than when using a single platform.
While supporting the execution of business processes, information systems record event logs. Conformance checking relies on these logs to analyze whether the recorded behavior of a process conforms to the behavior of a normative specification. A key assumption of existing conformance checking techniques, however, is that all events are associated with timestamps that allow to infer a total order of events per process instance. Unfortunately, this assumption is often violated in practice. Due to synchronization issues, manual event recordings, or data corruption, events are only partially ordered. In this paper, we put forward the problem of partial order resolution of event logs to close this gap. It refers to the construction of a probability distribution over all possible total orders of events of an instance. To cope with the order uncertainty in real-world data, we present several estimators for this task, incorporating different notions of behavioral abstraction. Moreover, to reduce the runtime of conformance checking based on partial order resolution, we introduce an approximation method that comes with a bounded error in terms of accuracy. Our experiments with real-world and synthetic data reveal that our approach improves accuracy over the state-of-the-art considerably.
This paper discusses the fitting of linear state space models to given multivariate time series in the presence of constraints imposed on the four main parameter matrices of these models. Constraints arise partly from the assumption that the models have a block-diagonal structure, with each block corresponding to an ARMA process, that allows the reconstruction of independent source components from linear mixtures, and partly from the need to keep models identifiable. The first stage of parameter fitting is performed by the expectation maximisation (EM) algorithm. Due to the identifiability constraint, a subset of the diagonal elements of the dynamical noise covariance matrix needs to be constrained to fixed values (usually unity). For this kind of constraints, so far, no closed-form update rules were available. We present new update rules for this situation, both for updating the dynamical noise covariance matrix directly and for updating a matrix square-root of this matrix. The practical applicability of the proposed algorithm is demonstrated by a low-dimensional simulation example. The behaviour of the EM algorithm, as observed in this example, illustrates the well-known fact that in practical applications, the EM algorithm should be combined with a different algorithm for numerical optimisation, such as a quasi-Newton algorithm.
Many participants in Massive Open Online Courses are full-time employees seeking greater flexibility in their time commitment and the available learning paths. We recently addressed these requirements by splitting up our 6-week courses into three 2-week modules followed by a separate exam. Modularizing courses offers many advantages: Shorter modules are more sustainable and can be combined, reused, and incorporated into learning paths more easily. Time flexibility for learners is also improved as exams can now be offered multiple times per year, while the learning content is available independently. In this article, we answer the question of which impact this modularization has on key learning metrics, such as course completion rates, learning success, and no-show rates. Furthermore, we investigate the influence of longer breaks between modules on these metrics. According to our analysis, course modules facilitate more selective learning behaviors that encourage learners to focus on topics they are the most interested in. At the same time, participation in overarching exams across all modules seems to be less appealing compared to an integrated exam of a 6-week course. While breaks between the modules increase the distinctive appearance of individual modules, a break before the final exam further reduces initial interest in the exams. We further reveal that participation in self-paced courses as a preparation for the final exam is unlikely to attract new learners to the course offerings, even though learners' performance is comparable to instructor-paced courses. The results of our long-term study on course modularization provide a solid foundation for future research and enable educators to make informed decisions about the design of their courses.
StudyMe
(2022)
N-of-1 trials are multi-crossover self-experiments that allow individuals to systematically evaluate the effect of interventions on their personal health goals. Although several tools for N-of-1 trials exist, there is a gap in supporting non-experts in conducting their own user-centric trials. In this study, we present StudyMe, an open-source mobile application that is freely available from https://play.google.com/store/apps/details?id=health.studyu.me and offers users flexibility and guidance in configuring every component of their trials. We also present research that informed the development of StudyMe, focusing on trial creation. Through an initial survey with 272 participants, we learned that individuals are interested in a variety of personal health aspects and have unique ideas on how to improve them. In an iterative, user-centered development process with intermediate user tests, we developed StudyMe that features an educational part to communicate N-of-1 trial concepts. A final empirical evaluation of StudyMe showed that all participants were able to create their own trials successfully using StudyMe and the app achieved a very good usability rating. Our findings suggest that StudyMe provides a significant step towards enabling individuals to apply a systematic science-oriented approach to personalize health-related interventions and behavior modifications in their everyday lives.
Here we present an exome-wide rare genetic variant association study for 30 blood biomarkers in 191,971 individuals in the UK Biobank. We compare gene- based association tests for separate functional variant categories to increase interpretability and identify 193 significant gene-biomarker associations. Genes associated with biomarkers were ~ 4.5-fold enriched for conferring Mendelian disorders. In addition to performing weighted gene-based variant collapsing tests, we design and apply variant-category-specific kernel-based tests that integrate quantitative functional variant effect predictions for mis- sense variants, splicing and the binding of RNA-binding proteins. For these tests, we present a computationally efficient combination of the likelihood- ratio and score tests that found 36% more associations than the score test alone while also controlling the type-1 error. Kernel-based tests identified 13% more associations than their gene-based collapsing counterparts and had advantages in the presence of gain of function missense variants. We introduce local collapsing by amino acid position for missense variants and use it to interpret associations and identify potential novel gain of function variants in PIEZO1. Our results show the benefits of investigating different functional mechanisms when performing rare-variant association tests, and demonstrate pervasive rare-variant contribution to biomarker variability.
Piloting a Survey-Based Assessment of Transparency and Trustworthiness with Three Medical AI Tools
(2022)
Artificial intelligence (AI) offers the potential to support healthcare delivery, but poorly trained or validated algorithms bear risks of harm. Ethical guidelines stated transparency about model development and validation as a requirement for trustworthy AI. Abundant guidance exists to provide transparency through reporting, but poorly reported medical AI tools are common. To close this transparency gap, we developed and piloted a framework to quantify the transparency of medical AI tools with three use cases. Our framework comprises a survey to report on the intended use, training and validation data and processes, ethical considerations, and deployment recommendations. The transparency of each response was scored with either 0, 0.5, or 1 to reflect if the requested information was not, partially, or fully provided. Additionally, we assessed on an analogous three-point scale if the provided responses fulfilled the transparency requirement for a set of trustworthiness criteria from ethical guidelines. The degree of transparency and trustworthiness was calculated on a scale from 0% to 100%. Our assessment of three medical AI use cases pin-pointed reporting gaps and resulted in transparency scores of 67% for two use cases and one with 59%. We report anecdotal evidence that business constraints and limited information from external datasets were major obstacles to providing transparency for the three use cases. The observed transparency gaps also lowered the degree of trustworthiness, indicating compliance gaps with ethical guidelines. All three pilot use cases faced challenges to provide transparency about medical AI tools, but more studies are needed to investigate those in the wider medical AI sector. Applying this framework for an external assessment of transparency may be infeasible if business constraints prevent the disclosure of information. New strategies may be necessary to enable audits of medical AI tools while preserving business secrets.
Privacy regulations and the physical distribution of heterogeneous data are often primary concerns for the development of deep learning models in a medical context. This paper evaluates the feasibility of differentially private federated learning for chest X-ray classification as a defense against data privacy attacks. To the best of our knowledge, we are the first to directly compare the impact of differentially private training on two different neural network architectures, DenseNet121 and ResNet50. Extending the federated learning environments previously analyzed in terms of privacy, we simulated a heterogeneous and imbalanced federated setting by distributing images from the public CheXpert and Mendeley chest X-ray datasets unevenly among 36 clients. Both non-private baseline models achieved an area under the receiver operating characteristic curve (AUC) of 0.940.94 on the binary classification task of detecting the presence of a medical finding. We demonstrate that both model architectures are vulnerable to privacy violation by applying image reconstruction attacks to local model updates from individual clients. The attack was particularly successful during later training stages. To mitigate the risk of a privacy breach, we integrated Rényi differential privacy with a Gaussian noise mechanism into local model training. We evaluate model performance and attack vulnerability for privacy budgets ε∈{1,3,6,10}�∈{1,3,6,10}. The DenseNet121 achieved the best utility-privacy trade-off with an AUC of 0.940.94 for ε=6�=6. Model performance deteriorated slightly for individual clients compared to the non-private baseline. The ResNet50 only reached an AUC of 0.760.76 in the same privacy setting. Its performance was inferior to that of the DenseNet121 for all considered privacy constraints, suggesting that the DenseNet121 architecture is more robust to differentially private training.
Quantifying neurological disorders from voice is a rapidly growing field of research and holds promise for unobtrusive and large-scale disorder monitoring. The data recording setup and data analysis pipelines are both crucial aspects to effectively obtain relevant information from participants. Therefore, we performed a systematic review to provide a high-level overview of practices across various neurological disorders and highlight emerging trends. PRISMA-based literature searches were conducted through PubMed, Web of Science, and IEEE Xplore to identify publications in which original (i.e., newly recorded) datasets were collected. Disorders of interest were psychiatric as well as neurodegenerative disorders, such as bipolar disorder, depression, and stress, as well as amyotrophic lateral sclerosis amyotrophic lateral sclerosis, Alzheimer's, and Parkinson's disease, and speech impairments (aphasia, dysarthria, and dysphonia). Of the 43 retrieved studies, Parkinson's disease is represented most prominently with 19 discovered datasets. Free speech and read speech tasks are most commonly used across disorders. Besides popular feature extraction toolkits, many studies utilise custom-built feature sets. Correlations of acoustic features with psychiatric and neurodegenerative disorders are presented. In terms of analysis, statistical analysis for significance of individual features is commonly used, as well as predictive modeling approaches, especially with support vector machines and a small number of artificial neural networks. An emerging trend and recommendation for future studies is to collect data in everyday life to facilitate longitudinal data collection and to capture the behavior of participants more naturally. Another emerging trend is to record additional modalities to voice, which can potentially increase analytical performance.
Multiplicative Up-Drift
(2020)
Drift analysis aims at translating the expected progress of an evolutionary algorithm (or more generally, a random process) into a probabilistic guarantee on its run time (hitting time). So far, drift arguments have been successfully employed in the rigorous analysis of evolutionary algorithms, however, only for the situation that the progress is constant or becomes weaker when approaching the target. Motivated by questions like how fast fit individuals take over a population, we analyze random processes exhibiting a (1+delta)-multiplicative growth in expectation. We prove a drift theorem translating this expected progress into a hitting time. This drift theorem gives a simple and insightful proof of the level-based theorem first proposed by Lehre (2011). Our version of this theorem has, for the first time, the best-possible near-linear dependence on 1/delta} (the previous results had an at least near-quadratic dependence), and it only requires a population size near-linear in delta (this was super-quadratic in previous results). These improvements immediately lead to stronger run time guarantees for a number of applications. We also discuss the case of large delta and show stronger results for this setting.
The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world(1). Here we describe the release of exome-sequence data for the first 49,960 study participants, revealing approximately 4 million coding variants (of which around 98.6% have a frequency of less than 1%). The data include 198,269 autosomal predicted loss-of-function (LOF) variants, a more than 14-fold increase compared to the imputed sequence. Nearly all genes (more than 97%) had at least one carrier with a LOF variant, and most genes (more than 69%) had at least ten carriers with a LOF variant. We illustrate the power of characterizing LOF variants in this population through association analyses across 1,730 phenotypes. In addition to replicating established associations, we found novel LOF variants with large effects on disease traits, includingPIEZO1on varicose veins,COL6A1on corneal resistance,MEPEon bone density, andIQGAP2andGMPRon blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical importance, and show that 2% of this population has a medically actionable variant. Furthermore, we characterize the penetrance of cancer in carriers of pathogenicBRCA1andBRCA2variants. Exome sequences from the first 49,960 participants highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community. <br /> Exome sequences from the first 49,960 participants in the UK Biobank highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community.
Recurrent generative adversarial network for learning imbalanced medical image semantic segmentation
(2020)
We propose a new recurrent generative adversarial architecture named RNN-GAN to mitigate imbalance data problem in medical image semantic segmentation where the number of pixels belongs to the desired object are significantly lower than those belonging to the background. A model trained with imbalanced data tends to bias towards healthy data which is not desired in clinical applications and predicted outputs by these networks have high precision and low recall. To mitigate imbalanced training data impact, we train RNN-GAN with proposed complementary segmentation mask, in addition, ordinary segmentation masks. The RNN-GAN consists of two components: a generator and a discriminator. The generator is trained on the sequence of medical images to learn corresponding segmentation label map plus proposed complementary label both at a pixel level, while the discriminator is trained to distinguish a segmentation image coming from the ground truth or from the generator network. Both generator and discriminator substituted with bidirectional LSTM units to enhance temporal consistency and get inter and intra-slice representation of the features. We show evidence that the proposed framework is applicable to different types of medical images of varied sizes. In our experiments on ACDC-2017, HVSMR-2016, and LiTS-2017 benchmarks we find consistently improved results, demonstrating the efficacy of our approach.
Primary keys (PKs) and foreign keys (FKs) are important elements of relational schemata in various applications, such as query optimization and data integration. However, in many cases, these constraints are unknown or not documented. Detecting them manually is time-consuming and even infeasible in large-scale datasets. We study the problem of discovering primary keys and foreign keys automatically and propose an algorithm to detect both, namely Holistic Primary Key and Foreign Key Detection (HoPF). PKs and FKs are subsets of the sets of unique column combinations (UCCs) and inclusion dependencies (INDs), respectively, for which efficient discovery algorithms are known. Using score functions, our approach is able to effectively extract the true PKs and FKs from the vast sets of valid UCCs and INDs. Several pruning rules are employed to speed up the procedure. We evaluate precision and recall on three benchmarks and two real-world datasets. The results show that our method is able to retrieve on average 88% of all primary keys, and 91% of all foreign keys. We compare the performance of HoPF with two baseline approaches that both assume the existence of primary keys.
Comment sections of online news platforms are an essential space to express opinions and discuss political topics. In contrast to other online posts, news discussions are related to particular news articles, comments refer to each other, and individual conversations emerge. However, the misuse by spammers, haters, and trolls makes costly content moderation necessary. Sentiment analysis can not only support moderation but also help to understand the dynamics of online discussions. A subtask of content moderation is the identification of toxic comments. To this end, we describe the concept of toxicity and characterize its subclasses. Further, we present various deep learning approaches, including datasets and architectures, tailored to sentiment analysis in online discussions. One way to make these approaches more comprehensible and trustworthy is fine-grained instead of binary comment classification. On the downside, more classes require more training data. Therefore, we propose to augment training data by using transfer learning. We discuss real-world applications, such as semi-automated comment moderation and troll detection. Finally, we outline future challenges and current limitations in light of most recent research publications.
Rapid decline of glomerular filtration rate estimated from creatinine (eGFRcrea) is associated with severe clinical endpoints. In contrast to cross-sectionally assessed eGFRcrea, the genetic basis for rapid eGFRcrea decline is largely unknown. To help define this, we meta-analyzed 42 genome-wide association studies from the Chronic Kidney Diseases Genetics Consortium and United Kingdom Biobank to identify genetic loci for rapid eGFRcrea decline. Two definitions of eGFRcrea decline were used: 3 mL/min/1.73m(2)/year or more ("Rapid3"; encompassing 34,874 cases, 107,090 controls) and eGFRcrea decline 25% or more and eGFRcrea under 60 mL/min/1.73m(2) at follow-up among those with eGFRcrea 60 mL/min/1.73m(2) or more at baseline ("CKDi25"; encompassing 19,901 cases, 175,244 controls). Seven independent variants were identified across six loci for Rapid3 and/or CKDi25: consisting of five variants at four loci with genome-wide significance (near UMOD-PDILT (2), PRKAG2, WDR72, OR2S2) and two variants among 265 known eGFRcrea variants (near GATM, LARP4B). All these loci were novel for Rapid3 and/or CKDi25 and our bioinformatic follow-up prioritized variants and genes underneath these loci. The OR2S2 locus is novel for any eGFRcrea trait including interesting candidates. For the five genome-wide significant lead variants, we found supporting effects for annual change in blood urea nitrogen or cystatin-based eGFR, but not for GATM or (LARP4B). Individuals at high compared to those at low genetic risk (8-14 vs. 0-5 adverse alleles) had a 1.20-fold increased risk of acute kidney injury (95% confidence interval 1.08-1.33). Thus, our identified loci for rapid kidney function decline may help prioritize therapeutic targets and identify mechanisms and individuals at risk for sustained deterioration of kidney function.
The 2020 European Bioinformatics Community for Mass Spectrometry (EuBIC-MS) Developers’ meeting was held from January 13th to January 17th 2020 in Nyborg, Denmark. Among the participants were scientists as well as developers working in the field of computational mass spectrometry (MS) and proteomics. The 4-day program was split between introductory keynote lectures and parallel hackathon sessions. During the latter, the participants developed bioinformatics tools and resources addressing outstanding needs in the community. The hackathons allowed less experienced participants to learn from more advanced computational MS experts, and to actively contribute to highly relevant research projects. We successfully produced several new tools that will be useful to the proteomics community by improving data analysis as well as facilitating future research. All keynote recordings are available on https://doi.org/10.5281/zenodo.3890181.
Social networking sites (SNS) are a rich source of latent information about individual characteristics. Crawling and analyzing this content provides a new approach for enterprises to personalize services and put forward product recommendations. In the past few years, commercial brands made a gradual appearance on social media platforms for advertisement, customers support and public relation purposes and by now it became a necessity throughout all branches. This online identity can be represented as a brand personality that reflects how a brand is perceived by its customers. We exploited recent research in text analysis and personality detection to build an automatic brand personality prediction model on top of the (Five-Factor Model) and (Linguistic Inquiry and Word Count) features extracted from publicly available benchmarks. Predictive evaluation on brands' accounts reveals that Facebook platform provides a slight advantage over Twitter platform in offering more self-disclosure for users' to express their emotions especially their demographic and psychological traits. Results also confirm the wider perspective that the same social media account carry a quite similar and comparable personality scores over different social media platforms. For evaluating our prediction results on actual brands' accounts, we crawled the Facebook API and Twitter API respectively for 100k posts from the most valuable brands' pages in the USA and we visualize exemplars of comparison results and present suggestions for future directions.
Viper
(2021)
Key-value stores (KVSs) have found wide application in modern software systems. For persistence, their data resides in slow secondary storage, which requires KVSs to employ various techniques to increase their read and write performance from and to the underlying medium. Emerging persistent memory (PMem) technologies offer data persistence at close-to-DRAM speed, making them a promising alternative to classical disk-based storage. However, simply drop-in replacing existing storage with PMem does not yield good results, as block-based access behaves differently in PMem than on disk and ignores PMem's byte addressability, layout, and unique performance characteristics. In this paper, we propose three PMem-specific access patterns and implement them in a hybrid PMem-DRAM KVS called Viper. We employ a DRAM-based hash index and a PMem-aware storage layout to utilize the random-write speed of DRAM and efficient sequential-write performance PMem. Our evaluation shows that Viper significantly outperforms existing KVSs for core KVS operations while providing full data persistence. Moreover, Viper outperforms existing PMem-only, hybrid, and disk-based KVSs by 4-18x for write workloads, while matching or surpassing their get performance.
Embedded real-time systems generate state sequences where time elapses between state changes. Ensuring that such systems adhere to a provided specification of admissible or desired behavior is essential. Formal model-based testing is often a suitable cost-effective approach. We introduce an extended version of the formalism of symbolic graphs, which encompasses types as well as attributes, for representing states of dynamic systems. Relying on this extension of symbolic graphs, we present a novel formalism of timed graph transformation systems (TGTSs) that supports the model-based development of dynamic real-time systems at an abstract level where possible state changes and delays are specified by graph transformation rules. We then introduce an extended form of the metric temporal graph logic (MTGL) with increased expressiveness to improve the applicability of MTGL for the specification of timed graph sequences generated by a TGTS. Based on the metric temporal operators of MTGL and its built-in graph binding mechanics, we express properties on the structure and attributes of graphs as well as on the occurrence of graphs over time that are related by their inner structure. We provide formal support for checking whether a single generated timed graph sequence adheres to a provided MTGL specification. Relying on this logical foundation, we develop a testing framework for TGTSs that are specified using MTGL. Lastly, we apply this testing framework to a running example by using our prototypical implementation in the tool AutoGraph.
In 1997, Henry Lieberman stated that debugging is the dirty little secret of computer science. Since then, several promising debugging technologies have been developed such as back-in-time debuggers and automatic fault localization methods. However, the last study about the state-of-the-art in debugging is still more than 15 years old and so it is not clear whether these new approaches have been applied in practice or not. For that reason, we investigate the current state of debugging in a comprehensive study. First, we review the available literature and learn about current approaches and study results. Second, we observe several professional developers while debugging and interview them about their experiences. Third, we create a questionnaire that serves as the basis for a larger online debugging survey. Based on these results, we present new insights into debugging practice that help to suggest new directions for future research.
Which event happened first?
(2021)
First come, first served: Critical choices between alternative actions are often made based on events external to an organization, and reacting promptly to their occurrence can be a major advantage over the competition. In Business Process Management (BPM), such deferred choices can be expressed in process models, and they are an important aspect of process engines. Blockchain-based process execution approaches are no exception to this, but are severely limited by the inherent properties of the platform: The isolated environment prevents direct access to external entities and data, and the non-continual runtime based entirely on atomic transactions impedes the monitoring and detection of events. In this paper we provide an in-depth examination of the semantics of deferred choice, and transfer them to environments such as the blockchain. We introduce and compare several oracle architectures able to satisfy certain requirements, and show that they can be implemented using state-of-the-art blockchain technology.
Comprior
(2021)
Background
Reproducible benchmarking is important for assessing the effectiveness of novel feature selection approaches applied on gene expression data, especially for prior knowledge approaches that incorporate biological information from online knowledge bases. However, no full-fledged benchmarking system exists that is extensible, provides built-in feature selection approaches, and a comprehensive result assessment encompassing classification performance, robustness, and biological relevance. Moreover, the particular needs of prior knowledge feature selection approaches, i.e. uniform access to knowledge bases, are not addressed. As a consequence, prior knowledge approaches are not evaluated amongst each other, leaving open questions regarding their effectiveness.
Results
We present the Comprior benchmark tool, which facilitates the rapid development and effortless benchmarking of feature selection approaches, with a special focus on prior knowledge approaches. Comprior is extensible by custom approaches, offers built-in standard feature selection approaches, enables uniform access to multiple knowledge bases, and provides a customizable evaluation infrastructure to compare multiple feature selection approaches regarding their classification performance, robustness, runtime, and biological relevance.
Conclusion
Comprior allows reproducible benchmarking especially of prior knowledge approaches, which facilitates their applicability and for the first time enables a comprehensive assessment of their effectiveness
In this paper, we analyze stochastic dynamic pricing and advertising differential games in special oligopoly markets with constant price and advertising elasticity. We consider the sale of perishable as well as durable goods and include adoption effects in the demand. Based on a unique stochastic feedback Nash equilibrium, we derive closed-form solution formulas of the value functions and the optimal feedback policies of all competing firms. Efficient simulation techniques are used to evaluate optimally controlled sales processes over time. This way, the evolution of optimal controls as well as the firms’ profit distributions are analyzed. Moreover, we are able to compare feedback solutions of the stochastic model with its deterministic counterpart. We show that the market power of the competing firms is exactly the same as in the deterministic version of the model. Further, we discover two fundamental effects that determine the relation between both models. First, the volatility in demand results in a decline of expected profits compared to the deterministic model. Second, we find that saturation effects in demand have an opposite character. We show that the second effect can be strong enough to either exactly balance or even overcompensate the first one. As a result we are able to identify cases in which feedback solutions of the deterministic model provide useful approximations of solutions of the stochastic model.
Making the domain tangible
(2017)
Programmers collaborate continuously with domain experts to explore the problem space and to shape a solution that fits the users’ needs. In doing so, all parties develop a shared vocabulary, which is above all a list of named concepts and their relationships to each other. Nowadays, many programmers favor object-oriented programming because it allows them to directly represent real-world concepts and interactions from the vocabulary as code. However, when existing domain data is not yet represented as objects, it becomes a challenge to initially bring existing domain data into object-oriented systems and to keep the source code readable. While source code might be comprehensible to programmers, domain experts can struggle, given their non-programming background. We present a new approach to provide a mapping of existing data sources into the object-oriented programming environment. We support keeping the code of the domain model compact and readable while adding implicit means to access external information as internal domain objects. This should encourage programmers to explore different ways to build the software system quickly. Eventually, our approach fosters communication with the domain experts, especially at the beginning of a project. When the details in the problem space are not yet clear, the source code provides a valuable, tangible communication artifact.
Design thinking is acknowledged as a thriving innovation practice plus something more, something in the line of a deep understanding of innovation processes. At the same time, quite how and why design thinking works-in scientific terms-appeared an open question at first. Over recent years, empirical research has achieved great progress in illuminating the principles that make design thinking successful. Lately, the community began to explore an additional approach. Rather than setting up novel studies, investigations into the history of design thinking hold the promise of adding systematically to our comprehension of basic principles. This chapter makes a start in revisiting design thinking history with the aim of explicating scientific understandings that inform design thinking practices today. It offers a summary of creative thinking theories that were brought to Stanford Engineering in the 1950s by John E. Arnold.
In rural/remote areas, resource constrained smart micro-grid (RCSMG) architectures can provide a cost-effective power supply alternative in cases when connectivity to the national power grid is impeded by factors such as load shedding. RCSMG architectures can be designed to handle communications over a distributed lossy network in order to minimise operation costs. However, due to the unreliable nature of lossy networks communication data can be distorted by noise additions that alter the veracity of the data. In this chapter, we consider cases in which an adversary who is internal to the RCSMG, deliberately distorts communicated data to gain an unfair advantage over the RCSMG’s users. The adversary’s goal is to mask malicious data manipulations as distortions due to additive noise due to communication channel unreliability. Distinguishing malicious data distortions from benign distortions is important in ensuring trustworthiness of the RCSMG. Perturbation data anonymisation algorithms can be used to alter transmitted data to ensure that adversarial manipulation of the data reveals no information that the adversary can take advantage of. However, because existing data perturbation anonymisation algorithms operate by using additive noise to anonymise data, using these algorithms in the RCSMG context is challenging. This is due to the fact that distinguishing benign noise additions from malicious noise additions is a difficult problem. In this chapter, we present a brief survey of cases of privacy violations due to inferences drawn from observed power consumption patterns in RCSMGs centred on inference, and propose a method of mitigating these risks. The lesson here is that while RCSMGs give users more control over power management and distribution, good anonymisation is essential to protecting personal information on RCSMGs.
In this chapter, we provide a framework to specify how cheating attacks can be conducted successfully on power marketing schemes in resource constrained smart micro-grids. This is an important problem because such cheating attacks can destabilise and in the worst case result in a breakdown of the micro-grid. We consider three aspects, in relation to modelling cheating attacks on power auctioning schemes. First, we aim to specify exactly how in spite of the resource constrained character of the micro-grid, cheating can be conducted successfully. Second, we consider how mitigations can be modelled to prevent cheating, and third, we discuss methods of maintaining grid stability and reliability even in the presence of cheating attacks. We use an Automated-Cheating-Attack (ACA) conception to build a taxonomy of cheating attacks based on the idea of adversarial acquisition of surplus energy. Adversarial acquisitions of surplus energy allow malicious users to pay less for access to more power than the quota allowed for the price paid. The impact on honest users, is the lack of an adequate supply of energy to meet power demand requests. We conclude with a discussion of the performance overhead of provoking, detecting, and mitigating such attacks efficiently.
Resource constrained smart micro-grid architectures describe a class of smart micro-grid architectures that handle communications operations over a lossy network and depend on a distributed collection of power generation and storage units. Disadvantaged communities with no or intermittent access to national power networks can benefit from such a micro-grid model by using low cost communication devices to coordinate the power generation, consumption, and storage. Furthermore, this solution is both cost-effective and environmentally-friendly. One model for such micro-grids, is for users to agree to coordinate a power sharing scheme in which individual generator owners sell excess unused power to users wanting access to power. Since the micro-grid relies on distributed renewable energy generation sources which are variable and only partly predictable, coordinating micro-grid operations with distributed algorithms is necessity for grid stability. Grid stability is crucial in retaining user trust in the dependability of the micro-grid, and user participation in the power sharing scheme, because user withdrawals can cause the grid to breakdown which is undesirable. In this chapter, we present a distributed architecture for fair power distribution and billing on microgrids. The architecture is designed to operate efficiently over a lossy communication network, which is an advantage for disadvantaged communities. We build on the architecture to discuss grid coordination notably how tasks such as metering, power resource allocation, forecasting, and scheduling can be handled. All four tasks are managed by a feedback control loop that monitors the performance and behaviour of the micro-grid, and based on historical data makes decisions to ensure the smooth operation of the grid. Finally, since lossy networks are undependable, differentiating system failures from adversarial manipulations is an important consideration for grid stability. We therefore provide a characterisation of potential adversarial models and discuss possible mitigation measures.
Power Systems
(2018)
Studies indicate that reliable access to power is an important enabler for economic growth. To this end, modern energy management systems have seen a shift from reliance on time-consuming manual procedures, to highly automated management, with current energy provisioning systems being run as cyber-physical systems. Operating energy grids as a cyber-physical system offers the advantage of increased reliability and dependability, but also raises issues of security and privacy. In this chapter, we provide an overview of the contents of this book showing the interrelation between the topics of the chapters in terms of smart energy provisioning. We begin by discussing the concept of smart-grids in general, proceeding to narrow our focus to smart micro-grids in particular. Lossy networks also provide an interesting framework for enabling the implementation of smart micro-grids in remote/rural areas, where deploying standard smart grids is economically and structurally infeasible. To this end, we consider an architectural design for a smart micro-grid suited to low-processing capable devices. We model malicious behaviour, and propose mitigation measures based properties to distinguish normal from malicious behaviour.
Confidence Counts
(2021)
The increasing reliance on online learning in higher education has been further expedited by the on-going Covid-19 pandemic. Students need to be supported as they adapt to this new learning environment. Research has established that learners with positive online learning self-efficacy beliefs are more likely to persevere and achieve their higher education goals when learning online. In this paper, we explore how MOOC design can contribute to the four sources of self-efficacy beliefs posited by Bandura [4]. Specifically, we will explore, drawing on learner reflections, whether design elements of the MOOC, The Digital Edge: Essentials for the Online Learner, provided participants with the necessary mastery experiences, vicarious experiences, verbal persuasion, and affective regulation opportunities, to evaluate and develop their online learning self-efficacy beliefs. Findings from a content analysis of discussion forum posts show that learners referenced three of the four information sources when reflecting on their experience of the MOOC. This paper illustrates the potential of MOOCs as a pedagogical tool for enhancing online learning self-efficacy among students.
The relevance of identity data leaks on the Internet is more present than ever. Almost every week we read about leakage of databases with more than a million users in the news. Smaller but not less dangerous leaks happen even multiple times a day. The public availability of such leaked data is a major threat to the victims, but also creates the opportunity to learn not only about security of service providers but also the behavior of users when choosing passwords. Our goal is to analyze this data and generate knowledge that can be used to increase security awareness and security, respectively. This paper presents a novel approach to the processing and analysis of a vast majority of bigger and smaller leaks. We evolved from a semi-manual to a fully automated process that requires a minimum of human interaction. Our contribution is the concept and a prototype implementation of a leak processing workflow that includes the extraction of digital identities from structured and unstructured leak-files, the identification of hash routines and a quality control to ensure leak authenticity. By making use of parallel and distributed programming, we are able to make leaks almost immediately available for analysis and notification after they have been published. Based on the data collected, this paper reveals how easy it is for criminals to collect lots of passwords, which are plain text or only weakly hashed. We publish those results and hope to increase not only security awareness of Internet users but also security on a technical level on the service provider side.
Functional dependencies (FDs) play an important role in maintaining data quality. They can be used to enforce data consistency and to guide repairs over a database. In this work, we investigate the problem of missing values and its impact on FD discovery. When using existing FD discovery algorithms, some genuine FDs could not be detected precisely due to missing values or some non-genuine FDs can be discovered even though they are caused by missing values with a certain NULL semantics. We define a notion of genuineness and propose algorithms to compute the genuineness score of a discovered FD. This can be used to identify the genuine FDs among the set of all valid dependencies that hold on the data. We evaluate the quality of our method over various real-world and semi-synthetic datasets with extensive experiments. The results show that our method performs well for relatively large FD sets and is able to accurately capture genuine FDs.
We present a system-level synthesis approach for heterogeneous multi-processor on chip, based on Answer Set Programming(ASP). Starting with a high-level description of an application, its timing constraints and the physical constraints of the target device, our goal is to produce the optimal computing infrastructure made of heterogeneous processors, peripherals, memories and communication components. Optimization aims at maximizing speed, while minimizing chip area. Also, a scheduler must be produced that fulfills the real-time requirements of the application. Even though our approach will work for application specific integrated circuits, we have chosen FPGA as target device in this work because of their reconfiguration capabilities which makes it possible to explore several design alternatives. This paper addresses the bottleneck of problem representation size by providing a direct and compact ASP encoding for automatic synthesis that is semantically equivalent to previously established ILP and ASP models. We describe a use-case in which designers specify their applications in C/C++ from which optimum systems can be derived. We demonstrate the superiority of our approach toward existing heuristics and exact methods with synthesis results on a set of realistic case studies. (C) 2018 Elsevier Inc. All rights reserved.
Verbal focus shifts
(2018)
Previous studies on design behaviour indicate that focus shifts positively influence ideational productivity. In this study we want to take a closer look at how these focus shifts look on the verbal level. We describe a mutually influencing relationship between mental focus shifts and verbal low coherent statements. In a case study based on the DTRS11 dataset we identify 297 low coherent statements via a combined topic modelling and manual approach. We introduce a categorization of the different instances of low coherent statements. The results indicate that designers tend to shift topics within an existing design issue instead of completely disrupting it. (C) 2018 Elsevier Ltd. All rights reserved.
The demand for peer-to-peer ridesharing services increased over the last years rapidly. To cost-efficiently dispatch orders and communicate accurate pick-up times is challenging as the current location of each available driver is not exactly known since observed locations can be outdated for several seconds. The developed trajectory visualization tool enables transportation network companies to analyze dispatch processes and determine the causes of unexpected delays. As dispatching algorithms are based on the accuracy of arrival time predictions, we account for factors like noise, sample rate, technical and economic limitations as well as the duration of the entire process as they have an impact on the accuracy of spatio-temporal data. To improve dispatching strategies, we propose a prediction approach that provides a probability distribution for a driver’s future locations based on patterns observed in past trajectories. We demonstrate the capabilities of our prediction results to ( i) avoid critical delays, (ii) to estimate waiting times with higher confidence, and (iii) to enable risk considerations in dispatching strategies.
A treemap is a visualization that has been specifically designed to facilitate the exploration of tree-structured data and, more general, hierarchically structured data. The family of visualization techniques that use a visual metaphor for parent-child relationships based “on the property of containment” (Johnson, 1993) is commonly referred to as treemaps. However, as the number of variations of treemaps grows, it becomes increasingly important to distinguish clearly between techniques and their specific characteristics. This paper proposes to discern between Space-filling Treemap TS, Containment Treemap TC, Implicit Edge Representation Tree TIE, and Mapped Tree TMT for classification of hierarchy visualization techniques and highlights their respective properties. This taxonomy is created as a hyponymy, i.e., its classes have an is-a relationship to one another: TS TC TIE TMT. With this proposal, we intend to stimulate a discussion on a more unambiguous classification of treemaps and, furthermore, broaden what is understood by the concept of treemap itself.
With the spread of smart phones capable of taking high-resolution photos and the development of high-speed mobile data infrastructure, digital visual media is becoming one of the most important forms of modern communication. With this development, however, also comes a devaluation of images as a media form with the focus becoming the frequency at which visual content is generated instead of the quality of the content. In this work, an interactive system using image-abstraction techniques and an eye tracking sensor is presented, which allows users to experience diverting and dynamic artworks that react to their eye movement. The underlying modular architecture enables a variety of different interaction techniques that share common design principles, making the interface as intuitive as possible. The resulting experience allows users to experience a game-like interaction in which they aim for a reward, the artwork, while being held under constraints, e.g., not blinking. The co nscious eye movements that are required by some interaction techniques hint an interesting, possible future extension for this work into the field of relaxation exercises and concentration training.
Ubiquitous business processes are the new generation of processes that pervade the physical space and interact with their environments using a minimum of human involvement. Although they are now widely deployed in the industry, their deployment is still ad hoc . They are implemented after an arbitrary modeling phase or no modeling phase at all. The absence of a solid modeling phase backing up the implementation generates many loopholes that are stressed in the literature. Here, we tackle the issue of modeling ubiquitous business processes. We propose patterns to represent the recent ubiquitous computing features. These patterns are the outcome of an analysis we conducted in the field of human-computer interaction to examine how the features are actually deployed. The patterns' understandability, ease-of-use, usefulness, and completeness are examined via a user experiment. The results indicate that these four indexes are on the positive track. Hence, the patterns may be the backbone of ubiquitous business process modeling in industrial applications.
TRIPOD
(2021)
Inertial measurement units (IMUs) enable easy to operate and low-cost data recording for gait analysis. When combined with treadmill walking, a large number of steps can be collected in a controlled environment without the need of a dedicated gait analysis laboratory. In order to evaluate existing and novel IMU-based gait analysis algorithms for treadmill walking, a reference dataset that includes IMU data as well as reliable ground truth measurements for multiple participants and walking speeds is needed. This article provides a reference dataset consisting of 15 healthy young adults who walked on a treadmill at three different speeds. Data were acquired using seven IMUs placed on the lower body, two different reference systems (Zebris FDMT-HQ and OptoGait), and two RGB cameras. Additionally, in order to validate an existing IMU-based gait analysis algorithm using the dataset, an adaptable modular data analysis pipeline was built. Our results show agreement between the pressure-sensitive Zebris and the photoelectric OptoGait system (r = 0.99), demonstrating the quality of our reference data. As a use case, the performance of an algorithm originally designed for overground walking was tested on treadmill data using the data pipeline. The accuracy of stride length and stride time estimations was comparable to that reported in other studies with overground data, indicating that the algorithm is equally applicable to treadmill data. The Python source code of the data pipeline is publicly available, and the dataset will be provided by the authors upon request, enabling future evaluations of IMU gait analysis algorithms without the need of recording new data.
Thematic maps are a common tool to visualize semantic data with a spatial reference. Combining thematic data with a geometric representation of their natural reference frame aids the viewer’s ability in gaining an overview, as well as perceiving patterns with respect to location; however, as the amount of data for visualization continues to increase, problems such as information overload and visual clutter impede perception, requiring data aggregation and level-of-detail visualization techniques. While existing aggregation techniques for thematic data operate in a 2D reference frame (i.e., map), we present two aggregation techniques for 3D spatial and spatiotemporal data mapped onto virtual city models that hierarchically aggregate thematic data in real time during rendering to support on-the-fly and on-demand level-of-detail generation. An object-based technique performs aggregation based on scene-specific objects and their hierarchy to facilitate per-object analysis, while the scene-based technique aggregates data solely based on spatial locations, thus supporting visual analysis of data with arbitrary reference geometry. Both techniques can apply different aggregation functions (mean, minimum, and maximum) for ordinal, interval, and ratio-scaled data and can be easily extended with additional functions. Our implementation utilizes the programmable graphics pipeline and requires suitably encoded data, i.e., textures or vertex attributes. We demonstrate the application of both techniques using real-world datasets, including solar potential analyses and the propagation of pressure waves in a virtual city model.