publish.UP Search

Toxic comment detection in online discussions (2020)

Comment sections of online news platforms are an essential space to express opinions and discuss political topics. In contrast to other online posts, news discussions are related to particular news articles, comments refer to each other, and individual conversations emerge. However, the misuse by spammers, haters, and trolls makes costly content moderation necessary. Sentiment analysis can not only support moderation but also help to understand the dynamics of online discussions. A subtask of content moderation is the identification of toxic comments. To this end, we describe the concept of toxicity and characterize its subclasses. Further, we present various deep learning approaches, including datasets and architectures, tailored to sentiment analysis in online discussions. One way to make these approaches more comprehensible and trustworthy is fine-grained instead of binary comment classification. On the downside, more classes require more training data. Therefore, we propose to augment training data by using transfer learning. We discuss real-world applications, such as semi-automated comment moderation and troll detection. Finally, we outline future challenges and current limitations in light of most recent research publications.

The influence of reward on facial mimicry (2020)

Trilla, Irene ; Drimalla, Hanna ; Bajbouj, Malek ; Dziobek, Isabel

Recent findings suggest a role of oxytocin on the tendency to spontaneously mimic the emotional facial expressions of others. Oxytocin-related increases of facial mimicry, however, seem to be dependent on contextual factors. Given previous literature showing that people preferentially mimic emotional expressions of individuals associated with high (vs. low) rewards, we examined whether the reward value of the mimicked agent is one factor influencing the oxytocin effects on facial mimicry. To test this hypothesis, 60 male adults received 24 IU of either intranasal oxytocin or placebo in a double-blind, between-subject experiment. Next, the value of male neutral faces was manipulated using an associative learning task with monetary rewards. After the reward associations were learned, participants watched videos of the same faces displaying happy and angry expressions. Facial reactions to the emotional expressions were measured with electromyography. We found that participants judged as more pleasant the face identities associated with high reward values than with low reward values. However, happy expressions by low rewarding faces were more spontaneously mimicked than high rewarding faces. Contrary to our expectations, we did not find a significant direct effect of intranasal oxytocin on facial mimicry, nor on the reward-driven modulation of mimicry. Our results support the notion that mimicry is a complex process that depends on contextual factors, but failed to provide conclusive evidence of a role of oxytocin on the modulation of facial mimicry.

RHEEMix in the data jungle (2020)

Kruse, Sebastian ; Kaoudi, Zoi ; Contreras-Rojas, Bertty ; Chawla, Sanjay ; Naumann, Felix ; Quiane-Ruiz, Jorge-Arnulfo

Data analytics are moving beyond the limits of a single platform. In this paper, we present the cost-based optimizer of Rheem, an open-source cross-platform system that copes with these new requirements. The optimizer allocates the subtasks of data analytic tasks to the most suitable platforms. Our main contributions are: (i) a mechanism based on graph transformations to explore alternative execution strategies; (ii) a novel graph-based approach to determine efficient data movement plans among subtasks and platforms; and (iii) an efficient plan enumeration algorithm, based on a novel enumeration algebra. We extensively evaluate our optimizer under diverse real tasks. We show that our optimizer can perform tasks more than one order of magnitude faster when using multiple platforms than when using a single platform.

Recurrent generative adversarial network for learning imbalanced medical image semantic segmentation (2020)

Rezaei, Mina ; Yang, Haojin ; Meinel, Christoph

We propose a new recurrent generative adversarial architecture named RNN-GAN to mitigate imbalance data problem in medical image semantic segmentation where the number of pixels belongs to the desired object are significantly lower than those belonging to the background. A model trained with imbalanced data tends to bias towards healthy data which is not desired in clinical applications and predicted outputs by these networks have high precision and low recall. To mitigate imbalanced training data impact, we train RNN-GAN with proposed complementary segmentation mask, in addition, ordinary segmentation masks. The RNN-GAN consists of two components: a generator and a discriminator. The generator is trained on the sequence of medical images to learn corresponding segmentation label map plus proposed complementary label both at a pixel level, while the discriminator is trained to distinguish a segmentation image coming from the ground truth or from the generator network. Both generator and discriminator substituted with bidirectional LSTM units to enhance temporal consistency and get inter and intra-slice representation of the features. We show evidence that the proposed framework is applicable to different types of medical images of varied sizes. In our experiments on ACDC-2017, HVSMR-2016, and LiTS-2017 benchmarks we find consistently improved results, demonstrating the efficacy of our approach.

Proceedings of the EuBIC-MS 2020 Developers’ Meeting (2020)

The 2020 European Bioinformatics Community for Mass Spectrometry (EuBIC-MS) Developers’ meeting was held from January 13th to January 17th 2020 in Nyborg, Denmark. Among the participants were scientists as well as developers working in the field of computational mass spectrometry (MS) and proteomics. The 4-day program was split between introductory keynote lectures and parallel hackathon sessions. During the latter, the participants developed bioinformatics tools and resources addressing outstanding needs in the community. The hackathons allowed less experienced participants to learn from more advanced computational MS experts, and to actively contribute to highly relevant research projects. We successfully produced several new tools that will be useful to the proteomics community by improving data analysis as well as facilitating future research. All keynote recordings are available on https://doi.org/10.5281/zenodo.3890181.

Predicting location probabilities of drivers to improved dispatch decisions of transportation network companies based on trajectory data (2020)

Richly, Keven ; Brauer, Janos ; Schlosser, Rainer

The demand for peer-to-peer ridesharing services increased over the last years rapidly. To cost-efficiently dispatch orders and communicate accurate pick-up times is challenging as the current location of each available driver is not exactly known since observed locations can be outdated for several seconds. The developed trajectory visualization tool enables transportation network companies to analyze dispatch processes and determine the causes of unexpected delays. As dispatching algorithms are based on the accuracy of arrival time predictions, we account for factors like noise, sample rate, technical and economic limitations as well as the duration of the entire process as they have an impact on the accuracy of spatio-temporal data. To improve dispatching strategies, we propose a prediction approach that provides a probability distribution for a driver’s future locations based on patterns observed in past trajectories. We demonstrate the capabilities of our prediction results to ( i) avoid critical delays, (ii) to estimate waiting times with higher confidence, and (iii) to enable risk considerations in dispatching strategies.

Partial order resolution of event logs for process conformance checking (2020)

van der Aa, Han ; Leopold, Henrik ; Weidlich, Matthias

While supporting the execution of business processes, information systems record event logs. Conformance checking relies on these logs to analyze whether the recorded behavior of a process conforms to the behavior of a normative specification. A key assumption of existing conformance checking techniques, however, is that all events are associated with timestamps that allow to infer a total order of events per process instance. Unfortunately, this assumption is often violated in practice. Due to synchronization issues, manual event recordings, or data corruption, events are only partially ordered. In this paper, we put forward the problem of partial order resolution of event logs to close this gap. It refers to the construction of a probability distribution over all possible total orders of events of an instance. To cope with the order uncertainty in real-world data, we present several estimators for this task, incorporating different notions of behavioral abstraction. Moreover, to reduce the runtime of conformance checking based on partial order resolution, we introduce an approximation method that comes with a bounded error in terms of accuracy. Our experiments with real-world and synthetic data reveal that our approach improves accuracy over the state-of-the-art considerably.

Multiplicative Up-Drift (2020)

Doerr, Benjamin ; Kötzing, Timo

Drift analysis aims at translating the expected progress of an evolutionary algorithm (or more generally, a random process) into a probabilistic guarantee on its run time (hitting time). So far, drift arguments have been successfully employed in the rigorous analysis of evolutionary algorithms, however, only for the situation that the progress is constant or becomes weaker when approaching the target. Motivated by questions like how fast fit individuals take over a population, we analyze random processes exhibiting a (1+delta)-multiplicative growth in expectation. We prove a drift theorem translating this expected progress into a hitting time. This drift theorem gives a simple and insightful proof of the level-based theorem first proposed by Lehre (2011). Our version of this theorem has, for the first time, the best-possible near-linear dependence on 1/delta} (the previous results had an at least near-quadratic dependence), and it only requires a population size near-linear in delta (this was super-quadratic in previous results). These improvements immediately lead to stronger run time guarantees for a number of applications. We also discuss the case of large delta and show stronger results for this setting.

Meta-analysis uncovers genome-wide significant variants for rapid kidney function decline (2020)

Gorski, Mathias ; Jung, Bettina ; Li, Yong ; Matias-Garcia, Pamela R. ; Wuttke, Matthias ; Coassin, Stefan ; Thio, Chris H. L. ; Kleber, Marcus E. ; Winkler, Thomas W. ; Wanner, Veronika ; Chai, Jin-Fang ; Chu, Audrey Y. ; Cocca, Massimiliano ; Feitosa, Mary F. ; Ghasemi, Sahar ; Hoppmann, Anselm ; Horn, Katrin ; Li, Man ; Nutile, Teresa ; Scholz, Markus ; Sieber, Karsten B. ; Teumer, Alexander ; Tin, Adrienne ; Wang, Judy ; Tayo, Bamidele O. ; Ahluwalia, Tarunveer S. ; Almgren, Peter ; Bakker, Stephan J. L. ; Banas, Bernhard ; Bansal, Nisha ; Biggs, Mary L. ; Boerwinkle, Eric ; Böttinger, Erwin ; Brenner, Hermann ; Carroll, Robert J. ; Chalmers, John ; Chee, Miao-Li ; Chee, Miao-Ling ; Cheng, Ching-Yu ; Coresh, Josef ; de Borst, Martin H. ; Degenhardt, Frauke ; Eckardt, Kai-Uwe ; Endlich, Karlhans ; Franke, Andre ; Freitag-Wolf, Sandra ; Gampawar, Piyush ; Gansevoort, Ron T. ; Ghanbari, Mohsen ; Gieger, Christian ; Hamet, Pavel ; Ho, Kevin ; Hofer, Edith ; Holleczek, Bernd ; Foo, Valencia Hui Xian ; Hutri-Kahonen, Nina ; Hwang, Shih-Jen ; Ikram, M. Arfan ; Josyula, Navya Shilpa ; Kahonen, Mika ; Khor, Chiea-Chuen ; Koenig, Wolfgang ; Kramer, Holly ; Kraemer, Bernhard K. ; Kuehnel, Brigitte ; Lange, Leslie A. ; Lehtimaki, Terho ; Lieb, Wolfgang ; Loos, Ruth J. F. ; Lukas, Mary Ann ; Lyytikainen, Leo-Pekka ; Meisinger, Christa ; Meitinger, Thomas ; Melander, Olle ; Milaneschi, Yuri ; Mishra, Pashupati P. ; Mononen, Nina ; Mychaleckyj, Josyf C. ; Nadkarni, Girish N. ; Nauck, Matthias ; Nikus, Kjell ; Ning, Boting ; Nolte, Ilja M. ; O'Donoghue, Michelle L. ; Orho-Melander, Marju ; Pendergrass, Sarah A. ; Penninx, Brenda W. J. H. ; Preuss, Michael H. ; Psaty, Bruce M. ; Raffield, Laura M. ; Raitakari, Olli T. ; Rettig, Rainer ; Rheinberger, Myriam ; Rice, Kenneth M. ; Rosenkranz, Alexander R. ; Rossing, Peter ; Rotter, Jerome ; Sabanayagam, Charumathi ; Schmidt, Helena ; Schmidt, Reinhold ; Schoettker, Ben ; Schulz, Christina-Alexandra ; Sedaghat, Sanaz ; Shaffer, Christian M. ; Strauch, Konstantin ; Szymczak, Silke ; Taylor, Kent D. ; Tremblay, Johanne ; Chaker, Layal ; van der Harst, Pim ; van der Most, Peter J. ; Verweij, Niek ; Voelker, Uwe ; Waldenberger, Melanie ; Wallentin, Lars ; Waterworth, Dawn M. ; White, Harvey D. ; Wilson, James G. ; Wong, Tien-Yin ; Woodward, Mark ; Yang, Qiong ; Yasuda, Masayuki ; Yerges-Armstrong, Laura M. ; Zhang, Yan ; Snieder, Harold ; Wanner, Christoph ; Boger, Carsten A. ; Kottgen, Anna ; Kronenberg, Florian ; Pattaro, Cristian ; Heid, Iris M.

Rapid decline of glomerular filtration rate estimated from creatinine (eGFRcrea) is associated with severe clinical endpoints. In contrast to cross-sectionally assessed eGFRcrea, the genetic basis for rapid eGFRcrea decline is largely unknown. To help define this, we meta-analyzed 42 genome-wide association studies from the Chronic Kidney Diseases Genetics Consortium and United Kingdom Biobank to identify genetic loci for rapid eGFRcrea decline. Two definitions of eGFRcrea decline were used: 3 mL/min/1.73m(2)/year or more ("Rapid3"; encompassing 34,874 cases, 107,090 controls) and eGFRcrea decline 25% or more and eGFRcrea under 60 mL/min/1.73m(2) at follow-up among those with eGFRcrea 60 mL/min/1.73m(2) or more at baseline ("CKDi25"; encompassing 19,901 cases, 175,244 controls). Seven independent variants were identified across six loci for Rapid3 and/or CKDi25: consisting of five variants at four loci with genome-wide significance (near UMOD-PDILT (2), PRKAG2, WDR72, OR2S2) and two variants among 265 known eGFRcrea variants (near GATM, LARP4B). All these loci were novel for Rapid3 and/or CKDi25 and our bioinformatic follow-up prioritized variants and genes underneath these loci. The OR2S2 locus is novel for any eGFRcrea trait including interesting candidates. For the five genome-wide significant lead variants, we found supporting effects for annual change in blood urea nitrogen or cystatin-based eGFR, but not for GATM or (LARP4B). Individuals at high compared to those at low genetic risk (8-14 vs. 0-5 adverse alleles) had a 1.20-fold increased risk of acute kidney injury (95% confidence interval 1.08-1.33). Thus, our identified loci for rapid kidney function decline may help prioritize therapeutic targets and identify mechanisms and individuals at risk for sustained deterioration of kidney function.

IMU-Based Movement Trajectory Heatmaps for Human Activity Recognition (2020)

Konak, Orhan ; Wegner, Pit ; Arnrich, Bert

Recent trends in ubiquitous computing have led to a proliferation of studies that focus on human activity recognition (HAR) utilizing inertial sensor data that consist of acceleration, orientation and angular velocity. However, the performances of such approaches are limited by the amount of annotated training data, especially in fields where annotating data is highly time-consuming and requires specialized professionals, such as in healthcare. In image classification, this limitation has been mitigated by powerful oversampling techniques such as data augmentation. Using this technique, this work evaluates to what extent transforming inertial sensor data into movement trajectories and into 2D heatmap images can be advantageous for HAR when data are scarce. A convolutional long short-term memory (ConvLSTM) network that incorporates spatiotemporal correlations was used to classify the heatmap images. Evaluation was carried out on Deep Inertial Poser (DIP), a known dataset composed of inertial sensor data. The results obtained suggest that for datasets with large numbers of subjects, using state-of-the-art methods remains the best alternative. However, a performance advantage was achieved for small datasets, which is usually the case in healthcare. Moreover, movement trajectories provide a visual representation of human activities, which can help researchers to better interpret and analyze motion patterns.

Refine

Has Fulltext

Author

Year of publication

Document Type

Language

Is part of the Bibliography

Keywords

Institute

22 search hits