Refine
Has Fulltext
- yes (3) (remove)
Document Type
- Doctoral Thesis (3) (remove)
Language
- English (3)
Is part of the Bibliography
- yes (3) (remove)
Keywords
- social media (3) (remove)
Institute
The immense popularity of online communication services in the last decade has not only upended our lives (with news spreading like wildfire on the Web, presidents announcing their decisions on Twitter, and the outcome of political elections being determined on Facebook) but also dramatically increased the amount of data exchanged on these platforms. Therefore, if we wish to understand the needs of modern society better and want to protect it from new threats, we urgently need more robust, higher-quality natural language processing (NLP) applications that can recognize such necessities and menaces automatically, by analyzing uncensored texts. Unfortunately, most NLP programs today have been created for standard language, as we know it from newspapers, or, in the best case, adapted to the specifics of English social media.
This thesis reduces the existing deficit by entering the new frontier of German online communication and addressing one of its most prolific forms—users’ conversations on Twitter. In particular, it explores the ways and means by how people express their opinions on this service, examines current approaches to automatic mining of these feelings, and proposes novel methods, which outperform state-of-the-art techniques. For this purpose, I introduce a new corpus of German tweets that have been manually annotated with sentiments, their targets and holders, as well as lexical polarity items and their contextual modifiers. Using these data, I explore four major areas of sentiment research: (i) generation of sentiment lexicons, (ii) fine-grained opinion mining, (iii) message-level polarity classification, and (iv) discourse-aware sentiment analysis. In the first task, I compare three popular groups of lexicon generation methods: dictionary-, corpus-, and word-embedding–based ones, finding that dictionary-based systems generally yield better polarity lists than the last two groups. Apart from this, I propose a linear projection algorithm, whose results surpass many existing automatically-generated lexicons. Afterwords, in the second task, I examine two common approaches to automatic prediction of sentiment spans, their sources, and targets: conditional random fields (CRFs) and recurrent neural networks, obtaining higher scores with the former model and improving these results even further by redefining the structure of CRF graphs. When dealing with message-level polarity classification, I juxtapose three major sentiment paradigms: lexicon-, machine-learning–, and deep-learning–based systems, and try to unite the first and last of these method groups by introducing a bidirectional neural network with lexicon-based attention. Finally, in order to make the new classifier aware of microblogs' discourse structure, I let it separately analyze the elementary discourse units of each tweet and infer the overall polarity of a message from the scores of its EDUs with the help of two new approaches: latent-marginalized CRFs and Recursive Dirichlet Process.
Comment sections of online news platforms are an essential space to express opinions and discuss political topics. However, the misuse by spammers, haters, and trolls raises doubts about whether the benefits justify the costs of the time-consuming content moderation. As a consequence, many platforms limited or even shut down comment sections completely. In this thesis, we present deep learning approaches for comment classification, recommendation, and prediction to foster respectful and engaging online discussions. The main focus is on two kinds of comments: toxic comments, which make readers leave a discussion, and engaging comments, which make readers join a discussion. First, we discourage and remove toxic comments, e.g., insults or threats. To this end, we present a semi-automatic comment moderation process, which is based on fine-grained text classification models and supports moderators. Our experiments demonstrate that data augmentation, transfer learning, and ensemble learning allow training robust classifiers even on small datasets. To establish trust in the machine-learned models, we reveal which input features are decisive for their output with attribution-based explanation methods. Second, we encourage and highlight engaging comments, e.g., serious questions or factual statements. We automatically identify the most engaging comments, so that readers need not scroll through thousands of comments to find them. The model training process builds on upvotes and replies as a measure of reader engagement. We also identify comments that address the article authors or are otherwise relevant to them to support interactions between journalists and their readership. Taking into account the readers' interests, we further provide personalized recommendations of discussions that align with their favored topics or involve frequent co-commenters. Our models outperform multiple baselines and recent related work in experiments on comment datasets from different platforms.
Individuals have an intrinsic need to express themselves to other humans within a given community by sharing their experiences, thoughts, actions, and opinions. As a means, they mostly prefer to use modern online social media platforms such as Twitter, Facebook, personal blogs, and Reddit. Users of these social networks interact by drafting their own statuses updates, publishing photos, and giving likes leaving a considerable amount of data behind them to be analyzed. Researchers recently started exploring the shared social media data to understand online users better and predict their Big five personality traits: agreeableness, conscientiousness, extraversion, neuroticism, and openness to experience. This thesis intends to investigate the possible relationship between users’ Big five personality traits and the published information on their social media profiles. Facebook public data such as linguistic status updates, meta-data of likes objects, profile pictures, emotions, or reactions records were adopted to address the proposed research questions. Several machine learning predictions models were constructed with various experiments to utilize the engineered features correlated with the Big 5 Personality traits. The final predictive performances improved the prediction accuracy compared to state-of-the-art approaches, and the models were evaluated based on established benchmarks in the domain. The research experiments were implemented while ethical and privacy points were concerned. Furthermore, the research aims to raise awareness about privacy between social media users and show what third parties can reveal about users’ private traits from what they share and act on different social networking platforms.
In the second part of the thesis, the variation in personality development is studied within a cross-platform environment such as Facebook and Twitter platforms. The constructed personality profiles in these social platforms are compared to evaluate the effect of the used platforms on one user’s personality development. Likewise, personality continuity and stability analysis are performed using two social media platforms samples. The implemented experiments are based on ten-year longitudinal samples aiming to understand users’ long-term personality development and further unlock the potential of cooperation between psychologists and data scientists.