publish.UP 020 Bibliotheks- und Informationswissenschaften

PyFin-sentiment (2023)

Responding to the poor performance of generic automated sentiment analysis solutions on domain-specific texts, we collect a dataset of 10,000 tweets discussing the topics of finance and investing. We manually assign each tweet its market sentiment, i.e., the investor’s anticipation of a stock’s future return. Using this data, we show that all existing sentiment models trained on adjacent domains struggle with accurate market sentiment analysis due to the task’s specialized vocabulary. Consequently, we design, train, and deploy our own sentiment model. It outperforms all previous models (VADER, NTUSD-Fin, FinBERT, TwitterRoBERTa) when evaluated on Twitter posts. On posts from a different platform, our model performs on par with BERT-based large language models. We achieve this result at a fraction of the training and inference costs due to the model’s simple design. We publish the artifact as a python library to facilitate its use by future researchers and practitioners.

Collective response to the health crisis among German Twitter users (2022)

Abramova, Olga ; Batzel, Katharina ; Modesti, Daniela

We used structural topic modeling to analyze over 800,000 German tweets about COVID-19 to answer the questions: What patterns emerge in tweets as a response to a health crisis? And how do topics discussed change over time? The study leans on the goals associated with the health information seeking (GAINS) model, discerning whether a post aims at tackling and eliminating the problem (i.e., problem-focused) or managing the emotions (i.e., emotion-focused); whether it strives to maximize positive outcomes (promotion focus) or to minimize negative outcomes (prevention focus). The findings indicate four clusters salient in public reactions: 1) “Understanding” (problem-promotion); 2) “Action planning” (problem-prevention); 3) “Hope” (emotion-promotion) and 4) “Reassurance” (emotion-prevention). Public communication is volatile over time, and a shift is evidenced from self-centered to community-centered topics within 4.5 weeks. Our study illustrates social media text mining's potential to quickly and efficiently extract public opinions and reactions. Monitoring fears and trending topics enable policymakers to rapidly respond to deviant behavior, like resistive attitudes toward containment measures or deteriorating physical health. Healthcare workers can use the insights to provide mental health services for battling anxiety or extensive loneliness from staying home.

Lernwelt Hochschule 2030 gestalten (2022)

Günther, Oliver

Data preparation for duplicate detection (2020)

Koumarelas, Ioannis ; Jiang, Lan ; Naumann, Felix

Data errors represent a major issue in most application workflows. Before any important task can take place, a certain data quality has to be guaranteed by eliminating a number of different errors that may appear in data. Typically, most of these errors are fixed with data preparation methods, such as whitespace removal. However, the particular error of duplicate records, where multiple records refer to the same entity, is usually eliminated independently with specialized techniques. Our work is the first to bring these two areas together by applying data preparation operations under a systematic approach prior to performing duplicate detection. <br /> Our process workflow can be summarized as follows: It begins with the user providing as input a sample of the gold standard, the actual dataset, and optionally some constraints to domain-specific data preparations, such as address normalization. The preparation selection operates in two consecutive phases. First, to vastly reduce the search space of ineffective data preparations, decisions are made based on the improvement or worsening of pair similarities. Second, using the remaining data preparations an iterative leave-one-out classification process removes preparations one by one and determines the redundant preparations based on the achieved area under the precision-recall curve (AUC-PR). Using this workflow, we manage to improve the results of duplicate detection up to 19% in AUC-PR.

IT-Organisation in Hochschulen und ihren Bibliotheken (2022)

Kostädt, Peter

Die große Bedeutung der Informationstechnologie für die Wissenschaftsdisziplinen und die zentralen Infrastruktureinrichtungen der Hochschulen steht heutzutage außer Frage. Der Beitrag liefert einen historischen Überblick über die Einführung und Weiterentwicklung der IT in deutschen Hochschulen von den 1950er-Jahren bis heute, wobei der Fokus auf den Bibliotheken und Rechenzentren liegt. Es zeigt sich, dass die verschiedenen Phasen der Technologieentwicklung zu heterogenen IT-Organisationsstrukturen in den Hochschulen geführt haben. DFG und HRK empfehlen daher seit 20 Jahren die Klärung der Verantwortlichkeiten im Rahmen einer IT-Governance sowie die Implementierung eines CIO-Modells. Wie verschiedene Studien zeigen, ist die Umsetzung in der deutschen Hochschullandschaft bislang jedoch nur in Teilen gelungen. Die Herausforderung an vielen Hochschulen besteht nach wie vor darin, die IT-Organisation aus ihrer reaktiven Rolle zu befreien und zu einem aktiven Treiber der digitalen Transformation umzubauen.

Knowledge transfer for entity resolution with siamese neural networks (2021)

Loster, Michael ; Koumarelas, Ioannis ; Naumann, Felix

The integration of multiple data sources is a common problem in a large variety of applications. Traditionally, handcrafted similarity measures are used to discover, merge, and integrate multiple representations of the same entity-duplicates-into a large homogeneous collection of data. Often, these similarity measures do not cope well with the heterogeneity of the underlying dataset. In addition, domain experts are needed to manually design and configure such measures, which is both time-consuming and requires extensive domain expertise. <br /> We propose a deep Siamese neural network, capable of learning a similarity measure that is tailored to the characteristics of a particular dataset. With the properties of deep learning methods, we are able to eliminate the manual feature engineering process and thus considerably reduce the effort required for model construction. In addition, we show that it is possible to transfer knowledge acquired during the deduplication of one dataset to another, and thus significantly reduce the amount of data required to train a similarity measure. We evaluated our method on multiple datasets and compare our approach to state-of-the-art deduplication methods. Our approach outperforms competitors by up to +26 percent F-measure, depending on task and dataset. In addition, we show that knowledge transfer is not only feasible, but in our experiments led to an improvement in F-measure of up to +4.7 percent.

Nicht im Trüben fischen (2021)

Hoyer, Melanie ; Stadler, Heike

Nicht im Trüben fischen (2021)

Hoyer, Melanie ; Stadler, Heike

Es darf gekocht werden (2019)

Thomas, Linda

Eine achtköpfige Delegation der Universität Potsdam (UP) besuchte vom 18. bis 21. November 2018 die Tel Aviv University (TAU) in Israel. Die Kooperation zwischen beiden Einrichtungen wurde durch einen »staff exchange« intensiviert. So konnten Mitarbeiter*innen der UP ihren Gegenpart an der TAU kennenlernen und sich inhaltlich austauschen. Der vorliegende Bericht basiert auf Gesprächen mit drei der insgesamt fünf Bibliotheksdirektorinnen sowie zwei weiteren Kolleginnen aus dem Bereich Erwerbung und Katalogisierung. Dabei wurden unterschiedliche Themenbereiche aus dem Bibliothekswesen und die Sicht beziehungsweise der Stand dazu an der TAU und der UP angesprochen. Der vorliegende Bericht geht auf die Themenfelder Bibliothekssystem, Open Access und Bibliothek als Raum ein.

Es darf gekocht werden (2019)

Thomas, Linda

Eine achtköpfige Delegation der Universität Potsdam (UP) besuchte vom 18. bis 21. November 2018 die Tel Aviv University (TAU) in Israel. Die Kooperation zwischen beiden Einrichtungen wurde durch einen »staff exchange« intensiviert. So konnten Mitarbeiter*innen der UP ihren Gegenpart an der TAU kennenlernen und sich inhaltlich austauschen. Der vorliegende Bericht basiert auf Gesprächen mit drei der insgesamt fünf Bibliotheksdirektorinnen sowie zwei weiteren Kolleginnen aus dem Bereich Erwerbung und Katalogisierung. Dabei wurden unterschiedliche Themenbereiche aus dem Bibliothekswesen und die Sicht beziehungsweise der Stand dazu an der TAU und der UP angesprochen. Der vorliegende Bericht geht auf die Themenfelder Bibliothekssystem, Open Access und Bibliothek als Raum ein.

020 Bibliotheks- und Informationswissenschaften

Refine

Has Fulltext

Author

Year of publication

Document Type

Language

Is part of the Bibliography

Keywords

Institute

43 search hits