TY - GEN A1 - Seewann, Lena A1 - Verwiebe, Roland A1 - Buder, Claudia A1 - Fritsch, Nina-Sophie T1 - “Broadcast your gender.” BT - A comparison of four text-based classification methods of German YouTube channels T2 - Zweitveröffentlichungen der Universität Potsdam : Wirtschafts- und Sozialwissenschaftliche Reihe N2 - Social media platforms provide a large array of behavioral data relevant to social scientific research. However, key information such as sociodemographic characteristics of agents are often missing. This paper aims to compare four methods of classifying social attributes from text. Specifically, we are interested in estimating the gender of German social media creators. By using the example of a random sample of 200 YouTube channels, we compare several classification methods, namely (1) a survey among university staff, (2) a name dictionary method with the World Gender Name Dictionary as a reference list, (3) an algorithmic approach using the website gender-api.com, and (4) a Multinomial Naïve Bayes (MNB) machine learning technique. These different methods identify gender attributes based on YouTube channel names and descriptions in German but are adaptable to other languages. Our contribution will evaluate the share of identifiable channels, accuracy and meaningfulness of classification, as well as limits and benefits of each approach. We aim to address methodological challenges connected to classifying gender attributes for YouTube channels as well as related to reinforcing stereotypes and ethical implications. T3 - Zweitveröffentlichungen der Universität Potsdam : Wirtschafts- und Sozialwissenschaftliche Reihe - 152 KW - text based classification methods KW - gender KW - YouTube KW - machine learning KW - authorship attribution Y1 - 2022 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-566287 SN - 1867-5808 IS - 152 ER - TY - JOUR A1 - Seewann, Lena A1 - Verwiebe, Roland A1 - Buder, Claudia A1 - Fritsch, Nina-Sophie T1 - “Broadcast your gender.” BT - A comparison of four text-based classification methods of German YouTube channels JF - Frontiers in Big Data N2 - Social media platforms provide a large array of behavioral data relevant to social scientific research. However, key information such as sociodemographic characteristics of agents are often missing. This paper aims to compare four methods of classifying social attributes from text. Specifically, we are interested in estimating the gender of German social media creators. By using the example of a random sample of 200 YouTube channels, we compare several classification methods, namely (1) a survey among university staff, (2) a name dictionary method with the World Gender Name Dictionary as a reference list, (3) an algorithmic approach using the website gender-api.com, and (4) a Multinomial Naïve Bayes (MNB) machine learning technique. These different methods identify gender attributes based on YouTube channel names and descriptions in German but are adaptable to other languages. Our contribution will evaluate the share of identifiable channels, accuracy and meaningfulness of classification, as well as limits and benefits of each approach. We aim to address methodological challenges connected to classifying gender attributes for YouTube channels as well as related to reinforcing stereotypes and ethical implications. KW - text based classification methods KW - gender KW - YouTube KW - machine learning KW - authorship attribution Y1 - 2022 U6 - https://doi.org/10.3389/fdata.2022.908636 SN - 2624-909X IS - 5 PB - Frontiers CY - Lausanne, Schweiz ER - TY - JOUR A1 - Verwiebe, Roland A1 - Bobzien, Licia A1 - Fritsch, Nina-Sophie A1 - Buder, Claudia T1 - Social inequality and digitization in modern societies BT - a systematic literature review on the role of ethnicity, gender, and age JF - SocArXiv : open archive of the social sciences N2 - The digitization process has triggered a profound transformation of modern societies. It encompasses a broad spectrum of technical, social, political, cultural and economic developments related to the mass use of computer- and internet-based technologies. It is now becoming increasingly clear that digitization is also changing existing structures of social inequality and that new structures of digital inequality are emerging. This is shown by a growing number of recent individual studies. In this paper, we set ourselves the task of systematizing this new research within the framework of an empirically supported literature review. To do so, we use the PRISMA model for literature reviews and focus on three central dimensions of inequality - ethnicity, gender, and age - and their relevance within the discourse on digitization and inequality. The empirical basis consists of journal articles published between 2000 and 2020 and listed on the Web of Science, as well as an additional Google Scholar search, through which we attempt to include important monographs and contributions to edited volumes in our analyses. Our text corpus thus comprises a total of 281 articles. Empirically, our literature review shows that unequal access to digital resources largely reproduces existing structures of inequality; in some cases, studies report a reduction in social inequalities as a result of the digitization process. KW - age KW - digitization KW - ethnicity KW - gender social inequality KW - social inequality Y1 - 2023 U6 - https://doi.org/10.31235/osf.io/k2zwh PB - Center for Open Science CY - [Charlottesville, VA] ER -