• search hit 6 of 24139
Back to Result List

“Broadcast your gender.”

  • Social media platforms provide a large array of behavioral data relevant to social scientific research. However, key information such as sociodemographic characteristics of agents are often missing. This paper aims to compare four methods of classifying social attributes from text. Specifically, we are interested in estimating the gender of German social media creators. By using the example of a random sample of 200 YouTube channels, we compare several classification methods, namely (1) a survey among university staff, (2) a name dictionary method with the World Gender Name Dictionary as a reference list, (3) an algorithmic approach using the website gender-api.com, and (4) a Multinomial Naïve Bayes (MNB) machine learning technique. These different methods identify gender attributes based on YouTube channel names and descriptions in German but are adaptable to other languages. Our contribution will evaluate the share of identifiable channels, accuracy and meaningfulness of classification, as well as limits and benefits of eachSocial media platforms provide a large array of behavioral data relevant to social scientific research. However, key information such as sociodemographic characteristics of agents are often missing. This paper aims to compare four methods of classifying social attributes from text. Specifically, we are interested in estimating the gender of German social media creators. By using the example of a random sample of 200 YouTube channels, we compare several classification methods, namely (1) a survey among university staff, (2) a name dictionary method with the World Gender Name Dictionary as a reference list, (3) an algorithmic approach using the website gender-api.com, and (4) a Multinomial Naïve Bayes (MNB) machine learning technique. These different methods identify gender attributes based on YouTube channel names and descriptions in German but are adaptable to other languages. Our contribution will evaluate the share of identifiable channels, accuracy and meaningfulness of classification, as well as limits and benefits of each approach. We aim to address methodological challenges connected to classifying gender attributes for YouTube channels as well as related to reinforcing stereotypes and ethical implications.show moreshow less

Export metadata

Additional Services

Search Google Scholar Statistics
Metadaten
Author details:Lena SeewannORCiDGND, Roland VerwiebeORCiDGND, Claudia Buder, Nina-Sophie FritschORCiDGND
DOI:https://doi.org/10.3389/fdata.2022.908636
ISSN:2624-909X
Title of parent work (German):Frontiers in Big Data
Subtitle (English):A comparison of four text-based classification methods of German YouTube channels
Publisher:Frontiers
Place of publishing:Lausanne, Schweiz
Further contributing person(s):Dimitri Prandner, Heinz Leitgöb, Robert Moosbrugger
Publication type:Article
Language:English
Date of first publication:2022/09/14
Publication year:2022
Release date:2022/11/09
Tag:YouTube; authorship attribution; gender; machine learning; text based classification methods
Issue:5
Number of pages:16
Organizational units:Wirtschafts- und Sozialwissenschaftliche Fakultät / Sozialwissenschaften
DDC classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Peer review:Referiert
Grantor:Publikationsfonds der Universität Potsdam
Publishing method:Open Access / Gold Open-Access
License (German):License LogoCC-BY - Namensnennung 4.0 International
External remark:Zweitveröffentlichung in der Schriftenreihe Postprints der Universität Potsdam : Wirtschafts- und Sozialwissenschaftliche Reihe ; 152
Accept ✔
This website uses technically necessary session cookies. By continuing to use the website, you agree to this. You can find our privacy policy here.