Fast Automated Processing and Evaluation of Identity Leaks

Jaeger, David; Graupner, Hendrik; Pelchen, Chris; Cheng, Feng; Meinel, Christoph

doi:10.1007/s10766-016-0478-6

The relevance of identity data leaks on the Internet is more present than ever. Almost every week we read about leakage of databases with more than a million users in the news. Smaller but not less dangerous leaks happen even multiple times a day. The public availability of such leaked data is a major threat to the victims, but also creates the opportunity to learn not only about security of service providers but also the behavior of users when choosing passwords. Our goal is to analyze this data and generate knowledge that can be used to increase security awareness and security, respectively. This paper presents a novel approach to the processing and analysis of a vast majority of bigger and smaller leaks. We evolved from a semi-manual to a fully automated process that requires a minimum of human interaction. Our contribution is the concept and a prototype implementation of a leak processing workflow that includes the extraction of digital identities from structured and unstructured leak-files, the identification of hash routines andThe relevance of identity data leaks on the Internet is more present than ever. Almost every week we read about leakage of databases with more than a million users in the news. Smaller but not less dangerous leaks happen even multiple times a day. The public availability of such leaked data is a major threat to the victims, but also creates the opportunity to learn not only about security of service providers but also the behavior of users when choosing passwords. Our goal is to analyze this data and generate knowledge that can be used to increase security awareness and security, respectively. This paper presents a novel approach to the processing and analysis of a vast majority of bigger and smaller leaks. We evolved from a semi-manual to a fully automated process that requires a minimum of human interaction. Our contribution is the concept and a prototype implementation of a leak processing workflow that includes the extraction of digital identities from structured and unstructured leak-files, the identification of hash routines and a quality control to ensure leak authenticity. By making use of parallel and distributed programming, we are able to make leaks almost immediately available for analysis and notification after they have been published. Based on the data collected, this paper reveals how easy it is for criminals to collect lots of passwords, which are plain text or only weakly hashed. We publish those results and hope to increase not only security awareness of Internet users but also security on a technical level on the service provider side.… show more

Author details:	David Jaeger ORCiD, Hendrik Graupner, Chris Pelchen GND, Feng Cheng GND, Christoph Meinel ORCiD GND
DOI:	https://doi.org/10.1007/s10766-016-0478-6
ISSN:	0885-7458
ISSN:	1573-7640
Title of parent work (English):	International journal of parallel programming
Publisher:	Springer
Place of publishing:	New York
Publication type:	Article
Language:	English
Date of first publication:	2018/12/05
Publication year:	2018
Release date:	2021/12/17
Tag:	Automated parsing; Data breach; Identity leak; Parallel processing
Volume:	46
Issue:	2
Number of pages:	30
First page:	441
Last Page:	470
Organizational units:	Digital Engineering Fakultät / Hasso-Plattner-Institut für Digital Engineering GmbH
DDC classification:	0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme
Peer review:	Referiert

Fast Automated Processing and Evaluation of Identity Leaks

Export metadata

Additional Services