Distributed detection of sequential anomalies in univariate time series
- The automated detection of sequential anomalies in time series is an essential task for many applications, such as the monitoring of technical systems, fraud detection in high-frequency trading, or the early detection of disease symptoms. All these applications require the detection to find all sequential anomalies possibly fast on potentially very large time series. In other words, the detection needs to be effective, efficient and scalable w.r.t. the input size. Series2Graph is an effective solution based on graph embeddings that are robust against re-occurring anomalies and can discover sequential anomalies of arbitrary length and works without training data. Yet, Series2Graph is no t scalable due to its single-threaded approach; it cannot, in particular, process arbitrarily large sequences due to the memory constraints of a single machine. In this paper, we propose our distributed anomaly detection system, short DADS, which is an efficient and scalable adaptation of Series2Graph. Based on the actor programming model, DADSThe automated detection of sequential anomalies in time series is an essential task for many applications, such as the monitoring of technical systems, fraud detection in high-frequency trading, or the early detection of disease symptoms. All these applications require the detection to find all sequential anomalies possibly fast on potentially very large time series. In other words, the detection needs to be effective, efficient and scalable w.r.t. the input size. Series2Graph is an effective solution based on graph embeddings that are robust against re-occurring anomalies and can discover sequential anomalies of arbitrary length and works without training data. Yet, Series2Graph is no t scalable due to its single-threaded approach; it cannot, in particular, process arbitrarily large sequences due to the memory constraints of a single machine. In this paper, we propose our distributed anomaly detection system, short DADS, which is an efficient and scalable adaptation of Series2Graph. Based on the actor programming model, DADS distributes the input time sequence, intermediate state and the computation to all processors of a cluster in a way that minimizes communication costs and synchronization barriers. Our evaluation shows that DADS is orders of magnitude faster than S2G, scales almost linearly with the number of processors in the cluster and can process much larger input sequences due to its scale-out property.…
Author details: | Johannes Schneider, Phillip WenigORCiDGND, Thorsten PapenbrockORCiDGND |
---|---|
DOI: | https://doi.org/10.1007/s00778-021-00657-6 |
ISSN: | 1066-8888 |
ISSN: | 0949-877X |
Title of parent work (English): | The VLDB journal : the international journal on very large data bases |
Publisher: | Springer |
Place of publishing: | Berlin |
Publication type: | Article |
Language: | English |
Date of first publication: | 2021/03/25 |
Publication year: | 2021 |
Release date: | 2022/11/04 |
Tag: | Actor model; Data mining; Distributed programming; Sequential anomaly; Time series |
Volume: | 30 |
Issue: | 4 |
Number of pages: | 24 |
First page: | 579 |
Last Page: | 602 |
Funding institution: | German government as part of the LuFo VI call I program (Luftfahrtforschungsprogramm) [20D1915] |
Organizational units: | An-Institute / Hasso-Plattner-Institut für Digital Engineering gGmbH |
DDC classification: | 0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik |
Peer review: | Referiert |
Publishing method: | Open Access / Hybrid Open-Access |
License (German): | CC-BY - Namensnennung 4.0 International |