TY - JOUR A1 - Long, Xiang A1 - de Melo, Gerard A1 - He, Dongliang A1 - Li, Fu A1 - Chi, Zhizhen A1 - Wen, Shilei A1 - Gan, Chuang T1 - Purely attention based local feature integration for video classification JF - IEEE Transactions on Pattern Analysis and Machine Intelligence N2 - Recently, substantial research effort has focused on how to apply CNNs or RNNs to better capture temporal patterns in videos, so as to improve the accuracy of video classification. In this paper, we investigate the potential of a purely attention based local feature integration. Accounting for the characteristics of such features in video classification, we first propose Basic Attention Clusters (BAC), which concatenates the output of multiple attention units applied in parallel, and introduce a shifting operation to capture more diverse signals. Experiments show that BAC can achieve excellent results on multiple datasets. However, BAC treats all feature channels as an indivisible whole, which is suboptimal for achieving a finer-grained local feature integration over the channel dimension. Additionally, it treats the entire local feature sequence as an unordered set, thus ignoring the sequential relationships. To improve over BAC, we further propose the channel pyramid attention schema by splitting features into sub-features at multiple scales for coarse-to-fine sub-feature interaction modeling, and propose the temporal pyramid attention schema by dividing the feature sequences into ordered sub-sequences of multiple lengths to account for the sequential order. Our final model pyramidxpyramid attention clusters (PPAC) combines both channel pyramid attention and temporal pyramid attention to focus on the most important sub-features, while also preserving the temporal information of the video. We demonstrate the effectiveness of PPAC on seven real-world video classification datasets. Our model achieves competitive results across all of these, showing that our proposed framework can consistently outperform the existing local feature integration methods across a range of different scenarios. KW - Feature extraction KW - Convolution KW - Computational modeling KW - Plugs KW - Three-dimensional displays KW - Task analysis KW - Two dimensional displays KW - Video classification KW - action recognition KW - attention mechanism KW - computer vision KW - Algorithms KW - Neural Networks KW - Computer Y1 - 2020 U6 - https://doi.org/10.1109/TPAMI.2020.3029554 SN - 0162-8828 SN - 1939-3539 SN - 2160-9292 VL - 44 IS - 4 SP - 2140 EP - 2154 PB - Inst. of Electr. and Electronics Engineers CY - Los Alamitos ER - TY - THES A1 - Shaabani, Nuhad T1 - On discovering and incrementally updating inclusion dependencies N2 - In today's world, many applications produce large amounts of data at an enormous rate. Analyzing such datasets for metadata is indispensable for effectively understanding, storing, querying, manipulating, and mining them. Metadata summarizes technical properties of a dataset which rang from basic statistics to complex structures describing data dependencies. One type of dependencies is inclusion dependency (IND), which expresses subset-relationships between attributes of datasets. Therefore, inclusion dependencies are important for many data management applications in terms of data integration, query optimization, schema redesign, or integrity checking. So, the discovery of inclusion dependencies in unknown or legacy datasets is at the core of any data profiling effort. For exhaustively detecting all INDs in large datasets, we developed S-indd++, a new algorithm that eliminates the shortcomings of existing IND-detection algorithms and significantly outperforms them. S-indd++ is based on a novel concept for the attribute clustering for efficiently deriving INDs. Inferring INDs from our attribute clustering eliminates all redundant operations caused by other algorithms. S-indd++ is also based on a novel partitioning strategy that enables discording a large number of candidates in early phases of the discovering process. Moreover, S-indd++ does not require to fit a partition into the main memory--this is a highly appreciable property in the face of ever-growing datasets. S-indd++ reduces up to 50% of the runtime of the state-of-the-art approach. None of the approach for discovering INDs is appropriate for the application on dynamic datasets; they can not update the INDs after an update of the dataset without reprocessing it entirely. To this end, we developed the first approach for incrementally updating INDs in frequently changing datasets. We achieved that by reducing the problem of incrementally updating INDs to the incrementally updating the attribute clustering from which all INDs are efficiently derivable. We realized the update of the clusters by designing new operations to be applied to the clusters after every data update. The incremental update of INDs reduces the time of the complete rediscovery by up to 99.999%. All existing algorithms for discovering n-ary INDs are based on the principle of candidate generation--they generate candidates and test their validity in the given data instance. The major disadvantage of this technique is the exponentially growing number of database accesses in terms of SQL queries required for validation. We devised Mind2, the first approach for discovering n-ary INDs without candidate generation. Mind2 is based on a new mathematical framework developed in this thesis for computing the maximum INDs from which all other n-ary INDs are derivable. The experiments showed that Mind2 is significantly more scalable and effective than hypergraph-based algorithms. N2 - Viele Anwendungen produzieren mit schnellem Tempo große Datenmengen. Die Profilierung solcher Datenmengen nach ihren Metadaten ist unabdingbar für ihre effektive Verwaltung und ihre Analyse. Metadaten fassen technische Eigenschaften einer Datenmenge zusammen, welche von einfachen Statistiken bis komplexe und Datenabhängigkeiten beschreibende Strukturen umfassen. Eine Form solcher Abhängigkeiten sind Inklusionsabhängigkeiten (INDs), die Teilmengenbeziehungen zwischen Attributen der Datenmengen ausdrücken. Dies macht INDs wichtig für viele Anwendungen wie Datenintegration, Anfragenoptimierung, Schemaentwurf und Integritätsprüfung. Somit ist die Entdeckung von INDs in unbekannten Datenmengen eine zentrale Aufgabe der Datenprofilierung. Ich entwickelte einen neuen Algorithmus namens S-indd++ für die IND-Entdeckung in großen Datenmengen. S-indd++ beseitigt die Defizite existierender Algorithmen für die IND-Entdeckung und somit ist er performanter. S-indd++ berechnet INDs sehr effizient basierend auf einem neuen Clustering der Attribute. S-indd++ wendet auch eine neue Partitionierungsmethode an, die das Verwerfen einer großen Anzahl von Kandidaten in früheren Phasen des Entdeckungsprozesses ermöglicht. Außerdem setzt S-indd++ nicht voraus, dass eine Datenpartition komplett in den Hauptspeicher passen muss. S-indd++ reduziert die Laufzeit der IND-Entdeckung um bis 50 %. Keiner der IND-Entdeckungsalgorithmen ist geeignet für die Anwendung auf dynamischen Daten. Zu diesem Zweck entwickelte ich das erste Verfahren für das inkrementelle Update von INDs in häufig geänderten Daten. Ich erreichte dies bei der Reduzierung des Problems des inkrementellen Updates von INDs auf dem inkrementellen Update des Attribute-Clustering, von dem INDs effizient ableitbar sind. Ich realisierte das Update der Cluster beim Entwurf von neuen Operationen, die auf den Clustern nach jedem Update der Daten angewendet werden. Das inkrementelle Update von INDs reduziert die Zeit der statischen IND-Entdeckung um bis 99,999 %. Alle vorhandenen Algorithmen für die n-ary-IND-Entdeckung basieren auf dem Prinzip der Kandidatengenerierung. Der Hauptnachteil dieser Methode ist die exponentiell wachsende Anzahl der SQL-Anfragen, die für die Validierung der Kandidaten nötig sind. Zu diesem Zweck entwickelte ich Mind2, den ersten Algorithmus für n-ary-IND-Entdeckung ohne Kandidatengenerierung. Mind2 basiert auf einem neuen mathematischen Framework für die Berechnung der maximalen INDs, von denen alle anderen n-ary-INDs ableitbar sind. Die Experimente zeigten, dass Mind2 wesentlich skalierbarer und leistungsfähiger ist als die auf Hypergraphen basierenden Algorithmen. T2 - Beitrag zur Entdeckung und inkrementellen Aktualisierung von Inklusionsabhängigkeiten KW - Inclusion Dependency KW - Data Profiling KW - Data Mining KW - Algorithms KW - Inclusion Dependency Discovery KW - Incrementally Inclusion Dependencies Discovery KW - Metadata Discovery KW - S-indd++ KW - Mind2 KW - Change Data Capture KW - Incremental Discovery KW - Big Data KW - Data Integration KW - Foreign Keys KW - Dynamic Data KW - Foreign Keys Discovery KW - Data Profiling KW - Data Mining KW - Algorithmen KW - Inklusionsabhängigkeiten KW - Inklusionsabhängigkeiten Entdeckung KW - Datenintegration KW - Metadaten Entdeckung Y1 - 2020 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-471862 ER -