• Treffer 5 von 8
Zurück zur Trefferliste

Purely attention based local feature integration for video classification

  • Recently, substantial research effort has focused on how to apply CNNs or RNNs to better capture temporal patterns in videos, so as to improve the accuracy of video classification. In this paper, we investigate the potential of a purely attention based local feature integration. Accounting for the characteristics of such features in video classification, we first propose Basic Attention Clusters (BAC), which concatenates the output of multiple attention units applied in parallel, and introduce a shifting operation to capture more diverse signals. Experiments show that BAC can achieve excellent results on multiple datasets. However, BAC treats all feature channels as an indivisible whole, which is suboptimal for achieving a finer-grained local feature integration over the channel dimension. Additionally, it treats the entire local feature sequence as an unordered set, thus ignoring the sequential relationships. To improve over BAC, we further propose the channel pyramid attention schema by splitting features into sub-features atRecently, substantial research effort has focused on how to apply CNNs or RNNs to better capture temporal patterns in videos, so as to improve the accuracy of video classification. In this paper, we investigate the potential of a purely attention based local feature integration. Accounting for the characteristics of such features in video classification, we first propose Basic Attention Clusters (BAC), which concatenates the output of multiple attention units applied in parallel, and introduce a shifting operation to capture more diverse signals. Experiments show that BAC can achieve excellent results on multiple datasets. However, BAC treats all feature channels as an indivisible whole, which is suboptimal for achieving a finer-grained local feature integration over the channel dimension. Additionally, it treats the entire local feature sequence as an unordered set, thus ignoring the sequential relationships. To improve over BAC, we further propose the channel pyramid attention schema by splitting features into sub-features at multiple scales for coarse-to-fine sub-feature interaction modeling, and propose the temporal pyramid attention schema by dividing the feature sequences into ordered sub-sequences of multiple lengths to account for the sequential order. Our final model pyramidxpyramid attention clusters (PPAC) combines both channel pyramid attention and temporal pyramid attention to focus on the most important sub-features, while also preserving the temporal information of the video. We demonstrate the effectiveness of PPAC on seven real-world video classification datasets. Our model achieves competitive results across all of these, showing that our proposed framework can consistently outperform the existing local feature integration methods across a range of different scenarios.zeige mehrzeige weniger

Metadaten exportieren

Weitere Dienste

Suche bei Google Scholar Statistik - Anzahl der Zugriffe auf das Dokument
Metadaten
Verfasserangaben:Xiang LongORCiD, Gerard de MeloORCiDGND, Dongliang He, Fu Li, Zhizhen Chi, Shilei Wen, Chuang GanORCiD
DOI:https://doi.org/10.1109/TPAMI.2020.3029554
ISSN:0162-8828
ISSN:1939-3539
ISSN:2160-9292
Pubmed ID:https://pubmed.ncbi.nlm.nih.gov/33026984
Titel des übergeordneten Werks (Englisch):IEEE Transactions on Pattern Analysis and Machine Intelligence
Verlag:Inst. of Electr. and Electronics Engineers
Verlagsort:Los Alamitos
Publikationstyp:Wissenschaftlicher Artikel
Sprache:Englisch
Datum der Erstveröffentlichung:07.10.2020
Erscheinungsjahr:2020
Datum der Freischaltung:05.01.2024
Freies Schlagwort / Tag:Algorithms; Computational modeling; Computer; Convolution; Feature extraction; Neural Networks; Plugs; Task analysis; Three-dimensional displays; Two dimensional displays; Video classification; action recognition; attention mechanism; computer vision
Band:44
Ausgabe:4
Seitenanzahl:15
Erste Seite:2140
Letzte Seite:2154
Organisationseinheiten:An-Institute / Hasso-Plattner-Institut für Digital Engineering gGmbH
DDC-Klassifikation:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 000 Informatik, Informationswissenschaft, allgemeine Werke
Peer Review:Referiert
Verstanden ✔
Diese Webseite verwendet technisch erforderliche Session-Cookies. Durch die weitere Nutzung der Webseite stimmen Sie diesem zu. Unsere Datenschutzerklärung finden Sie hier.