TY - JOUR A1 - Wang, Cheng A1 - Yang, Haojin A1 - Meinel, Christoph T1 - A deep semantic framework for multimodal representation learning JF - Multimedia tools and applications : an international journal N2 - Multimodal representation learning has gained increasing importance in various real-world multimedia applications. Most previous approaches focused on exploring inter-modal correlation by learning a common or intermediate space in a conventional way, e.g. Canonical Correlation Analysis (CCA). These works neglected the exploration of fusing multiple modalities at higher semantic level. In this paper, inspired by the success of deep networks in multimedia computing, we propose a novel unified deep neural framework for multimodal representation learning. To capture the high-level semantic correlations across modalities, we adopted deep learning feature as image representation and topic feature as text representation respectively. In joint model learning, a 5-layer neural network is designed and enforced with a supervised pre-training in the first 3 layers for intra-modal regularization. The extensive experiments on benchmark Wikipedia and MIR Flickr 25K datasets show that our approach achieves state-of-the-art results compare to both shallow and deep models in multimodal and cross-modal retrieval. KW - Multimodal representation KW - Deep neural networks KW - Semantic feature KW - Cross-modal retrieval Y1 - 2016 U6 - https://doi.org/10.1007/s11042-016-3380-8 SN - 1380-7501 SN - 1573-7721 VL - 75 SP - 9255 EP - 9276 PB - Springer CY - Dordrecht ER - TY - THES A1 - Wang, Cheng T1 - Deep Learning of Multimodal Representations Y1 - 2016 ER -