Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Publisher: IEEE
Types: Unknown
Machine learning algorithms for the analysis of timeseries often depend on the assumption that the utilised data are temporally aligned. Any temporal discrepancies arising in the data is certain to lead to ill-generalisable models, which in turn fail to correctly capture the properties of the task at hand. The temporal alignment of time-series is thus a crucial challenge manifesting in a multitude of applications. Nevertheless, the vast majority of algorithms oriented towards the temporal alignment of time-series are applied directly on the observation space, or utilise simple linear projections. Thus, they fail to capture complex, hierarchical non-linear representations which may prove to be beneficial towards the task of temporal alignment, particularly when dealing with multi-modal data (e.g., aligning visual and acoustic information). To this end, we present the Deep Canonical Time Warping (DCTW), a method which automatically learns complex non-linear representations of multiple time-series, generated such that (i) they are highly\ud correlated, and (ii) temporally in alignment. By means of\ud experiments on four real datasets, we show that the representations\ud learnt via the proposed DCTW significantly outperform\ud state-of-the-art methods in temporal alignment, elegantly\ud handling scenarios with highly heterogeneous features,\ud such as the temporal alignment of acoustic and visual\ud features.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • [1] J. Aach and G. Church. Aligning gene expression time series with time warping algorithms. Bioinformatics, 17:495-508, 2001. 1
    • [2] G. Andrew et al. Deep Canonical Correlation Analysis. In ICML, volume 28, 2013. 2, 3, 4, 5, 6
    • [3] F. Bach. Consistency of trace norm minimization. JMLR, 9:1019-1048, 2008. 3, 4
    • [4] F. Bach and M. Jordan. A probabilistic interpretation of canonical correlation analysis. 2005. 2
    • [5] A. Bruderlin and L. Williams. Motion signal processing. In SIGGRAPH, pages 97-104, 1995. 1
    • [6] Y. Caspi and M. Irani. Aligning non-overlapping sequences. IJCV, 48:39-51, 2002. 1
    • [7] S. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE TASSP, 28, 1980. 6
    • [8] F. De La Torre. A least-squares framework for component analysis. IEEE TPAMI, 34:1041-1055, 2012. 2
    • [9] J. Duchi et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. JMLR, 12:2121- 2159, 2011. 5
    • [10] R. Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE CVPR, pages 580-587. IEEE, 2014. 2
    • [11] D. Gong and G. Medioni. Dynamic Manifold Warping for view invariant action recognition. In IEEE CVPR, pages 571-578, 2011. 1
    • [12] L. Gorelick et al. Shape representation and classification using the poisson equation. IEEE TPAMI, 28:1991-2004, 2006. 5
    • [13] L. Gorelick et al. Actions as space-time shapes. IEEE TPAMI, 29:2247-2253, 2007. 5
    • [14] M. Hasan. On multi-set canonical correlation analysis. In IJCNN, pages 1128-1133. IEEE, 2009. 1, 4
    • [15] G. Hinton et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Sig. Prog. Mag., 29(6):82-97, 2012. 2
    • [16] G. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313:504-507, 2006. 2
    • [17] E. Hsu, K. Pulli, and J. PopovicĀ“. Style translation for human motion. In SIGGRAPH, volume 24, page 1082, 2005. 4
    • [18] B.-H. F. Juang. On the hidden markov model and dynamic time warping for speech recognitiona unified view. AT&T Bell Laboratories Technical Journal, 63(7):1213- 1243, 1984. 1
    • [19] V. Kazemi and S. Josephine. One Millisecond Face Alignment with an Ensemble of Regression Trees. In IEEE CVPR, 2014. 6, 7
    • [20] Y. Kim, H. Lee, and E. M. Provost. Deep learning for robust feature generation in audiovisual emotion recognition. In IEEE ICASSP, pages 3687-3691. IEEE, 2013. 2
    • [21] A. Krizhevsky et al. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097-1105, 2012. 2
    • [22] A. Maas et al. Rectifier nonlinearities improve neural network acoustic models. In ICML, volume 30, 2013. 5
    • [23] K. Mardia et al. Multivariate analysis. Academic press, 1979. 2
    • [24] C. Maurer and V. Raghavan. A linear time algorithm for computing exact Euclidean distance transforms of binary images in arbitrary dimensions. IEEE TPAMI, 25:265-270, 2003. 5
    • [25] J. Ngiam, A. Khosla, and M. Kim. Multimodal deep learning. ICML, 2011. 2
    • [26] A. A. Nielsen. Multiset canonical correlations analysis and multispectral, truly multitemporal remote sensing data. IEEE TIP, 11(3):293-305, 2002. 4
    • [27] M. Pantic et al. Web-based database for facial expression analysis. In ICME, volume 2005, pages 317-321, 2005. 6
    • [28] E. Patterson et al. CUAVE: A new audio-visual database for multimodal human-computer interface research. ICASSP, 2:II-2017-II-2020, 2002. 7
    • [29] L. Rabiner and B. Juang. Fundamentals of Speech Recognition, volume 103. 1993. 1, 3, 4
    • [30] S. Salvador and P. Chan. Fastdtw: Toward accurate dynamic time warping in linear time and space. In KDD-Workshop , 2004. 8
    • [31] S. Shariat and V. Pavlovic. Isotonic CCA for sequence alignment and activity recognition. In ICCV, pages 2572-2578, 2011. 1
    • [32] Y. Taigman et al. Deepface: Closing the gap to humanlevel performance in face verification. In IEEE CVPR, pages 1701-1708. IEEE, 2014. 2
    • [33] P. Vincent et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion. JMLR, 11:3371-3408, 2010. 5
    • [34] H. Vu et al. Manifold Warping: Manifold Alignment over Time. In AAAI, 2012. 1, 4, 8
    • [35] A. Wang, J. Lu, G. Wang, J. Cai, and T.-J. Cham. Multimodal unsupervised feature learning for RGB-D scene labeling. In ECCV, pages 453-467. Springer, 2014. 2
    • [36] J. Westbury et al. X-ray microbeam speech production database. JASA, 88(S1):S56--S56, 1990. 6
    • [37] F. Zhou and F. De La Torre. Canonical time warping for alignment of human behavior. NIPS, 2009. 1, 3, 4, 5, 8
    • [38] F. Zhou and F. De La Torre. Generalized time warping for multi-modal alignment of human motion. In IEEE CVPR, pages 1282-1289, 2012. 1, 4, 5, 8
    • [39] F. Zhou and F. De La Torre. Generalized Canonical Time Warping. IEEE TPAMI, 2015. 6
  • No related research data.
  • Discovered through pilot similarity algorithms. Send us your feedback.

Share - Bookmark

Funded by projects


Cite this article