LOGIN TO YOUR ACCOUNT

Username
Password
Remember Me
Or use your Academic/Social account:

CREATE AN ACCOUNT

Or use your Academic/Social account:

Congratulations!

You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.

Important!

Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message

CREATE AN ACCOUNT

Name:
Username:
Password:
Verify Password:
E-mail:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Luo, G.; Yang, S.; Tian, G.; Yuan, C.; Hu, W.; Maybank, Stephen J. (2014)
Publisher: IEEE Computer Society
Languages: English
Types: Article
Subjects: csis

Classified by OpenAIRE into

ACM Ref: ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION
In this paper, we address the problem of human action recognition through combining global temporal dynamics and local visual spatio-temporal appearance features. For this purpose, in the global temporal dimension, we propose to model the motion dynamics with robust linear dynamical systems (LDSs) and use the model parameters as motion descriptors. Since LDSs live in a non-Euclidean space and the descriptors are in non-vector form, we propose a shift invariant subspace angles based distance to measure the similarity between LDSs. In the local visual dimension, we construct curved spatio-temporal cuboids along the trajectories of densely sampled feature points and describe them using histograms of oriented gradients (HOG). The distance between motion sequences is computed with the Chi-Squared histogram distance in the bag-of-words framework. Finally we perform classification using the maximum margin distance learning method by combining the global dynamic distances and the local visual distances. We evaluate our approach for action recognition on five short clips data sets, namely Weizmann, KTH, UCF sports, Hollywood2 and UCF50, as well as three long continuous data sets, namely VIRAT, ADL and CRIM13. We show competitive results as compared with current state-of-the-art methods.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • [1] P. Turaga, R. Chellappa, V. S. Subrahmanian, and O. Udrea, “Machine recognition of human activities: A survey,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 11, pp. 1473-1488, 2008.
    • [2] R. Poppe, “A survey on vision-based human action recognition,” Image and Vision Computing, vol. 28, no. 6, pp. 976-990, 2010.
    • [3] I. Laptev, “On space-time interest points,” Int'l J. Computer Vision, vol. 64, no. 2-3, pp. 107-123, 2005.
    • [4] P. Dolla´r, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior recognition via sparse spatiotemporal features,” in Proc. IEEE Int'l Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005, pp. 65-72.
    • [5] P. Scovanner, S. Ali, and M. Shah, “A 3-dimensional sift descriptor and its application to action recognition,” in Proc. Int'l Conf. Multimedia, 2007, pp. 357-360.
    • [6] J. Niebles, H. Wang, and L. Fei-Fei, “Unsupervised learning of human action categories using spatial temporal words,” Int'l J. Computer Vision, vol. 79, no. 3, pp. 299-318, 2008.
    • [7] A. Efros, A. Berg, G. Mori, and J. Malik, “Recognizing action at a distance,” in Proc. IEEE Int'l Conf. Computer Vision, 2003, pp. 726-733.
    • [8] D. Tran and A. Sorokin, “Human activity recognition with metric learning,” in Proc. European Conf. on Computer Vision, 2008.
    • [9] R. Chaudhry, A. Ravichandran, G. Hager, and R. Vidal, “Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009, pp. 1932-1939.
    • [10] H. Wang, A. Kla¨ser, C. Schmid, and C. L. Liu, “Dense trajectories and motion boundary descriptors for action recognition,” Int'l J. Computer Vision, vol. 103, no. 1, pp. 60-79, 2013.
    • [11] J. Yamato, J. Ohya, and K. Ishii, “Recognizing human action in time-sequential images using hidden Markov model,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1992, pp. 379-385.
    • [12] M. Brand, N. Oliver, and A. Pentland, “Coupled hidden Markov models for complex action recognition,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1997, pp. 994-999.
    • [13] C. Sminchisescu, A. Kanaujia, Z. Li, and D. Metaxas, “Conditional models for contextual human motion recognition,” in Proc. IEEE Int'l Conf. Computer Vision, 2005, pp. 1808-1815.
    • [14] D. L. Vail, M. M. Veloso, and J. D. Lafferty, “Conditional random fields for activity recognition,” in Proc. Int'l Conf. Autonomous Agents and Multi-agent Systems, 2007.
    • [15] K. Mikolajczyk and H. Uemura, “Action recognition with motion-appearance vocabulary forest,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008, pp. 1-8.
    • [16] X. P. Burgos-Artizzu, P. Dolla´r, D. Lin, D. J. Anderson, and P. Perona, “Social behavior recognition in continuous video,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2012, pp. 1322-1329.
    • [17] C. Bregler, “Learning and recognizing human dynamics in video sequences,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1997, pp. 568-574.
    • [18] A. Blake, B. North, and M. Isard, “Learning multi-class dynamics,” in Proc. Ann. Conf. Neural Information Processing Systems, 1999, pp. 389-395.
    • [19] V. Pavlovic´ and J. M. Rehg, “Impact of dynamic model learning on classification of human motion,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2000, pp. 788-795.
    • [20] P. K. Turaga, A. Veeraraghavan, and R. Chellappa, “From videos to verbs: Mining videos for activities using a cascade of dynamical systems,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007, pp. 1-8.
    • [21] P. Van Overschee and B. De Moor, “N4SID: Subspace algorithms for the identification of combined deterministic-stochastic systems,” Automatica, vol. 30, no. 1, pp. 75-93, 1994.
    • [22] Z. Ghahramani and G. E. Hinton, “Parameter estimation for linear dynamical systems,” Dept. Computer Science, Univ. of Toronto, Technical Report CRG-TR-96-2, 1996.
    • [23] R. J. Martin, “A metric for ARMA processes,” IEEE Trans. Signal Process., vol. 48, no. 4, pp. 1164-1170, 2000.
    • [24] K. De Cock and B. De Moor, “Subspace angles between ARMA models,” Systems and Control Letter, vol. 46, pp. 265-270, 2002.
    • [25] A. B. Chan and N. Vasconcelos, “Probabilistic kernels for the classification of auto-regressive visual processes,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005, pp. 846-851.
    • [26] S. V. N. Vishwanathan, A. J. Smola, and R. Vidal, “Binet-Cauchy kernels on dynamical systems and its application to the analysis of dynamic scenes,” Int'l J. Computer Vision, vol. 73, no. 1, pp. 95-119, 2007.
    • [27] G. Doretto, A. Chiuso, Y. N. Wu, and S. Soatto, “Dynamic textures,” Int'l J. Computer Vision, vol. 51, no. 2, pp. 91-109, 2003.
    • [28] F. Woolfe and A. W. Fitzgibbon, “Shift-invariant dynamic texture recognition,” in Proc. European Conf. on Computer Vision, 2006, pp. 549-562.
    • [29] B. Ghanem and N. Ahuja, “Maximum margin distance learning for dynamic texture recognition,” in Proc. European Conf. on Computer Vision, 2010.
    • [30] T. Van Gestel, J. A. K. Suykens, P. Van Dooren, and B. De Moor, “Identification of stable models in subspace identification by using regularization,” IEEE Trans. Autom. Control, vol. 46, no. 9, pp. 1416-1420, 2001.
    • [31] S. L. Lacy and D. S. Bernstein, “Subspace identification with guaranteed stability using constrained optimization,” IEEE Trans. Autom. Control, vol. 48, no. 7, pp. 1259-1263, 2003.
    • [32] A. McCallum, D. Freitag, and F. Pereira, “Maximum entropy Markov models for information extraction and segmentation,” in Proc. Int'l Conf. Machine Learning, 2000, pp. 591-598.
    • [33] A. Oikonomopoulos, I. Patras, and M. Pantic, “Spatiotemporal salient points for visual recognition of human actions,” IEEE Trans. Syst., Man, Cybern. B, vol. 36, no. 3, pp. 710-719, 2006.
    • [34] G. Willems, T. Tuytelaars, and L. V. Gool, “An efficient dense and scale-invariant spatio-temporal interest point detector,” in Proc. European Conf. on Computer Vision, 2008.
    • [35] H. J. Seo and P. Milanfar, “Action recognition from one example,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 5, pp. 867-882, 2011.
    • [36] S. Mathe and C. Sminchisescu, “Dynamic eye movement datasets and learnt saliency models for visual action recognition,” in Proc. European Conf. on Computer Vision, 2012, pp. 842-856.
    • [37] S. Wong and R. Cipolla, “Extracting spatiotemporal interest points using global information,” in Proc. IEEE Int'l Conf. Computer Vision, 2007, pp. 1-8.
    • [38] A. H. Shabani, D. A. Clausi, and J. S. Zelek, “Improved spatio-temporal salient feature detection for action recognition,” in Proc. British Machine Vision Conference, 2011, pp. 100.1-100.0.
    • [39] C. Sch u¨ldt, I. Laptev, and B. Caputo, “Recognizing human actions: A local SVM approach,” in Proc. Int'l Conf. Pattern Recognition, 2004, pp. 32-36.
    • [40] D. G. Lowe, “Object recognition from local scale-invariant features,” in Proc. IEEE Int'l Conf. Computer Vision, 1999, pp. 1150-1157.
    • [41] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool, “SURF: Speeded up robust features,” Computer Vision and Image Understanding, vol. 110, no. 3, pp. 346-359, 2008.
    • [42] S. Ali and M. Shah, “Human action recognition in videos using kinematic features and multiple instance learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 2, pp. 288-303, 2010.
    • [43] A. F. Bobick and J. W. Davis, “The recognition of human movement using temporal templates,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 3, pp. 257-267, 2001.
    • [44] L. Wang and D. Suter, “Learning and matching of dynamic shape manifolds for human action recognition,” IEEE Trans. Image Process., vol. 16, no. 6, pp. 1646-1661, 2007.
    • [45] L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, “Actions as space-time shapes,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 12, pp. 2247-2253, 2007.
    • [46] F. Caillette, A. Galata, and T. Howard, “Real-time 3-D human body tracking using learnt models of behaviour,” Computer Vision and Image Understanding, vol. 109, no. 2, pp. 112-125, 2008.
    • [47] S. Hongeng and R. Nevatia, “Large-scale event detection using semi-hidden Markov models,” in Proc. IEEE Int'l Conf. Computer Vision, 2003, pp. 1455-1462.
    • [48] P. Natarajan and R. Nevatia, “View and scale invariant action recognition using multiview shape-flow models,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008, pp. 1-8.
    • [49] J. M. Wang, D. J. Fleet, and A. Hertzmann, “Gaussian process dynamical models for human motion,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 2, pp. 283-298, 2008.
    • [50] A. Bissacco, A. Chiuso, and S. Soatto, “Classification and recognition of dynamical models: The role of phase, independent components, kernels and optimal transport,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 11, pp. 1958-1972, 2007.
    • [51] S. M. Siddiqi, B. Boots, and G. J. Gordon, “A constraint generation approach to learning stable linear dynamical systems,” in Proc. Ann. Conf. Neural Information Processing Systems, 2007.
    • [52] S. Shalev-Shwartz, Y. Singer, and N. Srebro, “Pegasos: Primal estimated sub-gradient solver for SVM,” in Proc. Int'l Conf. Machine Learning, 2007, pp. 807-814.
    • [53] N. Dalal, B. Triggs, and C. Schmid, “Human detection using oriented histograms of flow and appearance,” in Proc. European Conf. on Computer Vision, 2006.
    • [54] M. D. Rodriguez, J. Ahmed, and M. Shah, “Action MACH: A spatio-temporal maximum average correlation height filter for action recognition,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008, pp. 1-8.
    • [55] M. Marszałek, I. Laptev, and C. Schmid, “Actions in context,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009, pp. 2929-2936.
    • [56] K. K. Reddy and M. Shah, “Recognizing 50 human action categories of web videos,” Machine Vision and Applications Journal, pp. 1-11, 2012.
    • [57] S. Oh, A. Hoogs, A. Perera, and et al., “A large-scale benchmark dataset for event recognition in surveillance video,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011, pp. 3153-3160.
    • [58] B. Chakraborty, J. Gonza`lez, and F. X. Roca, “Large scale continuous visual event recognition using max-margin hough transformation framework,” Computer Vision and Image Understanding, vol. 117, no. 10, pp. 1356-1368, 2013.
    • [59] M. Amer and S. Todorovic, “Sum-product networks for modeling activities with stochastic structure,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2012, pp. 1314-1321.
    • [60] H. Pirsiavash and D. Ramanan, “Detecting activities of daily living in first-person camera views,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2012, pp. 2847-2854.
    • [61] A. Gilbert, J. Illingworth, and R. Bowden, “Action recognition using mined hierarchical compound features,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 5, pp. 883-897, 2011.
    • [62] A. Kovashka and K. Grauman, “Learning a hierarchy of discriminative space-time neighborhood features for human action recognition,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010, pp. 2046-2053.
    • [63] M. Bregonzio, S. Gong, and T. Xiang, “Recognising action as clouds of space-time interest points,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009, pp. 1948-1955.
    • [64] Q. V. Le, W. Y. Zou, S. Y. Yeung, and A. Y. Ng, “Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011, pp. 3361-3368.
    • [65] X. Wu, D. Xu, L. Duan, and J. Luo, “Action recognition using context and appearance distribution features,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011, pp. 489-496.
    • [66] S. Sadanand and J. J. Corso, “Action bank: A high-level representation of activity in video,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2012, pp. 1234-1241.
  • No related research data.
  • Discovered through pilot similarity algorithms. Send us your feedback.

Share - Bookmark

Cite this article