Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Acar, Esra; Hopfgartner, Frank; Albayrak, Sahin (2015)
Languages: English
Types: Other
Subjects: ZA
When designing a video affective content analysis algorithm, one of the most important steps is the selection of discriminative features for the effective representation of video segments. The majority of existing affective content analysis methods either use low-level audio-visual features or generate handcrafted higher level representations based on these low-level features. We propose in this work to use deep learning methods, in particular convolutional neural networks (CNNs), in order to automatically learn and extract mid-level representations from raw data. To this end, we exploit the audio and visual modality of videos by employing Mel-Frequency Cepstral Coefficients (MFCC) and color values in the HSV color space. We also incorporate dense trajectory based motion features in order to further enhance the performance of the analysis. By means of multi-class support vector machines (SVMs) and fusion mechanisms, music video clips are classified into one of four affective categories representing the four quadrants of the Valence-Arousal (VA) space. Results obtained on a subset of the DEAP dataset show (1) that higher level representations perform better than low-level features, and (2) that incorporating motion information leads to a notable performance gain, independently from the chosen representation.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • [1] H. Gunes and B. Schuller, “Categorical and dimensional affect analysis in continuous input: Current trends and future directions,” Image and Vision Computing, vol. 31, no. 2, pp. 120-136, 2013.
    • [2] G. Irie, T. Satou, A. Kojima, T. Yamasaki, and K. Aizawa, “Affective audio-visual words and latent topic driving model for realizing movie affective scene classification,” IEEE Trans. on Multimedia, vol. 12, no. 6, pp. 523 -535, oct. 2010.
    • [3] A. Yazdani, K. Kappeler, and T. Ebrahimi, “Affective content analysis of music video clips,” in MIRUM. ACM, 2011, pp. 7-12.
    • [4] M. Xu, J. Wang, X. He, J. Jin, S. Luo, and H. Lu, “A three-level framework for affective content analysis and its case studies,” MTAP, 2012.
    • [5] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” PAMI, vol. 35, no. 8, pp. 1798-1828, 2013.
    • [6] S. Ji, W. Xu, M. Yang, and K. Yu, “3d convolutional neural networks for human action recognition,” PAMI, pp. 221-231, 2013.
    • [7] T. Li, A. Chan, and A. Chun, “Automatic musical pattern feature extraction using convolutional neural network,” in Int. Conf. Data Mining and App., 2010.
    • [8] E. Schmidt, J. Scott, and Y. Kim, “Feature learning in dynamic environments: Modeling the acoustic structure of musical emotion,” in ISMIR, 2012, pp. 325-330.
    • [9] E. Acar, F. Hopfgartner, and S. Albayrak, “Understanding affective content of music videos through learned representations,” in MMM, 2014, pp. 303-314.
    • [10] H. Wang and L. Cheong, “Affective understanding in film,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 16, no. 6, pp. 689- 704, 2006.
    • [11] P. Valdez and A. Mehrabian, “Effects of color on emotions,” Journal of Experimental Psychology: General, vol. 123, no. 4, p. 394, 1994.
    • [12] S. Koelstra, C. Muhl, M. Soleymani, J. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, and I. Patras, “Deap: A database for emotion analysis; using physiological signals,” Affective Computing, vol. 3, no. 1, pp. 18-31, 2012.
    • [13] J. Eggink and D. Bland, “A large scale experiment for mood-based classification of tv programmes,” in ICME, 2012, pp. 140-145.
    • [14] M. Wimmer, B. Schuller, D. Arsic, G. Rigoll, and B. Radig, “Low-level fusion of audio and video feature for multi-modal emotion recognition,” in VISAPP, vol. 2, 2008, pp. 145-151.
    • [15] Y. Baveye, J. Bettinelli, E. Dellandre´a, L. Chen, and C. Chamaret, “A large video data base for computational models of induced emotion,” in ACII. IEEE, 2013.
    • [16] G. Irie, K. Hidaka, T. Satou, T. Yamasaki, and K. Aizawa, “Affective video segment retrieval for consumer generated videos based on correlation between emotions and emotional audio events,” in ICME. IEEE, 2009, pp. 522-525.
    • [17] L. Canini, S. Benini, and R. Leonardi, “Affective recommendation of movies based on selected connotative features,” Circuits and Systems for Video Technology, IEEE Trans. on, vol. 23, no. 4, pp. 636-647, 2013.
    • [18] Y. Jiang, B. Xu, and X. Xue, “Predicting emotions in user-generated videos,” in AAAI, 2014.
    • [19] H. Wang, A. Kla¨ser, C. Schmid, and C. Liu, “Action Recognition by Dense Trajectories,” in CVPR, Jun. 2011, pp. 3169-3176.
    • [20] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online learning for matrix factorization and sparse coding,” The Journal of Machine Learning Research, vol. 11, pp. 19-60, 2010.
    • [21] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least angle regression,” The Annals of statistics, vol. 32, no. 2, pp. 407-499, 2004.
    • [22] T. Wu, C. Lin, and R. Weng, “Probability estimates for multi-class classification by pairwise coupling,” The Journal of Machine Learning Research, vol. 5, pp. 975-1005, 2004.
  • No related research data.
  • No similar publications.
  • BioEntity Site Name

Share - Bookmark

Download from

Funded by projects


Cite this article