Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Acar, Esra; Hopfgartner, Frank; Albayrak, Sahin (2014)
Publisher: Springer Verlag
Languages: English
Types: Other
In consideration of the ever-growing available multimedia data, annotating multimedia content automatically with feeling(s) expected to arise in users is a challenging problem. In order to solve this problem, the emerging research field of video affective analysis aims at exploiting human emotions. In this field where no dominant feature representation has emerged yet, choosing discriminative features for the effective representation of video segments is a key issue in designing video affective content analysis algorithms. Most existing affective content analysis methods either use low-level audio-visual features or generate hand-crafted higher level representations based on these low-level features. In this work, we propose to use deep learning methods, in particular convolutional neural networks (CNNs), in order to learn mid-level representations from automatically extracted low-level features. We exploit the audio and visual modality of videos by employing Mel-Frequency Cepstral Coefficients (MFCC) and color values in the RGB space in order to build higher level audio and visual representations. We use the learned representations for the affective classification of music video clips. We choose multi-class support vector machines (SVMs) for classifying video clips into four affective categories representing the four quadrants of the Valence-Arousal (VA) space. Results on a subset of the DEAP dataset (on 76 music video clips) show that a significant improvement is obtained when higher level representations are used instead of low-level features, for video affective content analysis.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • 1. Y. Bengio. Learning deep architectures for ai. Foundations and trends R in Machine Learning, 2(1):1-127, 2009.
    • 2. Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. 2013.
    • 3. J. C. Bezdek. Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, 1981.
    • 4. L. Canini, S. Benini, P. Migliorati, and R. Leonardi. Emotional identity of movies. In Image Processing (ICIP), 2009 16th IEEE International Conference on, pages 1821-1824. IEEE, 2009.
    • 5. Y. Cui, J. S. Jin, S. Zhang, S. Luo, and Q. Tian. Music video affective understanding using feature importance analysis. In Proceedings of the ACM International Conference on Image and Video Retrieval, pages 213-219. ACM, 2010.
    • 6. J. Eggink and D. Bland. A large scale experiment for mood-based classification of tv programmes. In Multimedia and Expo (ICME), 2012 IEEE International Conference on, pages 140-145. IEEE, 2012.
    • 7. A. Hanjalic and L. Xu. Affective video content representation and modeling. IEEE Transactions on Multimedia, pages 143-154, 2005.
    • 8. G. Irie, K. Hidaka, T. Satou, A. Kojima, T. Yamasaki, and K. Aizawa. Latent topic driving model for movie affective scene classification. In Proceedings of the 17th ACM international conference on Multimedia, pages 565-568. ACM, 2009.
    • 9. G. Irie, T. Satou, A. Kojima, T. Yamasaki, and K. Aizawa. Affective audio-visual words and latent topic driving model for realizing movie affective scene classification. Multimedia, IEEE Transactions on, 12(6):523 -535, oct. 2010.
    • 10. S. Ji, W. Xu, M. Yang, and K. Yu. 3d convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 221- 231, 2013.
    • 11. S. Koelstra, C. Muhl, M. Soleymani, J.-S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, and I. Patras. Deap: A database for emotion analysis; using physiological signals. Affective Computing, IEEE Transactions on, 3(1):18-31, 2012.
    • 12. T. Li, A. B. Chan, and A. Chun. Automatic musical pattern feature extraction using convolutional neural network. In Proc. Int. Conf. Data Mining and Applications, 2010.
    • 13. N. Malandrakis, A. Potamianos, G. Evangelopoulos, and A. Zlatintsi. A supervised approach to movie emotion tracking. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 2376-2379. IEEE, 2011.
    • 14. R. Plutchik. The nature of emotions human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. American Scientist, 89(4):344-350, 2001.
    • 15. E. M. Schmidt, J. Scott, and Y. E. Kim. Feature learning in dynamic environments: Modeling the acoustic structure of musical emotion. In International Society for Music Information Retrieval, pages 325-330, 2012.
    • 16. M. Soleymani, G. Chanel, J. Kierkels, and T. Pun. Affective characterization of movie scenes based on multimedia content analysis and user's physiological emotional responses. In Multimedia, 2008. ISM 2008. Tenth IEEE International Symposium on, pages 228-235. IEEE, 2008.
    • 17. R. Srivastava, S. Yan, T. Sim, and S. Roy. Recognizing emotions of characters in movies. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, pages 993-996. IEEE, 2012.
    • 18. P. Valdez and A. Mehrabian. Effects of color on emotions. Journal of Experimental Psychology: General, 123(4):394, 1994.
    • 19. M. Wimmer, B. Schuller, D. Arsic, G. Rigoll, and B. Radig. Low-level fusion of audio and video feature for multi-modal emotion recognition. In 3rd International Conference on Computer Vision Theory and Applications. VISAPP, volume 2, pages 145-151, 2008.
    • 20. T.-F. Wu, C.-J. Lin, and R. C. Weng. Probability estimates for multi-class classification by pairwise coupling. The Journal of Machine Learning Research, 5:975-1005, 2004.
    • 21. M. Xu, J. S. Jin, S. Luo, and L. Duan. Hierarchical movie affective content analysis based on arousal and valence features. In Proceedings of the 16th ACM international conference on Multimedia, pages 677-680. ACM, 2008.
    • 22. M. Xu, J. Wang, X. He, J. S. Jin, S. Luo, and H. Lu. A three-level framework for affective content analysis and its case studies. Multimedia Tools and Applications, 2012.
    • 23. A. Yazdani, K. Kappeler, and T. Ebrahimi. Affective content analysis of music video clips. In Proceedings of the 1st international ACM workshop on Music information retrieval with user-centered and multimodal strategies, pages 7-12. ACM, 2011.
  • No related research data.
  • No similar publications.
  • BioEntity Site Name

Share - Bookmark

Download from

Cite this article