Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Giannakopoulos, Theodoros (2015)
Publisher: Public Library of Science
Journal: PLoS ONE
Languages: English
Types: Article
Subjects: Q, R, python, Research Article, audio analysis, Science, Medicine, open-source library
Audio information plays a rather important role in the increasing digital content that is available today, resulting in a need for methodologies that automatically analyze such content: audio event recognition for home automations and surveillance systems, speech recognition, music information retrieval, multimodal analysis (e.g. audio-visual analysis of online videos for content based recommendation), etc. This paper presents pyAudioAnalysis, an open-source Python library that provides a wide range of audio analysis procedures including: feature extraction, classification of audio signals, supervised and unsupervised segmentation and content visualization. pyAudioAnalysis is licensed under the Apache License and is available at GitHub (https://github.com/tyiannak/pyAudioAnalysis/). Here we present the theoretical background behind the wide range of the implemented methodologies, along with evaluation metrics for some of the methods. pyAudioAnalysis has been already used in several audio analysis research applications: smart-home functionalities through audio event detection, speech emotion recognition, depression classification based on audio- visual features, music segmentation, multimodal content-based movie recommendation and health applications (e.g. monitoring eating habits). The feedback provided from all these particular audio applications has led to practical enhancement of the library.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • 1 Giannakopoulos T, Pikrakis A. Introduction to Audio Analysis: A MATLAB® Approach. Academic Press; 2014.
    • 2 Theodoridis S, Koutroumbas K. Pattern Recognition, Fourth Edition Academic Press, Inc.; 2008.
    • 3 Hyoung-Gook K, Nicolas M, Sikora T. MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval. John Wiley & Sons; 2005.
    • 4 Gouyon F, Klapuri A, Dixon S, Alonso M, Tzanetakis G, Uhle C, et al An experimental comparison of audio tempo induction algorithms. Audio, Speech, and Language Processing, IEEE Transactions on. 2006;14(5):1832–1844. doi: 10.1109/TSA.2005.858509
    • 5 Pikrakis A, Antonopoulos I, Theodoridis S. Music meter and tempo tracking from raw polyphonic audio. Proceedings of the International Conference on Music Information Retrieval (ISMIR); 2004. https://github.com/tyiannak/pyAudioAnalysis/wiki
    • 6 Plumpe M, Acero A, Hon H, Huang X. HMM-based smoothing for concatenative speech synthesis Proc. ICSLP 1998 (6), 2751–2754
    • 7 Pikrakis A, Giannakopoulos T, Theodoridis S. A Speech/Music Discriminator of Radio Recordings Based on Dynamic Programming and Bayesian Networks IEEE Transactions on Multimedia; 2008 5(10) 846–857
    • 8 Anguera Miro X, Bozonnet S, Evans N, Fredouille C, Friedland G, Vinyals O. Speaker diarization: A review of recent research. Audio, Speech, and Language Processing, IEEE Transactions on. 2012;20(2):356–370. doi: 10.1109/TASL.2011.2125954
    • 9 Tranter SE, Reynolds DA. An overview of automatic speaker diarization systems. Audio, Speech, and Language Processing, IEEE Transactions on. 2006;14(5):1557–1565. doi: 10.1109/TASL.2006.878256
    • 10 Giannakopoulos T, Petridis S. Fisher linear semi-discriminant analysis for speaker diarization. Audio, Speech, and Language Processing, IEEE Transactions on. 2012;20(7):1913–1922. doi: 10.1109/TASL.2012.2191285
    • 11 Vendramin L, Campello R J, Hruschka, E R. On the Comparison of Relative Clustering Validity Criteria. SDM 2009 (pp. 733–744).
    • 12 Vinciarelli A, Dielmann A, Favre S, Salamin H. Canal9: A database of political debates for analysis of social interactions. Affective Computing and Intelligent Interaction and Workshops, 2009. ACII 2009. 3rd International Conference on (pp. 1–4).
    • 13 Bartsch MA, Wakefield GH. Audio thumbnailing of popular music using chroma-based representations. IEEE Transactions on Multimedia; 2005;7(1):96–104. doi: 10.1109/TMM.2004.840597
    • 14 Lehinevych T, Kokkinis-Ntrenis N, Siantikos G, Dogruoz A S, Giannakopoulos T, Konstantopoulos S. Discovering Similarities for Content-Based Recommendation and Browsing in Multimedia Collections 2014 Tenth International Conference on Signal-Image Technology and Internet-Based Systems (SITIS)
    • 15 Giannakopoulos T, Smailis C, Perantonis S, Spyropoulos C. Realtime depression estimation using mid-term audio features International Workshop on Artificial Intelligence and Assistive Medicine. 2014.
    • 16 Tsiakas, K, Watts, L, Lutterodt, C, Giannakopoulos, T, Papangelis, A, Gatchel, R, et al. A Multimodal Adaptive Dialogue Manager for Depressive and Anxiety Disorder Screening: A Wizard-of-Oz Experiment 8th PErvasive Technologies Related to Assistive Environments (PETRA2015) conference
    • 17 Giannakopoulos T, Siantikos G, Perantonis S, Votsi N E, Pantis J. Automatic soundscape quality estimation using audio analysis 8th PErvasive Technologies Related to Assistive Environments (PETRA2015) conference
  • No related research data.
  • No similar publications.
  • BioEntity Site Name

Share - Bookmark

Funded by projects

  • EC | RADIO

Cite this article