Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Benetos, E.; Dixon, S.; Giannoulis, D.; Kirchhoff, H.; Klapuri, A. (2012)
Publisher: FEUP Edições
Languages: English
Types: Unknown
Subjects: M1, QA75
Automatic music transcription is considered by many to be the Holy Grail in the field of music signal analysis. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. In order to overcome the limited performance of transcription systems, algorithms have to be tailored to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information across different methods and musical aspects.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • [1] S. A. Abdallah and M. D. Plumbley. Polyphonic transcription by non-negative sparse coding of power spectra. In ISMIR, pages 318-325, 2004.
    • [2] S. Arberet, A. Ozerov, F. Bimbot, and R. Gribonval. A tractable framework for estimating and combining spectral source models for audio source separation. Signal Processing, 92(8):1886-1901, 2012.
    • [3] A.M. Barbancho, A. Klapuri, L.J. Tardon, and I. Barbancho. Automatic transcription of guitar chords and fingering from audio. IEEE TASLP, 20(3):915-921, 2012.
    • [4] J.G.A. Barbedo and G. Tzanetakis. Musical instrument classification using individual partials. IEEE TASLP, 19(1):111- 122, 2011.
    • [5] M. Bay, A. F. Ehmann, and J. S. Downie. Evaluation of multiple-F0 estimation and tracking systems. In ISMIR, pages 315-320, 2009.
    • [6] E. Benetos, A. Klapuri, and S. Dixon. Score-informed transcription for automatic piano tutoring. In EUSIPCO, 2012.
    • [7] N. Bertin, R. Badeau, and E. Vincent. Enforcing harmonicity and smoothness in Bayesian non-negative matrix factorization applied to polyphonic music transcription. IEEE TASLP, 18(3):538-549, 2010.
    • [8] J.C. Brown. Calculation of a constant Q spectral transform. JASA, 89(1):425-434, 1991.
    • [9] J. B. Buckheit and D. L. Donoho. WaveLab and reproducible research. Technical Report 474, Dept of Statistics, Stanford Univ., 1995.
    • [10] A. Dessein, A. Cont, and G. Lemaitre. Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence. In ISMIR, pages 489-494, 2010.
    • [11] Z. Duan, B. Pardo, and C. Zhang. Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions. IEEE TASLP, 18(8):2121-2133, 2010.
    • [12] J.L. Durrieu and J.P. Thiran. Musical audio source separation based on user-selected F0 track. In LVA/ICA, pages 438-445, 2012.
    • [13] J. Eggink and G.J. Brown. A missing feature approach to instrument identification in polyphonic music. In ICASSP, volume 5, pages 553-556, 2003.
    • [14] V. Emiya, R. Badeau, and B. David. Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE TASLP, 18(6):1643-1654, 2010.
    • [15] S. Ewert and M. Mu¨ller. Estimating note intensities in music recordings. In ICASSP, pages 385-388, 2011.
    • [16] S. Ewert and M. Mu¨ller. Using score-informed constraints for NMF-based source separation. In ICASSP, pages 129-132, 2012.
    • [17] Y. Freund, R. Schapire, and N. Abe. A short introduction to boosting. JSAI, 14(771-780):1612, 1999.
    • [18] R. Gang, G. Bocko, J. Lundberg, S. Roessner, D. Headlam, and M.F. Bocko. A real-time signal processing framework of musical expressive feature extraction using MATLAB. In ISMIR, pages 115-120, 2011.
    • [19] O. Gillet and G. Richard. Automatic labelling of tabla signals. In ISMIR, 2003.
    • [20] M. Goto. Development of the RWC music database. In 18th Int. Congress Acoustics, pages 553-556, 2004.
    • [21] F. Gouyon and S. Dixon. A review of automatic rhythm description systems. CMJ, 29(1):34-54, 2005.
    • [22] G. Grindlay and D. Ellis. Transcribing multi-instrument polyphonic music with hierarchical eigeninstruments. IEEE JSTSP, 5(6):1159-1169, 2011.
    • [23] A. Holzapfel, Y. Stylianou, A.C. Gedik, and B. Bozkurt. Three dimensions of pitched instrument onset detection. IEEE TASLP, 18(6):1517-1527, 2010.
    • [24] X. Huang, A. Acero, and H.-W. Hon, editors. Spoken Language Processing: A guide to theory, algorithm and system development. Prentice Hall, 2001.
    • [25] K. Itoyama, M. Goto, K. Komatani, T. Ogata, and H.G. Okuno. Simultaneous processing of sound source separation and musical instrument identification using Bayesian spectral modeling. In ICASSP, pages 3816-3819, 2011.
    • [26] H. Kirchhoff, S. Dixon, and A. Klapuri. Shift-variant nonnegative matrix deconvolution for music transcription. In ICASSP, 2012.
    • [27] A. Klapuri. Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE TASLP, 11(6):804-816, 2003.
    • [28] A. Klapuri and M. Davy, editors. Signal Processing Methods for Music Transcription. Springer, 2006.
    • [29] A. Loscos, Y. Wang, and W.J.J. Boo. Low level descriptors for automatic violin transcription. In ISMIR, pages 164-167, 2006.
    • [30] M. Marolt. Automatic transcription of bell chiming recordings. IEEE TASLP, 20(3):844-853, 2012.
    • [31] M. Mauch and S. Dixon. Simultaneous estimation of chords and musical context from audio. IEEE TASLP, 18(6):1280- 1289, 2010.
    • [32] M. Mauch, K. Noland, and S. Dixon. Using musical structure to enhance automatic chord transcription. In ISMIR, pages 231-236, 2009.
    • [33] Music Information Retrieval Evaluation eXchange (MIREX). http://music-ir.org/mirexwiki/, 2011.
    • [34] M. Mu¨ller, D. Ellis, A. Klapuri, and G. Richard. Signal processing for music analysis. IEEE JSTSP, 5(6):1088-1110, 2011.
    • [35] A. Nesbit, L. Hollenberg, and A. Senyard. Towards automatic transcription of Australian aboriginal music. In ISMIR, pages 326-330, 2004.
    • [36] A. Ozerov, E. Vincent, and F. Bimbot. A general flexible framework for the handling of prior information in audio source separation. IEEE TASLP, 20(4):1118-1133, 2012.
    • [37] P.H. Peeling and S.J. Godsill. Multiple pitch estimation using non-homogeneous Poisson processes. IEEE JSTSP, 5(6):1133-1143, 2011.
    • [38] G. Poliner and D. Ellis. A discriminative model for polyphonic piano transcription. EURASIP JASP, 8:154-162, 2007.
    • [39] G. Poliner, D. Ellis, A. Ehmann, E. Gomez, S. Streich, and B. Ong. Melody transcription from music audio: Approaches and evaluation. IEEE TASLP, 15(4):1247-1256, 2007.
    • [40] G. Reis, N. Fonseca, F. F. de Vega, and A. Ferreira. Hybrid genetic algorithm based on gene fragment competition for polyphonic music transcription. In Conf. Applications of Evolutionary Computing, pages 305-314. 2008.
    • [41] M.P. Ryyna¨nen and A. Klapuri. Polyphonic music transcription using note event modeling. In WASPAA, pages 319-322, 2005.
    • [42] E.D. Scheirer. Using musical knowledge to extract expressive performance information from audio recordings. In H. Okuno and D. Rosenthal, editors, Readings in Computational Auditory Scene Analysis. Lawrence Erlbaum, 1997.
    • [43] P. Smaragdis and J. C. Brown. Non-negative matrix factorization for polyphonic music transcription. In WASPAA, pages 177-180, 2003.
    • [44] Y. Wang and B. Zhang. Application-specific music transcription for tutoring. IEEE MultiMedia, 15(3):70-74, 2008.
    • [45] C. Yeh. Multiple fundamental frequency estimation of polyphonic recordings. PhD thesis, Universite´ Paris VI - Pierre et Marie Curie, France, 2008.
  • No related research data.
  • No similar publications.

Share - Bookmark

Download from

Funded by projects

  • EC | MIRES

Cite this article