Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Sigtia, Siddharth; Benetos, Emmanouil; Boulanger-Lewandowski, Nicolas; Weyde, Tillman; Garcez, Artur S. d'Avila; Dixon, Simon (2014)
Languages: English
Types: Unknown
Subjects: RC0321, QA75, M, TA, Computer Science - Learning

Classified by OpenAIRE into

arxiv: Computer Science::Sound
We investigate the problem of incorporating higher-level symbolic score-like information into Automatic Music Transcription (AMT) systems to improve their performance. We use recurrent neural networks (RNNs) and their variants as music language models (MLMs) and present a generative architecture for combining these models with predictions from a frame level acoustic classifier. We also compare different neural network architectures for acoustic modeling. The proposed model computes a distribution over possible output sequences given the acoustic input signal and we present an algorithm for performing a global search for good candidate transcriptions. The performance of the proposed model is evaluated on piano music from the MAPS dataset and we observe that the proposed model consistently outperforms existing transcription methods.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • [1] Mert Bay, Andreas F Ehmann, and J Stephen Downie. Evaluation of multiple-F0 estimation and tracking systems. In International Society for Music Information Retrieval Conference, pages 315-320, 2009.
    • [2] Sebastian Bock and Markus Schedl. Polyphonic piano note transcription with recurrent neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 121-124. IEEE, 2012.
    • [3] Nicolas Boulanger-lewandowski, Yoshua Bengio, and Pascal Vincent. Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription. In Proceedings of the 29th International Conference on Machine Learning (ICML), pages 1159-1166, 2012.
    • [4] Nicolas Boulanger-Lewandowski, Yoshua Bengio, and Pascal Vincent. High-dimensional sequence transduction. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3178-3182. IEEE, 2013.
    • [5] Nicolas Boulanger-Lewandowski, Jasha Droppo, Mike Seltzer, and Dong Yu. Phone sequence modeling with recurrent neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5417-5421, May 2014.
    • [6] Valentin Emiya, Roland Badeau, and Bertrand David. Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Transactions on Audio, Speech, and Language Processing, 18(6):1643-1654, 2010.
    • [7] Ian Goodfellow, Honglak Lee, Quoc V Le, Andrew Saxe, and Andrew Y Ng. Measuring invariances in deep networks. In Advances in neural information processing systems, pages 646- 654, 2009.
    • [8] Alex Graves. Sequence transduction with recurrent neural networks. In Representation Learning Workshop, ICML, 2012.
    • [9] Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdelrahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82-97, 2012.
    • [10] Anssi Klapuri and Manuel Davy, editors. Signal Processing Methods for Music Transcription. Springer-Verlag, New York, 2006.
    • [11] John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML '01, pages 282-289, 2001.
    • [12] J. Martens and I. Sutskever. Learning recurrent neural networks with Hessian-free optimization. In Proceedings of the 28th International Conference Machine on Learning (ICML), pages 1033-1040, 2011.
    • [13] Juhan Nam, Jiquan Ngiam, Honglak Lee, and Malcolm Slaney. A classification-based polyphonic piano transcription approach using learned feature representations. In International Society for Music Information Retrieval Conference (ISMIR), pages 175-180, 2011.
    • [14] Graham E Poliner and Daniel PW Ellis. A discriminative model for polyphonic piano transcription. EURASIP Journal on Advances in Signal Processing, 2007, 2007.
    • [15] S.A Raczynski, E. Vincent, and S. Sagayama. Dynamic bayesian networks for symbolic polyphonic pitch modeling. IEEE Transactions on Audio, Speech, and Language Processing, 21(9):1830-1840, Sept 2013.
    • [16] D.E. Rumelhart, G.E. Hintont, and R.J. Williams. Learning representations by back-propagating errors. Nature, 323:533- 536, 1986.
    • [17] Ju¨rgen Schmidhuber. Learning complex, extended sequences using the principle of history compression. Neural Computation, 4(2):234-242, 1992.
    • [18] Siddharth Sigtia, Emmanouil Benetos, Srikanth Cherla, Tillman Weyde, Artur S. dAvila Garcez, and Simon Dixon. An RNN-based music language model for improving automatic music transcription. In International Society for Music Information Retrieval Conference (ISMIR), 2014.
    • [19] Umut Simsekli, Jonathan Le Roux, and John R. Hershey. Hierarchical and coupled non-negative dynamical systems with application to audio modeling. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct 2013.
  • No related research data.
  • No similar publications.

Share - Bookmark

Cite this article