Improvements on Deep Bottleneck Network based I-Vector Representation for Spoken Language Identification
Song, Yan; Cui, Ruilian; McLoughlin, Ian Vince; Dai, Li-Rong(2016)
Recently, the i-vector representation based on deep bottleneck networks (DBN) pre-trained for automatic speech recognition has received significant interest for both speaker verification (SV) and language identification (LID). In particular, a recent unified DBN based i-vector framework, referred to as DBN-pGMM i-vector, has performed well.\ud In this paper, we replace the pGMM with a phonetic mixture of factor analyzers (pMFA), and propose a new DBN-pMFA i-vector. The DBN-pMFA i-vector includes the following improvements: (i) a pMFA model is derived from the DBN, which can jointly perform feature dimension reduction and de-correlation in a single linear transformation, (ii) a shifted DBF, termed SDBF, is proposed to exploit the temporal contextual information, (iii) a senone selection scheme is proposed to improve the i-vector extraction efficiently.\ud We evaluate the proposed DBN-pMFA i-vector on the most confused six languages selected from NIST LRE 2009. The experimental results demonstrate that DBN-pMFA can consistently outperform the previous DBN based framework. The computational complexity can be significantly reduced by applying a simple senone selection scheme.
Kent Academic Repository (http://kar.kent.ac.uk/55056/1/Odyssey2016_song_6.pdf)