We experiment with a new method to create synthetic models of rare and unseen triphones in order to supplement limited automatic speech recognition (ASR) training data. A trajectory model is used to characterise seen transitions at the spectral level, and these models are then used to create features for unseen or rare triphones. We find that a fairly restricted model (piece-wise linear with three line segments per channel of a diphone transition) is able to represent training data quite accurately. We report on initial results when creating additional triphones for a single-speaker data set, finding small but significant gains, especially when adding additional samples of rare (rather than unseen) triphones.
Reference:
Badenhorst, J and Davel, MH. 2015. Synthetic triphones from trajectory-based feature distributions. In: Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobTech), Port Elizabeth, South Africa, 25-26 November 2015
Badenhorst, J., & Davel, M. (2015). Synthetic triphones from trajectory-based feature distributions. IEEE. http://hdl.handle.net/10204/8737
Badenhorst, J, and MH Davel. "Synthetic triphones from trajectory-based feature distributions." (2015): http://hdl.handle.net/10204/8737
Badenhorst J, Davel M, Synthetic triphones from trajectory-based feature distributions; IEEE; 2015. http://hdl.handle.net/10204/8737 .
Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobTech), Port Elizabeth, South Africa, 25-26 November 2015