We improve on a piece-wise linear model of the trajectories of Mel Frequency Cepstral Coefficients, which are commonly used as features in Automatic Speech Recognition. For this purpose, we have created a very clean single-speaker corpus, which is ideal for the investigation of contextual effects on cepstral trajectories. We show that modelling improvements, such as continuity constraints on parameter values and more flexible transition models, systematically improve the robustness of our trajectory models. However, the parameter estimates remain unexpectedly variable within triphone contexts, suggesting interesting challenges for further exploration.
Reference:
Badenhorst, J, Davel, MH and Barnard, E. 2012. Improved transition models for cepstral trajectories. 23rd Annual Symposium of the Pattern Recognition Association of South Africa (PRASA), Pretoria, South Africa, 29-30 November 2012
Badenhorst, J., Davel, M., & Barnard, E. (2012). Improved transition models for cepstral trajectories. PRASA. http://hdl.handle.net/10204/6466
Badenhorst, J, MH Davel, and E Barnard. "Improved transition models for cepstral trajectories." (2012): http://hdl.handle.net/10204/6466
Badenhorst J, Davel M, Barnard E, Improved transition models for cepstral trajectories; PRASA; 2012. http://hdl.handle.net/10204/6466 .