Govender, AvashnaDe Wet, Febe2017-06-072017-06-072016-12Govender, A. and De Wet, F. 2016. Objective measures to improve the selection of training speakers in HMM-based child speech synthesis. 2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference, 30 November - 2 December 2016, Stellenbosch, South Africa, p. 25-30. DOI: 10.1109/RoboMech.2016.7813193978-1-5090-3335-5DOI: 10.1109/RoboMech.2016.7813193http://ieeexplore.ieee.org/document/7813193/http://hdl.handle.net/10204/91792016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference, 30 November - 2 December 2016, Stellenbosch, South Africa.Building synthetic child voices is considered a difficult task due to the challenges associated with data collection. As a result, speaker adaptation in conjunction with Hidden Markov Model (HMM)-based synthesis has become prevalent in this domain because the approach caters for limited amounts of data. An initial average voice model is trained using data from multiple speakers and adapted to resemble a specific target child speaker. Due to the scarcity of child speech data, initial models used in this approach are mostly trained with adult speech data. However, selection of appropriate training speakers from large corpora is not a trivial task because there is no means, other than conducting exhaustive subjective listening tests, to determine which training speakers will yield the best quality synthetic child voice. Therefore, there is a need to find an objective measure that can be used to easily identify a small set of training speakers that will yield the best quality output. In this paper we investigate whether a relationship exists between objective and subjective voice evaluation measures with regard to the selection of training speakers for an average voice model used in speaker-adaptive HMM child speech synthesis. Results indicate that, if training speakers that are closer to the target speaker are used to train initial models, better quality child voices are generated.enSynthetic child voicesHidden Markov ModelObjective measures to improve the selection of training speakers in HMM-based child speech synthesisConference PresentationGovender, A., & De Wet, F. (2016). Objective measures to improve the selection of training speakers in HMM-based child speech synthesis. IEEE. http://hdl.handle.net/10204/9179Govender, Avashna, and Febe De Wet. "Objective measures to improve the selection of training speakers in HMM-based child speech synthesis." (2016): http://hdl.handle.net/10204/9179Govender A, De Wet F, Objective measures to improve the selection of training speakers in HMM-based child speech synthesis; IEEE; 2016. http://hdl.handle.net/10204/9179 .TY - Conference Presentation AU - Govender, Avashna AU - De Wet, Febe AB - Building synthetic child voices is considered a difficult task due to the challenges associated with data collection. As a result, speaker adaptation in conjunction with Hidden Markov Model (HMM)-based synthesis has become prevalent in this domain because the approach caters for limited amounts of data. An initial average voice model is trained using data from multiple speakers and adapted to resemble a specific target child speaker. Due to the scarcity of child speech data, initial models used in this approach are mostly trained with adult speech data. However, selection of appropriate training speakers from large corpora is not a trivial task because there is no means, other than conducting exhaustive subjective listening tests, to determine which training speakers will yield the best quality synthetic child voice. Therefore, there is a need to find an objective measure that can be used to easily identify a small set of training speakers that will yield the best quality output. In this paper we investigate whether a relationship exists between objective and subjective voice evaluation measures with regard to the selection of training speakers for an average voice model used in speaker-adaptive HMM child speech synthesis. Results indicate that, if training speakers that are closer to the target speaker are used to train initial models, better quality child voices are generated. DA - 2016-12 DB - ResearchSpace DP - CSIR KW - Synthetic child voices KW - Hidden Markov Model LK - https://researchspace.csir.co.za PY - 2016 SM - 978-1-5090-3335-5 T1 - Objective measures to improve the selection of training speakers in HMM-based child speech synthesis TI - Objective measures to improve the selection of training speakers in HMM-based child speech synthesis UR - http://hdl.handle.net/10204/9179 ER -