Kleynhans, NDe Wet, FebeBarnard, E2016-07-112016-07-112015-11Kleynhans, N, De Wet, F and Barnard, E. 2015. Unsupervised acoustic model training: comparing South African English and isiZulu. In: Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech), Port Elizabeth, South Africa, 25-26 November 2015http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7359512&tag=1http://hdl.handle.net/10204/8629Copyright: 2015 by IEEE.Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech), Port Elizabeth, South Africa, 25-26 November 2015Large amounts of untranscribed audio data are generated every day. These audio resources can be used to develop robust acoustic models that can be used in a variety of speech-based systems. Manually transcribing this data is resource intensive and requires funding, time and expertise. Lightly-supervised training techniques, however, provide a means to rapidly transcribe audio, thus reducing the initial resource investment to begin the modelling process. Our findings suggest that the lightly-supervised training technique works well for English but when moving to an agglutinative language, such as isiZulu, the process fails to achieve the performance seen for English. Additionally, phone-based performances are significantly worse when compared to an approach using word-based language models. These results indicate a strong dependence on large or well-matched text resources for lightly-supervised training techniques.enUntranscribed audio dataisiZuluWord-based language modelsLightly-supervised trainingUnsupervised trainingAutomatic transcription generationAudio harvestingEnglish languageUnsupervised acoustic model training: comparing South African English and isiZuluConference PresentationKleynhans, N., De Wet, F., & Barnard, E. (2015). Unsupervised acoustic model training: comparing South African English and isiZulu. IEEE. http://hdl.handle.net/10204/8629Kleynhans, N, Febe De Wet, and E Barnard. "Unsupervised acoustic model training: comparing South African English and isiZulu." (2015): http://hdl.handle.net/10204/8629Kleynhans N, De Wet F, Barnard E, Unsupervised acoustic model training: comparing South African English and isiZulu; IEEE; 2015. http://hdl.handle.net/10204/8629 .TY - Conference Presentation AU - Kleynhans, N AU - De Wet, Febe AU - Barnard, E AB - Large amounts of untranscribed audio data are generated every day. These audio resources can be used to develop robust acoustic models that can be used in a variety of speech-based systems. Manually transcribing this data is resource intensive and requires funding, time and expertise. Lightly-supervised training techniques, however, provide a means to rapidly transcribe audio, thus reducing the initial resource investment to begin the modelling process. Our findings suggest that the lightly-supervised training technique works well for English but when moving to an agglutinative language, such as isiZulu, the process fails to achieve the performance seen for English. Additionally, phone-based performances are significantly worse when compared to an approach using word-based language models. These results indicate a strong dependence on large or well-matched text resources for lightly-supervised training techniques. DA - 2015-11 DB - ResearchSpace DP - CSIR KW - Untranscribed audio data KW - isiZulu KW - Word-based language models KW - Lightly-supervised training KW - Unsupervised training KW - Automatic transcription generation KW - Audio harvesting KW - English language LK - https://researchspace.csir.co.za PY - 2015 T1 - Unsupervised acoustic model training: comparing South African English and isiZulu TI - Unsupervised acoustic model training: comparing South African English and isiZulu UR - http://hdl.handle.net/10204/8629 ER -