ResearchSpace

The NCHLT speech corpus of the South African languages

Show simple item record

dc.contributor.author Barnard, E
dc.contributor.author Davel, MH
dc.contributor.author Van Heerden, C
dc.contributor.author De Wet, Febe
dc.contributor.author Badenhorst, J
dc.date.accessioned 2014-07-30T09:25:09Z
dc.date.available 2014-07-30T09:25:09Z
dc.date.issued 2014-05
dc.identifier.citation Barnard, E, Davel, M.H, Van Heerden, C, De Wet, F and Badenhorst, J. 2014. The NCHLT speech corpus of the South African languages. In: 4th International Workshop on Spoken Language Technologies for Under-Resourced Languages, St Petersburg, Russia, 14-16 May 2014 en_US
dc.identifier.isbn 978-5-8088-0908-6
dc.identifier.uri http://mica.edu.vn/sltu2014/proceedings/28.pdf
dc.identifier.uri http://hdl.handle.net/10204/7549
dc.description 4th International Workshop on Spoken Language Technologies for Under-Resourced Languages, St Petersburg, Russia, 14-16 May 2014 en_US
dc.description.abstract The NCHLT speech corpus contains wide-band speech from approximately 200 speakers per language, in each of the eleven of cial languages of South Africa. We describe the design and development processes that were undertaken in order to develop the corpus, and report on associated materials such as orthographic transcriptions and pronunciation dictionaries that were released as part of the corpus. In order to benchmark speech recognition performance on the corpus, we have also developed both phone-recognition and word-recognition systems for all eleven languages; we nd that high accuracies can be achieved for these speaker-independent but vocabulary-dependent recognition tasks in all languages. en_US
dc.language.iso en en_US
dc.relation.ispartofseries Workflow;13145
dc.subject Automatic Speech Recognition en_US
dc.subject ASR en_US
dc.subject Text-to speech en_US
dc.subject TTS en_US
dc.subject South African languages en_US
dc.subject Spoken language technologies en_US
dc.subject Under-resources languages en_US
dc.title The NCHLT speech corpus of the South African languages en_US
dc.type Conference Presentation en_US
dc.identifier.apacitation Barnard, E., Davel, M., Van Heerden, C., De Wet, F., & Badenhorst, J. (2014). The NCHLT speech corpus of the South African languages. http://hdl.handle.net/10204/7549 en_ZA
dc.identifier.chicagocitation Barnard, E, MH Davel, C Van Heerden, Febe De Wet, and J Badenhorst. "The NCHLT speech corpus of the South African languages." (2014): http://hdl.handle.net/10204/7549 en_ZA
dc.identifier.vancouvercitation Barnard E, Davel M, Van Heerden C, De Wet F, Badenhorst J, The NCHLT speech corpus of the South African languages; 2014. http://hdl.handle.net/10204/7549 . en_ZA
dc.identifier.ris TY - Conference Presentation AU - Barnard, E AU - Davel, MH AU - Van Heerden, C AU - De Wet, Febe AU - Badenhorst, J AB - The NCHLT speech corpus contains wide-band speech from approximately 200 speakers per language, in each of the eleven of cial languages of South Africa. We describe the design and development processes that were undertaken in order to develop the corpus, and report on associated materials such as orthographic transcriptions and pronunciation dictionaries that were released as part of the corpus. In order to benchmark speech recognition performance on the corpus, we have also developed both phone-recognition and word-recognition systems for all eleven languages; we nd that high accuracies can be achieved for these speaker-independent but vocabulary-dependent recognition tasks in all languages. DA - 2014-05 DB - ResearchSpace DP - CSIR KW - Automatic Speech Recognition KW - ASR KW - Text-to speech KW - TTS KW - South African languages KW - Spoken language technologies KW - Under-resources languages LK - https://researchspace.csir.co.za PY - 2014 SM - 978-5-8088-0908-6 T1 - The NCHLT speech corpus of the South African languages TI - The NCHLT speech corpus of the South African languages UR - http://hdl.handle.net/10204/7549 ER - en_ZA


Files in this item

This item appears in the following Collection(s)

Show simple item record