ResearchSpace

Aligning Audio Samples from the South African Parliament with Hansard Transcriptions

Show simple item record

dc.contributor.author Kleynhans, N
dc.contributor.author De Wet, Febe
dc.date.accessioned 2015-03-18T12:10:16Z
dc.date.available 2015-03-18T12:10:16Z
dc.date.issued 2014-11
dc.identifier.citation Kleynhans, N and De Wet, F.2014. Aligning Audio Samples from the South African Parliament with Hansard Transcriptions. Proceedings of the 2014 PRASA, RobMech and AfLaT International Joint Symposium, Cape Town, South Africa, 27-28 November 2014, pp 122-127 en_US
dc.identifier.isbn 978-0-620-62617-0
dc.identifier.uri http://hdl.handle.net/10204/7961
dc.description Proceedings of the 2014 PRASA, RobMech and AfLaT International Joint Symposium, Cape Town, South Africa, 27-28 November 2014 en_US
dc.description.abstract Most of the developing world can still be classified as under-resourced in terms of their languages. Harvesting suitable and relatively easily accessible spoken resources can drastically improve the situation. One such resource are parliamentary sessions, which in general are publicly available and are most often manually transcribed. In this investigation we present an automatic harvesting procedure which makes use of the “islands of certainty” principle to segment long utterances into more manageable shorter chunks and a garbage model to improve alignment by absorbing superfluous speech. The final harvesting approach was used to harvest 50 hours of South African Parliament audio data from a total 105 hours of raw audio data, at a GOP score of 1:94. The word alignment accuracy, performed on two parliamentary sessions, showed that over 96% of the words are within 1:07 seconds of the true position in the audio stream. en_US
dc.language.iso en en_US
dc.publisher Pattern Recognition Association of South Africa en_US
dc.relation.ispartofseries Workflow;14052
dc.subject Audio sample alignment en_US
dc.subject Hansard transcriptions en_US
dc.subject South African Parliament audio data en_US
dc.subject National Centre for Human Language Technology en_US
dc.title Aligning Audio Samples from the South African Parliament with Hansard Transcriptions en_US
dc.type Conference Presentation en_US
dc.identifier.apacitation Kleynhans, N., & De Wet, F. (2014). Aligning Audio Samples from the South African Parliament with Hansard Transcriptions. Pattern Recognition Association of South Africa. http://hdl.handle.net/10204/7961 en_ZA
dc.identifier.chicagocitation Kleynhans, N, and Febe De Wet. "Aligning Audio Samples from the South African Parliament with Hansard Transcriptions." (2014): http://hdl.handle.net/10204/7961 en_ZA
dc.identifier.vancouvercitation Kleynhans N, De Wet F, Aligning Audio Samples from the South African Parliament with Hansard Transcriptions; Pattern Recognition Association of South Africa; 2014. http://hdl.handle.net/10204/7961 . en_ZA
dc.identifier.ris TY - Conference Presentation AU - Kleynhans, N AU - De Wet, Febe AB - Most of the developing world can still be classified as under-resourced in terms of their languages. Harvesting suitable and relatively easily accessible spoken resources can drastically improve the situation. One such resource are parliamentary sessions, which in general are publicly available and are most often manually transcribed. In this investigation we present an automatic harvesting procedure which makes use of the “islands of certainty” principle to segment long utterances into more manageable shorter chunks and a garbage model to improve alignment by absorbing superfluous speech. The final harvesting approach was used to harvest 50 hours of South African Parliament audio data from a total 105 hours of raw audio data, at a GOP score of 1:94. The word alignment accuracy, performed on two parliamentary sessions, showed that over 96% of the words are within 1:07 seconds of the true position in the audio stream. DA - 2014-11 DB - ResearchSpace DP - CSIR KW - Audio sample alignment KW - Hansard transcriptions KW - South African Parliament audio data KW - National Centre for Human Language Technology LK - https://researchspace.csir.co.za PY - 2014 SM - 978-0-620-62617-0 T1 - Aligning Audio Samples from the South African Parliament with Hansard Transcriptions TI - Aligning Audio Samples from the South African Parliament with Hansard Transcriptions UR - http://hdl.handle.net/10204/7961 ER - en_ZA


Files in this item

This item appears in the following Collection(s)

Show simple item record