dc.contributor.author |
Kleynhans, N
|
|
dc.contributor.author |
De Wet, Febe
|
|
dc.date.accessioned |
2015-03-18T12:10:16Z |
|
dc.date.available |
2015-03-18T12:10:16Z |
|
dc.date.issued |
2014-11 |
|
dc.identifier.citation |
Kleynhans, N and De Wet, F.2014. Aligning Audio Samples from the South African Parliament with Hansard Transcriptions. Proceedings of the 2014 PRASA, RobMech and AfLaT International Joint Symposium, Cape Town, South Africa, 27-28 November 2014, pp 122-127 |
en_US |
dc.identifier.isbn |
978-0-620-62617-0 |
|
dc.identifier.uri |
http://hdl.handle.net/10204/7961
|
|
dc.description |
Proceedings of the 2014 PRASA, RobMech and AfLaT International Joint Symposium, Cape Town, South Africa, 27-28 November 2014 |
en_US |
dc.description.abstract |
Most of the developing world can still be classified as under-resourced in terms of their languages. Harvesting suitable and relatively easily accessible spoken resources can drastically improve the situation. One such resource are parliamentary sessions, which in general are publicly available and are most often manually transcribed. In this investigation we present an automatic harvesting procedure which makes use of the “islands of certainty” principle to segment long utterances into more manageable shorter chunks and a garbage model to improve alignment by absorbing superfluous speech. The final harvesting approach was used to harvest 50 hours of South African Parliament audio data from a total 105 hours of raw audio data, at a GOP score of 1:94. The word alignment accuracy, performed on two parliamentary sessions, showed that over 96% of the words are within 1:07 seconds of the true position in the audio stream. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
Pattern Recognition Association of South Africa |
en_US |
dc.relation.ispartofseries |
Workflow;14052 |
|
dc.subject |
Audio sample alignment |
en_US |
dc.subject |
Hansard transcriptions |
en_US |
dc.subject |
South African Parliament audio data |
en_US |
dc.subject |
National Centre for Human Language Technology |
en_US |
dc.title |
Aligning Audio Samples from the South African Parliament with Hansard Transcriptions |
en_US |
dc.type |
Conference Presentation |
en_US |
dc.identifier.apacitation |
Kleynhans, N., & De Wet, F. (2014). Aligning Audio Samples from the South African Parliament with Hansard Transcriptions. Pattern Recognition Association of South Africa. http://hdl.handle.net/10204/7961 |
en_ZA |
dc.identifier.chicagocitation |
Kleynhans, N, and Febe De Wet. "Aligning Audio Samples from the South African Parliament with Hansard Transcriptions." (2014): http://hdl.handle.net/10204/7961 |
en_ZA |
dc.identifier.vancouvercitation |
Kleynhans N, De Wet F, Aligning Audio Samples from the South African Parliament with Hansard Transcriptions; Pattern Recognition Association of South Africa; 2014. http://hdl.handle.net/10204/7961 . |
en_ZA |
dc.identifier.ris |
TY - Conference Presentation
AU - Kleynhans, N
AU - De Wet, Febe
AB - Most of the developing world can still be classified as under-resourced in terms of their languages. Harvesting suitable and relatively easily accessible spoken resources can drastically improve the situation. One such resource are parliamentary sessions, which in general are publicly available and are most often manually transcribed. In this investigation we present an automatic harvesting procedure which makes use of the “islands of certainty” principle to segment long utterances into more manageable shorter chunks and a garbage model to improve alignment by absorbing superfluous speech. The final harvesting approach was used to harvest 50 hours of South African Parliament audio data from a total 105 hours of raw audio data, at a GOP score of 1:94. The word alignment accuracy, performed on two parliamentary sessions, showed that over 96% of the words are within 1:07 seconds of the true position in the audio stream.
DA - 2014-11
DB - ResearchSpace
DP - CSIR
KW - Audio sample alignment
KW - Hansard transcriptions
KW - South African Parliament audio data
KW - National Centre for Human Language Technology
LK - https://researchspace.csir.co.za
PY - 2014
SM - 978-0-620-62617-0
T1 - Aligning Audio Samples from the South African Parliament with Hansard Transcriptions
TI - Aligning Audio Samples from the South African Parliament with Hansard Transcriptions
UR - http://hdl.handle.net/10204/7961
ER -
|
en_ZA |