dc.contributor.author |
De Vries, NJ
|
|
dc.contributor.author |
Davel, MH
|
|
dc.contributor.author |
Badenhorst, J
|
|
dc.contributor.author |
Basson, WD
|
|
dc.contributor.author |
De Wet, Febe
|
|
dc.contributor.author |
Barnard, E
|
|
dc.contributor.author |
De Waal, A
|
|
dc.date.accessioned |
2014-01-24T10:14:54Z |
|
dc.date.available |
2014-01-24T10:14:54Z |
|
dc.date.issued |
2014-01 |
|
dc.identifier.citation |
De Vries, N.J, Davel, M.H, Badenhorst, J, Basson, W.D, De Wet, F, Barnard, E and De Waal, A. 2013. A smartphone-based ASR data collection tool for under-resourced languages. Speech Communication, vol. 56, pp 119-131 |
en_US |
dc.identifier.issn |
0167-6393 |
|
dc.identifier.uri |
http://ac.els-cdn.com/S0167639313000915/1-s2.0-S0167639313000915-main.pdf?_tid=a94337ca-8425-11e3-a98c-00000aab0f6c&acdnat=1390478484_e5cbae971fe2966b364e5b8c4b3bfc57
|
|
dc.identifier.uri |
http://hdl.handle.net/10204/7179
|
|
dc.description |
Copyright: 2013 Elsevier. This is an ABSTRACT ONLY. The definitive version is published in Speech Communication, vol. 56, pp 119-131 |
en_US |
dc.description.abstract |
Acoustic data collection for automatic speech recognition (ASR) purposes is a particularly challenging task when working with under resourced languages, many of which are found in the developing world. We provide a brief overview of related data collection strategies, highlighting some of the salient issues pertaining to collecting ASR data for under-resourced languages. We then describe the development of a smartphone-based data collection tool, Woefzela, which is designed to function in a developing world context. Specifically, this tool is designed to function without any Internet connectivity, while remaining portable and allowing for the collection of multiple sessions in parallel; it also simplifies the data collection process by providing process support to various role players during the data collection process, and performs on-device quality control in order to maximise the use of recording opportunities. The use of the tool is demonstrated as part of a South African data collection project, during which almost 800 hours of ASR data was collected, often in remote, rural areas, and subsequently used to successfully build acoustic models for eleven languages. The on-device quality control mechanism (referred to as QC-on-the-go) is an interesting aspect of the Woefzela tool and we discuss this functionality in more detail. We experiment with different uses of quality control information, and evaluate the impact of these on ASR accuracy. Woefzela was developed for the Android Operating System and is freely available for use on Android smartphones. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
Elsevier |
en_US |
dc.relation.ispartofseries |
Workflow;11636 |
|
dc.subject |
Automatic speech recognition |
en_US |
dc.subject |
ASR |
en_US |
dc.subject |
ASR data collection |
en_US |
dc.subject |
Smartphones |
en_US |
dc.subject |
Woefzela |
en_US |
dc.subject |
Speech resources |
en_US |
dc.subject |
Speech data collection |
en_US |
dc.subject |
Broadband speech corpora |
en_US |
dc.subject |
On-device quality control |
en_US |
dc.subject |
QC-on-the-go |
en_US |
dc.subject |
Android |
en_US |
dc.subject |
Under-resourced languages |
en_US |
dc.title |
A smartphone-based ASR data collection tool for under-resourced languages |
en_US |
dc.type |
Article |
en_US |
dc.identifier.apacitation |
De Vries, N., Davel, M., Badenhorst, J., Basson, W., De Wet, F., Barnard, E., & De Waal, A. (2014). A smartphone-based ASR data collection tool for under-resourced languages. http://hdl.handle.net/10204/7179 |
en_ZA |
dc.identifier.chicagocitation |
De Vries, NJ, MH Davel, J Badenhorst, WD Basson, Febe De Wet, E Barnard, and A De Waal "A smartphone-based ASR data collection tool for under-resourced languages." (2014) http://hdl.handle.net/10204/7179 |
en_ZA |
dc.identifier.vancouvercitation |
De Vries N, Davel M, Badenhorst J, Basson W, De Wet F, Barnard E, et al. A smartphone-based ASR data collection tool for under-resourced languages. 2014; http://hdl.handle.net/10204/7179. |
en_ZA |
dc.identifier.ris |
TY - Article
AU - De Vries, NJ
AU - Davel, MH
AU - Badenhorst, J
AU - Basson, WD
AU - De Wet, Febe
AU - Barnard, E
AU - De Waal, A
AB - Acoustic data collection for automatic speech recognition (ASR) purposes is a particularly challenging task when working with under resourced languages, many of which are found in the developing world. We provide a brief overview of related data collection strategies, highlighting some of the salient issues pertaining to collecting ASR data for under-resourced languages. We then describe the development of a smartphone-based data collection tool, Woefzela, which is designed to function in a developing world context. Specifically, this tool is designed to function without any Internet connectivity, while remaining portable and allowing for the collection of multiple sessions in parallel; it also simplifies the data collection process by providing process support to various role players during the data collection process, and performs on-device quality control in order to maximise the use of recording opportunities. The use of the tool is demonstrated as part of a South African data collection project, during which almost 800 hours of ASR data was collected, often in remote, rural areas, and subsequently used to successfully build acoustic models for eleven languages. The on-device quality control mechanism (referred to as QC-on-the-go) is an interesting aspect of the Woefzela tool and we discuss this functionality in more detail. We experiment with different uses of quality control information, and evaluate the impact of these on ASR accuracy. Woefzela was developed for the Android Operating System and is freely available for use on Android smartphones.
DA - 2014-01
DB - ResearchSpace
DP - CSIR
KW - Automatic speech recognition
KW - ASR
KW - ASR data collection
KW - Smartphones
KW - Woefzela
KW - Speech resources
KW - Speech data collection
KW - Broadband speech corpora
KW - On-device quality control
KW - QC-on-the-go
KW - Android
KW - Under-resourced languages
LK - https://researchspace.csir.co.za
PY - 2014
SM - 0167-6393
T1 - A smartphone-based ASR data collection tool for under-resourced languages
TI - A smartphone-based ASR data collection tool for under-resourced languages
UR - http://hdl.handle.net/10204/7179
ER -
|
en_ZA |