ResearchSpace

A smartphone-based ASR data collection tool for under-resourced languages

Show simple item record

dc.contributor.author De Vries, NJ
dc.contributor.author Davel, MH
dc.contributor.author Badenhorst, J
dc.contributor.author Basson, WD
dc.contributor.author De Wet, Febe
dc.contributor.author Barnard, E
dc.contributor.author De Waal, A
dc.date.accessioned 2014-01-24T10:14:54Z
dc.date.available 2014-01-24T10:14:54Z
dc.date.issued 2014-01
dc.identifier.citation De Vries, N.J, Davel, M.H, Badenhorst, J, Basson, W.D, De Wet, F, Barnard, E and De Waal, A. 2013. A smartphone-based ASR data collection tool for under-resourced languages. Speech Communication, vol. 56, pp 119-131 en_US
dc.identifier.issn 0167-6393
dc.identifier.uri http://ac.els-cdn.com/S0167639313000915/1-s2.0-S0167639313000915-main.pdf?_tid=a94337ca-8425-11e3-a98c-00000aab0f6c&acdnat=1390478484_e5cbae971fe2966b364e5b8c4b3bfc57
dc.identifier.uri http://hdl.handle.net/10204/7179
dc.description Copyright: 2013 Elsevier. This is an ABSTRACT ONLY. The definitive version is published in Speech Communication, vol. 56, pp 119-131 en_US
dc.description.abstract Acoustic data collection for automatic speech recognition (ASR) purposes is a particularly challenging task when working with under resourced languages, many of which are found in the developing world. We provide a brief overview of related data collection strategies, highlighting some of the salient issues pertaining to collecting ASR data for under-resourced languages. We then describe the development of a smartphone-based data collection tool, Woefzela, which is designed to function in a developing world context. Specifically, this tool is designed to function without any Internet connectivity, while remaining portable and allowing for the collection of multiple sessions in parallel; it also simplifies the data collection process by providing process support to various role players during the data collection process, and performs on-device quality control in order to maximise the use of recording opportunities. The use of the tool is demonstrated as part of a South African data collection project, during which almost 800 hours of ASR data was collected, often in remote, rural areas, and subsequently used to successfully build acoustic models for eleven languages. The on-device quality control mechanism (referred to as QC-on-the-go) is an interesting aspect of the Woefzela tool and we discuss this functionality in more detail. We experiment with different uses of quality control information, and evaluate the impact of these on ASR accuracy. Woefzela was developed for the Android Operating System and is freely available for use on Android smartphones. en_US
dc.language.iso en en_US
dc.publisher Elsevier en_US
dc.relation.ispartofseries Workflow;11636
dc.subject Automatic speech recognition en_US
dc.subject ASR en_US
dc.subject ASR data collection en_US
dc.subject Smartphones en_US
dc.subject Woefzela en_US
dc.subject Speech resources en_US
dc.subject Speech data collection en_US
dc.subject Broadband speech corpora en_US
dc.subject On-device quality control en_US
dc.subject QC-on-the-go en_US
dc.subject Android en_US
dc.subject Under-resourced languages en_US
dc.title A smartphone-based ASR data collection tool for under-resourced languages en_US
dc.type Article en_US
dc.identifier.apacitation De Vries, N., Davel, M., Badenhorst, J., Basson, W., De Wet, F., Barnard, E., & De Waal, A. (2014). A smartphone-based ASR data collection tool for under-resourced languages. http://hdl.handle.net/10204/7179 en_ZA
dc.identifier.chicagocitation De Vries, NJ, MH Davel, J Badenhorst, WD Basson, Febe De Wet, E Barnard, and A De Waal "A smartphone-based ASR data collection tool for under-resourced languages." (2014) http://hdl.handle.net/10204/7179 en_ZA
dc.identifier.vancouvercitation De Vries N, Davel M, Badenhorst J, Basson W, De Wet F, Barnard E, et al. A smartphone-based ASR data collection tool for under-resourced languages. 2014; http://hdl.handle.net/10204/7179. en_ZA
dc.identifier.ris TY - Article AU - De Vries, NJ AU - Davel, MH AU - Badenhorst, J AU - Basson, WD AU - De Wet, Febe AU - Barnard, E AU - De Waal, A AB - Acoustic data collection for automatic speech recognition (ASR) purposes is a particularly challenging task when working with under resourced languages, many of which are found in the developing world. We provide a brief overview of related data collection strategies, highlighting some of the salient issues pertaining to collecting ASR data for under-resourced languages. We then describe the development of a smartphone-based data collection tool, Woefzela, which is designed to function in a developing world context. Specifically, this tool is designed to function without any Internet connectivity, while remaining portable and allowing for the collection of multiple sessions in parallel; it also simplifies the data collection process by providing process support to various role players during the data collection process, and performs on-device quality control in order to maximise the use of recording opportunities. The use of the tool is demonstrated as part of a South African data collection project, during which almost 800 hours of ASR data was collected, often in remote, rural areas, and subsequently used to successfully build acoustic models for eleven languages. The on-device quality control mechanism (referred to as QC-on-the-go) is an interesting aspect of the Woefzela tool and we discuss this functionality in more detail. We experiment with different uses of quality control information, and evaluate the impact of these on ASR accuracy. Woefzela was developed for the Android Operating System and is freely available for use on Android smartphones. DA - 2014-01 DB - ResearchSpace DP - CSIR KW - Automatic speech recognition KW - ASR KW - ASR data collection KW - Smartphones KW - Woefzela KW - Speech resources KW - Speech data collection KW - Broadband speech corpora KW - On-device quality control KW - QC-on-the-go KW - Android KW - Under-resourced languages LK - https://researchspace.csir.co.za PY - 2014 SM - 0167-6393 T1 - A smartphone-based ASR data collection tool for under-resourced languages TI - A smartphone-based ASR data collection tool for under-resourced languages UR - http://hdl.handle.net/10204/7179 ER - en_ZA


Files in this item

This item appears in the following Collection(s)

Show simple item record