Building transcribed speech corpora for under-resourced languages plays a pivotal role in developing speech technologies for such languages. The authors have developed an open-source tool for devices running the Android operating system to facilitate the efficient collection of speech data for Automatic Speech Recognition system development. The tool was designed for use in typical developing-world conditions; they present the relevant design choices and analyse the effectiveness of this tool by means of a case study. In particular, they introduce a novel semi-real-time quality monitoring system, which increases the efficiency of the data collection process.
Reference:
De Vries, NJ, Badenhorst, J, Davel, MH et al. 2011. Woefzela - An open-source platform for ASR data collection in the developing world. INTERSPEECH 2011, Florence, Italy, 27-31 August 2011
De Vries, N., Badenhorst, J., Davel, M., Barnard, E., & De Waal, A. (2011). Woefzela - An open-source platform for ASR data collection in the developing world. Conference paper. http://hdl.handle.net/10204/5149
De Vries, NJ, J Badenhorst, MH Davel, E Barnard, and A De Waal. "Woefzela - An open-source platform for ASR data collection in the developing world." (2011): http://hdl.handle.net/10204/5149
De Vries N, Badenhorst J, Davel M, Barnard E, De Waal A, Woefzela - An open-source platform for ASR data collection in the developing world; Conference paper; 2011. http://hdl.handle.net/10204/5149 .