Speech data collection in an under-resourced language within a multilingual context

Molapo, B; Barnard, E; De Wet, Febe

Speech data collection in an under-resourced language within a multilingual context

http://hdl.handle.net/10204/7621

Abstract:

In this paper, we present an end-to-end solution to the development of an automatic speech recognition (ASR) system in typical under-resourced languages, where the target language is likely to be influenced by one more embedded foreign languages. We first describe the collection and processing of the text corpus crawled from the World Wide Web using the Rapid Language Adaptation Toolkit. In particular, we highlight the challenges faced when foreign languages are embedded within the matrix language. Thereafter, we discuss our speech data collection efforts in under-resourced environments. We finally report on a strategy called transliteration that aids to improve recognition results of our grapheme-based automatic speech recognition system in the presence of embedded language words.

Reference:

Molapo, R and Barnard, E and De Wet, F. 2014. Speech data collection in an under-resourced language within a multilingual context. In: 4th International Workshop on Spoken Language Technologies for Under-resourced Languages, St Petersburg, Russia, 14-16 May 2014

Molapo, B., Barnard, E., & De Wet, F. (2014). Speech data collection in an under-resourced language within a multilingual context. International Research Insitute. http://hdl.handle.net/10204/7621

Molapo, B, E Barnard, and Febe De Wet. "Speech data collection in an under-resourced language within a multilingual context." (2014): http://hdl.handle.net/10204/7621

Molapo B, Barnard E, De Wet F, Speech data collection in an under-resourced language within a multilingual context; International Research Insitute; 2014. http://hdl.handle.net/10204/7621 .

Download RIS

4th International Workshop on Spoken Language Technologies for Under-resourced Languages, St Petersburg, Russia, 14-16 May 2014

Molapo, B
Barnard, E
De Wet, Febe

May 2014

Under-resourced languages
Transliteration
Matrix language
Transliteration
Grapheme-based ASR

Show full item record

Files in this item

Molapo_2014.pdf

This item appears in the following Collection(s)

Conference Publications

Browse

All of ResearchSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Publication Type
- Cluster
- Impact Area

Quick Links

Legislation and compliance

General Enquiries

Tel: + 27 12 841 2911
Email: callcentre@csir.co.za

Physical Address
Meiring Naudé Road
Brummeria
Pretoria
South Africa

Postal Address
PO Box 395
Pretoria 0001
South Africa

Social Connect

Resources on this site are free to download and reuse according to associated licensing provision. Please read the terms and conditions of usage of each resource.

Speech data collection in an under-resourced language within a multilingual context

Speech data collection in an under-resourced language within a multilingual context

This item appears in the following Collection(s)

Browse

All of ResearchSpace

This Collection

Quick Links

Legislation and compliance

General Enquiries

Social Connect