Researchspace >
General science, engineering & technology >
General science, engineering & technology >
General science, engineering & technology >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10204/951

Title: Text-based language identification for the South African languages
Authors: Botha, G
Zimu, V
Barnard, E
Keywords: Language identification systems
Official languages
Support Vector Machine
Issue Date: Nov-2006
Citation: Botha, G, Zimu, V and Barnard, E.2006. Text-based language identification for the South African languages. 17th Annual Symposium of the Pattern Recognition Association of South Africa, Parys, South Africa, 29 Nov - 1 Dec 2006, pp 7
Abstract: The authors investigate the performance of text-based language identification systems on the 11 official languages of South Africa, when n-gram statistics are used as features for classification. In particular, the authors compare support vector machines (SVMs) and likelihood-based classifiers on different amounts of input text, both from a closed domain and an open domain. With as few as 15 words of input text, reliable language identification is possible. Although the SVM is generally more accurate a classifier, the additional computational complexity of training this classifier may not be justified in light of the importance of using a large value for n.
Description: This paper was later published in the SAIEE Africa Research Journal, Vol 98(4), pp 141-146
URI: http://hdl.handle.net/10204/951
Appears in Collections:Human language technologies
General science, engineering & technology

Files in This Item:

File Description SizeFormat
Botha_2006.pdf72.54 kBAdobe PDFView/Open
View Statistics

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.


Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - Feedback