ResearchSpace

Text-based language identification for the South African languages

Show simple item record

dc.contributor.author Botha, G
dc.contributor.author Zimu, V
dc.contributor.author Barnard, E
dc.date.accessioned 2007-07-04T06:15:37Z
dc.date.available 2007-07-04T06:15:37Z
dc.date.issued 2006-11
dc.identifier.citation Botha, G, Zimu, V and Barnard, E.2006. Text-based language identification for the South African languages. 17th Annual Symposium of the Pattern Recognition Association of South Africa, Parys, South Africa, 29 Nov - 1 Dec 2006, pp 7 en
dc.identifier.uri http://hdl.handle.net/10204/951
dc.description This paper was later published in the SAIEE Africa Research Journal, Vol 98(4), pp 141-146
dc.description.abstract The authors investigate the performance of text-based language identification systems on the 11 official languages of South Africa, when n-gram statistics are used as features for classification. In particular, the authors compare support vector machines (SVMs) and likelihood-based classifiers on different amounts of input text, both from a closed domain and an open domain. With as few as 15 words of input text, reliable language identification is possible. Although the SVM is generally more accurate a classifier, the additional computational complexity of training this classifier may not be justified in light of the importance of using a large value for n. en
dc.language.iso en en
dc.subject Language identification systems en
dc.subject Official languages en
dc.subject Support Vector Machine en
dc.title Text-based language identification for the South African languages en
dc.type Conference Presentation en
dc.identifier.apacitation Botha, G., Zimu, V., & Barnard, E. (2006). Text-based language identification for the South African languages. http://hdl.handle.net/10204/951 en_ZA
dc.identifier.chicagocitation Botha, G, V Zimu, and E Barnard. "Text-based language identification for the South African languages." (2006): http://hdl.handle.net/10204/951 en_ZA
dc.identifier.vancouvercitation Botha G, Zimu V, Barnard E, Text-based language identification for the South African languages; 2006. http://hdl.handle.net/10204/951 . en_ZA
dc.identifier.ris TY - Conference Presentation AU - Botha, G AU - Zimu, V AU - Barnard, E AB - The authors investigate the performance of text-based language identification systems on the 11 official languages of South Africa, when n-gram statistics are used as features for classification. In particular, the authors compare support vector machines (SVMs) and likelihood-based classifiers on different amounts of input text, both from a closed domain and an open domain. With as few as 15 words of input text, reliable language identification is possible. Although the SVM is generally more accurate a classifier, the additional computational complexity of training this classifier may not be justified in light of the importance of using a large value for n. DA - 2006-11 DB - ResearchSpace DP - CSIR KW - Language identification systems KW - Official languages KW - Support Vector Machine LK - https://researchspace.csir.co.za PY - 2006 T1 - Text-based language identification for the South African languages TI - Text-based language identification for the South African languages UR - http://hdl.handle.net/10204/951 ER - en_ZA


Files in this item

This item appears in the following Collection(s)

Show simple item record