Text-based language identification for the South African languages

Botha, G; Zimu, V; Barnard, E

dc.contributor.author	Botha, G
dc.contributor.author	Zimu, V
dc.contributor.author	Barnard, E
dc.date.accessioned	2007-07-04T06:15:37Z
dc.date.available	2007-07-04T06:15:37Z
dc.date.issued	2006-11
dc.identifier.citation	Botha, G, Zimu, V and Barnard, E.2006. Text-based language identification for the South African languages. 17th Annual Symposium of the Pattern Recognition Association of South Africa, Parys, South Africa, 29 Nov - 1 Dec 2006, pp 7	en
dc.identifier.uri	http://hdl.handle.net/10204/951
dc.description	This paper was later published in the SAIEE Africa Research Journal, Vol 98(4), pp 141-146
dc.description.abstract	The authors investigate the performance of text-based language identification systems on the 11 official languages of South Africa, when n-gram statistics are used as features for classification. In particular, the authors compare support vector machines (SVMs) and likelihood-based classifiers on different amounts of input text, both from a closed domain and an open domain. With as few as 15 words of input text, reliable language identification is possible. Although the SVM is generally more accurate a classifier, the additional computational complexity of training this classifier may not be justified in light of the importance of using a large value for n.	en
dc.language.iso	en	en
dc.subject	Language identification systems	en
dc.subject	Official languages	en
dc.subject	Support Vector Machine	en
dc.title	Text-based language identification for the South African languages	en
dc.type	Conference Presentation	en
dc.identifier.apacitation	Botha, G., Zimu, V., & Barnard, E. (2006). Text-based language identification for the South African languages. http://hdl.handle.net/10204/951	en_ZA
dc.identifier.chicagocitation	Botha, G, V Zimu, and E Barnard. "Text-based language identification for the South African languages." (2006): http://hdl.handle.net/10204/951	en_ZA
dc.identifier.vancouvercitation	Botha G, Zimu V, Barnard E, Text-based language identification for the South African languages; 2006. http://hdl.handle.net/10204/951 .	en_ZA
dc.identifier.ris	TY - Conference Presentation AU - Botha, G AU - Zimu, V AU - Barnard, E AB - The authors investigate the performance of text-based language identification systems on the 11 official languages of South Africa, when n-gram statistics are used as features for classification. In particular, the authors compare support vector machines (SVMs) and likelihood-based classifiers on different amounts of input text, both from a closed domain and an open domain. With as few as 15 words of input text, reliable language identification is possible. Although the SVM is generally more accurate a classifier, the additional computational complexity of training this classifier may not be justified in light of the importance of using a large value for n. DA - 2006-11 DB - ResearchSpace DP - CSIR KW - Language identification systems KW - Official languages KW - Support Vector Machine LK - https://researchspace.csir.co.za PY - 2006 T1 - Text-based language identification for the South African languages TI - Text-based language identification for the South African languages UR - http://hdl.handle.net/10204/951 ER -	en_ZA

Files in this item

Name: Botha_2006.pdf

Size: 72.53Kb

Format: PDF

View/Open

This item appears in the following Collection(s)

Conference Publications

Show simple item record

Browse

All of ResearchSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Publication Type
- Cluster
- Impact Area

Quick Links

Legislation and compliance

General Enquiries

Tel: + 27 12 841 2911
Email: callcentre@csir.co.za

Physical Address
Meiring Naudé Road
Brummeria
Pretoria
South Africa

Postal Address
PO Box 395
Pretoria 0001
South Africa

Social Connect

Resources on this site are free to download and reuse according to associated licensing provision. Please read the terms and conditions of usage of each resource.