Author:Botha, GR; Barnard, EDate:Nov 2007The authors investigate the factors that determine the performance of text-based language identification, with a particular focus on the 11 official languages of South Africa, using n-gram statistics as features for classification. For a fixed ...Read more
Author:Zulu, PN; Botha, G; Barnard, EDate:2007Two methods for objectively measuring similarities and dissimilarities between the 11 official languages of South Africa are described. The first concerns the use of n-grams. The confusions between different languages in a text-based language ...Read more
Author:Botha, G; Zimu, V; Barnard, EDate:Nov 2006The authors investigate the performance of text-based language identification systems on the 11 official languages of South Africa, when n-gram statistics are used as features for classification. In particular, the authors compare support ...Read more