Louw, Johannes A2025-12-172025-12-172025https://doi.org/10.1007/978-3-032-11733-5_11http://hdl.handle.net/10204/14527Syllables are fundamental units in speech production and carry prosodic information, but their acoustic and linguistic properties across different language families are not well understood. This study examines syllable discovery approaches across South African languages using algorithmic syllabification and S5-HuBERT, a self-supervised speech representation model that demonstrates emergent syllabic organization. We analyzed speech recordings from eleven languages representing five language families in South Africa using a systematic comparison of rulebased and data-driven syllable discovery methods. We evaluated both approaches using cross-linguistic consistency measures and acoustic quality assessments across speakers. Our analysis reveals fundamental differences between the two approaches. Algorithmic syllables demonstrate strong language-family clustering with predominantly language-specific units, while S5-HuBERT units show superior cross-linguistic sharing and weaker family effects. Speaker independence analysis across four experimental phases demonstrates that data-driven methods achieve better acoustic consistency, with the fully data-driven approach reaching near-optimal speaker generalization. These results provide empirical guidance for implementing syllable-based semantic units in multilingual text-to-speech systems for resource-scarce languages.AbstractenSelf-supervised learningSpeech representationSyllable segmentationS5-HuBERTCross-linguistic analysisText-to-speech synthesisExploring syllable similarity across South African languages through self-supervised speech representationArticleN/A