Applying phonological feature embeddings for cross-lingual transfer in text-to-speech

Louw, Johannes AWang, Z2025-01-102025-01-102024-07979-8-3503-6559-72768-3311DOI: 10.1109/TSP63128.2024.10605769http://hdl.handle.net/10204/13924In this work, we build upon our previous research where we introduced phonological features as input to text-to-speech systems. While the use of phonological features is not a novel concept in our research, our focus in this study is on the comprehensive analysis of the embeddings produced by the encoder model, which we believe offers novel insights into the model’s ability to capture and generalize phonological patterns across languages. Cross-lingual transfer experiments are conducted using both a resource-rich and a resource-constrained language to explore the model’s cross-lingual transfer capabilities across different linguistic families. The analysis of the embedding vectors produced by the encoder model is conducted using cluster maps to visualize the hierarchical clusters obtained using a clustering procedure. This analysis reveals the model’s learning patterns and provides insights into how phonological features contribute to the model’s ability to handle linguistic diversity and data scarcity.FulltextenCross-lingualPhonological featureResource scarceText-to-speechApplying phonological feature embeddings for cross-lingual transfer in text-to-speechConference Presentationn/a