Google Launches WAXAL To Expand African Languages In AI

By TOSI ORE
GOOGLE has launched WAXAL (the Wolof, Amharic, Xitsonga, Afrikaans, and Luganda speech dataset), a large-scale open speech dataset designed to strengthen African representation in artificial intelligence, in partnership with leading research institutions across the continent.
The initiative aims to give more than 100 million Africans access to voice-enabled AI technologies by addressing the long-standing shortage of high-quality speech data for indigenous African languages.
In a statement on Monday, Head of Google Research Africa, Aisha Walcott-Bryant, said WAXAL was developed to empower African communities through locally relevant technology.
According to her, the dataset provides a foundation for students, researchers, and entrepreneurs to develop AI tools tailored to Africa’s linguistic and cultural realities.
“This dataset allows innovators to build on their own terms, in their own languages, and reach communities that have long been excluded from digital technologies,” Walcott-Bryant said.
WAXAL includes foundational speech data for 21 Sub-Saharan African languages, among them Hausa, Yoruba, Luganda, and Acholi. It was developed over a three-year period with Google funding and contains 1,250 hours of transcribed natural speech, alongside more than 20 hours of studio-quality recordings for synthetic voice development.
Walcott-Bryant said limited access to speech data has slowed the development of voice technologies across Africa’s more than 2,000 languages, preventing many people from using digital tools in their native tongues.
She added that WAXAL prioritised community ownership, with African institutions leading data collection while receiving technical support from Google.
Partner institutions include Makerere University, the University of Ghana, and Rwanda’s Digital Umuganda, all of which retain full ownership of the data under what Google describes as an equitable, partnership-led AI development model.

