Global Seminar students help bring African languages to the digital age

Thirteen Princeton students traveled to Kenya this summer as part of the Global Seminar “Technology for African Languages in the Digital Age,” spending six weeks studying Swahili, collecting and analyzing data in the country, and collaborating with six students from Maseno University to build digital tools for underrepresented languages.
Working in small groups, the students completed three projects related to language models: one on topic classification, one on automatic speech recognition, and one on speech tagging, each focusing on translating to English, Swahili, and one or two of Kenya’s Indigenous languages. The students also conducted fieldwork, where they visited fish markets, beaches, and community centers across the country, and took and captioned photos of culturally significant places, objects, and interactions to generate datasets.
“Not only is having the data in the language important, but having it be culturally relevant is also important,” Rachel Adjei ’28 told PAW. Most large language models (LLMs) rely on automatic translations, but the students conducted manual, on-the-ground work to ensure accuracy and nuance.