Datasets
Published Datasets
AfroBench published at Under review for 64 African languages.
GlobalMMLU published at Under review for 42 languages.
GlobalMMLU published at Under review for 42 languages on MMLU.
Uhura published at Under review for QA on six African languages.
TransWebEdu published at Under review for language modeling of 10 languages (machine translated).
BRIGHTER published at Under review for emotion detection of 28 languages.
INJONGO published at Under review for slot filling and intent detection of 16 African languages.
AFRIDOC-MT published at Under review for document level MT for five African languages.
WorldCuisines published at NAACL 2025 for food visual QA in 30 languages.
IrokoBench published at NAACL 2025 for MMLU, math reasoning and NLI for 16 African languages.
Warri published at NAACL Findings 2025 for machine translation of Nigerian-Pidgin.
CVQA published at NeurIPS 2024 for visual QA for 31 languages.
YORULECT published at EMNLP 2024 for three Yoruba dialects (Speech and MT).
SIB-200 published at EACL 2024 for topic classification for 200+ languages.
AfriMTE published at NAACL 2024 for machine translation evaluation of 13 African language pairs.
EkoHate published at WOAH Workshop @NAACL 2024 for abusive and hate speech detection of Lagos election tweets.
NaijaRC published at AfricaNLP Workshop @ICLR 2024 for reading comprehension QA for three Nigerian languages.
AfriHG published at AfricaNLP Workshop @ICLR 2024 for news headline generation for 16 languages.
YAD published at AfricaNLP Workshop @ICLR 2024 for diacritics restoration of Yoruba language.
Wura published at EMNLP 2024 for language modelling of 16 African languages.
IroyinSpeech published at LREC-COLING 2024 for ASR and TTS for Yoruba.
XTREME-UP published at EMNLP Findings 2023 , a benchmark for 88 under-represented languages.
AfriQA published at EMNLP 2023 for cross-lingual open retrieval QA of 10 African languages.
NollySenti published at ACL 2023 for movie sentiment classification of 5 Nigerian languages.
MasakhaPOS published at ACL 2023 for parts of speech tagging of 20 African languages.
[ε ku
](https://github.com/ajesujoba/yoruba_greetings/tree/main/data) published at [C3NLP at EACL 2021](https://aclanthology.org/2023.c3nlp-1.1/) for cultural greetings in machine translation for Yoruba. MasakhaNEWS published at IJCNLP-AACL for topic classification of 16 African languages (Area Chair Award).
MphayaNER published at AfricaNLP Workshop at ICLR 2023 for named entity recognition of Tshivenda.
AfriSenti published at EMNLP 2023 for sentiment classification of 14 African languages
MasakhaNER 2.0 published at EMNLP 2022 for named entity recognition of 20 African languages
ANTC published at COLING 2022 for topic classification of 5 African languages (Best Paper Award (Grand Challenges))
BibleTTS published at Interspeech 2022 for text-to-speech of 10 African languages
MAFAND published at NAACL 2022 for machine translation of 21 African languages
MasakhaNER published at TACL 2021 for named entity recognition of 10 African languages
Menyo-20k published at MT Summit 2021 for machine translation of Yoruba
Hausa-VOA-NER, Hausa-VOA-Topics, and Yoruba-BBC-Topics published at EMNLP 2020 for named entity recognition and topic classification of Hausa and Yoruba
Yoruba-NER published at LREC 2020 for named entity recognition of Yoruba