Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
InkubaLM
#1
: A small language model for low-resource African languages

As AI practitioners, we are committed to forging an inclusive future through the power of AI. While AI holds the promise of global prosperity, the challenge lies in the resources required for large models, which are often out of reach for the majority of the world and fail for the languages in those contexts. Open-source models have attempted to bridge this gap, but more can be done to make models cost-effective, accessible, and locally relevant. Introducing InkubaLM (Dung Beetle Language Model) – a robust, compact model designed to serve African communities without requiring extensive resources. Like the dung beetle, which moves 250 times its weight, InkubaLM exemplifies the strength of smaller models. Accompanied by two datasets, InkubaLM marks the first of many initiatives to distribute the resource load, ensuring African communities are empowered to access tools such as Machine Translation, Sentiment Analysis, Named Entity Recognition (NER), Parts of Speech Tagging (POS), Question Answering, and Topic Classification for their languages.

Model
To address the need for lightweight African language models, we introduce a small language model, InkubaLM-0.4B, trained for the five African languages: IsiZulu, Yoruba, Hausa, Swahili, and IsiXhosa. During training, we also include English and French.

InkubaLM-0.4B has been trained from scratch using 1.9 billion tokens of data for the five African languages, along with English and French data, totalling 2.4 billion tokens of data. Similar to the model architecture used for MobileLLM, we trained InkubaLM with a parameter size of 0.4 billion and a vocabulary size of 61788. The figure below shows the training data and model sizes of different public models. When we compare our model in terms of these parameters, we find that our model is the smallest in terms of size and has been trained using the smallest amount of data compared to other models.



https://lelapa.ai/inkubalm-a-small-langu...languages/
https://huggingface.co/lelapa/InkubaLM-0.4B


Messages In This Thread
InkubaLM - by AI Prompt Warehouse - 01-15-2026, 07:54 AM

Forum Jump:


Users browsing this thread: 1 Guest(s)