Training Data Summary

Overview

Our Approach

ANED AI models are built on capable open-weight foundation models, then extensively fine-tuned on African language data, business context, and regulatory knowledge. Last Updated: April 2026.

Section 1

African Language Data Sources

Our corpora include: Masakhane research community; N-ATLaS; African Wikipedia dumps; community-sourced corpora from African universities; and curated text spanning literature, journalism, legal documents, and business communications across the continent.

Section 2

Business & Regulatory Context

We train on: mobile money transaction patterns; cross-border trade documentation; regulatory frameworks from KRA, CBN, SARS, GRA, NBE and other African authorities; legal contract templates from 20+ jurisdictions; healthcare protocols; and agricultural advisory content in local languages.

Section 3

Languages Covered

2,000+ African language variants with primary focus on: Swahili, Hausa, Yoruba, Igbo, Amharic, Zulu, Xhosa, Twi, Wolof, Kikuyu, Luo, Somali, Oromo, Tigrinya, Sotho, Tswana, Fulfulde, Lingala, Kinyarwanda, Shona, and many more. Languages marked "in training" are actively being added.

Section 4

What We Do NOT Use

Your prompts, inputs, and generated outputs are never used to train our models. User content is retained only temporarily for abuse monitoring, then permanently deleted. This applies to all plans including the free tier.

Section 5

Ethical Data Sourcing

We obtain proper licensing for all training data, compensate data contributors fairly, respect intellectual property, acknowledge academic partners, and conduct regular audits of sourcing practices.

Section 6

Continuous Improvement

Quarterly cycle: track foundation model ecosystem, evaluate against our African Language Benchmark Suite, re-fine-tune when better base available. Each release tested across 8+ languages for quality, accuracy, safety, and latency.

Section 7

Contact

Research: research@anedcenter.com