Alif 1.0 launched in Pakistan: A groundbreaking AI model for Urdu

Alif 1.0 is trained on high-quality Urdu Alpaca dataset, generated through multilingual synthetic data techniques and human feedback refinement
An illustration created by generative AI. — Canva
An illustration created by generative AI. — Canva

Traversaal.ai has released Alif 1.0 as the first Urdu large language model (LLM) in Pakistan. Based on Llama-3.1-8B, Alif-1.0 outperforms top multilingual models, including Google’s Gemma, Cohere’s Aya Expanse, and Meta’s recent Llama models among many others.

This model was trained 2x faster with Unsloth and Huggingface's Timberland regional Library (TRL) library.

Why is Alif 1.0 exciting?

Alif 1.0 is trained on a high-quality Urdu Alpaca dataset, generated through multilingual synthetic data techniques and human feedback refinement. The dataset includes:

  1. Classification
  2. Sentiment Analysis
  3. Logical Reasoning with Urdu Chain-of-Thought (CoT)
  4. Question Answering (QA)
  5. Text Generation
  6. Bilingual Translations
  7. Ethics & Safety Assessments

Alif 1.0 availability

Alif 1.0 is available on Hugging Face for the users.

Urdu is a low-resource language, and this is a major step toward democratising AI for Urdu speakers. The developers have built a manually curated dataset covering diverse open-ended and closed-ended tasks.

Alif 1.0 performance highlight

Alif 1.0 outperforms top open-source models on different benchmarks, including:

  1. Gemma-2 (9B)
  2. Mistral-Nemo (12B)
  3. Cohere's Aya Expanse (8B)
  4. Qwen 2.5 (7B)
  5. Llama 3.1 (8B)

Alif 1.0 is claimed as a monumental step forward for Urdu NLP, ensuring cultural and linguistic alignment while expanding bilingual AI capabilities in Pakistan.