
Traversaal.ai has released Alif 1.0 as the first Urdu large language model (LLM) in Pakistan. Based on Llama-3.1-8B, Alif-1.0 outperforms top multilingual models, including Google’s Gemma, Cohere’s Aya Expanse, and Meta’s recent Llama models among many others.
This model was trained 2x faster with Unsloth and Huggingface's Timberland regional Library (TRL) library.
Why is Alif 1.0 exciting?
Alif 1.0 is trained on a high-quality Urdu Alpaca dataset, generated through multilingual synthetic data techniques and human feedback refinement. The dataset includes:
- Classification
- Sentiment Analysis
- Logical Reasoning with Urdu Chain-of-Thought (CoT)
- Question Answering (QA)
- Text Generation
- Bilingual Translations
- Ethics & Safety Assessments
Alif 1.0 availability
Alif 1.0 is available on Hugging Face for the users.
Urdu is a low-resource language, and this is a major step toward democratising AI for Urdu speakers. The developers have built a manually curated dataset covering diverse open-ended and closed-ended tasks.
Alif 1.0 performance highlight
Alif 1.0 outperforms top open-source models on different benchmarks, including:
- Gemma-2 (9B)
- Mistral-Nemo (12B)
- Cohere's Aya Expanse (8B)
- Qwen 2.5 (7B)
- Llama 3.1 (8B)
Alif 1.0 is claimed as a monumental step forward for Urdu NLP, ensuring cultural and linguistic alignment while expanding bilingual AI capabilities in Pakistan.