Lezghian ML Community
Founded a community of ML enthusiasts and linguists to preserve the Lezghian language through AI. Built a Russian-Lezghian translator, TTS model, and language corpus.
Lezghian ML Community
Founded an international community of ML enthusiasts and linguists dedicated to preserving the Lezghian language through modern AI technologies.
Mission
Create accessible tools and resources that preserve and advance the Lezghian language — an endangered language spoken by ~800,000 people — using state-of-the-art AI.
What We Built
Russian ↔ Lezghian Translator
Fine-tuned NLLB-200-distilled-600M model for accurate translation between Russian and Lezghian. Available as a Telegram bot (@lek_translator_bot) for everyday use by native speakers.
Lezghian TTS (Text-to-Speech)
Trained a VITS-based TTS model to generate natural Lezghian speech from text. One of the first TTS systems for the Lezghian language.
Language Corpus & Datasets
Assembled the largest open Lezghian language corpus with a team of 16 volunteers:
| Dataset | Size | Description |
|---|---|---|
| Bible (Lezghian-Russian) | 13.8K parallel sentences | Largest parallel corpus |
| Lezgi Gazet Archives | 402 articles | News articles corpus |
| CNAL Lezghian-Russian | 762 entries | Literary translations |
| Lez Wiki | 4.4K articles | Wikipedia dump |
| Secret of Third Planet | 361 entries | Children’s book translation |
Multilingual Embeddings
Fine-tuned multilingual-e5-large and LaBSE models for Lezghian language understanding and semantic search.
Impact
- First open-source NLP toolkit for the Lezghian language
- 16 active community members and growing
- Telegram bot used by native speakers daily
- Contributing to digital preservation of an endangered language
- All models and datasets freely available on HuggingFace
Technologies
Python, PyTorch, HuggingFace Transformers, NLLB, VITS, LaBSE, mT5