Lezghian ML Community

Founded a community of ML enthusiasts and linguists to preserve the Lezghian language through AI. Built a Russian-Lezghian translator, TTS model, and language corpus.

NLP Translation TTS Language Preservation HuggingFace Python

Lezghian ML Community

Founded an international community of ML enthusiasts and linguists dedicated to preserving the Lezghian language through modern AI technologies.

Mission

Create accessible tools and resources that preserve and advance the Lezghian language — an endangered language spoken by ~800,000 people — using state-of-the-art AI.

What We Built

Russian ↔ Lezghian Translator

Fine-tuned NLLB-200-distilled-600M model for accurate translation between Russian and Lezghian. Available as a Telegram bot (@lek_translator_bot) for everyday use by native speakers.

Lezghian TTS (Text-to-Speech)

Trained a VITS-based TTS model to generate natural Lezghian speech from text. One of the first TTS systems for the Lezghian language.

Language Corpus & Datasets

Assembled the largest open Lezghian language corpus with a team of 16 volunteers:

DatasetSizeDescription
Bible (Lezghian-Russian)13.8K parallel sentencesLargest parallel corpus
Lezgi Gazet Archives402 articlesNews articles corpus
CNAL Lezghian-Russian762 entriesLiterary translations
Lez Wiki4.4K articlesWikipedia dump
Secret of Third Planet361 entriesChildren’s book translation

Multilingual Embeddings

Fine-tuned multilingual-e5-large and LaBSE models for Lezghian language understanding and semantic search.

Impact

  • First open-source NLP toolkit for the Lezghian language
  • 16 active community members and growing
  • Telegram bot used by native speakers daily
  • Contributing to digital preservation of an endangered language
  • All models and datasets freely available on HuggingFace

Technologies

Python, PyTorch, HuggingFace Transformers, NLLB, VITS, LaBSE, mT5