The first manually annotated benchmark dataset for emotion detection in Ukrainian texts — covering 7 basic emotions, crowdsourced via Toloka.ai with high-confidence annotation.
While Ukrainian NLP has seen progress across many text processing tasks — sentiment analysis, toxicity detection, NER — emotion classification remained an underexplored frontier with no publicly available benchmark. We fill this gap with EmoBench-UA.
The annotation schema follows Ekman's framework of basic emotions. Texts were sourced from Ukrainian tweets, filtered for quality, and annotated by crowdworkers on Toloka.ai with stringent quality controls — language proficiency tests, exam phases, and confidence-weighted majority voting via the Dawid-Skene model.
We evaluate a broad range of approaches: linguistic baselines, Transformer encoders (mBERT, XLM-RoBERTa, monolingual Ukrainian models), cross-lingual transfer, and modern LLMs — establishing the first comprehensive benchmark for this task.
A carefully designed crowdsourcing pipeline on Toloka.ai, split into two sequential projects to reduce annotator cognitive load and improve label quality.
Ukrainian tweets corpus filtered by length (5–49 words), toxicity score, and emotional content likelihood via translated English classifier.
Project 1: Fear, Surprise, Disgust. Project 2: Anger, Joy, Sadness. Sequential design ensures multi-label coverage with lower cognitive load.
Ukrainian proficiency tests, training + exam phases, random control tasks, anti-fatigue breaks, and automated banning rules.
5 annotators per instance. Final labels via confidence-weighted majority voting. Only instances with ≥90% confidence are retained.
Ukrainian social media texts labelled with Ekman's basic emotions. Multi-label instances are supported.
We evaluate linguistic baselines, Transformer encoders, and large language models. Ukrainian-specific fine-tuned models achieve the best performance.
| Model | Type | F1 (macro) |
|---|---|---|
| Keywords Baseline | Linguistic | 0.21 |
| Logistic Regression (TF-IDF) | Linguistic | 0.38 |
| mBERT | Transformer | 0.46 |
| XLM-RoBERTa base | Transformer | 0.51 |
| XLM-RoBERTa large | Transformer | 0.55 |
| Twitter-XLM-RoBERTa | Transformer | 0.54 |
| EuroLLM | LLM | 0.41 |
| Llama 3.1 | LLM | 0.44 |
| ukr-emotions-classifier (ours) | Fine-tuned | 0.61 ★ |
Note: Full results table with all model variants and per-emotion scores available in the paper.
All resources are publicly released on Hugging Face under open licenses.
Multi-label emotion annotations (binary presence/absence) for 4,949 Ukrainian texts. Train / Dev / Test splits.
Emotion annotations with perceived intensity levels (Low / Medium / High) collected from crowdworkers.
Raw individual annotations from all 5 annotators per instance, enabling inter-annotator analysis.
Best-performing fine-tuned classifier (0.6B) achieving state-of-the-art results on EmoBench-UA.
Full paper with dataset statistics, annotation guidelines, ablation studies, and error analysis.
Published in Findings of EMNLP 2025. Also featured in SemEval-2025 Task 11 (Ukrainian track).
If you use EmoBench-UA in your research, please cite our paper: