🏛 EMNLP 2025 Findings  ·  SemEval-2025 Task 11

EmoBench-UA

The first manually annotated benchmark dataset for emotion detection in Ukrainian texts — covering 7 basic emotions, crowdsourced via Toloka.ai with high-confidence annotation.

📄 Dementieva, Babakov & Fraser 🇺🇦 Ukrainian Language NLP 🔬 TUM · USC · MCML
📄 Read the Paper 🤗 Models & Datasets 📚 ACL Anthology
😄Joy
😡Anger
😨Fear
😢Sadness
🤢Disgust
😲Surprise
😐None

Overview

A Benchmark for Ukrainian Emotion Detection

While Ukrainian NLP has seen progress across many text processing tasks — sentiment analysis, toxicity detection, NER — emotion classification remained an underexplored frontier with no publicly available benchmark. We fill this gap with EmoBench-UA.

The annotation schema follows Ekman's framework of basic emotions. Texts were sourced from Ukrainian tweets, filtered for quality, and annotated by crowdworkers on Toloka.ai with stringent quality controls — language proficiency tests, exam phases, and confidence-weighted majority voting via the Dawid-Skene model.

We evaluate a broad range of approaches: linguistic baselines, Transformer encoders (mBERT, XLM-RoBERTa, monolingual Ukrainian models), cross-lingual transfer, and modern LLMs — establishing the first comprehensive benchmark for this task.

4,949
Labeled Instances
7
Emotion Classes
Annotators per Text
0.85
Krippendorff's α
≥90%
Confidence Threshold

Methodology

Annotation Pipeline

A carefully designed crowdsourcing pipeline on Toloka.ai, split into two sequential projects to reduce annotator cognitive load and improve label quality.

01 — DATA SELECTION

Source & Filtering

Ukrainian tweets corpus filtered by length (5–49 words), toxicity score, and emotional content likelihood via translated English classifier.

02 — TWO-STAGE ANNOTATION

Split Projects

Project 1: Fear, Surprise, Disgust. Project 2: Anger, Joy, Sadness. Sequential design ensures multi-label coverage with lower cognitive load.

03 — QUALITY CONTROL

Annotator Screening

Ukrainian proficiency tests, training + exam phases, random control tasks, anti-fatigue breaks, and automated banning rules.

04 — AGGREGATION

Dawid-Skene Voting

5 annotators per instance. Final labels via confidence-weighted majority voting. Only instances with ≥90% confidence are retained.


Dataset Examples

Sample Annotations

Ukrainian social media texts labelled with Ekman's basic emotions. Multi-label instances are supported.

а я сьогодні біжу до щастя :)
and today I am running to happiness :)
Joy
ти з мене іздіваєшся?!
Are you kidding me?!
Anger
Шось мені ця арома кава не подобається, фу
I don't like this flavored coffee, ew
Disgust
такого ви ще не бачили!
you have never seen anything like it!
Surprise
Я скучила за цим місцем…
I missed this place...
Sadness
Починаю серйозно хвилюватись за котика.
I am starting to worry about the kitty.
Fear

Model Comparison

We evaluate linguistic baselines, Transformer encoders, and large language models. Ukrainian-specific fine-tuned models achieve the best performance.

Model Type F1 (macro)
Keywords BaselineLinguistic0.21
Logistic Regression (TF-IDF)Linguistic0.38
mBERTTransformer0.46
XLM-RoBERTa baseTransformer0.51
XLM-RoBERTa largeTransformer0.55
Twitter-XLM-RoBERTaTransformer0.54
EuroLLMLLM0.41
Llama 3.1LLM0.44
ukr-emotions-classifier (ours)Fine-tuned0.61 ★

Note: Full results table with all model variants and per-emotion scores available in the paper.


Resources

Open Data & Models

All resources are publicly released on Hugging Face under open licenses.

📊

ukr-emotions-binary

Multi-label emotion annotations (binary presence/absence) for 4,949 Ukrainian texts. Train / Dev / Test splits.

📈

ukr-emotions-intensity

Emotion annotations with perceived intensity levels (Low / Medium / High) collected from crowdworkers.

🗂

ukr-emotions-per-annotator

Raw individual annotations from all 5 annotators per instance, enabling inter-annotator analysis.

🤖

ukr-emotions-classifier

Best-performing fine-tuned classifier (0.6B) achieving state-of-the-art results on EmoBench-UA.

📄

arXiv Preprint

Full paper with dataset statistics, annotation guidelines, ablation studies, and error analysis.

📚

ACL Anthology

Published in Findings of EMNLP 2025. Also featured in SemEval-2025 Task 11 (Ukrainian track).


Authors

Research Team

Daryna Dementieva

TUM · MCML
Technical University of Munich
Munich Center for Machine Learning

Nikolay Babakov

USC
Universidade de Santiago de Compostela

Alexander Fraser

TUM · MCML · MDSI
Technical University of Munich
Munich Data Science Institute

Citation

Cite This Work

If you use EmoBench-UA in your research, please cite our paper:

@inproceedings{dementieva-etal-2025-emobench,
title = "{E}mo{B}ench-{UA}: A Benchmark Dataset for Emotion Detection in {U}krainian",
author = "Dementieva, Daryna and
Babakov, Nikolay and
Fraser, Alexander",
editor = "Christodoulopoulos, Christos and
Chakraborty, Tanmoy and
Rose, Carolyn and
Peng, Violet",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-emnlp.107/",
doi = "10.18653/v1/2025.findings-emnlp.107",
pages = "2025--2048",
ISBN = "979-8-89176-335-7",
abstract = "While Ukrainian NLP has seen progress in many texts processing tasks, emotion classification remains an underexplored area with no publicly available benchmark to date. In this work, we introduce **EmoBench-UA**, the first annotated dataset for emotion detection in Ukrainian texts. Our annotation schema is adapted from the previous English-centric works on emotion detection (Mohammad et al., 2018; Mohammad, 2022) guidelines. The dataset was created through crowdsourcing using the Toloka.ai platform ensuring high-quality of the annotation process. Then, we evaluate a range of approaches on the collected dataset, starting from linguistic-based baselines, synthetic data translated from English, to large language models (LLMs). Our findings highlight the challenges of emotion classification in non-mainstream languages like Ukrainian and emphasize the need for further development of Ukrainian-specific models and training resources." }