🏛 EMNLP 2025 Findings · SemEval-2025 Task 11

EmoBench-UA

The first manually annotated benchmark dataset for emotion detection in Ukrainian texts — covering 7 basic emotions, crowdsourced via Toloka.ai with high-confidence annotation.

📄 Dementieva, Babakov & Fraser 🇺🇦 Ukrainian Language NLP 🔬 TUM · USC · MCML

📄 Read the Paper 🤗 Models & Datasets 📚 ACL Anthology

Overview

A Benchmark for Ukrainian Emotion Detection

While Ukrainian NLP has seen progress across many text processing tasks — sentiment analysis, toxicity detection, NER — emotion classification remained an underexplored frontier with no publicly available benchmark. We fill this gap with EmoBench-UA.

The annotation schema follows Ekman's framework of basic emotions. Texts were sourced from Ukrainian tweets, filtered for quality, and annotated by crowdworkers on Toloka.ai with stringent quality controls — language proficiency tests, exam phases, and confidence-weighted majority voting via the Dawid-Skene model.

We evaluate a broad range of approaches: linguistic baselines, Transformer encoders (mBERT, XLM-RoBERTa, monolingual Ukrainian models), cross-lingual transfer, and modern LLMs — establishing the first comprehensive benchmark for this task.

4,949

Labeled Instances

Emotion Classes

5×

Annotators per Text

0.85

Krippendorff's α

≥90%

Confidence Threshold

Methodology

Annotation Pipeline

A carefully designed crowdsourcing pipeline on Toloka.ai, split into two sequential projects to reduce annotator cognitive load and improve label quality.

01 — DATA SELECTION

Source & Filtering

Ukrainian tweets corpus filtered by length (5–49 words), toxicity score, and emotional content likelihood via translated English classifier.

02 — TWO-STAGE ANNOTATION

Split Projects

Project 1: Fear, Surprise, Disgust. Project 2: Anger, Joy, Sadness. Sequential design ensures multi-label coverage with lower cognitive load.

03 — QUALITY CONTROL

Annotator Screening

Ukrainian proficiency tests, training + exam phases, random control tasks, anti-fatigue breaks, and automated banning rules.

04 — AGGREGATION

Dawid-Skene Voting

5 annotators per instance. Final labels via confidence-weighted majority voting. Only instances with ≥90% confidence are retained.

Dataset Examples

Sample Annotations

Ukrainian social media texts labelled with Ekman's basic emotions. Multi-label instances are supported.

а я сьогодні біжу до щастя :)

and today I am running to happiness :)

Joy

ти з мене іздіваєшся?!

Are you kidding me?!

Anger

Шось мені ця арома кава не подобається, фу

I don't like this flavored coffee, ew

Disgust

такого ви ще не бачили!

you have never seen anything like it!

Surprise

Я скучила за цим місцем…

I missed this place...

Sadness

Починаю серйозно хвилюватись за котика.

I am starting to worry about the kitty.

Fear

Experimental Results

Model Comparison

We evaluate linguistic baselines, Transformer encoders, and large language models. Ukrainian-specific fine-tuned models achieve the best performance.

Model	Type	F1 (macro)
Keywords Baseline	Linguistic	0.21
Logistic Regression (TF-IDF)	Linguistic	0.38
mBERT	Transformer	0.46
XLM-RoBERTa base	Transformer	0.51
XLM-RoBERTa large	Transformer	0.55
Twitter-XLM-RoBERTa	Transformer	0.54
EuroLLM	LLM	0.41
Llama 3.1	LLM	0.44
ukr-emotions-classifier (ours)	Fine-tuned	0.61 ★

Note: Full results table with all model variants and per-emotion scores available in the paper.

Resources

Open Data & Models

All resources are publicly released on Hugging Face under open licenses.

📊

ukr-emotions-binary

Multi-label emotion annotations (binary presence/absence) for 4,949 Ukrainian texts. Train / Dev / Test splits.

📈

ukr-emotions-intensity

Emotion annotations with perceived intensity levels (Low / Medium / High) collected from crowdworkers.

🗂

ukr-emotions-per-annotator

Raw individual annotations from all 5 annotators per instance, enabling inter-annotator analysis.

🤖

ukr-emotions-classifier

Best-performing fine-tuned classifier (0.6B) achieving state-of-the-art results on EmoBench-UA.

📄

arXiv Preprint

Full paper with dataset statistics, annotation guidelines, ablation studies, and error analysis.

📚

ACL Anthology

Published in Findings of EMNLP 2025. Also featured in SemEval-2025 Task 11 (Ukrainian track).

Authors

Research Team

Daryna Dementieva

TUM · MCML

Technical University of Munich

Munich Center for Machine Learning

daryna.dementieva@tum.de

Nikolay Babakov

USC

Universidade de Santiago de Compostela

nikolay.babakov@usc.es

Alexander Fraser

TUM · MCML · MDSI

Technical University of Munich

Munich Data Science Institute

Citation

Cite This Work

If you use EmoBench-UA in your research, please cite our paper:

@inproceedings{dementieva-etal-2025-emobench,
title = "{E}mo{B}ench-{UA}: A Benchmark Dataset for Emotion Detection in {U}krainian",
author = "Dementieva, Daryna and
Babakov, Nikolay and
Fraser, Alexander",
editor = "Christodoulopoulos, Christos and
Chakraborty, Tanmoy and
Rose, Carolyn and
Peng, Violet",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-emnlp.107/",
doi = "10.18653/v1/2025.findings-emnlp.107",
pages = "2025--2048",
ISBN = "979-8-89176-335-7",
abstract = "While Ukrainian NLP has seen progress in many texts processing tasks, emotion classification remains an underexplored area with no publicly available benchmark to date. In this work, we introduce **EmoBench-UA**, the first annotated dataset for emotion detection in Ukrainian texts. Our annotation schema is adapted from the previous English-centric works on emotion detection (Mohammad et al., 2018; Mohammad, 2022) guidelines. The dataset was created through crowdsourcing using the Toloka.ai platform ensuring high-quality of the annotation process. Then, we evaluate a range of approaches on the collected dataset, starting from linguistic-based baselines, synthetic data translated from English, to large language models (LLMs). Our findings highlight the challenges of emotion classification in non-mainstream languages like Ukrainian and emphasize the need for further development of Ukrainian-specific models and training resources." }