Arabic NLP

WojoodNER 2023: The First Arabic Named Entity Recognition Shared Task

We present WojoodNER-2023, the first Arabic Named Entity Recognition (NER) Shared Task. The primary focus of WojoodNER 2023 is on Arabic NER, offering novel NER datasets (i.e., …

mustafa-jarrar

SALMA: Arabic Sense-Annotated Corpus and WSD Benchmarks

SALMA, the first Arabic sense-annotated corpus, consists of 34K tokens, which are all sense-annotated. The corpus is annotated using two different sense inventories simultaneously …

mustafa-jarrar

ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic

This paper presents the ArBanking77, a large Arabic dataset for intent detection in the banking domain. Our dataset was arabized and localized from the original English Banking77 …

mustafa-jarrar

Arabic Fine-Grained Entity Recognition

Traditional NER systems are typically trained to recognize coarse-grained entities, and less attention is given to classifying entities into a hierarchy of fine-grained lower-level …

haneen-liqreina

Offensive Hebrew Corpus and Detection using BERT

Offensive language detection has been well studied in many languages, but it is lagging behind in low-resource languages, such as Hebrew. In this paper, we present a new offensive …

nagham-hamad

Context-Gloss Augmentation for Improving Arabic Target Sense Verification

Arabic language lacks semantic datasets and sense inventories. The most common semantically-labeled dataset for Arabic is the ArabGlossBERT, a relatively small dataset that …

sanad-malaysha
Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT featured image

Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT

This paper presents Wojood, a corpus for Arabic nested Named Entity Recognition (NER). Nested entities occur when one entity mention is embedded inside another entity mention. …

mustafa-jarrar
Wojood - Arabic NER featured image

Wojood - Arabic NER

Wojood consists of about 550K tokens (MSA and dialect) that are manually annotated with 21 entity types (e.g., person, organization, location, event, date, etc). It covers multiple …