Wojood - Arabic NER

Photo by rawpixel on Unsplash

Wojood consists of about 550K tokens (MSA and dialect) that are manually annotated with 21 entity types (e.g., person, organization, location, event, date, etc). It covers multiple domains and was annotated with nested entities. The corpus contains about 75K entities and 22.5% of which are nested. A nested named entity recognition (NER) model based on BERT was trained (F1-score 88.4%).

Mohammed Khalilia (محمد عبد الستار قاسم)
Mohammed Khalilia (محمد عبد الستار قاسم)
Senior Applied Scientist | Adjunct Professor