Mohammed Khalilia

Lead Data Scientist

Professional Summary

Mohammed Khalilia (محمد عبد الستار قاسم) is a researcher, computer scientist, and data scientist with a PhD in Computer Science from the University of Missouri. Following his doctorate, he joined Georgia Tech’s Computational Science and Engineering school and Emory University as a Postdoctoral Fellow, where his research spanned predictive modeling, relational cluster analysis, and health and nursing informatics.

He then spent nearly five years at Amazon, working across Amazon Web Services (AWS) and Amazon Studios in natural language processing (NLP), speech synthesis, and computer vision. In 2018, he was part of the team that launched Comprehend Medical, Amazon’s NLP service for clinical and biomedical text. At Qualtrics, he developed the company’s first fine-tuned large language model, trained synthetic sampling model, and worked on conversational machine learning, and active learning.

He is also an adjunct professor at Birzeit University, where he teaches NLP courses for doctoral students.

Education

PhD Computer Science

2007-01-01
2014-05-15

University of Missouri-Columbia • USA

BS Computer Science

2001-08-31
2006-12-31

University of Missouri-Columbia • USA

Interests

Machine learning Natural Language Processing Biomedical/health informatics Predictive modeling Large Language Models

Featured Projects

Health

SuraMed (AI Radiology)

SuraMed is an AI radiology company built for the Arab world, focused on developing advanced clinical imaging tools tailored to hospitals, clinics, and radiology centres across …

Apr 1, 2026 • 1 min read

Surveys

Synthetic Panels

Recruiting the right participants for a study can be difficult. You may not get the exact demographics you need, and the shorter the deadline, the less sure you can be that …

Jan 11, 2025 • 1 min read

Anonymization

Data Anonymization using NER

Customers own their data, and while a data use agreement permits the use of anonymized data, raw data cannot be used for model training. The anonymization tools are rule-based, …

Jan 11, 2024 • 1 min read

Surveys

Insight Explorer

Insights Explorer is an AI-powered text analytics tool that uses your open-ended feedback to identify top themes, create headlines, and generate helpful summaries. This feature can …

Jan 11, 2024 • 1 min read

Arabic NLP

Wojood - Arabic NER

Wojood consists of about 550K tokens (MSA and dialect) that are manually annotated with 21 entity types (e.g., person, organization, location, event, date, etc). It covers multiple …

Jan 1, 2022 • 1 min read

Clinical NLP

Comprehend Medical

Amazon Comprehend Medical is a HIPAA-eligible natural language processing (NLP) service that uses machine learning that has been pre-trained to understand and extract health data …

Jan 11, 2018 • 1 min read

Featured Publications

Speech

Multi-Channel Volume Level Equalization Based on User Preferences

Systems, devices, and methods are provided for multi-stem volume equalization, wherein the volume levels of each stem may be adjusted non-uniformly. Audio may be diarized into a …

mohammed-khalilia

• Dec 19, 2023 • 1 min read

NLP

Service architecture for entity and relationship detection in unstructured text

Techniques for entity and relationship detect from unstructured text as a service are described. A service may receive a request to identify entities within a provided unstructured …

thiruvarul-selvan-senthivel

• Nov 1, 2022 • 1 min read

Computer Vision

Facial Feature Location-based Audio Frame Replacement

Played audio frames included in first audio content may be received over one or more networks. The first audio content may further include a replaced audio frame. The first audio …

mohammed-khalilia

• Aug 2, 2022 • 1 min read

Computer Vision

Video Frame Replacement Based on Auxiliary Data

The popularity of videoconferencing has increased rap idly in recent years. Video conferencing tools may allow multiple people at multiple different locations to interact by …

gregory-johnson

• Jun 21, 2022 • 1 min read

Deep Learning

Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT

This paper presents Wojood, a corpus for Arabic nested Named Entity Recognition (NER). Nested entities occur when one entity mention is embedded inside another entity mention. …

mustafa-jarrar

• Jan 1, 2022 • 1 min read

Deep Learning

Joint Entity Extraction and Assertion Detection for Clinical Text

Negative medical findings are prevalent in clinical reports, yet discriminating them from positive findings remains a challenging task for in-formation extraction. Most of the …

parminder-bhatia

• Jan 1, 2019 • 1 min read

Recent Publications

Alaa Aljabari, Mohammed Khalilia, Mustafa Jarrar (2025). $Wojood^{Relations}$: Arabic Relation Extraction Corpus and Modeling. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing.

Link PDF Dataset DOI

Diyam Akra, Mohammed Khalilia, Mustafa Jarrar (2025). Active Learning for Multidialectal Arabic POS Tagging. Findings of the Association for Computational Linguistics: EMNLP 2025.

Link PDF Dataset DOI

Alaa Aljabari, Nagham Hamad, Mohammed Khalilia, Mustafa Jarrar (2025). WojoodOntology: Ontology-Driven LLM Prompting for Unified Information Extraction Tasks. Proceedings of The Third Arabic Natural Language Processing Conference.

Link PDF Dataset DOI

Nagham Hamad, Mohammed Khalilia, Mustafa Jarrar (2025). Konooz: Multi-domain Multi-dialect Corpus for Named Entity Recognition. Findings of the Association for Computational Linguistics: ACL 2025.

Link PDF Dataset DOI