Experience

Lead Data Scientist

aramco

Leading research on multimodal AI systems. Contributed to Llama 2 and other open-source models.

Staff Applied Scientist

Qualtrics XM

  • Developed and led Synthetic Panels powered by custom LLMs to generate synthetic responses to surveys.
  • Led and supervised Qualtrics LLM fine-tuning across multiple projects including Insight Explorer and EX comment summary.
  • Led and developed Qualtrics anonymization model.
  • Mentored multiple junior scientists.

Senior Applied Scientist

Qualtrics XM

  • Developed and led Synthetic Panels powered by custom LLMs to generate synthetic responses to surveys.
  • Led and supervised Qualtrics LLM fine-tuning across multiple projects including Insight Explorer and EX comment summary.
  • Led and developed Qualtrics anonymization model.
  • Mentored multiple junior scientists.

Machine Learning Scientist

Amazon

  • Audio data preprocessing using short-time Fourier transform (STFT).
  • Seq2Seq modeling for applications at the interaction of speech and computer vision.
  • Generating speaker embeddings.
  • Voice synthesis (vocoder)
  • Generative models including GANs and VAEs.
  • Named entity recognition: seq2seq modeling to built NER models based on BERT and LSTM architectures.
  • Temporality extraction: this is an extension to NER which allows the ex-traction of temporal events.
  • Language expansion: extending NER English model to other languages.
  • Data processing, active learning and ETL: data pre-processing, tokenization, active learning and database.
  • Mentoring: mentoring interns, new team members and leading projects.

Postdoctoral Fellow

Emory University

The goal of the ”Advanced Development of an Open-source Platform for Web-based Integrative Digital Image Analysis in Cancer” is to develop open-source integrative technologies to facilitate analysis of data provided by the National Cancer Institute’s The Cancer Genome Atlas (TCGA). Some of my responsibilities include:

  • Developing annotation capabilities that allow users to markup image data,
  • Assisting in transforming the current Cancer Digital Slide Archive to an open-source software,
  • Providing computational image analysis capabilities,
  • Developing integrative analysis dashboard.

Postdoctoral Fellow

Georgia Institute of Technology

  • Computational health,
  • Patient similarity, similarity learning, graph based similarity and treatment recommendation,
  • Predictive modeling,
  • Machine learning including clustering, classification and prediction.

Besides research, I am very hands-on, I:

  • Configured cloud computing clusters on Amazon Web Services (AWS),
  • Installed packages on AWS clusters including Apache Spark and databases,
  • Developed web services running on an AWS instance,
  • Designed web services based on Health Level 7 Fast Health Interoperability Resources,
  • Managed servers and data access.

Additional responsibilities included:

  • Working with graduate and undergraduate students on various projects and applications,
  • Mentoring graduate and undergraduate students,
  • Assigning tasks to the students,
  • Publishing scientific papers,
  • Teaching.

This work was done in collaboration with governmental agencies and companies including:

  • Center of Disease Control (CDC),
  • Union Chimique Belge (UCB), a pharma company headquartered in Brussels, Belgium,
  • Children’s Healthcare of Atlanta.

Research Analyst

University of Missouri-Columbia, School of Medicine

Leveraging Information Technology for Hi-Tech and Hi-Touch Care (http://light2.missouri.edu/) is a $13.3 million grant, federal government funded aimed at improving the health and lowering the cost of a population of 10,000 patients. As an analyst, my responsibilities included:

  • Performing patient risk stratification using rules based and cluster analysis,
  • Designing and implementing a web based portal for tracking how patients attribution to physicians,
  • Collaborating with Cerner to design a datamart to support meaningful use and data analytics,
  • Performing Extract, Transform, Load (ETL) processes from EHR to the datamart,
  • Developing scripts that implements some of the business logic (deployed later by Cerner).

Education

PhD Computer Science

University of Missouri-Columbia

Computer science, machine learning, unsupervised learning, predictive modeling, health informatics, medicine.

BS Computer Science

University of Missouri-Columbia

Skills & Hobbies
Technical Skills
Python
Machine Learning
Cloud Computing
Research & Communication
Academic Writing
Conference Presentations
Grant Proposals
Awards
Best Paper Award
NeurIPS ∙ December 2022
Awarded for groundbreaking work on efficient training of large models.
Languages
100%
English Fluent
100%
Arabic Native