Voice Style Transfer

Jan 11, 2021 · 1 min read

Photo by rawpixel on Unsplash

Voice style transfer, also known as voice cloning, create the same voice characteristics as of the original speaker while uttering different words in different languages. To do that, a variational auto-encoder model was trained on two datasets:

Voice Conversion Toolkit (VCTK)

44 hours
109 speakers

VoxCeleb1 and VoxCeleb2

2,000+ hours
7,000 speakers

The above data, which contains waveforms is converted to mel spectrograms using Short-Time Fourier Transform.

Then a variational auto-encoder called AutoVC, is trained on the data.

Last updated on Jan 11, 2021

Speech and Signal Processing Deep Learning

Authors

Mohammed Khalilia

Lead Data Scientist

Mohammed Khalilia (محمد عبد الستار قاسم) is a researcher, computer scientist, and data scientist with a PhD in Computer Science from the University of Missouri. Following his doctorate, he joined Georgia Tech’s Computational Science and Engineering school and Emory University as a Postdoctoral Fellow, where his research spanned predictive modeling, relational cluster analysis, and health and nursing informatics.

He then spent nearly five years at Amazon, working across Amazon Web Services (AWS) and Amazon Studios in natural language processing (NLP), speech synthesis, and computer vision. In 2018, he was part of the team that launched Comprehend Medical, Amazon’s NLP service for clinical and biomedical text. At Qualtrics, he developed the company’s first fine-tuned large language model, trained synthetic sampling model, and worked on conversational machine learning, and active learning.

He is also an adjunct professor at Birzeit University, where he teaches NLP courses for doctoral students.

← Wojood - Arabic NER Jan 1, 2022

Comprehend Medical Jan 11, 2018 →

No results found

Voice Style Transfer

Voice Conversion Toolkit (VCTK)

VoxCeleb1 and VoxCeleb2