<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Speech and Signal Processing | Mohammed Khalilia (محمد عبد الستار قاسم)</title><link>http://mohammedkhalilia.com/tags/speech-and-signal-processing/</link><atom:link href="http://mohammedkhalilia.com/tags/speech-and-signal-processing/index.xml" rel="self" type="application/rss+xml"/><description>Speech and Signal Processing</description><generator>HugoBlox Kit (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Mon, 11 Jan 2021 00:00:00 +0000</lastBuildDate><image><url>http://mohammedkhalilia.com/media/icon_hu_e7e672982174d01f.png</url><title>Speech and Signal Processing</title><link>http://mohammedkhalilia.com/tags/speech-and-signal-processing/</link></image><item><title>Voice Style Transfer</title><link>http://mohammedkhalilia.com/projects/voice-style-transfer/</link><pubDate>Mon, 11 Jan 2021 00:00:00 +0000</pubDate><guid>http://mohammedkhalilia.com/projects/voice-style-transfer/</guid><description>&lt;p&gt;Voice style transfer, also known as voice cloning, create the same voice characteristics as of the original speaker while uttering different words in different languages.
To do that, a variational auto-encoder model was trained on two datasets:&lt;/p&gt;
&lt;h5 id="voice-conversion-toolkit-vctk"&gt;Voice Conversion Toolkit (VCTK)&lt;/h5&gt;
&lt;ul&gt;
&lt;li&gt;44 hours&lt;/li&gt;
&lt;li&gt;109 speakers&lt;/li&gt;
&lt;/ul&gt;
&lt;h5 id="voxceleb1-and-voxceleb2"&gt;VoxCeleb1 and VoxCeleb2&lt;/h5&gt;
&lt;ul&gt;
&lt;li&gt;2,000+ hours&lt;/li&gt;
&lt;li&gt;7,000 speakers&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The above data, which contains waveforms is converted to mel spectrograms using Short-Time Fourier Transform.
&lt;figure &gt;
&lt;div class="flex justify-center "&gt;
&lt;div class="w-full" &gt;
&lt;img alt="alt"
srcset="http://mohammedkhalilia.com/projects/voice-style-transfer/fft_hu_9f1cb69ded5d4f92.webp 320w, http://mohammedkhalilia.com/projects/voice-style-transfer/fft_hu_764a88c2c35cdc64.webp 480w, http://mohammedkhalilia.com/projects/voice-style-transfer/fft_hu_80a6dc05ae176835.webp 760w"
sizes="(max-width: 480px) 100vw, (max-width: 768px) 90vw, (max-width: 1024px) 80vw, 760px"
src="http://mohammedkhalilia.com/projects/voice-style-transfer/fft_hu_9f1cb69ded5d4f92.webp"
width="760"
height="77"
loading="lazy" data-zoomable /&gt;&lt;/div&gt;
&lt;/div&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;Then a variational auto-encoder called AutoVC, is trained on the data.&lt;/p&gt;</description></item></channel></rss>