
Microsoft has recently announced its new language model for text-to-speech synthesis, named VALL-E. Using only 3 seconds of audio recording, this AI system can imitate the speaker’s voice. According to the company’s documentation, VALL-E is developing to work with other generative AI models, such as GPT-3, allowing ChatGPT to offer voice results when it is integrated. Microsoft is aiming to use this AI everywhere, such as in the browser, in office automation apps, and other areas. Examples of VALL-E show that it is capable of not only imitating the voice, but also the original language cadence and pitch. Such a model has been around before, as Google boasted similar models years ago. Still, Microsoft promises to broaden the use of VALL-E and other powerful AI’s to become a presence in popular solutions.