Vall-E: Meet the artificial intelligence that imitates your voice just by listening to you for three seconds

Microsoft is betting big on the artificial intelligence of ChatGPT; however, it is capable of imitating your voice just by listening to you; it raises issues like identity theft.

UNITED STATES.- A few years ago the world was one of the cryptocurrencies; the year 2022 wass non-fungible tokens; and today, without a doubt, outlines to be the year of the artificial intelligences and the realization of the metaverse.

With this in mind, Microsoft is betting big on the artificial intelligence of GPT-3, which is designed by Open AI for several of its applications and services, ranging from bing or World. However, they also assured that they are developing their own models.

Microsoft’s plan to implement ChatGPT within its solutions continues; and, during the first semester of this 2023 it will arrive at Bing, in addition to there being information about the implementations that it will have within the Office suite; In addition to a new artificial intelligence.

Meet Vall-E: The artificial intelligence that imitates voices.

Vall-E, andn concrete, it is a language model for the text-to-speech synthesis (TTS) It is based on EnCodec, an audio codec from Meta that is similar to other artificial intelligences that allow you to generate audio through a short text description.

Although it is true that Microsoft itself has a similar one: the Text-To-Speech, that allows you to convert text into synthesized speech, the difference is that Vall-E is capable of analyzing a person’s voice to later interpret how that voice would sound with different phrases.

One quirk: it preserves the speaker’s intonation and emotion, the company claims, and can achieve great results with just three seconds of voice.

Specifically, we train a neural codec language model (named Vall-E) using discrete codes derived from a standard neural audio codec model, and consider TTS as a conditional language modeling task rather than continuous signal regression as in previous works,” the statement said.

In other words, ChatGPT itself would be able to deliver voice results once; and once this model is integrated. A “Imitate the voice of the little boy on the road”, It would be possible, as long as the previous training has been carried out.

The objective of this, as explained in an article by Hipertextual, is to be able to create voice speech through a text introduction; however, this brings with it different drawbacks because, in the event that Vall-E is available to the public, many could use it to impersonate people’s identity.

