News + Trends

Microsoft's VALL-E imitates any voice - three seconds of recording is enough

Martin Jud
11.1.2023
Translation: machine translated

DALL-E is followed by VALL-E: Microsoft and OpenAI have created a new artificial intelligence (AI) that can imitate voices. A voice recording of just three seconds should be enough input for the AI.

Today we know: What photos or videos show doesn't necessarily have to have happened. Since ChatGPT and DALL-E, it's also clear that a text doesn't necessarily have to come from an author's pen or a picture from an artist's brush. Now it's the voice's turn.

Microsoft is aware that the technology also has potential for misuse. For this reason, a protocol in future applications will ensure that content created by VALL-E can be recognised as such.

The AI delivers impressive results with the examples presented by Microsoft. For its training, 60,000 hours of English language recordings were processed. This corresponds to a hundred times the input of existing speech syntheses.

Cover image: shutterstock

52 people like this article


User Avatar
User Avatar

I find my muse in everything. When I don’t, I draw inspiration from daydreaming. After all, if you dream, you don’t sleep through life.


Computing
Follow topics and stay updated on your areas of interest

Software
Follow topics and stay updated on your areas of interest

Audio
Follow topics and stay updated on your areas of interest

News + Trends

From the latest iPhone to the return of 80s fashion. The editorial team will help you make sense of it all.

Show all