

Microsoft's VALL-E imitates any voice - three seconds of recording is enough

DALL-E is followed by VALL-E: Microsoft and OpenAI have created a new artificial intelligence (AI) that can imitate voices. A voice recording of just three seconds should be enough input for the AI.
Today we know: What photos or videos show doesn't necessarily have to have happened. Since ChatGPT and DALL-E, it's also clear that a text doesn't necessarily have to come from an author's pen or a picture from an artist's brush. Now it's the voice's turn.
Microsoft is aware that the technology also has potential for misuse. For this reason, a protocol in future applications will ensure that content created by VALL-E can be recognised as such.
The AI delivers impressive results with the examples presented by Microsoft. For its training, 60,000 hours of English language recordings were processed. This corresponds to a hundred times the input of existing speech syntheses.
Cover image: shutterstock

I find my muse in everything. When I don’t, I draw inspiration from daydreaming. After all, if you dream, you don’t sleep through life.
From the latest iPhone to the return of 80s fashion. The editorial team will help you make sense of it all.
Show all