Meta unveils Voicebox, the most versatile generative AI for speech generation

shutterstock 2159105387 Large

Meta is proud to break new ground in the domain of generative artificial intelligence (AI) for speech with the launch of Voicebox. This innovative AI model excels in an array of speech generation tasks such as audio editing, sampling, and stylising, despite not being directly trained for these capabilities. Instead, Voicebox has mastered these skills through a process of in-context learning.

The Voicebox model, a product of Meta’s relentless innovation, is capable of creating high-quality audio clips and editing pre-recorded audio, eliminating intrusive sounds such as car horns or dogs barking, whilst retaining the original content and style. Impressively, Voicebox is not confined to a single language, with the ability to generate speech in six different languages.

Voicebox: The Future of Multipurpose Generative AI

Looking forward, Voicebox symbolises the potential held by multipurpose generative AI models. This could be the key to supplying virtual assistants and non-player characters in the ever-evolving metaverse with naturalistic voices. The model could also enable visually impaired individuals to audibly receive written messages from friends, spoken by AI in familiar tones.

Beyond these applications, Voicebox promises to revolutionise content creation, providing creators with innovative tools for effortless audio track creation and editing for videos, and much more besides.

Meta’s Voicebox: A Multitude of Applications

Voicebox, Meta’s latest offering, is versatile, with capabilities extending across a wide spectrum of tasks:

In-context text-to-speech synthesis: Voicebox can replicate the style of an audio sample as short as two seconds long for text-to-speech generation.

Speech editing and noise reduction: Voicebox can recreate a speech segment interrupted by noise or replace mispronounced words, all without the need to re-record the whole speech. Consider it as an eraser for audio editing.

Cross-lingual style transfer: Given a sample of speech and a text passage in English, French, German, Spanish, Polish, or Portuguese, Voicebox can generate a reading of the text in any of these languages, even if the sample speech and the text are not in the same language. This function could be instrumental in facilitating natural and authentic communication, even between speakers of different languages.

Diverse speech sampling: Leveraging its diverse data learning, Voicebox can generate speech that authentically reflects real-world conversations in six listed languages.

The unveiling of Voicebox is a testament to Meta’s commitment to pushing the boundaries in generative AI research. As we venture further into audio space, we eagerly await the opportunity to witness how other researchers and innovators will build upon Voicebox’s groundbreaking capabilities.

More from Qonversations

Tech

Wifi

Did you know? The term Wi-Fi doesn’t stand for anything

Tech

2024 02 19T141103Z 275614176 RC2Q56AQ23MP RTRMADP 3 TECH AI

Unlocking the investment potential of artificial intelligence in today’s market

Tech

Google California

Google invests US$1 billion to transform Thailand’s digital economy

Tech

Tunisia 5G

Tunisia speeds into the future with 5G to enhance digital infrastructure

Front of mind