{"id":166009,"date":"2023-06-18T19:25:13","date_gmt":"2023-06-19T00:25:13","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2023\/06\/meta-introduces-voicebox-does-a-first-on-generative-ai-speech"},"modified":"2023-06-18T19:25:13","modified_gmt":"2023-06-19T00:25:13","slug":"meta-introduces-voicebox-does-a-first-on-generative-ai-speech","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2023\/06\/meta-introduces-voicebox-does-a-first-on-generative-ai-speech","title":{"rendered":"Meta introduces Voicebox, does a first on Generative AI speech"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/meta-introduces-voicebox-does-a-first-on-generative-ai-speech2.jpg\"><\/a><\/p>\n<p>Meta AI researchers have moved a step forward in the field of generative AI for speech with the development of Voicebox. Unlike previous models, Voicebox can generalize to speech-generation tasks that it was not specifically trained for, demonstrating state-of-the-art performance.<\/p>\n<p>Voicebox is a versatile generative system for speech that can produce high-quality audio clips in a wide variety of styles. It can create outputs from scratch or modify existing samples. The model supports speech synthesis in six languages, as well as <a class=\"\" href=\"https:\/\/tech.hindustantimes.com\/tags\/noise\" data-name=\"noise\">noise<\/a> removal, content editing, style conversion, and diverse sample generation.<\/p>\n<p>Traditionally, generative AI models for speech required specific training for each task using carefully prepared training data. However, Voicebox adopts a new approach called Flow Matching, which surpasses diffusion models in performance. It outperforms existing state-of-the-art models like VALL-E for English text-to-speech tasks, achieving better <a class=\"\" href=\"https:\/\/tech.hindustantimes.com\/tags\/word\" data-name=\"word\">word<\/a> error rates (5.9% vs. 1.9%) and audio similarity (0.580 vs. 0.681), while also being up to 20 times faster. In cross-lingual style transfer, Voicebox surpasses YourTTS by reducing word error rates from 10.9% to 5.2% and improving audio similarity from 0.335 to 0.481.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Meta AI researchers have moved a step forward in the field of generative AI for speech with the development of Voicebox. Unlike previous models, Voicebox can generalize to speech-generation tasks that it was not specifically trained for, demonstrating state-of-the-art performance. Voicebox is a versatile generative system for speech that can produce high-quality audio clips in [\u2026]<\/p>\n","protected":false},"author":609,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-166009","post","type-post","status-publish","format-standard","hentry","category-robotics-ai"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/166009","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/609"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=166009"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/166009\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=166009"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=166009"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=166009"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}