{"id":154836,"date":"2023-01-07T22:24:24","date_gmt":"2023-01-08T04:24:24","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2023\/01\/microsoft-unveils-vall-e-a-voice-dall-e"},"modified":"2023-01-07T22:24:24","modified_gmt":"2023-01-08T04:24:24","slug":"microsoft-unveils-vall-e-a-voice-dall-e","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2023\/01\/microsoft-unveils-vall-e-a-voice-dall-e","title":{"rendered":"Microsoft Unveils VALL-E, A Voice DALL-E"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/microsoft-unveils-vall-e-a-voice-dall-e3.jpg\"><\/a><\/p>\n<p>VALL-E can generate various outputs with the same input text while maintaining the speaker\u2019s emotion and the acoustical prompt. VALL-E can synthesise natural speech with high speaker accuracy by prompting in the zero-shot scenario. According to evaluation results, VALL-E performs much better on <a href=\"https:\/\/analyticsindiamag.com\/amid-chatgpt-hype-openai-silently-releases-second-version-of-whisper\/\">LibriSpeech<\/a> and VCTK than the most advanced zero-shot TTS system. VALL-E even achieved new state-of-the-art zero-shot TTS results on LibriSpeech and VCTK.<\/p>\n<p>It is interesting to note that people who have lost their voice can \u2018talk\u2019 again through this text-to-speech method if they have previous voice recordings of themselves. Two years ago, a Stanford University Professor, <a href=\"https:\/\/analyticsindiamag.com\/stanford-university-professor-maneesh-agrawala-on-video-editing-tools-deep-fakes-more\/\">Maneesh Agarwala<\/a>, also told AIM that they were working on something similar, where they had planned to record a patient\u2019s voice before the surgery and then use that pre-surgery recording to convert their electrolarynx voice back into their pre-surgery voice.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>VALL-E can generate various outputs with the same input text while maintaining the speaker\u2019s emotion and the acoustical prompt. VALL-E can synthesise natural speech with high speaker accuracy by prompting in the zero-shot scenario. According to evaluation results, VALL-E performs much better on LibriSpeech and VCTK than the most advanced zero-shot TTS system. VALL-E even [\u2026]<\/p>\n","protected":false},"author":556,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11],"tags":[],"class_list":["post-154836","post","type-post","status-publish","format-standard","hentry","category-biotech-medical"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/154836","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/556"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=154836"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/154836\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=154836"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=154836"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=154836"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}