Microsoft tts voices free

This is an NLP technique to map words or phrases to vectors of real numbers. Microsoft’s recent innovation called Unified Neural Text Analyser (UniTA) helps simplify the text analyser workflow while reducing over 50% of pronunciation errors simultaneously.įirstly, UniTA converts the input text to word embedding vectors. Morphology - used to classify word numbers in terms of plurality, gender, or cases - is also needed. Simultaneously, normalisation for shortcuts like ‘$300’ to ‘300 dollars’ or abbreviations like ‘Dr’ to ‘Doctor’ is also essential. Tagging words according to their part of speech – noun, verb, adjective, adverb – is another concern. The other major challenge is word segmentation since it is different for different languages. The process becomes especially tricky with heteronyms like ‘read’, which has completely different pronunciations depending on the tense. The process starts with a Text Analyser that has to overcome many challenges.

The text analyser converts plain text to pronunciations, the acoustic model converts pronunciations to acoustic features and finally, the vocoder generates waveforms.īelow, we explain the tech behind the Neural TTS system. The technology consists of three main components: text analyser, neural acoustic model, and neural vocoder. So far, AT&T, Duolingo, Progressive, and Swisscom have tapped the Custom Neural Voice feature to develop a unique speech solution. Recently, Microsoft announced limited access to its Neural Text-To-Speech (TTS) AI that enables developers to create custom synthetic voices.