I am currently using IBM Watson's text-to-speech API, and am working on some translation customization of some specific words. I was wondering what you guys think I should do. In the following text:
the company estimates revenues to be $194MM to 200MM
IBM Watson will pronouce the "MM" as "M M", instead it should say "millions". I don't think I want to translate "MM" to "millions". Is there a way to tell Watson that if "MM" occurs after some number, it's pronounced as "millions" instead of "M M"?
Also, does capitalization make any difference? Is "mm" the same as "MM"?
If you want MM to be read as "millions", there must be a "$" preceding the number. The text-to-speech engine can't blindly change MM to millions because it may inappropriate in some contexts. For example, it could represent millimeters.
In your example, you should change 200MM to $200MM.
Related
I am stumbed.
I have a column with some thousand rows of unique adresses regarding universities, pharmacompanies etc. in a KNIME workflow
Example:
55 Shattuck Street Boston Massachusetts 02115 US [NAT: US RES: US] for all designated states
What I need is to clean the data, so each row look like nice and computable like this:
55 Shattuck Street Boston Massachusetts 02115 US.
My problem Is I can't seem to get the system to remove everything after US. Does anyone know a suitable approach in KNIME?
You should be able to use either String Replacer or String Manipulation for this. The first one lets you use either a simple wildcard or a full regular expression pattern while the second one uses a Java-like syntax - the choice comes down to how many different variations on the input data you need to handle and which syntax you prefer.
If you just need to remove any text between square brackets including the space before the open bracket then you can use String Replacer configured like this:
Beside the nodes which were already mentioned by nekomatic and which will work perfectly for the given scenario, there's also a user-friendly regular expression tool in the Palladian nodes extension called Regex Extractor, which allows you to build your regexes with a live preview as you might know from popular online regex testers.
For your scenario, you could e.g. set up a regex like this:
^(?<address>.*)(?:\s\[.*)
In prose, this means: Capture all characters until a space + square opening bracket and output into a column named address.
The Palladian extension is available here as a free plugin for KNIME Desktop and provides a variety of different tools for web, text, and geo data mining and classification.
Please tell me how I can change the stress in some words in the Azure voice engine text-to-speech. I use Russian voices. I am not working through SSML.
When I send a text for processing, then in some words he puts the stress on the wrong syllable or letter.
I know that some voice engines use special characters like + or 'in front of a stressed vowel. I have not found such an option here
To specify the stress for individual words you can use the SpeakSsmlAsync method and pass a lexicon url or you can directly specify it directly in the ssml by using the phoneme-element. In both cases you can use IPA.
I have an application developed using Dialogflow and actions-on-google framework.
When I provide a response which has numbers in it the text to speech engine pronounces 0 (Zero) as "O" (Oh)
Is there any way where I can configure not to speak 0 (Zero) as "O" (Oh) and should always speak "ZERO"
Please help
You can look the documentation for SSML to provide more specific nuances in the text-to-speech response.
If you want to say specific characters, you should be able to use an SSML say-as tag:
<speak>
<say-as interpret-as="characters">1234567890</say-as>
</speak>
Using sub alias of speak element fixed my issue
<speak>This is test<sub alias="one one zero seven">1107</sub> </speak>
with Dialogflow (API.AI) I find the problem that names from vessel are not well matched when the input comes from google home.
It seems as the speech to text engine completly ignore them and just does the speech to text based on dictionary so Dialogflow cant match the resulting text all at the end.
Is it really like that or is there some way to improve?
Thanks and
Best regards
I'd recommend look at Dialogflow's training feature to identify where the speech recognition of the Google Assistant may not have worked they way you expect. In those cases, you'll see how Google's speech recognition detected words you may not have accounted for. In cases where you'd like to match these unrecognized words to a entity value, simply add them as synonyms.
I capture an audio from a speaker where they say - "I want to meet John Disilva". I pass this to Google Speech API with Phrase as { 'John Disilva', 'Ashish Mundra'}. However, Google Speech API returns me full phrase i.e. - 'I want to meet John Disilva'.
Is there a way I can only get my phrase as return value as I am only interested to extract the name part?
The reason is that I cannot control what someone is saying to my mic. They can say 'I would like to see John Disilva' or 'Do you know John Disilva', but I am sure that my user will always have that name somewhere in this sentence which I want to extract.
If Google Speech API can give me the exact phrase via which it was able to detect John Disilva in that sentence then I can use that Phrase for further processing in my code.
This isn't possible with the Google Speech API. Your best bet may be to just do post-processing to see which name is present. If you need something more accurate than that look for an ASR system that supports "keyword spotting."