how to use Azure speech phonetic set for Japanese - azure

this is for text to speech Azure service
how do I use the sapi phonetic alphabet set referenced here https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-ssml-phonetic-sets#ja-jp
<phoneme alphabet="sapi" ph="">Akari</phoneme>
for example what do you put in ph for あかり
the doc gave some examples at the bottom e.g ゴ'ウセ but when i paste that i get
SSML parsing error: 0x8004507A - Unknown phoneme

Related

Issue with translating multiple languages in a single document

I am trying a language translation code where I am using the translate package where the provider is Microsoft. The input text has 2 languages english and Russian and my to language is english. The translated text does not change to English. Can anyone provide some inputs ?
from translate import Translator
to_lang = "en"
translator = Translator(provider='microsoft', to_lang=to_lang, secret_access_key=secret)
translator.translate("Elapsed Task Time – время в течение, которого выполнялась задача ")
'Elapsed Task Time – время в течение, которого выполнялась задача '
Here is what I tried to compare the issue
from googletrans import Translator
translator = Translator()
translator.translate(r.text, dest='en').text
"Elapsed Task Time - the time during which the task was performed"
Expected result :
"Elapsed Task Time - the time during which the task was performed"
For whatever reason, the (older) version of the Microsoft translator API used in this library won't autodetect mixed languages properly. It will work if your mixed languages include English and you specify from_lang for the other language. It always detects English. For example, if you specify from_lang='ru' and translate to 'it', the English portion will also be translated to Italian.
So, back to your scenario, this should work:
translator = Translator(provider='microsoft', to_lang=to_lang, from_lang='ru', secret_access_key=secret)
That said, I recommend you look at: https://github.com/MicrosoftTranslator/Text-Translation-API-V3-Python. In particular, Translate.py. This should work as expected and uses the latest API (well, more precisely, you get control over which API).

Azure Cognitives services speech to text recognition of numeric entities as text

I was wondering if its possible that the c++ sdk of Cognitives services Speech to text to return the numeric entities as text instead of numbers.
Current response 'I want to order 2 Cokes'
Expected response 'I want to order two Cokes'
Of course i can implement a feature to the translation. But i was wondering if its something that the service already provides. Particularly on spanish.
take a look at the sample repository at https://github.com/Azure-Samples/cognitive-services-speech-sdk
especially the file speech_recognition_samples.cpp , function SpeechRecognitionWithLanguageAndUsingDetailedOutputFormat
Enabling ‘detailed output’ will give you the result you want:
config->SetOutputFormat(OutputFormat::Detailed);
Then you need to look at the detailed output:
result->Properties.GetProperty(PropertyId::SpeechServiceResponse_JsonResult)
And that would create detailed output like this:
{"Duration":35500000,"NBest":[{"Confidence":0.7535948753356934,"Display":"I want to order 2 Cokes.","ITN":"I want to order 2 cokes","Lexical":"i want to order two cokes","MaskedITN":"I want to order 2 cokes"}],"Offset":17000000,"RecognitionStatus":"Success"}
The lexical output is probably what you want
Wolfgang

Extract a particular string given its heading - regex

Am trying to extract a particular string in the below data.
I have to extract the observations mentioned in the data. There are various ways observations has been written, for ex : OBSERVATION, Observation, observed, OBSERVED ....
Please let me know how to extract this.
Data :
Preconditions :1)One valid navigation map should be installed on the MGU.2)One route must be active.\nActions/steps : 1)Press PTT button.\n2)Give voice command "Navigate to Alibaba Restaurant" and observe system\' s behavior.\n\nExpected result/behaviour:\n1) Confirmation prompt of given spoken command should be played. \n2)User shall get the list of POI\'s which are spoken by user for e.g. \n\n\nObserved result/behavior:\n1) Confirmation prompt of given spoken command is not played.\n2) User not able to select POI via speech commands.\n3)User getting the list of POI destination but user not able to select those point via spoken commands for e.g.

Finding Related Topics using Google Knowledge Graph API

I'm currently working on a behavioral targeting application and I need a considerably large keyword database/tool/provider that enables applications to reach to the similar keywords via given keyword for my app. I've recently found that Freebase, which had been providing a similar service before Google acquired them and then integrated to their Knowledge Graph. I was wondering if it's possible to have a list of related topics/keywords for the given entity.
import json
import urllib
api_key = 'API_KEY_HERE'
query = 'Yoga'
service_url = 'https://kgsearch.googleapis.com/v1/entities:search'
params = {
'query': query,
'limit': 10,
'indent': True,
'key': api_key,
}
url = service_url + '?' + urllib.urlencode(params)
response = json.loads(urllib.urlopen(url).read())
for element in response['itemListElement']:
print element['result']['name'] + ' (' + str(element['resultScore']) + ')'
The script above returns the queries below, though I'd like to receive related topics to yoga, such as health, fitness, gym and so on, rather than the things that has the word "Yoga" in their name.
Yoga Sutras of Patanjali (71.245544)
Yōga, Tokyo (28.808222)
Sri Aurobindo (28.727333)
Yoga Vasistha (28.637642)
Yoga Hosers (28.253984)
Yoga Lin (27.524054)
Patanjali (27.061115)
Yoga Journal (26.635073)
Kripalu Center (26.074436)
Yōga Station (25.10318)
I'd really appreciate any suggestions, and I'm also open to using any other API if there is any that I could make use of. Cheers.
See your point:) So here's the script I use for that using Serpstat's API. Here's how it works:
Script collects the keywords from Serpstat's database
Then, collects search suggestions from Serpstat's database
Finally, collects search suggestions from Google's suggestions
Note that to make script work correctly, it's preferable to fill all input boxes. But not all of them are required.
Keyword — required keyword
Search Engine — a search engine for which the analysis will be carried out. For example, for the US Google, you need to set the g_us. The entire list of available search engines can be found here.
Limit the maximum number of phrases from the organic issue, which will participate in the analysis. You cannot set more than 1000 here.
Default keys — list of two-word keywords. You should give each of them some "weight" to receive some kind of result if something goes wrong.
Format: type, keyword, "weight". Every keyword should be written from a new line.
Types:
w — one word
p — two words
Examples:
"w; bottle; 50" — initial weight of word bottle is 50.
"p; plastic bottle; 30" — initial weight of phrase plastic bottle is 30.
"w; plastic bottle; 20" — incorrect. You cannot use a two-word phrase for the "w" type.
Bad words — comma-separated list of words you want the script to exclude from the results.
Token — here you need to enter your token for API access. It can be found on your profile page.
You can download the source code for script here

Creating a tag cloud from text:nsp output

I am using Text::NSP which creates n-gram from text files. Is it possible to create tag clouds from an output file of Text-NSP? I have used and liked IBM Word Cloud Generator which only gives a tag cloud output from the frequency of each word within a file. However, I am working with 2-grams and 3-grams. In short, I need a tag cloud generator which will accept an input file with words and their occurrence number. I am running on Debian.
Thanks all.
I started to use R snippets package which is what I was searching for.
The output of text::nsp should be changed with some bash scripts in order to obtain a dataframe acceptable by the R.

Resources