List of Azure speech dictation words per language - speech-to-text

When using speech to text in Azure with dictation mode ON it recognizes words like "question mark" and returns "?". We found other words like this and were looking for complete list but were not able to find it in the documentation (https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/index-speech-to-text)

You can find list of all supported punctuation words here: https://support.microsoft.com/en-us/office/dictate-your-documents-in-word-3876e05f-3fcc-418f-b8ab-db7ce0d11d3c#Tab=Windows
Scroll down to What can I say? section, select language and it will show you for example this:

Related

Do I need to add updated phoneme sequence of words to .dict file while adapting AM using cmusphinx?

I am trying to adapt en-us acoustic model with indian english accent recordings. Since many words are pronounced in different accent, do I need to add the updated phoneme representation of words? Currently I am following this link: https://cmusphinx.github.io/wiki/tutorialadapt/#accumulating-observation-counts and here nothing is mentioned about updating your .dict file.
PS: Should I add new words directly in the dictionary?
There is Indian English model in downloads, you should use it instead. It comes with Indian English dictionary.

text to phonemes converter

I'm searching for a tool that converts text to phonemes, (like text to speech software)
I can program one but it will not be without errors and takes a lot of time!
so my question is:
is there a simple tool for converting e.g.
"hello" to "HH AH0 L OW1"
maybe some command-line tool so i can capture the stdout?
i'm searching for the phonemes in 'Arpabet' style (see the 'hello' example).
espeak does something like that but the output is not in Arpabet style and the phonemes are
not split by some determiner.
If you had searched for Arpabet on wiki you would have found your answer. The CMU guys have prepared scripts which convert most english words to their respective Arpabet phonetic break up.
If you want the phone sequence of a couple of words you can use their interface here. But, if you want it for a big file then you might have to run their scripts on your own. They used to have a working page here, but it seems to be not working now.

Lucene phrase search

I have large text documents. Say, if I search for "computer m", then I want to get "computer monitor", "computer memory", and "computer market share". How can I get matched phrases only?
Should I index files using ShingleAnalyzerWrapper?
Should I use SpellChecker for this purpose?
How can I do this?
org.apache.lucene.search.highlight.Highlighter is used to extract the best-matching text from a found document. Much like how Google will highlight (or display in bold) the matching text in your search results.
This blog entry that might help you get a start on it:
http://hrycan.com/2009/10/25/lucene-highlighter-howto/
You can use MultiPhraseQuery for that.

How can automate text rewriting?

I will explain what I wanna do.
If I have a text like "go there Jack" and I wanna automate rewriting it as "Jack went there".
Let's imagine it's a lengthy text over thousands of lines and has a fixed format over time like "go there Jonh", "go there Joe", "go there Smith".. etc (these are just imaginary examples but the text is not much different).
So i wanna ask is there a tool or a programming language library to automate such task ?
NB: " i have heard about text filters in linux but google didn't help me"
Mmm, I'd say go with some kind of scripting language for something like that. Try Python. Also, depending on how complex these patterns get you might want to look into Regular Expressions. so python+regular expressions imho.
If the text has fixed text as go there <name> then you can use regular expression to do a match with /go there (.*)/i and create new string using the matched string + ' went there'
It helps if you specify which language you are using so an example can be given.

Text indexer search tool which can filter by punctuation?

This is not a programming question per se but a question about searching source code files, which help me in programming.
I use a search tool, X1, which quickly tells me which source code files contain some keywords I am looking for. However it doesn't work well for keywords which have punctuation attached to them. For example, if I search for "show()", X1 shows everything that has "show" in it including the too many results from "MessageBox.Show(.....)" which I don't want to see.
Another example: I need to filter to show ".parent" (notice the dot) and not show everything that has "parent" (no dot) in it.
Anyone knows a text search tool which can filter by keywords that have punctuation? I really prefer a desktop app instead of web based tool like Google (I find it clunky).
I am looking for a tool which indexes words and not a general file searcher like Windows File Explorer.
If you want to search code files efficiently for keywords and punctuation,
consider the SD Source Code Search Engine. It indexes each source langauge according
to langage-specific rules, so it knows exactly the identifiers, keywords,
strings, comments, operators in that langauge and indexes it according to
those elements. It will handle a wide variety of languages: C, C++, Java, VB6, C#, COBOL,
all at once.
Your first query would be posed as:
I=show - I=MessageBox ... '('
(locate identifiers named "show" but eliminate those that are overlapped by
MessageBox leftparen).
You second query would be posed as simply
'.' I=parent
See http://www.semanticdesigns.com/Products/SearchEngine/index.html
It seem to be the job of tools like ctags and cscope.
Ctags is used to index declarations of source files (many languages supported) and Cscope for in-depth c file analysis.
These tools are more suited for a per project use in my opinion. Moreover, you may need to use another tool to use these index, I use vim myself for this purpose, but many text editors use ctags.
The tool from DTSearch.com.

Resources