How to create .dic file for turkish language in cmusphinx - speech-to-text

I installed "sphinxbase" and "pocketsphinx" on windows and run the "PocketSphinxDemo" on eclipse and on my phone.
Next i want to create Turkish language for this application.It is enough to understand a few words or sentences as beginning so that it could be easy.
I could not found ready Turkish model on Voxforge.
Is there any other website that i can find or any tool that i can create easily.
I used imtool but dic file pronunciation is english.
How can i create dic file for turkish language.

You need a list of words first of all. After that you can use espeak rules to create a phonetic dictionary:
espeak -v tr -x
Türkçe
tYRktS'E
You only need to parse the output an put it in the dictionary in alpha-only format.You just neet create a map to letter-only phoneset, not necessary a map to arpabet. Open the text pad and create a map:
t t
y yy
r rr
k k
e ee
S' sh
So in the end you get entries like this:
türkçe t yy rr k t sh ee
That's it. There is no requirement to use ARPABet. For more details see the acoustic model training tutorial

Related

How to ask cygwin to repeat a command with different paramaters, drawn from a list, and then return the mean?

In phonological corpus tools, the documentation says
The procedure for running command-line analysis scripts is essentially
the same for any analysis. First, open a Terminal window (on Mac OS X
or Linux) or a CygWin window (on Windows, can be downloaded at
https://www.cygwin.com/). Using the “cd” command, navigate to the
directory containing your corpus file. If the analysis you want to
perform requires any additional input files, then they must also be in
this directory. (Instead of running the script from the relevant file
directory, you may also run scripts from any working directory as long
as you specify the full path to any files.) You then type the analysis
command into the Terminal and press enter/return to run the analysis.
The first (positional) argument after the name of the analysis script
is always the name of the corpus file.
The command I need is
corpustools.symbolsim.edit_distance.edit_distance(word1, word2, sequence_type, max_distance=None)
The file will be as required:
The file should be set up in columns (e.g., imported from a
spreadsheet) and be delimited with some uniform character (tab, comma,
backslash, etc.). The names of most columns of information can be
anything you like, but the column representing common spelling of the
word should be called “spelling”; that with transcription should be
called “transcription”; and that with token frequency should be called
“frequency.”
And I want to return values for a "transcription" series_type. So, from the above, I think that means that my .txt will be as follows:
spelling . transcription . frequency
PLEASE . P L IY Z . 1
HELP . HH EH L P . 1
ME . M IY . 1
Though getting all the data into .txt files of this sort will take a while.
Question
Is it possible, in cygwin , to run the above command, but to tell cygwin to work out the value for every possible word (I think that means I'll have to use the transcription values for the "word", so e.g. "P L IY Z") pair in the list as it appears in the .txt file (P L IY Z, HH EH L P... P L IY Z, M IY... HH EH L P, M IY), and then return the mean of these values?
I have no experience of cygwin, or coding.

How to make a text object/motion with begining and ending pattern in Vim

I would like to define a text object like iw, aB and the other ones listed in :help text-objects that defines an area beginning with some pattern and ending with another. More precisely, I would like to define a text object which starts with some {pattern1} and ends with some {pattern2}. The patterns included. It is important that it can stretch over multiple lines (like aB but unlike a").
The examples I have in mind are for selecting in-line equations in LaTeX, that is, everything between one $ and the next $ (including the $'s), and for selecting LaTeX environments like between \begin{*} and the following \end{*}, where the * here is just any string of characters (but non-greedy like \{-} in Vim regex).
I have tried to tried to look at this guide at the Vim Tips Wiki, but I do not know how to replace [z and ]z with something that searches backwards for some pattern and forwards for some patters, respectively, so that it works as I want it to.
So to give the example of the inline equation (lets say the text obejct is called ad), then, if the cursor was placed somewhere between the $'s in the following line:
it follows that $ \sum_{n=0}^\infty 2^{-n} $ is two
in normal mode, and vad was pressed, then $ \sum_{n=0}^\infty 2^{-n} $ should be in visual, or if dad was pressed it should be deleted.
The mentioned Vim Tips Wiki page lists the two plugins (under "Related scripts") that make defining new text objects very easy:
textobj-user is very flexible and generic
CountJump plugin (by me) is specially written for text objects defined by start and end patterns
The following call defines an ad text object for text inside $...$:
call CountJump#TextObject#MakeWithCountSearch('', 'd', 'a', 'v', '\$', '\$')

Remove s p a ce s from a word in MS Word doc

A 200 page file was imported from PDF to a Word document. The text came our garbled and I am trying to clean up using VBA macros.
The issue is that the text looks like this
CarrierCOM is a c a r rie r ’s c a r rie r into and ou t of Mexico.
We provide a fu ll line of services including co-loca tion, private
line, conversions, in te rconnections, c ro s s -b o rd e r services,
in ternet, video conferencing and specialized services as required.
I need help with removing spaces that appear randomly in between words and make the output look like this
CarrierCOM is a carrier’s carrier into and out of Mexico. We provide a
full line of services including co-location, private line,
conversions, interconnections, cross-border services, internet, video
conferencing and specialized services as required.
Any help you could provide would be appreciated. Doesn't have to be VBA, could be any other programming language/technique/software.
Use Ctrl-h (search and replace). First, replace ". " (without quotes) with ".$%", which will mark all of your end-of-sentence spaces. Second, replace " " with "" (i.e., replace all spaces with nothing). Third, replace ".$%" with ". " to put back the end-of-sentence spaces. There you go; you are a programmer.
I forgot to say that during each replace, you have to choose ReplaceAll. Also, start from the beginning of the document.

Bulk replacement of strings in single text file (Notepad++)

I am using Notepad++ to edit a text file that has been poorly encoded log. The program didn't take into account the AZERTY keyboard layout of the user. The result is a text file as follows (example I made up)
Hi guysm this is Qqron<
I zonder zhen ze cqn go to the szi;;ing pool together
:y phone nu;ber is !%%)#!####(
Cqll ;e/
I need to make bulk replacement of characters as follows
a > q
q > a
[/0] > 0
! > 1
and a few others
Is it possible to create a table of characters to be replaced ? I'm a bit of a beginner and I don't know whether Notepad++ allows to run scripts
Notepad++ has a macro recorder, but macros can't be written in any documented embedded language. You could potentially record a macro that does 70 or so search and replace operations. See this explanation. There is some information on "hacking" the macro language here.
Clearly Notepad++ was not meant for this task. The Python solutions are okay, but Perl was meant originally for stuff exactly like this. Here's a one-liner. This is for Windows. In bash/Linux, replace the double quotes with single ones.
perl -n -e "tr/aqAQzwZW;:!##$%^&*()m\/</qaQAwzWZmM1234567890:?./;print"
It will do what #kreativitea's solution does (I used his translation strings), reading the standard input and printing to standard output.
I don't know about Notepad++. But if you have python installed in your machine, you can you this little script.
source = """Hi guysm this is Qqron<
I zonder zhen ze cqn go to the szi;;ing pool together
:y phone nu;ber is !%%)#!####(
Cqll ;e/"""
replace_dict = {'a': 'q', 'q': 'a', '[/0]': '0', '!': '1'}
target = ''
for char in source:
target_char = replace_dict.get(char)
if target_char:
target += target_char
else:
target += char
print target
Just customize the replace_dict variable to suit your need.
So, there are quite a few different kinds of AZERTY layouts, so this isn't a complete answer. However, it does pass your test case, and does it as fast as any single character replacement can be done in python (unless you need to take unicode into account as well)
from string import maketrans
test = '''Hi guysm this is Qqron<
I zonder zhen ze cqn go to the szi;;ing pool together
:y phone nu;ber is !%%)#!####(
Cqll ;e/'''
# warning: not a full table.
table = maketrans('aqAQzwZW;:!##$%^&*()m/<', 'qaQAwzWZmM1234567890:?.')
test.translate(table)
So, as long as you find out what version of AZERTY your user is using, you should be okay. Just make sure to properly fill out the translation table with the details of the AZERTY implementation.

How do I search through MATLAB command history?

I would like to search for a specific command I've previously used. Is it possible to do a free text search on MATLAB command history?
Yes. Matlab stores your command history in a file called history.m in the "preferences folder," a directory containing preferences, history, and layout files. You can find the preferences folder using the prefdir command:
>> prefdir
ans =
/home/tobin/.matlab/R2010a
Then search the history.m file in that directory using the mechanism of your choice. For instance, using grep on unix:
>> chdir(prefdir)
>> !grep plot history.m
plot(f, abs(tf))
doc biplot
!grep plot history.m
You can also simply use the search function in the command history window if you just want to use the GUI.
If you want to accomplish this in a programmatic and platform-independent manner, you can first use MATLAB's Java internals to get the command history as a character array:
history = com.mathworks.mlservices.MLCommandHistoryServices.getSessionHistory;
historyText = char(history);
Then you can search through the character array however you like, using functions like STRFIND or REGEXP. You can also turn the character array into a cell array of strings (one line per cell) with the function CELLSTR, since they can sometimes be easier to work with.

Resources