CMU Sphinx for Indian English - cmusphinx

I have tried CMU Sphinx and it works fine with American English. Now,I want to use CMU Sphinx for detecting (Indian Accent) English. What exactly are the steps/changes I should do?

What you will have to do is adapt the acoustic model. Check the CMU Sphinx wiki page, they have explained the procedure of both training and adapting acoustic models. The link that works for now: http://cmusphinx.sourceforge.net/wiki/
According to what the site says,
CMUSphinx provides ways for adaptation which is sufficient for most cases when more accuracy is required. Adaptation is known to work well when you are using different recording environments (close-distance or far microphone or telephone channel), or when a slightly different accent is present (UK English or even Indian English) or even another language.

One thing you can also do is download pre-trained files from here:
https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/
The files inside these .tar.gz are a bit different from the structure I have in my version of the lib, so I had to follow the steps in the following link to make it work:
https://github.com/Uberi/speech_recognition/issues/192
I'll show the steps I have taken, which are basically what the link above says, but it may die, so here it goes:
On my computer (Ubuntu 18.04.4), the dictionaries are kept here:
~/.local/lib/python2.7/site-packages/speech_recognition/pocketsphinx-data
Inside the above folter, I had a subfolder en-US, in which I have the following files (F) and directories (D):
D acoustic-model
F language-model.lm.bin
F LICENSE.txt
F pronounciation-dictionary.dict
So I downloaded the .tar.gz for Indian language and made it look like the en-US directory. Like this:
tar zxvf cmusphinx-en-in-8khz-5.2.tar.gz
mv cmusphinx-en-in-8khz-5.2 en-IN
cd en-IN
mv en-us.lm.bin language-model.lm.bin
mv en_in.dic pronounciation-dictionary.dict
mv en_in.cd_cont_5000 acoustic-model
cd ..
Then I moved it to the correct directory.
mv en-IN ~/.local/lib/python2.7/site-packages/speech_recognition/pocketsphinx-data
From this point, I was able to use en-IN.

Related

How to generate API documentation from docstrings, for functional code

All I want is to generate API docs from function docstrings in my source code, presumably through sphinx's autodoc extension, to comprise my lean API documentation. My code follows the functional programming paradigm, not OOP, as demonstrated below.
I'd probably, as a second step, add one or more documentation pages for the project, hosting things like introductory comments, code examples (leveraging doctest I guess) and of course linking to the API documentation itself.
What might be a straightforward flow to accomplish documentation from docstrings here? Sphinx is a great popular tool, yet I find its getting started pages a bit dense.
What I've tried, from within my source directory:
$ mkdir documentation
$ sphinx-apidoc -f --ext-autodoc -o documentation .
No error messages, yet this doesn't find (or handle) the docstrings in my source files; it just creates an rst file per source, with contents like follows:
tokenizer module
================
.. automodule:: tokenizer
:members:
:undoc-members:
:show-inheritance:
Basically, my source files look like follows, without much module ceremony or object oriented contents in them (I like functional programming, even though it's python this time around). I've truncated the sample source file below of course, it contains more functions not shown below.
tokenizer.py
from hltk.util import clean, safe_get, safe_same_char
"""
Basic tokenization for text
not supported:
+ forms of pseuod elipsis (...)
support for the above should be added only as part of an automata rewrite
"""
always_swallow_separators = u" \t\n\v\f\r\u200e"
always_separators = ",!?()[]{}:;"
def is_one_of(char, chars):
'''
Returns whether the input `char` is any of the characters of the string `chars`
'''
return chars.count(char)
Or would you recommend a different tool and flow for this use case?
Many thanks!
If you find Sphinx too cumbersome and particular to use for simple projects, try pdoc:
$ pdoc --html tokenizer.py

How to find which Yocto Project recipe populates a particular file on an image root filesystem

I work with the Yocto Project quite a bit and a common challenge is determining why (or from what recipe) a file has been included on the rootfs. This is something that can hopefully be derived from the build system's environment, log & meta data. Ideally, a set of commands would allow linking a file back to a source (ie. recipe).
My usual strategy is to perform searches on the meta data (e.g. grep -R filename ../layers/*) and searches on the internet of said filenames to find clues of possible responsible recipes. However, this is not always very effective. In many cases, filenames are not explicitly stated within a recipe. Additionally, there are many cases where a filename is provided by multiple recipes which leads to additional work to find which recipe ultimately supplied it. There are of course many other clues available to find the answer. Regardless, this investigation is often quite laborious when it seems the build system should have enough information to make resolving the answer simple.
This is exact use case for oe-pkgdata-util script and its subcommand find-path. That script is part of openembedded-core.
See this example (executed in OE build environment, i.e. bitbake works):
tom#pc:~/oe/build> oe-pkgdata-util find-path /lib/ld-2.24.so
glibc: /lib/ld-2.24.so
You can clearly see that this library belongs to glibc recipe.
oe-pkgdata-util has more useful subcommands to see information about packages and recipes, it worth to check the --help.
If you prefer a graphical presentation, the Toaster web UI will also show you this, plus dependency information.
The candidate files deployed for each recipe are placed in each $WORKDIR/image
So you can cd to
$ cd ${TMPDIR}/work/${MULTIMACH_TARGET_SYS}
and perform a
$ find . -path '*/image/*/fileYouAreLookingFor'
from the result you should be able to infer the ${PN} of the recipe which deploys such file.
For example:
$ find . -path '*/image/*/mc'
./bash-completion/2.4-r0/image/usr/share/bash-completion/completions/mc
./mc/4.8.18-r0/image/usr/share/mc
./mc/4.8.18-r0/image/usr/bin/mc
./mc/4.8.18-r0/image/usr/libexec/mc
./mc/4.8.18-r0/image/etc/mc

Use German dictionary and language model with Sphinx4

I can use the en-us things that come with Sphinx4, no problem:
cfg.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us")
cfg.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict")
cfg.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin")
I can use this to transcribe an English sound file recording.
Now I want to use this with German recordings. On the website I find a link to Acoustic and Language Models. In it there is an archive 'German Voxforge'. It it I find the corresponding files for the acoustic model path. But it does not contain a dictionary or language model as far as I can see.
How do I get the dictionary and language model path for German in Sphinx4?
You create them yourself. You can create language model from subtitles or wikipedia dumps. The documentation is here.
Latest German models are actually not on CMUSphinx page, they are at github/gooofy. In this gooofy project you can find dictionary documentation, models and related matherials.
I have tried the German model with pocketsphinx and got some errors due to the "invalid" language model *.lm.bin files were used.
I have switched to the *.lm.gz and it working fine.
The proper configuration list is:
fst = voxforge-de.fst
hmm folder = model_parameters/voxforge.cd_cont_6000
dictionary = cmusphinx-voxforge-de.dic
language model = cmusphinx-voxforge-de.lm.gz
To get the "hmm" path you should unzip an archive:
cmusphinx-de-voxforge-5.2.tar.gz
I think it should be the same for a Sphinx4, so please give it a try.

CMake Help find_path find_library - I just don't understand

I am trying to learn CMake. I have the Mastering CMake book and I'm trying to go through my first "easy" tutorial. Using CMake: Hello World Example
I made it through the first part alright, but when I tried to add the sub folders for the "Building a library" part of the tutorial I'm just not getting it. I followed the instructions all the way to the very end.
**We've seen an example of how to build a program. Now let's make a library as well. The library will be called "namer" and it will have a single function "getWorld" that returns the name of the nearest planet. We choose to put this library in a subdirectory called "namer" - it doesn't really matter where we put it, this is just an example.
I made it a subfolder in my HelloWorld project. Should I be making this a separate project?
**One way we can help CMake find the Namer package (which will be our namer library) is by writing a helper script called FindNamer.cmake. This is just another file written in the CMake language that pokes around in all the places our library might be hiding. Here is an example (put this in "hello/FindNamer.cmake"):
This is my FindNamer.cmake file:
find_path(Namer_INCLUDE_DIRS world.h /usr/include "$ENV{NAMER_ROOT}")
find_library(Namer_LIBRARIES namer /usr/lib "$ENV{NAMER_ROOT}")
set(Namer_FOUND TRUE)
if (NOT Namer_INCLUDE_DIRS)
set(Namer_FOUND FALSE)
endif (NOT Namer_INCLUDE_DIRS)
if (NOT Namer_LIBRARIES)
set(Namer_FOUND FALSE)
endif (NOT Namer_LIBRARIES)
**The important parts here are the "find_path" and "find_library" commands, which look for the header file world.h and the namer library.
I followed the next instructions and at the very end the tutorial includes this:
**If we try again, configuration will still fail since the search path we gave for "find_path" and "find_library" doesn't actually include the needed files. We could copy them, or have added a hard-coded directory to find_path and find_library pointing to where the files are on our hard drive - but better, in the CMake GUI on windows or by running "ccmake ." on Linux, we can just fill in the directories there.
At this point I am completely confused (Newbie!!!!). I don't have a NamerConfig.cmake or namer-config.cmake file and I don't know what the find_path and find_library is supposed to be pointing to.
Thank you in advance for your help,
Severely Confused :-(
I said I was a newbie. I guess I'm a little tired too! Yes, these must be in two separate projects.

example of using external libraries or packages in Common Lisp

In Common Lisp, quicklisp is a popular library management tool. I'm going to use that tool and I'm going to try and use CL-WHO. I use the SBCL 1.0.57 implementation. I'm going to answer my own question below.
As a beginner, it's not clear how ASDF and quicklisp actually work together. And so it's not clear how to actually use packages or libraries that you've downloaded through quicklisp in an external source file. The quicklisp FAQ, at least at this moment, does not help. In python, it's incredibly simple: you can just put 'import somemodule' and life is great. Is there an equivalent for CL+quicklisp?
If you search, you find many results. Here are some of the most relevant ones I've found:
Lisp importing/loading file
How to use packages installed by quicklisp?
When I was reading through these originally, at least one question came to mind: do I actually have to care about ASDF if I'm using quicklisp? Quicklisp seems to be a higher level management tool. Other people suggest using quickproject. But is that really necessary?
The analogy to Python's imports is the system definition... well, this is a very loose analogy, but, it's the way to go. You declare dependencies in the system definition and then in source code you expect it to be there, so that if you later refer to the bits of the foreign code, you just do it.
Eg. in the system definition you might have: (usually it would be in my-program.asd file)
(defsystem :my-program
:version "0.0.1"
:serial t
:description "My program"
:components ((:file "some-source-file"))
;; `some-external-package' here is the "import", i.e. here you
;; declared that you will be using code from this package.
;; ASDF will generally "know" how to get the code of that package
;; from here on. But if it doesn't, then there are ways to "help it"
;; similar to how in Python there's a procedure to prepare your local
;; files to be used by easy_install
:depends-on (:some-external-package))
Later on in your code you just assume that the some-external-package is available to your program, e.g.:
(some-external-package:exported-symbol)
should just work. ("Your code" is the some-source-file.lisp, you've specified in the components).
This is the ASDF documentation on how to define systems
After you have this file in the place where ASDF might find it*, assuming you have ASDF installed (available to your Lisp, SBCL comes bundled with it), you'd load this system using (asdf:load-system :my-program) Explained here.
* - A quick way to test it would be to do
(push "/path/to/your/system/definition/" asdf:*central-registry*)
Download cl-who through the instructions on the quicklisp page and run this:
#!/usr/bin/sbcl --script
(load "~/quicklisp/setup.lisp")
(ql:quickload "asdf")
(asdf:load-system 'cl-who)
(with-open-file (*standard-output* "out.html" :direction :output)
(cl-who:with-html-output (*standard-output* nil :indent t)
(:html
(:head
(:title "Test page"))
(:body
(:p "CL-WHO is really easy to use")))))
To the beginner, or someone who's really lazy, there's no reason why you should have to write 3 lines at the top instead of just one (like in python).

Resources