I'm looking for an open source library to detect the spoken language used in an audio file, such as a wav file.
I tried CMU sphinx , but was not able to understand how to use it for language detection? Can someone please help?
If you are willing to learn another toolkit, you should consider Kaldi [1]. It is an open-source speech recognition toolkit with a speaker recognition system (which uses similar models as a language identification system) in the trunk and an experimental language Identification setup in the sandbox language_id. After checking out the repository, you can switch to the LID sandbox with svn switch ^/sandbox/language_id. The LID examples are in egs/lre07.
Whichever toolkit you use, I recommend an i-Vector based system instead of a phonotactic system. An i-Vector based system will be easier to setup as it doesn't require transcripts, and significantly faster, since it avoids decoding.
You can try CMU sphinx in all phone decode mode.
Train models for languages you wish to identify.
Pick language for which hypothesis score is best
Related
I'd like to create a TTS system for a native american language (wayuunaiki).
The language is written in latin (western) alphabet.
I also have information about the phonetics (the rules to convert each word into IPA symbols).
I'm planning to create a database of voice recordings from the native people. Then I want to somehow train that data, using the IPA equivalency information to generate a more accurate speech model.
I'm totally new to Natural Language Processing, so my question is.. which tools can I use to perform what I'm planning?
I've heard that HTK ans CMU Sphinx are quite good in speech recognition. No idea about speech generation. Also heard about Festival, but i read it only uses predefined most known languages: English, Spanish, and so.
Excuse my typing faults. I'm still learning English. Thanks in advance!
You can add new language in Festival, it's actually specifically designed to simplify new language creation. For more details read the festvox book:
http://festvox.org/bsv/
Another toolkit to consider is OpenMary, see their documentation too
https://github.com/marytts/marytts/wiki/New-Language-Support
It is more modern and might be easier for you.
In any case you will have to spend some time and write the code to describe your language. Usually it's about 300 lines of code. After that you can record single-speaker TTS database and run voice building process. The more you record the better the result would be.
Use Festival toolkit for text to speech (Tips : Use Linux operating system)
I want to use sphinx4 for general purpose voice recognition, e.g. you tell the application something and it prints what you've said. However when I walk through the examples it's all about recognising a very small amount of vocabulary. Is there any good tutorial to help configure it to recognise something more challenging e.g. a dialog between two people?
PS: I believe that sphinx4 already has the correct acoustic models and dictionaries, but the lm file is for specific applications, so I'd need a lm file, correct?
However when I walk through the examples it's all about recognising with very small amount of vocabulary. Is there any good tutorial to help config it to recognise something more challenging, e.g. a dialog between two people.
You do not need to configure sphinx4. You can just checkout the latest version from subversion and use the demo as is, for more information see the tutorial
http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4
When I believe is sphinx4 already included the right acoustic models and dictionaries, but the lm file is for the specific applications, so I'm needing a lm file, am I correct?
Default lm file provided is good enough for generic speech, however, if you have specific domain it makes sense to create your domain-specific language model.
My final year project group is planning to build a real time application with neural network support and need to handle image processing efficiently, Any language suggestions would be very much helpful. Thanks.
Mathematica may offer some useful features. The last couple of releases have added quite a lot of image processing functionality. You can get a taste by looking at these blog entries:
How to Make a Webcam Intruder Alarm with Mathematica
The Battle of the Marlborough Maze at Blenheim Palace Continues
The Incredible Convenience of Mathematica Image Processing
Mathematica is an interpreted language, which would appear to present an obstacle to your real-time constraints. However, Mathematica has always integrated well will foreign code (notably C, Java and .NET) and the latest release adds considerable new capabilities with respect to C-code generation, dynamic-library loading and CUDA / OpenCL GPU programming.
Alas, Mathematica is not FOSS and is pretty expensive for commercial use. However, they give great student discounts (90%+, last time I checked) and some college/university departments have site licenses.
On the down side, the Mathematica language is quite unconventional and it takes time to get into the swing of things. IMO, the effort is worth it, but the learning curve might be too long if your project timelines are short.
Note: I am not affiliated with WRI in any way.
My suggestion is OpenCV and C++. OpenCV is also usable with Python, but I don't recommend it if you need to write fast code, Python can be really slow.
How about Python? There is PIL, which
adds image processing capabilities to your Python interpreter. This library supports many file formats, and provides powerful image processing and graphics capabilities.
An introductory article about NN with python and a feed forward NN library:
http://www.ibm.com/developerworks/library/l-neurnet/
http://pypi.python.org/pypi/ffnet/0.6
Matlab provides a lot of features for image processing. May be slightly slow, but I assume performance is not an issue.
ImageMagick is suppose to be real good, but I have no first-hand experience. Mathematica?
Can some one provide me with a list of leading binary research tools for Windows OS and windows applications? I found BinScope from microsoft itself but was wondering if there are any other better tools around?
Thanks,
Omer
If you only have access to the binary your access is limited. If you want to peer into the inner workings of this binary your best bet is a Decompiler like IDA Pro and a assembler level debugger like OllyDBG.
Tom Reps, a professor at the University of Wisconsin and founder of GrammaTech, gave an impressive talk on this at Stanford last summer. GrammaTech is working on binary analysis (http://www.grammatech.com/research/contracts/HSARPA/HSARPA-2005-MCSB/), but I don't know whether it's available in their static analysis product yet.
Disclaimer: One of their VP's bought me lunch and got me to try a demo of their source code analysis tool while I was at Palm (before the binary analysis talk), but I think the results are confidential.
BAP is a toolkit for performing binary analysis on x86 programs. It lifts binary code to an easily understandable and analyzable language similar to compiler intermediate languages. It's not a point and click solution (i.e., programming is required to use it effectively), but it can be useful for people who want to write new program analyses on binary code without redefining the semantics of x86.
I'm interested in creating a visual programming language which can aid non-programmers(like children) to write simple programs, much like Labview or Simulink allows engineers to connect functional blocks together without the knowledge of how they are internally built. Is this called programming by demonstration? What are example applications?
What would be an ideal platform which can allow me to do this(it can be a desktop or a web app)
Check out Google Blockly. Blockly allows a developer to create their own blocks, translations (generators) to virtually any programming language (or even JSON/XML) and includes a graphical interface to allow end users to create their own programs.
Brief summary:
Blockly was influenced by App Inventor, which itself was based off Scratch
App Inventor now uses Blockly (?!)
So does the BBC microbit
Blockly itself runs in a browser (typically) using javascript
Focused on (visual) language developers
language independent blocks and generators
includes a Block Factory - which allows visual programming to create new Blocks (?!) - I didn't find this useful myself...except for understanding
includes generators to map blocks to javascript/python
e.g. These blocks:
Generated this code:
See https://developers.google.com/blockly/about/showcase for more details
Best wishes - Andy
The adventure on which you are about to embark is the design and implementation of a visual programming language. I don't know of any good textbooks in this area, but there are an IEEE conference and refereed journal devoted to this field. Margaret Burnett of Oregon State University, who is a highly regarded authority, has assembled a bibliography on visual programming languages; I suggest you start there.
You might consider writing to Professor Burnett for advice. If you do, I hope you will report the results back here.
There is Scratch written by MIT which is much like what you are looking for.
http://scratch.mit.edu/
A restricted form of programming is dataflow (aka. flow-based) programming, where the application is built from components by connecting their ports. Depending on the platform and purpose, the components are simple (like a path selector) or complex (like an image transformator). There are several dataflow systems (just I've made two), some of them has no visual editor, some of them are just a part of a bigger system, and there're some which don't even mention the approach. (Did you think, that make, MS-Excel and Unix Shell pipes are some kind of this?)
All modern digital synths based on dataflow approach, there's an amazing visual example: http://www.youtube.com/watch?v=0h-RhyopUmc
AFAIK, there's no dataflow system for definitly educational purposes. For more information, you should check this site: http://flowbased.org/start
There is a new open source library out there: TUM.CMS.VPLControl. Get it here. This library may serve as a basis for your purposes.
There is Snap written by UC Berkeley. It is another option to understand VPL.
Pay attention on CoSpaces Edu. It is an online platform that enables the creation of virtual worlds and learning experiences whilst providing a more flexible approach to the learning curriculum.
There is visual coding named "CoBlocks".
Learners can animate and code their creations with "CoBlocks" before exploring and sharing them in mobile VR.
Also It is possible to use JavaScript or TypeScript.
If you want to go ahead with this, the platform that I suggest is the one used to implement Scratch (which already does what you want, IMHO), which is Squeak Smalltalk. The Squeak environment was designed with visual programming explicitly in mind. It's free, and Smalltalk syntax can learned in half an hour. Learning the gigantic class library may take just a little longer.
The blocks editor which was most support and development for microbit is microsoft makecode
Scratch is a horrible language to teach programming (i'm biased, but check out Pipes Visual Programming Language)
What you seem to want to do sounds a lot like Functional Block programming (as in functional block programming language IEC 61499 and other VPLs for mechatronics development). There is already a lot of research into VPLs so you might want to make sure that A) what your are trying to do has an audience and B) what you are trying to do can be done easily.
It sounds a bit negative in tone, but a good place to start to test the plausibility of your idea is by reading Davor Babic's short blog post at http://blog.davor.se/blog/2012/09/09/Visual-programming/
As far as what platform to use - you could use pretty much anything, just make sure it has good graphic libraries (You could use Java with Swing - if you like pain - or Python with TKinter) just depends what you are familiar with. Just keep in mind who you want to eventually launch the language to (if its iOS, then look at using Objective-C, etc.)