PocketSphinx output nothing - cmusphinx

I want to do a Chinese speech recognition application. I build a language model(using CMUCLMTK) and an acoustic model base on the tutorial
http://cmusphinx.sourceforge.net/wiki/tutoriallm
http://cmusphinx.sourceforge.net/wiki/tutorialam
But I got nothing output while using the
pocketsphinx_continuous -hmm ... -lm ... -dict ...
The output just like
READY....
Listening...
Recoding is stopped, start...
Stoped listening...
...(lots of INFO)
000000000:(nothing here!)
READY....
I've checked my wav files' format, it's 16it 16KHz mono.
And also use ./scripts_pl/decode/slave.pl command and the result is 40% Error rate.(My model and training set is very small).
My devices are running well because I've test pocketSphinx in English model and it performs very well.
What other may lead to that strange result?

What other may lead to that strange result?
It's impossible to answer your question because you didn't provide the data files you are using. The troubleshooting section of the tutorial says
If you want to ask for help about training, try to provide the
training folder or at least logdir folder. We would be glad to help you.

Related

Sound detection of cutting woods

Im really new to machine Learning.I have a project to identify a given sound.(Ex: cutting wood)In the audio clip there will be several sound. What i need to do is recognise that particular sound from it. I red some articles about machine learning. But i still have lack of knowledge where to start this project and also I'm running out of time.
Any help will be really appreciated. Can anyone please tell me how to do this?
Can i directly perform template(algorithms) matching for a sound?
It's a long journey ahead of you and Stack Overflow isn't a good place for asking such a generic question. Consult help section for more.
To get you started, here are some web sites:
Awesome Bioacoustic
Comparative Audio Analysis With Wavenet, MFCCs, UMAP, t-SNE and PCA
Here are two small repos of mine related to audio classification:
Gender classification from audio
Kiwi / not-a-kiwi bird calls detector
They might give you an idea where to start your project. Check the libraries I am using - likely they will be of help to you.

Tutorial tensorflow audio pitch analysis

I'm a beginner with tensorflow and Python and I'm trying to build an app that automatically detects, in a football (soccer) match some key moments (yellow/red cards, goals, etc).
I'm starting to understand how to do a video analysis training the program on a dataset built by me, downloading images from the web and tagging them. In order to obtain some better results for the analysis, I was wondering if someone had some suggestions on tutorials to follow in order to understand how to train my app also on audio files, to make the program able to understand when there is a pitch variation in the audio of the video and combine both video and audio analysis in order to get better results.
Thank you in advance
Since you are new to Python and to tensorflow, I recommend you focus on just audio for now, especially since its a strong indicator of events of importance in a football match (red/yellow cards, nasty fouls, goals, strong chances, good plays, etc).
Very simply, without using much ML at all, you can use the average volume of a time period to infer significance. If you want to get a little more sophisticated, you can consider speech-to-text libraries to look for keywords in commentator speech.
Using video to try to determine when something important is happening is much, much more challenging.
This page can help you get started with audio signal processing in Python.
https://bastibe.de/2012-11-02-real-time-signal-processing-in-python.html

Recording microphone audio with Haskell

I am working on some voice recognition software as my first Haskell project. I am looking for something something that could work cross-platform. My plan is as follows:
Record audio as a .wav file
Use https://hackage.haskell.org/package/flac to convert the .wav file to a FLAC file.
Send a speech API request to Google's Speech API (as can be seen here https://cloud.google.com/speech/docs/quickstart).
Receive response and interpret it.
However, I am currently somewhat stuck on step 1 as it seems rather difficult to record microphone input in Haskell.
Since this is a learning experience I wanted to try to find something simple that I could understand quite easily. After some googling I found this: https://codereview.stackexchange.com/questions/169577/sound-recorder-based-on-sdl2, which uses Haskell SDL2 bindings and Codec.Audio.Wave. I tried copying the code to see if it would work, but alas it did not. I did not have any errors with running the program, but when I tried playing the wave file, nothing happened and it seemed completely empty. Nothing seems to have been recorded, as the file size is only 44 bytes.
I figured the issue was the SDL open device spec not being able to find a suitable recording device, so I tried checking the frequency to see if my microphone wasn't able to handle 48000 Hz and then since that didn't help (it is 96000 Hz max) I tried messing around with the device channels, but that did not help either.
The problem is not with my microphone since I tried recording some audio with arecord (on Arch Linux), which worked just fine. As I said, this is my first Haskell project and I am a bit of a newbie and although I do have experience in other languages it is completely possible that I have done something really silly.
I am not opposed to any solutions not involving SDL2, but as I am looking for something (hopefully) simple and cross-platform, this seemed like a good fit.

Can FFMPEG or any other project detect an audio file contains only noises?

I have a batch of audio files which recording people's voice. But some of this audio files record only noises or microphone burst. I want to detect these files and jump over them while processing my program.
I'm not sure whether ffmpeg can do this. If yes, could you guys provide me a link of that method? If not, do you know if there is some other software can do this? Or do you have any solution or suggestion to this problem?
Thank you.
I would approach this by looking at peak values and duration. SOX is a program that allows shell scripting which could batch analyze this. There is a large user base and forum as well.
Here is a link to a forum topic discussing it's use on batch discovering peak values and outputting information to a .csv file.

getting audio frequency from amr file

I am new in j2me developing world.
I just want to know that how to get audio frequency from the audio recording application which stores data in .amr file.
Please help me, I tried a lot, but I am helpless.
So any idea regarding this will be appreciated.
thanks in advance.
im gonna ad here what i have found from the other sites that may be useful to you and me(as a newbie)
http://www.developer.nokia.com/Community/Discussion/showthread.php?154169-Getting-Recorded-Audio-Frequency-in-J2ME
If you want frequency of sound in Hz then it is actually not a single value but a series of values as a function of time.
You will have to calculate fourier transform of the sound samples which will give you frequency.
Read about this on wikipedia on how to calculate fourier transform and frequency graph...
http://www.developer.nokia.com/Community/Discussion/showthread.php?95262-Frequency-Analysis-in-J2ME-MMAPI
this forum says something about fft(fast fourrier transform) and analysing recorded amr sound rather than processing live stream and provides 3 link about fft which are right underneat this line have a look at them:..
look at the site mobile-tuner.com. (im new too. in fact i know nothing about java.)
but the site says that tuner function enabled phones are s60 phones. i was trying to write guitar tuner program. since my phone is nokia 5310 express music which is s40 i gave up.
so good luck to you
note: javax.microedition.media.control.RecordControl
--i don't know too much but i have a hunch about that ""Record Control"" class or function is related to audio frequency function in j2me. and the frequency analysis thing is inside the "sound processing"

Resources