Is there some command line utility of pocket sphinx or cmu sphinx to convert a .wav file to text?
pocketsphinx_continuous -hmm -lm -dict will do. But I don't want to keep speaking the same sentence again and again.
pocketsphinx_continuous starting from version 0.8 has option -infile which you can use to decode a file. File must be in a specific format: 16khz 16bit mono wav file
pocketsphinx_continuous -infile file.wav
Related
I am using pocketsphinx to convert audio to text in ubuntu, the result contains text but to have also the time (in minutes and seconds) while converting audio to text with pocketsphinx, in addition to the generated text fromthe audio I want the time (in minutes and seconds) during which a word or phrase is pronounced. I am using this command :
pocketsphinx_continuous -infile file.wav 2> pocketsphinx.log > result.txt
pocketsphinx_continuous -time yes -infile file.wav 2> pocketsphinx.log > result.txt
I want to use ffmpeg to trim some mp3s without re-encoding. The command I used was
ffmpeg -i "inputfile.mp3" -t 00:00:12.414 -c copy out.mp3
However, out.mp3 has a length of 12.460s, and when I load the file in Audacity I can see that it was cut at the wrong spot, and not at 12.414s.
Why is this? I googled a bit and tried some other commands like ffmpeg -i "inputfile.mp3" -ss 0 -to 00:00:12.414 -c copy out.mp3 (which interestingly results in a different length of 12.434s) but could never get the milliseconds to be cut right.
PS. I wasn't sure whether SO was the right place to ask since it isn't technically programming related, however most of the stuff I found on ffmpeg for trimming audio files were stackoverflow questions, e. g. ffmpeg trimming videos with millisecond precision
You can't trim MP3 (nor most lossy codec output) with that level of precision. An MP3 frame or so of padding is added during encoding. (See also: https://wiki.hydrogenaud.io/index.php?title=Gapless, and all the hacks required to make this work.)
If you need precision timing, use something uncompressed like PCM in WAV, or a lossless compression like FLAC.
On Linux you can use mp3splt:
mp3splt -f mp3file.mp3 from to -o output file format
Example:
mp3splt -f "/home/audio folder/test.mp3" 0.11.89 3.25.48 -o #f_trimmed
this will create a "/home/audio folder/test_trimmed.mp3"
For more info to the parameters, check the mp3splt man page here
On Windows you can use mp3DirectCut
mp3DirectCut has a GUI, but it also have command line support
I am trying to use a keyphrase with pocketsphinx, but it keeps throwing the error,
ERROR: "kws_search.c", line 171: The word 'hey' is missing in the dictionary
Even though it is 100% in the dictionary. It is a big part of the dictionary and it recognizes that word fine when I leave the keyphrase out. Am I using it wrong? There isn't a tutorial that I could find. Everything is using python or android.
pocketsphinx_continuous -hmm /usr/local/share/pocketsphinx/model/en-us/en-us -dict 9063.dic -lm 9063.lm -vad_threshold 3.0 -kws keyphrase.file -infile /dev/stdin
and the keyphrase.file is
hey /1.0/
The correct command line is:
pocketsphinx_continuous -vad_threshold 3.0 -kws keyphrase.file -infile /dev/stdin
you do not need -lm and -dict which configures language model search mode. You need keyword search mode. When you use -dict you replace default dictionary with the dictionary with upper-case words. Words are case sensitive.
Tutorial is here.
I want to create a PDF with selectable/searchable text..
I have source.png which has gone through some pre-processing before OCR, and then I have view.jpg which is a compressed version of source.png to reduce the output PDF file
How do I define the view.jpg in the syntax?
tesseract -l eng source.png out pdf
I'm not sure whether you can specify view.jpg in the command. The out.pdf already contains some sort of a compressed source.png.
how do you convert a mp3wav (a compressed wav in mp3 form) to uncompressed wav (PCM) using sox?
mp3wav sample files can be downloaded here: http://www.clayloomis.com/simsong.html
I would have thought the following would simply work:
sox file.mp3 file.wav
It may be your version of sox doesn't handle MP3 files at all. I think this happened to me with the default RPM for openSUSE recently...