Unable to Convert Speech to Text - audio

I'm getting a 500 error when converting a simple MP3 file from speech to text using the Wit.ai site.
I'm thinking the WAV I'm sending is not in the right format. Here's my conversion:
ffmpeg -i input.mp3 -acodec pcm_s16le -ac 1 -ar 16000 input.wav
This gives me a WAV file with pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Here's my request:
curl -X "POST" "https://api.wit.ai/speech?v=20160526" \
-H "Authorization: Bearer TOKEN_HERE" \
-H "Content-Type: audio/wav"
I'm sending the file as binary content in the request. I also tried MP3 but it does not work either. Any idea why?

Ended up that audio files with more than 10s cannot be processed. They should implement a specific 400 Bad Request for that and perhaps a response with this very thing in it.

Related

Detect silence(s) in audio channel from video stream

We need to detect the 'silence'(s) in the audio channel of a video stream. We have been able to receive a UDP video stream and extract audio from it using the command:
ffmpeg -y -i udp://127.0.0.1:23000 -ab 3000k -ar 44100 -ac 1 test.wav
The audio file was saved only to verify whether audio has been extracted correctly or not.
To detect 'silence'(s) in the audio, we are using the silencedetect filter. We referred to some examples and it seems to work for audio files:
ffmpeg -i audio/file/path -af silencedetect=noise=-50dB:d=0.25 -f null -
We are unable to detect silence(s) in the audio from a video stream. This is the command we came up with:
ffmpeg -y -i udp://127.0.0.1:23000 -ab 3000k -ar 44100 -ac 1 -af silencedetect=noise=-50dB:d=0.25 -f null -
What is it that we are doing wrong? Any help would be appreciated.
Thanks!

How to convert AC3 audio to Wav audio?

I would like to convert a AC3 audio file (ATSC A/52 aka AC-3 aka Dolby Digital stream 6 channels) to Wave audio file (16khz mono/1 channel).
While searching on the internet, a lot of people just used ffmpeg -i file.ac3 file.wav however, i'm not sure if that even works.
I keep getting
[ac3 # 0x55ac1a0b0660] exponent -1 is out-of-rangets/s speed= 125x
[ac3 # 0x55ac1a0b0660] error decoding the audio block
[ac3 # 0x55ac1a0b0660] frame sync error
Error while decoding stream #0:0: Invalid data found when processing input
etc
while I do the same command.
How do I convert ac3 to wav (16khz mono)?
*Note:
I also tried ffmpeg -i file.ac3 -codec:a:1 ac3 -codec copy -b:a 384 file.wav -ac 1 -ar 16000. But this doesn't output an actual wav file.
ffmpeg -i file.ac3 -vcodec copy -acodec pcm_s16le -ar 16000 -ab 128k -ac 1 file.wav should do it!
Also you can convert Eac3 to wav in high quality mode.
The wav file will be 48KHz 24Bit 6 Channels.
ffmpeg -i "input.eac3" -acodec pcm_s24le -ar 48000 -ac 6 "output.wav"
If you want to export in 8 channels, just write 8 instead of 6.

ffmpeg sequence of multiple filters syntax

i am trying to use multiple filters in ffpmeg, but it does not allow more than one -af.
so, then i decided to try to do it with a -complex_filter.
sudo ffmpeg -f alsa -i default:CARD=Device \
-filter_complex \
"lowpass=5000,highpass=200; \
volume=+5dB; \
afftdn=nr=0.01:nt=w;" \
-c:a libmp3lame -b:a 128k -ar 48000 -ac 1 -t 00:00:05 -y $recdir/audio_$(date '+%Y_%m_%d_%H_%M_%S').mp3
it must work, but for some reason i get an error:
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, alsa, from 'default:CARD=Device':
Duration: N/A, start: 1625496748.441207, bitrate: 1536 kb/s
Stream #0:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
[AVFilterGraph # 0xaaab0a8b14e0] No such filter: ''
Error initializing complex filters.
Invalid argument
i have tried quotes and others, nothing helps..
ffmpeg -f alsa -i default:CARD=Device \
-filter_complex \
"lowpass=5000,highpass=200,volume=+5dB,afftdn=nr=0.01:nt=w" \
-c:a libmp3lame -b:a 128k -ar 48000 -ac 1 -t 00:00:05 -y $recdir/audio_$(date '+%Y_%m_%d_%H_%M_%S').mp3
If you end your filtergraph with ; then ffmpeg expects another filter. That is why you got the error No such filter: ''. Avoid ending with ;.
You have a linear series of simple filters so separate the filters with commas. This also means you can still use -af instead of -filter_complex if you prefer.
See FFmpeg Filtering Introduction to see the difference between ; and ,.

Not outputting Opus raw audio

I'm currently writing a small script that coverts an MP4 to Opus audio on the fly and sends it to Discord in golang. Initially my script would pass an MP4 as it was downloading to ffmpeg through stdin and then pass stdout to an Opus encoder, then to Discord (exactly like this). After learning I could build ffmpeg with Opus, I'd like to cut out the opus encoder I previous had and pass ffmpeg's output directly to Discord.
Previous, my ffmpeg command looked like this (with using the second opus encoder)
ffmpeg -i - -f s16le -ar 48000 -ac 2 pipe:1
Now, without the encoder and letting ffmpeg do all the work, this is what I've come up with so far.
ffmpeg -i - -f s16le -ar 48000 -ac 2 -acodec libopus -b:a 192k -vbr on -compression_level 10 pipe:1
With this command however the audio doesn't get accepted by Discord's server, meaning I'm suspecting opus audio isn't coming out the other end. No errors outputted. Have I done something wrong with ffmpeg that could of caused this?
Try
ffmpeg -i - -sample_fmt s16 -ar 48000 -ac 2 -acodec libopus -b:a 192k -vbr on -compression_level 10 -f opus pipe:1
You can't use -f s16le as that specifies an uncompressed output format (of a specific sample type), whereas you need a compressed data stream of a certain codec. Instead, you can use sample_fmt and -f opus

Extract every audio and subtitles from a video with ffmpeg

I have multiple audio tracks and subtitles to extract in a single .mkv file. I'm new to ffmpeg commands, this is what I've tried (audio):
ffmpeg -i VIDEO.mkv -vn -acodec copy AUDIO.aac
It just extract 1 audio. What I want is tell ffmpeg to extract every single audio files and subtitle files to a destination, and keep the original name of each files and extensions. (Because I don't know which extension does the audio files are, sometimes maybe .flac or .aac).
I'm not sure about the solutions I'd found online, because it's quite complicated, and I need explanations to know how it's works, so that I can manipulate the command in the future. By the way, I planned to run the code from Windows CMD. Thanks.
There is no option yet in ffmpeg to automatically extract all streams into an appropriate container, but it is certainly possible to do manually.
You only need to know the appropriate containers for the formats you want to extract.
Default stream selection only chooses one stream per stream type, so you have to manually map each stream with the -map option.
1. Get input info
Using ffmpeg or ffprobe you can get the info in each individual stream, and there is a wide variety of formats (xml, json, cvs, etc) available to fit your needs.
ffmpeg example
ffmpeg -i input.mkv
The resulting output (I cut out some extra stuff, the stream numbers and format info are what is important):
Input #0, matroska,webm, from 'input.mkv':
Metadata:
Duration: 00:00:05.00, start: 0.000000, bitrate: 106 kb/s
Stream #0:0: Video: h264 (High 4:4:4 Predictive), yuv444p, 320x240 [SAR 1:1 DAR 4:3], 25 fps, 25 tbr, 1k tbn, 50 tbc (default)
Stream #0:1: Audio: vorbis, 44100 Hz, mono, fltp (default)
Stream #0:2: Audio: aac, 44100 Hz, mono, fltp (default)
Stream #0:3: Audio: flac, 44100 Hz, mono, fltp (default)
Stream #0:4: Subtitle: ass (default)
ffprobe example
ffprobe -v error -show_entries stream=index,codec_name,codec_type input.mkv
The resulting output:
[STREAM]
index=0
codec_name=h264
codec_type=video
[/STREAM]
[STREAM]
index=1
codec_name=vorbis
codec_type=audio
[/STREAM]
[STREAM]
index=2
codec_name=aac
codec_type=audio
[/STREAM]
[STREAM]
index=3
codec_name=flac
codec_type=audio
[/STREAM]
[STREAM]
index=4
codec_name=ass
codec_type=subtitle
[/STREAM]
2. Extract the streams
Using the info from one of the commands above:
ffmpeg -i input.mkv \
-map 0:v -c copy video_h264.mkv \
-map 0:a:0 -c copy audio0_vorbis.oga \
-map 0:a:1 -c copy audio1_aac.m4a \
-map 0:a:2 -c copy audio2.flac \
-map 0:s -c copy subtitles.ass
In this case, the example above is the same as:
ffmpeg -i input.mkv \
-map 0:0 -c copy video_h264.mkv \
-map 0:1 -c copy audio0_vorbis.oga \
-map 0:2 -c copy audio1_aac.m4a \
-map 0:3 -c copy audio2.flac \
-map 0:4 -c copy subtitles.ass
I prefer the first example because the input file index:stream specifier:stream index is more flexible and efficient; it is also less prone to incorrect mapping.
See documentation on stream specifiers and the -map option to fully understand the syntax. Additional info is in the answer to FFmpeg mux video and audio (from another video) - mapping issue.
These examples will stream copy (re-mux) so no re-encoding will occur.
Container formats
A partial list to match the stream with the output extension for some common formats:
Video Format
Extensions
H.264
.mp4, .m4v, .mov, .h264, .264
H.265/HEVC
.mp4, .h265, .265
VP8/VP9
.webm
AV1
.mp4
MPEG-4
.mp4, .avi
MPEG-2
.mpg, .vob, .ts
DV
.dv, .avi, .mov
Theora
.ogv/.ogg
FFV1
.mkv
Almost anything
.mkv, .nut
Audio Format
Extensions
AAC
.m4a, .aac
MP3
.mp3
PCM
.wav
Vorbis
.oga/.ogg
Opus
.opus, .oga/.ogg, .mp4
FLAC
.flac, .oga/.ogg
Almost anything
.mka, .nut
Subtitle Format
Extensions
Subrip/SRT
.srt
SubStation Alpha/ASS
.ass
You would first list all the audio streams:
ffmpeg -i VIDEO.mkv
and then based on the output you can compile the command to extract the audio tracks individually.
Using some shell script you can then potentially automate this in a script file so that you can do it generically for any mkv file.
Subtitles are pretty much the same. The subtitles will be printed in the info and then you can extract them, similar to:
ffmpeg -threads 4 -i VIDEO.mkv -vn -an -codec:s:0.2 srt myLangSubtitle.srt
0.2 is the identifier that you have to read from the info.
I solved it like this:
ffprobe -show_entries stream=index,codec_type:stream_tags=language -of compact $video1 2>&1 | { while read line; do if $(echo "$line" | grep -q -i "stream #"); then echo "$line"; fi; done; while read -d $'\x0D' line; do if $(echo "$line" | grep -q "time="); then echo "$line" | awk '{ printf "%s\r", $8 }'; fi; done; }
Output:
Only set $video1 var before command.
Enjoy it!.
If someone steps in this question with a modern version of ffmpeg, it looks like they added the option there.
I needed to convert a file by maintaining all tracks:
ffmpeg -i "${input_file}" -vcodec hevc -crf 28 -map 0 "${output_file}"
To achieve what the original question asked, probably this could be used:
mappings="`ffmpeg -i \"${filein}\" |& awk 'BEGIN { i = 1 }; /Stream.*Audio/ {gsub(/^ *Stream #/, \"-map \"); gsub(/\(.*$/, \" -acodec mp3 audio\"i\".mp3\"); print; i +=1}'`"
ffmpeg -i "${input_file}" ${mappings}
The 1st line (mappings=...) extracts the existing audio streams and converts them in "-map X:Y -acodec mp3 FILENAME", while the 2nd one executes the extraction
The following script extracts all audio streams from files in current directory
ls |parallel "ffmpeg -i {} 2>&1 |\
sed -n 's/.*Stream \#\(.\+\)\:\(.\+\)\: Audio\: \([a-zA-Z0-9]\+\).*$/-map \1:\2 -c copy \"{.}.\1\2.\3\"/p' |\
xargs -n5 ffmpeg -i {} "

Resources