Use of google speech API, issu with audio file format - audio

Hope you're all well,
I'm trying to use google speech API to convert interview I give directly into text files.
I'm trying to get the environment up and running, so I'm testing it with a file :
I'm really new with audio, so I converted a test file into FLAC using an online converter : http://www.online-convert.com/
Then, I used ffprobe in order to verify, it looks good to me,
Input #0, flac, from '../../Walk_Away.flac':
Metadata:
MAJOR_BRAND : mp42
MINOR_VERSION : 0
COMPATIBLE_BRANDS: isommp42
ARTIST : Aaron Michael Cox
TITLE : Walk Away
ENCODER : Lavf57.57.100
Duration: 00:03:12.08, start: 0.000000, bitrate: 185 kb/s
Stream #0:0: Audio: flac, 16000 Hz, mono, s16
[FORMAT]
filename=../../Walk_Away.flac
nb_streams=1
nb_programs=0
format_name=flac
format_long_name=raw FLAC
start_time=0.000000
duration=192.078375
size=4444256
bit_rate=185101
probe_score=50
TAG:MAJOR_BRAND=mp42
TAG:MINOR_VERSION=0
TAG:COMPATIBLE_BRANDS=isommp42
TAG:ARTIST=Aaron Michael Cox
TAG:TITLE=Walk Away
TAG:ENCODER=Lavf57.57.100
[/FORMAT]
but, when I'm trying the :
node recognize.js async-gcs gs://file.flac -e LINEAR16 -r 16000
I receive a really sad result : "Transcription: ,,"
Can anyone help me please ? Thanks a lot
Ivan

The problem is that the file is a FLAC file, but in the command you've specified that it's a raw audio file (LINEAR16). Try again but specify that it's a FLAC file with the option -e FLAC and see if that helps. So the command should look something like this:
node recognize.js async-gcs gs://file.flac -e FLAC -r 16000
Alternatively you could convert the audio file to a raw audio file.

Related

Can't give metadata of comment to MP3 file using ffmpeg

I want to covert an AAC audio file to MP3 and add a comment in the metadata of the MP3 file using ffmpeg.
The -metadata comment option doesn't work and ffmpeg doesn't return an error.
The complete command I'm running is
ffmpeg -i "test.aac" -ab 128k -metadata comment='this is test' "test.mp3"
I also tried -metadata description='this is test' and even updated ffmpeg. Other options such as -metadata artist and -metadata album work well.
What's wrong with this approach?
Output
Stream mapping:
Stream #0:0 -> #0:0 (aac (native) -> mp3 (libmp3lame))
Press [q] to stop, [?] for help
Output #0, mp3, to 'test.mp3':
Metadata:
description : this is test
TSSE : Lavf58.29.100
Stream #0:0: Audio: mp3 (libmp3lame), 48000 Hz, stereo, fltp, 128 kb/s
Metadata:
encoder : Lavc58.54.100 libmp3lame
Environment
ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers
built with Apple clang version 11.0.3 (clang-1103.0.32.59)
Possible bug. ffmpeg is writing comment metadata as user text frame (TXXX) instead of the expected comment (COMM).
For now I suggest using a different tool for comment tag. eyeD3 example:
eyeD3 --comment "added a comment" input.mp3
ffmpeg by default writes an ID3v2 tag to MP3 output files. As of version 4.3.1 (or even snapshot 2021-02-10 and 2022-09-20) it is still wrongly written as a TXXX frame instead of COMM - /libavformat/id3v2.c does nowhere handle the needed association, and /libavformat/id3v2enc.c indicates that the -comment parameter is only used when providing a graphic to embed (i.e. album cover).
As an alternative you could force an ID3v1 tag (with all its shortcomings). You must also disable an ID3v2 tag creation, as almost every software encountering both ID3 versions prefers data from v2 over v1. The parameters to be added would be -write_id3v1 true -id3v2_version 0, so the overall execution is (on Windows):
ffmpeg -i "test.aac" -ab 128k -metadata "comment=this is a test" -write_id3v1 true -id3v2_version 0 "test.mp3"
This works as expected: no ID3v2 tag, only an ID3v1 tag where only the comment is filled. The quotation marks starting before comment and ending after test are needed so Windows knows where that one whole parameter starts and where it ends (and not with the next space character, as per default - that's also the reason why filenames should go into quotation marks).

FFmpeg - how to set output sample_size

Trying to create a simple command line player for .dsf (DSD audio) files, and output to an alsa device that supports up to 24-bit 192 kHz sample rate. The following command almost works and it does play the track. Examining the bold text below, the dsf input file is converted to 24-bit/192 kHz, but the output is then truncated to 16-bit 192 kHz (pcm_s16le i.e, 16 bit little endian).
ffmpeg -i '01 - Sweet Georgia Brown.dsf' -f alsa hw:0,0
After displaying the ffmpeg banner and song metadata (tags), here is the result, bold is my emphasis:
Duration: 00:05:14.83, start: 0.000000, bitrate: 9234 kb/s
Stream #0:0: Audio: flac, 192000 Hz, stereo, s32 (24 bit)
Stream mapping:
Stream #0:0 -> #0:0 (flac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, alsa, to 'hw:0,0':
Since I can play this and many other tracks at full resolution using another player (foobar2000) it seems there might be an option in the encoder which is part of FFmpeg: Lavf57.83.100 I can find no information in any of the FFmpeg documentation that helps. Tried finding options in FFplay and even guessing using other FFmpeg options like this example.
ffmpeg -sample_fmt s24 -i '01 - Sweet Georgia Brown.dsf' -f alsa hw:0,0 ***** same results.
I'm stuck. Any suggestions?
Environment: Linux Mint 19.2, 64-bit, ASUS Xonar STXii sound card.
Each output format or device has a default encoder registered for each media type it accepts. ALSA accepts audio and its default encoder is 16-bit signed PCM.
You can change the encoder by specifying one.
ffmpeg -i '01 - Sweet Georgia Brown.dsf' -c:a pcm_s24le -f alsa hw:0,0

Changing sample rate of mp3 files

I have a large amount of mp3 files that are not the correct sample rate for the external hardware I want to use them in. Is there any way of changing them all in one go rather than file by file through audacity?
You should mention what OS you're on ... this works on linux
sudo apt install libav-tools # install needed tool
// show what we have for one file
avprobe mysong.mp3
bottom of its output says
Duration: 00:00:01.65, start: 0.000000, bitrate: 192 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, mono, s16p, 192 kb/s
OK its a normal CD quality 44.1kHz so lets lower sample rate in half to 22050 kHz
avconv -i mysong.mp3 -ar 22050 mysong_22k.mp3
verify what we have now
avprobe mysong_22k.mp3
Duration: 00:00:01.70, start: 0.050113, bitrate: 33 kb/s
Stream #0:0: Audio: mp3, 22050 Hz, mono, s16p, 32 kb/s
so far so good now lets wrap this to look across all files in one dir
#!/bin/bash
for curr_song in $( ls *mp3 ); do
echo
echo "current specs on song -->${curr_song}<--"
echo
curr_song_base_name=${curr_song%.*}
echo curr_song_base_name $curr_song_base_name
curr_new_output=${curr_song_base_name}_22k.mp3
echo "avprobe $curr_song "
avprobe "$curr_song"
echo
avconv -i ${curr_song} -ar 22050 ${curr_new_output}
echo now confirm it worked
echo
avprobe ${curr_new_output}
done
this should get you up and running ... its runs fine for song names without spaces ... code is a tad more involved to handle spaces in filenames ... if you have spaces say so and I'll amend the code ... it cuts each output file by adding a _22k to end of file name so
input songhere.mp3
output songhere_22k.mp3
its easy enough to give it a different output directory

Fluent-ffmpeg "not a suitable output format"

I'm using the fluent-ffmpeg module for Node.js to convert audio files. I have a .mp3 file that I'd like to convert to .wma
Here's what that looks like:
var proc = new ffmpeg({
source: 'file.mp3',
nolog: false
}).toFormat('wma')
.saveToFile('file.wma', function(stdout, stderr)
{
console.log(stderr);
});
Unfortunately, I get the error:
Requested output format 'wma' is not a suitable output format
This is the entire error log:
ffmpeg version 0.8.9-4:0.8.9-0ubuntu0.12.04.1, Copyright (c) 2000-2013 the Libav developers
built on Nov 9 2013 19:25:10 with gcc 4.6.3
*** THIS PROGRAM IS DEPRECATED ***
This program is only provided for compatibility and will be removed in a future release. Please use avconv instead.
Input #0, mp3, from 'song_downloads/You Suffer.mp3':
Metadata:
title : You Suffer
artist : Napalm Death
album : Scum
genre : Death Metal
track : 12
date : 1987
Duration: 00:00:04.98, start: 0.000000, bitrate: 381 kb/s
Stream #0.0: Audio: mp3, 44100 Hz, stereo, s16, 193 kb/s
Requested output format 'wma' is not a suitable output format
I know this isn't an ffmpeg issue because
ffmpeg -i file.mp3 file.wma
Works fine. Any ideas?
I think, wma is not a container format. It is an audio codec. WMA file is most commonly contained in ASF (Advanced Systems Format) format. So choose the correct options as given in the fluent-ffmpeg API to set codec and the format. You can run these commands:
ffmpeg -formats to see all formats and
ffmpeg -codecs to see all supported codecs
this worked for me fine
fluent_ffmpeg('input.mp3').audioCodec("aac").save('output.aac')
when formats was:
D aac raw ADTS AAC (Advanced Audio Coding)
not E = Muxing supported

RTMP: Is there such a linux command line tool?

I have looked everywhere to find a linux utility that will allow me to download rtmp streams. Not flv video but MP3 streams. The location of the streams I want to download are in this format.
rtmp://live.site.com/loc/45/std_fc74a6b7f79c70a5f60.mp3
Anyone know of such a command line tool? Or even anything close to what I am asking for?
I do not want full software applications and it would be great if it worked on Linux via Shell or something.
Thanks all
One of the following should do, if you have mplayer or vlc compiled with RTMP access.
mplayer -dumpstream rtmp://live.site.com/loc/45/std_fc74a6b7f79c70a5f60.mp3
This will generate a ./stream.dump.
vlc -I dummy rtmp://live.site.com/loc/45/std_fc74a6b7f79c70a5f60.mp3 \
--sout file/ts:output.mpg vlc://quit
This will generate a ./output.mpg. You'll have to demux it to extract just the audio stream out.
This question is old but this can help to another users with this doubt.
To download directly, without any conversion, there is two options (the author of both programs is the same and the behavior is the same):
RTMPDump. Example: rtmpdump -r "rtmp://host.com/dir/file.flv" -o filename.flv
flvstreamer. Example: flvstreamer -r "rtmp://od.flash.plus.es/ondemand/14314/plus/plustv/PO770632.flv" -o salida.flv
And if you want download and convert the video at same time, the best way is use ffmpeg:
ffmpeg -i rtmp://server/live/streamName -acodec copy -vcodec copy dump.mp4
I think the landscape has changed a bit since the time of some of the previous answers. At least according to the rtmp wikipedia page. It would appear that the rtmp protocol specification is open for public use. To that end you can use 2 tools to accomplish what the original poster was asking, rtmpdump and ffmpeg. Here's what I did to download a rtmp stream that was sending an audio podcast.
step #1 - download the stream
I used the tool rtmpdump to accomplish this. Like so:
% rtmpdump -r rtmp://url/to/some/file.mp3 -o /path/to/file.flv
RTMPDump v2.3
(c) 2010 Andrej Stepanchuk, Howard Chu, The Flvstreamer Team; license: GPL
Connecting ...
INFO: Connected...
Starting download at: 0.000 kB
28358.553 kB / 3561.61 sec
Download complete
step #2 - convert the flv file to mp3
OK, so now you've got a local copy of the stream, file.flv. You can use ffmpeg to interrogate the file further and also to extract just the audio portion.
% ffmpeg -i file.flv
....
[flv # 0x25f6670]max_analyze_duration reached
[flv # 0x25f6670]Estimating duration from bitrate, this may be inaccurate
Input #0, flv, from 'file.flv':
Duration: 00:59:21.61, start: 0.000000, bitrate: 64 kb/s
Stream #0.0: Audio: mp3, 44100 Hz, 1 channels, s16, 64 kb/s
From the above output we can see that the file.flv contains a single stream, just audio, and it's in mp3 format, and it's a single channel. To extract it to a proper mp3 file you can use ffmpeg again:
% ffmpeg -i file.flv -vn -acodec copy file.mp3
....
[flv # 0x22a6670]max_analyze_duration reached
[flv # 0x22a6670]Estimating duration from bitrate, this may be inaccurate
Input #0, flv, from 'file.flv':
Duration: 00:59:21.61, start: 0.000000, bitrate: 64 kb/s
Stream #0.0: Audio: mp3, 44100 Hz, 1 channels, s16, 64 kb/s
Output #0, mp3, to 'file.mp3':
Metadata:
TSSE : Lavf52.64.2
Stream #0.0: Audio: libmp3lame, 44100 Hz, 1 channels, 64 kb/s
Stream mapping:
Stream #0.0 -> #0.0
Press [q] to stop encoding
size= 27826kB time=3561.66 bitrate= 64.0kbits/s
video:0kB audio:27826kB global headers:0kB muxing overhead 0.000116%
The above command will copy the audio stream into a file, file.mp3. You could also have extracted it to a wav file like so:
ffmpeg -i file.flv -vn -acodec pcm_s16le -ar 44100 -ac 2 file.wav
This page was useful in determining how to convert the flv file to other formats.

Resources