FFMPEG audio decoding - audio

I have used the avcodec_decode_audio3 function to decode the AMR content in the frame order.
I get 640 bytes output for each frame, with sample format being float and I have saved the output as a raw output file.
Now, I want to validate this output content. But I can't play it in any player as it does not have any header or media info. And I am not able to find any command in ffmpeg which gives me raw audio output.
Now, if want to re-encode that raw output content in FFMPEG, what would be the input format I need to give.
Can anybody give some suggestion on this?

If the audio data is saved in a binary file as raw (headerless), you can use Audacity to import is as raw data and play it back. You would need to provide sample encoding, sample rate and number of channels.
If there are any problems you can perform conversion to a raw file using ffmpeg, and use the result for comparison. For example:
ffmpeg -i input.wav -f f32le output.raw
produces raw audio file with 32-bit little-endian float samples, with original sample rate and number of channels. Alternatively, result sample rate and number of channels can be specified, for example, -ar 44100 and -ac 2.

Related

Is there a way to ensure mp3 duration accuracy with variable bit rate using FFMPEG?

In our application, we are processing audio files using ffmpeg. Specifically, we use the NodeJS library fluent-ffmpeg, (npm link).
Our audio files are generated from various text to speech providers. We recently noticed that when we converted audio using ssml to add pauses to the generated audio, the duration on the file is no longer correct. Upon further investigation, we noticed that the standard audios were also incorrect, just more accurate overall due to the more consistent data. When we put a pause at the beginning of the audio, the estimate was the worst, overshooting it by a very large margin (e.g., a 25s audio clip would read as 3 minutes long, but skip to the end when playing past the 25s mark.
I did some searching and research into the structure of MP3 files, and to me it seems like the issue is because the duration gets estimated by various audio players. Windows media player is an example, but Firefox's web player seems to also do this. I tried changing the ffmpeg command from using .audioQuality(0), which sets ffmpeg to use VBR, to .audioBitrate(320), which tells ffmpeg to use a constant bitrate.
For reference, the we are using libmp3lame, and the full command that gets run is the following, for the VBR and CBR cases respectively:
For VBR (broken durations): ffmpeg -i <URL> -acodec libmp3lame -aq 0 -f mp3 pipe:1
For CBR (correct duration): ffmpeg -i <URL> -acodec libmp3lame -b:a 320k -f mp3 pipe:1
Note: we then pipe the output to the requesting client application after sending the appropriate file headers, hence the pipe:1 output. The input is a cloud storage url where the source file is located
This fixes our problem of having a correct duration, and it makes sense to me why this would fix it if the problem was because the duration is being estimated by some of these players / audio consumers. But, this came at the cost that the file size was significantly larger, which also makes sense to me. While testing we found that compared to the same file in WAV, the VBR mp3 was about 10% of the WAV file size, while the CBR mp3 was still 50% of the WAV file size. This practically defeats the purpose of supporting the mp3 format for our use-case, which is a smaller but slightly lossy alternative to the large WAV file.
While researching, I found that there can be ID3 tags in a chunk at the beginning of the mp3 file, specifying information for the consumer of the audio to know the duration before potentially having processed the whole file. But, I also found that there doesn't seem to be a standard, at least for duration. More things like song title, album, artist, etc.
My question is, is there a way to get the proper duration onto an mp3 file, preferably via some ffmpeg mechanism, while still using VBR? Thanks!
FFmpeg does write a Xing header by default with duration info. However, that value is only known after the entire stream data has been received, so ffmpeg has to seek to the head to write it. Since you're piping the output, that can't be done.
Write the file locally or to some seekable destination, and then upload.

How do you encode raw pcm_f32le audio to AAC encoded audio with FFmpeg (C/C++)?

I am trying to encode raw audio (pcm_f32le) to AAC encoded audio. One thing I've noticed is that I can accomplish this via the CLI tool:
ffmpeg -f f32le -ar 48000 -ac 2 -c:a pcm_f32le -i out.raw out.m4a -y
This plays just fine and decodes fine.
The steps I've taken:
When I am using the C example code: https://ffmpeg.org/doxygen/3.4/encode_audio_8c-example.html and switch the encoder to codec = avcodec_find_encoder(AV_CODEC_ID_AAC);
Output the various sample formats associated with AAC, it only provides FLTP. That assumes a planar/interleaved format.
This page seems to provide the various supported input formats per codec.
This is confusing because I don't think my raw captured audio is interleaved. I've certainly tried passing it through and it doesn't work as intended.
It will stay stuck here with this ret code indefinitely after calling avcodec_receive_packet:
AVERROR(EAGAIN): output is not available in the current state - user must try to send input
Questions:
How can I modify the example code from FFmpeg to convert pcm_f32le raw audio to AAC encoded audio?
Why is the CLI tool able to?
I am using libsoundio to capture raw audio from Linux's Dummy Output. I wonder how I could get a planar format to pass through to get AAC encoded audio.
If AAC is not a possibility, is doing so with MP3?
Find here a working example of how to encode raw pcm_f32le to aac with ffmpeg

Batch amplification of PCM audio using sox

I have a large number of .PCM files (248 total) that are all encoded as:
Encoding: Signed 16-bit uncompressed PCM
Byte order: Little-endian
Channels: 2 channel (stereo)
Sample rate: 44100 Hz
8 Byte header
I need to apply a -7.5 db amplification (deamplification?) to every single one of these files.
The problem I have is that all of these tracks are looped, and I need to preserve the loop data (contained in the 8-byte header).
I've yet to see a batch audio editing problem that sox couldn't handle, so I'm hoping someone on here would know how to use sox to accomplish this, or failing that, know of a program that can do this for me.
Thanks for the help!
*Edit- A bit of research got me the exact encoding of the PCM audio I need to edit:
"The audio tracks are 44.1 kilohertz, 16-bit stereo uncompressed unsigned PCM files in little-endian order, left channel first, with a simple eight-byte header. The first four bytes spell out “MSU1” in ASCII. This is followed by a 32-bit unsigned integer used as the loop point, measured in samples (a sample being four bytes) – if the repeat bit is set in the audio state register, this value is used to determine where to seek the audio track to."
*Edit2-I've managed to develop the needed sox command, I just have no idea how to turn it into a batch. Also, turns out the files were 16-bit signed, not unsigned, PCM.
sox -t raw -e signed -b 16 -r 44100 -c 2 -L [filename].pcm -t raw -L [filename].raw vol -7.5dB
I'm fine with either a .BAT I drag and drop files onto or a .BAT that just converts every .PCM file in the folder.
Help appreciated, because I don't even know where to start looking for this one...

Extract audio from Transport Stream and preserve length

I'm using ffmpeg to extract audio from MPEG Transport Stream file recorded by DVB-S card. The command:
ffmpeg -i video.ts -vn audio.wav
The source file seems to be corrupted. I noticed the corruption happens from time to time, especially for videos longer than 1 hour. I've got errors like these:
[mp2 # 0x1bb5500] Header missing
Error while decoding stream #0:1
[mpegts # 0x17eaf40] Continuity check failed for pid 5261 expected 2 got 6
The problem is that the resulting audio.wav is shorter than the source video (40m33s and 40m59s accordingly). I'm looking for the way to preserve the original length in the resulting audio file.
I tried the recent ffmpeg under Windows and avconv under Ubuntu, output format was MP3 and WAV. For every case I've got the same results.
I didn't find whether it's possible to do it with ffmpeg however I found ProjectX - a tool which tries to fix the broken TS stream. Website: http://project-x.sourceforge.net/
With:
java -jar ProjectX.jar -demux my_video.ts
the stream is demuxed into audio and video files which are guaranteed to have the same length. I simply mux them back using ffmpeg.

Convert audio to 8-bit signed PCM

I have a .mp4 audio file that I want to convert to a 8-bit unsigned PCM format for an Arduino Uno using the TMRpcm library.
It also could be a .wav file. Anyways, I have tried many things to no avail. The closest I got was with Audacity using the NIST Sphere codec. I tried to do this with FFmpeg, but it only supports demuxing NIST Sphere files. How do I convert audio to this format on Mac OS X (10.10.2)?
avconv is a fork from ffmpeg ... so use ffmpeg if you wish
avconv -i input.mp4 -ar 8000 -acodec pcm_u8 -ac 1 output.wav
WAV is the container format for the PCM codec so if you MUST have PCM then get into a binary file editor (wxHexEditor is a nice one) and delete the first 44 bytes (its header) of that WAV file
So above gives you 8000 samples per second and a bit depth of 8 bits, and mono.
verify this using
avprobe some_video_audio_file.wav
see bit depth listing available using avconv here
I realized that I was trying to convert a corrupt audio file. Audacity converted a valid file correctly.

Resources