How to merge several audio files using Libav API? - audio

Currently, I am implementing a new feature of my software using the Libav API. This is the requirement: to merge a list of audio files (MP3 and WAV) and create a unique
audio file (MP3) as output. Note: The challenge is not about concatenating files, but merging them. When the output sound is played, all the input audio content must sound at the same time, as when you merge several files in a video editor.
I was researching about Libav audio streams, and I am just guessing that my requirement is related to the "channels" concept, I mean, that there is possible to include several audios in the stream, using one channel per audio or something like that. I was hoping to find more information about this topic, but FFmpeg/Libav documentation is actually scarce.
Right now, I am able to merge several audio streams to a video stream successfully and I can create a playable MP4 file. My problem is that players like MPlayer/VLC only reproduce the first audio stream with the video, the other two audio streams are ignored.
I was looking at the set of examples included in the FFmpeg source code, but there is nothing specifically related to my requirement, so I would appreciate any
source code reference or algorithm explanation about how to merge several audio files into one using libav. Thanks.
The ffmpeg command to merge several audio files requires de filter flag "amix", like in this example:
ffmpeg -i 1.mp3 -i 2.mp3 -i 3.mp3 -filter_complex amix=inputs=3:duration=first result.mp3
All the syntax related to this option is described in the FFmpeg Documentation
Checking the FFmpeg source code, it seems the amix feature implementation is included in the file af_amix.c
I am not 100% sure, but it seems the general algorithm is described in the function:
static int activate(AVFilterContext *ctx)

Do you know how to merge several audio files using command line ffmpeg? It would help you if you first understand how to do it with the ffmpeg command then reverse engineer how it achieves it. It's all about how to constrct a filtergraph and pass data through it.
As for examples, check out examples/filter_audio.c and examples/filtering_audio.c

This C example gets two WAV audio files and merges them to generate a new WAV file using ffmpeg-4.4 API. Tip: The key of the process is to use these filters: abuffer, amix and abuffersink.
Although it doesn't support MP3 format as the output, it gives you the basics to understand how to implement your own requirements. I hope it can be handy for anyone looking for references about this specific topic.


Merging of two AAC files into a single file

I am trying to merge two different AAC audio files and a H264 video file to form a single TS file using C++ code. I have been successful in it. So now my TS file possess the following order. First, video part from the video file, then audio part from the first audio file and then audio part from the second audio file and then again the video part and it goes on the same way. On hearing the resulting file, I recognized the presence of the different audio files with the video.The problem is that the resulting audio ain't that much cleared. Distortions can be recognized making it unclear to hear. Also note that the resulting audio seems slow as compared to the original.Can anyone guide me in getting off those distortions and procuring the exact replica of my original files ?

How to create byte-range m3u8 playlist for HLS?

Apple gives an example of support for byte-range segments in m3u8 files for HLS
But I cannot figure out how to create such playlist for given .ts file.
Are there any tools for that?
There is -hls_flags as a ffmpeg option. (
Following command generates single ts file which is segmented by byte range feature(supported from HLS version 4) in m3u8 index file.
$ ffmpeg -i sample.mp3 -hls_time 20 -hls_flags single_file out.m3u8
Looks like
ffprobe -show_frames media.ts -print_format json
gives enough information about frames to build such playlist, although some scripting will be required to construct it.
I'll update this answer with script if I succeed with that approach.
Here is couple of useful links I've found by now:
Bash scripts for generating iframe playlists - needs a bit of optimization, as it calls ffprobe multiple times
iframe-playlist-generator - project on python that can be used to generate iframe playlists from usual ones
It is not exactly what I've searched initially, but I-Frame playlists are similar to byte-range ones and fit for my task even better, so I'm going to use these two projects as a reference/starting point to create something a bit more suitable for me.
The projects actually use different methods to find size of I-Frame - the bash script just uses what ffprobe shows in pkt_size, and the python project adds a bit of voodoo by calculating size as difference of positions of packets and adding 188 to match example playlists from apple. 188 bytes is the size of mpeg-ts packet, probably that is related somehow, I have not managed to understand how, however. This difference in size calculation causes different playlists to be generated, probably one of them is incorrect in some way, but actually VLC plays both without any problems, so I'm going to stick to simpler method until it will be proven as incorrect.
Update 2:
I've created a ruby module that can extract I-Frame information of given .ts file with ffprobe and build both I-Frame and usual byterange m3u8 playlist (as it was requested in question) based on that information.
I've found the simple method of creating I-Frame playlist I mentioned before to be incorrect, so I used the method from iframe-playlist-generator. The output is almost similar to the I-Frame playlist generated by mediafilesegmenter -output-single-file -file-base output-dir/ input.ts, mentioned by Duvrai, but sometimes there are some 188-byte size misses for random frames, I could not understand the pattern, so it is currently ignored.
You can use a standard segmenter such as Apple's mediafilesegmenter, check the lengths of the files, and then concatenate (with the cat program) them into a single file. From the file sizes you have all the information needed to specify the byte ranges in a playlist file.
Not as nice as just downloading a tool from the net, but it's not a very complicated algorithm.
Unified Streaming also offers a tool that can do this for you:
mp4split --package-hls output-single-file -o prog_index.m3u8 input.mp4
This is part of their commercial streaming package (they offer a free trial upon request). They also provide an Amazon AWS instance with hourly fees.

Storing multiple channels with different encoding/sampling in a single WAV file

I have two RTP streams (one for each call direction) that I want to mix in a single WAV file.
The problem is that the two streams may use different codecs (and therefore different sampling frequency, encoding, etc).
Is it possible to store the two RTP streams in a WAV file using two channels (i.e. stereo)? Asked differently, is there a way to store multiple channels with different encoding, sampling frequency, etc?
Structure of the WAV file assumes that sampling rate and channel bitness is the same for all channels of the feed. Encoding applies the entire feed (with many encodings/formats/codecs you cannot separate a channel without decoding the feed).
You will need to store feeds in separate files, or you need a file format which supports multiple audio tracks (MP4, MKV for example) though they all have their own restrictions.
As Roman R. mentioned it is not immediate. You will need to take an extra step in between to convert whatever you have on your RTP stream into a proper WAV file. The idea is to use a software like ffmpeg to do so:
2 × mono → stereo: ffmpeg -i left.mp3 -i right.mp3 -ac 2 output.wav
After that you could try something of the flavor (untested):
ffmpeg -i rtp://leftrtp -i rtp://rightrtp -ac 2 output.wav
Most likely you will need to tune the codec settings to make it work as you want. You can Google around and find some infos on the subject or read the ffmpeg doc.

ffmpeg - Can I draw an audio channel as an image?

I'm wondering if it's possible to draw an audio channel of a video or audio file as an image using ffmpeg, or if there's another tool that would do it on Win2k8 x64. I'm doing this as part of an encoding process after a user uploads a video or audio file.
I'm using ColdFusion 10 to handle the upload and calling cfexecute to run ffmpeg.
I need the image to look something like this (without the horizontal lines):
You can do this programmatically very easily.
Study the basics of FFmpeg. I suggest you to compile this sample. It explains how to open a video/audio, identify the streams and loop over the packets.
Once you have the data packet (in this case you are interested only in the audio packets). You will decode it (line 87 of this document) and obtain the raw data of an audio. It's the waveform itself (the analogue "bitmap" for an audio).
You could also study this sample. This second example is how to write a video/audio file. You don't want to write any video, but with this sample you can easily understand how the audio raw data packet works, if you see the functions get_audio_frame() and write_audio_frame().
You need to have some knowledge about creating a bitmap. Any platform has an easy way to do that.
So, the answer for you: YES, IT IS POSSIBLE TO DO THIS WITH FFMPEG! But you have to code a little bit in order to get what you want...
Sorry, there are ALSO built-in features for this:
You could use those filters... or
showspectrum, showwaves, avectorscope
Here are some examples on how to use it: FFmpeg Filters - 12.22 showwaves.

Multiple audio streams in a MPEG-4 file

The MPEG-4 file format allows multiple streams to be present in a file.
This is useful for videos containing audio in multiple languages. In the case of such a video, the audio streams are synchronized to the video.
Is it possible to create a MPEG-4 file the contains desynchronized audio streams, i.e. the audio track are played on after another?
I want to design a MPEG-4 file that contains a music album, so it is crucial that the tracks are played one after another by media players such as VLC.
When I use MP4Box (from the GPAC framework) the resulting file is recognised by VLC as having synchronized audio streams. Which box of the MPEG-4 file format is responsible for this? Or how can I tell VLC that these audio streams are not synchronized?
Thanks in advance!
I can think of two ways you could do that, and both would be somewhat problematic.
You could concatenate all the audio streams into one audio track in the MP4 file. This won't be ideal, for some obvious reasons. For one thing, it's not exactly what you were asking for.
You could also just store the tracks as synchronized audio streams, but set the timing information in such a way that the first sample of the second track won't start playing until the first track finished playing, etc.
I'm not aware of any tools that can do this, but the file format will support such a scheme. Since it's an unusual way to store audio in an MP4 file, I would expect players to have problems with this, too.
Concatenating all streams would work and the individual tracks can be addressed by adding chapters. It works at least with VLC.
MP4Box -new -cat track1.m4a -cat track2.m4a -chap chapters.txt album.m4a
The chapters.txt would look something like this:
But this is only a hack.
The solution I'm looking for should preserve the tracks as individual streams.
