why is FFMPEG framemd5 giving different audio sizes? - audio

I am using FFmpeg framemd5 to verify that when I rewrap a Sony XDCAM "MP4" file to an MXF file I am not re-encoding the audio-video data. The "MP4" has a stereo file PCM audio stream, which has to be split into two mono streams for the MXF container. The video is 25 fps and the audio is 48000 sample rate. (I know that the MP4 container specification does not allow PCM as an audio stream. However, this is Sony's special non-standard MP4 - which fortunately FFmpeg will still read)
The first few lines of framemd5 output for my original (MP4) are as follows:
0, 0, 0, 1, 3110400, 1851d2848eeef6636ea5ff1caa0c3555
1, 0, 0, 1024, 4096, eb35a0242f1b59d64dc340913d4ba757
1, 1024, 1024, 1024, 4096, 37c3a63ff6af92890056e42d8146275a
The first few lines output for the MXF are as follows:
0, 0, 0, 1, 3110400, 1851d2848eeef6636ea5ff1caa0c3555
1, 0, 0, 1920, 3840, a01565b99da62249d86200070eff2729
0, 1, 1, 1, 3110400, eb46f1690b2f8e3f32d07cf8ccefcdf4
In the MXF output the "duration" for the audio stream is 1920 (which seems to make sense since 48000 / 25 = 1920, and the "size" is 3840 (which makes sense because 48000*16/8/25 = 3840)
Can somebody explain why the MP4 file is having duration = 1024, and size = 4096

a stereo file audio stream, which has to be split into two mono streams for the MXF container
If you're doing this, you're transcoding the audio. But since the target codec is PCM, and stream parameters are presumably unchanged, audio fidelity is preserved.
As to your main query, MP4s typically contain AAC audio, where each frame contains 1024 samples. PCM is uncoded audio and so can be encapsulated into frames of arbitrary sizes.
Add -af asetnsamples=1024 when checking the MXF to replicate the MP4 framing.

Related

How to ensure that ffmpeg libraries uses/ not uses GPU

My library ( Linux, Debian) uses FFMpeg libraries ( avformat, avcodec, swscale etc) for reading video stream from network cameras. Actually, I need to capture each video frame from network camera, decode it, scale and store in memory- and other thread pass this data to calling program for display.
Problem is, that all works in CPU and take a huge amount of CPU resource. How can I enforce usage of GPU accelerator for processing?
I have video card: VGA compatible controller: Intel Corporation HD Graphics 620 (rev 02)
My decode thread look like this ( I omit declarations, error handling etc, so pls don't look for grammar mistakes:)))
fmt = avformat_alloc_context();
//initialising, setting option by av_dict_set
// finding video stream index
***
// finding decoder and allocate its contexts
frame = av_frame_alloc();
while ( av_read_frame(ctx->fmt, &pkt) >= 0)
{
AVPacket orig_pkt = pkt;
avcodec_send_packet(ctx->dec_ctx, pkt);
avcodec_receive_frame(ctx->dec_ctx, frame);
***
// get buffer allocated for store of frame data
buff = get_free_buffer(ctx);
sws_scale(ctx->sws, (const uint8_t * const*)frame->data,
frame->linesize, 0, ctx->dec_ctx->height, buff->data,
buff->linesize);
ret = decode_packet(ctx, frame, &pkt, &got_frame);
if (ret < 0)
break;
pkt.data += ret;
pkt.size -= ret;
}
while (pkt.size > 0);
av_packet_unref(&orig_pkt);
}
*****
You can find HW accelerated ffmpeg recoding commands on the internet, i am using
ffmpeg -vaapi_device /dev/dri/renderD128 -i "inputfile" -vf format=nv12,hwupload -c:v h264_vaapi -f mp4 -qp 18 -map 0 "outputfile.mp4"
You can list HW accelerators by command ffmpeg -hwaccels and DRI framework path using command ls /dev/dri/, then video codec/encoder (h264_vaapi in above example) you can find using command ffmpeg -encoders. -f mp4 parameter may not be necessary to define file format, -qp sets quality (in this case similar to original), -map 0 will try to use all streams of the input file, not just the stream with highest quality, first/default subtitle...
On another hand, when i do not define the HW accelerator device and use default encoder libx264, i can see CPU is maxed out and so no HW acceleration is likely used.

FFmpeg Decklink playout audio PCM

I'm working on a ffmpeg playout application for Decklink but I'm facing some audio issues. I've seen other questions about this topic but none of them are currently helping.
I've tried Reubens code (https://stackoverflow.com/a/15372417/12610231) with the swr_convert for playing out ffmpeg/libav frames to a Decklink board (this needs to be 16 bits PCM interleaved) but the audio sounds wrong. It sounds like it's missing samples/ only getting half of the required samples).
When I record the samples in a raw audio file and play it out with Audacity the timeline is half the length of the actual recording and playing the samples on double speed.
I also tried the 'manual' conversion (https://stackoverflow.com/a/15372417/12610231) but unfortunately, not the result I was hoping for.
Here are some snippets of my code
swr_ctx = swr_alloc();
av_opt_set_int(swr_ctx, "in_channel_count", pAudioCodecCtx->channels, 0);
av_opt_set_int(swr_ctx, "in_sample_rate", pAudioCodecCtx->sample_rate, 0);
av_opt_set_int(swr_ctx, "in_channel_layout", pAudioCodecCtx->channel_layout, 0);
av_opt_set_sample_fmt(swr_ctx, "in_sample_fmt", pAudioCodecCtx->sample_fmt, 0);
av_opt_set_int(swr_ctx, "out_channel_count", 2, 0);
av_opt_set_int(swr_ctx, "out_sample_rate", 48000, 0);
av_opt_set_int(swr_ctx, "out_channel_layout", AV_CH_LAYOUT_STEREO, 0);
av_opt_set_sample_fmt(swr_ctx, "out_sample_fmt", AV_SAMPLE_FMT_S16, 0);
if (swr_init(swr_ctx))
{
printf("Error SWR");
}
///
ret = avcodec_decode_audio4(pAudioCodecCtx, pFrame, &frameFinished, &packet);
if (ret < 0) {
printf("Error in decoding audio frame.\n");
}
swr_convert(swr_ctx, (uint8_t**)&m_audioBuffer, pFrame->nb_samples, (const uint8_t *)pFrame->extended_data, pFrame->nb_samples);
It also looks like that the FFmpeg packet contains out of 1 video packet en 2 audio packets, not sure what to do with the second audio packet, I already tried to combine the first and second audio package without any good result on the audio side.
Any help is appreciated.

Overwriting TS Stream File with FFMPEG in Linux

I'm trying to convert rtmp streams to m3u8 stream. To reach that aim I use FFMPEG. Now, there is no problem with converting and downloading. However, it writes lots of .ts file such as channel0000.ts,channel0001.ts,channel0002.ts. Per every 10 seconds, 1 ts file is created. In this point, I want a single ts file. In other words, I need overwriting because I don't want to store all the stream I need just last 10 seconds. When I try to write on the same file I get this error:
Invalid segment filename template 'channel.ts'
Could not write header for output file #0 (incorrect codec parameters ?): Invalid argumentStream mapping:
Stream #0:0 -> #0:0 (copy)
Stream #0:1 -> #0:1 (copy)
Last message repeated 1 times
Here is my FFMPEG command.
ffmpeg -loglevel quiet -i rtmp://example -c:v libx264 -profile:v baseline -level 3.1 -c:a aac -strict experimental -f mpegts - | ffmpeg -i - -c copy -map 0 -f segment -segment_list channel.m3u8 -segment_format mpegts -segment_time 10 channel%04d.ts
Any suggestion?
In FFMPEG documentation, I found "segment_wrap" options. When you add this option, files are written in a loop. In my case I added "-segment_wrap 1" command part and it writes just a single file now.
I had a similar issue, my solution was to pipe out the entire result of the hls muxer, except for the header file using the -hls_segment_filename pipe:1 option (I'm therefore just piping out the .ts files and not the .m3u8 ones).
On my pipe out, i stuck a piece of code I wrote that detects the header of a .ts file, which looks something like the bytes [71, 64, 17, 1X, 0, 66, 240, 34, 0, 1, 193, 0, 0, 255, 1, 255, 0, 1, 252, 128, 17, 72] (I recommend GHex to check yoour .ts files if you want more of the header to discriminate). I then made a piece of code that cuts along this line (warning, the fourth byte here can change its value, but it is the only one), and recomposes the file.
The only thing left to do I think for your application is to use a queue for the file's content. If you want ten seconds and your file are about 1s you can use a queue of length 10.
My code was in go (because my whole project was), so here's a bit of code in go that might help :
type tsFile struct {
contents []byte
finished bool
child *tsFile
}
func in(a int, array int[]) bool{
for _, b := range list {
if b == a {
return true
}
}
return false
}
func CutAdd(curFile *tsFile, pipedOut []bytes){
header := [...]bytes{} // insert here the header you want
unNeededIndex := [...]int{3} // insert here the bytes indexes that change between files
cur_header_pointer := 0
header_cache := make([]byte, 0, len(header))
for i, b := range(pipedOut){
if header[cur_header_pointer] == b || in(cur_header_pointer, unNeededIndex){
header_cache = append(header_cache, b)
cur_header_pointer ++
if cur_header_pointer == len(header){
curFile.finished = true
curFile.child = &tsFile{contents : header_cache}
CutAdd(curFile.child, pipedOut[i:])
return
}
} else {
if cur_header_pointer != 0 {
for _, cached_b := range(header_cache){
curFile.contents = append(curFile.contents, cached_b) // we store what we thought was a header
}
cur_header_pointer = 0
}
curFile.contents = append(curFile.contents, b) // we store the byte
}
}
}
It is a bit janky, but it works for me (also there might be mistakes, I didn't put the actual code I made then, but you should have a rough idea of what I mean)

How to lower the quality and specs of a wav file on linux

So to preface my problem, I'll give some context.
In SDL2 you can load wav files such as from the wiki:
SDL_AudioSpec wav_spec;
Uint32 wav_length;
Uint8 *wav_buffer;
/* Load the WAV */
if (SDL_LoadWAV("test.wav", &wav_spec, &wav_buffer, &wav_length) == NULL) {
fprintf(stderr, "Could not open test.wav: %s\n", SDL_GetError());
} else {
/* Do stuff with the WAV data, and then... */
SDL_FreeWAV(wav_buffer);
}
The issue I'm getting from SDL_GetError is Complex WAVE files not supported
Now the wav file I'm intending to open has the following properties:
Playing test.wav.
Detected file format: WAV / WAVE (Waveform Audio) (libavformat)
ID_AUDIO_ID=0
[lavf] stream 0: audio (pcm_s24le), -aid 0
Clip info:
encoded_by: Pro Tools
ID_CLIP_INFO_NAME0=encoded_by
ID_CLIP_INFO_VALUE0=Pro Tools
originator_reference:
ID_CLIP_INFO_NAME1=originator_reference
ID_CLIP_INFO_VALUE1=
date: 2016-05-1
ID_CLIP_INFO_NAME2=date
ID_CLIP_INFO_VALUE2=2016-05-1
creation_time: 20:13:34
ID_CLIP_INFO_NAME3=creation_time
ID_CLIP_INFO_VALUE3=20:13:34
time_reference:
ID_CLIP_INFO_NAME4=time_reference
ID_CLIP_INFO_VALUE4=
ID_CLIP_INFO_N=5
Load subtitles in dir/
ID_FILENAME=dir/test.wav
ID_DEMUXER=lavfpref
ID_AUDIO_FORMAT=1
ID_AUDIO_BITRATE=2304000
ID_AUDIO_RATE=48000
ID_AUDIO_NCH=2
ID_START_TIME=0.00
ID_LENGTH=135.53
ID_SEEKABLE=1
ID_CHAPTERS=0
Selected audio codec: Uncompressed PCM [pcm]
AUDIO: 48000 Hz, 2 ch, s24le, 2304.0 kbit/100.00% (ratio: 288000->288000)
ID_AUDIO_BITRATE=2304000
ID_AUDIO_RATE=48000
ID_AUDIO_NCH=2
AO: [pulse] 48000Hz 2ch s16le (2 bytes per sample)
ID_AUDIO_CODEC=pcm
From the wiki.libsdl.org/SDL_OpenAudioDevice page and subsequent wiki.libsdl.org/SDL_AudioSpec#Remarks page I can at least surmise that a wav file of:
freq = 48000;
format = AUDIO_F32;
channels = 2;
samples = 4096;
quality should work.
The main problem I can see is that my wav file has the s16le format whereas it's not listed on the SDL_AudioSpec page.
This leads me to believe I need to reduce the quality of test.wav so it does not appear as "complex" in SDL.
When I search engine Complex WAVE files not supported nothing helpful comes up, except it appears in the SDL_Mixer library, which as far as I know I'm not using.
Can the format be changed via ffmepg to work in SDL2?
Edit: This appears to be the actual code in SDL2 where it complains. I don't really know enough about C to dig all the way through the vast SDL2 library, but I thought it might help if someone notices something just from hinting variable names and such:
/* Read the audio data format chunk */
chunk.data = NULL;
do {
if ( chunk.data != NULL ) {
SDL_free(chunk.data);
chunk.data = NULL;
}
lenread = ReadChunk(src, &chunk);
if ( lenread < 0 ) {
was_error = 1;
goto done;
}
/* 2 Uint32's for chunk header+len, plus the lenread */
headerDiff += lenread + 2 * sizeof(Uint32);
} while ( (chunk.magic == FACT) || (chunk.magic == LIST) );
/* Decode the audio data format */
format = (WaveFMT *)chunk.data;
if ( chunk.magic != FMT ) {
SDL_SetError("Complex WAVE files not supported");
was_error = 1;
goto done;
}
After a couple hours of fun audio converting I got it working, will have to tweak it to try and get better sound quality.
To answer the question at hand, converting can be done by:
ffmpeg -i old.wav -acodec pcm_s16le -ac 1 -ar 16000 new.wav
To find codecs on your version of ffmpeg:
ffmpeg -codecs
This format works with SDL.
Next within SDL when setting the desired SDL_AudioSpec make sure to have the correct settings:
freq = 16000;
format = AUDIO_S16LSB;
channels = 2;
samples = 4096;
Finally the main issue was most likely using the legacy SDL_MixAudio instead of the newer SDL_MixAudioFormat
With the following settings:
SDL_MixAudioFormat(stream, mixData, AUDIO_S16LSB, len, SDL_MIX_MAXVOLUME / 2); as can be found on the wiki.

FFmpeg playing audio slowly after conversion from AAC

I'm attempting to convert an AAC audio stream for playback. I've discovered that I need to convert from AV_SAMPLE_FMT_FLTP to AV_SAMPLE_FMT_S16 but when I do so the audio plays back at about half speed.
swr = swr_alloc();
assert(av_opt_set_int(swr, "in_channel_layout", audioContext->channel_layout, 0) == 0);
assert(av_opt_set_int(swr, "out_channel_layout", audioContext->channel_layout, 0) == 0);
assert(av_opt_set_int(swr, "in_sample_rate", audioContext->sample_rate, 0) == 0);
assert(av_opt_set_int(swr, "out_sample_rate", 44100, 0) == 0);
assert(av_opt_set_int(swr, "in_sample_fmt", audioContext->sample_fmt, 0) == 0);
assert(av_opt_set_int(swr, "out_sample_fmt", AV_SAMPLE_FMT_S16, 0) == 0);
swr_init(swr);
There is my code to convert. The input sample rate is 44100 and the audio is stereo.
I call the code with
swr_convert(swr, &output, aDecodedFrame->nb_samples, (const uint8_t**)aDecodedFrame->extended_data, aDecodedFrame->nb_samples) >= 0)
You didn't show the actual audio encoding code, so I'd speculate there's a chance you might not handle the resampling properly. Note that you read twice less data from the resampling operation (i.e. if you pass 80 bytes, you'll read 40 from the resampler).
You may take a look at my video writing code, and strip off the audio encoding part. It is here: http://sourceforge.net/p/karlyriceditor/code/HEAD/tree/src/ffmpegvideoencoder.cpp

Resources