audio sample format s16p, ffmpeg or audio codec bug?

audio sample format s16p, ffmpeg or audio codec bug? - audio

I have a video file and I had dumped the video info to a txt file with ffmpeg nearly 3 year ago.
...
Stream #0:1[0x1c0]: Audio: mp2, 48000 Hz, stereo, s16, 256 kb/s
Stream #0:2[0x1c1]: Audio: mp2, 48000 Hz, stereo, s16, 256 kb/s
But I found the format changed when I used the update ffprobe (ffprobe version N-78046-g46f67f4 Copyright (c) 2007-2016 the FFmpeg developers).
...
Stream #0:1[0x1c0]: Audio: mp2, 48000 Hz, stereo, s16p, 256 kb/s
Stream #0:2[0x1c1]: Audio: mp2, 48000 Hz, stereo, s16p, 256 kb/s
With the same video, its sample format changes to s16p.
I implemented a simple video player which uses ffmpeg. It can play video 3 years ago, but failed to output the correct pcm stream after changing to update ffmpeg. I spent lots time and finally found that the audio should have been s16 instead of s16p. The decoded audio stream works after I added the line before calling avcodec_decode_audio4,
audio_codec_ctx->sample_fmt = AV_SAMPLE_FMT_S16
but it is just a hack. Does anyone encounter this issue? How to make ffmpeg work correctly? Any hint is appreciated. Thanks!

The output format changed. The reason for this is fairly convoluted and technical, but let me try explaining it anyway.
Most audio codecs are structured such that the output of each channel is best reconstructed individually, and the merging of channels (interleaving of a "left" and "right" buffer into an array of samples ordered left0 right0 left1 right1 [etc]) happens at the very end. You can probably imagine that if the encoder wants to deinterleave again, then transcoding of audio involves two redundant operations (interleaving/deinterleaving). Therefore, all decoders where it makes sense were switched to output planar audio (so s16 changed to s16p, where p means planar), where each channel is its own buffer.
So: nowadays, interleaving is done using a resampling library (libswresample) after decoding instead of as an integral part of decoding, and only if the user explicitly wants to do so, rather than automatically/always.
You can indeed set the request sample format to S16 to force decoding to s16 instead of s16p. Consider this a compatibility hack that will at some point be removed for the few decoders for which it does work, and also one that will not work for new decoders. Instead, consider adding libswresample support to your application to convert between whatever is the native output format of the decoder, and the format you want to use for further data processing (e.g. playback using sound card).

Related

How do you encode raw pcm_f32le audio to AAC encoded audio with FFmpeg (C/C++)?

I am trying to encode raw audio (pcm_f32le) to AAC encoded audio. One thing I've noticed is that I can accomplish this via the CLI tool:
ffmpeg -f f32le -ar 48000 -ac 2 -c:a pcm_f32le -i out.raw out.m4a -y
This plays just fine and decodes fine.
The steps I've taken:
When I am using the C example code: https://ffmpeg.org/doxygen/3.4/encode_audio_8c-example.html and switch the encoder to codec = avcodec_find_encoder(AV_CODEC_ID_AAC);
Output the various sample formats associated with AAC, it only provides FLTP. That assumes a planar/interleaved format.
This page seems to provide the various supported input formats per codec.
This is confusing because I don't think my raw captured audio is interleaved. I've certainly tried passing it through and it doesn't work as intended.
It will stay stuck here with this ret code indefinitely after calling avcodec_receive_packet:
AVERROR(EAGAIN): output is not available in the current state - user must try to send input
Questions:
How can I modify the example code from FFmpeg to convert pcm_f32le raw audio to AAC encoded audio?
Why is the CLI tool able to?
I am using libsoundio to capture raw audio from Linux's Dummy Output. I wonder how I could get a planar format to pass through to get AAC encoded audio.
If AAC is not a possibility, is doing so with MP3?

Find here a working example of how to encode raw pcm_f32le to aac with ffmpeg

Dealing with problems in FLAC audio files with ffmpeg

I have gotten a set of FLAC (audio) files from a friend. I copied them to my Sonos music library, and got set to enjoy a nice album. Unfortunately, Sonos would not play the files. As a result I have been getting to know ffmpeg.
Sonos' complaint with the FLAC files was that it was "encoded at an unsupported sample rate". With rolling eyes and shaking head, I note that the free VLC media player happily plays these files, but the product I've paid for (Sonos) - does not. But I digress...
ffprobe revealed that the FLAC files contain both an Audio channel and a Video channel:
$ ffprobe -hide_banner -show_streams "/path/to/Myaudio.flac"
Duration: 00:02:23.17, start: 0.000000, bitrate: 6176 kb/s
Stream #0:0: Audio: flac, 176400 Hz, stereo, s32 (24 bit)
Stream #0:1: Video: mjpeg (Progressive), yuvj444p(pc, bt470bg/unknown/unknown), 450x446 [SAR 72:72 DAR 225:223], 90k tbr, 90k tbn, 90k tbc (attached pic)
Metadata:
comment : Cover (front)
Cool! I guess this is how some audio players are able to display the 'album artwork' when they play a song? Note also that the Audio stream is reported at 176400 Hz! Apparently I'm out of touch; I thought that 44.1khz sampling rate effectively removed all of the 'sampling artifacts' we could hear. Anyway, I learned that Sonos would support a max of 48kHz sampling rate, and this (the 176.4kHz rate) is what Sonos was unhappy about. I used ffmpeg to 'dumb it down' for them:
$ ffmpeg -i "/path/to/Myaudio.flac" -sample_fmt s32 -ar 48000 "/path/to/Myaudio48K.flac"
This seemed to work - at least I got a FLAC file that Sonos would play. However, I also got what looks like a warning of some sort:
[swscaler # 0x108e0d000] deprecated pixel format used, make sure you did set range correctly
[flac # 0x7feefd812a00] Frame rate very high for a muxer not efficiently supporting it.
Please consider specifying a lower framerate, a different muxer or -vsync 2
A bit more research turned up this answer which I don't quite understand, and then in a comment says, "not to worry" - at least wrt the swscaler part of the warning.
And that (finally) brings me to my questions:
1.a. What framerate, muxer & other specifications make a graphic compatible with a majority of programs that use the graphic?
1.b. How should I use ffmpeg to modify the Video channel to set these specifications (ref. Q 1.a.)?
2.a. How do I remove the Video channel from the .flac audio file?
2.b. How do I add a Video channel into a .flac file?
EDIT:
I asked the above (4) questions after failing to accomplish a 'direct' conversion (a single ffmpeg command) from FLAC at 176.4 kHz to ALAC (.m4a) at 48 kHz (max supported by Sonos). I reasoned that an 'incremental' approach through a series of conversions might get me there. With the advantage of hindsight, I now see I should have posted my original failed direct conversion incantation... we live and learn.
That said, the accepted answer below meets my final objective to convert a FLAC file encoded at 176.4kHz to an ALAC (.m4a) at 48kHz, and preserve the cover art/video channel.

What framerate, muxer & other specifications make a graphic compatible with a majority of programs that use the graphic?
A cover art is just a single frame so framerate has no relevance in this case. However, you don't want a video stream, it has to remain a single image, so -vsync 0 should be added. Muxer is simply the specific term for the packager as used in media file processing. It is decided by the choice of format e.g. FLAC, WAV..etc. What's important is the codec for the cover art; usually, it's PNG or JPEG. For FLAC, PNG is the default codec.
How do I remove the Video channel from the .flac audio file
ffmpeg -i "/path/to/Myaudio.flac" -vn -c copy "/path/to/Myaudio48K.flac"
(All this does is skip any video in the input and copy everything else)
How do I add a Video channel into a .flac file?
To add cover art to audio-only formats, like MP3, FLAC..etc, the video stream has to have a disposition of attached picture. So,
ffmpeg -i "/path/to/Myaudio.flac" -i coverimage -sample_fmt s32 -ar 48000 -disposition:v attached_pic -vsync 0 "/path/to/Myaudio48K.flac"
For direct conversion to ALAC, use
ffmpeg -i "/path/to/Myaudio.flac" -i coverimage -ar 48000 -c:a alac -disposition:v attached_pic -vsync 0 -c:v png "/path/to/Myaudio48K.m4a"

Slow audio-video sync drift when merging wav and mp4 with ffmpeg

I have an mp4 file with only a single video stream (no audio) and a wav audio file that I would like to add to the video using ffmpeg. The audio and the video have been recorded simultaneously during a conference, the former from a mixer output on a PC and the latter from a digital videocamera.
I am using this ffmpeg command:
ffmpeg -i incontro3.mp4 -itsoffset 18.39 -i audio_mix.wav -c:v copy -c:a aac final-video.mp4
where I'm using the -itsoffset 18.39 option since I know that 18.39s is the video-audio delay.
The problem I'm experiencing is that in the output file, while the audio is perfectly in sync with the video at the beginning, it slowly drifts out of sync during the movie.
The output if ffprobe on the video file is:
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'incontro3.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf57.25.100
Duration: 00:47:22.56, start: 0.000000, bitrate: 888 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 886 kb/s, 25 fps, 25 tbr, 12800 tbn (default)
Metadata:
handler_name : VideoHandler
and the ffprobe output for the audio file is:
Input #0, wav, from 'audio_mix.wav':
Metadata:
track : 5
encoder : Lavf57.25.100
Duration: 00:46:32.20, bitrate: 1411 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 2 channels, s16, 1411 kb/s
I'm using the latest ffmpeg Zeranoe windows build git-9591ca7 (2016-05-25).
Thanks in anticipation for any help/ideas!
UPDATE 1: It looks like the problem is upstream the video-audio merging, and could be in the concatenation and conversion of the MTS files generated by the video camera into the mp4 video. I will follow up as I make any progress in understanding...
UPDATE 2: The problem is not in the initial merging of the MTS files generated by the camera. Or, at least, it occurs identically if I merge them with cat or with ffmpeg -f concat
UPDATE 3: Following #Mulvya's suggestion, I observed that the drift rate is constant (at least as far as I can tell judging by eye). I also tried to superimpose the A/V tracks with another software, and the drift is exactly the same, thereby ruling out ffmpeg as culprit. My (bad) feeling is that the issue could be related to the internal clocks of the digital video camera and the laptop used for audio recording running at slightly different rates (see here the report of an identical issue I just found).

Since the drift rate is constant, you can use a combination of FFmpeg filters to retime the audio.
ffmpeg -i audio_mix.wav -af asetrate=44100*(10/9),aresample=44100 retimed.wav
Here, 44100*(10/9) indicates the actual no. of samples that represent 1 second of sound i.e. if after 100 seconds of playback of the original WAV, the audio just heard is the 90th second, then the sample consumption rate should be increased by 10/9. That would make for an unconventional sample rate, so aresample is added to resample it back to a standard rate.

How do I alter my FFMPEG command to make my HTTP Live Streams more efficient?

I want to reduce the muxing overhead when creating .ts files using FFMPEG.
Im using FFMPEG to create a series of transport stream files used for HTTP live streaming.
./ffmpeg -i myInputFile.ismv \
-vcodec copy \
-acodec copy \
-bsf h264_mp4toannexb \
-map 0 \
-f segment \
-segment_time 10\
-segment_list_size 999999 \
-segment_list output/myVarientPlaylist.m3u8 \
-segment_format mpegts \
output/myAudioVideoFile-%04d.ts
My input is in ismv format and contains a video and audio stream:
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 320x240, 348 kb/s, 29.97 tbr, 10000k tbn, 59.94 tbc
Stream #0:1(und): Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 63 kb/s
There is an issues related to muxing that is causing a large amout of overhead to be added to the streams. This is how the issue was described to me for the audio:
So for a given aac stream, the overhead will be 88% (since 200 bytes will map to 2 x 188 byte packets).
For video, the iframe packets are quite large, so they translate nicely into .ts packets, however, the diffs can be as small as an audio packet, therefore they suffer from the same issue.
The solution is to combine several aac packets into one larger stream before packaging them into .ts. Is this possible out of the box with FFMPEG?

It is not possible. Codecs rely on the encapsulating container for framing, which means to signal the start and length of a frame.
Your graphic actually misses an element, which is the PES packet. Your audio frame will be put into a PES packet first (which indicates its length), then the PES packet will be cut into smaller chunks which will be TS packets.
By design you can not start a new PES packet (containing an audio frame in your case) in a TS packet which already contains data. A new PES packet will always start in a new TS packet. Otherwise it would be impossible to start playing mid-stream (broadcast sitation) - it would be impossible to know on which byte in the TS the new PES begins (remember you have missed the beginning of the current PES).
There are some mitigating factors, the FF FF FF padding will probably be compressed by the networking hardware. Also if you are using HTTP (instead of UDP or RDP), gzip compression can be enabled (but I doubt it would help much).

I've fixed the worst problem of syncing the TS output on each frame in http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=75c8d7c2b4972f6ba2cef605949f57322f7c0361 - please try a version after that.

Restreaming video containing two languages live with ffmpeg

I have a project where i need to restream a live stream which has two languages setup on the audio.
Spanish on left and English on right
The stream mapping is:
Stream #0:0: Video: h264 ([7][0][0][0] / 0x0007), yuv420p, 512x288 [SAR 1:1 DAR 16:9], q=2-31, 1k tbn, 1k tbc
Stream #0:1: Audio: mp3 ([2][0][0][0] / 0x0002), 44100 Hz, stereo, s16, 18 kb/s
I need to restream this back live with just the English from the right side or just spanish from the left side, I tried looking everywhere but did not find any type of solution .
Since this needs to be done live, I can't be using other programs to separate video and audio to get it done.
This needs to be done through ffmpeg and I wonder if it even capable of doing so with original built or it would need some custom modification.

You can use the -map_channel option or the pan filter. Unfortunately you didn't specify if you want a stereo or mono output. If stereo you can simply mute a channel or duplicate a channel into both left and right channels of the output. Here are some examples assuming you want to keep a stereo output.
To copy the right channel of the input to the left and right channels of the output:
ffmpeg -i input -map_channel 0.1.1 -map_channel 0.1.1 output
To mute the left channel:
ffmpeg -i input -map_channel -1 -map_channel 0.1.1 output
To mute the left channel using pan:
ffmpeg -i input -filter:a pan="stereo:c1=c1" output
FFmpeg usage questions are better suited for superuser.com since stackoverflow is programming specific.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string