Dealing with problems in FLAC audio files with ffmpeg

Dealing with problems in FLAC audio files with ffmpeg - audio

I have gotten a set of FLAC (audio) files from a friend. I copied them to my Sonos music library, and got set to enjoy a nice album. Unfortunately, Sonos would not play the files. As a result I have been getting to know ffmpeg.
Sonos' complaint with the FLAC files was that it was "encoded at an unsupported sample rate". With rolling eyes and shaking head, I note that the free VLC media player happily plays these files, but the product I've paid for (Sonos) - does not. But I digress...
ffprobe revealed that the FLAC files contain both an Audio channel and a Video channel:
$ ffprobe -hide_banner -show_streams "/path/to/Myaudio.flac"
Duration: 00:02:23.17, start: 0.000000, bitrate: 6176 kb/s
Stream #0:0: Audio: flac, 176400 Hz, stereo, s32 (24 bit)
Stream #0:1: Video: mjpeg (Progressive), yuvj444p(pc, bt470bg/unknown/unknown), 450x446 [SAR 72:72 DAR 225:223], 90k tbr, 90k tbn, 90k tbc (attached pic)
Metadata:
comment : Cover (front)
Cool! I guess this is how some audio players are able to display the 'album artwork' when they play a song? Note also that the Audio stream is reported at 176400 Hz! Apparently I'm out of touch; I thought that 44.1khz sampling rate effectively removed all of the 'sampling artifacts' we could hear. Anyway, I learned that Sonos would support a max of 48kHz sampling rate, and this (the 176.4kHz rate) is what Sonos was unhappy about. I used ffmpeg to 'dumb it down' for them:
$ ffmpeg -i "/path/to/Myaudio.flac" -sample_fmt s32 -ar 48000 "/path/to/Myaudio48K.flac"
This seemed to work - at least I got a FLAC file that Sonos would play. However, I also got what looks like a warning of some sort:
[swscaler # 0x108e0d000] deprecated pixel format used, make sure you did set range correctly
[flac # 0x7feefd812a00] Frame rate very high for a muxer not efficiently supporting it.
Please consider specifying a lower framerate, a different muxer or -vsync 2
A bit more research turned up this answer which I don't quite understand, and then in a comment says, "not to worry" - at least wrt the swscaler part of the warning.
And that (finally) brings me to my questions:
1.a. What framerate, muxer & other specifications make a graphic compatible with a majority of programs that use the graphic?
1.b. How should I use ffmpeg to modify the Video channel to set these specifications (ref. Q 1.a.)?
2.a. How do I remove the Video channel from the .flac audio file?
2.b. How do I add a Video channel into a .flac file?
EDIT:
I asked the above (4) questions after failing to accomplish a 'direct' conversion (a single ffmpeg command) from FLAC at 176.4 kHz to ALAC (.m4a) at 48 kHz (max supported by Sonos). I reasoned that an 'incremental' approach through a series of conversions might get me there. With the advantage of hindsight, I now see I should have posted my original failed direct conversion incantation... we live and learn.
That said, the accepted answer below meets my final objective to convert a FLAC file encoded at 176.4kHz to an ALAC (.m4a) at 48kHz, and preserve the cover art/video channel.

What framerate, muxer & other specifications make a graphic compatible with a majority of programs that use the graphic?
A cover art is just a single frame so framerate has no relevance in this case. However, you don't want a video stream, it has to remain a single image, so -vsync 0 should be added. Muxer is simply the specific term for the packager as used in media file processing. It is decided by the choice of format e.g. FLAC, WAV..etc. What's important is the codec for the cover art; usually, it's PNG or JPEG. For FLAC, PNG is the default codec.
How do I remove the Video channel from the .flac audio file
ffmpeg -i "/path/to/Myaudio.flac" -vn -c copy "/path/to/Myaudio48K.flac"
(All this does is skip any video in the input and copy everything else)
How do I add a Video channel into a .flac file?
To add cover art to audio-only formats, like MP3, FLAC..etc, the video stream has to have a disposition of attached picture. So,
ffmpeg -i "/path/to/Myaudio.flac" -i coverimage -sample_fmt s32 -ar 48000 -disposition:v attached_pic -vsync 0 "/path/to/Myaudio48K.flac"
For direct conversion to ALAC, use
ffmpeg -i "/path/to/Myaudio.flac" -i coverimage -ar 48000 -c:a alac -disposition:v attached_pic -vsync 0 -c:v png "/path/to/Myaudio48K.m4a"

Related

How do you encode raw pcm_f32le audio to AAC encoded audio with FFmpeg (C/C++)?

I am trying to encode raw audio (pcm_f32le) to AAC encoded audio. One thing I've noticed is that I can accomplish this via the CLI tool:
ffmpeg -f f32le -ar 48000 -ac 2 -c:a pcm_f32le -i out.raw out.m4a -y
This plays just fine and decodes fine.
The steps I've taken:
When I am using the C example code: https://ffmpeg.org/doxygen/3.4/encode_audio_8c-example.html and switch the encoder to codec = avcodec_find_encoder(AV_CODEC_ID_AAC);
Output the various sample formats associated with AAC, it only provides FLTP. That assumes a planar/interleaved format.
This page seems to provide the various supported input formats per codec.
This is confusing because I don't think my raw captured audio is interleaved. I've certainly tried passing it through and it doesn't work as intended.
It will stay stuck here with this ret code indefinitely after calling avcodec_receive_packet:
AVERROR(EAGAIN): output is not available in the current state - user must try to send input
Questions:
How can I modify the example code from FFmpeg to convert pcm_f32le raw audio to AAC encoded audio?
Why is the CLI tool able to?
I am using libsoundio to capture raw audio from Linux's Dummy Output. I wonder how I could get a planar format to pass through to get AAC encoded audio.
If AAC is not a possibility, is doing so with MP3?

Find here a working example of how to encode raw pcm_f32le to aac with ffmpeg

Convert audio from old video game

I have some audiofiles from an old video game in a very rare format,
22050 hz, 2 channel, 4 bit, PCM (not ADPCM).
Is there any tool around to convert that in any modern format?
I tried ffplay -ac 2 -acodec adpcm_ima_apc -i $audiofile but that did not work out (it played back, but in a terrible quality)
I uploaded one of the files here (Be sure to use the right download-button, not the one from the add):
https://www.file-upload.net/download-13698824/Morningmood.wav.html

audio sample format s16p, ffmpeg or audio codec bug?

I have a video file and I had dumped the video info to a txt file with ffmpeg nearly 3 year ago.
...
Stream #0:1[0x1c0]: Audio: mp2, 48000 Hz, stereo, s16, 256 kb/s
Stream #0:2[0x1c1]: Audio: mp2, 48000 Hz, stereo, s16, 256 kb/s
But I found the format changed when I used the update ffprobe (ffprobe version N-78046-g46f67f4 Copyright (c) 2007-2016 the FFmpeg developers).
...
Stream #0:1[0x1c0]: Audio: mp2, 48000 Hz, stereo, s16p, 256 kb/s
Stream #0:2[0x1c1]: Audio: mp2, 48000 Hz, stereo, s16p, 256 kb/s
With the same video, its sample format changes to s16p.
I implemented a simple video player which uses ffmpeg. It can play video 3 years ago, but failed to output the correct pcm stream after changing to update ffmpeg. I spent lots time and finally found that the audio should have been s16 instead of s16p. The decoded audio stream works after I added the line before calling avcodec_decode_audio4,
audio_codec_ctx->sample_fmt = AV_SAMPLE_FMT_S16
but it is just a hack. Does anyone encounter this issue? How to make ffmpeg work correctly? Any hint is appreciated. Thanks!

The output format changed. The reason for this is fairly convoluted and technical, but let me try explaining it anyway.
Most audio codecs are structured such that the output of each channel is best reconstructed individually, and the merging of channels (interleaving of a "left" and "right" buffer into an array of samples ordered left0 right0 left1 right1 [etc]) happens at the very end. You can probably imagine that if the encoder wants to deinterleave again, then transcoding of audio involves two redundant operations (interleaving/deinterleaving). Therefore, all decoders where it makes sense were switched to output planar audio (so s16 changed to s16p, where p means planar), where each channel is its own buffer.
So: nowadays, interleaving is done using a resampling library (libswresample) after decoding instead of as an integral part of decoding, and only if the user explicitly wants to do so, rather than automatically/always.
You can indeed set the request sample format to S16 to force decoding to s16 instead of s16p. Consider this a compatibility hack that will at some point be removed for the few decoders for which it does work, and also one that will not work for new decoders. Instead, consider adding libswresample support to your application to convert between whatever is the native output format of the decoder, and the format you want to use for further data processing (e.g. playback using sound card).

How do I alter my FFMPEG command to make my HTTP Live Streams more efficient?

I want to reduce the muxing overhead when creating .ts files using FFMPEG.
Im using FFMPEG to create a series of transport stream files used for HTTP live streaming.
./ffmpeg -i myInputFile.ismv \
-vcodec copy \
-acodec copy \
-bsf h264_mp4toannexb \
-map 0 \
-f segment \
-segment_time 10\
-segment_list_size 999999 \
-segment_list output/myVarientPlaylist.m3u8 \
-segment_format mpegts \
output/myAudioVideoFile-%04d.ts
My input is in ismv format and contains a video and audio stream:
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 320x240, 348 kb/s, 29.97 tbr, 10000k tbn, 59.94 tbc
Stream #0:1(und): Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 63 kb/s
There is an issues related to muxing that is causing a large amout of overhead to be added to the streams. This is how the issue was described to me for the audio:
So for a given aac stream, the overhead will be 88% (since 200 bytes will map to 2 x 188 byte packets).
For video, the iframe packets are quite large, so they translate nicely into .ts packets, however, the diffs can be as small as an audio packet, therefore they suffer from the same issue.
The solution is to combine several aac packets into one larger stream before packaging them into .ts. Is this possible out of the box with FFMPEG?

It is not possible. Codecs rely on the encapsulating container for framing, which means to signal the start and length of a frame.
Your graphic actually misses an element, which is the PES packet. Your audio frame will be put into a PES packet first (which indicates its length), then the PES packet will be cut into smaller chunks which will be TS packets.
By design you can not start a new PES packet (containing an audio frame in your case) in a TS packet which already contains data. A new PES packet will always start in a new TS packet. Otherwise it would be impossible to start playing mid-stream (broadcast sitation) - it would be impossible to know on which byte in the TS the new PES begins (remember you have missed the beginning of the current PES).
There are some mitigating factors, the FF FF FF padding will probably be compressed by the networking hardware. Also if you are using HTTP (instead of UDP or RDP), gzip compression can be enabled (but I doubt it would help much).

I've fixed the worst problem of syncing the TS output on each frame in http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=75c8d7c2b4972f6ba2cef605949f57322f7c0361 - please try a version after that.

Restreaming video containing two languages live with ffmpeg

I have a project where i need to restream a live stream which has two languages setup on the audio.
Spanish on left and English on right
The stream mapping is:
Stream #0:0: Video: h264 ([7][0][0][0] / 0x0007), yuv420p, 512x288 [SAR 1:1 DAR 16:9], q=2-31, 1k tbn, 1k tbc
Stream #0:1: Audio: mp3 ([2][0][0][0] / 0x0002), 44100 Hz, stereo, s16, 18 kb/s
I need to restream this back live with just the English from the right side or just spanish from the left side, I tried looking everywhere but did not find any type of solution .
Since this needs to be done live, I can't be using other programs to separate video and audio to get it done.
This needs to be done through ffmpeg and I wonder if it even capable of doing so with original built or it would need some custom modification.

You can use the -map_channel option or the pan filter. Unfortunately you didn't specify if you want a stereo or mono output. If stereo you can simply mute a channel or duplicate a channel into both left and right channels of the output. Here are some examples assuming you want to keep a stereo output.
To copy the right channel of the input to the left and right channels of the output:
ffmpeg -i input -map_channel 0.1.1 -map_channel 0.1.1 output
To mute the left channel:
ffmpeg -i input -map_channel -1 -map_channel 0.1.1 output
To mute the left channel using pan:
ffmpeg -i input -filter:a pan="stereo:c1=c1" output
FFmpeg usage questions are better suited for superuser.com since stackoverflow is programming specific.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string