Decoding AAC to PCM with ffmpeg results in noise - audio

I have a .mp4 file generated with ffmpeg as follows.
ffmpeg -y -i video_extended.mp4 -itsoffset 00:00:04.00 -i output5-1.wav -map 0:0 -map 1:0 -c:v copy -c:a aac -ac 6 -ar 48000 -b:a 128k -async 1 mixed.mp4
Playing mixed.mp4 file with ffplay is fine and there is no impact to the sound quality. Below is the output I get from ffplay when using the command ffplay -i mixed.mp4
> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
> 'mixed_h264_aac_512k_async_qp0_all_I.mp4': Metadata:
> major_brand : isom
> minor_version : 512
> compatible_brands: isomiso2avc1mp41
> encoder : Lavf58.76.100 Duration: 00:00:16.02, start: 0.000000, bitrate: 49136 kb/s Stream #0:0[0x1](und): Video: h264 (High 4:4:4 Predictive) (avc1 / 0x31637661), yuv422p10le(progressive),
> 1920x1080, 65409 kb/s, 59.94 fps, 59.94 tbr, 11988 tbn (default)
> Metadata:
> handler_name : VideoHandler
> vendor_id : [0][0][0][0] Stream #0:1[0x2](und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, 5.1, fltp, 71 kb/s (default)
> Metadata:
> handler_name : SoundHandler
> vendor_id : [0][0][0][0] Switch subtitle stream from #-1 to #-1 vq= 1606KB sq= 0B f=0/0
Then, I decode the mixed.mp4 file back to raw PCM using the following command.
ffmpeg -i mixed.mp4 -vn -acodec pcm_s16le -f s16le -ar 48000 -ac 6 raw_audio.pcm
However, this raw_audio.pcm contains a lot of noise and ffplay output shows the following output
[s16le # 0x7f7490000c80] Estimating duration from bitrate, this may be inaccurate
Input #0, s16le, from 'separated_audio_s16.pcm':
Duration: 00:00:16.02, bitrate: 4607 kb/s
Stream #0:0: Audio: pcm_s16le, 48000 Hz, 6 channels, s16, 4608 kb/s
[pcm_s16le # 0x7f749002b940] Multiple frames in a packet.
[pcm_s16le # 0x7f749002b940] Invalid PCM packet, data has size 8 but at least a size of 12 was expected
Last message repeated 32 times
[pcm_s16le # 0x7f749002b940] Invalid PCM packet, data has size 8 but at least a size of 12 was expected
Last message repeated 11 times
Switch subtitle stream from #-1 to #-1 vq= 0KB sq= 0B f=0/0
[pcm_s16le # 0x7f749002b940] Invalid PCM packet, data has size 8 but at least a size of 12 was expected
Last message repeated 11 times
[pcm_s16le # 0x7f749002b940] Invalid PCM packet, data has size 8 but at least a size of 12 was expected
Last message repeated 11 times
[pcm_s16le # 0x7f749002b940] Invalid PCM packet, data has size 8 but at least a size of 12 was expected
Can someone please explain the issue here? Note that the ffplay command that works correctly for mixed.mp4 shows fltp as the audio format, whereas when playing the raw_audio.pcm file, it is seen as s16.
Is this a resampling issue in ffmpeg, and how can I rectify this?
I’m using ffmpeg and ffplay versions 5.0.1 in a Fedora 36 system.
Thank you.

Related

FFmpeg: Concatenate .webm files some are video only some audio only some with both

In my case I have 3 .webm files the first one is audio only, the second one is video only, the third one is audio and video.
I want to concatenate them into a single file which shows black screen for audio only parts, video for video only parts, and plays both for the parts that have audio and video.
The video codec is VP8, the audio codec is Opus.
concat.txt contains the entries for the three files
I am using the following command to concatenate them.
ffmpeg -f concat -safe 0 -i ./concat.txt -c copy -y output.webm
This command creates the output file, when I play it it only plays the first audio only part and crashes when it reaches the video only part.
I also tried to add a dummy picture to the audio only files but the command fails when I try to concatenate.
Any and all help/critique is welcome.
Thank you!
More Info On the Input files
Input #0, matroska,webm, from 'original1.webm':
Metadata:
title : -
ENCODER : Lavf58.45.100
Duration: 00:00:09.99, start: 0.000000, bitrate: 34 kb/s
Stream #0:0: Audio: opus, 48000 Hz, stereo, fltp (default)
Metadata:
DURATION : 00:00:09.990000000
Input #1, matroska,webm, from 'original2.webm':
Metadata:
title : -
ENCODER : Lavf58.45.100
Duration: 00:00:09.75, start: 0.000000, bitrate: 954 kb/s
Stream #1:0: Video: vp8, yuv420p(tv, bt470bg/unknown/unknown, progressive), 1680x1050, SAR 1:1 DAR 8:5, 1k tbr, 1k tbn, 1k tbc (default)
Metadata:
DURATION : 00:00:09.754000000
Input #2, matroska,webm, from 'original3.webm':
Metadata:
title : -
ENCODER : Lavf58.45.100
Duration: 00:00:10.02, start: 0.000000, bitrate: 912 kb/s
Stream #2:0: Audio: opus, 48000 Hz, stereo, fltp (default)
Metadata:
DURATION : 00:00:10.023000000
Stream #2:1: Video: vp8, yuv420p(tv, bt470bg/unknown/unknown, progressive), 1680x1050, SAR 1:1 DAR 8:5, 1k tbr, 1k tbn, 1k tbc (default)
Metadata:
DURATION : 00:00:09.965000000
All files to be concatenated must have the same attributes and stream order.
Add black video to audio only file:
ffmpeg -i audio.webm -f lavfi -i color=s=1680x1050 -r 1000 -map 0 -map 1 -c:a copy -c:v libvpx -shortest output1.webm
Add silent audio to video only file:
ffmpeg -f lavfi -i anullsrc=r=48000:cl=stereo -i video.webm -map 0 -map 1 -c:a libopus -c:v copy -shortest output2.webm
Make input.txt with the following contents:
file 'output1.webm'
file 'output2.webm'
file 'original3.webm'
Concatenate with the concat demuxer:
ffmpeg -f concat -safe 0 -i concat.txt -c copy output.webm

ffmpeg concat of multiple files audio out of sync

I am trying to concatenate multiple files together using ffmpeg concatenate demuxer. However final video is out of sync.
First parts are coming from static image which are converted to 5-seconds video in the following way:
ffmpeg -r 30 -i 1.png -vf loop=loop=150:size=1:start=0 -pix_fmt yuv420p -c:v libx264 -preset superfast -tune stillimage loop.mp4
Then I add silent audio stream:
ffmpeg -i loop.mp4 -f lavfi -i anullsrc -map 0:v -map 1:a -ar 44100 -ac 2 -c:v copy -c:a aac -shortest silent.mp4
ffrobe for the silent.mp4:
ffprobe -v quiet -show_entries stream=start_time,duration silent.mp4
[STREAM]
start_time=0.000000
duration=5.733333
[/STREAM]
[STREAM]
start_time=0.000000
duration=5.665011
[/STREAM]
Which already shows that audio and video streams have different duration.
Then I prepare concat input file, last line is the video with same dimension and framerate, also it has existing audio stream (44.1kHz, stereo)
file silent.mp4
file silent.mp4
... (multiple lines, say 10)
file silent.mp4
file video.mp4
To make sure that inputs have same parameters:
ffmpeg -i silent.mp4 -i video.mp4
ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'silent.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.29.100
Duration: 00:00:05.03, start: 0.000000, bitrate: 32 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1920x1080 [SAR 1:1 DAR 16:9], 19 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 2 kb/s (default)
Metadata:
handler_name : SoundHandler
Input #1, mov,mp4,m4a,3gp,3g2,mj2, from 'video.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.29.100
Duration: 00:00:43.03, start: 0.000000, bitrate: 1622 kb/s
Stream #1:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1920x1080, 1484 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #1:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : SoundHandler
Now, after concatenation final video is out of sync with audio (audio stream of the last part starts before static part is finished)
ffmpeg -f concat -i concat.txt -c copy result.mp4 (no warnings)
I have tried to pad audio stream with silence in both loop.mp4 and video.mp4, but it did not help - it randomly modifies duration and start time of both video and audio streams and audio is again out of sync after concat.
Also, I am not able to increase the duration of static part (so that number of entries in concat.txt can be decreased) because each static part can be different - this is just an example.

How to replace the video track of a part of a video file?

I have an mp4 file like this(same format but longer):
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'N1.2.mp4':
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: mp42mp41
creation_time : 2018-10-31T13:44:21.000000Z
Duration: 00:28:54.21, start: 0.000000, bitrate: 10295 kb/s
Stream #0:0(eng): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, bt709), 1920x1080, 9972 kb/s, 50 fps, 50 tbr, 50k tbn, 100 tbc (default)
Metadata:
creation_time : 2018-10-31T13:44:21.000000Z
handler_name : ?Mainconcept Video Media Handler
encoder : AVC Coding
Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 317 kb/s (default)
Metadata:
creation_time : 2018-10-31T13:44:21.000000Z
handler_name : #Mainconcept MP4 Sound Media Handler
I also have another video file that is 3 minutes long. and has no audio. What is the fastest way to encode the other video in a way that it is encoded like my main video and then replace the last three minutes of the video track of my original video with this?
In other words.
I have video A that is 1 hour long. With the encoding shown above.
I have video B that is 3 minutes long with no audio. with a random encoding.
I want to have video C with the same encoding and same audio as A. But it's video track would be the first 57 minutes of A + B(which is 3 minutes).
I want to do this as fast as possible so I would like to not re encode A.
I know how to concatenate two videos, I use this command:
ffmpeg -f concat -i files.txt -c copy res.mp4
Make end video using parameters of main video:
ffmpeg -i videob.mp4 -f lavfi -i anullsrc=sample_rate=48000:channel_layout=stereo -filter_complex "[0:v]scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2,setsar=1,format=yuv420p,fps=50[v]" -map "[v]" -map 1:a -c:v libx264 -profile:v main -c:a aac -video_track_timescale 50000 -shortest videob2.mp4
Get duration of main video:
ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 main.mp4
Make files.txt which is needed for concat demuxer:
file 'main.mp4'
outpoint 3420
file 'videob2.mp4'
In this example outpoint is main video duration minus end video duration.
Concatenate:
ffmpeg -f concat -i files.txt -i main.mp4 -map 0:v -map 1:a -c copy -shortest output.mp4

Metadata in mp3 not working when piping from ffmpeg with album art

In my program I am piping a webm from a stream to ffmpeg and then pipe the output to a http request. Part of the process is adding metadata for the mp3. This has so far worked great. However after adding an image as album art it has started to act unexpected.
First this is the command line I am using inside the program:
val parameters = listOf("ffmpeg",
"-i", "-",
"-i", albumImage.absolutePath,
"-map", "0",
"-map", "1",
"-c:v", "copy",
"-f", "mp3",
"-id3v2_version", "4",
"-metadata", "title=${info.title}",
"-metadata", "album=YouTube",
"-metadata", "artist=${info.author}",
"-metadata:s:v", "title=Album Cover",
"-metadata:s:v", "comment=Cover (front)",
"-"
)
It creates a valid mp3 file and I can find both the metadata and the image in the mp3 file, however when playing it none of them are displayed in VLC or anywhere else. To test various configurations I have converted it to the command line.
In a first try I have saved the video and the image and stopped using pipes altogether, which results in this:
ffmpeg -i video.webm -i image.jpeg -map 0 -map 1 -c:v copy -f mp3 -id3v2_version 4 -metadata title="Tiësto & KSHMR feat. Vassy - Secrets (Official Music Video)" -metadata album="YouTube" -metadata artist="Spinnin' Records" -metadata:s:v title="Album Cover" -metadata:s:v comment="Cover (front)" output3.mp3
In this case all metadata including the album art is displayed in VLC.
I then recreated the same thing as in my program, piping both video input and audio output, looking like this:
ffmpeg -i - -i image.jpeg -map 0 -map 1 -c:v copy -f mp3 -id3v2_version 4 -metadata title="Tiësto & KSHMR feat. Vassy - Secrets (Official Music Video)" -metadata album="YouTube" -metadata artist="Spinnin' Records" -metadata:s:v title="Album Cover" -metadata:s:v comment="Cover (front)" - < video.webm > output3.mp3
This file is the same as my programs output. Neither title nor album nor album image are displayed (however it can play the file)
To test a few more options I have hardcoded the output file but pipe the input file like this:
ffmpeg -i - -i image.jpeg -map 0 -map 1 -c:v copy -f mp3 -id3v2_version 4 -metadata title="Tiësto & KSHMR feat. Vassy - Secrets (Official Music Video)" -metadata album="YouTube" -metadata artist="Spinnin' Records" -metadata:s:v title="Album Cover" -metadata:s:v comment="Cover (front)" output3.mp3 < video.webm
Now the metadata is working again. When hardcoding the input video and piping the output, its again gone.
So to sum up: When piping the output of ffmpeg the metadata in the file is not properly working. Interestingly the stderr output of ffmpeg looks quite similar
Hardcoded output3.mp3:
ffmpeg version 3.4.4-0ubuntu0.18.04.1 Copyright (c) 2000-2018 the FFmpeg developers
Input #0, matroska,webm, from 'pipe:':
Metadata:
encoder : google/video-file
Duration: 00:03:39.58, start: -0.007000, bitrate: N/A
Stream #0:0(eng): Audio: opus, 48000 Hz, stereo, fltp (default)
Input #1, image2, from 'image.jpeg':
Duration: 00:00:00.04, start: 0.000000, bitrate: 1466 kb/s
Stream #1:0: Video: mjpeg, yuvj420p(pc, bt470bg/unknown/unknown), 320x180, 25 tbr, 25 tbn, 25 tbc
Stream mapping:
Stream #0:0 -> #0:0 (opus (native) -> mp3 (libmp3lame))
Stream #1:0 -> #0:1 (copy)
Output #0, mp3, to 'output3.mp3':
Metadata:
TPE1 : Spinnin' Records
TIT2 : Tiësto & KSHMR feat. Vassy - Secrets (Official Music Video)
TALB : YouTube
TSSE : Lavf57.83.100
Stream #0:0(eng): Audio: mp3 (libmp3lame), 48000 Hz, stereo, fltp (default)
Metadata:
encoder : Lavc57.107.100 libmp3lame
Stream #0:1: Video: mjpeg, yuvj420p(pc, bt470bg/unknown/unknown), 320x180, q=2-31, 25 tbr, 25 tbn, 25 tbc
Metadata:
title : Album Cover
comment : Cover (front)
With pipe output:
ffmpeg version 3.4.4-0ubuntu0.18.04.1 Copyright (c) 2000-2018 the FFmpeg developers
Input #0, matroska,webm, from 'pipe:':
Metadata:
encoder : google/video-file
Duration: 00:03:39.58, start: -0.007000, bitrate: N/A
Stream #0:0(eng): Audio: opus, 48000 Hz, stereo, fltp (default)
Input #1, image2, from 'image.jpeg':
Duration: 00:00:00.04, start: 0.000000, bitrate: 1466 kb/s
Stream #1:0: Video: mjpeg, yuvj420p(pc, bt470bg/unknown/unknown), 320x180, 25 tbr, 25 tbn, 25 tbc
Stream mapping:
Stream #0:0 -> #0:0 (opus (native) -> mp3 (libmp3lame))
Stream #1:0 -> #0:1 (copy)
Output #0, mp3, to 'pipe:':
Metadata:
TPE1 : Spinnin' Records
TIT2 : Tiësto & KSHMR feat. Vassy - Secrets (Official Music Video)
TALB : YouTube
TSSE : Lavf57.83.100
Stream #0:0(eng): Audio: mp3 (libmp3lame), 48000 Hz, stereo, fltp (default)
Metadata:
encoder : Lavc57.107.100 libmp3lame
Stream #0:1: Video: mjpeg, yuvj420p(pc, bt470bg/unknown/unknown), 320x180, q=2-31, 25 tbr, 25 tbn, 25 tbc
Metadata:
title : Album Cover
comment : Cover (front)
Yes, the ID3 header size cannot be filled in when the ID3v2 metadata has to be written in two steps (such as when an image packet has to be inserted) & the output is not seekable.
You can still work around this to a degree by telling ffmpeg to not flush the data quickly. However, ffmpeg will flush if its buffer exceeds 256 kB. Make a small allowance for the other parts of the ID3 header, and that gives you a ceiling for the maximum size of the image.
ffmpeg -i - -i image.jpeg -map 0 -map 1 -c:v copy -f mp3 -id3v2_version 4 -metadata title="Tiësto & KSHMR feat. Vassy - Secrets (Official Music Video)" -metadata album="YouTube" -metadata artist="Spinnin' Records" -metadata:s:v title="Album Cover" -metadata:s:v comment="Cover (front)" -flush_packets 0 - > output3.mp3 < video.webm

FFMPEG: Converting from raw audio to audio/mp4 (audio is being converted with slow speed)

If I convert from mp3 to mp4 directly everything works perfectly. But if I try to convert from raw pcm, the audio speed is slowed down.
I've tried the following (this works):
ffmpeg -i mp3/1.mp3 -strict -2 final.mp4
This doesn't work as expected:
ffmpeg -f s16le -i final.raw -strict -2 -r 26 final.mp4
With the following output:
Input #0, s16le, from 'final.raw':
Duration: 00:08:37.38, bitrate: 705 kb/s
Stream #0:0: Audio: pcm_s16le, 44100 Hz, 1 channels, s16, 705 kb/s
File 'final.mp4' already exists. Overwrite ? [y/N] y
Output #0, mp4, to 'final.mp4':
Metadata:
encoder : Lavf56.40.101
Stream #0:0: Audio: aac ([64][0][0][0] / 0x0040), 44100 Hz, mono, fltp, 128 kb/s
Metadata:
encoder : Lavc56.60.100 aac
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le (native) -> aac (native))
Press [q] to stop, [?] for help
size= 8273kB time=00:08:37.38 bitrate= 131.0kbits/s
video:0kB audio:8185kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.073808%
I've tried to set parameters like:
ffmpeg -ar 44100 -f s16le -i final.raw -strict -2 -r 26 final.mp4
With no luck.
In order to get the PCM from mp3 I'm using nodejs lame decoder:
var decoder = new lame.Decoder({
channels: 2,
bitDepth: 16,
sampleRate: 44100,
bitRate: 128,
outSampleRate: 44100, // 22050
mode: lame.STEREO
});

Resources