mkv file out of sync with linear drift - audio

I have a bunch of mkv files, with FLAC as the audio codec and FFV1 as the video one.
The files were created using an EasyCap aquisition dongle from a VCR analog source. Specifically, I used VLC's "open acquisition device" prompt and selected PAL. Then, I converted the files (audio PCM, video raw YUV) to (FLAC, FFV1) using
ffmpeg.exe -i input.avi -acodec flac -vcodec ffv1 -level 3 -threads 4 -coder 1 -context 1 -g 1 -slices 24 -slicecrc 1 output.mkv
Now, the files are progressively out of sync. It may be due to the fact that while (maybe) the video has a constant framerate, the FLAC track has variable framerate. So, is there a way to sync the track to audio, or something alike? Can FFmpeg do this? Thanks
EDIT
On Mulvya hint, I plotted the difference in sync at various times; the first column shows the seconds elapsed, the second shows the difference - in secs. The plot seems to behave linearly, with 0.0078 as a constant slope. NOTE: measurements taken by hands, by means of a chronometer
EDIT 2
Playing around with VirtualDub, I found that changing the framerate to 25 fps from the original 24.889 (Video->Frame rate...->Change frame rate to) and using the track converted to wav definitely does work. Two problems, though: VirtualDub crashes when importing the original FFV1-FLAC mkv file, so I had to convert the video to H264 to try it out; more, I find it difficult to use an external encoder to save VirtualDub output.
So, could I avoid using VirtualDub, and simply use ffmpeg for it? Here's the exported vdscript:
VirtualDub.audio.SetSource("E:\\4_track2.wav", "");
VirtualDub.audio.SetMode(0);
VirtualDub.audio.SetInterleave(1,500,1,0,0);
VirtualDub.audio.SetClipMode(1,1);
VirtualDub.audio.SetEditMode(1);
VirtualDub.audio.SetConversion(0,0,0,0,0);
VirtualDub.audio.SetVolume();
VirtualDub.audio.SetCompression();
VirtualDub.audio.EnableFilterGraph(0);
VirtualDub.video.SetInputFormat(0);
VirtualDub.video.SetOutputFormat(7);
VirtualDub.video.SetMode(3);
VirtualDub.video.SetSmartRendering(0);
VirtualDub.video.SetPreserveEmptyFrames(0);
VirtualDub.video.SetFrameRate2(25,1,1);
VirtualDub.video.SetIVTC(0, 0, 0, 0);
VirtualDub.video.SetCompression();
VirtualDub.video.filters.Clear();
VirtualDub.audio.filters.Clear();
The first line imports the wav-converted audio track.
Can I set an equivalent pipe in ffmpeg (possibly, using FLAC - not wav)? SetFrameRate2 is maybe the key, here.

Related

Is there a way to ensure mp3 duration accuracy with variable bit rate using FFMPEG?

In our application, we are processing audio files using ffmpeg. Specifically, we use the NodeJS library fluent-ffmpeg, (npm link).
Our audio files are generated from various text to speech providers. We recently noticed that when we converted audio using ssml to add pauses to the generated audio, the duration on the file is no longer correct. Upon further investigation, we noticed that the standard audios were also incorrect, just more accurate overall due to the more consistent data. When we put a pause at the beginning of the audio, the estimate was the worst, overshooting it by a very large margin (e.g., a 25s audio clip would read as 3 minutes long, but skip to the end when playing past the 25s mark.
I did some searching and research into the structure of MP3 files, and to me it seems like the issue is because the duration gets estimated by various audio players. Windows media player is an example, but Firefox's web player seems to also do this. I tried changing the ffmpeg command from using .audioQuality(0), which sets ffmpeg to use VBR, to .audioBitrate(320), which tells ffmpeg to use a constant bitrate.
For reference, the we are using libmp3lame, and the full command that gets run is the following, for the VBR and CBR cases respectively:
For VBR (broken durations): ffmpeg -i <URL> -acodec libmp3lame -aq 0 -f mp3 pipe:1
For CBR (correct duration): ffmpeg -i <URL> -acodec libmp3lame -b:a 320k -f mp3 pipe:1
Note: we then pipe the output to the requesting client application after sending the appropriate file headers, hence the pipe:1 output. The input is a cloud storage url where the source file is located
This fixes our problem of having a correct duration, and it makes sense to me why this would fix it if the problem was because the duration is being estimated by some of these players / audio consumers. But, this came at the cost that the file size was significantly larger, which also makes sense to me. While testing we found that compared to the same file in WAV, the VBR mp3 was about 10% of the WAV file size, while the CBR mp3 was still 50% of the WAV file size. This practically defeats the purpose of supporting the mp3 format for our use-case, which is a smaller but slightly lossy alternative to the large WAV file.
While researching, I found that there can be ID3 tags in a chunk at the beginning of the mp3 file, specifying information for the consumer of the audio to know the duration before potentially having processed the whole file. But, I also found that there doesn't seem to be a standard, at least for duration. More things like song title, album, artist, etc.
My question is, is there a way to get the proper duration onto an mp3 file, preferably via some ffmpeg mechanism, while still using VBR? Thanks!
FFmpeg does write a Xing header by default with duration info. However, that value is only known after the entire stream data has been received, so ffmpeg has to seek to the head to write it. Since you're piping the output, that can't be done.
Write the file locally or to some seekable destination, and then upload.

Using FFmpeg or Similar to Normalize audio in a video to EBU R128 standard

This is my first time here on stack overflow asking question.
I am stuck and really struggling with this. I am trying to make some of my MXF video files to be EBU r128 standard for its audio.
This means that it has to be -23 and not higher than 0.5.
My current process
Watch_folder > Encoding to MXF > Output_folder
I need to makesure when its comes to output folder, those MXF files are EBU R128 Loudness compliant.
What I have done so Far:
FFMPEG:
ffmpeg -i input.mxf -af loudnorm=I=-23:LRA=7:tp=-2:print_format=json -f null -
got the result:
Input Integrated: -15.1 LUFS
Input True Peak: +0.0 dBTP
Input LRA: 17.1 LU
Input Threshold: -26.2 LUFS
Output Integrated: -17.1 LUFS
Output True Peak: -1.5 dBTP
Output LRA: 5.3 LU
Output Threshold: -27.6 LUFS
Normalization Type: Dynamic
Target Offset: +1.1 LU
then i did
ffmpeg -i input.mxf -af loudnorm=I=-23:LRA=7:tp=-2:measured_I=-15.1:measured_LRA=17.1:measured_tp=0:measured_thresh=-27.6:offset=1.1 -ar 48k -y output.mxf
However, when i put it through the software Eff, it says that its not EBU compliant.
*EDIT:
This also reduces the quality. for example; my 6 Gb becomes 250 MB and you can tell the quality downgraded
ffmpeg-normalize
I did the following
ffmpeg-normalize input.mxf -c:a pcm_s32le -ar 48000 -o output.mxf
but this gives me errors.
if i do it without the output file type, i get a mkv which will not work for me. i need it to be mxf.
OK, a few issues here.
Firstly, if your file is measured at -26.2 LUFS, you'd need to add 3.2 dB to get it to -23. But you can't do that, because your true peak is too high (you'd be over full scale). You'll need to compress (dynamic audio compression, not file/rate compression) the audio or use at least a limiter to achieve this.
A good R128 audio track should be mixed properly rather than just run through a normaliser, otherwise you risk it either failing the standard or unwanted audio effects.
If you don't have access to audio editing software or someone who can do this for you, then FFMPEG does include an audio limiter, which will give you enough headroom to raise the level to -23 LUFS.
You can do that with something like this:
-filter_complex alimiter=level_in=1:level_out=1:limit=1.5:attack=7:release=100:level=disabled
However, tuning a limiter well depends on what the video file is of (music, speech, etc) and it is something that's worth taking some time over. Alter the attack and release values until you get the result you want.
Secondly, the reason that FFMPEG has produced a smaller file of lower quality is because you didn't specify anything in the video section. FFMPEG's default action with video is (usually) to encode to h264, so whatever your codec here is (I am assuming DNxHD from the fact that you're using an MXF wrapper) needs to be specified. FFMPEG will copy the video stream though and leave it alone if you include the option -c:v copy (which means copy video codec, basically).
Post your results once you have tried these...!

Combine Audio and Images in Stream

I would like to be able to create images on the fly and also create audio on the fly too and be able to combine them together into an rtmp stream (for Twitch or YouTube). The goal is to accomplish this in Python 3 as that is the language my bot is written in. Bonus points for not having to save to disk.
So far, I have figured out how to stream to rtmp servers using ffmpeg by loading a PNG image and playing it on loop as well as loading a mp3 and then combining them together in the stream. The problem is I have to load at least one of them from file.
I know I can use Moviepy to create videos, but I cannot figure out whether or not I can stream the video from Moviepy to ffmpeg or directly to rtmp. I think that I have to generate a lot of really short clips and send them, but I want to know if there's an existing solution.
There's also OpenCV which I hear can stream to rtmp, but cannot handle audio.
A redacted version of an ffmpeg command I have successfully tested with is
ffmpeg -loop 1 -framerate 15 -i ScreenRover.png -i "Song-Stereo.mp3" -c:v libx264 -preset fast -pix_fmt yuv420p -threads 0 -f flv rtmp://SITE-SUCH-AS-TWITCH/.../STREAM-KEY
or
cat Song-Stereo.mp3 | ffmpeg -loop 1 -framerate 15 -i ScreenRover.png -i - -c:v libx264 -preset fast -pix_fmt yuv420p -threads 0 -f flv rtmp://SITE-SUCH-AS-TWITCH/.../STREAM-KEY
I know these commands are not set up properly for smooth streaming, the result manages to screw up both Twitch's and Youtube's player and I will have to figure out how to fix that.
The problem with this is I don't think I can stream both the image and the audio at once when creating them on the spot. I have to load one of them from the hard drive. This becomes a problem when trying to react to a command or user chat or anything else that requires live reactions. I also do not want to destroy my hard drive by constantly saving to it.
As for the python code, what I have tried so far in order to create a video is the following code. This still saves to the HD and is not responsive in realtime, so this is not very useful to me. The video itself is okay, with the one exception that as time passes on, the clock the qr code says versus the video's clock start to spread apart farther and farther as the video gets closer to the end. I can work around that limitation if it shows up while live streaming.
def make_frame(t):
img = qrcode.make("Hello! The second is %s!" % t)
return numpy.array(img.convert("RGB"))
clip = mpy.VideoClip(make_frame, duration=120)
clip.write_gif("test.gif",fps=15)
gifclip = mpy.VideoFileClip("test.gif")
gifclip.set_duration(120).write_videofile("test.mp4",fps=15)
My goal is to be able to produce something along the psuedo-code of
original_video = qrcode_generator("I don't know, a clock, pyotp, today's news sources, just anything that can be generated on the fly!")
original_video.overlay_text(0,0,"This is some sample text, the left two are coordinates, the right three are font, size, and color", Times_New_Roman, 12, Blue)
original_video.add_audio(sine_wave_generator(0,180,2)) # frequency min-max, seconds
# NOTICE - I did not add any time measurements to the actual video itself. The whole point is this is a live stream and not a video clip, so the time frame would be now. The 2 seconds list above is for our psuedo sine wave generator to know how long the audio clip should be, not for the actual streaming library.
stream.send_to_rtmp_server(original_video) # Doesn't matter if ffmpeg or some native library
The above example is what I am looking for in terms of video creation in Python and then streaming. I am not trying to create a clip and then stream it later, I am trying to have the program be able to respond to outside events and then update it's stream to do whatever it wants. It is sort of like a chat bot, but with video instead of text.
def track_movement(...):
...
return ...
original_video = user_submitted_clip(chat.lastVideoMessage)
original_video.overlay_text(0,0,"The robot watches the user's movements and puts a blue square around it.", Times_New_Roman, 12, Blue)
original_video.add_audio(sine_wave_generator(0,180,2)) # frequency min-max, seconds
# It would be awesome if I could also figure out how to perform advance actions such as tracking movements or pulling a face out of a clip and then applying effects to it on the fly. I know OpenCV can track movements and I hear that it can work with streams, but I cannot figure out how that works. Any help would be appreciated! Thanks!
Because I forgot to add the imports, here are some useful imports I have in my file!
import pyotp
import qrcode
from io import BytesIO
from moviepy import editor as mpy
The library, pyotp, is for generating one time pad authenticator codes, qrcode is for the qr codes, BytesIO is used for virtual files, and moviepy is what I used to generate the GIF and MP4. I believe BytesIO might be useful for piping data to the streaming service, but how that happens, depends entirely on how data is sent to the service, whether it be ffmpeg over command line (from subprocess import Popen, PIPE) or it be a native library.
Are you using ffmpeg.exe and running a command through CMD? If so you can use either concat demuxer or pipe. When you use concat demuxer, ffmpeg can take image input from a text file. Text file should contain image paths and ffmpeg can find those images from different folders. Following code line shows how you can use concat demuxer. Image locations are saved to input.txt fie.
ffmpeg -f concat -i input.txt -vsync vfr -pix_fmt yuv420p output.mp4
But most suitable solution would be to use a data pipe to feed images to ffmpeg.
cat *.png | ffmpeg -f image2pipe -i - output.mkv
you can check this link to see more information about ffmpeg data pipe.
Generating multiple videos and streaming at real time is not a very stable solution. You can run into several problems.
I have settled on using Gstreamer to create my streams on the fly. It can allow me to take separate video and audio streams and combine them together. I do not exactly have a working example right now, but I hopefully will either have an answer or figure it out on my own soon, at Gstreamer in Python exits instantly, but is fine on command line.

FFMPEG: 4-channel audio workflow suggestions?

I’ve got a bunch of stereo files recorded for a documentary with a Zoom in 4 channel mode. Basically it’s sets of pairs of stereo file s— file A would be a stereo file with a lav or boom mike recording, file B of identical length would be a proper stereo recorded by Zoom itself.
Now I’m trying to convert all this into something I can correctly ingest into editing suite. Files A are a mess but I came up with a ffmpeg script which downconvert them to mono then reconvert them back to stereo (to get rid of inconsistensies). Now how do I merge two stereo files into a single WAV or AIFF file containing two separate stereo channels? I browsed around for any workflows and/or standards on that but can’t really find anything useful.
Any ideas on how to do that with ffmpeg (or anything else, really) would be appreciated!
Don't know if FCP-X reads multi track WAVs but you can output to a multi-track MOV.
ffmpeg -i file1.wav -i file2.wav -c copy -map 0 -map 1 file.mov

ffmpeg conversion to mp4 shifts the audio by one frame

I have a .mov file (codec = motion jpeg) that has an audio stream that includes small pulses at every second.
When I convert this file to mp4 using ffmpeg I notice that all my pulses are now off by one frame.
I simply used "ffmpeg -i source_file.mov target_file.mp4"
Here is an image of the comparison between the audio signals:
A1 is the original audio (.mov) and A2 is the mp4 output audio of ffmpeg.
As you can see the pulses are one frame late compared to the original.
I know that the h264 codec is lossy but one frame offset seems like a big loss if you ask me.
Is there any option I could use with ffmpeg to have a better audio stream ?
Here is the input file: https://www.dropbox.com/s/6y5g7lo5dvu0ub1/BBB_09_tree_trunk_009_ANIM_001.mov?dl=0
Here is the output file:
https://www.dropbox.com/s/10zuzwn0qs8l853/BBB_09_tree_trunk_009_ANIM_001.mp4?dl=0
If you copy the audio over, you shouldn't get the shift.
ffmpeg -i source_file.mov -c:a copy target_file.mp4
I've been working on this issue for my own needs and my file format has to be mp4. I'm working from mxf files. I've tried several options and found this to give the most accurate result (I've removed specifics for simplicity):
ffmpeg -ss 00:00:00.021 -i "input.mxf" -itsoffset -0.044 -i "input.mxf" -c:v libx264 -c:a aac -map 0:a -map 1:v "output.mp4"
Starting the first file at 21ms and mapping it as the audio, then shifting the video back 44ms gave gave me the most accurate sync (within several samples). I don't know why 22ms wasn't as accurate (when that's what the primer sample issue seems to equate to) and I found nothing that allowed me to work more granular, in samples. A filter with a PTS offset had no affect. Perhaps it works differently with different file formats. It's also worth noting that the same command without the -itsoffest gave the same sync result with one difference; the video stream duration was 1 frame and 1ms off the audio and container durations. With the -itsoffest, the durations were only 1ms different. You can use 22ms to achieve an accurate duration, but check your sync, it might be out that slightest bit more.
Also worth noting that I stumbled across some developer commentary on the -itsoffset tag which clarified that it doesn't work on audio, it works on video. It seems like the answer above is suggesting to map the offest against the audio, which apparently is not how the function is built to work. https://trac.ffmpeg.org/ticket/1349
try mpeg2 audio: -acodec mp2 it worked for me

Resources