hls video: Why is there a difference between segment length and target duration? - http-live-streaming

I am trying to prepare the files for a hls video.
As player I am using video.js and I am transcoding my content with ffmpeg into multiple streams of different size and bitrate.
I tried a lot of options but mainly I kept framerate and bitrate constant and produced iframes every second as I want to have 3s segments.
Then I segmented the streams with mp4hls and processed the playlists.
It seems to work all perfectly, the playlists are correct, the iframes also, but:
The length of the segments is 2 seconds and not like expected 3 seconds?
something like:
...-b:v: 192k -bufsize 200k -maxrate 192k -r 30 -g 30 -x264opts no-scenecut
and in python:
mp4hls --segment-duration 3 320x180.mp4
480x270.mp4
640x360.mp4
....
I would like to know if there is somewhere an error in my workflow
or if this is correct as I read in the hls specifications
that the segment must be equal or smaller than the #EXT-X-TARGETDURATION:3
Can somebody please explain to a beginner, why the segments are not the same
length than
written in the playlist. I could find nothing about this topic. Thanks.

All segments don’t need to the same duration. The target duration is just a hint as to how often the player should expect a manifest update.

This is almost always a framerate issue. It looks like you’re using FFMPEG. It has default settings for scene-cut and works out its own keyframes.. Therefore, you need to set this explicitly if you want specifically timed segments. Off the top of my head you need to set GOP, and Keyframe values.. Something like:
-g 60 -keyint_min 60 -sc_threshold 0
Will give you a strict 2 second segment with a 30fps input

Related

Is there a way to ensure mp3 duration accuracy with variable bit rate using FFMPEG?

In our application, we are processing audio files using ffmpeg. Specifically, we use the NodeJS library fluent-ffmpeg, (npm link).
Our audio files are generated from various text to speech providers. We recently noticed that when we converted audio using ssml to add pauses to the generated audio, the duration on the file is no longer correct. Upon further investigation, we noticed that the standard audios were also incorrect, just more accurate overall due to the more consistent data. When we put a pause at the beginning of the audio, the estimate was the worst, overshooting it by a very large margin (e.g., a 25s audio clip would read as 3 minutes long, but skip to the end when playing past the 25s mark.
I did some searching and research into the structure of MP3 files, and to me it seems like the issue is because the duration gets estimated by various audio players. Windows media player is an example, but Firefox's web player seems to also do this. I tried changing the ffmpeg command from using .audioQuality(0), which sets ffmpeg to use VBR, to .audioBitrate(320), which tells ffmpeg to use a constant bitrate.
For reference, the we are using libmp3lame, and the full command that gets run is the following, for the VBR and CBR cases respectively:
For VBR (broken durations): ffmpeg -i <URL> -acodec libmp3lame -aq 0 -f mp3 pipe:1
For CBR (correct duration): ffmpeg -i <URL> -acodec libmp3lame -b:a 320k -f mp3 pipe:1
Note: we then pipe the output to the requesting client application after sending the appropriate file headers, hence the pipe:1 output. The input is a cloud storage url where the source file is located
This fixes our problem of having a correct duration, and it makes sense to me why this would fix it if the problem was because the duration is being estimated by some of these players / audio consumers. But, this came at the cost that the file size was significantly larger, which also makes sense to me. While testing we found that compared to the same file in WAV, the VBR mp3 was about 10% of the WAV file size, while the CBR mp3 was still 50% of the WAV file size. This practically defeats the purpose of supporting the mp3 format for our use-case, which is a smaller but slightly lossy alternative to the large WAV file.
While researching, I found that there can be ID3 tags in a chunk at the beginning of the mp3 file, specifying information for the consumer of the audio to know the duration before potentially having processed the whole file. But, I also found that there doesn't seem to be a standard, at least for duration. More things like song title, album, artist, etc.
My question is, is there a way to get the proper duration onto an mp3 file, preferably via some ffmpeg mechanism, while still using VBR? Thanks!
FFmpeg does write a Xing header by default with duration info. However, that value is only known after the entire stream data has been received, so ffmpeg has to seek to the head to write it. Since you're piping the output, that can't be done.
Write the file locally or to some seekable destination, and then upload.

Frequency response of ffmpeg filters

I'm using ffmpeg to decode and encode signal. It works perfectly and I added filters. For example, I'm using such a command :
ffmpeg -re -i /home/dr_click/live.wav -af "anequalizer=c0 f=200 w=100 g=-5 t=0|c1 f=200 w=100 g=-5 t=0, anequalizer=c0 f=1000 w=100 g=3 t=0|c1 f=1000 w=100 g=3 t=0" -acodec pcm_s16be -ar 44100 -ac 2 -f rtp rtp://127.0.0.1:1234
I'm streaming my file, adding 2 filters with 200 Hz and 1000 Hz as central frequency and 100 Hz width and it works.
With such a filter, I know my gain will be -5db at 200Hz. But what is the gain for frequencies at 250 Hz ? Still -5db ? -4.5db ? -3db ? And same question at 350Hz or any other frequency.
What I'm looking for and didn't found is the way to get the frequency response of such a filter for a bandwith from 20Hz to 20kHz. In other words, what I'd like to know for any frequency is : gain = f (frequency) with a given ffmpeg filter
Thank you for your help,
Dr_Click
i'm working on a quite similar issue. Mine is to replace the system wide 15 band graphical LADSPA equalizer (mbeq_1197, controlled by JACK Rack) with an ffmpeg filter. As it is AFAIK impossible to adjust ffmpeg filter parameters during runtime, I have to rely on my already generated JACK EQ settings and need to transfer them to the ffmpeg EQ. Alas, I could not find any two "comparable" EQs: ffmpeg only offers a 18 band "superequalizer". My previous EQ has 15 bands, so I decided to do some interpolations and compare the frequency responses of the old and the new EQ.
Now to answer your question: I'm not an audio engineer, and I'm sure there are more professional ways. But what I found out for now is my current workflow:
Generate some white noise. In Linux you can e.g. use sox oder Audacity. In Audacity do Generate -> Built-in -> Noise... => White noise (1 min should be enough)
Save the file as WAV.
Apply your filter to this WAV: ffmpeg -i whitenoise.wav -af "<your filter>" whitenoise_filtered.wav
Load the filtered file into Audacity and do Analyze -> Plot Spectrum...
The output will be a little scattered because the white noise is not perfect, but this should be negligible.
Good luck!
Flittermice

mkv file out of sync with linear drift

I have a bunch of mkv files, with FLAC as the audio codec and FFV1 as the video one.
The files were created using an EasyCap aquisition dongle from a VCR analog source. Specifically, I used VLC's "open acquisition device" prompt and selected PAL. Then, I converted the files (audio PCM, video raw YUV) to (FLAC, FFV1) using
ffmpeg.exe -i input.avi -acodec flac -vcodec ffv1 -level 3 -threads 4 -coder 1 -context 1 -g 1 -slices 24 -slicecrc 1 output.mkv
Now, the files are progressively out of sync. It may be due to the fact that while (maybe) the video has a constant framerate, the FLAC track has variable framerate. So, is there a way to sync the track to audio, or something alike? Can FFmpeg do this? Thanks
EDIT
On Mulvya hint, I plotted the difference in sync at various times; the first column shows the seconds elapsed, the second shows the difference - in secs. The plot seems to behave linearly, with 0.0078 as a constant slope. NOTE: measurements taken by hands, by means of a chronometer
EDIT 2
Playing around with VirtualDub, I found that changing the framerate to 25 fps from the original 24.889 (Video->Frame rate...->Change frame rate to) and using the track converted to wav definitely does work. Two problems, though: VirtualDub crashes when importing the original FFV1-FLAC mkv file, so I had to convert the video to H264 to try it out; more, I find it difficult to use an external encoder to save VirtualDub output.
So, could I avoid using VirtualDub, and simply use ffmpeg for it? Here's the exported vdscript:
VirtualDub.audio.SetSource("E:\\4_track2.wav", "");
VirtualDub.audio.SetMode(0);
VirtualDub.audio.SetInterleave(1,500,1,0,0);
VirtualDub.audio.SetClipMode(1,1);
VirtualDub.audio.SetEditMode(1);
VirtualDub.audio.SetConversion(0,0,0,0,0);
VirtualDub.audio.SetVolume();
VirtualDub.audio.SetCompression();
VirtualDub.audio.EnableFilterGraph(0);
VirtualDub.video.SetInputFormat(0);
VirtualDub.video.SetOutputFormat(7);
VirtualDub.video.SetMode(3);
VirtualDub.video.SetSmartRendering(0);
VirtualDub.video.SetPreserveEmptyFrames(0);
VirtualDub.video.SetFrameRate2(25,1,1);
VirtualDub.video.SetIVTC(0, 0, 0, 0);
VirtualDub.video.SetCompression();
VirtualDub.video.filters.Clear();
VirtualDub.audio.filters.Clear();
The first line imports the wav-converted audio track.
Can I set an equivalent pipe in ffmpeg (possibly, using FLAC - not wav)? SetFrameRate2 is maybe the key, here.

How to Simply Remove Duplicate Frames from a Video using ffmpeg

First of all, I'd preface this by saying I'm NO EXPERT with video manipulation,
although I've been fiddling with ffmpeg for years (in a fairly limited way). Hence, I'm not too flash with all the language folk often use... and how it affects what I'm trying to do in my manipulations... but I'll have a go with this anyway...
I've checked a few links here, for example:
ffmpeg - remove sequentially duplicate frames
...but the content didn't really help me.
I have some hundreds of video clips that have been created under both Windows and Linux using both ffmpeg and other similar applications. However, they have some problems with times in the video where the display is 'motionless'.
As an example, let's say we have some web site that streams a live video into, say, a Flash video player/plugin in a web browser. In this case, we're talking about a traffic camera video stream, for example.
There's an instance of ffmpeg running that is capturing a region of the (Windows) desktop into a video file, viz:-
ffmpeg -hide_banner -y -f dshow ^
-i video="screen-capture-recorder" ^
-vf "setpts=1.00*PTS,crop=448:336:620:360" ^
-an -r 25 -vcodec libx264 -crf 0 -qp 0 ^
-preset ultrafast SAMPLE.flv
Let's say the actual 'display' that is being captured looks like this:-
123456789 XXXXX 1234567 XXXXXXXXXXX 123456789 XXXXXXX
^---a---^ ^-P-^ ^--b--^ ^----Q----^ ^---c---^ ^--R--^
...where each character position represents a (sequence of) frame(s). Owing to a poor internet connection, a "single frame" can be displayed for an extended period (the 'X' characters being an (almost) exact copy of the immediately previous frame). So this means we have segments of the captured video where the image doesn't change at all (to the naked eye, anyway).
How can we deal with the duplicate frames?... and how does our approach change if the 'duplicates' are NOT the same to ffmpeg but LOOK more-or-less the same to the viewer?
If we simply remove the duplicate frames, the 'pacing' of the video is lost, and what used to take, maybe, 5 seconds to display, now takes a fraction of a second, giving a very jerky, unnatural motion, although there are no duplicate images in the video. This seems to be achievable using ffmpeg with an 'mp_decimate' option, viz:-
ffmpeg -i SAMPLE.flv ^ ... (i)
-r 25 ^
-vf mpdecimate,setpts=N/FRAME_RATE/TB DEC_SAMPLE.mp4
That reference I quoted uses a command that shows which frames 'mp_decimate' will remove when it considers them to be 'the same', viz:-
ffmpeg -i SAMPLE.flv ^ ... (ii)
-vf mpdecimate ^
-loglevel debug -f null -
...but knowing that (complicated formatted) information, how can we re-organize the video without executing multiple runs of ffmpeg to extract 'slices' of video for re-combining later?
In that case, I'm guessing we'd have to run something like:-
user specifies a 'threshold duration' for the duplicates
(maybe run for 1 sec only)
determine & save main video information (fps, etc - assuming
constant frame rate)
map the (frame/time where duplicates start)->no. of
frames/duration of duplicates
if the duration of duplicates is less than the user threshold,
don't consider this period as a 'series of duplicate frames'
and move on
extract the 'non-duplicate' video segments (a, b & c in the
diagram above)
create 'new video' (empty) with original video's specs
for each video segment
extract the last frame of the segment
create a short video clip with repeated frames of the frame
just extracted (duration = user spec. = 1 sec)
append (current video segment+short clip) to 'new video'
and repeat
...but in my case, a lot of the captured videos might be 30 minutes long and have hundreds of 10 sec long pauses, so the 'rebuilding' of the videos will take a long time using this method.
This is why I'm hoping there's some "reliable" and "more intelligent" way to use
ffmepg (with/without the 'mp_decimate' filter) to do the 'decimate' function in only a couple of passes or so... Maybe there's a way that the required segments could even be specified (in a text file, for example) and as ffmpeg runs it will
stop/restart it's transcoding at specified times/frame numbers?
Short of this, is there another application (for use on Windows or Linux) that could do what I'm looking for, without having to manually set start/stop points,
extracting/combining video segments manually...?
I've been trying to do all this with ffmpeg N-79824-gcaee88d under Win7-SP1 and (a different version I don't currently remember) under Puppy Linux Slacko 5.6.4.
Thanks a heap for any clues.
I assume what you want to do is to keep frames with motion and upto 1 second of duplicate frames but discard the rest.
ffmpeg -i in.mp4 -vf
"select='if(gt(scene,0.01),st(1,t),lte(t-ld(1),1))',setpts=N/FRAME_RATE/TB"
trimmed.mp4
What the select filter expression does is make use of an if-then-else operator:
gt(scene,0.01) checks if the current frame has detected motion relative to the previous frame. The value will have to be calibrated based on manual observation by seeing which value accurately captures actual activity as compared to sensor/compression noise or visual noise in the frame. See here on how to get a list of all scene change values.
If the frame is evaluated to have motion, the then clause evaluates st(1,t). The function st(val,expr) stores the value of expr in a variable numbered val and it also returns that expression value as its result. So, the timestamp of the kept frames will keep on being updated in that variable until a static frame is encountered.
The else clause checks the difference between the current frame timestamp and the timestamp of the stored value. If the difference is less than 1 second, the frame is kept, else discarded.
The setpts sanitizes the timestamps of all selected frames.
Edit: I tested my command with a video input I synthesized and it worked.
I've done a bit of work on this question... and have found the following works pretty well...
It seems like the input video has to have a "constant frame rate" for things to work properly, so the first command is:-
ffmpeg -i test.mp4 ^
-vf "scale=trunc(iw/2)*2:trunc(ih/2)*2" ^
-vsync cfr test01.mp4
I then need to look at the 'scores' for each frame. Such a listing is produced by:-
ffmpeg -i test01.mp4 ^
-vf select="'gte(scene,0)',metadata=print" -f null -
I'll look at all those scores... and average them (mean) - a bit dodgy but it seems to work Ok. In this example, that average score is '0.021187'.
I then have to select a 'persistence' value -- how long to let the 'duplicated' frames run. If you force it to only keep one frame, the entire video will tend to run much too quickly... So, I've been using 0.2 seconds as a starting point.
So the next command becomes:-
ffmpeg -i test01.mp4 ^
-vf "select='if(gt(scene,0.021187),st(1,t),lte(t-ld(1),0.20))',
setpts=N/FRAME_RATE/TB" output.mp4
After that, the resultant 'output.mp4' video seems to work pretty well. It's only a bit of fiddling with the 'persistence' value that might need to be done to compromise between having a smoother-playing video and scenes that change a bit abruptly.
I've put together some Perl code that works Ok, which I'll work out how to post, if folks are interested in it... eventually(!)
Edit: Another advantage of doing this 'decimating', is that files are of shorter duration (obviously) AND they are smaller in size. For example, a sample video that ran for 00:07:14 and was 22MB in size went to 00:05:35 and 11MB.
Variable frame rate encoding is totally possible, but I don't think it does what you think it does. I am assuming that you wish to remove these duplicate frames to save space/bandwidth? If so, it will not work because the codec is already doing it. Codecs use reference frames, and only encode what has changed from the reference. Hence the duplicate frame take almost no space to begin with. Basically frames are just encoded as a packet of data saying, copy the previous frame, and make this change. The X frames have zero changes, so it only takes a few bytes to encode each one.

Seting dwScale and dwRate values in the AVISTREAMHEADER structure at AVI muxing

During capturing from some audio and video sources and encoding at AVI container for synchronizing audio & video I set audio as a master stream and this gave best result for synchronizing.
http://msdn.microsoft.com/en-us/library/windows/desktop/dd312034(v=vs.85).aspx
But this method gives a higher FPS value as a result. About 40 or 50 instead of 30 FPS.
If this media file just playback - all OK, but if try to recode at different software to another video format appears out of sync.
How can I programmatically set dwScale and dwRate values in the AVISTREAMHEADER structure at AVI muxing?
How can I programmatically set dwScale and dwRate values in the AVISTREAMHEADER structure at AVI muxing?
MSDN:
This method works by adjusting the dwScale and dwRate values in the AVISTREAMHEADER structure.
You requested that multiplexer manages the scale/rate values, so you cannot adjust them. You should be seeing more odd things in your file, not just higher FPS. The file itself is perhaps out of sync and as soon as you process it with other applciations that don't do playback fine tuning, you start seeing issues. You might be having video media type showing one frame rate and effectively the rate is different.

Resources