What is the function of the parameter SegmentTimeline in the MPD files? - mpeg-dash

I look the differences between the SegmentTimeLines of a MPD and the next MPD but I don't know their function in a streaming live video.
For example, in Facebook Gaming I have this MPD:
https://video.fagp1-1.fna.fbcdn.net/hvideo-ash2-ftw/s3/v/rv2BbY6JOGDx9uSFI3pAo/live-dash/dash-abr6/2334520010097563.mpd?_nc_rl=AfASGXHXqR__Qe_w&oh=9b8743ae3dd255d8d4363515f7c07f2d&oe=5CDCDFAF
But I have a video content with different time like this:
https://video.fagp1-1.fna.fbcdn.net/hvideo-ash2-ftw/s3/v/rv2BbY6JOGDx9uSFI3pAo/live-dash/live-q1-v/2334520010097563_0-15243750.m4v
It could be problem of synchronization?

Related

Does YouTube store video and audio separately

youtube-dl can be used to see what formats are used to store YouTube content:
youtube-dl -F https://youtu.be/??????
The above command hints that the audio and video are mostly stored separately. Is it right? Does YouTube streaming combine audio and video in real-time?
Formats for a sample YouTube content
Most large streaming services will use ABR streaming (see: https://stackoverflow.com/a/42365034/334402).
The two most common ABR streaming formats are HLS and MPEG-DASH and both provide a manifest or index file which the player downloads first and which will contain links to the media streams, typically audio, video, subtitle tracks etc.
For encrypted content the audio and video, and even different bit rate video tracks, may all have separate encryption keys.
The player will download the audio and video tracks and synchronise them for playback.
in general streaming video and audio are sent in separate channels .... ditto for multi track audio like 5+1 ... during transport these channels are wrapped by a media container like mp4 etc
motive is partly due to distinct compression algorithms ... some algos are best for audio versus others for video and baked into these algos is the spread and sharing of data over time across video frames see B-frames for details ... these channels are not limited to video and audio ... if you own the sending and receiving sides you can send arbitrary data in many distinct channels by making up your own data protocol ... as an aside modern codec like H.256 allow data to get sent from receiver back to sender when you think you are simply viewing a movie (read the RFC)
youtube stores each of its various flavors of video and audio in separate files on its end then combines them based in desired streaming quality choices on a per download basis

ffmpeg - correctly handling misaligned audio/video input stream before outputting to rtmp

I use a video player called MPV to transcode a dynamic playlist of media files.
I pipe MPV's encoded output into FFMPEG and format it for rtmp delivery.
However the playlist may contain media with misaligned audio and video, ie - the audio track may be shorter / longer than the video track.
No matter what MPV will only output what it's given. So if my media file has audio that is 1 second long and video that is 2 seconds long, it will output a media stream with exactly the same misalignment, rather than generating null audio or skipping to the next item in the playlist when it first encounters an active stream ending (eof).
For example, assuming my playlist was full of problematic media where the audio and video of each file was misaligned:
If I output this media stream to a popular streaming service's server, it could lead to stuttering and/or loss of a/v sync.
Similarly, if I output this media stream to a file and played it back in MPV or another video player, the result appears to be more like this:
I have tried to fix this in MPV in all sorts of ways, trying every relevant command line option available. I even wrote a user script that detects 'eof' audio and skips to the next item in the playlist, but it is not fast enough and still leads to small gaps of audio.
So my only hope is correcting it in ffmpeg. In the event of null audio/video, I need a fallback or a generative filter that can fill these empty gaps with silence (audio) or a colour/image (video).
I'm open to any ideas, and if my understanding in a/v encoding is a little off please educate me.

Routing AVPlayer audio output to AVAudioEngine

Due to the richness and complexity of my app's audio content, I am using AVAudioEngine to manage all audio across the app. I am converting every audio source to be represented as a node in my AVAudioEngine graph.
For example, instead using AVAudioPlayer objects to play mp3 files in my app, I create AVAudioPlayerNode objects using buffers of those audio files.
However, I do have a video player in my app that plays video files with audio using the AVPlayer framework (I know of nothing else in iOS that can play video files). Unfortunately, there seems to be no way I can obtain the audio output stream as a node in my AVAudioEngine graph.
Any pointers?
If you have a video file, you can extract audio data and pull it out from the video.
Then you can set the volume of AVPlayer to 0. (If you didn't remove audio data from the video)
and Play AVAudioPlayerNode.
If you receive the video data through network, You should make parser of the packet and divide them.
But AV-sync is very tough thing.

Streaming video from nodejs to an open player

Odd ball question for somebody just getting started with html5 players and streaming video....
When using YouTube long videos can be scrolled towards then end then played from there. Assuming YouTube first pulls down metadata like total video start/stop points and a bunch of thumbnails for scrolling.
Is this possible with an open html5 video player (like projekkter)? Reason asking is that I have video data inside a mongo database that I would like to stream similar to the YouTube player.
Inside mongo I have a bunch of smaller h264 files each in a document: actual raw h264 usually 1000kb (max 2 seconds), creation timestamp (long), and potentially a converted format (like mp4) for known clients. Idea is to query off a time range and order by creation time then piping the results into readable stream. There is a nice ffmpeg module to take streams and reformat if needed. Thought about piping the stream to the client with binaryjs and appending it into the player.
But the source directives in the documentation are usually URLs plus I need to lock down the start/stop point for the total video being played plus thumbnails.

How does youtube support starting playback from any part of the video?

Basically I'm trying to replicate YouTube's ability to begin video playback from any part of hosted movie. So if you have a 60 minute video, a user could skip straight to the 30 minute mark without streaming the first 30 minutes of video. Does anyone have an idea how YouTube accomplishes this?
Well the player opens the HTTP resource like normal. When you hit the seek bar, the player requests a different portion of the file.
It passes a header like this:
RANGE: bytes-unit = 10001\n\n
and the server serves the resource from that byte range. Depending on the codec it will need to read until it gets to a sync frame to begin playback
Video is a series of frames, played at a frame rate. That said, there are some rules about the order of what frames can be decoded.
Essentially, you have reference frames (called I-Frames) and you have modification frames (class P-Frames and B-Frames)... It is generally true that a properly configured decoder will be able to join a stream on any I-Frame (that is, start decoding), but not on P and B frames... So, when the user drags the slider, you're going to need to find the closest I frame and decode that...
This may of course be hidden under the hood of Flash for you, but that is what it will be doing...
I don't know how YouTube does it, but if you're looking to replicate the functionality, check out Annodex. It's an open standard that is based on Ogg Theora, but with an extra XML metadata stream.
Annodex allows you to have links to named sections within the video or temporal URIs to specific times in the video. Using libannodex, the server can seek to the relevant part of the video and start serving it from there.
If I were to guess, it would be some sort of selective data retrieval, like the Range header in HTTP. that might even be what they use. You can find more about it here.

Resources