I'm trying to save to a file an HTTP live video stream. I know that for this purpose, I will need to request periodically the M3U8 file, parse it to extract the URL of the media segments, download the segments and reassemble them. The problem I'm having is finding the right strategy to achieve smooth playback. The reassembled video is always choppy, the audio skips etc...only the first few seconds are okay.My M3U8 file looks something like this:
#EXTM3U
#EXT-X-TARGETDURATION:2
#EXT-X-VERSION:3
#EXT-X-ALLOW-CACHE:NO
#EXT-X-MEDIA-SEQUENCE:1105
#EXTINF:1.00000,
tmp/video1.ts
#EXTINF:1.00000,
tmp/video2.ts
#EXTINF:1.00000,
tmp/video2.ts
#EXTINF:1.00000,
tmp/video3.ts
#EXTINF:1.00000,
tmp/video4.ts
#EXTINF:1.00000,
tmp/video5.ts
After I parse the file, I start downloading all TS files (one at the time) and when I'm about to download the second from the last, I request a new M3U8 file. Is this wrong? maybe the server has not yet updated the segments? Therefore, I'm re-downloading the same ones? I tried to wait for 5 seconds (number_of_videos * duration) before requesting a new playlist but I still experience the playback issues mentioned.Any idea/strategy on how I can achieve smooth playback?
The basic strategy is more or less as follows.
You start by processing the manifest file and downloading the first few segments to fill your buffer. You begin playback once you are happy you have enough data in the buffer, while continuing to download additional segments sequentially until the end of the manifest, at which point you request it again. If you find new segments in the refreshed manifest, you add these URLs to your download queue. If you do not, you wait for a predetermined period of time and refresh the manifest again. For example, your client could poll the M3U8 manifest depending on the (total duration of the segments * number of segments / 2).
I know some commercial players enter a paranoid mode when the playback buffer is getting low and the refreshed manifest does not contain any new segments to download. In this case, they begin requesting for the updated manifest much more frequently.
You also need to pay attention to caching between your client and the HTTP server with the content. For example, some CDNs will cache the manifest file for a minimum mandatory period of time, so if you try to request it within this period, you may get a stale manifest file served back to you.
From your example above (and I hope it is just in your hand-crafted example), the duration for each segment appears to be 1 second, which is quite low. If this is really the case, you would want to adjust your initial buffer accordingly.
And lastly, I presume you've verified the source stream with a stable player, to make sure the problem is not on the other end?
-- ab1
Related
After creating the playlist with mp4 URLs the loading time between two mp4 files is high and the stream is not running smoothly. Please let me know if this can be fix by changing some settings on the server.
Let me explain the best practices for that. I hope it helps.
Improve WebRTC Playback Experience
ATTENTION: It does not make sense to play the stream with WebRTC because it’s already recorded file and there is no ultra low latency requirement. It make sense to play the stream with HLS. Just keep in mind that WebRTC playback uses more processing resources than the HLS. Even if you would like to decrease the amount of time to switch streams, please read the followings.
Open the embedded player(/usr/local/antmedia/webapps/{YOUR_APP}/play.html)
Find the genericCallback method and decrease the timeout value from 3000 to 1000 or even lower at the end of the genericCallback method. It’s exactly this line
Decrease the key frame interval of the video. You can set to 1 seconds. Generally recommend value is 2 seconds. WebRTC needs key frame to start the play. If the key frame interval is 10 seconds(default in ffmpeg), player may wait up to 10 seconds to play.
Improve HLS Playback Experience
Open the properties file of the application -> /usr/local/antmedia/webapps/{YOUR_APP}/WEB-INF/red5-web.properties
Add the following property
settings.hlsflags=delete_segments+append_list+omit_endlist
Let me explain what it means.
delete_segments just deletes the segment files that is out of the list so that your disk will not get full.
append_list just adds the
new segment files to the older m3u8 file so that player thinks that it’s just playing the same stream.
omit_endlist disables writing the
EXT-X-ENDLIST to the end of the file so player thinks that new segments are in their way and it wait for them. It does not run
stopping the stream.
Disable deleting hls files on ended to not encounter any race condition. Continue editing the file /usr/local/antmedia/webapps/{YOUR_APP}/WEB-INF/red5-web.properties and replace the following line
settings.deleteHLSFilesOnEnded=true with this one
settings.deleteHLSFilesOnEnded=false
Restart the Ant Media Server
sudo service antmedia restart
antmedia.io
I come to a technical problem and I need you.
Situation data:
I record the screen as well as 1 to 2 audio tracks (microphone and speaker).
These three recordings are done separately (it could be mixed but I don't prefer) and every 10s (this is configurable), I send the chunk of recorded data to my backend. We, therefore, have 2 to 3 chunks sent every 10s.
These data chunks are interdependent. Example: The 1st video chunk starts with the headers and a keyframe. The second chunk can be in the middle of a frame. It's like having the entire video and doing a random one-bit split.
The video stream is in h264 in a WebM container. I don't have a lot of control over it.
The audio stream is in opus in a WebM container. I can't use aac directly, nor do I have much control.
Given the reality, the server may be restarted randomly (crash, update, scaled, ...). It doesn't happen often (4 times a week). In addition, the customer can, once the recording ends on his side, close the application or his computer. This will prevent the end of the recording from being sent. Once it reconnects, the missing data chunks are sent. This, therefore, prevents the use of a "live" stream on the backend side.
Goals :
Store video and audio as it is received on the server in cloud storage.
Be able to start playing the video/audio even when the upload has not finished (so in a live stream)
As soon as the last chunks have been received on the server, I want the entire video to be already available in VoD (Video On Demand) with as little delay as possible.
Everything must be distributed with the audios in AAC. The audios can be mixed or not, and mixed or not with the video.
Current and blocking solution:
The most promising solution I have seen is using HLS to support the Live and VoD mode that I need. It would also bring a lot of optimization possibilities for the future.
Video isn't a problem in this context, here's what I do:
Every time I get a data chunk, I append it to a screen.webm file.
Then I spit the file with ffmpeg
ffmpeg -ss {total_duration_in_storage} -i screen.webm -c: v copy -f hls -hls_time 8 -hls_list_size 0 output.m3u8
I ignore the last file unless it's the last chunk.
I upload all the files to the cloud storage along with a newly updated output.m3u8 with the new file information.
Note: total_duration_in_storage corresponds to the time already uploaded
on cloud storage. So the sum of the parts presents in the last output.m3u8.
Note 2: I ignore the last file in point 3 because it allows me to have keyframes in each song of my playlist and therefore to be able to use a seeking which allows segmenting only the parts necessary for each new chunk.
My problem is with the audio. I can use the same method and it works fine, I don't re-encode. But I need to re-encode in aac to be compatible with HLS but also with Safari.
If I re-encode only the new chunks that arrive, there is an auditory glitch
The only possible avenue I have found is to re-encode and segment all the files each time a new chunk comes along. This will be problematic for long recordings (multiple hours).
Do you have any solutions for this problem or another way to achieve my goal?
Thanks a lot for your help!
The common situation when the integrity of an MP3 file is not correct, is when the file has been partially uploaded to the server. In this case, the indicated audio duration doesn't correspond to what is really in the MP3 file: we can hear the beginning, but at some point the playing stops and the indicated duration of the audio player is broken.
I tried with libraries like node-ffprobe, but it seems they just read metadata, without making comparison with real audio data in the file. Is there a way to detect efficiently a corrupted or incomplete MP3 file from node.js?
Note: the client uploading MP3 files is a hardware (an audio recorder), uploading files on a FTP server. Not a browser. So I'm not able to upload potentially more useful data from the client.
MP3 files don't normally have a duration. They're just a series of MPEG frames. Sometimes, there is an ID3 tag indicating duration, but not always.
Players can determine duration by choosing one of a few methods:
Decode the entire audio file.This is the slowest method, but if you're going to decode the file anyway, you might as well go this route as it gives you an exact duration.
Read the whole file, skimming through frame headers.You'll have to read the whole file from disk, but you won't have to decode it. Can be slow if I/O is slow, but gives you an exact duration.
Read the first frame's bitrate and estimate duration by file size.Definitely the fastest method, and the one most commonly used by players. Duration is an estimate only, and is reasonably accurate for CBR, but can be wildly inaccurate for VBR.
What I'm getting at is that these files might not actually be broken. They might just be VBR files that your player doesn't know the duration of.
If you're convinced they are broken (such as stopping in the middle of content), then you'll have to figure out how you want to handle it. There are probably only a couple ways to determine this:
Ideally, there's an ID3 tag indicating duration, and you can decode the whole file and determine its real duration to compare.
Usually, that ID3 tag won't exist, so you'll have to check to see if the last frame is complete or not.
Beyond that, you don't really have a good way of knowing if the stream is incomplete, since there is no outer container that actually specifies number of frames to expect.
The expression for calculating the filesize of an mp3 based on duration and encoding (from this answer) is quite simple:
x = length of song in seconds
y = bitrate in kilobits per second
(x * y) / 1024 = filesize (MB)
There is also a javascript implementation for the Web Audio API in another answer on that same question. Perhaps that would be useful in your Node implementation.
mp3diags is some older open source software for fixing mp3s and which was great for batch processing stuff like this. The source is c++ and still available if you're feeling nosy and want to see how some of these features are implemented.
Worth a look since it has some features that might be be useful in your context:
What is MP3 Diags and what does it do?
low quality audio
missing VBR header
missing normalization data
Correcting files that show incorrect song duration
Correcting files in which the player cannot seek correctly
I have a video file that I would like to start broadcasting from NodeJS, preferably through Express, at a given time. That is, if the video starts being available at timestamp t0, then if a client hits the video endpoint at time t0+60, the video playback would start at 60 seconds in.
My key requirement is that when a client connect at a given time, no more of that video be available than what would have been seen so far, so the client connecting at t0+60 would not be able to watch past the minute mark (plus some error threshold) initially, and every ~second, another second of video availability would be added, simulating a live experience synced across all clients regardless of when each loads the stream.
So far, I've tried my luck converting videos to Apple's HLS protocol (because the name sounds promising) and I was able to host the m3u8 files using Node's hls-server library, where the call is very straightforward:
import HLSServer = require('hls-server');
import http = require('http');
const source = __dirname + '/resources';
const server = http.createServer();
const hls = new HLSServer(server, {
path: '/streams', // Base URI to output HLS streams
dir: source // Directory that input files are stored
});
server.listen(8000);
However, it sends the entire video to the browser when asked, and appears to offer no option of forcing a start at a given frame. (I imagine forcing the start position can be done out of band by simply sending the current time to the client and then having the client do whatever is necessary with HTML and Javascript to advance to the latest position).
There are some vague approaches that I saw online that use MP4, but from what I understand, due to its compression, it is hard to know how many bytes of video data correspond to what footage duration as it may widely vary.
There are also some other tutorials which have a direct pipe from an input source such as a webcam, thereby requiring liveness, but for my comparatively simple use case where the video file is already present, I'm content with the ability to maintain a limited amount of precision, such as ±10 seconds, just as long as all clients are forced to be approximately in sync.
Thank you very much in advance, and I appreciate any pointers.
Basically I'm trying to replicate YouTube's ability to begin video playback from any part of hosted movie. So if you have a 60 minute video, a user could skip straight to the 30 minute mark without streaming the first 30 minutes of video. Does anyone have an idea how YouTube accomplishes this?
Well the player opens the HTTP resource like normal. When you hit the seek bar, the player requests a different portion of the file.
It passes a header like this:
RANGE: bytes-unit = 10001\n\n
and the server serves the resource from that byte range. Depending on the codec it will need to read until it gets to a sync frame to begin playback
Video is a series of frames, played at a frame rate. That said, there are some rules about the order of what frames can be decoded.
Essentially, you have reference frames (called I-Frames) and you have modification frames (class P-Frames and B-Frames)... It is generally true that a properly configured decoder will be able to join a stream on any I-Frame (that is, start decoding), but not on P and B frames... So, when the user drags the slider, you're going to need to find the closest I frame and decode that...
This may of course be hidden under the hood of Flash for you, but that is what it will be doing...
I don't know how YouTube does it, but if you're looking to replicate the functionality, check out Annodex. It's an open standard that is based on Ogg Theora, but with an extra XML metadata stream.
Annodex allows you to have links to named sections within the video or temporal URIs to specific times in the video. Using libannodex, the server can seek to the relevant part of the video and start serving it from there.
If I were to guess, it would be some sort of selective data retrieval, like the Range header in HTTP. that might even be what they use. You can find more about it here.