Mediafilesegmenter inserts timed metadata ID3 tags in HLS stream but at the wrong point in time - id3

I am inserting timed metadata in a HLS (HTTP Live Stream) using id3taggenerator and mediafilesegmenter. I have followed the instructions from Jake's Blog.
First, I create the id3tag using id3taggenerator:
id3taggenerator -o text.id3 -t "video"
Then add the tag to the id3macro file:
0 id3 /path/to/file/text.id3
And segment the video and insert the id3 tags with mediafilesegmenter:
mediafilesegmenter -M /path/to/id3macro -I -B "my_video" video.mp4
However, the timed metadata is inserted at the wrong point in time. Instead of showing up at the beginning of the video (point in time 0), it is added with a delay of 10 s (give or take 0.05 seconds, sometimes more, sometimes less).
I've wrote a simple iOS player app that logs whenever it is notified of an id3 tag in the video. The app is notified after playing the video for around 10 seconds of the ID3 tag. I've also tried with another id3macro file, with multiple timed metadata inserted in the video (around 0s, 5s, 7s), all showing up with the same approximate delay. I have also changed with the duration of the segment to 5s, but each time it's the same result.
The mediafilesegmenter I am using is Beta Version 1.1(140602).
Can anyone else confirm this problem, or pin-point to what am I doing wrong here?
Cheers!

I can confirm that I experience the same issue, using the same version of mediafilesegmenter:
mediafilesegmenter: Beta Version 1.1(140602)
Moreover, I can see that the packet with ID3 is inserted in the right moment in the stream. Eg. if I specify a 10 second delay – I can see that my ID3 is inserted in the end of the first 10 second segment.
However, it appears 10 seconds later in iOS notifications.
I can see the following possible reasons:
mediafilesegmenter inserts metadata packet in the right place, but timestamp is delayed by 10 seconds for some reason. Therefore, clients (eg. iOS player) show the tag 10 seconds later. Apple tools are not well documented so it's hard to verify.
Maybe iOS player receives metadata in time (because I know the tag was included in previous segment file) but issues a notification with 10 second delay, for whatever reason.
I cannot dig further because I don't have any Flash/desktop HLS players that support in-stream ID3 tags. If I had one, I would check whether desktop player will display/process ID3 in time, without delay. Then, it would mean the problem is iOS, not mediafilesegmenter.
Another useful thing to do would be – extracting MPEG-TS frame with ID3 tag from the segment file, and checking headers, looking for any strange things there (eg. wrong timestamp).
Update:
I did some more research including reverse engineering of TS segments created with Apple tools, and it seems:
mediafilesegmenter starts PTS (presentation time stamps) from 10 seconds while, for example, ffmpeg starts from 0.
mediafilesegmenter adds ID3 frame at the correct place in TS file but with wrong PTS that is 10 seconds ahead of what was specified in meta file.
While the first issue doesn't seem to affect the playback (as far as I understand it's more important that PTS goes on continuously, not where it starts), the second is definitely an issue and the reason you/we are experiencing the problem.
Therefore, iOS player receives ID3 frame in time but since its PTS is 10 seconds ahead – it waits 10 seconds before issuing a notification. As far as I can say now – some other players simply ignore this ID3 frame because it's in the wrong place.
As a workaround, you can shift all ID3 files by 10 seconds in your meta file, but obviously, you won't be able to put anything in the beginning.

Related

HLS Live streaming with re-encoding

I come to a technical problem and I need you.
Situation data:
I record the screen as well as 1 to 2 audio tracks (microphone and speaker).
These three recordings are done separately (it could be mixed but I don't prefer) and every 10s (this is configurable), I send the chunk of recorded data to my backend. We, therefore, have 2 to 3 chunks sent every 10s.
These data chunks are interdependent. Example: The 1st video chunk starts with the headers and a keyframe. The second chunk can be in the middle of a frame. It's like having the entire video and doing a random one-bit split.
The video stream is in h264 in a WebM container. I don't have a lot of control over it.
The audio stream is in opus in a WebM container. I can't use aac directly, nor do I have much control.
Given the reality, the server may be restarted randomly (crash, update, scaled, ...). It doesn't happen often (4 times a week). In addition, the customer can, once the recording ends on his side, close the application or his computer. This will prevent the end of the recording from being sent. Once it reconnects, the missing data chunks are sent. This, therefore, prevents the use of a "live" stream on the backend side.
Goals :
Store video and audio as it is received on the server in cloud storage.
Be able to start playing the video/audio even when the upload has not finished (so in a live stream)
As soon as the last chunks have been received on the server, I want the entire video to be already available in VoD (Video On Demand) with as little delay as possible.
Everything must be distributed with the audios in AAC. The audios can be mixed or not, and mixed or not with the video.
Current and blocking solution:
The most promising solution I have seen is using HLS to support the Live and VoD mode that I need. It would also bring a lot of optimization possibilities for the future.
Video isn't a problem in this context, here's what I do:
Every time I get a data chunk, I append it to a screen.webm file.
Then I spit the file with ffmpeg
ffmpeg -ss {total_duration_in_storage} -i screen.webm -c: v copy -f hls -hls_time 8 -hls_list_size 0 output.m3u8
I ignore the last file unless it's the last chunk.
I upload all the files to the cloud storage along with a newly updated output.m3u8 with the new file information.
Note: total_duration_in_storage corresponds to the time already uploaded
on cloud storage. So the sum of the parts presents in the last output.m3u8.
Note 2: I ignore the last file in point 3 because it allows me to have keyframes in each song of my playlist and therefore to be able to use a seeking which allows segmenting only the parts necessary for each new chunk.
My problem is with the audio. I can use the same method and it works fine, I don't re-encode. But I need to re-encode in aac to be compatible with HLS but also with Safari.
If I re-encode only the new chunks that arrive, there is an auditory glitch
The only possible avenue I have found is to re-encode and segment all the files each time a new chunk comes along. This will be problematic for long recordings (multiple hours).
Do you have any solutions for this problem or another way to achieve my goal?
Thanks a lot for your help!

Custom player using NDK/C++/MediaCodec - starvation/buffering in decoder

I have a very interesting problem.
I am running custom movie player based on NDK/C++/CMake toolchain that opens streaming URL (mp4, H.264 & stereo audio). In order to restart from given position, player opens stream, buffers frames to some length and then seeks to new position and start decoding and playing. This works fine all the times except if we power-cycle the device and follow the same steps.
This was reproduced on few version of the software (plugin build against android-22..26) and hardware (LG G6, G5 and LeEco). This issue does not happen if you keep app open for 10 mins.
I am looking for possible areas of concern. I have played with decode logic (it is based on the approach described as synchronous processing using buffers).
Edit - More Information (4/23)
I modified player to pick a stream and then played only video instead of video+audio. This resulted in constant starvation resulting in buffering. This appears to have changed across android version (no fix data here). I do believe that I am running into decoder starvation. Previously, I had set timeouts of 0 for both AMediaCodec_dequeueInputBuffer and AMediaCodec_dequeueOutputBuffer, which I changed on input side to 1000 and 10000 but does not make much difference.
My player is based on NDK/C++ interface to MediaCodec, CMake build passes -DANDROID_ABI="armeabi-v7a with NEON" and -DANDROID_NATIVE_API_LEVEL="android-22" \ and C++_static.
Anyone can share what timeouts they have used and found success with it or anything that would help avoid starvation or resulting buffering?
This is solved for now. Starvation was not caused from decoding perspective but images were consumed in faster pace as clock value returned were not in sync. I was using clock_gettime method with CLOCK_MONOTONIC clock id, which is recommended way but it was always faster for first 5-10 mins of restarting device. This device only had Wi-Fi connection. Changing clock id to CLOCK_REALTIME ensures correct presentation of images and no starvation.

HLS 10 second segment and seeking

Apple recommends having hls segments of 10 seconds, however this means that seeking would be limited to every 10 seconds.
I have tried shorter segments of 3 seconds and this is better for seeking but this is not ideal or recommended.
Is there anyway of keeping the segments at 10 seconds but allowing for better seeking?
Would adding a key frame every 30 frames (1 key frames every second) allow for better seeking?
Ultimately, it depends upon your player. If I'm watching a video with the default iPad player, I can navigate via the progress bar on the bottom and seek to any point in the video, and it works very well, regardless of segment length or key frame cadence.
Some players support the #EXT-X-I-FRAMES-ONLY attribute. This implementation of trick play works by only playing back the intra frames. This was introduced in version 4 of the Pantos spec, and I have only seen it working well on newer iPads. A good sample clip can be found here.

MP4 Atom Parsing - where to configure time...?

I've written an MP4 parser that can read atoms in an MP4 just fine, and stitch them back together - the result is a technically valid MP4 file that Quicktime can open and such, but it can't play any audio as I believe the timing/sampling information is all off. I should probably mention I'm only interested in audio.
What I'm doing is trying to take the moov atoms/etc from an existing MP4, and then take only a subset of the mdat atom in the file to create a new, smaller MP4. In doing so I've altered the duration in the mvhd atom, as well as the duration in the mdia header. There are no tkhd atoms in this file that have edits, so I believe I don't need to alter the durations there - what am I missing?
In creating the new MP4 I'm properly sectioning the mdat block with a wide box, and keeping the 'mdat' header/size in their right places - I make sure to update the size with the new content.
Now it's entirely 110% possible I'm missing something crucial about the format, but if this is possible I'd love to get the final piece. Anybody got any input/ideas?
Code can be found at the following link:
https://gist.github.com/ryanmcgrath/958c602cff133bd7fa0b
I'm going to take a stab in the dark here and say that you're not updating your stbl offsets properly. At least I didn't (at first glance) see your python doing that anywhere.
STSC
Lets start with the location of data. Packets are written into the file in terms of chunks, and the header tells the decoder where each "block" of these chunks exists. The stsc table says how many items per chunk exist. The first chunk says where that new chunk starts. It's a little confusing, but look at my example. This is saying that you have 100 samples per chunkk, up to the 8th chunk. At the 8th chunk there are 98 samples.
STCO
That said, you also have to track where the offsets of these chunks are. That's the job of the stco table. So, where in the file is chunk offset 1, or chunk offset 2, etc.
If you modify any data in mdat you have to maintain these tables. You can't just chop mdat data out, and expect the decoder to know what to do.
As if this wasn't enough, now you have to also maintain the sample time table (stts) the sample size table (stsz) and if this was video, the sync sample table (stss).
STTS
stts says how long a sample should play for in units of the timescale. If you're doing audio the timescale is probably 44100 or 48000 (kHz).
If you've lopped off some data, now everything could potentially be out of sync. If all the values here have the exact same duration though you'd be OK.
STSZ
stsz says what size each sample is in bytes. This is important for the decoder to be able to start at a chunk, and then go through each sample by its size.
Again, if all the sample sizes are exactly the same you'd be OK. Audio tends to be pretty much the same, but video stuff varies a lot (with keyframes and whatnot)
STSS
And last but not least we have the stss table which says which frame's are keyframes. I only have experience with AAC, but every audio frame is considered a keyframe. In that case you can have one entry that describes all the packets.
In relation to your original question, the time display isn't always honored the same way in each player. The most accurate way is to sum up the durations of all the frames in the header and use that as the total time. Other players use the metadata in the track headers. I've found it best to just keep all the values the same and then players are happy.
If you're doing all that and I missed it in the script then can you post a sample mp4 and a standalone app and I can try to help you out.

Download last 30 seconds of an mp3

Is it possible to download only the last 30 seconds of an mp3? Or is it necessary to download the whole thing and crop it after the fact? I would be downloading via http, i.e. I have the URL of the file but that's it.
No, it is not possible... at least not without knowing some more information first.
The real problem here is determining at what byte offset the last 30 seconds is. This is a product of knowing:
Sample Rate
Bit Depth (per sample)
# of Channels
CBR or VBR
Bit Rate
Even then, you're not going to get that with a VBR MP3 file, and even with CBR, who knows how big the ID3 and other crap at the beginning of the file is. Even if you know all of that, there is still some variability, as you have the problem of the bit reservoir.
The only way to know would be to download the whole file and use a tool such as FFMPEG to find out the right offset. Then if you want to play it, you'll want to add the appropriate headers, and make sure you are trimming on an eligible frame, or fix the bit reservoir yourself.
Now, if this could all be figured out server-side ahead of time, then yes, you could request the offset from the server, and then download from there. As for how to download it, your question is very incomplete and didn't mention what protocol you were using, so I cannot help you there.

Resources