hls ext-x-discontinuity-sequence for endless stream - http-live-streaming

In an endless HLS stream, I'm not sure how to implement the EXT-X-DISCONTINUITY-SEQUENCE tag.
The RFC states :
If the server removes an EXT-X-DISCONTINUITY tag from the Media
Playlist, it MUST increment the value of the EXT-X-DISCONTINUITY-
SEQUENCE tag so that the Discontinuity Sequence Numbers of the
segments still in the Media Playlist remain unchanged. The value of
the EXT-X-DISCONTINUITY-SEQUENCE tag MUST NOT decrease or wrap.
Clients can malfunction if each Media Segment does not have a
consistent Discontinuity Sequence Number.
The media playlist I create always have the same number of segments, and the older one gets deleted when a newer one is added. Sometimes, there might be a discontinuity between two segments, so I add an EXT-X-DISCONTINUITY to the segment. However, after some time, when there are no more discontinuities in the playlist, I remove this tag and should increment the EXT-X-DISCONTINUITY-SEQUENCE.
Since the stream is endless, it will have to wrap at some point. How do people usually implement this ?

The value of EXT-X-DISCONTINUITY-SEQUENCE is defined as a decimal-integer which is defined as a number in the range 0 - 2^64 - 1 (see https://datatracker.ietf.org/doc/html/draft-pantos-hls-rfc8216bis-09#section-4.2)
Even if you were incrementing EXT-X-DISCONTINUITY-SEQUENCE many many times a second (which the question implies you are not) it seems highly unlikely you would need to handle wrapping of this value.
Given the possible range and relatively slow incrementing in the general case, I seriously doubt anyone worries about wrapping this value, but I'd be interested to be proved wrong.

Related

What integer type is used for MP3 data frames?

I am writing a universal parser library for various binary formats in Rust as part of a personal project. I've started researching the file structure of MP3 files. As I understand it, an MP3 file structure consists of header and data frames. Each header frame provides meta information about the proceeding data frame. Here is a diagram and a listing of allowed values for MP3 header frames that I am referencing.
I understand the format of the MP3 header. My confusion, or lack of information, surrounds MP3 data frames. I can't seem to find a source that specifies what integer type samples are encoded as in the data frame portion of an MP3 file. Are they 8 bit, 16 bit, 32 bit, signed, unsigned, etc?
The best I can think of is, to use a combination of the sample rate frequency and bitrate to calculate what each sample size should. However, that doesn't determine if each sample is a signed or unsigned integer.
I'm not trying decode these files, I'm just trying to parse them. I've had a surprisingly hard time finding this information. Any information or helpful someone can offer would be much appreciated.
Although this is not related to .mp3 per se, there could potentially be some helpful information in Will C. Pirkle's book, Designing Audio Effect Plugins in C++.
He discusses the way in which the .wav audio format stores its information. It uses signed integers starting from -32,768 to 32,767. This represents a range of 2^16 in a bipolar format, where the exponent corresponds to the bit-depth (most commonly 16 or 24).
Another important thing to note is that while phase inversion is a common thing in many audio applications, there is no corresponding integer for inverting -32,768. To compensate, it's common to treat the value -32,768 as -32,767. This only matters though if you are using the value 0 in your processing, which is most often the case. Otherwise, one could extend the upper limit to 32,768.
He does state that it's more common for audio processing applications to deal with floating point numbers either between 0.0f and 1.0f or -1.0f and 1.0f. The reason is that due to addition and multiplication being common operations in DSP, we avoid overflowing that range if we use these floating points. In the bipolar integer format, it's too easy to find two numbers that result in a product or sum outside that range. In the range of -1.0f to 1.0f, any two numbers will always result in a product that's still within that range. Unfortunately, addition still requires caution, but eh...
I'm sorry I don't have more information about .mp3s specifically, but perhaps this could still be insightful.
Good luck!

Seeking on Ogg/Opus

I have ogg-opus audio files each containing a single track (mono) and of fixed sample rate (16kHz). I'm trying to implement seeking on them for streaming. For example, I want to know byte offsets to partially download a file (with HTTP Range) and play only the first 10 seconds, or say from second 10 to second 15. That is, I need to get the the byte offset at any given time position.
Is there a way to do it without loading/decoding an entire file in this case?
I don't believe there's an exact way to determine the exact byte offset required for a specific time, but libopus.op_pcm_seek() could be used for decoding once you have the bytes. Between the varying bit rates, page sizes, and packet durations of Opus files, some guesswork and dynamic calculations seem to be required. I'm attempting to do the same thing and a few people have asked me to implement it in OpusStreamDecoder. You could look at its underlying opus_chunkdecoder.c and the specific feature request which outlines how this could be achieved:
https://github.com/AnthumChris/opus-stream-decoder/issues/1

Associating segment with an absolute date and time (equivalent of PROGRAM-DATE-TIME for DASH)

HLS offers the EXT-X-PROGRAM-DATE-TIME tag that
associates the first sample of a
Media Segment with an absolute date and/or time
What is the equivalent of PROGRAM-DATE-TIME for MPEG-DASH?
I looked into mpeg-dash events but they associate arbitrary metadata to a period of time. What I need is to associated time to a segment.
I looked into mpeg-dash descriptors but as far as I could understand from the spec they associate metadata to a period/adaption-set. Things like audio configuration, frame packing, drm protection. I don't think descriptors can associate metadata with a specific segment (by segment I mean an element described by the <S> tag in the MPD).
I know that PROGRAM-DATE-TIME falls into the more general category of associating arbitrary metadata with a segment so I looked into the segment description in the MPEG-DASH spec (5.3.9.6.3 in ISO/IEC 23009-1) and I noticed that in addition to "t", "d" and "r" any other other attribute can also be specified. But the spec doesn't say a word on the meaning or format of these additional attributes. I guess they are left to the application. But still I am wondering if there is any standardized name for specifying an absolute date and time?
presentationTimeOffset is the DASH equivalent to PDT
https://dashif-documents.azurewebsites.net/DASH-IF-IOP/master/DASH-IF-IOP.html#timing-sampletimeline

MPEG-DASH trick modes

Does anyone know how to do trick modes (rewind/forward at different speeds) with MPEG-DASH ?
DASH-IF Interoperability Points V3.0 states that it is possible.
the general idea is laid out in the document but the details are not specified.
A DASH segmenter should add tracks with a frame rate lower than normal to a specially marked AdaptationSet. Roughly you could say (even though in theory you should look at the exact profile/level thresholds) half frame rate is double playoutrate. A quarter frame rate is quadruple playoutrate.
All this is only an offer to the DASH client to facilitate ffwd. The client can use it but doesn't have to. If the DASH client doesn't understand the AdaptationSet at all it will disregard it due the EssentialProperty that marking it as track play AdaptationSet.
I can't see that fast rewind can be supported in any spec conforming way. You'd need to implement it according to your needs but with no expectation of interoperability.
You can try an indication on ISO/IEC 23009-1:2014(E) =>
Annex A
The client may pause or stop a Media Presentation. In this case client simply stops requesting Media Segments or parts thereof. To resume, the client sends requests to Media Segments, starting with the next Subsegment after the last requested Subsegment.
If a specific Representation or SubRepresentation element includes the #maxPlayoutRate attribute, then the corresponding Representation or Sub-Representation may be used for the fast-forward trick mode. The client may play the Representation or Sub-Representation with any speed up to the regular speed times the specified #maxPlayoutRate attribute with the same decoder profile and level requirements as the normal playout rate. If a specific Representation or SubRepresentation element includes the #codingDependency attribute with value set to 'false', then the corresponding Representation or Sub-Representation may be used for both fast-forward and fast-rewind trick modes.
Sub-Representations in combination with Index Segments and Subsegment Index boxes may be used for efficient trick mode implementation. Given a Sub-Representation with the desired #maxPlayoutRate, ranges corresponding to SubRepresentation#level all level values from SubRepresentation#dependencyLevel may be extracted via byte ranges constructed from the information in Subsegment Index Box. These ranges can be used to construct more compact HTTP GET request.

Detecting presence (arrival/departure) with active RFID tags

Actually arrival is pretty simple, tag gets into a range of receivers antenna, but the departure is what is causing the problems.
First some information about the setup we have.
Tags:
They work at 433Mhz, every 1.5 seconds they transmit a "heartbeat", on movement they go into a transmission burst mode which lasts for as long as they are moving.
They transmit their ID, transmission sequence number(1 to 255, repeating over and over), for how long they have been in use, and input from motion sensor, if any. We have no control over them whatsoever. They will continue doing what they do until their battery dies. And they are sealed shut.
Receiver forwards all that data + signal strength of a tag to our software. Software can work with several receivers. Currently we are using omnidirectional antennas.
How can we be sure that the tag has departed from premises?
Problems:
Sometimes two or more tags transmit "heartbeat" at the same time and no signal is received. With number of tags increasing these collisions happen more often, this problem is solved by tags randomly changing their heartbeat rate (in several milliseconds) to avoid collisions. Problem is I can't rely on tags not "checking in" for a certain period of time as sign of departure. It could be timeout because of collisions. Because of these collisions we cannot rely that every "heartbeat" will be received.
Tag manufacturer advised that we use two receivers and set them up as a gate for tags to pass through. Based on the order of tags passing through "gates" we can tell in which direction they are going. The problem with our omnidirectional antennas is that sometimes tag signal bounces of building and then arrives to receiver. So based on signal strength it looks like its farther away then it is.
Does anybody have a solution of what we can do to have a reliable way of determining if tags are coming or leaving? Also we can setup antennas in different way as well.
I wrote the software that interprets data from receivers, so that part can be manipulated in any way. But I'm out of ideas of how to interpret information to get reliability we need.
Right now the only idea is to try out with directional antennas? But I would like to tryout all the options with the current equipment we have.
Also any literature suggestion that deals with active RFID tags is more than welcome, most of books I've found deal with passive tag solutions.
As a top level statement, if you need to track items leaving your site, your RFID technology is probably the wrong one. The technology you have is better suited to the positional tracking tags within a large area - eg a factory floor. Notwithstanding the above, here is my take:
A good approach to active RFID is to break your area down into zones that are tied to your business processes, for example:
Warehouse
Loading bay
Packing
Entry of a tag into a zone represents the start of a new process or perhaps the end of a process the tag is currently in. For example, moving from warehouse to the packing represents assembling a shipment, and movement into the loading bay initiates a shipment.
The crux of many RFID implementations is the installation and configuration of the RFID intrastructure to:
Map tag -> asset (which you have done)
Map tag read -> zone (and by inference asset -> zone)
Map movements between zones to steps in a business processes (and therefore understand when an asset leaves the site, your goal)
There are a number of considerations: the physical characteristics of 433MHz signals, position of antennae, sensitivity of antennae and some tricks that some vendors have. After an optimal site configuration, then you may need to have some processing tricks on the tag reads that will pour in.
Dirty data
Always keep in mind that tag read data is dirty - that RF interference (from unshielded motors, electric wiring, etc), weather conditions and physical manipulation of tags (eg covering with metal) happen all the time.
RSSI's are like stock tickers - there is a lot of random/microeconomic noise on top of broad macroeconomic trends. To interpret movement, compute the linear regression of groups of reads rather then rely on a specific read's RSSI.
If you do see a tag broadcasting with a high RSSI, which then falls to medium then low and then disappears, you really can interpret that as the tag is leaving the range of the receiver. Is that off-site? Well, you need to consider the site's layout (the zones) and the positioning of receivers within the zones.
TriangulationTrilateration
EDIT I had incorrectly used the term 'triangulation'. This refers to determining the position of something by known the angle it subtends from two or three known locations. In RFID, you use the distance and as such it is called 'trilateration'.
In my experience, vendors selling the tag technology you describe have server software that determines the absolute position of the tags using the received RSSI. You should be able to obtain the position of the tag within 1-10m using such software. Determining if the tag is moving off-site is then easy.
To code this yourself:
First, each tag is pinging away when moving. These pings hit the receivers at almost the same time and sent to the server. However the messages can sometimes arrive out of order or interleaved with earlier and later reads from other receivers. To help correlate pings, the ping contains a sequence number. You are looking for tag reads from the same tag, with the same sequence number, received by three (or more) receivers. If more than three, pick the three with the largest RSSI.
The distance is approximated from RSSI. This is not linear and subject to non-trivial random variation. A quick google turns up:
Given three approximate distances from three known points (the receivers' locations), you can then resolve the approximate position of the tag using Trilateration using 3 latitude and longitude points, and 3 distances.
Now you have the absolute position of the tag. You can use these positions to track the absolute movement of the tag.
To make this useful, you should position receivers so that you can reliably detect tags right up to the physical site boundaries. You should then determine a 'geofence' around your site, within receiver range. I would write a business rule that states:
If the last known position of a tag was outside the geofence, and
A tag read from the tag has not been detected in (say) 10s, then
Declare the tag has left the site.
By using the trilateration and geofence, you can focus the business logic on only those tags close to going awol. If you fail to receive your 1.5s ping only a few times from such a tag, it's highly likely that the tag has gone outside your receiver's range, and therefore off-site.
You're already aware that tag reads can sometimes come from reflections. If you have a lot of these, then your trilateration will be pretty poor. So this method works best when there are fairly large open spaces and minimal reflectors.
Some RFID vendors have all this built into their servers - processing this by writing your own code is (clearly) non-trivial.
Zone design using wide-area receivers
Logical design of zones can help the business logic layer. For example, suppose you have two zones (A and B) with two receivers (1 and 2):
A B
+----------+----------+
| | |
| 1 | 2 |
| | |
+----------+----------+
If you get tag reads from the tag at receiver 1, then one at receiver 2, how do you interpret that? Did tag T move into zone B, or just get a read at the extreme range of 2?
If you get a later read at 1, did the tag move back, or did it never move?
A better physical solution is:
A B
+----------+----------+
| | |
| 1 2 3 |
| | |
+----------+----------+
In this approach, a tag moving from A to B would get reads from the following receivers:
1 1 1 2 1 2 2 3 2 2 3 2 3 3 3 3 3
-------> time
From a programming logic point of view, a movement from A -> B has to traverse reads 1 -> 2 -> 3 (even though there is a lot of jitter). It gets even easier to interpret when you combine with RSSI.
Portal design with directional receivers
You can create quite a good portal using two directional receivers (you will need to spend some time configuring the antenna and sensitivity carefully). Mount a receiver well above the door on both sides. Below is a schematic from the side. R1 and R2 are the receivers (and the rough read field is shown), and on the left is a worker pushing an asset through the door:
----> direction of motion
-------------------+----------------
R1 | R2
/ \ | / \
o / \ / \
|-++ / \ / \
|\++ / \ / \
------------------------------------------
You should get a pattern of reads like this:
<nothing> 1 1 1 1 1 12 1 21 2 12 2 1 2 2 2 2 2 <nothing>
-------> time
This indicates a movement from receiver 1 to receiver 2.
"Signposts"
Savi implementations often use "sign posts" to assist with location. The sign post emits beam that illuminates a small area (like a doorway) in a 123KHz beam. The signpost also transmits a unique number identifying itself (left door might be 1, while the right door might be 2). When the tag passes through the beam, it wakes up and re-broadcasts the number. The reader now knows which door the tag passed through.
Watch out for any metal in the surrounding area. 123KHz travels extremely well down rebar in concrete walls, metal fences and rail tracks. We once had tags reporting themselves hundreds of meters from a signpost due to such effects.
With this approach you can implement a portal much like you would for passive.
Simulating signposts
If you don't have the ability to use signposts, then there is a dirty hack:
Stick a passive RFID tag to your active RFID tag
Install a passive RFID reader on each doorway
Passive RFID is actually very good in restricted spaces, so this implementation can work very well. This solution may be the same cost (or cheaper) than with your active RFID vendor.
If you're clever, you can use the EPC GIAI namespace for the passive tag ID and so burn it with the active tag ID. Both active and passive tags would then be identically named.
Physical considerations
433MHz tags have some interesting characteristics. Well-constructed receivers can get a read of tags within about 100m, which is a long way for RFID. In addition, 433MHz wraps itself around obstacles very well, especially metal ones. We could even read tags in the boot (trunk) of a car travelling at 50km/h - the signal propagates from the rubber seal.
When installing a reader to monitor a zone, you need to adjust its location and sensitivity very carefully to maximize the reads from tags within your zone, but also to minimize reads from outside your zone. This might be done in HW or in SW configuration (like dropping all reads below a particular RSSI).
One idea might be to move the receiver away from the area where your tags are exiting as in the layout below (R is the reader):
+-------------------------+-----------+
| Warehouse | Exit |
| . |
| .
| R . R --->
| .
| . |
| | |
+-------------------------+-----------+
It pays to do a RF site survey and spend enough time to properly understand how tags and readers work in an area. Getting the physical installation right is critical.
Other thing to do is to consider physical constrictions such as corridors and doorways and treat them as choke-points - map logical zones to them. Put a reader (with directional receiver tuned to cover the constriction) and lower sensitivity in to cover the constriction.
What no tag-reads actually means
If my experience of RFID has taught me anything, it is that you can get spurious reads at any time, and you need to treat everything with a degree of suspicion. For example, you might have a few seconds of missing reads from a given tag - this can mean anything:
A user accidentally putting a metal tin over the tag
A fork lift truck getting between tag and reader
An RF collision
A momentary network congestion
The battery dying or fading out (remember to check the low-battery flag in tag reads and ensure the business has a process to replace old tags).
Tag destroyed by a pallet being pushed into it
Stollen by someone wanting to resell it for scrap (Not a joke - this actually happened)
Oh yeah, it may be that the tag moved off-site.
If the tag has not been heard of in, say, 5 minutes, odds are that it's off site.
In most business processes that you would use this active tag technology for, a short delay before the system decides the tag is off-site is acceptable.
Conclusions
Site survey: spend time experimenting with readers in different locations. Walk around the site with a tag and see what reads you are actually getting. Use this to:
Logically segment your site into zones and locate receivers to most accurately position tags in zones
It's easier to determine movement between zones using several receivers; if possible, instrument physical constrictions such as doors and corridors as portals. As part of your RFID implementation, you might even want to install new walls or fences to create such constrictions. Consider a passive RFID for portals.
Beware of metal, especially large expanses of it.
You have dirty data. You need to compute linear regressions on the RSSIs to spot trends over short periods; you need to be able to forgive a small number of missing tag reads
Make sure that there are business processes to handle dying batteries and sudden disappearances of tags.
Above all, this problem is best solved by getting the receivers installed in the best locations and configuring them carefully, then getting the software right. Trying to solve a bad site installation with software can cause premature ageing.
Disclosure: I worked 8 years for a major active RFID vendor.
Using directional antennas sounds like it may be a more reliable option, although this obviously depends on the precise layout of your premises.
As far as using your current omnidirectional receivers, there are a couple of options I can think of:
First one, and likely easiest, would be to collect some data on the average 'check-in' times you are seeing for on-site tags, possibly as a function of the number of on-site tags (if the number is likely to change dramatically - as your collision frequency will be related to the number of tags present). You can then analyse this data to see if you can choose a suitable cut-off time, after which you declare that a tag is no longer present.. Obviously exactly what cut-off you choose will depend on the data you see and your willingness to accept false positives - it could also be that any acceptable cut-off time lies outside your 3 minute window (although I suspect that if that is the case then your 3 minute window may not be viable).
Another, more difficult, option (or group of options more like), would be to utilise more historical information about each tag - for instance, look for tags whose signal strength gradually decreases and then disappears, or tags whose check-in time changes drastically, or perhaps utilise multiple receivers and look for patterns between receivers - such as tags which are only seen by one receiver and then disappear, or distinctive patterns of signal strength (indicating bearing) between receivers as tags go off-site.
Obviously the second option is really about looking for patterns, both over time and between receivers, and is likely to be much more labour (and analysis) intensive to implement. If you are able to capture enough good quality data you might be able to utilise machine-learning algorithms to identify relevant patterns.
We do this every day.
First question is: "How many tags do you have at a reader at any given time?". Collisions are more rare than you might think, but they do happen and tag over-population can be easily determined.
Our Software was written and might be using the same readers and tags that you are using. We set reader timeouts to determine when a tag is "away" or "offsite"; usually 30 seconds without the tag being read. Arrival of course is instantaneous when a tag is detected at the reader, then the tag is flagged "onsite".
We also have the option to use multiple readers; one at a gate and another on the parking lot or in the building for example. The gate reader has a short timeout. If a tag passes the gate reader, it is red and then times out very quickly to flag the tag as "offsite". If a tag is then read by any other reader, the tag is then considered "onsite".
I can post links if you think it would be helpful, else you can search for RFID Track. It's iOS App and there are settings posted for a demo server.
Peter

Resources