mpeg-dash and codecs specification - mpeg-dash

Looking at the article :http://www.streamingmedia.com/Articles/Editorial/What-Is-.../What-is-MPEG-DASH-79041.aspx
And it makes statements like:DASH is codec-independent, and will work with H.264, WebM and other codecs
DASH supports both the ISO Base Media File Format (essentially the MP4 format) and MPEG-2 transport streams
DASH does not specify a DRM method but supports all DRM techniques specified in ISO/IEC 23001-7: Common Encryption
But how is audio/video compression, or DRM method is specified in Media Presentation? Where cab i find more details?

DASH is a streaming protocol - the video stream is inside a 'container' and the container is broken into chunks and streamed. A very high level view of the video component is:
elementary video stream encoded with some codec
fragmented mp4 container (broken into chunks to facilitate ABR)
MPEG DASH streaming protocol
The mp4 container header information contains information about all the streams it contains - this will include the codec that it used to encode the stream (e.g. h.264 for a video stream).
ABR essentially allows the client device or player download the video in chunks, e.g 10 second chunks, and select the next chunk from the bit rate most appropriate to the current network conditions.
The DASH manifest (essentially an index file that contains pointers to the different bit rate streams etc) contains header information about the protections systems in use, for example Widevine or PlayReady DRMs.
The mp4 container also contains information about the protection system in a special PSSH (Protection System Specific Headers) header for the protection systems in use, for example again, Widevine or PlayReady.
Generally DASH streams will have the protection information in both places to ensure that all players can play the stream, but last time I looked, I think the spec strictly speaking says it can be in either or both.
The specs themselves are available here:
http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html (search for DASH)
https://www.iso.org/standard/68042.html - unfortunately, this one requires payment AFAIK. You can see a W3C spec which uses it here, however: https://w3c.github.io/encrypted-media/format-registry/stream/mp4.html
And there is a nice overview of DASH here:
https://www.w3.org/2011/09/webtv/slides/W3C-Workshop.pdf
And, of course, the classic reference to some of the drivers for DASH and similar standards:
https://xkcd.com/927/

Related

Does YouTube store video and audio separately

youtube-dl can be used to see what formats are used to store YouTube content:
youtube-dl -F https://youtu.be/??????
The above command hints that the audio and video are mostly stored separately. Is it right? Does YouTube streaming combine audio and video in real-time?
Formats for a sample YouTube content
Most large streaming services will use ABR streaming (see: https://stackoverflow.com/a/42365034/334402).
The two most common ABR streaming formats are HLS and MPEG-DASH and both provide a manifest or index file which the player downloads first and which will contain links to the media streams, typically audio, video, subtitle tracks etc.
For encrypted content the audio and video, and even different bit rate video tracks, may all have separate encryption keys.
The player will download the audio and video tracks and synchronise them for playback.
in general streaming video and audio are sent in separate channels .... ditto for multi track audio like 5+1 ... during transport these channels are wrapped by a media container like mp4 etc
motive is partly due to distinct compression algorithms ... some algos are best for audio versus others for video and baked into these algos is the spread and sharing of data over time across video frames see B-frames for details ... these channels are not limited to video and audio ... if you own the sending and receiving sides you can send arbitrary data in many distinct channels by making up your own data protocol ... as an aside modern codec like H.256 allow data to get sent from receiver back to sender when you think you are simply viewing a movie (read the RFC)
youtube stores each of its various flavors of video and audio in separate files on its end then combines them based in desired streaming quality choices on a per download basis

Is it possible to splice advertisements or messages dynamically into an MP3 file via a standard GET request?

Say you have an MP3 file and it's 60,000,000 bytes, and you also have an MP3 advertisement that's 500,000 bytes, both encoded at the same bit rate.
Would it be possible using an nginx or apache module to change the MP3 "Content-Length" header value to 60,500,000 and then control the incoming "Content-Range" requests so the first 500,000 bytes return the advertisement audio, and any range request greater than 500,000 begins returning the regular audio file with a 500,000 byte offset?
Or is it only possible to splice advertisements (or messages) into an MP3 file using an application such as FFmpeg to re-render the entire file?
Apologies if this is a stupid question, I'm just trying to think outside of the box.
You cannot arbitrarily splice MP3 without artifacts and decoder errors.
You also generally cannot cut/splice MP3 on frame boundaries due to the Bit Reservoir. Basically, a particular MP3 frame may contain data from another frame to more efficiently use the available bandwidth when its needed. Ignoring the bit reservoir can also cause artifacts and/or decoder errors.
What you can do is re-encode your advertisement and eventually re-join the stream. That is, at the point of ad insertion, decode the stream to PCM, mix (or replace in the audio) for your ad, and have this parallel stream re-encoded to PCM. If the encoding parameters are the same, eventually (after a couple of extra MP3 frames), you'll have identical bitstreams, and you can go back to reading the stream from the same buffer.
If you're doing this for ad-insertion on internet radio (live) streams, keep in mind that you'll have to do this on the server for every client (or at least, for each ad variant and timing variant). If this is for podcasts or other pre-recorded content, I'd recommend the FFmpeg route. You won't have to build anything, you can stream and cache the output as its being encoded, and you'll have compatibility with other codecs without building one-off code for each codec/container.

How do Shoutcast servers and clients deal with mp3 frame headers and frame dependencies?

Short story:
If I myself intend to receive and then send a Shoutcast compatible audio stream processed by my application, then how to do it properly using an mp3 (de/en)coder library? Pseudo code, or better - lame mp3 specific code would be highly appreciated.
Long story:
More specific questions which bother me were caused by an article about mp3, which says:
Generally, frames are independent items. Each frame has its own header
and audio informations. There is no file header. Therefore, you can
cut any part of MPEG file and play it correctly (this should be done
on frame boundaries but most applications will handle incorrect
headers). For Layer III, this is not 100% correct. Due to internal
data organization in MPEG version 1 Layer III files, frames are often
dependent of each other and they cannot be cut off just like that.
This made me wonder, how Shoutcast servers and clients deal with frame headers and frame dependencies.
Do I have to encode to constant bitrate (CBR) only, if I want to achieve maximum compatibility with the most of Shoutcast players out there?
Is the mp3 frame header used at all or the stream format is deduced from a Shoutcast protocol specific HTTP header?
Does Shoutcast protocol guarantee (or is it common good practice) to start serving mp3 stream on frame boundaries and continue to respond with chunks that are cut at frame boundaries? But what is the minimum or recommended size of a mp3 frame for streaming live audio?
How does Shoutcast deal with frame dependencies - does it do something special with mp3 encoding to ensure that the served stream does not have frames which depend on previous frames (if this is even possible)? Or maybe it ignores these dependencies on server side/client side, thus getting audio quality reduction or even artifacts?
SHOUTcast servers do not know or care about the data being passed through them. They send it as-is. You can actually send arbitrary data through a SHOUTcast server, and receive it. SHOUTcast will segment the media data wherever the buffer size falls.
It's up to the client to re-sync to the data. It does this by locating the frame header, then being decoding. Once the codec has enough frames to reliably play back audio, it will begin outputting raw PCM. It's up to the codec when to decide it's safe to start playback. Since the codec knows what it's doing in terms of decoding the media, it knows when it has sufficient data (including bit reservoirs) to begin without artifacts. It's also worth noting that the bit reservoir cannot be carried on too far, so it doesn't take but a few frames at worst to handle it.
This is one of the reasons it's important to have a sizable buffer server-side, to flush to the clients as fast as possible on connect. If playback is to start quickly, the codec needs more data than the current frame to begin.

How to read video file using v4l2

I want to read a video file using v4l2, say an AVI file. And read it frame by frame.
As far as I can tell I need to use the read() function. But how isn't very clear to me. There are also hardly any examples available. So maybe a simple example on how to do this would help.
This is not what the Video4Linux2 (V4L2) API is for. It is not designed for reading multimedia files from disk, decoding them and playing them. Rather, it is designed to interface to assorted multimedia input devices (like webcams, microphones, TV tuners, and video capture devices), capture A/V data, and play it.
Take it from the V4L2 API introduction:
Video For Linux Two is [...] a kernel interface for analog radio and
video capture and output drivers.
For reading an AVI file and decoding/playing it (programmatically) on Linux, look into FFmpeg or GStreamer.

How to determine the codec of an audio file on Windows?

I need to find the codec of an audio file. How can I do this?
Do I need to write code to do this or is there a simpler way?
Please help me. If possible share helpful links.
The good old file utility will reveal lots of information about audio files, sometimes including the codec:
$ file X.wav
X.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz
#bhebsquines
"It is important to distinguish between a file format and an audio codec. A codec performs the encoding and decoding of the raw audio data while the data itself is stored in a file with a specific audio file format. Although most audio file formats support only one type of audio data (created with an audio coder), a multimedia container format (as Matroska or AVI) may support multiple types of audio and video data." - http://en.wikipedia.org/wiki/Audio_file_format
The application gspot does a good job of pulling codecs from audio and video files.
http://www.headbands.com/gspot/
run it and drag a file into the window. It will pull all of the data from there. Note that some audio files will not display a codec as they are made from "Raw" audio.
You can identify your codec by extenion name itself example file1audio.mp3 or fileaudio.avi, .mp3, .avi will be your file type or codec, you can use k-lite codec pack for your different type audio format. use this link http://www.free-codecs.com/download/k_lite_codec_pack.htm

Resources