How does chopping the media file into segments work with MPEG-DASH? - mpeg-dash

The basic idea behind MPEG-DASH is to chop the media file into segments (e.g. low, medium and best) which can be encoded at different bitrates.
Do I need to do that in advance and offline, or do I need some sort of streaming HTTP server that reads whatever I have in the filesystem and encodes in realtime?

Related

How can I detect corrupt/incomplete MP3 file, from a node.js app?

The common situation when the integrity of an MP3 file is not correct, is when the file has been partially uploaded to the server. In this case, the indicated audio duration doesn't correspond to what is really in the MP3 file: we can hear the beginning, but at some point the playing stops and the indicated duration of the audio player is broken.
I tried with libraries like node-ffprobe, but it seems they just read metadata, without making comparison with real audio data in the file. Is there a way to detect efficiently a corrupted or incomplete MP3 file from node.js?
Note: the client uploading MP3 files is a hardware (an audio recorder), uploading files on a FTP server. Not a browser. So I'm not able to upload potentially more useful data from the client.
MP3 files don't normally have a duration. They're just a series of MPEG frames. Sometimes, there is an ID3 tag indicating duration, but not always.
Players can determine duration by choosing one of a few methods:
Decode the entire audio file.This is the slowest method, but if you're going to decode the file anyway, you might as well go this route as it gives you an exact duration.
Read the whole file, skimming through frame headers.You'll have to read the whole file from disk, but you won't have to decode it. Can be slow if I/O is slow, but gives you an exact duration.
Read the first frame's bitrate and estimate duration by file size.Definitely the fastest method, and the one most commonly used by players. Duration is an estimate only, and is reasonably accurate for CBR, but can be wildly inaccurate for VBR.
What I'm getting at is that these files might not actually be broken. They might just be VBR files that your player doesn't know the duration of.
If you're convinced they are broken (such as stopping in the middle of content), then you'll have to figure out how you want to handle it. There are probably only a couple ways to determine this:
Ideally, there's an ID3 tag indicating duration, and you can decode the whole file and determine its real duration to compare.
Usually, that ID3 tag won't exist, so you'll have to check to see if the last frame is complete or not.
Beyond that, you don't really have a good way of knowing if the stream is incomplete, since there is no outer container that actually specifies number of frames to expect.
The expression for calculating the filesize of an mp3 based on duration and encoding (from this answer) is quite simple:
x = length of song in seconds
y = bitrate in kilobits per second
(x * y) / 1024 = filesize (MB)
There is also a javascript implementation for the Web Audio API in another answer on that same question. Perhaps that would be useful in your Node implementation.
mp3diags is some older open source software for fixing mp3s and which was great for batch processing stuff like this. The source is c++ and still available if you're feeling nosy and want to see how some of these features are implemented.
Worth a look since it has some features that might be be useful in your context:
What is MP3 Diags and what does it do?
low quality audio
missing VBR header
missing normalization data
Correcting files that show incorrect song duration
Correcting files in which the player cannot seek correctly

Is it possible to splice advertisements or messages dynamically into an MP3 file via a standard GET request?

Say you have an MP3 file and it's 60,000,000 bytes, and you also have an MP3 advertisement that's 500,000 bytes, both encoded at the same bit rate.
Would it be possible using an nginx or apache module to change the MP3 "Content-Length" header value to 60,500,000 and then control the incoming "Content-Range" requests so the first 500,000 bytes return the advertisement audio, and any range request greater than 500,000 begins returning the regular audio file with a 500,000 byte offset?
Or is it only possible to splice advertisements (or messages) into an MP3 file using an application such as FFmpeg to re-render the entire file?
Apologies if this is a stupid question, I'm just trying to think outside of the box.
You cannot arbitrarily splice MP3 without artifacts and decoder errors.
You also generally cannot cut/splice MP3 on frame boundaries due to the Bit Reservoir. Basically, a particular MP3 frame may contain data from another frame to more efficiently use the available bandwidth when its needed. Ignoring the bit reservoir can also cause artifacts and/or decoder errors.
What you can do is re-encode your advertisement and eventually re-join the stream. That is, at the point of ad insertion, decode the stream to PCM, mix (or replace in the audio) for your ad, and have this parallel stream re-encoded to PCM. If the encoding parameters are the same, eventually (after a couple of extra MP3 frames), you'll have identical bitstreams, and you can go back to reading the stream from the same buffer.
If you're doing this for ad-insertion on internet radio (live) streams, keep in mind that you'll have to do this on the server for every client (or at least, for each ad variant and timing variant). If this is for podcasts or other pre-recorded content, I'd recommend the FFmpeg route. You won't have to build anything, you can stream and cache the output as its being encoded, and you'll have compatibility with other codecs without building one-off code for each codec/container.

How do Shoutcast servers and clients deal with mp3 frame headers and frame dependencies?

Short story:
If I myself intend to receive and then send a Shoutcast compatible audio stream processed by my application, then how to do it properly using an mp3 (de/en)coder library? Pseudo code, or better - lame mp3 specific code would be highly appreciated.
Long story:
More specific questions which bother me were caused by an article about mp3, which says:
Generally, frames are independent items. Each frame has its own header
and audio informations. There is no file header. Therefore, you can
cut any part of MPEG file and play it correctly (this should be done
on frame boundaries but most applications will handle incorrect
headers). For Layer III, this is not 100% correct. Due to internal
data organization in MPEG version 1 Layer III files, frames are often
dependent of each other and they cannot be cut off just like that.
This made me wonder, how Shoutcast servers and clients deal with frame headers and frame dependencies.
Do I have to encode to constant bitrate (CBR) only, if I want to achieve maximum compatibility with the most of Shoutcast players out there?
Is the mp3 frame header used at all or the stream format is deduced from a Shoutcast protocol specific HTTP header?
Does Shoutcast protocol guarantee (or is it common good practice) to start serving mp3 stream on frame boundaries and continue to respond with chunks that are cut at frame boundaries? But what is the minimum or recommended size of a mp3 frame for streaming live audio?
How does Shoutcast deal with frame dependencies - does it do something special with mp3 encoding to ensure that the served stream does not have frames which depend on previous frames (if this is even possible)? Or maybe it ignores these dependencies on server side/client side, thus getting audio quality reduction or even artifacts?
SHOUTcast servers do not know or care about the data being passed through them. They send it as-is. You can actually send arbitrary data through a SHOUTcast server, and receive it. SHOUTcast will segment the media data wherever the buffer size falls.
It's up to the client to re-sync to the data. It does this by locating the frame header, then being decoding. Once the codec has enough frames to reliably play back audio, it will begin outputting raw PCM. It's up to the codec when to decide it's safe to start playback. Since the codec knows what it's doing in terms of decoding the media, it knows when it has sufficient data (including bit reservoirs) to begin without artifacts. It's also worth noting that the bit reservoir cannot be carried on too far, so it doesn't take but a few frames at worst to handle it.
This is one of the reasons it's important to have a sizable buffer server-side, to flush to the clients as fast as possible on connect. If playback is to start quickly, the codec needs more data than the current frame to begin.

Azure Media Services encoded file size

I have similar problem to this: Azure Media Services encoded mp4 file size is 10x the original I have a 500MB mp4 file. After encoding with 'H264 Multiple Bitrate 720p' file size is 11.5 GB. Processing costed a lot too. It's no problem with 1 file, but I have to be prepared to share about 100 1GB mp4 files. It has to cost so much? Maybe I didn't know about something? I'd like to share files with AES encryption.
There are a lot of dependencies on what your final output file will be.
1) You have a 500MB mp4 file. Is it SD, HD, 720P? 1080p? Resolution you started at will matter. If you had SD originally, the 720p preset will scale your video up and use more data.
2) The bitrate you started out at matters. If you have such a small file, it is likely encoded at a very low bitrate. Using the 720p multi bitrate preset will also throw more bits at the file. Basically creating bits that are not necessary for your source- as encoding cannot make things look better.
3) You started with a single bitrate - the preset you are using is generating Multiple bitrates (several MP4 files at different bitrates and resolutions.) The combination of starting with a small low bitrate file, and blowing it up to larger resolutions and bitrates, multiplied by 6 or more files tends to add a lot of data to the output.
The solution to your problem is to use "custom' presets. You don't have to use the built in presets that we define. You can modify them to suit your needs.
I recommend downloading the Azure Media Services Explorer tool at http://aka.ms/amse and using that to modify and submit your own custom JSON presets that match your output requirements better.

Strategy for time-indexed audio archive with lossy compression

For part of one of my projects, I am considering developing an audio archive for internet radio stations. This archive would be indexed and addressable by date/time.
For example, the server would connect to a stream (generally encoded in MP3), and save the stream data. A client could connect to this server and request audio from 2011-07-05 15:58:30 to 2011-07-05 15:59:37. The server would return the audio data to the client for playback.
My initial thought was to save the data to 1-minute chunks of raw MP3 data to disk, and reference these files from a database. The server would be dumb to the stream/file format, and wouldn't understand mpeg frames. It would simply pass on data to the client, dividing the chunks up linearly to send. It would be up to the client to sync to the stream. This is not unlike how internet radio servers run in general. SHOUTcast servers simply output the data, byte for byte, that is sent to them from the encoder. When a client connects, data is sent, regardless of whether or not it even ends on an MP3 frame. It is up to the client to sync.
I am wondering if there might be a better approach, maximizing compatibility with clients and audio formats. Any thoughts on how to go about this?
The only other thing I can think of is decoding the MP3 to raw PCM audio and re-encoding as necessary when requested. I would prefer not to go this route due to the disk space required, and the loss of quality when re-encoding.
This question is language-agnostic, but if it is helpful, I will likely implement a solution in PHP with MySQL as the database.
You don't have to worry about this, since ALL mp3 that I accessed over shoutcast is Constant Bitrate. Do you don't have to index it. I have POC project that had archive in 5 minute chunks, then uses PHP to combine that files and pseudo-stream it to the winamp via shoutcast. It worked!
And since you are working with mp3, you can assume (and you'll assume correctly) that the density of the captured file is linear, so to access 30 second of the 60 second file you should seek in the middle. Since mp3 decoders are robust enough, you don't have to track the frames at all here.
AACplus, whole different story. It's inherent VBR.

Resources