Live audio stream playback with a delay - audio

This might be a broad question. I'll try my best to keep the scope as narrow as possible.
I have the url of a mp3 stream I want to playback the stream with an adjustable delay.
So the final program will have some input to allow the user to specify the exact delay. Right now I don't have a particular language or library in mind specifically, cross-platform would be ideal.
For the implementation I have the following:
A ring buffer to hold the audio data and the following pointers:
loadPtr := where to store next byte
livePtr := loadPtr + 1, first byte ready from streaming
delayedPtr := livePtr + delta, where delta is a delay (default=0)
TCP connection for getting the data.
I've tried searching, but have not had much luck. Specially,
how do I store the continuous stream from TCP connection into the ring buffer?
what do I use to play audio from a memory buffer?
would the structure of the mp3 file, headers and data sections, affect this buffer structure?

Related

how to get low latency in Exoplayer RTSP live streaming on android?

I am working on RTSP live Streaming. I am getting live stream on my android App using exoplayer RTSP stream player. But latency of that streaming is about 3 seconds. As latency on vlc media player is 1 second. so how to reduce latency in exoplayer. Is there any way please tell me
What you're facing is buffering latency. VLC uses its own engine and buffering algorithms. However, if you wanna reduce buffering latency on ExoPlayer, you gotta get yourself familiarized with LoadControl. ExoPlayer uses a DefaultLoadControl in default instantiation. This LoadControl is a class that belongs to the ExoPlayer library, and it determines the values of time durations the player should spend in order to buffer the stream. If you wanna reduce the delay, you gotta reduce the LoadControl buffer values.
Here's a quick snippet of creating a SimpleExoPlayer instance with a custom load control :
val customLoadControl = DefaultLoadControl.Builder()
.setBufferDurationsMs(minBuffer, maxBuffer, playbackBuffer, playbackRebuffer)
.build()
Parameters in a nutshell : minBuffer is the minimum buffered video duration, maxBuffer is the maximum buffered video duration, playbackBuffer is the required buffered video duration in order to start playing , playbackRebuffer is the required buffered video duration in case it failed and it retries.
In your case, you should set your values really low, especially the playbackBuffer and minBuffer parameters. You should mess around with small values (they are in milliseconds). Going with a value of 1000 for both minBuffer and playbackBuffer
How to use the custom load control : After building the custom load control instance and storing it in a variable, you should pass it when you're building your SimpleExoPlayer :
myMediaPlayer = SimpleExoPlayer.Builder(this#MainActivity)
.setLoadControl(customLoadControl)
.build()
What to expect:
Using the default values is always recommended. If you mess with the values, the app may crash, the stream may be stuck, or the player may get very glitchy. So manipulate the values intelligently.
Here is the javadocs DefaultLoadControl Javadocs
If you don't know what buffering is, exoplayer (or any other player) may need to buffer (load the upcoming portion of the video/audio and store it in memory, rendered, way faster to access and reduces playback problems. Streamed media however needs buffering because it comes in form of chunks. So each chunk that arrives, will eventually be buffered. If you set the required buffered duration to 1000, you are telling ExoPlayer that the first chunk of stream that arrives whose length is 1000 millisecond should be buffered and played right away. I believe there is no simpler way to explain this. Best of luck.

Is it possible to splice advertisements or messages dynamically into an MP3 file via a standard GET request?

Say you have an MP3 file and it's 60,000,000 bytes, and you also have an MP3 advertisement that's 500,000 bytes, both encoded at the same bit rate.
Would it be possible using an nginx or apache module to change the MP3 "Content-Length" header value to 60,500,000 and then control the incoming "Content-Range" requests so the first 500,000 bytes return the advertisement audio, and any range request greater than 500,000 begins returning the regular audio file with a 500,000 byte offset?
Or is it only possible to splice advertisements (or messages) into an MP3 file using an application such as FFmpeg to re-render the entire file?
Apologies if this is a stupid question, I'm just trying to think outside of the box.
You cannot arbitrarily splice MP3 without artifacts and decoder errors.
You also generally cannot cut/splice MP3 on frame boundaries due to the Bit Reservoir. Basically, a particular MP3 frame may contain data from another frame to more efficiently use the available bandwidth when its needed. Ignoring the bit reservoir can also cause artifacts and/or decoder errors.
What you can do is re-encode your advertisement and eventually re-join the stream. That is, at the point of ad insertion, decode the stream to PCM, mix (or replace in the audio) for your ad, and have this parallel stream re-encoded to PCM. If the encoding parameters are the same, eventually (after a couple of extra MP3 frames), you'll have identical bitstreams, and you can go back to reading the stream from the same buffer.
If you're doing this for ad-insertion on internet radio (live) streams, keep in mind that you'll have to do this on the server for every client (or at least, for each ad variant and timing variant). If this is for podcasts or other pre-recorded content, I'd recommend the FFmpeg route. You won't have to build anything, you can stream and cache the output as its being encoded, and you'll have compatibility with other codecs without building one-off code for each codec/container.

How do Shoutcast servers and clients deal with mp3 frame headers and frame dependencies?

Short story:
If I myself intend to receive and then send a Shoutcast compatible audio stream processed by my application, then how to do it properly using an mp3 (de/en)coder library? Pseudo code, or better - lame mp3 specific code would be highly appreciated.
Long story:
More specific questions which bother me were caused by an article about mp3, which says:
Generally, frames are independent items. Each frame has its own header
and audio informations. There is no file header. Therefore, you can
cut any part of MPEG file and play it correctly (this should be done
on frame boundaries but most applications will handle incorrect
headers). For Layer III, this is not 100% correct. Due to internal
data organization in MPEG version 1 Layer III files, frames are often
dependent of each other and they cannot be cut off just like that.
This made me wonder, how Shoutcast servers and clients deal with frame headers and frame dependencies.
Do I have to encode to constant bitrate (CBR) only, if I want to achieve maximum compatibility with the most of Shoutcast players out there?
Is the mp3 frame header used at all or the stream format is deduced from a Shoutcast protocol specific HTTP header?
Does Shoutcast protocol guarantee (or is it common good practice) to start serving mp3 stream on frame boundaries and continue to respond with chunks that are cut at frame boundaries? But what is the minimum or recommended size of a mp3 frame for streaming live audio?
How does Shoutcast deal with frame dependencies - does it do something special with mp3 encoding to ensure that the served stream does not have frames which depend on previous frames (if this is even possible)? Or maybe it ignores these dependencies on server side/client side, thus getting audio quality reduction or even artifacts?
SHOUTcast servers do not know or care about the data being passed through them. They send it as-is. You can actually send arbitrary data through a SHOUTcast server, and receive it. SHOUTcast will segment the media data wherever the buffer size falls.
It's up to the client to re-sync to the data. It does this by locating the frame header, then being decoding. Once the codec has enough frames to reliably play back audio, it will begin outputting raw PCM. It's up to the codec when to decide it's safe to start playback. Since the codec knows what it's doing in terms of decoding the media, it knows when it has sufficient data (including bit reservoirs) to begin without artifacts. It's also worth noting that the bit reservoir cannot be carried on too far, so it doesn't take but a few frames at worst to handle it.
This is one of the reasons it's important to have a sizable buffer server-side, to flush to the clients as fast as possible on connect. If playback is to start quickly, the codec needs more data than the current frame to begin.

How do I change the volume of a PCM audio stream in node?

I am trying to figure out how to adjust the volume level of a PCM audio stream in node.
I have looked all over npmjs.org at all of the modules that I could find for working with audio, but haven't found anything that will take a stream in, change the volume, and give me a stream out.
Are there any modules that exist that can do this, perhaps even even something that wasn't made specifically for it?
If not, then I could create a module, if someone can point me in the right direction for modifying a stream byte by byte.
Here is what I am trying to accomplish:
I am writing a program to receive several PCM audio streams, and mix them for several outputs with varying volume levels. Example:
inputs vol output
music 25% output 1
live audio 80% output 1
microphone 0% output 1
music 100% output 2
live audio 0% output 2
microphone 0% output 2
What type of connection are you using? (Would make it easier to give example code)
What you basically want to do, is create a connection. Then on the connection or request object add a listener for the 'data' event. If you don't set an encoding, the data parameter on the callback should be a Buffer. The data event is triggered after each chunk is delivered through the network.
The Buffer gives you byte-size access to the data-stream using regular javascript number values. You can then parse that chunk, keep them in memory over multiple data-events using a closure (in order to buffer multiple chunks). And when appropriate write the parsed and processed data to a socket (another socket or the same in case of bi-directional sockets). Don't forget to manage your closure in order to avoid memory leaks!
This is just an abstract description. Let me know if anything needs clarification.

Strategy for time-indexed audio archive with lossy compression

For part of one of my projects, I am considering developing an audio archive for internet radio stations. This archive would be indexed and addressable by date/time.
For example, the server would connect to a stream (generally encoded in MP3), and save the stream data. A client could connect to this server and request audio from 2011-07-05 15:58:30 to 2011-07-05 15:59:37. The server would return the audio data to the client for playback.
My initial thought was to save the data to 1-minute chunks of raw MP3 data to disk, and reference these files from a database. The server would be dumb to the stream/file format, and wouldn't understand mpeg frames. It would simply pass on data to the client, dividing the chunks up linearly to send. It would be up to the client to sync to the stream. This is not unlike how internet radio servers run in general. SHOUTcast servers simply output the data, byte for byte, that is sent to them from the encoder. When a client connects, data is sent, regardless of whether or not it even ends on an MP3 frame. It is up to the client to sync.
I am wondering if there might be a better approach, maximizing compatibility with clients and audio formats. Any thoughts on how to go about this?
The only other thing I can think of is decoding the MP3 to raw PCM audio and re-encoding as necessary when requested. I would prefer not to go this route due to the disk space required, and the loss of quality when re-encoding.
This question is language-agnostic, but if it is helpful, I will likely implement a solution in PHP with MySQL as the database.
You don't have to worry about this, since ALL mp3 that I accessed over shoutcast is Constant Bitrate. Do you don't have to index it. I have POC project that had archive in 5 minute chunks, then uses PHP to combine that files and pseudo-stream it to the winamp via shoutcast. It worked!
And since you are working with mp3, you can assume (and you'll assume correctly) that the density of the captured file is linear, so to access 30 second of the 60 second file you should seek in the middle. Since mp3 decoders are robust enough, you don't have to track the frames at all here.
AACplus, whole different story. It's inherent VBR.

Resources