Which muxer has the lowest overhead for solely audio? - audio

I am muxing audio only mp3float datastream to rtmp, what muxer is the least resource intensive?
I am using flv right now, it works, I was wondering about efficiency of operation.

Related

ffmpeg transcode ONLY if downsampling

I'm using Airsonic as a home music server. It has ffmpeg transcoding capabilities. I'd like to transcode audio ONLY IF input stream has higher parameters (any of kb/s, bit, kHz). And leave native quality if file has lower params (so not to UPscale). Is it possible using some of the ffmpeg options?
P.S. Airsonic has some Downsample field, but it doesn't seem to work at all.

PCM stream real-time music compression

How can I compress a 44.1kHz sampled, 16-bit PCM real-time music data stream to reduce its size and send it over an AXI4 Stream interconnect in a Zynq Z7020?
Can anyone suggest a codec for such a use-case and maybe links to its implementation?
Take a look at IMA ADPCM, a pretty simplistic lossy codec. It doesn't need floating operations, it produces constant bitrate stream, which is easy to handle in hw.
The quality might be not that great though, but without any specs from you it's not possible to suggest something more suitable.

How do Shoutcast servers and clients deal with mp3 frame headers and frame dependencies?

Short story:
If I myself intend to receive and then send a Shoutcast compatible audio stream processed by my application, then how to do it properly using an mp3 (de/en)coder library? Pseudo code, or better - lame mp3 specific code would be highly appreciated.
Long story:
More specific questions which bother me were caused by an article about mp3, which says:
Generally, frames are independent items. Each frame has its own header
and audio informations. There is no file header. Therefore, you can
cut any part of MPEG file and play it correctly (this should be done
on frame boundaries but most applications will handle incorrect
headers). For Layer III, this is not 100% correct. Due to internal
data organization in MPEG version 1 Layer III files, frames are often
dependent of each other and they cannot be cut off just like that.
This made me wonder, how Shoutcast servers and clients deal with frame headers and frame dependencies.
Do I have to encode to constant bitrate (CBR) only, if I want to achieve maximum compatibility with the most of Shoutcast players out there?
Is the mp3 frame header used at all or the stream format is deduced from a Shoutcast protocol specific HTTP header?
Does Shoutcast protocol guarantee (or is it common good practice) to start serving mp3 stream on frame boundaries and continue to respond with chunks that are cut at frame boundaries? But what is the minimum or recommended size of a mp3 frame for streaming live audio?
How does Shoutcast deal with frame dependencies - does it do something special with mp3 encoding to ensure that the served stream does not have frames which depend on previous frames (if this is even possible)? Or maybe it ignores these dependencies on server side/client side, thus getting audio quality reduction or even artifacts?
SHOUTcast servers do not know or care about the data being passed through them. They send it as-is. You can actually send arbitrary data through a SHOUTcast server, and receive it. SHOUTcast will segment the media data wherever the buffer size falls.
It's up to the client to re-sync to the data. It does this by locating the frame header, then being decoding. Once the codec has enough frames to reliably play back audio, it will begin outputting raw PCM. It's up to the codec when to decide it's safe to start playback. Since the codec knows what it's doing in terms of decoding the media, it knows when it has sufficient data (including bit reservoirs) to begin without artifacts. It's also worth noting that the bit reservoir cannot be carried on too far, so it doesn't take but a few frames at worst to handle it.
This is one of the reasons it's important to have a sizable buffer server-side, to flush to the clients as fast as possible on connect. If playback is to start quickly, the codec needs more data than the current frame to begin.

A way to add data "mid stream" to encoded audio (possibly with AAC)

Is there a way to add lossless data to an AAC audio stream?
Essentially I am looking to be able to inject "this frame of audio should be played at XXX time" every n frames in.
If I use a lossless codec I suppose I could just inject my own header mid stream and that data would be intact as it needs to be the same on the way out just like gzip does not loose data.
Any ideas? I suppose I could encode the data into chunks of AAC on the server and on the network layer add a timestamp saying play the following chunk of AAC at time x but I'd prefer to figure a way to add it to the audio itself.
This is not really possible (short of writing your own specialized encoder), as AAC (and MP3) frames are not truly standalone.
There is a concept of the bit reservoir, where unused bandwidth from one frame can be utilized for a later frame that may need more bandwidth to store a more complicated sound. That is, data from frame 1 might be needed in frame 2 and/or 3. If you cut the stream between frames 1 and 2 and insert your alternative frames, the reference to the bit reservoir data is broken and you have damaged frame 2's ability to be decoded.
There are encoders that can work in a mode where the bit reservoir isn't used (at the cost of quality). If operating in this mode, you should be able to cut the stream more freely along frame boundaries.
Unfortunately, the best way to handle this is to do it in the time domain when dealing with your raw PCM samples. This gives you more control over the timing placement anyway, and ensures that your stream can also be used with other codecs.

RTP AAC Packet Depacketizer

I asked earlier about H264 at RTP H.264 Packet Depacketizer
My question now is about the audio packets.
I noticed via the RTP packets that audio frames like AAC, G.711, G.726 and others all have the Marker Bit set.
I think frames are independent. am I right?
My question is: Audio is small, but I know that I can have more than one frame per RTP ​​packet. Independent of how many frames I have, they are complete? Or it may be fragmented between RTP packets.
The difference between audio and video is that audio is typically encoded either in individual samples, or in certain [small] frames without reference to previous data. Additionally, amount of data is small. So audio does not typically need complicated fragmentation to be transmitted over RTP. However, for any payload type you should again refer to RFC that describes the details:
AAC - RTP Payload Format for MPEG-4 Audio/Visual Streams
G.711 - RTP Payload Format for ITU-T Recommendation G.711.1
G.726 - RTP Profile for Audio and Video Conferences with Minimal Control
Other

Resources