PCM stream real-time music compression - audio

How can I compress a 44.1kHz sampled, 16-bit PCM real-time music data stream to reduce its size and send it over an AXI4 Stream interconnect in a Zynq Z7020?
Can anyone suggest a codec for such a use-case and maybe links to its implementation?

Take a look at IMA ADPCM, a pretty simplistic lossy codec. It doesn't need floating operations, it produces constant bitrate stream, which is easy to handle in hw.
The quality might be not that great though, but without any specs from you it's not possible to suggest something more suitable.

Related

Is it realistic to stream 12-16 bit audio through SPP bluetooth in realtime?

I have tried to send 12-bit audio to be listened to in real time through the HC05 SPP bluetooth module hooked up to an arduino and DAC over serial with a python RFCOMM socket. I have since learned that Serial Port Protocol is not very great at all for this purpose due to its low bandwidth. I figured I could definitely send the data and then play it out through a DAC, but I doubt an arduino would hold an array the size of a WAV file and maybe not even an mp3 file, but that would defeat the purpose of controlling the audio (play,pause,rewind,etc) from my computer. Would it be more realistic and worthwhile to use an A2DP enabled bluetooth module? Or is it still possible to listen to acceptable quality 12-16 bit audio in real time with SPP? I have tried to use lower bit songs, adjusted baud rates for the arduino and HC-05 serial ports, and tried to adjust the magnitude of the values outputted by the DAC to the audio port and I still seem to get crackly audio. It seems the problem comes down to the low bitrate transfer speed of SPP, or am I wrong?
Is it realistic to stream 12-16 bit audio through SPP bluetooth in realtime?
Sure, at some awfully slow sample rate <= 8 kHz. You'd be better off sending 8-bit audio at a higher sample rate.
Would it be more realistic and worthwhile to use an A2DP enabled bluetooth module?
Yes, absolutely, without question. That's what it's designed for, as I mentioned in your other question.
Or is it still possible to listen to acceptable quality 12-16 bit audio in real time with SPP?
Acceptable is subjective. If it's just voice, you can get away with it. If you want reasonable audio quality for music, almost universally, no, it's not acceptable.
It seems the problem comes down to the low bitrate transfer speed of SPP, or am I wrong?
Without any code to inspect and debug, it's impossible to say what the specific problem is that you're referring to. Undoubtedly, the low bandwidth will not enable good quality audio anyway.
If you must continue to use SPP and simple codecs like PCM, at least use differential PCM to save a bit more bandwidth.

How to convert PCM audio stream for online play

I have access to an audio stream of PCM audio buffers. I should be clear I do not have access to the audio file. I only have access to a stream of 4096 byte chunks of the audio data.
The PCM buffers come in with the following format:
PCM Int 16
Little Endian
Two Channels
Interleaved
To support audio playback on a standard browser I need to convert the audio to the following format:
PCM Float 32
Big Endian
Two channels (at most)
Deinterleaved
This audio is coming from an iOS app so I have access to Swift and Objective C (although I am not very comfortable with Objective C...which makes Apple's Audio Converter Services almost impossible to use because Swift really doesn't like pointers).
Additionally the playback will occur on a browser so I could handle the conversion in client side Javascript or server sider. I am proficient enough in the following server side languages to do a conversion:
Java (preferred)
PHP
Node.js
Python
If anyone knows a way to do this in any of these languages please let me know. I have worked on this for long enough that I will probably understand even a very technical description of how to do this.
My current plan is to use bitwise operations to deinterleave the left and right channels, then cast the Int 16 Buffer to a Float 32 Buffer with the Web Audio API. Does this seem like a good plan?
Any help is appreciated, thank you.
My current plan is to use bitwise operations to deinterleave the left and right channels, then cast the Int 16 Buffer to a Float 32 Buffer with the Web Audio API. Does this seem like a good plan?
Yes, that is exactly what you need to do. I do the exact same thing in my applications, and this method works well and is really the only way that makes sense to do it. You don't want to send 32-bit float samples to the client from the server due to the amount of bandwidth. Do the conversion client-side.

RTP AAC Packet Depacketizer

I asked earlier about H264 at RTP H.264 Packet Depacketizer
My question now is about the audio packets.
I noticed via the RTP packets that audio frames like AAC, G.711, G.726 and others all have the Marker Bit set.
I think frames are independent. am I right?
My question is: Audio is small, but I know that I can have more than one frame per RTP ​​packet. Independent of how many frames I have, they are complete? Or it may be fragmented between RTP packets.
The difference between audio and video is that audio is typically encoded either in individual samples, or in certain [small] frames without reference to previous data. Additionally, amount of data is small. So audio does not typically need complicated fragmentation to be transmitted over RTP. However, for any payload type you should again refer to RFC that describes the details:
AAC - RTP Payload Format for MPEG-4 Audio/Visual Streams
G.711 - RTP Payload Format for ITU-T Recommendation G.711.1
G.726 - RTP Profile for Audio and Video Conferences with Minimal Control
Other

WebRTC: non 10ms audio frames possible?

Has anyone tried to change the audio frame size in WebRTC? It uses 10ms frames. I need a different size, and the code doesn't look promising...
Fundamentally there's no reason you can't use non-10ms frames, but much of the code is written with that assumption. It would indeed likely be a serious undertaking to change it.
On the device you can use other audio frame (e.g. 20,40ms). For the codec,you can use other audio frame because codec has a audio buffer. I used silk codec and ios device.

Is GSM6.10 audio format block or stream based?

I might be asking the wrong question, but my knowledge in this area is very limited.
I'm using acmStreamConvert to convert PCM to GSM (6.10).
Audio Format: 8khz, 16-bit, mono
For the PCM buffer size I'm using 640 bytes (320 samples). For GSM buffer I'm using 65 bytes. My understanding is that GSM "always" converts 320 samples to 65 bytes.
The reason I ask "block or stream" is I'm wondering if I can safely convert multiple audio streams (real-time) using the same acmStreamConvert handle? I see the function has some flags for ACM_STREAMCONVERTF_START and ACM_STREAMCONVERTF_END and ACM_STREAMCONVERTF_BLOCKALIGN, but is it required I use this start/end sequence for GSM? I understand that might be required for some formats that use head/tails, but I'm hoping this isn't required for GSM format?
I'm working on a group VOIP client, and each client sends GSM format, and then needs to convert to PCM before playing. I'm hoping I don't need one ACM handle per client.
Stream based, or at least the ACM API usage of it is. Trying to use the same ACM objects/handles for multiple streams will produce undesired results. I suspect this also means it doesn't handle lost packets as well as other codecs might (haven't confirmed that part yet).

Resources