How to convert PCM audio stream for online play - web

I have access to an audio stream of PCM audio buffers. I should be clear I do not have access to the audio file. I only have access to a stream of 4096 byte chunks of the audio data.
The PCM buffers come in with the following format:
PCM Int 16
Little Endian
Two Channels
Interleaved
To support audio playback on a standard browser I need to convert the audio to the following format:
PCM Float 32
Big Endian
Two channels (at most)
Deinterleaved
This audio is coming from an iOS app so I have access to Swift and Objective C (although I am not very comfortable with Objective C...which makes Apple's Audio Converter Services almost impossible to use because Swift really doesn't like pointers).
Additionally the playback will occur on a browser so I could handle the conversion in client side Javascript or server sider. I am proficient enough in the following server side languages to do a conversion:
Java (preferred)
PHP
Node.js
Python
If anyone knows a way to do this in any of these languages please let me know. I have worked on this for long enough that I will probably understand even a very technical description of how to do this.
My current plan is to use bitwise operations to deinterleave the left and right channels, then cast the Int 16 Buffer to a Float 32 Buffer with the Web Audio API. Does this seem like a good plan?
Any help is appreciated, thank you.

My current plan is to use bitwise operations to deinterleave the left and right channels, then cast the Int 16 Buffer to a Float 32 Buffer with the Web Audio API. Does this seem like a good plan?
Yes, that is exactly what you need to do. I do the exact same thing in my applications, and this method works well and is really the only way that makes sense to do it. You don't want to send 32-bit float samples to the client from the server due to the amount of bandwidth. Do the conversion client-side.

Related

String compression to refresh WS2811 RGB LEDs faster

I have the following problem. I am using WS2811 diodes, Arduino Due and node.js to my project. I want to stream video from a device connected to a node.js server and show it on array of diodes. Right now I am able to capture video from any device with browser and camera, change resolution of the video to this desired by me (15x10) and create String chain containing informations of all colors (R,G,B) of all diodes. I am sending it from node.js server to arduino though serial port with baud rate 115200. Unfortunately sending process it is too slow. I would like it to refresh the LED array at least 10 times per second. So I was wondering maybe to compress this string which I am sending to arduino, when it gets there decompress it, and set colors to diodes. Maybe you guys have some experience with similar project and advice me what to do.
For handling diodes I am using adafruit_neopixel library.
If I were you I would try to convert the video to a 16-bit encoding (like RGB565), or maybe even 8-bit, on your server.
Even at that low resolution I'm not certain the atmega328p is powerful enough to convert it back to 24-bit and send the data out to the display, but TIAS. If it doesn't work, you might want to consider switching to a BeagleBone or RPi.
If you have large areas of a similar colour, especially if you have dropped your bit depth to 16 or 8 bits as suggested in the previous answer, Run Length Encoding compression might be worth a try.
It's easy to implement it in a few lines of code:
https://en.wikipedia.org/wiki/Run-length_encoding

Audio streaming by websockets

I'm going to create voice chat. My backend server works on Node.js and almost every connection between client and server uses socket.io.
Is websockets appropriate for my use case? I prefer communication client -> server -> clients than P2P because I expect even 1000 clients connected to one room.
If websocket is ok, then which method is the best to send AudioBuffer to server and playback on other clients? I do it like that:
navigator.getUserMedia({audio: true}, initializeRecorder, errorCallback);
function initializeRecorder(MediaStream) {
var audioCtx = new window.AudioContext();
var sourceNode = audioCtx.createMediaStreamSource(MediaStream);
var recorder = audioCtx.createScriptProcessor(4096, 1, 1);
recorder.onaudioprocess = recorderProcess;
sourceNode.connect(recorder);
recorder.connect(audioCtx.destination);
}
function recorderProcess(e) {
var left = e.inputBuffer.getChannelData(0);
io.socket.post('url', left);
}
But after receive data on other clients I don't know how to playback this Audio Stream from Buffer Arrays.
EDIT
1) Why if I don't connect ScriptProcessor (recorder variable) to destination, onaudioprocess method isn't fired?
Documentation info - "although you don't have to provide a destination if you, say, just want to visualise some audio data" - Web Audio concepts and usage
2) Why I don't hear anything from my speakers after connect recorder variable to destination and if I connect sourceNode variable directly to destination, I do.
Even if onaudioprocess method doesn't do anything.
Anyone can help?
I think web sockets are appropriate here. Just make sure that you are using binary transfer. (I use BinaryJS for this myself, allowing me to open up arbitrary streams to the server.)
Getting the data from user media capture is pretty straightforward. What you have is a good start. The tricky party is on playback. You will have to buffer the data and play it back using your own script processing node.
This isn't too hard if you use PCM everywhere... the raw samples you get from the Web Audio API. The downside of this is that there is a lot of overhead shoving 32-bit floating point PCM around. This uses a ton of bandwidth which isn't needed for speech alone.
I think the easiest thing to do in your case is to reduce the bit depth to an arbitrary bit depth that works well for your application. 8-bit samples are plenty for discernible speech and will take up quite a bit less bandwidth. By using PCM, you avoid having to implement a codec in JS and then having to deal with the buffering and framing of data for that codec.
To summarize, once you have the raw sample data in a typed array in your script processing node, write something to convert those samples from 32-bit float to 8-bit signed integers. Send these buffers to your server in the same size chunks as they come in on, over your binary web socket. The server will then send these to all the other clients on their binary web sockets. When the clients receive audio data, it will buffer it for whatever amount of time you choose to prevent dropping audio. Your client code will convert those 8-bit samples back to 32-bit float and put it in a playback buffer. Your script processing node will pick up whatever is in the buffer and start playback as data is available.

Adding audio effects (reverb etc..) to a BackgroundAudioPlayer driven streaming audio app

I have a windows phone 8 app which plays audio streams from a remote location or local files using the BackgroundAudioPlayer. I now want to be able to add audio effects, for example, reverb or echo, etc...
Please could you advise me on how to do this? I haven't been able to find a way of hooking extra audio processing code into the pipeline of audio processing even through I've read much about WASAPI, XAudio2 and looked at many code examples.
Note that the app is written in C# but, from my previous experience with writing audio processing code, I know that I should be writing the audio code in native C++. Roughly speaking, I need to find a point at which there is an audio buffer containing raw PCM data which I can use as an input for my audio processing code which will then write either back to the same buffer or to another buffer which is read by the next stage of audio processing. There need to be ways of synchronizing what happens in my code with the rest of the phone's audio processing mechanisms and, of course, the process needs to be very fast so as not to cause audio glitches. Or something like that; I'm used to how VST works, not how such things might work in the Windows Phone world.
Looking forward to seeing what you suggest...
Kind regards,
Matt Daley
I need to find a point at which there is an audio buffer containing
raw PCM data
AFAIK there's no such point. This MSDN page hints that audio/video decoding is performed not by the OS, but by the Qualcomm chip itself.
You can use something like Mp3Sharp for decoding. This way the mp3 will be decoded on the CPU by your managed code, you can interfere / process however you like, then feed the PCM into the media stream source. Main downside - battery life: the hardware-provided codecs should be much more power-efficient.

Streaming audio from avconv via NodeJs WebSockets into Chrome with AudioContext

we're having trouble playing streamed audio in a browser (using Chrome).
We have a process which is streaming some audio (for example an internet radio) on udp on some port. It's avconv (avconv -y -i SOMEURL -f alaw udp://localhost:PORT).
We have a NodeJs server which receives this audio stream and forwards it to multiple clients connected via websockets. The audio stream which NodeJs receives is wrapped in a buffer which is an array with numbers from 0 to 255. The data is sent to the browser without any issues and then we're using AudioContext to play the audio stream in the browser (our code is based on AudioStreamer - https://github.com/agektmr/AudioStreamer).
At first, all all we got at this point was static. When looking into the AudioStreamer code, we realized that the audio stream data should be in the -1 to 1 range. With this knowledge we tried modifying each value in the buffer with this formula x = (x/128) - 1. We did it just to see what would happen and surprisingly the static became a bit less awful - you could even make out melodies of songs or words if the audio was speech. But it's still very very bad, lots of static, so this is obviously not a solution - but it does show that we are indeed receiving the audio stream via the websockets and not just some random data.
So the question is - what are we doing wrong? Is there a codec/format we should be using? Of course all the code (the avconv, NodeJs and client side) can be modified at will. We could also use another browser if needed, though I assume that's not the problem here. The only thing we do know is that we really need this to work through websockets.
The OS running the avconv and NodeJs is Ubuntu (various versions 10-13)
Any ideas? All help will be appreciated.
Thanks!
Tomas
The conversion from integer samples to floating point samples is incorrect. You must take into account:
Number of channels
Number of bits per sample
Signed/unsigned
Endianess
Let's assume you have a typical WAV file at 16 bit stereo, signed, little-endian. You're on the right track with your formula, but try this:
x = (x/32768) - 1

Is GSM6.10 audio format block or stream based?

I might be asking the wrong question, but my knowledge in this area is very limited.
I'm using acmStreamConvert to convert PCM to GSM (6.10).
Audio Format: 8khz, 16-bit, mono
For the PCM buffer size I'm using 640 bytes (320 samples). For GSM buffer I'm using 65 bytes. My understanding is that GSM "always" converts 320 samples to 65 bytes.
The reason I ask "block or stream" is I'm wondering if I can safely convert multiple audio streams (real-time) using the same acmStreamConvert handle? I see the function has some flags for ACM_STREAMCONVERTF_START and ACM_STREAMCONVERTF_END and ACM_STREAMCONVERTF_BLOCKALIGN, but is it required I use this start/end sequence for GSM? I understand that might be required for some formats that use head/tails, but I'm hoping this isn't required for GSM format?
I'm working on a group VOIP client, and each client sends GSM format, and then needs to convert to PCM before playing. I'm hoping I don't need one ACM handle per client.
Stream based, or at least the ACM API usage of it is. Trying to use the same ACM objects/handles for multiple streams will produce undesired results. I suspect this also means it doesn't handle lost packets as well as other codecs might (haven't confirmed that part yet).

Resources