Streaming audio from avconv via NodeJs WebSockets into Chrome with AudioContext - node.js

we're having trouble playing streamed audio in a browser (using Chrome).
We have a process which is streaming some audio (for example an internet radio) on udp on some port. It's avconv (avconv -y -i SOMEURL -f alaw udp://localhost:PORT).
We have a NodeJs server which receives this audio stream and forwards it to multiple clients connected via websockets. The audio stream which NodeJs receives is wrapped in a buffer which is an array with numbers from 0 to 255. The data is sent to the browser without any issues and then we're using AudioContext to play the audio stream in the browser (our code is based on AudioStreamer - https://github.com/agektmr/AudioStreamer).
At first, all all we got at this point was static. When looking into the AudioStreamer code, we realized that the audio stream data should be in the -1 to 1 range. With this knowledge we tried modifying each value in the buffer with this formula x = (x/128) - 1. We did it just to see what would happen and surprisingly the static became a bit less awful - you could even make out melodies of songs or words if the audio was speech. But it's still very very bad, lots of static, so this is obviously not a solution - but it does show that we are indeed receiving the audio stream via the websockets and not just some random data.
So the question is - what are we doing wrong? Is there a codec/format we should be using? Of course all the code (the avconv, NodeJs and client side) can be modified at will. We could also use another browser if needed, though I assume that's not the problem here. The only thing we do know is that we really need this to work through websockets.
The OS running the avconv and NodeJs is Ubuntu (various versions 10-13)
Any ideas? All help will be appreciated.
Thanks!
Tomas

The conversion from integer samples to floating point samples is incorrect. You must take into account:
Number of channels
Number of bits per sample
Signed/unsigned
Endianess
Let's assume you have a typical WAV file at 16 bit stereo, signed, little-endian. You're on the right track with your formula, but try this:
x = (x/32768) - 1

Related

How to get exact timestamp of audio recording start in with PortAudio?

We are using PortAudio for recording audio in our electron application. As a node wrapper, we use naudiodon.
The application needs to record both audio and video, but using different sources. Audio, as said, is being recorded with Port Audio, with additional app logic on top. Video, on the other hand, is being recorded with standard MediaRecorder API, with its own formats, properties, and codecs.
We use event 'onstart' to track actual video start and in order to sync audio and video, we must also know the exact audio start time.
Problem is: We are not able to detect that exact timestamp of audio start. What should be the correct way of doing it?
Here is what we tried:
 1. The first option is to listen to portaudio.AudioIO events, such as 'data' and 'readable'. Those are called as soon as PortAudio has new data chunk, so tracking the very first chunk minus its length in milliseconds would result in approximate audio start.
 2. The second option is to add Writable pipe to AudioIO, and do pretty much the same thing as with events.
The issue is, that by doing any of those options, calculated start doesn't always result in the actual timestamp of audio start. While playing around with port audio it was known, that calculated timestamp is higher than it should be, as though some chunks are being buffered before actually released.
Actual audio start and first chunk release can be different, in a range of around 50 - 500 ms with chunk length ~50ms. So chunks might buffer sometimes, and sometimes they don't. Is there any way to track the actual start time of the first chunk? I wasn't able to find any relevant info in checking port audio docs.
Maybe there are any other ways to keep using PortAudio and record video separately, but finally achieve the same desired feature, of synching them together?
PortAudio 19.5, Naudiodon 2.1.0, Electron 6.0.9, Node.js 12.4.0

NodeJS Simulate Live Video Stream

I have a video file that I would like to start broadcasting from NodeJS, preferably through Express, at a given time. That is, if the video starts being available at timestamp t0, then if a client hits the video endpoint at time t0+60, the video playback would start at 60 seconds in.
My key requirement is that when a client connect at a given time, no more of that video be available than what would have been seen so far, so the client connecting at t0+60 would not be able to watch past the minute mark (plus some error threshold) initially, and every ~second, another second of video availability would be added, simulating a live experience synced across all clients regardless of when each loads the stream.
So far, I've tried my luck converting videos to Apple's HLS protocol (because the name sounds promising) and I was able to host the m3u8 files using Node's hls-server library, where the call is very straightforward:
import HLSServer = require('hls-server');
import http = require('http');
const source = __dirname + '/resources';
const server = http.createServer();
const hls = new HLSServer(server, {
path: '/streams', // Base URI to output HLS streams
dir: source // Directory that input files are stored
});
server.listen(8000);
However, it sends the entire video to the browser when asked, and appears to offer no option of forcing a start at a given frame. (I imagine forcing the start position can be done out of band by simply sending the current time to the client and then having the client do whatever is necessary with HTML and Javascript to advance to the latest position).
There are some vague approaches that I saw online that use MP4, but from what I understand, due to its compression, it is hard to know how many bytes of video data correspond to what footage duration as it may widely vary.
There are also some other tutorials which have a direct pipe from an input source such as a webcam, thereby requiring liveness, but for my comparatively simple use case where the video file is already present, I'm content with the ability to maintain a limited amount of precision, such as ±10 seconds, just as long as all clients are forced to be approximately in sync.
Thank you very much in advance, and I appreciate any pointers.

String compression to refresh WS2811 RGB LEDs faster

I have the following problem. I am using WS2811 diodes, Arduino Due and node.js to my project. I want to stream video from a device connected to a node.js server and show it on array of diodes. Right now I am able to capture video from any device with browser and camera, change resolution of the video to this desired by me (15x10) and create String chain containing informations of all colors (R,G,B) of all diodes. I am sending it from node.js server to arduino though serial port with baud rate 115200. Unfortunately sending process it is too slow. I would like it to refresh the LED array at least 10 times per second. So I was wondering maybe to compress this string which I am sending to arduino, when it gets there decompress it, and set colors to diodes. Maybe you guys have some experience with similar project and advice me what to do.
For handling diodes I am using adafruit_neopixel library.
If I were you I would try to convert the video to a 16-bit encoding (like RGB565), or maybe even 8-bit, on your server.
Even at that low resolution I'm not certain the atmega328p is powerful enough to convert it back to 24-bit and send the data out to the display, but TIAS. If it doesn't work, you might want to consider switching to a BeagleBone or RPi.
If you have large areas of a similar colour, especially if you have dropped your bit depth to 16 or 8 bits as suggested in the previous answer, Run Length Encoding compression might be worth a try.
It's easy to implement it in a few lines of code:
https://en.wikipedia.org/wiki/Run-length_encoding

Audio streaming by websockets

I'm going to create voice chat. My backend server works on Node.js and almost every connection between client and server uses socket.io.
Is websockets appropriate for my use case? I prefer communication client -> server -> clients than P2P because I expect even 1000 clients connected to one room.
If websocket is ok, then which method is the best to send AudioBuffer to server and playback on other clients? I do it like that:
navigator.getUserMedia({audio: true}, initializeRecorder, errorCallback);
function initializeRecorder(MediaStream) {
var audioCtx = new window.AudioContext();
var sourceNode = audioCtx.createMediaStreamSource(MediaStream);
var recorder = audioCtx.createScriptProcessor(4096, 1, 1);
recorder.onaudioprocess = recorderProcess;
sourceNode.connect(recorder);
recorder.connect(audioCtx.destination);
}
function recorderProcess(e) {
var left = e.inputBuffer.getChannelData(0);
io.socket.post('url', left);
}
But after receive data on other clients I don't know how to playback this Audio Stream from Buffer Arrays.
EDIT
1) Why if I don't connect ScriptProcessor (recorder variable) to destination, onaudioprocess method isn't fired?
Documentation info - "although you don't have to provide a destination if you, say, just want to visualise some audio data" - Web Audio concepts and usage
2) Why I don't hear anything from my speakers after connect recorder variable to destination and if I connect sourceNode variable directly to destination, I do.
Even if onaudioprocess method doesn't do anything.
Anyone can help?
I think web sockets are appropriate here. Just make sure that you are using binary transfer. (I use BinaryJS for this myself, allowing me to open up arbitrary streams to the server.)
Getting the data from user media capture is pretty straightforward. What you have is a good start. The tricky party is on playback. You will have to buffer the data and play it back using your own script processing node.
This isn't too hard if you use PCM everywhere... the raw samples you get from the Web Audio API. The downside of this is that there is a lot of overhead shoving 32-bit floating point PCM around. This uses a ton of bandwidth which isn't needed for speech alone.
I think the easiest thing to do in your case is to reduce the bit depth to an arbitrary bit depth that works well for your application. 8-bit samples are plenty for discernible speech and will take up quite a bit less bandwidth. By using PCM, you avoid having to implement a codec in JS and then having to deal with the buffering and framing of data for that codec.
To summarize, once you have the raw sample data in a typed array in your script processing node, write something to convert those samples from 32-bit float to 8-bit signed integers. Send these buffers to your server in the same size chunks as they come in on, over your binary web socket. The server will then send these to all the other clients on their binary web sockets. When the clients receive audio data, it will buffer it for whatever amount of time you choose to prevent dropping audio. Your client code will convert those 8-bit samples back to 32-bit float and put it in a playback buffer. Your script processing node will pick up whatever is in the buffer and start playback as data is available.

How can I concatenate ATSC streams from DVB card?

I'm trying to make a simple "TV viewer" using a Linux DVB video capture card. Currently I watch TV using the following process (I'm on a Raspberry Pi):
Tune to a channel using azap -r TV_CHANNEL_HERE. This will supply bytes to
device /dev/dvb/adapter0/dvr0.
Open OMXPlayer omxplayer /dev/dvb/adapter0/dvr0
Watch TV!
The problem comes when I try to change channels. Even if I set the player to cache incoming bytes (tried with MPlayer also), the player can't withstand a channel change (by restarting azap with a new channel.
I'm thinking this is because of changes in the MPEG TS stream metadata.
Looking for a C library that would let me do the following:
Pull cache_size * mpeg_ts_packet_size from DVR device.
Evaluate each packet and rewrite metadata (PID, etc) as needed.
Populate FIFO with resulting packet.
Set {OMXPlayer,MPlayer} to read from FIFO.
The other thing I was thinking would be to use a program that converts MPEG TS into MPEG PS and concatenate the bytes that way.
Thoughts?
Indeed, when you want to tune on an other channel, some metadata can potentially change and invalid previously cached data.
Unfortunately I'm not familiar with the tools you are using but your point 2. makes me raise an eyebrow: you will waste your time trying to rewrite Transport Stream data.
I would rather suggest to stop and restart process on zapping since it seems to work fine at start.
P.S.:
Here are some tools that can help. Also, I'm not sure at which level your problem is but VLC can be installed on Raspberry PI and it handles TS gracefully.

Resources