Read sound output of computer in Node.js and analyze sound - node.js

I can't find a Node.js npm package that allows me to record the current output of my mac's sound and lets me analyze it.
I'm trying to create a music visualizer that just shows the currently volume of sound playing from the computer.
Does anyone have tips or idea of what kinda of package to use?

You could use 'node-audiorecorder' to capture time series audio data, as a WAV stream. -> node-audiorecorder
This will basically give you float values that you could work with.
Some Example code of above,
// Import module.
const AudioRecorder = require('node-audiorecorder');
// Options is an optional parameter for the constructor call.
// If an option is not given the default value, as seen below, will be used.
const options = {
program: `rec`, // Which program to use, either `arecord`, `rec`, or `sox`.
device: null, // Recording device to use, e.g. `hw:1,0`
bits: 16, // Sample size. (only for `rec` and `sox`)
channels: 1, // Channel count.
encoding: `signed-integer`, // Encoding type. (only for `rec` and `sox`)
format: `S16_LE`, // Encoding type. (only for `arecord`)
rate: 16000, // Sample rate.
type: `wav`, // Format type.
// Following options only available when using `rec` or `sox`.
silence: 2, // Duration of silence in seconds before it stops recording.
thresholdStart: 0.5, // Silence threshold to start recording.
thresholdStop: 0.5, // Silence threshold to stop recording.
keepSilence: true // Keep the silence in the recording.
// Optional parameter intended for debugging.
// The object has to implement a log and warn function.
const logger = console;
// Create an instance.
let audioRecorder = new AudioRecorder(options, logger);
You could use the data you have captured and use audio-render For any analysis that you need to do. (The library has some functions to capture data itself)
.pipe(Render(function (canvas) {
var data = this.getFloatTimeDomainData();
//draw volume, spectrum, spectrogram, waveform — any data you need


Interpolate silence in Discord.js stream

I'm making a discord bot with Discord.js v14 that records users' audio as individual files and one collective file. As Discord.js streams do not interpolate silence, my question is how to interpolate silence into streams.
My code is based off the Discord.js recording example.
In essence, a privileged user enters a voice channel (or stage), runs /record and all the users in that channel are recorded up until the point that they run /leave.
I've tried using Node packages like combined-stream, audio-mixer, multistream and multipipe, but I'm not familiar enough with Node streams to use the pros of each to fill in the gaps the cons add to the problem. I'm not entirely sure how to go about interpolating silence, either, whether it be through a Transform (likely requires the stream to be continuous, or for the receiver stream to be applied onto silence) or through a sort of "multi-stream" that swaps between piping the stream and a silence buffer. I also have yet to overlay the audio files (e.g, with ffmpeg).
Would it even be possible for a Readable to await an audio chunk and, if none is given within a certain timeframe, push a chunk of silence instead? My attempt at doing so is below (again, based off the Discord.js recorder example):
const SILENCE = Buffer.from([0xf8, 0xff, 0xfe]);
async function createListeningStream(connection, userId) {
// Creating manually terminated stream
let receiverStream = connection.receiver.subscribe(userId, {
end: {
behavior: EndBehaviorType.Manual
// Interpolating silence
// TODO Increases file length over tenfold by stretching audio?
let userStream = new Readable({
read() {
receiverStream.on('data', chunk => {
if (chunk) {
else {
// Never occurs
/* Piping userStream to file at 48kHz sample rate */
As an unnecessary bonus, it would help if it were possible to check whether a user ever spoke or not to eliminate creating empty recordings.
Thanks in advance.
Record all users in a voice channel in discord js v12
Adding silent frames to a node js stream when no data is received
After a lot of reading about Node streams, the solution I procured was unexpectedly simple.
Create a boolean variable recording that is true when the recording should continue and false when it should stop
Create a buffer to handle backpressuring (i.e, when data is input at a higher rate than its output)
let buffer = [];
Create a readable stream for which the receiving user audio stream is piped into
// New audio stream (with silence)
let userStream = new Readable({
// ...
// User audio stream (without silence)
let receiverStream = connection.receiver.subscribe(userId, {
end: {
behavior: EndBehaviorType.Manual,
receiverStream.on('data', chunk => buffer.push(chunk));
In that stream's read method, handle stream recording with a 48kHz timer to match the sample rate of the user audio stream
read() {
if (recording) {
let delay = new NanoTimer();
delay.setTimeout(() => {
if (buffer.length > 0) {
else {
}, '', '20m');
// ...
In the same method, also handle ending the stream
// ...
else if (buffer.length > 0) {
// Stream is ending: sending buffered audio ASAP
else {
// Ending stream
If we put it all together:
const NanoTimer = require('nanotimer'); // node
/* import NanoTimer from 'nanotimer'; */ // es6
const SILENCE = Buffer.from([0xf8, 0xff, 0xfe]);
async function createListeningStream(connection, userId) {
// Accumulates very, very slowly, but only when user is speaking: reduces buffer size otherwise
let buffer = [];
// Interpolating silence into user audio stream
let userStream = new Readable({
read() {
if (recording) {
// Pushing audio at the same rate of the receiver
// (Could probably be replaced with standard, less precise timer)
let delay = new NanoTimer();
delay.setTimeout(() => {
if (buffer.length > 0) {
else {
// delay.clearTimeout();
}, '', '20m'); // A 20.833ms period makes for a 48kHz frequency
else if (buffer.length > 0) {
// Sending buffered audio ASAP
else {
// Ending stream
// Redirecting user audio to userStream to have silence interpolated
let receiverStream = connection.receiver.subscribe(userId, {
end: {
behavior: EndBehaviorType.Manual, // Manually closed elsewhere
// mode: 'pcm',
receiverStream.on('data', chunk => buffer.push(chunk));
// pipeline(userStream, ...), etc.
From here, you can pipe that stream into a fileWriteStream, etc. for individual purposes. Note that it's a good idea to also close the receiverStream whenever recording = false with something like:
As well, the userStream should, too be closed if it's not, e.g, the first argument of the pipeline method.
As a side note, although outside the scope of my original question, there are many other modifications you can make to this. For instance, you can prepend silence to the audio before piping the receiverStream's data to the userStream, e.g, to make multiple audio streams of the same length:
// let startTime = ...
let creationTime;
for (let i = startTime; i < (creationTime =; i++) {
Happy coding!

Precise method of segmenting & transcoding video+audio (via ffmpeg), into an on-demand HLS stream?

recently I've been messing around with FFMPEG and streams through Nodejs. My ultimate goal is to serve a transcoded video stream - from any input filetype - via HTTP, generated in real-time as it's needed in segments.
I'm currently attempting to handle this using HLS. I pre-generate a dummy m3u8 manifest using the known duration of the input video. It contains a bunch of URLs that point to individual constant-duration segments. Then, once the client player starts requesting the individual URLs, I use the requested path to determine which time range of video the client needs. Then I transcode the video and stream that segment back to them.
Now for the problem: This approach mostly works, but has a small audio bug. Currently, with most test input files, my code produces a video that - while playable - seems to have a very small (< .25 second) audio skip at the start of each segment.
I think this may be an issue with splitting using time in ffmpeg, where possibly the audio stream cannot be accurately sliced at the exact frame the video is. So far, I've been unable to figure out a solution to this problem.
If anybody has any direction they can steer me - or even a prexisting library/server that solves this use-case - I appreciate the guidance. My knowledge of video encoding is fairly limited.
I'll include an example of my relevant current code below, so others can see where I'm stuck. You should be able to run this as a Nodejs Express server, then point any HLS player at localhost:8080/master to load the manifest and begin playback. See the transcode.get('/segment/:seg.ts' line at the end, for the relevant transcoding bit.
'use strict';
const express = require('express');
const ffmpeg = require('fluent-ffmpeg');
let PORT = 8080;
let HOST = 'localhost';
const transcode = express();
* This file demonstrates an Express-based server, which transcodes & streams a video file.
* All transcoding is handled in memory, in chunks, as needed by the player.
* It works by generating a fake manifest file for an HLS stream, at the endpoint "/m3u8".
* This manifest contains links to each "segment" video clip, which browser-side HLS players will load as-needed.
* The "/segment/:seg.ts" endpoint is the request destination for each clip,
* and uses FFMpeg to generate each segment on-the-fly, based off which segment is requested.
const pathToMovie = 'C:\\input-file.mp4'; // The input file to stream as HLS.
const segmentDur = 5; // Controls the duration (in seconds) that the file will be chopped into.
const getMetadata = async(file) => {
return new Promise( resolve => {
ffmpeg.ffprobe(file, function(err, metadata) {
// Generate a "master" m3u8 file, which the player should point to:
transcode.get('/master', async(req, res) => {
res.set({"Content-Disposition":"attachment; filename=\"m3u8.m3u8\""});
// Generate an m3u8 file to emulate a premade video manifest. Guesses segments based off duration.
transcode.get('/m3u8', async(req, res) => {
let met = await getMetadata(pathToMovie);
let duration = met.format.duration;
let out = '#EXTM3U\n' +
'#EXT-X-VERSION:3\n' +
`#EXT-X-TARGETDURATION:${segmentDur}\n` +
let splits = Math.max(duration / segmentDur);
for(let i=0; i< splits; i++){
out += `#EXTINF:${segmentDur},\n/segment/${i}.ts\n`;
res.set({"Content-Disposition":"attachment; filename=\"m3u8.m3u8\""});
// Transcode the input video file into segments, using the given segment number as time offset:
transcode.get('/segment/:seg.ts', async(req, res) => {
const segment = req.params.seg;
const time = segment * segmentDur;
let proc = new ffmpeg({source: pathToMovie})
.outputOptions('-preset faster')
.outputOptions('-g 50')
.outputOptions('-profile:v main')
.outputOptions('-ar 48000')
.outputOptions('-c:v h264')
.outputOptions(`-output_ts_offset ${time}`)
.on('error', function(err, st, ste) {
console.log('an error happened:', err, st, ste);
}).on('progress', function(progress) {
.pipe(res, {end: true});
transcode.listen(PORT, HOST);
console.log(`Running on http://${HOST}:${PORT}`);
I had the same problem as you, and I've managed to fix this issue as i mentioned in the comment by starting the complete HLS transcoding instead of doing manually the segment requested by the client. I'm going to simplify what I've done and also share the link to my github repo where I've implemented this. I did the same as you for generating the m3u8 manifest:
const segmentDur = 4; // Segment duration in seconds
const splits = Math.max(duration / segmentDur); // duration = duration of the video in seconds
let out = '#EXTM3U\n' +
'#EXT-X-VERSION:3\n' +
`#EXT-X-TARGETDURATION:${segmentDur}\n` +
for (let i = 0; i < splits; i++) {
out += `#EXTINF:${segmentDur}, nodesc\n/api/video/${id}/hls/${quality}/segments/${i}.ts?segments=${splits}&group=${group}&audioStream=${audioStream}&type=${type}\n`;
out += '#EXT-X-ENDLIST\n';
This works fine when you transcode the video (i.e use for example libx264 as video encoder in the ffmpeg command later on). If you use videocodec copy the segments won't match the segmentDuration from my testing. Now you have a choice here, either you start the ffmpeg transcoding at this point when the m3u8 manifest is requested, or you wait until the first segment is requested. I went with the second option since I want to support starting the transcoding based on which segment is requested.
Now comes the tricky part, when the client requests a segment api/video/${id}/hls/<quality>/segments/<segment_number>.ts in my case you have to first check if any transcoding is already active. If a transcoding is active, you have to check if the requested segment has been processed or not. If it has been processed we can simply send the requested segment back to the client. If it hasn't been processed yet (for example because of a user seek action) we can either wait for it (if the latest processed segment is close to the requested) or we can stop the previous transcoding and restart at the newly requested segment.
I'm gonna try to keep this answer as simple as I can, the ffmpeg command I use to achieve the HLS transcoding looks like this:
this.ffmpegProc = ffmpeg(this.filePath)
.on('end', () => {
this.finished = true;
.on('progress', progress => {
const seconds = this.addSeekTimeToSeconds(this.timestampToSeconds(progress.timemark));
const latestSegment = Math.max(Math.floor(seconds / Transcoding.SEGMENT_DURATION) - 1); // - 1 because the first segment is 0
this.latestSegment = latestSegment;
.on('start', (commandLine) => {
logger.DEBUG(`[HLS] Spawned Ffmpeg (startSegment: ${this.startSegment}) with command: ${commandLine}`);
.on('error', (err, stdout, stderr) => {
if (err.message != 'Output stream closed' && err.message != 'ffmpeg was killed with signal SIGKILL') {
logger.ERROR(`Cannot process video: ${err.message}`);
logger.ERROR(`ffmpeg stderr: ${stderr}`);
Where output options are:
return [
'-copyts', // Fixes timestamp issues (Keep timestamps as original file)
'-pix_fmt yuv420p',
'-map 0',
'-map -v',
'-map 0:V',
'-g 52',
`-crf ${this.CRF_SETTING}`,
'-deadline realtime',
'-preset:v ultrafast',
'-f hls',
`-hls_time ${Transcoding.SEGMENT_DURATION}`,
'-force_key_frames expr:gte(t,n_forced*2)',
'-hls_playlist_type vod',
`-start_number ${this.startSegment}`,
'-strict -2',
'-level 4.1', // Fixes chromecast issues
'-ac 2', // Set two audio channels. Fixes audio issues for chromecast
'-b:v 1024k',
'-b:a 192k',
And input options:
let inputOptions = [
'-copyts', // Fixes timestamp issues (Keep timestamps as original file)
'-threads 8',
`-ss ${this.startSegment * Transcoding.SEGMENT_DURATION}`
Parameters worth noting is the -start_number in the output options, this basically tells ffmpeg which number to use for the first segment, if the client requests for example segment 500 we want to keep it simple and start the numbering at 500 if we have to restart the transcoding. Then we have the standard HLS settings (hls_time, hls_playlist_type and f). In the inputoptions I use -ss to seek to the requested transcoding, since we know we told the client in the generated m3u8 manifest that each segment was 4 seconds long, we can just seek to 4 * requestedSegment.
You can see in the 'progress' event from ffmpeg I calculate the latest processed segment by looking at the timemark. By converting the timemark to seconds, then adding the applied seek-time for the transcoding we can calculate approximately which segment was just finished by dividing the amount of seconds with the segment duration which I've set to 4.
Now there is a lot more to keep track of than just this, you have to save the ffmpeg processes that you've started so you can check if a segment is finished or not and if a transcoding is active when the segment is requested. You also have to stop already running transcodings if the user requests a segment far in the future so you can restart it with the correct seek time.
The downside to this approach is that the file is actually being transcoded and saved to your file system while the transcoding is running, so you need to remove the files when the user stops requesting segments.
I've implemented this so it handles the things I've mentioned (long seeks, different resolution requests, waiting until segment is finished etc). If you want to have a look at it it's located here: Github Dose, most interesting files are the transcoding class, hlsManger class and the endpoint for the segments. I tried explaining this as good as I can so I hope you can use this as some sort of base or idea on how to move forward.

Can I offline render an audio file with dynamic tempo?

I'm developing a karaoke application.
I try to provide a funny function.
can I use AudioKit to offline render an audio file with time based dynamic tempo value?
Click the below image and you can get it very soon.
image example
And I post some code here.
// I want to change the tempo for bgm audio file dynamically
self.timePitch = AKTimePitch(self.bgmPlayer)
// here I set the initialized rate value to time Pitch
self.timePitch.rate = 1.0
// support iOS10+
self.out = AKOfflineRenderNode()
self.timePitch.connect(to: self.out)
// make the renderer as AudioKit.out
AudioKit.output = self.out
do {
try AudioKit.start()
} catch {
let url = URL(fileURLWithPath: NSTemporaryDirectory() + "output.caf")
// get total duration
let duration = self.duration() .background).async {
do {
let avAudioTime = AVAudioTime(sampleTime: 0, atRate:self.out.avAudioNode.inputFormat(forBus: 0).sampleRate)
// start play BGM avAudioTime)
// and render it to an offline file
try self.out?.renderToURL(url, duration: duration)
// **********
// Question:
// Can I change the tempo value when rendering?
// **********
// stop when finished
} catch {
It really depends on how the dynamic tempo is realized - you can send the audio through time/pitch shifting and render the result.

Decode streaming audio with gstreamer 1.0 and access the waveform data?

The actual gst version is 1.8.1.
Currently I have code that receives a gstreamer encoded stream and plays it through my soundcard. I want to modify it to instead give my application access to the raw un-compressed audio data. This should result in an array of integer sound samples, and if I were to plot them I would see the audio wave form (e.g. a perfect tone would be a nice sine wave), and if I were to append the most recent array to the last one received by a callback I wouldn't see any discontinuity.
This is the current playback code:
I think I need to change the alsasink to an appsink, and setting up a callback that will get the latest chunk of audio after it has passed through the decoder. This is adapted from :
_sink = gst_element_factory_make("appsink", "sink");
g_object_set (G_OBJECT (_sink), "emit-signals", TRUE,
"sync", FALSE, NULL);
g_signal_connect (_sink, "new-sample",
G_CALLBACK (on_new_sample_from_sink), this);
And then there is the callback:
static GstFlowReturn
on_new_sample_from_sink (GstElement * elt, gpointer data)
RosGstProcess *client = reinterpret_cast<RosGstProcess*>(data);
GstSample *sample;
GstBuffer *app_buffer, *buffer;
GstElement *source;
/* get the sample from appsink */
sample = gst_app_sink_pull_sample (GST_APP_SINK (elt));
buffer = gst_sample_get_buffer (sample);
/* make a copy */
app_buffer = gst_buffer_copy (buffer);
/* we don't need the appsink sample anymore */
gst_sample_unref (sample);
/* get source and push new buffer */
source = gst_bin_get_by_name (GST_BIN (client->_sink), "app_source");
return gst_app_src_push_buffer (GST_APP_SRC (source), app_buffer);
Can I get at the data in that callback? What am I supposed to do with the GstFlowReturn? If that is passing data to another pipeline element I don't want to do that, I'd rather get it there and be done.
Is the gpointer data passed to that callback exactly what I want (cast to a gint16 array?), or otherwise how do I convert and access it?
The GstFlowReturn is merely a return value for the underlying base classes. If you would return an error there the pipeline probably stops because.. well there was a critical error.
The cb_need_data events are triggered by your appsrc element. This can be used as a throttling mechanism if needed. Since you probably use the appsrc in a pure push mode (as soon something arrives at the appsink you push it to the appsrc) you can ignore these. You also explicitly disable these events on the appsrc element. (Or do you still use the one?)
The data format in the buffer depends on the caps that the decoder and appsink agreed on. That is usually the decoder preferred format. You may have some control over this format depending on the decoder or convert it to your preferred format. May be worthwhile to check the format, Float32 is not that uncommon..
I kind of forgot what your actual question was, I'm afraid..
I can interpret the data out of the modified callback below (there is a script that plots it to the screen), it looks like it is signed 16-bit samples in the uint8 array.
I'm not clear about the proper return value for the callback, there is a cb_need_data callback setup elsewhere in the code that is getting triggered all the time with this code.
static void // GstFlowReturn
on_new_sample_from_sink (GstElement * elt, gpointer data)
RosGstProcess *client = reinterpret_cast<RosGstProcess*>(data);
GstSample *sample;
GstBuffer *buffer;
GstElement *source;
/* get the sample from appsink */
sample = gst_app_sink_pull_sample (GST_APP_SINK (elt));
buffer = gst_sample_get_buffer (sample);
GstMapInfo map;
if (gst_buffer_map (buffer, &map, GST_MAP_READ))
audio_common_msgs::AudioData msg;;
// TODO(lucasw) copy this more efficiently
for (size_t i = 0; i < map.size; ++i)
{[i] =[i];
gst_buffer_unmap (buffer, &map);

Playing PCM stream from Web Audio API on Node.js

I'm streaming recorded PCM audio from a browser with web audio api.
I'm streaming it with binaryJS (websocket connection) to a nodejs server and I'm trying to play that stream on the server using the speaker npm module.
This is my client. The audio buffers are at first non-interleaved IEEE 32-bit linear PCM with a nominal range between -1 and +1. I take one of the two PCM channels to start off and stream it below.
var client = new BinaryClient('ws://localhost:9000');
var Stream = client.send();
recorder.onaudioprocess = function(AudioBuffer){
var leftChannel = AudioBuffer.inputBuffer.getChannelData (0);
Now I receive the data as a buffer and try writing it to a speaker object from the npm package.
var Speaker = require('speaker');
var speaker = new Speaker({
channels: 1, // 1 channel
bitDepth: 32, // 32-bit samples
sampleRate: 48000, // 48,000 Hz sample rate
server.on('connection', function(client){
client.on('stream', function(stream, meta){
stream.on('data', function(data){
The result is a high pitch screech on my laptop's speakers, which is clearly not what's being recorded. It's not feedback either. I can confirm that the recording buffers on the client are valid since I tried writing them to a WAV file and it played back fine.
The docs for speaker and the docs for the AudioBuffer in question
I've been stumped on this for days. Can someone figure out what is wrong or perhaps offer a different approach?
Update with solution
First off, I was using the websocket API incorrectly. I updated above to use it correctly.
I needed to convert the audio buffers to an array buffer of integers. I choose to use Int16Array. Since the given audio buffer has a range in-between 1 and -1, it was as simple as multiplying by the range of the new ArrayBuffer (32767 to -32768).
recorder.onaudioprocess = function(AudioBuffer){
var left = AudioBuffer.inputBuffer.getChannelData (0);
var l = left.length;
var buf = new Int16Array(l)
while (l--) {
buf[l] = left[l]*0xFFFF; //convert to 16 bit
It looks like you're sending your stream through as the meta object.
According to the docs, BinaryClient.send takes a data object (the stream) and a meta object, in that order. The callback for the stream event receives the stream (as a BinaryStream object, not a Buffer) in the first parameter and the meta object in the second.
You're passing send() the string 'channel' as the stream and the Float32Array from getChannelData() as the meta object. Perhaps if you were to swap those two parameters (or just use client.send(leftChannel)) and then change the server code to pass stream to speaker.write instead of leftchannel (which should probably be renamed to meta, or dropped if you don't need it), it might work.
Note that since Float32Array isn't a stream or buffer object, BinaryJS might try to send it in one chunk. You may want to send leftChannel.buffer (the ArrayBuffer behind that object) instead.
Let me know if this works for you; I'm not able to test your exact setup right now.
