Prometheus and Node exporter in milliseconds - node.js

I have a node express app with prom-client to monitor a serial connection and report the values to a http endpoint, the serial speed is 9600baud and is transferring some statistics over.
A Prometheus instance is configured with a job 10milliseconds interval to target that end point and grab the metrics.
I want to be able to see this metrics in at least 10 milliseconds resolution but it seems the Prometheus graph resolution does not accepts less than 1 seconds.
What should I do to get Prometheus collect data with at least 10 milliseconds res.
Is there a config I miss?
I have searched for hours
this is my node js app, a serial port listener is waiting for json messages, parses them and updates gauge metric types from 'prom-client' to be represented by express!
const serialPath = '/dev/tty.usbmodem14201';
const port = new SerialPort(serialPath, {
baudRate: 9600
});
const parser = new Readline();
port.pipe(parser);
parser.on('data', (line) => {
try {
const obj = JSON.parse(line);
if (obj.command !== undefined) {
console.log(obj);
}
if (obj.a) {
obj.a.forEach((analog) => {
analogGuage.set({
pin: analog.i
}, analog.v);
})
}
} catch (ex) {
console.log('Exception in parsing serial json:', ex);
console.log('Exception in parsing serial json:', line);
}
});
metrics endpoint for prometheus to call each 10ms
expressApp.get('/metrics', (req, res) => {
const metrics = client.register.metrics();
res.set('Content-Type', client.register.contentType);
res.end(metrics);
});
It is critical to mention all this is for an experimental personal embedded system :) so, no bottleneck or performance considerations are in place except to be able to transfer and parse serial reading in less than 10ms
since right now the Prometheus and the node exporter app are running on my PC, so 10ms intervals seems easy for Prom.
Please help.
Answer Edit: so I decided to drop Prometheus instead of InfluxDB, as both licenses allow source access and they promote millisec, nanosec monitoring,
but for future reference 9600baud was not enough either, but still after 115200baud rate and 150millisec reporting loops Prom. still did not manage to show less than 1sec,
So InfluxDB did it beatifullty , here is some pictures:
bellow is a 30sec window of Prom. on 115200baud
and about 10 second on same 115200baud in InfluxDB

While you can set scrape intervals less than a second, this isn't what Prometheus is designed for as that's hard real time monitoring at that point and e.g. the kernel scheduler may cause Prometheus to stop running briefly and miss some scrapes, which wouldn't be an issue with more typical scrape intervals.
I'd suggest looking at a custom solution if you need such a high resolution.

Related

Nodejs Cluster Architecture reading from single REDIS instance

I'm using Nodejs cluster module to have multiple workers running.
I created a basic Architecture where there will be a single MASTER process which is basically an express server handling multiple requests and the main task of MASTER would be writing incoming data from requests into a REDIS instance. Other workers(numOfCPUs - 1) will be non-master i.e. they won't be handling any request as they are just the consumers. I have two features namely ABC and DEF. I distributed the non-master workers evenly across features via assigning them type.
For eg: on a 8-core machine:
1 will be MASTER instance handling request via express server
Remaining (8 - 1 = 7) will be distributed evenly. 4 to feature:ABD and 3 to fetaure:DEF.
non-master workers are basically consumers i.e. they read from REDIS in which only MASTER worker can write data.
Here's the code for the same:
if (cluster.isMaster) {
// Fork workers.
for (let i = 0; i < numCPUs - 1; i++) {
ClusteringUtil.forkNewClusterWithAutoTypeBalancing();
}
cluster.on('exit', function(worker) {
console.log(`Worker ${worker.process.pid}::type(${worker.type}) died`);
ClusteringUtil.removeWorkerFromList(worker.type);
ClusteringUtil.forkNewClusterWithAutoTypeBalancing();
});
// Start consuming on server-start
ABCConsumer.start();
DEFConsumer.start();
console.log(`Master running with process-id: ${process.pid}`);
} else {
console.log('CLUSTER type', cluster.worker.process.env.type, 'running on', process.pid);
if (
cluster.worker.process.env &&
cluster.worker.process.env.type &&
cluster.worker.process.env.type === ServerTypeEnum.EXPRESS
) {
// worker for handling requests
app.use(express.json());
...
}
{
Everything works fine except consumers reading from REDIS.
Since there are multiple consumers of a particular feature, each one reads the same message and start processing individually, which is what I don't want. If there are 4 consumers, 1 is marked as busy and can not consumer until free, 3 are available. Once the message for that particular feature is written in REDIS by MASTER, the problem is all 3 available consumers of that feature start consuming. This means that the for a single message, the job is done based on number of available consumers.
const stringifedData = JSON.stringify(req.body);
const key = uuidv1();
const asyncHsetRes = await asyncHset(type, key, stringifedData);
if (asyncHsetRes) {
await asyncRpush(FeatureKeyEnum.REDIS.ABC_MESSAGE_QUEUE, key);
res.send({ status: 'success', message: 'Added to processing queue' });
} else {
res.send({ error: 'failure', message: 'Something went wrong in adding to queue' });
}
Consumer simply accepts messages and stop when it is busy
module.exports.startHeartbeat = startHeartbeat = async function(config = {}) {
if (!config || !config.type || !config.listKey) {
return;
}
heartbeatIntervalObj[config.type] = setInterval(async () => {
await asyncLindex(config.listKey, -1).then(async res => {
if (res) {
await getFreeWorkerAndDoJob(res, config);
stopHeartbeat(config);
}
});
}, HEARTBEAT_INTERVAL);
};
Ideally, a message should be read by only one consumer of that particular feature. After consuming, it is marked as busy so it won't consume further until free(I have handled this). Next message could only be processed by only one consumer out of other available consumers.
Please help me in tacking this problem. Again, I want one message to be read by only one free consumer and rest free consumers should wait for new message.
Thanks
I'm not sure I fully get your Redis consumers architecture, but I feel like it contradicts with the use case of Redis itself. What you're trying to achieve is essentially a queue based messaging with an ability to commit a message once its done.
Redis has its own pub/sub feature, but it is built on fire and forget principle. It doesn't distinguish between consumers - it just sends the data to all of them, assuming that its their logic to handle the incoming data.
I recommend to you use Queue Servers like RabbitMQ. You can achieve your goal with some features that AMQP 0-9-1 supports: message acknowledgment, consumer's prefetch count and so on. You can set up your cluster with very agile configs like ok, I want to have X consumers, and each can handle 1 unique (!) message at a time and they will receive new ones only after they let the server (rabbitmq) know that they successfully finished message processing. This is highly configurable and robust.
However, if you want to go serverless with some fully managed service so that you don't provision like virtual machines or anything else to run a message queue server of your choice, you can use AWS SQS. It has pretty much similar API and features list.
Hope it helps!

NodeJS of Azure Function is much more slower than local nodeJS

I'm beginner at nodeJs and Azure.
I'm trying to use wav-encoder npm module in my program.
wav-encoder
so I wrote code like below,
var WavEncoder = require('wav-encoder');
const whiteNoise1sec = {
sampleRate: 40000,
channelData: [
new Float32Array(40000).map(() => Math.random() - 0.5),
new Float32Array(40000).map(() => Math.random() - 0.5)
]
};
WavEncoder.encode(whiteNoise1sec).then((buffer)=>{
console.log(whiteNoise1sec);
console.log(buffer);
});
It runs on my local machine, less than 2 secs.
but if I upload similar code to Azure Functions, it takes more than 2 mins.
below is code in my Functions. It is triggered by http REST call.
var WavEncoder = require('wav-encoder');
module.exports = function (context, req) {
context.log('JavaScript HTTP trigger function processed a request.');
const whiteNoise1sec = {
sampleRate: 40000,
channelData: [
new Float32Array(40000).map(() => Math.random() - 0.5),
new Float32Array(40000).map(() => Math.random() - 0.5)
]
};
WavEncoder.encode(whiteNoise1sec).then((buffer)=>{
context.res = {
// status: 200, /* Defaults to 200 */
body: whiteNoise1sec
};
context.done();
});
};
Do you know how can I improve performance of Azure?
Update
context.res = {
// status: 200, /* Defaults to 200 */
body: whiteNoise1sec
};
context.done();
I found that this line cause slow performance.
If I give large size array to context.res.body it takes long time when I call context.done();
Isn't large size json response proper for Azure Functions???
It's a bit hard to analyze performance issues like this, but there are few things to consider here and few things to look at.
Cold functions vs warm functions performance
if the function hasn't been invoked in a while or never (I think it's about 10 or 20 minutes) it goes idle, meaning it gets deprovisioned. next time you hit that function it needs to be loaded from storage. Due to some architecture and relying of a certain type of storage, IO hits for small files are bad currently. There is work in progress to improve that, but a large npm tree would cause > 1 minute loading time just to fetch all the small js files. if the function is warm however, it should be in the msec range (or depending on the work your function is doing, see below for more)
Workaround: use this to pack your function https://github.com/Azure/azure-functions-pack
Slower CPU for consumption sku
in consumption sku, you are scaled to many instances (in the hundreds) but each instance is affinitized to a single core. That is fine for IO bound operations, regular node functions (since they are single threaded anyway), etc. But if your function tries to utilize CPU for CPU bound workloads, it's not going to perform as you expect it.
Workaround: you can use dedicated Skus for CPU bound workloads

Node.js: Why firing requests and forgot it can cause massive performance drop

I found problem accidentally when I was sending Google Analytics events without waiting for response, so I built sample code here: https://github.com/tanapoln/node_perf_test
Code is very simple, just 2 endpoints, sending OK as response except for the slow one that will fire pageview event to GA (with HTTP request) as you can see below:
let express = require('express')
let ua = require('universal-analytics')
let app = express()
let visitor = ua('UA-34321454-1', 'user1', {strictCidFormat: false, https: false});
app.get('/fast', function(req ,res) {
res.send('OK')
})
app.get('/slow', function(req, res) {
//This line will simply fire HTTP request to Google Analytics
visitor.pageview('/slow').send()
res.send('OK')
})
app.listen(3000, function() {
console.log("Server started at port 3000")
})
When benchmarking these 2 endpoints, you can see resule here:
Running 10s test # http://localhost:3000/fast
10 threads and 500 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 103.83ms 59.20ms 698.91ms 85.15%
Req/Sec 201.36 113.56 750.00 79.63%
19531 requests in 10.06s, 3.73MB read
Socket errors: connect 0, read 659, write 0, timeout 0
Requests/sec: 1941.10
Transfer/sec: 379.12KB
Running 10s test # http://localhost:3000/slow
10 threads and 500 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 407.06ms 228.33ms 1.23s 74.15%
Req/Sec 37.18 29.17 136.00 70.59%
2847 requests in 10.09s, 556.05KB read
Socket errors: connect 0, read 2900, write 1, timeout 0
Requests/sec: 282.10
Transfer/sec: 55.10KB
Request per second drop massively and I don't know the reasons why.
Please help.
I think problem is not live inside Node.js which I also tried with another language (in my case, Go) which I found same problem with same performance limitation.
This is far beyond my current knowledges. I guess that problem is live inside linux/unix TCP stack or even at lower level.
I will close this and do more research. If I hit a road block, I'll post another question again.
Thanks everyone.

node.js azure cache latency

I am working on a node.js express application which uses azure cache. I have deployed the service to azure and I notice a latency of 50ms or so for get and put rquests.
The methods I am using are:
var time1, time2;
var start = Date.now();
var cacheObject = this.cache;
cacheObject.put('test1', { first: 'Jane', last: 'Doe' }, function (error) {
if (error) throw error;
time1= Date.now() - start;
start = Date.now();
cacheObject.get('test1', function (error, data) {
if (error) throw error;
console.log('Data from cache:' + data);
time2 = (Date.now()-start);
res.send({t1: time1, t2: time2});
});
});
The time for put is represented by time1 and time2 represents the time for get.
From reading other posts on the internet, I understood that the latency should be in the order of a couple of ms, but 50ms seems a bit high. Am I using the methods properly? Are there any special settings I need to setup on the management portal? Or is 50ms latency expected?
A few obvious things to check first:
Is the client code running in the same region as the cache? The minimum possible latency is the network round trip time, which may be around 50ms between regions.
Is the Node.js Date.now() precise enough to measure a small number of milliseconds? I'm not familiar with Node.js, but in .NET you should use the StopWatch class for timing, rather than DateTime.Now.

Socket.IO server throttling a fast client

I have a server that uses socket.io and I need a way of throttling a client that is sending the server data too quickly. The server exposes both a TCP interface and a socket.io interface - with the TCP server (from the net module) I can use socket.pause() and socket.resume(), and this effectively throttles the client. But with socket.io's socket class there are no pause() and resume() methods.
What would be the easiest way of getting feedback to a client that it is overwhelming the server and needs to slow down? I liked socket.pause() and socket.resume() because it didn't require any additional code on the client-side - backup the TCP socket and things naturally slow down. Any equivalent for socket.io?
Update: I provide an API to interact with the server (there is currently a python version which runs over TCP and a JavaScript version which uses socket.io). So I don't have any real control over what the client does. Which is why using socket.pause() and socket.resume() is so great - backing up the TCP stream slows the python client down no matter what it tries to do. I'm looking for an equivalent for a JavaScript client.
With enough digging I found this:
this.manager.transports[this.id].socket.pause();
and
this.manager.transports[this.id].socket.resume();
Granted this probably won't work if the socket.io connection isn't a web sockets connection, and may break in a future update, but for now I'm going to go with it. When I get some time in the future I'll probably change it to the QUOTA_EXCEEDED solution that Pascal proposed.
Here is a dirty way to achieve throttling. Although this is a old post; some people may benefit from it:
First register a middleware:
io.on("connection", function (socket) {
socket.use(function (packet, next) {
if (throttler.canBeServed(socket, packet)) {
next();
}
});
//You other code ..
});
canBeServed is a simple throttler as seen below:
function canBeServed(socket, packet) {
if (socket.markedForDisconnect) {
return false;
}
var previous = socket.lastAccess;
var now = Date.now();
if (previous) {
var diff = now - previous;
//Check diff and disconnect if needed.
if (diff < 50) {
socket.markedForDisconnect = true;
setTimeout(function () {
socket.disconnect(true);
}, 1000);
return false;
}
}
socket.lastAccess = now;
return true;
}
You can use process.hrtime() instead of Date.time().
If you have a callback on your server somewhere which normally sends back the response to your client, you could try and change it like this:
before:
var respond = function (res, callback) {
res.send(data);
};
after
var respond = function (res, callback) {
setTimeout(function(){
res.send(data);
}, 500); // or whatever delay you want.
};
Looks like you should slow down your clients. If one client can send too fast for your server to keep up, this is not going to go very well with 100s of clients.
One way to do this would be have the client wait for the reply for each emit before emitting anything else. This way the server can control how fast the client can send by only answering when ready for example, or only answer after a set time.
If this is not enough, when a client exceeded x requests per second, start replying with something like QUOTA_EXCEEDED error, and ignore the data they send in. This will force external developers to make their app behave as you want them to do.
As another suggestion, I would propose a solution like this:
It is common for MySQL to get a large amount of requests which would take longer time to apply than the rate the requests coming in.
The server can record the requests in a table in db assuming this action is fast enough for the rate the requests are coming in and then process the queue at a normal rate for the server to sustain. This buffer system will allow the server to run slow but still process all the requests.
But if you want something sequential, then the request callback should be verified before the client can send another request. In this case, there should be a server ready flag. If the client is sending request while the flag is still red, then there can be a message telling the client to slow down.
simply wrap your client emitter into a function like below
let emit_live_users = throttle(function () {
socket.emit("event", "some_data");
}, 2000);
using use a throttle function like below
function throttle(fn, threshold) {
threshold = threshold || 250;
var last, deferTimer;
return function() {
var now = +new Date, args = arguments;
if(last && now < last + threshold) {
clearTimeout(deferTimer);
deferTimer = setTimeout(function() {
last = now;
fn.apply(this, args);
}, threshold);
} else {
last = now;
fn.apply(this, args);
}
}
}

Resources