node.js azure cache latency - node.js

I am working on a node.js express application which uses azure cache. I have deployed the service to azure and I notice a latency of 50ms or so for get and put rquests.
The methods I am using are:
var time1, time2;
var start = Date.now();
var cacheObject = this.cache;
cacheObject.put('test1', { first: 'Jane', last: 'Doe' }, function (error) {
if (error) throw error;
time1= Date.now() - start;
start = Date.now();
cacheObject.get('test1', function (error, data) {
if (error) throw error;
console.log('Data from cache:' + data);
time2 = (Date.now()-start);
res.send({t1: time1, t2: time2});
});
});
The time for put is represented by time1 and time2 represents the time for get.
From reading other posts on the internet, I understood that the latency should be in the order of a couple of ms, but 50ms seems a bit high. Am I using the methods properly? Are there any special settings I need to setup on the management portal? Or is 50ms latency expected?

A few obvious things to check first:
Is the client code running in the same region as the cache? The minimum possible latency is the network round trip time, which may be around 50ms between regions.
Is the Node.js Date.now() precise enough to measure a small number of milliseconds? I'm not familiar with Node.js, but in .NET you should use the StopWatch class for timing, rather than DateTime.Now.

Related

Prometheus and Node exporter in milliseconds

I have a node express app with prom-client to monitor a serial connection and report the values to a http endpoint, the serial speed is 9600baud and is transferring some statistics over.
A Prometheus instance is configured with a job 10milliseconds interval to target that end point and grab the metrics.
I want to be able to see this metrics in at least 10 milliseconds resolution but it seems the Prometheus graph resolution does not accepts less than 1 seconds.
What should I do to get Prometheus collect data with at least 10 milliseconds res.
Is there a config I miss?
I have searched for hours
this is my node js app, a serial port listener is waiting for json messages, parses them and updates gauge metric types from 'prom-client' to be represented by express!
const serialPath = '/dev/tty.usbmodem14201';
const port = new SerialPort(serialPath, {
baudRate: 9600
});
const parser = new Readline();
port.pipe(parser);
parser.on('data', (line) => {
try {
const obj = JSON.parse(line);
if (obj.command !== undefined) {
console.log(obj);
}
if (obj.a) {
obj.a.forEach((analog) => {
analogGuage.set({
pin: analog.i
}, analog.v);
})
}
} catch (ex) {
console.log('Exception in parsing serial json:', ex);
console.log('Exception in parsing serial json:', line);
}
});
metrics endpoint for prometheus to call each 10ms
expressApp.get('/metrics', (req, res) => {
const metrics = client.register.metrics();
res.set('Content-Type', client.register.contentType);
res.end(metrics);
});
It is critical to mention all this is for an experimental personal embedded system :) so, no bottleneck or performance considerations are in place except to be able to transfer and parse serial reading in less than 10ms
since right now the Prometheus and the node exporter app are running on my PC, so 10ms intervals seems easy for Prom.
Please help.
Answer Edit: so I decided to drop Prometheus instead of InfluxDB, as both licenses allow source access and they promote millisec, nanosec monitoring,
but for future reference 9600baud was not enough either, but still after 115200baud rate and 150millisec reporting loops Prom. still did not manage to show less than 1sec,
So InfluxDB did it beatifullty , here is some pictures:
bellow is a 30sec window of Prom. on 115200baud
and about 10 second on same 115200baud in InfluxDB
While you can set scrape intervals less than a second, this isn't what Prometheus is designed for as that's hard real time monitoring at that point and e.g. the kernel scheduler may cause Prometheus to stop running briefly and miss some scrapes, which wouldn't be an issue with more typical scrape intervals.
I'd suggest looking at a custom solution if you need such a high resolution.

NodeJS of Azure Function is much more slower than local nodeJS

I'm beginner at nodeJs and Azure.
I'm trying to use wav-encoder npm module in my program.
wav-encoder
so I wrote code like below,
var WavEncoder = require('wav-encoder');
const whiteNoise1sec = {
sampleRate: 40000,
channelData: [
new Float32Array(40000).map(() => Math.random() - 0.5),
new Float32Array(40000).map(() => Math.random() - 0.5)
]
};
WavEncoder.encode(whiteNoise1sec).then((buffer)=>{
console.log(whiteNoise1sec);
console.log(buffer);
});
It runs on my local machine, less than 2 secs.
but if I upload similar code to Azure Functions, it takes more than 2 mins.
below is code in my Functions. It is triggered by http REST call.
var WavEncoder = require('wav-encoder');
module.exports = function (context, req) {
context.log('JavaScript HTTP trigger function processed a request.');
const whiteNoise1sec = {
sampleRate: 40000,
channelData: [
new Float32Array(40000).map(() => Math.random() - 0.5),
new Float32Array(40000).map(() => Math.random() - 0.5)
]
};
WavEncoder.encode(whiteNoise1sec).then((buffer)=>{
context.res = {
// status: 200, /* Defaults to 200 */
body: whiteNoise1sec
};
context.done();
});
};
Do you know how can I improve performance of Azure?
Update
context.res = {
// status: 200, /* Defaults to 200 */
body: whiteNoise1sec
};
context.done();
I found that this line cause slow performance.
If I give large size array to context.res.body it takes long time when I call context.done();
Isn't large size json response proper for Azure Functions???
It's a bit hard to analyze performance issues like this, but there are few things to consider here and few things to look at.
Cold functions vs warm functions performance
if the function hasn't been invoked in a while or never (I think it's about 10 or 20 minutes) it goes idle, meaning it gets deprovisioned. next time you hit that function it needs to be loaded from storage. Due to some architecture and relying of a certain type of storage, IO hits for small files are bad currently. There is work in progress to improve that, but a large npm tree would cause > 1 minute loading time just to fetch all the small js files. if the function is warm however, it should be in the msec range (or depending on the work your function is doing, see below for more)
Workaround: use this to pack your function https://github.com/Azure/azure-functions-pack
Slower CPU for consumption sku
in consumption sku, you are scaled to many instances (in the hundreds) but each instance is affinitized to a single core. That is fine for IO bound operations, regular node functions (since they are single threaded anyway), etc. But if your function tries to utilize CPU for CPU bound workloads, it's not going to perform as you expect it.
Workaround: you can use dedicated Skus for CPU bound workloads

Amazon SQS with aws-sdk receiveMessage Stall

I'm using the aws-sdk node module with the (as far as I can tell) approved way to poll for messages.
Which basically sums up to:
sqs.receiveMessage({
QueueUrl: queueUrl,
MaxNumberOfMessages: 10,
WaitTimeSeconds: 20
}, function(err, data) {
if (err) {
logger.fatal('Error on Message Recieve');
logger.fatal(err);
} else {
// all good
if (undefined === data.Messages) {
logger.info('No Messages Object');
} else if (data.Messages.length > 0) {
logger.info('Messages Count: ' + data.Messages.length);
var delete_batch = new Array();
for (var x=0;x<data.Messages.length;x++) {
// process
receiveMessage(data.Messages[x]);
// flag to delete
var pck = new Array();
pck['Id'] = data.Messages[x].MessageId;
pck['ReceiptHandle'] = data.Messages[x].ReceiptHandle;
delete_batch.push(pck);
}
if (delete_batch.length > 0) {
logger.info('Calling Delete');
sqs.deleteMessageBatch({
Entries: delete_batch,
QueueUrl: queueUrl
}, function(err, data) {
if (err) {
logger.fatal('Failed to delete messages');
logger.fatal(err);
} else {
logger.debug('Deleted recieved ok');
}
});
}
} else {
logger.info('No Messages Count');
}
}
});
receiveMessage is my "do stuff with collected messages if I have enough collected messages" function
Occasionally, my script is stalling because I don't get a response for Amazon at all, say for example there are no messages in the queue to consume and instead of hitting the WaitTimeSeconds and sending a "no messages object", the callback isn't called.
(I'm writing this up to Amazon Weirdness)
What I'm asking is whats the best way to detect and deal with this, as I have some code in place to stop concurrent calls to receiveMessage.
The suggested answer here: Nodejs sqs queue processor also has code that prevents concurrent message request queries (granted it's only fetching one message a time)
I do have the whole thing wrapped in
var running = false;
runMonitorJob = setInterval(function() {
if (running) {
} else {
running = true;
// call SQS.receive
}
}, 500);
(With a running = false after the delete loop (not in it's callback))
My solution would be
watchdogTimeout = setTimeout(function() {
running = false;
}, 30000);
But surely this would leave a pile of floating sqs.receive's lurking about and thus much memory over time?
(This job runs all the time, and I left it running on Friday, it stalled Saturday morning and hung till I manually restarted the job this morning)
Edit: I have seen cases where it hangs for ~5 minutes and then suddenly gets messages BUT with a wait time of 20 seconds it should throw a "no messages" after 20 seconds. So a WatchDog of ~10 minutes might be more practical (depending on the rest of ones business logic)
Edit: Yes Long Polling is already configured Queue Side.
Edit: This is under (latest) v2.3.9 of aws-sdk and NodeJS v4.4.4
I've been chasing this (or a similar) issue for a few days now and here's what I've noticed:
The receiveMessage call does eventually return although only after 120 seconds
Concurrent calls to receiveMessage are serialised by the AWS.SDK library so making multiple calls in parallel have no effect.
The receiveMessage callback does not error - in fact after the 120 seconds have passed, it may contain messages.
What can be done about this? This sort of thing can happen for a number of reasons and some/many of these things can't necessarily be fixed. The answer is to run multiple services each calling receiveMessage and processing the messages as they come - SQS supports this. At any time, one of these services may hit this 120 second lag but the other services should be able to continue on as normal.
My particular problem is that I have some critical singleton services that can't afford 120 seconds of down time. For this I will look into either 1) use HTTP instead of SQS to push messages into my service or 2) spawn slave processes around each of the singletons to fetch the messages from SQS and push them into the service.
I also ran into this issue, but not when calling receiveMessage but sendMessage. I also saw hangups of exactly 120 seconds. I also saw it with a few other services, like Firehose.
That lead me to this line in the AWS SDK:
SQS Constructor
httpOptions:
timeout [Integer] — Sets the socket to timeout after timeout milliseconds of inactivity on the socket. Defaults to two minutes (120000).
to implement a fix, I override the timeout for my SQS client that performs the sendMessage to timeout after 10 seconds, and another with 25 seconds for receiving (where I long poll for 20 seconds):
var sendClient = new AWS.SQS({httpOptions:{timeout:10*1000}});
var receiveClient = new AWS.SQS({httpOptions:{timeout:25*1000}});
I've had this out in production for a week now and I've noticed that all of my SQS stalling issues have been eliminated.

Building a chat app: How to get time

I am building a chat app currently with PubNub. The problem now is from the app/frontend point of view, how should it get the time (server time). If every message is sent to the server, I could get the server time there. But with a 3rd party service like PubNub, how can I manage this? Since app sends messages to PubNub rather than my server. I dont want to rely on local time as users might have inaccurate clocks.
The simplest solution I thought of is: When app starts up, get server time. Record the difference between local time and server time (diff = Date.now() - serverTime). When sending messages, the time will be Date.now() - diff. Is this correct so far?
I guess this solution assumes 0 transmission (or latency) time? Is there a more correct or recommended way to implement this?
Your use case is probably the reason why pubnub.time() exists.
In fact, they even have a code example describing your drift calculation.
https://github.com/pubnub/javascript/blob/1fa0b48227625f92de9460338c222152c853abda/examples/time-drift-detla-detection/drift-delta-detection.html
// Drift Functions
function now(){ return+new Date }
function clock_drift(cb) {
clock_drift.start = now();
PUBNUB.time(function(timetoken){
var latency = (now() - clock_drift.start) / 2
, server_time = (timetoken / 10000) + latency
, local_time = now()
, drift = local_time - server_time;
cb(drift);
});
if (clock_drift.ival) return;
clock_drift.ival = setInterval( function(){clock_drift(cb)}, 1000 );
}
// This is how you use the code
// Periodically Get Latency in Miliseconds
clock_drift(function(latency){
var out = PUBNUB.$('latency');
out.innerHTML = "Clock Drift Delta: " + latency + "ms";
// Flash Update
PUBNUB.css( out, { background : latency > 2000 ? '#f32' : '#5b5' } );
setTimeout( function() {
PUBNUB.css( out, { background : '#444' } );
}, 300 );
});

How to detect and measure event loop blocking in node.js?

I'd like to monitor how long each run of the event loop in node.js takes. However I'm uncertain about the best way to measure this. The best way I could come up with looks like this:
var interval = 500;
var interval = setInterval(function() {
var last = Date.now();
setImmediate(function() {
var delta = Date.now() - last;
if (delta > blockDelta) {
report("node.eventloop_blocked", delta);
}
});
}, interval);
I basically infer the event loop run time by looking at the delay of a setInterval. I've seen the same approach in the blocked node module but it feels inaccurate and heavy. Is there a better way to get to this information?
Update: Changed the code to use setImmediate as done by hapi.js.
"Is there a better way to get this information?"
I don't have a better way to test the eventloop than checking the time delay of SetImmediate, but you can get better precision using node's high resolution timer instead of Date.now()
var interval = 500;
var interval = setInterval(function() {
var last = process.hrtime(); // replace Date.now()
setImmediate(function() {
var delta = process.hrtime(last); // with process.hrtime()
if (delta > blockDelta) {
report("node.eventloop_blocked", delta);
}
});
}, interval);
NOTE: delta will be a tuple Array [seconds, nanoseconds].
For more details on process.hrtime():
https://nodejs.org/api/all.html#all_process_hrtime
"The primary use is for measuring performance between intervals."
Check out this plugin https://github.com/tj/node-blocked I'm using it now and it seems to do what you want.
let blocked = require("blocked");
blocked(ms => {
console.log("EVENT LOOP Blocked", ms);
});
Will print out how long in ms the event loop is blocked for
Code
this code will measure the time in nanoseconds it took for the event loop to trigger. it measures the time between the current process and the next tick.
var time = process.hrtime();
process.nextTick(function() {
var diff = process.hrtime(time);
console.log('benchmark took %d nanoseconds', diff[0] * 1e9 + diff[1]);
// benchmark took 1000000527 nanoseconds
});
EDIT: added explanation,
process.hrtime([time])
Returns the current high-resolution real time in a [seconds, nanoseconds] tuple Array. time is an optional parameter that must be the result of a previous process.hrtime() call (and therefore, a real time in a [seconds, nanoseconds] tuple Array containing a previous time) to diff with the current time. These times are relative to an arbitrary time in the past, and not related to the time of day and therefore not subject to clock drift. The primary use is for measuring performance between intervals.
process.nextTick(callback[, arg][, ...])
Once the current event loop turn runs to completion, call the callback function.
This is not a simple alias to setTimeout(fn, 0), it's much more efficient. It runs before any additional I/O events (including timers) fire in subsequent ticks of the event loop.
You may also want to look at the profiling built into node and io.js. See for example this article http://www.brendangregg.com/flamegraphs.html
And this related SO answer How to debug Node.js applications

Resources