Node.js, (Hi)Redis and the multi command

Node.js, (Hi)Redis and the multi command - node.js

I'm playing around with node.js and redis and installed the hiredis library via this command
npm install hiredis redis
I looked at the multi examples here:
https://github.com/mranney/node_redis/blob/master/examples/multi2.js
At line 17 it says
// you can re-run the same transaction if you like
which implies that the internal multi.queue object is never cleared once the commands finished executing.
My question is: How would you handle the situation in an http environment? For example, tracking the last connected user (this doesn't really need multi as it just executes one command but it's easy to follow)
var http = require('http');
redis = require('redis');
client = redis.createClient()
multi = client.multi();
http.createServer(function (request, response) {
multi.set('lastconnected', request.ip); // won't work, just an example
multi.exec(function(err, replies) {
console.log(replies);
});
});
In this case, multi.exec would execute 1 transaction for the first connected user, and 100 transactions for the 100th user (because the internal multi.queue object is never cleared).
Option 1: Should I create the multi object inside the http.createServer callback function, which would effectivly kill it at the end of the function's execution? How expensive in terms of CPU cycles would creating and destroying of this object be?
Option 2: The other option would be to create a new version of multi.exec(), something like multi.execAndClear() which will clear the queue the moment redis executed that bunch of commands.
Which option would you take? I suppose option 1 is better - we're killing one object instead of cherry picking parts of it - I just want to be sure as I'm brand new to both node and javascript.

The multi objects in node_redis are very inexpensive to create. As a side-effect, I thought it would be fun to let you re-use them, but this is obviously only useful under some circumstances. Go ahead and create a new multi object every time you need a new transaction.
One thing to keep in mind is that you should only use multi if you actually need all of the operations to execute atomically in the Redis server. If you just want to batch up a series of commands efficiently to save network bandwidth and reduce the number of callbacks you have to manage, just send the individual commands, one after the other. node_redis will automatically "pipeline" these requests to the server in order, and the individual command callbacks, if any, will be invoked in order.

Related

Is NodeJS is suitable for websites - (ie. Port Blocking)?

I have gone through many painful months with is issue and I am now ready to let this go to the bin of "what-a-great-lie-for-websites-nodejs-is"!!!
All NodeJS tutorials discuss how to create a website. When done, it works. For one person at a time though. All requests sent to the port will be blocked by the first come-first-serve situation. Why? Because most requests sent to the nodejs server will have to get parsed, data requested from the database, data calculated and parsed, response prepared and sent back to the ajax call. (this is a mere simple website example).
Same applies for authentication - a request is made, data is parsed, authentication takes place, session is created and sent back to the requester.
No matter how you sugar coat this - All requests are done this way. Yes you can employ async functionality which will shorten the time spent on some portions, yes you can try promises, yes you can employ clustering, yes you can employ forking/spawning, etc... The result is always the same at all times: port gets blocked.
I tried responding with a reference so that we can use sockets to pass the data back and matching it with the reference - that also blocked it.
The point of the matter is this: when you ask for help, everyone wants all sort of code examples, but never go to the task of helping with an actual answer that works. The whole wide world!!!!! Which leads me to the answer that nodeJS is not suitable for websites.
I read many requests regarding this and all have been met with: "you must code properly"..! Really? Is there no NodeJS skilled and experienced person who can lend an answer on this one????!!!!!
Most of the NodeJS coders come from the PHP side - All websites using PHP never have to utilise any workaround whatsoever in order to display a web page and it never blocks 2 people at the same time. Thanks to the web server.
So how come NodeJS community cannot come to some sort of an asnwer on this one other than: "code carefully"???!!!
They want examples - each example is met with: "well that is a blocking code, do it another way", or "code carefully"!!! Come one people!!!
Think of the following scenario: One User, One page with 4 lists of records. Theoretically all lists should be filled up with records independently. What is happening because of how data is requested, prepared and responded, each list in reality is waiting for the next one to finish. That is on one session alone.
What about 2 people, 2 sessions at the same time?
So my question is this: is NodeJS suitable for a website and if it is, can anyone show and prove this with a short code??? If you can't prove it, then the answer is: "NodeJS is not suitable for websites!"
Here is an example based on the simplest tutorial and it is still blocking:
var express = require('express'),
fs = require("fs");
var app = express();
app.get('/heavyload', function (req, res) {
var file = '/media/sudoworx/www/sudo-sails/test.zip';
res.send('Heavy Load');
res.end();
fs.readFile(file, 'utf8', function (err,fileContent) {
});
});
app.get('/lightload', function (req, res) {
var file = '/media/sudoworx/www/sudo-sails/test.zip';
res.send('Light Load');
res.end();
});
app.listen(1337, function () {
console.log('Listening!')
});
Now, if you go to "/heavyload" it will immediately respond because that is the first thing sent to the browser, and then nodejs proceeds reading a heavy file (a large file). If you now go to the second call "/lightload" at the same time, you will see that it is waiting for the loading of the file to finish from the first call before it proceeds with the browser output. This is the simplest example of how nodejs simply fails in handling what otherwise would be simple in php and similar script languages.
Like mentioned before, I tried as many as 20 different ways to do this in my career of nodejs programmer. I totally love nodejs, but I cannot get past this obstacle... This is not a complaint - it is a call for help because I am at my end road with nodejs and I don't know what to do.
I thank you kindly.

So here is what I found out. I will answer it with an example of a blocking code:
for (var k = 0; k < 15000; k++){
console.log('Something Output');
}
res.status(200).json({test:'Heavy Load'});
This will block because it has to do the for loop for a long time and then after it finished it will send the output.
Now if you do the same code like this it won't block:
function doStuff(){
for (var k = 0; k < 15000; k++){
console.log('Something Output');
}
}
doStuff();
res.status(200).json({test:'Heavy Load'});
Why? Because the functions are run asynchronously...! So how will I then send the resulting response to the requesting client? Currently I am doing it as follows:
Run the doStuff function
Send a unique call reference which is then received by the ajax call on the client side.
Put the callback function of the client side into a waiting object.
Listen on a socket
When the doStuff function is completed, it should issue a socket message with the resulting response together with the unique reference
When the socket on the client side gets the message with the unique reference and the resulting response, it will then match it with the waiting callback function and run it.
Done! A bit of a workaround (as mentioned before), but it's working! It does require a socket to be listening. And that is my solution to this port-blocking situation in NodeJS.
Is there some other way? I am hoping someone answers with another way, because I am still feeling like this is some workaround. Eish! ;)

is NodeJS suitable for a website
Definitely yes.
can anyone show and prove this with a short code
No.
All websites using PHP never have to utilise any workaround whatsoever
in order to display a web page and it never blocks 2 people at the
same time.
Node.js doesn't require any workarounds as well, it simply works without blocking.
To more broadly respond to your question/complaint:
A single node.js machine can easily serve as a web-server for a website and handle multiple sessions (millions actually) without any need for workarounds.
If you're coming from a PHP web-server, maybe instead of trying to migrate an existing code to a new Node website, first play with online simple website example of Node.js + express, if that works well, you can start adding code that require long-running processes like reading from DBs or reading/writing to files and verify that visitors aren't being blocked (they shouldn't be blocked).
See this tutorial on how to get started.
UPDATE FOR EXAMPLE CODE
To fix the supplied example code, you should convert your fs.readFile call to fs.createReadStream. readFile is less recommended for large files, I don't think readFile literally blocked anything, but the need to allocate and move large amounts of bytes may choke the server, createReadStream uses chunks instead which is much easier on the CPU and RAM:
rstream = fs.createReadStream(file);
var length = 0;
rstream.on('data', function (chunk) {
length += chunk.length;
// Do something with the chunk
});
rstream.on('end', function () { // done
console.log('file read! length = ' + length);
});
After switching your code to createReadStream I'm able to serve continues calls to heavyload / lightload in ~3ms each
THE BETTER ANSWER I SHOULD HAVE WRITTEN
Node.js is a single process architecture, but it has multi-process capabilities, using the cluster module, it allows you to write master/workers code that distributes the load across multiple workers on multiple processes.
You can also use pm2 to do the clustering for you, it has a built-in load balancer to distribute to work, and also allows for scaling up/down without downtime, see this.
In your case, while one process is reading/writing a large file, other processes can accept incoming requests and handle them.

firebase-queue: Multiple workers not running correctly?

I'm pretty new to Node.js, though I've been writing javascript for years. I'm more than open to any node advice for best-practices that I'm not following, or other rethinks. That said:
I'm building a system in which a user creates a reservation, and simultaneously submits a task for my firebase-queue to pick up. This queue has multiple specs associated with it. In turn, it's supposed to:
check availability, and in response confirm/throw an alert on the reservation and update the firebase data accordingly.
Update the users reservations, which is an index of reservation object keys, and removing any redundant ones.
Use node-schedule to create dated functions to send notifications about the pending expiration of their reservation.
However, when I run my script, only one of the firebase-queues that I instantiate runs. I can look in the dashboard and see that the progress is at 100, the _state is the new finished_state (which is the next spec's start_state), but that next queue won't pick up the task and process it.
If I quit my script and rerun it, that next queue will work fine. And then the queue after that won't work, until I repeat the act of quitting and rerunning the script. I can continue this until the entire task sequence completes, so I don't think the specs or the code being executed itself are blocking. I don't see any error states spring up, anyway.
From the documentation it looks like I should be able to write the script this way, with multiple calls to 'new Queue(queueRef, options, function(data, progress, resolve, reject)...' and they'll just run each task as I set them in their options (all of which are basically:
var options = {
'specId': 'process_reservation',
'numWorkers': 5,
'sanitize': true,
'suppressStack': false
};
but, nope. I guess I can spawn child-processes for each of the queue instances, but I'm not sure if that's an extreme reaction to the issues I'm seeing, and I'm not sure if it would complicate the node structure in terms of shared module exports. Also, I'm not sure if it'll start eating into my concurrent connection count.
Thanks!

Concurrent processing via scala singleton object

I'm trying to build a simple orchestration engine in a functional test like the following:
object Engine {
def orchestrate(apiSequence : Seq[Any]) {
val execUnitList = getExecutionUnits(apiSequence) // build a specific list
schedule(execUnitList) // call multiple APIs
}
In the methods called underneath (getExecutionUnits, and schedule), the pattern I've applied is one where I incrementally build a list (hence, not a val but a var), iterate over the list and call sepcific APIs and run some custom validation on each one.
I'm aware that an object in scala is sort of equivalent to a singleton (so there's only one instance of Engine, in my case). I'm wondering if this is an appropriate pattern if I'm expecting 100's of invocations of the orchestrate method concurrently. I'm not managing any other internal variables within the Engine object and I'm simply acting on the provided arguments in the method. Assuming that the schedule method can take up to 10 seconds, I'm worried about the behavior when it comes to concurrent access. If client1, client2 and client3 call this method at the same time, will 2 of the clients get queued up and be blocked my the current client being processed?
Is there a safer idiomatic way to handle the use-case? Do you recommend using actors to wrap up the "orchestrate" method to handle concurrent requests?
Edit: To clarify, it is absolutely essential the the 2 methods (getExecutionUnits and schedule) and called in sequence. Moreover, the schedule method in turn calls multiple APIs (anywhere between 1 to 10) and it is important that they too get executed in sequence. As of right now I have a simply for loop that tackles 1 Api at a time, waits for the response, then moves onto the next one if appropriate.

I'm not managing any other internal variables within the Engine object and I'm simply acting on the provided arguments in the method.
If you are using any vars in Engine at all, this won't work. However, from your description it seems like you don't: you have a local var in getExecutionUnits method and (possibly) a local var in schedule which is initialized with the return value of getExecutionUnits. This case should be fine.
If client1, client2 and client3 call this method at the same time, will 2 of the clients get queued up and be blocked my the current client being processed?
No, if you don't add any synchronization (and if Engine itself has no state, you shouldn't).
Do you recommend using actors to wrap up the "orchestrate" method to handle concurrent requests?
If you wrap it in one actor, then the clients will be blocked waiting while the engine is handling one request.

How to pipeline in node.js to redis?

I have lot's of data to insert (SET \ INCR) to redis DB, so I'm looking for pipeline \ mass insertion through node.js.
I couldn't find any good example/ API for doing so in node.js, so any help would be great!

Yes, I must agree that there is lack of examples for that but I managed to create the stream on which I sent several insert commands in batch.
You should install module for redis stream:
npm install redis-stream
And this is how you use the stream:
var redis = require('redis-stream'),
client = new redis(6379, '127.0.0.1');
// Open stream
var stream = client.stream();
// Example of setting 10000 records
for(var record = 0; record < 10000; record++) {
// Command is an array of arguments:
var command = ['set', 'key' + record, 'value'];
// Send command to stream, but parse it before
stream.redis.write( redis.parse(command) );
}
// Create event when stream is closed
stream.on('close', function () {
console.log('Completed!');
// Here you can create stream for reading results or similar
});
// Close the stream after batch insert
stream.end();
Also, you can create as many streams as you want and open/close them as you want at any time.
There are several examples of using redis stream in node.js on redis-stream node module

In node_redis there all commands are pipelined:
https://github.com/mranney/node_redis/issues/539#issuecomment-32203325

You might want to look at batch() too. The reason why it'd be slower with multi() is because it's transactional. If something failed, nothing would be executed. That may be what you want, but you do have a choice for speed here.
The redis-stream package doesn't seem to make use of Redis' mass insert functionality so it's also slower than the mass insert Redis' site goes on to talk about with redis-cli.
Another idea would be to use redis-cli and give it a file to stream from, which this NPM package does: https://github.com/almeida/redis-mass
Not keen on writing to a file on disk first? This repo: https://github.com/eugeneiiim/node-redis-pipe/blob/master/example.js
...also streams to Redis, but without writing to file. It streams to a spawned process and flushes the buffer every so often.
On Redis' site under mass insert (http://redis.io/topics/mass-insert) you can see a little Ruby example. The repo above basically ported that to Node.js and then streamed it directly to that redis-cli process that was spawned.
So in Node.js, we have:
var redisPipe = spawn('redis-cli', ['--pipe']);
spawn() returns a reference to a child process that you can pipe to with stdin. For example: redisPipe.stdin.write().
You can just keep writing to a buffer, streaming that to the child process, and then clearing it every so often. This then won't fill it up and will therefore be a bit better on memory than perhaps the node_redis package (that literally says in its docs that data is held in memory) though I haven't looked into it that deeply so I don't know what the memory footprint ends up being. It could be doing the same thing.
Of course keep in mind that if something goes wrong, it all fails. That's what tools like fluentd were created for (and that's yet another option: http://www.fluentd.org/plugins/all - it has several Redis plugins)...But again, it means you're backing data on disk somewhere to some degree. I've personally used Embulk to do this too (which required a file on disk), but it did not support mass inserts, so it was slow. It took nearly 2 hours for 30,000 records.
One benefit to a streaming approach (not backed by disk) is if you're doing a huge insert from another data source. Assuming that data source returns a lot of data and your server doesn't have the hard disk space to support all of it - you can stream it instead. Again, you risk failures.
I find myself in this position as I'm building a Docker image that will run on a server with not enough disk space to accommodate large data sets. Of course it's a lot easier if you can fit everything on the server's hard disk...But if you can't, streaming to redis-cli may be your only option.
If you are really pushing a lot of data around on a regular basis, I would probably recommend fluentd to be honest. It comes with many great features for ensuring your data makes it to where it's going and if something fails, it can resume.
One problem with all of these Node.js approaches is that if something fails, you either lose it all or have to insert it all over again.

By default, node_redis, the Node.js library sends commands in pipelines and automatically chooses how many commands will go into each pipeline [(https://github.com/NodeRedis/node-redis/issues/539#issuecomment-32203325)][1]. Therefore, you don't need to worry about this. However, other Redis clients may not use pipelines by default; you will need to check out the client documentation to see how to take advantage of pipelines.

Node.js server GET to separate API failing after a few hours of use

In my node site I call a restful API service I have built using a standard http get. After a few hours of this communication successfully working I find that the request stops being sent, it just waits and eventually times out.
The API that is being called is still receiving requests from elsewhere perfectly well but when a request is sent from the site it does not reach the API.
I have tried with stream.pipe, util.pump and just writing the file to the file system.
I am using Node 0.6.15. My site and the service that is being called are on the same server so calls to localhost are being made. Memory usage is about 25% over all with cpu averaging about 10% usage.
After a while of the problem I started using the request module but I get the same behaviour. The number of calls it makes before failing varrys it seems between 5 to 100. In the end I have to restart the site but not the api to make it work again.
Here is roughly what the code in the site looks like:
var Request = require('request');
downloadPDF: function(req, res) {
Project.findById(req.params.Project_id, function(err, project) {
project.findDoc(req.params.doc_id ,function(err, doc) {
var pdfileName;
pdfileName = doc.name + ".pdf";
res.contentType(pdfileName);
res.header('Content-Disposition', "filename=" + pdfileName);
Request("http://localhost:3001/" + project._id).pipe(res);
});
});
}
I am lots at what could be happening.

Did you try to increase agent.maxSockets, or to disable http.Agent functionality? By default, recent node versions use sockets pooling for HTTP client connects, this may be source of the problem
http://nodejs.org/api/http.html#http_class_http_agent

I'm not sure how busy your Node server is, but it could be that all of your sockets are in TIME_WAIT status.
If you run this command, you should see how many sockets are in this state:
netstat -an | awk '/tcp/ {print $6}' | sort | uniq -c
It's normal to have some, of course. You just don't want to max out your system's available sockets and have them all be in TIME_WAIT.
If this is the case, you would actually want to reduce the agent.maxSockets setting (contrary to #user1372624's suggestion), as otherwise each request would simply receive a new socket even though it could simply reuse a recent one. It will simply take longer to reach a non-responsive state.
I found this Gist (a patch to http.Agent) that might help you.
This Server Fault answer might also help: https://serverfault.com/a/212127
Finally, it's also possible that updating Node may help, as they may have addressed the keep-alive behavior since your version (you might check the change log).

You're using a callback to return a value which doesn't make a lot of sense because your Project.findById() returns immediately, without waiting for the provided callback to complete.
Don't feel bad though, the programming model that nodejs uses is somewhat difficult at first to wrap your head around.
In event-driven programming (EDP), we provide callbacks to accomplish results, ignoring their return values since we never know when the callback might actually be called.
Here's a quick example.
Suppose we want to write the result of an HTTP request into a file.
In a procedural (non-EDP) programming environment, we rely on functions that only return values when they have them to return.
So we might write something like (pseudo-code):
url = 'http://www.example.com'
filepath = './example.txt'
content = getContentFromURL(url)
writeToFile(filepath,content)
print "Done!"
which assumes that our program will wait until getContentFromURL() has contacted the remote server, made its request, waited for a result and returned that result to the program.
The writeToFile() function then asks the operating system to open a local file at some filepath for write, waiting until it is told the open file operation has completed (typically waiting for the disk driver to report that it can carry out such an operation.)
writeToFile() then requests that the operating system write the content to the newly opened file, waiting until it is told that the driver the operating system uses to write files tells it it has accomplished this goal, returning the result to the program so it can tell us that the program has completed.
The problem that nodejs was created to solve is to better use all the time wasted by all the waiting that occurs above.
It does this by using functions (callbacks) which are called when operations like retrieving a result from a remote web request or writing a file to the filesystem complete.
To accomplish the same task above in an event-driven programming environment, we need to write the same program as:
getContentFromURL(url,onGetContentFromURLComplete)
function onGetContentFromURLComplete(content,err){
writeToFile(content,onWriteToFileComplete);
}
function onWriteToFileComplete(err){
print "Done!";
}
where
calling getContentFromURL() only calls the onGetContentFromURLComplete callback once it has the result of the web request and
calling writeToFile() only calls its callback to display a success message when it completes writing the content successfully.
The real magic of nodejs is that it can do all sorts of other things during the surprisingly large amount of time procedural functions have to wait for most time-intensive operations (like those concerned with input and output) to complete.
(The examples above ignore all errors which is usually considered a bad thing.)

I have also experienced intermittent errors using the inbuilt functions. As a work around I use the native wget. I do something like the following
var exec = require('child_process').exec;
function fetchURL(url, callback) {
var child;
var command = 'wget -q -O - ' + url;
child = exec(command, function (error, stdout, stderr) {
callback(error, stdout, stderr);
});
}
With a few small adaptations you could make it work for your needs. So far it is rock solid for me.

Did you try logging your parameters while calling this function? The error can depend on req.params.Project_id. You should also provide error handling in your callback functions.
If you can nail down the failing requests to a certain parameter set (make them reproducible), you could debug you application easily with node-inspector.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string