So Im learning node.js and making a random player vs player cardgame. I will be using sockets and sessions.
Theoreticall ill have lots of games running at the same time.
Whats the best practice here, do I spawn a child process of my node server for every game, or simply just keep data of ongoing games and 'boards' in the main server and just grab a given game every time there is an action?
Presuming there's no intensive CPU processing going on, you should do it all in a single process. Node is very good at doing a high volume of very small operations very quickly, all on a single thread, and it sounds like that fits your application.
If you were going to be doing CPU-intensive calculations you might want to hand off to worker processes, but you still wouldn't want a separate process per game.
More than likely, you'll want to load balance however many servers you need. If you're using socket.io, then you should look into socket.io-redis (briefly covered here) to see how you can have connections running across multiple servers still talk to each other. With an approach like this, you'll need to consider the fact that you can't store some info in memory, and instead will need to persist to redis or a similarly accessible medium for all servers to access.
Related
I can't seem to get this concept right in my head. If I have a website that gets 1 million concurrent users, without any databases at all, will I need to scale? I'm Using Node.js and Socket.IO. Also is there a way I could simulate something like this on my localhost?
Having one million user, or connections, on Socke.io, doesn't mean you have to scale, but depending on what they are doing, you would probably do. Having a data base adds storage but has nothing more to do with the need for scaling the Node.JS server.
You can create a test to try to insert as much as you want using a loop to connect and then try to emit an event for each of then.
For scaling node you can use a cluster. A single instance of Node.js runs in a single thread. To take advantage of multi-core systems, the user will sometimes want to launch a cluster of Node.js processes to handle the load. https://nodejs.org/api/cluster.html#cluster_cluster
To simulate high load, there are open source tools you can use for free: http://www.opensourcetesting.org/category/performance/
I have a simple nodejs webserver running, it:
Accepts requests
Spawns separate thread to perform background processing
Background thread returns results
App responds to client
Using Apache benchmark "ab -r -n 100 -c 10", performing 100 requests with 10 at a time.
Average response time of 5.6 seconds.
My logic for using nodejs is that is typically quite resource efficient, especially when the bulk of the work is being done by another process. Seems like the most lightweight webserver option for this scenario.
The Problem
With 10 concurrent requests my CPU was maxed out, which is no surprise since there is CPU intensive work going on the background.
Scaling horizontally is an easy thing to, although I want to make the most out of each server for obvious reasons.
So how with nodejs, either raw or some framework, how can one keep that under control as to not go overkill on the CPU.
Potential Approach?
Could accepting the request storing it in a db or some persistent storage and having a separate process that uses an async library to process x at a time?
In your potential approach, you're basically describing a queue. You can store incoming messages (jobs) there and have each process get one job at the time, only getting the next one when processing the previous job has finished. You could spawn a number of processes working in parallel, like an amount equal to the number of cores in your system. Spawning more won't help performance, because multiple processes sharing a core will just run slower. Keeping one core free might be preferred to keep the system responsive for administrative tasks.
Many different queues exist. A node-based one using redis for persistence that seems to be well supported is Kue (I have no personal experience using it). I found a tutorial for building an implementation with Kue here. Depending on the software your environment is running in though, another choice might make more sense.
Good luck and have fun!
I am in the process of beginning to write a worker queue for node using node's cluster API and mongoose.
I noticed that a lot of libs exist that already do this but using redis and forking. Is there a good reason to fork versus using the cluster API?
edit and now i also find this: https://github.com/xk/node-threads-a-gogo -- too many options!
I would rather not add redis to the mix since I already use mongo. Also, my requirements are very loose, I would like persistence but could go without it for the first version.
Part two of the question:
What are the most stable/used nodejs worker queue libs out there today?
Wanted to follow up on this. My solution ended up being a roll your own cluster impl where some of my cluster workers are dedicated job workers (ie they just have code to work on jobs).
I use agenda for job scheduling.
Cron type jobs are scheduled by the cluster master. The rest of the jobs are created in the non-worker clusters as they are needed. (verification emails etc)
Before that I was using kue but dropped it because the rest of my app uses mongodb and I didnt like having to use redis just for job scheduling.
Have u tried https://github.com/rvagg/node-worker-farm?
It is very light weight and doesn't require a separate server.
I personally am partial to cluster-master.
https://github.com/isaacs/cluster-master
The reason I like cluster master is because it does very little besides add in logic for forking your process, and give you the ability to manage the number of process you're running, and a little bit of logging/recovery to boot! I find overly bloated process management libraries tend to be unstable, and sometimes even slow things down.
This library will be good for you if the following are true:
Your module is largely asynchronous
You don't have a huge amount of different types of events triggering
The events that fire have small amounts of work to do, but you have lots of similar events firing(things like web servers)
The reason for the above list, is the reason why threads-a-gogo may be good for you, for the opposite reasons. If you have a few spots in your code, where there is a lot of work to do within your event loop, something like threads-a-gogo that launches a "thread" specifically for this work is awesome, because you aren't determining ahead of time how many workers to spawn, but rather spawning them to do work when needed. Note: this can also be bad if there is the potential for a lot of them to spawn, if you start launching too many processes things can actually bog down, but I digress.
To summarize, if your module is largely asynchronous already, what you really want is a worker pool. To minimize the down time when your process is not listening for events, and to maximize the amount of processor you can use. Unless you have a very busy syncronous call, a single node event loop will have troubles taking advantage of even a single core of a processor. Under this circumstance, you are best off with cluster-master. What I recommend is doing a little benchmarking, and see how much of a single core your program can use under the "worst case scenario". Let's say this is 33% of one core. If you have a quad core machine, you then tell cluster master to launch you 12 workers.
Hope this helped!
The following two diagrams are my understanding on how threads work in a event-driven web server (like Node.js + JavaScript) compared to a non-event driven web server (like IIS + C#)
From the diagram is easy to tell that on a traditional web server the number of threads used to perform 3 long running operations is larger than on a event-driven web server (3 vs 1.)
I think I got the "traditional web server" counts correct (3) but I wonder about the event-driven one (1). Here are my questions:
Is it correct to assume that only one thread was used in the event-driven scenario? That can't be correct, something must have been created to handle the I/O tasks. Right?
How did the evented server handled the I/O? Let's say that the I/O was to read from a database. I suspect that the web server had to create a thread to hand off the job of connecting to the database? Right?
If the event-driven web server indeed created threads to handle the I/O where is the gain?
A possible explanation for my confusion could be that on both scenarios, traditional and event-driven, three separate threads were indeed created to handle the I/O (not shown in the pictures) but the difference is really on the number of threads on the web server per-se, not on the I/O threads. Is that accurate?
Node may use threads for IO. The JS code runs in a single thread, but all the IO requests are running in parallel threads. If you want some JS code to run in parallel threads, use thread-a-gogo or some other packages out there which mitigate that behaviour.
Same as 1., threads are created by Node for IO operations.
You don't have to handle threading, unless you want to. Easier to develop. At least that's my point of view.
A node application can be coded to run like another web server. Typically, JS code runs in a single thread, but there are ways to make it behave differently.
Personally, I recommend threads-a-gogo (the package name isn't that revealing, but it is easy to use) if you want to experiment with threads. It's faster.
Node also supports multiple processes, you may run a completely separate process if you also want to try that out.
The best way to picture NodeJS is like a furious squirrel (i.e. your thread) running in a wheel with an infinite number of pigeons (your I/O) available to pass messages around.
I/O in node is "free". Your squirrel works to set up the connection and send the pigeon off, then can go on to do other things while the pigeon retrieves the data, only dealing with the data when the pigeon returns.
If you write bad code, you can end up having the squirrel waiting for each pigeon.
So always write non-blocking i/o code.
If you can encourage your Pigeons to promise to come back ;)
Promises and generators are probably the best approach you can take to this.
HOWEVER you can always use Node cluster to establish a master squirrel that will procreate child squirrels based on the number of CPUs the master squirrel can find to dole out the work.
Hope this helps and note the complete lack of a car analogy.
I am trying to learn Node.js and some of points that I understand:
Node.js does'nt create a seperate process for each request, instead it is just one process which processes all requests.
It is asynchronous which means you can attach a callback to a long-lasting process and continue your rest of the work without waiting for it to finish.
What I really don't understand is author's point in Understanding node.js - "Everything runs in parallel except your code". I have understood the analogy and the code that explains it but still I don't get it what is the distinction between "Everything" and "code". I have more often heard this about node.js.
Also, people pat node.js for its efficiency since memory overhead for one concurrent connection may be as low as 8KB but what about CPU load. Does node.js make it way less as compared to PHP+Apache?
Node.js uses a single thread any time it is running the JavaScript in your application. Tasks that are asynchronous (network, filesystem, etc.) are all handled on separate threads automatically for you. This means that you get much of the usefulness of a multithreaded application without having to worry about all of the trouble that comes with locking resources and what not.
Node is not a tool for every job. It is ideal for applications that are IO bound. For example, if your application required a ton of work to process templates and what not, Node probably isn't for you. If instead you're just shuffling data around, Node can be very effective.
The reason Node is often quoted as being faster than servers like Apache is that it doesn't create a thread and all of the resources with it to handling requests. In Apache, most of the time, that thread handling requests is waiting on network or filesystem data. While it does this, it is wasting resources. With Node, only one thread processes those requests (in your application). Again, this is great for some things, but if you have a lot of processing to do, Node would not be effective as it can really only handle a single request at a time in these situations.
This video does a pretty good job of explaining: http://www.youtube.com/watch?v=F6k8lTrAE2g&feature=youtube_gdata
Everything runs in parallel except your code.
It means if you do
while(true){}
anywhere in your code the entire node application will stop. While the code you write executes, nothing else does. Requests will not be handled, responses won't be returned, nothing. You have to be extremely careful to not hog the cpu in node.
but what about CPU load?
That completely depends on the nature of your application and the load. If your app is busy, it'll use more cpu.
Imagine a busy intersection with a traffic cop in the middle. When the cop is doing his job properly, hundreds of cars can pass through the intersection in a very fast and efficient way.
If the cop starts receiving and answering SMS messages on his cell while doing traffic, then things might go out of hand really quickly.
The traffic cop is your node.js app, and the time he spends doing SMS is what the author refers to as "your code".
In other words: node.js performance will shine the more you use it as a traffic cop. The more you start using it to do things other than pulling and pushing data (i.e.: sorting a list of numbers, rendering an html template, etc.), the more your capacity to accept and process new connections quickly will suffer.
"Everything" refers to everything else besides your code. For example, the stuff that handles HTTP. Another way to say the same thing is "your code doesn't wait for node.js to do stuff, like send data over TCP, because that's done asynchronously."
To answer your second question, I don't know which has less CPU load, I'm guessing they're similar. Node.js' touted advantage is the CPU is better utilized due to the aforementioned asynchronicity.