Concurrent users without database - node.js

I can't seem to get this concept right in my head. If I have a website that gets 1 million concurrent users, without any databases at all, will I need to scale? I'm Using Node.js and Socket.IO. Also is there a way I could simulate something like this on my localhost?

Having one million user, or connections, on Socke.io, doesn't mean you have to scale, but depending on what they are doing, you would probably do. Having a data base adds storage but has nothing more to do with the need for scaling the Node.JS server.
You can create a test to try to insert as much as you want using a loop to connect and then try to emit an event for each of then.

For scaling node you can use a cluster. A single instance of Node.js runs in a single thread. To take advantage of multi-core systems, the user will sometimes want to launch a cluster of Node.js processes to handle the load. https://nodejs.org/api/cluster.html#cluster_cluster
To simulate high load, there are open source tools you can use for free: http://www.opensourcetesting.org/category/performance/

Related

NodeJS Monitoring Website (Worker Threads?/Multi Process?)

I am doing small project of application that will monitor some servers.
It will base on telnet port check, ping, and also it will use libraries to connect directly to databases (MSSQL, Oracle, MySQL) to check their status.
I wonder what will be the best effective solution for this idea, currently with around 30 servers it works quite smooth, around 2.5sec to check status for all of them (running async). However I am worried that in the future with more servers it might get worse. Hence thinking about using some alternative like Worker Threads maybe? or some multi processing? Any ideas? Everything is happening in internal network so I do not expect huge latency.
Thank you in advance.
Have you ever tried the PM2 cluster mode:
https://pm2.keymetrics.io/docs/usage/cluster-mode/
The telnet stuff is TCP, which Node.js does very well using OS-level networking events. The connections to databases can vary. In the case of Oracle, you'll likely be using the node-oracledb. Those are SQL*Net connections that rely on the OCI libs and Node.js' thread pool. The thread pool defaults to four threads, but you can grow it up to 128 per Node.js process. See this doc for info:
https://oracle.github.io/node-oracledb/doc/api.html#-143-connections-threads-and-parallelism
Having said all that, other than increasing the size of the thread pool, I wouldn't recommend you make any changes. Why fight fires before they're burning? No need to over-engineer things. You're getting acceptable performance given the current number of servers you have.
How many servers do you plan to add in, say, 5 years? What's the difference in timing if you run the status checks for half of the servers vs all of them? Perhaps you could use that kind of data to make an educated guess as to where things would go.
As you add new ones, keep track of the total time to check the status. Is it slipping? If so, look into where the time is being spent and write the solution that will help.

Is there a way to share memory among workers/threads/something in Node.JS?

I have a Node app which accesses a static, large (>100M), complex, in-memory data structure, accepts queries, and then serves out little slices of that data to the client over HTTP.
Most queries can be answered in tenths of a second. Hurray for Node!
But, for certain queries, searching this data structure takes a few seconds. This sucks because everyone else has to wait.
To serve more clients efficiently, I would like to use some sort of parallelism.
But, because this data structure is so large, I would like to share it among the workers or threads or what have you, so I don't burn hundreds of megabytes. This would be perfectly safe, because the data structure is not going to be written to. A typical 'fork()' in any other language would do it.
However, as far as I can tell, all the standard ways of doing parallelism in Node explicitly make this impossible. For safety, they don't want you to share anything.
But is there a way?
Background:
It is impractical to put this data structure in a database, or use memcached, or anything like that.
WebWorker API libraries and similar only allow short serialized messages to be passed in and out of the workers.
Node's Cluster uses a call named 'fork', but it is not really a fork of the existing process, it is spawning a new one. So once again, no shared memory.
Probably the really correct answer would be to use filesystem-like access to shared memory, aka tmpfs, or mmap. There are some node libraries that make mount() and mmap() available for exactly something like this. Unfortunately then one has to implement complex data structure access on top of synchronous seeks and reads. My application uses arrays of arrays of dicts and so on. It would be nice to not have to reimplement all that.
I tried write a C/C++ binding of shared memory access from nodejs. https://github.com/supipd/node-shm
Still work in progress (but working for me), maybe usefull, if bug or suggestion, inform me.
building with waf is old style (node 0.6 and below), new build is with gyp.
You should look at node cluster (http://nodejs.org/api/cluster.html). Not clear this is going to help you without having more details, but this runs multiple node processes on the same machine using fork.
Actually Node does support spawning processes. I'm not sure how close Node's fork is to real fork, but you can try it:
http://nodejs.org/api/child_process.html#child_process_child_process_fork_modulepath_args_options
By the way: it is not true that Node is unsuited for that. It is as suited as any other language/web server. You can always fire multiple instances of your server on different ports and put a proxy in front.
If you need more memory - add more memory. :) It is as simple as that. Also you should think about putting all of that data on a dedicated in-memory database like Redis or Memcached ( or even Couchbase if you need complex queries ). You won't have to worry about duplicating that data any more.
Most web applications spend the majority of their life waiting for network buffers and database reads. Node.js is designed to excel at this io bound work. If your work is truly bound by the CPU, you might be served better by another platform.
With that out of the way...
Use process.nextTick (perhaps even nested blocks) to make sure that expensive CPU work is properly asynchronous and not allowed to block your thread. This will make sure one client making expensive requests doesn't negatively impact all the others.
Use node.js cluster to add a worker process for each CPU in the system. Worker processes can all bind to a single HTTP port and use Memcached or Redis to share memory state. Workers also have a messaging API that can be used to keep an in-process memory cache synchronized, however it has some consistency limitations.

How do I set up routing to multiple instances of a node.js server on one url?

I have a simple node.js server app built that I'm hoping to test out soon. It's single threaded and works fine without any child processing whatsoever. My problem is that the server box has multiple cores and the simplest way I can think to utilize them is by running multiple instances of the server app. However this would require them all to be on the same domain name and so some sort of request routing is required. I personally don't have much experience with servers in general and don't know if this is a task for node.js to perform or some other less complicated program (or more complicated.) If there is a node.js mechanism to solve this, for example, if one running instance can send incoming requests to the next instance, than how would I detect when this needs to happen? Transversely, if I use some other program how will it manage to detect when it needs to start talking to a new instance?
Node.js includes built-in support for managing a cluster of instances of your application to take advantage of multiple cores via the cluster module.

NodeJS + SocketIO: Scaling and preventing single point of failure

So the first app that people usually build with SocketIO and Node is usually a chatting app. This chatting app basically has 1 Node server that will broadcast to multiple clients. In the Node code, you would have something like.
//Psuedocode
for(client in clients){
if(client != messageSender){
user.send(message);
}
}
This is great for a low number of users, but I see a problem with this. First of all, there is a single point of failure which is the Node server. Second of all, the app will slow down as the number of clients grow. What is there to do then when we reach this bottleneck? Is there an architecture (horizontal/vertical scaling) that can be used to alleviate this problem?
For that "one day" when your chat app needs multiple, fault-tolerant node servers, and you want to use socket.io to cross communicate between the server and the client, there is a node.js module that fits the bill.
https://github.com/hookio/hook.io
It's basically an event emitting framework to cross communicate between multiple "things" -- such as multiple node servers.
It's relatively complicated to use, compared to most modules, which is understandable since this is a complex problem to solve.
That being said, you'd probably have to have a few thousand simultaneous users and lots of other problems before you begin to have problems with this.
Another thing you can do, is try to develop your application in a way so that if a connection is lost (which happens all the time anyway), eg. server goes down, client has network issues (eg. mobile user), etc, your application should be able to handle that and recover from such issues gracefully.
Since Node.js has a single event-loop thread, this single point of failure is written into its DNA. Even reloading a server after code changes require this thread to be stopped.
There are however a lot of tools available to handle such failures gracefully. You could use forever; a simple CLI tool for ensuring that a given script runs continuously. Other options include distribute and up. Distribute is a load balancing middleware for Node. Up builds on top of Distribute to offer zero downtime reloads using either a JavaScript API or command line interface:
Further reading I find you just need to use Redis Store with Socket.io to maintain connection references between two or more processes/ servers. These options have already been discussed extensively here and here.
There's also the option of using socket.io-clusterhub if you don't intend to use the Redis store.

Scaling Node.JS across multiple cores / servers

Ok so I have an idea I want to peruse but before I do I need to understand a few things fully.
Firstly the way I think im going to go ahead with this system is to have 3 Server which are described below:
The First Server will be my web Front End, this is the server that will be listening for connection and responding to clients, this server will have 8 cores and 16GB Ram.
The Second Server will be the Database Server, pretty self explanatory really, connect to the host and set / get data.
The Third Server will be my storage server, this will be where downloadable files are stored.
My first questions is:
On my front end server, I have 8 cores, what's the best way to scale node so that the load is distributed across the cores?
My second question is:
Is there a system out there I can drop into my application framework that will allow me to talk to the other cores and pass messages around to save I/O.
and final question:
Is there any system I can use to help move the content from my storage server to the request on the front-end server with as little overhead as possible, speed is a concern here as we would have 500+ clients downloading and uploading concurrently at peak times.
I have finally convinced my employer that node.js is extremely fast and its the latest in programming technology, and we should invest in a platform for our Intranet system, but he has requested detailed documentation on how this could be scaled across the current hardware we have available.
On my front end server, I have 8
cores, what's the best way to scale
node so that the load is distributed
across the cores?
Try to look at node.js cluster module which is a multi-core server manager.
Firstly, I wouldn't describe the setup you propose as 'scaling', it's more like 'spreading'. You only have one app server serving the requests. If you add more app servers in the future, then you will have a scaling problem then.
I understand that node.js is single-threaded, which implies that it can only use a single core. Not my area of expertise on how to/if you can scale it, will leave that part to someone else.
I would suggest NFS mounting a directory on the storage server to the app server. NFS has relatively low overhead. Then you can access the files as if they were local.
Concerning your first question: use cluster (we already use it in a production system, works like a charm).
When it comes to worker messaging, i cannot really help you out. But your best bet is cluster too. Maybe there will be some functionality that provides "inter-core" messaging accross all cluster workers in the future (don't know the roadmap of cluster, but it seems like an idea).
For your third requirement, i'd use a low-overhead protocol like NFS or (if you can go really crazy when it comes to infrastructure) a high-speed SAN backend.
Another advice: use MongoDB as your database backend. You can start with low-end hardware and scale up your database instance with ease using MongoDB's sharding/replication set features (if that is some kind of requirement).

Resources