Stress testing in NodeJS/Socket.io for 1000 users?

Stress testing in NodeJS/Socket.io for 1000 users? - node.js

I'm currently working on a chat-like application. There's users, chatrooms, messages—all that stuff. The app is powered by Node.js and Socket.IO.
One thing I am interested in doing, though, is stress testing the application. The tests that currently exist are simply some feature tests that make use of 1 client. However, I want to do things like test 1000 people logging into the application at the same time, many people entering the chatroom, everyone sending a message in that room, etc...
How do I go about doing this? Google searches appear to bring up unrelated results.
I've noticed that some have identified this as a duplicate. However, that question involves ASP.NET, and the use of Selenium is out of the question. In addition, I originally intended to get answers that involve doing special actions within the node.js framework. Another developer (no longer part of the team) wrote some stress tests that involve defining a custom user object and iteratively signing up those users, joining a room, etc. Those test are incomplete and no longer usable though since much in the codebase has changed since they were written. Anyways an answer that somehow allows for stress testing the application another way is acceptable.
EDIT: Here's my attempt of stress testing with many users.
function client() {
// Define a testing client that responds to socket io events and the like
}
...
testers = [];
for (int i = 0; i <= 1000; i++) testers.push(new client());
// Each client connects to server with option 'force new connection: true'
it('Should login many users simultaneously') {
// Use async.each on testers
// Use client.login() on each client.
}
I've found that this turns out to be problematic. The test simply doesn't advance after logging in about 250 users. What is the issue here? Could it have to do with the limitations of node.js and socket.io?

The solution that my team and I ended up going with is as follows:
We defined a Client object in its own file (client.js), with various actions and events defined in response to Socket.IO on events and such. In the test.js file, we would instantiate many instances of a Client object, creating many connections and clients.
The limitations that we encountered likely involve the capabilities of the computer that runs the tests (the device that hosts the server can run the test with 1000 users, though).

You can use Apache JMeter tool where you can simulate N number of concurrent users using single client.

Go with, for example, Apache ab.
Just create an API (only for testing purposes) and use and endpoint of that API to send a lot of messages from apache ab.
ab -n 1000 -c 10 -T 'multipart/form-data; boundary=1234567890' -p post.txt http://yourdomain.com/yourproject/api/sendmessage/
And post.txt would contain something like:
--1234567890
Content-Type: application/x-www-form-urlencoded
Content-Length: 30
from=myId&to=yourId&message=alrighty
<base64 data>
--1234567890
Where base64 data should be a long string that represents (base64) the data of that POST request.

Related

Is a node.js app that both servs a rest-api and handles web sockets a good idea?

Disclaimer: I'm new to node.js so I am sorry if this is a weird question :)
I have a node.js using express.js to serv a REST-API. The data served by the REST-API is fetched from a nosql database by the node.js app. All clients only use HTTP-GET. There is one exception though: Data is PUT and DELETEd from the master database (a relational database on another server).
The thought for this setup is of course to let the 'node.js/nosql database' server(s) be a public front end and thereby protecting the master database from heavy traffic.
Potentially a number of different client applications will use the REST-API, but mainly it will be used by a client app with a long lifetime (typically 0.5 to 2 hours). Instead of letting this app constantly polling the REST-API for possible new data I want to use websockets so that data is only sent to client when there is any new data. I will use a node.js app for this and probably socket.io so that it could fall back to api-polling if websockets are not supported by the client. New data should be sent to clients each time the master database PUTs or DELETEs objects in the nosql database.
The question is if I should use one node.js for both the API and the websockets or one for the API and one for the websockets.
Things to consider:
- Performance: The app(s) will be hosted on a cluster of servers with a load balancer and a HTTP accelerator in front. Would one app handling everything perform better than two apps with distinct tasks?
- Traffic between app: If I choose a two app solution the api app that receives PUTs and DELETEs from the master database will have to notice the websocket app every time it receives new data (or the master database will have to notice both apps). Could the doubled traffic be a performance issue?
- Code cleanlines: I believe two apps will result in cleaner and better code, but then again there will surely be some common code for both apps which will lead to having two copies it.
As to how heavy the load can be it is very difficult to say, but a possible peak can involve:
50000 clients
each listening to up to 5 different channels
new data being sent from master each 5th second
new data should be sent to approximately 25% of the clients (for some data it should be sent to all clients and other data probably below 1% of the clients)
UPDATE:
Thanks for the answers guys. More food for thoughts here. I have decided to have two node.js apps, one for the REST-API and one for web sockets. The reason is that I belive it will be easier to scale them. To begin with the whole system will be hosted on three physical servers and one node.js app for the REST-API on each server should bu sufficient, but for the websocket app there probably needs to several instances of it on each physical server.

This is a very good question.
If you are looking at a legacy system, and you already have a REST interface defined, there is not a lot of advantages to adding WebSockets. Things that may point you to WebSockets would be:
a demand for server-to-client or client-to-client real-time data
a need to integrate with server-components using a classic bi-directional protocol (e.g. you want to write an FTP or sendmail client in javascript).
If you are starting a new project, I would try to have a hard split in the project between:
the serving of static content (images, js, css) using HTTP (that was what it was designed for) and
the serving of dynamic content (real-time data) using WebSockets (load-balanced, subscription/messaging based, automatic reconnect enabled to handle network blips).
So, why should we try to have a hard separation? Let's consider the advantages of a HTTP-based REST protocol.
The use of the HTTP protocol for REST semantics is an invention that has certain advantages
Stateless Interactions: none of the client's context is to be stored on the server side between the requests.
Cacheable: Clients can cache the responses.
Layered System: undetectability of intermediaries
Easy testing: it's easy to use curl to test an HTTP-based protocol
On the other hand...
The use of a messaging protocol (e.g. AMQP, JMS/STOMP) on top of WebSockets does not preclude any of these advantages.
WebSockets can be transparently load-balanced, messages and state can be cached, efficient stateful or stateless interactions can be defined.
A basic reactive analysis style can define which events trigger which messages between the client and the server.
Key additional advantages are:
a WebSocket is intended to be a long-term persistent connection, usable for multiple different messaging purpose over a single connection
a WebSocket connection allows for full bi-directional communication, allowing data to be sent in either direction in sympathy with network characteristics.
one can use connection offloading to share subscriptions to common topics using intermediaries. This means with very few connections to a core message broker, you can serve millions of connected users efficiently at scale.
monitoring and testing can be implemented with an admin interface to send/recieve messages (provided with all message brokers).
the cost of all this is that one needs to deal with re-establishment of state when the WebSocket needs to reconnect after being dropped. Many protocol designers build in the notion of a "sync" message to provide context from the server to the client.
Either way, your model object could be the same whether you use REST or WebSockets, but that might mean you are still thinking too much in terms of request-response rather than publish/subscribe.

The first thing you must think about, is how you're going to scale the servers and manage their state. With a REST API this is largely straightforward, as they are for the most part stateless, and every load balancer knows how to proxy http requests. Hence, REST APIs can be scaled horizontally, leaving the few bits of state to the persistence layer (database) to deal with. With websockets, often times its a different matter. You need to research what load balancer you're going to use (if its a cloud deployment, often times it depends on the cloud provider). Then figure out what type of websocket support or configuration the load balancer will need. Then depending on your application, you need to figure out how to manage the state of your websocket connections across the cluster. Think about the different use cases, e.g. if a websocket event on one server alters the state of the data, will you need to propagate this change to a different user on a different connection? If the answer is yes, then you'll probably need something like Redis to manage your ws connections and communicate changes between the servers.
As for performance, at the end of the day its still just HTTP connections, so I doubt there will be a big difference in separating the server functionality. However, I think two servers would go a big way in improving code cleanliness, as long as you have another 'core' module to isolate code common to both servers.

Personally I would do them together, this is because you can share the models and most of the code between the REST and the WS.
At the end of the day what Yuri said in his answer is correct, but is not so much work to load balance WS any way, everyone does it nowadays. The approach I took is have REST for everything and then create some WS "endpoints" for subscribing for realtime data server-client.
So for what I understood, your client would just get notifications from the server, with updates, so definitely I would go with WS. You subscribe to some events and then you get new results when there are. Keep asking with HTTP calls is not the best way.
We had this need and basically built a small framework around this idea http://devalien.github.io/Axolot/
Basically you can understand our approach in the controller (this is just an example, in our real world app we have subscriptions so we can notify when we have new data or when we finish a procedure). In actions there are the rest endpoints and in sockets the websockets endpoints.
module.exports = {
model: 'user', // We are attaching the user to the model, so CRUD operations are there (good for dev purposes)
path: '/user', // Tthis is the end point
actions: {
'get /': [
function (req, res) {
var query = {};
Model.user.find(query).then(function(user) { // Find from the User Model declared above
res.send(user);
}).catch(function (err){
res.send(400, err);
});
}],
},
sockets: {
getSingle: function(userId, cb) { // This one is callable from socket.io using "user:getSingle
Model.user.findOne(userId).then(function(user) {
cb(user)
}).catch(function (err){
cb({error: err})
});
}
}
};

Sending messages between clients socket.io

I'm working on a chat application and using socket.io / node for that. Basically I came up with the following strategies:
Send message from the client which is received by the socket server which then sends it to the receiving client. On the background I store that to the message on the DB to be retrieved later if the user wishes to seee his old conversations.
The pros of this approach is that the user gets the message almost instantly since we don't wait for the DB operation to complete, but the con is that if the DB operation failed and exactly that time the client refreshed its page to fetch the message, it won't get that.
Send message form the client to the server, the server then stores it on the DB first and then only sends it to the receiving client.
The pros is that we make sure that the message will be received to the client only if its stored in the DB. The con is that it will be no way close to real time since we'll be doing a DB operation in between slowing down the message passing.
Send message to the client which then is stored on a cache layer(redis for example) and then instantly broadcast it to the receiving client. On background keep fetching records from redis and updating DB. If the client refreshes the page, we first look into the DB and then the redis layer.
The pros is that we make the communication faster and also make sure messages are presented correctly on demand. The con is that this is quite complex as compared to above implementations, and I'm wondering if there's any easier way to achieve this?
My question is whats the way to go if you're building a serious chat application that ensures both - faster communication and data persistence. What are some strategies that app like facebook, whatsapp etc. use for the same? I'm not looking for exact example, but a few pointers will help.
Thanks.

I would go for the option number 2. I've been doing myself Chat apps in node and I found out that this is the best option. Saving in a database takes few milliseconds, which includes the 0.x milliseconds to write in the databse and the few milliseconds of latency in communication ( https://blog.serverdensity.com/mongodb-benchmarks/ ).
SO I would consider this approach realtime. The good thing with this is that if it fails, you can display a message to the sender that it failed, for whatever reason.
Facebook, whatsapp and many other big messaging apps are based on XMPP (jabber) which is a very, very big protocol for instant messaging and everything is very well documented on how to do things but it is based in XML, so you still have to parse everything etc but luckily there are very good libraries to handle with xmpp. So if you want to go the common way, using XMPP you can, but most of the big players in this area are not following anymore all the standards, since does not have all the features we are used to use today.
I would go with doing my own version, actually, I already something made (similar to slack), if you want I could give you access to it in private.
So to end this, number 2 is the way to go (for me). XMPP is cool but brings also a lot of complexity.

Pass data between multiple NodeJS servers

I am still pretty new to NodeJS and want to know if I am looking at this in the wrong way.
Background:
I am making an app that runs once a week, generates a report, and then emails that out to a list of recipients. My initial reason for using Node was because I have an existing front end already built using angular and I wanted to be able to reuse code in order to simplify maintenance. My main idea was to have 4+ individual node apps running in parallel on our server.
The first app would use node-cron in order to run every Sunday. This would check the database for all scheduled tasks and retrieve the stored parameters for the reports it is running.
The next app is a simple queue that would store the scheduled tasks and pass them to the worker tasks.
The actual pdf generation would be somewhat CPU intensive, so this would be a cluster of n apps that would retrieve and run individual reports from the queue.
When done making the pdf, they would pass to a final email app that would send the file out.
My main concerns are communication between apps. At the moment I am setting up the 3 lower levels (ie. all but the scheduler) on separate ports with express, and opening http requests to them when needed. Is there a better way to handle this? Would the basic 'net' work better than the 'http' package? Is Express even necessary for something like this, or would I be better off running everything as a basic http/net server? So far the only real use I've made of Express is to specifically listen to a path for put requests and to parse the incoming json. I was led to asking here because in tracking logs so far I see every so often the http request is reset, which doesn't appear to affect the data received on the child process, but I still like to avoid errors in my coding.

I think that his kind of decoupling could leverage some sort of stateful priority queue with features like retry on failure, clustering, ...
I've used Kue.js in the past with great sucess, it's redis backed and has nice documentation and interface http://automattic.github.io/kue/

Node.js - Handling hundreds of user preferences

I'm learning node.js (my web background is mainly PHP) and I'm loving it so far but I have the following question. In PHP and other similar languages, each request is a single lived execution of the script. All user preferences can be loaded, etc can be loaded and there's no issue there as once the script execution has been completed, all resources will be released.
In node.js, especially in a long running process like a chatroom (I'm using socket.io), you will have hundreds/thousands of users being handled by one process. Assuming for instance I have a chatroom with 200 people, and I want messages to be highlighted if it comes from a participant the user has deemed a "Friend", then I will have to loop through 200 users to see if the user is a friend or not (especially if chats are to be only sent to friends and not publicly).
Won't this be really slow, especially over time? Is there something I'm missing out on? In my small tests as the number of users as well as number of messages go up, the responsiveness of the server goes down noticeably.

If you are going to develop a complex chatroom, you have to consider design the server side code and maintain the clients information at the server side. For example, you have to map the newly connected client socket to variables at the server side, also if you want to introduce "Friend" feature you have to maintain those information at server side. So your server don't have to look up each client see if they are the correct message receivers.
With all those implemented, in the scenario of sending message to the public, at the server side we could first find all the "friend" sockets, then send the message highlighted as "Friend" to those sockets, then send normal text to others. For private message to Friend, it will be much easier as we only consider friends sockets.
So you still need to reuse some of your design patterns you've used in PHP, socket.io would only maintain the long connections for you, and that is all.

Node.js: Handling outgoing HTTP request from node

I've been wrestling with this problem for a while but could not find a good solution for this, so came here for help.
I have a node.js server; on receiving a request from a client, the server will contact 3rd party backend to grab some data, and return it back to the client.
The server to 3rd party backend communication involved multiple calls back and forth, and it typically takes ~3 seconds to finish this process.
If I fire off, say, 50 concurrent requests from test tools like JMeter, the performance degradation becomes severe very quickly, even causing timeouts for some of the later served calls.
Initially I started looking into asyncblock, but since it was running on fiber I wasn't seeing a big improvements in performance, so I started looking into threads.
The only mature module I could find was thread-a-gogo, but I also recently found out that you cannot use the required external modules(like crypto, for example) within threads spawned by TAGG.
Given that there are proxy products built with node.js I believe there is an efficient way to do this but I can't really think of other approaches if threads cannot use external modules.
Any advice would be appreciated.
I can't reveal the full detail due to NDA but here's the basic concept of what I'm doing. I'd like to send below logic to a separate thread.
asyncblock(flow){
var result1 = flow.sync(externalRequest1(flow.callback());
if result1 contains success message
var result2 = flow.sync(externalRequest2(flow.callback());
if result2 contains success message
process result2 and return to client
}

First thing to check and be certain: are you using the node.js core HTTP Agent? If so, you are subject to maxSockets limit of 5 connections to the same server. Read the hyperquest README rant for details.
Secondly, be aware the remote side may impose abuse limitations as well, so check to see if there are issues there or if, for example, overall performance would be better if you used a single pool of 10 connections instead of opening unlimited number of simultaneous connections to the upstream server.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string