Benchmarking comet applications - node.js

I'm currenctly working on my master's thesis. It's about real-time webapplications.
Now I'd like to compare Node.js with for example long polling.
I know some benchmarking tools such as ab, autobench etc., but these don't really test the application. Once they've made a request to the server, the request is handled and a new request is made. What I need is a benchmarking tool that will 'stay' on the webpage for a longer time so it'll simulate real people.
For example: I've made a demo chat in both Node.js and long polling (PHP). Now I want to test this with 100 simultaneous that stay on the chat for about 30 seconds.
Does anyone has some suggestions for me how I can reach this goal?
I thank you in advance!

Now I'd like to compare Node.js with for example long polling.
Long polling itself is a platform agnostic web push technology, so you can compare long polling application made in node.js with similar application made in PHP for example.
What I need is a benchmarking tool that will 'stay' on the webpage for
a longer time so it'll simulate real people.
You can create another server application which would simulate client connections, however this application shouldn't be hosted on the same machine as your long poll server application in order to have "near real" latency between clients and server. Even this approach may not give you exact environment as you would have with "real human" clients (since server application simulating client connection would be on the same origin and also because of famous quote "there is no test like production"), but it can give you rough environment to test your long polling server to gather some benchmark data. For example socket.io has this kind of application for simulation of a variety of browser transports.

Related

Node.js design approach. Server polling periodically from clients

I'm trying to learn Node.js and adequate design approaches.
I've implemented a little API server (using express) that fetches a set of data from several remote sites, according to client requests that use the API.
This process can take some time (several fecth / await), so I want the user to know how is his request doing. I've read about socket.io / websockets but maybe that's somewhat an overkill solution for this case.
So what I did is:
For each client request, a requestID is generated and returned to the client.
With that ID, the client can query the API (via another endpoint) to know his request status at any time.
Using setTimeout() on the client page and some DOM manipulation, I can update and display the current request status every X, like a polling approach.
Although the solution works fine, even with several clients connecting concurrently, maybe there's a better solution?. Are there any caveats I'm not considering?
TL;DR The approach you're using is just fine, although it may not scale very well. Websockets are a different approach to solve the same problem, but again, may not scale very well.
You've identified what are basically the only two options for real-time (or close to it) updates on a web site:
polling the server - the client requests information periodically
using Websockets - the server can push updates to the client when something happens
There are a couple of things to consider.
How important are "real time" updates? If the user can wait several seconds (or longer), then go with polling.
What sort of load can the server handle? If load is a concern, then Websockets might be the way to go.
That last question is really the crux of the issue. If you're expecting a few or a few dozen clients to use this functionality, then either solution will work just fine.
If you're expecting thousands or more to be connecting, then polling starts to become a concern, because now we're talking about many repeated requests to the server. Of course, if the interval is longer, the load will be lower.
It is my understanding that the overhead for Websockets is lower, but still can be a concern when you're talking about large numbers of clients. Again, a lot of clients means the server is managing a lot of open connections.
The way large services handle this is to design their applications in such a way that they can be distributed over many identical servers and which server you connect to is managed by a load balancer. This is true for either polling or Websockets.

Should I use a REST API, or Socket.io for a Geolocation App?

I need to track moving cars.
Should I post the location every time the location changes, and send it over the socket?
Or should make a REST API and post the location (from the tracked device) and check it (with the tracker device) every 10 seconds, regardless if the location changed or not?
(The App is being made with React Native)
Building HTTP requests by frequent updates requires more resources then sending messages through websocket. Keeping websocket connections open by a lot of users requires more resources than using HTTP. In my opinion the answer depends on the user count, the update frequency, whether you apply the REST constraints (no server side session) and which version of HTTP you use (HTTP2 is more efficient than HTTP1.1 as far as I know). I don't think this is something we can tell you without measurements.
The same is true if you want to push data from the server to the client. If you do it frequently and the update must be almost immediate, then websocket is probably a better choice than polling. If you do the rarely and the delay (polling frequency) can be a few minutes, then polling might be better.
Note that I am not an expert of load scaling, this is just a layman's logic.
I would use WebSockets. For small deployments and low-frequency updates basically anything works, but with WebSockets you have technology that scales better in the long term. (And no, I would not consider this premature optimization, since the choice of technology here does not mean unnecessary initial overhead.)
Shameless plug: If you're using WebSocket, you could take a look at Crossbar.io - http://crossbar.io, or WAMP (http://wamp-proto.org) in general, which provides messaging mechanisms on top of WebSocket and should work well for you use case. I work for the company which is at the core of this, but it's open source software.

Using Laravel + Redis + Node.js on Heroku for websocket app... worried about connection limit

This is a bit of a stretch, but I hope someone can help.
I'm a PHP/iOS developer who's been working on an app that has a messaging component. Front end is Obj-C, backend is PHP/MySQL currently. As I've gone further into development, I'm feeling the shortcomings of polling and I've been looking for a more realtime solution and, sure enough, I've found the answer in web sockets. PHP doesn't play too well in this domain, but I've been able to get things working locally by using Laravel + Redis + Node.js.
Next I needed to find a suitable host for the real world app deployment and this is where I'm running into my first major obstacle (or perceived obstacle?)
Heroku appears to have very low limits on the number of Redis connections allowable:
Link: https://elements.heroku.com/addons/heroku-redis
Free plan: 20 connections
$120/month: 400 connections
$1450/month: 5000 connections
The problem is, if this app does well and gains the kind of traction I want, a LOT of people will be using it at the same time all across the country and these limits have me worried. These prices seem a bit ridiculous or I'm not looking at it correctly.
So my question is, does maintaining an open web socket (one user) mean that one of the Redis connections is used? Or am I looking at this completely wrong? Trying to decide if I need to just stick to polling or if there is a cost-efficient solution to this. I do want to stick to Laravel/Redis if possible because I am not too familiar with JS and I feel that my backend will be much less secure if I try to go down that route at this point.
Proper design will use 2 Redis connections per server (or per Heroku Dyno):
One connection will be used to Subscribe (to listen) to the app's channel(s). This connection cannot be used for other functions, so...
A second connection is used for all other Redis features, such as Database use and Publishing to the app's channel(s).
I don't know if you're into Ruby, but I'm the author of the Plezi Http(REST)/Websocket framework and had to manage a solution for Plezi's scaling capabilities over Redis (which is an automated feature, you just tell Plezi the Redis server's address and you're good to go).
If you want to look over Plezi's Redis code, you will notice there are two connections and that each server registers to two channels - a global channel and a private channel: one used for application wide events and the other one allows messages to be routed to specific connections based on the server they belong to (avoiding workload on unrelated servers).
Good luck!

What is the best way to communicate between two servers?

I am building a web app which has two parts. In one part it uses a real time connection between the server and the client and in the other part it does some cpu intensive task to provide relevant data.
Implementing the real time communication in nodejs and the cpu intensive part in python/java. What is the best way the nodejs server can participate in a duplex communication with the other server ?
For a basic solution you can use Socket.IO if you are already using it and know how it works, it will get the job done since it allows for communication between a client and server where the client can be a different server in a different language.
If you want a more robust solution with additional options and controls or which can handle higher traffic throughput (though this shouldn't be an issue if you are ultimately just sending it through the relatively slow internet) you can look at something like ØMQ (ZeroMQ). It is a messaging queue which gives you more control and lots of different communications methods beyond just request-response.
When you set either up I would recommend using your CPU intensive server as the stable end(server) and your web server(s) as your client. Assuming that you are using a single server for your CPU intensive tasks and you are running several NodeJS server instances to take advantage of multi-cores for your web server. This simplifies your communication since you want to have a single point to connect to.
If you foresee needing multiple CPU servers you will want to setup a routing server that can route between multiple web servers and multiple CPU servers and in this case I would recommend the extra work of learning ØMQ.
You can use http.request method provided to make curl request within node's code.
http.request method is also used for implementing Authentication api.
You can put your callback in the success of request and when you get the response data in node, you can send it back to user.
While in backgrount java/python server can utilize node's request for CPU intensive task.
I maintain a node.js application that intercommunicates among 34 tasks spread across 2 servers.
In your case, for communication between the web server and the app server you might consider mqtt.
I use mqtt for this kind of communication. There are mqtt clients for most languages, including node/javascript, python and java. In my case I publish json messages using mqtt 'topics' and any task that has registered to subscribe to a 'topic' receives it's data when published. If you google "pub sub", "mqtt" and "mosquitto" you'll find lots of references and examples. Mosquitto (now an Eclipse project) is only one of a number of mqtt brokers that are available. Another very good broker that is written in Java is called hivemq.
This is a very simple, reliable solution that scales well. In my case literally millions of messages reliably pass through mqtt every day.
You must be looking for socketio
Socket.IO enables real-time bidirectional event-based communication.
It works on every platform, browser or device, focusing equally on reliability and speed.
Sockets have traditionally been the solution around which most
realtime systems are architected, providing a bi-directional
communication channel between a client and a server.

Load test a Backbone App

I've got an NGinx/Node/Express3/Socket.io/Redis/Backbone/Backbone.Marionette app that proxies requests to a PHP/MySQL REST API. I need to load test the entire stack as a whole.
My app takes advantage of static asset caching with NGinx, clustering with node/express and socket is multi-core enabled using Redis. All that's to say, I've gone through a lot of trouble to try and make sure it can stand up to the load.
I hit it with 50,000 users in 10 seconds using blitz.io and it didn't even blink... Which concerned me because I wanted to see it crash, or at least breath a little heavy; but 50k was the max you could throw at it with that tool, indicating to me that they expect you to not reasonably be able to, or need to, handle more than that... Which is when I realized it wasn't actually incurring the load I was expecting because the load is initiated after the page loads and the Backbone app starts up and kicks off the socket connection and requests the data from the correct REST API endpoint (from different server).
So, here's my question:
How can I load test the entire app as a whole? I need the load test to tax the server in the same way that the clients actually will, which means:
Request the single page Backbone app from my NGinx/Node/Express server
Kick off requests for the static assets from NGinx (simulating what the browser would do)
Kick off requests to the REST API (PHP/MySQL running on a different server)
Create the connection to the Socket.io service (running on NGinx/Node/Express, utilizing Redis to handle multi-core junk)
If the testing tool uses a browser-like environment to load the page up, parsing the JS and running it, everything will be copasetic (NGinx/Node/Express server will get hit and so will the PHP/MySQL server). Otherwise, the testing tool will need to simulate this by firing off at least a dozen different kinds of requests nearly simultaneously. Otherwise it's like stress testing a door by looking at it 10,000 times (that is to say, it's pointless).
I need to ensure my app can handle 1,000 users hitting it in under a minute all loading the same page.
You should learn to use Apache JMeter http://jmeter.apache.org/
You can perform stress tests with it,
see this tutorial https://www.youtube.com/watch?v=8NLeq-QxkSw
As you said, "I need the load test to tax the server in the same way that the clients actually will"
That means that the tests is agnostic to the technology you are using.
I highly recommend Jmeter, is widely used and you can integrate it with Jenkins and do a lot of cool stuff with it.

Resources