Non-Websocket Socket.io clients for benchmarking - node.js

There are a bunch of Socket.io client implementations out there in e.g. Java (see Java socket.io client), that seem to exclusively support the Websocket protocol.
For benchmarking the server performance of other protocols - and I'm particularly interested in htmlfile as it will be used by IE browsers < 10, unless I enable Flash, which I'm not sure I'll do, as socket.io transport 'flashsocket' takes 5 seconds to start on IE 8 - is there any Socket.io client available that would allow benchmarking of the server?
I don't care too much what OS or programming language it is.

There's
https://github.com/Gottox/socket.io-java-client
In addition to WebSocket, it only does XHR, and that feature is currently considered to be in beta:
Status: Connecting with Websocket is production ready. XHR is in beta.
Light testing xhr polling seems to support the claim that it is not production ready yet. I had a bunch of disconnects without subsequent reconnects. This was when testing a few hundred client instances simultaneously in one JVM. As there were errors in the server log, I guess it's the client.
One more guess: As the connection drops are so frequent, and the load on the server is so much higher compared to WebSockets, I wonder whether this client's 'xhr polling' does not do HTTP Keepalive, that would explain a lot... Will check as soon as time permits.
Using WebSocket, I could do 1000 instances per JVM (probably more) and 5000 instances (5 JVMs x 1000 instances each) per machine (probably more, too) without issues.
And apparently
it's easy to write your own transport
Will check this out.

It should also be easy to create your own by
sniffing on a browser <-> socket.io session (On Windows, that's Fiddler)
copying the code from one of the official socket.io tests

Related

Node.js design approach. Server polling periodically from clients

I'm trying to learn Node.js and adequate design approaches.
I've implemented a little API server (using express) that fetches a set of data from several remote sites, according to client requests that use the API.
This process can take some time (several fecth / await), so I want the user to know how is his request doing. I've read about socket.io / websockets but maybe that's somewhat an overkill solution for this case.
So what I did is:
For each client request, a requestID is generated and returned to the client.
With that ID, the client can query the API (via another endpoint) to know his request status at any time.
Using setTimeout() on the client page and some DOM manipulation, I can update and display the current request status every X, like a polling approach.
Although the solution works fine, even with several clients connecting concurrently, maybe there's a better solution?. Are there any caveats I'm not considering?
TL;DR The approach you're using is just fine, although it may not scale very well. Websockets are a different approach to solve the same problem, but again, may not scale very well.
You've identified what are basically the only two options for real-time (or close to it) updates on a web site:
polling the server - the client requests information periodically
using Websockets - the server can push updates to the client when something happens
There are a couple of things to consider.
How important are "real time" updates? If the user can wait several seconds (or longer), then go with polling.
What sort of load can the server handle? If load is a concern, then Websockets might be the way to go.
That last question is really the crux of the issue. If you're expecting a few or a few dozen clients to use this functionality, then either solution will work just fine.
If you're expecting thousands or more to be connecting, then polling starts to become a concern, because now we're talking about many repeated requests to the server. Of course, if the interval is longer, the load will be lower.
It is my understanding that the overhead for Websockets is lower, but still can be a concern when you're talking about large numbers of clients. Again, a lot of clients means the server is managing a lot of open connections.
The way large services handle this is to design their applications in such a way that they can be distributed over many identical servers and which server you connect to is managed by a load balancer. This is true for either polling or Websockets.

Websockets for non-realtime apps?

I have been studying web sockets recently and plan to use them in my application even though the app is not realtime. I am mostly doing this because I want to try it out and further down the line it might open more possibilites for the app's functionality. Also I am not bothered about having an API for mobile at the moment but think it would still be possible to have some kind of api over web sockets if I needed it in the future.
However for in-production apps are there any real reasons why somebody would consider implementing websockets if there is no real-time element?
Are there any benefits over HTTP requests other than the real timeness of it?
HTTP requests include the full HTTP headers. Depending on the cookie load, this may reach a couple of KB per request. WebSocket protocol headers are minimal compared to that. If you have a lot of requests and care about bandwidth then going with WebSocket makes sense.
Additionally a HTTP connection is (traditionally) negotiated for each request, which means you have overhead on each request compared to WebSocket, which has persistent connections. Connection establishment takes time (hence the advantage in real-time applications), but it also uses resources on the server. Again, depending on your app's communication patterns, using WebSocket may make sense.

What is the best way to communicate between two servers?

I am building a web app which has two parts. In one part it uses a real time connection between the server and the client and in the other part it does some cpu intensive task to provide relevant data.
Implementing the real time communication in nodejs and the cpu intensive part in python/java. What is the best way the nodejs server can participate in a duplex communication with the other server ?
For a basic solution you can use Socket.IO if you are already using it and know how it works, it will get the job done since it allows for communication between a client and server where the client can be a different server in a different language.
If you want a more robust solution with additional options and controls or which can handle higher traffic throughput (though this shouldn't be an issue if you are ultimately just sending it through the relatively slow internet) you can look at something like ØMQ (ZeroMQ). It is a messaging queue which gives you more control and lots of different communications methods beyond just request-response.
When you set either up I would recommend using your CPU intensive server as the stable end(server) and your web server(s) as your client. Assuming that you are using a single server for your CPU intensive tasks and you are running several NodeJS server instances to take advantage of multi-cores for your web server. This simplifies your communication since you want to have a single point to connect to.
If you foresee needing multiple CPU servers you will want to setup a routing server that can route between multiple web servers and multiple CPU servers and in this case I would recommend the extra work of learning ØMQ.
You can use http.request method provided to make curl request within node's code.
http.request method is also used for implementing Authentication api.
You can put your callback in the success of request and when you get the response data in node, you can send it back to user.
While in backgrount java/python server can utilize node's request for CPU intensive task.
I maintain a node.js application that intercommunicates among 34 tasks spread across 2 servers.
In your case, for communication between the web server and the app server you might consider mqtt.
I use mqtt for this kind of communication. There are mqtt clients for most languages, including node/javascript, python and java. In my case I publish json messages using mqtt 'topics' and any task that has registered to subscribe to a 'topic' receives it's data when published. If you google "pub sub", "mqtt" and "mosquitto" you'll find lots of references and examples. Mosquitto (now an Eclipse project) is only one of a number of mqtt brokers that are available. Another very good broker that is written in Java is called hivemq.
This is a very simple, reliable solution that scales well. In my case literally millions of messages reliably pass through mqtt every day.
You must be looking for socketio
Socket.IO enables real-time bidirectional event-based communication.
It works on every platform, browser or device, focusing equally on reliability and speed.
Sockets have traditionally been the solution around which most
realtime systems are architected, providing a bi-directional
communication channel between a client and a server.

Does socket.io and node.js's performance get affected on heroku's server (with no websocket)?

Since heroku server doesn't support websocket, does it mean if we run a node.js + io.socket app on it, expecting many concurrent users, some in effectiveness will happen when there are more users?
I was building a multiuser app and suddenly notice that heroku is using long poll instead of websockets. I couldn't see much delay in my prototype but I am worried, should i be building my app on a server that supports real websockets?
... should i be building my app on a server that supports real websockets?
Probably.
http://websocket.org/quantum.html, says "HTML5 Web Sockets can provide a 500:1 or—depending on the size of the HTTP headers—even a 1000:1 reduction in unnecessary HTTP header traffic and 3:1 reduction in latency."
Long polling is old and inefficient, and is slowly being replaced by sockets. They are supported by every server. Most of latest browsers have already added support too. Heroku will do that too, soon hopefully. You can continue with your prototype, maybe websocket support will be added before you finish it.
The advantages websocket are given here

socket.io websocket fallbacks

I want to use dotcloud with node.js + socket.io for realtime applications.
But they don't support websockets.
Will there be noticeable bandwidth or performance degradation by relying purely on fallbacks?
Is it worth it to use my own server? Linode or aws or whatnot.
Thanks.
I'm implementing an instant messaging system which depends completely on websocket. As the web is evolving quite fast and websocket was in the web standard, I decided to use flash websocket fallback for any browser that don't support it by default (Firefox, Opera). Here is what you may want to know:
I use websocket. I use a pure websocket server. I don't use any other protocols. I don't use socket.io. I must say that if you decide to use only websocket, you won't have benefit from socket.io lib, even the development time. It only adds unnecessary overhead to your server because of multiple transportation layers support.
At client side, I use websocket + flash websocket fallback which implements websocket specs using flash socket and I would say that there's no noticable difference. The only thing you should know that is due to the "same origin policy", you may need to serve flash socket policy request your own (run on port 843 by default) to allow the flash socket to connect in.
We're currently using private server because we have a dedicated sysadmin. However, it's better if you can just focus on doing what you intended to do, and not on unwanted things. Oh, and sometimes, it's better if you have complete control of your own server :-).
Hope it helps.

Resources