I currently am creating a horizontally scalable socket.io server which looks like the following:
LoadBalancer (nginx)
Proxy1 Proxy2 Proxy3 Proxy{N}
BackEnd1 BackEnd2 BackEnd3 BackEnd4 BackEnd{N}
Where the proxies are using sticky session + cluster each with a socket.io server running on a core and being load balanced by the nginx proxy.
Now to my question, these backend nodes use redis pubsub to communicate with the proxies which are handling all the communication via the transport (websockets).
When a request is sent to a backend server by a proxy, it knows the user who requested it, along with the proxy the user is on. My fear is that, when a proxy server goes offline for whatever reason, any pending request on my backend nodes will fail to reach the user when it comes back online because the messages where sent while the server was offline. What can I implement to circumvent this issue and essentially have messages get queued while any proxy server is offline, then delivered when its back on?
Pubsub doesn't persist messages. At all. In order to use Redis for this you would need to use a queue instead. For example you can use a combination of list operations where the producer pushes them to a list and you client server uses a BLPOP or BRPOP depending on how you add them and whether you want messages in FIFO or LIFO sequence.
Related
Normally i use ajax http requests to get/post data. Now i have thoughts like why shouldn't i replace all the ajax get requests with socketIO?is there any disadvantage in following this approach?
I understand that session cookies via http headers will be sent between client and server during every http requests, during client<=>server interactions using sockets, will the session cookies in browser automatically sent to the server via socket headers(if that exists)?
In which usecases should i prefer SocketIO over Http?(if you consider this as a question that demands broad answer then you can link me to some relevant articles)
WebSockets are useful when the server needs to push some real time information to the client about some events that happened on the server. This avoids the client making multiple polling AJAX calls to verify if some event has occurred on the server.
Think of a simple chat application. If the client needs to know if the other participant in a chat session has written something in order to display it, he will need to make AJAX calls at regular intervals to verify this on the server. On the other hand WebSockets allow the server to notify the client when this even occurs, so it is much more efficient in terms of network traffic. Also the WebSockets protocol allows the server to push real time information to multiple subscribed clients at the same time: for example you could have a web browser and mobile application subscribed to a WebSocket and talking to each other directly through the server. Using AJAX those kind of scenarios would be harder to achieve and would require much more stateless HTTP calls.
I understand that session cookies will be sent between client and server during every http requests, is this case the same during client<=>server interactions using sockets
The WebSockets protocol is different from the HTTP protocol. So after the initial handshake occurs (which happens over HTTP), there are no more notion of HTTP specific things such as cookies.
There's one important thing that you should be aware when using WebSockets: it requires a persistent connection to be established between the client and the server. This could make it tricky when you need to load balance your servers. Of course the different implementations of the WebSockets protocol might offer solutions to this problem. For example Socket.IO has a Redis implementation allowing the servers to keep track of connected clients through a cluster of nodes.
On the Server side for websockets there is already an ping/pong implementation where the server sends a ping and client replies with a pong to let the server node whether a client is connected or not. But there isn't something implemented in reverse to let the client know if the server is still connected to them.
There are two ways to go about this I have read:
Every client sends a message to server every x seconds and whenever
an error is thrown when sending, that means the server is down, so
reconnect.
Server sends a message to every client every x seconds, the client receives this message and updates a variable on the client, and on the client side you have a thread that constantly checks every x seconds which checks if this variable has changed, if it hasn't in a while it means it hasn't received a message from the server and you can assume the server is down so reestablish a connection.
You can achieve trying to figure out on client side whether the server is still online using either methods. The first one you'll be sending traffic to the server whereas the second one you'll be sending traffic out of the server. Both seem easy enough to implement but I'm not so sure which is the better way in terms of being the more efficient/cost effective.
Server upload speeds are higher than client upload speeds, but server CPUs are an expensive resource while client CPUs are relatively cheap. Unloading logic onto the client is a more cost-effective approach...
Having said that, servers must implement this specific logic (actually, all ping/timeout logic), otherwise they might be left with "half-open" sockets that drain resources but aren't connected to any client.
Remember that sockets (file descriptors) are a limited resource. Not only do they use memory even when no traffic is present, but they prevent new clients from connecting when the resource is maxed out.
Hence, servers must clear out dead sockets, either using timeouts or by implementing ping.
P.S.
I'm not a node.js expert, but this type of logic should be implemented using the Websocket protocol ping rather than by your application. You should probably look into the node.js server / websocket framework and check how to enable ping-ing.
You should set pings to accommodate your specific environment. i.e., if you host on Heroku, than Heroku will implement a timeout of ~55 seconds and your pings should be sent before this timeout occurs.
I have a logging server that receives data from some stateless clients on a single network (inaccessible from the outside world). I'd like to make sure all logs are eventually received by the server, even if the internet connection goes down.
To do this the easiest solution would be to set up a proxy server, and have the client log to both the logging server and the proxy server. The proxy server then tries to log to the logging server, and if it fails it caches the request for later. Something like this:
Notes:
All requests are idempotent.
The clients are stateless (logs can not be cached on the clients)
All parts of the system, except the intermediate "internet" step, are configurable.
The proxy server does not need to read or modify the data.
The logging server response is not used by the client.
I cannot make significant changes to the client or logging server (Cassandra would be great for this application, though).
My questions: is there any off the shelf software that can serve as the proxy? If not, anything to think about when writing this? Are there any concerns with this scheme?
your proxy looks like a simple persistent queue. all you have to do is to add/configure connector to the logging server.
but even without a queue the whole process looks like 2 db queries and and 2 rest calls - you will probably waste more time comparing different products than writing it on your own
I am working on a webRTC application where a P2P connection is established between a Customer and free agents .The agents are fetched using AJAX call in the application.I want to scale the application such that if the agents are running on any node server they are able to have a communication mechanism and update status on agent(available,busy,unavailable)can be performed.
My problem statement is that the application is running on 8040 and agentsservice is running on 8088 where the application is making ajax calls and bringing the data.What best can be done to scale the agents or any idea about how to scale the application.
I followed https://github.com/rajaraodv/redispubsub using Redis pub/sub but my problem is not resolved as the agents are being updated , fetched on another node using ajax calls .
You didnt gave enough info... but to scale your nodejs app you need a centeral place which will hold all the info that needed and than can scale redis can scale easily, youc can try socket.io etc..
now after you have your cluster of redis for example you need to make all your node.js to communicate with the redis server that way all you nodes server will have access to same info, now its up to you to send to right info to right clients
Message Bus approach:
Ajax call will send to one of the nodejs servers. If the message doesn't find its destination in that server, it will be sent to the next one, and so one. So signaling server must distribute the received message to all the other nodes in the cluster by establishing a Message Bus
I'm about to begin with socket.io and this is more of a theoretical question,
let's say that I want to send a message to a specific user with socket.io,
normally I would have to store the socketid with the relevant userid and when sending, get the socketid and send to.
but what if I have mutliple server processes running ? I'll have to make sure the correct server that the client is actually connected to does the sending. is it possible ?
For multiple server instances, you need to have a caching service (memcache, redis) for authentication and a central message queue service (stormMQ, rabbitMQ, AQ, java-based mq) where all your node instances bind to. Thus, a Node instance binds to the message queue for each client / channel / whatever, and all the other bound Node instances receive the messages and forward them to the client.
The problem is typically about how to play with a WebSocket cluster:
Several front-end servers which will be in charge of handling bidirectional connections with each client. They form the WebSocket cluster.
Several back-end servers which will be in charge of handling the business logic of your application.
Each time the back-end wants to inform the client, it will send a request to the WebSocket cluster which has the responsibility to communicate with the client.
A possible scenario:
Identify each WebSocket cluster's server with a unique id.
Identify each client with a unique id.
Each time a client will connect one of your WebSocket cluster's server, store its unique id along with the server's unique id in a a distributed key/value like database.
Thus you know which client is connected with which server.
The next time your back-end application wants to notify a client there are two possibilities:
The pair (clientId, serverId) is not present in the database and you cannot inform the client.
The pair (clientId, serverId) is present in the database, then you have to ask to the server identified by serverId to notify the client identified by clientId.
Notes:
Each WebSocket cluster's server can run a node.js instance supercharged with socket.io. It has to provide a route which will take the clientId as a parameter and will use socket.io to notify this client. Indeed, socket.io is aware of whcih client is using which socket on this server.
Every time your server will crash, you have to clean your database and remove all pairs which contain the server id.
Deploying a WebSocket cluster can be tedious, so you have commercial offers like Kaazing.
A good distributed key/value like database is Riak. It is better than Redis or Memcached for the above purpose because it can be easily distributed in a data-center and over several data-centers.