I am wanting to log a public chat to a database to be able to lookup chat history in the future.
I am familiar with node.js and mongoDB.
I don't want to miss logging any chat messages, and so was looking for a redundant solution in case of network disconnect or server failure/restarts.
Everything I've seen regarding failover and balancing, is with the node app as a http server, and so can be solved with a reverse proxy sending the requests to the different servers.
But I'm at a loss as how to have 2+ VPSs in different regions, running a node app that monitors the same public chat and logs those chat entries to a DB without race conditions on the DB.
Messaging between the node instances? But it seems like there would also be race conditions with that...
Thanks for the help.
Related
Okay so multithreading nodeJS isn't much problem from what I've been reading. Just deploy several identical apps and use nginx as reverse proxy and load balancer of all the apps.
But actually native cluster module works pretty well too, I found.
However, what if I have socket.io with the nodeJS app? I have tried the same strategy with nodeJS + socket.IO; however, it obviously did not work because every socket event emitted will be more or less evenly distributed and sockets other than the one that made the connection would have no idea where the request came from.
So best method I can think of right now is to separate nodeJS server and socket.IO server. Scale nodeJS server horizontally (multiple identical apps) but just have one socket.IO server. Although I believe it would be enough for the purpose of our solution, I still need to look out for future. Has anyone succeeded in horizontally scaling Socket.IO? So multiple threads?
The guidelines on the socket.io website use Redis with a package called socket.io-redis
https://socket.io/docs/using-multiple-nodes/
Looks like is just acts like a single pool for the connections, and each node instance connects to that.
Putting your socket server on a separate service (micro-service) is a probably fine, the downside is needing to manage communications between the two instances.
I am working on a webRTC application where a P2P connection is established between a Customer and free agents .The agents are fetched using AJAX call in the application.I want to scale the application such that if the agents are running on any node server they are able to have a communication mechanism and update status on agent(available,busy,unavailable)can be performed.
My problem statement is that the application is running on 8040 and agentsservice is running on 8088 where the application is making ajax calls and bringing the data.What best can be done to scale the agents or any idea about how to scale the application.
I followed https://github.com/rajaraodv/redispubsub using Redis pub/sub but my problem is not resolved as the agents are being updated , fetched on another node using ajax calls .
You didnt gave enough info... but to scale your nodejs app you need a centeral place which will hold all the info that needed and than can scale redis can scale easily, youc can try socket.io etc..
now after you have your cluster of redis for example you need to make all your node.js to communicate with the redis server that way all you nodes server will have access to same info, now its up to you to send to right info to right clients
Message Bus approach:
Ajax call will send to one of the nodejs servers. If the message doesn't find its destination in that server, it will be sent to the next one, and so one. So signaling server must distribute the received message to all the other nodes in the cluster by establishing a Message Bus
I've gotten some great feedback from Stackoverflow and wanted to check on one more idea.
Currently I've got a webapp that runs nodejs on a PAAS (Heroku and trying out bluemix). The server is being configured to talk to a Couchdb (hosted on cloudant). There are two types of data saved to the db, first, user data (each user will have it's own database), and second, app data itself (metrics, user account info (auth/admin stuff).
After some great feedback from here, the idea is that after the user logs in, they will sync there local (browser) pouchdb instance with Cloudant (probably proxied through my server as was recommended here).
Now the question is, for the app/admin data, maybe I run a couchdb instance on my server so i'm not making repeated network calls for things like user logins, metrics data, etc. The data would not be very big, and is already separated from the user data calls. The point is to have a faster/local instance for authentication mainly, changes/updates get synced outside of user requests.
The backend is in express web framework and it looks like my options are pouchdb.... to sync to the Cloudant instance?
If I want local db access (backed a Couchdb instance), on a node/express server running on a PAAS, is that the recommended setup?
Thanks vm for any feedback,
Paul
Not sure if you found a solution, but this is what I would try.
Because heroku clears any temp data, you wouldn't be able to run a default express-pouch database, you will need to change pouch db from using file system to using LevelDOWN adapter.(Link to Pouchdb adapters: https://pouchdb.com/adapters.html)
Some of these adapters would include:
https://github.com/watson/mongodown
https://github.com/kesla/mysqldown
https://github.com/hmalphettes/redisdown
You can easily get heroku mondo, mysql, or redis addon, and connect that to you express-pouchdb backend.
This way you will be able to keep your data.
Im using nodejs and socket.io to deliver a chat on my business app, but i want to distribute the deploy so i can have as many chat servers i want to balance the load of the traffic.
I try the load balance approach from nginx but that just do that balance the traffic but the communication between the socket.io serves its not the same, so one chat message send from user A to server S1 wont travel to user B on server S2.
There is any tool or approach to do this.
Thanks in advance.
===== EDIT =====
Here is the architecture of the app.
The main app frontend on PHP CodeIgniter lets tag it as PHPCI
The chat app backend on NodeJs and SocketIO lets tag it as CHAT
The chat model data on Redist lets tag it as REDIST
So what i have now its PHPCI -> CHAT -> REDIST. That work just fine.
What i need is to distribute the application so i can have as many PHPCI or CHAT or REDIST i want, example
PHPCI1 CHAT1
PHPCI2 -> -> REDIST1
PHPCI3 CHAT2
Where the numbers represent instances not different apps.
So a User A connected to PHPCI1 can send a message to a user B connected on PHPCI3.
I think some queue in the middle of CHAT can handle this something like rabbitmq that can only use the SocketIO to deliver the messages to the client.
If you're distributing the server load (and that's a requirement), I'd suggest adding a designated chat data server (usually an in-memory database or message queue) to handle chat state and message passing across edge servers.
Redis Pub/Sub is ideal for this purpose, and can scale up to crazy levels on even a low-end machine. The Redis Cookbook has a chapter on precisely this use case.
If you set up the server-side of your chat app correctly, you shouldn't have to distribute socket.io. Since node.js is browser-based and doesn't require any client-side code (other than the resources downloaded from the webpage), it works automatically. With a webpage, the files required to run socket.io are temporarily downloaded to users when they are correctly included (just like with jQuery). If you are using node.js and socket.io to make an android app, the files should be included in your application when you distribute it, not separately.
In addition, if you wish to use two separate socket.io servers, you should be able to establish communication between the two by connecting them in a similar manner that a client connects to the server, but with a special parameter that lets the other server know that a server connected and it can respond and set a variable for the other server.
I am new to node.js so need your help in validating the below mentioned approach.
Problem: I am developing a node.js application which is broadcasts messages to the people who are specifically subscribed to a topic. If the user in logged into the application either via web or mobile I want to use socket.io to push new messages as and when they are created. As I mentioned I need to push the messages to a selected list of logged in users based on certain filters, the message is not pushed to everyone logged in only to the users matching filter criteria. This is express application.
Approach: As soon as a client makes a connection to server a socket is established. The socket will be allocated a room. The key will be login name, so if there are further request from the same login ex. multiple browser windows those sockets also will be added to the same room. The Login Name and the room with sockets will be stored in Redis. When a new message is created internal application logic will determine the users who needs to be notified. Fetch only those logins from Redis along with the room information and push the message to them. When the sockets are closed remove the Redis entry for that login...Also this needs to be scalable since i might use node cluster in the future.
I read lot of about socket.io and Redis pub/sub approach and i am not using them in the approach above. Just storing the login and sockets as key value pairs
Can you please let me know if this is a good approach. Will there be any performance/scalability issue? Is there any other better ways to do this?
Thanks a lot for all your help....
You're Redis model will have to be a little more complicated than that. You'll need to maintain an index using sets, so you can find intersects which can be used to find all users in a given room. You'll then need to use redis's pub/sub functionality to enable realtime notifications. You'll also need to store messages in indexed sets, then just publish to inform your application that a change has been made, therefore sending the new data from the set.
If you could provide an example I can provide some redis commands to better explain how Redis works.
Update
This is in response to comments below.
Technologies I would use:
Nginx
Socket.io
mranney/node_redis
Redis
Scaling Redis
There are several solutions to scale Redis. If you need higher concurrency you can scale using master-slave replication. If you need more memory you can set up partitioning, or you can use the Redis Cluster beta(3.0.0). Alternatively you can outsource your solution to one of many Redis services(RedisGreen,RedisLabs,etc.), however this is best paired with a PaaS provider(AWS, Google Compute, Joyent) so it can be depolyed in the same cloud.
Scaling Socket.io
Socket.io can be scaled using Nginx. This is pretty common practice when scaling WebSockets. You then can synchronize each node app(with socket.io) using Redis as a messaging protocol(pub/sub).
You can SUBSCRIBE connections to track when a user joins or leaves, on the event of that, which ever app/server fires the event will PUBLISH connections update or PUBLISH connections "user:john left". If a user were to leave like in the latter example, you must also remember to remove that user from the set that represents a room(ex generalChat) so something like this SREM generalChat user:john, then execute the latter upon callback from the SREM command. Once the PUBLISH is sent, all apps/servers connected to redis, already having subscribed, will receive a message query from Redis in realtime notifying them to update. All apps/servers will broadcast to the corresponding room either a new user list(redis set type) or a command notifying the frontend to remove the user.
Basically all your sockets are in sync with Redis, so you can host multiple socket.io servers and use Messaging via Pub/Sub to queue actions across your entire cloud.
Examples
It's not hard to scale socket.io with Redis, however Redis may be cumbersome to setup and scale, but Redis doesn't use that much memory because you manage your own relations so you therefore only have relations mapped for your specific intentions. Also you can lease cloud hosting for 8GB for $80 a month, and that would support higher concurrency than the Big Boy plan from pusher, for less than half the price, and you get persistence as well so your stack is more uniform and has less dependencies.
If you were to use Pusher you'd probably need a persistent storage medium like MongoDB, MySQL, Postgre, etc. With Redis you can rely on it for all your data storage(excluding file storage). This would then create more traffic depending on your implementation.
Ex 1
You can use pusher to notify changes and refer to the backend to populate the new/changed data.
Pusher for Messaging
Boiler Plate:
Client <== Socket Opened ==> Pusher
Client === User Left ==> Pusher
All Clients <== User left === Pusher
All Clients === Request New Data ==> Backend <==> Database
All Clients <== Response === Backend
This can create a lot of problems, and you'd have to implement timeouts. This also takes a lot of Pusher connections, which is expensive.
Ex 2
You can connect to pusher with your backend to save the frontend from handling many requests(probably better for mobile users). This saves pusher traffic, because its not sending to hundreds/thousands of clients, just a handful of your backend servers.
This example assumes that you have 4 socket.io servers running.
Pusher for MQ on Backend
Boiler Plate:
Backend 1/2/3/4 <== Socket Opened ==> Pusher
Backend 1 === Remove User from room ==> Database
Backend 1 === User Leaves ==> Pusher
Backend 1/2/3/4 <== Use Left === Pusher
Backend 1/2/3/4 === Get Data ==> Database
Backend 1/2/3/4 <== Recieve Data === Database
Backend 1/2/3/4 === New Data ==> Room(clients)
Ex 3
You can use Redis as explained above.
Again assuming 4 socket.io servers.
Redis as MQ and datastore
Boiler Plate:
Backend 1/2/3/4 <== Connected ==> Redis
Backend 1/2/3/4 === Subscribe ==> Redis
Backend 1 === User Left ==> Redis (removes user)
Backend 1 === PUBLISH queue that user left ==> Redis
Backend 1/2/3/4 <== User Left Message === Redis
Backend 1/2/3/4 === Get New Data ==> Redis
Backend 1/2/3/4 <== New Data === Redis
Backend 1/2/3/4 === New Data ==> Room(clients)
All of these examples can be improved and optimized significantly, but I won't do that for sake of readability and clarity.
Conclusion
If you know how Redis works implementing this should be fairly straight forward. If you're learning redis you should start out a little smaller to get a hang of how redis works(its more than key:value storage). In the end running redis would be more cost effective, and efficient, but would take longer to develop. Pusher would be much more expensive, include more dependencies into your stack, and wouldn't be as effective(pusher is on a different cloud). Only advantage for using Pusher or any other service similar to it, is the ease of use for the platform they provide. You're essentially paying a monthly fee for boilerplate code and stack management.
Bottom Line
It would be best to reverse proxy with Nginx regardless of which stack you choose, so you can easily scale.
Redis, Socket.io, Node.js stack would be the best for large scale projects, and professional products. It will keep your operating cost down, and increase your concurrency without dramatically increasing your cost as you scale.
Redis, Socket.io(optional), Node.js, Pusher, Database stack would be best for smaller projects that you don't expect much growth out of. Once you get to 5,000 connections you're forking out $199/mo just for pusher, then you have to consider the cost for the rest of your stack. If you connect your backend to Pusher instead you'll save money, increase production time, and you'll still suffer performance hits from retrieving data from a thirdparty cloud.