I have the following dilemma:
I need to send a heartbeat message every 5 minutes (or less) to all users of my app
I thought about topic messaging, but the 1 million subscriber limit is not acceptable for my application
So: the only possibility left is sending out the message in batches of 1000
This is really resource intensive
Now my question:
How can I make this process of batching and sending really efficient? Is there a good solution already made, preferably in node.js?
Thank you,
Sebastian
You may use XMPP, instead of HTTP.
As google says, it is less resource intensive in respect to HTTP:
The asynchronous nature of XMPP allows you to send more messages with
fewer resources.
Also you can have 1000 similtanouis connection per app (sender ID):
For each sender ID, GCM allows 1000 connections in parallel.
Also there exists a node-xmpp solution available for that.
Related
I'm using service bus service From azure to Send Messages and I was wondering if Using SessionId will effect the speed of sending messages than the Case if I dont use it.
I know that SessionId will preserve the Order but what about the all in all speed ?
Thanks
Sending a message will not be much slower when you specify a session ID. Processing will be, but this is the wrong terminology to use. You can't compare handling messages w/o a session by multiple concurrent consumers and sessioned messages where the intent is to process those messages in the order they were sent in. Different business requirements that have different justifications, right? If you plan to use sessions, processing will be somewhat slower due to only a single active consumer being able to process all the messages from a given session. And that has to be backed up by a requirement, probably.
Take, for example handling items scanned at a grocery checkout. If you want to know what items are purchased in general, competing consumers is the way to go. However, if you want to know what items were bought per purchase, you can't use a competing consumer and have to use sessions to ensure only items for a given purchase are included and nothing else. Will the latter be somewhat slower? Yes, but you can't accomplish it with a competing consumer and if the business wants it, they'll accept the cost of slightly slower processing to gain the insights. Note, there are always multiple ways to solve the problem and maybe sessions is not what's needed at all.
So, we're designing a new micro-service architecture. One of the biggest challenge is internal communication. For communication, in which response is required, we're using REST APIs. But for the services, which just wants to relay the information, this API processing is unnecessary overhead.
One way is to use Queue. The service1 will push the information into a queue, and service2 can consume from there. Therefore service1 don't have to wait (unlike an API call). (If there is any error in processing the information, service2 can either inform via a callback URL to service1, or any other way; this is not a concern at this point [1])
Now with Queue, there are two options, one is RabbitMQ. And another is AWS SQS. With RabbitMQ I've to worry about server-setup and everything (which can be done, but wants to avoid it). So after a POC of SQS, it seems like a good option, but the thing is SQS internally uses Rest APIs to communicate with AWS servers, at both point (service1 when pushing, service2 when consuming), there will be overhead. So now I'm thinking why not do it in NodeJS, service1 will hit the service2 with information. Service2 will respond immediately, acknowledging that it has received the information, if there is any error then [1].
Now Pros/Cons I could summarise is -
RabbitMQ
Easy to implement
In case of unavailability of receiver, sender won't have to worry about retrying.
Server Setup Cost + Maintenance (+ Tuning)
SQS
Easiest to implement
Pricing
Constant Polling for Messages
Overhead at push/receive
Non-blocking APIs
No 3rd medium required for communication
Service1 has to manage retry mechanism
Relative to SQS, less overhead
Information will be in-memory until processed
So to some up, my question is, is it a good idea to go with Non-blocking APIs? Or which one will be better approach, in terms of making system scalable.
Edit -
Can a PubSub provider like PubNub or Pusher can be used instead of Queue?
SQS uses XML over http, RabbitMQ uses AMQP, all protocols have overhead. Serializing/deserializing has a cost. Both the amazon SQS and AMQP are very efficient. I would exclude these "overheads" from your calculations, and instead focus on your other requirements.
One of the big advantages of using a queue is the handling of surge activity. If you get 100K hits, and need to send 100K messages, and you try to implement this as inter-service calls (non-blocking or otherwise), you will hit real limits on the scalability of your system (from a port count if nothing else). If you instead put 100K messages on a queue, those messages can be processed basically at the remote server's "leisure".
Additionally, as you have mentioned above, queues have a persistence that is much more difficult to implement on your own. If you data is not critical, this is not a big concern, but if this data is of higher importance, you really want something that pushes to a persistent store (Like SQS, or Rabbit persistent queues)...
I am late here but off late I have started working with NON Blocking I/O and see a great benefit of NIO especially when you are calling external services which cannot be given access to a message queue. Using a fixed connection pool will ensure that 100K problem is handled with non blocking I/O and too many connections are not created.
While calling internal services a message queue is prefered, but lets say you do not have that option, you can leverage NIO with a retry mechanism and connection pooling to given you the same scalability message queues would give. This is assuming that receivers are able to handle the load of NIO calls.
We are using socketIO on a large chat application.
At some points we want to dispatch "presence" (user availability) to all other users.
io.in('room1').emit('availability:update', {userid='xxx', isAvailable: false});
room1 may contains a lot of users (500 max). We observe a significant raise in our NodeJS load when many availability updates are triggered.
The idea was to use something similar to redis store with Socket IO. Have web browser clients to connect to different NodeJS servers.
When we want to emit to a room we dispatch the "emit to room1" payload to all other NodeJS processes using Redis PubSub ZeroMQ or even RabbitMQ for persistence. Each process will itself call his own io.in('room1').emit to target his subset of connected users.
One of the concern with this setup is that the inter-process communication may become quite busy and I was wondering if it may become a problem in the future.
Here is the architecture I have in mind.
Could you batch changes and only distribute them every 5 seconds or so? In other words, on each node server, simply take a 'snapshot' every X seconds of the current state of all users (e.g. 'connected', 'idle', etc.) and then send that to the other relevant servers in your cluster.
Each server then does the same, every 5 seconds or so it sends the same message - of only the changes in user state - as one batch object array to all connected clients.
Right now, I'm rather surprised you are attempting to send information about each user as a packet. Batching seems like it would solve your problem quite well, as it would also make better use of standard packet sizes that are normally transmitted via routers and switches.
You are looking for this library:
https://github.com/automattic/socket.io-redis
Which can be used with this emitter:
https://github.com/Automattic/socket.io-emitter
About available users function, I think there are two alternatives,you can create a "queue Users" where will contents "public data" from connected users or you can use exchanges binding information for show users connected. If you use an "user's queue", this will be the same for each "room" and you could update it when an user go out, "popping" its state message from queue (Although you will have to "reorganize" all queue message for it).
Nevertheless, I think that RabbitMQ is designed for asynchronous communication and it is not very useful approximation have a register for presence or not from users. I think it's better for applications where you don't know when the user will receive the message and its "real availability" ("fire and forget architectures"). ZeroMQ require more work from zero but you could implement something more specific for your situation with a better performance.
An publish/subscribe example from RabbitMQ site could be a good point to begin a new design like yours where a message it's sent to several users at same time. At summary, I will create two queues for user (receive and send queue messages) and I'll use specific exchanges for each "room chat" controlling that users are in each room using exchange binding's information. Always you have two queues for user and you create exchanges to binding it to one or more "chat rooms".
I hope this answer could be useful for you ,sorry for my bad English.
This is the common approach for sharing data across several Socket.io processes. You have done well, so far, with a single process and a single thread. I could lamely assume that you could pick any of the mentioned technologies for communicating shared data without hitting any performance issues.
If all you need is IPC, you could perhaps have a look at Faye. If, however, you need to have some data persisted, you could start a Redis cluster with as many Redis masters as you have CPUs, though this will add minor networking noise for Pub/Sub.
I have no clue if it's better to ask this here, or over on Programmers.SE, so if I have this wrong, please migrate.
First, a bit about what I'm trying to implement. I have a node.js application that takes messages from one source (a socket.io client), and then does processing on the message, which might result in zero or more messages back out, either to the sender, or other clients within that group.
For the processing, I would like to essentially just shove the message into a queue, then it works its way through various message processors that might kick off their own items, and eventually, the bit running socket.io is informed "Hey, send this message back"
As a concrete example, say a user signs into the service, that sign in message is then placed in the queue, where the authorization processor gets it, does it's thing, then places a message back in the queue saying the client's been authorized. This goes back to the socket.io socket that is connected to the client, along with other clients that might be interested. It can also go to other subsystems that might want to do more processing on authorization (looking up user info, sending more info to the client based on their data, etc).
If I wanted strong coupling, this would be easy, but I tried that before, and it just goes to a mess of spaghetti code that's very fragile, and I would like to avoid that. Another wrench in the setup is this should be cluster-able, which is where the real problem comes in. There might be more than one, say, authorization processor running. But the authorization message should be processed only once.
So, in short, I'm looking for a pattern/technique that will allow me to, essentially, have multiple "groups" of subscribers for a message, and the message will be processed only once per group.
I thought about maybe having each instance of a processor generate a unique name that would be used as a list in Reids. This name would then be registered with some sort of dispatch handler, and placed into a set for that group of subscribers. Then when a message arrives, the dispatch pulls a random member out of that set, and places it into that list. While it seems like this would work, it seems somewhat over-complicated and fragile.
The core problem is I've never designed a system like this, so I'm not even sure the proper terms to use or look up. If anyone can point me in the right direction for this, I would be most appreciative.
I think what your describing is similar to https://www.getbridge.com/ service. I it but ended up writing my own based on zeromq, it allows you to register services, req -> <- rec and channels which are pub / sub workers.
As for the design, I used a client -> broker -> services & channels which are all plug and play using auto discovery, you have the services register their schema with the brokers who open a tcp connection so that brokers on other servers can communicate with that broker groups services. Then internal services and clients connect via unix sockets or ipc channels which ever is preferred.
I ended up wrapping around the redis publish/subscribe functions a bit to do this. Each type of message processor gets a "group name", and there can be multiple instances of the processor within that group (so multiple instances of the program can run for clustering).
When publishing a message, I generate an incremental ID, then store the message in a string key with that ID, then publish the message ID.
On the receiving end, the first thing the subscriber does is attempt to add the message ID it just got from the publisher into a set of received messages for that group with sadd. If sadd returns 0, the message has already been grabbed by another instance, and it just returns. If it returns 1, the full message is pulled out of the string key and sent to the listener.
Of course, this relies on redis being single threaded, which I imagine will continue to be the case.
What you might be looking for is an AMQP protocol implementation,where you can have queue get custom exchanges,and implement a pub-sub model.
RabbitMQ - a popular amqp protocol implementation with lots of libraries
it also has node.js library
Since Azure Service Bus limits the maximum number of concurrent connections to a Queue or Topic to 100, is there a method that we can use to query our Queues/Topics to determine how many concurrent connections there are?
We are aware that we can capture the throttling events, but would very much prefer an active approach, where we can proactively increase or decrease the number of Queues/Topics when the system is under a heavy load.
The use case here is a process waiting for a reply message, where the reply is coming from a long-running process, and the subscription is using a Correlation Filter to facilitate two-way communication between the Publisher and Subscriber. Thus, we must have a BeginReceive() going in order to await the response, and each such Publisher will be consuming a connection for the duration of their wait time. The system already balances load across multiple Topics, but we need a way to be proactive about how many Topics are created, so that we do not get throttled too often, but at the same time not have an excess of Topics for this purpose.
I don't believe it is currently possile to query the listener counts. I think that the subscriber object also figures into that so in theory, if you have up to 2000 subscribers per topic and if each allows up to 100 connections, that's alot of potential connections. We just need to keep in mind that subscribers are cooperative (each gets a copy of all messages) and receivers on subscriers are competitive (only one gets it).
I've also seen unconfirmed reports of performance delays when you start running > 1,000 subscribers so make sure you test this scenario.
But... given your scenario, I'd deduce that performance time likely isn't the biggest factor (you have long running processes already). So introducing a couple seconds lag into the workflow likely won't be critical. If that's the case, I'd set the timeout for your BeginRecieve to something fairly short (couple seconds) and have a sleep/wait delay between attempts. This gives other listeners an opportnity to get messsages as well. We might also want to consider an approach where we attempt to recieve multiple messages and then assign them out other processes for processing (coorelation in this case?).
Juts some thoughts.