I am looking at building an app that monitors the public transport buses for a major city:
I did a quick prototype using pubnub. The buses have a phone transmitting gps signals to a channel and bus users have phones subscribed to channels. I have questions:
I am planning for each bus route there is a channel. The city has 50 routes so there will be 50 routes. Does this adhere to the best practice?
Is there an api to list channels ?
I am sending a message to a channel every second. Assume, there are 50 routes with 5 buses each running 24 hours. There will be 216000000 daily messages. what will i be charged for a day?
Does your Android client open a network connection everytime a publish is call? I want to minimize the bandwith used by the phone that is transmitting the GPS signal.
Bus users may want to see location of multiples buses. I know best practice is to subscribe to one public and one private channel. What is the best way to do it?
I would appreciate if you could answer the above questions.
Full disclosure up front - I work for PubNub Customer Success so responses for pricing related questions are informational in nature only and not to be construed as a promotional. Asker specifically mentions PubNub and the information provided below is publicly available from the PubNub website.
Anant, also as an FYI StackOverflow would normally ask that each of these questions gets asked as a separate thread. Moving forward please do your best to adhere to community guidelines.
1 Every implementation will be different as far as the specific architecture and design pattern strategy though your proposed approach seems to be a sensible utilization of channel methodology. PubNub does not limit the total number of channels in use, however as a practical limitation for most mobile development frameworks subscribing to more than 50 channels simultaneously would be around the upper limit. Adding more than that and both iOS and Android will begin exhibiting performance limitations. If new bus lines are added the subscriptions can be managed to only subscribe to nearby routes, etc.
Question 1 the second with the indent. Yes that can be done with the here_now API
2 PubNub charges $1 per million messages (without SSL enabled) so based on your hypothetical your message charges would be $216 per day. That being said, there is significant room here for design pattern optimization so that busses only publish a new location whenever there is a change - repeated publishes while the bus is standing still are unnecessary. This optimization on it's own will bring the message usage figure down significantly, and there are other strategies which can be utilized to further optimize depending on your specific implementation approach. If you anticipate needing more than 1 billion messages per month, a deployment to Global Cloud would make sense so as to avail yourself of volume discount pricing not otherwise available on Go Cloud.
3 Rather than opening a new connection with every publish, PubNub keeps an active socket connection open until unsubscribed or disconnected via loss of network connection/app force close. The bandwidth utilization to keep this connection active over a period of several hours and absent any other publish/subscribe activity typically measures less than 1K depending on your configuration parameters. Android supports background threading so even when the app is not in focus the connection can remain open to facilitate data push alerts which can be used to prompt the user to bring the app back into the foreground to review any updated information.
4 This question is not clear, assuming that the bus locations are published to the public channel what would the purpose of the private channel serve? If you meant a private channel to receive alerts for the arrival of the user's selected bus, then yes that would be an appropriate implementation strategy. Please clarify if you meant something different.
Related
We are designing a system where users can exchange "messages" (let's say XML files for simplicity sake). This system is peer to peer by design - meaning only directed messages are supported. User A can only send message to User B, it is not possible to send messages to "groups" of users etc. FIFO order is mandatory requirement as well.
This must be a reliable solution - so we started looking into Azure and its services. And Service Bus does look like the right solution to me. It offers all bells and whistles we are looking for:
FIFO order is guaranteed
Dead-letter queue with timeouts
Geo-redundancy
Transactions
and so on
So naturally, I started playing with it. And the first idea I had was to give each user of my system a QUEUE from the service bus. It will act as an INBOX for them. Other users send messages to the user (let's say using unique USER_ID as a queue ID for example), messages get accumulated in the queue and when user decides to check the inbox, they will get all the messages in the correct order. This way we "outsource" all routing, security etc etc to the service bus itself - thus considerable simplifying the app logic.
But there is a serious caveat in this approach - Service Bus supports only up to 10,000 queues: https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-azure-and-service-bus-queues-compared-contrasted#capacity-and-quotas and the number of users in my system can reach tens of thousands (but max out at 100,000 or so). So I'm somewhat in the range but not really. Therefore, I have questions:
Is there a flaw in my approach? Overall, is that a good idea to give a queue to the user exclusively? Or perhaps I should implement some kind of metadata and route messages based on it?
Am I looking at the right solution? I want to use SaaS as much as possible so I don't want to start building RabbitMQs on VMs etc - but are there built-in alternatives? may be a different approach should be considered?..
As for the numbers, I'm looking to start with 2,000 users and 200,000 messages a day - not a high load by any means. But if things work out, I see how these numbers can increase by 20x - 30x (but no more).
I would appreciate any options on this. Thank you.
I am trying to design the strategy that my organization will employ to create topics, and which messages will go to which one. I am looking at either creating a separate topic for each event, or a single topic to hold messages from all events, and then to triage with filters. I am convinced that using a separate topic for every event is better because:
Filters will be less complex and thus more performant, since each
event is already separated in its own topic.
There will be less chance of message congestion in any given topic.
Messages are less likely to be needlessly copied into any given
subscription.
More topics means more messaging stores, which means better message
retrieval and sending.
From a risk management perspective, it seems like having more topics
is better. If I only used a single topic, an outage would affect all
subscribers for all messages. If I use many topics, then perhaps outages would only affect some
topics and leave the others operational.
I get 12 more shared access keys per topic. It's easier to have more granular control over which topics are
exposed to which client apps since I can add/revoke access by
add/revoking the shared access key for each app on a per-topic basis.
Any thoughts would be appreciated
Like Sean already mentioned, there is really no one answer but here are some details about topics that could help you.
Topics are designed for large number of recipients by sending messages to multiple (upto 2000) subscriptions, which actually have the filters
Topics don't really store messages but subscriptions do
For outages, unless you have topics across regions, I'm not sure if it would help as such
The limit is for shared access authorization rules per policy. You should be using one of these to generate a SAS key for your clients.
Also, chaining service bus with autoforwarding is something you could consider as required.
We are using socketIO on a large chat application.
At some points we want to dispatch "presence" (user availability) to all other users.
io.in('room1').emit('availability:update', {userid='xxx', isAvailable: false});
room1 may contains a lot of users (500 max). We observe a significant raise in our NodeJS load when many availability updates are triggered.
The idea was to use something similar to redis store with Socket IO. Have web browser clients to connect to different NodeJS servers.
When we want to emit to a room we dispatch the "emit to room1" payload to all other NodeJS processes using Redis PubSub ZeroMQ or even RabbitMQ for persistence. Each process will itself call his own io.in('room1').emit to target his subset of connected users.
One of the concern with this setup is that the inter-process communication may become quite busy and I was wondering if it may become a problem in the future.
Here is the architecture I have in mind.
Could you batch changes and only distribute them every 5 seconds or so? In other words, on each node server, simply take a 'snapshot' every X seconds of the current state of all users (e.g. 'connected', 'idle', etc.) and then send that to the other relevant servers in your cluster.
Each server then does the same, every 5 seconds or so it sends the same message - of only the changes in user state - as one batch object array to all connected clients.
Right now, I'm rather surprised you are attempting to send information about each user as a packet. Batching seems like it would solve your problem quite well, as it would also make better use of standard packet sizes that are normally transmitted via routers and switches.
You are looking for this library:
https://github.com/automattic/socket.io-redis
Which can be used with this emitter:
https://github.com/Automattic/socket.io-emitter
About available users function, I think there are two alternatives,you can create a "queue Users" where will contents "public data" from connected users or you can use exchanges binding information for show users connected. If you use an "user's queue", this will be the same for each "room" and you could update it when an user go out, "popping" its state message from queue (Although you will have to "reorganize" all queue message for it).
Nevertheless, I think that RabbitMQ is designed for asynchronous communication and it is not very useful approximation have a register for presence or not from users. I think it's better for applications where you don't know when the user will receive the message and its "real availability" ("fire and forget architectures"). ZeroMQ require more work from zero but you could implement something more specific for your situation with a better performance.
An publish/subscribe example from RabbitMQ site could be a good point to begin a new design like yours where a message it's sent to several users at same time. At summary, I will create two queues for user (receive and send queue messages) and I'll use specific exchanges for each "room chat" controlling that users are in each room using exchange binding's information. Always you have two queues for user and you create exchanges to binding it to one or more "chat rooms".
I hope this answer could be useful for you ,sorry for my bad English.
This is the common approach for sharing data across several Socket.io processes. You have done well, so far, with a single process and a single thread. I could lamely assume that you could pick any of the mentioned technologies for communicating shared data without hitting any performance issues.
If all you need is IPC, you could perhaps have a look at Faye. If, however, you need to have some data persisted, you could start a Redis cluster with as many Redis masters as you have CPUs, though this will add minor networking noise for Pub/Sub.
I'm working on a server architecture for sending/receiving messages from remote embedded devices, which will be hosted on Windows Azure. The front-facing servers are going to be maintaining persistent TCP connections with these devices, and I need a way to communicate with them on the backend.
Problem facts:
Devices: ~10,000
Frequency of messages device is sending up to servers: 1/min
Frequency of messages originating server side (e.g. from user actions, scheduled triggers, etc.): 100/day
Average size of message payload: 64 bytes
Upward communication
The devices send up messages very frequently (sensor readings). The constraints for that data are not very strong, due to the fact that we can aggregate/insert those sensor readings in a batched manner, and that they don't require in-order guarantees. I think the best way of handling them is to put them in a Storage Queue, and have a worker process poll the queue at intervals and dump that data. Of course, I'll have to be careful about making sure the worker process does this frequently enough so that the queue doesn't infinitely back up. The max batch size of Azure Storage Queues is 32, but I'm thinking of potentially pulling in more than that: something like publishing to the data store every 1,000 readings or 30 seconds, whichever comes first.
Downward communication
The server sends down updates and notifications much less frequently. This is a slightly harder problem, as I can see two viable paradigms here (with some blending in between). Could either:
Create a Service Bus Queue for each device (or one queue with thousands of subscriptions - limit is for number of queues is 10,000)
Have a state table housed in a DB that contains the latest "state" of a specific message type that the devices will get sent to them
With option 1, the application server simply enqueues a message in a fire-and-forget manner. On the front-end servers, however, there's quite a bit of things that have to happen. Concerns I can see include:
Monitoring 10k queues (or many subscriptions off of a queue - the
Azure SDK apparently reuses connections for subscriptions to the same
queue)
Connection Management
Should no longer monitor a queue if device disconnects.
Need to expire messages if device is disconnected for an extended period of time (so that queue isn't backed up)
Need to enable some type of "refresh" mechanism to update device's complete state when it goes back online
The good news is that service bus queues are durable, and with sessions can arrange messages to come in a FIFO manner.
With option 2, the DB would house a table that would maintain state for all of the devices. This table would be checked periodically by the front-facing servers (every few seconds or so) for state changes written to it by the application server. The front-facing servers would then dispatch to the devices. This removes the requirement for queueing of FIFO, the reasoning being that this message contains the latest state, and doesn't have to compete with other messages destined for the same device. The message is ephemeral: if it fails, then it will be resent when the device reconnects and requests to be refreshed, or at the next check interval of the front-facing server.
In this scenario, the need for queues seems to be removed, but the DB becomes the bottleneck here, and I fear it's not as scalable.
These are both viable approaches, and I feel this question is already becoming too large (although I can provide more descriptions if necessary). Just wanted to get a feel for what's possible, what's usually done, if there's something fundamental I'm missing, and what things in the cloud can I take advantage of to not reinvent the wheel.
If you can identify the device (may be device id/IMEI/Mac address) by the the message it sends then you can reduce the number of queues from 10,000 to 1 queue and not have 10000 subscriptions too. This could also help you in the downward communication as you will be able to identify the device and send the message to the appropriate socket.
As you mentioned the connections last longer you could deliver the command to the device that is connected and decide what to do with the commands to the device that are not connected.
Hope it helps
I have channels for push notification. Can I use this adresses for ping user's device? I want to know the count of online users.
The push notification channels could give you a rather rough count of devices that are reachable at a given instant, but it would potentially double-count the same user on multiple devices and it would be the number that receive the notification (roughly), not the number that are in your app at that time.
Keep in mind too that users could turn off notifications, and if you're surfacing toasts or tiles without perceptible value to the user, they're likely to get rather annoyed and potentially uninstall your app.
Analytics providers like Flurry and Localytics might be an option to provide finer granularity and better accuracy on user behavior. Or simply add some code into your own app to provide the level of tracking required; notifications seems like a rather backdoor means to this end.