How do I get PubNub to remember my last states and amalgamate them? - pubnub

I'm fairly new to PubNub. I've been using it to control a relatively simple IoT device.
A user interface allows a user to publish messages of the form { "desiredState": state }.
Then, the actual device which is subscribed to the channel sets the state accordingly and publishes a new message back { "actualState": state }.
Then both ends are aware if there is a difference between actualState and desiredState. It's not clear to me though, how I would leverage the PubNub APIs to retrieve the latest of each of those messages. So, for example, I want a single JSON object something like this:
{
"desiredState": state1,
"actualState": state2
}
In this case, state1 would be the latest "desiredState" message and state2 would be the latest "actualState" message.
I have looked into the storage and playback mechanism, but I'm not sure that is the simplest way to go.
What is an efficient PubNub solution for remembering the latest values for each of the possible keys I am publishing?

Related

How to handle publishing event when message broker is out?

I'm thinking how can I handle sending events when suddenly message broker go down. Please take a look at this code
using (var uow = uowProvider.Create())
{
...
...
var policy = offer.Buy(customer);
uow.Policies.Add(policy);
// DB changes are saved here! but what would happen if...
await uow.CommitChanges();
// ...eventPublisher throw an exception?
await eventPublisher.PublishMessage(PolicyCreated(policy));
return true;
}
IMHO if eventPublisher throw exception the event PolicyCreated won't be published. I don't know how to deal with this situation. The event must be published in system. I suppose that only good solution will be creating some kind of retry mechanism but I'm not sure...
I would like to elaborate a bit on the answers provided by both #Imran Arshad and #VoiceOfUnreason which are, of course, correct.
There are basically 3 patterns when it comes to publishing messages:
exactly once delivery (requires distributed transactions)
at most once delivery (no distributed transaction but may miss messages - like the actor model)
at least once delivery (no distributed transaction but may have duplicate messages)
The following is all in terms of your example.
For exactly once delivery both the database and the queue would need to provide the ability to enlist in distributed transactions. Some queues do not proivde this functionality out-of-the-box (like RabbitMQ) and even though it may be possible to roll your own it may not be the best option. Distributed transactions are typically quite slow.
For at most once delivery we have to accept that we may miss messages and I'm guessing that in most use-cases this is quite troublesome. You would get around this by tracking the progress and picking up the missed messages and resending them if required.
For at least once delivery we would need to ensure that the messages are idempotent. When we get a duplicate messages (usually quite an edge case) they should be ignored or their outcome should be the same as the initial message processed.
Now, there are a couple of ways around your issue. You could start a database transaction and make your database changes. Before you comit you perform the message sending. Should that fail then your transaction would be rolled back. That works fine for sending a single message but in your case some subscribers may have received a message. This complicates matters as all your subscribers need to receive the message or none of them get to receive it.
You could have your subscriber check whether the state is indeed true and whether it should continue processing. This places a burden on the subscriber and introduces some coupling. It could either postpone the action should the state not allow processing, or ignore it.
Another option is that instead of publishing the event you send yourself a command that indicates completion of the step. The command handler would perform the publishing and retry until all subscriber queues receive the message. This would require the relevant subscribers to ignore those messages that they had already processed (idempotence).
The outbox is a store-and-forward approach and will eventually send the message to all subscribers. You could have your outbox perhaps be included in the database transaction. In my Shuttle.Esb service bus one of the folks that used it came across a weird side-effect that I had not planned. He used a sql-based queue as an outbox and the queue connection was to the same database. It was therefore included in the database transasction and would roll back with all the other changes if not committed. Apologies for promoting my own product but I'm sure other service bus offerings may have the same functionality.
There are therefore quite a few things to consider and various techniques to mitigate the risk of a queue outage. I would, however, move the queue interaction to before the database commit.
For reliable system you need to save events locally. If your broker is down you have to retry and publish event.
There are many ways to achieve this but most common is outbox pattern. Just like your mail box your event/message stays locally and you keep retrying until it's sent and you mark the message published in your local DB.
you can read more about here Publish Events
You'll want to review Udi Dahan's discussion of Reliable Messaging without Distributed Transactions.
But very roughly, the PolicyCreated event becomes part of the unit of work; either because it is saved in the Policy representation itself, or because it is saved in an EventRepository that participates in the same transaction as the Policies repository.
Once you've captured the information in your database, retry the publish is relatively straight forward - read the events from the database, publish, optionally mark the events in the database as successfully published so that they can be cleaned up.

How to enforce the order of messages passed to an IoT device over MQTT via a cloud-based system (API design issue)

Suppose I have an IoT device which I'm about to control (lets say switch on/off) and monitor (e.g. collect temperature readings). It seems MQTT could be the right fit. I could publish messages to the device to control it and the device could publish messages to a broker to report temperature readings. So far so good.
The problems start to occur when I try to design the API to control the device.
Lets day the device subscribes to two topics:
/device-id/control/on
/device-id/control/off
Then I publish messages to these topics in some order. But given the fact that messaging is typically an asynchronous process there are no guarantees on the order of messages received by the device.
So in case two messages are published in the following order:
/device-id/control/on
/device-id/control/off
they could be received in the reversed order leaving the device turned on, which can have dramatic consequences, depending on the context.
Of course the API could be designed in some other way, for example there could be just one topic
/device-id/control
and the payload of individual messages would carry the meaning of an individual message (on/off). So in case messages are published to this topic in a given order they are expected to be received in the exact same order on the device.
But what if the order of publishes to individual topics cannot be guaranteed? Suppose the following architecture of a system for IoT devices:
/ control service \
application -> broker -> control service -> broker -> IoT device
\ control service /
The components of the system are:
an application which effectively controls the device by publishing messages to a broker
a typical message broker
a control service with some business logic
The important part is that as in most modern distributed systems the control service is a distributed, multi instance entity capable of processing multiple control messages from the application at a time. Therefore the order of messages published by the application can end up totally mixed when delivered to the IoT device.
Now given the fact that most MQTT brokers only implement QoS0 and QoS1 but no QoS2 it gets even more interesting as such control messages could potentially be delivered multiple times (assuming QoS1 - see https://stackoverflow.com/a/30959058/1776942).
My point is that separate topics for control messages is a bad idea. The same goes for a single topic. In both cases there are no message delivery order guarantees.
The only solution to this particular issue that comes to my mind is message versioning so that old (out-dated) messages could simply be skipped when delivered after another message with more recent version property.
Am I missing something?
Is message versioning the only solution to this problem?
Am I missing something?
Most definitely. The example you brought up is a generic control system, being attached to some message-oriented scheme. There are a number of patterns that can be used when referring to a message-based architecture. This article by Microsoft categorizes message patterns into two primary classes:
Commands and
Events
The most generic pattern of command behavior is to issue a command, then measure the state of the system to verify the command was carried out. If you forget to verify, your system has an open loop. Such open loops are (unfortunately) common in IT systems (because it's easy to forget), and often result in bugs and other bad behaviors such as the one described above. So, the proper way to handle a command is:
Issue the command
Inquire as to the state of the system
Evaluate next action
Events, on the other hand, are simply fired off. As the publisher of an event, it is not my business to worry about who receives the event, in what order, etc. Now, it should also be pointed out that the use of any decent message broker (e.g. RabbitMQ) generally carries strong guarantees that messages will be delivered in the order which they were originally published. Note that this does not mean they will be processed in order.
So, if you treat a command as an event, your system is guaranteed to act up sooner or later.
Is message versioning the only solution to this problem?
Message versioning typically refers to a property of the message class itself, rather than a particular instance of the class. It is often used when multiple versions of a message-based API exist and must be backwards-compatible with one another.
What you are instead referring to is unique message identifiers. Guids are particularly handy for making sure that each message gets its own unique id. However, I would argue that de-duplication in message-based architectures is an anti-pattern. One of the consequences of using messaging is that duplicates are possible, so you should try to design your system behaviors to be stateless and idempotent. If this is not possible, it should be considered that messaging may not be the correct communication solution for the need.
Using the command-event dichotomy as an example, you could perform the following transaction:
The controller issues the command, assigning a unique identifier to the command.
The control system receives the command and turns on.
The control system publishes the "light on" event notification, containing the unique id of the command that was used to turn on the light.
The controller receives the notification and correlates it to the original command.
In the event that the controller doesn't receive notification after some timeout, the controller can retry the command. Note that "light on" is an idempotent command, in that multiple calls to it will have the same effect.
When state changes, send the new state immediately and after that periodically every x seconds. With this solution your systems gets into desired state, after some time, even when it temporarily disconnects from the network (low battery).
BTW: You did not miss anything.
Apart from the comment that most brokers don't support QOS2 (I suspect you mean that a number of broker as a service offerings don't support QOS2, such as Amazon's AWS IoT service) you have covered most of the major points.
If message order really is that important then you will have to include some form of ordering marker in the message payload, be this a counter or timestamp.

Service Fabric actors that receive events from other actors

I'm trying to model a news post that contains information about the user that posted it. I believe the best way is to send user summary information along with the message to create a news post, but I'm a little confused how to update that summary information if the underlying user information changes. Right now I have the following NewsPostActor and UserActor
public interface INewsPostActor : IActor
{
Task SetInfoAndCommitAsync(NewsPostSummary summary, UserSummary postedBy);
Task AddCommentAsync(string content, UserSummary, postedBy);
}
public interface IUserActor : IActor, IActorEventPublisher<IUserActorEvents>
{
Task UpdateAsync(UserSummary summary);
}
public interface IUserActorEvents : IActorEvents
{
void UserInfoChanged();
}
Where I'm getting stuck is how to have the INewsPostActor implementation subscribe to events published by IUserActor. I've seen the SubscribeAsync method in the sample code at https://github.com/Azure/servicefabric-samples/blob/master/samples/Actors/VS2015/VoiceMailBoxAdvanced/VoicemailBoxAdvanced.Client/Program.cs#L45 but is it appropriate to use this inside the NewsPostActor implementation? Will that keep an actor alive for any reason?
Additionally, I have the ability to add comments to news posts, so should the NewsPostActor also keep a subscription to each IUserActor for each unique user who comments?
Events may not be what you want to be using for this. From the documentation on events (https://azure.microsoft.com/en-gb/documentation/articles/service-fabric-reliable-actors-events/)
Actor events provide a way to send best effort notifications from the
Actor to the clients. Actor events are designed for Actor-Client
communication and should NOT be used for Actor-to-Actor communication.
Worth considering notifying the relevant actors directly or have an actor/service that will manage this communication.
Service Fabric Actors do not yet support a Publish/Subscribe architecture. (see Azure Feedback topic for current status.)
As already answered by charisk, Actor-Events are also not the way to go because they do not have any delivery guarantees.
This means, the UserActor has to initiate a request when a name changes. I can think of multiple options:
From within IUserAccount.ChangeNameAsync() you can send requests directly to all NewsPostActors (assuming the UserAccount holds a list of his posts). However, this would introduce additional latency since the client has to wait until all posts have been updated.
You can send the requests asynchronously. An easy way to do this would be to set a "NameChanged"-property on your Actor state to true within ChangeNameAsync() and have a Timer that regularly checks this property. If it is true, it sends requests to all NewsPostActors and sets the property to false afterwards. This would be an improvement to the previous version, however it still implies a very strong connection between UserAccounts and NewsPosts.
A more scalable solution would be to introduce the "Message Router"-pattern. You can read more about this pattern in Vaughn Vernon's excellent book "Reactive Messaging Patterns with the Actor Model". This way you can basically setup your own Pub/Sub model by sending a "NameChanged"-Message to your Router. NewsPostActors can - depending on your scalability needs - subscribe to that message either directly or through some indirection (maybe a NewsPostCoordinator). And also depending on your scalability needs, the router can forward the messages either directly or asynchronously (by storing it in a queue first).

Distributed pub/sub with single consumer per message type

I have no clue if it's better to ask this here, or over on Programmers.SE, so if I have this wrong, please migrate.
First, a bit about what I'm trying to implement. I have a node.js application that takes messages from one source (a socket.io client), and then does processing on the message, which might result in zero or more messages back out, either to the sender, or other clients within that group.
For the processing, I would like to essentially just shove the message into a queue, then it works its way through various message processors that might kick off their own items, and eventually, the bit running socket.io is informed "Hey, send this message back"
As a concrete example, say a user signs into the service, that sign in message is then placed in the queue, where the authorization processor gets it, does it's thing, then places a message back in the queue saying the client's been authorized. This goes back to the socket.io socket that is connected to the client, along with other clients that might be interested. It can also go to other subsystems that might want to do more processing on authorization (looking up user info, sending more info to the client based on their data, etc).
If I wanted strong coupling, this would be easy, but I tried that before, and it just goes to a mess of spaghetti code that's very fragile, and I would like to avoid that. Another wrench in the setup is this should be cluster-able, which is where the real problem comes in. There might be more than one, say, authorization processor running. But the authorization message should be processed only once.
So, in short, I'm looking for a pattern/technique that will allow me to, essentially, have multiple "groups" of subscribers for a message, and the message will be processed only once per group.
I thought about maybe having each instance of a processor generate a unique name that would be used as a list in Reids. This name would then be registered with some sort of dispatch handler, and placed into a set for that group of subscribers. Then when a message arrives, the dispatch pulls a random member out of that set, and places it into that list. While it seems like this would work, it seems somewhat over-complicated and fragile.
The core problem is I've never designed a system like this, so I'm not even sure the proper terms to use or look up. If anyone can point me in the right direction for this, I would be most appreciative.
I think what your describing is similar to https://www.getbridge.com/ service. I it but ended up writing my own based on zeromq, it allows you to register services, req -> <- rec and channels which are pub / sub workers.
As for the design, I used a client -> broker -> services & channels which are all plug and play using auto discovery, you have the services register their schema with the brokers who open a tcp connection so that brokers on other servers can communicate with that broker groups services. Then internal services and clients connect via unix sockets or ipc channels which ever is preferred.
I ended up wrapping around the redis publish/subscribe functions a bit to do this. Each type of message processor gets a "group name", and there can be multiple instances of the processor within that group (so multiple instances of the program can run for clustering).
When publishing a message, I generate an incremental ID, then store the message in a string key with that ID, then publish the message ID.
On the receiving end, the first thing the subscriber does is attempt to add the message ID it just got from the publisher into a set of received messages for that group with sadd. If sadd returns 0, the message has already been grabbed by another instance, and it just returns. If it returns 1, the full message is pulled out of the string key and sent to the listener.
Of course, this relies on redis being single threaded, which I imagine will continue to be the case.
What you might be looking for is an AMQP protocol implementation,where you can have queue get custom exchanges,and implement a pub-sub model.
RabbitMQ - a popular amqp protocol implementation with lots of libraries
it also has node.js library

does pusher store persistent historical data in its channels?

Pusher seems like a good service but I was wondering if anyone knows if it can be used as a persistent activity stream. For instance I would like to subscribe to a channel and get historical activity rather than just the new real-time activity events after I've subscribed.
Channel event history is on our (Pusher's) backlog. You can keep track of updates here:
http://pusher.tenderapp.com/discussions/requests/30-event-history
Since we generally recommend publishing data from your server (for security reasons) this also gives you the opportunity to persist messages and provide your own channel history.
We do understand the benefits of event history which is why it's on our backlog.

Resources