does pusher store persistent historical data in its channels? - pusher

Pusher seems like a good service but I was wondering if anyone knows if it can be used as a persistent activity stream. For instance I would like to subscribe to a channel and get historical activity rather than just the new real-time activity events after I've subscribed.

Channel event history is on our (Pusher's) backlog. You can keep track of updates here:
http://pusher.tenderapp.com/discussions/requests/30-event-history
Since we generally recommend publishing data from your server (for security reasons) this also gives you the opportunity to persist messages and provide your own channel history.
We do understand the benefits of event history which is why it's on our backlog.

Related

How to handle publishing event when message broker is out?

I'm thinking how can I handle sending events when suddenly message broker go down. Please take a look at this code
using (var uow = uowProvider.Create())
{
...
...
var policy = offer.Buy(customer);
uow.Policies.Add(policy);
// DB changes are saved here! but what would happen if...
await uow.CommitChanges();
// ...eventPublisher throw an exception?
await eventPublisher.PublishMessage(PolicyCreated(policy));
return true;
}
IMHO if eventPublisher throw exception the event PolicyCreated won't be published. I don't know how to deal with this situation. The event must be published in system. I suppose that only good solution will be creating some kind of retry mechanism but I'm not sure...
I would like to elaborate a bit on the answers provided by both #Imran Arshad and #VoiceOfUnreason which are, of course, correct.
There are basically 3 patterns when it comes to publishing messages:
exactly once delivery (requires distributed transactions)
at most once delivery (no distributed transaction but may miss messages - like the actor model)
at least once delivery (no distributed transaction but may have duplicate messages)
The following is all in terms of your example.
For exactly once delivery both the database and the queue would need to provide the ability to enlist in distributed transactions. Some queues do not proivde this functionality out-of-the-box (like RabbitMQ) and even though it may be possible to roll your own it may not be the best option. Distributed transactions are typically quite slow.
For at most once delivery we have to accept that we may miss messages and I'm guessing that in most use-cases this is quite troublesome. You would get around this by tracking the progress and picking up the missed messages and resending them if required.
For at least once delivery we would need to ensure that the messages are idempotent. When we get a duplicate messages (usually quite an edge case) they should be ignored or their outcome should be the same as the initial message processed.
Now, there are a couple of ways around your issue. You could start a database transaction and make your database changes. Before you comit you perform the message sending. Should that fail then your transaction would be rolled back. That works fine for sending a single message but in your case some subscribers may have received a message. This complicates matters as all your subscribers need to receive the message or none of them get to receive it.
You could have your subscriber check whether the state is indeed true and whether it should continue processing. This places a burden on the subscriber and introduces some coupling. It could either postpone the action should the state not allow processing, or ignore it.
Another option is that instead of publishing the event you send yourself a command that indicates completion of the step. The command handler would perform the publishing and retry until all subscriber queues receive the message. This would require the relevant subscribers to ignore those messages that they had already processed (idempotence).
The outbox is a store-and-forward approach and will eventually send the message to all subscribers. You could have your outbox perhaps be included in the database transaction. In my Shuttle.Esb service bus one of the folks that used it came across a weird side-effect that I had not planned. He used a sql-based queue as an outbox and the queue connection was to the same database. It was therefore included in the database transasction and would roll back with all the other changes if not committed. Apologies for promoting my own product but I'm sure other service bus offerings may have the same functionality.
There are therefore quite a few things to consider and various techniques to mitigate the risk of a queue outage. I would, however, move the queue interaction to before the database commit.
For reliable system you need to save events locally. If your broker is down you have to retry and publish event.
There are many ways to achieve this but most common is outbox pattern. Just like your mail box your event/message stays locally and you keep retrying until it's sent and you mark the message published in your local DB.
you can read more about here Publish Events
You'll want to review Udi Dahan's discussion of Reliable Messaging without Distributed Transactions.
But very roughly, the PolicyCreated event becomes part of the unit of work; either because it is saved in the Policy representation itself, or because it is saved in an EventRepository that participates in the same transaction as the Policies repository.
Once you've captured the information in your database, retry the publish is relatively straight forward - read the events from the database, publish, optionally mark the events in the database as successfully published so that they can be cleaned up.

Can domain events be deleted?

In order to make the domain event handling consistent, I want to persist the domain events to database while saving the AggregateRoot. later react to them using an event processor, for example let's say I want to send them to an event bus as integration events, I wonder whether or not the event is allowed to be deleted from the database after passing it through the bus?
So the events will never ever be loaded with the AggregateRoot root anymore.
I wonder whether or not the reactor is allowed to remove the event from db after the reaction.
You'll probably want to review Reliable Messaging Without Distributed Transactions, by Udi Dahan; also Pat Helland's paper Life Beyond Distributed Transactions.
In event sourced systems, meaning that the history of domain events is the persisted history of the aggregate, you will almost never delete events.
In a system where the log of domain events is simply a journal of messages to be communicated to other "partners": fundamentally, the domain events are messages that describe information to be copied from one part of the system to another. So when we get an acknowledgement that the message has been copied successfully, we can remove the copy stored "here".
In a system where you can't be sure that all of the consumers have received the domain event (because, perhaps, the list of consumers is not explicit), then you probably can't delete the domain events.
You may be able to move them -- which is, instead of having implicit subscriptions to the aggregate, you could have an explicit subscription from an event history to the aggregate, and then implicit subscriptions to the history.
You might be able to treat the record of domain events as a cache -- if the partner's aren't done consuming the message within 7 days of it being available, then maybe the delivery of the message isn't the biggest problem in the system.
How many nines of delivery guarantee do you need?
Domain events are things that have happened in the past. You can't delete the past, assuming you're not Martin McFly :)
Domain events shouldn't be deleted from event store. If you want to know whether you already processed it before, you can add a flag to know it.
UPDATE ==> DESCRIPTION OF EVENT MANAGEMENT PROCESS
I follow the approach of IDDD (Red Book by Vaughn Vernon, see picture on page 287) this way:
1) The aggregate publish the event locally to the BC (lightweight publisher).
2) In the BC, a lightweight subscriber store all the event published by the BC in an "event store" (which is a table in the same database of the BC).
3) A batch process (worker) reads the event store and publish the events to a message queue (or an event bus as you say).
4) Other BCs interested in the event (or even the same BC) subscribe to the message queue (or event bus) for listening and react to the event.
Anyway, even the worker had sent the event away ok to the message queue, you shouldn't delete the domain event from the event store. Instead simply dont send it again, but events are things that have happened and you cannot (should not) delete a thing that have occurred in the past.
Message queue or event bus are just a mechanism to send/receive events, but the events should remain stored in the BC they were created and published.

How to control idempotency of messages in an event-driven architecture?

I'm working on a project where DynamoDB is being used as database and every use case of the application is triggered by a message published after an item has been created/updated in DB. Currently the code follows this approach:
repository.save(entity);
messagePublisher.publish(event);
Udi Dahan has a video called Reliable Messaging Without Distributed Transactions where he talks about a solution to situations where a system can fail right after saving to DB but before publishing the message as messages are not part of a transaction. But in his solution I think he assumes using a SQL database as the process involves saving, as part of the transaction, the correlationId of the message being processed, the entity modification and the messages that are to be published. Using a NoSQL DB I cannot think of a clean way to store the information about the messages.
A solution would be using DynamoDB streams and subscribe to the events published either using a Lambda or another service to transformed them into domain-specific events. My problem with this is that I wouldn't be able to send the messages from the domain logic, the logic would be spread across the service processing the message and the Lambda/service reacting over changes and the solution would be platform-specific.
Is there any other way to handle this?
I can't say a specific solution based on DynamoDB since I've not used this engine ever. But I've built an event driven system on top of MongoDB so I can share my learnings you might find useful for your case.
You can have different approaches:
1) Based on an event sourcing approach you can just save the events/messages your use case produce within a transaction. In Mongo when you are just inserting/appending new items to the same collection you can ensure atomicity. Anyway, if the engine does not provide that capability the query operation is so centralized that you are reducing the possibility of an error at minimum.
Once all the events are stored, you can then consume them and project them to a given state and then persist the updated state in another transaction.
Here you have to deal with eventual consistency as data will be stale in your read model until you have projected the events.
2) Another approach is applying the UnitOfWork pattern where you cache all the query operations (insert/update/delete) to save both events and the state. Once your use case finishes, you execute all the cached queries against the database (flush). This way although the operations are not atomic you are again centralizing them quite enough to minimize errors.
Of course the best is to use an ACID database if you require that capability and any other approach will be a workaround to get close to it.
About publishing the events I don't know if you mean they are published to a messaging transportation mechanism such as rabbitmq, Kafka, etc. But that must be a background process where you fetch the events from the DB and publishes them in order to break the 2 phase commit within the same transaction.

Service Fabric actors that receive events from other actors

I'm trying to model a news post that contains information about the user that posted it. I believe the best way is to send user summary information along with the message to create a news post, but I'm a little confused how to update that summary information if the underlying user information changes. Right now I have the following NewsPostActor and UserActor
public interface INewsPostActor : IActor
{
Task SetInfoAndCommitAsync(NewsPostSummary summary, UserSummary postedBy);
Task AddCommentAsync(string content, UserSummary, postedBy);
}
public interface IUserActor : IActor, IActorEventPublisher<IUserActorEvents>
{
Task UpdateAsync(UserSummary summary);
}
public interface IUserActorEvents : IActorEvents
{
void UserInfoChanged();
}
Where I'm getting stuck is how to have the INewsPostActor implementation subscribe to events published by IUserActor. I've seen the SubscribeAsync method in the sample code at https://github.com/Azure/servicefabric-samples/blob/master/samples/Actors/VS2015/VoiceMailBoxAdvanced/VoicemailBoxAdvanced.Client/Program.cs#L45 but is it appropriate to use this inside the NewsPostActor implementation? Will that keep an actor alive for any reason?
Additionally, I have the ability to add comments to news posts, so should the NewsPostActor also keep a subscription to each IUserActor for each unique user who comments?
Events may not be what you want to be using for this. From the documentation on events (https://azure.microsoft.com/en-gb/documentation/articles/service-fabric-reliable-actors-events/)
Actor events provide a way to send best effort notifications from the
Actor to the clients. Actor events are designed for Actor-Client
communication and should NOT be used for Actor-to-Actor communication.
Worth considering notifying the relevant actors directly or have an actor/service that will manage this communication.
Service Fabric Actors do not yet support a Publish/Subscribe architecture. (see Azure Feedback topic for current status.)
As already answered by charisk, Actor-Events are also not the way to go because they do not have any delivery guarantees.
This means, the UserActor has to initiate a request when a name changes. I can think of multiple options:
From within IUserAccount.ChangeNameAsync() you can send requests directly to all NewsPostActors (assuming the UserAccount holds a list of his posts). However, this would introduce additional latency since the client has to wait until all posts have been updated.
You can send the requests asynchronously. An easy way to do this would be to set a "NameChanged"-property on your Actor state to true within ChangeNameAsync() and have a Timer that regularly checks this property. If it is true, it sends requests to all NewsPostActors and sets the property to false afterwards. This would be an improvement to the previous version, however it still implies a very strong connection between UserAccounts and NewsPosts.
A more scalable solution would be to introduce the "Message Router"-pattern. You can read more about this pattern in Vaughn Vernon's excellent book "Reactive Messaging Patterns with the Actor Model". This way you can basically setup your own Pub/Sub model by sending a "NameChanged"-Message to your Router. NewsPostActors can - depending on your scalability needs - subscribe to that message either directly or through some indirection (maybe a NewsPostCoordinator). And also depending on your scalability needs, the router can forward the messages either directly or asynchronously (by storing it in a queue first).

CQRS - Event replay for read side

I have read several blogs on CQRS and all of them explain that at write-side events are persisted on event store and upon a request, events are retrieved and replayed on aggregate.
My question is why doesn't event replay on an aggregate is required at read side?
Because your read side doesn't use aggregates.
Read side is implemented as projections which calculate the current state from the stream of events emited by aggregates and persist the current state in some pesistent store or in memory. The while point on the read side is to have a current state readily available for clients.
I wanna add example to Jakub Konecki explanation.
Let's imagine that you model a bank account using event sourcing. Every operation on that account causes event(s) to be persisted. After few years you have hundreds of events that are connected with that bank account. Now, if you want to display balance of that account you would replay all events to calculate the balance? And if there are many accounts, replaying events only to calculate balance would be performance bottleneck for application. We don't even mention other information that are needed to display from bank account and describe current state of account.
That is why we store snapshots of aggregate state on read side, because mainly read side is used for presentation purposes. And we want to keep that part of our system simple.

Resources