Duplicate Requests on Service Fabric When Remoting

Duplicate Requests on Service Fabric When Remoting - azure

I have a Stateless service on Service Fabric (ASP.NET Core) which will call an Actor and the Actor may internally also call other Actors and/or Stateful Services depending on the scenarios.
My question is, do we need to account for duplicate requests due to the remoting aspect of the system?
In our earlier Akka.Net implementations there was a chance that the Actor received duplicate requests due to TCP/IP network congestion etc, and we handled that by giving each message a unique Correlation Id. We would store the request and its outcome in state on the actors and if the same correlation id came back again, we would just assume it was a duplicate and sent the earlier outcome instead of re-processing the request.
I had seen a similar approach used in one of the sample projects Microsoft had but I can't seem to find that anymore (dead link on Github).
Does anyone know if this needs to be handled in Actor and or Stateful services?

You could add custom headers in your remoting calls, by creating a custom implementations of IServiceRemotingClientFactory and IServiceRemotingClient.
Add custom headers inside the operations RequestResponseAsync and SendOneWay.
Another example by Peter Bons here.:
var header = requestRequestMessage.GetHeader();
var customHeaders = customHeadersProvider.Invoke() ?? new CustomHeaders();
header.AddHeader(CustomHeaders.CustomHeader, customHeaders.Serialize());
On the receiving side, you can get the custom header from IServiceRemotingRequestMessageHeader in a custom ActorServiceRemotingDispatcher.

Related

Hexagonal Architecture for a real-time stock watcher

I'm designing a stock market watcher system.
It accepts registration of patterns from subscribers. Meanwhile it polls latest market info every few seconds, the it supports multiple market, so the polling interval, working hours are depend on different market configuration. It may also dynamically adapt polling rates based on market information or subscriptions.
If the market info matches some pattern, it logs the information, and sends alert to subscribers.
The subscribers, patterns, and logs are undoubtedly domain models.
The market info source, alert fanout are out adapters.
Then where should the polling engine be?
Some approaches came to me:
Be a Domain Service: Domain service manage threads to poll market & match patterns.
Be an Application Service: Threads are implementation details, so application manage threads for polling, there are also two approaches:
2a. Application do the most logic, it queries market info, invoke pattern.match(), create logs and sends alerts
2b. Application just invokes a method like GetInfoAndMatch() in the Domain, Domain handles details like 2a did.
I'm struggling which one is more make sense, what's your opinion?

The polling engine triggers a controller. Just like a user would trigger an update manually. The controller then invokes the use case (or primary port in hexagonal architecture) and passes the result to presenter. The presenter updates the ui models and the views show the new values.
In an rich client application this is non big deal since the controller can directly access the ui models.
In a web application the ui controller is on the client side and the backend controller at the backend side (see this answer for details).
Here the backend controller gets either triggered by the "polling engine" or by the client.

Using infraestructure in use cases

I have been reading the book Patterns, principles and practices of domain driven design
, specifically the chapter dedicated to repositories, and in one of the code examples it uses infrastructure interfaces in the use cases, is it correct that the application layer has knowledge of infrastructure?, I thought that use cases should only have knowledge of the domain ...

Using interface to seperate from implementation is the right way, so use cases layer knows interfaces not infrastructure detail.

It is the Application Layer's responsibility to invoke the (injected) infrastructure services, call the domain layer methods, and persist/load necessary data for the business logic to be executed. The domain layer is unconcerned about how data is persisted or loaded, yes, but the application layer makes it possible to use the business logic defined in the domain layer.
You would probably have three layers that operate on any request: A Controller that accepts the request and knows which application layer method to invoke, an Application Server that knows what data to load and which domain layer method to invoke, and the Domain Entity (usually an Aggregate) that encloses the business logic (a.k.a Invariants).
The Controller's responsibility is only to gather the request params (gather user input in your case), ensure authentication (if needed), and then make the call to the Application Service method.
Application Services are direct clients of the domain model and act as intermediaries to coordinate between the external world and the domain layer. They are responsible for handling infrastructure concerns like ID Generation, Transaction Management, Encryption, etc.
Let's take the example of an imaginary MessageSender Application Service. Here is an example control flow:
API sends the request with conversation_id, user_id (author), and message.
Application Service loads Conversation from the database. If the Conversation ID is valid, and the author can participate in this conversation (these are invariants), you invoke a send method on the Conversation object.
The Conversation object adds the message to its own data, runs its business logic, and decides which users to send it to.
The Conversation object raises events to be dispatched into a message interface (collected in a temporary variable valid for that session) and returns. These events contain the entire data to reconstruct details of the message (timestamps, audit log, etc.) and don't just cater to what is pushed out to the receiver later.
The Application Service persists the updated Conversation object and dispatches all events raised during the recent processing.
A subscriber listening for the event gathers it, constructs the message in the right format (picking only the data it needs from the event), and performs the actual push to the receiver.
So you see the interplay between Application Services and Domain Objects is what makes it possible to use the Domain in the first place. With this structure, you also have a good implementation of the Open-Closed Principle.
Your Conversation object changes only if you are changing business logic (like who should receive the message).
Your Application service will seldom change because it simply loads and persists Conversation objects and publishes any raised events to the message broker.
Your Subscriber logic changes only if you are pushing additional data to the receiver.

How Spring Cloud Stream prevents the application’s instances from receiving duplicate messages?

Spring Cloud Stream is based on At least once method,This means that in some rare cases a duplicate message can arrive at an endpoint.
Does Spring Cloud Stream keep a buffer of already received messages?
The IdempotentReceiver in Enterprise Integration Patterns book suggests :
Design a receiver to be an Idempotent Receiver,one that can safely receive the same message multiple times.
Does Spring Cloud Stream control duplicate messages in consumers?
Update:
A paragraph from Spring Cloud Stream says :
4.5.1. Durability
Consistent with the opinionated application model of Spring Cloud Stream, consumer group subscriptions are durable. That is, a binder implementation ensures that group subscriptions are persistent and that, once at least one subscription for a group has been created, the group receives messages, even if they are sent while all applications in the group are stopped.
Anonymous subscriptions are non-durable by nature. For some binder implementations (such as RabbitMQ), it is possible to have non-durable group subscriptions.
In general, it is preferable to always specify a consumer group when binding an application to a given destination. When scaling up a Spring Cloud Stream application, you must specify a consumer group for each of its input bindings. Doing so prevents the application’s instances from receiving duplicate messages (unless that behavior is desired, which is unusual).

I think your assumption on the responsibility of the spring-cloud-stream framework are incorrect.
Spring-cloud-stream in a nutshell is a framework responsible for connecting and adapting producers/consumers provided by the developer to the message broker(s) exposed by the spring-cloud-stream binder (e.g., Kafka, Rabbit, Kinesis etc).
So connecting to a broker, receiving message from the broker, deserialising it, invoking user code, serialising message and sending it back to the broker is in the scope of framework responsibility. So you can look at it as purely infrastructure.
What you're describing is more of an application concern since the actual receiver is something that user would develop as part of the spring-cloud-stream development experience, hence responsibility for idempotence would reside with such user.
Also, on top of that most brokers already handle idempotency (in a way) by ensuring that a particular message has been delivered only once. That said, if someone sends identical message to such broker, it will have no idea that it is duplicate so the requirement for idempotency and/or deduplication is still valid, but as you can see it is not as straight forward given the amount of factor that are in play where your understanding of idempotence could be different from mine, hence our approaches could be different as well.
One last thing (partially to prove my last point): can safely receive the same message multiple times. - That is all it states, but what does safely really mean to you vs. me vs. some other person?

If you are concerned about a case where the application receives and processes message from the broker but crashes before it acknowledges the message, that can happen. Spring cloud stream app starters provides support for auto-configuration of a persistent message metadata store which backs Spring Integration's IdempotentReceiverInterceptor. An example of this is in the SFTP source app starter. By default, the sftp source uses an in-memory metadata store, so it would not survive a restart, but can be customized to use a persistent store.

Service Fabric - A web api in cluster who' only job is to serve data from reliable collection

I am new to Service Fabric and currently I am struggling to find out how to access data from reliable collection (That is defined, and initialized in a Statefull Service context) from a WEB API (that is, also living in the Service fabric cluster, as a separate application). The problem is very basic and I am sure I am missing something very obvious. So apologies to the community if this sounds lame.
I have a large XML, a portions of which I want to expose via a WEB API endpoints as results from various queries . Searched for similar questions, but couldn't find a suitable answer.
Would be happy to see how an experienced SF developer would do such task.
EDIT I posted the solution i have came up with

After reading around and observing others issues and Azure's samples, I have implemented a solution. Posting here the gotchas I had, hoping that will help other devs that are new to Azure Service fabric (Disclaimer: I am still a newbie in Service Fabric, so comments and suggestions are highly appreciated):
First, pretty simple - I ended up with a stateful service and a WEB Api Stateless service in an azure service fabric application:
DataStoreService - Stateful service that is reading the large XMLs and stores them into Reliable dictionary (happens in the RunAsync method).
Web Api provides an /api/query endpoint that filters out the Collection of XElements that is stored in the rteliable dictionary and serialize it back to the requestor
3 Gotchas
1) How to get your hands on the reliable dictionary data from the Stateless service, i.e how to get an instance of the Stateful service from Stateless one :
ServiceUriBuilder builder = new ServiceUriBuilder("DataStoreService");
IDataStoreService DataStoreServiceClient = ServiceProxy.Create<IDataStoreService>(builder.ToUri(), new ServicePartitionKey("Your.Partition.Name"));
Above code is already giving you the instance. I.e - you need to use a service proxy. For that purpose you need:
define an interface that your stateful service will implement, and use it when invoking the Create method of ServiceProxy (IDataStoreService)
Pass the correct Partition Key to Create method. This article gives very good intro on Azure Service Bus partiotions
2) Registering of Replica listeners - in order to avoid errors saying
The primary or stateless instance for the partition 'a67f7afa-3370-4e6f-ae7c-15188004bfa1' has invalid address, this means that right address from the replica/instance is not registered in the system
, you need to register replica listeners as stated in this post :
public DataStoreService(StatefulServiceContext context)
: base(context)
{
configurationPackage = Context.CodePackageActivationContext.GetConfigurationPackageObject("Config");
}
3) Service fabric name spacing and referencing services - the ServiceUriBuilder class I took from the service-fabric-dotnet-web-reference-app. Basically you need something to generate an Uri of the form:
new Uri("fabric:/" + this.ApplicationInstance + "/" + this.ServiceInstance);,
where ServiceInstance is the name of the service you want to get instance of (DataStoreService in this case)

You can use WebAPI with OWIN to setup a communication listener and expose data from your reliable collections. See Build a web front end for your app for info on how to set that up. Take a look at the WordCount sample in the Getting started sample apps, which feeds a bunch of random words into a stateful service and keeps a count of the words processed. Hope that helps.

Service Fabric actors that receive events from other actors

I'm trying to model a news post that contains information about the user that posted it. I believe the best way is to send user summary information along with the message to create a news post, but I'm a little confused how to update that summary information if the underlying user information changes. Right now I have the following NewsPostActor and UserActor
public interface INewsPostActor : IActor
{
Task SetInfoAndCommitAsync(NewsPostSummary summary, UserSummary postedBy);
Task AddCommentAsync(string content, UserSummary, postedBy);
}
public interface IUserActor : IActor, IActorEventPublisher<IUserActorEvents>
{
Task UpdateAsync(UserSummary summary);
}
public interface IUserActorEvents : IActorEvents
{
void UserInfoChanged();
}
Where I'm getting stuck is how to have the INewsPostActor implementation subscribe to events published by IUserActor. I've seen the SubscribeAsync method in the sample code at https://github.com/Azure/servicefabric-samples/blob/master/samples/Actors/VS2015/VoiceMailBoxAdvanced/VoicemailBoxAdvanced.Client/Program.cs#L45 but is it appropriate to use this inside the NewsPostActor implementation? Will that keep an actor alive for any reason?
Additionally, I have the ability to add comments to news posts, so should the NewsPostActor also keep a subscription to each IUserActor for each unique user who comments?

Events may not be what you want to be using for this. From the documentation on events (https://azure.microsoft.com/en-gb/documentation/articles/service-fabric-reliable-actors-events/)
Actor events provide a way to send best effort notifications from the
Actor to the clients. Actor events are designed for Actor-Client
communication and should NOT be used for Actor-to-Actor communication.
Worth considering notifying the relevant actors directly or have an actor/service that will manage this communication.

Service Fabric Actors do not yet support a Publish/Subscribe architecture. (see Azure Feedback topic for current status.)
As already answered by charisk, Actor-Events are also not the way to go because they do not have any delivery guarantees.
This means, the UserActor has to initiate a request when a name changes. I can think of multiple options:
From within IUserAccount.ChangeNameAsync() you can send requests directly to all NewsPostActors (assuming the UserAccount holds a list of his posts). However, this would introduce additional latency since the client has to wait until all posts have been updated.
You can send the requests asynchronously. An easy way to do this would be to set a "NameChanged"-property on your Actor state to true within ChangeNameAsync() and have a Timer that regularly checks this property. If it is true, it sends requests to all NewsPostActors and sets the property to false afterwards. This would be an improvement to the previous version, however it still implies a very strong connection between UserAccounts and NewsPosts.
A more scalable solution would be to introduce the "Message Router"-pattern. You can read more about this pattern in Vaughn Vernon's excellent book "Reactive Messaging Patterns with the Actor Model". This way you can basically setup your own Pub/Sub model by sending a "NameChanged"-Message to your Router. NewsPostActors can - depending on your scalability needs - subscribe to that message either directly or through some indirection (maybe a NewsPostCoordinator). And also depending on your scalability needs, the router can forward the messages either directly or asynchronously (by storing it in a queue first).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string