Deleting an idle Stateful Service in Service Fabric - azure

I have a set of user-specific stateful services servicing requests forwarded from a public-facing stateless service (web API) in an app.
I'm trying to delete a stateful service if it has not serviced any user request since a given time interval, say an hour. Currently, I'm managing this by keeping a .NET timer in the service itself and using the tick event to self-destruct the service if it's been idle.
Is this the right way to do it? Or is there any other more efficient approach to do this in Azure service fabric?

The mechanism you have will work great and is what we'd normally recommend.
Another way to do it would be to have a general "service manager" service that periodically checked to see if services were busy, (or were informed) and which could kick off the deleteserviceasync call. That way only that service would need the cluster admin rights, while all the others could get locked down to read only.

Related

How to maintain state in a Service Fabric microservice deployed in multiple clusters accessing external resource

I am trying to make my Service Fabric service, which makes a SOAP call to an external service, such that if deployed over 2 or more clusters, it can still work, in that if one service has made the connection to the external service, then the service in the other cluster doesn't try to make the connection, and vice versa.
I can't think of a better way to design this without storing the state in a database, which introduces a host of issues such as locking and race conditions, etc. What are some designs that can fit in this scenario. Any suggestions would be highly appreciated.
There is no way to do that out of the box on Service Fabric.
You have to find an approach to orchestrate these calls between clusters\services, you could:
Create a service in one of the clusters to delegate the calls to other services, and store the info about connections on a single service.
put a message in a queue and each service get one message to open a connection(this can be one of the approaches used above)
Store in a shared cache(redis) every active call, before you attempt to make the call you check if the connection is already active somewhere, when the connection close you remove from the cache for other services be able to open the connection, also enable expiration to close these connections in case of service failure.
Store the state in a database as you suggested

Migration to Azure Service Fabric - Architectural considerations

We are on Azure since 2010 and had a great benefit from a performance and reliability in our application. Azure offers a lot of enterprise-level services and I think that the new "Azure Service Fabric" is great.
What I cannot understand by reading the documentation is the approach on migrating an "old" Cloud Service to the new Service Fabric. Why do we want to migrate? For horizontal scaling and more reliability.
Currently we have a single-instance cloud service, that spins up a lot of subservices. Those subservices are great candidates for microservices. The only problem is that some of these subservices are "runners", i.e. they just cycle on our users database and decide whether an operation (service) has to be run for a particular user or not.
How would you migrate a service like this considering that more than one instance may run this service?
Thanks
First thing to keep in mind is that once a service is started it keeps running, and his lifecycle and uptime is controlled by Service Fabric (ex: it will restart it automatically if it crashes). Second thing to keep in mind is that you will end-up with multiple instances of the service running at the same time (on different nodes), so they will end-up doing the exact same thing on different nodes of your cluster.
Your first reflex could be to have one stateless service kind/instance per runner "subservice" that keeps running and leverage the RunAsync (https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-reliable-services-advanced-usage). Personally, I wouldn't take that approach, since this could then require some kind of synchronization between services to prevent useless concurrency, since they do the exact same thing independently.
A better approach would be to have your runner services need to run only once in a while when requested by the "main" service acting as an orchestrator, you could have a Queue based approach where the "main" service submit tasks (messages) to be processed by the runners, who are listening concurrently on the same Queue, making sure that maximum one service instance would complete the task.
For the Queue, think Service Bus or Reliable Concurrent Queue (https://learn.microsoft.com/enus/dotnet/api/microsoft.servicefabric.data.collections.preview.ireliableconcurrentqueue-1).

Waiting for a service to be ready (Service Fabric)

I have four services running on Azure Service Fabric, but two of those 4 services depend on another one, is there a way to make a service initialization wait until another service announces it is ready?
No. There's no ordering to service creation (services can be created at any time, not just during a deployment from your build machine), and what does it even mean for your service to be ready? From our perspective it means the Failover Manager found nodes that the service is able to run on and the code packages have been activated on those nodes. The platform doesn't know what your service code does though. From your perspective it probably means "when it's responding to my requests" otherwise it's not "ready," which can happen at any time during the service's lifetime for any number of reasons:
Service was just deployed and its communication stack hasn't opened an endpoint yet
Service instance/replica moved and its communication stack is spinning back up on a new node
Service partition is in quorum loss and not accepting write operations
etc.
This is an ongoing thing that your services need to be prepared to handle. If two of services can't do any work until they are able to talk to another service, then they need to poll for that service they depend on until it's available through an endpoint on that service that you define.

Windows Azure Inter-Role communication

I want to create an Azure application which does the following:
User is presented with a MVC 4 website (web role) which shows a list of commands.
When the user selects a command, it is broadcast to all worker roles.
Worker roles process the task, store the results and notify web role
Web role displays the combined results of the worker roles
From what I've been reading there seem to be two ways of doing this: the Windows Azure Service Bus or using Queues. Each worker role also stores the results in the database.
The Service Bus seems more appropriate with its publish/subscribe model, so all worker roles would get the same command and roughly the same time. Queues seem easier to use though.
Can the service bus be used locally with the emulator when developing? I am using a free trial and cannot keep the application constantly whilst still developing. Also, when using queues how can you notify the web role that processing is complete?
I agree. ServiceBus is a better choice for this messaging requirement. You could, with some effort, do the same with queues. But, you'll be writing a lot of code to implement things that the ServiceBus already gives you.
There is not a local emulator for ServiceBus like there is for the Azure Strorage service (queues/tables/blobs). However, you could still use the ServiceBus for messaging between roles while they are running locally in your development environment.
As for your last question about notifying the web role that processing is complete, there are a several ways to go here. Just a few thoughts (not exhaustive list)...
Table storage where the web role can periodically check the status of the unit of work.
Another ServiceBus Queue/topic for completed work.
Internal endpoints. You'll have to have logic to know if it's just an update from worker role N or if it is indicating a completed unit of work for all worker roles.
I agree with Rick's answer, but would also add the following things to think about:
If you choose the Service Bus Topic approach then as each worker role comes online it would need to generate a subscription to the topic. You'll need to think about subscription maintenance of when one of the workers has a failure and is recycled, or any number of reasons why a subscription may be out there.
Telling the web role that all the workers are complete is interesting. The options Rick provides are good ones, but you'll need to think about some things here. It means that the web role needs to know just how many workers are out there or some other mechanism to decide when all have reported done. You could have the situation of five worker roles receieving a message and start working, then one of them starts to repeatedly fail processing. The other four report their completion but now the web role is waiting on the fifth. How long do you wait for a reply? Can you continue? What if you just told the system to scale down and while the web role thinks there are 5 there is now only 4. These are things you'll need to to think about and they all depend on your requirements.
Based on your question, you could use either queue service and get good results. But each of them are going to have different challenges to overcome as well as advantages.
Some advantages of service bus queues is that it provides blocking receipt with a persistent connection (up to 100 connections), it can monitor messages for completion, and it can send larger messages (256KB).
Some advantages of storage queues over the service bus solution is that it's slightly faster (if 15 ms matters to you), you can use a single storage system (since you'll probably be using Storage for blob and table services anyways), and simple auto-scaling. If you need to auto-scale your worker roles based on the load, passing the the requests through a storage queue makes auto-scaling trivial -- you just setup auto-scaling in the Azure Cloud Service UI under the scale tab.
A more in-depth comparison of the two azure queue services can be found here: http://msdn.microsoft.com/en-us/library/hh767287.aspx
Also, when using queues how can you notify the web role that processing is complete?
For the Azure Storage Queues solution, I've written a library that can help: https://github.com/brentrossen/AzureDistributedService.
It provides a proxy layer that facilitates RPC style communication from web roles to worker roles and back through Storage Queues.

Azure - RoleEnvironement events not firing when role is rebooted or taken down for patching

In short, is there a RoleEnvironment event that I can handle in code when any other role in my deployment is rebooted or taken offline for patching?
I've got an application in production that has both web roles for an web front end and web roles running WCF services as an application layer (business logic, data access etc). The web layer communicates with the WCF layer over an internal endpoint as we don't want to expose the services at this point in time. So this means it is not possible to use the load balancer to call my service layer through a single url.
So I have to load balance requests to the WCF web roles manually. This has caused problems in the past when a machine has been recycled by the fabric controller for patching.
I'm handling the RoleEnvironment.Changing and RoleEnvironment.Changed events to adjust the list of backend web roles I am communicating with, which works well in testing when I make a configuration change to increase or decrease the number of instances in my deployment. But if I reboot a role through the portal, this does not fire the RoleEnvironment events.
Thanks,
Rob
RoleEnvironment.Changing will be fired "before a change to the service configuration" (my emphasis). In this case no configuration change is occurring, your service is still configured to have exactly the same number of instances. AFAIK there is no way to know when your deployment is taken offline, and clearly their are instances where notice cannot be given in advance (e.g. hardware failure). Therefore you have to code for communication failure, intercept the error, and try another role instance.
I do not believe you can intercept RoleEnvironment changes from a different Role easily.
I would suggest that you have RoleEnvironment changes trapped in the Role where they occur, handle them by throwing a message/record onto some persisted storage and let your Web-roles check that storage either on a regular schedule or every-time when you communicate to the WCF-roles.
Basically, if you're doing your own internal load-balancing, you need a mechanism for registration/tear-down of your instances so that you can manage your wcf workers
You can use the Azure storage queues to post a message when a role is going down and have a worker role that listens on that queue and adjusts things accordingly.

Resources