Storing a large amount of state in a service fabric cluster - azure

I have a scenario where we need to store x*100 GBs of data. The data is in-general a good candidate for persistent state for an actor (well-partitioned, used by the specific actors only) in the service fabric cluster itself.
Is the service fabric persistent state storage recommended for data of this scale? (Our compute load is going to be fairly low, so bumping up VMs just to store the state is not a desirable option.)
How does the amount of persistent state affect the latency of moving partitions between nodes in the cluster?

Well let's look at how state is stored in a service (this applies to actors too).
The component that stores your data in your service is called a State Provider. State providers can be in-memory only or in-memory + local disk. The default state provider you get with an actor service is in-memory + local disk but it only keeps hot data in memory so your storage requirements are not memory bound. Contrast with the Reliable Collections state provider which currently stores all data both in-memory and on local disk, although in a future release it will also have an option to only keep hot data in memory and offload the rest to local disk.
Given that you are using actors, you can use the default actor state provider which means your data capacity is limited by local disk storage on your machines or VMs, which should be reasonable for storing 100s of GB. We generally don't move entire partitions around, but occasionally Service Fabric does need to rebuild a replica of your service, and the more data you have the longer it will take to build a replica. However, this doesn't really affect the latency of your service, because you have multiple replicas in a stateful service and you usually have enough replicas up that you don't need to wait for another to be rebuilt. Rebuilding a replica is usually something that happens "off to the side."
It's true that it's not economical to add VMs just for storing state, but keep in mind that you can pack as many services onto your VMs as you want. So even though your actor service isn't using much compute, you can always pack other services on those VMs to use up that compute so that you're maximizing both compute and storage on your VMs, which can in fact be very economical.

Related

What's the difference between primary and non-primary nodes in Azure Service Fabric?

I can't find any specific documentation that says what's the difference between primary node and a non-primary node, and how are they being used. Can somebody shed light on it? Thanks.
If you compare Service Fabric to other Orchestration Tools like Kubernetes, you will notice a small difference on how clusters are defined.
Kubernetes uses a concept of Master to host cluster management services, and Minion to host your application services(containers). Until version 1.1 it was not possible to run containers on the masters, because it had the idea that Master's should be isolated to avoid conflicting with containers running on it, like consuming too much memory, disk, cpu, and so on.
On Service Fabric this is a bit different. When you define a NodeType as Primary, what it means inside the cluster is that this NodeType will be responsible to host the Service Fabric Management Services(services needed to control the cluster health, orchestration and so on).
When you deploy a cluster via Azure Portal, depending on the durability tier (Bronze,Silver,Gold) you choose, it will require a certain number of nodes on Primary Node Type, to keep the cluster management healthy. For production workloads, 5 nodes the minimum recommended size for Primary NodeType or NonPrimary with stateful workloads in it. The minimum supported use VM SKU is Standard D1 or Standard D1_V2.
There is a catch for Primary Node-type, the change of VMSS Sku (Size) is not supported, you can do on your own risk, but is a recipe for disaster, because the risk of loosing management services is too high.
Non-primary NodeType, there is no overall difference other than these mentioned above. All NodeTypes will have a VMSS and a LoadBalancer(with an domain) being able to configure the access rules. All NodeType will have a limit of 100 nodes.
Compared to Kubernetes, SF does not add any constraints to prevent you deploying your services alongside the management services on primary nodes, Every node is part of a pool of resources(including the primary). So the default behaviour is deploy applications on every node available no matter the NodeType.
When you plan bigger clusters (100+ nodes), it is important that you take that in account, and isolate your Primary NodeType from your workloads, and remove the pressure on your management services nodes.
Having multiple node types can be useful in these situations:
You want to run services exposed to the internet & services not exposed. The first set would run on a node type (VMSS) attached to the Load Balancer and the second on a scale set that isn't.
You need to run services for certain customers on premium hardware and trials on cheaper hardware. The first set would run on nodes with lots of CPU, lots of RAM. The second on lower SKU's.
You want to build a cluster that exceeds the max node count that one VMSS can hold.
Or you need to add scale sets on the fly, to support huge growth.
And: The primary nodes run your system services, the secondaries don't.
There is not much of a difference. Nodes of different node types all share the same characteristics of a Service Fabric Cluster. They all participate in load balancing etc.
Except for one thing: system services run om the nodes of the primairy node type only (source):
Primary node type is where the system services run, so the VM SKU you choose for it, must take into account the overall peak load you plan to place into the cluster. Here is an analogy to illustrate what I mean here - Think of the primary node type as your "Lungs", it is what provides oxygen to your brain, and so if the brain does not get enough oxygen, your body suffers.
An important purpose of node types is to constraint service placement to specific node types. For example, you can have several node types, one uses VM's with higher cpu capacity and one with focus on amount of memory. The you can place memory resource hungry services on one node type and cpu intensive services on the other.

Service fabric Stateful service - Scaling without partitioning?

I am planning to migrate my existing cloud monolithic Restful Web API service to Service fabric in three steps.
The Memory cache (in process) has been heavily used in my cloud service.
Step 1) Migrate cloud service to SF stateful service with 1 replica and single partition. The cache code is as it is. No use of Reliable collection.
Step 2) Horizontal scaling of SF Monolithic stateful service to 5 replica and single partition. Cache code is modified to use Reliable collection.
Step 3) Break down the SF monolithic service to micro services (stateless / stateful)
Is the above approach cleaner? Any recommendation.? Any drawback?
More on Step 2) Horizontal scaling of SF stateful service
I am not planning to use SF partitioning strategy as I could not think of uniform data distribuition in my applictaion.
By adding more replica and no partitioning with SF stateful service , I am just making my service more reliable (Availability) . Is my understanding correct?
I will modify the cache code to use Reliable collection - Dictionary. The same state data will be available in all replicas.
I understand that the GET can be executed on any replica , but update / write need to be executed on primary replica?
How can i scale my SF stateful service without partitioning ?
Can all of the replica including secondory listen to my client request and respond the same? GET shall be able to execute , How PUT & POST call works?
Should i prefer using external cache store (Redis) over Reliable collection at this step? Use Stateless service?
This document has a good overview of options for scaling a particular workload in Service Fabric and some examples of when you'd want to use each.
Option 2 (creating more service instances, dynamically or upfront) sounds like it would map to your workload pretty well. Whether you decide to use a custom stateful service as your cache or use an external store depends on a few things:
Whether you have the space in your main compute machines to store the cached data
Whether your service can get away with a simple cache or whether it needs more advanced features provided by other caching services
Whether your service needs the performance improvement of a cache in the same set of nodes as the web tier or whether it can afford to call out to a remote service in terms of latency
whether you can afford to pay for a caching service, or whether you want to make due with using the memory, compute, and local storage you're already paying for with the VMs.
whether you really want to take on building and running your own cache
To answer some of your other questions:
Yes, adding more replicas increases availability/reliability, not scale. In fact it can have a negative impact on performance (for writes) since changes have to be written to more replicas.
The state data isn't guaranteed to be the same in all replicas, just a majority of them. Some secondaries can even be ahead, which is why reading from secondaries is discouraged.
So to your next question, the recommendation is for all reads and writes to always be performed against the primary so that you're seeing consistent quorum committed data.

Service fabric actors state

We are planning to use Service fabric actor model for one of our user services. We have thousands of users and they have their own profile data. By far reading the materials, service fabric actor model maintains its states with their service fabric cluster. I couldn't get a clear picture in disaster recovery/planned shutdown scenarios/offline data access. In such cases, Is it needed to persist the data out side of these actor service?
What happens to the data, if we decided to shutdown all the service fabric cluster one day, and wanted to reactivate few days later?
In an SF cluster in Azure, the data is stored on the temp drive. There's no guarantee that a node that is shutdown retains the temp drive. So shutting down all nodes simultaneously will result in data loss.
To avoid this, you should regularly create backups of your (Actor) Services. For instance by using this Nuget package. Store the resulting files outside the cluster.
The cluster technology will help keep your data safe during failures of nodes, e.g. in a 5 node cluster, 4 remaining healthy nodes can take over the work of a failed node. Data is stored redundantly, so your services remain operational. The same functionality also allows for rolling upgrades of services/actors.
Here's an article about DR.
I had implemented a large enterprise application in service fabric using actor model for management of orders.
Few things that might help while choosing a strategy for data backup and restoration
As the package https://github.com/loekd/ServiceFabric.BackupRestore is not full fledged and you need to take care of some of the scenario.
for example: During deployment your actor partitions moved to other nodes and if you try to take incremental backups it will failed with FabricMissingFullBackupException because on that node after becoming primary you haven't took the Full backup and some one needs to manually fix the issue.
How we added the retry pattern to fix that issue is not in the scope of this question.
Incremental backups didn't restore always during restoration process.
Some time Incremental backup creation failed even if you set the logTrunctationIntervalInMinutes properly.
Some developer by mistake deleted the service or application you will loss all your data.
if your system heavily dependent on Reminder's which was in our case.
During restoration all the reminders gets reset.
Good Solution: Override the default KvsActorStateProvider with your own implementation which stores the data in DocumentDB, MongoDB, Cassandra or Azure SQL if you want to use the power BI for some analytics.

How to store temporary data in an Azure multi-instance (scale set) virtual machine?

We developed a server service that (in a few words) supports the communications between two devices. We want to make advantage of the scalability given by an Azure Scale Set (multi instance VM) but we are not sure how to share memory between each instance.
Our service basically stores temporary data in the local virtual machine and these data are read, modified and sent to the devices connected to this server.
If these data are stored locally in one of the instances the other instances cannot access and do not have the same information. Is it correct?
If one of the devices start making some request to the server the instance that is going to process the request will not always be the same so the data at the end is spread between instances.
So the question might be, how to share memory between Azure instances?
Thanks
Depending on the type of data you want to share and how much latency matters, as well as ServiceFabric (low latency but you need to re-architect/re-build bits of your solution), you could look at a shared back end repository - Redis Cache is ideal as a distributed cache; SQL Azure if you want to use a relation db to store the data; storage queue/blob storage - or File storage in a storage account (this allows you just to write to a mounted network drive from both vm instances). DocumentDB is another option, which is suited to storing JSON data.
You could use Service Fabric and take advantage of Reliable Collections to have your state automagically replicated across all instances.
From https://azure.microsoft.com/en-us/documentation/articles/service-fabric-reliable-services-reliable-collections/:
The classes in the Microsoft.ServiceFabric.Data.Collections namespace provide a set of out-of-the-box collections that automatically make your state highly available. Developers need to program only to the Reliable Collection APIs and let Reliable Collections manage the replicated and local state.
The key difference between Reliable Collections and other high-availability technologies (such as Redis, Azure Table service, and Azure Queue service) is that the state is kept locally in the service instance while also being made highly available.
Reliable Collections can be thought of as the natural evolution of the System.Collections classes: a new set of collections that are designed for the cloud and multi-computer applications without increasing complexity for the developer. As such, Reliable Collections are:
Replicated: State changes are replicated for high availability.
Persisted: Data is persisted to disk for durability against large-scale outages (for example, a datacenter power outage).
Asynchronous: APIs are asynchronous to ensure that threads are not blocked when incurring IO.
Transactional: APIs utilize the abstraction of transactions so you can manage multiple Reliable Collections within a service easily.
Working with Reliable Collections -
https://azure.microsoft.com/en-us/documentation/articles/service-fabric-work-with-reliable-collections/

Turning off ServiceFabric clusters overnight

We are working on an application that processes excel files and spits off output. Availability is not a big requirement.
Can we turn the VM sets off during night and turn them on again in the morning? Will this kind of setup work with service fabric? If so, is there a way to schedule it?
Thank you all for replying. I've got a chance to talk to a Microsoft Azure rep and documented the conversation in here for community sake.
Response for initial question
A Service Fabric cluster must maintain a minimum number of Primary node types in order for the system services to maintain a quorum and ensure health of the cluster. You can see more about the reliability level and instance count at https://azure.microsoft.com/en-gb/documentation/articles/service-fabric-cluster-capacity/. As such, stopping all of the VMs will cause the Service Fabric cluster to go into quorum loss. Frequently it is possible to bring the nodes back up and Service Fabric will automatically recover from this quorum loss, however this is not guaranteed and the cluster may never be able to recover.
However, if you do not need to save state in your cluster then it may be easier to just delete and recreate the entire cluster (the entire Azure resource group) every day. Creating a new cluster from scratch by deploying a new resource group generally takes less than a half hour, and this can be automated by using Powershell to deploy an ARM template. https://azure.microsoft.com/en-us/documentation/articles/service-fabric-cluster-creation-via-arm/ shows how to setup the ARM template and deploy using Powershell. You can additionally use a fixed domain name or static IP address so that clients don’t have to be reconfigured to connect to the cluster. If you have need to maintain other resources such as the storage account then you could also configure the ARM template to only delete the VM Scale Set and the SF Cluster resource while keeping the network, load balancer, storage accounts, etc.
Q)Is there a better way to stop/start the VMs rather than directly from the scale set?
If you want to stop the VMs in order to save cost, then starting/stopping the VMs directly from the scale set is the only option.
Q) Can we do a primary set with cheapest VMs we can find and add a secondary set with powerful VMs that we can turn on and off?
Yes, it is definitely possible to create two node types – a Primary that is small/cheap, and a ‘Worker’ that is a larger size – and set placement constraints on your application to only deploy to those larger size VMs. However, if your Service Fabric service is storing state then you will still run into a similar problem that once you lose quorum (below 3 replicas/nodes) of your worker VM then there is no guarantee that your SF service itself will come back with all of the state maintained. In this case your cluster itself would still be fine since the Primary nodes are running, but your service’s state may be in an unknown replication state.
I think you have a few options:
Instead of storing state within Service Fabric’s reliable collections, instead store your state externally into something like Azure Storage or SQL Azure. You can optionally use something like Redis cache or Service Fabric’s reliable collections in order to maintain a faster read-cache, just make sure all writes are persisted to an external store. This way you can freely delete and recreate your cluster at any time you want.
Use the Service Fabric backup/restore in order to maintain your state, and delete the entire resource group or cluster overnight and then recreate it and restore state in the morning. The backup/restore duration will depend entirely on how much data you are storing and where you export the backup.
Utilize something such as Azure Batch. Service Fabric is not really designed to be a temporary high capacity compute platform that can be started and stopped regularly, so if this is your goal you may want to look at an HPC platform such as Azure Batch which offers native capabilities to quickly burst up compute capacity.
No. You would have to delete the cluster and recreate the cluster and deploy the application in the morning.
Turning off the cluster is, as Todd said, not an option. However you can scale down the number of VM's in the cluster.
During the day you would run the number of VM's required. At night you can scale down to the minimum of 5. Check this page on how to scale VM sets: https://azure.microsoft.com/en-us/documentation/articles/service-fabric-cluster-scale-up-down/
For development purposes, you can create a Dev/Test Lab Service Fabric cluster which you can start and stop at will.
I have also been able to start and stop SF clusters on Azure by starting and stopping the VM scale sets associated with these clusters. But upon restart all your applications (and with them their state) are gone and must be redeployed.

Resources