Do Service Fabric singleton services have downtime when scaling? - azure

We use a Service Fabric cluster to deploy stateless microservices. One of the microservices is designed as a singleton. This means it is designed to be deployed on a single node only.
But does this mean when we scale up or scale down the VM scale set (horizontal scaling) the service will be down? Or does the Service Fabric cluster take care of it?

There are two main concepts do keep in mind about services in service fabric, mainly but not limited to Stateful Services. Partitions and Replicas.
Partitions define the approach used to split the data into groups of data, they are defined as:
Ranged partitioning (otherwise known as UniformInt64Partition). Used to split the data by a range of integer values.
Named partitioning. Applications using this model usually have data that can be bucketed, within a bounded set. Some common examples of data fields used as named partition keys would be regions, postal codes, customer groups, or other business boundaries.
Singleton partitioning. Singleton partitions are typically used when the service does not require any additional routing. For example, stateless services use this partitioning scheme by default.
When you use Singleton for Stateful services, it assumes the data is managed as a single group, no actual data partition is used.
Replicas defined the number of copies a partition will have around the cluster, in order to prevent data-loss on a primary replica failure.
In summary,
If you use a Singleton partition, shouldn't be a problem if the number of replicas is at least 3.
That means, once one NODE gets updated, the replica hosted on that node will be moved to another node, if this replica being moved is a primary replica, it will be demoted to secondary, a secondary will be promoted to primary, and then the demoted replica will shutdown and replicated onto another node.
The third replica is needed in case a replica fails during an upgrade, then the third get promoted to primary.

Related

Are Service Fabric Partition Identifiers persistent?

In Service Fabric we can get a Partition Identifier from a Service Context.
The docs doesn't say much about this property. :-(
I'm wondering whether this identifier is persistent for stateful services, so that it is assigned the same value every time the stateful service partition is started.
I've tested this by creating a reliable service on the local development cluster. After stopping and restarting the cluster the same partition identifiers were indeed reported.
Can I trust that this will always be the case?
If not, what would cause it to change?

What's the difference between primary and non-primary nodes in Azure Service Fabric?

I can't find any specific documentation that says what's the difference between primary node and a non-primary node, and how are they being used. Can somebody shed light on it? Thanks.
If you compare Service Fabric to other Orchestration Tools like Kubernetes, you will notice a small difference on how clusters are defined.
Kubernetes uses a concept of Master to host cluster management services, and Minion to host your application services(containers). Until version 1.1 it was not possible to run containers on the masters, because it had the idea that Master's should be isolated to avoid conflicting with containers running on it, like consuming too much memory, disk, cpu, and so on.
On Service Fabric this is a bit different. When you define a NodeType as Primary, what it means inside the cluster is that this NodeType will be responsible to host the Service Fabric Management Services(services needed to control the cluster health, orchestration and so on).
When you deploy a cluster via Azure Portal, depending on the durability tier (Bronze,Silver,Gold) you choose, it will require a certain number of nodes on Primary Node Type, to keep the cluster management healthy. For production workloads, 5 nodes the minimum recommended size for Primary NodeType or NonPrimary with stateful workloads in it. The minimum supported use VM SKU is Standard D1 or Standard D1_V2.
There is a catch for Primary Node-type, the change of VMSS Sku (Size) is not supported, you can do on your own risk, but is a recipe for disaster, because the risk of loosing management services is too high.
Non-primary NodeType, there is no overall difference other than these mentioned above. All NodeTypes will have a VMSS and a LoadBalancer(with an domain) being able to configure the access rules. All NodeType will have a limit of 100 nodes.
Compared to Kubernetes, SF does not add any constraints to prevent you deploying your services alongside the management services on primary nodes, Every node is part of a pool of resources(including the primary). So the default behaviour is deploy applications on every node available no matter the NodeType.
When you plan bigger clusters (100+ nodes), it is important that you take that in account, and isolate your Primary NodeType from your workloads, and remove the pressure on your management services nodes.
Having multiple node types can be useful in these situations:
You want to run services exposed to the internet & services not exposed. The first set would run on a node type (VMSS) attached to the Load Balancer and the second on a scale set that isn't.
You need to run services for certain customers on premium hardware and trials on cheaper hardware. The first set would run on nodes with lots of CPU, lots of RAM. The second on lower SKU's.
You want to build a cluster that exceeds the max node count that one VMSS can hold.
Or you need to add scale sets on the fly, to support huge growth.
And: The primary nodes run your system services, the secondaries don't.
There is not much of a difference. Nodes of different node types all share the same characteristics of a Service Fabric Cluster. They all participate in load balancing etc.
Except for one thing: system services run om the nodes of the primairy node type only (source):
Primary node type is where the system services run, so the VM SKU you choose for it, must take into account the overall peak load you plan to place into the cluster. Here is an analogy to illustrate what I mean here - Think of the primary node type as your "Lungs", it is what provides oxygen to your brain, and so if the brain does not get enough oxygen, your body suffers.
An important purpose of node types is to constraint service placement to specific node types. For example, you can have several node types, one uses VM's with higher cpu capacity and one with focus on amount of memory. The you can place memory resource hungry services on one node type and cpu intensive services on the other.

Service fabric Stateful service - Scaling without partitioning?

I am planning to migrate my existing cloud monolithic Restful Web API service to Service fabric in three steps.
The Memory cache (in process) has been heavily used in my cloud service.
Step 1) Migrate cloud service to SF stateful service with 1 replica and single partition. The cache code is as it is. No use of Reliable collection.
Step 2) Horizontal scaling of SF Monolithic stateful service to 5 replica and single partition. Cache code is modified to use Reliable collection.
Step 3) Break down the SF monolithic service to micro services (stateless / stateful)
Is the above approach cleaner? Any recommendation.? Any drawback?
More on Step 2) Horizontal scaling of SF stateful service
I am not planning to use SF partitioning strategy as I could not think of uniform data distribuition in my applictaion.
By adding more replica and no partitioning with SF stateful service , I am just making my service more reliable (Availability) . Is my understanding correct?
I will modify the cache code to use Reliable collection - Dictionary. The same state data will be available in all replicas.
I understand that the GET can be executed on any replica , but update / write need to be executed on primary replica?
How can i scale my SF stateful service without partitioning ?
Can all of the replica including secondory listen to my client request and respond the same? GET shall be able to execute , How PUT & POST call works?
Should i prefer using external cache store (Redis) over Reliable collection at this step? Use Stateless service?
This document has a good overview of options for scaling a particular workload in Service Fabric and some examples of when you'd want to use each.
Option 2 (creating more service instances, dynamically or upfront) sounds like it would map to your workload pretty well. Whether you decide to use a custom stateful service as your cache or use an external store depends on a few things:
Whether you have the space in your main compute machines to store the cached data
Whether your service can get away with a simple cache or whether it needs more advanced features provided by other caching services
Whether your service needs the performance improvement of a cache in the same set of nodes as the web tier or whether it can afford to call out to a remote service in terms of latency
whether you can afford to pay for a caching service, or whether you want to make due with using the memory, compute, and local storage you're already paying for with the VMs.
whether you really want to take on building and running your own cache
To answer some of your other questions:
Yes, adding more replicas increases availability/reliability, not scale. In fact it can have a negative impact on performance (for writes) since changes have to be written to more replicas.
The state data isn't guaranteed to be the same in all replicas, just a majority of them. Some secondaries can even be ahead, which is why reading from secondaries is discouraged.
So to your next question, the recommendation is for all reads and writes to always be performed against the primary so that you're seeing consistent quorum committed data.

How to store temporary data in an Azure multi-instance (scale set) virtual machine?

We developed a server service that (in a few words) supports the communications between two devices. We want to make advantage of the scalability given by an Azure Scale Set (multi instance VM) but we are not sure how to share memory between each instance.
Our service basically stores temporary data in the local virtual machine and these data are read, modified and sent to the devices connected to this server.
If these data are stored locally in one of the instances the other instances cannot access and do not have the same information. Is it correct?
If one of the devices start making some request to the server the instance that is going to process the request will not always be the same so the data at the end is spread between instances.
So the question might be, how to share memory between Azure instances?
Thanks
Depending on the type of data you want to share and how much latency matters, as well as ServiceFabric (low latency but you need to re-architect/re-build bits of your solution), you could look at a shared back end repository - Redis Cache is ideal as a distributed cache; SQL Azure if you want to use a relation db to store the data; storage queue/blob storage - or File storage in a storage account (this allows you just to write to a mounted network drive from both vm instances). DocumentDB is another option, which is suited to storing JSON data.
You could use Service Fabric and take advantage of Reliable Collections to have your state automagically replicated across all instances.
From https://azure.microsoft.com/en-us/documentation/articles/service-fabric-reliable-services-reliable-collections/:
The classes in the Microsoft.ServiceFabric.Data.Collections namespace provide a set of out-of-the-box collections that automatically make your state highly available. Developers need to program only to the Reliable Collection APIs and let Reliable Collections manage the replicated and local state.
The key difference between Reliable Collections and other high-availability technologies (such as Redis, Azure Table service, and Azure Queue service) is that the state is kept locally in the service instance while also being made highly available.
Reliable Collections can be thought of as the natural evolution of the System.Collections classes: a new set of collections that are designed for the cloud and multi-computer applications without increasing complexity for the developer. As such, Reliable Collections are:
Replicated: State changes are replicated for high availability.
Persisted: Data is persisted to disk for durability against large-scale outages (for example, a datacenter power outage).
Asynchronous: APIs are asynchronous to ensure that threads are not blocked when incurring IO.
Transactional: APIs utilize the abstraction of transactions so you can manage multiple Reliable Collections within a service easily.
Working with Reliable Collections -
https://azure.microsoft.com/en-us/documentation/articles/service-fabric-work-with-reliable-collections/

Storing a large amount of state in a service fabric cluster

I have a scenario where we need to store x*100 GBs of data. The data is in-general a good candidate for persistent state for an actor (well-partitioned, used by the specific actors only) in the service fabric cluster itself.
Is the service fabric persistent state storage recommended for data of this scale? (Our compute load is going to be fairly low, so bumping up VMs just to store the state is not a desirable option.)
How does the amount of persistent state affect the latency of moving partitions between nodes in the cluster?
Well let's look at how state is stored in a service (this applies to actors too).
The component that stores your data in your service is called a State Provider. State providers can be in-memory only or in-memory + local disk. The default state provider you get with an actor service is in-memory + local disk but it only keeps hot data in memory so your storage requirements are not memory bound. Contrast with the Reliable Collections state provider which currently stores all data both in-memory and on local disk, although in a future release it will also have an option to only keep hot data in memory and offload the rest to local disk.
Given that you are using actors, you can use the default actor state provider which means your data capacity is limited by local disk storage on your machines or VMs, which should be reasonable for storing 100s of GB. We generally don't move entire partitions around, but occasionally Service Fabric does need to rebuild a replica of your service, and the more data you have the longer it will take to build a replica. However, this doesn't really affect the latency of your service, because you have multiple replicas in a stateful service and you usually have enough replicas up that you don't need to wait for another to be rebuilt. Rebuilding a replica is usually something that happens "off to the side."
It's true that it's not economical to add VMs just for storing state, but keep in mind that you can pack as many services onto your VMs as you want. So even though your actor service isn't using much compute, you can always pack other services on those VMs to use up that compute so that you're maximizing both compute and storage on your VMs, which can in fact be very economical.

Resources