We are planning to build a AKS cluster in HA cross regional using Azure Traffic manager. We intend to apply certain network policies on one region and same needs to be replicated on the other region.
How can we ensure replication across the multi region ?
Additionally how can we ensure Storage isolation in AKS region?
Any leads would be appreciated.
As far as I know kubernetes doesnt offer any sync engine. You can use flux or any other gitops tool to sync configs between clusters (or rather apply the same config to clusters).
Related
We are planning to deploy 15 different applications in azure kubernetes. These applications are owned by multiple business portfolios - eg: marketing, finance, legal.
Currently we have provisioned an AKS cluster with the following configuration
Subnet : x.x.x.x/21, that gives us ~2k IPs.
Network : Azure CNI
Network policy : calico
Min & Max nodes: Azure auto scaling
Min & Max Pods : Based on the CPU & Memory utilization, maximum allowed.
Azure services (eg: SQL) : Leverages service endpoints
On premises : Leverages private endpoint & private DNS.
Region : West US
Question :
Can we deploy all the applications on the same AKS cluster? If not, what is the industry standard & why?
The simple answer is yes. I wouldn't call it industry standard, because AFAIK there isn't one.
The more complicated answer is: It depends. It depends entirely on the security requirements you need to adhere to for the applications you're deploying.
There's a number of ways you can manage this in a single cluster, assuming you want to prevent access (Network / User) via RBAC and the CNI within Kubernetes. It does make things more complicated but arguably less cumbersome long term.
You could separate out the different apps through namespacing, node taints. There's a lot of different options that will be dependent on the requirements you need to fulfill for your clients / company.
Even if you have region specific requirements, you can manage this in a single environment.
I am looking a way to run aks or k8s cluster in dev/test labs but I couldn't find an official way. I guess Azure has allow using production services in Dev/Test Lab however they haven't published yet a document to achieve this. I need rich memory VMs such as 128/256 gb though AKS doesn't support that vm on cluster. And AutoShutdown option will be cost saving for these VMs. So I have to build this in dev/test lab. Any suggestion would be helpful. Thanks!
AKS is a managed service and you can't run it on you own VMs or the ones from Dev/Test Labs. Why are you saying that you can't use 128/256GB RAM VMs? When selecting your VMs size in the portal, make sure to select the Memory Optimized family.
If I understand correctly, your goal is to save money running these high cost VMs. One possible way you can achieve this is create your cluster with a single instance of a smaller VM and create a a second node pool with the larger VMs. You can then create and destroy that second pool on demand.
I'm deciding if I should provide use vanilla kubernetes or use Azure Kubernetes Service for my CI build agents.
What control will I lose if used AKS; SSH inside cluster? turning on and off the VMS? How about the cost, I see that AKS use the VM pricing, is there something beyond that
There are several limitations which come to my mind, but neither of them should restrict your use case:
You lose control over master nodes (control plane). Shouldn't be an issue in your use case, and I hardly imagine where this may be a limitation. You still can SSH into worker nodes in AKS.
You lose fine-grained control over size of worker nodes. Node pools become an abstraction to control size of the VMs. In a self-managed cluster you can attach VMs of completely different size to the cluster. In AKS all the nodes in the same pool must be of the same size (but you can create several node pools with different VM sizes).
It's not possible to choose node's OS in AKS (it's Ubuntu-based).
You're not flexible in choosing network plugins for k8s. It's either kubenet or Azure CNI. But that's fine as long as you're not using some weird applications which requre L2 networking, more info here
There are definitely benefits of AKS:
You're not managing control plane which is a real pain reliever.
AKS can scale its nodes dynamically, which may be a good option for bursty workloads like build agents, but also imposes additional delay during node scaling procedure.
Cluster (control and data planes) upgrades are just couple of clicks in azure portal.
Control plane is free in AKS (in contrast e.g. to EKS in Amazon), you pay only for the worker nodes, you can calculate your price here
We are working on an application that processes excel files and spits off output. Availability is not a big requirement.
Can we turn the VM sets off during night and turn them on again in the morning? Will this kind of setup work with service fabric? If so, is there a way to schedule it?
Thank you all for replying. I've got a chance to talk to a Microsoft Azure rep and documented the conversation in here for community sake.
Response for initial question
A Service Fabric cluster must maintain a minimum number of Primary node types in order for the system services to maintain a quorum and ensure health of the cluster. You can see more about the reliability level and instance count at https://azure.microsoft.com/en-gb/documentation/articles/service-fabric-cluster-capacity/. As such, stopping all of the VMs will cause the Service Fabric cluster to go into quorum loss. Frequently it is possible to bring the nodes back up and Service Fabric will automatically recover from this quorum loss, however this is not guaranteed and the cluster may never be able to recover.
However, if you do not need to save state in your cluster then it may be easier to just delete and recreate the entire cluster (the entire Azure resource group) every day. Creating a new cluster from scratch by deploying a new resource group generally takes less than a half hour, and this can be automated by using Powershell to deploy an ARM template. https://azure.microsoft.com/en-us/documentation/articles/service-fabric-cluster-creation-via-arm/ shows how to setup the ARM template and deploy using Powershell. You can additionally use a fixed domain name or static IP address so that clients don’t have to be reconfigured to connect to the cluster. If you have need to maintain other resources such as the storage account then you could also configure the ARM template to only delete the VM Scale Set and the SF Cluster resource while keeping the network, load balancer, storage accounts, etc.
Q)Is there a better way to stop/start the VMs rather than directly from the scale set?
If you want to stop the VMs in order to save cost, then starting/stopping the VMs directly from the scale set is the only option.
Q) Can we do a primary set with cheapest VMs we can find and add a secondary set with powerful VMs that we can turn on and off?
Yes, it is definitely possible to create two node types – a Primary that is small/cheap, and a ‘Worker’ that is a larger size – and set placement constraints on your application to only deploy to those larger size VMs. However, if your Service Fabric service is storing state then you will still run into a similar problem that once you lose quorum (below 3 replicas/nodes) of your worker VM then there is no guarantee that your SF service itself will come back with all of the state maintained. In this case your cluster itself would still be fine since the Primary nodes are running, but your service’s state may be in an unknown replication state.
I think you have a few options:
Instead of storing state within Service Fabric’s reliable collections, instead store your state externally into something like Azure Storage or SQL Azure. You can optionally use something like Redis cache or Service Fabric’s reliable collections in order to maintain a faster read-cache, just make sure all writes are persisted to an external store. This way you can freely delete and recreate your cluster at any time you want.
Use the Service Fabric backup/restore in order to maintain your state, and delete the entire resource group or cluster overnight and then recreate it and restore state in the morning. The backup/restore duration will depend entirely on how much data you are storing and where you export the backup.
Utilize something such as Azure Batch. Service Fabric is not really designed to be a temporary high capacity compute platform that can be started and stopped regularly, so if this is your goal you may want to look at an HPC platform such as Azure Batch which offers native capabilities to quickly burst up compute capacity.
No. You would have to delete the cluster and recreate the cluster and deploy the application in the morning.
Turning off the cluster is, as Todd said, not an option. However you can scale down the number of VM's in the cluster.
During the day you would run the number of VM's required. At night you can scale down to the minimum of 5. Check this page on how to scale VM sets: https://azure.microsoft.com/en-us/documentation/articles/service-fabric-cluster-scale-up-down/
For development purposes, you can create a Dev/Test Lab Service Fabric cluster which you can start and stop at will.
I have also been able to start and stop SF clusters on Azure by starting and stopping the VM scale sets associated with these clusters. But upon restart all your applications (and with them their state) are gone and must be redeployed.
I'm a little bit confused about when to use Azure Availability Set and when to use Azure Affinity Group.
Lets look at the key purpose of Availability set and Affinity Group briefly to begin with.
Availability Set: is predominately to provide High Availability for your deployment. Azure does this via Fault domains and Upgrade domains.
A fault domain: is basically a different hardware rack in the same datacenter. The solution will be deployed in two different hardware racks.
Upgrade domains: is exactly same like fault domains in function, but they support upgrades rather than failures. The Upgrade domain is a logical unit of instance separation that determines which instances in a particular service will be upgraded at a point in time.
Affinity Group: In order to explain it, we need to take peek into Azure DC . Windows Azure Data Centers are purpose build , you might see rows and rows of containers (something like shipping containers) that contain clusters and racks. Each of those Containers have specific services, for example, Compute and Storage, SQL Azure, Service Bus, Access Control Service, and so on. Those containers are spread across the data center.
When you deploy a service using Portal or PowerShell , the service will talk directly to RDFE (Red Dog Front End). The RDFE controls the DC and nodes. The Cluster of nodes is controlled by Fabric Controller.. When you specify Affinity Group , the Fabric controller will place all the required elements of a deployment together. This has number of advantages like reducing latency (since required elements are close together) , Networking.
There are new changes brought in related to Network Affinity group , you can refer them (https://azure.microsoft.com/en-us/documentation/articles/virtual-networks-migrate-to-regional-vnet/).
To address you question
You would use Availability set when you want to have Highly Available system and also want to have SLA for Compute. Without Availability set there wont be SLA for your VM or PaaS Instances in other words will single instances of VM (IaaS) and PaaS wont have SLA and prone to downtime during HW failure and Upgrades of OS.
Availability set can be implemented after the deployment as well. Do note there is cost associated with the Availability set , since you are running additional instances , so they will be charged.
Affinity group you need to include them at the time of Creation of the services . It cannot be updated after the creation. So it very important to include Affinity group at the time of creation. There is no additional charges for including Affinity group.
Do share your feedback if the response addresses your question.