AKS randomly change deployments and pods - azure

I am investigating a robust way to scan my Azure AKS clusters and randomly change the numbers of pods, allocated resources, throttling and if possible limit connections to other resources (E.g. database, queues, cache).
The idea is to have this running against any environment (test, QA, live)
Log what changes where made and when
Email that the script has run
Return environment to desired state
My questions are:
Is there tooling for this already?
If this possible via CRON/ Azure pipelines?
This is part of my stress development work cycle that includes API integration and load testing to help find weakness and feedback ways we can improve our offering and teams reputation

Google "Kubernetes chaos engineering".
Look at Azure Chaos Studio https://azure.microsoft.com/en-us/products/chaos-studio/#overview
Create a chaos experiment that uses a Chaos Mesh fault to kill AKS pods with the Azure portal https://learn.microsoft.com/en-us/azure/chaos-studio/chaos-studio-tutorial-aks-portal

Related

Multi regional Azure Container Service DC/OS clusters

I'm experimenting a little with ACS using the DC/OS orchestrator, and while spinning up a cluster within a single region seems simple enough, I'm not quite sure what the best practice would be for doing deployments across multiple regions.
Azure itself does not seem to support deploying to more than one region right now. With that assumption, I guess my only other option is to create multiple, identical clusters in all the regions I wish to be available, and then use Azure Traffic Manager to route incoming traffic to the nearest available cluster.
While this solution works, it also causes a few issues I'm not 100% sure on how I should work around.
Our deployment pipelines must make sure to deploy to all regions when deploying a new version of a service. If we have a East US and North Europe region, during deployments from our CI tool I have to connect to the Marathon API in both regions to trigger the new deployments. If the deployment fails in one region, and succeeds in the other, I suddenly have a disparity between the two regions.
If i have a service using local persistent volumes deployed, let's say PostgreSQL or ElasticSearch, it needs to have instances in both regions since service discovery will only find services local to the region. That brings up the problem of replication between regions to keep all state in all regions; this seem to require some/a lot of manual configuration to get to work.
Has anyone ever used a setup somewhat like this using Azure Container Service (or really Amazon Container Service, as I assume the same challenges can be found there) and have some pointers on how to approach this?
You have multiple options for spinning up across regions. I would use a custom installation together with terraform for each of them. This here is a great starting point: https://github.com/bernadinm/terraform-dcos
Distributing agents across different regions should be no problem, ensuring that your services will keep running despite failures.
Distributing masters (giving you control over the services during failures) is a little more diffult as it involves distributing a zookeeper quorum across high latency links, so you should be careful in choosing the "distance" between regions.
Have a look at the documentation for more details.
You are correct ACS does not currently support Multi-Region deployments.
Your first issue is specific to Marathon in DC/OS, I'll ping some of the engineering folks over there to see if they have any input on best practice.
Your second point is something we (I'm the ACS PM) are looking at. There are some solutions you can use in certain scenarios (e.g. ArangoDB is in the DC/OS universe and will provide replication). The DC/OS team may have something to say here too. In ACS we are evaluating the best approaches to providing solutions for this use case but I'm afraid I can't give any indication of timeline.
An alternative solution is to have your database in a SaaS offering. This takes away all the complexity of managing redundancy and replication.

Service Fabric Azure test environment

We have a number of Service Fabric clusters provisioned in Azure, for dev and testing. I would like to find a way to 'pause' these over night to save paying for them when they're not being used.
This seems to be what the Azure Dev Labs are for, but as far as I can see they don't support Service Fabric Clusters.
I'm thinking of writing a script to completely tear these environments down at night and rebuild them in the morning, but before doing that I'm wondering if there are any better ways.
Service Fabric clusters cannot be safely "paused". If you shut down all VMs, there is a chance that the cluster's state - the applications and their data - will be lost.
If you don't mind starting with a fresh set of clusters every morning, it's pretty straightforward to automate. You can define your environments using ARM templates and write a short script to provision, then create another script to delete the resource groups at the end of the day, which will remove the VMs and all associated resources.

How to turn on/off Azure web apps during office hours [duplicate]

I thought one of the advantages of Azure was that I could turn services on and off depending on when I want them to be available.
However I cant see how to pause my App Service Plan.
Is it possible?
I want to use the S1 tier so that I can play with what it offers. However I want to be able to pause the cost accumulation when I am not using it.
I see from the app service pricing help that an app will still be billed for even though it is in the stopped state.
Yet the link also clearly states that I only pay for what I use. So how does that work?
If you put your hosting plan onto the free tier, you will stop being charged for it. However if you have things like deployment slots and certificates these will be deleted.
The ability to turn services on and off, is more to do with being able to scale services, so if you need 50 servers for an hour you can easily do that.
What you can do to make your solution temporary is to create a deployment script, using Powershell or Resource manager Templates then you can deploy your solution for exactly as long as you need it and then delete it again when you don't. In this sense you can turn your services on and off at a whim.
Azure provides building blocks for you to create the solution you need, it is up to you to figure out how to best use those building blocks to create the solution you seek.
Edited to answer extended question.
If you want to use the S1 pricing plan, and not have it charge when you are not using it, the only way of achieving that is by using automation. Fortunately, this is reasonably trivial to achieve.
If you look at this template it is pretty much all configured to deploy a website from Github to Azure on demand. If you edit that to configure it to your needs you can have a new Azure website online with 2 minutes of running the script.
Then you would have another script that deleted it once you had finished.
Doing it this way you would loose no functionality, and probably learn quite a bit about what is possible with Azure along the way.
App Service Plan
An app service plan is the hardware that a web app runs on. In the free and shared tier your web apps share an instance with other web apps. In the other tiers you have a dedicated virtual machine. It is this virtual machine that you pay for. In that case it is irrelevant whether or not you have web apps running on your app service or not, you still have a virtual machine running and you will be charged for that.
To change the App Service Plan via PowerShell, you can run the following command
Set-AzureRmAppServicePlan -ResourceGroupName $rg -Name $AppServicePlan -Tier Free
I was able to accomplish this using the dashboard by selecting the App Service Plan, clicking Scale up (App Service Plan), and then from there if you click Dev/Test you can select the Free tier.
As others have mentioned, you need to script this. Fortunately, I created a repository with one-click deployment to your Azure resources.
https://github.com/jraps20/jrap-AzureVerticalScaling
The steps are intended to be as simple and generic as possible:
Execute the one-click deployment from the repo readme
Select the subscription, resource group etc.
Deploy resource to Azure
Set up your schedule to scale up and scale down as-needed
The scripting relies on runbooks and variables to maintain the previous state of each App Service Plan and App Services within those plans. Some App Services cannot be scaled due to specific settings being used (AlwaysOn, Use32BitWOrkerProcess, ClientCertEnabled, etc.). In those cases, the previous values are stored as variables prior to down scaling and then the original values are reapplied when the services are scaled up.
For more clarity, I have written a blog post that goes into detail. The post is pertaining to Sitecore, but applies to any App Service setup- Drastically Reduce Azure PaaS Hosting Costs in Non-Prod Environments With Scheduled Vertical Scaling. It also includes a brief video tutorial to show its use case.
Myself and others have been using this repository/approach for well over a year and it works great. I mostly use it for POC's to reduce costs when I'm not actively working on something. However, its main intention was for targeting non-prod environments to save costs during non-work hours.
Azure App Service Plan is just an logical concept of a set of features and capacity that you can share across multiple apps. I don`t think you can "pause" a plan, instead you can pause your service. and depends on billing model of each service, you might or might not get charged.
Pausing = Delete or lower tier.
Scripting is the key.
Design Diagram
Use scripts to create (also consider shared resources)
Delete using scripts
Use scripts to recreate.
eg: If we use resource group properly per environment then
Export-AzureRmResourceGroup will create a template for us (everything in the resource group will be pulled out as script). So we can delete it and recreate it anytime.
To pause a VM and stop billing you need to shut is down and deallocate it. Just shutting down still has the capacity reserved as if its running.
Storage can't be shutdown - it can be moved to lower cost tiers.

Azure Service Plans and non-production slots

I m looking for the best practice when it comes to azure service plans in a microservice architecture. We have a series of microservices where each is completely independent from each other both in terms of capacity, resources, developers and overall architecture. It goes without saying that if one service experiences issues the other ones should not be affected if not interacting with the problematic service. Those services are hosted in Azure
My question is around Service Plans and how those should related to dev / staging environments. Up until now we would create a service plan for our microservice, call it PersonService. So we would create the PersonService service plan and then the default slot would be production (person-service) and then we would have another staging slot (person-service-staging) to cater for staging / testing needs. All of those would be served under the same service plan.
A terrible thought came to me today that if a dev deploys some horrible bug in staging that eats up all the CPU and / or mem then the production slot would starve from those resources and essentially the staging environment would be affecting the response times of production.
Am I right to think this would be the case? How do you guys recommend to set this up to avoid this issue? Thanks
Yes you are correct, if person-service-staging starts to consume a significant proportion of the underlying server's resources, it will affect person-service.
Avoiding it very much depends on your current set up and what your priorities are. Adding a dev / test / staging service plan is by far the easiest approach. This leaves your production service plan solely for production ready code. With deployment slots there simply to allow for easy switching between versions (and quick rollback if you realise something in production is broken)
The alternative to this is having a service plan that is solely dedicated to staging that you deploy as part of your testing pipeline. The speed with which Azure can stand up a service plan means that you can create and destroy them on the fly. This gives you the benefit of being able to performance test against your staging server, when it is running on an identical plan to your production code.
One of the major benefits of cloud computing is the ability to crate disposable servers. It takes a deliberate thought to shift out of the old philosophy of 'that's our staging server' Even in a CI scenario - unless you're deploying code every 30 mins! - it can be much cleaner to throw some new servers up to test against. Even if you don't have an automated test pipeline, it is only a matter of a couple of Azure Automation scripts connected to a button on a webpage (though it is surprising how quickly those couple of scripts multiply! into something much more elegant / complicated)

Alternate to run window service in Azure cloud

We currently have a window service which send some notification emails to users after doing some processing on database(SQL database). Runs once in day.
We want to move this on azure cloud. One alternate is to put it on Azure VM as is. but I am finding some other best possible solution for that.
I study about recurring and on demand Web jobs but I am not sure is this is best solution.
Also is there any possibility to update configuration of service code in App.config without re-deploy the code of service on cloud. I means we can manage configuration from Azure portal.
Thanks in advance.
Update 11/4/2016
Since this was written, there are 2 additional features available in Azure that are both excellent choices depending on what functionality you need:
Azure Functions (which was based on the WebJobs described below): Serverless code that can be trigger/invoked in various ways, and has scaling support.
Azure Service Fabric: Microservice platform, with support for actor model, stateful and stateless services.
You've got 3 basic options:
Windows service running on VM
WebJob
Cloud service
There's a lot of information out there on the tradeoffs between these choices, but here's a brief summary.
VM - Advantages: you can move your service basically as it is without having to change much or any of your code. They also have the easiest connectivity with other resources in Azure (blob storage, virtual networks, etc). The disadvantage is you're giving up all the of PaaS advantages and are still stuck managing your own VM infrastructure
WebJob - Advantages: Multiple invocation options (queues, blobs, manually, queue receive loops, continuous while-loop style, etc), scheduled (would cover your case). Easy to deploy (can go with website, as a console app, automatically through Kudu), has some built in logging in Azure portal - and yes, to answer your question, you can alter the configuration in the portal itself for connection strings and app settings.
Disadvantages - you'll need to update code, you don't have access to underlying resources (if you need that), and more of something to keep in mind than a disadvantage - it uses the same resources as the webapp it's deployed with.
Web Jobs are the newest of the options, but at the same time appear to have active development going on to increase the functionality and usefulness.
Cloud Service - like a managed VM, has some deployment options, access to underlying VM if needed. Would require some code changes from your existing service.
There's nothing you've mentioned in your use case that makes me think a Web Job shouldn't be first thing you try.
(Edit: Troy Hunt has a great and relatively recent blog post illustrating most of the points I've mentioned about Web Jobs above: http://www.troyhunt.com/2015/01/azure-webjobs-are-awesome-and-you.html)

Resources