Creating pv backups on AKS managed disks (dynamic) using velero - azure

I'm currently trying out Azure AKS and during setup I obviously also want to make backups. For this the best practice seems to be velero. According to the documentation of velero to include pv snapshots you would annotate the pod/deployment. Example:
backup.velero.io/backup-volumes: wp-pv
Note the above is when using a static managed disk. I can see the snapshot is created. However, when I do a restore a new pv is created instead of using the one from the restore. Is this expected behavior?
Ideally, I would like to use dynamic pv's instead but this would make it even more trivial because I don't know what name the pv will have and thus can't add proper annotations beforehand.
How can I solve this in a clean way? My ideal situation would be to have scheduled backups using velero and in case of a recovery automatically have it use the snapshot as base for the pv instead of it creating a new one that doesn't contain my data. For now, it seems this is a manual procedure? Am I missing something?

This is by design.
PersistantVolumes by definitions can only ever belong to one PVC claimant. Even when set as dynamic.
I think what you want is to have the reclaim policy set to retain. See here:
https://kubernetes.io/docs/concepts/storage/persistent-volumes/#retain
A state of "Retain" should mean that the PVs data persists, it is just needing to be reclaimed by a new PV/PVC. The AKS should pick up on this... But I've only ever done this with AWS/Baremetal
In this case Velero, rightly, has to both recreate the PVC and PV for the volume to be released and reassigned to the new claimant.

Related

Custom Node Configuration in AKS using Terraform

I wanted to create noodpool with swap memory enabled in AKS, I have gone through
Terraform documentation there I can see swap_file_size_mb and vm_swappiness are the only thing related to swap. My question is
is there any way to use this flag --fail-swap-on
to false(or it will automatically set to false when we set swap_file_size_mb)
And is there any way to change MemorySwap.SwapBehavior to "UnlimitedSwap"
Are these things are possible in AKS, or Am I missing something, I want a working node
that has swap memory and should use for workload through terraform. Any suggestion appreciated. Thanks.
there is a kubelet_config block in AKS schema that allows settings failSwapOn: https://learn.microsoft.com/en-us/azure/aks/custom-node-configuration#virtual-memory
but that one is not exposed in terraform, but i think its hardcoded to false: https://github.com/hashicorp/terraform-provider-azurerm/blob/ddd6a9e2ef99f2e859b567badbed1aa829261caa/internal/services/containers/kubernetes_cluster_node_pool_resource.go#L1020
MemorySwap - i dont think so, at least I dont see it in the docs: https://learn.microsoft.com/en-us/azure/aks/custom-node-configuration#virtual-memory

Terraform multi-stage resource initialization with temporary resources

I use terraform to initialize some OpenStack cloud resources.
I have a scenario where I would need to initialize/prepare a volume disk using a temporary compute resource. Once volume is fully initialized, I would no longer need the temporary compute resource but need to attach to another compute resource (different network configuration and other settings making reuse of first impossible). As you might have guessed, I cannot reach directly the expected long term goal without the intermediary step.
I know I could drive a state machine or some sort of processing queue from outside terraform to achieve this, but I wonder if it was possible to do it nicely in one single run of terraform.
The best I could think of, is that a main terraform script would trigger creation/destruction of the intermediate compute resource by launching a another terraform instance responsible just for the intermediate resources (using terraform apply followed by terraform destroy). However it requires extra care such as ensuring unique folder to deal with concurrent "main" resource initialization and makes the whole a bit messy I think.
I wonder if it was possible to do it nicely in one single run of terraform.
Sadly, no. Any "solution" which you could possibly implement for that (e.g. running custom scripts through local-exec, etc) in a single TF will only be convoluted mess, and will only lead to more issues that it solves in the long term.
The proper way, as you wrote, is to use dedicated CI/CD pipeline for a multistage deployment. Alternatively, don't use TF at all, and use other IaC tool.

Backing up of Terraform statefile

I usually run all my Terraform scripts through Bastion server and all my code including the tf statefile resides on the same server. There happened this incident where my machine accidentally went down (hard reboot) and somehow the root filesystem got corrupted. Now my statefile is gone but my resources still exist and are running. I don't want to again run terraform apply to recreate the whole environment with a downtime. What's the best way to recover from this mess and what can be done so that this doesn't get repeated in future.
I have already taken a look at terraform refresh and terraform import. But are there any better ways to do this ?
and all my code including the tf statefile resides on the same server.
As you don't have .backup file, I'm not sure if you can recover the statefile smoothly in terraform way, do let me know if you find a way :) . However you can take few step which will help you come out from situation like this.
The best practice is keep all your statefiles in some remote storage like S3 or Blob and configure your backend accordingly so that each time you destroy or create a new stack, it will always contact the statefile remotely.
On top of it, you can take the advantage of terraform workspace to avoid the mess of statefile in multi environment scenario. Also consider creating a plan for backtracking and versioning of previous deployments.
terraform plan -var-file "" -out "" -target=module.<blue/green>
what can be done so that this doesn't get repeated in future.
Terraform blue-green deployment is the answer to your question. We implemented this model quite a while and it's running smoothly. The whole idea is modularity and reusability, same templates is working for 5 different component with different architecture without any downtime(The core template remains same and variable files is different).
We are taking advantage of Terraform module. We have two module called blue and green, you can name anything. At any given point of time either blue or green will be taking traffic. If we have some changes to deploy we will bring the alternative stack based on state output( targeted module based on terraform state), auto validate it then move the traffic to the new stack and destroy the old one.
Here is an article you can keep as reference but this exactly doesn't reflect what we do nevertheless good to start with.
Please see this blog post, which, unfortunately, illustrates import being the only solution.
If you are still unable to recover the terraform state. You can create a blueprint of terraform configuration as well as state for a specific aws resources using terraforming But it requires some manual effort to edit the state for managing the resources back. You can have this state file, run terraform plan and compare its output with your infrastructure. It is good to have remote state especially using any object stores like aws s3 or key value store like consul. It has support for locking the state when multiple transactions happened at a same time. Backing up process is also quite simple.

How to mount a file and access it from application in a container kubernetes

I am looking for a best solution for a problem where lets say an application has to access a csv file (say employee.csv) and does some operations such as getEmployee or updateEmployee etc.
Which Volume is best suitable for this and why?
Please note that employee.csv will have some pre-loaded data already.
Also to be precise we are using azure-cli for handling kubernetes.
Please Help!!
My first question would be: is your application meant to be scalable (i.e. have multiple instances running at the same time)? If that is the case, then you should choose a volume that can be written by multiple instances at the same time (ReadWriteMany, https://kubernetes.io/docs/concepts/storage/persistent-volumes/). As you are using Azure, the AzureFile volume could fit your case. However, I am concerned that there could be a conflict with multiple writers (and some data may be lost). My advice would be to better use a Database System so you avoid this kind of situations.
If you only want to have one writer, then you could use pretty much any of them. However, if you use local volumes you could have problems when a pod get rescheduled on another host (it would not be able to retrieve the data). Given the requirements that you have (a simple csv file), the reason I would give you for using one PersistentVolume provider instead of another would be the less painful to setup. In this sense, just like before, if you are using Azure you could simply use an AzureFile volume type, as it should be more straightforward to configure in that cloud: https://learn.microsoft.com/en-us/azure/aks/azure-files

How do you rename a GCE persistent disk?

This should be easy but ...
I have been working on a Google Compute Engine persistent disk image that I'm calling utilserver, and basically now need to build it again from scratch, but I might need the original one to try a few things in case problems come up. So I'd like to rename utilserver to utilserver-backup and then create a new utilserver that will hopefully end up being more correct. However, under the web console for my project there's only a "Delete" button, no "Rename" button. Neither does gcutil seem to have a rename command. Ok, I tried creating a snapshot of utilserver and then from that a new persistent disk called utilserver-backup, but when I did that the new disk looked like a completely new image--none of my prior installation work was on there. Any ideas here?
You can create a snapshot of your disk and then can create multiple disk from that snapshot. By creating the snapshot you will have the backup of your original disk. You can then delete the original disk and create a new one with the same name. You can refer to the following link for more details about snapshot: https://cloud.google.com/compute/docs/disks/create-snapshots
I personally have tried creating a new disk from snapshot using the following command and it created a new disk with all my data
gcutil adddisk <disk-name> --project=<project-id> --source_snapshot=<snapshot-name>
gcutil has been deprecated in favor of gcloud compute.
gcloud compute disks create <new-disk-name> --source-snapshot <snapshot-name> --zone=<zone-name>
Example:
gcloud compute disks create production --source-snapshot production-backup-2023-01-23 --zone=asia-southeast1-b

Resources