Capturing kubectl set command in terraform - terraform

We have a case where we need to update AWS EKS CNI config on the daemon set. But the solution is only through kubectl command. How do we update an existing daemonset with specific values through terraform code? The requirement is that the solution has to be in IAC. The equivalent kubectl command given is
kubectl set env daemonset -n kube-system aws-node WARM_IP_TARGET=2,MINIMUM_IP_TARGET=12
The values shown in numbers are planned to be variables in terraform.

What you are asking for doesn't exist. Here is the open Terraform Github issue for what you are asking for:
https://github.com/hashicorp/terraform-provider-kubernetes/issues/723
Even if that did exist, I wouldn't consider that IaC as it's not declarative (might as well just run a bash script).
In my opinion, the real solution is for AWS to allow the provisioning of bare clusters so that "addons" can be managed completely through IaC tools. But that also does not exist:
https://github.com/aws/containers-roadmap/issues/923
The closest you're going to get will be to use a null_resource to execute the patch. Here's an example in that Github issue:
https://github.com/hashicorp/terraform-provider-kubernetes/issues/723#issuecomment-679423792
So your final result will look similar to this:
resource "null_resource" "patch_aws_cni" {
triggers = {
always_run = timestamp()
}
provisioner "local-exec" {
command = <<EOF
# do all those commands to get kubectl and auth info, then run:
kubectl set env daemonset -n kube-system aws-node WARM_IP_TARGET=2,MINIMUM_IP_TARGET=12
EOF
}
}

Related

Kubeconfig in Azure

I have an Azure cloud where a Kubernetes cluster was created by me. Besides, in my environment, I have Jenkins running for the pipeline. I need to create a container with React FE in it. I need to describe some kubectl commands with kubeconfig to enable access to Kubernetes clusters in my Azure cloud. Below lines of code are from the Jenkins groovy file:
sh "helm template $podPath -f $destPath --set namespace=$namespace > helm_chart_${env}.yaml" sh "kubectl config set-context jenkins-react#react --kubeconfig=/root/.kube/sa_new_kubeconfig" sh "kubectl delete -f helm_chart_${env}.yaml
--kubeconfig=/root/.kube/sa_new_kubeconfig || true" sh "sleep 10"
I am willing to know if there is any alternative way to use kubeconfig apart from defining it explicitly in Jenkins groovy code. If yes then which is the convenient and better way?
You can use an env variable KUBECONFIG with the path to your Kubernetes config file.
Then, it depends on how you configure your Jenkins and your Jenkins pipeline, but you may:
Add this variable to your Jenkins agent configuration
Add this variable to your Jenkinsfile pipeline

How Terraform local-exec works on Concourse?

I used 'null_resource' and pass aws cli to 'local-exec' to update stepfunction:
resource "null_resource" "enable_step_function_xray" {
triggers = {
state_machine_arn = xxxxxxx
}
provisioner "local-exec" {
command = "aws stepfunctions update-state-machine --state-machine-arn ${self.triggers.state_machine_arn} --tracing-configuration enabled=true"
}
}
This works fine when I tested via local Terraform, my question is if this will work if I apply Terraform on Concourse?
It depends entirely on if you have the Concourse job configured to use a container image that has the aws cli tool installed. If the AWS CLI tool is installed and in the path then the local-exec should succeed. If not, then it will obviously fail.
My assumption is that in your local machine, you've already set up the required credentials. So if you simply try it on Concourse CI it will fail with an authentication error.
To set it up in Concourse -
AWS Console - Create a new IAM user cicd with programmatic access only and the relevant permissions. For testing purposes, you can use the AdministratorAcess policy, but make sure to make it least-privileged later on.
AWS Console - Create AWS security credentials (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) for the cicd user (save them in a safe place)
Concourse CI - Create the secrets AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
Concourse CI - Add ((AWS_ACCESS_KEY_ID)) and ((AWS_SECRET_ACCESS_KEY)) environment variables to your Concourse CI task
I'm sure there are many tutorials about this subject, but the above steps will probably appear in most of these tutorials. Concourse CI should now be able to apply changes on your AWS account.

Terraform - How to Create a GKE Cluster and Install Helm Charts?

Goal
I have a specific workflow to set up a fresh Kubernetes cluster on Google Cloud. And I want to automate the process with Terraform. Those are the steps:
Create cluster
gcloud beta container --project "my-google-project" clusters create "cluster-name" --zone "europe-west3-b"
Setup Helm repos
helm repo add stable https://kubernetes-charts.storage.googleapis.com/
helm repo add jetstack https://charts.jetstack.io/
helm repo update
Install NGINX Ingress
kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user $(gcloud config get-value account)
helm install nginx-ingress stable/nginx-ingress
Install Cert-Manager
kubectl apply --validate=false -f https://raw.githubusercontent.com/jetstack/cert-manager/v0.13.0/deploy/manifests/00-crds.yaml
kubectl create namespace cert-manager
helm install cert-manager jetstack/cert-manager --namespace cert-manager
Ideas
The first step will probably look like this:
resource "google_container_cluster" "primary" {
name = "cluster-name"
location = "europe-west3-b"
initial_node_count = 3
master_auth {
username = ""
password = ""
client_certificate_config {
issue_client_certificate = false
}
}
node_config {
oauth_scopes = [
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
]
metadata = {
disable-legacy-endpoints = "true"
}
}
}
But I have no idea how to approach steps 2 - 4.
While Terraform makes sense for building and provisioning cloud infrastructure for things like Kubernetes to run on, it doesn't necessarily make sense to be used to configure said infrastructure after deployment. I think most infrastructure designs would consider applications deployed onto a provisioned cluster as configurations to said cluster. The semantics here are surely a bit nuanced but I maintain that a tool like Ansible is better suited to deploy applications to your cluster after provisioning.
So my advice would be to define a handful of Ansible Roles. Perhaps:
create_cluster
deploy_helm
install_nginx_ingress
install_cert_manager
Within each respective role, define the tasks and variables that are required to be used as per the Galaxy schema. Lastly, define a Playbook that Ansible uses to include or import these roles. This would allow you to provision your infrastructure and deploy all of the required applications to it in a single command:
ansible-playbook playbook.yml

Terraform local-exec Provisioner to run on multiple Azure virtual machines

I had a working TF setup to spin up multiple Linux VMs in Azure. I was running a local-exec provisioner in a null_resource to execute an Ansible playbook. I was extracting the private IP addresses from the TF state file. The state file was stored locally.
I have recently configured Azure backend and now the state file is stored in a storage account.
I have modified the local provisioner and am trying to obtain all the private IP addresses to run the Ansible playbook against, as follows:
resource "null_resource" "Ansible4Ubuntu" {
provisioner "local-exec" {
command = "sleep 20;ansible-playbook -i '${element(azurerm_network_interface.unic.*.private_ip_address, count.index)}', vmlinux-playbook.yml"
I have also tried:
resource "null_resource" "Ansible4Ubuntu" {
provisioner "local-exec" {
command = "sleep 20;ansible-playbook -i '${azurerm_network_interface.unic.private_ip_address}', vmlinux-playbook.yml"
They both work fine with the first VM only and ignores the rest. I have also tried with count.index+1 and self.private_ip_address, but no luck.
Actual result: TF provides the private IP of only the first VM to Ansible.
Expected result: TF to provide a list of all private IPs to Ansible so that it can run the playbook against all of them.
PS: I am also looking at using the TF's remote_state data structure, but seems like the state file contains IPs from previous builds as well, making it hard to extract the ones good for the current build.
I would appreciate any help.
Thanks
Asghar
As Matt said, the null_resource just run one time, so it just works fine with the first VM and ignores the rest. You need to configure triggers for the null_resource with the NIC list to make it run multiple times. Sample code like this:
resource "null_resource" "Ansible4Ubuntu" {
triggers = {
network_interface_ids = "${join(",", azurerm_network_interface.unic.*.id)}"
}
provisioner "local-exec" {
command = "sleep 20;ansible-playbook -i '${join(" ", azurerm_network_interface.unic.*.private_ip_address)}, vmlinux-playbook.yml"
}
}
You can change something in it as you want. For information, see null_resource.

Capture Terraform provisioner output?

Use Case
Trying to provision a (Docker Swarm or Consul) cluster where initializing the cluster first occurs on one node, which generates some token, which then needs to be used by other nodes joining the cluster. Key thing being that nodes 1 and 2 shouldn't attempt to join the cluster until the join key has been generated by node 0.
Eg. on node 0, running docker swarm init ... will return a join token. Then on nodes 1 and 2, you'd need to pass that token to the same command, like docker swarm init ${JOIN_TOKEN} ${NODE_0_IP_ADDRESS}:{SOME_PORT}. And magic, you've got a neat little cluster...
Attempts So Far
Tried initializing all nodes with the AWS SDK installed, and storing the join key from node 0 on S3, then fetching that join key on other nodes. This is done via a null_resource with 'remote-exec' provisioners. Due to the way Terraform executes things in parallel, there are racy type conditions and predictably nodes 1 and 2 frequently attempt to fetch a key from S3 thats not there yet (eg. node 0 hasn't finished its stuff yet).
Tried using the 'local-exec' provisioner to SSH into node 0 and capture its join key output. This hasn't worked well or I sucked at doing it.
I've read the docs. And stack overflow. And Github issues, like this really long outstanding one. Thoroughly. If this has been solved elsewhere though, links appreciated!
PS - this is directly related to and is a smaller subset of this question, but wanted to re-ask it in order to focus the scope of the problem.
You can redirect the outputs to a file:
resource "null_resource" "shell" {
provisioner "local-exec" {
command = "uptime 2>stderr >stdout; echo $? >exitstatus"
}
}
and then read the stdout, stderr and exitstatus files with local_file
The problem is that if the files disappear, then terraform apply will fail.
In terraform 0.11 I made a workaround by reading the file with external data source and storing the results in a null_resource triggers (!)
resource "null_resource" "contents" {
triggers = {
stdout = "${data.external.read.result["stdout"]}"
stderr = "${data.external.read.result["stderr"]}"
exitstatus = "${data.external.read.result["exitstatus"]}"
}
lifecycle {
ignore_changes = [
"triggers",
]
}
}
But in 0.12 this can be replaced with file()
and then finally I can use / output those with:
output "stdout" {
value = "${chomp(null_resource.contents.triggers["stdout"])}"
}
See the module https://github.com/matti/terraform-shell-resource for full implementation
You can use external data:
data "external" "docker_token" {
program = ["/bin/bash", "-c" "echo \"{\\\"token\\\":\\\"$(docker swarm init...)\\\"}\""]
}
Then the token will be available as data.external.docker_token.result.token.
If you need to pass arguments in, you can use a script (e.g. relative to path.module). See https://www.terraform.io/docs/providers/external/data_source.html for details.
When I asked myself the same question, "Can I use output from a provisioner to feed into another resource's variables?", I went to the source for answers.
At this moment in time, provisioner results are simply streamed to terraform's standard out and never captured.
Given that you are running remote provisioners on both nodes, and you are trying to access values from S3 - I agree with this approach by the way, I would do the same - what you probably need to do is handle the race condition in your script with a sleep command, or by scheduling a script to run later with the at or cron or similar scheduling systems.
In general, Terraform wants to access all variables either up front, or as the result of a provider. Provisioners are not necessarily treated as first-class in Terraform. I'm not on the core team so I can't say why, but my speculation is that it reduces complexity to ignore provisioner results beyond success or failure, since provisioners are just scripts so their results are generally unstructured.
If you need more enhanced capabilities for setting up your instances, I suggest a dedicated tool for that purpose like Ansible, Chef, Puppet, etc. Terraform's focus is really on Infrastructure, rather than software components.
Simpler solution would be to provide the token yourself.
When creating the ACL token, simply pass in the ID value and consul will use that instead of generating one at random.
You could effectively run the docker swarm init step for node 0 as a Terraform External Data Source, and have it return JSON. Make the provisioning of the remaining nodes depend on this step and refer to the join token generated by the external data source.
https://www.terraform.io/docs/providers/external/data_source.html
With resource dependencies you can ensure that a resource is created before another.
Here's an incomplete example of how I create my consul cluster, just to give you an idea.
resource "aws_instance" "consul_1" {
user_data = <<EOF
#cloud-config
runcmd:
- 'docker pull consul:0.7.5'
- 'docker run -d -v /etc/localtime:/etc/localtime:ro -v $(pwd)/consul-data:/consul/data --restart=unless-stopped --net=host consul:0.7.5 agent -server -advertise=${self.private_ip} -bootstrap-expect=2 -datacenter=wordpress -log-level=info -data-dir=/consul/data'
EOF
}
resource "aws_instance" "consul_2" {
depends_on = ["aws_instance.consul_1"]
user_data = <<EOF
#cloud-config
runcmd:
- 'docker pull consul:0.7.5'
- 'docker run -d -v /etc/localtime:/etc/localtime:ro -v $(pwd)/consul-data:/consul/data --restart=unless-stopped --net=host consul:0.7.5 agent -server -advertise=${self.private_ip} -retry-join=${aws_instance.consul_1.private_ip} -datacenter=wordpress -log-level=info -data-dir=/consul/data'
EOF
}
For the docker swarm setup I think it's out of Terraform scope and I think it should because the token isn't an attribute of the infrastructure you are creating. So I agree with nbering, you could try to achieve that setup with a tool like Ansible or Chef.
But anyways, if the example helps you to setup your consul cluster I think you just need to configure consul as your docker swarm backend.
Sparrowform - is a lightweight provisioner for Terraform based infrastructure can handle your case. Here is example for aws ec2 instances.
Assuming we have 3 ec2 instances for consul cluster: node0, node1 and node2. The first one (node0) is where we fetch token from and keep it in S3 bucket. The other two ones load token later from S3.
$ nano aws_instance.node0.sparrowfile
#!/usr/bin/env perl6
# have not checked this command, but that's the idea ...
bash "docker swarm init | aws s3 cp - s3://alexey-bucket/stream.txt"
$ nano aws_instance.node1.sparrowfile
#!/usr/bin/env perl6
my $i=0;
my $token;
try {
while True {
my $s3-token = run 'aws', 's3', 'cp', 's3://alexey-bucket/stream.txt', '-', :out;
$token = $s3-token.out.lines[0];
$s3-token.out.close;
last if $i++ > 8 or $token;
say "retry num $i ...";
sleep 2*$i;
}
CATCH { { .resume } }
}
die "we have not succeed in fetching token" unless $token;
bash "docker swarm init $token";
$ nano aws_instance.node2.sparrowfile - the same setup as for node1
$ terrafrom apply # bootstrap infrastructure
$ sparrowform --ssh_private_key=~/.ssh/aws.pub --ssh_user=ec2-user # run provisioning on node0, node1, node2
PS disclosure, I am the tool author.

Resources