The problem:
I'm trying to build a Docker Swarm cluster on Digital Ocean, consisting of 3 "manager" nodes and however many worker nodes. The number of worker nodes isn't particularly relevant for this question. I'm trying to module-ize the Docker Swarm provisioning stuff, so its not specifically coupled to the digitalocean provider, but instead can receive a list of ip addresses to act against provisioning the cluster.
In order to provision the master nodes, the first node needs to be put into swarm mode, which generates a join key that the other master nodes will use to join the first one. "null_resource"s are being used to execute remote provisioners against the master nodes, however, I cannot figure out how dafuq to make sure the first master node completes doing its stuff ("docker swarm init ..."), before having another "null_resource" provisioner execute against the other master nodes that need to join the first one. They all run in parallel and predictably, it doesn't work.
Further, trying to figure out how to collect the first node's generated join-token and make it available to the other nodes. I've considered doing this with Consul, and storing the join token as a key, and getting that key on the other nodes - but this isn't ideal as... there are still issues with ensuring the Consul cluster is provisioned and ready (so kind of the same problem).
main.tf
variable "master_count" { default = 3 }
# master nodes
resource "digitalocean_droplet" "master_nodes" {
count = "${var.master_count}"
... etc, etc
}
module "docker_master" {
source = "./docker/master"
private_ip = "${digitalocean_droplet.master_nodes.*.ipv4_address_private}"
public_ip = "${digitalocean_droplet.master_nodes.*.ipv4_address}"
instances = "${var.master_count}"
}
docker/master/main.tf
variable "instances" {}
variable "private_ip" { type = "list" }
variable "public_ip" { type = "list" }
# Act only on the first item in the list of masters...
resource "null_resource" "swarm_master" {
count = 1
# Just to ensure this gets run every time
triggers {
version = "${timestamp()}"
}
connection {
...
host = "${element(var.public_ip, 0)}"
}
provisioner "remote-exec" {
inline = [<<EOF
... install docker, then ...
docker swarm init --advertise-addr ${element(var.private_ip, 0)}
MANAGER_JOIN_TOKEN=$(docker swarm join-token manager -q)
# need to do something with the join token, like make it available
# as an attribute for interpolation in the next "null_resource" block
EOF
]
}
}
# Act on the other 2 swarm master nodes (*not* the first one)
resource "null_resource" "other_swarm_masters" {
count = "${var.instances - 1}"
triggers {
version = "${timestamp()}"
}
# Host key slices the 3-element IP list and excludes the first one
connection {
...
host = "${element(slice(var.public_ip, 1, length(var.public_ip)), count.index)}"
}
provisioner "remote-exec" {
inline = [<<EOF
SWARM_MASTER_JOIN_TOKEN=$(consul kv get docker/swarm/manager/join_token)
docker swarm join --token ??? ${element(var.private_ip, 0)}:2377
EOF
]
}
##### THIS IS THE MEAT OF THE QUESTION ###
# How do I make this "null_resource" block not run until the other one has
# completed and generated the swarm token output? depends_on doesn't
# seem to do it :(
}
From reading through github issues, I get the feeling this isn't an uncommon problem... but its kicking my ass. Any suggestions appreciated!
#victor-m's comment is correct. If you use a null_resource and have the following trigger on any former's property, then they will execute in order.
resource "null_resource" "first" {
provisioner "local-exec" {
command = "echo 'first' > newfile"
}
}
resource "null_resource" "second" {
triggers = {
order = null_resource.first.id
}
provisioner "local-exec" {
command = "echo 'second' >> newfile"
}
}
resource "null_resource" "third" {
triggers = {
order = null_resource.second.id
}
provisioner "local-exec" {
command = "echo 'third' >> newfile"
}
}
$ terraform apply
null_resource.first: Creating...
null_resource.first: Provisioning with 'local-exec'...
null_resource.first (local-exec): Executing: ["/bin/sh" "-c" "echo 'first' > newfile"]
null_resource.first: Creation complete after 0s [id=3107778766090269290]
null_resource.second: Creating...
null_resource.second: Provisioning with 'local-exec'...
null_resource.second (local-exec): Executing: ["/bin/sh" "-c" "echo 'second' >> newfile"]
null_resource.second: Creation complete after 0s [id=3159896803213063900]
null_resource.third: Creating...
null_resource.third: Provisioning with 'local-exec'...
null_resource.third (local-exec): Executing: ["/bin/sh" "-c" "echo 'third' >> newfile"]
null_resource.third: Creation complete after 0s [id=6959717123480445161]
Apply complete! Resources: 3 added, 0 changed, 0 destroyed.
To make sure, cat the new file and here's the output as expected
$ cat newfile
first
second
third
Related
I am creating a VPN using a script in Terraform as no provider function is available. This VPN also has some other attached resources like security groups.
So when I run terraform destroy it starts deleting the VPN but in parallel, it also starts deleting the security group. The security group deletion fails because those groups are "still" associated with the VPN which is in the process of deletion.
When I run terraform destroy -parallelism=1 it works fine, but due to some limitations, I cannot use this in prod.
Is there a way I can enforce VPN to be deleted first before any other resource deletion starts?
EDIT:
See the security group and VPN Code:
resource "<cloud_provider>_security_group" "sg" {
name = format("%s-%s", local.name, "sg")
vpc = var.vpc_id
resource_group = var.resource_group_id
}
resource "null_resource" "make_vpn" {
triggers = {
vpn_name = var.vpn_name
local_script = local.scripts_location
}
provisioner "local-exec" {
command = "${local.scripts_location}/login.sh"
interpreter = ["/bin/bash", "-c"]
environment = {
API_KEY = var.api_key
}
}
provisioner "local-exec" {
command = local_file.make_vpn.filename
}
provisioner "local-exec" {
when = "destroy"
command = <<EOT
${self.triggers.local_script}/delete_vpn_server.sh ${self.triggers.vpn_name}
EOT
on_failure = continue
}
}
I want to perform the exec operation only once per hour. Meaning, if it's now 12 then don't exec again until it's 13 o'clock.
The timestamp in combination with the fomatdate will result in timestamps that only differ every hour.
resource "null_resource" "helm_login" {
triggers = {
hour = formatdate("YYYYMMDDhh", timestamp())
}
provisioner "local-exec" {
command = <<-EOF
az acr login -n ${var.helm_chart_acr_fqdn} -t -o tsv --query accessToken \
| helm registry login ${var.helm_chart_acr_fqdn} \
-u "00000000-0000-0000-0000-000000000000" \
--password-stdin
EOF
}
The problem is that terraform reports that this value will be only known after appy and always wants to recreate the resource.
# module.k8s.null_resource.helm_login must be replaced
-/+ resource "null_resource" "helm_login" {
~ id = "4503742218368236410" -> (known after apply)
~ triggers = {
- "hour" = "2021112010"
} -> (known after apply) # forces replacement
}
I have observed similar issues where values are fetched from data and passed to resources on creation, forcing me to not use those data values but hard code them.
As you just find out terraform evaluates the timestamp function at runtime,
that is why we see the: (known after apply) # forces replacement
But we can do something about that to meet your goal, we can pass the hour as a parameter:
variable "hour" {
type = number
}
resource "null_resource" "test" {
triggers = {
hour = var.hour
}
provisioner "local-exec" {
command = "echo 'test'"
}
}
Then to call terraform we do:
hour=$(date +%G%m%d%H); sudo terraform apply -var="hour=$hour"
First run:
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
# null_resource.test will be created
+ resource "null_resource" "test" {
+ id = (known after apply)
+ triggers = {
+ "hour" = "2021112011"
}
}
Plan: 1 to add, 0 to change, 0 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
null_resource.test: Creating...
null_resource.test: Provisioning with 'local-exec'...
null_resource.test (local-exec): Executing: ["/bin/sh" "-c" "echo 'test'"]
null_resource.test (local-exec): test
null_resource.test: Creation complete after 0s [id=6793564729560967989]
Second run:
null_resource.test: Refreshing state... [id=6793564729560967989]
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
I have a Terraform module, which we'll call parent and a child module used inside of it that we'll refer to as child. The goal is to have the child module run the provisioner before the kubernetes_deployment resource is created. Basically, the child module builds and pushes a Docker image. If the image is not already present, the kubernetes_deployment will wait and eventually timeout because there's no image for the Deployment to use for creation of pods. I've tried everything I've been able to find online, output variables in the child module, using depends_on in the kubernetes_deployment resource, etc and have hit a wall. I would greatly appreciate any help!
parent.tf
module "child" {
source = ".\\child-module-path"
...
}
resource "kubernetes_deployment" "kub_deployment" {
...
}
child-module-path\child.tf
data "external" "hash_folder" {
program = ["powershell.exe", "${path.module}\\bin\\hash_folder.ps1"]
}
resource "null_resource" "build" {
triggers = {
md5 = data.external.hash_folder.result.md5
}
provisioner "local-exec" {
command = "${path.module}\\bin\\build.ps1 ${var.argument_example}"
interpreter = ["powershell.exe"]
}
}
Example Terraform error output:
module.parent.kubernetes_deployment.kub_deployment: Still creating... [10m0s elapsed]
Error output:
Error: Waiting for rollout to finish: 0 of 1 updated replicas are available...
In your child module, declare an output value that depends on the null resource that has the provisioner associated with it:
output "build_complete" {
# The actual value here doesn't really matter,
# as long as this output refers to the null_resource.
value = null_resource.build.triggers.md5
}
Then in your "parent" module, you can either make use of module.child.build_complete in an expression (if including the MD5 string in the deployment somewhere is useful), or you can just declare that the resource depends on the output.
resource "kubernetes_deployment" "example" {
depends_on = [module.child.build_complete]
...
}
Because the output depends on the null_resource and the kubernetes_deployment depends on the output, transitively the kubernetes_deployment now effectively depends on the null_resource, creating the ordering you wanted.
This is different from "Capture Terraform provisioner output?". I have a resource (a null_resource in this case) with a count and a local-exec provisioner that has some complex interpolated arguments:
resource "null_resource" "complex-provisioning" {
count = "${var.count}"
triggers {
server_triggers = "${null_resource.api-setup.*.id[count.index]}"
db_triggers = "${var.db_id}"
}
provisioner "local-exec" {
command = <<EOF
${var.init_command}
do-lots-of-stuff --target=${aws_instance.api.*.private_ip[count.index]} --bastion=${aws_instance.bastion.public_ip} --db=${var.db_name}
EOF
}
}
I want to be able to show what the provisioner did as output (this is not valid Terraform, just mock-up of what I want):
output "provisioner_commands" {
value = {
api_commands = "${null_resource.complex-provisioning.*.provisioner.0.command}"
}
}
My goal is to get some output like
provisioner_commands = {
api_commands = [
"do-lots-of-stuff --target=10.0.0.1 --bastion=77.2.4.34 --db=mydb.local",
"do-lots-of-stuff --target=10.0.0.2 --bastion=77.2.4.34 --db=mydb.local",
"do-lots-of-stuff --target=10.0.0.3 --bastion=77.2.4.34 --db=mydb.local",
]
}
Can I read provisioner configuration and output it like this? If not, is there a different way to get what I want? (If I didn't need to run over an array of resources, I would define the command in a local variable and reference it both in the provisioner and the output.)
You cannot grab the interpolated command from the local-exec provisioner block but if you put the same interpolation into the trigger, you can retrieve it in the output with a for expression in 0.12.x
resource "null_resource" "complex-provisioning" {
count = 2
triggers = {
command = "echo ${count.index}"
}
provisioner "local-exec" {
command = self.triggers.command
}
}
output "data" {
value = [
for trigger in null_resource.complex-provisioning.*.triggers:
trigger.command
]
}
$ terraform apply
null_resource.complex-provisioning[0]: Refreshing state... [id=9105930607760919878]
null_resource.complex-provisioning[1]: Refreshing state... [id=405391095459979423]
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
Outputs:
data = [
"echo 0",
"echo 1",
]
Let's assume we have some DO tags:
resource "digitalocean_tag" "foo" {
name = "foo"
}
resource "digitalocean_tag" "bar" {
name = "bar"
}
And we have configured swarm worker nodes with mentioned tags.
resource "digitalocean_droplet" "swarm_data_worker" {
name = "swarm-worker-${count.index}"
tags = [
"${digitalocean_tag.foo.id}",
"${digitalocean_tag.bar.id}"
]
// swarm node config stuff
provisioner "remote-exec" {
inline = [
"docker swarm join --token ${data.external.swarm_join_token.result.worker} ${digitalocean_droplet.swarm_manager.ipv4_address_private}:2377"
]
}
}
I want to label created swarm node with corresponding resource (droplet) tags.
To label worker nodes we need to run on the swarm master:
docker node update --label-add foo --label-add bar worker-node
How can we automate this with terraform?
Got it! Probably not the best way to solve the issue, but until Terraform with full swarm support not released can't find something better.
The main idea is to use pre-installed DO ssh key:
variable "public_key_path" {
description = "DigitalOcean public key"
default = "~/.ssh/hcmc_swarm/key.pub"
}
variable "do_key_name" {
description = "Name of the key on Digital Ocean"
default = "terraform"
}
resource "digitalocean_ssh_key" "default" {
name = "${var.do_key_name}"
public_key = "${file(var.public_key_path)}"
}
Then we can provision manager:
resource "digitalocean_droplet" "swarm_manager" {
...
ssh_keys = ["${digitalocean_ssh_key.default.id}"]
provisioner "remote-exec" {
inline = [
"docker swarm init --advertise-addr ${digitalocean_droplet.swarm_manager.ipv4_address_private}"
]
}
}
And after all we can connect to the swarm_manager via ssh after worker is ready:
# Docker swarm labels list
variable "swarm_data_worker__lables" {
type = "list"
default = ["type=data-worker"]
}
resource "digitalocean_droplet" "swarm_data_worker" {
...
provisioner "remote-exec" {
inline = [
"ssh -o StrictHostKeyChecking=no root#${digitalocean_droplet.swarm_manager.ipv4_address_private} docker node update --label-add ${join(" --label-add ", var.swarm_data_worker__lables)} ${self.name}",
]
}
}
Please, if you know a better approach to solve this issue, don't hesitate to point out via new answer or comment.