I have a Terraform infrastructure where I have to install a Windows 10 server EC2 instance with an additional volume (D:).
The Terraform configuration is quite easy and well explained in volume_attachment documentation.
To init, attach and format the new volume I found this answer, I tested it (by hands) and it works as expected.
The problem is how to automate everything: the aws_volume_attachment depends on aws_instance so I can't run the script to init, attach and format the new volume in the user_data section of the aws_instance since the aws_volume_attachment is not yet created by Terraform.
I'm trying to execute the script with a null_resource.
To configure the instance image with Packer I'm using WinRM with following configuration:
source "amazon-ebs" "windows" {
ami_name = var.image_name
communicator = "winrm"
instance_type = "t2.micro"
user_data_file = "setup.txt"
winrm_insecure = true
winrm_port = 5986
winrm_use_ssl = true
winrm_username = "Administrator"
so I tried to replicate the same connection for the null_resource in Terraform:
resource "null_resource" "it" {
depends_on = [aws_volume_attachment.it]
triggers = { instance_id = local.jenkins_build_win.id, volume_id = aws_ebs_volume.it.id }
connection {
host = local.jenkins_build_win.fqdn
https = true
insecure = true
password = local.jenkins_build_win.password
port = 5986
type = "winrm"
user = "Administrator"
}
provisioner "remote-exec" {
inline = [
"Initialize-Disk -Number 1 -PartitionStyle MBR",
"$part = New-Partition -DiskNumber 1 -UseMaximumSize -IsActive -AssignDriveLetter",
"Format-Volume -DriveLetter $part.DriveLetter -Confirm:$FALSE"
]
}
}
local.jenkins_build_win.fqdn and local.jenkins_build_win.password resolve correctly (I wrote them with a local_file resource and I can use them to connect to the instance with Remote Desktop), but Terraform can't connect to the instance. :(
Running Terraform with TF_LOG=trace the only detail I can get for the error is:
[DEBUG] connecting to remote shell using WinRM
[ERROR] error creating shell: unknown error Post "https://{fqdn}:5986/wsman": read tcp {local_ip}:44538->{remote_ip}:5986: read: connection reset by peer
while running Packer with PACKER_LOG=1 I can't get any details on WinRM connection: my intention was to compare the calls made by Packer with the ones done by Terraform to try to identify the problem...
I feel I'm stuck. :( Any idea?
Related
I am creating a VPN using a script in Terraform as no provider function is available. This VPN also has some other attached resources like security groups.
So when I run terraform destroy it starts deleting the VPN but in parallel, it also starts deleting the security group. The security group deletion fails because those groups are "still" associated with the VPN which is in the process of deletion.
When I run terraform destroy -parallelism=1 it works fine, but due to some limitations, I cannot use this in prod.
Is there a way I can enforce VPN to be deleted first before any other resource deletion starts?
EDIT:
See the security group and VPN Code:
resource "<cloud_provider>_security_group" "sg" {
name = format("%s-%s", local.name, "sg")
vpc = var.vpc_id
resource_group = var.resource_group_id
}
resource "null_resource" "make_vpn" {
triggers = {
vpn_name = var.vpn_name
local_script = local.scripts_location
}
provisioner "local-exec" {
command = "${local.scripts_location}/login.sh"
interpreter = ["/bin/bash", "-c"]
environment = {
API_KEY = var.api_key
}
}
provisioner "local-exec" {
command = local_file.make_vpn.filename
}
provisioner "local-exec" {
when = "destroy"
command = <<EOT
${self.triggers.local_script}/delete_vpn_server.sh ${self.triggers.vpn_name}
EOT
on_failure = continue
}
}
Having the following VMSS terraform config:
resource "azurerm_linux_virtual_machine_scale_set" "my-vmss" {
...
instances = 2
...
upgrade_mode = "Rolling"
rolling_upgrade_policy {
max_batch_instance_percent = 100
max_unhealthy_instance_percent = 100
max_unhealthy_upgraded_instance_percent = 0
pause_time_between_batches = "PT10M"
}
extension {
name = "my-vmss-app-health-ext"
publisher = "Microsoft.ManagedServices"
type = "ApplicationHealthLinux"
automatic_upgrade_enabled = true
type_handler_version = "1.0"
settings =jsonencode({
protocol = "tcp"
port = 8080
})
...
}
However, whenever a change is applied (e.g., changing custom_data), the VMSS is updated but instances are not reimaged. Only after manual reimage (via UI or Azure CLI) do the instances get updated.
The "terraform plan" is as expected - custom_data change is detected:
# azurerm_linux_virtual_machine_scale_set.my-vmss will be updated in-place
~ resource "azurerm_linux_virtual_machine_scale_set" "my-vmss" {
...
~ custom_data = (sensitive value)
...
Plan: 0 to add, 1 to change, 0 to destroy.
Any idea of how to make Terraform cause the instance reimaging?
It looks like not a terraform issue but a "rolling upgrades" design by Azure. From here (1) it follows that updates to custom_data won't affect existing instances. I.e., until the instance is manually reimaged (e.g., via UI or azure CLI) it won't get the new custom_data (e.g., the new cloud-init script).
In contrast, AWS does refresh instances on custom_data updates. Please let me know if my understanding is incorrect or if you have an idea of how to work around this limitation in Azure.
I am creating AWS EC2 instance and I am using Terraform Cloud as backend.
in ./main.tf:
terraform {
required_version = "~> 0.12"
backend "remote" {
hostname = "app.terraform.io"
organization = "organization"
workspaces { prefix = "test-dev-" }
}
in ./modules/instances/function.tf
resource "aws_instance" "test" {
ami = "${var.ami_id}"
instance_type = "${var.instance_type}"
subnet_id = "${var.private_subnet_id}"
vpc_security_group_ids = ["${aws_security_group.test_sg.id}"]
key_name = "${var.test_key}"
tags = {
Name = "name"
Function = "function"
}
provisioner "remote-exec" {
inline = [
"sudo useradd someuser"
]
connection {
host = "${self.public_ip}"
type = "ssh"
user = "ubuntu"
private_key = "${file("~/.ssh/mykey.pem")}"
}
}
}
and as a result, I got the following error:
Call to function "file" failed: no file exists at /home/terraform/.ssh/...
so what is happening here, is that terraform trying to find the file in Terraform Cloud instead of my local machine. How can I transfer file from my local machine and still using Terraform Cloud?
There is no straight way to do what I asked in the question. In the end I ended up uploading the keys into AWS with its CLI like this:
aws ec2 import-key-pair --key-name "name_for_the_key" --public-key-material file:///home/user/.ssh/name_for_the_key.pub
and then reference it like that:
resource "aws_instance" "test" {
ami = "${var.ami_id}"
...
key_name = "name_for_the_key"
...
}
Note Yes file:// looks like the "Windowsest" syntax ever but you have to use it on linux too.
With terraform 0.12, there is a templatefile function but I haven't figured out the syntax for passing it a non-trivial map as the second argument and using the result to be executed remotely as the newly created instance's provisioning step.
Here's the gist of what I'm trying to do, although it doesn't parse properly because one can't just create a local variable within the resource block named scriptstr.
While I'm really trying to get the output of the templatefile call to be executed on the remote side, once the provisioner can ssh to the machine, I've so far gone down the path of trying to get the templatefile call output written to a local file via the local-exec provisioner. Probably easy, I just haven't found the documentation or examples to understand the syntax necessary. TIA
resource "aws_instance" "server" {
count = "${var.servers}"
ami = "${local.ami}"
instance_type = "${var.instance_type}"
key_name = "${local.key_name}"
subnet_id = "${element(aws_subnet.consul.*.id, count.index)}"
iam_instance_profile = "${aws_iam_instance_profile.consul-join.name}"
vpc_security_group_ids = ["${aws_security_group.consul.id}"]
ebs_block_device {
device_name = "/dev/sda1"
volume_size = 2
}
tags = "${map(
"Name", "${var.namespace}-server-${count.index}",
var.consul_join_tag_key, var.consul_join_tag_value
)}"
scriptstr = templatefile("${path.module}/templates/consul.sh.tpl",
{
consul_version = "${local.consul_version}"
config = <<EOF
"bootstrap_expect": ${var.servers},
"node_name": "${var.namespace}-server-${count.index}",
"retry_join": ["provider=aws tag_key=${var.consul_join_tag_key} tag_value=${var.consul_join_tag_value}"],
"server": true
EOF
})
provisioner "local-exec" {
command = "echo ${scriptstr} > ${var.namespace}-server-${count.index}.init.sh"
}
provisioner "remote-exec" {
script = "${var.namespace}-server-${count.index}.init.sh"
connection {
type = "ssh"
user = "clear"
private_key = file("${local.private_key_file}")
}
}
}
In your question I can see that the higher-level problem you seem to be trying to solve here is creating a pool of HashiCorp Consul servers and then, once they are all booted up, to tell them about each other so that they can form a cluster.
Provisioners are essentially a "last resort" in Terraform, provided out of pragmatism because sometimes logging in to a host and running commands on it is the only way to get a job done. An alternative available in this case is to instead pass the information from Terraform to the server via the aws_instance user_data argument, which will then allow the servers to boot up and form a cluster immediately, rather than being delayed until Terraform is able to connect via SSH.
Either way, I'd generally prefer to have the main body of the script I intend to run already included in the AMI so that Terraform can just run it with some arguments, since that then reduces the problem to just templating the invocation of that script rather than the whole script:
provisioner "remote-exec" {
inline = ["/usr/local/bin/init-consul --expect='${var.servers}' etc, etc"]
connection {
type = "ssh"
user = "clear"
private_key = file("${local.private_key_file}")
}
}
However, if templating an entire script is what you want or need to do, I'd upload it first using the file provisioner and then run it, like this:
provisioner "file" {
destination = "/tmp/consul.sh"
content = templatefile("${path.module}/templates/consul.sh.tpl", {
consul_version = "${local.consul_version}"
config = <<EOF
"bootstrap_expect": ${var.servers},
"node_name": "${var.namespace}-server-${count.index}",
"retry_join": ["provider=aws tag_key=${var.consul_join_tag_key} tag_value=${var.consul_join_tag_value}"],
"server": true
EOF
})
}
provisioner "remote-exec" {
inline = ["sh /tmp/consul.sh"]
}
I was trying to do
terraform apply
but getting below error
1 error(s) occurred:
digitalocean_droplet.testvm[0]: Resource 'digitalocean_droplet.testvm' not found for variable
'digitalocean_droplet.testvm.ipv4_address'
Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with any
resources that successfully completed. Please address the error above
and apply again to incrementally change your infrastructure.
How can I pass the public ip of the created droplet to provisioner local-exec command.
Below is my .tf file
provider "digitalocean" {
token = "----TOKEN----"
}
resource "digitalocean_droplet" "testvm" {
count = "10"
name = "do-instance-${count.index}"
image = "ubuntu-16-04-x64"
size = "512mb"
region = "nyc3"
ipv6 = true
private_networking = false
ssh_keys = [
"----SSH KEY----"
]
provisioner "local-exec" {
command = "fab production deploy ${digitalocean_droplet.testvm.ipv4_address}"
}
}
Thanks in advance!
For local-exec provisioner you can make use of the self keyword. In this case it would be {self.ipv4_address}.
My guess is that your snippet would've worked if you don't put count=10 in the testvm droplet. You can also make use of ${count.index}
More info: https://www.terraform.io/docs/provisioners/
Also, found this github issue that might be helpful to you.
Hope it helps