Error handling in cloud init scripts within azurerm_linux_virtual_machine - azure

I run a custom shell script when I deploy my virtual machine with terraform, which can throw errors.
My question is, how do you handle these errors, because regardless of the return code of the script, terraform always reports the deployment was successful, which leads to confusion when the VM does not what it’s supposed to do.
Here a snippet of the terraform file for context:
data "template_file" "setup_script" {
count = var.agent_count
template = file("scripts/setup.sh")
vars = {
POOL_NAME = var.pool_name
AGENT = "agent-${count.index}"
ORGANIZATION_URL = var.organization_url
TOKEN = var.token
TERRAFORM_VERSION = var.terraform_version
VSTS_AGENT_VERSION = var.vsts_agent_version
}
}
resource "azurerm_linux_virtual_machine" "vmachine" {
count = length(module.network.network_interfaces)
name = "agent-${count.index}"
resource_group_name = azurerm_resource_group.deployment-agents.name
location = azurerm_resource_group.deployment-agents.location
size = "Standard_B1ms"
admin_username = "adminuser"
network_interface_ids = [
module.network.network_interfaces[count.index].id,
]
admin_ssh_key {
username = "adminuser"
public_key = var.ssh_public_key
}
os_disk {
caching = "ReadWrite"
storage_account_type = "Standard_LRS"
}
source_image_reference {
publisher = "Canonical"
offer = "UbuntuServer"
sku = "18.04-LTS"
version = "latest"
}
boot_diagnostics {
storage_account_uri = azurerm_storage_account.boot.primary_blob_endpoint
}
custom_data = base64encode(data.template_file.setup_script.*.rendered[count.index])
}
And the setup.sh shell script:
# --- snip ----
apt-get install azure-cli
if [ $? -gt 0 ]; then
echo "Cannot install azure cli!"
exit 1
fi
# Test
exit 1
Thanks for the help.

Terraform only return the error about itself, not the script execute inside the VM. You can find the error message inside the VM.
And to install the Azure CLI via the cloud-init with a shell script, you need to add #!/bin/bash at the beginning of the shell script, see the note:
And install the Azure CLI, I think there are more things you need to do than what you have tried, take a look at the steps that install the Azure CLI in Ubuntu. Or use the existing shell script here.

The important piece here is "a cloud-init image deployment will NOT fail if the script fails" from the Azure cloud-init docs.
This mean that Azure think the deployment was successful, which is what Terraform will also show.
In order to fail the Terraform deployment, you would have to use cloud-init to install the cli using cloud-init itself, which should make the deployment fail:
data "template_cloudinit_config" "config" {
gzip = true
base64_encode = true
part {
content_type = "text/cloud-config"
content = "packages: ['azure-cli']"
}
}
More examples on what could be done using cloud-init in the docs.

Related

Terraform default script path

I am creating multiple VMs in Azure using cloud-init, they are created in parallel and when any of them fails, I can see in the logs:
Error: error executing "/tmp/terraform_876543210.sh": Process exited with status 1
But I have no way to figure out which VM is failing, I need to ssh each of them and check
The script path seems to be defined for provisioning Terraform
Is there a way to override it also for cloud-init to something like: /tmp/terraform_vmName_876543210.sh ?
I am not using provisioner but cloud-init, any idea how I can force terraform to override the terraform sh file?
Below my script:
resource "azurerm_linux_virtual_machine" "example" {
name = "example-machine"
resource_group_name = azurerm_resource_group.example.name
location = azurerm_resource_group.example.location
size = "Standard_F2"
admin_username = "adminuser"
network_interface_ids = [
azurerm_network_interface.example.id,
]
admin_ssh_key {
username = "adminuser"
public_key = file("~/.ssh/id_rsa.pub")
}
os_disk {
caching = "ReadWrite"
storage_account_type = "Standard_LRS"
}
source_image_reference {
publisher = "Canonical"
offer = "UbuntuServer"
sku = "16.04-LTS"
version = "latest"
}
custom_data = base64encode(templatefile(
"my-cloud-init.tmpl", {
var1 = "value1"
var2 = "value2"
})
)
}
And my cloud-init script:
## template: jinja
#cloud-config
runcmd:
- sudo /tmp/bootstrap.sh
write_files:
- path: /tmp/bootstrap.sh
permissions: 00700
content: |
#!/bin/sh -e
echo hello
From the code you found for the Terraform, it shows:
DefaultUnixScriptPath is used as the path to copy the file to for
remote execution on Unix if not provided otherwise.
It's the configuration for the remote execution. And for the remote execution of the SSH, you can set the source and the destination for the copy file in the provisioner "file".
But it's used to set the path in the remote VM, not the local machine that you execute the Terraform. And you can overwrite the file name like this:
provisioner "file" {
source = "conf/myapp.conf"
destination = "/etc/terraform_$(var.vmName).conf"
...
}

Creating azure automation dsc configuration and dsc configuration node using terraform doesn't seems to be working

As a very first step of my release process I run the following terraform code
resource "azurerm_automation_account" "automation_account" {
for_each = data.terraform_remote_state.pod_bootstrap.outputs.ops_rg
name = "${local.automation_account_prefix}-${each.key}"
location = each.key
resource_group_name = each.value.name
sku_name = "Basic"
tags = {
environment = "development"
}
}
The automation accounts created as expected and I can see those in Azure portal.
I also have terraform code that creates a couple of windows VMs,each VM creation accompained by the following
resource "azurerm_virtual_machine_extension" "dsc" {
name = "DevOpsDSC"
virtual_machine_id = var.vm_id
publisher = "Microsoft.Powershell"
type = "DSC"
type_handler_version = "2.83"
settings = <<SETTINGS_JSON
{
"configurationArguments": {
"RegistrationUrl": "${var.dsc_server_endpoint}",
"NodeConfigurationName": "${var.dsc_config}",
"ConfigurationMode": "${var.dsc_mode}",
"ConfigurationModeFrequencyMins": 15,
"RefreshFrequencyMins": 30,
"RebootNodeIfNeeded": false,
"ActionAfterReboot": "continueConfiguration",
"AllowModuleOverwrite": true
}
}
SETTINGS_JSON
protected_settings = <<PROTECTED_SETTINGS_JSON
{
"configurationArguments": {
"RegistrationKey": {
"UserName": "PLACEHOLDER_DONOTUSE",
"Password": "${var.dsc_primary_access_key}"
}
}
}
PROTECTED_SETTINGS_JSON
}
The result is the following
So VM extension is created for each VM and the status says that provisioning succeeded.
For the next step I run the following terraform code
resource "azurerm_automation_dsc_configuration" "iswebserver" {
for_each = data.terraform_remote_state.pod_bootstrap.outputs.ops_rg
name = "iswebserver"
resource_group_name = each.value.name
automation_account_name = data.terraform_remote_state.ops.outputs.automation_account[each.key].name
location = each.key
content_embedded = "configuration iswebserver {}"
}
resource "azurerm_automation_dsc_nodeconfiguration" "iswebserver" {
for_each = data.terraform_remote_state.pod_bootstrap.outputs.ops_rg
name = "iswebserver.localhost"
resource_group_name = each.value.name
automation_account_name = data.terraform_remote_state.ops.outputs.automation_account[each.key].name
depends_on = [azurerm_automation_dsc_configuration.iswebserver]
content_embedded = file("${path.cwd}/iswebserver.mof")
}
The mof file content is the following
/*
#TargetNode='IsWebServer'
#GeneratedBy=P120bd0
#GenerationDate=02/25/2021 17:33:16
#GenerationHost=D-MJ05UA54
*/
instance of MSFT_RoleResource as $MSFT_RoleResource1ref
{
ResourceID = "[WindowsFeature]IIS";
IncludeAllSubFeature = True;
Ensure = "Present";
SourceInfo = "D:\\DSC\\testconfig.ps1::5::9::WindowsFeature";
Name = "Web-Server";
ModuleName = "PsDesiredStateConfiguration";
ModuleVersion = "1.0";
ConfigurationName = "TestConfig";
};
instance of OMI_ConfigurationDocument
{
Version="2.0.0";
MinimumCompatibleVersion = "1.0.0";
CompatibleVersionAdditionalProperties= {"Omi_BaseResource:ConfigurationName"};
Author="P120bd0";
GenerationDate="02/25/2021 17:33:16";
GenerationHost="D-MJ05UA54";
Name="TestConfig";
};
After running the code I have got the following result
The configuration is created as expected, clicking on configuration entry in UI grid, leads to the following
Meaning that node configuration is created as well. My expectation was that for each VM I will see the Node configured to run configuration provided in mof file but Nodes UI shows empty Nodes
So I was trying to configure node manually to connect all peaces together
and that fails with the following
So I am totally confisued. On the one hand there's azurerm_virtual_machine_extension that allows to create extension and bind it to the automation account. In addition there are azurerm_automation_dsc_configuration and azurerm_automation_dsc_nodeconfiguration that allows to create configuration and node configuration. But the bottom line is that you cannot connect all those dots to be able to create node.
Just to confirm that configuration is valid, I create additional vm without using azurerm_virtual_machine_extension and I was able succesfully add this MV to created node configuration
The problem was in azurerm_virtual_machine_extension dsc_configuration parameter. The value needs to be the same as name property of the azurerm_automation_dsc_nodeconfiguration resource.

How to send local files using Terraform Cloud as remote backend?

I am creating AWS EC2 instance and I am using Terraform Cloud as backend.
in ./main.tf:
terraform {
required_version = "~> 0.12"
backend "remote" {
hostname = "app.terraform.io"
organization = "organization"
workspaces { prefix = "test-dev-" }
}
in ./modules/instances/function.tf
resource "aws_instance" "test" {
ami = "${var.ami_id}"
instance_type = "${var.instance_type}"
subnet_id = "${var.private_subnet_id}"
vpc_security_group_ids = ["${aws_security_group.test_sg.id}"]
key_name = "${var.test_key}"
tags = {
Name = "name"
Function = "function"
}
provisioner "remote-exec" {
inline = [
"sudo useradd someuser"
]
connection {
host = "${self.public_ip}"
type = "ssh"
user = "ubuntu"
private_key = "${file("~/.ssh/mykey.pem")}"
}
}
}
and as a result, I got the following error:
Call to function "file" failed: no file exists at /home/terraform/.ssh/...
so what is happening here, is that terraform trying to find the file in Terraform Cloud instead of my local machine. How can I transfer file from my local machine and still using Terraform Cloud?
There is no straight way to do what I asked in the question. In the end I ended up uploading the keys into AWS with its CLI like this:
aws ec2 import-key-pair --key-name "name_for_the_key" --public-key-material file:///home/user/.ssh/name_for_the_key.pub
and then reference it like that:
resource "aws_instance" "test" {
ami = "${var.ami_id}"
...
key_name = "name_for_the_key"
...
}
Note Yes file:// looks like the "Windowsest" syntax ever but you have to use it on linux too.

Remote-exec not working in Terraform with aws_instance resourse

I have this below code when I run apply it gets a timeout. An instance is created but remote-exec commands don't work.
I am running this in the windows 10 machine.
Terraform version is v0.12.12 provider.aws v2.33.0
resource "aws_instance" "web" {
ami = "ami-54d2a63b"
instance_type = "t2.nano"
key_name = "terra"
tags = {
Name = "HelloWorld"
}
connection {
type = "ssh"
user = "ubuntu"
private_key = "${file("C:/Users/Vinayak/Downloads/terra.pem")}"
host = self.public_ip
}
provisioner "remote-exec" {
inline = [
"echo cat > test.txt"
]
}
}
Please try to change you host line to
host = "${self.public_ip}"
Letting people know the actual error message you are getting might help too. :)

Using function templatefile(path, vars) with a remote-exec provisioner

With terraform 0.12, there is a templatefile function but I haven't figured out the syntax for passing it a non-trivial map as the second argument and using the result to be executed remotely as the newly created instance's provisioning step.
Here's the gist of what I'm trying to do, although it doesn't parse properly because one can't just create a local variable within the resource block named scriptstr.
While I'm really trying to get the output of the templatefile call to be executed on the remote side, once the provisioner can ssh to the machine, I've so far gone down the path of trying to get the templatefile call output written to a local file via the local-exec provisioner. Probably easy, I just haven't found the documentation or examples to understand the syntax necessary. TIA
resource "aws_instance" "server" {
count = "${var.servers}"
ami = "${local.ami}"
instance_type = "${var.instance_type}"
key_name = "${local.key_name}"
subnet_id = "${element(aws_subnet.consul.*.id, count.index)}"
iam_instance_profile = "${aws_iam_instance_profile.consul-join.name}"
vpc_security_group_ids = ["${aws_security_group.consul.id}"]
ebs_block_device {
device_name = "/dev/sda1"
volume_size = 2
}
tags = "${map(
"Name", "${var.namespace}-server-${count.index}",
var.consul_join_tag_key, var.consul_join_tag_value
)}"
scriptstr = templatefile("${path.module}/templates/consul.sh.tpl",
{
consul_version = "${local.consul_version}"
config = <<EOF
"bootstrap_expect": ${var.servers},
"node_name": "${var.namespace}-server-${count.index}",
"retry_join": ["provider=aws tag_key=${var.consul_join_tag_key} tag_value=${var.consul_join_tag_value}"],
"server": true
EOF
})
provisioner "local-exec" {
command = "echo ${scriptstr} > ${var.namespace}-server-${count.index}.init.sh"
}
provisioner "remote-exec" {
script = "${var.namespace}-server-${count.index}.init.sh"
connection {
type = "ssh"
user = "clear"
private_key = file("${local.private_key_file}")
}
}
}
In your question I can see that the higher-level problem you seem to be trying to solve here is creating a pool of HashiCorp Consul servers and then, once they are all booted up, to tell them about each other so that they can form a cluster.
Provisioners are essentially a "last resort" in Terraform, provided out of pragmatism because sometimes logging in to a host and running commands on it is the only way to get a job done. An alternative available in this case is to instead pass the information from Terraform to the server via the aws_instance user_data argument, which will then allow the servers to boot up and form a cluster immediately, rather than being delayed until Terraform is able to connect via SSH.
Either way, I'd generally prefer to have the main body of the script I intend to run already included in the AMI so that Terraform can just run it with some arguments, since that then reduces the problem to just templating the invocation of that script rather than the whole script:
provisioner "remote-exec" {
inline = ["/usr/local/bin/init-consul --expect='${var.servers}' etc, etc"]
connection {
type = "ssh"
user = "clear"
private_key = file("${local.private_key_file}")
}
}
However, if templating an entire script is what you want or need to do, I'd upload it first using the file provisioner and then run it, like this:
provisioner "file" {
destination = "/tmp/consul.sh"
content = templatefile("${path.module}/templates/consul.sh.tpl", {
consul_version = "${local.consul_version}"
config = <<EOF
"bootstrap_expect": ${var.servers},
"node_name": "${var.namespace}-server-${count.index}",
"retry_join": ["provider=aws tag_key=${var.consul_join_tag_key} tag_value=${var.consul_join_tag_value}"],
"server": true
EOF
})
}
provisioner "remote-exec" {
inline = ["sh /tmp/consul.sh"]
}

Resources