Terraform and using provisioners

Terraform and using provisioners - terraform

I have a module that creates some VMs (along with a VApp) and I need to enable users of that module to pass scripts that would get run on creation and/or deletion of each VM. For example, to enroll the VM into a company authentication system, CMDB, monitoring system or other.
And I just cannot find a way to implement it nicely.
There are several approaches that I've tried:
Passing a script that gets passed directly to commands inside provisioner blocks on each VM resource.
Advantages of this approach:
Clean and simple - there are no additional resources.
Works as intended - runs on creation and/or deletion.
The problems with this approach:
You can only pass one script per VM.
You could concatenate them together, but that has obvious drawbacks like the failure in one is then a failure in everything leaving things in an inconsistent state. Retrying gets pretty messy.
It's somewhat ugly since if the command is null, Terraform crashes.
As a workaround, you could pass a one-liner containing only a comment about why it's there but that doesn't exactly make it pretty.
Creating a separate null_resource for each VM-script combination and putting the script there.
Advantages of this approach:
There is a separate resource for each script for each VM that if failed could be retried.
The problems with this approach:
Problems arise when updating the scripts.
Creation script updates can be ignored with a lifecycle ignore statement since if you're changing the enroll procedure after a machine has already been succesfully enrolled, you don't need to enroll it again.
But changing a when = destroy script results in the resource being recreated. Meaning it being destroyed and created with the new version of a script. Which in turn triggers the run of the old script which is very unlikely to be desirable.
Creating "runner" null_resources that only contain the name of a script to run. The script and its content is managed through a set of local_file resources.
Advantages of this approach:
Script content is separated from script execution.
The problems with this approach:
This falls apart in ephemeral environments (like running Terraform in a new container each time) where the files would have to be created each time. Maybe this can be worked around though seems it might be difficult because depends_on fails if something other than a static resource is used - fails when trying to use resource subscripts.
Additional considerations:
Using mechanisms of pushing those scripts to VMs and having them run there is undesirable because:
Additional risk of having to handle secrets passed to a lot of machines.
There is the assumption that the machines can also reach the APIs that is not always true.
The hack of putting information from other resources in triggers (because a when = destroy provisioner can only use references to self) is a bit worrisome because relying on it means everything will break completely if Terraform developers decide to remove it entirely.

Related

Terraform multi-stage resource initialization with temporary resources

I use terraform to initialize some OpenStack cloud resources.
I have a scenario where I would need to initialize/prepare a volume disk using a temporary compute resource. Once volume is fully initialized, I would no longer need the temporary compute resource but need to attach to another compute resource (different network configuration and other settings making reuse of first impossible). As you might have guessed, I cannot reach directly the expected long term goal without the intermediary step.
I know I could drive a state machine or some sort of processing queue from outside terraform to achieve this, but I wonder if it was possible to do it nicely in one single run of terraform.
The best I could think of, is that a main terraform script would trigger creation/destruction of the intermediate compute resource by launching a another terraform instance responsible just for the intermediate resources (using terraform apply followed by terraform destroy). However it requires extra care such as ensuring unique folder to deal with concurrent "main" resource initialization and makes the whole a bit messy I think.

I wonder if it was possible to do it nicely in one single run of terraform.
Sadly, no. Any "solution" which you could possibly implement for that (e.g. running custom scripts through local-exec, etc) in a single TF will only be convoluted mess, and will only lead to more issues that it solves in the long term.
The proper way, as you wrote, is to use dedicated CI/CD pipeline for a multistage deployment. Alternatively, don't use TF at all, and use other IaC tool.

How to change a terraform variable depending on runtime information?

I have a Terraform script that, after creating several nodes, installs a piece of software on them by telling one node about the other ones using cloud-init. I want this cloud-init piece to only run when this one node is initially created, and not if it's altered or re-created. Now thanks to Terraform plan, Terraform has the information needed, it's telling the user clearly if the node has to be re-created or if it's created the first time at all. But how do I get this kinda information in my script?
The (boring) and manual way is of course to make it a variable that a human enters after reviewing the plan. But this is hardly scalable, secure or sophisticated.
Maybe Terraform is entirely the wrong tool for this kinda job?

Terraform does not expose the information about what action is planned for an object for use in the configuration itself, because Terraform is a desired state system and so the actions are derived from the configuration, rather than the configuration being derived from the actions.
To achieve what you described I think you'll need to arrange for the initialization process to itself remember that it already run in some persistent location. Cloud-init itself remembers when it has run to completion so that rebooting the system won't re-run initialization tasks, but of course that information cannot survive replacing the VM entirely and so you'd need to create a similar marker yourself in a data store that will outlive that particular VM.
One unanswered question down that path is how that external state would get cleaned up if you were to destroy the VM entirely, since the software running in the VM can't tell whether the system is being shut down in preparation for replacement or being shut down in response to just destroying.

How can I debug custom terraform provider which I have implemented

I have implemented a kubernetes terraform provider which applies manifest files to k8s cluster. I have created .tf files also but when I run terraform init it downloads plugins from terraform registry.
How can I make my plugin to run for terraform apply.

Debugging a TerraForm provider
Understanding the design
In order to do it, you first have to understand how Go builds apps, and then how terraform works with it.
Every terraform provider is a sort of module. In order to support an open, modular system, in almost any language, you need to be able to dynamically load modules and interact with them. Terraform is no exception.
However, the go lang team long ago decided to compile to statically linked applications;
any dependencies you have will be compiled into 1 single binary. Unlike in other native languages (like C, or C++), a
.dll or .so is not used; there is no dynamic library to load at runtime and thus, modularity becomes a whole other trick.
This is done to avoid the notorious dll hell that was so common up until most modern systems included some
kind of dependency management. And yes, it can still be an issue.
Every terraform provider is its own mini RPC server. When terraform runs your provider, it actually starts a new process that is your provider, and connects to it through
this RPC channel. Compounding the problem is that the lifetime of your provider process is very much
ephemeral; potentially lasting no more and a few seconds. It's this process you need to connect to with your debugger
Normal debugging
Normally, you would directly spin-up your app, and it would load modules into application memory. That's why you can actually
debug it, because your debugger knows how to find the exact memory address for your provider. However, you don't have
this arrangement, and you need to do a remote debug session.
The conundrum
So, you don't load terraform directly, and even if you did, your module (a.k.a your provider) is in the memory
space of an entirely different process; and that lasts no more than a few seconds, potentially.
The solution
You need the debugging tool delve.
You are going to have to place a little bit of shim code close to the spot in the code where you want to begin
debugging. We need to stop this provider process from exiting before we can connect. So, put this bit of code in place:
connected := false
for !connected {
time.Sleep(time.Second) // set breakpoint here
}
This code effectively creates an infinite sleep loop; but that's actually essential to solving the problem.
Place a break point right inside this loop. It won't do anything, yet.
Now run the terraform commands you need to, to engage the code you're desiring to debug. Upon doing so,
terraform will basically stop, as it waits on a response from you provider; because you put an infinite sleep loop in
You must now tell delve to connect to this remote process using it's PID. This isn't as hard as it seems.
Run this commands:
dlv --listen=:2345 --headless=true --api-version=2 --accept-multiclient attach $(pgrep terraform-provider-artifactory)
The last argument gets the PID for your provider and supplies it to delve to connect. Immediately upon running this
command, you're going to hit your break point. Please make sure to substitute terraform-provider-artifactory for your provider name
To exit this infinite loop, use your debugger to set connected to true. By doing so you change the loop predicate
and it will exit this loop on the next iteration.
DEBUG! - At this point you can, step, watch, drop the call stack, etc. Your whole arsenel is available

Can't figure out how to reuse terraform provisioners

I've created some (remote-exec and file) provisioners to bootstrap (GCP) VMs that I'm creating that I want to apply to all my VMs, but I can't seem to figure out how to reuse them...?
Modules seem like the obvious answer, but creating a module to create the VMs means I'd need to make input vars for everything that I'd want to configure on each of the VMs specifically...
Reusing the snippets with the provisioners doesn't seem possible though?

Terraform's Provisioner feature is intended as a sort of "last resort" for situations where there is no alternative but to SSH into a machine and run commands on it remotely, but generally we should explore other options first.
The ideal case is to design your machine images so that they are already correctly configured for what they need to do and so they can immediately start doing that work on boot. If you use HashiCorp Packer then you can potentially run very similar steps at image build time to what you might've otherwise run at Terraform create time with provisioners, perhaps allowing you to easily adapt the work you already did.
If they need some configuration parameters from Terraform in order to start their work, you can use features like the GCP instance metadata argument to pass in those values so that the software in the image can access it as soon as the system boots.
A second-best sort of option is to use features like GCP startup scripts to pass the script to run via metadata so that again it's available immediately on boot, without the need to wait for the SSH server to start up and become available.
In both of these cases, the idea is to rely on features provided by the compute platform to treat the compute instances as a sort of "appliance", so Terraform (and you) can think of them as being similar to a resource modelling a hosted service. Terraform is concerned only with starting and stopping this black box infrastructure and wiring it in with other infrastructure, and the instance handles its implementation details itself. For use-cases where horizontal scaling is appropriate, this also plays nicely with managed autoscaling functionality like google_compute_instance_group, since new instances can be started by that system rather than directly by Terraform.
Because Provisioners are designed as a last-resort for when approaches like the above are not available, their design does not include any means for general reuse. It's expected that each provisioner will be a tailored solution to a specific problem inline in the resource it relates to, not something you use systematically across many separate callers.
With that said, if you are using file and remote-exec in particular you can get partway there by factoring out the specific file to be uploaded and the remote command to execute, in which case your resource blocks will contain just the declaration boilerplate while avoiding repetition of the implementation details. For example, if you had a module that exported outputs local_file_path, remote_file_path, and remote_commands you could write something like this:
module "provisioner_info" {
source = "./modules/provisioner-info"
}
resource "any" "example" {
# ...
provisioner "file" {
source = module.provisioner_info.local_file_path
destination = module.provisioner_info.remote_file_path
}
provisioner "remote-exec" {
inline = module.provisioner_info.remote_commands
}
}
That is the limit for factoring out provisioner details in current versions of Terraform.

How to test terraform templates other than trial and error

I'm creating cloud resources using Terraform. Each resource is expected to be in a particular desired state after provisioning. For example, when I create a Google Cloud Bucket, I would like certain permissions to be applied automatically. So, my plan contains necessary code for this but I wanted to make sure that this works all the time regardless before I apply. Is there any testing tool/library that can help here?

Yes, I had the same thinking before. Currently, I use several ways to reduce the risk when I apply a new terraform change.
They can't guarantee a 100% successful terraform apply, but will fix the most issues before you apply it.
Validate terraform configuration files.
Terraform has the validate function for starting. But it is not smart enough to go through subfolders. I create a small shell function and add in CI/CD pipeline to run it automatically before terraform apply.
validate() {
modules=$(find . -type f -name "*.tf" -exec dirname {} \;|sort -u)
for m in ${modules}
do
(terraform validate "$m" && echo "√ $m") || exit 1
done
}
Of course, do terraform fmt before you submit your change is not bad idea.
terraform plan
#Martin Atkins explained it already, and terraform.io has details about this command.
run automation test kitchen.
That's a test Kitchen plugin for testing Terraform configurations
https://github.com/newcontext-oss/kitchen-terraform
That's an integration test. The test will run in separate VPC with as more as test cases you added. Add the automation test in CI/CD pipeline as well to trigger an automation test every time when you raise merge request to master branch. Apply the change only after getting the test passed.

The terraform plan command is intended to give a preview of what changes Terraform will make when the plan is applied, which is the closest we can get to testing a Terraform configuration without touching the "real" API.
For situations where that isn't enough, it's common to deploy the same config multiple times with different states, thus allowing one to be used as a "staging" environment to test changes without affecting the primary environment. The State Environments feature added in Terraform 0.9 can make this easier, since the multiple environment states can be managed directly with Terraform CLI commands.
When it comes to automated testing of the result, there is currently no full solution to this integrated into Terraform, but there are some building blocks that could be useful to assist in writing tests in a separate programming language.
Terraform produces state files in JSON format that can, in principle, be used by external programs to extract certain data about what Terraform created. While this format is not yet considered officially stable, in practice it changes infrequently enough that people have successfully integrated with it, accepting that they might need to make adjustments as they upgrade Terraform.
What strategy is appropriate here will depend a lot on what exactly you want to test. For example:
In an environment that's spinning up virtual servers, tools like Serverspec can be used to run tests from the perspective of these servers. This can either be run separately from Terraform using some out-of-band process, or as part of the Terraform apply using the remote-exec provisioner. This allows verification of questions like "can the server reach the database?", but is not suitable for questions such as "is the instance's security group restrictive enough?", since robustly checking that requires accessing data from outside of the instance itself.
It's possible to write tests using an existing test framework (such as RSpec for Ruby, unittest for Python, etc) which gather relevant resource ids or addresses from the Terraform state file and then use the relevant platform's SDK to retrieve data about the resources and assert that they are set up as expected. This is a more general form of the previous idea, running the tests from the perspective of a host outside of the infrastructure under test, and can thus collect a broader set of data to make assertions on.
For more modest needs, one can choose to trust that the Terraform state is an accurate representation of reality (a valid assumption in many cases) and simply assert directly on that. This is most appropriate for simple "lint-like" cases, such as verifying that the correct resource tagging scheme is being followed for cost-allocation purposes.
There is some more discussion about this in a relevant Terraform Github issue.
In the latest versions of Terraform it is strongly recommended to use a remote backend for any non-toy application, but that means that the state data is not directly available on local disk. However, a snapshot of it can be retrieved from the remote backend using the terraform state pull command, which prints the JSON-formatted state data to stdout so it can be captured and parsed by a calling program.

We recently open sourced Terratest, our swiss army knife for testing infrastructure code.
Today, you're probably testing all your infrastructure code manually by deploying, validating, and undeploying. Terratest helps you automate this process:
Write tests in Go.
Use helpers in Terratest to execute your real IaC tools (e.g., Terraform, Packer, etc.) to deploy real infrastructure (e.g., servers) in a real environment (e.g., AWS). Note that this environment would be a separate "sandbox" account and not prod!
Use helpers in Terratest to validate that the infrastructure works correctly in that environment by making HTTP requests, API calls, SSH connections, etc.
Use helpers in Terratest to undeploy everything at the end of the test.
Here's an example test for some Terraform code:
terraformOptions := &terraform.Options {
// The path to where your Terraform code is located
TerraformDir: "../examples/terraform-basic-example",
}
// This will run `terraform init` and `terraform apply` and fail the test if there are any errors
terraform.InitAndApply(t, terraformOptions)
// At the end of the test, run `terraform destroy` to clean up any resources that were created
defer terraform.Destroy(t, terraformOptions)
// Run `terraform output` to get the value of an output variable
instanceUrl := terraform.Output(t, terraformOptions, "instance_url")
// Verify that we get back a 200 OK with the expected text
// It can take a minute or so for the Instance to boot up, so retry a few times
expected := "Hello, World"
maxRetries := 15
timeBetweenRetries := 5 * time.Second
http_helper.HttpGetWithRetry(t, instanceUrl, 200, expected, maxRetries, timeBetweenRetries)
These are integration tests, and depending on what you're testing, can take 5 - 50 minutes. It's not fast (though using Docker and test stages, you can speed some things up), and you'll have to work to make the tests reliable, but it is well worth the time.
Check out the Terratest repo for docs and lots of examples of various types of infrastructure code and the corresponding tests for them.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string