I'm trying to decide whether to use Ansible or Terraform for a project. This project is doing scientific computations on cloud servers. I want to be able to say "here's an input file, go grab a computer with 16vCPUs. When everything's done, copy the output file back and destroy the resource". Is there a way for me to tell terraform "destroy/take action on this resource when you detect a command is done or some api is hit"?
Is there a way for me to tell terraform "destroy/take action on this
resource when you detect a command is done or some api is hit"?
If it's detectable via API hit then you can use local-exec Provisioner and use curl to hit the api endpoint in loop (in a wrapping shell script), polling the status of the API to deduce if the task is done or not. If you do so, make sure to give appropriate timeout value to this provisioner.
For the cleanup sequence (post API status is affirmative), declare the dependency explicitly to ensure Terraform is able to follow the execution accordingly.
Having explained all this, your end to end objective is best suited with a combination of IaC + Config mgmt tool. If I had to do it, I'd have preferred to use Terraform + Ansible combination in following outline
A shell script invoking Terraform to provision the 16vCPUs resource with all necessary depdendency
From within Terraform, invoke local-exec provisioner to launch ansible for appropriately configuring the instance, copying the file and launching remote execution etc.
Let the control come back to shell script which will start polling the status of the task based on API polling using curl
As and when shell script detects the task is done, it'd have copied the remote file locally (this step can be made better but that's a topic for another discussion)
Invoke Terraform again, with destroy this time, to tear down entire environment.
I'd further recommend to have it tied up in a jenkins workflow (if you are using jenkins already) giving control to other user's and still having full visibility and auditing in place.
Related
I have a module that creates some VMs (along with a VApp) and I need to enable users of that module to pass scripts that would get run on creation and/or deletion of each VM. For example, to enroll the VM into a company authentication system, CMDB, monitoring system or other.
And I just cannot find a way to implement it nicely.
There are several approaches that I've tried:
Passing a script that gets passed directly to commands inside provisioner blocks on each VM resource.
Advantages of this approach:
Clean and simple - there are no additional resources.
Works as intended - runs on creation and/or deletion.
The problems with this approach:
You can only pass one script per VM.
You could concatenate them together, but that has obvious drawbacks like the failure in one is then a failure in everything leaving things in an inconsistent state. Retrying gets pretty messy.
It's somewhat ugly since if the command is null, Terraform crashes.
As a workaround, you could pass a one-liner containing only a comment about why it's there but that doesn't exactly make it pretty.
Creating a separate null_resource for each VM-script combination and putting the script there.
Advantages of this approach:
There is a separate resource for each script for each VM that if failed could be retried.
The problems with this approach:
Problems arise when updating the scripts.
Creation script updates can be ignored with a lifecycle ignore statement since if you're changing the enroll procedure after a machine has already been succesfully enrolled, you don't need to enroll it again.
But changing a when = destroy script results in the resource being recreated. Meaning it being destroyed and created with the new version of a script. Which in turn triggers the run of the old script which is very unlikely to be desirable.
Creating "runner" null_resources that only contain the name of a script to run. The script and its content is managed through a set of local_file resources.
Advantages of this approach:
Script content is separated from script execution.
The problems with this approach:
This falls apart in ephemeral environments (like running Terraform in a new container each time) where the files would have to be created each time. Maybe this can be worked around though seems it might be difficult because depends_on fails if something other than a static resource is used - fails when trying to use resource subscripts.
Additional considerations:
Using mechanisms of pushing those scripts to VMs and having them run there is undesirable because:
Additional risk of having to handle secrets passed to a lot of machines.
There is the assumption that the machines can also reach the APIs that is not always true.
The hack of putting information from other resources in triggers (because a when = destroy provisioner can only use references to self) is a bit worrisome because relying on it means everything will break completely if Terraform developers decide to remove it entirely.
I'm new to this terraform world and I've been assigned into the task of creating many configurations to azure with it.
I'm developing a main.tf script (which creates some resources, like resource group, vnets, kubernetes cluster, app services, etc.) and while coding it and executing
Terraform apply, it seems to only apply what changed doing in fact updates.
Then we deleted the resource group the script created and a colegue of mine had to run the same script with terraform creating a resource group with another name since i didn't had a required permission, after that, if i run the command Terraform apply it fails and gives errors, that say that the resource cannot be created because it already exists.
After reading some documentation i found that it might be because of the state
https://www.terraform.io/docs/state/index.html
Is the update of a script something that only works for each session of terraform?
Even doing a Terraform refresh doesn't seem to work.
Or probably I'm just mistaking and there is no way to update some resources.
EDIT: for some reason the state file that was on the storage only had a few things, the solution was to delete everything and create again.
For the new resources, there is nothing more, the Terraform script helps you create the resources you set in the script.
For the existing resources, when you make changes in the script that you already deployed via the Terraform, then it will check the state file to make sure what changes the resources should update. If there is no state file ( or you delete it), then it will deploy the Terraform script directly, but if any resources you want to deploy already exists, then it will fail due to the existing resources. And the command terraform refresh just updates the last state of the resources in the Terraform script that you already deployed. If the deployment failed and the state file has no resources in it, then refresh is not useful.
If someone else ran terraform apply for you because you didn't have access, and now you want to modify that terraform and run it yourself, you need to get the state file that was generated when that other person ran it. You absolutely have to maintain the Terraform state file somewhere, so that it can be accessed on subsequent runs. You should really configure a Terraform backend, instead of using local state files.
You need to be aware that Terraform stores everything it does in the state file, and refers to that file before every run. A terraform refresh only tells Terraform to refresh the state of the things that are in the state file, it doesn't rebuild the state file from scratch. Understanding Terraform state files is so fundamental to the use of Terraform that you really need to understand this before using it.
Is it possible to fail an aws_instance creation if script passed into user_data fails to run? e.g., with exit 1?
I have a null_resource that uses depends_on [aws_instance.myVM] to post a JSON to the instance, and I need that depends_on to fail when the user_data script fails.
Thanks!
user_data is handled by software running inside the instance itself, such as cloud-init, and so its processing is asynchronous from the ec2:RunInstances call that Terraform's AWS provider makes to start the instance running. There is therefore no way to feed back status information from the user_data handling because it could potentially be running some time after the EC2 API starts reporting that the instance is "running", depending on where in the boot process it's dealt with.
Also, depends_on is for ordering rather than for handling errors, so a depends_on clause will never change anything about Terraform's error handling.
If you want to run software on your instance synchronously as part of Terraform's create operation then unfortunately the only practical option is to use the remote-exec provisioners. This comes at the expense of some considerable extra complexity, because Terraform must now be able to open an SSH session with the instance and authenticate with it to create a two-way communications channel.
In return for that complexity, Terraform can be the one to run the code in question and so Terraform can detect whether it succeeded or not (using its exit status). If the remote command fails, Terraform will halt further processing, return an error, and mark the instance as tainted so that the next plan will attempt to create it again.
With that said, it may be better to find a different way to achieve your goal that doesn't require coupling the Terraform run directly to the state of the software running in the virtual machine. It's not usually expected that a Terraform configuration should need to both start up a virtual machine and interact with software running in that virtual machine in the same operation. This could be a good point for some system decomposition, where you'd have one Terraform configuration that declares that the virtual machine should exist and then a second configuration that assumes that the virtual machine exists and takes actions against it. You can then run whatever other software you need to run in between those two, in order to check whether the instance started up successfully.
One not so obvious way I am doing this is I register the instance as a consul service at the end of user data. If the service never arrives, downstream polling for arrival of the consul service can also timeout and fail. This could be located in another local exec function which could be used as a dependency, or it could be in other instance user data as well.
Perhaps a better way would be to use the consul KV store though, and not to just rely on service arrival. The KV store could be used to provide more information than just a binary state.
I want to run a script using terraform inside an existing instance on any cloud which is pre-created .The instance was created manually , is there any way to push my script to this instance and run it using terraform ?
if yes ,then How can i connect to the instance using terraform and push my script and run it ?
I believe ansible is a better option to achieve this easily.
Refer the example give here -
https://docs.ansible.com/ansible/latest/modules/script_module.html
Create a .tf file and describe your already existing resource (e.g. VM) there
Import existing thing using terraform import
If this is a VM then add your script to remote machine using file provisioner and run it using remote-exec - both steps are described in Terraform file, no manual changes needed
Run terraform plan to see if expected changes are ok, then terraform apply if plan was fine
Terraform's core mission is to create, update, and destroy long-lived infrastructure objects. It is not generally concerned with the software running in the compute instances it deploys. Instead, it generally expects each object it is deploying to behave as a sort of specialized "appliance", either by being a managed service provided by your cloud vendor or because you've prepared your own machine image outside of Terraform that is designed to launch the relevant workload immediately when the system boots. Terraform then just provides the system with any configuration information required to find and interact with the surrounding infrastructure.
A less-ideal way to work with Terraform is to use its provisioners feature to do late customization of an image just after it's created, but that's considered to be a last resort because Terraform's lifecycle is not designed to include strong support for such a workflow, and it will tend to require a lot more coupling between your main system and its orchestration layer.
Terraform has no mechanism intended for pushing arbitrary files into existing virtual machines. If your virtual machines need ongoing configuration maintenence after they've been created (by Terraform or otherwise) then that's a use-case for traditional configuration management software such as Ansible, Chef, Puppet, etc, rather than for Terraform.
I'm creating cloud resources using Terraform. Each resource is expected to be in a particular desired state after provisioning. For example, when I create a Google Cloud Bucket, I would like certain permissions to be applied automatically. So, my plan contains necessary code for this but I wanted to make sure that this works all the time regardless before I apply. Is there any testing tool/library that can help here?
Yes, I had the same thinking before. Currently, I use several ways to reduce the risk when I apply a new terraform change.
They can't guarantee a 100% successful terraform apply, but will fix the most issues before you apply it.
Validate terraform configuration files.
Terraform has the validate function for starting. But it is not smart enough to go through subfolders. I create a small shell function and add in CI/CD pipeline to run it automatically before terraform apply.
validate() {
modules=$(find . -type f -name "*.tf" -exec dirname {} \;|sort -u)
for m in ${modules}
do
(terraform validate "$m" && echo "√ $m") || exit 1
done
}
Of course, do terraform fmt before you submit your change is not bad idea.
terraform plan
#Martin Atkins explained it already, and terraform.io has details about this command.
run automation test kitchen.
That's a test Kitchen plugin for testing Terraform configurations
https://github.com/newcontext-oss/kitchen-terraform
That's an integration test. The test will run in separate VPC with as more as test cases you added. Add the automation test in CI/CD pipeline as well to trigger an automation test every time when you raise merge request to master branch. Apply the change only after getting the test passed.
The terraform plan command is intended to give a preview of what changes Terraform will make when the plan is applied, which is the closest we can get to testing a Terraform configuration without touching the "real" API.
For situations where that isn't enough, it's common to deploy the same config multiple times with different states, thus allowing one to be used as a "staging" environment to test changes without affecting the primary environment. The State Environments feature added in Terraform 0.9 can make this easier, since the multiple environment states can be managed directly with Terraform CLI commands.
When it comes to automated testing of the result, there is currently no full solution to this integrated into Terraform, but there are some building blocks that could be useful to assist in writing tests in a separate programming language.
Terraform produces state files in JSON format that can, in principle, be used by external programs to extract certain data about what Terraform created. While this format is not yet considered officially stable, in practice it changes infrequently enough that people have successfully integrated with it, accepting that they might need to make adjustments as they upgrade Terraform.
What strategy is appropriate here will depend a lot on what exactly you want to test. For example:
In an environment that's spinning up virtual servers, tools like Serverspec can be used to run tests from the perspective of these servers. This can either be run separately from Terraform using some out-of-band process, or as part of the Terraform apply using the remote-exec provisioner. This allows verification of questions like "can the server reach the database?", but is not suitable for questions such as "is the instance's security group restrictive enough?", since robustly checking that requires accessing data from outside of the instance itself.
It's possible to write tests using an existing test framework (such as RSpec for Ruby, unittest for Python, etc) which gather relevant resource ids or addresses from the Terraform state file and then use the relevant platform's SDK to retrieve data about the resources and assert that they are set up as expected. This is a more general form of the previous idea, running the tests from the perspective of a host outside of the infrastructure under test, and can thus collect a broader set of data to make assertions on.
For more modest needs, one can choose to trust that the Terraform state is an accurate representation of reality (a valid assumption in many cases) and simply assert directly on that. This is most appropriate for simple "lint-like" cases, such as verifying that the correct resource tagging scheme is being followed for cost-allocation purposes.
There is some more discussion about this in a relevant Terraform Github issue.
In the latest versions of Terraform it is strongly recommended to use a remote backend for any non-toy application, but that means that the state data is not directly available on local disk. However, a snapshot of it can be retrieved from the remote backend using the terraform state pull command, which prints the JSON-formatted state data to stdout so it can be captured and parsed by a calling program.
We recently open sourced Terratest, our swiss army knife for testing infrastructure code.
Today, you're probably testing all your infrastructure code manually by deploying, validating, and undeploying. Terratest helps you automate this process:
Write tests in Go.
Use helpers in Terratest to execute your real IaC tools (e.g., Terraform, Packer, etc.) to deploy real infrastructure (e.g., servers) in a real environment (e.g., AWS). Note that this environment would be a separate "sandbox" account and not prod!
Use helpers in Terratest to validate that the infrastructure works correctly in that environment by making HTTP requests, API calls, SSH connections, etc.
Use helpers in Terratest to undeploy everything at the end of the test.
Here's an example test for some Terraform code:
terraformOptions := &terraform.Options {
// The path to where your Terraform code is located
TerraformDir: "../examples/terraform-basic-example",
}
// This will run `terraform init` and `terraform apply` and fail the test if there are any errors
terraform.InitAndApply(t, terraformOptions)
// At the end of the test, run `terraform destroy` to clean up any resources that were created
defer terraform.Destroy(t, terraformOptions)
// Run `terraform output` to get the value of an output variable
instanceUrl := terraform.Output(t, terraformOptions, "instance_url")
// Verify that we get back a 200 OK with the expected text
// It can take a minute or so for the Instance to boot up, so retry a few times
expected := "Hello, World"
maxRetries := 15
timeBetweenRetries := 5 * time.Second
http_helper.HttpGetWithRetry(t, instanceUrl, 200, expected, maxRetries, timeBetweenRetries)
These are integration tests, and depending on what you're testing, can take 5 - 50 minutes. It's not fast (though using Docker and test stages, you can speed some things up), and you'll have to work to make the tests reliable, but it is well worth the time.
Check out the Terratest repo for docs and lots of examples of various types of infrastructure code and the corresponding tests for them.