How to compare two Terraform plans for differences? - terraform

There is a pipeline that generates a Terraform plan and pauses until a manager approves the changes. It can pass an undetermined period of time for it to be approved (one second, three hours, etc) so the proposed Terraform plan could differ from the Terraform plan executed after the approval due to many reasons like the infrastructure being manually modified (not intended but possible).
The pipeline after the approval runs the second Terraform plan and compares it with the first one generated in the pre-approval stage. The pipeline does the comparison with a git diff and fails if there is a difference. That is not working as expected because the plans differ even if generated one after the other, in a section called relevant_attributes, but the differences are the order in which the JSON is generated, not the content or effective changes.
The following scripts are being used to generate the JSONs and compare them out:
terraform show -json expected.tfplan | jq . > expected.json
terraform show -json actual.tfplan | jq . > actual.json
git diff --no-index expected.json actual.json || exit 1
Is there a fix for this approach? Alternatively, are there better ways to compare two Terraform plans for differences in this approval scenario?

You can export your plan with -out=PLAN_FILE and then only apply it when you want.
For example, you can run
terraform plan -out tfplan.zip
To show the plan you can run
terraform show tfplan.zip
Then when you want to apply you just run terraform apply PLAN_FILE
terraform apply tfplan.zip
Or you can go for other approaches, for example by using approval features on some CI/CD platforms (circleci, or GitHub Actions's deployments for example) where you block the apply step and keep your pipeline pending.
Or simply treat your IaC as we would treat our code, only deploy when merging to some branches; makes your main branches protected and only require approvals from authorized people. When opening a PR run a simple Terraform plan, and when it gets merged you run terraform apply.
Edit:
Apparently, these solutions might not fit your use case, so your approach might be better, you just need to sort your resource_change array with jq in case it gets shuffled between each plan
terraform show -json expected.tfplan | jq '.resource_changes | sort_by(.address)'

Related

Restrict variable grouping being used in Azure DevOps Step

I have two variable groups with overlapping keys but different values. I want to use one group under one task [ JSON replace ] and the other group in another [JSON replace ].
I have tried going through the documentations and it says that variables can only be set at root/stage/job levels. Is there a way I can work around it?
I want to use one group under one task [ JSON replace ] and the other group in another [JSON replace ].
According to the document Specify jobs in your pipeline, we could to know that:
You can organize your pipeline into jobs. Every pipeline has at least
one job. A job is a series of steps that run sequentially as a unit.
In other words, a job is the smallest unit of work that can be
scheduled to run.
And the variable group will be added as a preselected condition to the precompiled review stage, we could not re-set in the task level.
To resolve this issue, you could overwrite the specify variable by the Logging Command:
Write-Host "##vso[task.setvariable variable=testvar;]testvalue"

Terraform apply in GitLab CI/CD only if plan is not empty

I use Gitlab CI/CD to provision infrastructure with Terraform.
I currently have a 3 stages pipeline (init, plan, apply) that works great with a manual apply job.
The plan job shares a plan artefact with the apply job.
Sometimes the plan is empty (no resource to change) but the apply job is still mandatory.
Do you know a way to avoid running the apply job when the plan is empty ?
Or to automatically (instead of manually) run the apply job when the plan is empty ?
The easiest way to implement that is by using -detailed-exitcode in your terraform plan step. As per Terraform documentation:
-detailed-exitcode Return detailed exit codes when the command exits. This
will change the meaning of exit codes to:
0 - Succeeded, diff is empty (no changes)
1 - Errored
2 - Succeeded, there is a diff
As an example, run in your terminal the plan:
terraform plan -detailed-exitcode
An then run in your terminal:
echo $?
If the plan contains any changes your value won't be zero.
You just need evaluate that value in your CI/CD with an If statement to decide if you want to run the apply or not.
In you plan job, you output your plan file:
terraform plan --out planoutput
Which you pass as an artifact to your apply job.
Before applying, you can grep (I'll let you find the correct grep command here) your plan
terraform show planoutput | grep Plan
And depending on that, do a terraform apply or not.

How to acces output folder from a PythonScriptStep?

I'm new to azure-ml, and have been tasked to make some integration tests for a couple of pipeline steps. I have prepared some input test data and some expected output data, which I store on a 'test_datastore'. The following example code is a simplified version of what I want to do:
ws = Workspace.from_config('blabla/config.json')
ds = Datastore.get(ws, datastore_name='test_datastore')
main_ref = DataReference(datastore=ds,
data_reference_name='main_ref'
)
data_ref = DataReference(datastore=ds,
data_reference_name='main_ref',
path_on_datastore='/data'
)
data_prep_step = PythonScriptStep(
name='data_prep',
script_name='pipeline_steps/data_prep.py',
source_directory='/.',
arguments=['--main_path', main_ref,
'--data_ref_folder', data_ref
],
inputs=[main_ref, data_ref],
outputs=[data_ref],
runconfig=arbitrary_run_config,
allow_reuse=False
)
I would like:
my data_prep_step to run,
have it store some data on the path to my data_ref), and
I would then like to access this stored data afterwards outside of the pipeline
But, I can't find a useful function in the documentation. Any guidance would be much appreciated.
two big ideas here -- let's start with the main one.
main ask
With an Azure ML Pipeline, how can I access the output data of a PythonScriptStep outside of the context of the pipeline?
short answer
Consider using OutputFileDatasetConfig (docs example), instead of DataReference.
To your example above, I would just change your last two definitions.
data_ref = OutputFileDatasetConfig(
name='data_ref',
destination=(ds, '/data')
).as_upload()
data_prep_step = PythonScriptStep(
name='data_prep',
script_name='pipeline_steps/data_prep.py',
source_directory='/.',
arguments=[
'--main_path', main_ref,
'--data_ref_folder', data_ref
],
inputs=[main_ref, data_ref],
outputs=[data_ref],
runconfig=arbitrary_run_config,
allow_reuse=False
)
some notes:
be sure to check out how DataPaths work. Can be tricky at first glance.
set overwrite=False in the `.as_upload() method if you don't want future runs to overwrite the first run's data.
more context
PipelineData used to be the defacto object to pass data ephemerally between pipeline steps. The idea was to make it easy to:
stitch steps together
get the data after the pipeline runs if need be (datastore/azureml/{run_id}/data_ref)
The downside was that you have no control over where the pipeline is saved. If you wanted to data for more than just as a baton that gets passed between steps, you could have a DataTransferStep to land the PipelineData wherever you please after the PythonScriptStep finishes.
This downside is what motivated OutputFileDatasetConfig
auxilary ask
how might I programmatically test the functionality of my Azure ML pipeline?
there are not enough people talking about data pipeline testing, IMHO.
There are three areas of data pipeline testing:
unit testing (the code in the step works?
integration testing (the code works when submitted to the Azure ML service)
data expectation testing (the data coming out of the meets my expectations)
For #1, I think it should be done outside of the pipeline perhaps as part of a package of helper functions
For #2, Why not just see if the whole pipeline completes, I think get more information that way. That's how we run our CI.
#3 is the juiciest, and we do this in our pipelines with the Great Expectations (GE) Python library. The GE community calls these "expectation tests". To me you have two options for including expectation tests in your Azure ML pipeline:
within the PythonScriptStep itself, i.e.
run whatever code you have
test the outputs with GE before writing them out; or,
for each functional PythonScriptStep, hang a downstream PythonScriptStep off of it in which you run your expectations against the output data.
Our team does #1, but either strategy should work. What's great about this approach is that you can run your expectation tests by just running your pipeline (which also makes integration testing easy).

Azure DevOps Releases skip tasks

I'm currently working on implementing CI/CD pipelines for my company in Azure DevOps 2020 (on premise). There is one requirement I just not seem to be able to solve conveniently: skipping certain tasks depending on user input in a release pipeline.
What I want:
User creates new release manually and decides if a task group should be executed.
Agent Tasks:
1. Powershell
2. Task Group (conditional)
3. Task Group
4. Powershell
What I tried:
Splitting the tasks into multiple jobs with the task group depending on a manual intervention task.
does not work, if the manual intervention is rejected the whole execution stops with failed.
Splitting the tasks into multiple stages doing almost the same as above with the same outcome.
Splitting the tasks into multiple stages trigger every stage manually.
not very usable because you have to execute what you want in the correct order and after the previous stages succeeded.
Variable set at release creation (true/false).
Will use that if nothing better comes up but kinda prone to typos and not very usable for the colleagues who will use this. Unfortunately Azure DevOps seems to not support dropdown or checkbox variables for releases. (but works with parameters in builds)
Two Stages one with tasks 1,2,3,4 and one with tasks 1,3,4.
not very desireable for me because of duplication.
Any help would be highly appreciated!
Depends on what the criteria is for the pipelines to run. One recommendation would be two pipeline lines calling the same template. And each pipeline may have a true/false embedded in it to pass as a parameter to the template.
The template will have all the tasks defined in it; however, the conditional one will have a condition like:
condition: and(succeeded(), eq('${{ parameters.runExtraStep}}', true))
This condition would be set at the task level.
Any specific triggers can be defined in the corresponding pipeline.
Here is the documentation on Azure YAML Templates to get you started.
Unfortunately, it's impossible to add custom condition for a Task Group, but this feature is on Roadmap. Check the following user voice and you can vote it:
https://developercommunity.visualstudio.com/idea/365689/task-group-custom-conditions-at-group-and-task-lev.html
The workaround is that you can clone the release definition (right click a release definition > Clone), then remove some tasks or task groups and save it, after that you can create release with corresponding release definition per to detailed scenario.
Finally I decided to stick with Releases and split my tasks into 3 agent jobs. Job 1 with the first powershell, job 2 with the conditional taskgroup that executes only if a variable is true and job 3 with the remaining tasks.
As both cece-dong and dreadedfrost stated, I could've achieved a selectable runtime parameter for the condition with yaml pipelines. Unfortunately one of the task groups needs a specific artifact from a yaml pipeline. Most of the time it would be the "latest", which can be easily achieved with a download artifacts task but sometimes a previous artifact get's chosen. I have found no easy way to achieve this in a way as convenient as it is in releases where you by default have a dropdown with a list of artifacts.
If found this blog post for anyone interested on how you can handle different build artifacts in yaml pipelines.
Thanks for helping me out!

Azure DevOps Pipeline Test results contain Duplicate Test Cases

If I go to Test results screen after run of my pipeline, it is showing each test case from Java/Maven/TestNG automated test project duplicated. One instance of each test case shows blank for machine name and the duplicate of that shows a machine name.
Run 1000122 - JUnit_TestResults_3662
There are several possibilities. First, if you added multiple configurations to a test plan, if so, the tests cases will be repeated in the plan with the each of the configurations you have assigned.
Another possibility is that when you passed parameters to the test method, did you use multiple parameters, so the test method was executed two times.
The information you provided is not sufficient. Can you share the code or screenshots of your Test Samples?

Resources