Managing Auto Scaling Group via terraform - terraform

Let say I have a auto-scaling group which I manage via terraform. And i want that auto scaling group to scale up and scale down based on our business hours .
The TF template for managing ASG :
resource "aws_autoscaling_group" "foobar" {
availability_zones = ["us-west-2a"]
name = "terraform-test-foobar5"
max_size = 1
min_size = 1
health_check_grace_period = 300
health_check_type = "ELB"
force_delete = true
termination_policies = ["OldestInstance"]
}
resource "aws_autoscaling_schedule" "foobar" {
scheduled_action_name = "foobar"
min_size = 0
max_size = 1
desired_capacity = 0
start_time = "2016-12-11T18:00:00Z"
end_time = "2016-12-12T06:00:00Z"
autoscaling_group_name = aws_autoscaling_group.foobar.name
}
As we can see here i have to set a particular date and time for the action.
what I want is : I want to scale down on saturday night 9 pm by 10% of my current capacity, and then again want to scale up by 10% on monday morning 6 am .
How can I achieve this.
Any help is highly appreciated. Please let me know how to get through this.

The solution is not straightforward, but is doable. The required steps are:
create a Lambda function that scales down the ASG (e.g. with Boto3 and Python)
assign an IAM role with the right permissions
create a Cron trigger for "every saturday 9pm" with aws_cloudwatch_event_rule
create a aws_cloudwatch_event_target, with the previously created Cron trigger and Lambda function
repeat for scaling up
This module will probably fit your needs, you just have to code the Lambda and use the module to trigger it on a schedule.

Related

The "count" value depends on resource attributes that cannot be determined until apply, so Terraform cannot predict how many instances will be created

I want to exempt certain policies for an Azure VM. I have the following terraform code to exempt the policies.
It uses locals to identify the scope on which policies should be exempt.
locals {
exemption_scope = try({
mg = length(regexall("(\\/managementGroups\\/)", var.scope)) > 0 ? 1 : 0,
sub = length(split("/", var.scope)) == 3 ? 1 : 0,
rg = length(regexall("(\\/managementGroups\\/)", var.scope)) < 1 ? length(split("/", var.scope)) == 5 ? 1 : 0 : 0,
resource = length(split("/", var.scope)) >= 6 ? 1 : 0,
})
expires_on = var.expires_on != null ? "${var.expires_on}T23:00:00Z" : null
metadata = var.metadata != null ? jsonencode(var.metadata) : null
# generate reference Ids when unknown, assumes the set was created with the initiative module
policy_definition_reference_ids = length(var.member_definition_names) > 0 ? [for name in var.member_definition_names :
replace(substr(title(replace(name, "/-|_|\\s/", " ")), 0, 64), "/\\s/", "")
] : var.policy_definition_reference_ids
exemption_id = try(
azurerm_management_group_policy_exemption.management_group_exemption[0].id,
azurerm_subscription_policy_exemption.subscription_exemption[0].id,
azurerm_resource_group_policy_exemption.resource_group_exemption[0].id,
azurerm_resource_policy_exemption.resource_exemption[0].id,
"")
}
and the above local is used like mentioned below
resource "azurerm_management_group_policy_exemption" "management_group_exemption" {
count = local.exemption_scope.mg
name = var.name
display_name = var.display_name
description = var.description
management_group_id = var.scope
policy_assignment_id = var.policy_assignment_id
exemption_category = var.exemption_category
expires_on = local.expires_on
policy_definition_reference_ids = local.policy_definition_reference_ids
metadata = local.metadata
}
Both the locals and azurerm_management_group_policy_exemption are part of the same module file. And Policy exemption is applied like mentioned below
module exemption_jumpbox_sql_vulnerability_assessment {
count = var.enable_jumpbox == true ? 1 : 0
source = "../policy_exemption"
name = "Exemption - SQL servers on machines should have vulnerability"
display_name = "Exemption - SQL servers on machines should have vulnerability"
description = "Not required for Jumpbox"
scope = module.create_jumbox_vm[0].virtual_machine_id
policy_assignment_id = module.security_center.azurerm_subscription_policy_assignment_id
policy_definition_reference_ids = var.exemption_policy_definition_ids
exemption_category = "Waiver"
depends_on = [module.create_jumbox_vm,module.security_center]
}
It works for an existing Azure VM. However it throws the following error while trying to provision the Azure VM and apply the policy exemption on this Azure VM.
Ideally, module.exemption_jumpbox_sql_vulnerability_assessment should get executed only after [module.create_jumbox_vm as it is defined as a dependent. But not sure why it is throwing the error
│ The "count" value depends on resource attributes that cannot be determined
│ until apply, so Terraform cannot predict how many instances will be
│ created. To work around this, use the -target argument to first apply only
│ the resources that the count depends on.
I tried to reproduce the scenario in my environment.
resource "azurerm_management_group_policy_exemption" "management_group_exemption" {
count = local.exemption_scope.mg
name = var.name
display_name = var.display_name
description = var.description
management_group_id = var.scope
policy_assignment_id = var.policy_assignment_id
exemption_category = var.exemption_category
expires_on = local.expires_on
policy_definition_reference_ids = local.policy_definition_reference_ids
metadata = local.metadata
}
locals {
exemption_scope = try({
...
})
Received the same error:
The "count" value depends on resource attributes that cannot be determined
│ until apply, so Terraform cannot predict how many instances will be
│ created. To work around this, use the -target argument to first apply only
│ the resources that the count depends on.
Referring to local values , the values will be known on the apply time only, and not during the apply time .So if it is not dependent on other sources , it will expmpt policies but it is dependent on the VM which may be still in process of creation.
So target only the resource that is dependent on first ,as only when vm is created is when the exemption policy can be assigned to that vm.
Check count:using-expressions-in-count | Terraform | HashiCorp Developer
Also note that while using terraform count argument with Azure Virtual Machines ,NIC resource also to be created for each Virtual Machine resource.
resource "azurerm_network_interface" "nic" {
count = var.vm_count
name = "${var.vm_name_pfx}-${count.index}-nic"
location = data.azurerm_resource_group.example.location
resource_group_name = data.azurerm_resource_group.example.name
//tags = var.tags
ip_configuration {
name = "internal"
subnet_id = azurerm_subnet.internal.id
private_ip_address_allocation = "Dynamic"
}
}
Reference: terraform-azurerm-policy-exemptions/examples/count at main · AnsumanBal-MT/terraform-azurerm-policy-exemptions · GitHub

Setup Cloudwatch Alarm with Terraform that uses a query-expression

My goal is to setup an alarm in Cloudwatch via Terraform, that fires when disk_usage is above a certain treshold. The monitored metrics come from a Non-AWS-Server and are collected via CloudWatch Agent.
My first step was to do this manually, by setting up a metric that Selects the maximum disk_usage of all devices on a selected host:
SELECT MAX(disk_used_percent) FROM CWAgent WHERE host = 'MY_HOST'
I when successfully created an alarm based on this metric. Now I want to do the same thing with Terraform, but I cant figure out how to do that.
If I setup the Terraform-Resource to use a dimension for the host, then I get no results. If I try to setup a metric-query, then I get a conflict between Terraform and AWS, where Terraform tells me that my resource should not declare a "period"-Attribute but AWS demands it and will fail if not provided:
Error: Updating metric alarm failed: ValidationError: Period must not
be null
Currently, my resource looks like this:
resource "aws_cloudwatch_metric_alarm" "disk_usage_alarm" {
alarm_name = "Disk usage alarm on MY_HOST"
alarm_description = "One or more disks on MY_HOST are over 65% capacity"
comparison_operator = "GreaterThanOrEqualToThreshold"
threshold = "65"
evaluation_periods = "2"
datapoints_to_alarm = "1"
treat_missing_data = "missing"
actions_enabled = "false"
insufficient_data_actions = []
alarm_actions = []
ok_actions = []
metric_query {
id = "q1"
label = "Maximum disk_used_percentage for all disks on Host MY_HOST"
return_data = true
expression = "SELECT MAX(disk_used_percent) FROM CWAgent WHERE host = 'MY_HOST'"
}
}
Anyone knows whats wrong here and how to correctly setup this alarm via Terraform?

Terraform - Create or not create resources based on conditions

I need my resources to be created on specified environments. For example, if I have a AWS Lambda that is not ready for production, I need it to only exist in development environment. Is there a nice way to do this? I know that it's possible to set count to 0, but I'm not sure how to cascade this decision to other resources.
For example, I have a resource for an AWS Lambda and the count is set to 0.
resource "aws_lambda_function" "example_lambda" {
count ? local.is_production ? 0 : 1
}
How do I cascade this decision to other resources that depends on the AWS Lambda above?
And let's say I have a S3 Bucket which will invoke the Lambda function.
resource "aws_s3_bucket" "example_bucket" {
bucket = "bucket_name"
}
resource "aws_lambda_permission" "example_bucket_etl" {
statement_id = "AllowExecutionFromS3Bucket"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.example_lambda.arn
principal = "s3.amazonaws.com"
source_arn = aws_s3_bucket.example_bucket.arn
}
resource "aws_s3_bucket_notification" "bucket_notification" {
bucket = aws_s3_bucket.example_bucket.id
lambda_function {
lambda_function_arn = aws_lambda_function.example_lambda.arn
events = ["s3:ObjectCreated:*"]
filter_prefix = "example_bucket/"
filter_suffix = ".txt"
lambda_function {
lambda_function_arn = aws_lambda_function.another_lambda_function.arn
events = ["s3:ObjectCreated:*"]
filter_prefix = "another_example_bucket/"
filter_suffix = ".txt"
}
}
You can use the same count variable on multiple resources. A nicer and clear way would be to add all resources into a module, if that is possible in your code. https://www.terraform.io/docs/language/meta-arguments/count.html
When you use count in a resource block, that makes Terraform treat references to that resource elsewhere as producing a list of objects representing each of the instances of that resource.
Since that value is just a normal list value, you can take its length in order to concisely write down what is essentially the statement "there should be one Y for each X", or in your specific case "there should be one lambda permission for each lambda function".
For example:
resource "aws_lambda_function" "example" {
count = local.is_production ? 0 : 1
# ...
}
resource "aws_lambda_permission" "example_bucket_etl" {
count = length(aws_lambda_function.example)
function_name = aws_lambda_function.example[count.index].name
# ...
}
Inside the aws_lambda_permission configuration we first set the count to be whatever is the count of the aws_lambda_function.example, which tells Terraform that we intend for the counts of these to always match. That connection helps Terraform understand how to resolve situations where you increase or reduce the count, by hinting that the resulting create/destroy actions will need to happen in a particular order in order to be valid. We then use count.index to refer to indices of the other resource, which in this case will only ever be zero but again helps Terraform understand our intent during validation.
The lambda_function nested block inside aws_s3_bucket_notification requires a slightly different strategy, since in that case we're not creating a separate resource instance per lambda function but instead just generating some dynamic configuration blocks inside a single resource instance. For that situation, we can use dynamic blocks which serve as a sort of macro for generating multiple blocks based on elements of a collection:
resource "aws_s3_bucket_notification" "bucket_notification" {
bucket = aws_s3_bucket.example_bucket.id
dynamic "lambda_function" {
for_each = aws_lambda_function.example
content {
# "lambda_function" in this block is the iterator
# symbol, so lambda_function.value refers to the
# current element of aws_lambda_function.example.
lambda_function_arn = lambda_function.value.arn
# ...
}
}
}
Again this is relying on the fact that aws_lambda_function.example is a list of objects, but in a different way: we ask Terraform to generate a lambda_function block for each element of aws_lambda_function.example, setting lambda_function.value to the whole aws_lambda_function object corresponding to each block. We can therefore access the .arn attribute from that object to get the corresponding ARN that we need to populate the lambda_function_arn argument inside the block.
Again, for this case there will only ever be zero or one lambda function objects and therefore only zero or one lambda_function blocks, but in both cases this pattern generalizes to other values of count, ensuring that all of these will stay aligned as your configuration evolves.

Terraform .11 to .12 conversion of deeply nested data

So, in my old .11 code, I have a file where i my output modules locals section, I'm building:
this_assigned_nat_ip = google_compute_instance.this_public.*.network_interface.0.access_config.0.assigned_nat_ip--
Which later gets fed to the output statement.  This module could create N instances. So what it used to do was give me the first nat ip on the first access_config block on the first network interface of all the instances we created.  (Someone locally wrote the code so we know that there's only going to be one network interface with one access config block).
How do I translate that to t12?  I'm unsure of the syntax to keep the nesting.
Update:
Here's a chunk of the raw data out of a terraform show from tf11 (slightly sanitized)
module.gcp_bob_servers_ams.google_compute_instance.this_public.0:
machine_type = n1-standard-2
min_cpu_platform =
network_interface.# = 1
network_interface.0.access_config.# = 1
network_interface.0.access_config.0.assigned_nat_ip =
network_interface.0.access_config.0.nat_ip = 1.2.3.4
network_interface.0.access_config.0.network_tier = PREMIUM
Terraform show of equivalent host in tf12:
# module.bob.module.bob_gcp_ams.module.atom_d.google_compute_instance.this[1]:
resource "google_compute_instance" "this" {
allow_stopping_for_update = true
network_interface {
name = "nic0"
network = "https://www.googleapis.com/compute/v1/projects/stuff-scratch/global/networks/scratch-public"
network_ip = "10.112.112.6"
subnetwork = "https://www.googleapis.com/compute/v1/projects/stuff-scratch/regions/europe-west4/subnetworks/scratch-europe-west4-x-public-subnet"
subnetwork_project = "stuff-scratch"
access_config {
nat_ip = "35.204.132.177"
network_tier = "PREMIUM"
}
}
scheduling {
automatic_restart = true
on_host_maintenance = "MIGRATE"
preemptible = false
}
}
If I understand correctly this_assigned_nat_ip is a list of IPs. You should be able to get the same thing in Terraform 0.12 by doing:
this_assigned_nat_ip = [for i in google_compute_instance.this_public : i.network_interface.0.access_config.0.assigned_nat_ip]
I did not test is, so I might have some small syntax error, but the for is the key to get that done.
Turns out this[*].network_interface[*].access_config[*].nat_ip[*] gave me what I needed. Given there's only every going to be one address on the interface, it comes out fine.

Automating Spin Up/Down Of AMIs using Terraform/Terragrunt

This may seem silly, but I've been looking for instructions/tutorial on how to automate an Amazon AMI to teardown/up on a schedule. This is because we have non-production servers used for development that don't need to run 24/7. Any chance someone can assist or point me in the proper direction?
Here is how I do it;
resource "aws_autoscaling_schedule" "asg_morning" {
count = "${var.schedule_enabled}"
scheduled_action_name = "${upper(var.environment)}-${app}-AM-Schedule"
min_size = 1
max_size = 1
desired_capacity = 1
recurrence = "${var.schedule_am}"
autoscaling_group_name = "${aws_autoscaling_group.app.name}"
}
resource "aws_autoscaling_schedule" "asg_evening" {
count = "${var.schedule_enabled}"
scheduled_action_name = "${upper(var.environment)}-${var.app}-PM-Schedule"
min_size = 0
max_size = 0
desired_capacity = 0
recurrence = "${var.schedule_pm}"
autoscaling_group_name = "${aws_autoscaling_group.app.name}"
}

Resources