How to ensure that the ASG ( Auto scaling Group) replaces existing instances with every change in the Launch configuration - terraform

The infrastructure is provisioned using terraform code.
In our AWS environment, we have a new AMI created for every commit made to the repository. Now, if we want to have autoscaling configured for the web servers behind an ALB using this new AMI
how can we make sure that the ASG replaces existing instances with every change in the Launch configuration, because I believe, once you change the LC, only the instances that are created out of scaling in/out are launched using the new AMI and the existing ones are not replaced.
Also, do you have any idea of how can we pro-grammatically (via terraform) get how many servers run at any point in time, in case of auto- scaling ?
Any help is highly appreciated here.
Thanks!

For the most part this is pretty straightforward and there are already a dozen of implementations around the web.
The tricky part is to express the 'create_before_destroy' field on the LC and the ASG. You schould also refer to the LC in your ASG resource. That way once your LC is changed you would trigger a workflow that creates a new ASG, that replaces your current one.
Very Good Documented Example
Also, do you have any idea of how can we pro-grammatically (via
terraform) get how many servers run at any point in time, in case of
auto- scaling ?
This depends on the context. If you have a static number it's easy, you could define it in your module and stick with it. If it's about passing the previous ASG value the way would be again described in the guide above :) You need to write a custom external handler for how many in 'the moment' running instances you have around your target groups. There might be of course a new AWS REST API addition that gives you the chance to query all your Target Groups health check property and get their total sum ( not aware about it ). Then again, you might add some custom rules for scaling policies.
External Handler
Side note: in the example the deployment is happening with ELB.

Related

Terraform detecting changes immediately after apply

I have a fairly simple terraform setup setting up:
An AWS VPC
Its default route table, with an endpoint to S3
A couple of security groups
Some EC2 instances
An internal Route53 DNS zone
Now, if I execute terraform plan immediately after terraform apply from scratch, a bunch of spurious changes are detected. These fall into two categories:
Empty attributes (tags and aws_default_route_table.propagating_vgws), even though they are set explicitly empty in the code
Two Route53 records that are marked as changed, but show no changes to be applied
ingress and egress rules in security groups
The first two groups are annoying, but no big deal even if they'd be nice to get rid of.
The last one I'd rather like to get rid of. I think it's related to the fact that I have the rules as separate aws_security_group_rule resources rather than inline in the security group resources (because some of them refer to each other mutually). I had a couple of inline rules, but rereading the docs I think that's not allowed, but even removing them doesn't remove this issue.
(Running terraform apply -refresh-only makes everything good, but it's really annoying that an apply from a blank slate needs this kind of fixup)
As suggested in Marcin's comment, this seems to have been a bug in Terraform: updgrading the terraform executable to the latest version fixes the problem.
Try using Terraform's "count" variable to define multiple security group rules in a single resource. This will help reduce the number of spurious changes Terraform detects.
If you are using separate security group rules for each rule, you can use the "depends_on" attribute to ensure that the rules are applied in the correct order.
Finally, if you're still seeing spurious changes, you can try running a "terraform refresh" before running "terraform plan" to ensure that Terraform has the most up-to-date information.

Blue Green Deployment with AWS ECS

We are using ECS Fargate containers to deploy all of our services (~10) and want to follow Blue/Green Deployment.
We have deployed all the services under BLUE flag where target groups are pointing to the services.
In CICD, New Target groups are created and having slightly different forward rules to allow testing without any issue.
Now, my System is running with 2 kind of target groups, services and task definition -
tg_blue, service_blue, task_blue → pointing to old containers and serving live traffic
tg_green, service_green, task_green → pointing to new containers and do not have any traffic.
All above steps are done in Terraform.
Now, I want to switch the traffic and here I am stuck, How to Switch the Traffic and How the next Deployment will look like?
I would go for AWS native solution if no important reasons against. I have on my mind CodeDeploy. It switches in automatic way between TGroups.
Without CDeploy, you need to implement weighted balancing among two TGroups and adjust them later on. That is extra work.
Whole flow is quite good explained on this YT video.

Can't figure out how to reuse terraform provisioners

I've created some (remote-exec and file) provisioners to bootstrap (GCP) VMs that I'm creating that I want to apply to all my VMs, but I can't seem to figure out how to reuse them...?
Modules seem like the obvious answer, but creating a module to create the VMs means I'd need to make input vars for everything that I'd want to configure on each of the VMs specifically...
Reusing the snippets with the provisioners doesn't seem possible though?
Terraform's Provisioner feature is intended as a sort of "last resort" for situations where there is no alternative but to SSH into a machine and run commands on it remotely, but generally we should explore other options first.
The ideal case is to design your machine images so that they are already correctly configured for what they need to do and so they can immediately start doing that work on boot. If you use HashiCorp Packer then you can potentially run very similar steps at image build time to what you might've otherwise run at Terraform create time with provisioners, perhaps allowing you to easily adapt the work you already did.
If they need some configuration parameters from Terraform in order to start their work, you can use features like the GCP instance metadata argument to pass in those values so that the software in the image can access it as soon as the system boots.
A second-best sort of option is to use features like GCP startup scripts to pass the script to run via metadata so that again it's available immediately on boot, without the need to wait for the SSH server to start up and become available.
In both of these cases, the idea is to rely on features provided by the compute platform to treat the compute instances as a sort of "appliance", so Terraform (and you) can think of them as being similar to a resource modelling a hosted service. Terraform is concerned only with starting and stopping this black box infrastructure and wiring it in with other infrastructure, and the instance handles its implementation details itself. For use-cases where horizontal scaling is appropriate, this also plays nicely with managed autoscaling functionality like google_compute_instance_group, since new instances can be started by that system rather than directly by Terraform.
Because Provisioners are designed as a last-resort for when approaches like the above are not available, their design does not include any means for general reuse. It's expected that each provisioner will be a tailored solution to a specific problem inline in the resource it relates to, not something you use systematically across many separate callers.
With that said, if you are using file and remote-exec in particular you can get partway there by factoring out the specific file to be uploaded and the remote command to execute, in which case your resource blocks will contain just the declaration boilerplate while avoiding repetition of the implementation details. For example, if you had a module that exported outputs local_file_path, remote_file_path, and remote_commands you could write something like this:
module "provisioner_info" {
source = "./modules/provisioner-info"
}
resource "any" "example" {
# ...
provisioner "file" {
source = module.provisioner_info.local_file_path
destination = module.provisioner_info.remote_file_path
}
provisioner "remote-exec" {
inline = module.provisioner_info.remote_commands
}
}
That is the limit for factoring out provisioner details in current versions of Terraform.

Backing up of Terraform statefile

I usually run all my Terraform scripts through Bastion server and all my code including the tf statefile resides on the same server. There happened this incident where my machine accidentally went down (hard reboot) and somehow the root filesystem got corrupted. Now my statefile is gone but my resources still exist and are running. I don't want to again run terraform apply to recreate the whole environment with a downtime. What's the best way to recover from this mess and what can be done so that this doesn't get repeated in future.
I have already taken a look at terraform refresh and terraform import. But are there any better ways to do this ?
and all my code including the tf statefile resides on the same server.
As you don't have .backup file, I'm not sure if you can recover the statefile smoothly in terraform way, do let me know if you find a way :) . However you can take few step which will help you come out from situation like this.
The best practice is keep all your statefiles in some remote storage like S3 or Blob and configure your backend accordingly so that each time you destroy or create a new stack, it will always contact the statefile remotely.
On top of it, you can take the advantage of terraform workspace to avoid the mess of statefile in multi environment scenario. Also consider creating a plan for backtracking and versioning of previous deployments.
terraform plan -var-file "" -out "" -target=module.<blue/green>
what can be done so that this doesn't get repeated in future.
Terraform blue-green deployment is the answer to your question. We implemented this model quite a while and it's running smoothly. The whole idea is modularity and reusability, same templates is working for 5 different component with different architecture without any downtime(The core template remains same and variable files is different).
We are taking advantage of Terraform module. We have two module called blue and green, you can name anything. At any given point of time either blue or green will be taking traffic. If we have some changes to deploy we will bring the alternative stack based on state output( targeted module based on terraform state), auto validate it then move the traffic to the new stack and destroy the old one.
Here is an article you can keep as reference but this exactly doesn't reflect what we do nevertheless good to start with.
Please see this blog post, which, unfortunately, illustrates import being the only solution.
If you are still unable to recover the terraform state. You can create a blueprint of terraform configuration as well as state for a specific aws resources using terraforming But it requires some manual effort to edit the state for managing the resources back. You can have this state file, run terraform plan and compare its output with your infrastructure. It is good to have remote state especially using any object stores like aws s3 or key value store like consul. It has support for locking the state when multiple transactions happened at a same time. Backing up process is also quite simple.

Setting up ELB on AWS for Node.js

I have a very basic question but I have been searching the internet for days without finding what I am looking for.
I currently run one instance on AWS.
That instance has my node server and my database on it.
I would like to make use of ELB by separating the one machine that hosts both the server and the database:
One machine that is never terminated, which hosts the database
One machine that runs the basic node server, which as well is never terminated
A policy to deploy (and subsequently terminate) additional EC2 instances that run the server when traffic demands it.
First of all I would like to know if this setup makes sense.
Secondly,
I am very confused about the way this should work in practice:
Do all deployed instances run using the same volume or is a snapshot of the volume is used?
In general, how do I set such a system? Again, I searched the web and all of the tutorials and documentations are so generalized for every case that I cannot seem to figure out exactly what to do in my case.
Any tips? Links? Articles? Videos?
Thank you!
You would have an AutoScaling Group with a minimum size of 1, that is configured to use an AMI based on your NodeJS server. The AutoScaling Group would add/remove instances to the ELB as instances are created and deleted.
EBS volumes can not be attached to more than one instance at a time. If you need a shared disk volume you would need to look into the EFS service.
Yes you need to move your database onto a separate server that is not a member of the AutoScaling Group.

Resources