I am provisioning my Infracture with Terraform and I am using xyz.sh bash script which consists my Deeplearning Model training over GPU Machine.
My question is, How will I get the logs/finishing time of xyz.sh bash script without ssh into the machine? if not possible then, if I will ssh into the machine so, How can i check that the script is still running or finished
When you use user_data for an EC2 instance, what happens internally is that Terraform sends that string to the EC2 API and then the EC2 infrastructure makes that string available to the instance via the Instance metadata and user data API.
How (and whether) that string is used by the EC2 instance is then dependent on what software you have installed in the EC2 instance. A typical configuration for common Linux distribution AMIs is to have cloud-init installed and configured to run on first boot. If you are using an AMI with cloud-init then it will be cloud-init that retrieves the user_data string from the EC2 endpoint and executes it as a script (or, other interpretations), and so cloud-init is the system responsible for emitting any logs that result from that process.
You can read more about debugging cloud-init in Testing and debugging cloud-init, which mentions that cloud-init writes logs to /var/log/cloud-init.log by default (some Linux distributions may customize this) and that you can use the cloud-init analyze subcommand to retrieve information from that log file.
Terraform's involvement in this process is only to send the given user_data string to the EC2 API, so Terraform has no visibility into what happens after the instance is created. Unless the script you submit includes a step to report its progress somewhere, there is no built-in way to determine that other than to inspect the cloud-init log file from within the EC2 instance itself.
You can run command
ls -la /var/log/cloud*
it will out put few of log related to user data :
(in my case, I use Ali Cloud, so it shows as) :
Then you need to identify which one is your userdata, in my case the /var/log/cloud-init-output.log is my all userdata output will be stored.
Other cloud providers might be a little different but the concept should be the same because most of the cloud are using same cloud-init library https://cloud-init.io/
Note: you need to ssh into server.
Related
I have an AWS Windows Server 2016 VM. This VM has a bunch of libraries/software installed (dependencies).
I'd like to, using python3, launch and deploy multiple clones of this instance. I want to do this so that I can use them almost like batch compute nodes in Azure.
I am not very familiar with AWS, but I did find this tutorial.
Unfortunately, it shows how to launch an instance from the store, not an existing configured one.
How would I do what I want to achieve? Should I create an AMI from my configured VM and then just launch that?
Any up-to-date links and/or advice would be appreciated.
Yes, you can create an AMI from the running instance, then launch N instances from that AMI. You can do both using the AWS console or you could call boto3 create_image() and run_instances(). Alternatively, look at Packer for creating AMIs.
You don't strictly need to create an AMI. You could simply the bootstrap each instance as it launches via a user data script or some form of CM like Ansible.
I am trying to launch my own AMI using user-data so that it can run a script and then terminate.
So I launched an Ec2 Windows Base and configure it to have all the tools I need (NodeJS etc) and saved my script to C:\Projects\index.js.
I then saved it as an Image.
So I then used the console to launch an EC2 from my new AMI with the user-data of
node C:\Projects\index.js --uuid=1
</powershell>
If I run that command having RDP into the EC2 it works, so it seems that the userdata did not run when the Image was started.
Having read some of the other questions and answers it could be because the AMI created was made from an Instance that started already. So the userdata did not persist.
Can anyone advise me on how I can launch my AMI with a custom userdata each time? (as the UUID will change)
Thanks
Another solution that worked for me is to run Sysprep with EC2Launch.
The issue is that AWS doesn't reestablish the route to the profile service (169.254.169.254) in your custom AMI. See response by SanjitPatel in this post. So when I tried to use my custom AMI to create spot requests, my new instances were failing to find user data.
Shutting down with Sysprep, essentially forces AWS re-do all setup work on the instance, as if it were run for the first time. So when you create your instance, shut it down with Sysprep and then create your custom AMI, AWS will setup the profile service route correctly for the new instances and execute your user data. This also avoids manually changing Windows Tasks and executing user data on subsequent boots, as persist tag does.
Here is a quick step-by-step:
1.Create an instance using one of the AWS Windows AMIs (Windows Server 2016 Nano Server doesn't support Sysprep) and passing your desired user data (this may be optional, but good to make sure AWS wires setup scripts correctly to handle user data).
2.Customize your instance as needed.
3.Shut down your instance with Sysprep. Just open EC2LaunchSettings application and click "Shutdown with Sysprep".
4.Create your custom AMI from the instance you just shut down.
5.Use your custom AMI to create other instances, passing user data on instance creation. User data will be executed on instance launch. In my case, I used Spot Request screen, which had a User Data text box.
Hope this helps!
When launching EC2 using Terraform (or cloud formation), we can configure EC2 by putting some scripts in user_data/remote-exec. Alternatively, we can configure EC2 using Ansible/Chef, etc. What are the difference of configuring EC2 in user_data/remote-exec and do that with Ansible/Chef? when to use the former, when to use the latter (I know Ansible/Chef is idempotent)?
In my case, the EC2 is originally manually launched, then manually configured using a lot of linux commands. and the commands are not configured by me. Now I am the person to automate the whole structure using terraform, and configure EC2s. Using user_data/remote-exec to configure EC2 is straightforward. I just need to put all the existing linux commands they have in some scripts with a little change. And if the configuration result using my script is not successful, at least I can quickly figure out whether I miss some commands by comparing my script and the original linux commands. But if I use ansible/chef, I have to rewrite all the steps using different language. And if the configuration is not what expected, it is hard for me to figure out which steps are not correct, because the syntax of ansible/chef and linux commands are totally different.
My question is, in my case, should I use ansible/chef or user_data/remote-exec for configuration?
User Data is good for initial configuration of the system. If you need longer term maintenance a configuration management software like Ansible/Chef/Salt/Puppet is a great option.
Packer can be used for immutable infrastructure, i.e. doesn't change after creation. You can run all the scripts and installs on the system for it to be ready to just boot, this is also faster because you don't have to wait for user data to run.
A few questions you have to ask as well, how often are you going to patch these? Are you going to just update existing or replace with new. Ansible is great for configuration since it's just yaml files an
Blue/Green deployments generally replace servers with all new ones and gradually move traffic over to the new servers.
Some more things to consider with your Infrastructure as code
I am creating cloud infrastructure using Terraform (e.g. AWS EC2 VM) and after VM creation, I am running a shell script on remote VM using provisioner(remote-exec).
Is there any way to capture the shell script output (from remote vm) and store it in Terraform output (state file on local/consul)?
I already tried Terraform's 'External Data Source' but I guess it works only with local scripts (not remote vm scripts). Please correct me in case I am wrong.
Thanks
Creation-time or Destroy-time provisioners in Terraform only apply once during the resource creation/destroy, not during updating or any other lifecycle. For that reason the output of the provisioners won't be available in the terraform state.
Reference : https://www.terraform.io/docs/provisioners/index.html
https://github.com/matti/terraform-shell-resource module captures output from temporary files to triggers where they are stored in the state. The same pattern could maybe work for remote-exec too? Or, then just use local-exec to run command on the remote.
I have an EC2 instance which runs an app hosted on a private git repo.
I need to be able to launch many of these from my master server. At the moment, I have 5 fixed "worker" instances which I start/stop from the master with no problem. Each worker starts, pulls the repo, and launches the app on startup. This is obviously not a good solution and I want to make it more flexible (launch as many instances as I want, etc). The configuration and packages are final so I feel good about bundling it all into an AMI.
Is there a way for me to bundle my git keys into the AMI, in order to launch many similar instances and have them all pull and launch my app on startup without heving to connect to each of them and enter the password? Is there a better way? I've read about cloud-init, user-data, puppet and many other things, but I'm quite novice in the matter and couldn't find a proper example using ssh keys.
Instead of bundling the keys into the AMI, I suggest you keep them separate from the AMI because:
If you change your git keys, you don't have to build a new AMI
Unauthorized users who have privileges to launch an instance from your AMI cannot launch your app
I suggest using the user-data feature. You can optionally encrypt your keys and base64encode it if you want to. When you launch your instance manually or using CLI/API, you can pass your keys which can be accessed by the instance once it is launched. There are variety of ways to access the data (python, curl to name a few). I suggest you use AWS metadata server because your instance does not need your AWS credentials to fetch the user-data. Once your instance is launched, have your app make the following call, get the keys and then pull the repo:
curl http://169.254.169.254/latest/user-data
returns your metadata (no credentials needed). You can optionally base64decode and decrypt your keys and use it to pull the repo. If you do not want the extra security, you can bypass encrypt/base64 part.