How to programmatically download data to AWS EC2 instance? - linux

There are 3 machines involved in my task
A: my desktop
B: EC2 instance spun up by A
C: a remote linux server where data sits and I only have read privilege
The task has basically 3 steps
spin up B from A
download data from C to B to a specific location
change some of the downloaded data on B
I know how to do 1 using awscli or boto3. Steps 2 and 3 are easy if I ssh to the EC2 instance manually. The problem is that if this task needs to be automated, how can I deal with the login credentials.
Specifically, I am thinking of using user_data to run shell scripts after the EC2 instance is born, but the data download uses scp which needs password. Then I could upload an ssh credential file to the EC2 instance, but then I cannot utilize user_data to run the script for step 2 and 3.
So my current solution is all from shell script
spin up B from A
upload ssh credential from A to B
ssh from A to B with shell commands attached where steps 2 and 3 for the task are performed
This solution appears really ugly to me. Is there a better practice in this case?

3 Options
Pass the encrypted/encoded password as part of userdata. The userdata script will first decrypt/decode the password and use it to scp the file from C. Then delete the userdata or someway to delete the encrypted/encoded password
Use ssh key instead of ssh password. But the risk is you have to pass the private key in the userdata. Not a secure way.
Use Ansible and ssh key. But too much work for a simple task.

Give a try to ansible it can help you to automate this task by creating a playbook
For creating an instance you could use the ec2 module, from the doc examples:
# Basic provisioning example
- ec2:
key_name: mykey
instance_type: t2.micro
image: ami-123456
wait: yes
group: webserver
count: 3
vpc_subnet_id: subnet-29e63245
assign_public_ip: yes
To download data, the get_url module, example:
- name: Download file with check (md5)
get_url:
url: http://example.com/path/file.conf
dest: /etc/foo.conf
checksum: md5:66dffb5228a211e61d6d7ef4a86f5758
For modifying files there are multiple modules that can be found in the http://docs.ansible.com/.
In overall is a tool that can help to automate many things, but some time is required to get the basis, check the Getting started guide, hope it can help.

There are many way to solve your task. I will not say about task 1 (spin up B from A) because you already done on it.
Option 1: Use EC2 Run command to push commands to server B. Flow: A -> EC2 Run Command service -> B -> C No need to push credential (SSH key/password) to server B
Option 2: Define all you commands in bash shell file, push this shell file to S3. Use User Data of server B to download that file from S3. Flow: A -> S3. B get file from S3. B -> C
With above 2 options, you do not need to push any credentials to server B. Server C can be any where you have connection between B to C for downloading task.

Related

EC2 instance running S3 Sync command terminates before data transfer is complete

I have an EC2 instance running Linux. This instance is used to run aws s3 commands.
I want to sync the last 6 months worth of data from source to target S3 buckets. I am using credentials with the necessary permissions to do this.
Initially I just ran the command:
aws s3 sync "s3://source" "s3://target" --query "Contents[?LastModified>='2022-08-11' && LastModified<='2023-01-11']"
However, after maybe 10 mins this command stops running, and only a fraction of the data is synced.
I thought this was because my SSM session was terminating, and with it the command stopped executing.
To combat this, I used the following command to try and ensure that this command would continue to execute even after my SSM terminal session was closed:
nohup aws s3 sync "s3://source" "s3://target" --query "Contents[?LastModified>='2022-08-11' && LastModified<='2023-01-11']" --exclude "*.log" --exclude "*.bak" &
Checking the status of the EC2 instance, the command appears to run for about 20 mins, before clearly stopping for some reason.
The --query parameter controls what information is displayed in the response from an API call.
It does not control which files are copied in an aws s3 sync command. The documentation for aws s3 sync defines the --query parameter as: "A JMESPath query to use in filtering the response data."
Your aws s3 sync command will be synchronizing ALL files unless you use Exclude and Include Filters. These filters operate on the name of the object. It is not possible to limit the sync command by supplying date ranges.
I cannot comment on why the command would stop running before it is complete. I suggest you redirect output to a log file and then review the log file for any clues.

Is there a way by which I can run 'aws configure command like "aws configure | access key | shared secret | region" '

I'm making a crone job that switches cluster context every time and checks for stuff. But, for switching the context to EKS, I need to run aws configure every time to get logged in.
I'm wondering how this step can be fulfilled via a crone job that will also switch the context to EKS. If It is possible to run aws configure like AWS configure | key1 | key2 | region, I'll pass the input in via string templating.
Since you are using EKS, I assume you are using aws-auth configmap also. To contact with EKS, you need to use a role or user inside aws-auth.
Here is what you can do now:
Make your credentials file having multiple profiles:
[profile1]
...
[profile2]
...
then you can switch profiles in your script by this environment variable:
export AWS_PROFILE=profile1
Example like:
export AWS_PROFILE=profile1
aws eks ...
kubectl ...
export AWS_PROFILE=profile2
aws eks ...
kubectl ...
The export part might be different in the real world, but the base script is similar.

AWS-CDK: Using InitFile to create a file in EC2 instance

I'm trying to create a file in my EC2 instance using the InitFile construct in CDK. Below is the code i'm using to create my EC2 instance into which i'm trying to create a file textfile.txt which would contain a text 'welcome' going by https://docs.aws.amazon.com/cdk/api/v1/python/aws_cdk.aws_ec2/InitFile.html reference
during cdk initialisation,
init_data = ec2.CloudFormationInit.from_elements(
ec2.InitFile.from_string("/home/ubuntu/textfile.txt", "welcome")
)
self.ec2_instance = ec2.Instance(self,
id='pytenv-instance',
vpc=self.vpc,
instance_type=ec2.InstanceType.of(ec2.InstanceClass.BURSTABLE2, ec2.InstanceSize.NANO),
machine_image=ec2.MachineImage.generic_linux(
{'us-east-1': 'ami-083654bd07b5da81d'}
),
key_name="demokeyyt18",
security_group=self.sg,
vpc_subnets=ec2.SubnetSelection(
subnet_type=ec2.SubnetType.PUBLIC
),
init=init_data,
)
From the EC2 configuration it is evident that the machine image here is Ubuntu. Getting this error: Failed to receive 1 resource signal(s) within the specified duration.
Am I missing something? Any inputs?
UPDATE: This same code works with EC2 machine image as Amazon_linux but not for Ubuntu. Am I doing something wrong ?
CloudFormation init requires the presence of cfn-init helper script on the instance. Ubuntu does not come with it, so you have to set it up yourself.
Here's the AWS guide that contains links to the installation scripts for Ubuntu 16.04/18.04/20.04. You need to add these to the user_data prop of your instance. Then cloudformation-init will work.
If you just want to create a file when the instance starts, though, you don't have to use cfn-init at all - you could just supply the command that creates your file to the user_data prop directly:
self.ec2_instance.user_data.add_commands("echo welcome > /home/ubuntu/textfile.txt")

Node red instance in Kubernetes with custom settings.js and other files

I am building a service which creates on demand node red instance on Kubernetes. This service needs to have custom authentication, and some other service specific data in a JSON file.
Every instance of node red will have a Persistent Volume associated with it, so one way I though of doing this was to attach the PVC with a pod and copy the files into the PV, and then start the node red deployment over the modified PVC.
I use following script to accomplish this
def paste_file_into_pod(self, src_path, dest_path):
dir_name= path.dirname(src_path)
bname = path.basename(src_path)
exec_command = ['/bin/sh', '-c', 'cd {src}; tar cf - {base}'.format(src=dir_name, base=bname)]
with tempfile.TemporaryFile() as tar_buffer:
resp = stream(self.k8_client.connect_get_namespaced_pod_exec, self.kube_methods.component_name, self.kube_methods.namespace,
command=exec_command,
stderr=True, stdin=True,
stdout=True, tty=False,
_preload_content=False)
print(resp)
while resp.is_open():
resp.update(timeout=1)
if resp.peek_stdout():
out = resp.read_stdout()
tar_buffer.write(out.encode('utf-8'))
if resp.peek_stderr():
print('STDERR: {0}'.format(resp.read_stderr()))
resp.close()
tar_buffer.flush()
tar_buffer.seek(0)
with tarfile.open(fileobj=tar_buffer, mode='r:') as tar:
subdir_and_files = [tarinfo for tarinfo in tar.getmembers()]
tar.extractall(path=dest_path, members=subdir_and_files)
This seems like a very messy way to do this. Can someone suggest a quick and easy way to start node red in Kubernetes with custom settings.js and some additional files for config?
The better approach is not to use a PV for flow storage, but to use a Storage Plugin to save flows in a central database. There are several already in existence using DBs like MongoDB
You can extend the existing Node-RED container to include a modified settings.js in /data that includes the details for the storage and authentication plugins and uses environment variables to set the instance specific at start up.
Examples here: https://www.hardill.me.uk/wordpress/tag/multi-tenant/

How to start an ec2 instance using sqs and trigger a python script inside the instance

I have a python script which takes video and converts it to a series of small panoramas. Now, theres an S3 bucket where a video will be uploaded (mp4). I need this file to be sent to the ec2 instance whenever it is uploaded.
This is the flow:
Upload video file to S3.
This should trigger EC2 instance to start.
Once it is running, I want the file to be copied to a particular directory inside the instance.
After this, I want the py file (panorama.py) to start running and read the video file from the directory and process it and then generate output images.
These output images need to be uploaded to a new bucket or the same bucket which was initially used.
Instance should terminate after this.
What I have done so far is, I have created a lambda function that is triggered whenever an object is added to that bucket. It stores the name of the file and the path. I had read that I now need to use an SQS queue and pass this name and path metadata to the queue and use the SQS to trigger the instance. And then, I need to run a script in the instance which pulls the metadata from the SQS queue and then use that to copy the file(mp4) from bucket to the instance.
How do i do this?
I am new to AWS and hence do not know much about SQS or how to transfer metadata and automatically trigger instance, etc.
Your wording is a bit confusing. It says that you want to "start" an instance (which suggests that the instance already exists), but then it says that it wants to "terminate" an instance (which would permanently remove it). I am going to assume that you actually intend to "stop" the instance so that it can be used again.
You can put a shell script in the /var/lib/cloud/scripts/per-boot/ directory. This script will then be executed every time the instance starts.
When the instance has finished processing, it can call sudo shutdown now -h to turn off the instance. (Alternatively, it can tell EC2 to stop the instance, but using shutdown is easier.)
For details, see: Auto-Stop EC2 instances when they finish a task - DEV Community
I tried to answer in the most minimalist way, there are many points below that can be further improved. I think below is still quite some as you mentioned you are new to AWS.
Using AWS Lambda with Amazon S3
Amazon S3 can send an event to a Lambda function when an object is created or deleted. You configure notification settings on a bucket, and grant Amazon S3 permission to invoke a function on the function's resource-based permissions policy.
When the object uploaded it will trigger the lambda function. Which creates the instance with ec2 user data Run commands on your Linux instance at launch.
For the ec2 instance make you provide the necessary permissions via Using instance profiles for download and uploading the objects.
user data has a script that does the rest of the work which you need for your workflow
Download the s3 object, you can pass the name and s3 bucket name in the same script
Once #1 finished, start the panorama.py which processes the videos.
In the next step you can start uploading the objects to the S3 bucket.
Eventually terminating the instance will be a bit tricky which you can achieve Change the instance initiated shutdown behavior
OR
you can use below method for terminating the instnace, but in that case your ec2 instance profile must have access to terminate the instance.
ec2-terminate-instances $(curl -s http://169.254.169.254/latest/meta-data/instance-id)
You can wrap the above steps into a shell script inside the userdata.
Lambda ec2 start instance:
def launch_instance(EC2, config, user_data):
ec2_response = EC2.run_instances(
ImageId=config['ami'], # ami-0123b531fc646552f
InstanceType=config['instance_type'],
KeyName=config['ssh_key_name'],
MinCount=1,
MaxCount=1,
SecurityGroupIds=config['security_group_ids'],
TagSpecifications=tag_specs,
# UserData=base64.b64encode(user_data).decode("ascii")
UserData=user_data
)
new_instance_resp = ec2_response['Instances'][0]
instance_id = new_instance_resp['InstanceId']
print(f"[DEBUG] Full ec2 instance response data for '{instance_id}': {new_instance_resp}")
return (instance_id, new_instance_resp)
Upload file to S3 -> Launch EC2 instance

Resources