Testing a connection to S3 from an EC2 Instance using Python 3 - python-3.x

I'm attempting to establish whether an EC2 instance can reach S3. Currently I'm doing this through an upload:
try:
# Create empty ping file
subprocess.run(['touch', '/tmp/ping'])
# Run upload commands
upload_result = subprocess.run(['aws', 's3', 'cp', '/tmp/ping', 's3://mybucket/ping'])
# Check if it succeeded
upload_result.check_returncode()
except Exception as e:
print('Could not reach S3')
However, I'm wondering if there's a more efficient (non-boto) way of doing this. The EC2 Instance does not have s3:getObject permissions, only s3:putObject which is intended. But if there's a way to establish it by a simple HTTPS request or something similar, I would love to hear about it.

Couple of things here that are not ideal:
shelling out to the awscli (I would use the boto3 SDK instead)
invoking a mutating operation (PutObject) simply to test connectivity
You might consider giving this Lambda function read access to a specific sentinel object (e.g. s3://mybucket/headtest) and then invoke HeadObject against it.

Related

How to increase the AWS lambda to lambda connection timeout or keep the connection alive?

I am using boto3 lambda client to invoke a lambda_S from a lambda_M. My code looks something like
cfg = botocore.config.Config(retries={'max_attempts': 0},read_timeout=840,
connect_timeout=600) # tried also by including the ,
# region_name="us-east-1"
lambda_client = boto3.client('lambda', config=cfg) # even tried without config
invoke_response = lambda_client.invoke (
FunctionName=lambda_name,
InvocationType='RequestResponse',
Payload=json.dumps(request)
)
Lambda_S is supposed to run for like 6 minutes and I want lambda_M to be still alive to get the response back from lambda_S but lambda_M is timing out, after giving a CW message like
"Failed to connect to proxy URL: http://aws-proxy..."
I searched and found someting like configure your HTTP client, SDK, firewall, proxy or operating system to allow for long connections with timeout or keep-alive settings. But the issue is I have no idea how to do any of these with lambda. Any help is highly appreciated.
I would approach this a bit differently. Lambdas charge you by second, so in general you should avoid waiting in them. One way you can do that is create an sns topic and use that as the messenger to trigger another lambda.
Workflow goes like this.
SNS-A -> triggers Lambda-A
SNS-B -> triggers lambda-B
So if you lambda B wants to send something to A to process and needs the results back, from lambda-B you send a message to SNS-A topic and quit.
SNS-A triggers lambda, which does its work and at the end sends a message to SNS-B
SNS-B triggers lambda-B.
AWS has example documentation on what policies you should put in place, here is one.
I don't know how you are automating the deployment of native assets like SNS and lambda, assuming you will use cloudformation,
you create your AWS::Lambda::Function
you create AWS::SNS::Topic
and in its definition, you add 'subscription' property and point it to you lambda.
So in our example, your SNS-A will have a subscription defined for lambda-A
lastly you grant SNS permission to trigger the lambda: AWS::Lambda::Permission
When these 3 are in place, you are all set to send messages to SNS topic which will now be able to trigger the lambda.
You will find SO answers to questions on how to do this cloudformation (example) but you can also read up on AWS cloudformation documentation.
If you are not worried about automating the stuff and manually tests these, then aws-cli is your friend.

How to start an ec2 instance using sqs and trigger a python script inside the instance

I have a python script which takes video and converts it to a series of small panoramas. Now, theres an S3 bucket where a video will be uploaded (mp4). I need this file to be sent to the ec2 instance whenever it is uploaded.
This is the flow:
Upload video file to S3.
This should trigger EC2 instance to start.
Once it is running, I want the file to be copied to a particular directory inside the instance.
After this, I want the py file (panorama.py) to start running and read the video file from the directory and process it and then generate output images.
These output images need to be uploaded to a new bucket or the same bucket which was initially used.
Instance should terminate after this.
What I have done so far is, I have created a lambda function that is triggered whenever an object is added to that bucket. It stores the name of the file and the path. I had read that I now need to use an SQS queue and pass this name and path metadata to the queue and use the SQS to trigger the instance. And then, I need to run a script in the instance which pulls the metadata from the SQS queue and then use that to copy the file(mp4) from bucket to the instance.
How do i do this?
I am new to AWS and hence do not know much about SQS or how to transfer metadata and automatically trigger instance, etc.
Your wording is a bit confusing. It says that you want to "start" an instance (which suggests that the instance already exists), but then it says that it wants to "terminate" an instance (which would permanently remove it). I am going to assume that you actually intend to "stop" the instance so that it can be used again.
You can put a shell script in the /var/lib/cloud/scripts/per-boot/ directory. This script will then be executed every time the instance starts.
When the instance has finished processing, it can call sudo shutdown now -h to turn off the instance. (Alternatively, it can tell EC2 to stop the instance, but using shutdown is easier.)
For details, see: Auto-Stop EC2 instances when they finish a task - DEV Community
I tried to answer in the most minimalist way, there are many points below that can be further improved. I think below is still quite some as you mentioned you are new to AWS.
Using AWS Lambda with Amazon S3
Amazon S3 can send an event to a Lambda function when an object is created or deleted. You configure notification settings on a bucket, and grant Amazon S3 permission to invoke a function on the function's resource-based permissions policy.
When the object uploaded it will trigger the lambda function. Which creates the instance with ec2 user data Run commands on your Linux instance at launch.
For the ec2 instance make you provide the necessary permissions via Using instance profiles for download and uploading the objects.
user data has a script that does the rest of the work which you need for your workflow
Download the s3 object, you can pass the name and s3 bucket name in the same script
Once #1 finished, start the panorama.py which processes the videos.
In the next step you can start uploading the objects to the S3 bucket.
Eventually terminating the instance will be a bit tricky which you can achieve Change the instance initiated shutdown behavior
OR
you can use below method for terminating the instnace, but in that case your ec2 instance profile must have access to terminate the instance.
ec2-terminate-instances $(curl -s http://169.254.169.254/latest/meta-data/instance-id)
You can wrap the above steps into a shell script inside the userdata.
Lambda ec2 start instance:
def launch_instance(EC2, config, user_data):
ec2_response = EC2.run_instances(
ImageId=config['ami'], # ami-0123b531fc646552f
InstanceType=config['instance_type'],
KeyName=config['ssh_key_name'],
MinCount=1,
MaxCount=1,
SecurityGroupIds=config['security_group_ids'],
TagSpecifications=tag_specs,
# UserData=base64.b64encode(user_data).decode("ascii")
UserData=user_data
)
new_instance_resp = ec2_response['Instances'][0]
instance_id = new_instance_resp['InstanceId']
print(f"[DEBUG] Full ec2 instance response data for '{instance_id}': {new_instance_resp}")
return (instance_id, new_instance_resp)
Upload file to S3 -> Launch EC2 instance

Can't sync s3 with ec2 folder from aws lambda

I am trying to automate data processing using AWS. I have setup an AWS lambda function in python that:
Gets triggered by an S3 PUT event
Ssh into an EC2 instance using paramiko layer
Copy the new objects from the bucket into some folder in the instance, unzip the file inside the instance and run a python script that cleans the csv files.
The problem is the aws cli call to sync s3 bucket with ec2 folder is not working, but when I manually ssh into the ec2 instance and runn the command it works.My aws-cli is configured with my access_keys and the ec2 has an s3 role that allows it full access.
import boto3
import time
import paramiko
def lambda_handler(event, context):
#create a low level client representing s3
s3 = boto3.client('s3')
ec2 = boto3.resource('ec2', region_name='eu-west-a')
instance_id = 'i-058456c79fjcde676'
instance = ec2.Instance(instance_id)
------------------------------------------------------'''
#start instance
instance.start()
#allow some time for the instance to start
time.sleep(30)
# Print few details of the instance
print("Instance id - ", instance.id)
print("Instance public IP - ", instance.public_ip_address)
print("Instance private IP - ", instance.private_ip_address)
print("Public dns name - ", instance.public_dns_name)
print("----------------------------------------------------")
print('Downloading pem file')
s3.download_file('some_bucket', 'some_pem_file.pem', '/tmp/some_pem_file.pem')
# Allowing few seconds for the download to complete
print('waiting for instance to start')
time.sleep(30)
print('sshing to instsnce')
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
privkey = paramiko.RSAKey.from_private_key_file('/tmp/some_pem_file.pem')
# username is most likely 'ec2-user' or 'root' or 'ubuntu'
# depending upon yor ec2 AMI
#s3_path = "s3://some_bucket/" + object_name
ssh.connect(
instance.public_dns_name, username='ubuntu', pkey=privkey)
print('inside machine...running commands')
stdin, stdout, stderr = ssh.exec_command('aws s3 sync s3://some_bucket/ ~/ec2_folder;\
bash ~/ec2_folder/unzip.sh; python3 ~/ec2_folder/process.py;')
stdin.flush()
data = stdout.read().splitlines()
for line in data:
print(line)
print('done, closing ssh session')
ssh.close()
# Stop the instance
instance.stop()
return('Triggered')
The use of an SSH tool is somewhat unusual.
Here are a few more 'cloud-friendly' options you might consider.
Systems Manager Run Command
The AWS Systems Manager Run Command allows you to execute a script on an Amazon EC2 instance (and, in fact, on any computer that is running the Systems Manager agent). It can even run the command on many (hundreds!) of instances/computers at the same time, keeping track of the success of each execution.
This means that, instead of connecting to the instance via SSH, the Lambda function could call the Run Command via an API call and Systems Manager would run the code on the instance.
Pull, Don't Push
Rather than 'pushing' the work to the instance, the instance could 'pull the work':
Configure the Amazon S3 event to push a message into an Amazon SQS queue
Code on the instance could be regularly polling the SQS queue
When it finds a message on the queue, it runs a script that downloads the file (the bucket and key are passed in the message) and then runs the processing script
Trigger via HTTP
The instance could run a web server, listening for a message.
Configure the Amazon S3 event to push a message into an Amazon SNS topic
Add the instance's URL as an HTTP subscription to the SNS topic
When a message is sent to SNS, it forwards it to the instance's URL
Code in the web server then triggers your script
This answer is based on the additional information that you wish to shutdown the EC2 instance between executions.
I would recommend:
Amazon S3 Event triggers Lambda function
Lambda function starts the instance, passing filename information via the User Data field (it can be used to pass data, not just scripts). The Lambda function can then immediately exit (which is more cost-effective than waiting for the job to complete)
Put your processing script in the /var/lib/cloud/scripts/per-boot/ directory, which will cause it to run every time the instance is started (every time, not just the first time)
The script can extract the User Data passed from the Lambda function by retrieving curl http://169.254.169.254/latest/user-data/, so that it knows the filename from S3
The script then processes the file
The script then runs sudo shutdown now -h to stop the instance
If there is a chance that another file might come while the instance is already processing a file, then I would slightly change the process:
Rather than passing the filename via User Data, put it into an Amazon SQS queue
When the instance is started, it should retrieve the details from the SQS queue
After the file is processed, it should check the queue again to see if another message has been sent
If yes, the process the file and repeat
If no, shutdown itself
By the way, things can sometimes go wrong, so it's worth putting a 'circuit breaker' in the script so that it does not shutdown the instance if you want to debug things. This could be a matter of passing a flag, or even adding a tag to the instance, which is checked before calling the shutdown command.

boto3 s3 connection error: An error occurred (SignatureDoesNotMatch) when calling the ListBuckets operation

I'm using the boto3 package to connect from outside an s3 cluster (i.e. the script is currently not being run within the AWS 'cloud', but from my MBP connecting to the relevant cluster). My code:
s3 = boto3.resource(
"s3",
aws_access_key_id=self.settings['CREDENTIALS']['aws_access_key_id'],
aws_secret_access_key=self.settings['CREDENTIALS']['aws_secret_access_key'],
)
bucket = s3.Bucket(self.settings['S3']['bucket_test'])
for bucket_in_all in boto3.resource('s3').buckets.all():
if bucket_in_all.name == self.settings['S3']['bucket_test']:
print ("Bucket {} verified".format(self.settings['S3']['bucket_test']))
Now I'm receiving this error message:
botocore.exceptions.ClientError: An error occurred (SignatureDoesNotMatch) when calling the ListBuckets operation
I'm aware of the sequence of how the aws credentials are checked, and tried different permutations of my environment variables and ~/.aws/credentials, and know that the credentials as per my .py script should override, however I'm still seeing this SignatureDoesNotMatch error message. Any ideas where I may be going wrong? I've also tried:
# Create a session
session = boto3.session.Session(
aws_access_key_id=self.settings['CREDENTIALS']['aws_access_key_id'],
aws_secret_access_key=self.settings['CREDENTIALS']['aws_secret_access_key'],
aws_session_token=self.settings['CREDENTIALS']['session_token'],
region_name=self.settings['CREDENTIALS']['region_name']
)
s3 = boto3.resource('s3')
for bucket in s3.buckets.all():
print(bucket.name)
...however I also see the same error traceback.
Actually, this was partly answered by #John Rotenstein and #bdcloud nevertheless I need to be more specific...
The following code in my case was not necessary and causing the error message:
# Create a session
session = boto3.session.Session(
aws_access_key_id=self.settings['CREDENTIALS']['aws_access_key_id'],
aws_secret_access_key=self.settings['CREDENTIALS']['aws_secret_access_key'],
aws_session_token=self.settings['CREDENTIALS']['session_token'],
region_name=self.settings['CREDENTIALS']['region_name']
)
The credentials now stored in self.settings mirror the ~/.aws/credentials. Weirdly (and like last week where the reverse happened), I now have access. It could be that a simple reboot of my laptop meant that my new credentials (since I updated these again yesterday) in ~/.aws/credentials were then 'accepted'.

AWS Lambda Python lots of "could not create '/var/task/__pycache__/FILENAMEpyc'" messages

In the configuration for my Pyhon 3.6 AWS Lambda function I set the environment variable "PYTHONVERBOSE" with a setting of 1
Then in the Cloudwatch logs for my function it shows lots of messages similar to:
could not create '/var/task/pycache/auth.cpython-36.pyc': OSError(30, 'Read-only file system')
Is this important? Do I need to fix it?
I don't think you can write in the /var/task/ folder. If you want to write something to disk inside of the lambda runtime try the /tmp folder.

Resources