Writing json to AWS S3 from AWS Lambda - python-3.x

I am trying to write a response to AWS S3 as a new file each time.
Below is the code I am using
s3 = boto3.resource('s3', region_name=region_name)
s3_obj = s3.Object(s3_bucket, f'/{folder}/{file_name}.json')
resp_ = s3_obj.put(Body=json.dumps(response_json).encode('UTF-8'))
I can see that I get a 200 response and the file on the directory as well. But it also produces the below exception :
[DEBUG] 2020-10-13T08:29:10.828Z. Event needs-retry.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f2cf2fdfe123>>
My code throws 500 Exception even though it works. I have other business logic as part of the lambda and things work just fine as the write to S3 operation is at the last. Any help would be appreciated.

The Key (filename) of an Amazon S3 object should not start with a slash (/).

Related

Check if a file exists in s3 using graphql code

Can someone tell me how to check if a file exists in s3 bucket using graphql code.
As of now i am using :
await s3.headObject(existParams).promise();
But the above is not working I just keep on waiting for the response and it doesnot return anything and after waiting for 1 min it throws time out(504).

Testing a connection to S3 from an EC2 Instance using Python 3

I'm attempting to establish whether an EC2 instance can reach S3. Currently I'm doing this through an upload:
try:
# Create empty ping file
subprocess.run(['touch', '/tmp/ping'])
# Run upload commands
upload_result = subprocess.run(['aws', 's3', 'cp', '/tmp/ping', 's3://mybucket/ping'])
# Check if it succeeded
upload_result.check_returncode()
except Exception as e:
print('Could not reach S3')
However, I'm wondering if there's a more efficient (non-boto) way of doing this. The EC2 Instance does not have s3:getObject permissions, only s3:putObject which is intended. But if there's a way to establish it by a simple HTTPS request or something similar, I would love to hear about it.
Couple of things here that are not ideal:
shelling out to the awscli (I would use the boto3 SDK instead)
invoking a mutating operation (PutObject) simply to test connectivity
You might consider giving this Lambda function read access to a specific sentinel object (e.g. s3://mybucket/headtest) and then invoke HeadObject against it.

How to start an ec2 instance using sqs and trigger a python script inside the instance

I have a python script which takes video and converts it to a series of small panoramas. Now, theres an S3 bucket where a video will be uploaded (mp4). I need this file to be sent to the ec2 instance whenever it is uploaded.
This is the flow:
Upload video file to S3.
This should trigger EC2 instance to start.
Once it is running, I want the file to be copied to a particular directory inside the instance.
After this, I want the py file (panorama.py) to start running and read the video file from the directory and process it and then generate output images.
These output images need to be uploaded to a new bucket or the same bucket which was initially used.
Instance should terminate after this.
What I have done so far is, I have created a lambda function that is triggered whenever an object is added to that bucket. It stores the name of the file and the path. I had read that I now need to use an SQS queue and pass this name and path metadata to the queue and use the SQS to trigger the instance. And then, I need to run a script in the instance which pulls the metadata from the SQS queue and then use that to copy the file(mp4) from bucket to the instance.
How do i do this?
I am new to AWS and hence do not know much about SQS or how to transfer metadata and automatically trigger instance, etc.
Your wording is a bit confusing. It says that you want to "start" an instance (which suggests that the instance already exists), but then it says that it wants to "terminate" an instance (which would permanently remove it). I am going to assume that you actually intend to "stop" the instance so that it can be used again.
You can put a shell script in the /var/lib/cloud/scripts/per-boot/ directory. This script will then be executed every time the instance starts.
When the instance has finished processing, it can call sudo shutdown now -h to turn off the instance. (Alternatively, it can tell EC2 to stop the instance, but using shutdown is easier.)
For details, see: Auto-Stop EC2 instances when they finish a task - DEV Community
I tried to answer in the most minimalist way, there are many points below that can be further improved. I think below is still quite some as you mentioned you are new to AWS.
Using AWS Lambda with Amazon S3
Amazon S3 can send an event to a Lambda function when an object is created or deleted. You configure notification settings on a bucket, and grant Amazon S3 permission to invoke a function on the function's resource-based permissions policy.
When the object uploaded it will trigger the lambda function. Which creates the instance with ec2 user data Run commands on your Linux instance at launch.
For the ec2 instance make you provide the necessary permissions via Using instance profiles for download and uploading the objects.
user data has a script that does the rest of the work which you need for your workflow
Download the s3 object, you can pass the name and s3 bucket name in the same script
Once #1 finished, start the panorama.py which processes the videos.
In the next step you can start uploading the objects to the S3 bucket.
Eventually terminating the instance will be a bit tricky which you can achieve Change the instance initiated shutdown behavior
OR
you can use below method for terminating the instnace, but in that case your ec2 instance profile must have access to terminate the instance.
ec2-terminate-instances $(curl -s http://169.254.169.254/latest/meta-data/instance-id)
You can wrap the above steps into a shell script inside the userdata.
Lambda ec2 start instance:
def launch_instance(EC2, config, user_data):
ec2_response = EC2.run_instances(
ImageId=config['ami'], # ami-0123b531fc646552f
InstanceType=config['instance_type'],
KeyName=config['ssh_key_name'],
MinCount=1,
MaxCount=1,
SecurityGroupIds=config['security_group_ids'],
TagSpecifications=tag_specs,
# UserData=base64.b64encode(user_data).decode("ascii")
UserData=user_data
)
new_instance_resp = ec2_response['Instances'][0]
instance_id = new_instance_resp['InstanceId']
print(f"[DEBUG] Full ec2 instance response data for '{instance_id}': {new_instance_resp}")
return (instance_id, new_instance_resp)
Upload file to S3 -> Launch EC2 instance

list_datasets() method does nothing in AWS Lambda

I am trying to get the list of datasets from BigQuery inside the AWS lambda. But, while executing the client.list_datasets() method it does nothing and lambda is timed out.
My code is as follows:
from google.cloud.bigquery import Client
from google.oauth2.service_account import Credentials
credentials = Credentials.from_service_account_info(
service_account_dict)
client = Client(
project=service_account_dict.get("project_id"),
credentials=credentials
)
datasets = client.list_datasets()
print(datasets)
for dataset in datasets:
print("dataset info", dataset.__dict__)
The output of first print statement is:
<google.api_core.page_iterator.HTTPIterator object at 0x7fbae4975550>
But, the second print for dataset.__dict__ is not being printed. Or, looping over the HTTPIterator object is not performed.
BTW, the code works perfectly fine in local machine.
The AWS VPC that I used in lambda function was causing this issue. The VPC blocked requests to the external API (in my case BigQuery API).
Configuring the VPC subnet and NAT Gateway to expose lambda function to the internet (0.0.0.0/0) solved the issue.

boto3 s3 connection error: An error occurred (SignatureDoesNotMatch) when calling the ListBuckets operation

I'm using the boto3 package to connect from outside an s3 cluster (i.e. the script is currently not being run within the AWS 'cloud', but from my MBP connecting to the relevant cluster). My code:
s3 = boto3.resource(
"s3",
aws_access_key_id=self.settings['CREDENTIALS']['aws_access_key_id'],
aws_secret_access_key=self.settings['CREDENTIALS']['aws_secret_access_key'],
)
bucket = s3.Bucket(self.settings['S3']['bucket_test'])
for bucket_in_all in boto3.resource('s3').buckets.all():
if bucket_in_all.name == self.settings['S3']['bucket_test']:
print ("Bucket {} verified".format(self.settings['S3']['bucket_test']))
Now I'm receiving this error message:
botocore.exceptions.ClientError: An error occurred (SignatureDoesNotMatch) when calling the ListBuckets operation
I'm aware of the sequence of how the aws credentials are checked, and tried different permutations of my environment variables and ~/.aws/credentials, and know that the credentials as per my .py script should override, however I'm still seeing this SignatureDoesNotMatch error message. Any ideas where I may be going wrong? I've also tried:
# Create a session
session = boto3.session.Session(
aws_access_key_id=self.settings['CREDENTIALS']['aws_access_key_id'],
aws_secret_access_key=self.settings['CREDENTIALS']['aws_secret_access_key'],
aws_session_token=self.settings['CREDENTIALS']['session_token'],
region_name=self.settings['CREDENTIALS']['region_name']
)
s3 = boto3.resource('s3')
for bucket in s3.buckets.all():
print(bucket.name)
...however I also see the same error traceback.
Actually, this was partly answered by #John Rotenstein and #bdcloud nevertheless I need to be more specific...
The following code in my case was not necessary and causing the error message:
# Create a session
session = boto3.session.Session(
aws_access_key_id=self.settings['CREDENTIALS']['aws_access_key_id'],
aws_secret_access_key=self.settings['CREDENTIALS']['aws_secret_access_key'],
aws_session_token=self.settings['CREDENTIALS']['session_token'],
region_name=self.settings['CREDENTIALS']['region_name']
)
The credentials now stored in self.settings mirror the ~/.aws/credentials. Weirdly (and like last week where the reverse happened), I now have access. It could be that a simple reboot of my laptop meant that my new credentials (since I updated these again yesterday) in ~/.aws/credentials were then 'accepted'.

Resources