How to download PDF file from AWS API gateway in python - python-3.x

I'm creating AWS API endpoint (GET) to get a PDF file and facing a serializable issue.
AWS Lambda is mapped to access the file from s3.
import boto3
import base64
def lambda_handler(event, context):
response = client.get_object(
Bucket='test-bucket',
Key=file_path,
)
data = response['Body'].read()
return {
'statusCode': 200,
'isBase64Encoded': True,
'body': data,
'headers': {
'content-type': 'application/pdf',
'content-disposition': 'attachment; filename=test.pdf'
}
}
[ERROR] Runtime.MarshalError: Unable to marshal response: bytes is not JSON serializable.
If I return str(data, "utf-8") will download PDF file and making issues while open.
Please suggest to me where I'm lagging.
Thanks.

You will need to initiate the client variable first and then encode the data that is coming back from s3 as followed:
import json
import boto3
import base64
client = boto3.client('s3')
def lambda_handler(event, context):
bucket_name ='bucket-name'
file_name='file-name.pdf'
fileObject = client.get_object(Bucket=bucket_name, Key=file_name)
file_content = fileObject["Body"].read()
print(bucket_name, file_name)
return base64.b64encode(file_content)

Related

AttributeError: module 'urllib3' has no attribute 'HTTPHeaderDict'

I am trying to send headers from lambda into an API.I have taken HTTPHeaderDict from https://urllib3.readthedocs.io/en/latest/user-guide.html .
import urllib3
import os
import json
# Get environment variable
api_url = os.getenv('API_URL')
def lambda_handler(event, context):
#print("Received event: " + json.dumps(event, indent=2))
request = json.loads(json.dumps(event))
print(json.dumps(request['headers']))
headers = request['headers']
request_headers = urllib3.HTTPHeaderDict()
for key in headers:
request_headers.add(key, headers[key])
http = urllib3.PoolManager()
response = http.request('GET', api_url + '/', headers=request_headers, timeout=10)
return {
'statusCode': response.status,
'headers': response.headers,
'body': response.data
}
I see the error in cloudwatch
[ERROR] AttributeError: module 'urllib3' has no attribute 'HTTPHeaderDict'
What version of urllib3 are you using? If you are using the latest pip installed package which is version 1.26.7 it won't have it exposed at the package import level. If you look at the docs for the latest stable release you'll see that it isn't mentioned as an import level Class.
The link you linked too is for 2.0.0dev0 version which you'll have to install from the github repo itself. If you can't install from the repo you should be able to access the HTTPHeaderDict class from the _collections module like from urllib3._collections import HTTPHeaderDict and then call it as request_headers = HTTPHeaderDict().
The HTTPHeaderDict seems to be not serializable and the error was in the way i was sending the response.
import urllib3
import os
import json
#from urllib3._collections import HTTPHeaderDict
# Get environment variable
api_url = os.getenv('API_URL')
def lambda_handler(event, context):
#print("Received event: " + json.dumps(event, indent=2))
request = json.loads(json.dumps(event))
http = urllib3.PoolManager()
response = http.request('GET', api_url + '/', headers=request['headers'], timeout=10)
response_headers = {}
for header_key in response.headers:
response_headers[header_key] = response.headers[header_key]
return {
'statusCode': response.status,
'headers' : response_headers,
'body': json.dumps('Hello from Lambda!')
}
I had to loop through the HTTPHeaderDict and put the value in another dictionary.As pointed by #mata request['headers'] can be used directly.
P.S I couldn't find aws docs that indicate which version of urllib3 is used for AWS Lambda functions

Kaggle login and unzip file to store in s3 bucket

Create a lambda function for python 3.7.
Role attached to the lambda function should have S3 access and lambda basic execution.
Read data from https://www.kaggle.com/therohk/india-headlines-news-dataset/download and save into S3 as CSV. file is zip how to unzip and store in temp file
Getting Failed in AWS Lambda function:
Lambda Handler to download news headline dataset from kaggle
import urllib3
import boto3
from botocore.client import Config
http = urllib3.PoolManager()
def lambda_handler(event, context):
bucket_name = 'news-data-kaggle'
file_name = "india-news-headlines.csv"
lambda_path = "/tmp/" +file_name
kaggle_info = {'UserName': "bossdk", 'Password': "xxx"}
url = "https://www.kaggle.com/account/login"
data_url = "https://www.kaggle.com/therohk/india-headlines-news-dataset/download"
r = http.request('POST',url,kaggle_info)
r = http.request('GET',data_url)
f = open(lambda_path, 'wb')
for chunk in r.iter_content(chunk_size = 512 * 1024):
if chunk:
f.write(chunk)
f.close()
data = ZipFile(lambda_path)
# S3 Connect
s3 = boto3.resource('s3',config=Config(signature_version='s3v4'))
# Uploaded File
s3.Bucket(bucket_name).put(Key=lambda_path, Body=data, ACL='public-read')
return {
'status': 'True',
'statusCode': 200,
'body': 'Dataset Uploaded'
}

How to get AWS S3 object Location/URL using python 3.8?

I am uploading a file to AWS S3 using AWS Lambda function (Python3.8) with the following code.
file_obj = open(filename, 'rb')
s3_upload = s3.put_object( Bucket="aaa", Key="aaa.png", Body=file_obj)
return {
'statusCode': 200,
'body': json.dumps("Executed Successfully")
}
I want to get the location/url of the S3 object in return. In Node.js we use the .location parameter for getting the object location/url.
Any idea how to do this using python 3.8?
The url of S3 objects has known format and follows Virtual hosted style access:
https://bucket-name.s3.Region.amazonaws.com/keyname
Thus, you can construct the url yourself:
bucket_name = 'aaa'
aws_region = boto3.session.Session().region_name
object_key = 'aaa.png'
s3_url = f"https://{bucket_name}.s3.{aws_region}.amazonaws.com/{object_key}"
return {
'statusCode': 200,
'body': json.dumps({'s3_url': s3_url})
}

How to upload video to s3 using API GW and python?

Im trying to make a api which will upload video to s3 . I all ready managed to upload the video in s3, but the problem is the video file is not working . i checked content-type of video file, and it's binary/octet-stream instead on video/mp4 . So i set content-type to "video/mp4" while calling put_object api, but it still not working.
I use Lambda function for putting the video to s3 . here is my lambda code -
import json
import base64
import boto3
def lambda_handler(event, context):
bucket_name = 'ad-live-streaming'
s3_client = boto3.client('s3')
file_content = event['content']
merchantId = event['merchantId']
catelogId = event['catelogId']
file_name = event['fileName']
file_path = '{}/{}/{}.mp4'.format(merchantId, catelogId, file_name)
s3_response = s3_client.put_object(Bucket=bucket_name, Key=file_path, Body=file_content, ContentType='video/mp4')
return {
'statusCode': 200,
"merchantId":merchantId,
"catelogId":catelogId,
"file_name":file_name,
}
Any idea how to solve this issue ?
Based on the example in Upload binary files to S3 using AWS API Gateway with AWS Lambda | by Omer Hanetz | The Startup | Medium, it appears that you need to decode the file from base64:
file_content = base64.b64decode(event['content'])

Writing string to S3 with boto3: "'dict' object has no attribute 'put'"

In an AWS lambda, I am using boto3 to put a string into an S3 file:
import boto3
s3 = boto3.client('s3')
data = s3.get_object(Bucket=XXX, Key=YYY)
data.put('Body', 'hello')
I am told this:
[ERROR] AttributeError: 'dict' object has no attribute 'put'
The same happens with data.put('hello') which is the method recommended by the top answers at How to write a file or data to an S3 object using boto3 and with data.put_object: 'dict' object has no attribute 'put_object'.
What am I doing wrong?
On the opposite, reading works great (with data.get('Body').read().decode('utf-8')).
put_object is a method of the s3 object, not the data object.
Here is a full working example with Python 3.7:
import json
import boto3
s3 = boto3.client('s3')
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
bucket = 'mybucket'
key = 'id.txt'
id = None
# Write id to S3
s3.put_object(Body='Hello!', Bucket=bucket, Key=key)
# Read id from S3
data = s3.get_object(Bucket=bucket, Key=key)
id = data.get('Body').read().decode('utf-8')
logger.info("Id:" + id)
return {
'statusCode': 200,
'body': json.dumps('Id:' + id)
}

Resources