How to read the boto3 file object in opencv python3 - python-3.x

I am trying to read the AWS S3 presigned uri file with open-CV. But the read parameter is of NoneType. How to read the boto3 file_obj in opencv and process further?
import cv2
import boto3
s3Client = boto3.client('s3')
file_path = s3Client.generate_presigned_url('get_object', Params = {'Bucket':
'www.mybucket.com', 'Key': 'hello.txt'}, ExpiresIn = 100)
img = cv2.imread(file_path)
But it is reading the file as <class 'NoneType'>. But I need it to be read by the cv2.

import urllib2
response = urllib2.urlopen(file_path)
image = response.read()
img = cv2.imread(image)
can you please try this

Related

How to read parquet file from s3 using pandas

I am trying to read the parquet file which is in s3 using pandas.
Below is the code
import boto3
import pandas as pd
key = 'key'
secret = 'secret'
s3_client = boto3.client(
's3',
aws_access_key_id = key,
aws_secret_access_key = secret,
region_name = 'region_name'
)
print(s3_client)
AWS_S3_BUCKET='bucket_name'
filePath='data/wine_dataset'
response = s3_client.get_object(Bucket=AWS_S3_BUCKET, Key=filePath)
status = response.get("ResponseMetadata", {}).get("HTTPStatusCode")
if status == 200:
print(f"Successful S3 get_object response. Status - {status}")
books_df = pd.read_parquet(response.get("Body"))
print(books_df)
else:
print(f"Unsuccessful S3 get_object response. Status - {status}")
I am getting the below error
NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.
But when I read the same s3 path using pyspark it worked
path= 's3a://bucket_name/data/wine_dataset'
df = spark.read.parquet(path)
I am not sure why it is not working using pandas. Can anyone help me on this?

Kaggle login and unzip file to store in s3 bucket

Create a lambda function for python 3.7.
Role attached to the lambda function should have S3 access and lambda basic execution.
Read data from https://www.kaggle.com/therohk/india-headlines-news-dataset/download and save into S3 as CSV. file is zip how to unzip and store in temp file
Getting Failed in AWS Lambda function:
Lambda Handler to download news headline dataset from kaggle
import urllib3
import boto3
from botocore.client import Config
http = urllib3.PoolManager()
def lambda_handler(event, context):
bucket_name = 'news-data-kaggle'
file_name = "india-news-headlines.csv"
lambda_path = "/tmp/" +file_name
kaggle_info = {'UserName': "bossdk", 'Password': "xxx"}
url = "https://www.kaggle.com/account/login"
data_url = "https://www.kaggle.com/therohk/india-headlines-news-dataset/download"
r = http.request('POST',url,kaggle_info)
r = http.request('GET',data_url)
f = open(lambda_path, 'wb')
for chunk in r.iter_content(chunk_size = 512 * 1024):
if chunk:
f.write(chunk)
f.close()
data = ZipFile(lambda_path)
# S3 Connect
s3 = boto3.resource('s3',config=Config(signature_version='s3v4'))
# Uploaded File
s3.Bucket(bucket_name).put(Key=lambda_path, Body=data, ACL='public-read')
return {
'status': 'True',
'statusCode': 200,
'body': 'Dataset Uploaded'
}

How to upload video to s3 using API GW and python?

Im trying to make a api which will upload video to s3 . I all ready managed to upload the video in s3, but the problem is the video file is not working . i checked content-type of video file, and it's binary/octet-stream instead on video/mp4 . So i set content-type to "video/mp4" while calling put_object api, but it still not working.
I use Lambda function for putting the video to s3 . here is my lambda code -
import json
import base64
import boto3
def lambda_handler(event, context):
bucket_name = 'ad-live-streaming'
s3_client = boto3.client('s3')
file_content = event['content']
merchantId = event['merchantId']
catelogId = event['catelogId']
file_name = event['fileName']
file_path = '{}/{}/{}.mp4'.format(merchantId, catelogId, file_name)
s3_response = s3_client.put_object(Bucket=bucket_name, Key=file_path, Body=file_content, ContentType='video/mp4')
return {
'statusCode': 200,
"merchantId":merchantId,
"catelogId":catelogId,
"file_name":file_name,
}
Any idea how to solve this issue ?
Based on the example in Upload binary files to S3 using AWS API Gateway with AWS Lambda | by Omer Hanetz | The Startup | Medium, it appears that you need to decode the file from base64:
file_content = base64.b64decode(event['content'])

Writing string to S3 with boto3: "'dict' object has no attribute 'put'"

In an AWS lambda, I am using boto3 to put a string into an S3 file:
import boto3
s3 = boto3.client('s3')
data = s3.get_object(Bucket=XXX, Key=YYY)
data.put('Body', 'hello')
I am told this:
[ERROR] AttributeError: 'dict' object has no attribute 'put'
The same happens with data.put('hello') which is the method recommended by the top answers at How to write a file or data to an S3 object using boto3 and with data.put_object: 'dict' object has no attribute 'put_object'.
What am I doing wrong?
On the opposite, reading works great (with data.get('Body').read().decode('utf-8')).
put_object is a method of the s3 object, not the data object.
Here is a full working example with Python 3.7:
import json
import boto3
s3 = boto3.client('s3')
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
bucket = 'mybucket'
key = 'id.txt'
id = None
# Write id to S3
s3.put_object(Body='Hello!', Bucket=bucket, Key=key)
# Read id from S3
data = s3.get_object(Bucket=bucket, Key=key)
id = data.get('Body').read().decode('utf-8')
logger.info("Id:" + id)
return {
'statusCode': 200,
'body': json.dumps('Id:' + id)
}

Unable to read the buffer from BytesIO in google app engine flex environment

Here is the related code
import logging
logging.getLogger('googleapicliet.discovery_cache').setLevel(logging.ERROR)
import datetime
import json
from flask import Flask, render_template, request
from flask import make_response
from googleapiclient.discovery import build
from googleapiclient.http import MediaIoBaseDownload
from oauth2client.client import AccessTokenCredentials
...
#app.route('/callback_download')
def userselectioncallback_with_drive_api():
"""
Need to make it a background process
"""
logging.info("In download callback...")
code = request.args.get('code')
fileId = request.args.get('fileId')
logging.info("code %s", code)
logging.info("fileId %s", fileId)
credentials = AccessTokenCredentials(
code,
'flex-env/1.0')
http = httplib2.Http()
http_auth = credentials.authorize(http)
# Exports a Google Doc to the requested MIME type and returns the exported content. Please note that the exported content is limited to 10MB.
# v3 does not work? over quota?
drive_service = build('drive', 'v3', http=http_auth)
drive_request = drive_service.files().export(
fileId=fileId,
mimeType='application/pdf')
b = bytes()
fh = io.BytesIO(b)
downloader = MediaIoBaseDownload(fh, drive_request)
done = False
try:
while done is False:
status, done = downloader.next_chunk()
logging.log("Download %d%%.", int(status.progress() * 100))
except Exception as err:
logging.error(err)
logging.error(err.__class__)
response = make_response(fh.getbuffer())
response.headers['Content-Type'] = 'application/pdf'
response.headers['Content-Disposition'] = \
'inline; filename=%s.pdf' % 'yourfilename'
return response
It is based on some code example of drive api. I am trying to export some files from google drive to pdf format.
The exception comes from the line
response = make_response(fh.getbuffer())
It throws the exception:
TypeError: 'memoryview' object is not callable
How can I retrieve the pdf content properly from the fh? Do I need to further apply some base 64 encoding?
My local runtime is python 3.4.3
I have used an incorrect API. I should do this instead:
response = make_response(fh.getvalue())

Resources