How to read image using OpenCV got from S3 using Python 3? - python-3.x

I have a bunch of images in my S3 bucket folder.
I have a list of keys from S3 (img_list) and I can read and display the images:
key = img_list[0]
bucket = s3_resource.Bucket(bucket_name)
img = bucket.Object(key).get().get('Body').read()
I have a function for that:
def image_from_s3(bucket, key):
bucket = s3_resource.Bucket(bucket)
image = bucket.Object(key)
img_data = image.get().get('Body').read()
return Image.open(io.BytesIO(img_data))
Now what I want is to read the image using OpenCV but I am getting an error:
key = img_list[0]
bucket = s3_resource.Bucket(bucket_name)
img = bucket.Object(key).get().get('Body').read()
cv2.imread(img)
SystemError Traceback (most recent call last)
<ipython-input-13-9561b5237a85> in <module>
2 bucket = s3_resource.Bucket(bucket_name)
3 img = bucket.Object(key).get().get('Body').read()
----> 4 cv2.imread(img)
SystemError: <built-in function imread> returned NULL without setting an error
Please advise how to read it in the right way?

Sorry, I got it wrong in the comments. This code sets up a PNG file in a memory buffer to simulate what you get from S3:
#!/usr/bin/env python3
from PIL import Image, ImageDraw
import cv2
# Create new solid red image and save to disk as PNG
im = Image.new('RGB', (640, 480), (255, 0, 0))
im.save('image.png')
# Slurp entire contents, raw and uninterpreted, from disk to memory
with open('image.png', 'rb') as f:
raw = f.read()
# "raw" should now be very similar to what you get from your S3 bucket
Now, all I need is this:
nparray = cv2.imdecode(np.asarray(bytearray(raw)), cv2.IMREAD_COLOR)
So, you should need:
bucket = s3_resource.Bucket(bucket_name)
img = bucket.Object(key).get().get('Body').read()
nparray = cv2.imdecode(np.asarray(bytearray(img)), cv2.IMREAD_COLOR)

Hope This will be Best Solution For Mentioned Requirement, To read The Images from s3 bucket using URLs..!
import os
import logging
import boto3
from botocore.client import Config
from botocore.exceptions import ClientError
import numpy as np
import urllib
import cv2
s3_signature ={
'v4':'s3v4',
'v2':'s3'
}
def create_presigned_url(bucket_name, bucket_key, expiration=3600, signature_version=s3_signature['v4']):
s3_client = boto3.client('s3',
aws_access_key_id="AWS_ACCESS_KEY",
aws_secret_access_key="AWS_SECRET_ACCESS_KEY",
config=Config(signature_version=signature_version),
region_name='ap-south-1'
)
try:
response = s3_client.generate_presigned_url('get_object',
Params={'Bucket': bucket_name,
'Key': bucket_key},
ExpiresIn=expiration)
print(s3_client.list_buckets()['Owner'])
for key in s3_client.list_objects(Bucket=bucket_name,Prefix=bucket_key)['Contents']:
print(key['Key'])
except ClientError as e:
logging.error(e)
return None
# The response contains the presigned URL
return response
def url_to_image(URL):
resp = urllib.request.urlopen(url)
image = np.asarray(bytearray(resp.read()), dtype="uint8")
image = cv2.imdecode(image, cv2.IMREAD_COLOR)
return image
seven_days_as_seconds = 604800
generated_signed_url = create_presigned_url(you_bucket_name, bucket_key,
seven_days_as_seconds, s3_signature['v4'])
print(generated_signed_url)
image_complete = url_to_image(generated_signed_url)
#just debugging Purpose to show the read Image
cv2.imshow('Final_Image',image_complete)
cv2.waitKey(0)
cv2.destroyAllWindows()
Use For Loop to iterate to all Keys In The KeyList. Just Before calling the Function create_presigned_url.

Another alternative using download_fileobj and BytesIO:
from io import BytesIO
import cv2
import numpy as np
bucket = s3_resource.Bucket(bucket_name)
file_stream = BytesIO()
bucket.Object(key).download_fileobj(file_stream)
np_1d_array = np.frombuffer(file_stream.getbuffer(), dtype="uint8")
img = cv2.imdecode(np_1d_array, cv2.IMREAD_COLOR)

Related

How to resize image using Pillow and tkinter

I am trying to display photos from NASA API but I only get the piece of this photo.
I tried to change the geometry of the window app but I still have the same problem.
Also, I tried to resize the image but I am getting errors.
import requests
from pprint import PrettyPrinter
import json
import datetime
from tkinter import *
from PIL import Image, ImageTk
from urllib.request import urlopen
root = Tk()
root.geometry("800x500")
root.title("Nasa photo of a day")
pp = PrettyPrinter()
api_key = 'PRIVATE KEY'
today_date = datetime.date.today()
URL_APOD = "https://api.nasa.gov/planetary/apod"
date = today_date
params = {
'api_key': api_key,
'date': date,
'hd': 'True'
}
response_image = requests.get(URL_APOD, params=params)
json_data = json.loads(response_image.text)
image_url = json_data['hdurl']
print(image_url)
imageUrl = image_url
u = urlopen(imageUrl)
raw_data = u.read()
u.close()
nasa_img = ImageTk.PhotoImage(data=raw_data)
my_label1 = Label(image=nasa_img)
my_label1.pack()
root.mainloop()
This is how it looks like
And this is how is should looks like
You can pass the result of urlopen(...) to PIL.Image() to fetch the image and then resize the image as below:
with urlopen(imageUrl) as fd:
# fetch the image from URL and resize it to desired size
image = Image.open(fd).resize((500,500))
nasa_img = ImageTk.PhotoImage(image)
my_label1 = Label(image=nasa_img)
my_label1.pack()

Header is repeating when merging multiple files using Python shell in AWS Glue

I am new to Python and AWS Glue.
I am trying to merge few excel files in a S3 source bucket and generate 1 output file (csv) in a target S3 bucket. I am able to read and generate the output file with merged data but the only problem is that the header is repeating from each file.
Can someone help to debug to remove the repeating headers?
Below is my code:
import pandas as pd
import glob
import xlrd
import openpyxl
import boto3
import io
import json
import os
from io import StringIO
import numpy as np
s3 = boto3.resource('s3')
bucket = s3.Bucket('test bucket')
prefix_objs = bucket.objects.filter(Prefix='source/file')
prefix_df = []
for obj in prefix_objs:
key = obj.key
print(key)
temp = pd.read_excel(obj.get()['Body'], encoding='utf8')
prefix_df.append(temp)
bucket = 'test bucket'
csv_buffer = StringIO()
for current_df in prefix_df:
current_df.to_csv(csv_buffer, index = None)
print(current_df)
s3_resource = boto3.resource('s3')
s3_resource.Object(bucket, 'merge.csv').put(Body=csv_buffer.getvalue())
Please help!
Regards,
Vijay
Change this line and add the parameter header.
temp = pd.read_excel(obj.get()['Body'], encoding='utf8')
to
temp = pd.read_excel(obj.get()['Body'], encoding='utf8', header=1)
or
temp = pd.read_excel(obj.get()['Body'], encoding='utf8', skiprows=1)
You need to test the header value, because sometimes the header starts not in the first row.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html

Python cv2.imread() by url

i want to get images by url here on 6 and 7 lines, any help or ideas?
import urllib
import numpy as np
mkembed = ""
ourembed = ""
mkpic = cv2.imread("image.jpg")
ourpic = cv2.imread("image2.jpg")
difference = cv2.subtract(mkpic, ourpic)
b, g, r = cv2.split(difference)
if cv2.countNonZero(b) == 0 and cv2.countNonZero(g) == 0 and cv2.countNonZero(r) == 0:
print("The images are completely Equal")```
Use the following code to get an image by url with cv2:
#import necessary packages
import numpy as np
import urllib.request as urllib
import cv2
#get image by url
resp = urllib.urlopen("https://homepages.cae.wisc.edu/~ece533/images/airplane.png")
image = np.asarray(bytearray(resp.read()), dtype="uint8")
image = cv2.imdecode(image, cv2.IMREAD_COLOR)
#show image
cv2.imshow("Image", image)
cv2.waitKey()

Read an video file object from S3 and use it for further processing through Opencv

import boto3
import cv2
import numpy as np
s3 = boto3.resource('s3')
vid = (s3.Object('bucketname', 'video.blob').get()['Body'].read())
cap = cv2.VideoCapture(vid)
This is my code. I have a video file in an s3 bucket. I want to do some processing on it with OpenCV and I don't want to download it. So I'm trying to store that video file into vid. Now the problem is that type(vid) is byte which is the reason to result in this error TypeError: an integer is required (got type bytes)
on line 6. I tried converting it into an integer or a string but was unable to.
On an attempt to convert byte to an integer: I referred to this and was getting length issues. This is just a sample video file. The actual file I want to do processing on will be huge when converted to byte object.
On an attempt to get the object as a string and then convert it to an integer: I referred to this. Even this doesn't seem to work for me.
If anyone can help me solve this issue, I will be grateful. Please comment if anything is uncertain to you regarding my issue and I'll try to provide more details.
If streaming the video from a url is an acceptable solution, I think that is the easiest solution. You just need to generate a url to read the video from.
import boto3
import cv2
s3_client = boto3.client('s3')
bucket = 'bucketname'
key = 'video.blob'
url = s3_client.generate_presigned_url('get_object',
Params = {'Bucket': bucket, 'Key': key},
ExpiresIn = 600) #this url will be available for 600 seconds
cap = cv2.VideoCapture(url)
ret, frame = cap.read()
You should see that you are able to read and process frames from that url.
Refer below the useful code snippet to perform various operations on S3 bucket.
import boto3
s3 = boto3.resource('s3', region_name='us-east-2')
for listing buckets in s3
for bucket in s3.buckets.all():
print(bucket.name)
bucket creation in s3
my_bucket=s3.create_bucket(Bucket='Bucket Name', CreateBucketConfiguration={
'LocationConstraint': 'us-east-2'
})
listing down objects inside bucket
my_bucket = s3.Bucket('Bucket Name')
for file in my_bucket.objects.all():
print (file.key)
Uploading a file from current directory
import os
print(os.getcwd())
fileName="B01.jpg"
bucketName="Bucket Name"
file = open(fileName)
s3.meta.client.upload_file(fileName, bucketName, 'test2.txt')
reading image/video from bucket
import matplotlib.pyplot as plt
s3 = boto3.resource('s3', region_name='us-east-2')
bucket = s3.Bucket('Bucket Name') # bucket name
object = bucket.Object('maisie_williams.jpg') # image name
object.download_file('B01.jpg') #donwload image with this name
img=plt.imread('B01.jpg') #read the downloaded image
imgplot = plt.imshow(img) #plot the image
plt.show(imgplot)
Reading from one bucket and then dumping it to another
import boto3
s3 = boto3.resource('s3', region_name='us-east-2')
bucket = s3.Bucket('Bucket Name') # bucket name
object = bucket.Object('maisie_williams.jpg') # image name
object.download_file('B01.jpg')
fileName="B01.jpg"
bucketName="Bucket Name"
file = open(fileName)
s3.meta.client.upload_file(fileName, bucketName, 'testz.jpg')
If you have access keys then you can probably do the folowing
keys = pd.read_csv('accessKeys.csv')
#creating Session for S3 buckets
session = boto3.session.Session(aws_access_key_id=keys['Access key ID'][0],
aws_secret_access_key=keys['Secret access key'][0])
s3 = session.resource('s3')
buck = s3.Bucket('Bucket Name')

Write Pandas Dataframe to_csv StringIO instead of file

Objective of this code is to read an existing CSV file from a specified S3 bucket into a Dataframe, filter the dataframe for desired columns, and then write the filtered Dataframe to a CSV object using StringIO that I can upload to a different S3 bucket.
Everything works right now except the code block for the function "prepare_file_for_upload". Below is the full code block:
from io import StringIO
import io #unsued at the moment
import logging
import pandas as pd
import boto3
from botocore.exceptions import ClientError
FORMAT = '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
logging.basicConfig(level=logging.INFO, format=FORMAT)
logger = logging.getLogger(__name__)
#S3 parameters
source_bucket = 'REPLACE'
source_folder = 'REPLACE/'
dest_bucket = 'REPLACE'
dest_folder = 'REPLACE'
output_name = 'REPLACE'
def get_file_name():
try:
s3 = boto3.client("s3")
logging.info(f'Determining filename from: {source_bucket}/{source_folder}')
bucket_path = s3.list_objects(Bucket=source_bucket, Prefix=source_folder)
file_name =[key['Key'] for key in bucket_path['Contents']][1]
logging.info(file_name)
return file_name
except ClientError as e:
logging.info(f'Unable to determine file name from bucket {source_bucket}/{source_folder}')
logging.info(e)
def get_file_data(file_name):
try:
s3 = boto3.client("s3")
logging.info(f'file name from get data: {file_name}')
obj = s3.get_object(Bucket=source_bucket, Key=file_name)
body = obj['Body']
body_string = body.read().decode('utf-8')
file_data = pd.read_csv(StringIO(body_string))
#logging.info(file_data)
return file_data
except ClientError as e:
logging.info(f'Unable to read {file_name} into datafame')
logging.info(e)
def filter_file_data(file_data):
try:
all_columns = list(file_data.columns)
columns_used = ('col_1', 'col_2', 'col_3')
desired_columns = [x for x in all_columns if x in columns_used]
filtered_data = file_data[desired_columns]
logging.info(type(filtered_data)) #for testing
return filtered_data
except Exception as e:
logging.info('Unable to filter file')
logging.info(e)
The block below is where I am attempting to write the existing DF that was passed to the function using "to_csv" method with StringIO instead of creating a local file. to_csv will write to a local file but does not work with buffer (yes, I tried putting the buffer cursor to start position after and still nothing)
def prepare_file_for_upload(filtered_data): #this is the function block where I am stuck
try:
buffer = StringIO()
output_name = 'FILE_NAME.csv'
#code below is writing to file but can not get to write to buffer
output_file = filtered_data.to_csv(buffer, sep=',')
df = pd.DataFrame(buffer) #for testing
logging.info(df) #for testing
return output_file
except Exception as e:
logging.info(f'Unable to prepare {output_name} for upload')
logging.info(e)
def upload_file(adjusted_file):
try:
#dest_key = f'{dest_folder}/{output_name}'
dest_key = f'{output_name}'
s3 = boto3.resource('s3')
s3.meta.client.upload_file(adjusted_file, dest_bucket, dest_key)
except ClientError as e:
logging.info(f'Unable to upload {output_name} to {dest_key}')
logging.info(e)
def execute_program():
file_name = get_file_name()
file_data = get_file_data(file_name)
filtered_data = filter_file_data(file_data)
adjusted_file = prepare_file_for_upload(filtered_data)
upload_file = upload_file(adjusted_file)
if __name__ == '__main__':
execute_program()
Following solution worked for me:
csv_buffer = StringIO()
output_file = filtered_data.to_csv(csv_buffer)
s3_resource = boto3.resource('s3')
s3_resource.Object(dest_bucket, output_name).put(Body=csv_buffer.getvalue())
When working with a BytesIO object, pay careful attention to the order of operations. In your code, you instantiate the BytesIO object and then fill it via a call to to_csv(). So far so good. But one thing to manage when working with a BytesIO object that is different from a file workflow is the stream position.
After writing data to the stream, the stream position is at the end of the stream. If you try to write from that position, you will likely write nothing! The operation will complete leaving you scratching your head why no results are written to S3. Add a call to seek() with the argument 0 to your function. Here is a demo program that demonstrates:
from io import BytesIO
import boto3
import pandas
from pandas import util
df = util.testing.makeMixedDataFrame()
s3_resource = boto3.resource("s3")
buffer = BytesIO()
df.to_csv(buffer, sep=",", index=False, mode="wb", encoding="UTF-8")
# The following call to `tell()` returns the stream position. 0 is the beginning of the file.
df.tell()
>> 134
# Reposition stream to the beginning by calling `seek(0)` before uploading
df.seek(0)
s3_r.Object("test-bucket", "test_df_from_resource.csv").put(Body=buffer.getvalue())
You should get a response similar to the following (with actual values)
>> {'ResponseMetadata': {'RequestId': 'request-id-value',
'HostId': '###########',
'HTTPStatusCode': 200,
'HTTPHeaders': {'x-amz-id-2': '############',
'x-amz-request-id': '00000',
'date': 'Tue, 31 Aug 2021 00:00:00 GMT',
'x-amz-server-side-encryption': 'value',
'etag': '"xxxx"',
'server': 'AmazonS3',
'content-length': '0'},
'RetryAttempts': 0},
'ETag': '"xxxx"',
'ServerSideEncryption': 'value'}
Changing the code to move the stream position should solve the issues you were facing. It is also worth mentioning, Pandas had a bug that caused unexpected behavior when writing to a bytes object. It was fixed and the sample I provided assumes you are running a version of Python greater than 3.8 and a version of Pandas greater than 1.3.2. Further information on IO can be found in the python documentation.

Resources