I want to upload the figure which is made with matplotlib to GCS.
Current code:
from tensorflow.gfile import MakeDirs, Open
import numpy as np
import matplotlib.pyplot as plt
import datetime
_LOGDIR = "{date:%Y%m%d-%H%M%S}".format(date=datetime.datetime.now())
_PATH_LOGDIR = 'gs://{0}/logs/{1}'.format('skin_cancer_mnist', _LOGDIR)
MakeDirs(_PATH_LOGDIR))
def saving_figure(path_logdir):
data = np.arange(0, 21, 2)
fig = plt.figure(figsize=(20, 10))
plt.plot(data)
fig.savefig("{0}/accuracy_loss_graph.png".format(path_logdir))
plt.close()
saving_figure(_PATH_LOGDIR)
"/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/matplotlib/backends/backend_agg.py", line 512, in print_png
filename_or_obj = open(filename_or_obj, 'wb')
FileNotFoundError: [Errno 2] No such file or directory: 'gs://skin_cancer_mnist/logs/20190116-195604/accuracy_loss_graph.png'
(The directory exists, I checked)
I could change the source code of matplotlib to use the Open method of tf.Gfile.Open, but there should be a better option...
Joans 2nd Option didn't work for me, I found a solution that worked for me:
from google.cloud import storage
import io
def saving_figure(path_logdir):
data = np.arange(0, 21, 2)
fig = plt.figure(figsize=(20, 10))
plt.plot(data)
fig_to_upload = plt.gcf()
# Save figure image to a bytes buffer
buf = io.BytesIO()
fig_to_upload.savefig(buf, format='png')
# init GCS client and upload buffer contents
client = storage.Client()
bucket = client.get_bucket('skin_cancer_mnist')
blob = bucket.blob('logs/20190116-195604/accuracy_loss_graph.png')
blob.upload_from_file(buf, content_type='image/png', rewind=True)
You cannot directly upload a file to Google Cloud Storage using the python open function (which is the one that matplotlib.pyplot.savefig is using behind the curtains).
Instead, you should use the Cloud Storage Client Library for Python. Check this documentation for details on how this library is used. This will allow you to manipulate files and upload/download them to GCS, among other things.
You will have to import this library in order to use it, you can install it by running pip install google-cloud-storage and import it as from google.cloud import storage.
As well, since the plt.figure is an object, and not the actual .png image that you want to upload, you cannot directly upload it to Google Cloud Storage either.
However you can do either one of the following:
Option 1: Save the image locally, and then upload it to Google Cloud Storage:
Using your code:
from google.cloud import storage
def saving_figure(path_logdir):
data = np.arange(0, 21, 2)
fig = plt.figure(figsize=(20, 10))
plt.plot(data)
fig.savefig("your_local_path/accuracy_loss_graph.png".format(path_logdir))
plt.close()
# init GCS client and upload file
client = storage.Client()
bucket = client.get_bucket('skin_cancer_mnist')
blob = bucket.blob('logs/20190116-195604/accuracy_loss_graph.png') # This defines the path where the file will be stored in the bucket
your_file_contents = blob.upload_from_filename(filename="your_local_path/accuracy_loss_graph.png")
Option 2: Save the image result from the figure to a variable, then upload it to GCS as a string (of bytes):
I have found the following StackOverflow answer that seems to save the figure image into a .png byte string, however I haven't tried it myself.
Again, based in your code:
from google.cloud import storage
import io
import urllib, base64
def saving_figure(path_logdir):
data = np.arange(0, 21, 2)
fig = plt.figure(figsize=(20, 10))
plt.plot(data)
fig_to_upload = plt.gcf()
# Save figure image to a bytes buffer
buf = io.BytesIO()
fig_to_upload.savefig(buf, format='png')
buf.seek(0)
image_as_a_string = base64.b64encode(buf.read())
# init GCS client and upload buffer contents
client = storage.Client()
bucket = client.get_bucket('skin_cancer_mnist')
blob = bucket.blob('logs/20190116-195604/accuracy_loss_graph.png') # This defines the path where the file will be stored in the bucket
your_file_contents = blob.upload_from_string(image_as_a_string, content_type='image/png')
Edit: Both options assume that the environment you are running the script from, has the Cloud SDK installed, and a Google Cloud authenticated account activated (if you haven't, you can check this documentation that explains how to do it).
Related
I want to run an Azure function and save a CSV output to a Azure container.
I currently have two blocks of code that
Generates a CSV file.
Loads a CSV file into my container.
Each blocks works on my local PC in a Jupyter Notebook.
But I am struggling to combine them to work together in an Azure function. So I am looking for help.
Block 1 (Generate the CSV)
import yfinance as yf
import pandas as pd
from datetime import date
import csv
#stock names
NZX =[["Ascension Capital Limited", "ACE"],["AFC Group Holdings Limited", "AFC"],["Z Energy Limited", "ZEL"]]
today = str(date.today().isoformat())
directory = "C:\\Users\\Etc...\\SharePrices\\CSVs\\"
df_list = list()
for i in NZX:
code =i[1]
name =i[0]
cmpy = f"{code}.NZ"
tickerStrings = [cmpy]
for ticker in tickerStrings:
data = yf.download(ticker, group_by="Ticker", period='1d')
data['ticker'] = ticker
df_list.append(data)
df = pd.concat(df_list)
df.to_csv(f"{directory}_{today}.csv")
Block 2
from azure.storage.blob import BlobClient
blob = BlobClient.from_connection_string(conn_str="Myconnectionstring", container_name="container1", blob_name="StevesBlob3.csv")
with open("./output.csv", "rb") as data:
blob.upload_blob(data)
Can anyone point me in the right direction? Current issues I am struggling with
Do I need to save the file in a temp folder in the Azure function before trying to move it, or can I push it directly to the container
How do I reference the destination folder/container when I save the CSV?
Any guidance would be much appreciated.
New to Azure functions.
Example with a generated CSV file
#creates random csv in blob storage
import numpy as np
import pandas as pd
from datetime import datetime
from azure.storage.blob import ContainerClient
#Create dynamic filename
dateTimeObj = datetime.now()
timestampStr = dateTimeObj.strftime("%d%b%Y%H%M%S")
filename =f"{timestampStr}.csv"
df = pd.DataFrame(np.random.randn(5, 3), columns=['Column1','Column2','Colum3'])
df.to_csv(filename, index=False)
blob = BlobClient.from_connection_string(
conn_str="DefaultEndpointsProtocol=https;AccountName=storageaccountXXXXXXX;AccountKey=XXXXXXXXXXXXXXXX;EndpointSuffix=core.windows.net",
container_name="container2",
blob_name=filename)
with open(filename, "rb") as data:
blob.upload_blob(data)
I downloaded fasttext model lid.176.bin. If I run my code locally with model in folder everything works fine. But I need to run it in GC, so, I uploaded model into bucket and changed path to model from local to gs bucket and got an error: ValueError: gs://models/fasttext-model/lid.176.bin cannot be opened for loading!
How can I use model from bucket?
path_to_pretrained_model = 'gs://models/fasttext-model/lid.176.bin'
fasttext_model = fasttext.load_model(path_to_pretrained_model)
This function from Google Documentation helps me to solve the problem
from google.cloud import storage
def download_blob(bucket_name, source_blob_name, destination_file_name):
"""Downloads a blob from the bucket."""
# bucket_name = "your-bucket-name"
# source_blob_name = "storage-object-name"
# destination_file_name = "local/path/to/file"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
# Construct a client side representation of a blob.
# Note `Bucket.blob` differs from `Bucket.get_blob` as it doesn't retrieve
# any content from Google Cloud Storage. As we don't need additional data,
# using `Bucket.blob` is preferred here.
blob = bucket.blob(source_blob_name)
blob.download_to_filename(destination_file_name)
print(
"Blob {} downloaded to {}.".format(
source_blob_name, destination_file_name
)
)
I have a bucket on S3.
I want to be able to connect to it and read the pictures/PDFs into my EC2 machine memory, perform OCR and get needed fields.
Here is what I have done so far but unfortunately it doesn't work.
import cv2
import boto3
import matplotlib
import pytesseract
from PIL import Image
boto3.setup_default_session(profile_name='default-mfasession')
s3_client = boto3.client('s3')
s3_resource = boto3.resource('s3')
bucket_name = "my_bucket"
key = "my-files/._Screenshot 2020-04-20 at 14.21.20.png"
bucket = s3_resource.Bucket(bucket_name)
object = bucket.Object(key)
response = object.get()
file_stream = response['Body']
im = Image.open(file_stream)
np.array(im)
Returns me an error:
UnidentifiedImageError: cannot identify image file <_io.BytesIO object
at 0x7fae33dce110>
I have tried all the answers related to this issue in SO nothing helped.
Including:
matplotlib: ValueError: invalid PNG header
and
PIL cannot identify image file for io.BytesIO object
Please advise how to solve it?
This is what I usually use. Maybe it will work for you as well:
def image_from_s3(bucket, key):
bucket = s3_resource.Bucket(bucket)
image = bucket.Object(key)
img_data = image.get().get('Body').read()
return Image.open(io.BytesIO(img_data))
And in your handler you execute this:
img = image_from_s3(image_bucket, image_key)
img should be Pillow's image if it successfully executes.
I am a beginner in using boto3 and I'd like to compress a file that is on a s3 bucket without downloading it to my local laptop. It is supposed to be a streaming compression (Glue aws). Here you can find my three attempts. The first one would be the best one because it is, in my opinion, on stream (similar to "gzip.open" function).
First wrong attempt (gzip.s3.open does not exists...):
with gzip.s3.open('s3://bucket/attempt.csv','wb') as fo:
"operations (write a file)"
Second wrong attempt (s3fs gzip compression on pandas dataframe):
import gzip
import boto3
from io import BytesIO, TextIOWrapper
s3 = boto3.client('s3', aws_access_key_id='', aws_secret_access_key='')
# read file
source_response_m = s3.get_object(Bucket=bucket,Key='file.csv')
df = pd.read_csv(io.BytesIO(source_response_m['Body'].read()))
# compress file
buffer = BytesIO()
with gzip.GzipFile(mode='w', fileobj=buffer) as zipped_file:
df.to_csv(TextIOWrapper(zipped_file, 'utf8'), index=False)
# upload it
s3_resource = boto3.resource('s3',aws_access_key_id='', aws_secret_access_key='')
s3_object = s3_resource.Object(bucket, 'file.csv.gz')
s3_object.put(Body=buffer.getvalue())
Third wrong attempt (Upload Gzip file using Boto3 & https://gist.github.com/tobywf/079b36898d39eeb1824977c6c2f6d51e)
from io import BytesIO
import gzip
import shutil
import boto3
from tempfile import TemporaryFile
s3 = boto3.resource('s3',aws_access_key_id='', aws_secret_access_key='')
bucket = s3.Bucket('bucket')
def upload_gzipped(bucket, key, fp, compressed_fp=None, content_type='text/plain'):
"""Compress and upload the contents from fp to S3.
If compressed_fp is None, the compression is performed in memory.
"""
if not compressed_fp:
compressed_fp = BytesIO()
with gzip.GzipFile(fileobj=compressed_fp, mode='wb') as gz:
shutil.copyfileobj(fp, gz)
compressed_fp.seek(0)
bucket.upload_fileobj(compressed_fp, key, {'ContentType': content_type, 'ContentEncoding': 'gzip'})
upload_gzipped(bucket,'folder/file.gz.csv', 'file.csv.gz')
Honestly I have no idea how to use the latter attempt. The doc I have found is not very clear and there are no examples.
Do you have any ideas/suggestions to overcome my issue?
Thanks in advance.
Solution
I was able to solve my issue using the link below. Hope it will be useful for you.
https://gist.github.com/veselosky/9427faa38cee75cd8e27
D
I am writing machine learning code that can be trained locally or in the cloud. I am using keras.preprocessing to load images, which under the hood uses PIL. It works fine for local files, but understandably doesn't understand Google Cloud Storage paths, like "gs://...".
from keras.preprocessing import image
image.load_img("gs://myapp-some-bucket/123.png")
Gives this error:
.../lib/python2.7/site-packages/keras/preprocessing/image.py", line 320, in load_img img = pil_image.open(path) File
.../lib/python2.7/site-packages/PIL/Image.py", line 2530, in open fp = builtins.open(filename, "rb") IOError: [Errno 2] No such file or directory: 'gs://myapp-some-bucket/123.png'
What is the correct way of doing this? I ultimately need a folder of images to be a single numpy array (images decoded and grayscale).
Found a replacement for keras.preprocessing.image.load_img, that understands GCS. I also included more code to read the whole folder, and turn every image in the folder into a single numpy array for training...
import os
import tensorflow as tf
from tensorflow.python.platform import gfile
filelist = gfile.ListDirectory("gs://myapp-some-bucket")
sess = tf.Session()
with sess.as_default():
x = np.array([np.array(tf.image.decode_png(tf.read_file(os.path.join(train_files_dir, filename))).eval()) for filename in filelist])
Load image:
image_path = 'gs://xxxxxxx.jpg'
image = tf.read_file(image_path)
image = tf.image.decode_jpeg(image)
image_array = sess.run(image)
Save image:
job_dir = 'gs://xxxxxxxx'
image = tf.image.encode_jpeg(image_array)
file_name = 'xxx.jpg'
write_op = tf.write_file(os.path.join(job_dir, file_name), image)
sess.run(write_op)