FastText cannot be opened for loading from bucket - nlp

I downloaded fasttext model lid.176.bin. If I run my code locally with model in folder everything works fine. But I need to run it in GC, so, I uploaded model into bucket and changed path to model from local to gs bucket and got an error: ValueError: gs://models/fasttext-model/lid.176.bin cannot be opened for loading!
How can I use model from bucket?
path_to_pretrained_model = 'gs://models/fasttext-model/lid.176.bin'
fasttext_model = fasttext.load_model(path_to_pretrained_model)

This function from Google Documentation helps me to solve the problem
from google.cloud import storage
def download_blob(bucket_name, source_blob_name, destination_file_name):
"""Downloads a blob from the bucket."""
# bucket_name = "your-bucket-name"
# source_blob_name = "storage-object-name"
# destination_file_name = "local/path/to/file"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
# Construct a client side representation of a blob.
# Note `Bucket.blob` differs from `Bucket.get_blob` as it doesn't retrieve
# any content from Google Cloud Storage. As we don't need additional data,
# using `Bucket.blob` is preferred here.
blob = bucket.blob(source_blob_name)
blob.download_to_filename(destination_file_name)
print(
"Blob {} downloaded to {}.".format(
source_blob_name, destination_file_name
)
)

Related

How to save and access pickle/hdf5 files in azure machine learning studio

I have a pickle file parameters.pkl containing some parameters and their values of a model. The pickle file has been created through the following process:
dict={'scaler': scaler,
'features': z_tags,
'Z_reconstruction_loss': Z_reconstruction_loss}
pickle.dump(dict, open('parameters.pkl', 'wb'))
model_V2.hdf5
I am new to azure machine learning studio.It will be helpful to know, how the pickle file and hdf5 files can be stored in Azure machine Learning Studio and an API endpoint be created, so that the the pickle file can be accessed through API. Objective is to access the pickle file and its contents through API.. I have tried the following:
pip install azureml , azureml-core
from azureml.core import Workspace
from azureml.core.webservice import AciWebservice
from azureml.core.webservice import Webservice
from azureml.core.model import InferenceConfig
from azureml.core.environment import Environment
from azureml.core import Workspace
from azureml.core.model import Model
from azureml.core.conda_dependencies import CondaDependencies
ws = Workspace.create(
name='myworkspace',
subscription_id='<azure-subscription-id>',
resource_group='myresourcegroup',
create_resource_group=True,
location='eastus2'
)
ws.write_config()
ws = Workspace.from_config()
model = Model.register(workspace = ws,
model_path ="model/parameters.pkl",
model_name = "parameters",
tags = {"version": "1"},
description = "parameters",
)
# to install required packages
env = Environment('env')
cd = CondaDependencies.create(pip_packages=['pandas==1.1.5', 'azureml-defaults','joblib==0.17.0'], conda_packages = ['scikit-learn==0.23.2'])
env.python.conda_dependencies = cd
# Register environment to re-use later
env.register(workspace = ws)
print("Registered Environment")
myenv = Environment.get(workspace=ws, name="env")
myenv.save_to_directory('./environ', overwrite=True)
aciconfig = AciWebservice.deploy_configuration(
cpu_cores=1,
memory_gb=1,
tags={"data":"parameters"},
description='parameters MODEL',
)
inference_config = InferenceConfig(entry_script="score.py", environment=myenv)
What to modify in following score script, as I don't want to predict anything but access the parameter values stored in the pickle file.
def init():
global modelmodel_path = Model.get_model_path("parameters")
print("Model Path is ", model_path)
model = joblib.load(model_path)
def run(data):
try:
data = json.loads(data)
result = model.predict(data['data'])
return {'data' : result.tolist() , 'message' : "Successfully
accessed"}
except Exception as e:
error = str(e)
return {'data' : error , 'message' : 'Failed to access'}
Deploy the Model
service = Model.deploy(workspace=ws,
name='iris-model',
models=[model],
inference_config=inference_config,
deployment_config=aciconfig,
overwrite = True)
service.wait_for_deployment(show_output=True)
url = service.scoring_uri
print(url)
We need to deploy the model as the external service and as a web service application to azure machine learning studio.
Follow the below steps to reproduce the work to deploy the model as the web service application to perform API calls.
Note: Before proceeding with the steps, make the models pickle file handy
Open https://studio.azureml.net/
Click on New
Then click on upload from local files
Select the zip file to upload
Under experiments upload the dataset.
Use the following canvas steps to create the model
After mapping all the elements and also executing the python script. Click on set up web service
Click on deploy web service.

Not able to read model.pkl from output folder in Azure ML

I'm try to read the model.pkl file from the artifacts output folder like this
def init():
global model
# infile = open('model.pkl','rb')
# model = pickle.load(infile)
#model = joblib.load('model.pkl')
model_path = Model.get_model_path(model_name = '<<modelname>>')
model_path="outputs/model.pkl"
# deserialize the model file back into a sklearn model
model = joblib.load(model_path)
But still its not working please guide me how to read model.pkl file from artifacts output folder, because of this it is failing to deploy into Azure container instance

Can't read PNG files from S3 in Python 3?

I have a bucket on S3.
I want to be able to connect to it and read the pictures/PDFs into my EC2 machine memory, perform OCR and get needed fields.
Here is what I have done so far but unfortunately it doesn't work.
import cv2
import boto3
import matplotlib
import pytesseract
from PIL import Image
boto3.setup_default_session(profile_name='default-mfasession')
s3_client = boto3.client('s3')
s3_resource = boto3.resource('s3')
bucket_name = "my_bucket"
key = "my-files/._Screenshot 2020-04-20 at 14.21.20.png"
bucket = s3_resource.Bucket(bucket_name)
object = bucket.Object(key)
response = object.get()
file_stream = response['Body']
im = Image.open(file_stream)
np.array(im)
Returns me an error:
UnidentifiedImageError: cannot identify image file <_io.BytesIO object
at 0x7fae33dce110>
I have tried all the answers related to this issue in SO nothing helped.
Including:
matplotlib: ValueError: invalid PNG header
and
PIL cannot identify image file for io.BytesIO object
Please advise how to solve it?
This is what I usually use. Maybe it will work for you as well:
def image_from_s3(bucket, key):
bucket = s3_resource.Bucket(bucket)
image = bucket.Object(key)
img_data = image.get().get('Body').read()
return Image.open(io.BytesIO(img_data))
And in your handler you execute this:
img = image_from_s3(image_bucket, image_key)
img should be Pillow's image if it successfully executes.

I'm getting the following error when I try to run inference on an sklearn model when used as a Lambda Function on AWS

The code used for inference is:
import json
import pickle
import numpy as np
import boto3
s3 = boto3.resource('s3')
# Function for transferring pickles from s3 to lambda
def download_files_from_s3():
with open('/tmp/km_model_on_space_data.pkl', 'wb') as f:
s3.Bucket("ml-model-first-try").download_fileobj("km_model_on_space_data.pkl", f)
with open('/tmp/tfidf_vectorizer.pkl', 'wb') as f:
s3.Bucket("ml-model-first-try").download_fileobj("tfidf_vectorizer.pkl", f)
with open('/tmp/cluster_distances.pkl', 'wb') as f:
s3.Bucket("ml-model-first-try").download_fileobj("cluster_distances.pkl", f)
with open('/tmp/space_ids_only.pkl', 'wb') as f:
s3.Bucket("ml-model-first-try").download_fileobj("space_ids_only.pkl", f)
# Downloading files from s3 to lambda ------------ Comment the next line if data is already downloaded
download_files_from_s3()
# Loading pickles
km = pickle.load(open('/tmp/km_model_on_space_data.pkl','rb'))
tfidf_vectorizer = pickle.load(open('/tmp/tfidf_vectorizer.pkl','rb'))
cluster_distances = pickle.load(open('/tmp/cluster_distances.pkl','rb'))
space_ids = pickle.load(open('/tmp/space_ids_only.pkl','rb'))
# Automatically called everytime
def lambda_handler(event, context):
tfidf = tfidf_vectorizer.transform([event['text']])
cluster = km.predict(tfidf)
cluster_arr = cluster_distances[:,cluster[0]]
cluter_arr_sorted = cluster_arr.argsort()
recommended_space_ids = []
for i in cluter_arr_sorted[0:10]:
recommended_space_ids.append(space_ids[i])
return {'Cluster_Number' : cluster[0], 'Space_Ids' : recommended_space_ids}
The pickled files are stored in s3 and retrieved into lambda. Then they are used for inference. The zip file which is uploaded to s3 contains all the necessary libraries like numpy and sklearn etc along with a function.py file whose code is given above. It contains the lambda_handler which is used for inference.
The error is:
"Unable to import module 'function': dynamic module does not define module export function (PyInit_multiarray)"

Writing figure to Google Cloud Storage instead of local drive

I want to upload the figure which is made with matplotlib to GCS.
Current code:
from tensorflow.gfile import MakeDirs, Open
import numpy as np
import matplotlib.pyplot as plt
import datetime
_LOGDIR = "{date:%Y%m%d-%H%M%S}".format(date=datetime.datetime.now())
_PATH_LOGDIR = 'gs://{0}/logs/{1}'.format('skin_cancer_mnist', _LOGDIR)
MakeDirs(_PATH_LOGDIR))
def saving_figure(path_logdir):
data = np.arange(0, 21, 2)
fig = plt.figure(figsize=(20, 10))
plt.plot(data)
fig.savefig("{0}/accuracy_loss_graph.png".format(path_logdir))
plt.close()
saving_figure(_PATH_LOGDIR)
"/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/matplotlib/backends/backend_agg.py", line 512, in print_png
filename_or_obj = open(filename_or_obj, 'wb')
FileNotFoundError: [Errno 2] No such file or directory: 'gs://skin_cancer_mnist/logs/20190116-195604/accuracy_loss_graph.png'
(The directory exists, I checked)
I could change the source code of matplotlib to use the Open method of tf.Gfile.Open, but there should be a better option...
Joans 2nd Option didn't work for me, I found a solution that worked for me:
from google.cloud import storage
import io
def saving_figure(path_logdir):
data = np.arange(0, 21, 2)
fig = plt.figure(figsize=(20, 10))
plt.plot(data)
fig_to_upload = plt.gcf()
# Save figure image to a bytes buffer
buf = io.BytesIO()
fig_to_upload.savefig(buf, format='png')
# init GCS client and upload buffer contents
client = storage.Client()
bucket = client.get_bucket('skin_cancer_mnist')
blob = bucket.blob('logs/20190116-195604/accuracy_loss_graph.png')
blob.upload_from_file(buf, content_type='image/png', rewind=True)
You cannot directly upload a file to Google Cloud Storage using the python open function (which is the one that matplotlib.pyplot.savefig is using behind the curtains).
Instead, you should use the Cloud Storage Client Library for Python. Check this documentation for details on how this library is used. This will allow you to manipulate files and upload/download them to GCS, among other things.
You will have to import this library in order to use it, you can install it by running pip install google-cloud-storage and import it as from google.cloud import storage.
As well, since the plt.figure is an object, and not the actual .png image that you want to upload, you cannot directly upload it to Google Cloud Storage either.
However you can do either one of the following:
Option 1: Save the image locally, and then upload it to Google Cloud Storage:
Using your code:
from google.cloud import storage
def saving_figure(path_logdir):
data = np.arange(0, 21, 2)
fig = plt.figure(figsize=(20, 10))
plt.plot(data)
fig.savefig("your_local_path/accuracy_loss_graph.png".format(path_logdir))
plt.close()
# init GCS client and upload file
client = storage.Client()
bucket = client.get_bucket('skin_cancer_mnist')
blob = bucket.blob('logs/20190116-195604/accuracy_loss_graph.png') # This defines the path where the file will be stored in the bucket
your_file_contents = blob.upload_from_filename(filename="your_local_path/accuracy_loss_graph.png")
Option 2: Save the image result from the figure to a variable, then upload it to GCS as a string (of bytes):
I have found the following StackOverflow answer that seems to save the figure image into a .png byte string, however I haven't tried it myself.
Again, based in your code:
from google.cloud import storage
import io
import urllib, base64
def saving_figure(path_logdir):
data = np.arange(0, 21, 2)
fig = plt.figure(figsize=(20, 10))
plt.plot(data)
fig_to_upload = plt.gcf()
# Save figure image to a bytes buffer
buf = io.BytesIO()
fig_to_upload.savefig(buf, format='png')
buf.seek(0)
image_as_a_string = base64.b64encode(buf.read())
# init GCS client and upload buffer contents
client = storage.Client()
bucket = client.get_bucket('skin_cancer_mnist')
blob = bucket.blob('logs/20190116-195604/accuracy_loss_graph.png') # This defines the path where the file will be stored in the bucket
your_file_contents = blob.upload_from_string(image_as_a_string, content_type='image/png')
Edit: Both options assume that the environment you are running the script from, has the Cloud SDK installed, and a Google Cloud authenticated account activated (if you haven't, you can check this documentation that explains how to do it).

Resources