Can't copy/download a Blob Storage file to Azure Functions folder - python-3.x

Here are the problem details:
We are able to upload a file to Azure Blob Storage.
We also have an Azure Function that runs a Python (python 3.6) App (engine/init.py) using an http trigger (The function essentially takes a file and runs it through an Azure Speaker Recognition service)
We can run the code on a local machine and it works.
The problem is when we move the code to the Azure cloud (configured as a Linux Machine) and want to access the file in Blob Storage the app doesn't work (returns error 500). We have verified the path is correct (both source and destination), we can see the file in the blob storage, but we cannot read the contents of the file in Blob Storage (ie we cannot read the text in a .txt file) and we cannot copy or download the file into the Azure Function directory.
The file access details are in a file called downloadFile.py (see code below) which uses BlockBlobService to connect with the Azure Blob Storage.
We use block_blob_service.list_blob to find the file
We use block_blob_service.get_blob_to_path to download the file
Both functions work fine on the local machine as we have the azure-storage-blob library included in our requirements.txt file.
Ultimately, our objective is we need the python App to access a file in Blob storage and return a result. We are open to how we achieve that. Thx in advance for your help.
Here is the requirements.txt file contents:
azure-functions
azure-storage-blob == 1.5.0
Here is our code (downloadFile.py):
from azure.storage.blob import BlockBlobService, PublicAccess
import os, uuid, sys
def run_sample():
try:
# Create the BlockBlockService that is used to call the Blob service for the storage account
block_blob_service = BlockBlobService(account_name='', account_key='')
# Create a container called 'quickstartblobs'.
#container_name ='quickstartblobs'
container_name ='images'
#block_blob_service.create_container(container_name)
# Set the permission so the blobs are public.
#block_blob_service.set_container_acl(container_name, public_access=PublicAccess.Container)
# Create a file in Documents to test the upload and download.
__location__ = os.path.realpath(
os.path.join(os.getcwd(), os.path.dirname(__file__)))
#local_path = os.path.join(__location__, 'SoundsForJay/')
local_path = os.path.join(__location__)
# not able to download file to azure function.
#local_path=os.path.abspath(os.path.curdir)
# List the blobs in the container
print("\nList blobs in the container")
generator = block_blob_service.list_blobs(container_name)
for blob in generator:
print("\t Blob name: " + blob.name)
if ".wav" in blob.name:
local_file_name = blob.name
# Download the blob(s).
# Add '_DOWNLOADED' as prefix to '.txt' so you can see both files in Documents.
#full_path_to_file2 = os.path.join(local_path, str.replace(local_file_name ,'.txt', '_DOWNLOADED.txt'))
full_path_to_file2 = os.path.join(local_path, str.replace(local_file_name ,'.wav', '_DOWNLOADED.wav'))
print("\nDownloading blob to " + full_path_to_file2)
block_blob_service.get_blob_to_path(container_name, local_file_name, full_path_to_file2)
sys.stdout.write("Sample finished running. When you hit <any key>, the sample will be deleted and the sample "
"application will exit.")
sys.stdout.flush()
#input()
# Clean up resources. This includes the container and the temp files
#block_blob_service.delete_container(container_name)
#os.remove(full_path_to_file)
#os.remove(full_path_to_file2)
except Exception as e:
print(e)
return "run_sample is running."

Now Python Azure function doesn't allow to write the file, it's read-only mode and this is not able to change. So you could not use get_blob_to_path cause you could write file to your disk.
So if you just want to read the text file content, you could use the below code
filename = "test.txt"
account_name = "storage account"
account_key = "account key"
input_container_name="test"
block_blob_service = BlockBlobService(account_name, account_key)
blobstring = block_blob_service.get_blob_to_text(input_container_name, filename).content
Also you could use the function blob binding to read the content, bind the inputblob as a stream.
def main(req: func.HttpRequest,inputblob: func.InputStream) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
name = req.params.get('name')
if not name:
try:
req_body = req.get_json()
except ValueError:
pass
else:
name = req_body.get('name')
if name:
return func.HttpResponse(inputblob.read(size=-1))
else:
return func.HttpResponse(
"Please pass a name on the query string or in the request body",
status_code=400
)

Related

Saving file into Azure Blob

I am using below python code to save the file into local folder. I want to save this file into Azure Blob directly. I do not want file to be stored locally and then upload into blob.
I tried giving blob location in folder variable but it did not work. I have excel file that I want to read from Web browser and saved into Azure blobs using python.
folder = 'Desktop/files/ab'
r = requests.get(api_end_point, headers=api_headers, stream=True)
with open(folder, 'wb') as f:
f.write(r.content)
First you should get the files as something like stream.
import os
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
connect_str = os.getenv('str')
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
container_name = "test"
container_client = blob_service_client.get_container_client(container_name)
blob_client = blob_service_client.get_blob_client(container_name, "MyFirstBlob.txt")
blob_client.upload_blob(req.get_body(), blob_type="BlockBlob")
On my side, I put the data in the body of request, and I upload that to azure blob. It is stream. You can also put a stream in it.
These are the offcial doc:
https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python#upload-blobs-to-a-container
https://learn.microsoft.com/en-us/azure/developer/python/sdk/storage/azure-storage-blob/azure.storage.blob.blobserviceclient?view=storage-py-v12
https://learn.microsoft.com/en-us/azure/developer/python/sdk/storage/azure-storage-blob/azure.storage.blob.blobclient?view=storage-py-v12#upload-blob-data--blob-type--blobtype-blockblob---blockblob----length-none--metadata-none----kwargs-

upload time very slow with multiple file as blob uploads on Azure storage containers using Python

I want to store some user images from a feedback section of an app I am creating for which I am using Azure containers.
Although I am able to store the images the process is taking ~150 to 200 seconds for 3 files of ~190kb each
Is there a better way to upload the files or am I missing something here (file type used for upload maybe).
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient,ContentSettings
connect_str = 'my_connection_string'
# Create the BlobServiceClient object which will be used to create a container client
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
photos_object_list = []
my_content_settings = ContentSettings(content_type='image/png')
#Files to upload
file_list = ['/Users/vikasnair/Documents/feedbackimages/reference_0.jpg','/Users/vikasnair/Documents/feedbackimages/reference_1.jpeg','/Users/vikasnair/Documents/feedbackimages/reference_2.jpeg']
#creating list of file objects as I would be taking list of file objects from the front end as an input
for i in range(0,len(file_list)):
photos_object_list.append(open(file_list[i],'rb'))
import timeit
start = timeit.default_timer()
if photos_object_list != None:
for u in range(0,len(photos_object_list)):
blob_client = blob_service_client.get_blob_client(container="container/folder", blob=loc_id+'_'+str(u)+'.jpg')
blob_client.upload_blob(photos_object_list[u], overwrite=True, content_settings=my_content_settings)
stop = timeit.default_timer()
print('Time: ', stop - start)
A couple things you can do to decrease the duration of the upload:
Reuse blob_service_client for all the uploads rather than creating a new blob_client for each file.
Use async blob methods to upload all the file in parallel rather than sequentially.
If the target blob parent folder is static, you can also use an Azure file share with your container instance. This will allow you to use regular file system operations to persist files to and retrieve files from a storage account.

Writing a new file to a Google Cloud Storage bucket from a Google Cloud Function (Python)

I am trying to write a new file (not upload an existing file) to a Google Cloud Storage bucket from inside a Python Google Cloud Function.
I tried using google-cloud-storage but it does not have the
"open" attribute for the bucket.
I tried to use the App Engine library GoogleAppEngineCloudStorageClient but the function cannot deploy with this dependencies.
I tried to use gcs-client but I cannot pass the credentials inside the function as it requires a JSON file.
Any ideas would be much appreciated.
Thanks.
from google.cloud import storage
import io
# bucket name
bucket = "my_bucket_name"
# Get the bucket that the file will be uploaded to.
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket)
# Create a new blob and upload the file's content.
my_file = bucket.blob('media/teste_file01.txt')
# create in memory file
output = io.StringIO("This is a test \n")
# upload from string
my_file.upload_from_string(output.read(), content_type="text/plain")
output.close()
# list created files
blobs = storage_client.list_blobs(bucket)
for blob in blobs:
print(blob.name)
# Make the blob publicly viewable.
my_file.make_public()
You can now write files directly to Google Cloud Storage. It is no longer necessary to create a file locally and then upload it.
You can use the blob.open() as follows:
from google.cloud import storage
def write_file():
client = storage.Client()
bucket = client.get_bucket('bucket-name')
blob = bucket.blob('path/to/new-blob.txt')
with blob.open(mode='w') as f:
for line in object:
f.write(line)
You can find more examples and snippets here:
https://github.com/googleapis/python-storage/tree/main/samples/snippets
You have to create your file locally and then to push it to GCS. You can't create a file dynamically in GCS by using open.
For this, you can write in the /tmp directory which is an in memory file system. By the way, you will never be able to create a file bigger than the amount of the memory allowed to your function minus the memory footprint of your code. With a function with 2Gb, you can expect a max file size of about 1.5Gb.
Note: GCS is not a file system, and you don't have to use it like this
EDIT 1
Things have changed since my answer:
It's now possible to write in any directory in the container (not only the /tmp)
You can stream write a file in GCS, as well as you receive it in streaming mode on CLoud Run. Here a sample to stream write to GCS.
Note: stream write deactivate the checksum validation. Therefore, you won't have integrity checks at the end of the file stream write.

Unable to use data from Google Cloud Storage in App Engine using Python 3

How can I read the data stored in my Cloud Storage bucket of my project and use it in my Python code that I am writing in App Engine?
I tried using:
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(source_blob_name)
But I am unable to figure out how to extract actual data from the code to get it in a usable form.
Any help would be appreciated.
Getting a file from a Google Cloud Storage bucket means that you are just getting an object. This concept abstract the file itself from your code. You will either need to store locally the file to perform any operation on it or depending on the extension of your file put that object inside of a file readstreamer or the method that you need to read the file.
Here you can see a code example on how to read a file from app engine:
def read_file(self, filename):
self.response.write('Reading the full file contents:\n')
gcs_file = gcs.open(filename)
contents = gcs_file.read()
gcs_file.close()
self.response.write(contents)
You have a couple of options.
content = blob.download_as_string() --> Converts the content of your Cloud Storage object to String.
blob.download_to_file(file_obj) --> Updates an existing file_obj to include the Cloud Storage object content.
blob.download_to_filename(filename) --> Saves the object in a file. On App Engine Standard environment, you can store files in /tmp/ directory.
Refer this link for more information.

I am trying to read a file in in Google Cloud Storage bucket with a python code but getting the error

I am trying to read a file in stored in Google Cloud Storage bucket python:
textfile = open("${gcs_bucket}mdm/OFF-B/test.txt", 'r')
times = textfile.read().splitlines()
textfile.close()
print(getcwd())
print(times)
The file is present in that location but I am receiving the following error:
File "/var/cache/tomcat/temp/interpreter-9196592956267519250.tmp", line 3, in <module>
textfile = open("gs://tp-bi-datalake-mft-landing-dev/mdm/OFF-B/test.txt", 'r')
IOError: [Errno 2] No such file or directory: 'gs://tp-bi-datalake-mft-landing-dev/mdm/OFF-B/test.txt'
That's because you are trying to read it as a local file.
To read from Cloud Storage you need to import the library and use the client.
Check this similar Stackoverflow Question.
In your case it would be something like:
from google.cloud import storage
# Instantiates a client
client = storage.Client()
bucket_name = 'tp-bi-datalake-mft-landing-dev'
bucket = client.get_bucket(bucket_name)
blob = bucket.get_blob('mdm/OFF-B/test.txt')
downloaded_blob = blob.download_as_string()
print(downloaded_blob)
Also you will need to install the library, you can do that simply by running:
pip install google-cloud-storage before you run your code.
Also here you can find some more Google Cloud Storage Python Samples.

Resources