Azure Blob Using Python - python-3.x

I am accessing a website that allows me to download CSV file. I would like to store the CSV file directly to the blob container. I know that one way is to download the file locally and then upload the file, but I would like to skip the step of downloading the file locally. Is there a way in which I could achieve this.
i tried the following:
block_blob_service.create_blob_from_path('containername','blobname','https://*****.blob.core.windows.net/containername/FlightStats',content_settings=ContentSettings(content_type='application/CSV'))
but I keep getting errors stating path is not found.
Any help is appreciated. Thanks!

The file_path in create_blob_from_path is the path of your local file, looks like "C:\xxx\xxx". This path('https://*****.blob.core.windows.net/containername/FlightStats') is Blob URL.
You could download your file to byte array or stream, then use create_blob_from_bytes or create_blob_from_stream method.

Other answer uses the so called "Azure SDK for Python legacy".
I recommend that if it's fresh implementation then use Gen2 Storage Account (instead of Gen1 or Blob storage).
For Gen2 storage account, see example here:
from azure.storage.filedatalake import DataLakeFileClient
data = b"abc"
file = DataLakeFileClient.from_connection_string("my_connection_string",
file_system_name="myfilesystem", file_path="myfile")
file.append_data(data, offset=0, length=len(data))
file.flush_data(len(data))
It's painful, if you're appending multiple times then you'll have to keep track of offset on client side.

Related

Moving data from a database to Azure blob storage

I'm able to use dask.dataframe.read_sql_table to read the data e.g. df = dd.read_sql_table(table='TABLE', uri=uri, index_col='field', npartitions=N)
What would be the next (best) steps to saving it as a parquet file in Azure blob storage?
From my small research there are a couple of options:
Save locally and use https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-blobs?toc=/azure/storage/blobs/toc.json (not great for big data)
I believe adlfs is to read from blob
use dask.dataframe.to_parquet and work out how to point to the blob container
intake project (not sure where to start)
$ pip install adlfs
dd.to_parquet(
df=df,
path='absf://{BLOB}/{FILE_NAME}.parquet',
storage_options={'account_name': 'ACCOUNT_NAME',
'account_key': 'ACCOUNT_KEY'},
)

Writing a new file to a Google Cloud Storage bucket from a Google Cloud Function (Python)

I am trying to write a new file (not upload an existing file) to a Google Cloud Storage bucket from inside a Python Google Cloud Function.
I tried using google-cloud-storage but it does not have the
"open" attribute for the bucket.
I tried to use the App Engine library GoogleAppEngineCloudStorageClient but the function cannot deploy with this dependencies.
I tried to use gcs-client but I cannot pass the credentials inside the function as it requires a JSON file.
Any ideas would be much appreciated.
Thanks.
from google.cloud import storage
import io
# bucket name
bucket = "my_bucket_name"
# Get the bucket that the file will be uploaded to.
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket)
# Create a new blob and upload the file's content.
my_file = bucket.blob('media/teste_file01.txt')
# create in memory file
output = io.StringIO("This is a test \n")
# upload from string
my_file.upload_from_string(output.read(), content_type="text/plain")
output.close()
# list created files
blobs = storage_client.list_blobs(bucket)
for blob in blobs:
print(blob.name)
# Make the blob publicly viewable.
my_file.make_public()
You can now write files directly to Google Cloud Storage. It is no longer necessary to create a file locally and then upload it.
You can use the blob.open() as follows:
from google.cloud import storage
def write_file():
client = storage.Client()
bucket = client.get_bucket('bucket-name')
blob = bucket.blob('path/to/new-blob.txt')
with blob.open(mode='w') as f:
for line in object:
f.write(line)
You can find more examples and snippets here:
https://github.com/googleapis/python-storage/tree/main/samples/snippets
You have to create your file locally and then to push it to GCS. You can't create a file dynamically in GCS by using open.
For this, you can write in the /tmp directory which is an in memory file system. By the way, you will never be able to create a file bigger than the amount of the memory allowed to your function minus the memory footprint of your code. With a function with 2Gb, you can expect a max file size of about 1.5Gb.
Note: GCS is not a file system, and you don't have to use it like this
EDIT 1
Things have changed since my answer:
It's now possible to write in any directory in the container (not only the /tmp)
You can stream write a file in GCS, as well as you receive it in streaming mode on CLoud Run. Here a sample to stream write to GCS.
Note: stream write deactivate the checksum validation. Therefore, you won't have integrity checks at the end of the file stream write.

Unable to use data from Google Cloud Storage in App Engine using Python 3

How can I read the data stored in my Cloud Storage bucket of my project and use it in my Python code that I am writing in App Engine?
I tried using:
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(source_blob_name)
But I am unable to figure out how to extract actual data from the code to get it in a usable form.
Any help would be appreciated.
Getting a file from a Google Cloud Storage bucket means that you are just getting an object. This concept abstract the file itself from your code. You will either need to store locally the file to perform any operation on it or depending on the extension of your file put that object inside of a file readstreamer or the method that you need to read the file.
Here you can see a code example on how to read a file from app engine:
def read_file(self, filename):
self.response.write('Reading the full file contents:\n')
gcs_file = gcs.open(filename)
contents = gcs_file.read()
gcs_file.close()
self.response.write(contents)
You have a couple of options.
content = blob.download_as_string() --> Converts the content of your Cloud Storage object to String.
blob.download_to_file(file_obj) --> Updates an existing file_obj to include the Cloud Storage object content.
blob.download_to_filename(filename) --> Saves the object in a file. On App Engine Standard environment, you can store files in /tmp/ directory.
Refer this link for more information.

Copying multiple files from Azure Blob Storage

We are modelling a directory structure in Azure Blob storage. I would like to be able to copy all files from a folder to a local directory. Is there any way to copy multiple files from blob storage at once that match a pattern or do I have to get these files individually?
As you may already know, blob storage only support 1 level hierarchy: You have blob containers (folder) and each container contains blobs (files). There's no concept of folder hierarchy there. The way you create an illusion of folder hierarchy is via something called blob prefix. For example, look at the screenshot below:
In the picture above, you see two folders under images blob container: 16x16 and 24x24. In cloud, the blob names include these folder names. So the name of AddCertificate.png file in folder 24x24 in blob storage is 24x24/AddCertificate.png.
Now coming to your question, you would still need to download individual files but what storage client library allows you to do is fetch a list of blobs by blob prefix. So if you want to download all files in folder 24x24 (or in other words you want to download all blobs with prefix 24x24), you would first list the blobs with prefix 24x24 and then download each blob individually. On the local computer, you could create a folder by the name of the prefix.
You can refer below code as a sample reference ((its written in javascript but you can easily map the logic to any language). This code is maintained by Microsoft.
https://github.com/WindowsAzure/azure-sdk-tools-xplat/blob/master/lib/commands/storage.blob._js#L144
https://github.com/WindowsAzure/azure-sdk-tools-xplat/blob/master/lib/commands/storage.blob._js#L689
The second link explains how to parse blob prefixes and get the folder hierarchy out of it.
It also shows how to download blob using multiple threads and ensure the integrity of the blob using MD5.
Just including the high level code that process blob path containing prefixes. Please refer above link for full implementation, I cannot copy paste entire implementation here.
if (!fileName) {
var structure = StorageUtil.getStructureFromBlobName(specifiedBlobName);
fileName = structure.fileName;
fileName = utils.escapeFilePath(fileName);
structure.dirName = StorageUtil.recursiveMkdir(dirName, structure.dirName);
fileName = path.join(structure.dirName, fileName);
dirName = '.'; //FileName already contain the dirname
}

How to create a sub container in azure storage location

How can I create a sub container in the azure storage location?
Windows Azure doesn't provide the concept of heirarchical containers, but it does provide a mechanism to traverse heirarchy by convention and API. All containers are stored at the same level. You can gain simliar functionality by using naming conventions for your blob names.
For instance, you may create a container named "content" and create blobs with the following names in that container:
content/blue/images/logo.jpg
content/blue/images/icon-start.jpg
content/blue/images/icon-stop.jpg
content/red/images/logo.jpg
content/red/images/icon-start.jpg
content/red/images/icon-stop.jpg
Note that these blobs are a flat list against your "content" container. That said, using the "/" as a conventional delimiter, provides you with the functionality to traverse these in a heirarchical fashion.
protected IEnumerable<IListBlobItem>
GetDirectoryList(string directoryName, string subDirectoryName)
{
CloudStorageAccount account =
CloudStorageAccount.FromConfigurationSetting("DataConnectionString");
CloudBlobClient client =
account.CreateCloudBlobClient();
CloudBlobDirectory directory =
client.GetBlobDirectoryReference(directoryName);
CloudBlobDirectory subDirectory =
directory.GetSubdirectory(subDirectoryName);
return subDirectory.ListBlobs();
}
You can then call this as follows:
GetDirectoryList("content/blue", "images")
Note the use of GetBlobDirectoryReference and GetSubDirectory methods and the CloudBlobDirectory type instead of CloudBlobContainer. These provide the traversal functionality you are likely looking for.
This should help you get started. Let me know if this doesn't answer your question:
[ Thanks to Neil Mackenzie for inspiration ]
Are you referring to blob storage? If so, the hierarchy is simply StorageAccount/Container/BlobName. There are no nested containers.
Having said that, you can use slashes in your blob name to simulate nested containers in the URI. See this article on MSDN for naming details.
I aggree with tobint answer and I want to add something this situation because I also
I need the same way upload my games html to Azure Storage with create this directories :
Games\Beautyshop\index.html
Games\Beautyshop\assets\apple.png
Games\Beautyshop\assets\aromas.png
Games\Beautyshop\customfont.css
Games\Beautyshop\jquery.js
So After your recommends I tried to upload my content with tool which is Azure Storage Explorer and you can download tool and source code with this url : Azure Storage Explorer
First of all I tried to upload via tool but It doesn't allow to hierarchical directory upload because you don't need : How to create sub directory in a blob container
Finally, I debug Azure Storage Explorer source code and I edited Background_UploadBlobs method and UploadFileList field in StorageAccountViewModel.cs file. You can edit it what you wants.I may have made spelling errors :/ I am so sorry but That's only my recommend.
If you are tying to upload files from Azure portal:
To create a sub folder in container, while uploading a file you can go to Advanced options and select upload to a folder, which will create a new folder in the container and upload the file into that.
Kotlin Code
val blobClient = blobContainerClient.getBlobClient("$subDirNameTimeStamp/$fileName$extension");
this will create directory having TimeStamp as name and inside that there will be your Blob File. Notice the use of slash (/) in above code which will nest your blob file by creating folder named as previous string of slash.
It will look like this on portal
Sample code
string myfolder = "<folderName>";
string myfilename = "<fileName>";
string fileName = String.Format("{0}/{1}.csv", myfolder, myfilename);
CloudBlockBlob blob = container.GetBlockBlobReference(fileName);

Resources