azure container_client delete_blobs causes oom - python-3.x

I am using Azure blob SDK python version. Here is how I do blobs deletion:
blobs = container_client.list_blobs(name_starts_with="myprefix")
container_client.delete_blobs(*blobs)
if blobs here return a large amount of blob objects, the above code crashes.
What is the standard practice here? Are there other ways to do batch deletion?
Update:
reply to #Ivan Yang:
This is slightly different from your solution. I ran it but got error
There is a partial failure in the batch operation.
blobs = container_client.list_blobs(name_starts_with="myprefix")
blobs_list = list(blobs)
i in range(0, len(blobs_list), 10):
container_client.delete_blobs(*blobs_list[i: i+10])

You'd better specify how many blobs you are trying to delete by using delete_blobs method. Then it's easier for debug.
As a workaround, you can fetch a certain number(like 10 blobs) of blobs each time, then delete until the continuation token is null.
Here is the sample code:
#define a continuation token
continuation_token = None
while True:
#fetch 10 blobs each time
blob_list = container_client.list_blobs(name_starts_with="xxx",results_per_page=10).by_page(continuation_token=continuation_token)
list_segment=[blob.name for blob in list(next(blob_list))]
container_client.delete_blobs(*list_segment)
continuation_token = blob_list.continuation_token
if not continuation_token:
break

Related

add header at top in csv file of append blob

I am creating a pipeline in Azure data factory where I am using Function app as one of activity to transform data and store in append blob container as csv format .As I have taken 50 batches in for loop so 50 times my function app is to process data for each order.I am appending header in csv file with below logic.
//First I am creating file as per business logic //
//csveventcontent is my source data //
var dateAndTime = DateTime.Now.AddDays(-1);
string FileDate = dateAndTime.ToString("ddMMyyyy");
string FileName = _config.ContainerName + FileDate + ".csv";
StringBuilder csveventcontent = new StringBuilder();
OrderEventService obj = new OrderEventService();
//Now I am checking if todays file exists and if it doesn't we create it.//
if (await appBlob.ExistsAsync() == false)
{
await appBlob.CreateOrReplaceAsync(); //CreateOrReplace();
//Append Header
csveventcontent.AppendLine(obj.GetHeader());
}
Now the problem is header is appending so many times in csv file .Sometimes it is not appending at top.Probably due to parralel function app is running 50 times.
How I can fixed header at top only at one time.
I have tried with data flow and logic app also but unable to do it.If it can be handled through code that would be easier I guess.
I think you are right there. Its the concurrency of the function app that is causing the problem. The best approach would be to use a queue and process messages one by one. Or you could use a distributed lock to ensure only one function writes to the file at a time. You can use blob leases for this.
The Lease Blob operation creates and manages a lock on a blob for write and delete operations. The lock duration can be 15 to 60 seconds, or can be infinite.
Refer: Lease Blob Request Headers

Firebase Storage remove custom metadata key

I couldn't remove a custom metadata key from a file in Firebase storage.
This is what I tried so far:
blob = bucket.get_blob("dir/file")
metadata = blob.metadata
metadata.pop('custom_key', None) # or del metadata['custom_key']
blob.metadata = metadata
blob.patch()
I also tried to set its value to None but it didn't help.
It seems that there are some reasons that could be affecting you to delete the custom metadata. I will address them individually, so it's easier for understanding.
First, it seems that when you read the metadata with blob.metadata, it only returns as a read-only - as clarified here. So, your updates will not work as you would like, using the way you are trying. The second reason, it seems that saving the metadata again back to blob, follows a different order than what you are trying - as shown here.
You can give it a try using the below code:
blob = bucket.get_blob("dir/file")
metadata = blob.metadata
metadata.pop{'custom_key': None}
blob.patch()
blob.metadata = metadata
While this code is untested, I believe it might help you changing the orders and avoid the blob.metadata read-only situation.
In case this doesn't help you, I would recommend you to raise an issue for in the official Github repository for the Python library on Cloud Storage, for further clarifications from the developers.

How to convert from Azure Append Blob to Azure Block Blob

Is their any any to convert from Append Blob to Block Blob .
Regards
C
For a blob conversion, I am using a
--blob-type=BlockBlob
option at the end of my azcopy.exe statement. So far it works well.
Good luck!
Is their any any to convert from Append Blob to Block Blob .
Once the blob has been created, its type cannot be changed, and it can be updated only by using operations appropriate for that blob type, i.e., writing a block or list of blocks to a block blob, appending blocks to a append blob, and writing pages to a page blob.
More information please refer to this link: Understanding Block Blobs, Append Blobs, and Page Blobs
Is their any any to convert from Append Blob to Block Blob .
Automatic conversion between blob types is not allowed. What you would need to do is download the blob and reupload it as Block Blob.
Given: i have source blob which is append blob
And: i have to copy source to new blob container as block blob
When: i use CopyBlobToBlobckBlobContainer functionThen: destination container will have same blob as source but as block blob.
public void CopyBlobToBlobckBlobContainer(string sourceBlobName)
{
var sourceContainerClient = new BlobContainerClient(sourceConnectionString, BlobContainerName);
var destinationContainerClient = new BlobContainerClient(destinationConnectionString, OutBlobContainerName);
destinationContainerClient.CreateIfNotExists();
var sourceBlobClient = sourceContainerClient.GetBlockBlobClient(sourceBlobName);
var sourceUri = sourceBlobClient.GenerateSasUri(BlobSasPermissions.Read, ExpiryOffset);
var destBlobClient = destinationContainerClient.GetBlockBlobClient(sourceBlobName);
var result = destBlobClient.SyncUploadFromUri(sourceUri, overwrite: true);
var response = result.GetRawResponse();
if (response.Status != 201) throw new BlobCopyException(response.ReasonPhrase);
}
Use the below command on azure cli.
azcopy copy 'https://<storage-account-name>.<blob or dfs>.core.windows.net/<container-name>/<append-or-page-blob-name>' 'https://<storage-account-name>.<blob or dfs>.core.windows.net/<container-name>/<name-of-new-block-blob>' --blob-type BlockBlob --block-blob-tier <destination-tier>
The --block-blob-tier parameter is optional. If you omit that
parameter, then the destination blob infers its tier from the default
account access tier setting. To change the tier after you've created a
block blob, see Change a blob's tier.

Azure Table Storage access time - inserting/reading from

I'm making a program that stores and reads from Azure tables some that are stored in CSV files. What I got are CSV files that that can have various number of columns, and between 3k and 50k rows. What I need to do is upload that data in Azure table. So far I managed to both upload data and retrieve it.
I'm using REST API, and for uploading I'm creating XML batch request, with 100 rows per request. Now that works fine, except it takes a bit too long to upload, ex. for 3k rows it takes around 30seconds. Is there any way to speed that up? I noticed that it takes most time when proccessing response ( for ReadToEnd() command ). I read somewhere that setting proxy to null could help, but it doesn't do much in my case.
I also found somewhere that it is possible to upload whole XML request to blob and then execute it from there, but I couldn't find any example for doing that.
using (Stream requestStream = request.GetRequestStream())
{
requestStream.Write(content, 0, content.Length);
}
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
Stream dataStream = response.GetResponseStream();
using (var reader = new StreamReader(dataStream))
{
String responseFromServer = reader.ReadToEnd();
}
}
As for retrieving data from azure tables, I managed to get 1000 entities per request. As for that, it takes me around 9s for CS with 3k rows. It also takes most time when reading from stream. When I'm calling this part of the code (again ReadToEnd() ):
response = request.GetResponse() as HttpWebResponse;
using (StreamReader reader = new StreamReader(response.GetResponseStream()))
{
string result = reader.ReadToEnd();
}
Any tips?
As you mentioned you are using REST API you may have to write extra code and depend on your own methods to implement performance improvement differently then using client library. In your case using Storage client library would be best as you can use already build features to expedite insert, upsert etc as described here.
However if you were using Storage Client Library and ADO.NET you can use the article below which is written by Windows Azure Table team as supported way to improve Azure Access Performance:
.NET and ADO.NET Data Service Performance Tips for Windows Azure Tables

Azure Table storage batch suddenly fails

I get an odd error while batching data to Azure table storage.
I have an array with +350.000 strings. I save each string in a row. It Works fine until the firste +50.000 records then the Azure table storage starts to throw an exception with "invalid inputtype" and a "statuscode 400".
When I batch, I batch 10 items at a time, with a simple retrypolicy.
_TableContext.RetryPolicy = RetryPolicies.Retry(4, new TimeSpan(0, 0, 30));
_TableContext.SaveChanges(System.Data.Services.Client.SaveChangesOptions.Batch);
No async, no parallism. It Works fine on Dev environment.
Grrr...
There is a physical limit in the Azure Table Storage of 1MB per row, and a limit of 64 Kb (Kilobytes) per string field.
Also, if you are storing the strings as partitionkeys or rowkeys then some characters are not allowed.
Source:
http://msdn.microsoft.com/en-us/library/dd179338.aspx
the error was my own mistake. I had made an attempt to save batches with the same set of row and partionkey. When I changed that, it worked perfectly.
Azure FTW! :)

Resources