Azure batch operations delete several blobs and tables - azure

I have a function that deletes every table & blob that belongs to the affected user.
CloudTable uploadTable = CloudStorageServices.GetCloudUploadsTable();
TableQuery<UploadEntity> uploadQuery = uploadTable.CreateQuery<UploadEntity>();
List<UploadEntity> uploadEntity = (from e in uploadTable.ExecuteQuery(uploadQuery)
where e.PartitionKey == "uploads" && e.UserName == User.Idendity.Name
select e).ToList();
foreach (UploadEntity uploadTableItem in uploadEntity)
{
//Delete table
TableOperation retrieveOperationUploads = TableOperation.Retrieve<UploadEntity>("uploads", uploadTableItem.RowKey);
TableResult retrievedResultUploads = uploadTable.Execute(retrieveOperationUploads);
UploadEntity deleteEntityUploads = (UploadEntity)retrievedResultUploads.Result;
TableOperation deleteOperationUploads = TableOperation.Delete(deleteEntityUploads);
uploadTable.Execute(deleteOperationUploads);
//Delete blob
CloudBlobContainer blobContainer = CloudStorageServices.GetCloudBlobsContainer();
CloudBlockBlob blob = blobContainer.GetBlockBlobReference(uploadTableItem.BlobName);
blob.Delete();
}
Each table got its own blob, so if the list contains 3 uploadentities, the 3 table and the 3 blobs will be deleted.
I heard you can use table batch operations for reduce cost and load. I tried it, but failed miserable. Anyone intrested in helping me:)?
Im guessing tablebatch operations are for tables only, so its a no go for blobs, right?
How would you add tablebatchoperations for this code? Do you see any other improvements that can be done?
Thanks!

I wanted to use batch operations but I didn't know how. Anyhow, I figured it out after some testing.
Improved code for deleting several entities:
CloudTable uploadTable = CloudStorageServices.GetCloudUploadTable();
TableQuery<UserUploadEntity> uploadQuery = uploadTable.CreateQuery<UserUploadEntity>();
List<UserUploadEntity> uploadEntity = (from e in uploadTable.ExecuteQuery(uploadQuery)
where e.PartitionKey == "useruploads" && e.MapName == currentUser
select e).ToList();
var batchOperation = new TableBatchOperation();
foreach (UserUploadEntity uploadTableItem in uploadEntity)
{
//Delete upload entities
batchOperation.Delete(uploadTableItem);
//Delete blobs
CloudBlobContainer blobContainer = CloudStorageServices.GetCloudBlobContainer();
CloudBlockBlob blob = blobContainer.GetBlockBlobReference(uploadTableItem.BlobName);
blob.Delete();
}
uploadTable.ExecuteBatch(batchOperation);
I am aware that batchoperations are limited to 100 but in my case its nothing to worry about.

Related

Download Images stored in Azure Blob Storage as Images using C#

I have stored a bunch of images in Azure Blob Storage. Now I want to retrieve them & resize them.
I have successfully managed to read much information from the account, such as the filename, the date last modified, and the size, but how do I get the actual image? Examples I have seen show me how to download it to a file, but that is no use to me, I want to download it as an image so I can process it.
This is what I have so far:
BlobContainerClient containerClient = blobServiceClient.GetBlobContainerClient(containerName);
Console.WriteLine("Listing blobs...");
// build table to hold the info
DataTable table = new DataTable();
table.Columns.Add("ID", typeof(int));
table.Columns.Add("blobItemName", typeof(string));
table.Columns.Add("blobItemLastModified", typeof(DateTime));
table.Columns.Add("blobItemSizeKB", typeof(double));
table.Columns.Add("blobImage", typeof(Image));
// row counter for table
int intRowNo = 0;
// divider to convert Bytes to KB
double dblBytesToKB = 1024.00;
// List all blobs in the container
await foreach (BlobItem blobItem in containerClient.GetBlobsAsync())
{
// increment row number
intRowNo++;
//Console.WriteLine("\t" + blobItem.Name);
// length in bytes
long? longContentLength = blobItem.Properties.ContentLength;
double dblKb = 0;
if (longContentLength.HasValue == true)
{
long longContentLengthValue = longContentLength.Value;
// convert to double DataType
double dblContentLength = Convert.ToDouble(longContentLengthValue);
// Convert to KB
dblKb = dblContentLength / dblBytesToKB;
}
// get the image
// **** Image thisImage = what goes here ?? actual data from blobItem ****
// Last modified date
string date = blobItem.Properties.LastModified.ToString();
try
{
DateTime dateTime = DateTime.Parse(date);
//Console.WriteLine("The specified date is valid: " + dateTime);
table.Rows.Add(intRowNo, blobItem.Name, dateTime, dblKb);
}
catch (FormatException)
{
Console.WriteLine("Unable to parse the specified date");
}
}
You need to open a read stream for your image, and construct your .NET Image from this stream:
await foreach (BlobItem item in containerClient.GetBlobsAsync()){
var blobClient = containerClient.GetBlobClient(item.Name);
using Stream stream = await blobClient.OpenReadAsync();
Image myImage = Image.FromStream(stream);
//...
}
The blobclient class also exposes some other helpful methods, like a download to a stream.

Get ListBlobs() sorted by LastModifiedDate?

I have 30000 images in blob storage and I want to fetch the images in descending order of modified date. Is there any way to fetch it in chunks of 1000 images per call?
Here is my code but this take too much time. Basically can i sort ListBlobs() by LastUpdated date?
CloudBlobContainer rootContainer = blobClient.GetContainerReference("installations");
CloudBlobDirectory dir1;
var items = rootContainer.ListBlobs(id + "/Cameras/" + camId.ToString() + "/", false);
foreach (var blob in items.OfType<CloudBlob>()
.OrderByDescending(b => b.Properties.LastModified).Skip(1000).Take(500))
{
}
Basically can i sort ListBlobs() by LastUpdated date?
No, you can't do server-side sorting on LastUpdated. Blob Storage service returns the data sorted by blob's name. You would need to fetch the complete data on the client and sort it there.
Other alternative would be to store the blob's information (like blob's URL, last modified date etc.) in a SQL Database and fetch the list from there. There you will have the ability to sort the data any way you like it.
I have sorted the blobs in last modified order as in the below example and it is the only solution I could think of :)
/**
* list the blob items in the blob container, ordered by the last modified date
* #return
*/
public List<FileProperties> listFiles() {
Iterable<ListBlobItem> listBlobItems = rootContainer.listBlobs();
List<FileProperties> list = new ArrayList<>();
for (ListBlobItem listBlobItem : listBlobItems) {
if (listBlobItem instanceof CloudBlob) {
String substring = ((CloudBlob) listBlobItem).getName();
FileProperties info = new FileProperties(substring, ((CloudBlob) listBlobItem).getProperties().getLastModified());
list.add(info);
}
}
// to sort the listed blob items in last modified order
list.sort(new Comparator<FileProperties>() {
#Override
public int compare(FileProperties o1, FileProperties o2) {
return new Long(o2.getLastModifiedDate().getTime()).compareTo(o1.getLastModifiedDate().getTime());
}
});
return list;
}

Saving & Testing Stored Procedures/Triggers (maybe User Defined Functions) For Partitioned Collections

I'm receiving the following error when attempting save modifications to a Stored Procedure that has been created within a partitioned collection:
Failed to save the script
Here is the details from within the Azure Portal:
Operation name Failed to save the script Time stamp Fri Feb 17 2017
08:46:32 GMT-0500 (Eastern Standard Time) Event initiated by
- Description Database Account: MyDocDbAccount, Script: bulkImport, Message: {"code":400,"body":"{\"code\":\"BadRequest\",\"message\":\"Replaces
and upserts for scripts in collections with multiple partitions are
not supported.
The Stored Procedure in question is the example "bulkImport" script that can be found here.
There is a known missing capability (bug, if you prefer) in DocumentDB right now where you cannot update existing stored procedures in a partitioned collection. The workaround is to delete it first and then recreate it under the same name/id.
Contrary to the error message, it turns out that _client.ReplaceStoredProcedureAsync(...) does work (as of June 2018) on partitioned collections. So you can do something like this:
try
{
await _client.CreateStoredProcedureAsync(...);
}
catch(DocumentClientException dex) when (dex.StatusCode == HttpStatusCode.Conflict)
{
await _client.ReplaceStoredProcedureAsync(...);
}
Once your SP gets created the 1st time, you will never have any time quanta when it isn't available (due to deletion + recreation).
This extension method can handle add or update of a stored procedure.
public static async Task AddOrUpdateProcedure(this DocumentClient client,
string databaseId,
string collectionId,
string storedProcedureId,
string storedProcedureBody)
{
try
{
var documentCollectionUri = UriFactory.CreateDocumentCollectionUri(databaseId, collectionId);
await client.CreateStoredProcedureAsync(documentCollectionUri, new StoredProcedure
{
Id = storedProcedureId,
Body = storedProcedureBody
});
}
catch (DocumentClientException ex) when (ex.StatusCode == HttpStatusCode.Conflict)
{
var storedProcedureUri = UriFactory.CreateStoredProcedureUri(databaseId, collectionId, storedProcedureId);
var storedProcedure = await client.ReadStoredProcedureAsync(storedProcedureUri);
storedProcedure.Resource.Body = storedProcedureBody;
await client.ReplaceStoredProcedureAsync(storedProcedure);
}
}
As of now, updating a stored procedure still does not work in Azure Portal / CosmosDB Data explorer. There is a Cosmos DB Extension for Visual Studio Code where this works. However I don't see a way of executing the procedure from the extension like I can from Data Explorer.
try
{
var spResponse = await dbClient.CreateStoredProcedureAsync($"/dbs/{dataRepoDatabaseId}/colls/{collectionName}", new StoredProcedure
{
Id = sp.Item1,
Body = sp.Item2
}, new RequestOptions { PartitionKey = new PartitionKey(partitionKey) });
}
catch (DocumentClientException dex) when (dex.StatusCode == HttpStatusCode.Conflict)
{
//Fetch the resource to be updated
StoredProcedure sproc = dbClient.CreateStoredProcedureQuery($"/dbs/{dataRepoDatabaseId}/colls/{collectionName}")
.Where(r => r.Id == sp.Item1)
.AsEnumerable()
.SingleOrDefault();
if(!sproc.Body.Equals( sp.Item2))
{
sproc.Body = sp.Item2;
StoredProcedure updatedSPResponse = await dbClient.ReplaceStoredProcedureAsync(sproc);
}
}

Why isn't my TakeLimit honored by TableQuery?

I'd like to fetch top n rows from my Azure Table with a simple TableQuery. But with the code below, all rows are fetched regardless of my limit with the Take.
What am I doing wrong?
int entryLimit = 5;
var table = GetFromHelperFunc();
TableQuery<MyEntity> query = new TableQuery<MyEntity>()
.Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, "MyPK"))
.Take(entryLimit);
List<FeedEntry> entryList = new List<FeedEntry>();
TableQuerySegment<FeedEntry> currentSegment = null;
while (currentSegment == null || currentSegment.ContinuationToken != null)
{
currentSegment = table.ExecuteQuerySegmented(query, this.EntryResolver, currentSegment != null ? currentSegment.ContinuationToken : null);
entryList.AddRange(currentSegment.Results);
}
Trace.WriteLine(entryList.Count) // <-- Why does this exceed my limit?
The Take method on the storage SDK doesn't work like it would in LINQ. Imagine you do something like this:
TableQuery<TableEntity> query = new TableQuery<TableEntity>()
.Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, "temp"))
.Take(5);
var result = table.ExecuteQuery(query);
When you start iterating over result you'll initially get only 5 items. But underneath, if you keep iterating over the result, the SDK will keep querying the table (and proceed to the next 'page' of 5 items).
If I have 5000 items in my table, this code will output all 5000 items (and underneath the SDK will do 1000 requests and fetch 5 items per request):
TableQuery<TableEntity> query = new TableQuery<TableEntity>()
.Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, "temp"))
.Take(5);
var result = table.ExecuteQuery(query);
foreach (var item in result)
{
Trace.WriteLine(item.RowKey);
}
The following code will fetch exactly 5 items in 1 request and stop there:
TableQuery<TableEntity> query = new TableQuery<TableEntity>()
.Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, "temp"))
.Take(5);
var result = table.ExecuteQuery(query);
int index = 0;
foreach (var item in result)
{
Console.WriteLine(item.RowKey);
index++;
if (index == 5)
break;
}
Actually, the Take() method sets the page size or the "take count" (TakeCount property on TableQuery). But it's still up to you to stop iterating on time if you only want 5 records.
In your example, you should modify the while loop to stop when reaching the TakeCount (which you set by calling Take):
while (entryList.Count < query.TakeCount && (currentSegment == null || currentSegment.ContinuationToken != null))
{
currentSegment = table.ExecuteQuerySegmented(query, currentSegment != null ? currentSegment.ContinuationToken : null);
entryList.AddRange(currentSegment.Results);
}
AFAIK Storage Client Library 2.0 had a bug in Take implementation. It was fixed in ver 2.0.4.
Read last comments at http://blogs.msdn.com/b/windowsazurestorage/archive/2012/11/06/windows-azure-storage-client-library-2-0-tables-deep-dive.aspx
[EDIT]
Original MSDN post no longer available. Still present on WebArchive:
http://web.archive.org/web/20200722170914/https://learn.microsoft.com/en-us/archive/blogs/windowsazurestorage/windows-azure-storage-client-library-2-0-tables-deep-dive

Microsoft.WindowsAzure.StorageClient.CloudBlockBlob.DownloadBlockList returns 0 blocks

I'm getting a list of IListBlobItems using CloudBlobContainer.ListBlobs. I'm then looping over each entry to show the size of the blob using the following code:
foreach (IListBlobItem item in blobs)
{
if (item.GetType() == typeof(CloudBlobDirectory))
{ }
else if (item.GetType() == typeof(CloudBlockBlob))
{
CloudBlockBlob blockBlob = (CloudBlockBlob)item;
IEnumerable<ListBlockItem> blocks = blockBlob.DownloadBlockList(new BlobRequestOptions { BlobListingDetails = BlobListingDetails.All });
Console.WriteLine(blockBlob.Name.PadRight(50, ' ') +
blocks.Sum(b => b.Size));
}
else
{
Console.WriteLine(item.Uri.LocalPath);
}
}
However, when I check the count on the variable blocks, it is always 0. Why is that?
I believe (not 100% sure) that DownloadBlockList() is only valid if the blob was initially uploaded in blocks, rather than all at once. That may be why you get no blocks back.
In any case, you seem to only want the total size of the blob anyway. So using the blob.Properties.Length property may be an easier approach:
CloudBlockBlob blockBlob = (CloudBlockBlob)item;
blockBlob.FetchAttributes();
Console.WriteLine(blockBlob.Name.PadRight(50, ' ') + blockBlob.Properties.Length);
I would think you should be checking for a CloudBlob instead of a CloudBlockBlob type. If you add another else if with that, do you get the sizes? If that isn't it, do you see the code going into the else if and the b.size is just zero?

Resources