Append to CloudBlockBlob stream - azure

We have a file system abstraction that allows us to easily switch between local and cloud (Azure) storage.
For reading and writing files we have the following members:
Stream OpenRead();
Stream OpenWrite();
Part of our application "bundles" documents into one file. For our local storage provider OpenWrite returns an appendable stream:
public Stream OpenWrite()
{
return new FileStream(fileInfo.FullName, FileMode.Open, FileAccess.ReadWrite, FileShare.ReadWrite, BufferSize, useAsync: true);
}
For Azure blob storage we do the following:
public Stream OpenWrite()
{
return blob.OpenWrite();
}
Unfortunately this overrides the blob contents each time. Is it possible to return a writable stream that can be appended to?

Based on the documentation for OpenWrite here http://msdn.microsoft.com/en-us/library/microsoft.windowsazure.storage.blob.cloudblockblob.openwrite.aspx, The OpenWrite method will overwrite an existing blob unless explicitly prevented using the accessCondition parameter.
One thing you could do is read the blob data in a stream and return that stream to your calling application and let that application append data to that stream. For example, see the code below:
static void BlobStreamTest()
{
storageAccount = CloudStorageAccount.DevelopmentStorageAccount;
CloudBlobContainer container = storageAccount.CreateCloudBlobClient().GetContainerReference("temp");
container.CreateIfNotExists();
CloudBlockBlob blob = container.GetBlockBlobReference("test.txt");
blob.UploadFromStream(new MemoryStream());//Let's just create an empty blob for the sake of demonstration.
for (int i = 0; i < 10; i++)
{
try
{
using (MemoryStream ms = new MemoryStream())
{
blob.DownloadToStream(ms);//Read blob data in a stream.
byte[] dataToWrite = Encoding.UTF8.GetBytes("This is line # " + (i + 1) + "\r\n");
ms.Write(dataToWrite, 0, dataToWrite.Length);
ms.Position = 0;
blob.UploadFromStream(ms);
}
}
catch (StorageException excep)
{
if (excep.RequestInformation.HttpStatusCode != 404)
{
throw;
}
}
}
}

There is now a CloudAppendBlob class that allows you to add content to an existing blob :
var account = CloudStorageAccount.Parse("storage account connectionstring");
var client = account.CreateCloudBlobClient();
var container = client.GetContainerReference("container name");
var blob = container.GetAppendBlobReference("blob name");
In your case you want to append from a stream:
await blob.AppendFromStreamAsync(new MemoryStream());
But you can append from text, byte array, file. Check the documentation.

Related

Azure Block Blob: "The specified block list is invalid." Microsoft.Azure.Storage.StorageException when compressing files > 2GB between Blobs

The issue happens when I upload a file to one blob (Blob1) which in turn runs a background compression service. The background service streams the file from Blob1, compresses it, and stores it as a zip file in a separate blob (Blob2) to cache for the user to download.
The process works without issue for files < 2GB, but throws a Micrososft.Azure.Storage.StorageException when the file size is > 2GB.
using Microsoft.Azure.Storage.Blob 11.2.2
Sample Code
public async void DoWork(CancellationToken cancellationToken)
{
while (!cancellationToken.IsCancellationRequested)
{
await _messageSemaphore.WaitAsync(cancellationToken);
MyModel model = await _queue.PeekMessage();
if(model != null)
{
try
{
//Get CloudBlockBlob zip blob reference
var zipBlockBlob = await _storageAccount.GetFileBlobReference(_configuration[ConfigKeys.ContainerName], model.Filename, model.FileRelativePath);
using (var zipStream = zipBlockBlob.OpenWrite()) //Opens zipstream
{
using (var archive = new ZipArchive(zipStream, ZipArchiveMode.Create, false)) // Create new ZipArchive
{
//Add each file to the zip archive
foreach (var fileUri in Files)
{
var file = new Uri(fileUri);
var cloudBlockBlob = blobContainer.GetBlockBlobReference(file);
using (var blobStream = cloudBlockBlob.OpenRead())//Opens read stream
{
//Create new ZipEntry
var zipEntry = archive.CreateEntry(model.Filename, CompressionLevel.Fastest);
using (var zipEntryStream = zipEntry.Open())
{
//Zip file
blobStream.CopyTo(zipEntryStream);
}
}
}
}
}
}
catch (FileNotFoundException e)
{
Console.WriteLine($"Download Error: {e.Message}");
}
catch (Microsoft.Azure.Storage.StorageException e) // "
{
Console.WriteLine($"Storage Exception Error: {e.Message}");
}
}
else
{
_messageSemaphore.Release();
// Wait for 1 minute between polls when idle
await Task.Delay(_sleepTime);
}
}
}
The problem was with concurrency and having multiple application instances writing to the same blob at the same time. To get around this I ended up implementing an Azure blob lease on the blob being created but had to create a temp file to apply the lease to. This is a decent enough work around for now, but this could / should probably be implemented using an Azure event driven service.

azure blob client returns file not fully

i'm trying to download file from azure blob storage, but it returns only part of file. What i'm doing wrong ? File in storage is not corrupted
public async Task<byte[]> GetFile(string fileName)
{
var blobClient = BlobContainerClient.GetBlobClient(fileName);
var downloadInfo = await blobClient.DownloadAsync();
byte[] b = new byte[downloadInfo.Value.ContentLength];
await downloadInfo.Value.Content.ReadAsync(b, 0, (int)downloadInfo.Value.ContentLength);
return b;
}
I'm using Azure.Storage.Blobs 12.4.2 package. I tried this code and it works for me
public async Task<byte[]> GetFile(string fileName)
{
var blobClient = BlobContainerClient.GetBlobClient(fileName);
using (var memorystream = new MemoryStream())
{
await blobClient.DownloadToAsync(memorystream);
return memorystream.ToArray();
}
}
I am not able to full understand your code as the current BlobClient as of v11.1.1 does not expose any download methods. As #Guarav Mantri-AIS mentioned the readAsync can behave in that manner.
Consider an alternative using the DownloadToByteArrayAsync() which is part of the API. I have include the code required to connect but of course this is just for the purpose of demonstrating a full example.
Your method would be condensed as follows:
public async Task<byte[]> GetFile(string containerName, string fileName)
{
//i am getting the container here, not sure where or how you are doing this
var container = GetContainer("//your connection string", containerName);
//Get the blob first
ICloudBlob blob = container.GetBlockBlobReference(fileName);
//and now download it straight to a byte array
return await blobClient.DownloadAsync();
}
public CloudBlobContainer GetContainer(string connectionString, string containerName)
{
//1. connect to the account
var account = CloudStorageAccount.Parse(connectionString);
//2. create a client
var blobClient = _account.CreateCloudBlobClient();
//3. i am getting the container here, not sure where or how you are doing this
return = _blobClient.GetContainerReference(containerName);
}

Uploading ID to same Azure Blob Index that a file is being uploaded to using .Net SDK

I am uploading documents to my an Azure Blob Storage, which works perfect, but I want to be able to link and ID to this specifically uploaded document.
Below is my code for uploading the file:
[HttpPost]
public ActionResult Upload(HttpPostedFileBase file)
{
try
{
var path = Path.Combine(Server.MapPath("~/App_Data/Uploads"), file.FileName);
string searchServiceName = ConfigurationManager.AppSettings["SearchServiceName"];
string blobStorageKey = ConfigurationManager.AppSettings["BlobStorageKey"];
string blobStorageName = ConfigurationManager.AppSettings["BlobStorageName"];
string blobStorageURL = ConfigurationManager.AppSettings["BlobStorageURL"];
file.SaveAs(path);
var credentials = new StorageCredentials(searchServiceName, blobStorageKey);
var client = new CloudBlobClient(new Uri(blobStorageURL), credentials);
// Retrieve a reference to a container. (You need to create one using the mangement portal, or call container.CreateIfNotExists())
var container = client.GetContainerReference(blobStorageName);
// Retrieve reference to a blob named "myfile.gif".
var blockBlob = container.GetBlockBlobReference(file.FileName);
// Create or overwrite the "myblob" blob with contents from a local file.
using (var fileStream = System.IO.File.OpenRead(path))
{
blockBlob.UploadFromStream(fileStream);
}
System.IO.File.Delete(path);
return new JsonResult
{
JsonRequestBehavior = JsonRequestBehavior.AllowGet,
Data = "Success"
};
}
catch (Exception ex)
{
throw;
}
}
I have added the ClientID field to the Index(It is at the bottom), but have no idea how I am able to add this to this index. This is still al nerw to me and just need a little guidance if someone can help :
Thanks in advance.

Azure blob Storage is messing Image Quality

I'm using Azure Blob Storage for adding an image. Quality of my image is altered for some reason in bad manner which is leading me to incorrect functionality.
Can someone put some light on it. how can I avoid?
CloudBlockBlob blob = _blobContainer.GetBlockBlobReference(filename);
file.Position = 0;
stickerBlob.Properties.ContentType = "image/bmp";
stickerBlob.UploadFromStream(file); //its just a Stream Object.
Code shared. I'm also mentioning the content type as well. I can download and see the image as well but without any ContentType ,also my logic fails on that image.
Saving your images in .png format would be good. You may want to specify the compression levels for other formats (https://msdn.microsoft.com/en-us/library/bb882583(v=vs.110).aspx)
public bool UploadImage(Bitmap bitmap)
{
bool success = false;
try
{
CloudBlobContainer container = BlobClient.GetContainerReference("YourContainerName");
CloudBlockBlob blob = container.GetBlockBlobReference("YourBlobReference");
blob.Properties.ContentType = "image/png";
byte[] file;
using (MemoryStream stream = new MemoryStream())
{
bitmap.Save(stream, System.Drawing.Imaging.ImageFormat.Png);
file = stream.ToArray();
}
blob.UploadFromStream(new MemoryStream(file));
success = true;
}
catch (Exception ex)
{
//Handle error here
}
return success;
}

Getting blob count in an Azure Storage container

What is the most efficient way to get the count on the number of blobs in an Azure Storage container?
Right now I can't think of any way other than the code below:
CloudBlobContainer container = GetContainer("mycontainer");
var count = container.ListBlobs().Count();
If you just want to know how many blobs are in a container without writing code you can use the Microsoft Azure Storage Explorer application.
Open the desired BlobContainer
Click the Folder Statistics icon
Observe the count of blobs in the Activities window
I tried counting blobs using ListBlobs() and for a container with about 400,000 items, it took me well over 5 minutes.
If you have complete control over the container (that is, you control when writes occur), you could cache the size information in the container metadata and update it every time an item gets removed or inserted. Here is a piece of code that would return the container blob count:
static int CountBlobs(string storageAccount, string containerId)
{
CloudStorageAccount cloudStorageAccount = CloudStorageAccount.Parse(storageAccount);
CloudBlobClient blobClient = cloudStorageAccount.CreateCloudBlobClient();
CloudBlobContainer cloudBlobContainer = blobClient.GetContainerReference(containerId);
cloudBlobContainer.FetchAttributes();
string count = cloudBlobContainer.Metadata["ItemCount"];
string countUpdateTime = cloudBlobContainer.Metadata["CountUpdateTime"];
bool recountNeeded = false;
if (String.IsNullOrEmpty(count) || String.IsNullOrEmpty(countUpdateTime))
{
recountNeeded = true;
}
else
{
DateTime dateTime = new DateTime(long.Parse(countUpdateTime));
// Are we close to the last modified time?
if (Math.Abs(dateTime.Subtract(cloudBlobContainer.Properties.LastModifiedUtc).TotalSeconds) > 5) {
recountNeeded = true;
}
}
int blobCount;
if (recountNeeded)
{
blobCount = 0;
BlobRequestOptions options = new BlobRequestOptions();
options.BlobListingDetails = BlobListingDetails.Metadata;
foreach (IListBlobItem item in cloudBlobContainer.ListBlobs(options))
{
blobCount++;
}
cloudBlobContainer.Metadata.Set("ItemCount", blobCount.ToString());
cloudBlobContainer.Metadata.Set("CountUpdateTime", DateTime.Now.Ticks.ToString());
cloudBlobContainer.SetMetadata();
}
else
{
blobCount = int.Parse(count);
}
return blobCount;
}
This, of course, assumes that you update ItemCount/CountUpdateTime every time the container is modified. CountUpdateTime is a heuristic safeguard (if the container did get modified without someone updating CountUpdateTime, this will force a re-count) but it's not reliable.
The API doesn't contain a container count method or property, so you'd need to do something like what you posted. However, you'll need to deal with NextMarker if you exceed 5,000 items returned (or if you specify max # to return and the list exceeds that number). Then you'll make add'l calls based on NextMarker and add the counts.
EDIT: Per smarx: the SDK should take care of NextMarker for you. You'll need to deal with NextMarker if you're working at the API level, calling List Blobs through REST.
Alternatively, if you're controlling the blob insertions/deletions (through a wcf service, for example), you can use the blob container's metadata area to store a cached container count that you compute with each insert or delete. You'll just need to deal with write concurrency to the container.
Example using PHP API and getNextMarker.
Counts total number of blobs in an Azure container.
It takes a long time: about 30 seconds for 100000 blobs.
(assumes we have a valid $connectionString and a $container_name)
$blobRestProxy = ServicesBuilder::getInstance()->createBlobService($connectionString);
$opts = new ListBlobsOptions();
$nblobs = 0;
while($cont) {
$blob_list = $blobRestProxy->listBlobs($container_name, $opts);
$nblobs += count($blob_list->getBlobs());
$nextMarker = $blob_list->getNextMarker();
if (!$nextMarker || strlen($nextMarker) == 0) $cont = false;
else $opts->setMarker($nextMarker);
}
echo $nblobs;
If you are not using virtual directories, the following will work as previously answered.
CloudBlobContainer container = GetContainer("mycontainer");
var count = container.ListBlobs().Count();
However, the above code snippet may not have the desired count if you are using virtual directories.
For instance, if your blobs are stored similar to the following: /container/directory/filename.txt where the blob name = directory/filename.txt the container.ListBlobs().Count(); will only count how many "/directory" virtual directories you have. If you want to list blobs contained within virtual directories, you need to set the useFlatBlobListing = true in the ListBlobs() call.
CloudBlobContainer container = GetContainer("mycontainer");
var count = container.ListBlobs(null, true).Count();
Note: the ListBlobs() call with useFlatBlobListing = true is a much more expensive/slow call...
Bearing in mind all the performance concerns from the other answers, here is a version for v12 of the Azure SDK leveraging IAsyncEnumerable. This requires a package reference to System.Linq.Async.
public async Task<int> GetBlobCount()
{
var container = await GetBlobContainerClient();
var blobsPaged = container.GetBlobsAsync();
return await blobsPaged
.AsAsyncEnumerable()
.CountAsync();
}
With Python API of Azure Storage it is like:
from azure.storage import *
blob_service = BlobService(account_name='myaccount', account_key='mykey')
blobs = blob_service.list_blobs('mycontainer')
len(blobs) #returns the number of blob in a container
If you are using Azure.Storage.Blobs library, you can use something like below:
public int GetBlobCount(string containerName)
{
int count = 0;
BlobContainerClient container = new BlobContainerClient(blobConnctionString, containerName);
container.GetBlobs().ToList().ForEach(blob => count++);
return count;
}
Another Python example, works slow but correctly with >5000 files:
from azure.storage.blob import BlobServiceClient
constr="Connection string"
container="Container name"
blob_service_client = BlobServiceClient.from_connection_string(constr)
container_client = blob_service_client.get_container_client(container)
blobs_list = container_client.list_blobs()
num = 0
size = 0
for blob in blobs_list:
num += 1
size += blob.size
print(blob.name,blob.size)
print("Count: ", num)
print("Size: ", size)
I have spend quite period of time to find the below solution - I don't want to some one like me to waste time - so replying here even after 9 years
package com.sai.koushik.gandikota.test.app;
import com.microsoft.azure.storage.CloudStorageAccount;
import com.microsoft.azure.storage.blob.*;
public class AzureBlobStorageUtils {
public static void main(String[] args) throws Exception {
AzureBlobStorageUtils getCount = new AzureBlobStorageUtils();
String storageConn = "<StorageAccountConnection>";
String blobContainerName = "<containerName>";
String subContainer = "<subContainerName>";
Integer fileContainerCount = getCount.getFileCountInSpecificBlobContainersSubContainer(storageConn,blobContainerName, subContainer);
System.out.println(fileContainerCount);
}
public Integer getFileCountInSpecificBlobContainersSubContainer(String storageConn, String blobContainerName, String subContainer) throws Exception {
try {
CloudStorageAccount storageAccount = CloudStorageAccount.parse(storageConn);
CloudBlobClient blobClient = storageAccount.createCloudBlobClient();
CloudBlobContainer blobContainer = blobClient.getContainerReference(blobContainerName);
return ((CloudBlobDirectory) blobContainer.listBlobsSegmented().getResults().stream().filter(listBlobItem -> listBlobItem.getUri().toString().contains(subContainer)).findFirst().get()).listBlobsSegmented().getResults().size();
} catch (Exception e) {
throw new Exception(e.getMessage());
}
}
}
Count all blobs in a classic and new blob storage account. Building on #gandikota-saikoushik, this solution works for blob containers with a very large number of blobs.
//setup set values from Azure Portal
var accountName = "<ACCOUNTNAME>";
var accountKey = "<ACCOUTNKEY>";
var containerName = "<CONTAINTERNAME>";
uristr = $"DefaultEndpointsProtocol=https;AccountName={accountName};AccountKey={accountKey}";
var storageAccount = Microsoft.WindowsAzure.Storage.CloudStorageAccount.Parse(uristr);
var client = storageAccount.CreateCloudBlobClient();
var container = client.GetContainerReference(containerName);
BlobContinuationToken continuationToken = new BlobContinuationToken();
blobcount = CountBlobs(container, continuationToken).ConfigureAwait(false).GetAwaiter().GetResult();
Console.WriteLine($"blobcount:{blobcount}");
public static async Task<int> CountBlobs(CloudBlobContainer container, BlobContinuationToken currentToken)
{
BlobContinuationToken continuationToken = null;
var result = 0;
do
{
var response = await container.ListBlobsSegmentedAsync(continuationToken);
continuationToken = response.ContinuationToken;
result += response.Results.Count();
}
while (continuationToken != null);
return result;
}
List blobs approach is accurate but slow if you have millions of blobs. Another way that works in a few cases but is relatively fast is querying the MetricsHourPrimaryTransactionsBlob table. It is at the account level and metrics get aggregated hourly.
https://learn.microsoft.com/en-us/azure/storage/common/storage-analytics-metrics
You can use this
public static async Task<List<IListBlobItem>> ListBlobsAsync()
{
BlobContinuationToken continuationToken = null;
List<IListBlobItem> results = new List<IListBlobItem>();
do
{
CloudBlobContainer container = GetContainer("containerName");
var response = await container.ListBlobsSegmentedAsync(null,
true, BlobListingDetails.None, 5000, continuationToken, null, null);
continuationToken = response.ContinuationToken;
results.AddRange(response.Results);
} while (continuationToken != null);
return results;
}
and then call
var count = await ListBlobsAsync().Count;
hope it will be useful

Resources