The issue happens when I upload a file to one blob (Blob1) which in turn runs a background compression service. The background service streams the file from Blob1, compresses it, and stores it as a zip file in a separate blob (Blob2) to cache for the user to download.
The process works without issue for files < 2GB, but throws a Micrososft.Azure.Storage.StorageException when the file size is > 2GB.
using Microsoft.Azure.Storage.Blob 11.2.2
Sample Code
public async void DoWork(CancellationToken cancellationToken)
{
while (!cancellationToken.IsCancellationRequested)
{
await _messageSemaphore.WaitAsync(cancellationToken);
MyModel model = await _queue.PeekMessage();
if(model != null)
{
try
{
//Get CloudBlockBlob zip blob reference
var zipBlockBlob = await _storageAccount.GetFileBlobReference(_configuration[ConfigKeys.ContainerName], model.Filename, model.FileRelativePath);
using (var zipStream = zipBlockBlob.OpenWrite()) //Opens zipstream
{
using (var archive = new ZipArchive(zipStream, ZipArchiveMode.Create, false)) // Create new ZipArchive
{
//Add each file to the zip archive
foreach (var fileUri in Files)
{
var file = new Uri(fileUri);
var cloudBlockBlob = blobContainer.GetBlockBlobReference(file);
using (var blobStream = cloudBlockBlob.OpenRead())//Opens read stream
{
//Create new ZipEntry
var zipEntry = archive.CreateEntry(model.Filename, CompressionLevel.Fastest);
using (var zipEntryStream = zipEntry.Open())
{
//Zip file
blobStream.CopyTo(zipEntryStream);
}
}
}
}
}
}
catch (FileNotFoundException e)
{
Console.WriteLine($"Download Error: {e.Message}");
}
catch (Microsoft.Azure.Storage.StorageException e) // "
{
Console.WriteLine($"Storage Exception Error: {e.Message}");
}
}
else
{
_messageSemaphore.Release();
// Wait for 1 minute between polls when idle
await Task.Delay(_sleepTime);
}
}
}
The problem was with concurrency and having multiple application instances writing to the same blob at the same time. To get around this I ended up implementing an Azure blob lease on the blob being created but had to create a temp file to apply the lease to. This is a decent enough work around for now, but this could / should probably be implemented using an Azure event driven service.
i'm trying to download file from azure blob storage, but it returns only part of file. What i'm doing wrong ? File in storage is not corrupted
public async Task<byte[]> GetFile(string fileName)
{
var blobClient = BlobContainerClient.GetBlobClient(fileName);
var downloadInfo = await blobClient.DownloadAsync();
byte[] b = new byte[downloadInfo.Value.ContentLength];
await downloadInfo.Value.Content.ReadAsync(b, 0, (int)downloadInfo.Value.ContentLength);
return b;
}
I'm using Azure.Storage.Blobs 12.4.2 package. I tried this code and it works for me
public async Task<byte[]> GetFile(string fileName)
{
var blobClient = BlobContainerClient.GetBlobClient(fileName);
using (var memorystream = new MemoryStream())
{
await blobClient.DownloadToAsync(memorystream);
return memorystream.ToArray();
}
}
I am not able to full understand your code as the current BlobClient as of v11.1.1 does not expose any download methods. As #Guarav Mantri-AIS mentioned the readAsync can behave in that manner.
Consider an alternative using the DownloadToByteArrayAsync() which is part of the API. I have include the code required to connect but of course this is just for the purpose of demonstrating a full example.
Your method would be condensed as follows:
public async Task<byte[]> GetFile(string containerName, string fileName)
{
//i am getting the container here, not sure where or how you are doing this
var container = GetContainer("//your connection string", containerName);
//Get the blob first
ICloudBlob blob = container.GetBlockBlobReference(fileName);
//and now download it straight to a byte array
return await blobClient.DownloadAsync();
}
public CloudBlobContainer GetContainer(string connectionString, string containerName)
{
//1. connect to the account
var account = CloudStorageAccount.Parse(connectionString);
//2. create a client
var blobClient = _account.CreateCloudBlobClient();
//3. i am getting the container here, not sure where or how you are doing this
return = _blobClient.GetContainerReference(containerName);
}
I am uploading documents to my an Azure Blob Storage, which works perfect, but I want to be able to link and ID to this specifically uploaded document.
Below is my code for uploading the file:
[HttpPost]
public ActionResult Upload(HttpPostedFileBase file)
{
try
{
var path = Path.Combine(Server.MapPath("~/App_Data/Uploads"), file.FileName);
string searchServiceName = ConfigurationManager.AppSettings["SearchServiceName"];
string blobStorageKey = ConfigurationManager.AppSettings["BlobStorageKey"];
string blobStorageName = ConfigurationManager.AppSettings["BlobStorageName"];
string blobStorageURL = ConfigurationManager.AppSettings["BlobStorageURL"];
file.SaveAs(path);
var credentials = new StorageCredentials(searchServiceName, blobStorageKey);
var client = new CloudBlobClient(new Uri(blobStorageURL), credentials);
// Retrieve a reference to a container. (You need to create one using the mangement portal, or call container.CreateIfNotExists())
var container = client.GetContainerReference(blobStorageName);
// Retrieve reference to a blob named "myfile.gif".
var blockBlob = container.GetBlockBlobReference(file.FileName);
// Create or overwrite the "myblob" blob with contents from a local file.
using (var fileStream = System.IO.File.OpenRead(path))
{
blockBlob.UploadFromStream(fileStream);
}
System.IO.File.Delete(path);
return new JsonResult
{
JsonRequestBehavior = JsonRequestBehavior.AllowGet,
Data = "Success"
};
}
catch (Exception ex)
{
throw;
}
}
I have added the ClientID field to the Index(It is at the bottom), but have no idea how I am able to add this to this index. This is still al nerw to me and just need a little guidance if someone can help :
Thanks in advance.
I'm using Azure Blob Storage for adding an image. Quality of my image is altered for some reason in bad manner which is leading me to incorrect functionality.
Can someone put some light on it. how can I avoid?
CloudBlockBlob blob = _blobContainer.GetBlockBlobReference(filename);
file.Position = 0;
stickerBlob.Properties.ContentType = "image/bmp";
stickerBlob.UploadFromStream(file); //its just a Stream Object.
Code shared. I'm also mentioning the content type as well. I can download and see the image as well but without any ContentType ,also my logic fails on that image.
Saving your images in .png format would be good. You may want to specify the compression levels for other formats (https://msdn.microsoft.com/en-us/library/bb882583(v=vs.110).aspx)
public bool UploadImage(Bitmap bitmap)
{
bool success = false;
try
{
CloudBlobContainer container = BlobClient.GetContainerReference("YourContainerName");
CloudBlockBlob blob = container.GetBlockBlobReference("YourBlobReference");
blob.Properties.ContentType = "image/png";
byte[] file;
using (MemoryStream stream = new MemoryStream())
{
bitmap.Save(stream, System.Drawing.Imaging.ImageFormat.Png);
file = stream.ToArray();
}
blob.UploadFromStream(new MemoryStream(file));
success = true;
}
catch (Exception ex)
{
//Handle error here
}
return success;
}
What is the most efficient way to get the count on the number of blobs in an Azure Storage container?
Right now I can't think of any way other than the code below:
CloudBlobContainer container = GetContainer("mycontainer");
var count = container.ListBlobs().Count();
If you just want to know how many blobs are in a container without writing code you can use the Microsoft Azure Storage Explorer application.
Open the desired BlobContainer
Click the Folder Statistics icon
Observe the count of blobs in the Activities window
I tried counting blobs using ListBlobs() and for a container with about 400,000 items, it took me well over 5 minutes.
If you have complete control over the container (that is, you control when writes occur), you could cache the size information in the container metadata and update it every time an item gets removed or inserted. Here is a piece of code that would return the container blob count:
static int CountBlobs(string storageAccount, string containerId)
{
CloudStorageAccount cloudStorageAccount = CloudStorageAccount.Parse(storageAccount);
CloudBlobClient blobClient = cloudStorageAccount.CreateCloudBlobClient();
CloudBlobContainer cloudBlobContainer = blobClient.GetContainerReference(containerId);
cloudBlobContainer.FetchAttributes();
string count = cloudBlobContainer.Metadata["ItemCount"];
string countUpdateTime = cloudBlobContainer.Metadata["CountUpdateTime"];
bool recountNeeded = false;
if (String.IsNullOrEmpty(count) || String.IsNullOrEmpty(countUpdateTime))
{
recountNeeded = true;
}
else
{
DateTime dateTime = new DateTime(long.Parse(countUpdateTime));
// Are we close to the last modified time?
if (Math.Abs(dateTime.Subtract(cloudBlobContainer.Properties.LastModifiedUtc).TotalSeconds) > 5) {
recountNeeded = true;
}
}
int blobCount;
if (recountNeeded)
{
blobCount = 0;
BlobRequestOptions options = new BlobRequestOptions();
options.BlobListingDetails = BlobListingDetails.Metadata;
foreach (IListBlobItem item in cloudBlobContainer.ListBlobs(options))
{
blobCount++;
}
cloudBlobContainer.Metadata.Set("ItemCount", blobCount.ToString());
cloudBlobContainer.Metadata.Set("CountUpdateTime", DateTime.Now.Ticks.ToString());
cloudBlobContainer.SetMetadata();
}
else
{
blobCount = int.Parse(count);
}
return blobCount;
}
This, of course, assumes that you update ItemCount/CountUpdateTime every time the container is modified. CountUpdateTime is a heuristic safeguard (if the container did get modified without someone updating CountUpdateTime, this will force a re-count) but it's not reliable.
The API doesn't contain a container count method or property, so you'd need to do something like what you posted. However, you'll need to deal with NextMarker if you exceed 5,000 items returned (or if you specify max # to return and the list exceeds that number). Then you'll make add'l calls based on NextMarker and add the counts.
EDIT: Per smarx: the SDK should take care of NextMarker for you. You'll need to deal with NextMarker if you're working at the API level, calling List Blobs through REST.
Alternatively, if you're controlling the blob insertions/deletions (through a wcf service, for example), you can use the blob container's metadata area to store a cached container count that you compute with each insert or delete. You'll just need to deal with write concurrency to the container.
Example using PHP API and getNextMarker.
Counts total number of blobs in an Azure container.
It takes a long time: about 30 seconds for 100000 blobs.
(assumes we have a valid $connectionString and a $container_name)
$blobRestProxy = ServicesBuilder::getInstance()->createBlobService($connectionString);
$opts = new ListBlobsOptions();
$nblobs = 0;
while($cont) {
$blob_list = $blobRestProxy->listBlobs($container_name, $opts);
$nblobs += count($blob_list->getBlobs());
$nextMarker = $blob_list->getNextMarker();
if (!$nextMarker || strlen($nextMarker) == 0) $cont = false;
else $opts->setMarker($nextMarker);
}
echo $nblobs;
If you are not using virtual directories, the following will work as previously answered.
CloudBlobContainer container = GetContainer("mycontainer");
var count = container.ListBlobs().Count();
However, the above code snippet may not have the desired count if you are using virtual directories.
For instance, if your blobs are stored similar to the following: /container/directory/filename.txt where the blob name = directory/filename.txt the container.ListBlobs().Count(); will only count how many "/directory" virtual directories you have. If you want to list blobs contained within virtual directories, you need to set the useFlatBlobListing = true in the ListBlobs() call.
CloudBlobContainer container = GetContainer("mycontainer");
var count = container.ListBlobs(null, true).Count();
Note: the ListBlobs() call with useFlatBlobListing = true is a much more expensive/slow call...
Bearing in mind all the performance concerns from the other answers, here is a version for v12 of the Azure SDK leveraging IAsyncEnumerable. This requires a package reference to System.Linq.Async.
public async Task<int> GetBlobCount()
{
var container = await GetBlobContainerClient();
var blobsPaged = container.GetBlobsAsync();
return await blobsPaged
.AsAsyncEnumerable()
.CountAsync();
}
With Python API of Azure Storage it is like:
from azure.storage import *
blob_service = BlobService(account_name='myaccount', account_key='mykey')
blobs = blob_service.list_blobs('mycontainer')
len(blobs) #returns the number of blob in a container
If you are using Azure.Storage.Blobs library, you can use something like below:
public int GetBlobCount(string containerName)
{
int count = 0;
BlobContainerClient container = new BlobContainerClient(blobConnctionString, containerName);
container.GetBlobs().ToList().ForEach(blob => count++);
return count;
}
Another Python example, works slow but correctly with >5000 files:
from azure.storage.blob import BlobServiceClient
constr="Connection string"
container="Container name"
blob_service_client = BlobServiceClient.from_connection_string(constr)
container_client = blob_service_client.get_container_client(container)
blobs_list = container_client.list_blobs()
num = 0
size = 0
for blob in blobs_list:
num += 1
size += blob.size
print(blob.name,blob.size)
print("Count: ", num)
print("Size: ", size)
I have spend quite period of time to find the below solution - I don't want to some one like me to waste time - so replying here even after 9 years
package com.sai.koushik.gandikota.test.app;
import com.microsoft.azure.storage.CloudStorageAccount;
import com.microsoft.azure.storage.blob.*;
public class AzureBlobStorageUtils {
public static void main(String[] args) throws Exception {
AzureBlobStorageUtils getCount = new AzureBlobStorageUtils();
String storageConn = "<StorageAccountConnection>";
String blobContainerName = "<containerName>";
String subContainer = "<subContainerName>";
Integer fileContainerCount = getCount.getFileCountInSpecificBlobContainersSubContainer(storageConn,blobContainerName, subContainer);
System.out.println(fileContainerCount);
}
public Integer getFileCountInSpecificBlobContainersSubContainer(String storageConn, String blobContainerName, String subContainer) throws Exception {
try {
CloudStorageAccount storageAccount = CloudStorageAccount.parse(storageConn);
CloudBlobClient blobClient = storageAccount.createCloudBlobClient();
CloudBlobContainer blobContainer = blobClient.getContainerReference(blobContainerName);
return ((CloudBlobDirectory) blobContainer.listBlobsSegmented().getResults().stream().filter(listBlobItem -> listBlobItem.getUri().toString().contains(subContainer)).findFirst().get()).listBlobsSegmented().getResults().size();
} catch (Exception e) {
throw new Exception(e.getMessage());
}
}
}
Count all blobs in a classic and new blob storage account. Building on #gandikota-saikoushik, this solution works for blob containers with a very large number of blobs.
//setup set values from Azure Portal
var accountName = "<ACCOUNTNAME>";
var accountKey = "<ACCOUTNKEY>";
var containerName = "<CONTAINTERNAME>";
uristr = $"DefaultEndpointsProtocol=https;AccountName={accountName};AccountKey={accountKey}";
var storageAccount = Microsoft.WindowsAzure.Storage.CloudStorageAccount.Parse(uristr);
var client = storageAccount.CreateCloudBlobClient();
var container = client.GetContainerReference(containerName);
BlobContinuationToken continuationToken = new BlobContinuationToken();
blobcount = CountBlobs(container, continuationToken).ConfigureAwait(false).GetAwaiter().GetResult();
Console.WriteLine($"blobcount:{blobcount}");
public static async Task<int> CountBlobs(CloudBlobContainer container, BlobContinuationToken currentToken)
{
BlobContinuationToken continuationToken = null;
var result = 0;
do
{
var response = await container.ListBlobsSegmentedAsync(continuationToken);
continuationToken = response.ContinuationToken;
result += response.Results.Count();
}
while (continuationToken != null);
return result;
}
List blobs approach is accurate but slow if you have millions of blobs. Another way that works in a few cases but is relatively fast is querying the MetricsHourPrimaryTransactionsBlob table. It is at the account level and metrics get aggregated hourly.
https://learn.microsoft.com/en-us/azure/storage/common/storage-analytics-metrics
You can use this
public static async Task<List<IListBlobItem>> ListBlobsAsync()
{
BlobContinuationToken continuationToken = null;
List<IListBlobItem> results = new List<IListBlobItem>();
do
{
CloudBlobContainer container = GetContainer("containerName");
var response = await container.ListBlobsSegmentedAsync(null,
true, BlobListingDetails.None, 5000, continuationToken, null, null);
continuationToken = response.ContinuationToken;
results.AddRange(response.Results);
} while (continuationToken != null);
return results;
}
and then call
var count = await ListBlobsAsync().Count;
hope it will be useful