What is the difference between OpenReadAsync and DownloadToStreamAsync functions of CloudBlockBlob in the Azure blob storage? Searched in google but could not find an answer.
Both OpenReadAsync and DownloadToStreamAsync could initiate an asynchronous operation for you to retrieve the blob stream.
Based on my testing, you could have a better understanding of them by the following sections:
Basic Concepts
DownloadToStreamAsync:Initiates an asynchronous operation to download the contents of a blob to a stream.
OpenReadAsync:Initiates an asynchronous operation to download the contents of a blob to a stream.
Usage
a) DownloadToStreamAsync
Sample Code:
using (var fs = new FileStream(<yourLocalFilePath>, FileMode.Create))
{
await blob.DownloadToStreamAsync(fs);
}
b) OpenReadAsync
Sample Code:
//Set buffer for reading from a blob stream, the default value is 4MB.
blob.StreamMinimumReadSizeInBytes=10*1024*1024; //10MB
using (var blobStream = await blob.OpenReadAsync())
{
using (var fs = new FileStream(localFile, FileMode.Create))
{
await blobStream.CopyToAsync(fs);
}
}
Capturing Network requests via Fiddler
a) DownloadToStreamAsync
b) OpenReadAsync
According to the above, DownloadToStreamAsync just sends one get request for retrieving blob stream, while OpenReadAsync sends more than one request to retrieving blob stream based on the “Blob.StreamMinimumReadSizeInBytes” you have set or by default value.
The difference between DownloadToStreamAsync and OpenReadAsync is that DownloadToStreamAsync will download the contents of the blob to the stream before returning, but OpenReadAsync will not trigger a download until the stream is consumed.
For example, if using this to return a file stream from an ASP.NET core service, you should use OpenReadAsync and not DownloadToStreamAsync:
Example with DownloadToStreamAsync (not recommended in this case):
Stream target = new MemoryStream(); // Could be FileStream
await blob.DownloadToStreamAsync(target); // Returns when streaming (downloading) is finished. This requires the whole blob to be kept in memory before returning!
_logger.Log(LogLevel.Debug, $"DownloadToStreamAsync: Length: {target.Length} Position: {target.Position}"); // Output: DownloadToStreamAsync: Length: 517000 Position: 517000
target.Position = 0; // Rewind before returning Stream:
return File(target, contentType: blob.Properties.ContentType, fileDownloadName: blob.Name, lastModified: blob.Properties.LastModified, entityTag: null);
Example with OpenReadAsync (recommended in this case):
// Do NOT put the stream in a using (or close it), as this will close the stream before ASP.NET finish consuming it.
Stream blobStream = await blob.OpenReadAsync(); // Returns when the stream has been opened
_logger.Log(LogLevel.Debug, $"OpenReadAsync: Length: {blobStream.Length} Position: {blobStream.Position}"); // Output: OpenReadAsync: Length: 517000 Position: 0
return File(blobStream, contentType: blob.Properties.ContentType, fileDownloadName: blob.Name, lastModified: blob.Properties.LastModified, entityTag: null);
Answer from a member of Microsoft Azure (here):
The difference between DownloadStreamingAsync and OpenReadAsync is
that the former gives you a network stream (wrapped with few layers
but effectively think about it as network stream) which holds on to
single connection, the later on the other hand fetches payload in
chunks and buffers issuing multiple requests to fetch content. Picking
one over the other one depends on the scenario, i.e. if the consuming
code is fast and you have good broad network link to storage account
then former might be better choice as you avoid multiple req-res
exchanges but if the consumer is slow then later might be a good idea
as it releases a connection back to the pool right after reading and
buffering next chunk. We recommend to perf test your app with both to
reveal which is best choice if it's not obvious.
OpenReadAsync returns a Task<Stream> and you use it with an await.
sample test method
CloudBlobContainer container = GetRandomContainerReference();
try
{
await container.CreateAsync();
CloudBlockBlob blob = container.GetBlockBlobReference("blob1");
using (MemoryStream wholeBlob = new MemoryStream(buffer))
{
await blob.UploadFromStreamAsync(wholeBlob);
}
using (MemoryStream wholeBlob = new MemoryStream(buffer))
{
using (var blobStream = await blob.OpenReadAsync())
{
await TestHelper.AssertStreamsAreEqualAsync(wholeBlob, blobStream);
}
}
}
DownloadToStreamAsync is a virtual (can be overridden) method returning a task and takes stream object as input.
sample usage.
await blog.DownloadToStreamAsync(memoryStream);
Related
I want to download a storage blob from Azure and stream it to a client via an .NET Web-App. The blob was uploaded correctly and is visible in my Azure storage account.
Surprisingly, the following throws an exception within HttpBaseStream:
[...]
var blobClient = _containerClient.GetBlobClient(Path.Combine(fileName));
var stream = await blobClient.OpenReadAsync();
return stream;
-> When i step further and return a File (return File(stream, MediaTypeNames.Application.Octet);), the download works as intended.
I tried to push the stream into an MemoryStream, which also fails with the same exception:
[...]
var blobClient = _containerClient.GetBlobClient(Path.Combine(fileName));
var stream = new MemoryStream();
await blobClient.DownloadToAsync(stream);
return stream
->When i step further, returning the file results in a timeout.
How can i fix that? Why do i get this exception - i followed the official quickstart guide from Microsoft.
the following throws an exception within HttpBaseStream
It looks like the HTTP result type is attempting to set the Content-Length header and is reading Length to do so. That would be the natural thing to do. However, it would also be natural to handle the NotSupportedException and just not set Content-Length at all.
If the NotSupportedException only shows up when running in the debugger, then just ignore it.
If the exception is actually thrown to your code (i.e., causing the request to fail), then you'll need to follow the rest of this answer.
First, create a minimal reproducible example and report a bug to the .NET team.
To work around this issue in the meantime, I recommend writing a stream wrapper that returns an already-determined length, which you can get from the Azure blob attributes. E.g.:
public sealed class KnownLengthStreamWrapper : Stream
{
private readonly Stream _stream;
public KnownLengthStreamWrapper(Stream stream, long length)
{
_stream = stream;
Length = length;
}
public override long Length { get; private set; }
... // override all other Stream members and forward to _stream.
}
That should be sufficient to get your app working.
I tried to push the stream into an MemoryStream
This didn't work because you'd need to "rewind" the MemoryStream at some point, e.g.:
var stream = new MemoryStream();
await blobClient.DownloadToAsync(stream);
stream.Position = 0;
return stream;
Check this sample of all the blob options which i have already posted on git working as expected. Reference
public void DownloadBlob(string path)
{
storageAccount = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("StorageConnectionString"));
CloudBlobClient client = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = client.GetContainerReference("images");
CloudBlockBlob blockBlob = container.GetBlockBlobReference(Path.GetFileName(path));
using (MemoryStream ms = new MemoryStream())
{
blockBlob.DownloadToStream(ms);
HttpContext.Current.Response.ContentType = blockBlob.Properties.ContentType.ToString();
HttpContext.Current.Response.AddHeader("Content-Disposition", "Attachment; filename=" + Path.GetFileName(path).ToString());
HttpContext.Current.Response.AddHeader("Content-Length", blockBlob.Properties.Length.ToString());
HttpContext.Current.Response.BinaryWrite(ms.ToArray());
HttpContext.Current.Response.Flush();
HttpContext.Current.Response.Close();
}
}
The following trigger removes exif data from blobs (which are images) after they are uploaded to azure storage. The problem is that the blob trigger fires at least 5 times for each blob.
In the trigger the blob is updated by writing a new stream of data to it. I had assumed that blob receipts would prevent further firing of the blob trigger against this blob.
[FunctionName("ExifDataPurge")]
public async System.Threading.Tasks.Task RunAsync(
[BlobTrigger("container/{name}.{extension}", Connection = "")]CloudBlockBlob image,
string name,
string extension,
string blobTrigger,
ILogger log)
{
log.LogInformation($"C# Blob trigger function Processed blob\n Name:{name}");
try
{
var memoryStream = new MemoryStream();
await image.DownloadToStreamAsync(memoryStream);
memoryStream.Position = 0;
using (Image largeImage = Image.Load(memoryStream))
{
if (largeImage.Metadata.ExifProfile != null)
{
//strip the exif data from the image.
for (int i = 0; i < largeImage.Metadata.ExifProfile.Values.Count; i++)
{
largeImage.Metadata.ExifProfile.RemoveValue(largeImage.Metadata.ExifProfile.Values[i].Tag);
}
var exifStrippedImage = new MemoryStream();
largeImage.Save(exifStrippedImage, new SixLabors.ImageSharp.Formats.Jpeg.JpegEncoder());
exifStrippedImage.Position = 0;
await image.UploadFromStreamAsync(exifStrippedImage);
}
}
}
catch (UnknownImageFormatException unknownImageFormatException)
{
log.LogInformation($"Blob is not a valid Image : {name}.{extension}");
}
}
Triggers are handled in such a way that they track which blobs have been processed by storing receipts in container azure-webjobs-hosts. Any blob not having a receipt, or an old receipt (based on blob ETag) will be processed (or reprocessed).
since you are calling await image.UploadFromStreamAsync(exifStrippedImage); it gets triggered (assuming its not been processed)
When you call await image.UploadFromStreamAsync(exifStrippedImage);, it will update blob so the blob function will trigger again.
You can try to check the existing CacheControl property on the blob to not update it if it has been updated to break the loop.
// Set the CacheControl property to expire in 1 hour (3600 seconds)
blob.Properties.CacheControl = "max-age=3600";
So I've addressed this by storing a Status in metadata against the blob as it's processed.
https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-container-properties-metadata
The trigger then contains a guard to check for the metadata.
if (image.Metadata.ContainsKey("Status") && image.Metadata["Status"] == "Processed")
{
//an subsequent processing for the blob will enter this block.
log.LogInformation($"blob: {name} has already been processed");
}
else
{
//first time triggered for this blob
image.Metadata.Add("Status", "Processed");
await image.SetMetadataAsync();
}
The other answers pointed me in the right direction. I think it is more correct to use the metadata. Storing an ETag elsewhere seems redundant when we can store metadata. The use of "CacheControl" seems like too much of a hack, other developers might be confused as to what and why I have done it.
I'm currently using v2 of Azure Function Apps. I've set the environment to be 64 bit and am compiling to .Net Standard 2.0. Host Json specifies version 2.
I'm reading in a .csv and it works fine for smaller files. But when I read in a 180MB .csv into a List of string[] it's ballooning to over a GB on read and when I try to parse it, it's up over 2 GB but then throws the 'Out of Memory' Exception. Even running on an app service plan with more than 3.5 GB hasn't solved the issue.
Edit:
I'm using this:
Uri blobUri = AppendSasOnUri(blobName); _webClient = new WebClient();
Stream sourceStream = _webClient.OpenRead(blobUri);
_reader = new StreamReader(sourceStream);
However, since It's a csv, I'm splitting out entire columns of data. It's pretty hard to get away from this:
internal async Task<List<string[]>> ReadCsvAsync() {
while (!_reader.EndOfStream) {
string[] currentCsvRow = await ReadCsvRowAsync();
_fullBlobCsv.Add(currentCsvRow);
}
return _fullBlobCsv; }
Goal is to store json into blob when alls said and done.
Try using stream (StreamReader) to read the input .csv file and process one line at a time.
I'm able to parse 300mb files on consumption plan with streams. My use-case may not be same but similar. Parse a large concatenated pdf file and separate it to 5000+ smaller files and store the separated files into blob container. Below is my code for reference.
For your use case you may want to use CloudAppendBlob instead of CloudBlockBlob if you're pushing all parsed data into single blob.
public async static void ExtractSmallerFiles(CloudBlockBlob myBlob, string fileDate, ILogger log)
{
using (var reader = new StreamReader(await myBlob.OpenReadAsync()))
{
CloudBlockBlob blockBlob = null;
var fileContents = new StringBuilder(string.Empty);
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
if (line.StartsWith("%%MS_SKEY_0000_000_PDF:"))
{
var matches = Regex.Match(line, #"%%MS_SKEY_0000_000_PDF: A(\d+)_SMFL_B1234_D(\d{8})_A\d+_M(\d{15}) _N\d+");
var smallFileDate = matches.Groups[2];
var accountNumber = matches.Groups[3];
var fileName = $"SmallerFiles/{smallFileDate}/{accountNumber}.pdf";
blockBlob = myBlob.Container.GetBlockBlobReference(fileName);
}
fileContents.AppendLine(line);
if (line.Equals("%%EOF"))
{
log.LogInformation($"Uploading {fileContents.Length} bytes to {blockBlob.Name}");
await blockBlob.UploadTextAsync(fileContents.ToString());
fileContents = new StringBuilder(string.Empty);
}
}
await myBlob.DeleteAsync();
log.LogInformation("Extracted Smaller files");
}
}
I am looking for a way to uploading files to Azure Blob Storage.
I found azure-storage npm package.
But I'm having a problem with 'createBlockBlobFromStream' method.
I dont know how create stream from Uint8Array.
xhr.onload = function (e) {
if (this.status == 200) {
// Note: .response instead of .responseText
const blob = new Blob([this.response]);
console.log(audios[i].file);
const reader = new FileReader();
reader.onload = function () {
const data = new Uint8Array(reader.result);
Meteor.call('uploadFilesViaSDK', data);
};
reader.readAsArrayBuffer(blob);
}
};
I`m trying to migarate files from S3 to Azure blob. Thats why I dowload files from S3, and than read it as ArrayBuffer and convert it to Uint8Array.
And now I am looking way how to upload this data to azure via azure.createBlockBlobFromStream meyhod.
Specifically, I need an example of creating a stream from Uint8Array.
I'll be grateful for any answer
In addition to an approach provided by Gaurav, once you created a stream from Uint8Array by using streamifier, you can use createWriteStreamToBlockBlob function to write to a block blob from a stream. With that you are able to transmit stream by calling .pipe():
streamifier.createReadStream(new Buffer(uint8)).pipe(blobSvc.createWriteStreamToBlockBlob('mycontainer', 'test.txt'));
I am writing automatic tests and I need to check if the Upload succeeded.
How do I do that? How come there is no FileExist method?
Exists method for blobs have been added in the new Storage Client Library 2.0 release. Since you are using an older library version, you can instead use FetchAttributes. It will throw an exception if the blob does not exist.
On the other hand, as Magnus also mentioned, Upload* methods throw an exception if they do not succeed.
I recommend checking file size for case that forexample connection to server was closed before completing data transfer.
public bool WriteDocumentStream(String documentId, Stream dataStream, long length)
{
CloudBlobContainer container = BlobClient.GetContainerReference(ContainerName);
CloudBlob blob = container.GetBlobReference(documentId);
blob.UploadFromStream(dataStream);
blob.FetchAttributes();
bool success = blob.Properties.Length == length;
if (!success)
blob.Delete();
return success;
}
//length should be like this: context.Request.ContentLength
//(if request have ContentLength defined at headers)