Azure Blob storage append blob 409 / modified error when refreshing portal / storage explorer whilst appending - azure

I am receiving an error whilst uploading chunks to an append blob in Azure. Left alone, the process works ok but the problem arises the moment I refresh the container with Storage explorer (latest version) or refresh the page in the Azure portal whilst its uploading. My process throws the following.
An exception of type 'Azure.RequestFailedException' occurred in System.Private.CoreLib.dll but was not handled in user code: 'The blob has been modified while being read.
RequestId:62778adb-001e-011e-29a4-d589bf000000
Time:2021-11-09T20:02:29.3183234Z
Status: 409 (The blob has been modified while being read.)
ErrorCode: BlobModifiedWhileReading
Taking a lease out on the file makes no difference.
Test code is
using System;
using System.IO;
using System.Buffers;
using System.Threading.Tasks;
using Azure.Storage.Blobs;
using System.Text;
using Azure.Storage.Blobs.Specialized;
namespace str
{
static class Program
{
static async Task Main(string[] args)
{
const string ContainerName = "files";
const string BlobName = "my.blob";
const int ChunkSize = 4194304; // 4MB
const string connstr = "some-connecting-string-to-your-datalake-gen-2-account";
BlobServiceClient blobClient = new(connstr);
BlobContainerClient containerClient = blobClient.GetBlobContainerClient(ContainerName);
await containerClient.CreateIfNotExistsAsync();
AppendBlobClient appendClient = containerClient.GetAppendBlobClient(BlobName);
await appendClient.CreateIfNotExistsAsync();
using FileStream fs = await FileMaker.CreateNonsenseFileAsync();
using BinaryReader reader = new(fs);
bool readLoop = true;
while (readLoop)
{
byte[] chunk = reader.ReadBytes(ChunkSize);
if (chunk.Length > 0)
await appendClient.AppendBlockAsync(new MemoryStream(chunk));
readLoop = chunk.Length == ChunkSize;
}
fs.Close();
File.Delete(fs.Name);
}
}
public static class FileMaker
{
public static async Task<FileStream> CreateNonsenseFileAsync(int blocks = 30000)
{
string tempFile = Path.GetTempFileName();
FileStream fs = File.OpenWrite(tempFile);
byte[] buffer = ArrayPool<byte>.Shared.Rent(1024);
Random randy = new();
using (StreamWriter writer = new(fs))
{
for (int i = 0; i < blocks; i++)
{
randy.NextBytes(buffer);
await writer.WriteAsync(Encoding.UTF8.GetString(buffer));
}
}
return File.OpenRead(tempFile);
}
}
}
csproj is
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net5.0</TargetFramework>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="Azure.Storage.Files.DataLake" Version="12.8.0" />
<PackageReference Include="System.Buffers" Version="4.5.1" />
</ItemGroup>
</Project>
I can only imagine the act of refreshing the page somehow triggers a metadata change and the service gives up enthusiastically but seems fairly arbitrary as you may not know who is poking about in blob storage as you are uploading to it?
As above, left alone with no one refreshing the page in the portal or storage explorer this code works fine and uploads the garbage in 4MB chunks (limit of append blob writes, switching to 2MB chunks doesn't make a difference) without a problem.

I tried to reproduce the scenario in my system not facing the issue that you are facing. Able to get the appended data after the refresh the portal .
OUTPUT
Before refresh the blob
After refresh the blob

Related

.NET Core: Reading Azure Storage Blob into Memory Stream throws NotSupportedException in HttpBaseStream

I want to download a storage blob from Azure and stream it to a client via an .NET Web-App. The blob was uploaded correctly and is visible in my Azure storage account.
Surprisingly, the following throws an exception within HttpBaseStream:
[...]
var blobClient = _containerClient.GetBlobClient(Path.Combine(fileName));
var stream = await blobClient.OpenReadAsync();
return stream;
-> When i step further and return a File (return File(stream, MediaTypeNames.Application.Octet);), the download works as intended.
I tried to push the stream into an MemoryStream, which also fails with the same exception:
[...]
var blobClient = _containerClient.GetBlobClient(Path.Combine(fileName));
var stream = new MemoryStream();
await blobClient.DownloadToAsync(stream);
return stream
->When i step further, returning the file results in a timeout.
How can i fix that? Why do i get this exception - i followed the official quickstart guide from Microsoft.
the following throws an exception within HttpBaseStream
It looks like the HTTP result type is attempting to set the Content-Length header and is reading Length to do so. That would be the natural thing to do. However, it would also be natural to handle the NotSupportedException and just not set Content-Length at all.
If the NotSupportedException only shows up when running in the debugger, then just ignore it.
If the exception is actually thrown to your code (i.e., causing the request to fail), then you'll need to follow the rest of this answer.
First, create a minimal reproducible example and report a bug to the .NET team.
To work around this issue in the meantime, I recommend writing a stream wrapper that returns an already-determined length, which you can get from the Azure blob attributes. E.g.:
public sealed class KnownLengthStreamWrapper : Stream
{
private readonly Stream _stream;
public KnownLengthStreamWrapper(Stream stream, long length)
{
_stream = stream;
Length = length;
}
public override long Length { get; private set; }
... // override all other Stream members and forward to _stream.
}
That should be sufficient to get your app working.
I tried to push the stream into an MemoryStream
This didn't work because you'd need to "rewind" the MemoryStream at some point, e.g.:
var stream = new MemoryStream();
await blobClient.DownloadToAsync(stream);
stream.Position = 0;
return stream;
Check this sample of all the blob options which i have already posted on git working as expected. Reference
public void DownloadBlob(string path)
{
storageAccount = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("StorageConnectionString"));
CloudBlobClient client = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = client.GetContainerReference("images");
CloudBlockBlob blockBlob = container.GetBlockBlobReference(Path.GetFileName(path));
using (MemoryStream ms = new MemoryStream())
{
blockBlob.DownloadToStream(ms);
HttpContext.Current.Response.ContentType = blockBlob.Properties.ContentType.ToString();
HttpContext.Current.Response.AddHeader("Content-Disposition", "Attachment; filename=" + Path.GetFileName(path).ToString());
HttpContext.Current.Response.AddHeader("Content-Length", blockBlob.Properties.Length.ToString());
HttpContext.Current.Response.BinaryWrite(ms.ToArray());
HttpContext.Current.Response.Flush();
HttpContext.Current.Response.Close();
}
}

Is it possible to use more than 1.5 Gb of memory with an Azure Function App V2

I'm currently using v2 of Azure Function Apps. I've set the environment to be 64 bit and am compiling to .Net Standard 2.0. Host Json specifies version 2.
I'm reading in a .csv and it works fine for smaller files. But when I read in a 180MB .csv into a List of string[] it's ballooning to over a GB on read and when I try to parse it, it's up over 2 GB but then throws the 'Out of Memory' Exception. Even running on an app service plan with more than 3.5 GB hasn't solved the issue.
Edit:
I'm using this:
Uri blobUri = AppendSasOnUri(blobName); _webClient = new WebClient();
Stream sourceStream = _webClient.OpenRead(blobUri);
_reader = new StreamReader(sourceStream);
However, since It's a csv, I'm splitting out entire columns of data. It's pretty hard to get away from this:
internal async Task<List<string[]>> ReadCsvAsync() {
while (!_reader.EndOfStream) {
string[] currentCsvRow = await ReadCsvRowAsync();
_fullBlobCsv.Add(currentCsvRow);
}
return _fullBlobCsv; }
Goal is to store json into blob when alls said and done.
Try using stream (StreamReader) to read the input .csv file and process one line at a time.
I'm able to parse 300mb files on consumption plan with streams. My use-case may not be same but similar. Parse a large concatenated pdf file and separate it to 5000+ smaller files and store the separated files into blob container. Below is my code for reference.
For your use case you may want to use CloudAppendBlob instead of CloudBlockBlob if you're pushing all parsed data into single blob.
public async static void ExtractSmallerFiles(CloudBlockBlob myBlob, string fileDate, ILogger log)
{
using (var reader = new StreamReader(await myBlob.OpenReadAsync()))
{
CloudBlockBlob blockBlob = null;
var fileContents = new StringBuilder(string.Empty);
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
if (line.StartsWith("%%MS_SKEY_0000_000_PDF:"))
{
var matches = Regex.Match(line, #"%%MS_SKEY_0000_000_PDF: A(\d+)_SMFL_B1234_D(\d{8})_A\d+_M(\d{15}) _N\d+");
var smallFileDate = matches.Groups[2];
var accountNumber = matches.Groups[3];
var fileName = $"SmallerFiles/{smallFileDate}/{accountNumber}.pdf";
blockBlob = myBlob.Container.GetBlockBlobReference(fileName);
}
fileContents.AppendLine(line);
if (line.Equals("%%EOF"))
{
log.LogInformation($"Uploading {fileContents.Length} bytes to {blockBlob.Name}");
await blockBlob.UploadTextAsync(fileContents.ToString());
fileContents = new StringBuilder(string.Empty);
}
}
await myBlob.DeleteAsync();
log.LogInformation("Extracted Smaller files");
}
}

Azure Blob Storage Uploads Fine, but Blob Doesn't Exist

Here is my code:
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Blob;
using System;
using Microsoft.WindowsAzure;
using System.Net.Http;
namespace Test
{
class Program
{
static void Main(string[] args)
{
//get the storage account from the connection string
CloudStorageAccount storageAccount = CloudStorageAccount.Parse("DefaultEndpointsProtocol=https;AccountName=[account name];AccountKey=[account key];EndpointSuffix=core.windows.net");
//instantiate the client
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
//set the container
CloudBlobContainer container = blobClient.GetContainerReference("images");
//get the blob reference
CloudBlockBlob blockBlob = container.GetBlockBlobReference("myblob.jpg");
//get image from stream and upload
using (var client = new HttpClient())
{
using (var stream = client.GetStreamAsync(some_url).GetAwaiter().GetResult())
{
if (stream != null)
{
blockBlob.UploadFromStreamAsync(stream);
}
}
client.Dispose();
}
}
}
}
The storage account instantiation works fine.
The container referencing works fine (it actually exists).
The block blob referencing works, as well, with no errors.
The stream has the image I am getting from the URL referenced.
Finally, the upload returns no errors.
Except, there is no image when I navigate to the Blob URI.
I get the following error:
The specified blob does not exist. RequestId:7df0aadc-0001-007c-6b90-f95158000000 Time:2017-07-10T15:21:25.2984015Z
I have also uploaded an image via the Azure Portal and that exists and can be navigated to through a browser.
Am I missing something?
Update below line in your code as you're calling async method.
blockBlob.UploadFromStreamAsync(stream).GetAwaiter().GetResult();
This should resolve your problem.

Uploading files to Azure blob storage taking more time for larger files

Hi All...
I am trying to uploading the lager file (size more than 100 MB) files
to Azure blob storage.Below is the code.
My problem is even though I have used BeginPutBlock with TPL (Task
Parallelism) it is taking more time (20 Min for 100 MB uploading). But
i have to upload the files more than 2 GB size. Can anyone please help
me on this.
namespace BlobSamples {
public class UploadAsync
{
static void Main(string[] args)
{
//string filePath = #"D:\Frameworks\DNCMag-Issue26-DoubleSpread.pdf";
string filePath = #"E:\E Books\NageswaraRao Meterial\ebooks\applied_asp.net_4_in_context.pdf";
string accountName = "{account name}";
string accountKey = "{account key}";
string containerName = "sampleContainer";
string blobName = Path.GetFileName(filePath);
//byte[] fileContent = File.ReadAllBytes(filePath);
Stream fileContent = System.IO.File.OpenRead(filePath);
StorageCredentials creds = new StorageCredentials(accountName, accountKey);
CloudStorageAccount storageAccount = new CloudStorageAccount(creds, useHttps: true);
CloudBlobClient blobclient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobclient.GetContainerReference(containerName);
CloudBlockBlob blob = container.GetBlockBlobReference(blobName);
// Define your retry strategy: retry 5 times, starting 1 second apart
// and adding 2 seconds to the interval each retry.
var retryStrategy = new Incremental(5, TimeSpan.FromSeconds(1),
TimeSpan.FromSeconds(2));
// Define your retry policy using the retry strategy and the Azure storage
// transient fault detection strategy.
var retryPolicy =
new RetryPolicy<StorageTransientErrorDetectionStrategy>(retryStrategy);
// Receive notifications about retries.
retryPolicy.Retrying += (sender, arg) =>
{
// Log details of the retry.
var msg = String.Format("Retry - Count:{0}, Delay:{1}, Exception:{2}",
arg.CurrentRetryCount, arg.Delay, arg.LastException);
};
Console.WriteLine("Upload Started" + DateTime.Now);
ChunkedUploadStreamAsync(blob, fileContent, (1024*1024), retryPolicy);
Console.WriteLine("Upload Ended" + DateTime.Now);
Console.ReadLine();
}
private static Task PutBlockAsync(CloudBlockBlob blob, string id, Stream stream, RetryPolicy policy)
{
Func<Task> uploadTaskFunc = () => Task.Factory
.FromAsync(
(asyncCallback, state) => blob.BeginPutBlock(id, stream, null, null, null, null, asyncCallback, state)
, blob.EndPutBlock
, null
);
Console.WriteLine("Uploaded " + id + DateTime.Now);
return policy.ExecuteAsync(uploadTaskFunc);
}
public static Task ChunkedUploadStreamAsync(CloudBlockBlob blob, Stream source, int chunkSize, RetryPolicy policy)
{
var blockids = new List<string>();
var blockid = 0;
int count;
// first create a list of TPL Tasks for uploading blocks asynchronously
var tasks = new List<Task>();
var bytes = new byte[chunkSize];
while ((count = source.Read(bytes, 0, bytes.Length)) != 0)
{
var id = Convert.ToBase64String(BitConverter.GetBytes(++blockid));
blockids.Add(id);
tasks.Add(PutBlockAsync(blob, id, new MemoryStream(bytes, true), policy));
bytes = new byte[chunkSize]; //need a new buffer to avoid overriding previous one
}
return Task.Factory.ContinueWhenAll(
tasks.ToArray(),
array =>
{
// propagate exceptions and make all faulted Tasks as observed
Task.WaitAll(array);
policy.ExecuteAction(() => blob.PutBlockListAsync(blockids));
Console.WriteLine("Uploaded Completed " + DateTime.Now);
});
}
} }
If you can accept command line tool, you can try AzCopy, which is able to transfer Azure Storage data in high performance and its transferring can be resumed.
If you want to control the transferring jobs programmatically, please use Azure Storage Data Movement Library, which is the core of AzCopy.
As I known, Block blobs are made up of blocks. A block could up to 4MB in size. According to your code, you set the block size to 1MB and programatically uploaded each block in parallel. For a simple way, you could leverage the property ParallelOperationThreadCount to upload blob blocks in parallel as follows:
//set the number of blocks that may be simultaneously uploaded
var requestOption = new BlobRequestOptions()
{
ParallelOperationThreadCount = 5,
//Gets or sets the maximum size of a blob in bytes that may be uploaded as a single blob
SingleBlobUploadThresholdInBytes = 10 * 1024 * 1024 //maximum for 64MB,32MB by default
};
//upload a file to blob
blob.UploadFromFile("{filepath}", options: requestOption);
Upon the option, when your blob(file) is larger than the value in SingleBlobUploadThresholdInBytes, then the storage client breaks the file into blocks(4MB in size) automatically and upload the blocks simultaneously.
Based on your requirement, I created an ASP.NET Web API application which exposes a API to upload file to Azure Blob Storage.
Project URL: AspDotNet-WebApi-AzureBlobFileUploadSample
Note:
In order to upload large file, you need to increase the maxRequestLength and maxAllowedContentLength in your web.config as follows:
<system.web>
<httpRuntime maxRequestLength="2097152"/> <!--KB in size, 4MB by default, increase it to 2GB-->
</system.web>
<system.webServer>
<security>
<requestFiltering>
<requestLimits maxAllowedContentLength="2147483648" /> <!--Byte in size,increase it to 2GB-->
</requestFiltering>
</security>
</system.webServer>
Screenshot
I'd suggest you use Azcopy when uploading large files, it saves a lot time for coding by yourself and is more efficient. For upload a single file, run the command below:
AzCopy /Source:C:\folder /Dest:https://youraccount.blob.core.windows.net/container /DestKey:key /Pattern:"test.txt"

Getting an error when uploading a file to Azure Storage

I'm converting a website from a standard ASP.NET website over to use Azure. The website had previously taken an Excel file uploaded by an administrative user and saved it on the file system. As part of the migration, I'm saving this file to Azure Storage. It works fine when running against my local storage through the Azure SDK. (I'm using version 1.3 since I didn't want to upgrade during the development process.)
When I point the code to run against Azure Storage itself, though, the process usually fails. The error I get is:
System.IO.IOException occurred
Message=Unable to read data from the transport connection: The connection was closed.
Source=Microsoft.WindowsAzure.StorageClient
StackTrace:
at Microsoft.WindowsAzure.StorageClient.Tasks.Task`1.get_Result()
at Microsoft.WindowsAzure.StorageClient.Tasks.Task`1.ExecuteAndWait()
at Microsoft.WindowsAzure.StorageClient.CloudBlob.UploadFromStream(Stream source, BlobRequestOptions options)
at Framework.Common.AzureBlobInteraction.UploadToBlob(Stream stream, String BlobContainerName, String fileName, String contentType) in C:\Development\RateSolution2010\Framework.Common\AzureBlobInteraction.cs:line 95
InnerException:
The code is as follows:
public void UploadToBlob(Stream stream, string BlobContainerName, string fileName,
string contentType)
{
// Setup the connection to Windows Azure Storage
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(GetConnStr());
DiagnosticMonitorConfiguration dmc = DiagnosticMonitor.GetDefaultInitialConfiguration();
dmc.Logs.ScheduledTransferPeriod = TimeSpan.FromMinutes(1);
dmc.Logs.ScheduledTransferLogLevelFilter = LogLevel.Verbose;
DiagnosticMonitor.Start(storageAccount, dmc);
CloudBlobClient BlobClient = null;
CloudBlobContainer BlobContainer = null;
BlobClient = storageAccount.CreateCloudBlobClient();
// For large file copies you need to set up a custom timeout period
// and using parallel settings appears to spread the copy across multiple threads
// if you have big bandwidth you can increase the thread number below
// because Azure accepts blobs broken into blocks in any order of arrival.
BlobClient.Timeout = new System.TimeSpan(1, 0, 0);
Role serviceRole = RoleEnvironment.Roles.Where(s => s.Value.Name == "OnlineRates.Web").First().Value;
BlobClient.ParallelOperationThreadCount = serviceRole.Instances.Count;
// Get and create the container
BlobContainer = BlobClient.GetContainerReference(BlobContainerName);
BlobContainer.CreateIfNotExist();
//delete prior version if one exists
BlobRequestOptions options = new BlobRequestOptions();
options.DeleteSnapshotsOption = DeleteSnapshotsOption.None;
CloudBlob blobToDelete = BlobContainer.GetBlobReference(fileName);
Trace.WriteLine("Blob " + fileName + " deleted to be replaced by newer version.");
blobToDelete.DeleteIfExists(options);
//set stream to starting position
stream.Position = 0;
long totalBytes = 0;
//Open the stream and read it back.
using (stream)
{
// Create the Blob and upload the file
CloudBlockBlob blob = BlobContainer.GetBlockBlobReference(fileName);
try
{
BlobClient.ResponseReceived += new EventHandler<ResponseReceivedEventArgs>((obj, responseReceivedEventArgs)
=>
{
if (responseReceivedEventArgs.RequestUri.ToString().Contains("comp=block&blockid"))
{
totalBytes += Int64.Parse(responseReceivedEventArgs.RequestHeaders["Content-Length"]);
}
});
blob.UploadFromStream(stream);
// Set the metadata into the blob
blob.Metadata["FileName"] = fileName;
blob.SetMetadata();
// Set the properties
blob.Properties.ContentType = contentType;
blob.SetProperties();
}
catch (Exception exc)
{
Logging.ExceptionLogger.LogEx(exc);
}
}
}
I've tried a number of different alterations to the code: deleting a blob before replacing it (although the problem exists on new blobs as well), setting container permissions, not setting permissions, etc.
Your code looks like it should work, but it has lots of extra functionality that is not strictly required. I would cut it down to an absolute minimum and go from there. It's really only a gut feeling, but I think it might be the using statement giving you grief. This enture function could be written (presuming the container already exists) as:
public void UploadToBlob(Stream stream, string BlobContainerName, string fileName,
string contentType)
{
// Setup the connection to Windows Azure Storage
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(GetConnStr());
CloudBlobClient BlobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer BlobContainer = BlobClient.GetContainerReference(BlobContainerName);
CloudBlockBlob blob = BlobContainer.GetBlockBlobReference(fileName);
stream.Position = 0;
blob.UploadFromStream(stream);
}
Notes on the stuff that I've removed:
You should set up diagnostics just once when you're app starts, not every time a method is called. Usually in the RoleEntryPoint.OnStart()
I'm not sure why you're trying to set ParallelOperationThreadCount higher if you have more instances. Those two things seem unrelated.
It's not good form to check for the existence of a container/table every time you save something to it. It's more usual to do that check once when your app starts or to have a process external to the website to make sure all the required containers/tables/queues exist. Of course if you're trying to dynamically create containers this is not true.
The problem turned out to be firewall settings on my laptop. It's my personal laptop originally set up at home and so the firewall rules weren't set up for a corporate environment resulting in slow performance on uploads and downloads.

Resources