Is it possible to use more than 1.5 Gb of memory with an Azure Function App V2 - azure

I'm currently using v2 of Azure Function Apps. I've set the environment to be 64 bit and am compiling to .Net Standard 2.0. Host Json specifies version 2.
I'm reading in a .csv and it works fine for smaller files. But when I read in a 180MB .csv into a List of string[] it's ballooning to over a GB on read and when I try to parse it, it's up over 2 GB but then throws the 'Out of Memory' Exception. Even running on an app service plan with more than 3.5 GB hasn't solved the issue.
Edit:
I'm using this:
Uri blobUri = AppendSasOnUri(blobName); _webClient = new WebClient();
Stream sourceStream = _webClient.OpenRead(blobUri);
_reader = new StreamReader(sourceStream);
However, since It's a csv, I'm splitting out entire columns of data. It's pretty hard to get away from this:
internal async Task<List<string[]>> ReadCsvAsync() {
while (!_reader.EndOfStream) {
string[] currentCsvRow = await ReadCsvRowAsync();
_fullBlobCsv.Add(currentCsvRow);
}
return _fullBlobCsv; }
Goal is to store json into blob when alls said and done.

Try using stream (StreamReader) to read the input .csv file and process one line at a time.
I'm able to parse 300mb files on consumption plan with streams. My use-case may not be same but similar. Parse a large concatenated pdf file and separate it to 5000+ smaller files and store the separated files into blob container. Below is my code for reference.
For your use case you may want to use CloudAppendBlob instead of CloudBlockBlob if you're pushing all parsed data into single blob.
public async static void ExtractSmallerFiles(CloudBlockBlob myBlob, string fileDate, ILogger log)
{
using (var reader = new StreamReader(await myBlob.OpenReadAsync()))
{
CloudBlockBlob blockBlob = null;
var fileContents = new StringBuilder(string.Empty);
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
if (line.StartsWith("%%MS_SKEY_0000_000_PDF:"))
{
var matches = Regex.Match(line, #"%%MS_SKEY_0000_000_PDF: A(\d+)_SMFL_B1234_D(\d{8})_A\d+_M(\d{15}) _N\d+");
var smallFileDate = matches.Groups[2];
var accountNumber = matches.Groups[3];
var fileName = $"SmallerFiles/{smallFileDate}/{accountNumber}.pdf";
blockBlob = myBlob.Container.GetBlockBlobReference(fileName);
}
fileContents.AppendLine(line);
if (line.Equals("%%EOF"))
{
log.LogInformation($"Uploading {fileContents.Length} bytes to {blockBlob.Name}");
await blockBlob.UploadTextAsync(fileContents.ToString());
fileContents = new StringBuilder(string.Empty);
}
}
await myBlob.DeleteAsync();
log.LogInformation("Extracted Smaller files");
}
}

Related

.NET Core: Reading Azure Storage Blob into Memory Stream throws NotSupportedException in HttpBaseStream

I want to download a storage blob from Azure and stream it to a client via an .NET Web-App. The blob was uploaded correctly and is visible in my Azure storage account.
Surprisingly, the following throws an exception within HttpBaseStream:
[...]
var blobClient = _containerClient.GetBlobClient(Path.Combine(fileName));
var stream = await blobClient.OpenReadAsync();
return stream;
-> When i step further and return a File (return File(stream, MediaTypeNames.Application.Octet);), the download works as intended.
I tried to push the stream into an MemoryStream, which also fails with the same exception:
[...]
var blobClient = _containerClient.GetBlobClient(Path.Combine(fileName));
var stream = new MemoryStream();
await blobClient.DownloadToAsync(stream);
return stream
->When i step further, returning the file results in a timeout.
How can i fix that? Why do i get this exception - i followed the official quickstart guide from Microsoft.
the following throws an exception within HttpBaseStream
It looks like the HTTP result type is attempting to set the Content-Length header and is reading Length to do so. That would be the natural thing to do. However, it would also be natural to handle the NotSupportedException and just not set Content-Length at all.
If the NotSupportedException only shows up when running in the debugger, then just ignore it.
If the exception is actually thrown to your code (i.e., causing the request to fail), then you'll need to follow the rest of this answer.
First, create a minimal reproducible example and report a bug to the .NET team.
To work around this issue in the meantime, I recommend writing a stream wrapper that returns an already-determined length, which you can get from the Azure blob attributes. E.g.:
public sealed class KnownLengthStreamWrapper : Stream
{
private readonly Stream _stream;
public KnownLengthStreamWrapper(Stream stream, long length)
{
_stream = stream;
Length = length;
}
public override long Length { get; private set; }
... // override all other Stream members and forward to _stream.
}
That should be sufficient to get your app working.
I tried to push the stream into an MemoryStream
This didn't work because you'd need to "rewind" the MemoryStream at some point, e.g.:
var stream = new MemoryStream();
await blobClient.DownloadToAsync(stream);
stream.Position = 0;
return stream;
Check this sample of all the blob options which i have already posted on git working as expected. Reference
public void DownloadBlob(string path)
{
storageAccount = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("StorageConnectionString"));
CloudBlobClient client = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = client.GetContainerReference("images");
CloudBlockBlob blockBlob = container.GetBlockBlobReference(Path.GetFileName(path));
using (MemoryStream ms = new MemoryStream())
{
blockBlob.DownloadToStream(ms);
HttpContext.Current.Response.ContentType = blockBlob.Properties.ContentType.ToString();
HttpContext.Current.Response.AddHeader("Content-Disposition", "Attachment; filename=" + Path.GetFileName(path).ToString());
HttpContext.Current.Response.AddHeader("Content-Length", blockBlob.Properties.Length.ToString());
HttpContext.Current.Response.BinaryWrite(ms.ToArray());
HttpContext.Current.Response.Flush();
HttpContext.Current.Response.Close();
}
}

Uploading files to Azure blob storage taking more time for larger files

Hi All...
I am trying to uploading the lager file (size more than 100 MB) files
to Azure blob storage.Below is the code.
My problem is even though I have used BeginPutBlock with TPL (Task
Parallelism) it is taking more time (20 Min for 100 MB uploading). But
i have to upload the files more than 2 GB size. Can anyone please help
me on this.
namespace BlobSamples {
public class UploadAsync
{
static void Main(string[] args)
{
//string filePath = #"D:\Frameworks\DNCMag-Issue26-DoubleSpread.pdf";
string filePath = #"E:\E Books\NageswaraRao Meterial\ebooks\applied_asp.net_4_in_context.pdf";
string accountName = "{account name}";
string accountKey = "{account key}";
string containerName = "sampleContainer";
string blobName = Path.GetFileName(filePath);
//byte[] fileContent = File.ReadAllBytes(filePath);
Stream fileContent = System.IO.File.OpenRead(filePath);
StorageCredentials creds = new StorageCredentials(accountName, accountKey);
CloudStorageAccount storageAccount = new CloudStorageAccount(creds, useHttps: true);
CloudBlobClient blobclient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobclient.GetContainerReference(containerName);
CloudBlockBlob blob = container.GetBlockBlobReference(blobName);
// Define your retry strategy: retry 5 times, starting 1 second apart
// and adding 2 seconds to the interval each retry.
var retryStrategy = new Incremental(5, TimeSpan.FromSeconds(1),
TimeSpan.FromSeconds(2));
// Define your retry policy using the retry strategy and the Azure storage
// transient fault detection strategy.
var retryPolicy =
new RetryPolicy<StorageTransientErrorDetectionStrategy>(retryStrategy);
// Receive notifications about retries.
retryPolicy.Retrying += (sender, arg) =>
{
// Log details of the retry.
var msg = String.Format("Retry - Count:{0}, Delay:{1}, Exception:{2}",
arg.CurrentRetryCount, arg.Delay, arg.LastException);
};
Console.WriteLine("Upload Started" + DateTime.Now);
ChunkedUploadStreamAsync(blob, fileContent, (1024*1024), retryPolicy);
Console.WriteLine("Upload Ended" + DateTime.Now);
Console.ReadLine();
}
private static Task PutBlockAsync(CloudBlockBlob blob, string id, Stream stream, RetryPolicy policy)
{
Func<Task> uploadTaskFunc = () => Task.Factory
.FromAsync(
(asyncCallback, state) => blob.BeginPutBlock(id, stream, null, null, null, null, asyncCallback, state)
, blob.EndPutBlock
, null
);
Console.WriteLine("Uploaded " + id + DateTime.Now);
return policy.ExecuteAsync(uploadTaskFunc);
}
public static Task ChunkedUploadStreamAsync(CloudBlockBlob blob, Stream source, int chunkSize, RetryPolicy policy)
{
var blockids = new List<string>();
var blockid = 0;
int count;
// first create a list of TPL Tasks for uploading blocks asynchronously
var tasks = new List<Task>();
var bytes = new byte[chunkSize];
while ((count = source.Read(bytes, 0, bytes.Length)) != 0)
{
var id = Convert.ToBase64String(BitConverter.GetBytes(++blockid));
blockids.Add(id);
tasks.Add(PutBlockAsync(blob, id, new MemoryStream(bytes, true), policy));
bytes = new byte[chunkSize]; //need a new buffer to avoid overriding previous one
}
return Task.Factory.ContinueWhenAll(
tasks.ToArray(),
array =>
{
// propagate exceptions and make all faulted Tasks as observed
Task.WaitAll(array);
policy.ExecuteAction(() => blob.PutBlockListAsync(blockids));
Console.WriteLine("Uploaded Completed " + DateTime.Now);
});
}
} }
If you can accept command line tool, you can try AzCopy, which is able to transfer Azure Storage data in high performance and its transferring can be resumed.
If you want to control the transferring jobs programmatically, please use Azure Storage Data Movement Library, which is the core of AzCopy.
As I known, Block blobs are made up of blocks. A block could up to 4MB in size. According to your code, you set the block size to 1MB and programatically uploaded each block in parallel. For a simple way, you could leverage the property ParallelOperationThreadCount to upload blob blocks in parallel as follows:
//set the number of blocks that may be simultaneously uploaded
var requestOption = new BlobRequestOptions()
{
ParallelOperationThreadCount = 5,
//Gets or sets the maximum size of a blob in bytes that may be uploaded as a single blob
SingleBlobUploadThresholdInBytes = 10 * 1024 * 1024 //maximum for 64MB,32MB by default
};
//upload a file to blob
blob.UploadFromFile("{filepath}", options: requestOption);
Upon the option, when your blob(file) is larger than the value in SingleBlobUploadThresholdInBytes, then the storage client breaks the file into blocks(4MB in size) automatically and upload the blocks simultaneously.
Based on your requirement, I created an ASP.NET Web API application which exposes a API to upload file to Azure Blob Storage.
Project URL: AspDotNet-WebApi-AzureBlobFileUploadSample
Note:
In order to upload large file, you need to increase the maxRequestLength and maxAllowedContentLength in your web.config as follows:
<system.web>
<httpRuntime maxRequestLength="2097152"/> <!--KB in size, 4MB by default, increase it to 2GB-->
</system.web>
<system.webServer>
<security>
<requestFiltering>
<requestLimits maxAllowedContentLength="2147483648" /> <!--Byte in size,increase it to 2GB-->
</requestFiltering>
</security>
</system.webServer>
Screenshot
I'd suggest you use Azcopy when uploading large files, it saves a lot time for coding by yourself and is more efficient. For upload a single file, run the command below:
AzCopy /Source:C:\folder /Dest:https://youraccount.blob.core.windows.net/container /DestKey:key /Pattern:"test.txt"

Updating many json files in Azure blob storage?

I have about ~1,000,000 json files, that I would like to update every 30 minutes. The update is simply appending a new array to the end of the existing content.
A single update uses code similar to:
CloudBlockBlob blockBlob = container.GetBlockBlobReference(blobName);
JObject jObject = null;
// If the blob exists, then we may need to update it.
if(blockBlob.Exists())
{
MemoryStream memoryStream = new MemoryStream();
blockBlob.DownloadToStream(memoryStream);
jObject = JsonConvert.DeserializeObject(Encoding.UTF8.GetString(memoryStream.ToArray())) as JObject;
} // End of the blob exists
if(null == jObject)
{
jObject = new JObject();
jObject.Add(new JProperty("identifier", identifier));
} // End of the blob did not exist
JArray jsonArray = new JArray();
jObject.Add(new JProperty(string.Format("entries{0}", timestamp.ToString()),jsonArray));
foreach(var entry in newEntries)
{
jsonArray.Add(new JObject(
new JProperty("someId", entry.id),
new JProperty("someValue", value)
)
);
} // End of loop
string jsonString = JsonConvert.SerializeObject(jObject);
// Upload
blockBlob.Properties.ContentType = "text/json";
blockBlob.UploadFromStream(new MemoryStream(Encoding.UTF8.GetBytes(jsonString)));
Basically:
Check if the blob exists,
If it does, download the data and create a json object from the existing details.
If it did not, then create a new object with the details.
Push the update to the blob.
The problem with this is performance. I've done quite a few things I can do to increase performance (the updates run in five parallel threads and I have set ServicePointManager.UseNagleAlgorithm to false.
It still runs slow though. Roughly ~100,000 updates can take up to an hour.
So I guess basically, my questions would be:
Should I be using Azure Blob storage for this? (I'm open to alternative suggestions).
If so, any suggestions on improving performance?
Note: The file basically contains a history of events and I cannot re-generate the entire file based on existing data. This is why the contents are downloaded before being updated.

NPOI writing XLS file converting to Azure Blob

I'm trying to convert current application that uses NPOI for creating xls document on the server to Azure hosted application. I have little experience with NPOI and Azure so 2 strikes right there. I have the app uploading the xls to Blob container however it is always blank (9 bytes). From what I understand NPOI uses filestream to write to the file so I just changed that to write to the blob container.
Here is what i think are the relevant portions:
internal void GenerateExcel(DataSet ds, int QuoteID, string ReportFileName)
{
string ExcelFileName = string.Format("{0}_{1}.xls",ReportFileName,QuoteID);
try
{
//these 2 strings will get deleted but left here for now to run side by side at the moment
string ReportDirectoryPath = HttpContext.Current.Server.MapPath(".") + "\\Reports";
if (!Directory.Exists(ReportDirectoryPath))
{
Directory.CreateDirectory(ReportDirectoryPath);
}
string ExcelReportFullPath = ReportDirectoryPath + "\\" + ExcelFileName;
if (File.Exists(ExcelReportFullPath))
{
File.Delete(ExcelReportFullPath);
}
// Create a new workbook.
var workbook = new HSSFWorkbook();
//Rest of the NPOI XLS rows cells etc. etc. all works fine when writing to disk////////////////
// Retrieve storage account from connection string.
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("StorageConnectionString"));
// Create the blob client.
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
// Retrieve a reference to a container.
CloudBlobContainer container = blobClient.GetContainerReference("pricingappreports");
// Create the container if it doesn't already exist.
if (container.CreateIfNotExists())
{
container.SetPermissions(new BlobContainerPermissions { PublicAccess = BlobContainerPublicAccessType.Blob });
}
// Retrieve reference to a blob with the same name.
CloudBlockBlob blockBlob = container.GetBlockBlobReference(ExcelFileName);
// Write the output to a file on the server
String file = ExcelReportFullPath;
using (FileStream fs = new FileStream(file, FileMode.Create))
{
workbook.Write(fs);
fs.Close();
}
// Write the output to a file on Azure Storage
String Blobfile = ExcelFileName;
using (FileStream fs = new FileStream(Blobfile, FileMode.Create))
{
workbook.Write(fs);
blockBlob.UploadFromStream(fs);
fs.Close();
}
}
I'm uploading to the Blob and the file exists, why doesn't the data get written to the xls?
Any help would be appreciated.
Update: I think I found the problem. Doesn't look like you can write to a file in Blob Storage. Found this Blog which pretty much answers my questions: it doesn't use NPOI but the concept is the same. http://debugmode.net/2011/08/28/creating-and-updating-excel-file-in-windows-azure-web-role-using-open-xml-sdk/
Thanks
Can you install fiddler and check the request and the response packets? You may also need to seek back to 0 between two writes . So the correct code here could be to add the below before trying to write the stream to blob.
workbook.Write(fs);
fs.Seek(0, SeekOrigin.Begin);
blockBlob.UploadFromStream(fs);
fs.Close();
I also noticed that you are using String Blobfile = ExcelFileName instead of String Blobfile = ExcelReportFullPath.

Getting an error when uploading a file to Azure Storage

I'm converting a website from a standard ASP.NET website over to use Azure. The website had previously taken an Excel file uploaded by an administrative user and saved it on the file system. As part of the migration, I'm saving this file to Azure Storage. It works fine when running against my local storage through the Azure SDK. (I'm using version 1.3 since I didn't want to upgrade during the development process.)
When I point the code to run against Azure Storage itself, though, the process usually fails. The error I get is:
System.IO.IOException occurred
Message=Unable to read data from the transport connection: The connection was closed.
Source=Microsoft.WindowsAzure.StorageClient
StackTrace:
at Microsoft.WindowsAzure.StorageClient.Tasks.Task`1.get_Result()
at Microsoft.WindowsAzure.StorageClient.Tasks.Task`1.ExecuteAndWait()
at Microsoft.WindowsAzure.StorageClient.CloudBlob.UploadFromStream(Stream source, BlobRequestOptions options)
at Framework.Common.AzureBlobInteraction.UploadToBlob(Stream stream, String BlobContainerName, String fileName, String contentType) in C:\Development\RateSolution2010\Framework.Common\AzureBlobInteraction.cs:line 95
InnerException:
The code is as follows:
public void UploadToBlob(Stream stream, string BlobContainerName, string fileName,
string contentType)
{
// Setup the connection to Windows Azure Storage
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(GetConnStr());
DiagnosticMonitorConfiguration dmc = DiagnosticMonitor.GetDefaultInitialConfiguration();
dmc.Logs.ScheduledTransferPeriod = TimeSpan.FromMinutes(1);
dmc.Logs.ScheduledTransferLogLevelFilter = LogLevel.Verbose;
DiagnosticMonitor.Start(storageAccount, dmc);
CloudBlobClient BlobClient = null;
CloudBlobContainer BlobContainer = null;
BlobClient = storageAccount.CreateCloudBlobClient();
// For large file copies you need to set up a custom timeout period
// and using parallel settings appears to spread the copy across multiple threads
// if you have big bandwidth you can increase the thread number below
// because Azure accepts blobs broken into blocks in any order of arrival.
BlobClient.Timeout = new System.TimeSpan(1, 0, 0);
Role serviceRole = RoleEnvironment.Roles.Where(s => s.Value.Name == "OnlineRates.Web").First().Value;
BlobClient.ParallelOperationThreadCount = serviceRole.Instances.Count;
// Get and create the container
BlobContainer = BlobClient.GetContainerReference(BlobContainerName);
BlobContainer.CreateIfNotExist();
//delete prior version if one exists
BlobRequestOptions options = new BlobRequestOptions();
options.DeleteSnapshotsOption = DeleteSnapshotsOption.None;
CloudBlob blobToDelete = BlobContainer.GetBlobReference(fileName);
Trace.WriteLine("Blob " + fileName + " deleted to be replaced by newer version.");
blobToDelete.DeleteIfExists(options);
//set stream to starting position
stream.Position = 0;
long totalBytes = 0;
//Open the stream and read it back.
using (stream)
{
// Create the Blob and upload the file
CloudBlockBlob blob = BlobContainer.GetBlockBlobReference(fileName);
try
{
BlobClient.ResponseReceived += new EventHandler<ResponseReceivedEventArgs>((obj, responseReceivedEventArgs)
=>
{
if (responseReceivedEventArgs.RequestUri.ToString().Contains("comp=block&blockid"))
{
totalBytes += Int64.Parse(responseReceivedEventArgs.RequestHeaders["Content-Length"]);
}
});
blob.UploadFromStream(stream);
// Set the metadata into the blob
blob.Metadata["FileName"] = fileName;
blob.SetMetadata();
// Set the properties
blob.Properties.ContentType = contentType;
blob.SetProperties();
}
catch (Exception exc)
{
Logging.ExceptionLogger.LogEx(exc);
}
}
}
I've tried a number of different alterations to the code: deleting a blob before replacing it (although the problem exists on new blobs as well), setting container permissions, not setting permissions, etc.
Your code looks like it should work, but it has lots of extra functionality that is not strictly required. I would cut it down to an absolute minimum and go from there. It's really only a gut feeling, but I think it might be the using statement giving you grief. This enture function could be written (presuming the container already exists) as:
public void UploadToBlob(Stream stream, string BlobContainerName, string fileName,
string contentType)
{
// Setup the connection to Windows Azure Storage
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(GetConnStr());
CloudBlobClient BlobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer BlobContainer = BlobClient.GetContainerReference(BlobContainerName);
CloudBlockBlob blob = BlobContainer.GetBlockBlobReference(fileName);
stream.Position = 0;
blob.UploadFromStream(stream);
}
Notes on the stuff that I've removed:
You should set up diagnostics just once when you're app starts, not every time a method is called. Usually in the RoleEntryPoint.OnStart()
I'm not sure why you're trying to set ParallelOperationThreadCount higher if you have more instances. Those two things seem unrelated.
It's not good form to check for the existence of a container/table every time you save something to it. It's more usual to do that check once when your app starts or to have a process external to the website to make sure all the required containers/tables/queues exist. Of course if you're trying to dynamically create containers this is not true.
The problem turned out to be firewall settings on my laptop. It's my personal laptop originally set up at home and so the firewall rules weren't set up for a corporate environment resulting in slow performance on uploads and downloads.

Resources