Getting progress of CloudBlob.DownloadToFileParallelAsync - azure

I'm trying to download binary files from my Azure storage account. Initially, I was using CloudBlob.DownloadToFileAsync() which allowed me to supply a IProgress parameter and get progress updates of the transfer.
However, on bigger > 2gb files, DownloadToFileAsync was hanging. According to the documentation I needed to be using DownloadToFileParallelAsync to download larger files. I have implemented this, and confirm it now works, but now I'm unable to get the progress of the download as it does not offer a IProgress parameter.
Can anyone point me to how I can gather any useful progress data, or offer a workaround?
int parallelIOCount = SystemInfo.processorCount;
long rangeSizeInBytes = 16 * Constants.MB;
await cloudModuleBlob.DownloadToFileParallelAsync(targetTempModuleFile, FileMode.Create, parallelIOCount, rangeSizeInBytes, cancellationTokenSource.Token);
progressSlider.value = 1.0f;
//When the download is finished...
//Rename the temp file to the full version.
if (File.Exists(targetCiqModuleFile))
File.Move(targetTempModuleFile, targetCiqModuleFile);
Debug.Log("Download saved to: " + targetCiqModuleFile);

Solved it with a workarounnd. Rather than using DownloadToFileAsync I instead used DownloadRangeToStreamAsync to break the blob into smaller pieces and combined them on the client end. Works effectively with 16mb chunks.
//Create the file.
using (FileStream fileStream = File.Create(targetTempModuleFile))
long chunkSize = 16 * Constants.MB;
Int64 current = 0;
while (current < cloudModuleBlob.Properties.Length)
if ((current + chunkSize) > cloudModuleBlob.Properties.Length)
await cloudModuleBlob.DownloadRangeToStreamAsync(fileStream, current, (cloudModuleBlob.Properties.Length - current), default, default, default, progressHandler, cancellationToken);
await cloudModuleBlob.DownloadRangeToStreamAsync(fileStream, current, chunkSize, default, default, default, progressHandler, cancellationToken);
current = current + chunkSize;


Fast file copy with progress information in Node.js?

Is there any chance to copy large files with Node.js with progress infos and fast?
Solution 1 : fs.createReadStream().pipe(...) = useless, up to 5 slower than native cp
See: Fastest way to copy file in node.js, progress information is possible (with npm package 'progress-stream' ):
fs = require('fs');
The only problem with that way is that it takes easily 5 times longer compared "cp source dest". See also the appendix below for the full test code.
Solution 2 : rsync ---info=progress2 = same slow as solution 1 = useless
Solution 3 : My last resort, write a native module for node.js, using "CoreUtils" (linux sources for cp and others) or other functions as shown in Fast file copy with progress
Does anyone knows better than solution 3? I'd like to avoid native code but it seems the best fit.
thanks! any package recommendations or hints (tried all fs**) are welcome!
test code, using pipe and progress:
var path = require('path');
var progress = require('progress-stream');
var fs = require('fs');
var _source = path.resolve('../inc/big.avi');// 1.5GB
var _target= '/tmp/a.avi';
var stat = fs.statSync(_source);
var str = progress({
length: stat.size,
time: 100
str.on('progress', function(progress) {
function copyFile(source, target, cb) {
var cbCalled = false;
var rd = fs.createReadStream(source);
rd.on("error", function(err) {
var wr = fs.createWriteStream(target);
wr.on("error", function(err) {
wr.on("close", function(ex) {
function done(err) {
if (!cbCalled) {
cb && cb(err);
cbCalled = true;
update: a fast (with detailed progress!) C version is implemented here: Seems the best place to go from :-)
One aspect that may slow down the process is related to console.log. Take a look into this code:
const fs = require('fs');
const sourceFile = 'large.exe'
const destFile = 'large_copy.exe'
fs.stat(sourceFile, function(err, stat){
const filesize = stat.size
let bytesCopied = 0
const readStream = fs.createReadStream(sourceFile)
readStream.on('data', function(buffer){
bytesCopied+= buffer.length
let porcentage = ((bytesCopied/filesize)*100).toFixed(2)
console.log(porcentage+'%') // run once with this and later with this line commented
readStream.on('end', function(){
Here are the execution times copying a 400mb file:
with console.log: 692.950ms
without console.log: 382.540ms
cpy and cp-file both support progress reporting
I have the same issue. I want to copy large files as fast as possible and want progress information. I created a test utility that tests the different copy methods:
You can run it simply with:
npx copy-speed-test --source --destination someNonExistentFolder
It does a native copy using child_process.exec(), a copy file using fs.copyFile and it uses createReadStream with a variety of different buffer sizes (you can change buffer sizes by passing them on the command line. run npx copy-speed-test -h for more info.
Some things I learnt:
fs.copyFile is just as fast as native
you can get quite inconsistent results on all these methods, particularly when copying from and to the same disc and with SSDs
if using a large buffer then createReadStream is nearly as good as the other methods
if you use a very large buffer then the progress is not very accurate.
The last point is because the progress is based on the read stream, not the write stream. if copying a 1.5GB file and your buffer is 1GB then the progress immediately jumps to 66% then jumps to 100% and you then have to wait whilst the write stream finishes writing. I don't think that you can display the progress of the write stream.
If you have the same issue I would recommend that you run these tests with similar file sizes to what you will be dealing with and across similar media. My end use case is copying a file from an SD card plugged into a raspberry pi and copied across a network to a NAS so that's what I was the scenario that I ran the tests for.
I hope someone other than me finds it useful!
I solved a similar problem (using Node v8 or v10) by changing the buffer size. I think the default buffer size is around 16kb, which fills and empties quickly but requires a full cycle around the event loop for each operation. I changed the buffer to 1MB and writing a 2GB image fell from taking around 30 minutes to 5, which sounds similar to what you are seeing. My image was also decompressed on the fly, which possibly exacerbated the problem. Documentation on stream buffering has been in the manual since at least Node v6:
Here are the key code components you can use:
let gzSize = 1; // do not initialize divisors to 0
const hwm = { highWaterMark: 1024 * 1024 }
const inStream = fs.createReadStream( filepath, hwm );
// Capture the filesize for showing percentages
inStream.on( 'open', function fileOpen( fdin ) {
inStream.pause(); // wait for fstat before starting
fs.fstat( fdin, function( err, stats ) {
gzSize = stats.size;
// openTargetDevice does a complicated fopen() for the output.
// This could simply be inStream.resume()
openTargetDevice( gzSize, targetDeviceOpened );
inStream.on( 'data', function shaData( data ) {
const bytesRead = data.length;
offset += bytesRead;
console.log( `Read ${offset} of ${gzSize} bytes, ${Math.floor( offset * 100 / gzSize )}% ...` );
// Write to the output file, etc.
// Once the target is open, I convert the fd to a stream and resume the input.
// For the purpose of example, note only that the output has the same buffer size.
function targetDeviceOpened( error, fd, device ) {
if( error ) return exitOnError( error );
const writeOpts = Object.assign( { fd }, hwm );
outStream = fs.createWriteStream( undefined, writeOpts );
outStream.on( 'open', function fileOpen( fdin ) {
// In a simpler structure, this is in the fstat() callback.
inStream.resume(); // we have the _input_ size, resume read
// [...]
I have not made any attempt to optimize these further; the result is similar to what I get on the commandline using 'dd' which is my benchmark.
I left in converting a file descriptor to a stream and using the pause/resume logic so you can see how these might be useful in more complicated situations than the simple fs.statSync() in your original post. Otherwise, this is simply adding the highWaterMark option to Tulio's answer.
Here is what I'm trying to use now, it copies 1 file with progress:
String.prototype.toHHMMSS = function () {
var sec_num = parseInt(this, 10); // don't forget the second param
var hours = Math.floor(sec_num / 3600);
var minutes = Math.floor((sec_num - (hours * 3600)) / 60);
var seconds = sec_num - (hours * 3600) - (minutes * 60);
if (hours < 10) {hours = "0"+hours;}
if (minutes < 10) {minutes = "0"+minutes;}
if (seconds < 10) {seconds = "0"+seconds;}
return hours+':'+minutes+':'+seconds;
var purefile="20200811140938_0002.MP4";
var filename="/sourceDir"+purefile;
var output="/destinationDir"+purefile;
var progress = require('progress-stream');
var fs = require('fs');
const convertBytes = function(bytes) {
const sizes = ["Bytes", "KB", "MB", "GB", "TB"]
if (bytes == 0) {
return "n/a"
const i = parseInt(Math.floor(Math.log(bytes) / Math.log(1024)))
if (i == 0) {
return bytes + " " + sizes[i]
return (bytes / Math.pow(1024, i)).toFixed(1) + " " + sizes[i]
var copiedFileSize = fs.statSync(filename).size;
var str = progress({
length: copiedFileSize, // length(integer) - If you already know the length of the stream, then you can set it. Defaults to 0.
time: 200, // time(integer) - Sets how often progress events are emitted in ms. If omitted then the default is to do so every time a chunk is received.
speed: 1, // speed(integer) - Sets how long the speedometer needs to calculate the speed. Defaults to 5 sec.
// drain: true // drain(boolean) - In case you don't want to include a readstream after progress-stream, set to true to drain automatically. Defaults to false.
// transferred: false// transferred(integer) - If you want to set the size of previously downloaded data. Useful for a resumed download.
percentage: 9.05,
transferred: 949624,
length: 10485760,
remaining: 9536136,
eta: 42,
runtime: 3,
delta: 295396,
speed: 949624
str.on('progress', function(progress) {
console.log('eltelt: '+progress.runtime.toString().toHHMMSS() + 's / hátra: ' + progress.eta.toString().toHHMMSS()+'s');
console.log(convertBytes(progress.speed)+"/s"+' '+progress.speed);
//const hwm = { highWaterMark: 1024 * 1024 } ;
var hrstart = process.hrtime(); // measure the copy time
var rs=fs.createReadStream(filename)
.pipe(fs.createWriteStream(output, {emitClose: true}).on("close", () => {
var hrend = process.hrtime(hrstart);
var timeInMs = (hrend[0]* 1000000000 + hrend[1]) / 1000000000;
var finalSpeed=convertBytes(copiedFileSize/timeInMs);
console.log('Done: file copy: '+ finalSpeed+"/s");'Execution time (hr): %ds %dms', hrend[0], hrend[1] / 1000000);
}) );
Refer to
With that package, you can track progress while you are copying or moving files. The progress tracking is event and method call based so its very convenient to use.
You can provide options to do a lot of things. eg. total number of file for concurrent operation, chunk size to read from a file at a time.
It was tested for single file upto 17GB and directories up to i dont really remember but it was pretty large. And also :D, it is safe to use for large file(s).
So, go ahead and have a look at it whether it matches your expectations or if it is what you are looking for :D

Append to Azure Append Blob Using AppendTextAsync Results in Missing Data

I'm attempting to create a logger for an application in Azure using the new Azure append blobs and the Azure Storage SDK 6.0.0. So I created a quick test application to get a better understanding of append blobs and their performance characteristics.
My test program simply loops 100 times and appends a line of text to the append blob. If I use the synchronous AppendText() method everything works fine, however, it appears to be limited to writing about 5-6 appends per second. So I attempted to use the asynchronous AppendTextAsync() method; however, when I use this method, the loop runs much faster (as expected) but the append blob is missing about 98% of the appended text without any exception being thrown.
If I add a Thread.Sleep and sleep for 100 milliseconds between each append operation, I end up with about 50% of the data. Sleep for 1 second and I get all of the data.
This seems similar to an issue that was discovered in v5.0.0 but was fixed in v5.0.2:
Here is my test code if you'd like to try to reproduce this issue:
static void Main(string[] args)
var accountName = "<account-name>";
var accountKey = "<account-key>;
var credentials = new StorageCredentials(accountName, accountKey);
var account = new CloudStorageAccount(credentials, true);
var client = account.CreateCloudBlobClient();
var container = client.GetContainerReference("<container-name>");
var blob = container.GetAppendBlobReference("append-blob.txt");
for (int i = 0; i < 100; i++)
blob.AppendTextAsync(string.Format("Appending log number {0} to an append blob.\r\n", i));
Console.WriteLine("Press any key to exit.");
Does anyone know if I'm doing something wrong with my attempt to append lines of text to an append blob? Otherwise, any idea why this would just lose data without throwing some kind of exception?
I'd really like to start using this as a repository for my application logs (since it was largely created for that purpose). However, it would be quite unreliable if logs would just go missing without warning if the rate of logging went above 5-6 logs per second.
Any thoughts or feedback would be greatly appreciated.
I now have a working solution based upon the information provided by #ZhaoxingLu-Microsoft. According to the the API documentation, the AppendTextAsync() method should only be used in a single-writer scenario because the API internally uses the append-offset conditional header to avoid duplicate blocks which does not work in a multiple-writer scenario.
Here is the documentation that specifies this behavior is by design:
So the solution is to use the AppendBlockAsync() method instead. The following implementation appears to work correctly:
for (int i = 0; i < 100; i++)
var message = string.Format("Appending log number {0} to an append blob.\r\n", i);
var bytes = Encoding.UTF8.GetBytes(message);
var stream = new MemoryStream(bytes);
tasks[i] = blob.AppendBlockAsync(stream);
Please note that I am not explicitly disposing the memory stream in this example as that solution would entail a using block with an async/await inside the using block in order to wait for the async append operation to finish before disposing the memory stream... but that causes a completely unrelated issue.
You are using async method incorrectly. blob.AppendTextAsync() is non-blocking, but it doesn't really finish when it returns. You should wait for all the async tasks before exiting from the process.
Following code is the correct usage:
var tasks = new Task[100];
for (int i = 0; i < 100; i++)
tasks[i] = blob.AppendTextAsync(string.Format("Appending log number {0} to an append blob.\r\n", i));
Console.WriteLine("Press any key to exit.");

How can I cleanup files & directories in Azure Local Storage after a file begins streaming to the browser?

BACKGROUND: I'm making use of Azure Local Storage. This is supposed to be treated as "volatile" storage. First of all, how long do the files & directories that I create persist on the Web Role Instances (there are 2, in my case)? Do I need to worry about running out of storage if I don't do cleanup on those files/directories after each user is done with it? What I'm doing is I'm pulling multiple files from a separate service, storing them in Azure Local Storage, compressing them into a zip file and storing that zip file, and then finally file streaming that zip file to the browser.
THE PROBLEM: This all works beautifully except for one minor hiccup. The file seems to stream to the browser asynchronously. So what happens is that an exception gets thrown when I try to delete the zipped file from azure local storage afterward since it is still in the process of streaming to the browser. What would be the best approach to forcing the deletion process to happen AFTER the file is completely streamed to the browser?
Here is my code:
using (Service.Company.ServiceProvider CONNECT = new eZ.Service.CompanyConnect.ServiceProvider())
// Iterate through all of the files chosen
foreach (Uri fileId in fileIds)
// Get the int file id value from the uri
System.Text.RegularExpressions.Regex rex = new System.Text.RegularExpressions.Regex(#"e[B|b]://[^\/]*/\d*/(\d*)");
string id_str = rex.Match(fileId.ToString()).Groups[1].Value;
int id = int.Parse(id_str);
// Get the file object from eB service from the file id passed in
eZ.Data.File f = new eZ.Data.File(CONNECT.eZSession, id);
f.Retrieve("Header; Repositories");
string _fileName = f.Name;
using (MemoryStream stream = new MemoryStream())
f.ContentData = new eZ.ContentData.File(f, stream);
// After the ContentData is created, hook into the event
f.ContentData.TransferProgressed += (sender, e) => { Console.WriteLine(e.Percentage); };
// Now do the transfer, the event will fire as blocks of data is read
int bytesRead;
// Open the Azure Local Storage file stream
using (azure_file_stream = File.OpenWrite(curr_user_path + _fileName))
while ((bytesRead = f.ContentData.Read()) > 0)
// Write the chunk to azure local storage
byte[] buffer = stream.GetBuffer();
azure_file_stream.Write(buffer, 0, bytesRead);
stream.Position = 0;
catch (Exception e)
throw e;
//Console.WriteLine("The following error occurred: " + e);
} // end of foreach block
} // end of eB using block
string sevenZipDllPath = Path.Combine(Utilities.GetCurrentAssemblyPath(), "7z.dll");
Global.logger.Info(string.Format("sevenZipDllPath: {0}", sevenZipDllPath));
var compressor = new SevenZipCompressor
ArchiveFormat = OutArchiveFormat.Zip,
CompressionLevel = CompressionLevel.Fast
// Compress the user directory
compressor.CompressDirectory(webRoleAzureStorage.RootPath + curr_user_directory, curr_user_package_path + "");
// stream to the browser
httpResponse.BufferOutput = false;
httpResponse.ContentType = Utilities.GetMIMEType("BigStuff3.mp4");
httpResponse.AppendHeader("content-disposition", "attachment;");
azure_file_stream = File.OpenRead(curr_user_package_path + "");
// Azure Local Storage cleanup
foreach (FileInfo file in user_directory.GetFiles())
foreach (FileInfo file in package_directory.GetFiles())
Can you simply run a job on the machine that cleans up files after say a day of their creation? This could be as simple as a batch file in the task scheduler or a separate thread started from WebRole.cs.
You can even use AzureWatch to auto-re-image your instance if the local space drops below a certain threshold
Could you place the files (esp. the final compressed one that the users download) in Windows Azure blob storage? The file could be made public, or create a Shared Access Signature so that only the persons you provide the URL to could download it. Placing the files in blob storage for download could alleviate some pressures on the web server.

Is http.ServerResponse.write() blocking?

Is it possible to write non-blocking response.write? I've written a simple test to see if other clients can connect while one downloads a file:
var connect = require('connect');
var longString = 'a';
for (var i = 0; i < 29; i++) { // 512 MiB
longString += longString;
function download(request, response) {
response.setHeader("Content-Length", longString.length);
response.setHeader("Content-Type", "application/force-download");
response.setHeader("Content-Disposition", 'attachment; filename="file"');
var app = connect().use(download);
And it seems like write is blocking!
Am I doing something wrong?
Update So, it doesn't block and it blocks in the same time. It doesn't block in the sense that two files can be downloaded simultaneously. And it blocks in the sense that creating a buffer is a long operation.
Any processing done strictly in JavaScript will block. response.write(), at least as of v0.8, is no exception to this:
The first time response.write() is called, it will send the buffered header information and the first body to the client. The second time response.write() is called, Node assumes you're going to be streaming data, and sends that separately. That is, the response is buffered up to the first chunk of body.
Returns true if the entire data was flushed successfully to the kernel buffer. Returns false if all or part of the data was queued in user memory. 'drain' will be emitted when the buffer is again free.
What may save some time is to convert longString to Buffer before attempting to write() it, since the conversion will occur anyways:
var longString = 'a';
for (...) { ... }
longString = new Buffer(longString);
But, it would probably be better to stream the various chunks of longString rather than all-at-once (Note: Streams are changing in v0.10):
var longString = 'a',
chunkCount = Math.pow(2, 29),
bufferSize = Buffer.byteLength(longString),
longBuffer = new Buffer(longString);
function download(request, response) {
var current = 0;
response.setHeader("Content-Length", bufferSize * chunkCount);
response.setHeader("Content-Type", "application/force-download");
response.setHeader("Content-Disposition", 'attachment; filename="file"');
function writeChunk() {
if (current < chunkCount) {
if (response.write(longBuffer)) {
} else {
response.once('drain', writeChunk);
} else {
And, if the eventual goal is to stream a file from disk, this can be even easier with fs.createReadStream() and stream.pipe():
function download(request, response) {
// response.setHeader(...)
// ...
Nope, it does not block, I tried one from IE and other from firefox. I did IE first but still could download file from firefox first.
I tried for 1 MB (i < 20) it works the same just faster.
You should know that whatever longString you create requires memory allocation. Try to do it for i < 30 (on windows 7) and it will throw FATAL ERROR: JS Allocation failed - process out of memory.
It takes time for memory allocation/copying nothing else. Since it is a huge file, the response is time taking and your download looks like blocking. Try it yourself for smaller values (i < 20 or something)

"The specified block list is invalid" while uploading blobs in parallel

I've a (fairly large) Azure application that uploads (fairly large) files in parallel to Azure blob storage.
In a few percent of uploads I get an exception:
The specified block list is invalid.
System.Net.WebException: The remote server returned an error: (400) Bad Request.
This is when we run a fairly innocuous looking bit of code to upload a blob in parallel to Azure storage:
public static void UploadBlobBlocksInParallel(this CloudBlockBlob blob, FileInfo file)
blob.Properties.ContentType = file.GetContentType();
blob.Metadata["Extension"] = file.Extension;
byte[] data = File.ReadAllBytes(file.FullName);
int numberOfBlocks = (data.Length / BlockLength) + 1;
string[] blockIds = new string[numberOfBlocks];
x =>
string blockId = Convert.ToBase64String(Guid.NewGuid().ToByteArray());
int currentLength = Math.Min(BlockLength, data.Length - (x * BlockLength));
using (var memStream = new MemoryStream(data, x * BlockLength, currentLength))
var blockData = memStream.ToArray();
var md5Check = System.Security.Cryptography.MD5.Create();
var md5Hash = md5Check.ComputeHash(blockData, 0, blockData.Length);
blob.PutBlock(blockId, memStream, Convert.ToBase64String(md5Hash));
blockIds[x] = blockId;
byte[] fileHash = _md5Check.ComputeHash(data, 0, data.Length);
blob.Metadata["Checksum"] = BitConverter.ToString(fileHash).Replace("-", string.Empty);
blob.Properties.ContentMD5 = Convert.ToBase64String(fileHash);
data = null;
All very mysterious; I'd think the algorithm we're using to calculate the block list should produce strings that are all the same length...
We ran into a similar issue, however we were not specifying any block ID or even using the block ID anywhere. In our case, we were using:
using (CloudBlobStream stream = blob.OpenWrite(condition))
//// [write data to stream]
This would cause The specified block list is invalid. errors under parallelized load. Switching this code to use the UploadFromStream(…) method while buffering the data into memory fixed the issue:
using (MemoryStream stream = new MemoryStream())
//// [write data to stream]
stream.Seek(0, SeekOrigin.Begin);
blob.UploadFromStream(stream, condition);
Obviously this could have negative memory ramifications if too much data is buffered into memory, but this is a simplification. One thing to note is that UploadFromStream(...) uses Commit() in some cases, but checks additional conditions to determine the best method to use.
This exception can happen also when multiple threads open stream into a blob with the same file name and try to write into this blob simultaneously.
NOTE: this solution is based on Azure JDK code, but I think we can safely assume that pure REST version will have the very same effect (as any other language actually).
Since I have spent entire work day fighting this issue, even if this is actually a corner case, I'll leave a note here, maybe it will be of help to someone.
I did everything right. I had block IDs in the right order, I had block IDs of the same length, I had a clean container with no leftovers of some previous blocks (these three reasons are the only ones I was able to find via Google).
There was one catch: I've been building my block list for commit via
CloudBlockBlob.commitBlockList(Iterable<BlockEntry> blockList)
with use of this constructor:
BlockEntry(String id, BlockSearchMode searchMode)
in the second argument. And THAT proved to be the root cause. Once I changed it to
and eventually landed on the one-parameter constructor
BlockEntry(String id)
which uses UNCOMMITED by default, commiting the block list worked and blob was successfuly persisted.
