NLog with azure blob storage container file size - nlog

I have configure NLog with azure BLOB storage in my application. But if my log
content is too big like 500 000 lines, then it just stores only 50 000.
How many log line i can store per BLOB container file?

Think you are meeting the hard limit of 50000 blocks in a single blob:
https://learn.microsoft.com/en-us/azure/storage/common/storage-scalability-targets#azure-blob-storage-scale-targets
When writing single line at a time, then each line becomes a single block.
I guess the NLog Blob Target should have some batch-logic, and only write every 1-2 sec. (Can be done by applying AsyncWrapper or BufferingWrapper with non-frequent flush interval).
I guess the NLog Blob Target should try and keep track of the number of write-operations. And automatically roll when having reached 49999 writes (And also roll when failing because of reached block limit).

New updated NLog Blob Storage Target has been created. It performs batching by default, and reduces the chance of meeting the 50000 file-operations limit.
See: https://www.nuget.org/packages/NLog.Extensions.AzureBlobStorage/

Related

Very high number of file transaction when using Azure Functions with Python

I am building a function app that is trigger by a queue message, reads some input files from Blob Storage, combines them and writes a new file to Blob storage.
Each time the function runs, I see a very high amount of file transactions resulting in unexpected costs. The costs are related to "File Write/Read/Protocol Operation Units".
The function has a Queue Trigger binding, three input bindings pointing to blob storage and an Output binding pointing to blob storage
The Function App is running on Python (which I know is experimental)
When looking at the metrics of my storage account I see spikes going to 50k file transactions for each time I run my function. Testing with an empty function triggered by a Queue Message, I also get 5k file transactions.
Normally the function writes the output to the output binding location (which for Python function is a temporary file on the Function App storage, which is then copied back to Blob storage, I presume)
In this related question, the high costs for the storage are suspected to be related with logging. In my case logging is not enabled in the hosts.json file and I've disabled logging on the storage account also. This hasn't resolved the issue. (Expensive use of storage account from Azure Functions)
Are these values normal for an output file of 60KB and an input file of around 2MB?
Is this related to the Python implementation or is this to be expected for all languages?
Can I avoid this?
The python implementation in V1 functions creates inefficiencies that could lead to significant file usage. This is a known shortcoming. There is a work in progress on a python implementation for functions V2 that will not have this problem.

Parallelize Azure Logic App executions when copying a file from SFTP to Blob Storage

I have an Azure Logic App which gets triggered when a new file is added or modified in an SFTP server. When that happens the file is copied to Azure Blob Storage and then gets deleted from the SFTP server. This operation takes approximately 2 seconds per file.
The only problem I have is that these files (on average 500kb) are processed one by one. Given that I'm looking to transfer around 30,000 files daily this approach becomes very slow (something around 18 hours).
Is there a way to scale out/parallelize these executions?
I am not sure that there is a scale out/parallelize execution on Azure Logic App. But based on my experience, if the timeliness requirements are not very high, we could use Foreach to do that, ForEach parallelism limit is 50 and the default is 20.
In your case, my suggestion is that we could do loop to trigger when a new file is added or modified in an SFTP then we could insert a queue message with file path as content to azure storage queue, then according to time or queue length to end the loop. We could get the queue message collection. Finally, fetch the queue message and fetch the files from the SFTP to create blob in the foreach action.
If you're C# use Parallel.ForEach like Tom Sun said. If you use this one I also recommend to use async/await pattern for IO operation (save to blob). It will free up the executing thread when file is being saved to serve some other request.

Azure Storage Performance Queue vs Table

I've got a nice logging system I've set up that writes to Azure Table Storage and it has worked well for a long time. However, there are certain places in my code where I need to now write a lot of messages to the log (50-60 msgs) instead of just a couple. It is also important enough that I can't start a new thread to finish writing to the log and return the MVC action before I know the log is successful because theoretically that thread could die. I have to write to the log before I return data to the web user.
According to the Azure dashboard, Table Storage transactions take ~37ms to commit, end to end (E2E), while queues only take ~6ms E2E to commit.
I'm now considering not logging directly to table storage, and instead log to an Azure Queue, then have a batch job run that reads off the queue and then puts them in their proper place in table storage. That way I can still index them properly via their partition and row keys. I can also write just a single queue message with all of the log entries. So it should only take 6ms instead of (37 * 50) ms.
I know that there are Table Storage batch operations. However, each of the log entries typically goes to different partition, and batch ops need to stay within a single partition.
I know that queue messages only live for 7 days, so I'll make sure I store queue messages in a new mechanism if they're older than a day (if it doesn't work the first 50 times, it just isn't going to work).
My question, then is: what am I not thinking about? How could this completely kick me in the balls in 4 months down the road?

Azure Queue Storage: Send files in messages

I am assessing Azure Queue Storage to communicate between two decoupled applications.
My requirement is to send a file (flat file, size: small to large) in the queue message.
As per my reading an individual message in a queue cannot exceed beyond 64KB, so sending a file of variable size in the message is out of question.
Another solution I can think of is using a combination of Queue Storage and blob storage, i.e. in the queue message add a reference to the file (on blob storage) and then when required read the file from the blob (using the reference/address in the queue message).
My question is, is this a right approach? or are there any other elegant ways to achieving this?
Thanks,
Sandeep
While there's no right approach, since you can put anything you want in a queue message (within size limits), consider this: If your file sizes can go over 64K, you simply cannot store these within a queue message, so you will have no other choice but to store your content somewhere else (e.g. blobs). For files under 64K, you'll need to decide whether you want two different methods for dealing with files, or just use blobs as your file source across the board and have a consistent approach.
Also remember that message-passing will eat up bandwidth and processing. If you store your files in queue messages, you'll need to account for this with high-volume message-passing, and you'll also need to extract your file content from your queue messages.
One more thing: If you store content in blobs, you can use any number of tools to manipulate these files, and your files remain in blob storage permanently (until you explicitly delete them). Queue messages must be deleted after processing, giving you no option to keep your file around. This is probably an important aspect to consider.

Limit number of blobs processed at one time by Azure Webjobs

I have an Azure Webjob that copies large CSVs (500 MB to 10+ GB) into a SQL Azure table. I get a new CSV every day and I only retain records for 1 month, because it's expensive to keep them in SQL, so they are pretty volatile.
To get them started, I bulk uploaded last month's data (~200 GBs) and I'm seeing all 30 CSV files getting processed at the same time. This causes a pretty crazy backup in the uploads, as shown by this picture:
I have about 5 pages that look like this counting all of the retries.
If I upload them 2 at a time, everything works great! But as you can see from the running times, some can take over 14 hours to complete.
What I want to do is bulk upload 30 CSVs and have the Webjob only process 3 of the files at a time, then once one completes, start the next one. Is this possible with the current SDK?
Yes, absolutely possible.
Assuming the pattern you are using here is to use Scheduled or On-Demand WebJobs that pop a message on a queue which is then picked up by a constantly running WebJob that processes messages from the queue and then does the work you can use the JobHost.Queues.BatchSize property to limit the number of queue messages that can be processed at one time. H
static void Main()
{
JobHostConfiguration config = new JobHostConfiguration();
//AzCopy cannot be invoked multiple times in the same host
//process, so read and process one message at a time
config.Queues.BatchSize = 1;
var host = new JobHost(config);
host.RunAndBlock();
}
If you would like to see what this looks like in action feel free to clone this GitHub repo I published recently on how to use WebJobs and AzCopy to create your own Blob Backup service. I had the same problem you're facing which is I could not run too many jobs at once.
https://github.com/markjbrown/AzCopyBackup
Hope that is helpful.
Edit, I almost forgot. While you can change the BatchSize property above you can also take advantage of having multiple VM's host and process these jobs too which basically allows you to scale this into multiple, independent, parallel processes. You may find that you can scale up the number of VM's and process the data very quickly instead of having to throttle it using BatchSize.

Resources