We have a streaming service for small data files. Files of around 1MB is uploaded to a storage account every minute. A stream analytics job takes the data from these and passes them to an event hub which again triggers a function app.
Lately the inputs from this storage accounts suddenly stops, and the stream has to be restarted for it to start processing data again.
Any feedback is welcome, and I will happily provide more information if needed to solve this.
These are the facts:
Blob uploads every minute.
Stream gets input events from blob.
Even though the data is successfully uploaded to the blob, eventually
the stream stops processing these saying 0 input events recieved.
Watermark delay increases with 0 events outputted until restarted.
Every time the stream stops receiving inputs the storage account gets a network error followed by a timeout error every 10 min (likely the stream retry policy).
Whenever it is restarted there is a spike in outputs which after a short while normalizes in volume.
At any point there is only 30 files (1 per minute last 30 mins), as after this the files are deleted.
The storage account and stream is located in different regions.
Related
I have an Azure Function triggered by an EventGrid when a blob is created.
Based on the size of the blob (this is a pdf file), my Azure Function can take any where between 2 seconds to 600 seconds (10 minutes) to execute.
As per Azure documentation, the ExentGrid retries to deliver the event again if it does not receive a response from the end point (in this case, the end point is my Azure function) with in 30 seconds.
10 seconds
30 seconds
1 minute
5 minutes
10 minutes
30 minutes
1 hour
3 hours
6 hours
Every 12 hours up to 24 hours
I don't see any issues for the smaller files that I upload to the storage, My azure function executes and hopefully the EventGrid receives the response under 30 senconds, hence my function is execute only once.
Issue:
For larger files, My azure function is triggered by the eventgrid (as expected) and the execution starts, however due to the large file size, my function executes for well over 30 seconds, Since the eventgrid did not receive any success response from end point (as the function is still executing), it sends another event and my function initiates another instance for the same file, this way the function executes several times for time same file.
How can I handle this situation, Can I change the retry mechanism for the eventgrid only for this function, or is there a better way to handle this problem.
Any help would be greatly appreciated.
Azure looks for timely response(<30s) from Azure Function or webhook event handlers, there seems to be no setting to increase this time limit. On receiving an event, instead of doing the actual long running work, you must push a message to a Azure queue, and let your function pick up messages from that queue. This allows you to just enqueue the work and quickly return response to Azure Event grid within 30seconds, and also scales up your event handling[even if more blobs are uploaded as a burst, your application can handle it].
So suppose that you have an application that lets user request a job. For example (hypothetical): user uploads a video. There is an entry made in RDBMs with the URL to video on blob and the status is set to "Pending".
There is a recurring time triggered functionapp that is executed every 10 seconds or so which gets 10 pending jobs from RDBMS and performs some compression etc.
The problem here is that as long as the number of requests stay 10-30 videos per 10 seconds we should be fine. But if the number of requests increase all of a sudden .. say 200 requests per 10 seconds this would mean that there will be a lot of job pending and the user would have to wait 10 times longer than usual to see status change. How do you scale out function app automatically in such scenario? Does it have to be manual?
There's an easier way to get fan out and parallel processing through multiple concurrently running Azure Functions.
Add an Azure Service Bus Queue to your solution.
For each video that needs to be processed, enqueue a service bus message with the appropriate data you'll need to retrieve and process the video (like the BlobId).
Have your Azure Function triggered by an ServiceBusTrigger.
Azure will spin up additional instances of your Azure Function as the queue depth increases. It'll also scale in idle instances after there's no more data to process.
I've got a nice logging system I've set up that writes to Azure Table Storage and it has worked well for a long time. However, there are certain places in my code where I need to now write a lot of messages to the log (50-60 msgs) instead of just a couple. It is also important enough that I can't start a new thread to finish writing to the log and return the MVC action before I know the log is successful because theoretically that thread could die. I have to write to the log before I return data to the web user.
According to the Azure dashboard, Table Storage transactions take ~37ms to commit, end to end (E2E), while queues only take ~6ms E2E to commit.
I'm now considering not logging directly to table storage, and instead log to an Azure Queue, then have a batch job run that reads off the queue and then puts them in their proper place in table storage. That way I can still index them properly via their partition and row keys. I can also write just a single queue message with all of the log entries. So it should only take 6ms instead of (37 * 50) ms.
I know that there are Table Storage batch operations. However, each of the log entries typically goes to different partition, and batch ops need to stay within a single partition.
I know that queue messages only live for 7 days, so I'll make sure I store queue messages in a new mechanism if they're older than a day (if it doesn't work the first 50 times, it just isn't going to work).
My question, then is: what am I not thinking about? How could this completely kick me in the balls in 4 months down the road?
I am using azure worker role to read and process the queue message.
It is working fine but sometimes the performance is very slow. It's not reading queue properly.
So queue message count starts to increase, so all functionality is getting delayed.
Web app details.
Main use of the app is tracking the vehicle. each vehicle contains device to send gps in every 15sec duration..So ill will get and push into the queue by web role..then reading and processing that message..
Sometimes worker role performance is very low.. takes 2 sec to read single message..
I cant say its happening by work load, because morning and evening trips are there..that time i have to process more details,like sending messages etc...but that time its working fine.. afternoon time no trip that time simply reading and pushing into azure table storage. Its not reading queue fastly , once or twice in a day its happening..queue messages counts increased more than 5000, then all data processing getting delayed..
How can I avoid this?
I have a WebJob which gets triggered when a user uploads a file to the blob storage - it is triggered by a queue storage message which is created once the upload is complete.
Depending on the purpose of the file, it will post messages to other queues to trigger processing jobs.
Some of these jobs are time critical, and run relatively quickly. In one case the processing takes about three seconds, and the user is waiting for the result.
However, because the minimum queue polling interval is 2 seconds, the time the user must wait for the two WebJobs to be invoked is generally doubling their wait time.
I tried combining the two WebJobs into one, hoping that when the first handler posts a queue message the corresponding processing handler would be immediately triggered, but in fact it consistently waits two seconds before picking up the message.
My question is, is there a way for me to tell my WebJob to check the queue triggers immediately from within the same WebJob if I know there is a message waiting? Or even better configure it to immediately check the queue triggers if I post to a queue from inside the WebJob?
Or would switching to a service bus queue improve the responsiveness to new messages?
Update
In the docs about using blob triggers, it says:
There is an exception for blobs that you create by using the Blob attribute. When the WebJobs SDK creates a new blob, it passes the new blob immediately to any matching BlobTrigger functions. Therefore if you have a chain of blob inputs and outputs, the SDK can process them efficiently. But if you want low latency running your blob processing functions for blobs that are created or updated by other means, we recommend using QueueTrigger rather than BlobTrigger.
http://azure.microsoft.com/en-gb/documentation/articles/websites-dotnet-webjobs-sdk-storage-blobs-how-to/
However there is no mention of anything similar for queues. Meaning if you need really low latency in this scenario then blobs are the better than queues, which seems wrong.
Update 2
I ended up working around this by pulling the orchestrating code out of the first WebJob and into the service layer of the application and removing the WebJob.. it was fast running anyway so perhaps separating it into its own WebJob was an overkill. This means only the processing WebJob has to be triggered after the file upload.
Currently 2 sec is the minimum time it will take for the SDK to poll for the new message. The SDK does an exponential back off polling so you can configure the MaxPollingInterval to be 2 sec always.
config.Queues.MaxPollingInterval = TimeSpan.FromSeconds(15);
For more details please see http://azure.microsoft.com/en-us/documentation/articles/websites-dotnet-webjobs-sdk-storage-queues-how-to/#config