I am working on cosmos db change feed for a real time project. we are running our webjobs in azure app service with P3V2 specification. there are multiple webjobs running using change feed. So to monitor the processes we have used the change feed lag estimator for monitoring record lags. the implementation is according to following document.
https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/how-to-use-change-feed-estimator
For one of the webjob in the .net core code we have put a delay of 10 mins using await Task.delay() function. for that specific webjob we are getting estimation in millions even though the records which we are processing are not more than 100.
This is kind of uncertain behavior we are observing. can anyone help to find the exact reason?
Is the Estimator matching a processor that is currently running and processing documents? Normally what you describe matches a scenario where the Processor is not running/never ran or never completed a successful run on some of the leases.
You can use the detailed estimation to understand how the lag is distributed across leases: https://docs.microsoft.com/en-us/azure/cosmos-db/sql/how-to-use-change-feed-estimator#as-an-on-demand-detailed-estimation
Related
We have a long running ASP.NET WebApp in Azure which has no real endpoints exposed – it serves a single functional purpose primarily reading and manipulating database data, effectively a batched, scheduled task, triggered by a timer every 30 seconds.
The app runs fine most of the time but we are seeing occasional issues where the CPU load for the app goes close to the maximum for the AppServicePlan, instantaneously rather than gradually, and stops executing any more timer triggers and we cannot find anything explicitly in the executing code to account for it (no signs of deadlocks etc. and all code paths have try/catch so there should be no unhandled exceptions). More often than not we see errors getting a connection to a database but it’s not clear if those are cause or symptoms.
Note, this is the only resource within the AppService Plan. The Azure SQL database is in the same region and whilst utilised by other apps is very lightly used by them and they also exhibit none of the issues seen by the problem app.
It feels like this is infrastructure related but we have been unable to find anything to explain what is happening so if anyone has any suggestions for where we should be looking they would be gratefully received. We have enabled basic Application Insights (not SDK) but other than seeing CPU load spike prior to loss of app response there is little information of interest given our limited knowledge of how to best utilise Insights.
According to your description, I thought of two points to troubleshoot your problem. First of all, you can track the running status of your program through the code, and put a log at the beginning and end of your batch scheduled tasks to record the status of each run. If possible, record request and response information and start and end information. This can completely record the time and running status of your task.
Secondly, you can record logs before the program starts database operations, and whether the database connection is successful. The best case is to be able to record, what business will trigger CPU load when operating, and track the specific operating conditions, in order to specifically analyze what causes the database connection failure.
Because you cannot reproduce your problem, you can only guess the cause of the problem. If you still can't find where the problem is through the above two points, then modify your timer appropriately, and let the program trigger once every 5 minutes instead of 30s.
I have tried for the 1st time Azure Function, besides a couple of problems where I found a workaround, it was quite easy to develop and publish my function to Azure. I even tried preview features like durable entities and it works great, I am enthusiast.
However, I had some concerns with the timings. My function is http triggered, it's called by another application. Most of the time execution time is ~1sec which is great. Sometimes, I don't know why it takes up to 30 secs to execute the same function. Is this normal? Maybe some cold start? Or it's me doing something wrong? I am a newbie so I'd like the experts opinion. I am using consumption plan in w. Europe.
Unfortunately for this application anything > 4 sec is not acceptable because it will cause an error in the caller reflected in turn to the end user.
Here you can se a screen capture of logs with timings, look at the bottom what crazy slow times.
Any way to ensure timing always within 4 secs?
This much variation would not be expected with cold start. Generally cold start is about 2-5 seconds and should only happen if a long period of no invocations. Also the measurement here is just execution time, and doesn’t include startup time. I’d recommend looking into logs and adding traces to see if there’s a line of code it’s hanging on.
First step is to understand what happens once you hit one Azure Function endpoint, step by step:
Azure must allocate your application to a server with capacity,
The Functions runtime must then start up on that server,
Your code then needs to execute.
I don't know why it takes up to 30 secs to execute the same function. Is this normal? Maybe some cold start?
I think the answer is related to cold start, the following image represents what happens when you trigger a function app's endpoint (Source: Understanding serverless cold start):
I have similar issues once using Consumption plan. A dedicated plan might be a solution for your case, half minute to warm up an endpoint is pretty bad. To keep the function warm, you have a chance to use Premium plan which promises the following:
When you're using the Premium plan, instances of the Azure Functions host are added and removed based on the number of incoming events just like the Consumption plan. Premium plan supports the following features: Perpetually warm instances to avoid any cold start
You can read about this further: Premium plan (preview)
Additional information:
Be careful with the mentioned option because the pricing might be different based on the following:
Instead of billing per execution and memory consumed, billing for the Premium plan is based on the number of core seconds, execution time, and memory used across needed and reserved instances. At least one instance must be warm at all times. This means that there is a fixed monthly cost per active plan, regardless of the number of executions.
I would consider at least for testing purposes the above mentioned option, I hope the answer helps and gives you the idea why you have slow startup.
I have a C# console application which extracts 15GB FireBird database file on a server location to multiple files and loads the data from files to SQLServer database. The console application uses System.Threading.Tasks.Parallel class to perform parallel execution of the dataload from files to sqlserver database.
It is a weekly process and it takes 6 hours to complete.
What is best option to move this (console application) process to azure cloud - WebJob or WorkerRole or Any other cloud service ?
How to reduce the execution time (6 hrs) after moving to cloud ?
How to implement the suggested option ? Please provide pointers or code samples etc.
Your help in detail comments is very much appreciated.
Thanks
Bhanu.
let me give some thought on this question of yours
"What is best option to move this (console application) process to
azure cloud - WebJob or WorkerRole or Any other cloud service ?"
First you can achieve the task with both WebJob and WorkerRole, but i would suggest you to go with WebJob.
PROS about WebJob is:
Deployment time is quicker, you can turn your console app without any change into a continues running webjob within mintues (https://azure.microsoft.com/en-us/documentation/articles/web-sites-create-web-jobs/)
Build in timer support, where WorkerRole you will need to handle on your own
Fault tolerant, when your WebJob fail, there is built-in resume logic
You might want to check out Azure Functions. You pay only for the processing time you use and there doesn't appear to be a maximum run time (unlike AWS Lambda).
They can be set up on a schedule or kicked off from other events.
If you are already doing work in parallel you could break out some of the parallel tasks into separate azure functions. Aside from that, how to speed things up would require specific knowledge of what you are trying to accomplish.
In the past when I've tried to speed up work like this, I would start by spitting out log messages during the processing that contain the current time or that calculate the duration (using the StopWatch class). Then find out which areas can be improved. The slowness may also be due to slowdown on the SQL Server side. More investigation would be needed on your part. But the first step is always capturing metrics.
Since Azure Functions can scale out horizontally, you might want to first break out the data from the files into smaller chunks and let the functions handle each chunk. Then spin up multiple parallel processing of those chunks. Be sure not to spin up more than your SQL Server can handle.
I have an executable that performs long calculations and I want to run those calculations on Azure. What would be the optimal service - batch or VM perhaps?
Azure batch or VM scale sets. Azure Batch is based on top of scale sets and is more specifically designed for task/jobs while VM scalesets help for scaling generic VMs.
Use cases for Batch:
Batch is a managed Azure service that is used for batch processing or batch computing--running a large volume of similar tasks to get some desired result. Batch computing is most commonly used by organizations that regularly process, transform, and analyze large volumes of data.
Batch works well with intrinsically parallel (also known as "embarrassingly parallel") applications and workloads. Intrinsically parallel workloads are easily split into multiple tasks that perform work simultaneously on many computers.
More info here for batch: https://azure.microsoft.com/en-us/documentation/articles/batch-technical-overview/
if you can change the doctype to multi-part and you're able to suspend your long job every minute or so and update progress, that will make it more user interactive and stops the http connection timing out. you could also add a cancel job button? or is the question about something else?
I have a continuous Webjob running on my Azure Website. It is responsible for doing some work after retrieving items from a QueueTrigger. I am attempting to increase the rate in which the items are processed off the Queue. As I scale out my App Service Plan, the processing rate increases as expected.
My concern is that it seems wasteful to pay for additional VMs just to run additional instances of my Webjob. I am looking for options/best practices to run multiple instances of the same Webjob on a single server.
I've tried starting multiple JobHosts in individual threads within Main(), but either that doesn't work or I was doing something wrong... the Webjob would fail to run due to what looks like each thread trying to access 'WebJobSdk.marker'. My current solution is to publish my Webjob multiple times, each time modifying 'webJobName' slightly in 'webjob-publish-settings.json' so that the same project is considered a different Webjob at publish time. This works great so far, expect that it creates a lot of additional work each time I need to make any update.
Ultimately, I'm looking for some advice on what the recommended way of accomplishing this would be. Ideally, I would like to get the multiple instances running via code, and only have to publish once when I need to update the code.
Any thoughts out there?
You can use the JobHostConfiguration.QueuesConfiguration.BatchSize and NewBatchThreshold settings to control the concurrency level of your queue processing. The latter NewBatchThreshold setting is new in the current in progress beta1 release. However, by enabling "prerelease" packages in your Nuget package manager, you'll see the new release if you'd like to try it. Raising the NewBatchThreshold setting increases the concurrency level - e.g. setting it to 100 means that once the number of currently running queue functions drops below 100, a new batch of messages will be fetched for concurrent processing.
The marker file bug was fixed in this commit a while back, and again is part of the current in progress v1.1.0 release.