How I consider resource schedule (with break time) in all the resources. This is the table of resource break time.
enter image description here
I want to know the resource schedule (break time) of four resources
Related
I created a pool in Azure Batch (from the Azure portal) with Auto scale activated.
I also defined a formula where the initial number of node is set to 0. This number will ramp up according to number of active tasks and will go back to 0 if there no task remaining.
My problem is that the minimum evaluation interval for the formula is 5 minutes, which means that in the worst case I have to wait at least 5 minutes (plus the time it takes for the nodes to boot and execute the start task) before a task can be assigned to a node.
I would like to apply the formula on the pool on demand by using the REST API (for instance right after adding a job).
According to the API documentation:
https://learn.microsoft.com/en-us/rest/api/batchservice/pool/evaluate-auto-scale
You can evaluate a formula but it will not be applied on the pool.
https://learn.microsoft.com/en-us/rest/api/batchservice/pool/enable-auto-scale
You can enable automatic scaling for a pool but if it is already enabled you have to specify a new auto scale formula and/or a new evaluation interval.
If you specify a new interval, then the existing auto scale evaluation schedule will be stopped and a new auto scale evaluation schedule will be started, with its starting time being the time when this request was issued.
You can disable and then re-enable the autoscale formula, but note the call limits on the enable API. However note that if you are trying to do this frequently on the order of less than the minimum evaluation period that thrashing a pool faster than the underlying infrastructure can allocate resources does not provide any benefits.
With an Azure Data Factory "Tumbling Window" trigger, is it possible to limit the hours of each day that it triggers during (adding a window you might say)?
For example I have a Tumbling Window trigger that runs a pipeline every 15 minutes. This is currently running 24/7 but I'd like it to only run during business hours (0700-1900) to reduce costs.
Edit:
I played around with this, and found another option which isn't ideal from a monitoring perspective, but it appears to work:
Create a new pipeline with a single "If Condition" step with a dynamic Expression like this:
#and(greater(int(formatDateTime(utcnow(),'HH')),6),less(int(formatDateTime(utcnow(),'HH')),20))
In the true case activity, add an Execute Pipeline step executing your original pipeline (with "Wait on completion" ticked)
In the false case activity, add a wait step which sleeps for X minutes
The longer you sleep for, the longer you can possibly encroach on your window, so adjust that to match.
I need to give it a couple of days before I check the billing on the portal to see if it has reduced costs. At the moment I'm assuming a job which just sleeps for 15 minutes won't incur the costs that one running and processing data would.
there is no easy way but you can create two deployment pipelines for the same job in Azure devops and as soon as your winodw 0700 to 1900 expires you replace that job with a dummy job using azure dev ops pipeline.
All the functions in our Function Apps seem to be reporting weirdly high resource consumption. From the docs:
Functions are billed based on observed resource consumption measured in gigabyte seconds (GB-s). Observed resource consumption is calculated by multiplying average memory size in gigabytes by the time in milliseconds it takes to execute the function. Memory used by a function is measured by rounding up to the nearest 128 MB
(I know it's also priced per million executions, but I'm investigating a single execution here).
The "monitor" blade for my function shows that the last execution had a duration of 47659ms, or 47.659 seconds. The "memory working set" metric says that my function was using 117MB at its peak, so round that up to 128MB, and my function's resource consumption should be:
0.128GB * 47.659s = 6.1 GB-s.
Problem is, for that execution the "Function Execution Units" metric shows that it actually used 5.94M (million) GB-s. I'm pretty sure M = Million because some tiny ones earlier on had a "k" suffix instead of an M (the function was failing to start back then and only ran for ~100ms).
I also have the daily execution quota set to 13,333 so the whole Function App should have immediately stopped for the day, but it hasn't, and one of the other functions executed a few minutes after. Attached is a screenshot with the execution in question indicated along with the other one. I've asked other people here to log in and check it out, and everyone sees the same kind of thing, on all function apps.
What am I missing here?
Edit: the metrics API shows the same weirdness - a function execution that runs for 591ms returns this:
az monitor metrics list --resource 'func-name' --resource-group 'rg-name' --resource-type 'Microsoft.Web/sites' --metrics 'FunctionExecutionUnits'
{
"average": null,
"count": null,
"maximum": null,
"minimum": null,
"timeStamp": "2020-07-06T06:55:00+00:00",
"total": 120064.0
}
Finally found some clarity on a different documentation page, Estimating Consumption plan costs, at the bottom of the "View Execution Data" section there's an example for how to generate a chart of Function Execution Units, and underneath their example chart, it briefly mentions the following:
This chart shows a total of 1.11 billion Function Execution Units consumed in a two-hour period, measured in MB-milliseconds. To convert to GB-seconds, divide by 1024000.
Would have been a lot clearer if the units were displayed in Azure Monitor and/or mentioned in the Functions Pricing docs. But my fault for not RTFM...
We would like to set up Azure auto-scaling based on specific time of the day. E.g. on 7:00 we would like to increase number of instances and at 17:00 we would like to decrease them.
We are aware that we can set to scale up by some other metrics (CPU, number of messages in queue, etc.), but this has some negative impacts for us - starting of a new instance takes some time and also w3wp warm-up takes some time too. And we need to have instances ready immediately when high load comes.
Is there any way to set auto-scaling on specific time of the day (from 7:00 to 17:00) and specific days of week (working days).
You could inculcate the following general guidelines based on your requirement:
Scale based on a schedule
In addition to scale based on CPU, you can set your scale differently for specific days of the week.
Click Add a scale condition.
Setting the scale mode and the rules is the same as the default condition.
Select Repeat specific days for the schedule.
Select the days and the start/end time for when the scale condition should be applied.
Scale differently on specific dates
In addition to scale based on CPU, you can set your scale differently for specific dates.
Click Add a scale condition.
Setting the scale mode and the rules is the same as the default condition.
Select Specify start/end dates for the schedule.
Select the start/end dates and the start/end time for when the scale condition should be applied.
Refer Get started with Autoscale in Azure for more details.
As general Autoscaling guidelines:
When you can predict the load on the application well enough to use scheduled autoscaling, adding and removing instances to meet anticipated peaks in demand. If this isn't possible, use reactive autoscaling based on runtime metrics, in order to handle unpredictable changes in demand. Typically, you can combine these approaches. For example, create a strategy that adds resources based on a schedule of the times when you know the application is most busy. This helps to ensure that capacity is available when required, without any delay from starting new instances. For each scheduled rule, define metrics that allow reactive autoscaling during that period to ensure that the application can handle sustained but unpredictable peaks in demand.
The Background
Our clients use a service for which they set daily budget. This is a prepaid service and we allocate a particular amount from user's budget every day.
Tables:
budgets - how much we are allowed to spend per day
money - clients real balance
money_allocated - amount deducted from money that can be spent today (based on budgets)
There is a cron job that runs every few minutes and checks:
if user has money_allocated for a given day
if money_allocated >= budgets (user may increase budget during the day)
In the first case we allocate full amount of daily budget, in the latter - the difference between budget and already allocated amount for that day (in this case we create additional record in money_allocated for the same day).
Allocation has two stages - in the first round we add a row with status "pending" (allocation requested) and another cron checks all "pending" allocations and moves money from money to money_allocated if user has enough money. This changes status to "completed".
The Problem
We have a cluster of application servers (under NLB) and above cron job runs on each of them which means that money can accidentally be allocated multiple times (or not allocated at all if we implement wrong "already allocated" triggers).
Our options include:
Run cron job on one server only - no redundancy, client complaints and money lost on failure
Add a unique index on money_allocated that goes like (client_id, date, amount) - won't allocate more money for a given day if client doubles the budget or increases it multiple times by the same amount during the day
There is an option to record each movement in budgets and link all allocations to either "first allocation of the day" or "change of budget (id xxx)" (add this to the unique index as well). This does not look sexy enough, however.
Any other options? Any advice would be highly appreciated!
Ok, so I ended up running this on one of the cluster's instances. If you use Amazon AWS and are in a similar situation, below is one of the options..
On each machine, at the beginning of your cron job's code, do the following:
Call describe_load_balancers (AWS API), parse the response to get a list/array of all instances
Get http://169.254.169.254/latest/meta-data/instance-id - this returns instance ID of the machine that is sending request
If received instance ID is #1 in the list/array of all instances - proceed, if not - exit
Also, be sure to automatically replace unhealthy instances under this load balancer in short time as describe_load_balancers returns a list of both healthy and unhealthy instances. You may end up with a job not being done for a while if instance #1 goes down.