Doing some tests, I could see that having an Azure Integration Runtime (AIR) allowed us to reduce considerably the amount of time required to finish a pipeline.
To fully understand the use of this configuration and its billing as well, I've got these questions. Let's assume I've got two independent pipelines, all of their Data Flow activities use the same AIR with a TTL = 10 minutes.
The first pipeline takes 7 minutes to finish. The billing will be (if I understand well):
billing: time to acquire cluster + job execution time + TTL (7 + 10)
Five minutes later, I trigger the second pipeline. It will take only 3 minutes to finish (I understand it will also use the same pool as the first one). After it concludes, the TTL is setting up to 10 minutes again or is equal to 2 minutes
10 - 5 -3 (original TTL - start time second pipe - runtime second pipe), in this case, what will happen if I trigger a third pipeline that could take more than 2 minutes?
What about the billing, how is it going to be calculated?

Look at the ADF pipeline monitoring view and find all of your data flow activity executions.
Add up that total data flow activity execution time.
Now add the TTL value for that Azure IR you were using to that total.
That is the total time you will be billed.


How to trigger/force Azure Batch pool autoscale formula

I created a pool in Azure Batch (from the Azure portal) with Auto scale activated.
I also defined a formula where the initial number of node is set to 0. This number will ramp up according to number of active tasks and will go back to 0 if there no task remaining.
My problem is that the minimum evaluation interval for the formula is 5 minutes, which means that in the worst case I have to wait at least 5 minutes (plus the time it takes for the nodes to boot and execute the start task) before a task can be assigned to a node.
I would like to apply the formula on the pool on demand by using the REST API (for instance right after adding a job).
According to the API documentation:
You can evaluate a formula but it will not be applied on the pool.
You can enable automatic scaling for a pool but if it is already enabled you have to specify a new auto scale formula and/or a new evaluation interval.
If you specify a new interval, then the existing auto scale evaluation schedule will be stopped and a new auto scale evaluation schedule will be started, with its starting time being the time when this request was issued.
You can disable and then re-enable the autoscale formula, but note the call limits on the enable API. However note that if you are trying to do this frequently on the order of less than the minimum evaluation period that thrashing a pool faster than the underlying infrastructure can allocate resources does not provide any benefits.

Azure Data Factory - Tumbling Window Trigger - Limit hours it is running

With an Azure Data Factory "Tumbling Window" trigger, is it possible to limit the hours of each day that it triggers during (adding a window you might say)?
For example I have a Tumbling Window trigger that runs a pipeline every 15 minutes. This is currently running 24/7 but I'd like it to only run during business hours (0700-1900) to reduce costs.
I played around with this, and found another option which isn't ideal from a monitoring perspective, but it appears to work:
Create a new pipeline with a single "If Condition" step with a dynamic Expression like this:
In the true case activity, add an Execute Pipeline step executing your original pipeline (with "Wait on completion" ticked)
In the false case activity, add a wait step which sleeps for X minutes
The longer you sleep for, the longer you can possibly encroach on your window, so adjust that to match.
I need to give it a couple of days before I check the billing on the portal to see if it has reduced costs. At the moment I'm assuming a job which just sleeps for 15 minutes won't incur the costs that one running and processing data would.
there is no easy way but you can create two deployment pipelines for the same job in Azure devops and as soon as your winodw 0700 to 1900 expires you replace that job with a dummy job using azure dev ops pipeline.

Azure Data Factory Pricing - Activity Count

I'm thinking of using Data Factory in order to copy data from a Blob Storage container to an SQL table but I'm not quite sure I understand how the pricing works, specifically how the activities are counted.
So if I have a pipeline with 3 activities that copies the data from a CSV with 1000 lines will the total activity count be 3*1 or 3*1000? In other words, will I be charged based on the number o files it processes or the total number of lines it copies?
That's 3 activity runs. Activity runs are measured by the thousand, at $1 per. Since these are Copy activities, they consume Data Integration Units (DIU) at $.25 per hour. Pipeline execution time is billed at $.005 per hour. If you add all this up for 1 pipeline with 3 Copy activities that runs for 1 hour, your total bill is like 27 cents.
We run THOUSANDS of pipelines a month, all with many activities including quite a few Copy activities. Our Data Factory billing is still so low that it looks like a rounding error in our total Azure spend.
The exception to this is Data Flow. Data Flow is a Spark wrapper, so you have to pay for Cluster time, which can get expensive quickly if you aren't careful.
Actually, you have to pay for 2 important metrics: Orchestration and Execution. Please refer to more details from this document.
1.Orchestration, $1 per 1,000 runs. You have 3 activities, then it should be $ 3/1000.
2.Execution, it depends on the DIU you configured,which means the performance of your transmission.

Azure Monitor Custom log search Query - understanding Period and Frequency

the actual problem is different from what I've described. I'll provide and update/edit to this ticket once we'll resolve the issue. More details may be found at this thread -
Original question:
We use Azure Monitor to create alerts based on logs in Log Analytics. For this we choose our Log Analytics account as a "RESOURCE", then choose "Custom log search" signal name for "CONDITION". Alert logic - "Number of results greater than 0".
Sample query:
search *
| where ResourceProvider == "MICROSOFT.DATAFACTORY" and status_s == "Failed"
For Period and Frequency lets set 15 minutes. All looks simple, but...
The issue: described above setup does not work (it works sometimes), because alerts are fired only sometimes, a lot of them are missed which is completely unacceptable behavior.
If we set Period = Frequency = 5 minutes we basically miss almost every event. Period = Frequency = 15 minutes works better, but still a lot of events are missing. Period = Frequency = 30 works even better, but all this looks weird.
Important notice - logs are collected from Data Factory V2 into Log Analytics. I suspect that alert misses are due to the fact that logs are delivered to Log Analytics with some delay (up to several minutes). So when Azure Monitor evaluates alert query for the last 15 minutes (Period=15) it might be that most resent log entries are still not in Log Analytics. When next query evaluation is executed in 15 minutes it will miss the logs that were ingressed with a delay for prev 15 minutes interval. Is this assumption correct? If so, this is very weird - how then we supposed to configure Period and Frequency values? If I set Period > Frequency (e.g. Period = 30 and Frequency = 5, which means "evaluate expression every 5 minutes, take data for last 30 minutes from current time") then we get multiple duplicated alerts because Period is larger than Frequency so there is a big chance of log search query returning the same log entries every 5 minutes - this is highly undesirable behavior.
Issue happened to be with a buggy bahavior of ARM template creating alerts. Thanks to Stanislav Zhelyazkov it has been nailed down and resolved - I use alternative ARM API now and it seems to work fine. More details on the topic may be found here -

Does IoTHub delay messages by a batching interval in a custom endpoint to Azure Storage?

I am sending some messages in a pipeline using Azure IoT Edge. There is a custom endpoint (say, GenericEndpoint) that I have set up, which will send/put the messages to Azure Blob storage. I am using a route to push the device messages to the specific endpoint GenericEndpoint.
The batch frequency of GenericEndpoint is set at 60 seconds. So 1 batch creates 1 single file with some messages, in the container specified.
Lets say, there are N messages in a single blob batch file (say, blobX) in the specific container. If I take the average of the difference between the IoTHub.EnqueuedTime(i) of each message i, in blobX and the 'Creation Time' of blobX, and call it AVG, I get:
I think, this essentially gives me the average time that those N messages spent in iothub before being written in the blob storage. Now what I observe here is that, if p and q are respectively the first and last message written in blobX, then
But since the batching interval was set to 60 seconds, I would expect this average or AVG to be approximately near 30 seconds. Because, if the messages are written as soon as they arrive, then the average for each batch file would be near 30 seconds.
But in my case, AVG ≈ 90 seconds, which suggests the messages wait for atleast approximately one batching interval (60 seconds in this case) before being considered for a particular batch.
Assumption: When a batch of messages are written in a blob file, they are written all at once.
My question:
Is this delay of one batch interval or 60 seconds intentional? If yes, then I assume it will change on changing the batching interval to say 100 seconds.
If, no, then, does it usually take 60 seconds to process a message in iothub and then send it through a route to a custom endpoint? Or am I looking at this from a completely wrong angle?
I apologize beforehand if my question seems confusing.
