File transfer in Azure Integration Services - azure

I have a requirement where I need to transfer file (20-150 MB) between two systems. For this requirement , is it better to use Durable function instead of Azure data factory (ADF). As per my understanding , ADF execution will costlier as compared to durable functions. Note : durable function trigger is eventGrid trigger. Any suggestion will be helpful. File transfer will be simple pass through, no transformation is involved.
Also, for my requirement even simple azure function will work right instead of durable function? There is no need of function orchestration as file is not processed in batch. Since, file will be processed based on event trigger.

As of my experience, I would like to recommend using Azure functions over ADF is a good idea because of the following reasons:
Azure Data Factory is too expensive. ADF costs way more than azure functions.
Custom Logic: ADF is not built to perform cleansing logics or any custom code. Its primary goal is for data integration from external systems using its vast connector pool
Latency: ADF has much higher latency due to the large overhead of its job framework
Durable function is just related to the maximum execution time of a single call. For "out of the box" functions, that timeout is 10min, for durable functions this limitation gets removed. In this case, where you simply need to copy the data, there might be timeout issue and therefore you can consider the Durable function. Otherwise, simple function should also work fine. Moreover, Durable functions and normal functions share the same billing pattern.
For more details: https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview?tabs=csharp

Related

Azure function queue trigger distributed tracing

I have a .NET isolated function with a queue trigger.
When triggering that queue from the storage explorer or from another function using a QueueServiceClient, a new operationId is made up. Thus I cannot correlate it, it seems.
Is it possible to do distributed tracing using W3C standard for Azure Function Queue trigger? I can not find any source on this.
If so, how?
Currently not supported.
Azure Functions team will evaluate this scenario (at some unmentioned point in time) whether or not it can be/will be supported. This has to do with their dependency on the team creating the Azure.Storage.Queues SDK.
https://github.com/Azure/azure-functions-dotnet-worker/issues/1126

Is there a way to easily track CosmosDB RU/s in Azure Functions?

I've combined Azure Functions with CosmosDB and need to track how many RU/s every function consumes. Is there an easy way to track this information? Some of my functions make trips to several containers to aggregate data, and so it's difficult to get RequestCharge from every operation and calculate the sum without accidentally skipping RU/s. I'm wondering if there is an existing integration with Functions.
If not, is it possible to define a static variable only for the current execution context so that I could leverage it in different classes for counting RU/s? Due to the serverless nature of Functions, I'm not sure how to define a static variable that won't be overwritten by another concurrent execution.
It is not recommended to implement this kind of monitoring inside Azure Functions to do processing as it introduces a vector for processing of the item to fail.
If you are looking to monitor usage of Cosmos DB containers you should use one of the monitoring options in the Monitoring Azure Cosmos DB article.
However, given that your Functions are calling multiple containers another possible option is to manually measure each of the operations performed by each Azure Function and then monitor the execution counts in Azure Monitor on the Functions themselves and multiply by the RU/s you manually calculated.

Azure Data Factory(ADF) vs Azure Functions: How to choose?

Currently we are using Blob trigger Azure Functions to move json data into Cosmos DB. We are planning to replace Azure Functions with Azure Data Factory(ADF) pipeline.
I am new to Azure Data Factory(ADF), so not sure, Could Azure Data Factory(ADF) pipeline be better option or not?
Though my answer is a bit late, I would like to add that I would not recommend replacing your current setup with ADF. Reasons:
It is too expensive. ADF costs way more than azure functions.
Custom Logic: ADF is not built to perform cleansing logics or any custom code. Its primary goal is for data integration from external systems using its vast connector pool
Latency: ADF has much higer latency due to the large overhead of its job frameweork
Based on you requirements, Azure Data Factory is your perfect option. You could follow this tutorial to configure Cosmos DB Output and Azure Blob Storage Input.
Advantage over azure function is being that you don't need to write any custom code unless there is a data cleaning involved and azure data factory is the recommended option, even if you want azure function for other purposes you can add it within the pipeline.
Fundamental use of Azure Data Factory is data ingestion. Azure Functions are Server-less (Function as a Service) and its best usage is for short lived instances. Azure Functions which are executed for multiple seconds are far more expensive. Azure Functions are good for Event Driven micro services. For Data ingestion , Azure Data Factory is a better option as its running cost for huge data will be lesser than azure functions. Also you can integrate Spark processing pipelines in ADF for more advanced data ingestion pipelines.
Moreover , it depends upon your situation . Azure functions are server less light weight processes meant for quick access in response to an event instead of volumetric responses which are meant for batch processes.
So, if your requirement is to quickly respond to an event with little information stay with Azure functions or if you have a need for batch process switch to ADF.
Cost
I get images from here.
Let's calculate the cost:
if your file is large:
43:51hour=43.867(h)
4(DIU)*43.867(h)*0.25($/DIU-H)=43.867$
43.867/7.514GB= 5.838 ($/GB)
if your file is small(2.497MB), take about 45 seconds:
4(DIU)*1/60(h)*0.25($/DIU-H)=0.0167$
2.497MB/1024MB=0.00244013671 GB
0.0167/0.00244013671= 6.844 ($/GB)
scale
The Max instances Azure function can run is 200.
ADF can run 3,000 Concurrent External activities. And In my test, only 1500 copy activities were running parallel. (This test wasted a lot of money.)

What state is maintained in Azure Durable functions?

When going through Azure Durable function they mention that we can write stateful functions. What is meant by stateful and what state is maintained? Are we talking about running state of a function?
A stateful function is a function which has a state, that is, some data is asscociated to the function. In our specific case, we are talking about:
managing state of the workflow (at what step we are)
create progress checkpoints (when a checkpoint is reached, the state is changed)
persisting execution history
scheduling activity
From the docs:
Durable Functions is an extension to the Azure Functions runtime that
enables the definition of stateful workflows in code. By breaking down
workflows into activities, the Durable Functions extension can manage
state, create progress checkpoints, and handle the distribution of
function calls across servers. In the background, it makes use of an
Azure Storage account to persist execution history, schedule activity
functions and retrieve responses. Your serverless code should never
interact with persisted information in that storage account, and is
typically not something with which developers need to interact.

Custom azure function trigger and scaling

Hi I'm trying to find information on how an azure function running on a consumption plan would scale with a custom trigger. This article - https://learn.microsoft.com/en-us/azure/azure-functions/functions-scale#how-the-consumption-plan-works - seems to imply theres a custom scaling implementation per trigger and does not have any explanation of how that works with custom triggers (if at all).
Custom triggers are not supported for Azure Functions. I think the main reason for that is indeed lack of Scaling Controller hooks.
Based on what is done in Durable Functions, you might be able to define your own triggers which are based on other existing triggers (like Orchestration Trigger is based on Storage Queues) to add your specific semantics, but reuse the scaling logic.

Resources