Fetch on-demand data from Azure Data Factory Pipeline - azure

I have searched for on-demand data fetch but found details about scheduling ADF pipeline.
I want to know about how to achieve on-demand data load from ADF pipeline?

Documentation for One-time pipelines is here: https://learn.microsoft.com/en-us/azure/data-factory/data-factory-scheduling-and-execution#onetime-pipeline
You can use this for example with PowerShell (https://learn.microsoft.com/en-us/azure/data-factory/data-factory-copy-activity-tutorial-using-powershell) to script one-time execution.

Related

Running Code in Azure Repo from Azure Data Factory

Main problem:
I need to orchestrate the run of Python scripts using an Azure Data Factory pipeline.
What I have tried:
Databricks: The problem with this solution is that it is costly, a little slow (the need to spin up clusters), and it requires that I write my code in a notebook.
Batch activity from ADF: It too is costly and little slow. I don't have to write my code in a notebook, but I have to manually put it in a storage account, which is not great when debugging or updating.
My question:
Is there a way to run code in an Azure repo (or Github repo) directly from the Data Factory? Like the batch activity but instead of reading the code from a storage account read it from the repo itself?
Thanks for your help
Based on the statement in the document "Pipelines and activities in Azure Data Factory", the Azure Git Repos and GitHub Repos are not the supported source data store and sink data store for ADF pipelines. So, it is not possible to directly run the code from the git repository in the ADF pipelines.
However, ADF has the Source control option to allow you to configure a Git repository with either Azure Repos or GitHub. Then you can configure CI/CD pipelines on Azure DevOps to integrate with ADF. The CI/CD pipelines can directly run code from the git repository.
For more details, you can see the document "CI/CD in ADF".

Azure Data Factory and Calling an Azure Batch Job

I am new to Azure Data Factory pipelines.
I want guidance on how to call an Azure Batch Job via a Azure Data Factory pipeline and monitor the batch job for failure/completion - is this possible ?
Regards
I found the following articles which I am working through...
https://learn.microsoft.com/en-us/azure/data-factory/v1/data-factory-data-processing-using-batch

Azure Data Factory, How get output from scala (jar job)?

We have a Azure Data Factory pipeline and one step is a jar job that should return output used in the next steps.
It is possible to get output from notebook with dbutils.notebook.exit(....)
I need the similar feature to retrieve output from main class of jar.
Thanks!
Image of my pipeline
Actually,there is no built-in feature to execute jar job directly as i know.However, you could implement it easily with Azure Databricks Service.
Two ways in Azure Databricks workspace:
If your jar is executable jar,then just use Set JAR which could set main class and parameters:
Conversely,you could try to use Notebook to execute dbutils.notebook.exit(....) or something else.
Back to ADF, ADF has Databricks Activity and you can get output of it for next steps.Any concern,please let me know.
Updates:
There is no similar feature to dbutils.notebook.exit(....) in Jar activity as i know.So far i just provide a workaround here: storing the parameters into specific file which resides in the (for example)blob storage inside the jar execution.Then use LookUp activity after jar activity to get the params for next steps.
Updates at 1.21.2020
Got some updates from MSFT in the github link: https://github.com/MicrosoftDocs/azure-docs/issues/46347
Sending output is a feature that only notebooks support for notebook
workflows and not jar or python executions in databricks. This should
be a feature ask for databricks and only then ADF can support it.
I would recommend you to submit this as a product feedback on Azure
Databricks feedback forum.
It seems that output from jar execution is not supported by azure databricks,ADF only supports features of azure databricks naturally. Fine...,you could push the related progress by contacting with azure databricks team. I just shared all my knowledges here.

Use Azure Functions as custom activity in ADFv2

Is it possible to somehow package and execute already written azure function as a custom activity in azure data factory?
My workflow is next:
I want to use azure function (which is doing some data processing) in ADF pipeline as a custom activity. This custom activity is just one of the activities in pipeline but its key to be executed.
Is it possible to somehow package and execute already written azure
function as a custom activity in azure data factory?
As I know, there is no way to do that so far. In my opinion, you do not need to package the Azure Function. I suggest you using Web Activity to invoke the endpoint of your Azure Function which could merge into previous pipeline nicely.

Scheduling U - SQL Job

I am trying to schedule a U SQL job. Please let me know whether I can schedule the U SQL job.If so,how can I schedule.
Thanks,
Vinoth
To my mind, the best way to orchestrate your U-SQL job along with concomitant data management such as getting source data, pushing output data and etc is Azure Data Factory V2. ADF has reach API. Basically, you can run your jobs using either PowerShell or C# or a trigger.
See my very simple example of the job and how to add a trigger below. In this example, I process the documents with my U-SQL job and then push output file (CSV or Avro file) into Azure SQL Server
You could use Azure Automation (with the help of the Azure Data Lake Analytics Cmdlets) or Azure Data Factory to schedule a U-SQL script in the cloud.
You can get some guidance regarding creating a ADF Pipeline here:
https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline-using-editor/

Resources