Azure data factory data flow task cannot take on prem as source - azure

Hey,
I am having trouble in creating a Data Flows task which uses an on-prem source. Is this not possible in the Preview version?
I have created a self hosted IR to connect ADF to my laptop, and that is what I want to use. In the pic below I am trying to create a dataset off self hosted IR. It works great in Copy task but for Data Flows it is greyed out.

For this question, I asked Azure support for help and they replied me with the answer:
Answer:
This means on-premise SQL server is not supported as dataset in data flow in current stage.
Screen shot:
Update:
Data flow now only support Azure IR so it doesn’t support on-premise dataset.
Refer to Integration runtime types.
Hope this helps.

If your goal is to use visual data transformations in ADF using Mapping Data Flows with on-prem data, then build a pipeline with a Copy Activity first. Use the Self-Hosted Integration Runtime with the Copy Activity to stage your data in Blob Store. Then add a subsequent Execute Data Flow activity to transform that data.
I made video on how to do this:
https://www.youtube.com/watch?v=IN-4v0e7UIs

Reproduce your issue on my side ,however nothing about this feature is claimed on the official document. As you can see everywhere about data flow cliams that:
You could submit any voice here:
Also found a feedback for data flow in ADF for your reference.If you need push the progress of it,you could vote up it. Also,i would suggest you referring to the comments in the link:
For access to the 80+ ADF connectors, use Copy Activity to stage data
for transformation.
Data Flows will access data in your lake (Blob, ADB, ADW, ADLS) for
transformation.
Think of Copy Activity as your data heavy-lifting activity and Data
Flow as your data transformation engine.

Related

How to import your example files into my Azure Data Factory

I believe many of you have ADF experiences, and may be have seen Mark Kromer's example of azure data flows (https://github.com/kromerm/adfdataflowdocs)
I am a beginner using ADF and azure Data Flows especially. I am very curious of these examples, and I really want to import all your examples files (json) to my newly created data factory. It must be an easier way than creating all activities, connections, datasets and others manually. The templates are of course good but I want to test out your example code in my azure portal and in my data factory.
And just a second question: I am a SSIS man used control flows with master packages executing other packages. When I am now building a data warehouse in ADF with many dimension tables and fact tables, is it best practice to have separate data flows or should I build general data flows that either have many parallel upserts to different dimensions? I think I need some guiding here
Thank you
Regards Geir
I've been publishing those example data flows as ADF pipeline templates.
In your browser, go to ADF and then click Factory Resources > Pipeline from Template. You should see a Data Flow category. Many of the examples are in there.
In ADF, you can also have master / child packages. You'll use Execute Pipeline in ADF instead of execute package.
ADF supports re-use and generalization of data flow patterns, so that you could process multiple dimension tables in a single data flow. You'll have to weigh the value of doing that vs. supporting it long-term as well as the danger of creating a very complex data flow.

Does azure data factory support real time copy activity?

I want to copy data from an on-premise oracle db to sql server on a real time basis.
Data Factory has many strengths, but frequency is not one of them. Have you considered a different approach?
For real time integrations I would recommend Functional App, Logic App and/or Service Bus. I would call this App whenever there is a relevant change in the oracle DB. Alternatively you could have an API on top of this on-premise oracle DB that you call from a scheduled APP.
If you expect heavy traffic you might want to consider using a Service Bus. The illustration below shows how Azure Service Bus sends data from a Publisher(on-premise) to a Subscriber(Azure sql DB) using Topic Message.
Illustration of large scale service bus dataflow
Ref. Azure Service Bus
Welcome to Stack Overflow!
You can copy data from an Oracle database to any supported sink data store. For a list of data stores that are supported as sources or sinks by the copy activity, see the Supported data stores table.
To copy data from and to an Oracle database that isn't publicly accessible, you need to set up a Self-hosted Integration Runtime. The integration runtime provides a built-in Oracle driver. Therefore, you don't need to manually install a driver when you copy data from and to Oracle.
For more details and step by step procedure, refer Copy data from and to Oracle by using Azure Data Factory.
You can also use custom activity as per your needs. Refer custom activities.
Hope this helps.
You could copy data incrementally. But there is frequency limitation. Please reference this post. https://social.msdn.microsoft.com/Forums/en-US/54380f98-716b-4a95-88af-cad2ab7e47b5/what-type-of-data-ingestion-does-azure-data-factory-use?forum=AzureDataFactory

Is Azure Data Factory suitable for downloading data from non-Azure REST APIs?

Consider a data processing pipeline as follows:
Fetch a large amount of data from a REST API that's hosted somewhere on the internet and persist it to a data store.
Perform some complex data transformations on the persisted data.
Persist the results of the data transformations on a data store.
Aiming to implement such a pipeline in Azure, steps 2 and 3 seem like a good fit for implementation as Azure Data Factory activities.
My questions is - Does it make sense to implement step 1 in an Azure Data Factory activity as well?
Technically it might be possible to code a .Net activity that perform the data download and persistence.
No - do not implement step 1 in an Azure Data Factory activity.
Technically it is possible to run the entire process from ADF but I would argue that the choice is more costly (relatively) than other options available to you because you will pay for each activity in Azure Data Factory.
For instance, what if the rest api does not have any new data to offer when you initiate the (scheduled) activity? You'll pay for that.
You might consider the following as an easy to implement alternative:
1 - Create a .NET console app, publish as a WebJob, schedule to run daily.
2 - The long-running console app can query the rest api, persist data into azure storage / documentdb, push a message into queue which triggers ADF steps 2/3 to run against the saved data.
I have done exactly that using .Net Activity. I had a need to fetch data from Salesforce api. This has been working well for my needs. Here is a post I wrote up about creating a .net activity and storing the data in azure data lake.
As in Newport99's answer yes you will incur costs for that activity but I am not sure how cost effect it would be to be running a separate web app to host a web job and also run the Azure Data Factory pipeline. When I was originally designing a solution the WebJob was my first choice but in the end I prefer to have the whole solution utilizing one azure service instead of multiple.
Hope that helps.
There have been a lot of improvements to ADF in the years since this question was posted, including a REST connector.
Here's the approach recommended by ADF at this time...
Copy data from a REST endpoint by using Azure Data Factory

Azure Data Factory - moving data from On-Premise SQL to Azure SQL

A simple question: Can this be achieved directly? I mean without the Azure blob storage in between (as showed in all the examples)? Can someone provide some code example please.
yes, you can do this directly. In fact, you can do direct copies from any of our supported sources/sinks, you don't have to pass through blob. To go from on-prem SQL Server-->SQL azure, you will need to setup a Data Management Gateway connector on your on-prem server. Then, you use a linked service of type AzureStorage and an output dataset of type AzureSQLTable as the output dataset, instead of AzureBlob as is shown in the example. The exact steps to setup the DMG and the JSON code for the linked services, datasets, and pipelines can be found in our documentation. We are also improving our UI in the near future to make these kinds of copy setups an easy code-free experience.
https://azure.microsoft.com/en-us/documentation/articles/data-factory-sqlserver-connector/

trigger azure ml experiment from powerbi

I have created an azure ml experiment which fetches data from API and updates it in sql azure database. My power bi report picks data from this database and displays the report. The data from the source is changing frequently. So I need something like a checkbox in power bi which when checked will trigger the azure ml experiment and update the database with latest data.
I know that we can schedule it to run in Rstudio pipeline but we are not thinking of this approach as it is not financially viable.
Thanks in Advance.
You could use a direct query connection from Power BI to your Azure SQL instance. Then the reports in power bi will be always up to date with the latest data you have. Then the only question is when to trigger the ML experiment. If this really needs to be on demand (rather than on a schedule) you could do that in a button in your own App. You could embed the report in your app so that you get an end to end update.
You could have a look at the Azure Data Factory (ADF), that will help you build data pipelines in the cloud.
You can use ADF to read the data from the API (refresh your data), batch-wise-score it in Azure Machine Learning, and push it directly to your Azure SQL making PowerBI always seeing the latest data which will be scored.
Take a look at the following blog where they take data through this kind of pipeline. You just have to change that the data doesn't come from Stream Analytics but from your API.
http://blogs.msdn.com/b/data_insights_global_practice/archive/2015/09/16/event-hubs-stream-analytics-azureml-powerbi-end-to-end-demo-part-i-data-ingestion-and-preparation.aspx

Resources