I have created a copy activity which has source for REST API and Sink as Azure ADLS. It work for the 1000 records.
I need to apply custom pagination to track if API has more than 1000 records any time. I am trying to parameterize the copy activity dataset but not sure how to use until for looping in this case.
How can I achieve this custom pagination?
I am expecting that the copy activity iterates until it gets the full set of records. Source of the copy activity is Rest API and sink is ADLS. I am not sure how to use Until activity for this kind of pagination where the result set is different (Have Top and sink as parameters) and requires a custom pagination.
Related
I am new to Azure Data Factory and wonder if you can please help me in acheiving with the following scenario:
I want to get data from a REST Endpoint using API. The REST Endpoint is stored in a SQL Database table and therefore I fetch the URL using a Lookup activity
Further on, I am storing the URL value in a variable using a "Set Variables" activity
Post that I am fetching the data from the Endpoint using REST API in a Web activity.
Now , I want to store the output data from the Web activity into a Blob storage. For this, i am using Copy activity , but I am not able to get this working at all. Meaning , I am unable to collect the output from the Web Activity into my Copy activity.
In case any of you have come across this situation then it will be of very good help for me indeed.
Seems you are not using correct output from web activity in copy activity. To check what is the output from web activity follow this GIF and pass those values appropriately to copy activity
Is there a method that gives me the list of files copied in azure data lake storage after a copy activity in azure data factory? I have to copy data from a datasource and after i have to skip files based on a particular condition. Condition must check also file path and name with other data from sql database. any idea?
As of now, there's no function to get the files list after a copy activity. You can however use a get Metadata activity or a Lookup Activity and chain a Filter activity to it to get the list of files based on your condition.
There's a workaround that you can check out here.
"The solution was actually quite simple in this case. I just created another pipeline in Azure Data Factory, which was triggered by a Blob Created event, and the folder and filename passed as parameters to my notebook. Seems to work well, and a minimal amount of configuration or code required. Basic filtering can be done with the event, and the rest is up to the notebook.
For anyone else stumbling across this scenario, details below:
https://learn.microsoft.com/en-us/azure/data-factory/how-to-create-event-trigger"
I have a dataset in Data Factory, and I would like to know if is possible update row values using only data factory activities, without data flow, store procedures, queries...
There is a way to do update (and probably any other SQL statement) from Data Factory, it's a bit tacky though.
The Loopup activity, can execute a set of statements in Query mode, ie:
The only condition is to end it with select, otherwise Lookup activity throws error.
This works for Azure SQL, PostgreSQL, and most likely for any other DB Data Factory can connect to.
Concepts:
Datasets:
Datasets represent data structures within the data stores, which simply point to or reference the data you want to use in your activities as inputs or outputs.
Now, a dataset is a named view of data that simply points or references the data you want to use in your activities as inputs and outputs. Datasets identify data within different data stores, such as tables, files, folders, and documents. For example, an Azure Blob dataset specifies the blob container and folder in Blob storage from which the activity should read the data.
Currently, according to my experience, it's impossible to update row values using only data factory activities. Azure Data Factory doesn't support this now.
Fore more details,please reference:
Datasets
Datasets and linked services in Azure Data Factory.
For example, when I use Copy Active, Data Factory doesn't provide my any ways to update the rows:
Hope this helps.
This is now possible in Azure Data Factory, your Data flow should have an Alter Row stage, and the Sink has a drop-down where you can select the key column for doing updates.
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-alter-row
As mentioned in Above comment regarding ADF data flow, ADF data flow does not support on-permise sink or source, the sink & source should reside in Azure SQL or Azure Data lake or any other AZURE data services.
I'm using ADF 2 and trying to grapple with the web activity.
The tasks are.
Copy file from blob storage and put the data in an azure SQL database
Iterate through the data and use a PUT call to a REST API to update the data
Okay so I can get the data in the table no problem. I can also make the call using a Web activity to the the API and put some hard coded data there.
But I've been trying to use a For Each to iterate through the table I have and call the web activity to pass that data to the API
This is where I'm stuck. I'm New to data factory and been through all their standard help information but not getting any where.
Any help is appreciated
I think you need to drive the foreach via a SQL lookup task that populates a data set and then call the activity for each row:
here are some posts to get you started:
https://learn.microsoft.com/en-us/azure/data-factory/control-flow-for-each-activity
https://learn.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity
replace the copy activity with the web call in the tutorial below:
https://learn.microsoft.com/en-us/azure/data-factory/tutorial-bulk-copy-portal
Does anyone tell me how Azure Data factory datasets are cleaned up (removed, deleted etc). Is there any policy or settings to control it?
From what I can say, all the time series of data sets are left behind intact.
Say, I want to develop an activity which overwrites data daily in the destination folder in Azure Blob or Data Lake storage (for example which is mapped to external table in Azure Datawarehouse and it is a full data extract). How can I achieve this with just copy activity? Shall I add custom .Net activity to do the cleanup no longer needed datasets myself?
Yes, you would need a custom activity to delete pipepline output.
You could have pipeline activities that overwrite the same output but you must be careful to understand how ADF slicing model and activity dependency works, so that anything using the output gets a clear and repeatable set.