Issue in reading paginated API data using ADF - azure

I am trying to read the data from a paginated API using ADF copy activity. While accessing the API at first time, only the base URL is used(without any query params). When API returns the data, it will have a key called cursor with some token. This has to be passed as query string to get the next page. How to implement this using ADF?. How to append "?cursor=" in the base URL and then add cursor value from API response
First Page : {{Base URL}} #returns a JSON with a key called cursor
From Second Page : {{Base URL}}?cursor = {{value of cursor key in JSON Response}}
I have tried using Query Parameters inside the Copy Activity of pagination. But it is not working

You can use web activity to the cursor value and pass it to the relative URL in copy activity.
You can first make a call with the base URL using web activity.
Parameterize the relative URL in the dataset.
Connect web activity to copy data activity and pass the web activity output required value to the copy activity dataset parameters to run the URL with cursor value.
Ex: #concat('?cursor=',activity('Web1').output.cursor)

Related

Add parameter as drop down list in ADF pipeline

I have a Copy Data task which is obtaining data from an API
I want to add the Request Method as a parameter to be defined at run time so I want it to be a Drop Down list instead of a String, Can this be done in Azure Data Factory?
Unfortunately, It is not possible to create dropdown list for Request Method.
But you can raise feature request here
Refere - https://learn.microsoft.com/en-us/azure/data-factory/connector-http?tabs=data-factory

Custom Pagination of Rest API in Azure Data Factory

I would like to retrieve all results from Rest API endpoint.The URL has the below form
https://myapi.com/relativeapi?project=&repo=&prId=&page=&pageSize=&startTime=&endTime
By default when requesting data it is returned only the first page. A sample output is the below
"pageSize":50,
"num":50,
"isLastPage":false,
"data":
{"ABC":{"mock1":[{"Id":18,"Date":"202104T02:04:53.000Z","attr1":0,"attr2":0,"attr3":0,"historyData":[{"Date":"2021-11-03T00:08:13.000Z","attr1":0,"attr2":0,"attr3":0,"attr4":{}}
How can we achieve this in Azure Data Factory and retrieve all results from all pages (last page is till "IsLastPage=TRUE and "data" is empty)?
Also how can we incrementally request API data, so the pipeline does not need to run all results from beginning (page 1), but get results from last updated page
#christi08
Since the next page information is not returned in the output. Unfortunately you will not be able to make utilize of the inbuilt pagination feature.
As alternative/workaround - you could use the below approach.
You could use an iterative approach to achieve your end goal.
STEP 1 :
Your request is going to be in the below format
https://myapi.com/relativeapi?page=1.......
https://myapi.com/relativeapi?page=2.......
https://myapi.com/relativeapi?page=3.......
https://myapi.com/relativeapi?page=n.......
Step 2 :
Create a variable named pageno at the pipeline level.
Step 3:
In the Rest Connector create a Parameter page.
This page parameter would be added as a relative url along with other parameter & path.
In your case, the base url will be different.
Step 4 :
Noe in the copy activity, under the source setting.
You will pass the parameter with the value of the pipeline variable.
This pipeline variable will be incremented.
So for each iteration - the pageno will be incremented and therefore the relative url is also dynamic.
You would need SET VARIABLE activity to increase the pageno pipeline variable.
To loop you could use an Until Activity
End condition
For the until activity to end, you will have to provide an expression.
You could add another web activitity / lookup activity with the dynamic relative url.
You could access the output of the webactivity / lookup activity and access the isLastPage node - Until this is true.
You could access the copyactivity output and see whether the number of rows written is 0. and end the until activity.

Paginate REST API call in Data Factory

I want to call a ZenDesk REST API in Data Factory,
here's how the output looks like
next_page determines the next URL page of the REST API
end_of_stream returns false if I've reached the end of the page
In Data Factory, I've set up a Copy Activity in a pipeline which copies the data from REST JSON into my Azure DB, something like this
The problem is the pagination in ADF doesn't support the type of pagination that Zendesk API provides, from my research it looks like I'll need to create an Until loop in order to make this work, so something like this
I need to know how I can
Set a variable to true/false based on the Copy Activity output from the REST API call
Dynamically set the URL of my Copy Activity, so it can loop through all the pages in the REST API
Thanks!
Set a variable to true/false based on the Copy Activity output from the REST API call
In the value Variables section of the Set Variable Activity, in the variable click on the add dynamic content.
#activity('Copy data1').output.end_of_stream
You can replace the Copy data1 with that of your own Copy Activity Name
Dynamically set the URL of my Copy Activity, so it can loop through all the pages in the REST API
For your REST API dataset,
Configure the below Parameter
Relative URL :
In the Copy Data Activity :
Once the first copy action is done, you should update the variable start_time with the value of the end_time of the output.
Snippet Output of the REST API
The reason being, the next page url is same relative url of the API with start_time parameter with the value of end_time of the REST API (Current Page) Output.

Variable from lookup in a foreach block in Azure datafactory

I'm trying to set up a simple pipeline in ADF. First step is selecting access tokens and logfilenames from a table on an MSSQL server. This works ok, preview shows a table containing two columns; token and logfilename. After this lookup I have a foreach loop which needs to do http requests to a REST API using the values in the columns token and logfilename. As items, in the foreach block, I set #activity('nameoftheactivity').output. In the foreach loop is a copy block. Source of this copy block is a REST API with its base URL (https://api.domain.com/v2/), relative URL is set as
#concat('logfile/',dataset().ForEachLogfilename,'.',formatDateTime(utcNow(), 'yyyy-MM-dd'),'.log/json?access_token=',dataset().ForEachToken)
The ForEachLogfilename and the ForEachToken are set as Dataset properties with values as
#{item().token} and #{item().logfilename}
When I hit the preview button Azure suggest that I set values for #item().token and #item().logfilename which I do as suggested. A click on finish AND I HAVE DATA from the rest API. But only with the preview data... It just errors when I perform a "trigger now"... Can anyone point me in the right direction?
cheers!
Found it!!
instead of
#concat('logfile/',dataset().ForEachLogfilename,'.',formatDateTime(utcNow(), 'yyyy-MM-dd'),'.log/json?access_token=',dataset().ForEachToken)
I had to use
#concat('logfile/',item().logfilename,'.',formatDateTime(utcNow(), 'yyyy-MM-dd'),'.log/json?access_token=',item().token)
in the "Add dynamic content" field....

Parse-Server query fed directly from URL query string

I'd like to know if this is even possible. And if it is possible, what the security ramifications would be.
I want to use Javascript to build a dynamic URL to query a Parse-Server database.
It appears that it might be possible based on an earlier Stackoverflow question here and a Node.js doc here
Here's how I envision it working....
So, a user would be sent (via email/Twitter/etc) a link which was created by above method. Once the user clicked on that URL link, the following would happen automatically:
Step #1: User's system would automatically submit a parse-server query.
Step #2: On success, the user's browser would download a web page which displayed the query results.
step one create the pointer value ie the query pseudo-semantic
step 2 insert the pointer value to a text-type field in parse cls=clazz
step 2b mailgun a msg containing a link like :
express.domain.com/#!clazz/:yALNMbHWoy - where 'yA...oy' is oid of pointer row in parse/clazz table
Note that the link is an abstraction only. Its a uri first to express/route && function that will simply get a row from parse.clazz. That row contains the semantic for making a parse query to get back the full DB compliment to pass along to the node template setting up the html ...
in your node/router GET/clazz/:oid will lookup that Parse row in parse/clazz, using the pointer/text value to format a 2nd, Parse query. The query response from the 2nd parse.qry is your real meat ... This response can be used by the express template formatting the html response to orig request on "express.domain.com".
where u ask "download web page" .. that is simply node's RESPONSE to a GET on a route like "GET/clazz".

Resources