Azure Data Factory - Retrieve next pagination link (decoded) from response headers in a copy data activity of Azure Data Factory

Azure Data Factory - Retrieve next pagination link (decoded) from response headers in a copy data activity of Azure Data Factory - azure

I have created a copy data activity in azure data factory and this data pipeline pulls the data from an API (via REST activity source) and writes the response body (json) on a file kept in the azure blob storage.
The API which I am fetching the response from, is paginated and the link to next page is sent in the response headers in response->headers->link.
This URL to next page is in the following general format:
<https%3A%2F%2FsomeAPI.com%2Fv2%2FgetRequest%3FperPage%3D80%26sortOrder%3DDESCENDING%26nextPageToken%3DVAdjkjklfjjgkl>; rel="next"
I want to fetch the next page token present in the above URL and use it in the pagination rule.
I have tried using some pagination rules:
> AbsoluteURL = Headers.link
But, this did not work as the entire encoded link shown above, is getting appended directly and hence, the pipeline throws an error.
> Query Parameters
I have also tried to use the query parameters but could not get any result.
I have followed questions over stackoverflow and have read the documentations:
Please help me with how can I access this next page token or what can be the pagination rule to support the scenario.
Pasting postman output and ADF data pipeline, for reference.
Postman Response Headers Output
Pagination Rules, I need help on

Related

How to get creation datetime of a blob file from Azuredatafactory or Azure Dataflow

I can fetch lastmodified file from get metadata activity but the requirement is the creation date of the file .I can see no option to fetch that in Azure Data Factory . Besides I need to fetch:
First row from fileCreationDateTime --- Descending Order. So i need the CreationDateTime functionality as LastModified .
As shown in the picture , there are two things CREATION TIME and LAST MODIFIED . So i need to fetch all the creationtimes and sort it to decending and pick the first row.

You can use Web Activity in ADF and make a call to Get Blob REST API which will return the blob creation date in Web activity Response headers section, and you can use that value as per your requirement.
Get Blob REST API sample URI: https://myaccount.blob.core.windows.net/mycontainer/myblob
Ref document: Get Blob - Response Headers
You can capture the value of Blob creation datetime from the REST API response headers in ADF using below dynamic expression:
#activity('WebGetBlobDetails').output.ADFWebActivityResponseHeaders['x-ms-creation-time']
Here is a sample demo. I have used the hardcoded bearer token in the headers for a quick demo, but you can automate and pass dynamically by having another web activity to get the bearer token before WebGetBlobDetails activity.
Sample GIF:

Got Error Ingest JSON array data into Azure (via Azure Synapse or Azure Data Factory)

I'm trying to ingest the Json array format from US Census API data into Azure either ASA or ADF is fine... Tried using HTTP or REST and none of them is successful.enter image description here
The error after using HTTP connector is shown as
"Error occurred when deserializing source JSON file 'c753cdb5-b33b-4f22-9ca2-778c97a69953'. Check if the data is in valid JSON object format. Error reading JObject from JsonReader. Current JsonReader item is not an object: StartArray. Path '[0]', line 1, position 2. Activity ID: 8608038f-3dd1-474f-86f1-d94bf5a45eba".
I attached the error message as well as sample API data and "test connection successful" screenshots in this post.
Shall I put in some parameters or advanced set up to specify something about the array form for the census data? Please advise.
The sample data link is inserted for your reference.
https://api.census.gov/data/2020/acs/acs5/subject?get=group(S1903)&for=state:51
Greatly appreciate your help in advance!
T.R.
error in azure synapse ingestion
connection test is good
US Census API Sample Test Data

As per official documentation As REST connector only support response in JSON.
Data link provided by you returns json array not Json. That is why ADF can not accept data returned by data link.

How can I import all records from an Airtable table using an Azure Synapse Analytics pipeline rather than just retrieving the first 100?

When using the REST integration in an Azure Synapse pipeline and supplying the proper authorization (api_key), I'm only getting 100 records loaded into my Azure Synapse data sink. How do I ensure all records are imported?

There is a pagination offset that appears in the JSON response of Airtable. On the Source tab of the copy data step in Synapse, under Pagination rules, select QueryParameter, enter "offset" (no quotes) into the field next to QueryParameter, and enter "$['offset']" (no quotes) into the Value. That's it - no need for relative URL or a parameter configuration. The pagination rule tells synapse to look for the data element "offset" in the response and to continue fetching more data until a response no longer contains that data element in the JSON. See screenshot below. The second screenshot shows the authorization configuration.
The authorization configuration for the Airtable API is shown below - this causes Synapse to include the HTTP header and value "Authorization: Bearer " to the Airtable API. Just replace <api_key> with your Airtable api key which can be found and / or created under your account settings in Airtable.

Pagination error invalid token for Rest API Data Factory

I am trying to use rest api to do pagination as it is just sending the first page in Azure ADF going to blob storage. I am currently using AbsoluteUrl and $['#odata.nextLink'] to get over all the pages, the issue is I am getting this error, I have used first used the token activity to get the token and then used it in copy activity where the source is rest api dataset with headers dynamically coming from token activity and then used pagination. Can you point me in the right direction on if this is the correct approach or am I missing something.
This is how the import schema looks like:
And the error after importing schema
This is how my rest api configuration look like:
And this is how my token all web activity looks like:
Edit 2:
This is how the output is for Web activity:
Including the part of the snip that missed the access token:
This is the output for Copy Activity when Pagination is on:
This is the setup of the pipeline:

HttpStatusCode 401 indicates the authentication was not completed or failed due to invalid credentials. It maybe that the access token is missing in request from copy activity or not referenced properly or is expired. Make sure you already have the right access to this API.
Here is an example with basic configuration requirements:
Get the access token
Ensure you are able to reference it dynamically using Add dynamic content fields. Modify the reference with respect to the output you have received from earlier Login Activity.
Additional headers: Authorization: #concat('Bearer', activity('Login').output.access_token)
AbsoluteUrl: ${result_root}.{nextPageURL}
Here is the official doc on Pagination support refer the supported Key and value pairs.
If you are getting the access token correctly but still seeing the error, try to Import Schema in Mapping Settings of copy activity. And make sure the nextPageUrl or odata.nextLink in your case is mapped correctly.
Recheck $['#odata.nextLink'] , AbsoluteUrl value as:
$.rootElementName.CollectionOfItems.nextLinkURL

How to copydata from RestAPI using datafactory and save it in Datalake?

I'm trying to fetch data from REST API and save the json string it into DataLake and I'm getting an error. I've followed the steps mentioned here
https://learn.microsoft.com/en-us/azure/data-factory/connector-rest & https://www.alexvolok.com/2019/adfv2-rest-api-part1-oauth2/
The API which I'm trying to connect uses OAuth2 so I need to first get the access token and then do a get request to get actual data.
Below are the steps which I'm following
Creating a Web HTTP request in the pipeline and passing the client_ID, client secret, username, password and grant type in the body of the request. When I debug the pipline I do get the Access_token which I need in step 2.
In Step two I have a copy activity which uses the output(access_token) from web to authenticate the second REST GET request but this is where I'm facing a lot of issues. The code which I'm using is "#concat('Bearer ', activity('GetAccessToken').output.access_token)"
In step 3 I have two datasets and 2 Linked services, Dataset 1 is a REST dataset which has the base url and relative url which is linked to the REST linked service and secondly the sink dataset is connected to AZURE datalake storage.
In Source Dataset I'm passing additional header Authorization = #concat('Bearer ', activity('GetAccessToken').output.access_token) and ideally since the API which I want to call will return empty if no parameters are send so I pass in the parameters inside the "Request Body" is that even correct? the Request body would look something like this "start_date=2020/07/17&end_date=2020/07/18".
The sink is a simple Json dataset stored in DataLake.
When I try to debug I get the error as below
But I'm getting the below error
{
"errorCode": "2200",
"message": "Failure happened on 'Source' side. ErrorCode=UserErrorHttpStatusCodeIndicatingFailure,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The HttpStatusCode 401 indicates failure. { \"Error\": { \"Message\":\"Authentication failed: Invalid headers\", \"Server-Time\":\"2020-07-27T06:59:24\", \"Id\":\"6AAF87BC-5634-4C28-8626-810A19B86BFF\" } },Source=Microsoft.DataTransfer.ClientLibrary,'",
"failureType": "UserError",
"target": "CopyDataFromAPI",
"details": []
}
Please advise if I'm doing anything wrong.

I Knew this was a simple issue,
So for people who are looking for answers.
Please make sure the REST Source URL starts with HTTPS:// instead of HTTP:// I Guess Azure does not pass headers to url which starts with HTTP:// which is strange because POSTMAN and python script has no problem sending the headers.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string