How to copydata from RestAPI using datafactory and save it in Datalake? - azure

I'm trying to fetch data from REST API and save the json string it into DataLake and I'm getting an error. I've followed the steps mentioned here
https://learn.microsoft.com/en-us/azure/data-factory/connector-rest & https://www.alexvolok.com/2019/adfv2-rest-api-part1-oauth2/
The API which I'm trying to connect uses OAuth2 so I need to first get the access token and then do a get request to get actual data.
Below are the steps which I'm following
Creating a Web HTTP request in the pipeline and passing the client_ID, client secret, username, password and grant type in the body of the request. When I debug the pipline I do get the Access_token which I need in step 2.
In Step two I have a copy activity which uses the output(access_token) from web to authenticate the second REST GET request but this is where I'm facing a lot of issues. The code which I'm using is "#concat('Bearer ', activity('GetAccessToken').output.access_token)"
In step 3 I have two datasets and 2 Linked services, Dataset 1 is a REST dataset which has the base url and relative url which is linked to the REST linked service and secondly the sink dataset is connected to AZURE datalake storage.
In Source Dataset I'm passing additional header Authorization = #concat('Bearer ', activity('GetAccessToken').output.access_token) and ideally since the API which I want to call will return empty if no parameters are send so I pass in the parameters inside the "Request Body" is that even correct? the Request body would look something like this "start_date=2020/07/17&end_date=2020/07/18".
The sink is a simple Json dataset stored in DataLake.
When I try to debug I get the error as below
But I'm getting the below error
{
"errorCode": "2200",
"message": "Failure happened on 'Source' side. ErrorCode=UserErrorHttpStatusCodeIndicatingFailure,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The HttpStatusCode 401 indicates failure. { \"Error\": { \"Message\":\"Authentication failed: Invalid headers\", \"Server-Time\":\"2020-07-27T06:59:24\", \"Id\":\"6AAF87BC-5634-4C28-8626-810A19B86BFF\" } },Source=Microsoft.DataTransfer.ClientLibrary,'",
"failureType": "UserError",
"target": "CopyDataFromAPI",
"details": []
}
Please advise if I'm doing anything wrong.

I Knew this was a simple issue,
So for people who are looking for answers.
Please make sure the REST Source URL starts with HTTPS:// instead of HTTP:// I Guess Azure does not pass headers to url which starts with HTTP:// which is strange because POSTMAN and python script has no problem sending the headers.

Related

Webhook validation handshake failed

i'm new in azure and wanted to create an Event subscription to push any changes (here resource write success) from my blob storage to my Logic App via HTTP. In my Logic app i have a When a HTTP request is received, which would run when i send a request.
just wanted to demonstrate how i'm trying to create it.
NAME: 'i dont think this matters'
...
TOPIC NAME: 'just gave it a Random Name, should this refer to sth?'
Source Resource:'My Storage Account'
ENDPOINT:'i got the URL from the Overview page of my logic app (Workflow URL
)'
URL looks like this:
https://LOGICAPPNAME.azurewebsites.net:443/api/APPNAME/triggers/manual/invoke?api-version=2022-05-01&sp=%2Ftriggers%2Fmanual%2Frun&sv=1.0&sig=RjVKZbs-0CV559hZYlFfhM0k22W39lS5
when i copy and paste this to my browser i can trigger my Logic App. i think that act as a GET and i'm not sure if it sends it as POST or etc. it would make any difference.
and i got this error.
Deployment has failed with the following error: {"code":"Url validation","message":"Webhook validation handshake failed for https://LOGICAPPNAME.azurewebsites.net/api/APPNAME/triggers/manual/invoke. Http POST request retuned 2XX response with response body . When a validation request is accepted without validation code in the response body, Http GET is expected on the validation url included in the validation event(within 10 minutes). For troublehooting, visit https://aka.ms/esvalidation. Activity id:ID, timestamp: DATE TIME"}
If there's still confusion on how i'm doing this, i'm trying to follow THIS EXAMPLE. How can i fix this Error?
After reproducing from my end, I could able to achieve this following the link you have provided using the below details for creation of my logic app.
The reason you are receiving that error is that you need to use Logic App HTTP Trigger Request URL as Web Hook's endpoint and not the Logicapp's URL.
NOTE: You don't really have to call the Trigger again, whenever you make any changes to your storage account the logic apps gets triggered.
RESULTS:
When Blob is Created
When a Blob is Deleted
REFERENCES: "When blob is added or modified" trigger will not be fired on subfolder" answered by SamaraSoucy-MSFT - Microsoft Q&A

Pagination error invalid token for Rest API Data Factory

I am trying to use rest api to do pagination as it is just sending the first page in Azure ADF going to blob storage. I am currently using AbsoluteUrl and $['#odata.nextLink'] to get over all the pages, the issue is I am getting this error, I have used first used the token activity to get the token and then used it in copy activity where the source is rest api dataset with headers dynamically coming from token activity and then used pagination. Can you point me in the right direction on if this is the correct approach or am I missing something.
This is how the import schema looks like:
And the error after importing schema
This is how my rest api configuration look like:
And this is how my token all web activity looks like:
Edit 2:
This is how the output is for Web activity:
Including the part of the snip that missed the access token:
This is the output for Copy Activity when Pagination is on:
This is the setup of the pipeline:
HttpStatusCode 401 indicates the authentication was not completed or failed due to invalid credentials. It maybe that the access token is missing in request from copy activity or not referenced properly or is expired. Make sure you already have the right access to this API.
Here is an example with basic configuration requirements:
Get the access token
Ensure you are able to reference it dynamically using Add dynamic content fields. Modify the reference with respect to the output you have received from earlier Login Activity.
Additional headers: Authorization: #concat('Bearer', activity('Login').output.access_token)
AbsoluteUrl: ${result_root}.{nextPageURL}
Here is the official doc on Pagination support refer the supported Key and value pairs.
If you are getting the access token correctly but still seeing the error, try to Import Schema in Mapping Settings of copy activity. And make sure the nextPageUrl or odata.nextLink in your case is mapped correctly.
Recheck $['#odata.nextLink'] , AbsoluteUrl value as:
$.rootElementName.CollectionOfItems.nextLinkURL

How to pass Body Parameters(format) when calling a POST request with Content-Type as form-data in Azure Data Factory

I am trying to call API endpoint as a POST and Content-Type as from-data using azure data factory web activity. Tried different way of passing Body parameters but it failed.
Here is the Postman Request.
Here is the Azure Data Factory Web Activity configurations.(use the body as a json and tried different combinations but all didn't work)
And above is the error message.
Any help would be highly appreciated.
Since your request is seen successfully executing from postman, try copying the entire body from that and use in web activity.
The format for passing body for a POST request from a web activity is shown here.
Also make sure you have entered valid “url–Target endpoint and path”. This is usually seen as Activity requires Public end point but you may have used Private vnet where this is not allowed. Web Activity is supported for invoking URLs that are hosted in a private virtual network as well by leveraging self-hosted integration runtime. The integration runtime should have a line of sight to the URL endpoint.
Note: The activity will timeout at 1 minute with an error if it does not receive a response from the endpoint.
Further going through some similar scenarios it is learnt that;
Mostly the header is passed as string in WebActivity whereas Postman
it is integer/long
In case your API tries redirecting, it seems that the web activity
in Azure Data Factory does not currently support following
redirects, meanwhile Postman and other tools and libraries usually
follow redirects by default or include a option for handling them.
Checkout the supported authentication types in the web activity. If you are trying to authorize your from, try set the following.
URL: https://login.microsoftonline.com/<<tenantid>>/oauth2/token
Headers: Content-Type - "application/x-www-form-urlencoded"
Body: grant_type=client_credentials&client_id=<<clientid>>f&client_secret=<<secret>>&resource=https%3A%2F%2Fmicrosoft.onmicrosoft.com%2F<<resourceId>>
Error code: 2108:
Message: Error calling the endpoint '%url;'. Response status code: '%code;'
Cause: The request failed due to an underlying issue such as network connectivity, a DNS failure, a server certificate validation, or a timeout.
Workaround: Make the API call using Powershell, and call that Powershell script from within Data Factory.

How read async URL returned (in location header) by a Logic App authenticated through Oauth?

I've created a Logic App, which is configured to authenticate using AD Outh according to this:
https://learn.microsoft.com/en-us/azure/logic-apps/logic-apps-securing-a-logic-app
This Logic App it takes a lot of time to execute and to avoid timeout the response was configured with Asynchronous Pattern as mention in:
https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/async-operations
The authentication with Bearer Token works and the Logic App responses with a URL (in Location header) in which finally will be stored the response.
The problem is that I cant't access to this URL because I'm receiving the next error:
{
"error": {
"code": "DirectApiAuthorizationRequired",
"message": "The request must be authenticated only by Shared Access scheme."
}
}
The problem is that the URL returned in the Location header only contains SAS keys when I run the Logic App using SAS and I need use only OAuth for securiry reasons.
Now, if try to access to this URL using Bearer token the response is:
{
"error": {
"code": "InvalidUseOfOAuthToken",
"message": "The requested operation is not supported, Use of open authentication token is only supported for workflow trigger request."
}
}
Here an example of the URL:
https://prod-05.southcentralus.logic.azure.com/workflows/a98d6ba3becd449db74ac0527a64ec57/runs/08585941366423271731798768425CU04/operations/c4d9cb98-03b3-4c44-87c3-5752c2ed403c?api-version=2016-10-01
So, understanding that is not posible access to this URL using OAuth, How can I force the header location to include the SAS parameters by consuming the logic app using OAuth?
Finally, solution proposed was to use the functionality "Create expiring callback URLs" for Async Logic Apps,
https://learn.microsoft.com/en-us/azure/logic-apps/logic-apps-securing-a-logic-app#create-expiring-callback-urls
This consists of creating a temporary URL for each execution, these URL’s contain SAS parameter which will in cretain date . These URLs are created from a call to the ARM Rest API using a Bearer Token and specifying the date on which it will stop working. With this way there is no problem using SAS because are temporary.
https://learn.microsoft.com/en-us/rest/api/logic/workflowversions/listcallbackurl

SharePoint API: Invalid Access Token Resource

I am trying to obtain an access token for use with the SharePoint Rest API. For my organizations base site. I am able to obtain a token and use that token to make subsequent requests successfully.
Next, I followed the same process and created more app permissions for a different site: {{tenant removed}}/sites/testsite. I was initially unable to create the request for the token because the resource parameter was not valid (see image below):
Per the URI encoding standards, I replaced the "/" in the site url with "%2f" and I am able to get a token (see image below):
Next however, the requests using that token to the API fail:
{
"error_description":
"Exception of type 'Microsoft.IdentityModel.Tokens.AudienceUriValidationFailedException' was thrown."
}
In the response header:
3000003;reason="Invalid audience Uri
'00000003-0000-0ff1-ce00-000000000000/{{tenant
removed}}%2fsites%2f{{removed}}#{{realm
removed}}'.";category="invalid_client"
Did I encode the resource incorrectly? What am I missing? How can I use this method to get information from the other site?
I can see many developers making the same assumption when they create requests, since almost all documentation don't point out this scenario. You will be able to obtain a token for the site successfully as long as the resource is in a valid uri format, there is no validation done on the uri itself. Even if you get a token it will not work for any requests.
When fetching the access token for subsites (i.e: {{tenant}}/sites/testsite ). The resource part of the request body does not need to be modified.
So, for example, when you are getting a token for test.sharepoint.com/sites/testsite the resource of the request body should just be:
00000003-0000-0ff1-ce00-000000000000/test.sharepoint.com#{{realm}} (without /sites/testsite)
However, when you make HTTP requests to the API with the token, you should use the full site name. Example:
https://test.sharepoint.com/sites/testsite/_api/web/

Resources