How to get creation datetime of a blob file from Azuredatafactory or Azure Dataflow

How to get creation datetime of a blob file from Azuredatafactory or Azure Dataflow - azure

I can fetch lastmodified file from get metadata activity but the requirement is the creation date of the file .I can see no option to fetch that in Azure Data Factory . Besides I need to fetch:
First row from fileCreationDateTime --- Descending Order. So i need the CreationDateTime functionality as LastModified .
As shown in the picture , there are two things CREATION TIME and LAST MODIFIED . So i need to fetch all the creationtimes and sort it to decending and pick the first row.

You can use Web Activity in ADF and make a call to Get Blob REST API which will return the blob creation date in Web activity Response headers section, and you can use that value as per your requirement.
Get Blob REST API sample URI: https://myaccount.blob.core.windows.net/mycontainer/myblob
Ref document: Get Blob - Response Headers
You can capture the value of Blob creation datetime from the REST API response headers in ADF using below dynamic expression:
#activity('WebGetBlobDetails').output.ADFWebActivityResponseHeaders['x-ms-creation-time']
Here is a sample demo. I have used the hardcoded bearer token in the headers for a quick demo, but you can automate and pass dynamically by having another web activity to get the bearer token before WebGetBlobDetails activity.
Sample GIF:

Related

Azure Data Factory - Retrieve next pagination link (decoded) from response headers in a copy data activity of Azure Data Factory

I have created a copy data activity in azure data factory and this data pipeline pulls the data from an API (via REST activity source) and writes the response body (json) on a file kept in the azure blob storage.
The API which I am fetching the response from, is paginated and the link to next page is sent in the response headers in response->headers->link.
This URL to next page is in the following general format:
<https%3A%2F%2FsomeAPI.com%2Fv2%2FgetRequest%3FperPage%3D80%26sortOrder%3DDESCENDING%26nextPageToken%3DVAdjkjklfjjgkl>; rel="next"
I want to fetch the next page token present in the above URL and use it in the pagination rule.
I have tried using some pagination rules:
> AbsoluteURL = Headers.link
But, this did not work as the entire encoded link shown above, is getting appended directly and hence, the pipeline throws an error.
> Query Parameters
I have also tried to use the query parameters but could not get any result.
I have followed questions over stackoverflow and have read the documentations:
Please help me with how can I access this next page token or what can be the pagination rule to support the scenario.
Pasting postman output and ADF data pipeline, for reference.
Postman Response Headers Output
Pagination Rules, I need help on

Azure Data Factory - how to save the rowCount of copied records in a file in ADLS Gen2 or to send this information to email as notification?

I want to save the output result of the rowsRead(source destination) and rowsCopied(sink destination) in a file in ADLS Gen2 or send it as an email notification.
This is the output result I want to store in a file or send as email notification(see picture)

You can get the number rows read and number of rows written from the Copy activity output.
To avoid complicate dynamic content in the web activity Body, first store these in two in a string variables.
For rows read use the below dynamic content.
#string(activity('Copy data1').output.rowsRead)
For rows written use the below expression
#string(activity('Copy data1').output.rowsCopied)
For mail, you can use logic apps. Use web activity to invoke logic app.
create a parameter for mail in pipeline.
Web activity:
In the Body of the web activity, give the below dynamic content.
{
"message" : "This is a custom dynamic message from your pipeline with run ID #{pipeline().RunId} and rows read are #{variables('rowsread')} and rows written are #{variables('rowswritten')}.",
"dataFactoryName" : "#{pipeline().DataFactory}",
"pipelineName" : "#{pipeline().Pipeline}",
"receiver" : "#{pipeline().parameters.receiver}"
}
For logic app workflow, please go through this Microsoft Official documentation which has step by step explanation to send mail from ADF using logic app.

How can I import all records from an Airtable table using an Azure Synapse Analytics pipeline rather than just retrieving the first 100?

When using the REST integration in an Azure Synapse pipeline and supplying the proper authorization (api_key), I'm only getting 100 records loaded into my Azure Synapse data sink. How do I ensure all records are imported?

There is a pagination offset that appears in the JSON response of Airtable. On the Source tab of the copy data step in Synapse, under Pagination rules, select QueryParameter, enter "offset" (no quotes) into the field next to QueryParameter, and enter "$['offset']" (no quotes) into the Value. That's it - no need for relative URL or a parameter configuration. The pagination rule tells synapse to look for the data element "offset" in the response and to continue fetching more data until a response no longer contains that data element in the JSON. See screenshot below. The second screenshot shows the authorization configuration.
The authorization configuration for the Airtable API is shown below - this causes Synapse to include the HTTP header and value "Authorization: Bearer " to the Airtable API. Just replace <api_key> with your Airtable api key which can be found and / or created under your account settings in Airtable.

If two or more metadata headers with the same name are submitted for a resource would the blob service return 200 or 400? (Azure Blob Service)

According to
https://learn.microsoft.com/en-us/learn/modules/work-azure-blob-storage/5-set-retrieve-properties-metadata-rest -
"If two or more metadata headers with the same name are submitted for a resource, the Blob service returns status code 400 (Bad Request)".
But According to
https://learn.microsoft.com/en-us/learn/modules/work-azure-blob-storage/4-manage-container-properties-metadata-dotnet -
"If two or more metadata headers with the same name are submitted for a resource, Blob storage comma-separates and concatenates the two values and return HTTP response code 200 (OK)".
Well, Which is it?
Am I missing something?

Actually both of them are true.
The first one is for the REST API operation. If you are calling the REST API directly and setting 2 metadata items with the same name, the request will fail with bad request (400 status code error).
The second one is for when you are using .Net SDK. Here, if you are setting 2 metadata items with the same name, the SDK will combine both of them and send a single header request to the REST API.

Pagination error invalid token for Rest API Data Factory

I am trying to use rest api to do pagination as it is just sending the first page in Azure ADF going to blob storage. I am currently using AbsoluteUrl and $['#odata.nextLink'] to get over all the pages, the issue is I am getting this error, I have used first used the token activity to get the token and then used it in copy activity where the source is rest api dataset with headers dynamically coming from token activity and then used pagination. Can you point me in the right direction on if this is the correct approach or am I missing something.
This is how the import schema looks like:
And the error after importing schema
This is how my rest api configuration look like:
And this is how my token all web activity looks like:
Edit 2:
This is how the output is for Web activity:
Including the part of the snip that missed the access token:
This is the output for Copy Activity when Pagination is on:
This is the setup of the pipeline:

HttpStatusCode 401 indicates the authentication was not completed or failed due to invalid credentials. It maybe that the access token is missing in request from copy activity or not referenced properly or is expired. Make sure you already have the right access to this API.
Here is an example with basic configuration requirements:
Get the access token
Ensure you are able to reference it dynamically using Add dynamic content fields. Modify the reference with respect to the output you have received from earlier Login Activity.
Additional headers: Authorization: #concat('Bearer', activity('Login').output.access_token)
AbsoluteUrl: ${result_root}.{nextPageURL}
Here is the official doc on Pagination support refer the supported Key and value pairs.
If you are getting the access token correctly but still seeing the error, try to Import Schema in Mapping Settings of copy activity. And make sure the nextPageUrl or odata.nextLink in your case is mapped correctly.
Recheck $['#odata.nextLink'] , AbsoluteUrl value as:
$.rootElementName.CollectionOfItems.nextLinkURL

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to get creation datetime of a blob file from Azuredatafactory or Azure Dataflow - azure

Related

Azure Data Factory - Retrieve next pagination link (decoded) from response headers in a copy data activity of Azure Data Factory

Azure Data Factory - how to save the rowCount of copied records in a file in ADLS Gen2 or to send this information to email as notification?

How can I import all records from an Airtable table using an Azure Synapse Analytics pipeline rather than just retrieving the first 100?

If two or more metadata headers with the same name are submitted for a resource would the blob service return 200 or 400? (Azure Blob Service)

Pagination error invalid token for Rest API Data Factory

Categories

Resources