Azure Data Factory specify custom output filename when copying to Blob Storage - azure

I'm currently using ADF to copy files from an SFTP server to Blob Storage on a scheduled basis.
The filename structure is AAAAAA_BBBBBB_CCCCCC.txt.
Is it possible to rename the file before copying to Blob Storage so that I end up with a folder-like structure like below?
AAAAAA/BBBBBB/CCCCCC.txt

Here is what worked for me
I created 3 parameters in my Blob storage dataset, see the image bellow:
I specified the name of my file, added the file extension, you can add anything in the Timestamp just so you could bypass the ADF requirement since a parameter can't be empty.
Next, click on the Connection tab and add the following code in the FileName box: #concat(dataset().FileName,dataset().Timestamp,dataset().FileExtension). This code basically concatenate all parameters do you could have something like "FileName_Timestamp_FileExtension. See the image bellow:
Next, click on your pipeline then select your copy data activity. Click on the Sink tab. Find the parameter Timestamp under Dataset properties and add this code: #pipeline().TriggerTime. See the image bellow:
Finally, publish your pipeline and run/debug it. If it worked for me then I am sure it will work for you as well :)

With ADF V2, you could do that. First, use a lookup activity to get all the filenames of your source.
Then chain a foreach activity to iterate the source file names. The foreach activity contains a copy activity. Both your source dataset and sink dataset of the cop activity have parameters for filename and folder path.
You could use split and replace functions to generate the sink folder path and filename based on your source file names.

First you have to get the filenames in a GetMetadata-Activity. You can use this as a parameter in a copy-Activity and rename the filenames.
As mentioned in previous answer you can use a replace function to do this:
{
"name": "TgtBooksBlob",
"properties": {
"linkedServiceName": {
"referenceName": "Destination-BlobStorage-data",
"type": "LinkedServiceReference"
},
"folder": {
"name": "Target"
},
"type": "AzureBlob",
"typeProperties": {
"fileName": {
"value": "#replace(item().name, '_', '\\')",
"type": "Expression"
},
"folderPath": "data"
}
},
"type": "Microsoft.DataFactory/factories/datasets"
}

Related

Reading files from Azure blob into Function App in Python

I'm trying to read multiple files of same type from a container recursively from Azure blob storage in Python with Function App. But how could that be done using binding functions in host.json of orchestrator as shown below? What appropriate changes should be made in local settings as I've mentioned the conn strings and paths to blobs already in the same?
{
"scriptFile": "__init__.py",
"bindings": [
{
"name": "context",
"type": "orchestrationTrigger",
"direction": "in"
},
{
"name": "inputblob",
"type": "blob",
"dataType": "string",
"path": "test/{file_name}.pdf{queueTrigger}",
"connection": "CONTAINER_CONN_STR",
"direction": "in"
}
]
}
*test : The directory I have.
CONTAINER_CONN_STR : already specified path
Also, when doing so, in normal method without binding, gives error while downloading the files to local system as given below:
Exception: PermissionError: [Errno 13] Permission denied: 'analytics_durable_activity/'
Stack: File "C:\Program Files\Microsoft\Azure Functions Core Tools\workers\python\3.8\WINDOWS\X64\azure_functions_worker\dispatcher.py", line 271, in _handle__function_load_request
func = loader.load_function(
how could that be done using binding functions in host.json of orchestrator as shown below? What appropriate changes should be made in local settings
The configuration that you have used looks good. For more information, you can refer to this Example.
Also, when doing so, in normal method without binding, gives error when downloading the files to local system as given below:
You might this error when you are trying to open a file, but your path is a folder or if you don't have the permissions that are required.
You can refer to this SO thread which discusses a similar issue.
REFERENCES:
Set, View, Change, or Remove Permissions on Files and Folders | Microsoft Docs
You can keep the state of the trigger in an entity and checking the same every time the function gets triggered. The function will process the file only when the state matches i.e. the previous file has already been received but not processed.
Please refer https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview?tabs=csharp - Pattern #6: Aggregator (stateful entities)

Using parameters to locate file during trigger creation in Azure Data Factory

I am trying to create a trigger that I will use for starting a pipeline in ADF:
The folder I want to set my trigger on can have different paths:
202001/Test/TriggerFolder
202002/Test/TriggerFolder
202003/Test/TriggerFolder
etc..
Therefore in my Blob path begins with I would like to use a parameter (that I will set somewhere else through another pipeline) that tells the trigger where to look for instead of having a static name file.
Unfortunately it doesn't seem to give me the chance to add dynamic content as (for example) in a DataSet.
If there is really no chance, because maybe I may think the trigger is something instantiated once, is there a way to create a trigger as a step during a pipeline?
Thank you!
There is possibility to pass parameter from "ARM Template" of the Azure Data Factory. During the deployment of pipelines, this parameter can be passed with necessary value. Below is example code for it.
Sample Code:
{
"name": "[concat(parameters('factoryName'), '/trigger1')]",
"type": "Microsoft.DataFactory/factories/triggers",
"apiVersion": "2018-06-01",
"properties": {
"annotations": [],
"runtimeState": "Stopped",
"pipelines": [],
"type": "BlobEventsTrigger",
"typeProperties": {
"blobPathBeginsWith": "[parameters('trigger1_properties_typeProperties_blobPathBeginsWith')]",
"ignoreEmptyBlobs": true,
"scope": "[parameters('trigger1_properties_typeProperties_scope')]",
"events": [
"Microsoft.Storage.BlobCreated"
]
}
},

Is it possible to create a blob triggered azure function with file name pattern?

I am developing a blob triggered azure function. Following is the configuration of my "function.json" file:
{
"disabled": false,
"bindings": [
{
"name": "myBlob",
"type": "blobTrigger",
"direction": "in",
"path": "input/{name}",
"connection": "BlobConnectionString"
}
]
}
My function is working fine. It is triggered for all files in "input" blob. Now I want to filter files by its naming pattern. For Example : I want to trigger my azure function for only those files which contains "~123~" in its name.
Is it possible to do with some change in "path" property of "function.json" file?
If yes, then what should be the value of the "path" property?
If not, please let me know if there is any other workaround possible.
Thanks,
input/{prefix}~123~{suffix} should work. In function method signature, instead of name, use prefix and suffix to get blob name if needed.

Azure Logic Apps - Moving Email Message with Move Message action

I'm fairly new to Logic Apps and I have an app that I'm trying to get to move an email to a subfolder of the Inbox in a shared mailbox, but I'm trying to generate the path based on the date and I cannot for the life of me get it to work. I don't know if my path syntax is wrong or what.
The subfolder structure is basically
- Inbox
- 2018
- Jan
- Feb
- Mar
- Etc
And I'm trying to generate the path based off the year and the month using the Expressions part of a field. I've got an expression that generates the path for me
concat('Inbox\',formatDateTime(convertFromUtc(utcNow(),'Mountain Standard Time'),'MMM'),'\',formatDateTime(convertFromUtc(utcNow(),'Mountain Standard Time'),'yyyy'))
When the logic app runs this generates the correct path string of Inbox\2018\Jan but when the Move Email action runs it always escapes the backslash and then says it can't find the folder Inbox\\2018\\Jan.
So I either have this format wrong, I can't put the email in a subfolder or there's another way to do this.
I tried using the folder picker to pick one of the month subfolders and then peeked at the code and it uses some base64 encoded string for the path. I've pasted below what the peeked code shows
{
"inputs": {
"host": {
"connection": {
"name": "#parameters('$connections')['office365']['connectionId']"
}
},
"method": "post",
"path": "/Mail/Move/#{encodeURIComponent(triggerBody()?['Id'])}",
"queries": {
"folderPath": "Id::AAMkADRmOTgyMDI1LThkODYtNDMwYy1iYThiLTIzODQwN2Y1OGMzYQAuAAAAAAA6K3dJssnITb8NwkAsBOo7AQBaJ9ZTcg-MSoOEUUjjUdOAAAAD0nvYAAA="
},
"authentication": "#parameters('$authentication')"
},
"metadata": {
"Id::AAMkADRmOTgyMDI1LThkODYtNDMwYy1iYThiLTIzODQwN2Y1OGMzYQAuAAAAAAA6K3dJssnITb8NwkAsBOo7AQBaJ9ZTcg-MSoOEUUjjUdOAAAAD0nvYAAA=": "Jan"
}
}
Does anyone know how I would be able to move an email to a subfolder without using the folder picker?
Edit: Since posting I've also tried using the following strings that also do not work
Inbox/2018/Jan
Inbox:/2018/Jan
/Inbox/2018/Jan
You cant really have the path in terms of a hierarchy folder structure in this particular logic app.
If you look at the Documentation for Office 365 Mail rest operations #
https://msdn.microsoft.com/office/office365/api/mail-rest-operations#MoveCopyMessages
You will notice that to Move messages what you actually need is a folder ID. Also if you look at the logic app Designer, when you select a folder directly from there and then look at the code view you will see an ID. It looks something like
"method": "post",
"path": "/Mail/Move/#{encodeURIComponent(triggerBody()?['Id'])}",
"queries": {
"folderPath": "Id::AAMkADZmZDQ5OWNhLTU3NzQtNDRlZC1iMDRlLTg5NTA1NGM3NWJlZgAuAAAAAAAhZj7Qt8LySYhKvlgbXRNVAQBT8bGPBJK8Qqoy01hgwH4rAAEJysaQAAA="
}
},
"metadata": {
"Id::AAMkADZmZDQ5OWNhLTU3NzQtNDRlZC1iMDRlLTg5NTA1NGM3NWJlZgAuAAAAAAAhZj7Qt8LySYhKvlgbXRNVAQBT8bGPBJK8Qqoy01hgwH4rAAEJysaQAAA=": "Jan"
},
The FolderID is unique to every folder. One easy way to find the FolderIDs for a folder is to use
https://developer.microsoft.com/en-us/graph/graph-explorer#
and after signing in , posting
https://graph.microsoft.com/beta/me/mailFolders/Inbox/childFolders
as the query which will give you the ChildFolders for Inbox the values will look something like the following for every folder
"value": [
{
"id": "AAMkADZmZDQ5OWNhLTU3NzQtNDRlZC1iMDRlLTg5NTA1NGM3NWJlZgAuAAAAAAAhZj7Qt8LySYhKvlgbXRNVAQBT8bGPBJK8Qqoy01hgwH4rAAEJysWPAAA=",
"displayName": "AZCommunity",
"parentFolderId": "AAMkADZmZDQ5OWNhLTU3NzQtNDRlZC1iMDRlLTg5NTA1NGM3NWJlZgAuAAAAAAAhZj7Qt8LySYhKvlgbXRNVAQDX8XL9o4tkR5vF5sEdh44eAIYnQnhhAAA=",
"childFolderCount": 0,
"unreadItemCount": 5,
"totalItemCount": 169,
"wellKnownName": null
},
For what you are trying to do, you will have to do additional work to map the folders to the folder ID and then assign using that. I would suggest using Azure Functions to easily do this.

Is it possible to use U-SQL managed tables as output datasets in Azure Data Factory?

I have a small ADF pipeline that copies a series of files from an Azure Storage Account to an Azure Data Lake account. As a final activity in the pipeline I want to run a U-SQL script that uses the copied files as inputs and outputs the result to a U-SQL managed table.
The U-SQL script basically extracts the data from the copied files, applies some transformation and then INSERT´s it into an existing U-SQL managed table.
How (if possible) can I add the U-SQL table as a output dataset in Azure Data Factory?
You cannot currently add a U-SQL internal table as an output dataset in Azure Data Factory (ADF). A similar question came up recently here and the answer from Michael Rys (the "father" of U-SQL) was "I know that the ADF team has a work item to do this for you."
You could use howerver Azure Data Factory to run a parameterised U-SQL script, where the input parameter is the filepath. This would have a similar result.
Example pipeline from a recent question:
{
"name": "ComputeEventsByRegionPipeline",
"properties": {
"description": "This is a pipeline to compute events for en-gb locale and date less than 2012/02/19.",
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"scriptPath": "adlascripts\\SearchLogProcessing.txt",
"scriptLinkedService": "StorageLinkedService",
"degreeOfParallelism": 3,
"priority": 100,
"parameters": {
"in": "/input/SearchLog.tsv",
"out": "/output/Result.tsv"
}
},
...
Basically the U-SQL script goes from:
#searchlog =
EXTRACT ...
FROM #in
USING Extractors.Tsv();
to:
#searchlog =
EXTRACT ...
FROM "/input/SearchLog.tsv"
USING Extractors.Tsv();
which I think achieves the same thing you want.

Resources