I am currently using ADF to copy a bunch of files from FTP to Azure Storage account. I have to add
metadata for each file. I have been able to do this by adding metadata under the sink tab.
The problem is that this metadata is dynamic for each file and is derived from the name of the file. Can I do something like this in ADF or do I need a separate Azure Function / API to update metadata for each file?
Regards Tarun
I think you can use ADF expression here?
Related
I am new to Azure Data Factory and started doing projects with it. Currently, I have managed to copy files from SharePoint to ADLS. After copying, I would like to move the file in SharePoint to Archive folder using ADF but not successful.
So for example, my file is in "Shared Documents/latestfile/test.xlsx". After copying into ADLS, i would like to shift the file to "Shared Documents/Archive/test.xlsx"
Would kindly need some help in doing so. Thank you.
ShairPoint as a Sink is not yet support by Azure Data Factory.
Please refer the Docs to understand the Azure Data Factory connector overview, here.
You can try to leverage SharePoint APIs to achieve same, mentioned here: REST call to move a file/folder to another location | Link
I have few 100 files in a folder in Blob Storage. Each of the files have custom metadata (Dictionary type). So when traversing through all files I need to get those metadata of each files.
So how to read that details. I tried using GetMetadata feature which has some hardcoded features like, exists, filename, lastedit etc. But I need to get the custom metadata of those files.
Please share some ideas.
#Sandeep
I am assuming you're looking to get the user-defined Metadata through Get Meta data Feature.
Unfortunately, this is currently not possible - the activity only returns the predefined set of Metada.
The list of Metadata supported has been documented here.
Workaround
One of the workarounds I can think is make use of the Web Activity in the ADF and hit the REST API Get Blob Properties
The Get blob properties API returns all user-defined metadata, standard HTTP properties, and system properties for the blob.
I have some Excel files stored in SharePoint online. I want copy files stored in SharePoint folders to Azure Blob storage.
To achieve this, I am creating a new pipeline in Azure Data factory using Azure Portal. What are possible ways to copy files from SharePoint to Azure blob store using Azure Data Factory pipelines?
I have looked at all linked services types in Azure data factory pipeline but couldn't find any suitable type to connect to SharePoint.
Rather than directly accessing the file in SharePoint from Data Factory, you might have to use an intermediate technology and have Data Factory call that. You have a few of options:
Use a Logic App to move the file
Use an Azure Function
Use a custom activity and write your own C# to copy the file.
To call a Logic App from ADF, you use a web activity.
You can directly call an Azure Function now.
We can create a linked service of type 'File system' by providing the directory URL as 'Host' value. To authenticate the user, provide username and password/AKV details.
Note: Use Self-hosted IR
You can use the logic app to fetch data from Sharepoint and load it to azure blob storage and now you can use azure data factory to fetch data from blob even we can set an event trigger so that if any file comes into blob container the azure pipeline will automatically trigger.
You can use Power Automate (https://make.powerautomate.com/) to do this task automatically:
Create an Automated cloud flow trigger whenever a new file is dropped in a SharePoint
Use any mentioned trigger as per your requirement and fill in the SharePoint details
Add an action to create a blob and fill in the details as per your use case
By using this you will be pasting all the SharePoint details to the BLOB without even using ADF.
My previous answer was true at the time, but in the last few years, Microsoft has published guidance on how to copy documents from a SharePoint library. You can copy file from SharePoint Online by using Web activity to authenticate and grab access token from SPO, then passing to subsequent Copy activity to copy data with HTTP connector as source.
I ran into some issues with large files and Logic Apps. It turned out there were some extremely large files to be copied from that SharePoint library. SharePoint has a default limit of 100 MB buffer size, and the Get File Content action doesn’t natively support chunking.
I successfully pulled the files with the web activity and copy activity. But I found the SharePoint permissions configuration to be a bit tricky. I blogged my process here.
You can use a binary dataset if you just want to copy the full file rather than read the data.
If my file is located at https://mytenant.sharepoint.com/sites/site1/libraryname/folder1/folder2/folder3/myfile.CSV, the URL I need to retrieve the file is https://mytenant.sharepoint.com/sites/site1/libraryname/folder1/folder2/folder3/myfile.CSV')/$value.
Be careful about when you get your auth token. Your auth token is valid for 1 hour. If you copy a bunch of files sequentially, and it takes longer than that, you might get a timeout error.
I have created a pipeline in Azure data factory (V1). I have a copy pipeline, that has an AzureSqlTable data set on input and AzureBlob data set as output. The AzureSqlTable data set that I use as input, is created as output of another pipeline. In this pipeline I launch a procedure that copies one table entry to blob csv file.
I get the following error when launching pipeline:
Copy activity encountered a user error: ErrorCode=UserErrorTabularCopyBehaviorNotSupported,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=CopyBehavior property is not supported if the source is tabular data source.,Source=Microsoft.DataTransfer.ClientLibrary,'.
How can I solve this?
According to the error information, it indicateds that it is not supported action for Azure data factory, but if use Azure sql table as input and Azure blob data as output it should be supported by Azure data factory.
I also do a demo test it with Azure portal. You also could follow the detail steps to do that.
1.Click the copy data from Azure portal.
2.Set copy properties.
3.Select the source
4.Select the destination data store
5.Complete the deployment
6.Check the result from azure and storage.
Update:
If we want to use the existing dataset we could choose [From Existing Conections], for more information please refer to the screenshot.
Update2:
For Data Factory(v1) copy activity settings it just supports to use existing Azure blob storage/Azure Data Lake Store Dataset. More detail information please refer to this link.
If using Data Factory(V2) is acceptable, we could using existing azure sql dataset.
So, actually, if we don't use this awful "Copy data (PREVIEW)" action and we actually add an activity to existing pipeline and not a new pipeline - everything works. So the solution is to add a copy activity manually into an existing pipeline.
How do I delete all files in a source folder (located on On-premise file system). I would need help with a .NET custom activity or any Out-of-the-box solutions in Azure Data Factory.
PS: I did find a delete custom activity but it's more towards Blob storage.
Please help.
Currently, there is NO support for a custom activity on Data Management Gateway. Data Management Gateway only supports copy activity and Stored Procedure activity as of today (02/22/2017).
Work Around: As I do not have a delete facility for on-premise files, I am planning to have source files in folder structures of yyyy-mm-dd. So, every date folder (Ex: 2017-02-22 folder) will contain all the related files. I will now configure my Azure Data Factory job to pull data based on date.
Example: The ADF job on Feb 22nd will search for 2017-02-22 folder. In the next run my ADF job will search for 2017-02-23 folder. This way I don't need to delete the processed files.
Actually, there is a normal way to do it. You will have to create Azure Functions App that will accept POST with your FTP/SFTP settings (in case you use one) and file name to remove. Therefore you parse request content to JSON, extract settings and use SSH.NET library to remove desired file. In case you have just a file share, you do not even need to bother with SSH.
Later on in Data Factory you add a Web Activity with dynamic content in Body section constructing the JSON request in the form I've mentioned above. For URL you specify published Azure Function Url + ?code=<your function key>
We actually ended up creating the whole bunch of Azure Functions that serve as custom activities for our DF pipelines.