How to delete a Linked Service in Data Factory? - azure

I have a Data Factory pipeline that contains a bunch of Linked Services. Some of them are not used anymore and I would like to delete them. However, when I try to delete some of them, Data Factory complains that it is still used with the following error:
Error: Not able to delete linked service, message returned from
service - The document cannot be deleted since it is referenced by
staging_zone_dim.
Question
I made sure that no Datasets or Pipelines references it and still, I do get the error message when I try to delete the linked services. What am I missing?

You should have a look in the published view of the data factory and delete the pipelines there too before deleting the linked service.
You can switch here by choosing data factory instead of Azure DevOps:

I have been able to work around this issue by directly going to the Git repository that contained the json files and deleting the unused configuration files.

Related

Not able to deleted the linked service referenced in other linked service

I am testing the Azure data factory deployment using ARM Templates and deleting the ADF instance (Data factory pipeline, linked services, data sets, data flow, Trigger etc.) using Azure Data Factory Delete item in-built task in DevOps Pipeline from azure Devops before deploying to UAT and production. All items deleted as per task outcome but there is one linked service which didn't delete.
Giving error= deleting LS_1 Linked Service: the document cannot be deleted since it is referenced by LS_2. basically the LS_2 deleted and it is not showing in the UAT ADF environment, only LS_1 is showing.
please find attached screenshot. please share your valuable suggestion on this how to resolve it.
Thanks

Automatically adding data to cosmos DB through ARM template

I made an ARM template which runs through an azure devops pipeline to create a new cosmos instance and put two collections inside it. I'd like to put some data inside the collections (fixed values, same every time). Everything is created in the standard way, e.g. the collections are using
"type": "Microsoft.DocumentDb/databaseAccounts/apis/databases/containers"
I think these are the relevant docs.
I haven't found mentions of automatically adding data much, but it's such an obviously useful thing I'm sure it will have been added. If I need to add another step to my pipeline to add data, that's an option too.
ARM templates are not able to insert data into Cosmos DB or any service with a data plane for many of the reasons listed in the comments and more.
If you need to both provision a Cosmos resource and then insert data into it you may want to consider creating another ARM template to deploy an Azure Data Factory resource and then invoke the pipeline using PowerShell to copy the data from Blob Storage into the Cosmos DB collection. Based upon the ARM doc you referenced above it sounds as though you are creating a MongoDB collection resource. ADF supports MongoDB so this should work very well.
You can find the ADF ARM template docs here and the ADF PowerShell docs can be found here. If you're new to using ARM to create ADF resources, I recommend first creating it using the Azure Portal, then export it and examine the properties you will need to drive with parameters or variables during deployment.
PS: I'm not sure why but this container resource path (below) you pointed to in your question should not used as it breaks a few things in ARM, namely, you cannot put a resource lock on it or use Azure Policy. Please use the latest api-version which as of this writing is 2021-04-15.
"type": "Microsoft.DocumentDb/databaseAccounts/apis/databases/containers"

Get files list after azure data factory copy activity

Is there a method that gives me the list of files copied in azure data lake storage after a copy activity in azure data factory? I have to copy data from a datasource and after i have to skip files based on a particular condition. Condition must check also file path and name with other data from sql database. any idea?
As of now, there's no function to get the files list after a copy activity. You can however use a get Metadata activity or a Lookup Activity and chain a Filter activity to it to get the list of files based on your condition.
There's a workaround that you can check out here.
"The solution was actually quite simple in this case. I just created another pipeline in Azure Data Factory, which was triggered by a Blob Created event, and the folder and filename passed as parameters to my notebook. Seems to work well, and a minimal amount of configuration or code required. Basic filtering can be done with the event, and the rest is up to the notebook.
For anyone else stumbling across this scenario, details below:
https://learn.microsoft.com/en-us/azure/data-factory/how-to-create-event-trigger"

Azure Data Factory and SharePoint

I have some Excel files stored in SharePoint online. I want copy files stored in SharePoint folders to Azure Blob storage.
To achieve this, I am creating a new pipeline in Azure Data factory using Azure Portal. What are possible ways to copy files from SharePoint to Azure blob store using Azure Data Factory pipelines?
I have looked at all linked services types in Azure data factory pipeline but couldn't find any suitable type to connect to SharePoint.
Rather than directly accessing the file in SharePoint from Data Factory, you might have to use an intermediate technology and have Data Factory call that. You have a few of options:
Use a Logic App to move the file
Use an Azure Function
Use a custom activity and write your own C# to copy the file.
To call a Logic App from ADF, you use a web activity.
You can directly call an Azure Function now.
We can create a linked service of type 'File system' by providing the directory URL as 'Host' value. To authenticate the user, provide username and password/AKV details.
Note: Use Self-hosted IR
You can use the logic app to fetch data from Sharepoint and load it to azure blob storage and now you can use azure data factory to fetch data from blob even we can set an event trigger so that if any file comes into blob container the azure pipeline will automatically trigger.
You can use Power Automate (https://make.powerautomate.com/) to do this task automatically:
Create an Automated cloud flow trigger whenever a new file is dropped in a SharePoint
Use any mentioned trigger as per your requirement and fill in the SharePoint details
Add an action to create a blob and fill in the details as per your use case
By using this you will be pasting all the SharePoint details to the BLOB without even using ADF.
My previous answer was true at the time, but in the last few years, Microsoft has published guidance on how to copy documents from a SharePoint library. You can copy file from SharePoint Online by using Web activity to authenticate and grab access token from SPO, then passing to subsequent Copy activity to copy data with HTTP connector as source.
I ran into some issues with large files and Logic Apps. It turned out there were some extremely large files to be copied from that SharePoint library. SharePoint has a default limit of 100 MB buffer size, and the Get File Content action doesn’t natively support chunking.
I successfully pulled the files with the web activity and copy activity. But I found the SharePoint permissions configuration to be a bit tricky. I blogged my process here.
You can use a binary dataset if you just want to copy the full file rather than read the data.
If my file is located at https://mytenant.sharepoint.com/sites/site1/libraryname/folder1/folder2/folder3/myfile.CSV, the URL I need to retrieve the file is https://mytenant.sharepoint.com/sites/site1/libraryname/folder1/folder2/folder3/myfile.CSV')/$value.
Be careful about when you get your auth token. Your auth token is valid for 1 hour. If you copy a bunch of files sequentially, and it takes longer than that, you might get a timeout error.

Delete remote files using Azure Data Factory

How do I delete all files in a source folder (located on On-premise file system). I would need help with a .NET custom activity or any Out-of-the-box solutions in Azure Data Factory.
PS: I did find a delete custom activity but it's more towards Blob storage.
Please help.
Currently, there is NO support for a custom activity on Data Management Gateway. Data Management Gateway only supports copy activity and Stored Procedure activity as of today (02/22/2017).
Work Around: As I do not have a delete facility for on-premise files, I am planning to have source files in folder structures of yyyy-mm-dd. So, every date folder (Ex: 2017-02-22 folder) will contain all the related files. I will now configure my Azure Data Factory job to pull data based on date.
Example: The ADF job on Feb 22nd will search for 2017-02-22 folder. In the next run my ADF job will search for 2017-02-23 folder. This way I don't need to delete the processed files.
Actually, there is a normal way to do it. You will have to create Azure Functions App that will accept POST with your FTP/SFTP settings (in case you use one) and file name to remove. Therefore you parse request content to JSON, extract settings and use SSH.NET library to remove desired file. In case you have just a file share, you do not even need to bother with SSH.
Later on in Data Factory you add a Web Activity with dynamic content in Body section constructing the JSON request in the form I've mentioned above. For URL you specify published Azure Function Url + ?code=<your function key>
We actually ended up creating the whole bunch of Azure Functions that serve as custom activities for our DF pipelines.

Resources