How to remove special characters from XML stored in ADLS using Azure data factory or any other option? - azure

I have scenario where i need to remove some characters from xml tags which is stored in ADLS. I am looking for an option with ADF. Can someone help me here with approach i should follow?

This is not possible by ADF. May you can have piece code to do this in
Azure Functions. As, Azure Data Factory can do data movement and data
transformation only. When you are saying about tags that means it does
not come under that.
You may use the Azure Function activity in a Data Factory pipeline to run Azure Functions. To launch an Azure Function, you must first set up a connected service connection and an activity that specifies the Azure Function you want to perform.
There is the Microsoft document which have deep insights about Azure Function Activity in ADF | Here.

Related

Is it possible to store variables in Azure Data Factory pipelines?

In my Azure Data Factory pipeline, I want to use a variable, which gets updated on each run and which is also read on each run. At the moment, I am using a Database to achieve that. But it would be much simpler if Azure Data Factory provided a way of storing variables. So, my question is, is there any such facility in Azure Data Factory?
As #Joel Cochran says, ADF doesn't support persist a variable inside pipeline runs. We need to write data to a storage, eg. database or azure storage. Use Lookup Activity to get the value from blob storage file or DB. :)

Use Azure Functions as custom activity in ADFv2

Is it possible to somehow package and execute already written azure function as a custom activity in azure data factory?
My workflow is next:
I want to use azure function (which is doing some data processing) in ADF pipeline as a custom activity. This custom activity is just one of the activities in pipeline but its key to be executed.
Is it possible to somehow package and execute already written azure
function as a custom activity in azure data factory?
As I know, there is no way to do that so far. In my opinion, you do not need to package the Azure Function. I suggest you using Web Activity to invoke the endpoint of your Azure Function which could merge into previous pipeline nicely.

Generating and storing JSON files from the run-time parameters passed to Azure Data Factory v2 pipeline?

Can we create a file (preferably json) and store it in its supported storage sinks (like Blob, Azure Data Lake Service etc) using the parameters that are passed to Azure Data Factory v2 pipeline at run-time. I suppose it can be done via Azure Batch but it seems to be an overkill for such a trivial task. Is there a better way to do that?
Here are all the transform activities ADFv2 currently equips with, I'm afraid there isn't a direct way to create a file in ADFv2. You could leverage Custom activity to achieve this by running your customized code logic on an Azure Batch pool of virtual machines. Hope it'll help a little.

How to handle Incremental & Full Upload in a Azure Data Factory

We have a Azure Storage Account with 2 blob stores. A Full and a Inc.
In the Full we place the full upload CSV files whenever a Full Upload is needed, in the Inc we just place day by day small incremental CSV Files.
We load all our data first in a staging, then to the ODS en finally to a Edw (Enterprise DW).
A full upload is only needed when there are structural changes to the tables.
Basically the only difference between the two uploads is that the full also cleares all data in the ODS and the EDW, but runs the sames stored procedures in the pipelines, ...
Anybody has tips on how to handle such a situation in a Azure Data Factory.
I would prefer not to double the data-factories, but due to the different avalability/frequency of the output datasets I can't use the same staging logical (in the data-factory) table as output dataset ....
So any hint(s) are appreciated ...
First of all to be clear ADF is just there to invoke other Azure services, it doesn't do any of the work itself. So the question really is; what services in Azure could you call from ADF to do this work and manage this situation?
To answer that...
Option 1: I would suggest you look at Azure Data Lake. I've written simply procedures to what you've described above in USQL where parameters can be passed to the USQL procedures from ADF for different types of behaviour.
The code you create can live in an Azure Data Lake Analytics database, similar to TSQL objects. Then maybe start using Azure Data Lake Storage as well, instead of normal blobs.
Option 2: Break out the C# and create yourself an Azure Data Factory custom activity and create a set of classes to do exactly what you require. Again with params passed by ADF or include logic in the methods to check the 'full' table contents. This will however involve a lot more development work and require an Azure Batch Service for the compute.

Azure Data Factory - moving data from On-Premise SQL to Azure SQL

A simple question: Can this be achieved directly? I mean without the Azure blob storage in between (as showed in all the examples)? Can someone provide some code example please.
yes, you can do this directly. In fact, you can do direct copies from any of our supported sources/sinks, you don't have to pass through blob. To go from on-prem SQL Server-->SQL azure, you will need to setup a Data Management Gateway connector on your on-prem server. Then, you use a linked service of type AzureStorage and an output dataset of type AzureSQLTable as the output dataset, instead of AzureBlob as is shown in the example. The exact steps to setup the DMG and the JSON code for the linked services, datasets, and pipelines can be found in our documentation. We are also improving our UI in the near future to make these kinds of copy setups an easy code-free experience.
https://azure.microsoft.com/en-us/documentation/articles/data-factory-sqlserver-connector/

Resources