Dynamically generate Extract scripts from metadata in USQL - azure

I have a requirement to read metadata information that comes in json format and dynamically generate extract statements to further transform data for that table.
I have currently loaded metadata information in Azure SQL DB. So, I would need to read this data and create extract statements on the fly and pass them to the USQL as a parameter.
Need some help in how to proceed with this and also whether this is the correct approach that I am following.
Thanks in advance.

Don't equate executing U-SQL to something like Stored Procedures in SQL Server: the two are quite different under the covers. For instance, passing parameters is kinda supported, but not like you may think, and [to the best of my knowledge] dynamic script elements aren't supported.
I do, however, think you could accomplish this with Azure Data Factory (ADF) and some custom code.
ADF executes U-SQL scripts by referencing a blob in Blob Storage, so you could have an ADF custom activity (Azure Batch) that reads your metadata and dynamically generates the U-SQL script to an Azure Blob.
Once available, the Data Factory can execute the generated script based on a pipeline parameter that holds the script name.
Doing this in ADF allows you to perform this complex operation dynamically. If you go this route, be sure to use ADF V2.

Related

Is it possible to store variables in Azure Data Factory pipelines?

In my Azure Data Factory pipeline, I want to use a variable, which gets updated on each run and which is also read on each run. At the moment, I am using a Database to achieve that. But it would be much simpler if Azure Data Factory provided a way of storing variables. So, my question is, is there any such facility in Azure Data Factory?
As #Joel Cochran says, ADF doesn't support persist a variable inside pipeline runs. We need to write data to a storage, eg. database or azure storage. Use Lookup Activity to get the value from blob storage file or DB. :)

Create a generic data factory with multiple linked services

Use Case: To create a generic data factory which can read data from different azure blob containers which has flat files into Azure SQL. I have created a data pipeline which uses stored procedures to populate the Azure SQL tables.
Issue: The trouble that I have is that I want to execute this data factory from my code and change the database and blob container on the fly and execute the same data factory with this new parameters. The Table names will remain the same on the Azure SQL side and the File name will also remain same in the blob storage. The change will the the Container or the folder name inside the Container which will be know before hand.
Please help me out or point me in the direction as to what could help me achieve this and if this can be at all be achieved or not.
You would need to use the parameterized datasets and linked services. Define parameters on your data factory pipeline (which you want to pass from your code e.g. container name or the folder name, connection string for SQL azure and connection string for blob storage). Once this is defined - you would need to pass these values downstream all the way till the linked service
i.e. something like this
Pipeline Parameters > Dataset Parameters > Linked Service Parameters

Get files list after azure data factory copy activity

Is there a method that gives me the list of files copied in azure data lake storage after a copy activity in azure data factory? I have to copy data from a datasource and after i have to skip files based on a particular condition. Condition must check also file path and name with other data from sql database. any idea?
As of now, there's no function to get the files list after a copy activity. You can however use a get Metadata activity or a Lookup Activity and chain a Filter activity to it to get the list of files based on your condition.
There's a workaround that you can check out here.
"The solution was actually quite simple in this case. I just created another pipeline in Azure Data Factory, which was triggered by a Blob Created event, and the folder and filename passed as parameters to my notebook. Seems to work well, and a minimal amount of configuration or code required. Basic filtering can be done with the event, and the rest is up to the notebook.
For anyone else stumbling across this scenario, details below:
https://learn.microsoft.com/en-us/azure/data-factory/how-to-create-event-trigger"

Generating and storing JSON files from the run-time parameters passed to Azure Data Factory v2 pipeline?

Can we create a file (preferably json) and store it in its supported storage sinks (like Blob, Azure Data Lake Service etc) using the parameters that are passed to Azure Data Factory v2 pipeline at run-time. I suppose it can be done via Azure Batch but it seems to be an overkill for such a trivial task. Is there a better way to do that?
Here are all the transform activities ADFv2 currently equips with, I'm afraid there isn't a direct way to create a file in ADFv2. You could leverage Custom activity to achieve this by running your customized code logic on an Azure Batch pool of virtual machines. Hope it'll help a little.

Azure Data Factory U-SQL activity dynamic parameter

Lets suppose I have a U-SQL script which get executed every one hour from ADF pipeline.
I have mssql database which contains config table. Is there any way to read a config from database and pass it to U-SQL script?
In ADF docs i couldn't find any way of doing it. Only SliceStart, SliceEnd, but what if my parameter is type of GUID ?
You can achieve this in ADF V2 (currently in Public Preview) using the Lookup activity. The lookup activity can pass the lookup results to the subsequent activity (in your case U-SQL activity).
https://learn.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity

Resources