Use Case: To create a generic data factory which can read data from different azure blob containers which has flat files into Azure SQL. I have created a data pipeline which uses stored procedures to populate the Azure SQL tables.
Issue: The trouble that I have is that I want to execute this data factory from my code and change the database and blob container on the fly and execute the same data factory with this new parameters. The Table names will remain the same on the Azure SQL side and the File name will also remain same in the blob storage. The change will the the Container or the folder name inside the Container which will be know before hand.
Please help me out or point me in the direction as to what could help me achieve this and if this can be at all be achieved or not.
You would need to use the parameterized datasets and linked services. Define parameters on your data factory pipeline (which you want to pass from your code e.g. container name or the folder name, connection string for SQL azure and connection string for blob storage). Once this is defined - you would need to pass these values downstream all the way till the linked service
i.e. something like this
Pipeline Parameters > Dataset Parameters > Linked Service Parameters
Related
What i want to do is similar to what we can easily do with Azure Sql Server databases, where we can click on the copy functionality what create the same database in another Sql Server.
I don't see that functionality in Azure Cosmos DB resource.
Looking in the Microsoft documentation they seem to point into a Data migration tool
but if we already have many containers/collections and millions of records, running this locally might be impractical.
Is there any other suggestion?
You can use Azure data factory pipeline to move data from one azure cosmos container to another container.
Here are the steps I have followed to move data from one container to another using ADF.
I have two containers in Cosmos db as shown below,
Employee container has data,
Initially staff1 has no data,
Created an Azure data factory resource.
Created Linked service of type Azure cosmos DB.
Created two data sets of type Azure cosmos DB. One is for source, and another is for sink.
Created a pipeline as shown below,
Selected Sink as shown below,
Ran pipeline run successfully, and data is inserted into target container.
Data is inserted into target as shown below,
Reference link
I have Database Username , Servername, hostand other details that are stored in Table. I wanted to make a Linked Service that can use the connection details from these table and store it in parameter.
As of now i am hardcoding these details in parameters created in linked service but I want a generic linked service that can take details from table or from pipeline parameter.
AFAIK, there is no such feature available in Azure Data Factory which allows to parameterize the Linked Service or the pipeline where values are stored in a out source Table or file. You need to define the values in ADF only.
The standard and only way possible is to parameterize a linked service and pass dynamic values at run time by defining the values in ADF. For example, if you want to connect to different databases on the same logical SQL server, you can now parameterize the database name in the linked service definition. This prevents you from having to create a linked service for each database on the logical SQL server.
You can use parameters to pass external values into pipelines,
datasets, linked services, and data flows. Once the parameter has been
passed into the resource, it cannot be changed. By parameterizing
resources, you can reuse them with different values each time.
Parameters can be used individually or as a part of expressions. JSON
values in the definition can be literal or expressions that are
evaluated at runtime.
The official document Parameterize linked services in Azure Data Factory will help you to understand the complete fundamentals.
Usecase: I have data files of varying size copied to a specific SFTP folder periodically (Daily/Weekly). All these files needs to be validated and processed. Then write them to related tables in Azure SQL. Files are of CSV format and are actually a flat text file which directly corresponds to a specific Table in Azure SQL.
Implementation:
Planning to use Azure Data Factory. So far, from my reading I could see that I can have a Copy pipeline in-order to copy the data from On-Prem SFTP to Azure Blob storage. As well, we can have SSIS pipeline to copy data from On-Premise SQL Server to Azure SQL.
But I don't see a existing solution to achieve what I am looking for. can someone provide some insight on how can I achieve the same?
I would try to use Data Factory with a Data Flow to validate/process the files (if possible for your case). If the validation is too complex/depends on other components, then I would use functions and put the resulting files to blob. The copy activity is also able to import the resulting CSV files to SQL server.
You can create a pipeline that does the following:
Copy data - Copy Files from SFTP to Blob Storage
Do Data processing/validation via Data Flow
and sink them directly to SQL table (via Data Flow sink)
Of course, you need an integration runtime, that can access the on-prem server - either by using VNet integration or by using the self hosted IR. (If it is not publicly accessible)
In my Azure Data Factory pipeline, I want to use a variable, which gets updated on each run and which is also read on each run. At the moment, I am using a Database to achieve that. But it would be much simpler if Azure Data Factory provided a way of storing variables. So, my question is, is there any such facility in Azure Data Factory?
As #Joel Cochran says, ADF doesn't support persist a variable inside pipeline runs. We need to write data to a storage, eg. database or azure storage. Use Lookup Activity to get the value from blob storage file or DB. :)
I want to copy data from azure blob storage to azure sql database. The destination database is divided among different tables.
So is there any way in which i directly send the blob data to different sql tables using a single pipeline in one copy activity?
As this should be a trigger based pipeline so it is a continuous process, i created trigger for every hour but right now i can just send blob data to one table and then divide them into different table by invoking another pipeline where source and sink dataset both are SQL database.
Finding a solution for this
You could use a stored procedure in your database as a sink in the copy activity. This way, you can define the logic in the stored procedure to write the data to your destination tables. You can find the description of the stored procedure sink here.
You'll have to use a user defined table type for this solution, maintaining them can be difficult, if you run into issues, you can have a look at my & BioEcoSS' answer in this thread.
According to my experience and Azure Data Factory doucmentation, we could not directly send the blob data to different sql tables using a single pipeline in one copy activity.
Because during Table mapping settings, One Copy Data Active only allows us select one corresponding table in the destination data store or specify the stored procedure to run at the destination.
You don't need to create a new pipeline, just add a new copy data active, each copy active call different stored procedure.
Hope this helps.