Get files list after azure data factory copy activity - azure

Is there a method that gives me the list of files copied in azure data lake storage after a copy activity in azure data factory? I have to copy data from a datasource and after i have to skip files based on a particular condition. Condition must check also file path and name with other data from sql database. any idea?

As of now, there's no function to get the files list after a copy activity. You can however use a get Metadata activity or a Lookup Activity and chain a Filter activity to it to get the list of files based on your condition.
There's a workaround that you can check out here.
"The solution was actually quite simple in this case. I just created another pipeline in Azure Data Factory, which was triggered by a Blob Created event, and the folder and filename passed as parameters to my notebook. Seems to work well, and a minimal amount of configuration or code required. Basic filtering can be done with the event, and the rest is up to the notebook.
For anyone else stumbling across this scenario, details below:
https://learn.microsoft.com/en-us/azure/data-factory/how-to-create-event-trigger"

Related

Can i populate different SQL tables at once inside azure data factory when the source data set is Blob storage?

I want to copy data from azure blob storage to azure sql database. The destination database is divided among different tables.
So is there any way in which i directly send the blob data to different sql tables using a single pipeline in one copy activity?
As this should be a trigger based pipeline so it is a continuous process, i created trigger for every hour but right now i can just send blob data to one table and then divide them into different table by invoking another pipeline where source and sink dataset both are SQL database.
Finding a solution for this
You could use a stored procedure in your database as a sink in the copy activity. This way, you can define the logic in the stored procedure to write the data to your destination tables. You can find the description of the stored procedure sink here.
You'll have to use a user defined table type for this solution, maintaining them can be difficult, if you run into issues, you can have a look at my & BioEcoSS' answer in this thread.
According to my experience and Azure Data Factory doucmentation, we could not directly send the blob data to different sql tables using a single pipeline in one copy activity.
Because during Table mapping settings, One Copy Data Active only allows us select one corresponding table in the destination data store or specify the stored procedure to run at the destination.
You don't need to create a new pipeline, just add a new copy data active, each copy active call different stored procedure.
Hope this helps.

Filter blob data in Copy Activity

I have a copy Activity that copies data from Blob to Azure Data Lake. The Blob is populated by an Azure function with an event hub trigger. Blob files are appended with UNIX timestamp which is the event enqueued time in the event hub. Azure data factory is triggered every hour to merge the files and move them over to Data lake.
Inside the source dataset I have filters by Last Modified date in UTC time out of the box. I can use this but it limits me to use Last modified date in the blob. I want to use my own date filters and decide where I want to apply these filters. Is this possible in Data factory? If yes, can you please point me in the right direction.
For ADF in any case,the only idea that came to my mind is using combination of Look Up Activity ,ForEach Activity and Filter Activity.Maybe it is kind of complex.
1.Use Look up to retrieve the data from the blob file.
2.Use ForEach Activity to loop the result and set your data time filters.
3.Inside the ForEach Activity, do the copy task.
Please refer to this blog to get some clues.
Reviewing your descriptions of all the tasks you did now, I suggest you getting an idea of Azure Stream Analytics Service. No matter the data source is Event Hub or Azure Blob Storage, ASA supports them as input. And it supports ADL as output.
You could create a job to configure input and output,then use popular SQL language to filter your data however you want.Such as Where operator or DataTime Functions.

Dynamically generate Extract scripts from metadata in USQL

I have a requirement to read metadata information that comes in json format and dynamically generate extract statements to further transform data for that table.
I have currently loaded metadata information in Azure SQL DB. So, I would need to read this data and create extract statements on the fly and pass them to the USQL as a parameter.
Need some help in how to proceed with this and also whether this is the correct approach that I am following.
Thanks in advance.
Don't equate executing U-SQL to something like Stored Procedures in SQL Server: the two are quite different under the covers. For instance, passing parameters is kinda supported, but not like you may think, and [to the best of my knowledge] dynamic script elements aren't supported.
I do, however, think you could accomplish this with Azure Data Factory (ADF) and some custom code.
ADF executes U-SQL scripts by referencing a blob in Blob Storage, so you could have an ADF custom activity (Azure Batch) that reads your metadata and dynamically generates the U-SQL script to an Azure Blob.
Once available, the Data Factory can execute the generated script based on a pipeline parameter that holds the script name.
Doing this in ADF allows you to perform this complex operation dynamically. If you go this route, be sure to use ADF V2.

Iterating Through azure SQL table in Azure Data Factory

I'm using ADF 2 and trying to grapple with the web activity.
The tasks are.
Copy file from blob storage and put the data in an azure SQL database
Iterate through the data and use a PUT call to a REST API to update the data
Okay so I can get the data in the table no problem. I can also make the call using a Web activity to the the API and put some hard coded data there.
But I've been trying to use a For Each to iterate through the table I have and call the web activity to pass that data to the API
This is where I'm stuck. I'm New to data factory and been through all their standard help information but not getting any where.
Any help is appreciated
I think you need to drive the foreach via a SQL lookup task that populates a data set and then call the activity for each row:
here are some posts to get you started:
https://learn.microsoft.com/en-us/azure/data-factory/control-flow-for-each-activity
https://learn.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity
replace the copy activity with the web call in the tutorial below:
https://learn.microsoft.com/en-us/azure/data-factory/tutorial-bulk-copy-portal

azure data factory dataset cleanup

Does anyone tell me how Azure Data factory datasets are cleaned up (removed, deleted etc). Is there any policy or settings to control it?
From what I can say, all the time series of data sets are left behind intact.
Say, I want to develop an activity which overwrites data daily in the destination folder in Azure Blob or Data Lake storage (for example which is mapped to external table in Azure Datawarehouse and it is a full data extract). How can I achieve this with just copy activity? Shall I add custom .Net activity to do the cleanup no longer needed datasets myself?
Yes, you would need a custom activity to delete pipepline output.
You could have pipeline activities that overwrite the same output but you must be careful to understand how ADF slicing model and activity dependency works, so that anything using the output gets a clear and repeatable set.

Resources