I am trying to use two forEach activities to iterate on subfolders of folders with parameters to get metadata of subfolders. I have forEach1 and forEach2 with their own items array. Within the second for loop I need to combine both for loops' item() in a Metada activity to access my dataset like #item1()#item2(). Is this possible?
Nested foreach activity is not allowed. But you could use an execute pipeline activity inside the foreach activity. And in the nested pipeline, you could have another foreach.
It is possible but the second ForEach activity needs to be inside the first one, not another activity in the pipeline.
As you have it now, the first ForEach will run until completion, then the second one will start and you cannot access the items in the first one.
Hope this helped!
Related
I have a azure data factory pipeline with a lookup activity that check a JSON file.
The size is like below in azure:
Azure Blog Size Screenshot
and when I download it, I see below values for the file. so it's not larger that the value the error states: "The size 5012186 of lookup activity result exceeds the limitation 4194304"
Size of the data as opened in Notepad ++
Also below is the design of my pipeline that gets stuck:
Pipeline design - Lookup Activity to Read my model.json file to retrieve metadata
Any ideas on how to tackle this issue? thanks in advance
As lookup has the limitation of 5000 rows, you can try the below workaround for this.
To overcome this the workaround is as mentioned in Microsoft Document
Design a two-level pipeline where the outer pipeline iterates over an inner pipeline, which retrieves data that doesn't exceed the maximum rows or size.
Possible solution:
First, try to save your files list as JSON files to a folder of Blob storage with the size of 5000 rows.
create Get Metadata activity which can fetch the files from the folder
Get Metadata activity settings
Then create For-each activity to Iterate over files
In for-each activity setting give items as #activity('Get Metadata1').output.childItems
For the files create a dataset and give the folder name manually and for filename use the dataset parameter, which we can give the filename in the lookup inside the parent ForEach.
Lookup activity inside Parent ForEach give the file name as #string(item().name)
Execute Pipeline activity:
Before this create an array parameter in the child pipeline and pass the look up output inside ForEach to that in the Execute Pipeline activity.
Give look up output #activity('Lookup1').output.value
Now create inside the Child Pipeline and give the array parameter to the ForEach as #pipeline().parameters.ok
You can use which ever activity you want inside this ForEach, here I have used append.
Then create result1 variable as array and give value as #variables('arrayid')
The Output will be the array of all ids in the file
BgULa02.png)
I have a folder in ADLS that has few files. For the purpose of understanding, I will keep it simple. I have the following three files. When I loop through this folder, I want to get the "file name" and "source" as separate parameters so that I can pass it subsequent activities/pipelines.
employee_crm.txt
contractor_ps.txt
manager_director_sap.txt
I want to put this in an array so that it can be passed accordingly to the subsequent activities.
(employee, contractor, manager_director)
(crm, ps, sap)
I want to pass two parameters to my subsequent activity (may be a stored procedure) as usp_foo (employee, crm) and it will execute the process based on the parameters. Similary, usp_foo (contractor, ps) and usp_foo (manager_director, sap).
How do I get the child items as two separate parameters so that it can be passed to SP?
To rephrase the question, you would like to 1) get a list of blob names and 2) parse those names into 2 variables. This pattern occurs frequently, so the following steps will guide you through how to accomplish these tasks.
Define an ADLS DataSet that specifies the folder. You do not need a schema, and you can optionally parameterize the FileSystem and Directory names:
To get a list of the objects within, use the GetMetadata activity. Expand the "Field list" section and select "Child Items" in the drop down:
Add a Filter activity to make sure you are only dealing with .txt files. Note it targets the "childItems" property:
You may obviously alter these expressions to meet the specific needs of your project.
Use ForEach activity to loop through each element in the Filter sequentially:
Inside the ForEach, add activities to parse the filename. To access the fileName, use "item().name":
In my example, I am storing these values as pipeline variables, which are global [hence the need to perform this operation sequentially]. Storing them in an Array for further use gets complicated and tricky in a hurry because of the limited Array and Object support in the Pipeline Expression Language. The inability to have nested foreach activities may also be a factor.
To overcome these, at this point I would pass these values to another pipeline directly inside the ForEach loop.
This pattern has the added benefit of allowing individual file execution apart from the folder processing.
I have a pipeline built that reads metadata from a blob container subfolder raw/subfolder. I then execute a foreach loop with another get metadata task to get data for each subfolder, it returns the following type of data. /raw/subfolder1/folder1, /raw/subfolder2/folder1, /raw/subfolder2/folder1 and so on. I need another foreach loop to access the files inside of each folder. The problem is that you cannot run a foreach loop inside of another foreach loop so I cannot iterate further on the files.
I have an execute datapipline that calls the above pipeline and then uses a foreach. My issue with this is that I'm not finding a way to pass the item().name from the above iteration to my new pipeline. It doesn't appear you can pass in objects form the previous pipeline? How would I be able to accomplish this nested foreach metat data gathering so I can iterate further on my files?
Have you tried using parameters? Here is how it would look like:
In your parent pipeline, click on the "Execute Pipeline" activity which triggers the inner (your new pipeline) go to Settings and specify item name as a parameter "name".
In your inner pipeline, click anywhere on empty space and add new parameter "name".
Now you can refer to that parameter like this: pipeline().parameters.name
Using Parameters works in this scenario as #Andrii mentioned.
For more on passing parameters between activities refer to this link.
https://azure.microsoft.com/en-in/resources/azure-data-factory-passing-parameters/
I use Get Metadata to retrieve files name in a blob container. I also have a README.md in this same blob container.
I would like to be able to apply filter and set variable value in function of the files present in the blob container, but without having consideration of the README.md file. How is it possible?
As an example, here is a logic I would like to implement for setting Variable value:
#if(and(equals(greater(activity('FilterOnOthers').output.FilteredItemsCount,0),true),not(equals(activity('FilterOnOthers').output.Value[0],'README.md'))),'YES','NO')
But it does not work as expected.
Thank you for your help
You can use an If Loop condition. In the If Loop condition check for the metadata output. The condition should be file name README.md. Use your desired activity inside the If Loop based on either TRUE/FALSE
Great question! Here's an almost fool-proof way to do so :
Create a variable of Array type in your pipeline, say 'x'.
Have the get metadata activity to read the folder and it's childItems by adding Child Items in the field list of the Dataset as shown below (highlighted) :
After getting the list of child items as an array in the output activity of the Get Metadata activity, chain a ForEach activity as shown in the above screenshot.
In the ForEach activity, for Items, use expression : #activity('Get Metadata1').output.childItems
In the activities tab of the forEach activity, create an ifCondition activity.
In the ifCondition activity, specify the condition. eg- #equals(item().name, 'README.md').
In the Activities tab of the ifCondition, add an "Append Variable" activity for false condition.
In the Append Variable, append value : #item().name to the variable 'x'.
Now your variable 'x' has all values except 'README.md'.
Hope I was clear in the explanation.
I have a pipeline that includes a simple copy task that reads data from an SFTP source and writes to a table within a server. I have successfully parameterized the pipeline to prompt for which server and table I want to use at runtime but I want to specify a list of server/table pairs in a table that is accessed by a lookup task for use as parameters instead of needing to manually enter the server/table each time. For now it's only three combinations of servers and tables but that number should be able to flex as needed.
The issue I'm running into as that when I try to specify the array variable as my parameter in the lookup task within a For Each loop the pipeline fails telling me I need to specify an integer in the value array. I understand what it's telling me but it doesn't seem logical to me that I'd have to specify '0', '1','2' and so on each time.
How do I just let it iterate through the server and table pairs until there aren't any more to process? I'm not sure of the exact syntax but there has to be a way to tell it run the pipeline once with this server and table, again with a different server and table, then again and again until no more pairs are found in the table.
Not sure if it matters but I am on the data flow preview and using ADFv2
https://learn.microsoft.com/en-us/azure/data-factory/control-flow-for-each-activity#iteration-expression-language
I guess you want to access the iterate item, which is item() in adf expression language.
If you append a foreach activity after a look up activity, and put the output of lookup activity in items field in foreach activity, then item() means the iterate item in the lookup output.