Would really appreciate some assistance / pointers with what has to be a very common and probably simple situation, but just does not seem to be obvious to me.
Consider the situation below:
All I am doing is reading some data in from a blob - works fine.
Then, using a json parse to get the dynamic tags/labels to work with.
Then I am appending the 'name' values into a variable of name: 'myArr' of type array which I initialised a step or two earlier.
When I run I can inspect the contents of 'myArr' and all the names are in there. So far so good.
Now.. how can I write the contents of 'myArr' into a blob or out to a data lake? When I add a create blob activity there is no way to select the 'myArr' variable as the content. I am messing around with another foreach control activity, but it just gets messy and doesn't work.
There has to be a simple/elegant way to push that array variable into a blob?
Thanks!
The "Blob content" input box requires a file content but not a array variable, so you need to do some operation on "myArr". You can use "Parse JSON" action to parse the "myArr" and then you can select the "Body" from "Parse JSON" action into the "Blob content" input box, shown as below screenshot:
Hope it helps~
Related
I have an ADF pipeline which is iterating over a set of files, performing various operations and I have an Azure CosmosDB (SQL API) instance where I would like to insert the name of file and a timestamp, mainly to keep track on which files have been already processed and which not, but in the future I might want to add some other bits of data related to each file.
What I have is my CosmosDB
And currently I am trying to utilice the Copy Data Activity for the insert part.
One problem that I have is that this particular activity expects source while at this point I have only the filename. In theory it was an option to use the Blob Storage from where I read the file at the beginning, but since the Blob Storage is set to store binary files I got the following error if I try to use it as source
Because of that I created a dummy CosmosDB Linked service, but I have several issues with this approach:
Generally the idea for dummy source is not very appealing to me
I haven't find a lot of information on the topic but it seems that if I want to use something in the Sink I need to SELECT from the source
Even though I have selected a value for the id the item is not saved with the selected value from the Source query, but as you can see from the first screenshot I got a GUID and only the name is as I want it.
So my questions are two. I just learn ADF but this approach doesn't look like the proper way to insert item into CosmosDB from activity, so a better/more common approach would be appreciated. If there is not better proposal, how can I at least apply my own value for the id column? If I create the item in the CosmosDB GUI and save it from there, as you can see I am able to use the filename as id which for now seems like a good idea to me, but I wasn't able to add custom value (string or int) when I was trying through the activity, so how can I achieve this?
This is how my Sink looks like
I'm working with ADF and trying to leverage parameters to make life easier and reduce the number of objects being created in the ADF itself. What I am trying to do, would appear on the surface to be extremely simple, bu in reality its driving me slowly crazy. Would greatly appreciate any assistance!
I am trying to set up a parameterised dataset to be used as a sink target. Inside that dataset I have added a param named "filenames" of type string. In the connection tab I have added that param to the file part of the path. The folder part point to my Azure Data Lake folder and the file part is set to: #dataset().filename which is the result of choosing 'dynamic content' then selecting the param.
So far so good.. my sink target is, as far as I am aware, ready to receive "filenames" to write out to.
This is where it all goes wrong.
I now create a new pipeline. I want to use a list or array of values inside that pipeline which represent the names of the files I want to process. I have been told that I'll need a Foreach to send each of the values one at a time to the COPY DATA task behind the Foreach. I am no stranger to Foreach type loops and behaviors.. but for the life of me I CANNOT see where to set up the list of filenames. I can create a param as a type "array" but how the heck do you populate it?
I have another use case which this problem is preventing me from completing. This use case is, I think, the same problem but perhaps serves to explain the situation more clearly. It goes like this:
I have a linked service to a remote database. I need to copy data from that database (around 12 tables) into the data lake. At the moment I have about 12 "COPY DATA" actions linked together - which is ridiculous. I want to use a Foreach loop to copy the data from source to data lake one after the other. Again, I can set up the sink dataset to be parameterised, just fine... but how the heck do I create the array/list of table names in the pipeline to pass to the sink dataset?
I add the Foreach and inside the foreach a "COPY DATA" but where do I add all the table names?
Would be very grateful for any assistance. THANK YOU.
If you want to manually populate values of an array as a pipeline parameter, you create the parameter with Array type and set the value with syntax like: ["File1","File2","File3"]
You then iterate that array using a ForEach activity.
Inside the ForEach, you reference #item() to get the current file name value the loop is on.
You can also use a Lookup activity to get data from elsewhere and iterate over that using the ForEach.
is there a way to create a template to validate incoming files including such checks as empty file checks, format, data types, record counts along, and will stop the workflow if any of the checks fail. The solution for this requirement should consider multiple file-formats and reduce the burden on ETL processing and checks to enable scale.
File transfer to occur either by trigger or data currency rule
Data Factory more focus on data transfer, not the file filter.
We could using the get metadata and if-condition to achieve some of the these feature, such as validate the file format, size, file name. You can use Get Metadata to get the file properties and If-condition can help you filter the file.
But that's too complexed for Data Factory to help you achieve all the features you want.
Update:
For example, we can parameter a file in source, :
Create dataset parameter filename and pipeline parameter name:
Using Get metadata to get its properties: Item type, Exists, Size, Item name.
Output:
For example, We can build expression in if-condition to judge if it's empyt(size=0):
#equals(activity('Get Metadata1').output.size,0)
If Ture means it's empty, False no empty. Then we can build the workflow in True or False active.
Hope this helps.
I demonstrate similar techniques to validate source files and take appropriate downstream actions in your pipeline based on those values in this video.
My JSON array is as follows.
[{"20656":"20656","20648":"20648","20666":"20666","20657":"20657","20658":"20658","20659":"20659","20660":"20660","20665":"20665","20672":"20672","20667":"20667","24517":"24517","20677":"20677","20662":"20662","24605":"24605","20675":"20675","20663":"20663","20649":"20649","20664":"20664","20668":"20668","20669":"20669","20670":"20670","20671":"20671","20673":"20673","20674":"20674","20676":"20676"}]
How do I use each individual value and use it as a variable for my next query.
Thanks,
Assuming your variable looks like this
Add Select Action
Which has From property set to
split(replace(replace(replace(variables('MyJsonArray'),'[{',''),'}]',''),'"',''),',')
And Map to pair MyID with expression
substring(item(),0,lastIndexOf(item(),':'))
Now you can simply iternate over all IDs with simple Foreach and refer to each ID by using expression
item()['MyID']
You can use "Parse JSON" action to parse your json data.
First, I create a "Initialize variable" action to store the json data(shown as below screenshot)
Then create "Parse JSON" action to parse the json object above.
If you don't know how to create the schema, you can click "Use sample payload to generate schema" and input your json data into it. It will generate the schema for you automatically. You can also refer to this tutorial: https://learn.microsoft.com/en-us/azure/logic-apps/logic-apps-perform-data-operations#parse-json-action
After that, we can use each individual value as variable in our logic app.(I created "Initialize variable 2" in the screenshot below as an example).
I use Get Metadata to retrieve files name in a blob container. I also have a README.md in this same blob container.
I would like to be able to apply filter and set variable value in function of the files present in the blob container, but without having consideration of the README.md file. How is it possible?
As an example, here is a logic I would like to implement for setting Variable value:
#if(and(equals(greater(activity('FilterOnOthers').output.FilteredItemsCount,0),true),not(equals(activity('FilterOnOthers').output.Value[0],'README.md'))),'YES','NO')
But it does not work as expected.
Thank you for your help
You can use an If Loop condition. In the If Loop condition check for the metadata output. The condition should be file name README.md. Use your desired activity inside the If Loop based on either TRUE/FALSE
Great question! Here's an almost fool-proof way to do so :
Create a variable of Array type in your pipeline, say 'x'.
Have the get metadata activity to read the folder and it's childItems by adding Child Items in the field list of the Dataset as shown below (highlighted) :
After getting the list of child items as an array in the output activity of the Get Metadata activity, chain a ForEach activity as shown in the above screenshot.
In the ForEach activity, for Items, use expression : #activity('Get Metadata1').output.childItems
In the activities tab of the forEach activity, create an ifCondition activity.
In the ifCondition activity, specify the condition. eg- #equals(item().name, 'README.md').
In the Activities tab of the ifCondition, add an "Append Variable" activity for false condition.
In the Append Variable, append value : #item().name to the variable 'x'.
Now your variable 'x' has all values except 'README.md'.
Hope I was clear in the explanation.