Long story short, I have a data dump that is too large for an azure function. So we are using Data Factory.
I have tasked another function to generate an access token for an API and output it as part of a json. I would like to set that token to a variable within the pipeline. So far I have this:
I'm attempting to use the Dynamic Content "language" to set the variable:
#activity('Get_Token').output
I'd like something like pythons:
token = data.get('data', {}).get('access_token', '')
As a secondary question, my next step is to use this token to call an API while iterating over another output, so perhaps this exact step can be added into the ForEach?
Looks like the variable should be #activity('Get token').output.data.access_token as others have indicated but, as you've guessed, there's no need to assign a variable if you only need it within the foreach. You can access any predecessor output from that successor activity. Here's how to use the token while iterating over another output:
Let's say your function also outputs listOfThings as an array
within the data key. Then you can set the foreach activity to
iterate over #activity('Get token').output.data.listOfThings.
Inside the foreach you will have (let's say) a Copy activity with a
REST dataset as the source. Configure the REST linked service
with anonymous auth ...
... then you'll find a field called Additional
Headers in the REST dataset where you can create a key Authorization
with value as above, Basic #activity('Get token').output.data.access_token
The thing that you said you want to iterate over (in the listOfThings JSON array) can be referenced inside the foreach activity with
#item() (or, if it's a member of an item in the listOfThings
iterable then it would be #item().myMember)
To make #4 explicit for anyone else arriving here:
If listOfThings looks like this, listOfThings: [ "thing1", "thing2", ...]
for example, filenames: ["file1.txt", "file2.txt", ...]
then #item() becomes file1.txt etc.
whereas
If listOfThings looks like this, listOfThings: [ {"key1":"value1", "key2":"value2" ... }, {"key1":"value1", "key2":"value2" ... }, ...]
for example. filenames: [ {"folder":"folder1", "filename":"file1.txt"}, {"folder":"folder2", "filename":"file2.txt"}, ... ]
then #item().filename becomes file1.txt etc.
Related
Sorry if this is a bit vague or rambly, I'm still getting to grips with Data Factory and a lot of it seems a bit obtuse...
What I want to do is query my Cosmos Database for a list of Ids of records that need to be updated. For each of these records, I want to call a REST API using the Id (i.e. /Record/{Id}/Details)
I've created a Data Flow that took a string as a parameter and then called the REST API fine.
I then made a pipeline using a Lookup with a query (select c.RecordId from c where...) and pass that into a ForEach with items set to #activity('Lookup1').output.value
I then setup the Activity of the ForEach to my Data flow. From research, I think I'm supposed to set the Parameter value to "#item().RecordId", but that gives an error "parameter [name] does not match parameter type 'string'".
I can change the type of the parameter to any (and use toString([parameter]) to cast it ) and then when I try and debug it passes the parameter in, but it gives an error of "Job failed due to reason: at (Line 2/Col 14): Datatype any not found".
I'm not sure what the solution is. Is there a way to cast the result of the lookup to an integer or string? Is there a way to narrow an any down? Is there a better way than toString() that would work? Is there a better way than ForEach?
I tried to reproduce similar scenario what you are trying.
My sample data in cosmos
To query Cosmos Database for a list of Ids and call a REST API using the Id For each of these records.
First, I took Lookup activity in data factory and selected the id's where the last_name is Bluth
Its output and settings are as below:
Then I passed the output of lookup activity to For-each activity.
Then inside for each activity I created Dataflow activity and for that DataSource I gave the source as Rest API. My Rest API to call specific user is https://reqres.in/api/users/2 I gave base URL as https://reqres.in/api/users.
Then I created parameter called demoId as datatype string and in relative URL I gave that dynamic value as #dataset().demoId
After this I gave value source parameter as #item().id as after https://reqres.in/api/users there is only id should be provided to get data in you case you can try Record/#{item().id}/Details.
For each id it is successfully passing id to rest API and fetching data:
I am making a call to a REST API with Azure Synapse and the return dataset looks something like this:
{
"links": [
{
"rel": "next",
"href": "[myRESTendpoint]?limit=1000&offset=1000"
},
{
"rel": "last",
"href": "[myRESTendpoint]?limit=1000&offset=60000"
},
{
"rel": "self",
"href": "[myRESTendpoint]"
}
],
"count": 1000,
"hasMore": true,
"items": [
{
"links": [],
"closedate": "6/16/2014",
"id": "16917",
"number": "62000",
"status": "H",
"tranid": "0062000"
},...
],
"offset": 0,
"totalResults": 60316
}
I am familiar with making a REST call to a single endpoint that can return all the the data with a single call using a Synapse pipeline, but this particular REST endpoint has a hard limit on only returning 1000 records, but it does give a property named "hasMore".
Is there a way to recursively make rest calls in a Synapse pipeline until the "hasMore" property equals false?
The end goal of this is to sink data to either a dedicated SQL pool or into ADLS2 and transform from there.
I have tried to achieve the same scenario using Azure Data Factory which seems to be more appropriate and easy to achieve the goal "The end goal of this is to sink data to either a dedicated SQL pool or into ADLS2 and transform from there".
As you have to hit the page recursively to fetch 1000 records , you might set it in the following fashion if the response header/response body contain the URL for the next page.
You're less likely to be able to use the functionality if the next page link or query parameter isn't included in the response headers/body.
Alternatively, you may utilise loop logic and do the Copy Activity.
Create two parameters in the Rest Connector:
Fill in the parameters for the RestConnector's relative URL.
Using the Set Variable action, the value of this variable would be increased in a loop. For each cycle, the URL for the Copy Activity is dynamically set.If you want to loop or iterate, you may use the Until activity.
Alternative:
In my experience, the REST connection pagination is quite rigid. Usually put the action within a loop. As a result, to have more control.
FOREACH Loop, here
For those following the thread, I used IpsitaDash-MT's suggestion using the ForEach loop. In the case of this API, when a call is made I get a property returned at the end of the call named "totalResults". Here are the steps I used to achieve what I was looking to do:
Make a dummy call to the API to get the "totalResults" parameter. This is just a call to return the number of results I am looking to get. In the case of this API, the body of the request is a SQL statement, so when the dummy request is made I am only asking for the ID's of the results I am looking to get.
SQL statement example
I then take the property "totalResults" from that request set a dynamic value in the "Items" of the ForEach loop like this:
#range(0,add(div(sub(int(activity('Get Pages Customers').output.totalResults),mod(int(activity('Get Pages Customers').output.totalResults),1000)),1000),1))
NOTE: The API only allows pages of 1000 results, I do some math to get a range of page numbers. I also have to add 1 to the final result to include the last page.
ForEach Loop Settings
In the API I have two parameters that can be passed "limit" and "offset". Since I want all of the data there is no reason to have limit set to anything other than 1000 (the max allowable number). The offset parameter can be set to any number less than or equal to "totalResults" - "limit" and greater than or equal to 0. So I use the range established in step 2 and multiply it out by 1000 to set the offset parameter in the URL.
Setting the offset parameter in the copy data activity
Dynamic value of the Relative URL in the REST connector
NOTE: I found it better to sink the data as JSON into ADLS2 first rather than into a dedicated SQL pool due to the Lookup feature.
Since synapse does not allow nested ForEach loops, I run the data through a data flow to format the data and check for duplicates and updates.
When the data flow is completed it kicks off a lookup activity to get the data that was just processed and pass it into a new pipeline to use another ForEach loop to get the child data for each ID of parent data.
Data Flow and Lookup for child data pipeline
I have an Azure Stream Analytics job that uses an EventHub and a Reference data in Blob storage as 2 inputs. The reference data is CSV that looks something like this:
REGEX_PATTERN,FRIENDLY_NAME
115[1-2]{1}9,Name 1
115[3-9]{1}9,Name 2
I then need to lookup an attribute in the incoming event in EventHub against this CSV to get the
FRIENDLY_NAME.
Typical way of of using reference data is using JOIN clause. But in this case I cannot use it because such regex matching is not supported with LIKE operator.
UDF is another option, but I cannot seem to find a way of using reference data as a CSV inside the function.
Is there any other way of doing this in an Azure Stream Analytics job?
As I know, the JOIN is not supported in your scenario. The join key should be specific, can't be a regex value.
Thus, reference data is not suitable here because it should be used in the ASA sql like below:
SELECT I1.EntryTime, I1.LicensePlate, I1.TollId, R.RegistrationId
FROM Input1 I1 TIMESTAMP BY EntryTime
JOIN Registration R
ON I1.LicensePlate = R.LicensePlate
WHERE R.Expired = '1'
The join key is needed. What I mean is that the reference data input is not needed even here.
Your idea is using UDF script and load the data in the UDF to compare with the hardcode regex data. This idea is not easy to maintain. Maybe you could consider my workaround:
1.You said you have different reference data,please group them and store as json array. Assign one group id to every group. For example:
Group Id 1:
[
{
"REGEX":"115[1-2]{1}9",
"FRIENDLY_NAME":"Name 1"
},
{
"REGEX":"115[3-9]{1}9",
"FRIENDLY_NAME":"Name 2"
}
]
....
2.Add one column to referring group id and set Azure Function as Output of your ASA SQL. Inside Azure Function, please accept the group id column and load the corresponding group of json array. Then loop the rows to match the regex and save the data into destination residence.
I think Azure Function is more flexible then UDF in ASA sql job. Additional,this solution is maybe easier to maintain.
I have a pipeline that contains a list of IDs as input and I need to iterate through these IDs and call a REST API using batches of 10 IDs per time (these IDs will be passed as a parameter into JSON request).
1) Is there any approach using forEach activity in Data Factory passing the step size?
2) Do you have any other suggestions of how to accomplish this?
I have tried using "forEach" loop and also thinking in a way to use "setVariable" and "appendVariable" activities to store the current index during the loop, but also couldn't find a way to get the current index during the "forEach".
You should use a LookupActivity. With that you can get information from database, files or whatever and them pass it to a ForEach Loop.
Consider I have the following information in my txt file:
name|age
orochiBrabo|25
NarutoBoy|98
You can recover it using LookupActivity which I will call MyLookUp and then connect it box with a ForEach Box.
In ForEach Activity setting tab you write #activity('MyLookUp').output.value and now you can iterate over all rows in the file. Inside your ForEach you can refer results like item().age , item().name or item().myColumnName.
I'm calling API using HTTP connector getting result array data. and used until loop. so every time i will get some records into result array.
Now I want to append all records so that i will those all.
Like 1st time i got 2 records like below and 2nd time 1 then I want to append so that it will be 3 total.
1st iteration result -
"results":[
{"id":"2","name":"t1"},{"id":"3","name":"t4"}
]
2nd iteration result -
"results":[
{"id":"66","name":"i7"}]
I want to append all data so that final result will be like -
[{"id":"2","name":"t1"},{"id":"3","name":"t4"}, {"id":"66","name":"i7"}]
instead of foreach I tried using append array variable but it throws below error -
its a type of array need to be string to append.
I can able to achieve it using foreach but it does not make sense just to add values use foreach instead if we found any way to directly add array it will be great.
You can use JS inline code to implement your requirement. I did some test on my side, post to arrays(result1 and result2) to logic app and compose them using JS :
Result :
Please note if you want to use this feature , you should create an integration account and associated with your logic app in "Workflow settings" blade .
The above solution works only if you have integration account.
Other simple option - use union function inside compose action to append two array collections:
union(variables('ResponseArray'),body('Response'))
https://learn.microsoft.com/en-us/azure/logic-apps/workflow-definition-language-functions-reference#union