Azure Data Factory - copy Lookup activity output to json in blob - azure

in ADF below is the output of my lookup activity (this the header of a flat file from SFTP)
{
"firstRow": {
"Prop_0": "000",
"Prop_1": "IN",
"Prop_2": "12123",
"Prop_3": "XYZ_ABC",
"Prop_4": "20211011",
"Prop_5": "034255",
"Prop_6": "272023"
}
Can someone help me the approach to transform this to a JSON file with custom field names instead of prop_x and save to a Blob storage

You can simply leverage the additional column in a copy activity.
Follow your lookup activity by the copy activity:
In the source settings of the copy activity, add the new column names(i.e. the ones you expect in json). Here I used p0, p1...
Taking p0 as example, you can simply put #activity('Lookup1').output.firstRow.Prop_0 in the dynamic content.
Then in the Mapping tab, you just map to the target columns of your json file. (Assume you already import the schema of your target json)

Related

Azure Data Factory Pipeline - Store single-value source query output as a variable to then use in Copy Data activity

I am looking to implement an incremental table loading pipeline in ADF. I want to execute a query to get the latest timestamp from the table in an Azure SQL database. Then, store this value as a variable in ADF so I can then reference it in the "Source" query of a Copy Data activity.
The goal is to only request data from an API with a timestamp greater than the latest timestamp in the SQL table.
Is this functionality possible within ADF pipelines? or do I need to look to Azure functions or Data Flows?
This is definitely possible with Data Factory. You could use the Lookup Activity or a Stored Procedure, but the team just released the new Script Activity:
This will return results like so:
{
"resultSetCount": 1,
"recordsAffected": 0,
"resultSets": [
{
"rowCount": 1,
"rows": [
{
"MaxDate": "2018-03-20"
}
]
...
}
Here is the expression to read this into a variable:
#activity('Script1').output.resultSets[0].rows[0].MaxDate

Azure Stream Analytics: Regex in Reference Data

I have an Azure Stream Analytics job that uses an EventHub and a Reference data in Blob storage as 2 inputs. The reference data is CSV that looks something like this:
REGEX_PATTERN,FRIENDLY_NAME
115[1-2]{1}9,Name 1
115[3-9]{1}9,Name 2
I then need to lookup an attribute in the incoming event in EventHub against this CSV to get the
FRIENDLY_NAME.
Typical way of of using reference data is using JOIN clause. But in this case I cannot use it because such regex matching is not supported with LIKE operator.
UDF is another option, but I cannot seem to find a way of using reference data as a CSV inside the function.
Is there any other way of doing this in an Azure Stream Analytics job?
As I know, the JOIN is not supported in your scenario. The join key should be specific, can't be a regex value.
Thus, reference data is not suitable here because it should be used in the ASA sql like below:
SELECT I1.EntryTime, I1.LicensePlate, I1.TollId, R.RegistrationId
FROM Input1 I1 TIMESTAMP BY EntryTime
JOIN Registration R
ON I1.LicensePlate = R.LicensePlate
WHERE R.Expired = '1'
The join key is needed. What I mean is that the reference data input is not needed even here.
Your idea is using UDF script and load the data in the UDF to compare with the hardcode regex data. This idea is not easy to maintain. Maybe you could consider my workaround:
1.You said you have different reference data,please group them and store as json array. Assign one group id to every group. For example:
Group Id 1:
[
{
"REGEX":"115[1-2]{1}9",
"FRIENDLY_NAME":"Name 1"
},
{
"REGEX":"115[3-9]{1}9",
"FRIENDLY_NAME":"Name 2"
}
]
....
2.Add one column to referring group id and set Azure Function as Output of your ASA SQL. Inside Azure Function, please accept the group id column and load the corresponding group of json array. Then loop the rows to match the regex and save the data into destination residence.
I think Azure Function is more flexible then UDF in ASA sql job. Additional,this solution is maybe easier to maintain.

How to set up output path while copying data from Azure Cosmos DB to ADLS Gen 2 via Azure Data Factory

I have a cosmos DB collection in the following format:
{
"deviceid": "xxx",
"partitionKey": "key1",
.....
"_ts": 1544583745
}
I'm using Azure Data Factory to copy data from Cosmos DB to ADLS Gen 2. If I copy using a copy activity, it is quite straightforward. However, my main concern is the output path in ADLS Gen 2. Our requirements state that we need to have the output path in a specific format. Here is a sample of the requirement:
outerfolder/version/code/deviceid/year/month/day
Now since deviceid, year, month, day are all in the payload itself I can't find a way to use them except create a lookup activity and use the output of the lookup activity in the copy activity.
And this is how I set the ouput folder using the dataset property:
I'm using SQL API on Cosmos DB to query the data.
Is there a better way I can achieve this?
I think that your way works, but its not the cleanest. What I'd do is create a different variable inside the pipeline for each one: version, code, deviceid, etc. Then, after the lookup you can assign the variables, and finally do the copy activity referencing the pipeline variables.
It may look kind of redundant, but think of someone (or you 2 years from now) having to modify the pipeline and if you are not around (or have forgotten), this way makes it clear how it works, and what you should modify.
Hope this helped!!

Azure Data factory copy activity failed mapping strings (from csv) to Azure SQL table sink uniqueidentifier field

I have an Azure data factory (DF) pipeline that consists a Copy activity. The Copy activity uses HTTP connector as source to invoke a REST end-point and returns csv stream that sinks with Azure SQL Database table.
The Copy fails when CSV contains strings (such as 40f52caf-e616-4321-8ea3-12ea3cbc54e9) which are mapped to an uniqueIdentifier field in target table with error message The given value of type String from the data source cannot be converted to type uniqueidentifier of the specified target column.
I have tried to wrapped the source string with {} such as {40f52caf-e616-4321-8ea3-12ea3cbc54e9} with no success.
The Copy activity will work if I modified the target table field from uniqueIdentifier to nvarchar(100).
I reproduce your issue on my side.
The reason is data types of source and sink are dismatch.You could check the Data type mapping for SQL server.
Your source data type is string which is mapped to nvarchar or varchar, and uniqueidentifier in sql database needs GUID type in azure data factory.
So,please configure sql server stored procedure in your sql server sink as a workaround.
Please follow the steps from this doc:
Step 1: Configure your Sink dataset:
Step 2: Configure Sink section in copy activity as follows:
Step 3: In your database, define the table type with the same name as sqlWriterTableType. Notice that the schema of the table type should be same as the schema returned by your input data.
CREATE TYPE [dbo].[CsvType] AS TABLE(
[ID] [varchar](256) NOT NULL
)
Step 4: In your database, define the stored procedure with the same name as SqlWriterStoredProcedureName. It handles input data from your specified source, and merge into the output table. Notice that the parameter name of the stored procedure should be the same as the "tableName" defined in dataset.
Create PROCEDURE convertCsv #ctest [dbo].[CsvType] READONLY
AS
BEGIN
MERGE [dbo].[adf] AS target
USING #ctest AS source
ON (1=1)
WHEN NOT MATCHED THEN
INSERT (id)
VALUES (convert(uniqueidentifier,source.ID));
END
Output:
Hope it helps you.Any concern,please free feel to let me know.
There is a way to fix guid conversion into uniqueidentifier SQL column type properly via JSON configuration.
Edit the Copy Activity via Code {} button in top right toolbar.
Put:
"translator": {
"type": "TabularTranslator",
"typeConversion": true
}
into typeProperties block of the Copy activity. This will also work if Mapping schema is unspecified / dynamic.

How to output a value from ADF .NET Custom Activity

I have an ADF that takes a Dataset input from Azure Data Lake Storage, it then has a pipeline with a custom .NET activity. The activity moves files from their Import folder into a custom folder location ready for processing and then deletes the original file.
I want to be able to pass the custom folder location back out into the activities pipeline so that I can give it to the next activity.
Is there a way of outputting the custom folder string to the activities pipeline?
Thank you,
We have an improve item the output can be next customer activity's input. Its pending deployment. You can have a try by the end of this month :)
For how to use this feature:
Update code: Activity1:
Execute(...)
{
return new Dictionary<string, string> { "**Foo**", "Value" }
}
Update pipeline: Activity2 Json:
"extendedProperties": { "ValueOfFoo", "**$$Foo**" }
If you want to use a "custom activity" as the title of the question suggest, this is possible using
Reference : https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-custom-activity#retrieve-execution-outputs
#activity('MyCustomActivity').output.outputs[0]"
You can also consume the output in another activity as described here:
Reference : https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-custom-activity#pass-outputs-to-another-activity
You can send custom values from your code in a Custom Activity back to the service. You can do so by writing them into outputs.json from your application. The service copies the content of outputs.json and appends it into the Activity Output as the value of the customOutput property. (The size limit is 2MB.) If you want to consume the content of outputs.json in downstream activities, you can get the value by using the expression
#activity('<MyCustomActivity>').output.customOutput

Resources