How do you filter for a string (not) containing a substring in an Azure Data Factory data flow expression?

How do you filter for a string (not) containing a substring in an Azure Data Factory data flow expression? - azure

As someone with a background in Alteryx, it has been a slow process to get up to speed with the expressions and syntax within Azure Data Factory data flows. I am trying to filter out rows containing the following string in a similar manner to this Alteryx filter code below:
!Contains([Subtype], "News")
After scrolling through all the string expressions in Azure Data Factory, I am struggling to find anything similar to the logic above. Thanks in advance for any help you can provide me on this front!

You can use Filter transformation in ADF Data flow and give the condition
for any column like below:
My Sample Data:
Here I am filtering out the rows the which contains a string of "Rakesh" in the Name column with the Data flow expression instr(Name,"Rakesh")==0.
instr() returns number of common letters. Our condition satisfies if its result is 0.
Filter Transformation:
.
Output in Data preview of filter:
You can see the remaining rows only in the result.

Related

Azure Data Flow Flatten and Parsing key/value column

I'm trying to transform a key/value data into column using Azure Data Flow. Basically this:
{"key":"rate1","value":"123"}-{"key":"rate2","value":"456"}
into this:
key
value
rate1
123
rate2
456
I was following this example here ( Flatten and Parsing Json using Azure Data Flow ), and everything was look good until I tried to use parse.
The output just shows the value column, not the key. I don't know why. Below are my dataflow settings.
Source query: https://i.stack.imgur.com/6Q8Xb.png
Source Data preview: https://i.stack.imgur.com/UNj8x.png
Derived Column: https://i.stack.imgur.com/C0g1N.png
Derived Column Data preview: https://i.stack.imgur.com/vtVY7.png
Flatten: https://i.stack.imgur.com/Bkp7P.png
Flatten Data preview: https://i.stack.imgur.com/yM6h1.png
Parse: https://i.stack.imgur.com/RUJpr.png
Parse Data preview: https://i.stack.imgur.com/RC42Y.png
Anyone have any idea what I'm missing?
Edit: My source is Snowflake
Thanks in advance!

I reproduced the above and got same result after parse transformation.
The above process is correct, may be the preview is not showing correctly. You can view the desired result as individual columns by using derived column transformation after parse.
In sink select the desired columns by Mapping->deselect auto mapping->+->Fixed mapping.
Sink Data preview:.

Data Flow - Window Transformation - NTILE Expression

I'm attempting to assign quartiles to a numeric source data range as it transits a data flow.
I gather that this can be accomplished by using the ntile expression within a window transform.
I'm failing in my attempt to use the documentation provided here to get any success.
This is just a basic attempt to understand the implementation before using it for real application. I have a numeric value in my source dataset, and I want the values within the range to be spread across 4 buckets and defined as such.
Thanks in advance for any assistance with this.

In Window transformation of Data Flow, we can configure the settings keeping the source data numeric column in “Sort” tab as shown below:
Next in Window columns tab, create a new column and write expression as “nTile(4)” in order to create 4 buckets:
In the Data Preview we can see that the data is spread across 4 Buckets:

Azure Data Factory DataFlow exclude 1 column from expression columns()

I'm looking for a solution for the following problem.
I've created the following expression in a Derived Column in Azure Data Factory DataFlow
md5(concatWS("||", toString(columns())))
But from the above expression column() I want to extract 1 Column.
so something like this md5(concatWS("||", toString(columns()-'PrimaryKey'))). I cannot exclude the primary key column with a select in front of the derived because I need it in a later stage.
So in Databricks i'm executing the following, but I want to achieve this as well in ADF
non_key_columns = [column for column in dfsourcechanges.columns if column not in key_columns]
Are there any suggestions, how I can solve this

You can try to use byNames function to do this. Create an array and add all your column names into it except 'PrimaryKey'. Then pass it to byNames function as first parameter. Something like this expression:md5(concatWS("||", toString(byNames(['yourColumn1','yourColumn2',...]))))

Azure Data Factory - Exists transformation in Data Flow with generic dataset

I'm having issues using the Exists Transformation within a Data Flow with a generic dataset.
I have two sources (one from staging table "sourceStg", one from DWH table "sourceDwh") and want to compare if the UniqueIdentifier-Column in the staging table is existing in the UniqueIdentifier-Column in the DWH table. For that I have a generic data set which I query with a SQL statement containing parameters.
When I open the "Exists settings" I cannot choose any Column from the source in the conditions since the source is generic and has no Projection until I run the data flow. However, I have a parameter which I get from the parent pipeline which provides me the name of the Column containing the UniqueIdentifier (both column names in staging / DWH are the same).
I tried to add following statement "byName($UniqueIdentifier)" in the left and right column field but the engine resolves them both as the sourceStg-Column since the prefix of the source-transformations is missing and it defaults to the first one. What I basically now try to achieve is having some statement as followed defining the correct source-transformation and the column containing the unique identifier with a parameter.
exists(sourceStg#$UniqueIdentifier == sourceDwh#$UniqueIdentifier)
But either the expression cannot be parsed or the result does not retrieve the actual UniqueIdentifier value from the column but writes the statement (e.g. sourceStg#$UniqueIdentifier) as column value.
The only workaround I found so far is having two derived columns which adds a suffix to the UniqueIdentifier-Column in one source and a new parameter $UniqueIdentiferDwh which is populate with the parameter $UniqueIdentifier and the same suffix as used in the derived column.
Any Azure Data Factory experts out there to help?
Thanks in advance!

Azure-Data-Factory - If Condition returns false despite being logically true

I'm trying to do a logical test to compare two activity outputs.
The first one is giving back a file name (derived from GetMetaData)and the other one distinct filenames that are already in the database (derived from a lookup Activity).
So the first activity is giving X.csv (a file in a Blob0 while the second one is giving a list Y.csv; Z.csv (the result of the lookup Select distinct from table X)
Based on this outcome I would say that the logical test is true so ADF has to start a particular activity. I'm using the expresion below, but despite the fact there are no errors the outcome is always false. What am I doing wrong? I guess it has something to do with the lookup activity because the query will give a list of values I think.
please help thanks in advance!
#equals(activity('GetBlobName').output,activity('LookupBestandsnaam').output)
Output activity LookupBestandsnaam:
Output activity GetBlobName:

The output of Lookup and Get Metadata are different:
Lookup activity reads and returns the content of a configuration file
or table.
Get Metadata activity to retrieve the metadata of any data in Azure
Data Factory
We can't compare the output directly. You will always get false in the if condition expression.
Please try bellow expression:
#equals(activity('GetBlobName').output.value.name,activity('LookupBestandsnaam').output.value.bestandsnaam)
Update:
Congratulations that you use another way to solved it:
"I have now replaced the if condition with a stored procedure that uses an IF exists script running on the basis of look-up activity in ADF."

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string