How to modify dynamic complex data type fields in azure data factory data flow - azure

This bounty has ended. Answers to this question are eligible for a +400 reputation bounty. Bounty grace period ends in 22 hours.
kevzettler is looking for a more detailed answer to this question.
I have a complex data type (fraudData) that undesirably has hyphen characters in the field names I need to remove or change the hypens to some other character.
The input schema of the complex object looks like:
I have tried using the "Select" and "Derive Column" data flow functions and adding a custom mapping. It seems both functions have the same mapping interface. My current attempt with Select is:
This gets me close to the desired results. I can use the replace expression to convert hypens to underscores.
The problem here is that this mapping creates new root level columns outside of the fraudData structure. I would like to preserve the hierarchy of the fraudData structure and modify the column names in place.
If I am unable to modify the fraudData in place. Is there any way I can take the new columns and merge them into another complex data type?
Update:. I do not know the fields of the complex data type in advance. This is a schema drift problem. This is why I have tried using the pattern matching solution. I will not be able to hardcode out kown sub-column names.

You can rename the sub-columns of complex data type using derived column transformation and convert them as a complex data type again. I tried this with sample data and below is the approach.
Sample complex data type column with two sub fields are taken as in below image.
img:1 source data preview
In Derived column transformation, For the column fraudData, expression is given as
#(fraudData_1_chn=fraudData.{fraudData-1-chn},
fraudData_2_chn=fraudData.{fraudData-2-chn})
img:2 Derived column settings
This expression renames the subfields and nests them under the parent column fraudData.
img:3 Transformed data- Fields are renamed.
Update: To rename sub columns dynamically
You can use below expression to rename all the fields under the root column fraudData.
#(each(fraudData, match(true()), replace($$,'-','_') = $$))
This will replace fields which has - with _.
You can also use pattern match in the expression.
#(each(fraudData, patternMatch(`fraudData-.+` ), replace($$,'-','_') = $$))
This expression will take fields with pattern fraudData-.+ and replace - with _ in those fields only.
Reference:
Microsoft document on script for hierarchical definition in data flow.
Microsoft document on building schemas using derived column transformation .

Related

Dynamic column masking in Azure with derived column

I am building a data flow within azure data factory and I would like to apply some GDPR masking rules within the flow.
What I would like to do is the following: In the Derived Column (or other component) I would like to match my input columns with a reference array and for the columns that matches between my input and reference array I would like to replace/mask those values.
Power point over the data flow and what I would like to do
I have tried some IN and regex functions but I have not gotten it yet. Anyone that know how and if this is possible?
Update: I might have got somewhere with the SELECT component. However, there's something that I don't quite get:
Let's say that I have a data flow parameter called ColumnsToMask of the type string[]. I define the variable content as ['a', 'b']. (a and b are two of my input columns.)
In the SELECT component I add a rule based mapping รก: in($ColumnsToMask, name)
That don't work for some reason. However, this works:
in(['a', 'b'], name)
(By work I mean that I get the matching columns added to my output.)
Anyone knows what I am doing wrong setting my parameter?
Update2.5
Changed the text to a picture in an effort to hopefully explain it a bit better:
How come the evaluated expression works but not the expression itself?
So when I use the evaluated expression everything works like I would like it to but when I try the variable that holds the value it for some reason does not work. What should I change?
Your parameter definition would look something like this:

Look Up not working as expected in SSRS expression

I have 2 datasets in form of lists (Share point) in my rdl in Visual Studio 2012.
I have BranchCode column as the common column in both my data sets. One tablix in my report where I am writing an expression for looking up BranchCode from dataset1 with BranchCode of dataset2. If it is true then I want it to retrieve the corresponding BranchCost value from dataset2.
I am able to write the lookup expression but final o/p is just a blank value. Can somebody please help me out with this?
I always recommend casting your datatypes in expressions.
So what you should have is something like this:
=LOOKUP(Fields!BranchCode.Value, Fields!BranchCode.Value, Fields!BranchCost.Value, "DataSet2")
You would use the VB.NET functions to cast your values to be the same. Common examples are CSTR() - string, CINT() - int and CDEC() -decimal
=LOOKUP(CSTR(Fields!BranchCode.Value), CSTR(Fields!BranchCode.Value), Fields!BranchCost.Value, "DataSet2")
If it is a string you could also wrap it in the RTRIM() function to make sure there are no trailing spaces.
If you still have issues I recommend outputting the data in both DataSets into tables in the report. Run the report and inspect the data to ensure the DataSets contain the expected data. I also like to add special characters around strings in the table such as # so you can easily identify any leading or trailing spaces.

Excel Dependent Drop Down List with Repetitive Values

I have some data stored in an hierarchical way like this:
And I want to create three drop down lists from where you can select the Product in the Category based on the Store,Something like this:
The tricky thing here is the fact that a product(i.e Frozen Pizza) can be found on both stores while others (Lays) can only be found on one store.
How can I achieve this or how can I store the data in such a way that I can have the same result?.
I've tried things like named range with the data stored in a table like structure and with =INDIRECT (but won't work because of illegal characters like spaces,symbols,etc in the named range).I am looking for a Formula not a VBA.
I think you were on the right track. If not using VBA, I would use data stored in a table with named ranges and the INDIRECT formula.
That approach would be arduous as you would have to build out each list in its own range (e.g. products in category 1 of store 1, products in category 2 of store 1, etc.).
Also, as you mentioned, the named ranges are strict, so you would need to convert spaces and symbols to _ or omit them completely. You could consider using numeric IDs in the drop down lists instead of text, but the user would need to know what the IDs represent. You could then translate the IDs back to text using a lookup table once selected.
VBA would certainly provide a better solution.

How to eval a field name contained in another field in an Access Query?

I need to create a long list of complex strings, containing the data of different fields in different places to create explanatory reports.
The only way I conceived, in Access 2010, is to save text parts in a table, together with field names to be used to compose the string to be shown (see line1 expression in figure). Briefly:
//field A contain a string with a field name:
A = "[Quantity]"
//query expression:
=EVAL(A)
//return error instead the number contained in field [Quantity], present in the query dataset
I thought doing an EVAL on a field (A), to obtain the value of the field (B) which name is contained in field A. But seems not working.
Any way exist?
Example (very simplified):
Sample query that EVAL a field containing other field names to obtain the value of the fields
Any Idea?
PS: Sorry for my english, not my mothertongue.
I found a interesting workaround in another forum.
Other people had same problem using EVAL, but found that it is possible to substitute a string with a field contents using REPLACE function.
REPLACE("The value of field Quantity is {Quantity}";"{Quantity}";[Quantity])
( {} are used only for clarity, not needed if one knows that words to be substituted do not compare in the string). Using this code in a query, and nesting as many REPLACE as many different fields one want to use:
REPLACE(REPLACE("<Salutation> <Name>";"<Salutation>";[Salutation]);"<Name>";[Name])
it is possible to embed fields name in a string and substitute them with the current value of that field in a query. Of course the latter example can be done more simply with a concatenation (&), but if the string is contained in a field instead that hardcoded, it can be linked to records as needed.
REPLACE(REPLACE([DescriptiveString];"[Salutation]";[Salutation]);"[Name]";[Name])
Moreover, it is possibile to create complex strings context-based as:
REPLACE(REPLACE(REPLACE("{Salutation} {Name} {MaidenName}";"{Salutation}";[Salutation]);"{Name}";[Name]);"{MaidenName}";IIF(Isnull([MaidenName]);"";[MaidenName]))
The hard part is to enumerate all the field's placeholders one wants to insert in the string (like {Quantity},{Salutation}, {Name}, {MaidenName}) in the REPLACE call, while with EVAL one would avoid this boring part, if only it was working.
Not as neat as I would, but works.

Correlations/Data Mining in Microsoft Excel 2003

I have an Excel spreadsheet where each column is a certain variable. At the end of my columns I have a special last column called "Type" which can be A, B, C, or D.
Each row is a data point with different variables that ends up in a certain "Type" bucket (A/B/C/D) recorded in the last column.
I need a way to examine all entries of a certain type (say, "C" or "C"|"D") and find out which of the variable(s) is a good predictor of this last column, and which are better predictors than others.
Some variables are numbers, others are fixed strings (from a set of strings), so it's not just a number/number correlation.
Is Excel 2003 a good tool for that, or are there better statistical programs that make this easier? Do I create a Pivot/Histogram for each category, or is there a better way to run these queries? Thanks
You can make some filtering, especially to clean the data (I mean, to change the data values into one type, string or numeral) using microsoft excel. Execl also makes some data mining. However, for the kind of problems you have, a good tool that I recommend you is WEKA. Using this tool, you can make associative classification prediction (i.e., class association rule mining)of all data instances(rows) and therefore, you can determine which items fall belong to A/B/C/D. Your special attribute will be your class attribute.

Resources