Azure Data Flow - Can we have Dynamic columns or change in projections for Unpiovt functionality - azure

The excel consist of 62 columns and 7 columns are fixed and rest of them have weeks as in year(week1 to week 52)
I have used a data flow task to unpivot the 53 columns into rows with 2 extra columns year and value.
The problem is that I have the 52 week column names keep changing on every week data load and how to I handle this change in column names in data flow. For a single run it gives the exact output

What you'll want to do here is to implement late-binding of your schema, or what ADF refers to as "schema drift". Instead of setting a hardened "early binding" schema in your Source projection, leave the dataset schema and projection empty.
Next, add a Derived Column after your source and call it "Projection". This is where you'll build your projection using rules to account for your evolving schema.
Build out your canonical model with the column names for your entire year using byName('columnname'). That will tell ADF to look for the existence of the column in single quotes from your source data while also providing a schema that you can use to build out your pivot table.
If you need to cast the values, wrap byName() inside of a casting function, i.e. toString(), toDate(), etc.

Related

How to change types of multiple columns on Datafactory

I have a table with hundreds of columns.
This table will be modified in a pipeline on Datafactory.
One of these modifications is: change columns types from original type to desired type.
For example: the table have 50+ columns with type 'decimal' but i need to change them to 'integer'. All of this columns have a name starting with 'CD_'
How can i do that on Datafactory's Dataflow?
I'm trying to use Column Patterns in Derived Column Transformation. I created this following pattern (see picture attached).
My Column Pattern
But when i update the Data Visualization tab, i got the following error:
DF-EXPR-021 at Derive 'ChangeTypes'(Line 589/Col 30): Return types do not match
Thank you very much.

Azure Data Factory - Exists transformation in Data Flow with generic dataset

I'm having issues using the Exists Transformation within a Data Flow with a generic dataset.
I have two sources (one from staging table "sourceStg", one from DWH table "sourceDwh") and want to compare if the UniqueIdentifier-Column in the staging table is existing in the UniqueIdentifier-Column in the DWH table. For that I have a generic data set which I query with a SQL statement containing parameters.
When I open the "Exists settings" I cannot choose any Column from the source in the conditions since the source is generic and has no Projection until I run the data flow. However, I have a parameter which I get from the parent pipeline which provides me the name of the Column containing the UniqueIdentifier (both column names in staging / DWH are the same).
I tried to add following statement "byName($UniqueIdentifier)" in the left and right column field but the engine resolves them both as the sourceStg-Column since the prefix of the source-transformations is missing and it defaults to the first one. What I basically now try to achieve is having some statement as followed defining the correct source-transformation and the column containing the unique identifier with a parameter.
exists(sourceStg#$UniqueIdentifier == sourceDwh#$UniqueIdentifier)
But either the expression cannot be parsed or the result does not retrieve the actual UniqueIdentifier value from the column but writes the statement (e.g. sourceStg#$UniqueIdentifier) as column value.
The only workaround I found so far is having two derived columns which adds a suffix to the UniqueIdentifier-Column in one source and a new parameter $UniqueIdentiferDwh which is populate with the parameter $UniqueIdentifier and the same suffix as used in the derived column.
Any Azure Data Factory experts out there to help?
Thanks in advance!

implementing scd type 2 as a generalized stored procedure in azure data warehouse for all dimensions?

I am new to azure and I am working on Azure data warehouse. I have loaded few dimensions and staging tables. I want to implement SCD type 2 as a generalised procedure for all the updates with hashbytes. As we know, ADW doesnt support merge, I am trying to implement this with normal insert and update statements with a startdate and enddate column. But the schemas of the dimension table are not exactly the same as those in the staging table, there are few columns that are not considered.
Initially i thought i will pass in the staging and dimension table as parameters, and fetch schema from sys objects, create a temp table, load necessary column from stage and do a hashbyte and compare hashbyte from temp and dimension, but is this a good approach?
PS: Also one more problem, is the sometimes the column names are mapped different, like branchid as branch_id. How do i fetch columns for these. Note that this is just one case, and this could be the case in many tables as well.

SSAS Calculated member that knows if the user is using the report filter

I am trying to write a calculated member which acts differently depending on whether the user is filtering by that member or has it dragged down as rows or columns on their pivot table (using Excel).
The rules are:
1. If the user is using the date dimensin as a Report Filter in Excel, then the calculated member should get the maximum date out of all dates that they are filtered by.
2. If they have the date dimension as rows on the pivot table, then I need to apply ClosingPeriod and some other logic.
Please try this. The idea came from here.
Basically the dynamic named set represents what's in the report filters. And the EXISTING keyword trims the list of days down to the filter context of the current cell letting you detect say if one month is on rows. Compare counts and you can detect what the user did.
CREATE HIDDEN DYNAMIC SET CURRENTCUBE.SelectedDays as
[Date].[Date].[Date].Members;
CREATE MEMBER CURRENTCUBE.[Measures].[My Calc] as
CASE
WHEN SelectedDays.Count > {existing [Date].[Date].[Date].Members}.Count
THEN Tail({existing [Date].[Date].[Date].Members},1).Item(0).Item(0).Name
WHEN SelectedDays.Count < [Date].[Date].[Date].Members.Count
THEN Tail(SelectedDays,1).Item(0).Item(0).Name
END
Performance is going to not be good. And I suspect users will be confused with the results of your calc. If you want to describe the business scenario more I can maybe recommend a better approach.

How to get value from nested relations in Power Pivot?

I'm using Power Pivot add-in to create a data warehouse for generate dynamic tables and graphs (strictly data source is Excel), but I have a problem whit a calculate in the relations. My data model is the following:
My Snowflake data warehouse model
So for the fact table "fSales" I need to multiply the dCostDetail[Value] per dWorkCost[Value] to generate the fSales[Expenses] amount.
I tried to use the formula but I get an error: related but it don't allow to nested between the relations, e.g. fSales[Expenses] = related(dCostDetail[Value])*related(dWorkCost[Value])
Also I tried to use the next formula:
fSales[Expenses] = related(dWorkCost[Value]) * Calculate(Calculate(Calculate(Value(dCostDetail[Value]), Userelationship(fSales[IdProduct],dProduct[Sku]),Userelationships(dProduct[IdCateg],dCategory[IdCategory]), Userelationships(dCategory[IdCategory],dCostDetail[IdCateg]))))
And I need this "type" of normalized model to have the details when I analyze the information, e.g. filter, but if you know another way to generate the calculation it would be ok.
RELATED doesn't work in measures, because it evaluates on a record-by-record level. So you're on the right track, but what you need to do is create a column in Powerpivot in the fSales table called "Cost Detail" or whatever, and use a RELATED formula there to pull in that value from the CostDetail table. Create another column and do the same thing to pull in the dWorkCost value into the fSales table.
Then you can do a measure for the expenses like this:
Expenses:=SUM([whatever you called CostDetailColumn])*SUM([whatever you called WorkCostColumn])
You should be able to drop that measure into a pivot and it should do what you're looking for.

Resources