Populate missing data points in azure dataflow - azure

We are working on building ETL pipeline using Azure data flows.
Our requirement here is have to fill in the missing data points (add rows as required) and data for it to be copied from the previous available data point ( when sorted on key columns )
Example -
If the input data is :
The output should be like this:
The rows highlighted in green have values copied from previous available key columns ( Name, year and period )
Any idea how i can achieve the same in azure data flow.

You can use a combination of mapLoop function to generate years + quarters in 1 column. Then flatten tx it to get a table of years+quarters. Then left outer join that table to the original table.
You will have the resulting tables with nulls for the missing quarters. Then use the filldown technique to fill in values(this only works for small data)

Related

Excel data tables: Multiple outputs with only one input column

I am trying to create a data table with multiple outputs across periods, but for the same scenarios.
Is it possible to create that without inserting an extra column between each output column to deliver input for the data table (i.e. input column = index 50-110).
Is this in any way possible? See picture of what I would usually mark to create the data table (this does only cover one period/output though). But if I were to make the scenario for FY23, then I would need to insert a column between FY22 and FY23 where I copy the index 50-110 again. I would like to not have to do that.

Delete bottom two rows in Azure Data Flow

I would like to delete the bottom two rows of an excel file in ADF, but I don't know how to do it.
The flow I am thinking of is this.
enter image description here
*I intend to filter -> delete the rows to be deleted in yellow.
The file has over 40,000 rows of data and is updated once a month. (The number of rows changes with each update, so the condition must be specified with a function.)
The contents of the file are also shown here.
The bottom two lines contain spaces and asterisks.
enter image description here
Any help would be appreciated.
I'm new to Azure and having trouble.
I need your help.
Add a surrogate key transformation to put a row number on each row. Add a new branch to duplicate the stream and in that new branch, add an aggregate.
Use the aggregate transformation to find the max() value of the surrogate key counter.
Then subtract 2 from that max number and filter for just the rows up to that max-2.
Let me provide a more detailed answer here ... I think I can get it in here without writing a separate blog.
The simplest way to filter out the final 2 rows is a pattern depicted in the screenshot here. Instead of the new branch, I just created 2 sources both pointing to the same data source. The 2nd stream is there just to get a row count and store it in a cached sink. For the aggregation expression I used this: "count(1)" as the row count aggregator.
In the first stream, that is the primary data processing stream, I add a Surrogate Key transformation so that I can have a row number for each row. I called my key column "sk".
Finally, set the Filter transformation to only allow rows with a row number <= the max row count from the cached sink minus 2.
The Filter expression looks like this: sk <= cachedSink#output().rowcount-2

Excel Power query merging different workbooks

I can't find the solution for the problem described here.
I have an Excel file with sales data of 2020 and another one with data for 2021, with lots of rows, so if I copy paste one below the other in the other Excel, I can't use pivot data because too many rows, so I want to merge my 2 Excel files in this way:
First table:
Second table:
Desired final table (in Excel):
Is there any way I can do that with power query or something else in Excel?
Note: my table doesn't have just Sales 2020 in 2021, but also other data, but for simplicity I didn't include it there (example: growth 2020, growth 2021)
So if anyone can help me I will appreciate it a lot!
I would start with 2 queries, each one would just read the rows from First table and Second table respectively.
Then I would start a new query by Reference to the First query.
In this Output query I would add a Merge Queries step, matching the first 3 columns from the First and Second queries. I would set the Join Type set to Full Outer Join.
Next in the Expand step I would return all the columns from the Second table.
Finally I would add 3 columns using the Conditional Column option, to create merged versions of Name, Surname and Month. For example:
= if [Name] = null then [Second.Name] else [Name]
Finally I would remove the un-needed columns, rename and re-order the columns if needed.

Azure Data Flow - Can we have Dynamic columns or change in projections for Unpiovt functionality

The excel consist of 62 columns and 7 columns are fixed and rest of them have weeks as in year(week1 to week 52)
I have used a data flow task to unpivot the 53 columns into rows with 2 extra columns year and value.
The problem is that I have the 52 week column names keep changing on every week data load and how to I handle this change in column names in data flow. For a single run it gives the exact output
What you'll want to do here is to implement late-binding of your schema, or what ADF refers to as "schema drift". Instead of setting a hardened "early binding" schema in your Source projection, leave the dataset schema and projection empty.
Next, add a Derived Column after your source and call it "Projection". This is where you'll build your projection using rules to account for your evolving schema.
Build out your canonical model with the column names for your entire year using byName('columnname'). That will tell ADF to look for the existence of the column in single quotes from your source data while also providing a schema that you can use to build out your pivot table.
If you need to cast the values, wrap byName() inside of a casting function, i.e. toString(), toDate(), etc.

PowerBI: Comparing a filterd table against a variable string is returning an empty table

Please see the code below for a Power BI table in DAX:
TABLE1 =
VAR ParticipantOneParticipantId =
SELECTEDVALUE(
ParticipantOneDetails[ParticipantId]
)
RETURN
FILTER(
ParticipantOneMeetings,
ParticipantOneMeetings[ParticipantId] = ParticipantOneParticipantId
)
I am fetching a value for ParticipantId from a sliced table called ParticipantOneDetails and setting ParticipantOneParticipantId to it.
In the next step I am trying to filter the table ParticipantOneMeetings based on its column ParticipantId comparing it against ParticipantOneParticipantId.
The problem is that the resulting table is coming out empty even though I know that ParticipantOneParticipantId must have a value and the ParticipantOneMeetings table also has values. I verified by comparing against a hard-coded string.
Can you please point out what I am doing wrong? Is comparing this way not legal?
The problem lies in the process you are trying. A calculated/custom tables and columns are static. They always refresh when the data set is refreshed. They do not interact dynamically with the slicer value. So it is impossible to get data from a slicer dynamically for a Custom Table generation.
Now, your requirement of creating a new table based on slicer value is not completely clear to me. As what you are trying, is a simple filtered output of your table "ParticipantOneMeetings" after applying the Slicer. If you have relation between your 2 tables using column "ParticipantId", change in Slicer will automatically filter out values in ParticipantOneMeetings table. Why you wants to hold this same filtered values in a new Custom table is really a mater here to know for finding appropriate solution for you.
Turns out I needed to add the following measure to the table output:
MeetingsAttendedByBothParticipants =
countrows(
INTERSECT(
VALUES(ParticipantOneMeetings[Name]),
VALUES(ParticipantTwoMeetings[Name])
)
)
The above provides an intersection on output of two sliced meeting tables. This results in a list of meetings that both persons attend.

Resources