Excel Power Query. Unpivot Years only to create multiple measures on each row - excel

I'm trying to normalise some data that is supplied in Excel. The data is made up of a number of dimension columns followed by several measure columns over time. Unfortunately the data comes in with a single "Measure/Year" identifier which means that if there are 10 years of data and 4 measures, there will be 40 measure columns.
I can't select specific columns to unpivot as the number of columns will change over time and I want to automate this completely.
A simplified sample of data looks like this (just showing 2 measures over 3 years in this example - but potentially 5 measures over an ever increasing number of years).
Country
Category
Product
QTY_2018
QTY_2019
QTY_2020
Value_2018
Value_2019
Value_2020
France
Fruit
Apple
10
20
30
11
22
33
France
Fruit
Orange
40
50
60
44
55
66
Germany
Veg
Carrot
70
80
90
77
88
99
What I would like to achieve is...
Country
Category
Product
Year
QTY
Value
France
Fruit
Apple
2018
10
11
France
Fruit
Apple
2019
20
22
France
Fruit
Apple
2020
30
33
France
Fruit
Orange
2018
40
44
France
Fruit
Orange
2019
50
55
France
Fruit
Orange
2020
60
66
Germany
Veg
Carrot
2018
70
77
Germany
Veg
Carrot
2019
80
88
Germany
Veg
Carrot
2020
90
99
So far I have selected all the non-measure columns and then applied a transform "Unpivot other columns", and then creating 2 custom columns to get the measure name (Qty or Value in this example) and the year. This gets around the problem of the varying number of measure columns but that only gets me so far.
I now have data that looks like this
Country
Category
Product
Year
Measure
Amount
France
Fruit
Apple
2018
QTY
10
France
Fruit
Apple
2018
Value
11
and so on...
Notes:
The measure label column will always 'measurename_YYYY'
The list of measure names is finite (4 or 5 maybe) so updating this to support more measure names if any are added will be fine as this will be rare. The number of years will increase each year but as I want end users to be able to refresh the query based on the contents of a sheet they update (the sample data above) then the varying periods must be handled in the query.
If this can be done in the datamodel I'm happy to go with that too.
I maybe going about this the wrong way with my attempts so far but my Power Query knowledge is pretty basic so any help would be gratefully received.

You should be able to just repivot on the new Measure column to get your desired result now.

You're nearly there. Just Pivot on your "Measure" column, to complete the output:
Unpivoted = Table.UnpivotOtherColumns(Source, {"Country", "Category", "Product"}, "Attribute", "Value"),
#"Split Column" = Table.SplitColumn(Unpivoted, "Attribute", Splitter.SplitTextByEachDelimiter({"_"}, QuoteStyle.Csv, false), {"Measure", "Year"}),
#"Pivoted Column" = Table.Pivot(#"Split Column", List.Distinct(#"Split Column"[Measure]), "Measure", "Value")

Related

How can I convert a repeated column element in to a title row?

I have some rather ugly post-pivot data, much like the following:
Location
Team
Staff
Sales
North
1
1100
55
North
2
2100
56
North
3
3200
91
South
1
7100
75
South
2
3100
16
South
3
9200
41
East
1
8100
25
East
2
9100
56
East
3
4200
31
My users don't like the duplication in the first column and would rather it be a header row with only one element, with the three resulting tables side-by-side. So, something like this:
with the obvious extension for East.
How can I achieve this automatically? I would do it by hand, but the real version of my table has a few hundred categories of values in the Location column.

In Excel how can a formula verify whether the column location or column element has taken the correct data from its header name?

The Input data
in sheet1
and
the output calculated in sheet2
Now the sheet1 data can be changed by the user for input, so now columns 'Units1' & 'Units2' may not be placed at the same address that are in columns 'C' and 'D' respectively, so suppose a new user will input the data in which 'Avocado' and 'Banana' are in columns C & D , then the 'Output' calculation in Sheet2 will be incorrect because we always want to use Units1 & Units2 for calculation.
How to fix this, so that every time the data is input the formula checks whether the correct columns have been taken for calculation or not?
Is there a way to use INDEX or family of LOOKUP functions or any other function for this.
Maybe by a creating a new sheet and making a table of Indexes which refer to (or point to) the column names of Data sheet
Location
Dates
Units1
Units2
Avocado
Banana
New York
05-01-18
10
12
1
2
Los Angeles
02-02-18
20
23
1
2
Chicago
08-03-18
30
34
1
2
Houston
05-04-18
40
45
1
2
Phoenix
02-05-18
50
56
1
2
Philadelphia
08-06-18
60
67
1
2
San Antonio
05-07-18
70
78
1
2
San Diego
02-08-18
80
89
1
2
Dallas
08-09-18
90
99
1
2
San Jose
05-10-18
100
112
1
2
Use INDEX/MATCH:
=INDEX(2:2,1,MATCH("Units2",$1:$1,0))/INDEX(2:2,1,MATCH("Units1",$1:$1,0))

Use my custom row order with pandas .describe() function

Assuming I have the following test DataFrame df:
Car Sold make profit
Honda 100 Accord 10
Honda 20 Fit 5
Toyota 300 Corolla 20
Hyundai 150 Elantra 20
BMW 20 Z-class 100
Toyota 45 Lexus 7
BMW 50 X-class 30
JEEP 150 cherokee 2
Honda 20 CRV 5
Toyota 30 Yaris 3
I need a summary statistic table for number of cars sold, by type of car.
I can do that this way:
df.groupby('Car')['Sold'].describe()
this gives me something like the following:
Car count mean std min 25th 50th 75th max
BMW 2
Honda 3
Hyundai 1
JEEP 1
Toyota 3
The 'Car' column values are listed in the summary statistic table in alphabetically ascending order. I am looking for a way to sort it in my own pre-specified way. I want the summary statistic table to be listed as "Toyota, Hyundai, JEEP, BMW, Honda"
df.groupby('Car')['Sold'].describe().loc[["Toyota", "Hyundai", "JEEP", "BMW", "Honda"]]
helps me put it in order, but I am not able to do it for multi-level indexing. For instance, if I want the summary statistics table by 'Car', and further by the make, .loc does not give me the desired solution.

Excel combining up to 100 ranges that have similar but not identical columns into one

I'm pulling in data for unique items and each item has a potentially unique set of columns that go with it. So for instance as a trivial example two ranges might look this:
Range 1:
Brand Year MPG Sales
Nissan 2000 25 150
Nissan 2005 27 180
Nissan 2008 30 190
Range2:
Brand Year Sales
Honda 2000 95
Honda 2001 100
Honda 2001 150
Ideally I would like to combine into one named range:
Brand Year MPG Sales
Nissan 2000 25 150
Nissan 2005 27 180
Nissan 2008 30 190
Honda 2000 Null 95
Honda 2001 Null 100
Honda 2001 Null 150
In reality I have something like 100+ named ranges like this and the columns vary from anywhere between 25-35.
Is there a smart way to achieve the desired result? I'm trying right now to use vba to loop through each range and check if the row exists, but that seems like kind of a sloppy solution and I'm having issues getting it to work, but I can probably brute force it through.

EXCEL: Count of Column Items For Every Distinct Item In Another

In my Excel sheet, I have 2 columns. Names of restaurants, and ratings for each one. For each rating, a new row is created, so of course a restaurant name occurs multiple times.
Restaurant Rating
McDonalds 8
McDonalds 7
Red Robin 5
Qdoba 7
Etc.
How can I get the number of times each rating happens for each restaurant? We'll say rating goes from 1-10.
I want it to look like this:
Restaurant 1(rating) 2 3 4 5 6 7 8 9 10
McDonalds 889 22 45 77 484 443 283 333 44 339
Any help is appreciated!
Using Pivot Tables:
Use a pivot tables to set your rows at "Restaurant" and your columns as "Rating" and your values as "Count of Rating"
Using Countifs:

Resources