Restructuring a table with PowerQuery - excel

I am moving my first steps in PowerQuery, so here's my problem. I have a raw data table which list countries and certain products. For each product there is the "market" value followed by a MyValue (meaning my own sales of that product in that country). An example here:
raw table
What I was trying to obtain with PowerQuery is a table that unpivots the products category and leaves two columns, one for Market and one for MyValue.
I tried in many ways and the closest to the result I could get was splitting the original table in two, one for the Market and one for MyValues. Then unpivot each one of them in PowerQuery so that I could get them in this way:
Market
And
MyValue
I tried then to merge the two tables but can't work it out. Of course I could do that manually but I'm sure there a way to do it with PowerQuery, either splitting into 2 tables, unpivoting and then merging or - even better - with a single query.
The result I'm aiming at is like
Desired Result

You are close.
After you unpivot, you need to create a custom column that you can pivot on, and also modify the names in the resultant "attribute" column.
Read the comments in the code and explore the Applied Steps window to understand the algorithm
M Code
let
Source = Excel.CurrentWorkbook(){[Name="rawTable"]}[Content],
//generalized "typer" in case you add other Items
#"Changed Type" = Table.TransformColumnTypes(Source,{
{"Country", type text}, {"Date", type date}} &
List.Transform(List.RemoveFirstN(Table.ColumnNames(Source),2),each {_, Int64.Type})),
//Unpivot all except Country|Date
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"Country", "Date"}, "Item", "Value"),
//Add Custom Column to create Pivot column for "Market" and "MyValue
#"Added Custom" = Table.AddColumn(#"Unpivoted Other Columns", "Custom", each
if Text.StartsWith([Item],"My")
then "Market"
else "MyValue"),
//Replace "My" so Item Labels will be consistent
#"Replaced Value" = Table.ReplaceValue(#"Added Custom","My","",Replacer.ReplaceText,{"Item"}),
//Pivot with no aggregation (unless you want to)
#"Pivoted Column" = Table.Pivot(#"Replaced Value", List.Distinct(#"Replaced Value"[Custom]), "Custom", "Value"),
//Sort "Items" to original Column Order
itemSortOrder = List.Distinct(#"Replaced Value"[Item]),
sorted = Table.Sort(#"Pivoted Column",
{{"Country", Order.Ascending},
each List.PositionOf(itemSortOrder,[Item])
})
in
sorted
Hopefully, this is what you want for a result

thank you so much for having spent your time to help me.
I think I solved my problem using the List.Zip function. Solution was not mine but I took if from THIS video. With this trick, I don't even have to split the original source data into two tables (market & MyShare).
It perfectly does what I needed to with little if no effort for data-cleaning...

Related

Get values of top N based on sum and condition [duplicate]

I would like to extract the top 5 players based on the sales by each employee (without Pivot Table / Auto filter).
Refer my input and output screenshot
Snapshot
Any suggestions, how to obtain first top 5 ranks (even if repeated; as shown in the screenshots)
I have verified Extract Top 5 Values for Each Group in a List without VBA and some other links also.
Thanks in advance for your time and consideration! Please let me know if my request is unclear and/or if you have any specific questions.
This is what I use to track the top 5 absentees...
Edit to suit your needs.
Formula in cell A1:
=INDEX(A$13:A52,AGGREGATE(15,6,ROW($1:$40)/(B$13:B$52=B1),COUNTIF(B$1:B1,B1)))
Formula in cell B1:
LARGE(B$13:B$52,ROW())
An alternative approach using Power Query which is available in Excel 2010 Professional Plus and all later versions of Excel.
Steps are:
Add your input data table to the Power Query Editor;
Sort the table by Sales then by Name;
Add an Index Column starting from 1;
Filter the Index column to show values less than or equal to 5;
Remove the Index column, then you should have something like the following:
Close & Load the output table to a new worksheet (by default).
Here are the power query M Codes for your reference. All functions used are within GUI so it should be easy and straight forward.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Employee", type text}, {"Month", type text}, {"Sales", type number}}),
#"Sorted Rows" = Table.Sort(#"Changed Type",{{"Sales", Order.Descending}, {"Employee", Order.Ascending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1),
#"Filtered Rows" = Table.SelectRows(#"Added Index", each [Index] <= 5),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Index"})
in
#"Removed Columns"
Let me know if you have any questions. Cheers :)
Try this one. As you have in your sample:
On Cell E16:
=VLOOKUP(LARGE($C$3:$C$12,ROW()-15),CHOOSE({2/1},$A$3:$A$12,$C$3:$C$12),2,FALSE)
On Cell F16:
=VLOOKUP(LARGE($C$3:$C$12,ROW()-15),CHOOSE({2/1},$B$3:$B$12,$C$3:$C$12),2,FALSE)
On Cell G16:
=VLOOKUP(LARGE($C$3:$C$12,ROW()-15),$C$3:$C$12,1,FALSE)
You can drag it down to get the list sorted.
Hope it helps!

Excel Power Query: Keep only matched and row above

I wanted to know if Power Query in Excel can handle matching something from another worksheet and keeping only the matching row and the row above it all the while not sorting the list.
Above is the report I get sent daily. It contains orders going out. But we only give our customers their orders if they paid, which our system also catches as an "order". Our database is created that links these two orders together but it does it in a single column with the order in above the order out.
The above is the flat text file from the database that shows the OUT orders and the IN orders (i.e. payments). They are sorted by IN and linked OUT order. The numbers are randomly made by the system.
Can Power Query be used to import this flat text file from the database, match those OUT orders from "Today's OUTS" sheet and the OrdersINs which is always the single row above?
I want to just end up with a sheet that contains Today's OUTS and their linked Order INs.
Thank you.
Yes, it can.
Read in the two tables
Add an Index column to the "Links" table to be able to restore original order
Do Table.Join with JoinKind.FullOuter (all rows from both)
Sort according to the Index column
At this point one could either
add a custom column to reference the previous row if there is something in the OUTS column or,
my preference as it will often be faster: offset the Links column by one; then filter out the nulls
Please read the comments in the code and explore the Applied Steps to better understand the algorithm:
M Code
let
Source = Excel.CurrentWorkbook(){[Name="Outs"]}[Content],
Outs = Table.TransformColumnTypes(Source,{{"Today's OUTS", type text}}),
Source2 = Excel.CurrentWorkbook(){[Name="Links"]}[Content],
Links = Table.TransformColumnTypes(Source2,{{"Order Links", type text}}),
//Add index column to links to restore order after join
#"Added Index" = Table.AddIndexColumn(Links, "Index", 0, 1, Int64.Type),
Joined = Table.Join(Outs,"Today's OUTS", #"Added Index", "Order Links", JoinKind.FullOuter),
#"Sorted Rows" = Table.Sort(Joined,{{"Index", Order.Ascending}}),
#"Removed Columns" = Table.RemoveColumns(#"Sorted Rows",{"Index"}),
//offset Links by one row (usually faster than using Index to reference previous row
prevRow = let
ShiftedList = {null} & List.RemoveLastN(Table.Column(#"Removed Columns", "Order Links"),1),
Custom1 = Table.ToColumns(#"Removed Columns") & {ShiftedList},
Custom2 = Table.FromColumns(Custom1, Table.ColumnNames(#"Removed Columns") & {"Order IN"})
in
Custom2,
#"Removed Columns1" = Table.RemoveColumns(prevRow,{"Order Links"}),
//Filter out the nulls
#"Filtered Rows" = Table.SelectRows(#"Removed Columns1", each ([#"Today's OUTS"] <> null))
in
#"Filtered Rows"
Edit: Outs without Links will show up in the Outs column with a blank in the In column. Not sure how you might want to handle this

Power Query / Power BI get look for data from another excel workbook

I am trying to combine worksheets from two different workbooks with Power Query and I have trouble doing that.
I do not want to merge the two workbooks.
I do not want to create relationships or "joints".
However, I want to get very specific information for one workbook which has only one column. The "ID" column.
The ID column has rows with letter tags : AB or BE.
Following these letters, sepcific numeric ranges are associated.
For both AB and BE, number ranges first from 0000 to 3000 and from 3000 to 6000.
I thus have the following possibilities:
From AB0000 to AB3000
From AB3001 to AB6000
From BE0000 to BE3000
From BE3001 to AB6000
Each category match to the a specific item in my column geography, from the other workbook:
From AB0000 to AB3000, it is ItalyZ
From AB3001 to AB6000, it is ItalyB
From BE0000 to BE3000, it is UKY
From BE3001 to AB6000, it is UKM
I am thus trying to find the highest number associated to the first AB category, the second AB category, the first BE category, and the second.
I then want to "bring" this number in the other query and increment it each time that matching country is found in the other workbook.
For example :
AB356 is the highest number in the first workbook.
Once the first "ItalyB" is found, the column besides writes "AB357".
Once the second is "ItalyB" is found, the column besides write "AB358".
Here is the one columned worksheet:
Here is the other worksheet with the various countries in geography:
Here is an example of results:
have one column (geography) with
I think that this is something which I should work towards:
I added the index column, with a start as one, because each row (even row zero) should increment either of the four matching code.
In order to keep moving forward I have also been trying to create some sort of mapping in third excel sheet, that I imported in Power BI, but I am not sure that this is a good way forward:
I have the following result when I create a blank query:
After a correction, I still get this result when creating the blank query:
This is not an easy answer as there are many steps to get to your result. I have choosen for m-query because of the complexity.
In PBi click on Transform data, now you are in m-query.
The table with the ID's (I called it "HighestID") needs expansion
because we need to be able to map on prefix
You need a mapping table
("GeoMapping"), else there is no relation between the Prefixes and
the geolocation.
We need the newID on the Geo-table (which I called "Geo").
Expand the HighestID table.
Click on the table and open the Advanced Editor, look at your code and compare it to the one below, the last 2 steps are essential, there I add two columns (Prefix and Number) which we need later.
let
Source = Csv.Document(File.Contents("...\HighestID.csv"),[Delimiter=";", Columns=1, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"ID", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type1", "Prefix", each Text.Middle([ID],0,2), type text),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Number", each Number.FromText(Text.Middle([ID],2,5)))
in
#"Added Custom1"
Result:
Create mapping table
Click right button under your last table and click Blank Query:
Paste the source below, ensure the name of the merg table equals the name of your table. As I mentioned, I called it HighestID.
let
Source = #table({"Prefix", "Seq_Start", "Seq_End","GeoLocation"},{{"AB",0,2999,"ItalyZ"},{"AB",3000,6000,"ItalyB"},{"BC",0,299,"UKY"},{"BC",3000,6000,"UKM"}}),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Seq_Start", Int64.Type}, {"Seq_End", Int64.Type}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"Prefix"}, HighestID, {"Prefix"}, "HighestID", JoinKind.LeftOuter),
#"Expanded HighestID" = Table.ExpandTableColumn(#"Merged Queries", "HighestID", {"Number"}, {"Number"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded HighestID", each [Number] >= [Seq_Start] and [Number] <= [Seq_End]),
#"Grouped Rows" = Table.Group(#"Filtered Rows", {"Prefix", "Seq_Start", "Seq_End", "GeoLocation"}, {{"NextSeq", each List.Max([Number]) + 1, type number}})
in
#"Grouped Rows"
Result:
Adding the NextSeq Column
This is the hard bit because when I would only give you teh code, I am afraid it will not work so I give you the steps you need to do.
1.Select the table, right click on Geography and click Group by. select as below:
Merge with table Geomapping as below:
Expand the GeoMapping with NextSeq
Add a custom column:
Remove columns not needed so only custom is left created in step 4.
Expand the column (all select). End result all your columns you had earlier plus an Index column.

Excel - extracting top 5 values

I would like to extract the top 5 players based on the sales by each employee (without Pivot Table / Auto filter).
Refer my input and output screenshot
Snapshot
Any suggestions, how to obtain first top 5 ranks (even if repeated; as shown in the screenshots)
I have verified Extract Top 5 Values for Each Group in a List without VBA and some other links also.
Thanks in advance for your time and consideration! Please let me know if my request is unclear and/or if you have any specific questions.
This is what I use to track the top 5 absentees...
Edit to suit your needs.
Formula in cell A1:
=INDEX(A$13:A52,AGGREGATE(15,6,ROW($1:$40)/(B$13:B$52=B1),COUNTIF(B$1:B1,B1)))
Formula in cell B1:
LARGE(B$13:B$52,ROW())
An alternative approach using Power Query which is available in Excel 2010 Professional Plus and all later versions of Excel.
Steps are:
Add your input data table to the Power Query Editor;
Sort the table by Sales then by Name;
Add an Index Column starting from 1;
Filter the Index column to show values less than or equal to 5;
Remove the Index column, then you should have something like the following:
Close & Load the output table to a new worksheet (by default).
Here are the power query M Codes for your reference. All functions used are within GUI so it should be easy and straight forward.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Employee", type text}, {"Month", type text}, {"Sales", type number}}),
#"Sorted Rows" = Table.Sort(#"Changed Type",{{"Sales", Order.Descending}, {"Employee", Order.Ascending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1),
#"Filtered Rows" = Table.SelectRows(#"Added Index", each [Index] <= 5),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Index"})
in
#"Removed Columns"
Let me know if you have any questions. Cheers :)
Try this one. As you have in your sample:
On Cell E16:
=VLOOKUP(LARGE($C$3:$C$12,ROW()-15),CHOOSE({2/1},$A$3:$A$12,$C$3:$C$12),2,FALSE)
On Cell F16:
=VLOOKUP(LARGE($C$3:$C$12,ROW()-15),CHOOSE({2/1},$B$3:$B$12,$C$3:$C$12),2,FALSE)
On Cell G16:
=VLOOKUP(LARGE($C$3:$C$12,ROW()-15),$C$3:$C$12,1,FALSE)
You can drag it down to get the list sorted.
Hope it helps!

How do I reconstruct a data-set based on unique ID

Looking for a solution either in excel or IBM SPSS:
I have a dataset with around 95,000 rows. Each row is one response from a participant on a particular question. For example, Row 2 is the response from participant A, on Question 1, where they indicated a score of 2. As pictured.
Ideally I need 1 line of responses per participant as pictured here:
I've tried VLOOKUP and then a macro to delete #N/A and move up the values but memory can't even handle the VLOOKUP, so it's not a viable option.
I feel out of options on what to do, but without laying out my data-set like this, I can't do later analysis (Later I need to average across all participants where Q5 = 80 etc [Q5 is a category code]).
You can do this with a Pivot Table.
Using Power Query (Excel 2010+) (aka Get&Transform in Excel 2016+) gives you a bit more flexibility in, for example, automating the naming of the column Headers.
You can use the GUI if you will only have five questions. But if the number of questions might vary from run to run, the code to handle that needs to be done through the Advanced Editor.
If not, you can use the GUI to just Pivot the QuestionNumber column
let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"UserID", type text}, {"QuestionNumber", Int64.Type}, {"Score", Int64.Type}}),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Changed Type", {{"QuestionNumber", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Changed Type", {{"QuestionNumber", type text}}, "en-US")[QuestionNumber]), "QuestionNumber", "Score", List.Sum),
Renames = List.Transform(List.Skip(Table.ColumnNames(#"Pivoted Column"),1), each {_, "Q" &_}),
#"New Headers" = Table.RenameColumns(#"Pivoted Column", Renames)
in
#"New Headers"
SPSS ANSWER:
Run this code in a new syntax window:
casestovars /id=userid /index=questionNum /separator="".

Resources