Row totals based on column name in PowerQuery - excel

I have a data file with around 400 columns in it. I need to import this data into PowerPivot. In order to reduce my file size, I would like to use PowerQuery to create 2 different row totals, and then delete all my unneeded columns upon load.
While my first row total column (RowTotal1) would summate all 400 columns, I would also like a second row total (RowTotal2) that subtracts from RowTotal1 any column whose name contains the text "click" in it.
Secondly, I would like to use the the value in my Country column as a variable, to also subtract any column that contains this var. e.g.
Site----Country----Col1----Col2----ClickCol1----Col3----Germany----RowTotal1----RowTotal2
1a--------USA----------2---------4-----------8------------16----------24--------------54---------------46-------
2a-----Germany-------2---------4-----------8------------16----------24--------------54---------------22-------
RowTotal1 = 2 + 4 + 8 + 16 + 24
RowTotal2 (first row) = 54 - 8 (ClickCol1)
RowTotal2 (second row) = 54 - 24 (Germany) - 8 (ClickCol1)
Is this possible? (EDIT: Yes. See answer below)
REVISED QUESTION: Is there a more memory efficient way to do than trying to group 300+ million rows at once?

Code would look something like this:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Site", type text}, {"Country", type text}, {"Col1", Int64.Type}, {"Col2", Int64.Type}, {"ClickCol1", Int64.Type}, {"Col3", Int64.Type}, {"Germany", Int64.Type}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"Country", "Site"}, "Attribute", "Value"),
#"Added Conditional Column" = Table.AddColumn(#"Unpivoted Other Columns", "Value2", each if [Country] = [Attribute] or [Attribute] = "ClickCol1" then 0 else [Value] ),
#"Grouped Rows" = Table.Group(#"Added Conditional Column", {"Site", "Country"}, {{"RowTotal1", each List.Sum([Value]), type number},{"RowTotal2", each List.Sum([Value2]), type number}})
in
#"Grouped Rows"
But since you have a lot of columns, I should explain the steps:
(Assuming you have these in Excel file) Import them to Power Query
Select "Site" and "Country" columns (with Ctrl), right click > Unpivot Other Columns
Add Column with this formula (you might need to use Advanced Editor): Table.AddColumn(#"Unpivoted Other Columns", "Value2", each if [Country] = [Attribute] or [Attribute] = "ClickCol1" then 0 else [Value])
Select Site and Country columns, Right Click > Group By
Make it look like this:

Related

Aggregate multiple (many!) pair of columns (Exce)

I have table; The table consists of pairs of date and value columns
Pair Pair Pair Pair .... ..... ......
What I need is the sum of all values for the same date.
The total table has 3146 columns (so 1573 pairs of value and date)!! with up to 186 entries on row level.
Thankfully, the first column contains all possible date values.
Considering the 3146 columns I am not sure how to do that without doing massivle amount of small steps :(
This shows a different method of creating the two column table that you will group by Date and return the Sum. Might be faster than the List.Accumulate method. Certainly worth a try in view of your comment above.
Unpivot the original table
Add 0-based Index column; then IntegerDivide by 2
Group by the IntegerDivide column and extract the Date and Value to separate columns.
Then group by date and aggregate by sum
let
Source = Excel.CurrentWorkbook(){[Name="Table12"]}[Content],
//assuming only columns are Date and Value, this will set the data types for any number of columns
Types = List.Transform(List.Alternate(Table.ColumnNames(Source),1,1,1), each {_, type date}) &
List.Transform(List.Alternate(Table.ColumnNames(Source),1,1,0), each {_, type number}),
#"Changed Type" = Table.TransformColumnTypes(Source,Types),
//Unpivot all columns to create a two column table
//The Value.1 table will alternate the related Date/Value
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {}, "Attribute", "Value.1"),
//add a column to group the pairs of values
//below two lines => a column in sequence of 0,0,1,1,2,2,3,3, ...
#"Added Index" = Table.AddIndexColumn(#"Unpivoted Other Columns", "Index", 0, 1, Int64.Type),
#"Inserted Integer-Division" = Table.AddColumn(#"Added Index", "Integer-Division", each Number.IntegerDivide([Index], 2), Int64.Type),
#"Removed Columns" = Table.RemoveColumns(#"Inserted Integer-Division",{"Index"}),
// Group by the "pairing" sequence,
// Extract the Date and Value to new columns
// => a 2 column table
#"Grouped Rows" = Table.Group(#"Removed Columns", {"Integer-Division"}, {
{"Date", each [Value.1]{0}, type date},
{"Value", each [Value.1]{1}, type number}}),
#"Removed Columns1" = Table.RemoveColumns(#"Grouped Rows",{"Integer-Division"}),
//Group by Date and aggregate by Sum
#"Grouped Rows1" = Table.Group(#"Removed Columns1", {"Date"}, {{"Sum Values", each List.Sum([Value]), type number}}),
//Sort into date order
#"Sorted Rows" = Table.Sort(#"Grouped Rows1",{{"Date", Order.Ascending}})
in
#"Sorted Rows"
Quick google shows "Number of columns per table 16,384" for powerquery and 16000 for powerBI, so I'm thinking you have to split your input data somehow first, or perhaps this is not the tool for you, maybe AWK
Assuming that works, an M version of what you are looking for. It stacks the columns in groups of 2, then groups and sums them
let Source = Excel.CurrentWorkbook(){[Name="Table4"]}[Content],
Combo = List.Split(Table.ColumnNames(Source),2),
#"Added Custom" =List.Accumulate(
Combo,
#table({"Column1"}, {}),
(state,current)=> state & Table.Skip(Table.DemoteHeaders(Table.SelectColumns(Source, current)),1)
),
#"Grouped Rows" = Table.Group(#"Added Custom", {"Column1"}, {{"Sum", each List.Sum([Column2]), type number}})
in #"Grouped Rows"
186 rows * 1573 pairs of columns = 292,578 records.
Assuming not a very old version of Excel, 293k rows is fine, so it can be done with formulae:
Insert five columns to the left, so data starts in F3.
In A3 put zero, in A4 put 1, select the two and drag down to A188.
In A189 put =A3.
In B3 put 0, and drag down to B188.
In B189 put =B3
"Drag"* down A189 and B189 to row 292580
In C3 put =OFFSET($F$3,A3,B3)
In D3 put =OFFSET($F$3,A3,B3+1)
Select those two cells and click on the cross at bottom right to copy them to the end of column B.
Then put Date and Value in A1 and B1, and use a Pivot Table to get totals, averages, or whatever you need.
Any blank cells in the original input do not matter.
to "drag" down hundred of thousands of cells:
Copy A189 and B189
Goto (F5) A292580
Paste
Pin (F8)
CTRL-up arrow
Enter
And rather than $F$3 I would name that cell Origin, and use "Origin" in the two Offset formulae, but many people seem to consider that too complicated.

Group by column A value, transpose column B, column C row values for each grouped column A value

This is in Excel 2016. I have a spreadsheet where each row represents a response to two questions "Qa" and "Qb" from a unique student. The spreadsheet columns are: "Section" (class section student is in), "Qa", and "Qb".
Thus, if three students answered from the same class section, that section will be listed three times under "Section", with each unique students answers in the other columns.
I want to group by section and spread the answers to each question across a single row in separate columns. The number of columns to create will default to the section with the most unique responses
In this case, 10003 has the greatest number of responses, so I want to get the following end result.
I am at a loss with how to get this going. Something like grouping by the section but transposing the rows within that group?
As #ScottCraner pointed out, you can obtain your desired output using Power Query, available in Windows Excel 2010+ and Office 365 Excel
Select some cell in your original table
Data => Get&Transform => From Table/Range
When the PQ UI opens, navigate to Home => Advanced Editor
Make note of the Table Name in Line 2 of the code.
Replace the existing code with the M-Code below
Change the table name in line 2 of the pasted code to your "real" table name
Examine any comments, and also the Applied Steps window, to better understand the algorithm and steps
M Code
let
//Change table name in next row to actual table name in workbook
Source = Excel.CurrentWorkbook(){[Name="Table20"]}[Content],
//set data type
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Section", Int64.Type}, {"Qa", type text}, {"Qb", type text}}),
//Group by Section
//Add a 1-based Index column to each Group
#"Grouped Rows" = Table.Group(#"Changed Type", {"Section"}, {
{"Row", each Table.AddIndexColumn(_,"Row",1,1)}}),
//Expand the grouped tables
#"Expanded Row" = Table.ExpandTableColumn(#"Grouped Rows", "Row", {"Qa", "Qb", "Row"}, {"Qa", "Qb", "Row"}),
//Unpivot
//Merge Row and Attribute columns to create the q-number headers
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Expanded Row", {"Section", "Row"}, "Attribute", "Value"),
#"Merged Columns" = Table.CombineColumns(Table.TransformColumnTypes(#"Unpivoted Other Columns",
{{"Row", type text}}, "en-US"),{"Attribute", "Row"},
Combiner.CombineTextByDelimiter("-", QuoteStyle.None),"Merged"),
//Pivot on the Sorted Merged column with no aggregation
#"Pivoted Column" = Table.Pivot(#"Merged Columns", List.Sort(List.Distinct(#"Merged Columns"[Merged])), "Merged", "Value")
in
#"Pivoted Column"
Note that there are no empty columns (iow, there is no Qa-4)
If you really need an empty column, insert a step at the beginning replacing nulls with a blank
let
//Change table name in next row to actual table name in workbook
Source = Excel.CurrentWorkbook(){[Name="Table20"]}[Content],
//set data type
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Section", Int64.Type}, {"Qa", type text}, {"Qb", type text}}),
//if you really need a blank Qa column since you have four distinct Qb rows but only 3 Qa rows,
// then we insert the next line
#"Replaced Value" = Table.ReplaceValue(#"Changed Type",null,"",Replacer.ReplaceValue,{"Qa", "Qb"}),
//Group by Section
//Add a 1-based Index column to each Group
#"Grouped Rows" = Table.Group(#"Replaced Value", {"Section"}, {
{"Row", each Table.AddIndexColumn(_,"Row",1,1)}}),
//Expand the grouped tables
#"Expanded Row" = Table.ExpandTableColumn(#"Grouped Rows", "Row", {"Qa", "Qb", "Row"}, {"Qa", "Qb", "Row"}),
//Unpivot
//Merge Row and Attribute columns to create the q-number headers
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Expanded Row", {"Section", "Row"}, "Attribute", "Value"),
#"Merged Columns" = Table.CombineColumns(Table.TransformColumnTypes(#"Unpivoted Other Columns",
{{"Row", type text}}, "en-US"),{"Attribute", "Row"},
Combiner.CombineTextByDelimiter("-", QuoteStyle.None),"Merged"),
//Pivot on the Sorted Merged column with no aggregation
#"Pivoted Column" = Table.Pivot(#"Merged Columns", List.Sort(List.Distinct(#"Merged Columns"[Merged])), "Merged", "Value")
in
#"Pivoted Column"

I have 3 time periods in excel - I need to know the duration of the longest continuous period

Please help!
Ideally, I would really like to solve this using formulas only - not VBA or anything I consider 'fancy'.
I work for a program that awards bonuses for continuous engagement. We have three (sometimes more) engagement time periods that could overlap and/or could have spaces of no engagement. The magic figure is 84 days of continuous engagement. We have been manually reviewing each line (hundreds of lines) to see if the time periods add up to 84 days of continuous engagement, with no periods of inactivity.
In the link there is a pic of a summary of what we work with. Row 3 for example, doesn't have 84 days in any of the 3 time periods, but the first 2 time periods combined includes 120 consecutive days. The dates will not appear in date order - e.g. early engagements may be listed in period 3.
Really looking forward to your advice.
Annie
#TomSharpe has shown you a method of solving this with formulas. You would have to modify it if you had more than three time periods.
Not sure if you would consider a Power Query solution to be "too fancy", but it does allow for an unlimited number of time periods, laid out as you show in the sample.
With PQ, we
construct lists of all the consecutive dates for each pair of start/end
combine the lists for each row, removing the duplicates
apply a gap and island technique to the resulting date lists for each row
count the number of entries for each "island" and return the maximum
Please note: I counted both the start and the end date. In your days columns, you did not (except for one instance). If you want to count both, leave the code as is; if you don't we can make a minor modification
To use Power Query
Create a table which excludes that first row of merged cells
Rename the table columns in the format I show in the screenshot, since each column header in a table must have a different name.
Select some cell in that Data Table
Data => Get&Transform => from Table/Range
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to better understand the algorithm
M Code
code edited to Sort the date lists to handle certain cases
let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Start P1", type datetime}, {"Comment1", type text}, {"End P1", type datetime}, {"Days 1", Int64.Type}, {"Start P2", type datetime}, {"Comment2", type text}, {"End P2", type datetime}, {"Days 2", Int64.Type}, {"Start P3", type datetime}, {"Comment3", type text}, {"End P3", type datetime}, {"Days 3", Int64.Type}}),
//set data types for columns 1/5/9... and 3/7/11/... as date
dtTypes = List.Transform(List.Alternate(Table.ColumnNames(#"Changed Type"),1,1,1), each {_,Date.Type}),
typed = Table.TransformColumnTypes(#"Changed Type",dtTypes),
//add Index column to define row numbers
rowNums = Table.AddIndexColumn(typed,"rowNum",0,1),
//Unpivot except for rowNum column
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(rowNums, {"rowNum"}, "Attribute", "Value"),
//split the attribute column to filter on Start/End => just the dates
//then filter and remove the attributes columns
#"Split Column by Delimiter" = Table.SplitColumn(#"Unpivoted Other Columns", "Attribute", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Attribute.1", "Attribute.2"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Attribute.1", type text}, {"Attribute.2", type text}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Attribute.2"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([Attribute.1] = "End" or [Attribute.1] = "Start")),
#"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows",{"Attribute.1"}),
#"Changed Type2" = Table.TransformColumnTypes(#"Removed Columns1",{{"Value", type date}, {"rowNum", Int64.Type}}),
//group by row number
//generate date list from each pair of dates
//combine into a single list of dates with no overlapped date ranges for each row
#"Grouped Rows" = Table.Group(#"Changed Type2", {"rowNum"}, {
{"dateList", (t)=> List.Sort(
List.Distinct(
List.Combine(
List.Generate(
()=>[dtList=List.Dates(
t[Value]{0},
Duration.TotalDays(t[Value]{1}-t[Value]{0})+1 ,
#duration(1,0,0,0)),idx=0],
each [idx] < Table.RowCount(t),
each [dtList=List.Dates(
t[Value]{[idx]+2},
Duration.TotalDays(t[Value]{[idx]+3}-t[Value]{[idx]+2})+1,
#duration(1,0,0,0)),
idx=[idx]+2],
each [dtList]))))}
}),
//determine Islands and Gaps
#"Expanded dateList" = Table.ExpandListColumn(#"Grouped Rows", "dateList"),
//Duplicate the date column and turn it into integers
#"Duplicated Column" = Table.DuplicateColumn(#"Expanded dateList", "dateList", "dateList - Copy"),
#"Changed Type3" = Table.TransformColumnTypes(#"Duplicated Column",{{"dateList - Copy", Int64.Type}}),
//add an Index column
//Then subtract the index from the integer date
// if the dates are consecutive the resultant ID column will => the same value, else it will jump
#"Added Index" = Table.AddIndexColumn(#"Changed Type3", "Index", 0, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(#"Added Index", "ID", each [#"dateList - Copy"]-[Index]),
#"Removed Columns2" = Table.RemoveColumns(#"Added Custom",{"dateList - Copy", "Index"}),
//Group by the date ID column and a Count will => the consecutive days
#"Grouped Rows1" = Table.Group(#"Removed Columns2", {"rowNum", "ID"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
#"Removed Columns3" = Table.RemoveColumns(#"Grouped Rows1",{"ID"}),
//Group by the Row number and return the Maximum Consecutive days
#"Grouped Rows2" = Table.Group(#"Removed Columns3", {"rowNum"}, {{"Max Consecutive Days", each List.Max([Count]), type number}}),
//combine the Consecutive Days column with original table
result = Table.Join(rowNums,"rowNum",#"Grouped Rows2","rowNum"),
#"Removed Columns4" = Table.RemoveColumns(result,{"rowNum"})
in
#"Removed Columns4"
Unfortunately Gap and Island seems to be a non-starter, because I don't think you can use it without either VBA or a lot of helper columns, plus the start dates need to be in order. It's a pity, because the longest continuous time on task (AKA largest island) drops out of the VBA version very easily and arguably it's easier to understand than the array formula versions below see this.
Moving on to option 2, if you have Excel 365, you can Use Sequence to generate a list of dates in a certain range, then check that each of them falls in one of the periods of engagement like this:
=LET(array,SEQUENCE(Z$2-Z$1+1,1,Z$1),
period1,(array>=A3)*(array<=C3),
period2,(array>=E3)*(array<=G3),
period3,(array>=I3)*(array<=K3),
SUM(--(period1+period2+period3>0)))
assuming that Z1 and Z2 contain the start and end of the range of dates that you're interested in (I've used 1/1/21 and 31/7/21).
If you don't have Excel 365, you can used the Row function to generate the list of dates instead. I suggest using the Name Manager to create a named range Dates:
=INDEX(Sheet1!$A:$A,Sheet1!$Z$1):INDEX(Sheet1!$A:$A,Sheet1!$Z$2)
Then the formula is:
= SUM(--(((ROW(Dates)>=A3) * (ROW(Dates)<=C3) +( ROW(Dates)>=E3) * (ROW(Dates)<=G3) + (ROW(Dates)>=I3) * (ROW(Dates)<=K3))>0))
You will probably have to enter this using CtrlShiftEnter or use Sumproduct instead of Sum.
EDIT
As #Qualia has perceptively noted, you want the longest time of continuous engagement. This can be found by applying Frequency to the first formula:
=LET(array,SEQUENCE(Z$2-Z$1+1,1,Z$1),
period1,(array>=A3)*(array<=C3),
period2,(array>=E3)*(array<=G3),
period3,(array>=I3)*(array<=K3),
onDays,period1+period2+period3>0,
MAX(FREQUENCY(IF(onDays,array),IF(NOT(onDays),array)))
)
and the non_365 version becomes
=MAX(FREQUENCY(IF((ROW(Dates)>=A3)*(ROW(Dates)<=C3)+(ROW(Dates)>=E3)*(ROW(Dates)<=G3)+(ROW(Dates)>=I3)*(ROW(Dates)<=K3),ROW(Dates)),
IF( NOT( (ROW(Dates)>=A3)*(ROW(Dates)<=C3)+(ROW(Dates)>=E3)*(ROW(Dates)<=G3)+(ROW(Dates)>=I3)*(ROW(Dates)<=K3) ),ROW(Dates))))

Power Query Sum of column by group as new column

So I am new to power query and I just wasted over an hour looking for something that I can do easily in many other programs.
I just want to create a new column summing up another column. FOr instance, to check if the percentage a correct and if not normalize therafter. I dont want to group by and reduce the table.
I ve been searching left and right and tried to add a new column like "Group Sum" using stuff like
= list.sum([Number])
= Calculate(SUM([Number])
just to get the the total sum of all entries 200. No success.
Maybe its me, but I really dont see the logic.
I now tried
let
Quelle = Excel.CurrentWorkbook(){[Name="Tabelle1"]}[Content],
#"Geänderter Typ" = Table.TransformColumnTypes(Quelle,{{"Group", type text}, {"Gender", type text}, {"Number", Int64.Type}, {"Group Sum", Int64.Type}, {"Spalte1", Int64.Type}})
#"Added Custom" = Table.AddColumn(#"Geänderter Typ","Group Sum",(i)=>List.Sum(Table.SelectRows(#"Geänderter Typ", each [Group]=i[Group])[Number]), type number )
in
#"Geänderter Typ"
which results in an error and
let
Quelle = Excel.CurrentWorkbook(){[Name="Tabelle1"]}[Content],
#"Geänderter Typ" = Table.TransformColumnTypes(Quelle,{{"Group", type text}, {"Gender", type text}, {"Number", Int64.Type}, {"Group Sum", Int64.Type}}),
#"Hinzugefügte benutzerdefinierte Spalte" = Table.AddColumn(#"Geänderter Typ", "Benutzerdefiniert", each Table.Group(Quelle, {"Group"}, {{"Group Sum", each List.Sum([Number]), type nullable number}}))
in
#"Hinzugefügte benutzerdefinierte Spalte"
Which gives me a new column where all entries say "Table"
Here are two other options. The examples assume your source table is named Table1. Here's how mine looks at its source in Excel:
Note it does not have a Group Sum column. The query will derive that.
Option 1.
Click Add Column then Custom Column and fill out the screen like this and click OK:
You should see a table like this:
Then just click the table in the first row of the Custom column and you should get a table that looks like this:
Then you can merge this new table with the original source table (Table1). Click Home > Merge Queries and fill out the information for the merge like this and click OK. (Note that the same query "Table1" is being merged to itself at this point, and only the Group column is selected for each entry.)
You should see a table like this:
Then, in the formula bar above that table, where you see = Table.NestedJoin(Custom, {"Group"}, Custom, {"Group"}, "Custom", JoinKind.LeftOuter), change the first instance of Custom to Source, so the line reads = Table.NestedJoin(Source, {"Group"}, Custom, {"Group"}, "Custom", JoinKind.LeftOuter) instead.
That is, change it from:
To:
Then expand the new Custom column by clicking the button, only selecting the Group Sum column, clearing the checkbox beside "Use original column name as prefix," and clicking OK:
You should get this result:
Here's the M code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Group", type text}, {"Gender", type text}, {"Number", Int64.Type}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each Table.Group(Source, {"Group"}, {{"Group Sum", each List.Sum([Number]), type nullable number}})),
Custom = #"Added Custom"{0}[Custom],
#"Merged Queries" = Table.NestedJoin(Source, {"Group"}, Custom, {"Group"}, "Custom", JoinKind.LeftOuter),
#"Expanded Custom" = Table.ExpandTableColumn(#"Merged Queries", "Custom", {"Group Sum"}, {"Group Sum"})
in
#"Expanded Custom"
(You can replace Table1, Source and Changed Type with Tablelle1, Quelle, and #"Geänderter Typ", respectively throughout the code above to align with Max's language.)
Option 2.
Click Transform then Group By and fill out the screen like this and click OK:
Then expand the AllData column with only the Gender and Number columns selected like this:
The result:
Here's the M code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Grouped Rows" = Table.Group(Source, {"Group"}, {{"AllData", each _, type table [Group=text, Gender=text, Number=number]}, {"Group Sum", each List.Sum([Number]), type number}}),
#"Expanded AllData" = Table.ExpandTableColumn(#"Grouped Rows", "AllData", {"Gender", "Number"}, {"Gender", "Number"})
in
#"Expanded AllData"
try
let Quelle= Excel.CurrentWorkbook(){[Name="Tabelle1"]}[Content],
#"Promoted Headers" = Table.PromoteHeaders(Quelle, [PromoteAllScalars=true]),
#"Geänderter Typ" = Table.TransformColumnTypes(#"Promoted Headers",{{"Group", type text}, {"Gender", type text}, {"Number", Int64.Type}}),
#"Added Custom" = Table.AddColumn(#"Geänderter Typ","Group Sum2",(i)=>List.Sum(Table.SelectRows(#"Geänderter Typ", each [Group]=i[Group]) [Number]), type number )
in #"Added Custom"
Group and Join Method
I have now seen a few ways to do this, but I think the most efficient is probably a group-and-join approach that builds on previous comments and answers here. It takes one line:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Custom" = Table.Join(Source, "Group", Table.Group(Source,{"Group"},{{"Group Sum", each List.Sum([Number]), type nullable number}}), "Group")
in
#"Added Custom"
The Table.Group() part of this creates a table with each unique value of the grouping variable ("Group" here) and, for each of those unique values, its summary value (the sum of [Number] for all rows with the same "Group" value here). To attach these summary values onto the original table becomes the job for Table.Join(). The Table.Join() function gets four input arguments: 1.) the original table, 2.) the grouping column in the original table ("Group" here), 3.) the summary table (that's the output of the Table.Group() function here) and 4.) the grouping column in summary table (also "Group" here).
I tested this and get the results as shown:
Note: I changed Number column values from the question to show that the code is working. In the example provided in the original question, the Group Sum is 100 for both groups, and that seems to make the approach suggested in another answer look like it's working when it does not.

Excel - Struggling with complex Index/Match/Match lookup

I've got a table with 9 columns and about 6000 rows. Each row has a price as the last column. Some of those prices are 0.00 when they should be a value.
In another worksheet I have the "original" table with about 3700 rows. The prices I need are in those rows. However, the original table has the prices horizontally within the rows, each next to a cell with specific range of gals. Basically the table I have has unique rows for every location/gal range/price combo, the original has all gal range/prices sequentially in single location rows
For example, a row in the original table looks like this:
... / 1-2000 / 2.8383 / 2001-4000 / 2.5382 / ...
Where as in my new table they look like this:
... / 1-2000 / 2.8383
... / 20001-4000 / 2.5382
etc
Everything is the same in my new table and the original table EXCEPT those gal ranges and prices.
What I'm trying to do is use an array multiple criteria Index/Match (based on 3 cells in both my new table and the original) to lookup the row, find the value that matches the range of gals, and then take the price to the right of that gal range cell.
Here is a row I'm trying to get the value for in my new table:
Here is the row with the value I need in the original table:
Here's a closer look at the formula I've constructed:
INDEX(old!$A$2:$Q$3755,MATCH(1,(A29=old!$A$2:$A$3755)*(F29=old!$F$2:$F$3755)*(G29=old!$G$2:$G3755),1),MATCH(H29,old!$J$2:$Q$3755,1)+1)
The first standard Index/Match part works great... I Index the table and Match to find the row. If I just enter a number for the Col (e.g., 1, 2, 3) it'll return the value from the corresponding cell PERFECTLY. However, I can't seem to make the Col match part work... I continuously get REF and N/A errors.
Is there some trick to doing a two way search? It seems like it should be a simple thing to just find that value in the row and take the next cell after if...?
One catch here is that the gal range value I'm looking for is NOT unique... there are at least 20 other references that have the same range (e.g., "1-2000"). Is there a way to limit the col match to just the row I find with the row match?
Any help is GREATLY appreciated.
Thanks,
Rick
One way to tackle this case is to use #powerquery.
Please refer to this article to find out how to use Power Query on your version of Excel. It is availeble in Excel 2010 Professional Plus and later versions. My demonstration is using Excel 2016.
The steps are:
Load/Add your old data to the Power Query Editor. My sample data only has one row but it is same for thousands of rows;
Use Merge Columns function under Transform tab to merge the first 7 columns with a delimiter say semicolon ;;
Repeat Merge Columns for each pair of GALLONS and TOTAL PRICES with the same delimiter ;. If you have done it correctly you should have something like the following:
Use Unpivot Columns function under Transform tab to unpivot all the merged columns for GALLONS;TOTAL PRICE, then remove the Attribute column;
Use Split Column function under the Transform tab to split each column by delimiter ;. If you have done it correctly you should have something like the following:
Make a duplicate column of the GALLONS range column (which is the second last column in the above screen-shot), and then split the original GALLONS range column by delimiter -. Then you should have:
Rename the column headers as desired;
Close & Load the new table to a new worksheet (by default) or you can change the default setting and create a connection for the new table and load it to a desired location in your workbook.
The second table is the output table and you can do INDEX+MATCH from this new table which should be much easier than from the old table. If the data are identical but just in different structure then you may just use the output table without worrying about looking up missing prices.
I have added a test line to my source table and here is the Refreshed output with a click of button:
Here are the power query M codes behind the scene for reference only. All steps are performed using built-in functions of the editor which is quite straight forward.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"IATA", Int64.Type}, {"ST", type text}, {"FUELER", type text}, {"UPDATED", type datetime}, {"RESTRICTIONS", type text}, {"BASEF UEL", type text}, {"NOTES", type any}, {"GALLONS1", type text}, {"TOTAL PRICES1", type text}, {"GALLONS2", type text}, {"TOTAL PRICES2", type text}, {"GALLONS3", type text}, {"TOTAL PRICES3", type text}, {"GALLONS4", type text}, {"TOTAL PRICES4", type text}, {"GALLONS5", type text}, {"TOTAL PRICES5", type text}}),
#"Merged Columns" = Table.CombineColumns(Table.TransformColumnTypes(#"Changed Type", {{"IATA", type text}, {"UPDATED", type text}, {"NOTES", type text}}, "en-AU"),{"IATA", "ST", "FUELER", "UPDATED", "RESTRICTIONS", "BASEF UEL", "NOTES"},Combiner.CombineTextByDelimiter(";", QuoteStyle.None),"Merged"),
#"Merged Columns1" = Table.CombineColumns(#"Merged Columns",{"GALLONS1", "TOTAL PRICES1"},Combiner.CombineTextByDelimiter(";", QuoteStyle.None),"Merged.1"),
#"Merged Columns2" = Table.CombineColumns(#"Merged Columns1",{"GALLONS2", "TOTAL PRICES2"},Combiner.CombineTextByDelimiter(";", QuoteStyle.None),"Merged.2"),
#"Merged Columns3" = Table.CombineColumns(#"Merged Columns2",{"GALLONS3", "TOTAL PRICES3"},Combiner.CombineTextByDelimiter(";", QuoteStyle.None),"Merged.3"),
#"Merged Columns4" = Table.CombineColumns(#"Merged Columns3",{"GALLONS4", "TOTAL PRICES4"},Combiner.CombineTextByDelimiter(";", QuoteStyle.None),"Merged.4"),
#"Merged Columns5" = Table.CombineColumns(#"Merged Columns4",{"GALLONS5", "TOTAL PRICES5"},Combiner.CombineTextByDelimiter(";", QuoteStyle.None),"Merged.5"),
#"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Merged Columns5", {"Merged"}, "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Columns",{"Attribute"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Removed Columns", "Merged", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), {"Merged.1", "Merged.2", "Merged.3", "Merged.4", "Merged.5", "Merged.6", "Merged.7"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Merged.1", Int64.Type}, {"Merged.2", type text}, {"Merged.3", type text}, {"Merged.4", type datetime}, {"Merged.5", type text}, {"Merged.6", type text}, {"Merged.7", type text}}),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Changed Type1", "Value", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), {"Value.1", "Value.2"}),
#"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Value.1", type text}, {"Value.2", type text}}),
#"Duplicated Column" = Table.DuplicateColumn(#"Changed Type2", "Value.1", "Value.1 - Copy"),
#"Reordered Columns" = Table.ReorderColumns(#"Duplicated Column",{"Merged.1", "Merged.2", "Merged.3", "Merged.4", "Merged.5", "Merged.6", "Merged.7", "Value.1 - Copy", "Value.1", "Value.2"}),
#"Split Column by Delimiter2" = Table.SplitColumn(#"Reordered Columns", "Value.1", Splitter.SplitTextByDelimiter("-", QuoteStyle.Csv), {"Value.1.1", "Value.1.2"}),
#"Changed Type3" = Table.TransformColumnTypes(#"Split Column by Delimiter2",{{"Value.1.1", Int64.Type}, {"Value.1.2", Int64.Type}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type3",{{"Merged.1", "IATA"}, {"Merged.2", "ST"}, {"Merged.3", "FUELER"}, {"Merged.4", "UPDATED"}, {"Merged.5", "RESTRICTIONS"}, {"Merged.6", "BASEF UEL"}, {"Merged.7", "NOTES"}, {"Value.1 - Copy", "GALLONS"}, {"Value.1.1", "Min Fuel"}, {"Value.1.2", "Max Fuel"}, {"Value.2", "TOTAL PRICE"}}),
#"Changed Type4" = Table.TransformColumnTypes(#"Renamed Columns",{{"UPDATED", type date}})
in
#"Changed Type4"
This can be done with Index/Match, but you have to keep your cool.
My Screenshot for reference
The formula in cell F7 is
=INDEX(
INDEX(A2:A3,MATCH(B7&C7&D7,INDEX(A2:A3&C2:C3&F2:F3,0),0))
:INDEX(L2:L3,MATCH(B7&C7&D7,INDEX(A2:A3&C2:C3&F2:F3,0),0)),
MATCH(E7,
INDEX(A2:A3,MATCH(B7&C7&D7,INDEX(A2:A3&C2:C3&F2:F3,0),0))
:INDEX(L2:L3,MATCH(B7&C7&D7,INDEX(A2:A3&C2:C3&F2:F3,0),0)),0)
+1)
To explain: First we build an Index on column A finding the row with a concatenation of the three values. That is combined with the union operator : with an Index on column L, using the same Match.
This Index will return one row of data, from column A to L. It is then used as the range argument for another Index/Match, where inside that row of data Match looks for the "g value" and adds 1 to move one to the right of the found cell.
Note that you don't want whole columns in this formula, since it will be very slow to calculate.
Here is another approach, not using Index/Match at all, but Sumproduct instead. in F7:
=SUMPRODUCT($B$2:$L$3,($A$2:$A$3=B7)*($C$2:$C$3=C7)*($F$2:$F$3=D7)*($A$2:$K$3=E7))
Note that the offset of one column is achieved with the first range from B to L and the last range from A to K.

Resources