Aggregate multiple (many!) pair of columns (Exce) - excel

I have table; The table consists of pairs of date and value columns
Pair Pair Pair Pair .... ..... ......
What I need is the sum of all values for the same date.
The total table has 3146 columns (so 1573 pairs of value and date)!! with up to 186 entries on row level.
Thankfully, the first column contains all possible date values.
Considering the 3146 columns I am not sure how to do that without doing massivle amount of small steps :(

This shows a different method of creating the two column table that you will group by Date and return the Sum. Might be faster than the List.Accumulate method. Certainly worth a try in view of your comment above.
Unpivot the original table
Add 0-based Index column; then IntegerDivide by 2
Group by the IntegerDivide column and extract the Date and Value to separate columns.
Then group by date and aggregate by sum
let
Source = Excel.CurrentWorkbook(){[Name="Table12"]}[Content],
//assuming only columns are Date and Value, this will set the data types for any number of columns
Types = List.Transform(List.Alternate(Table.ColumnNames(Source),1,1,1), each {_, type date}) &
List.Transform(List.Alternate(Table.ColumnNames(Source),1,1,0), each {_, type number}),
#"Changed Type" = Table.TransformColumnTypes(Source,Types),
//Unpivot all columns to create a two column table
//The Value.1 table will alternate the related Date/Value
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {}, "Attribute", "Value.1"),
//add a column to group the pairs of values
//below two lines => a column in sequence of 0,0,1,1,2,2,3,3, ...
#"Added Index" = Table.AddIndexColumn(#"Unpivoted Other Columns", "Index", 0, 1, Int64.Type),
#"Inserted Integer-Division" = Table.AddColumn(#"Added Index", "Integer-Division", each Number.IntegerDivide([Index], 2), Int64.Type),
#"Removed Columns" = Table.RemoveColumns(#"Inserted Integer-Division",{"Index"}),
// Group by the "pairing" sequence,
// Extract the Date and Value to new columns
// => a 2 column table
#"Grouped Rows" = Table.Group(#"Removed Columns", {"Integer-Division"}, {
{"Date", each [Value.1]{0}, type date},
{"Value", each [Value.1]{1}, type number}}),
#"Removed Columns1" = Table.RemoveColumns(#"Grouped Rows",{"Integer-Division"}),
//Group by Date and aggregate by Sum
#"Grouped Rows1" = Table.Group(#"Removed Columns1", {"Date"}, {{"Sum Values", each List.Sum([Value]), type number}}),
//Sort into date order
#"Sorted Rows" = Table.Sort(#"Grouped Rows1",{{"Date", Order.Ascending}})
in
#"Sorted Rows"

Quick google shows "Number of columns per table 16,384" for powerquery and 16000 for powerBI, so I'm thinking you have to split your input data somehow first, or perhaps this is not the tool for you, maybe AWK
Assuming that works, an M version of what you are looking for. It stacks the columns in groups of 2, then groups and sums them
let Source = Excel.CurrentWorkbook(){[Name="Table4"]}[Content],
Combo = List.Split(Table.ColumnNames(Source),2),
#"Added Custom" =List.Accumulate(
Combo,
#table({"Column1"}, {}),
(state,current)=> state & Table.Skip(Table.DemoteHeaders(Table.SelectColumns(Source, current)),1)
),
#"Grouped Rows" = Table.Group(#"Added Custom", {"Column1"}, {{"Sum", each List.Sum([Column2]), type number}})
in #"Grouped Rows"

186 rows * 1573 pairs of columns = 292,578 records.
Assuming not a very old version of Excel, 293k rows is fine, so it can be done with formulae:
Insert five columns to the left, so data starts in F3.
In A3 put zero, in A4 put 1, select the two and drag down to A188.
In A189 put =A3.
In B3 put 0, and drag down to B188.
In B189 put =B3
"Drag"* down A189 and B189 to row 292580
In C3 put =OFFSET($F$3,A3,B3)
In D3 put =OFFSET($F$3,A3,B3+1)
Select those two cells and click on the cross at bottom right to copy them to the end of column B.
Then put Date and Value in A1 and B1, and use a Pivot Table to get totals, averages, or whatever you need.
Any blank cells in the original input do not matter.
to "drag" down hundred of thousands of cells:
Copy A189 and B189
Goto (F5) A292580
Paste
Pin (F8)
CTRL-up arrow
Enter
And rather than $F$3 I would name that cell Origin, and use "Origin" in the two Offset formulae, but many people seem to consider that too complicated.

Related

Group by column A value, transpose column B, column C row values for each grouped column A value

This is in Excel 2016. I have a spreadsheet where each row represents a response to two questions "Qa" and "Qb" from a unique student. The spreadsheet columns are: "Section" (class section student is in), "Qa", and "Qb".
Thus, if three students answered from the same class section, that section will be listed three times under "Section", with each unique students answers in the other columns.
I want to group by section and spread the answers to each question across a single row in separate columns. The number of columns to create will default to the section with the most unique responses
In this case, 10003 has the greatest number of responses, so I want to get the following end result.
I am at a loss with how to get this going. Something like grouping by the section but transposing the rows within that group?
As #ScottCraner pointed out, you can obtain your desired output using Power Query, available in Windows Excel 2010+ and Office 365 Excel
Select some cell in your original table
Data => Get&Transform => From Table/Range
When the PQ UI opens, navigate to Home => Advanced Editor
Make note of the Table Name in Line 2 of the code.
Replace the existing code with the M-Code below
Change the table name in line 2 of the pasted code to your "real" table name
Examine any comments, and also the Applied Steps window, to better understand the algorithm and steps
M Code
let
//Change table name in next row to actual table name in workbook
Source = Excel.CurrentWorkbook(){[Name="Table20"]}[Content],
//set data type
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Section", Int64.Type}, {"Qa", type text}, {"Qb", type text}}),
//Group by Section
//Add a 1-based Index column to each Group
#"Grouped Rows" = Table.Group(#"Changed Type", {"Section"}, {
{"Row", each Table.AddIndexColumn(_,"Row",1,1)}}),
//Expand the grouped tables
#"Expanded Row" = Table.ExpandTableColumn(#"Grouped Rows", "Row", {"Qa", "Qb", "Row"}, {"Qa", "Qb", "Row"}),
//Unpivot
//Merge Row and Attribute columns to create the q-number headers
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Expanded Row", {"Section", "Row"}, "Attribute", "Value"),
#"Merged Columns" = Table.CombineColumns(Table.TransformColumnTypes(#"Unpivoted Other Columns",
{{"Row", type text}}, "en-US"),{"Attribute", "Row"},
Combiner.CombineTextByDelimiter("-", QuoteStyle.None),"Merged"),
//Pivot on the Sorted Merged column with no aggregation
#"Pivoted Column" = Table.Pivot(#"Merged Columns", List.Sort(List.Distinct(#"Merged Columns"[Merged])), "Merged", "Value")
in
#"Pivoted Column"
Note that there are no empty columns (iow, there is no Qa-4)
If you really need an empty column, insert a step at the beginning replacing nulls with a blank
let
//Change table name in next row to actual table name in workbook
Source = Excel.CurrentWorkbook(){[Name="Table20"]}[Content],
//set data type
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Section", Int64.Type}, {"Qa", type text}, {"Qb", type text}}),
//if you really need a blank Qa column since you have four distinct Qb rows but only 3 Qa rows,
// then we insert the next line
#"Replaced Value" = Table.ReplaceValue(#"Changed Type",null,"",Replacer.ReplaceValue,{"Qa", "Qb"}),
//Group by Section
//Add a 1-based Index column to each Group
#"Grouped Rows" = Table.Group(#"Replaced Value", {"Section"}, {
{"Row", each Table.AddIndexColumn(_,"Row",1,1)}}),
//Expand the grouped tables
#"Expanded Row" = Table.ExpandTableColumn(#"Grouped Rows", "Row", {"Qa", "Qb", "Row"}, {"Qa", "Qb", "Row"}),
//Unpivot
//Merge Row and Attribute columns to create the q-number headers
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Expanded Row", {"Section", "Row"}, "Attribute", "Value"),
#"Merged Columns" = Table.CombineColumns(Table.TransformColumnTypes(#"Unpivoted Other Columns",
{{"Row", type text}}, "en-US"),{"Attribute", "Row"},
Combiner.CombineTextByDelimiter("-", QuoteStyle.None),"Merged"),
//Pivot on the Sorted Merged column with no aggregation
#"Pivoted Column" = Table.Pivot(#"Merged Columns", List.Sort(List.Distinct(#"Merged Columns"[Merged])), "Merged", "Value")
in
#"Pivoted Column"

I have 3 time periods in excel - I need to know the duration of the longest continuous period

Please help!
Ideally, I would really like to solve this using formulas only - not VBA or anything I consider 'fancy'.
I work for a program that awards bonuses for continuous engagement. We have three (sometimes more) engagement time periods that could overlap and/or could have spaces of no engagement. The magic figure is 84 days of continuous engagement. We have been manually reviewing each line (hundreds of lines) to see if the time periods add up to 84 days of continuous engagement, with no periods of inactivity.
In the link there is a pic of a summary of what we work with. Row 3 for example, doesn't have 84 days in any of the 3 time periods, but the first 2 time periods combined includes 120 consecutive days. The dates will not appear in date order - e.g. early engagements may be listed in period 3.
Really looking forward to your advice.
Annie
#TomSharpe has shown you a method of solving this with formulas. You would have to modify it if you had more than three time periods.
Not sure if you would consider a Power Query solution to be "too fancy", but it does allow for an unlimited number of time periods, laid out as you show in the sample.
With PQ, we
construct lists of all the consecutive dates for each pair of start/end
combine the lists for each row, removing the duplicates
apply a gap and island technique to the resulting date lists for each row
count the number of entries for each "island" and return the maximum
Please note: I counted both the start and the end date. In your days columns, you did not (except for one instance). If you want to count both, leave the code as is; if you don't we can make a minor modification
To use Power Query
Create a table which excludes that first row of merged cells
Rename the table columns in the format I show in the screenshot, since each column header in a table must have a different name.
Select some cell in that Data Table
Data => Get&Transform => from Table/Range
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to better understand the algorithm
M Code
code edited to Sort the date lists to handle certain cases
let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Start P1", type datetime}, {"Comment1", type text}, {"End P1", type datetime}, {"Days 1", Int64.Type}, {"Start P2", type datetime}, {"Comment2", type text}, {"End P2", type datetime}, {"Days 2", Int64.Type}, {"Start P3", type datetime}, {"Comment3", type text}, {"End P3", type datetime}, {"Days 3", Int64.Type}}),
//set data types for columns 1/5/9... and 3/7/11/... as date
dtTypes = List.Transform(List.Alternate(Table.ColumnNames(#"Changed Type"),1,1,1), each {_,Date.Type}),
typed = Table.TransformColumnTypes(#"Changed Type",dtTypes),
//add Index column to define row numbers
rowNums = Table.AddIndexColumn(typed,"rowNum",0,1),
//Unpivot except for rowNum column
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(rowNums, {"rowNum"}, "Attribute", "Value"),
//split the attribute column to filter on Start/End => just the dates
//then filter and remove the attributes columns
#"Split Column by Delimiter" = Table.SplitColumn(#"Unpivoted Other Columns", "Attribute", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Attribute.1", "Attribute.2"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Attribute.1", type text}, {"Attribute.2", type text}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Attribute.2"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([Attribute.1] = "End" or [Attribute.1] = "Start")),
#"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows",{"Attribute.1"}),
#"Changed Type2" = Table.TransformColumnTypes(#"Removed Columns1",{{"Value", type date}, {"rowNum", Int64.Type}}),
//group by row number
//generate date list from each pair of dates
//combine into a single list of dates with no overlapped date ranges for each row
#"Grouped Rows" = Table.Group(#"Changed Type2", {"rowNum"}, {
{"dateList", (t)=> List.Sort(
List.Distinct(
List.Combine(
List.Generate(
()=>[dtList=List.Dates(
t[Value]{0},
Duration.TotalDays(t[Value]{1}-t[Value]{0})+1 ,
#duration(1,0,0,0)),idx=0],
each [idx] < Table.RowCount(t),
each [dtList=List.Dates(
t[Value]{[idx]+2},
Duration.TotalDays(t[Value]{[idx]+3}-t[Value]{[idx]+2})+1,
#duration(1,0,0,0)),
idx=[idx]+2],
each [dtList]))))}
}),
//determine Islands and Gaps
#"Expanded dateList" = Table.ExpandListColumn(#"Grouped Rows", "dateList"),
//Duplicate the date column and turn it into integers
#"Duplicated Column" = Table.DuplicateColumn(#"Expanded dateList", "dateList", "dateList - Copy"),
#"Changed Type3" = Table.TransformColumnTypes(#"Duplicated Column",{{"dateList - Copy", Int64.Type}}),
//add an Index column
//Then subtract the index from the integer date
// if the dates are consecutive the resultant ID column will => the same value, else it will jump
#"Added Index" = Table.AddIndexColumn(#"Changed Type3", "Index", 0, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(#"Added Index", "ID", each [#"dateList - Copy"]-[Index]),
#"Removed Columns2" = Table.RemoveColumns(#"Added Custom",{"dateList - Copy", "Index"}),
//Group by the date ID column and a Count will => the consecutive days
#"Grouped Rows1" = Table.Group(#"Removed Columns2", {"rowNum", "ID"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
#"Removed Columns3" = Table.RemoveColumns(#"Grouped Rows1",{"ID"}),
//Group by the Row number and return the Maximum Consecutive days
#"Grouped Rows2" = Table.Group(#"Removed Columns3", {"rowNum"}, {{"Max Consecutive Days", each List.Max([Count]), type number}}),
//combine the Consecutive Days column with original table
result = Table.Join(rowNums,"rowNum",#"Grouped Rows2","rowNum"),
#"Removed Columns4" = Table.RemoveColumns(result,{"rowNum"})
in
#"Removed Columns4"
Unfortunately Gap and Island seems to be a non-starter, because I don't think you can use it without either VBA or a lot of helper columns, plus the start dates need to be in order. It's a pity, because the longest continuous time on task (AKA largest island) drops out of the VBA version very easily and arguably it's easier to understand than the array formula versions below see this.
Moving on to option 2, if you have Excel 365, you can Use Sequence to generate a list of dates in a certain range, then check that each of them falls in one of the periods of engagement like this:
=LET(array,SEQUENCE(Z$2-Z$1+1,1,Z$1),
period1,(array>=A3)*(array<=C3),
period2,(array>=E3)*(array<=G3),
period3,(array>=I3)*(array<=K3),
SUM(--(period1+period2+period3>0)))
assuming that Z1 and Z2 contain the start and end of the range of dates that you're interested in (I've used 1/1/21 and 31/7/21).
If you don't have Excel 365, you can used the Row function to generate the list of dates instead. I suggest using the Name Manager to create a named range Dates:
=INDEX(Sheet1!$A:$A,Sheet1!$Z$1):INDEX(Sheet1!$A:$A,Sheet1!$Z$2)
Then the formula is:
= SUM(--(((ROW(Dates)>=A3) * (ROW(Dates)<=C3) +( ROW(Dates)>=E3) * (ROW(Dates)<=G3) + (ROW(Dates)>=I3) * (ROW(Dates)<=K3))>0))
You will probably have to enter this using CtrlShiftEnter or use Sumproduct instead of Sum.
EDIT
As #Qualia has perceptively noted, you want the longest time of continuous engagement. This can be found by applying Frequency to the first formula:
=LET(array,SEQUENCE(Z$2-Z$1+1,1,Z$1),
period1,(array>=A3)*(array<=C3),
period2,(array>=E3)*(array<=G3),
period3,(array>=I3)*(array<=K3),
onDays,period1+period2+period3>0,
MAX(FREQUENCY(IF(onDays,array),IF(NOT(onDays),array)))
)
and the non_365 version becomes
=MAX(FREQUENCY(IF((ROW(Dates)>=A3)*(ROW(Dates)<=C3)+(ROW(Dates)>=E3)*(ROW(Dates)<=G3)+(ROW(Dates)>=I3)*(ROW(Dates)<=K3),ROW(Dates)),
IF( NOT( (ROW(Dates)>=A3)*(ROW(Dates)<=C3)+(ROW(Dates)>=E3)*(ROW(Dates)<=G3)+(ROW(Dates)>=I3)*(ROW(Dates)<=K3) ),ROW(Dates))))

powerquery group by rows with formula on multiple columns

I am trying to group/merge two rows by dividing the values in each based on another column (Eligible) value.
From the initial raw data, I have reached this level with different steps (by unpivoting etc.) in power query.
Now I need to have a ratio per employee (eligible/not-eligible) for each month.
So for employee A, "Jan-14" will be -10/(-10 + -149) and so on. Any ideas will be appreciated. Thanks
Really appreciate the response. Interestingly, I have used your other answer to reach this stage from the raw data.
Since we are calculating how much time an employee worked on eligible activities each month so We will be grouping on the Employee. Employee name was just for reference which I took out and later will join with employee query to get the names if required. There was a typo in the image, the last row should also be an employee with id 2.
So now when there is a matching row, we use the formula to calculate the percentage of time spent on eligible activities but
If there isn't a matching row with eligible=1, then the outcome should be 0
if there isn't a matching row with eligible-0, then the outcome should be 1 (100%)
Try this and modify as needed. It assumes you are starting with all Eligible=0 and will only pick up matching Eligible=1. If there is a E=1 without E=0 it is removed. Also assumes we match on both Employee and EmployeeName
~ ~ ~ ~ ~
Click select the first three columns (Employee, EmployeeName, Eligible), right click .... Unpivot other other columns
Add custom column with name "One" and formula =1+[Eligible]
Merge the table onto itself, Home .. Merge Queries... with join kind Left Outer
Click to match Employee, EmployeeName and Attribute columns in the two boxes, and match One column in the top box to the Eligible Column in the bottom box
In the new column, use arrows atop the column to expand, choosing [x] onlt the Value column. Make the name of the column: Custom.Value
Add column .. custom column ... formula = [Custom.Value] / ( [Custom.Value] + [Value])
Filter Eligible to only pick up the zeroes using the arrow atop that column
Remove extra columns
Click select Attribute column, Transform ... pivot ... use custom as the values column
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Employee", "EmployeeName", "Eligible"}, "Attribute", "Value"),
#"Added Custom" = Table.AddColumn(#"Unpivoted Other Columns", "One", each 1+[Eligible]),
#"Merged Queries" = Table.NestedJoin(#"Added Custom",{"Employee", "EmployeeName", "Attribute", "One"},#"Added Custom",{"Employee", "EmployeeName", "Attribute", "Eligible"},"Added Custom",JoinKind.LeftOuter),
#"Expanded Added Custom" = Table.ExpandTableColumn(#"Merged Queries", "Added Custom", {"Value"}, {"Custom.Value"}),
#"Added Custom1" = Table.AddColumn(#"Expanded Added Custom", "Custom", each [Custom.Value]/([Custom.Value]+[Value])),
#"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([Eligible] = 0)),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Eligible", "Value", "One", "Custom.Value"}),
#"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[Attribute]), "Attribute", "Custom", List.Sum)
in #"Pivoted Column"

Excel Min Positive number with multiple criteria

I want to get the location of each RefID based on the smallest positive number. The answer for the example below is:
Table
RefID Location Number
vT70SAAS Sixth Floor 311.39
wmhXXAAY Sixth Floor 35.57
wm2xcAAA Rooftop 7.55
I tried =MIN(IF(F27:F30>0,F27:F30)) as array, but it gets the overall min positive. I need one for each RefID
Sample data:
Number Location RefID
-3.50 Basement wmhXXAAY
-32.39 First Floor wm2xcAAA
524.71 Second Floor vT70SAAS
-7.19 Second Floor wm2xcAAA
61.81 Third Floor wm2xcAAA
150.63 Third Floor wmhXXAAY
467.76 Fifth Floor wm2xcAAA
102.30 Fifth Floor wmhXXAAY
311.39 Sixth Floor vT70SAAS
35.57 Sixth Floor wmhXXAAY
521.51 Rooftop vT70SAAS
7.55 Rooftop wm2xcAAA
244.54 Rooftop wm2xcAAA
Table
One more option:
=AGGREGATE(15,6,(1/(($A$2:$A$14>0)*($C$2:$C$14=F2)))*$A$2:$A$14,1)
A pivot table is the best way. Especially if you are going to deal with larger datasets.
Make one from your data. Add RefID as the ROW and Min(Number) as DATA
Then - If you want to add a value back to your original table (in another, adjacent column) - you could do a lookup on the pivot table.
=VLOOKUP(A1,F1:G5,2)
Where F1:G5 is the range of the pivot table.
If, for some reason, you need the VLOOKUP to work without being able to manually select the pivottable, then see this: https://answers.microsoft.com/en-us/msoffice/forum/msoffice_excel-mso_other/use-pivot-table-name-in-a-formula/17aa5ad3-eee4-4a5c-a70c-a9296853066d
Just add another condition to the IF():
With data starting in the second row, array enter:
=MIN(IF((F2:F14>0)*(H2:H14="vT70SAAS"),F2:F14))
Note we use multiply rather than AND()
(do the same for the other IDs)
You can also do this using Power Query, available in Excel 2010+
Most steps can be done from the UI.
The Table renaming can be done manually, but I customized the code a bit in case you wind up adding extra columns.
Algorithm
Filter the table for values > 0
Group Rows by RefID, with Operation = All Rows
Add a Custom Column with formula Table.Min([Grouped],"Number") to extract the RefID with the lowest Number
Expand the Resultant column
ReOrder the columns into the desired final order
Remove the Extra columns
Rename the resultant columns.
M-Code
You may need to change the Table Name in Line 2
let
Source = Excel.CurrentWorkbook(){[Name="refIDtbl"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Number", type number}, {"Location", type text}, {"RefID", type text}}),
#"Filtered Rows" = Table.SelectRows(#"Changed Type", each [Number] >= 0),
#"Grouped Rows" = Table.Group(#"Filtered Rows", {"RefID"}, {{"Grouped", each _, type table [Number=number, Location=text, RefID=text]}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.Min([Grouped],"Number")),
#"Expanded Custom" = Table.ExpandRecordColumn(#"Added Custom", "Custom", {"Number", "Location", "RefID"}, {"Custom.Number", "Custom.Location", "Custom.RefID"}),
#"Reordered Columns" = Table.ReorderColumns(#"Expanded Custom",{"RefID", "Grouped", "Custom.Location", "Custom.RefID", "Custom.Number"}),
#"Removed Columns" = Table.RemoveColumns(#"Reordered Columns",{"Grouped", "Custom.RefID"}),
oldColNames = Table.ColumnNames(#"Removed Columns"),
newColNames = List.ReplaceValue(oldColNames,"Custom.","",Replacer.ReplaceText),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",List.Zip({oldColNames,newColNames}))
in
#"Renamed Columns"

Row totals based on column name in PowerQuery

I have a data file with around 400 columns in it. I need to import this data into PowerPivot. In order to reduce my file size, I would like to use PowerQuery to create 2 different row totals, and then delete all my unneeded columns upon load.
While my first row total column (RowTotal1) would summate all 400 columns, I would also like a second row total (RowTotal2) that subtracts from RowTotal1 any column whose name contains the text "click" in it.
Secondly, I would like to use the the value in my Country column as a variable, to also subtract any column that contains this var. e.g.
Site----Country----Col1----Col2----ClickCol1----Col3----Germany----RowTotal1----RowTotal2
1a--------USA----------2---------4-----------8------------16----------24--------------54---------------46-------
2a-----Germany-------2---------4-----------8------------16----------24--------------54---------------22-------
RowTotal1 = 2 + 4 + 8 + 16 + 24
RowTotal2 (first row) = 54 - 8 (ClickCol1)
RowTotal2 (second row) = 54 - 24 (Germany) - 8 (ClickCol1)
Is this possible? (EDIT: Yes. See answer below)
REVISED QUESTION: Is there a more memory efficient way to do than trying to group 300+ million rows at once?
Code would look something like this:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Site", type text}, {"Country", type text}, {"Col1", Int64.Type}, {"Col2", Int64.Type}, {"ClickCol1", Int64.Type}, {"Col3", Int64.Type}, {"Germany", Int64.Type}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"Country", "Site"}, "Attribute", "Value"),
#"Added Conditional Column" = Table.AddColumn(#"Unpivoted Other Columns", "Value2", each if [Country] = [Attribute] or [Attribute] = "ClickCol1" then 0 else [Value] ),
#"Grouped Rows" = Table.Group(#"Added Conditional Column", {"Site", "Country"}, {{"RowTotal1", each List.Sum([Value]), type number},{"RowTotal2", each List.Sum([Value2]), type number}})
in
#"Grouped Rows"
But since you have a lot of columns, I should explain the steps:
(Assuming you have these in Excel file) Import them to Power Query
Select "Site" and "Country" columns (with Ctrl), right click > Unpivot Other Columns
Add Column with this formula (you might need to use Advanced Editor): Table.AddColumn(#"Unpivoted Other Columns", "Value2", each if [Country] = [Attribute] or [Attribute] = "ClickCol1" then 0 else [Value])
Select Site and Country columns, Right Click > Group By
Make it look like this:

Resources