Concatenate power query columns that are offset from each other - excel

The problem
I have a data set with two header rows. I've transposed the rows into columns to work with the headers before combining, but I need help with concatenation of column1 into column2, since past row 7 the columns are offset from one another by one row (see example image).
The goal
I've tried to use replace and concatenate myself with an index, but have been unable to achieve the desired end result where column2 row 8 is concatenated with column1 row 7, so that when I combine these columns and transpose again the headers will be correctly labeled (see example image).
Thank you for any suggestions and your time.
Example image:

Here's one way.
I start with your Problem table as a table named Table1:
Then I add an index. (Add Column > Index Column):
Then I add a custom column. (Add Column > Custom Column) With this setup:
(#"Added Index"{[Index]-1}[Column1] references the entry in Column1 at the position record row that is equal to the value in the Index column, minus 1.)
...to get this:
Then I replaced Errors in the new Custom column. (Right-click Custom column title > click Replace Errors > type null > click OK)
Then I select Column1 and Custom column and remove other columns. (Select Column 1 column title > hold Ctrl and click Custom column title > keep holding Ctrl and right click Custom column title > click Remove Other Columns)
Here's my M code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each #"Added Index"{[Index]-1}[Column1]&"-"&[Column2]),
#"Replaced Errors" = Table.ReplaceErrorValues(#"Added Custom", {{"Custom", null}}),
#"Removed Other Columns" = Table.SelectColumns(#"Replaced Errors",{"Column1", "Custom"})
in
#"Removed Other Columns"

Another way.
Code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
IndexedTable = Table.AddIndexColumn(Source, "Index", 0, 1),
Transform = Table.TransformRows(IndexedTable, (row)=>[Column1= row[Column1], Column2 = if row[Column1]=null then Text.Combine({IndexedTable{row[Index]-1}[Column1], "-",row[Column2]}) else row[Column2]]),
ToTable = Table.FromRecords(Transform)
in
ToTable
Brief explanation:
Source
Add index to address previous record
Use Table.TransformRows to analyze and transform each row to a record in this manner: Column1 taken from each row's column1 (row[Column1]), Column2 is generated from previous row using Text.Concatenate, IndexedTable{row[Index]-1}[Column1]. This yields value from previous row's Column1. Table.TransformRows returns list of records.
Transform list of records into the table.
This code will fail if 1st row contains null in [Column1]. If this is unacceptable, add another if-then-else.

Another way:
let
Source = Excel.CurrentWorkbook(){[Name="Table"]}[Content],
fillDown = Table.FillDown(Table.DuplicateColumn(Source, "Column1", "copy"),{"copy"}),
replace = Table.ReplaceValue(fillDown, each [Column2], each if [Column2] = null then null
else [copy]&"-"&[Column2], Replacer.ReplaceValue, {"Column2"})[[Column1],[Column2]]
in
replace

Related

Aggregate multiple (many!) pair of columns (Exce)

I have table; The table consists of pairs of date and value columns
Pair Pair Pair Pair .... ..... ......
What I need is the sum of all values for the same date.
The total table has 3146 columns (so 1573 pairs of value and date)!! with up to 186 entries on row level.
Thankfully, the first column contains all possible date values.
Considering the 3146 columns I am not sure how to do that without doing massivle amount of small steps :(
This shows a different method of creating the two column table that you will group by Date and return the Sum. Might be faster than the List.Accumulate method. Certainly worth a try in view of your comment above.
Unpivot the original table
Add 0-based Index column; then IntegerDivide by 2
Group by the IntegerDivide column and extract the Date and Value to separate columns.
Then group by date and aggregate by sum
let
Source = Excel.CurrentWorkbook(){[Name="Table12"]}[Content],
//assuming only columns are Date and Value, this will set the data types for any number of columns
Types = List.Transform(List.Alternate(Table.ColumnNames(Source),1,1,1), each {_, type date}) &
List.Transform(List.Alternate(Table.ColumnNames(Source),1,1,0), each {_, type number}),
#"Changed Type" = Table.TransformColumnTypes(Source,Types),
//Unpivot all columns to create a two column table
//The Value.1 table will alternate the related Date/Value
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {}, "Attribute", "Value.1"),
//add a column to group the pairs of values
//below two lines => a column in sequence of 0,0,1,1,2,2,3,3, ...
#"Added Index" = Table.AddIndexColumn(#"Unpivoted Other Columns", "Index", 0, 1, Int64.Type),
#"Inserted Integer-Division" = Table.AddColumn(#"Added Index", "Integer-Division", each Number.IntegerDivide([Index], 2), Int64.Type),
#"Removed Columns" = Table.RemoveColumns(#"Inserted Integer-Division",{"Index"}),
// Group by the "pairing" sequence,
// Extract the Date and Value to new columns
// => a 2 column table
#"Grouped Rows" = Table.Group(#"Removed Columns", {"Integer-Division"}, {
{"Date", each [Value.1]{0}, type date},
{"Value", each [Value.1]{1}, type number}}),
#"Removed Columns1" = Table.RemoveColumns(#"Grouped Rows",{"Integer-Division"}),
//Group by Date and aggregate by Sum
#"Grouped Rows1" = Table.Group(#"Removed Columns1", {"Date"}, {{"Sum Values", each List.Sum([Value]), type number}}),
//Sort into date order
#"Sorted Rows" = Table.Sort(#"Grouped Rows1",{{"Date", Order.Ascending}})
in
#"Sorted Rows"
Quick google shows "Number of columns per table 16,384" for powerquery and 16000 for powerBI, so I'm thinking you have to split your input data somehow first, or perhaps this is not the tool for you, maybe AWK
Assuming that works, an M version of what you are looking for. It stacks the columns in groups of 2, then groups and sums them
let Source = Excel.CurrentWorkbook(){[Name="Table4"]}[Content],
Combo = List.Split(Table.ColumnNames(Source),2),
#"Added Custom" =List.Accumulate(
Combo,
#table({"Column1"}, {}),
(state,current)=> state & Table.Skip(Table.DemoteHeaders(Table.SelectColumns(Source, current)),1)
),
#"Grouped Rows" = Table.Group(#"Added Custom", {"Column1"}, {{"Sum", each List.Sum([Column2]), type number}})
in #"Grouped Rows"
186 rows * 1573 pairs of columns = 292,578 records.
Assuming not a very old version of Excel, 293k rows is fine, so it can be done with formulae:
Insert five columns to the left, so data starts in F3.
In A3 put zero, in A4 put 1, select the two and drag down to A188.
In A189 put =A3.
In B3 put 0, and drag down to B188.
In B189 put =B3
"Drag"* down A189 and B189 to row 292580
In C3 put =OFFSET($F$3,A3,B3)
In D3 put =OFFSET($F$3,A3,B3+1)
Select those two cells and click on the cross at bottom right to copy them to the end of column B.
Then put Date and Value in A1 and B1, and use a Pivot Table to get totals, averages, or whatever you need.
Any blank cells in the original input do not matter.
to "drag" down hundred of thousands of cells:
Copy A189 and B189
Goto (F5) A292580
Paste
Pin (F8)
CTRL-up arrow
Enter
And rather than $F$3 I would name that cell Origin, and use "Origin" in the two Offset formulae, but many people seem to consider that too complicated.

Replace text within a table for all cells that contain a given word for n columns

I have data within a table that occasionally has been inputted with text to say something like not available or No Data etc. I wish to replace each instance a cell contains no that this is then replaced with null across n number of columns. I don't know every type of word that has been entered but it looks as though each cell to be converted to null contains no as characters so I will go with this.
i.e.
Is there any way to combine `if text.contains([n columns],"no") then null else [n columns]
In powerquery, this removes the content of any cell containing (No,NO,no,nO) and converts to a null
Click select the first column, right click, Unpivot other columns
click select Value column and transform ... data type .. text
right click Value column and transform ... lower case
we really don't want that so change this in the formula bar
= Table.TransformColumns(#"Changed Type1",{{"Value", Text.Lower, type text}})
to resemble this instead (which also ignore the Case of the No)
= Table.TransformColumns(#"Changed Type1",{{"Value", each if Text.Contains(_,"no", Comparer.OrdinalIgnoreCase) then null else _, type text}})
click select attribute column
Transform ... pivot column
values column:Value, Advanced ... don’t aggregate
sample full code:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Column"}, "Attribute", "Value"),
#"Changed Type1" = Table.TransformColumnTypes(#"Unpivoted Other Columns",{{"Value", type text}}),
#"CheckForNo" = Table.TransformColumns(#"Changed Type1",{{"Value", each if Text.Contains(_,"no", Comparer.OrdinalIgnoreCase) then null else _, type text}}),
#"Pivoted Column" = Table.Pivot(#"CheckForNo", List.Distinct(#"Lowercased Text"[Attribute]), "Attribute", "Value")
in #"Pivoted Column"

How can I get the rows with the most cells that have the highest (or the lowest) values

I have a table of data which is consisted of 18 columns and 2.017 rows. I can get the row that has the highest (MAX) value in a cell but I need the row that has the most cells with higher values and have them in DESC order. I haven't managed yet to find a relevant post to this.
Here follows an example:
Using numbers up to 10 for illustration, the following shows the logic behind. (The actual numbers are those shown in Exhibit1)
Thank you
EDIT:
I am adding the below in order to try to clarify further. I am not sure if it is the correct path to go but I hope it makes sense.
In Exhibit2 I am indexing each column Desc (Based on Exhibit1) and then =SUM in the end of the row. Following this logic, the name having the lowest total is the one with the most high values (not the highest) in its row.
The result table is the following
Although possible with formulas and helper tables/columns, this can also be accomplished using Power Query, available in Windows Excel 2010+ and Excel 365 (Windows or Mac)
To use Power Query
Select some cell in your Data Table
Data => Get&Transform => from Table/Range or from within sheet
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
As we discussed in our Chat, I transform each column into a list of Ranked Entries; then sum the ranks for each row and sort as you have laid out.
M Code
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
//type all the columns
data = Table.TransformColumnTypes(Source,{
{"Order", Int64.Type},
{"Name", type text}} &
List.Transform(List.RemoveFirstN(Table.ColumnNames(Source),2), each {_, type number})
),
//Replace with ranks
//generate list of transforms to dynamically include all columns
cols = List.RemoveFirstN(Table.ColumnNames(data),2),
xForms = List.Transform(cols, (c)=> {c, each List.PositionOf(List.Sort(Table.Column(data,c),Order.Descending),_)}),
ranks = Table.TransformColumns(data,xForms),
//add Index column to enable row-wise sums
// then add the sumRank column and delete the Index column
#"Added Index" = Table.AddIndexColumn(ranks, "Index", 0, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(#"Added Index", "sumRank", each
List.Sum(
Record.ToList(
Record.RemoveFields(#"Added Index"{[Index]},{"Order","Name","Index"})
)
)),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Index"}),
//join back with the original data table
//extract the sumRank column
join = Table.NestedJoin(data,{"Order","Name"}, #"Removed Columns",{"Order","Name"}, "joined",JoinKind.FullOuter),
#"Expanded joined" = Table.ExpandTableColumn(join, "joined", {"sumRank"}, {"sumRank"}),
//sort by the sumRank column, then remove it
#"Sorted Rows" = Table.Sort(#"Expanded joined",{{"sumRank", Order.Ascending}}),
#"Removed Columns1" = Table.RemoveColumns(#"Sorted Rows",{"sumRank"})
in
#"Removed Columns1"
This set-up is volatile, so I would only adopt it if non-volatile alternatives are not forthcoming.
An additional column in your table with the following formula:
=SUM(COUNTIF(OFFSET([Column1],,TRANSPOSE(ROW(INDIRECT("1:"&COLUMNS(Table1[#[Column1]:[Column4]])))-1)),">="&Table1[#[Column1]:[Column4]]))
which you can then use to sort your table.
Note that this formula will most likely require committing with CTRL+SHIFT+ENTER for your version of Excel.
Amend the table and column names as required, noting that the part
Table1[#[Column1]:[Column4]]
as well as including the table name, should comprise the leftmost and rightmost of the contiguous columns to be interrogated.

Increment difference between cells

I'm trying to duplicate data in a sheet with increments of 12 between each cell from a sheet with 1 cell per row. Between the 12-incremented rows there's other data. This means I can't drag to extend the formula. Like this for customer numbers:
'SheetA'E3 = 'SheetB'Y2
'SheetA'E15 = 'SheetB'Y3
'SheetA'E27 = 'SheetB'Y4
..and so on. I've tried extending 12/24 cells at a time and copying but I can't make it work. Extending doesn't add +1 to one sheet, just +12/+24 to both. Doing this manually will take months. Can this be done without a VBA solution?
Any suggestions? I'm sorry if my terminology isn't on point here.
SheetA:
Try this (run as VBA code):
Sub test1()
For i01 = 0 To 100
Worksheets("SheetA").Cells(3 + 12 * i01, 5) = Worksheets("SheetB").Cells(2 + i01, 25)
Next i01
End Sub
Power Query, available in Windows Excel 2010+ and Office 365, can produce your SheetA given SheetB. Not sure about the effect of the variability you mention.
The query assumes that the correct parameters are listed as column headers in Sheet B. The column headers will get copied over as parameters to sheet A.
To use Power Query:
Select some cell in your Data Table
Data => Get&Transform => from Table/Range
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
M Code
let
//Read in the data
//Change table name in next line to be the "real" table name
Source = Excel.CurrentWorkbook(){[Name="Table12"]}[Content],
//set data types based on first entry in the column
//will be independent of the column names
typeIt = Table.TransformColumnTypes(Source,
List.Transform(
Table.ColumnNames(Source), each
{_,Value.Type(Table.Column(Source,_){0})})
),
//UNpivot except for the c.number and c.name columns to create the Parameter and Level columns
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(typeIt, {"C. number", "C. name"}, "Parameter", "Level"),
//Group By C.Number
//Add the appropriate rows for each customer
//And a blank row to separate the customers
#"Grouped Rows" = Table.Group(#"Unpivoted Other Columns", {"C. number"}, {
{"All", each _, type table [C. number=nullable number, C. name=nullable text, Parameter=text, Level=any]},
{"custLabel", (t)=> Table.InsertRows(t,0,{
[C. number = null, C. name=null,Parameter = null, Level = null],
[C. number = null, C. name=null, Parameter = "Customer Number", Level="Customer Name"],
[C. number = null, C. name=null,Parameter = t[C. number]{0}, Level = t[C. name]{0}],
[C. number = null, C. name=null,Parameter = "Parameter", Level = "Level"]
})}
}),
//Remove the unneeded columns and expand the remaining table
#"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"C. number", "All"}),
#"Expanded custLabel" = Table.ExpandTableColumn(#"Removed Columns", "custLabel", {"Parameter", "Level"}, {"Parameter", "Level"}),
//Remove the top blank row
//promote the new blank row to the Header location
#"Removed Top Rows" = Table.Skip(#"Expanded custLabel",1),
#"Promoted Headers" = Table.PromoteHeaders(#"Removed Top Rows", [PromoteAllScalars=true]),
//data type set to text since it will look better on the report
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Customer Number", type text}, {"Customer Name", type text}})
in
#"Changed Type"```
Data
Results
[ Indirect with row() ]
Assuming 'SheetA'E3 column is the target and 'SheetB'Y2 is the source data.
In SheetA!E3 cell put:
=INDIRECT("SheetB!Y"&( ( (row()-3) / 12) + 2)
Press Enter
Then select SheetA!E3 cell, copy. Then paste in SheetA!E24. The formula will update itself.
Idea :
Find the relation between the target cell row number and the source cell row number. [ b > a : 3 > 2 , 15 > 3, 27 > 4 ] leads to a = (b-3)/12 + 2 . (The math is sort of like figuring out a straight line equation from 3 coordinate.) Then use INDIRECT() to combine the calculated row number with the column address.

Counter for one unique expression for every row in Power Query

I try to get a counter for an unique expression in my table. My table looks something like this:
Starting format
so everytime the expression "Date" appears in the voteOptionText column the counter should increase by one, so that I'm able to distinguish between the diffrent persons who gave the answers, because I know every new set of data begins with the date expression. So it should look like this:
Desired counter
So the counter should only count the word "date" and not other expressions.
I need this counter to pivot the table afterwards and to distinguish multiple answers. So the next step would be to pivot the index column. Do you have any idea how to get this counter? I appreciate any help!
You may use following approach:
let
Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
group = Table.Group(Source, {"voteOptionText"}, {"temp", each _}, 0, (a,b)=>Number.From(b[voteOptionText]="Date")),
i = Table.AddIndexColumn(group, "i", 1),
del = Table.RemoveColumns(i,"voteOptionText"),
final = Table.ExpandTableColumn(del, "temp", {"voteOptionText", "voteAnswer"})
in
final
You don't really need to have a sequential number in your "Counter" column. You just need to have the same number for each group. You can then Group by the Counter column.
So I would
Add an index column
Add a custom column whereby if the contents of voteOptionText is Date then copy the Index number, else null
Fill down
eg
M Code
let
Source = Excel.CurrentWorkbook(){[Name="Table9"]}[Content],
#"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
//Add Counter Column
#"Added Custom" = Table.AddColumn(#"Added Index", "Counter",
each if [voteOptionText] = "Date" then [Index] else null),
#"Filled Down" = Table.FillDown(#"Added Custom",{"Counter"}),
//Remove Index column
#"Removed Columns" = Table.RemoveColumns(#"Filled Down",{"Index"})
in
#"Removed Columns"
Note: You mentioned you wanted to Pivot the results. Since you have multiple rows with the same content/label, you will need to do some further processing before you can create the Pivot Table

Resources