I want to merge/concatenate rows if those rows have a duplicate values in one column. The merge step applies to more than one columns. In de table below I show an example of the problem:
+-----+--------+--------+--------+--------+--------+----------+
| | A | B | C | D | E | F |
+-----+--------+--------+--------+--------+--------+----------+
| Dog | | | param1 | | | |
+-----+--------+--------+--------+--------+--------+----------+
| Dog | param2 | | | | | |
+-----+--------+--------+--------+--------+--------+----------+
| Dog | | | | | | |
+-----+--------+--------+--------+--------+--------+----------+
| Dog | | | | | | param3 |
+-----+--------+--------+--------+--------+--------+----------+
| Cat | | param5 | | | | |
+-----+--------+--------+--------+--------+--------+----------+
| Cat | | | | param6 | | |
+-----+--------+--------+--------+--------+--------+----------+
I have about 4000 unique row values and about 30 columns. The duplicate row values are ranging from n=1 to n=10.
My preferred table:
+-----+--------+--------+--------+--------+--------+----------+
| | A | B | C | D | E | F |
+-----+--------+--------+--------+--------+--------+----------+
| Dog | param2 | | param1 | | | param3 |
+-----+--------+--------+--------+--------+--------+----------+
| Cat | | param5 | | param6 | | |
+-----+--------+--------+--------+--------+--------+----------+
Can this be done in Excel with some magic or do I need advanced stuff like python for this?
I have tried multiple formula's with CONCATINATE but to no success.
Thank you in advance
This can also be accomplished using Power Query, available in Windows Excel 2010+ and Excel 365 (Windows or Mac)
To use Power Query
Select some cell in your Data Table
Data => Get&Transform => from Table/Range
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
M Code
let
//change next line to reflect actual data source
Source = Excel.CurrentWorkbook(){[Name="Table10"]}[Content],
//set all columns to data type text
#"Changed Type" = Table.TransformColumnTypes(Source,
List.Transform(Table.ColumnNames(Source), each {_, type text})),
//Group by Animal
//Then "Fill Up" each column and return only the first row
#"Group Animal" = Table.Group(#"Changed Type","Animal",
{"Consolidate", each Table.FillUp(_,Table.ColumnNames(_)){0}}),
//Expand the grouped table and re-set the data types to text
#"Expanded Consolidate" = Table.ExpandRecordColumn(#"Group Animal", "Consolidate",
List.RemoveFirstN(Table.ColumnNames(#"Changed Type"))),
#"Changed Type1" = Table.TransformColumnTypes(#"Expanded Consolidate", List.Transform(Table.ColumnNames(#"Expanded Consolidate"), each {_, type text}))
in
#"Expanded Consolidate"
Tricky. One way is to nest REDUCE() in another:
Formula in A8:
=DROP(REDUCE(0,UNIQUE(A1:A6),LAMBDA(a,b,VSTACK(a,REDUCE(b,SEQUENCE(6),LAMBDA(x,y,HSTACK(x,#SORT(INDEX(FILTER(B1:G6&"",A1:A6=b),,y),,-1))))))),1)
Or, a bit more dynamic:
=LET(r,A1:G6,s,TAKE(r,,1),t,DROP(r,,1)&"",DROP(REDUCE(0,UNIQUE(s),LAMBDA(a,b,VSTACK(a,REDUCE(b,SEQUENCE(COLUMNS(t)),LAMBDA(x,y,HSTACK(x,#SORT(INDEX(FILTER(t,s=b),,y),,-1))))))),1))
Highlight the values you need to be grouped together then go to data, click on groe tab
Current input:
+-------------------------+
| col A |
+-------------------------+
| Test(read) |
| - ABC(dsakl) |
| - ABC(dslø) Write(T) |
| - ABC(dsjkl) |
| Test(true) |
| - ABC(DSdsa) |
| - ABC(dsalk**) write(F) |
+-------------------------+
Wanted output:
+-------------------------+-------------------------+
| col A | col B |
+-------------------------+-------------------------+
| Test(read) | - ABC(dsakl) |
| Test(read) | - ABC(dslø) Write(T) |
| Test(read) | - ABC(dsjkl) |
| Test(true) | - ABC(DSdsa) |
| Test(true) | - ABC(dsalk**) write(F) |
+-------------------------+-------------------------+
Hence, I want to transform my data so everytime "Test" occurs it creates a new row in column "A" corresponding to the number of rows until the next "Test" occurs. The cells containing "ABC" should be written in column "B" in the output file.
Really appreciate input on this one. Thanks!
There may be other approaches to this, but if you want to do it purely with regular formulas you could try this:
Do col B with Aggregate to find the first row with a "-", then the second etc.
=IFERROR(INDEX(A$1:A$10,AGGREGATE(15,6,ROW(A$1:A$10)/(LEFT(A$1:A$10)="-"),ROW())),"")
Col A is more difficult - this finds the nth row starting with "-" as before, then looks before this to find the last row starting with T:
=IFERROR(INDEX(A$1:A$10,AGGREGATE(14,6,ROW($A$1:INDEX(A$1:A$10,AGGREGATE(15,6,ROW(A$1:A$10)/(LEFT(A$1:A$10)="-"),ROW())))
/(LEFT($A$1:INDEX(A$1:A$10,AGGREGATE(15,6,ROW(A$1:A$10)/(LEFT(A$1:A$10)="-"),ROW())))="T"),1)),"")
I am trying to create a count of dates that occur during particular weeks (the last five weeks including this week) and a total running count for dates which would include records prior to the listed five week duration.
I could do this fairly easily using standard Excel formula, but I'm trying to learn PowerQuery and the M language.
My first query returns the relevant rows from the original source data (another query that pulls everything from the Excel table and forces the correct data type for each column).
The dates I'm trying to count are contained in the Section A#(lf)Turnbacks.
The query is called Triage_Turnbacks.
let
Source = #"Source_Data",
#"Filter_Rows" = Table.SelectRows(Source, each ([#"Cancelled?"] = null) and ([#"Section A#(lf)Turnbacks"] <> null)),
#"Removed_Columns" = Table.SelectColumns(Filter_Rows,{"VENDOR CODE", "Supplier Lookup", "Section A#(lf)Turnbacks"}),
#"Duplicate_Date_Column" = Table.DuplicateColumn(Removed_Columns, "Section A#(lf)Turnbacks", "Start_Of_Week"),
#"Start_of_Week" = Table.TransformColumns(#"Duplicate_Date_Column",{{"Start_Of_Week", Date.StartOfWeek, type date}})
in
Start_of_Week
| VENDOR CODE | Supplier Lookup | Section ATurnbacks | Start_Of_Week |
|-------------|-----------------|--------------------|---------------|
| A1 | Supplier A | 08/04/2019 | 08/04/2019 |
| A1 | Supplier A | 08/04/2019 | 08/04/2019 |
| A2 | Supplier B | 16/04/2019 | 15/04/2019 |
| A3 | Supplier C | 30/05/2019 | 27/05/2019 |
| A10 | Supplier D | 24/04/2019 | 22/04/2019 |
| A5 | Supplier E | 18/06/2019 | 17/06/2019 |
| A1 | Supplier A | 17/07/2019 | 15/07/2019 |
| A1 | Supplier A | 24/06/2019 | 24/06/2019 |
| A2 | Supplier B | 11/06/2019 | 10/06/2019 |
| A3 | Supplier C | 03/07/2019 | 01/07/2019 |
| A10 | Supplier D | 02/07/2019 | 01/07/2019 |
| A5 | Supplier E | 17/07/2019 | 15/07/2019 |
| A1 | Supplier A | 17/07/2019 | 15/07/2019 |
| A1 | Supplier A | 05/08/2019 | 05/08/2019 |
| A2 | Supplier B | 12/07/2019 | 08/07/2019 |
| A1 | Supplier A | 05/08/2019 | 05/08/2019 |
| A1 | Supplier A | 29/07/2019 | 29/07/2019 |
| A2 | Supplier B | 12/06/2019 | 10/06/2019 |
| A3 | Supplier C | 21/02/2019 | 18/02/2019 |
| A10 | Supplier D | 23/04/2019 | 22/04/2019 |
| A5 | Supplier E | 14/06/2019 | 10/06/2019 |
| A10 | Supplier D | 08/02/2019 | 04/02/2019 |
A second query returns the previous five week Monday and Sunday dates.
The query is called 5_Week_Calendar.
let
Source = Table.FromList({-5..0}, each{_}),
AddedStartOfWeek = Table.AddColumn(Source, "StartOfWeek", each Date.StartOfWeek(Date.AddDays(Date.From(DateTime.LocalNow()),[Column1]*7)), type date),
AddedEndOfWeek = Table.AddColumn(AddedStartOfWeek, "EndOfWeek", each Date.EndOfWeek([StartOfWeek]), type date),
RemovedColumn = Table.RemoveColumns(AddedEndOfWeek,{"Column1"})
in RemovedColumn
| StartOfWeek | EndOfWeek |
|-------------|------------|
| 01/07/2019 | 07/07/2019 |
| 08/07/2019 | 14/07/2019 |
| 15/07/2019 | 21/07/2019 |
| 22/07/2019 | 28/07/2019 |
| 29/07/2019 | 04/08/2019 |
| 05/08/2019 | 11/08/2019 |
My third query returns the count per week (i.e. which records in Triage_Turnbacks.Start_Of_Week matches 5_Week_Calendar.StartOfWeek.
let
Source = Table.NestedJoin(#"5_Week_Calendar", {"StartOfWeek"}, Triage_Turnbacks, {"Start_Of_Week"}, "Triage_Turnbacks", JoinKind.LeftOuter),
#"Removed Columns" = Table.RemoveColumns(Source,{"EndOfWeek"}),
#"Aggregated Triage_Turnbacks" = Table.AggregateTableColumn(#"Removed Columns", "Triage_Turnbacks", {{"Start_Of_Week", List.Count, "Total Turnbacks"}})
in
#"Aggregated Triage_Turnbacks"
| StartOfWeek | Total Turnbacks |
|-------------|-----------------|
| 01/07/2019 | 2 |
| 08/07/2019 | 1 |
| 15/07/2019 | 3 |
| 22/07/2019 | 1 |
| 29/07/2019 | 1 |
| 05/08/2019 | 2 |
The problem
How do I add the running total which includes records before the initial 5 week date of 01/07/2019?
| StartOfWeek | Total Turnbacks | Running Total |
|-------------|-----------------|---------------|
| 01/07/2019 | 2 | 15 |
| 08/07/2019 | 1 | 16 |
| 15/07/2019 | 3 | 19 |
| 22/07/2019 | 1 | 19 |
| 29/07/2019 | 1 | 20 |
| 05/08/2019 | 2 | 22 |
A running total is simple in concept but, as far as I know, M's standard library does not currently provide anything like List.CumulativeSum or Table.CumulativeSum. This makes things a bit awkward and a custom function will likely be needed.
A running total can be achieved with something like:
ListCumulativeSum = (listOfNumbers as list) as list => List.Accumulate(listOfNumbers, {}, (listState, currentItem) => listState & {List.Last(listState, 0) + currentItem}),
which, given a list of numbers, returns a list of numbers.
The equivalent for tables might be the function below. (I am sure there are many ways of doing this; some more performant/scalable than others.)
TableCumulativeSum = (someTable as table, columnToSum as text, newColumnName as text) =>
let
runningTotal = ListCumulativeSum(Table.Column(someTable, columnToSum)),
rightTable = Table.FromColumns({{0..List.Count(runningTotal)-1}, runningTotal}, {"$rightJoinKey", newColumnName}),
leftTable = Table.AddIndexColumn(someTable, "$leftJoinKey"),
joined = Table.Join(leftTable, {"$leftJoinKey"}, rightTable, {"$rightJoinKey"}, JoinKind.FullOuter),
dropped = Table.RemoveColumns(joined, {"$leftJoinKey", "$rightJoinKey"})
in dropped,
Overall and going back to your question, I think it can be achieved with the code below:
let
// This is just a function which returns a record. The record itself
// contains two values:
// • a list of dates (which will be the "left join column")
// • a replacer function (which conditionally replaces dates)
GetReplacerAndDates = (n as number, replaceWith as any) as record =>
let
startOfCurrentWeek = Date.StartOfWeek(DateTime.Date(DateTime.LocalNow())),
nMondaysAgo = Date.AddWeeks(startOfCurrentWeek, -n),
defaultAndLastNWeeks = {replaceWith} & List.Dates(nMondaysAgo, n + 1, #duration(7, 0, 0, 0)),
conditionalReplacer = (someDate as date) =>
let
startOfWeek = Date.StartOfWeek(someDate),
startOfWeekOrReplacement = if startOfWeek >= nMondaysAgo then startOfWeek else replaceWith
in startOfWeekOrReplacement,
toReturn = [replacer = conditionalReplacer, values = defaultAndLastNWeeks]
in toReturn,
rec = GetReplacerAndDates(5, "Prior period"), // Needs a better variable name.
// You don't need this below, I only used it to give me a starting point
// and so that I could verify my answer.
triageTurnbacks = Table.FromColumns(
{
{"A1","A1","A2","A3","A10","A5","A1","A1","A2","A3","A10","A5","A1","A1","A2","A1","A1","A2","A3","A10","A5","A10"},
{"Supplier A","Supplier A","Supplier B","Supplier C","Supplier D","Supplier E","Supplier A","Supplier A","Supplier B","Supplier C","Supplier D","Supplier E","Supplier A","Supplier A","Supplier B","Supplier A","Supplier A","Supplier B","Supplier C","Supplier D","Supplier E","Supplier D"},
{#date(2019,4,8), #date(2019,4,8), #date(2019,4,16), #date(2019,5,30), #date(2019,4,24), #date(2019,6,18), #date(2019,7,17), #date(2019,6,24), #date(2019,6,11), #date(2019,7,3), #date(2019,7,2), #date(2019,7,17), #date(2019,7,17), #date(2019,8,5), #date(2019,7,12), #date(2019,8,5), #date(2019,7,29), #date(2019,6,12), #date(2019,2,21), #date(2019,4,23), #date(2019,6,14), #date(2019,2,8)}
},
type table [Vendor Code = text, Supplier Lookup = text, Section A Turnbacks = date]
),
// I think the single step "startOfWeekColumn" below should probably replace the #"Duplicate_Date_Column"
// and #"Start_of_Week" steps of your Triage_Turnbacks query.
// You could also assign rec[replacer] to a variable,
// just to make the code/call site easier to read.
startOfWeekColumn = Table.AddColumn(triageTurnbacks, "startOfWeek", each rec[replacer]([Section A Turnbacks]), type any),
datesToJoin = Table.FromColumns({rec[values]}, {"startOfWeek"}),
joined = Table.NestedJoin(datesToJoin, "startOfWeek", startOfWeekColumn, "startOfWeek", "Total Turnbacks", JoinKind.LeftOuter),
reduced = Table.TransformColumns(joined, {{"Total Turnbacks", Table.RowCount, type number}}),
// Helper functions. I don't think cumulative summation can be achieved natively (as far as I know).
ListCumulativeSum = (listOfNumbers as list) as list => List.Accumulate(listOfNumbers, {}, (listState, currentItem) => listState & {List.Last(listState, 0) + currentItem}),
TableCumulativeSum = (someTable as table, columnToSum as text, newColumnName as text) as table =>
let
runningTotal = ListCumulativeSum(Table.Column(someTable, columnToSum)),
rightTable = Table.FromColumns({{0..List.Count(runningTotal)-1}, runningTotal}, {"$rightJoinKey", newColumnName}),
leftTable = Table.AddIndexColumn(someTable, "$leftJoinKey"),
joined = Table.Join(leftTable, {"$leftJoinKey"}, rightTable, {"$rightJoinKey"}, JoinKind.FullOuter),
dropped = Table.RemoveColumns(joined, {"$leftJoinKey", "$rightJoinKey"}),
changedTypes = Table.TransformColumnTypes(dropped, {{newColumnName, type number}})
in changedTypes,
runningTotal = TableCumulativeSum(reduced, "Total Turnbacks", "Running Total")
in
runningTotal
which gives me this:
which I think matches your expected output with the exception of the Total Turnbacks column of the week starting 22/07/2019 (where you have 1 and I have 0).
You could obviously filter/remove the "Prior period" row from the final table.
For further details, please see comments in the code. If they don't help, let me know.