Consolidate entries in Excel - excel

I'm attempting to consolidate the unique dates that some students do their homework. Goal is to get the unique number of entries of date&name (i.e. multiple entries of one person on same date counts as one), and ideally the output can be the following. I thought about using arrays or pivot table but I can't think of any non-manual way to do this. Thanks a lot (pardon my poor formatting...).
(Note: the actual problem involves a wide range of dates and >100 names).
Input
Date Name Quan
10/22/2019 Amy 4
10/10/2019 Amy 3
10/23/2019 Amy 1
10/23/2019 Amy 3
10/10/2019 Amy 5
1/31/2011 Cathy 5
1/31/2011 Cathy 2
10/23/2019 Cathy 1
1/31/2011 Cathy 4
1/31/2011 Cathy 5
Output
Date Name Quan
10/23/2019 Amy 4
10/22/2019 Amy 4
10/10/2019 Amy 8
10/23/2019 Cathy 1
1/31/2011 Cathy 16

If you have O365 with dynamic arrays, you can use all formulas
To get the unique list of Date/Name, sorted in the order you show:
eg: F1: =SORT(UNIQUE(INDEX(Table1,SEQUENCE(ROWS(Table1)),{1,2})),{2,1},{1,-1})
Results will spill to the appropriate rows
To get the sums:
eg: H2: =SUMIFS(Table1[Quan],Table1[Date],F2,Table1[Name],G2)
Note that I used a Table and structured references, but you can use regular range references if you prefer. Structured references have the ability of automatically expanding/contracting if you add/remove data from your table.
Note:
If you don't have the latest functions, you can still use formulas:
To create a unique list, you'll need a helper column in, eg: K
K2: =IFERROR(INDEX(Table1[Date]&Table1[Name],MATCH(0,COUNTIF($K$1:K1,Table1[Date]&Table1[Name]),0)),"")
Fill down until Blanks
Then, for the Date Column:
L2: =--LEFT(K2,5)
Name column:
M2: =MID(K2,6,99)
And use SUMIFS as before to get the SUM.
You'll need to sort using the Data/Sort tab.
Another method is by using Power Query aka Get & Transform (available in Excel 2010+) to
Group By Date and Name
Aggregate with Sum
MCode
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}, {"Name", type text}, {"Quan", Int64.Type}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Date", "Name"}, {{"Quant", each List.Sum([Quan]), type number}}),
#"Sorted Rows" = Table.Sort(#"Grouped Rows",{{"Date", Order.Descending}, {"Name", Order.Ascending}})
in
#"Sorted Rows"

I assume you have three rows: Date, Name, and Quan?
If that's the case, you can try using the CONCATENATE formula to combine Date and Name, then use the new "Date & Name" row and the "Quan" row to make pivot table.
CONCATENATE result should look like this
Pivot table will look for the same "Date & Name" and give you the "Quan" sum you want.

If you use office 2016 or higher (or you have installed PowerQuery-Addin) and you have a lot of Datarow. Group the Data with PowerQuery and without any formula

Related

Excel: How to analyze data in a table that contains multivalue cells

I am working in a science project right now about insects, and I have been logging information about the insects I have been finding along. Right now, I realize that it was a bad decision to register the name of all the insects that I been finding per each observation.
I am not allowed to provide to much information because it is confidential, but I am going to add a similar example of my case in the following table:
# of sample
insect (family)
1
Dermestidae, Histeridae
2
Histeridae, Dichotumius
3
Histeriade
4
Dermestidae, Histeridae
5
Cleridae, Dichotumius
485
Histeriade
486
Dermestidae, Histeridae
487
Dermestidae, Cleridae
488
Histeriade
Something like the above table. In my actual table, I have cells with 5 or 6 diferent insects. The thing is:
How can I search for all the different values? I mean, I want to create a table that contains all the different values and how many of them are... Something like the following table:
Insect (family)
Count
Cleridae
54
Histeridae
154
Dermestidae
34
(There are at least 100 different insects and some of them just appear once, so it is impossible for me to search all the different names manually.
Furthermore, I was thinking about converting my table to a long structure. Something like the following;
Instead of this:
# of sample
insect (family)
1
Dermestidae, Histeridae
2
Histeridae, Dichotumius
3
Histeriade
4
Dermestidae, Histeridae
5
Cleridae, Dichotumius
I want this:
# of sample
insect (family)
1
Dermestidae
1
Histeridae
2
Histeridae
2
Dichotumius
3
Histeriade
4
Dermestidae
4
Histeridae
5
Cleridae
5
Dichotumius
I was thinking that this arrangement should be better than the one that I have now.
I hope someone can help me with this issue. Thanks so much.
I tried the above, but I did´t got it. That's the reasons I asking for help.
What you are trying to accomplish is called unpivoting data. Power query is best for this case. If you want it to do by formula then can try the following formula-
=DROP(REDUCE(0,REDUCE(0,B2:B6,LAMBDA(a,x,VSTACK(a,CONCAT(CHOOSEROWS(A2:A6,ROW(x)-1)&"|")&TEXTSPLIT(x,,",")))),LAMBDA(p,q,VSTACK(p,TEXTSPLIT(q,"|")))),2)
This can be accomplished using Power Query, available in Windows Excel 2010+ and Excel 365 (Windows or Mac)
I am uncertain if you want just the unpivoted table, the Counts of each family, or something else, but I have shown the results at each of the last two steps in the query. You can use what you need.
To use Power Query
Select some cell in your Data Table
Data => Get&Transform => from Table/Range
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
let
//Change next line to reflect your actual data source
Source = Excel.CurrentWorkbook(){[Name="Insects"]}[Content],
//set the data types
#"Changed Type" = Table.TransformColumnTypes(Source,{{"# of sample", Int64.Type}, {"insect (family)", type text}}),
//Split Insect Family column by the comma, into rows
#"Split Column by Delimiter" =
Table.ExpandListColumn(
Table.TransformColumns(#"Changed Type", {{"insect (family)",
Splitter.SplitTextByDelimiter(
",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}),
"insect (family)"),
//Remove any leading and trailing spaces => your unpivoted table
#"Trim Spaces" = Table.TransformColumns(#"Split Column by Delimiter", {"insect (family)", each Text.Trim(_), type text}),
//To create your table with counts, merely Group by the insect (family) column and aggregate with Count
#"Grouped Rows" = Table.Group(#"Trim Spaces", {"insect (family)"}, {{"Count", each Table.RowCount(_), Int64.Type}})
in
#"Grouped Rows"
Source Data
Next to last step showing unpivoted table
Last Step
To answer 'How can I search for all the different values?', the below formula will create a unique list of the insect families (where the insect families are in range B2:B100)
=UNIQUE(TEXTSPLIT(TEXTJOIN(", ",TRUE,B2:B100),"|",", ",TRUE))
You will then be able to use a COUNTIF() formula to find how many tests contain each family.

How to speed up dynamic columns with formulas in Power Query

The Question (How do I make it faster)
I have been playing around with Power Query in Excel for over a year now but for the first time, I have a query that takes 20+ minutes to run.
I am sure there is something here I can learn!
While it does currently work I believe if it was well-written it would run much faster.
Data Structure
There are two databases here
Database of Company (Aka attendees) - About 400 rows
Company Title
Rita Book
Paige Turner
Dee End
etc
Database of Events - About 500 rows
An Event can have many Company (Attendees). The database exports this as a comma-separated list in the column [#"Export CSV - Company"]
Event Title
Export CSV - Company
Date
Year
Event 1
Rita Book, Dee End
1/1/2015
2015
Event 2
Paige Turner
2/1/2015
2015
Event 3
Dee End
3/1/2015
2015
Event 4
Rita Book, Paige Turner, Dee End
1/1/2016
2016
etc
...
...
...
Note that I also have a separate query called #"Company Event Count - 1 Years List" which is a list of all years that events have been run.
The Goal
For a visualization, I need to get the data into the following structure:
Company Title
2015
2016
etc
John Smith
10
20
...
Jane Doe
5
14
...
etc
...
...
...
The Code
I have done my best to comment on my code below. Feel free to ask any questions.
let
// This is a function. It was the only way I could figure out how to use [Company Title] from #"Keep only names column" and "currentColumnTitleYearStr" from the dynamically created columns in the same scope
count_table_year_company = (myTbl, yearStr, companyStr) =>
Table.RowCount(
Table.SelectRows(
myTbl,
each Text.Contains([#"Export CSV - Company"], companyStr)
)
),
Source = #"Company 1 - Loaded CSV From Folder", // Grab a list of all Company
#"Keep only names column" = Table.SelectColumns(Source,{"Company Title"}), // Keep only the [Company Title] field
// Dynamically create columns for each year. Example Columns: [Company Title], [2015], [2016], [2017], etc
#"Add Columns for each year" =
List.Accumulate(
#"Company Event Count - 1 Years List", // Get a table of all events
#"Keep only names column",
(state, currentColumnTitleYearStr) => Table.AddColumn(
state,
currentColumnTitleYearStr, // The Year becomes the column title and is also used in filters
let // I hoped that filting the table by Year at this point would mean it only has to do it once per column, instead of once per cell.
eventsThisYearTbl = Table.SelectRows(
#"Event 1 - Loaded CSV From Folder",
each ([Year] = Number.FromText(currentColumnTitleYearStr))
)
in(
// Finally for each cell, calculate the count of events. E.g How many events did 'John Smith' attend in 2015
each count_table_year_company(eventsThisYearTbl, currentColumnTitleYearStr, [Company Title]) //CompanyTitleVar
)
)
),
FinalStep = #"Add Columns for each year"
in
FinalStep
My Theries
I believe one of a few things may be making it slow
I am using "List.Accumulate(" to dynamically create a column for each year. While this does work I think it may be the wrong formula for the job. Especially because the state field which is like a running total of each cell must be a huge number.
I worry that I have an 'each' where I dont need it but I cant seem to remove any. Its my understanding that every 'each' is effectively a nested loop so removing one may have a dramatic impact on performance.
In Conclusion
While it does currently work I know there is something for me to learn here.
Thank you so much any guidance or suggested readings you can provide :)
Does this do what you want? Converts from left to right. If not please explain more clearly
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
SplitNames = Table.TransformColumns(Source,{{"Names", each Text.Split(_,", ")}}),
#"Expanded Names" = Table.ExpandListColumn(SplitNames, "Names"),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Names",{"Event Title", "Date"}),
#"Added Custom" = Table.AddColumn(#"Removed Columns", "Count", each 1),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Added Custom", {{"Year", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Added Custom", {{"Year", type text}}, "en-US")[Year]), "Year", "Count", List.Sum)
in #"Pivoted Column"

How to transpose data in Excel while using the 1st column as unique identifier?

I need help transposing some 3rd-party Excel output that comes in this format below:
Employee
Question
Response
Bob
Q1
Yes
Bob
Q2
No
Bob
Q3
100
Jane
Q1
No
Jane
Q2
No
Jane
Q3
50
Tom
Q1
No
Tom
Q2
Yes
Tom
Q3
0
Background:
This is survey data containing up to 10 questions and each employee MUST answer each question. So if data was collected from 10 employees for a survey of 3 questions, then the output file will contain (10x3) 30 rows of data
I need to rearrange this data for the "business side" and I realized that the desired output is beyond the scope of simply using TRANSPOSE() in Excel
Here is the final result that I've been asked to design
Employee
Q1
Q2
Q3
Bob
Yes
No
100
Jane
No
No
50
Tom
No
Yes
0
Basically, I need 1-row per employee with each question horizontally lined up and their responses.
Is this even possible? If so, any help would be greatly appreciated!
cheers
This can also be accomplished using Power Query, available in Windows Excel 2010+ and Excel 365 (Windows or Mac)
It is a simple Pivot with no aggregation, and can actually be done entirely from the UI.
I did change your table 1 as Jane seems to have two different answers to Q2 - I suspect the numerical answer is really for Q3
To use Power Query
Select some cell in your Data Table
Data => Get&Transform => from Table/Range
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
let
//read the data
//change the table name in next line to actual table name in your workbook
Source = Excel.CurrentWorkbook(){[Name="Table34"]}[Content],
//set the data types
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Employee", type text}, {"Question", type text}, {"Response", type any}}),
//Pivot with no aggregation
#"Pivoted Column" = Table.Pivot(#"Changed Type", List.Distinct(#"Changed Type"[Question]), "Question", "Response")
in
#"Pivoted Column"
So this works, with an extra column in column D:
=A2&B2
So based on your data:
So, formula in cell B13:
=INDEX($C$2:$C$10,MATCH($A13&B$12,$D$2:$D$10,0))
The issue with your data is that Jane has two of Q2... I had to correct that.
For your list of names, you could copy and use remove duplicates or use unique().

VERY messy data: How do I clean up horizontal data that is all inconsistent? [VBA] [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I keep trying ways to fix a dataset, but keep running into problems because of how inconsistent it is.
Here's what the data looks like:
Entry1
Age
45
Occupation
Scientist
Phone Number
408-283-3721
User I.D.
390842
Housing Type
Condo
Square Footage
1073.29
Floors
2
Bathrooms
2.5
Budget Max
$289,287
Household Size
3
Pets?
Yes
Entry2
Floors
2
Square Footage
1974.19
User I.D.
379733
Phone Number
312-246-9121
Pets?
No
Budget Max
$481,621
Household Size
4
Bathrooms
3
Housing Type
Apartment
Occupation
Pilot
Age
32
Entry3
User I.D.
379621
Floors
1
Square Footage
1223.12
Pets?
No
Occupation
Managing Director
Budget Max
$402,342
Phone Number
714-343-1358
Household Size
2
Age
31
Bathrooms
2
Housing Type
House
I want to create a new, cleaned dataset with headers along the top (e.g. "Age", "Occupation", etc) and the values associated (to the right of each variable name cell) as the row, underneath each column.
The variable names are all mixed up, not always on the same column or relative row, so it's not only transposing into a clean new dataset but finding the appropriate values depending on where the variable is (so, I'm thinking something like .Cells.Find(What:="the variable name") for each one and somehow returning the value next to it in a loop). Then, there's the issue where some entries have 3 rows and 8 columns and others 4 rows and 6 columns (not all rows being full too). I also struggle with placing the values under the appropriate column header and not replace the former value. (i.e not just changing one cell but adding to the one below and so on)
There are over 400 records like this, so doing it manually would be super tedious. I'm fairly certain these are all the variations though.
Loop through the data row by row.
If only the first column has data it is the header of an entry. Write that to a new workbook column A.
Enrty Name
Entry1
Then go to the next row. If more than 2 columns have data it is a data row to the previous entry. Data rows contain data in blocks of 2 cells, where the first block is the data description and the second cell the data value.
So you need to loop through the columns of the data rows in blocks of 2:
Take the first block which is Age | 45
Check if the column Age exists. Here it does not so we name the next free column Age and fill in the data to the last enty
Enrty Name
Age
Entry1
45
Then we move on to the next block Occupation | Scientist and do the same. Check if a column Occupation exists? No, ok insert next free column:
Enrty Name
Age
Occupation
Entry1
45
Scientist
We do this until the entire row is done, then we move over to the next one and if this is a data row too, we keep going until we find a new entry header.
So after the first entry your data would look like this:
Enrty Name
Age
Occupation
Phone Number
User I.D.
Housing Type
Square Footage
Floors
Bathrooms
Budget Max
Entry1
45
Scientist
408-283-3721
390842
Condo
1073.29
2
2.5
$289,287
Yes
Then you move over to the next entry
Enrty Name
Age
Occupation
Phone Number
User I.D.
Housing Type
Square Footage
Floors
Bathrooms
Budget Max
Entry1
45
Scientist
408-283-3721
390842
Condo
1073.29
2
2.5
$289,287
Yes
Entry2
The first data set here is Floors | 2, so you search in the first row for Floors it is found in column 8. So we write 2 into column 8.
Enrty Name
Age
Occupation
Phone Number
User I.D.
Housing Type
Square Footage
Floors
Bathrooms
Budget Max
Entry1
45
Scientist
408-283-3721
390842
Condo
1073.29
2
2.5
$289,287
Yes
Entry2
2
If you keep that going you have cleaned up data in the end.
If your real data corresponds to your example, where all the parameters are spelled identically, you can do this using Power Query.
If there are variations in your data that this table doesn't show, examples of these variations would be needed to craft a better solution.
Select some cell in your Data Table
Data => Get&Transform => from Table/Range
Select My data does NOT have headers
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
M Code (Modified to deal with missing Parameter Values)
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
//Add grouping column Entries and Unpaivot
#"Added Custom" = Table.AddColumn(Source, "Entry", each
if Text.StartsWith([Column1],"entry",Comparer.OrdinalIgnoreCase) then [Column1] else null),
#"Filled Down" = Table.FillDown(#"Added Custom",{"Entry"}),
//Remove extra entry rows
remRows = Table.SelectRows(#"Filled Down", each [Entry] <> [Column1]),
//Table.ReplaceValue(#"Removed Columns1"," ",null,Replacer.ReplaceValue,{"Value"})
//Replace nulls with space so we don't lose one item of a "pair"
#"Replaced Value" = Table.ReplaceValue(remRows,null," ",Replacer.ReplaceValue,Table.ColumnNames(remRows)),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Replaced Value", {"Entry"}, "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),
#"Added Index" = Table.AddIndexColumn(#"Removed Columns", "Index", 0, 1, Int64.Type),
#"Inserted Integer-Division" = Table.AddColumn(#"Added Index", "Integer-Division", each Number.IntegerDivide([Index], 2), Int64.Type),
#"Removed Columns1" = Table.RemoveColumns(#"Inserted Integer-Division",{"Index"}),
//Group in pairs
//Mark blank subTables
//Extract Entry, Parameter and Value
#"Grouped Rows" = Table.Group(#"Removed Columns1", {"Integer-Division"}, {
{"Empties", each List.NonNullCount(List.ReplaceValue(_[Value]," ", null,Replacer.ReplaceValue))},
{"Entry", each _[Entry]{0}},
{"Parameter", each _[Value]{0}},
{"Value", each _[Value]{1}}
}),
#"Filtered Rows" = Table.SelectRows(#"Grouped Rows", each ([Empties] <> 0)),
#"Removed Columns2" = Table.RemoveColumns(#"Filtered Rows",{"Integer-Division", "Empties"}),
//Group by Entry, Pivot and expand
#"Grouped Rows1" = Table.Group(#"Removed Columns2", {"Entry"}, {
{"Pivot", each Table.Pivot(_, _[Parameter], "Parameter","Value")}
}),
//Column name list for the Pivot table.
//Could sort or order them any way you want
colNames = List.Distinct(#"Removed Columns2"[Parameter]),
#"Expanded Pivot" = Table.ExpandTableColumn(#"Grouped Rows1", "Pivot", colNames,colNames)
in
#"Expanded Pivot"
Original Data
Transformed

Filter companies that have at least 3 specific products

I have an excel pivot table (and a table dataset behind) that has the structure like the one below. How can I filter/show only companies (Col A) with Products (Col B) 1 AND 2 AND 3? Sounds like something easy but can't find a way to do that. No problem by achieving this using Power Query (available in Power BI or Excel).
A1: Company 1 | B1: Product 1
A2: Company 1 | B2: Product 2
A3: Company 1 | B3: Product 3
A4: Company 1 | B4: Product 4
A5: Company 2 | B5: Product 1
A6: Company 3 | B6: Product 1
A7: Company 4 | B7: Product 1
A8: Company 4 | B8: Product 2
A9: Company 4 | B9: Product 3
A10: Company 4 | B9: Product 4
A11: Company 4 | B9: Product 5
Here's an approach using Power Query.
Starting with this brought into Power Query from the table in Excel:
I then group on Company (Transform > Group By):
Then I add a new custom column (Add Column > Custom Column) to flag whether each company has the 3 products included in its associated grouped table's Product column:
Then I filter out the FALSE entries from the new custom column (use button at top right of Custom column):
Then I expand the Products column from the embedded table in the AllData column (use button at top right of AllData column).
Then I remove the Custom column:
Here's the M code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Company", type text}, {"Product", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Company"}, {{"AllData", each _, type table}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each List.ContainsAll([AllData][Product], {"Product 1","Product 2","Product 3"})),
#"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([Custom] = true)),
#"Expanded AllData" = Table.ExpandTableColumn(#"Filtered Rows", "AllData", {"Product"}, {"Product"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded AllData",{"Custom"})
in
#"Removed Columns"
Basically, you'll need to do a couple of things to do this entirely in Excel:
Add a new table that lists the products, with a column indicating whether that product is included/flagged:
Update your company/product table to have 2 helper columns: One to VLOOKUP whether the product is flagged, and one to indicate whether a company has all 3 flagged products:
The first helper column would use a formula like =VLOOKUP([#Product],tProducts,2,FALSE).
The second helper column would use a formula like =COUNTIFS([Company],[#Company],[Product Flagged],TRUE)>=3.
Rows with a TRUE in Column D have 1 each of Products 1, 2, and 3 (unless you have rows with duplicate company/product combinations, where it gets a bit trickier):
In your pivot table, you can filter by this helper column:

Resources