how to find non unique combinations of two variables in Excel? - excel

I have two columns in Excel that identify individual records (ID and code). Some of these may occur multiple times. For some records, the code may be missing. And some IDs belong to multiple codes. I need to find these IDs that have non unique associations with a given code, and show what those are.
Minimal example:
ID code
K151 ABC
K152 BCD
K153 EFG
K154
K151 ABC
K154 HDG
K153 EFF
K151 ABC
K153 EFG
So I need to have a list (possibly with the number of occurences):
ID code freq
153 EFG 2
153 EFF 1
154 1
154 HDG 1
It is fairly easy to do something similar using a Pivot Table, but note that e.g. K151 - ABC occurs 3 times, and it should not be listed, just those IDs that have multiple codes. Also, in the pivot table the codes are collapsed under the ID as parent category and they are not shown side by side.
It is also OK, if the non-uniquely coded IDs are flagged in the original table in a new variable, and then these records can be filtered manually using the flag.
ID code flag
K151 ABC 0
K152 BCD 0
K153 EFG 1
...
K153 EFF 1
...
I need to find a solution in Excel (2013), not VBA or anything else, and ideally the solution should also be compatible with LibreOffice Calc.

The flagging will be simple.
Formula in C2 downwards:
=COUNTIF($A:$A,"="&A2)<>COUNTIFS($A:$A,"="&A2,$B:$B,"="&B2)
Formula in D2 downwards:
=IF(C2,COUNTIFS($A:$A,"="&A2,$B:$B,"="&B2),0)
For OpenOffice we need SUMPRODUCT because COUNTIF will not count if blank.
Formula in C2 downwards:
=COUNTIF($A$1:$A$20,"="&A2)<>SUMPRODUCT(($A$1:$A$20=A2)*($B$1:$B$20=B2))
Formula in D2 downwards:
=IF(C2,SUMPRODUCT(($A$1:$A$20=A2)*($B$1:$B$20=B2)),0)

Copy and paste the first two columns in a different location.
Select the data and Use Remove Duplicates tool in the Data tab.
Start third column and use COUNTIFS for multiple column criteria to get the count of each row.

Related

Count unique values in column A based on a moving range of criteria in column B

Okay, so this is my first question, let's hope I can explain it well...
Essentially, I would like to count the number of unique values in column A, but from a subset of those which have, in column B, a value that falls within a specified range.
Here's an example:
ColumnA ColumnB
potato 29.1
potato 29.7
potato 30.3
potato 31.0
bean 31.6
apple 32.2
apple 32.8
bean 33.5
bean 34.0
apple 34.3
potato 35.0
Count b/w 29-31: 1
Count b/w 30-32: 2
Count b/w 31-33: 3
Count b/w 32-34: 2
Count b/w 33-35: 3
In other words, I want to know how many unique items are present within each range (as specified by column B), and I want to carry that down through a series of overlapping ranges.
So far, the best I've been able to come up with is a COUNTIFS formula that counts the total number of records in each range. e.g.:
=COUNTIFS(B1:B11,">=29",B1:B11,"<=31")
=COUNTIFS(B1:B11,">=30",B1:B11,"<=32")
=COUNTIFS(B1:B11,">=31",B1:B11,"<=33")
etc...
And this obviously doesn't even reference column A. I've tried a few different array formulas based on similar questions, but they're always solving a slightly different problem, so I've been largely unsuccessful.
Any help much appreciated! Thank you.
You would use this array formula:
=SUM(IF(($B$2:$B$12>=A16)*($B$2:$B$12<=B16),(1/COUNTIFS($A$2:$A$12,$A$2:$A$12,$B$2:$B$12,">=" & A16,$B$2:$B$12,"<=" & B16))))
Being and array formula it must be confirmed with Ctrl-Shift-Enter when exiting edit mode. If done correctly then Excel will put the {} around the formula automatically.
It finds all the rows where the data in B is between the extents then uses the 1/COUNTIF() to find the unique values.

Equation/ sorting to separate one long column of data into separate columns

I have two columns of data in an excel spreadsheet that is listed like, each date has three numbers associated with it. It is shown like this:
1 112
1 123
1 456
2 788
2 989
2 901
What I am trying to do is have the data shown like this:
1
112
123
456
Then in another column next to it have;
2
788
989
901
Okay, this can be done pretty easily/quickly.
First, select your entire column that has # ### and go to Data --> Text to Columns, and choose “Delimited”, then use a “Space” delimiter. This will separate your numbers by the space, so 1 and 2 will be in Column A, and the three digit numbers are in B (or wherever you decide to put them).
Then, just get the unique values from column A. I tend to copy the entire column to a temporary column (or worksheet), then highlight them and go to Data --> Remove Duplicates. Now you have a list of unique numbers. Copy and paste these (transposing) into (for example) column D.
Then, in D2, enter this formula (adjust ranges as necessary) as an array, using CTRL+SHIFT+ENTER:
=IFERROR(INDEX($B$1:$B$6,SMALL(IF($A$1:$A$6=D$1,ROW($B$1:$B$6)-ROW($B$1)+1),ROWS($B$2:$B2))),"")
Here’s a screenshot of the final output:

Automatically working out the average of filtered results

I have a spreadsheet where column P has a score between 1-6
The cell O4 has the following formula: =AVERAGEIFS(P8:P5000,P8:P5000,"<>6",P8:P5000,"<>0")
This formula searches for the average of the score in column P excluding 6, blanks and 0
Column O has staff names e.g John, Mark, Tim.......
What i want to do is for Cell O4 to automatically calculate the average of the figures shown in column P after i have used the filter function to show only results of a selected staff member.
I was hoping excel might be able to do this automatically however cell O4 appears to still be showing the average of the whole column P regardless of whether i have filtered or not.
I was given the formula below on another forum but it seems to be giving slightly wrong results albeit only by a small amount but i need to have the results exact if possible. Any help appreciated.
=SUMPRODUCT(1-ISNUMBER(MATCH(P8:P100,{0,6},0)),SUBTOTAL(9,OFFSET(P8,ROW(P8:P100)-ROW(P8),0,1)))/SUMPRODUCT(1-ISNUMBER(MATCH(P8:P100,{0,6},0)),SUBTOTAL(2,OFFSET(P8,ROW(P8:P100)-ROW(P8),0,1)))
Maybe
{=AVERAGE(IF((P8:P5000<>6)*(P8:P5000<>0)*SUBTOTAL(103,INDIRECT("O"&ROW(8:5000))),P8:P5000))}
will do what you want. Assuming the Filter is on column O.
The 103 in SUBTOTAL will also exclude if rows are manually hidden. If this ist unwanted and it should only exclude hidden rows, if filtered, then use 3 instead.
This is an array formula. Input it into the cell without the curly brackets and then press [Ctrl]+[Shift]+[Enter] to create the array formula.
I would create a separate table in a new sheet with all unique staff members and then perform the calculation. This way, you can quickly compare values for all staff just by scanning the table instead of having to constantly update the filter to see the values for potentially dozens or hundreds of staff. You would add the staff name range and criteria to your AVERAGEIFS formula.
For your example:
Sheet 2
A B
--- ---
1 | Staff Average
2 | Bob =AVERAGEIFS(Sheet1!$P$8:$P$5000,Sheet1!$O$8:$O$5000,A2,Sheet1!$P$8:$P$5000,"<>6",Sheet1!$P$8:$P$5000,"<>0")
3 | Mary =AVERAGEIFS(Sheet1!$P$8:$P$5000,Sheet1!$O$8:$O$5000,A3,Sheet1!$P$8:$P$5000,"<>6",Sheet1!$P$8:$P$5000,"<>0")
4 | Joe =AVERAGEIFS(Sheet1!$P$8:$P$5000,Sheet1!$O$8:$O$5000,A4,Sheet1!$P$8:$P$5000,"<>6",Sheet1!$P$8:$P$5000,"<>0")

Complex Conditional Formatting

Is this even possible? Here's what I'm trying to do:
This is a gigantic spread sheet with lots of data on lots of different things, one particular section of it is set up like so:
name1 name2 num1 num2
john smith 3
jane doe 5
samuel jackson 0
jackie chan 2 12
abe lincoln 19
Most of the time num2 is going to be left blank, but if there is an entry, I want to concatenate name1 and name2, with the space, and then apply the conditional formatting to cells in the spreadsheet that contain the concatenated name.
So in the above example, ANY cells in the spreadsheet containing "jackie chan" will be the target cells for the conditional formatting.
Any advice will be appreciated!
I quickly copied to excel your data and tried to solve the problem.
In a column next to the data (E) I concatenated the two names if the num2 was not blank with the expression: =IF(NOT(ISBLANK(D2));A2&" "&B2;"") otherwise I left it empty.
Created a little test column, where I've put some names apearing in the E column and some which not. Then I used the =NOT(ISERROR(VLOOKUP(INDIRECT(ADDRESS(ROW();COLUMN()));$E$2:$E$6;1;0))) expression to conditionally format the cells in the test column. In which INDIRECT(...) gets the cell's value, and if VLOOKUP does not find a match in the E column, it raises an N/A error, which is caught by the ISERROR function (not ISERR!).

Cross referencing Excel worksheets

I'm working with 3 worksheets.
PROJECTS consists of the following:
Project ClientCode Code
------ ---------- ----
Project1 ABC 123
Project2 ABC 456
Project3 DEF 789
INVOICES consists of:
ProjectCode Amount
----------- -----
123 $100
789 $200
123 $50
And CLIENTS consists of:
Code Total
---- -----
ABC [$150]
DEF [$200]
I'm trying to create a formula which will populate the "Total" field on the client sheet by determining which invoices belong to which project belong to which client. I feel like it would be a combination of SUMIF and LOOKUP, but I'm stumped.
EDIT: Revised the above to the format discussed below (swapped Projects column B and C)
Using VLOOKUP and SUMIF in a single cell without any helper column is possible, but you will need to interchange the positions of columns ClientCode and Code in PROJECTS for it to work.
Interchange the column positions as mentioned above (so that ClientCode is before Code), then use:
=SUMIF(INVOICES!A:A, VLOOKUP(CLIENTS!A2, PROJECTS!B:C, 2, 0), INVOICES!B:B)
I'm assuming that row 1 of each worksheet has the column headers. A2 here refers to ABC.
VLOOKUP first looks for the Code of the ClientCode and SUMIF then sums the amounts of matched Code in the INVOICES worksheet.
EDIT: Below should work better since VLOOKUP finds only the first match, which doesn't work here.
=SUM(SUMIF(INVOICES!A:A,IF(CLIENTS!A1=PROJECTS!C:C,PROJECTS!B:B),INVOICES!F:F))
Note that you have to use Ctrl+Shift+Enter to use this formula. After you did it for ABC, you can drag the formula to B. Also note that his formula can take some time to evaluate, and as such, it might be better if you change the ranges to an appropriate range. For example, if INVOICES has only 100 rows, change INVOICES!A:A, INVOICES!B:B to INVOICES!A2:A100, INVOICES!B2:B100, same goes for the other ranges in this formula.

Resources