Find distinct values based upon multiple columns - excel

I have a spreadsheet of sales with (to keep the example simple) 3 columns
NAME -- STATE -- COUNTRY
It's easy to find how many sales. (sum all the lines)
I can find out how many customers I have but how about finding out how many customers from a particular state (and country)
NAME -- STATE -- COUNTRY
p1----- CA------ USA
p2----- CA------ USA
p1----- CA------ USA
p1----- CA------ USA
p3----- NY------ USA
p3----- NY------ USA
The above example would give 2 unique customers from CA and 1 unique customer from NY and 3 from the USA
EDIT:
The desired result from the above table would be
STATE - UNIQUE CUSTOMERS
CA ---- 2
NY ---- 1
COUNTRY - UNIQUE CUSTOMERS
USA ---- 3

Assuming your data have headers in row 1 of columns A, B, and C, follow these directions.
In cell F1 enter STATE.
In cell G1 enter COUNT.
In cell F2 enter this array-formula (must be confirmed with Ctrl+Shift+Enter↵):
=IFERROR(INDEX(B$2:INDEX(B:B,COUNTA(B:B)),MATCH(0,COUNTIF(F$1:F1,B$2:INDEX(B:B,COUNTA(B:B))),)),"")
In cell G2 enter this regular formula (confirmed with Enter):
=IF(LEN(F2),COUNTIF(B2:B13,F2),"")
Select F2:G2 and copy.
Now select F3:F51 and paste.
UPDATE
The nature of the question changed. The first formula is exactly the same as before. It gets the distinct states in the source data and culls them so they display with no blanks.
The second formula is now different. It needs to count the number of distinct customers in each state, and it is now an array formula confirmed with Ctrl+Shift+Enter↵).
=IF(LEN(F2),SUM(IF(F2=$B$2:$B$50,1/(COUNTIFS($B$2:$B$50,F2,$A$2:$A$50,$A$2:$A$50)),)),"")

This formula (entered as an array formula CTRL-SHIFT-ENTER) will count the number of occurrences of a Name in MyState
=COUNTIFS(Names,Names,States,MyState)
So if MyState="CA" this would return {3;1;3;3;0;0}
To get the number of names in CA you can sum the reciprocals of this array, EXCEPT taking the reciprocal of zero is invalid/infinite. So wrap the formula above in a test for zero: if it's zero, output zero, otherwise take the reciprocal (one of the rare situations where you get to set infinity equal to zero!):
=IF(COUNTIFS(Names,Names,States,MyState)=0,0,1/COUNTIFS(Names,Names,States,MyState))
(Still an array formula.)
For CA this will return {0.333333;1;0.333333;0.333333;0;0}
The final step is to sum with the array formula:
=SUM(IF(COUNTIFS(Names,Names,States,MyState)=0,0,1/COUNTIFS(Names,Names,States,MyState)))
It's possible that this could return say 2.99999... instead of 3 due to rounding errors. If that's a problem you can fix it by wrapping it with the ROUND function or setting the display format zero decimal places.
It should be straightforward to modify this to count by country. Hope that helps.

Since the question also has an 'google-spreadsheets' tag, this would be my suggested formula to use in a google spreadsheet:
For the state counts:
=query(unique(ArrayFormula({A2:A&B2:B, A2:C})), "select Col3, count(Col1) where Col3 <> '' group by Col3 label count(Col1)''",0)
And for the country counts:
=query(unique(ArrayFormula({A2:A&B2:B&C2:C, C2:C})), "select Col2, count(Col1) where Col2 <> '' group by Col2 label count(Col1)''",0)
Also see this example spreadsheet.

Easy with a PivotTable in Excel 2013:
Also easy with Google Spreadsheets:

Related

Displaying a label, sorted in a third column pulled from the first column according to data in the second column? i.e. Ranking

Imagine you have spreadsheet with data in a fixed # of contiguous rows.. let's say row 1 through row 20
Now let's say you have 3 columns of interest.
A, B and C
Column A is a label column.. the data in there are just string labels.. let's say types of canned food.. Tuna, Spam, Sardines, etc.
Column B is our number column.. let's say it is prices. e.g. 2 for Tuna, 5 for Spam and 3 for Sardines. These prices can change often very rapidly.. ok so prices are not the best example but let's imagine that prices change rapidly.
Now Column C is where we want to put the formula.
I would like to have a formula in Column C that will pull the labels from Column A, based on their prices in column B and rank them from highest to lowest.. that is C1 would calculate to "Spam", C2 to "Sardines" and C3 to "Tuna"
right now there are 20 rows of data.. but maybe at some other point there might be 30 or 6 or 40, etc.
So can someone help me out with the formula or at least explain what functions I need to use and the general idea involved? thanks
=IF(A2:A200<>"";SORTBY(A2:A200;B2:B200;-1);"")
You can simply use SORT formula. In this case =SORT(A1:B1000,2,-1) where A1:B1000 is range to be sorted, second parameter 2 is column number from range to sort by, 3rd parameter for order (-1 is desceding).
Place formula in C1 and you will get spilled array.

How to use VLOOKUP function in MS Excel

I am very very new to Excel
I have two sheets
Sheet 1
Country PMU Cluster
A Asia Mercury
B Australia Venus
C North America Jupiter
All the countries and continents are unique here
In sheet 2
I have
CountryCode Country PMU Cluster
123 A
234 A
453 B
235 C
1 country can have multiple codes
I have to take the PMU and Cluster and merge it with Sheet 2 , sheet 2 will have an additional column of Country Code.
Any help is very much apprciated.
Replacing my answer per your edits.
I'm just doing this on a single sheet but you can easily adapt by pointing to your other sheet for your lookup array.
Here is the formula for cell G2:
==VLOOKUP($F2,$A:$C,2,FALSE)
Here is the formula for cell H2:
=VLOOKUP($F2,$A:$C,3,FALSE)
Drag your formulas down and you're done. Vlookup formulas are very useful I recommend looking up how they work as someone else could better explain than I. Basically, you are looking up a value (column F) in an array (columns A,B,C) and returning a column index (B = 2, C = 3, etc) for a match. Lastly, you are looking for an approximate (TRUE) or exact (FALSE) match. Almost always use FALSE.
Also, look up cell references and how to lock them (ie, how $ signs rules vary). That way you can easily drag formulas across and keep your lookup value and array the same.

Automatically working out the average of filtered results

I have a spreadsheet where column P has a score between 1-6
The cell O4 has the following formula: =AVERAGEIFS(P8:P5000,P8:P5000,"<>6",P8:P5000,"<>0")
This formula searches for the average of the score in column P excluding 6, blanks and 0
Column O has staff names e.g John, Mark, Tim.......
What i want to do is for Cell O4 to automatically calculate the average of the figures shown in column P after i have used the filter function to show only results of a selected staff member.
I was hoping excel might be able to do this automatically however cell O4 appears to still be showing the average of the whole column P regardless of whether i have filtered or not.
I was given the formula below on another forum but it seems to be giving slightly wrong results albeit only by a small amount but i need to have the results exact if possible. Any help appreciated.
=SUMPRODUCT(1-ISNUMBER(MATCH(P8:P100,{0,6},0)),SUBTOTAL(9,OFFSET(P8,ROW(P8:P100)-ROW(P8),0,1)))/SUMPRODUCT(1-ISNUMBER(MATCH(P8:P100,{0,6},0)),SUBTOTAL(2,OFFSET(P8,ROW(P8:P100)-ROW(P8),0,1)))
Maybe
{=AVERAGE(IF((P8:P5000<>6)*(P8:P5000<>0)*SUBTOTAL(103,INDIRECT("O"&ROW(8:5000))),P8:P5000))}
will do what you want. Assuming the Filter is on column O.
The 103 in SUBTOTAL will also exclude if rows are manually hidden. If this ist unwanted and it should only exclude hidden rows, if filtered, then use 3 instead.
This is an array formula. Input it into the cell without the curly brackets and then press [Ctrl]+[Shift]+[Enter] to create the array formula.
I would create a separate table in a new sheet with all unique staff members and then perform the calculation. This way, you can quickly compare values for all staff just by scanning the table instead of having to constantly update the filter to see the values for potentially dozens or hundreds of staff. You would add the staff name range and criteria to your AVERAGEIFS formula.
For your example:
Sheet 2
A B
--- ---
1 | Staff Average
2 | Bob =AVERAGEIFS(Sheet1!$P$8:$P$5000,Sheet1!$O$8:$O$5000,A2,Sheet1!$P$8:$P$5000,"<>6",Sheet1!$P$8:$P$5000,"<>0")
3 | Mary =AVERAGEIFS(Sheet1!$P$8:$P$5000,Sheet1!$O$8:$O$5000,A3,Sheet1!$P$8:$P$5000,"<>6",Sheet1!$P$8:$P$5000,"<>0")
4 | Joe =AVERAGEIFS(Sheet1!$P$8:$P$5000,Sheet1!$O$8:$O$5000,A4,Sheet1!$P$8:$P$5000,"<>6",Sheet1!$P$8:$P$5000,"<>0")

How can I remove non-matching values in two different columns and sort in Excel?

I have several columns of data in my Excel spreadsheet.
Originally, I had two different spreadsheets, as they were generated from reports in a software application.
One of the spreadsheets contains the names of individuals who have had transactions with us in the past year. The other spreadsheet contains the names and the phone numbers. I copied and pasted the columns with the names and phone numbers into my spreadsheet with just the names of people who have purchased something from us in the past year.
My ultimate goal is to extract the names and phone numbers of only the names that have purchased something in the past year.
My column for the past year contains 1,002 names, while my master customer list (with phone numbers) contains over 20,000 individuals. I need the phone numbers of all of the individuals that have purchased something from us in the past year, but I don't want to have to manually go through 1,000 names (and, essentially, 20,000+ to find the match).
If I can achieve my goal without having to use VBA, that would be great. If this is the only route I can take, then I will go that route, but I would like to avoid coding if possible. (This is simply due to time constraints.)
The VLOOKUP function is likely the best solution for you. From the Excel documentation, it:
Looks for a value in the leftmost column of a table, and then returns
a value in the same row from a column you specify. By default, the
table must be sorted in an ascending order.
Note well the implication of that last sentence: the column you're searching in (leftmost column of the lookup table) must be sorted in ascending order for this function to produce the correct results.
Taking a simple example, let's say you have Sheet1 in your Excel workbook with the following information:
A B C
1 Name Transactions Phone
2 Sally 3
3 Alice 5
4 Joe 2
5 Jon 1
You need to add their phone numbers to this sheet, from another workbook. Let's say your phone number information in the other workbook looks like this:
A B
1 Name Phone
2 Alice 2222222
3 Bill 3333333
4 Bob 4444444
5 Jim 5555555
6 Joe 6666666
7 Sally 7777777
8 Sue 8888888
9 Tom 9999999
Take the following steps to add the phone numbers to Sheet1 in the first workbook:
Copy the phone information into a blank sheet in the first workbook. Let's call this Sheet2 for this example.
Make sure the phone information is sorted ascending by the Name column (A), because that's the leftmost column and thus the lookup column.
In cell C2 of Sheet1 (the empty phone cell for Sally), enter: =VLOOKUP(A2, Sheet2!A$2:B$9, 2,FALSE).
Drag-copy this formula down to the remaining cells in the Phone column.
Result:
A B C
1 Name Transactions Phone
2 Sally 3 7777777
3 Alice 5 2222222
4 Joe 2 6666666
5 Jon 1 #N/A
Notes:
The second parameter (Table_array - the lookup data range) should not include the column headings. As you can see, it's Sheet2!A$2:B$9 so it includes the information from rows 2 to 9 in columns A and B.
The last parameter (Range_lookup) should be set to FALSE so you don't pick up the information from the closest match. Note how Jon has no matching phone number record, so his Phone is set to "#N/A" - otherwise he would have been assigned Joe's phone number since that's closest match to Jon.
Parameter documentation:
Lookup_value is the value to be found in the first column of the table, and can be a value, a reference, or a text string.
Table_array is a table of text, numbers, or logical values, in which data is retrieved. Table_array can be a reference to a range or
a range name.
Col_index_num is the column number in Table_array from which the matching value should be returned. The first column of values in
the table is column 1.
Range_lookup is a logical value: to find the closest match in the first column (sorted in ascending order) = TRUE or omitted; find
an exact match = FALSE.

Excel function for ranking duplicate values

I have an excel sheet containing two columns of data that I'd like to rank.
Suppose we have the following:
A B
Franz 58
Daniel 92
Markus 37
Jörg 58
I would like a formula to rank the above data based on column B, and where there are duplicate values (Franz and Jörg) to put the alphabetical name first. What I have at the moment is simply duplicating Franz twice:
=INDEX(Name,MATCH(A2,Points,0))
Can someone advise me of formula / code that will rank the data and arrange duplicate values alphabetically?
Thanks
I would add a helper column in next to your data to help out with ties.
so in column C use
=B1+1/COUNTIF($A$1:$A$4,"<="&A1)/10
This will add on a decimal ranking system based on the name. This assumes that your numbers in column B do not have decimal places, if they do then you will need to increase the 10 on the end of the formula to account for it ie: for 2 decimal places use 1000, 3 : 10000 etc
Use this formula to get the first name
=INDEX(name,MATCH(LARGE(points,1),points,0))
adjust the 1 to 2 for the second name etc
EDIT had the sign around the wrong way
This will rank your data and will not repeat duplicates too:
In C2:
=SUM(1*(b2>$b$2:$b$5))+1+IF(ROW(b2)-ROW($b$2)=0,0,SUM(1*(b2=OFFSET($b$2,0,0,INDEX(ROW(b2)-ROW($b$2)+1,1)-1,1))))
CTRL+SHIFT+ENTER to turn it into an array
Drag these down to C5 and it will not duplicate rank where the name is the same, it will rank them alphabetically if they are the same.
Then if you wanted to order them automatically in order of top performer/score you then do this:
Putting this in E2:
=INDEX(A2:A5,MATCH(LARGE(C2:C5,ROW()-1),C2:C5,0))
...and drag down
Then use a vlookup on your data to return the score putting this in F2:
=vlookup(E2,A2:C5,2,false)
...and drag down
This should give you a table of highest scoring people in score order.
Assuming A2 is the first of the ranked points scores try this version
=INDEX(Name,SMALL(IF(A2=Points,ROW(Points)-MIN(ROW(Points))+1),COUNTIF(A$2:A2,A2)))
confirmed with CTRL+SHIFT+ENTER and copied down
Requires the Name list to be sorted because names with duplicate scores will be listed in the order shown

Resources