Excel counting combinations of logical statements - excel

I have a spreadsheet with data in multiple catagories, e.g. Pet (Dog, Cat, Rabbit), gender (M,F), mode of transport(Car,Bike,Skateboard). For each individual, these can either be true or false. I want to count the number of individuals with a particular combination of pet, gender, mode of transport. I want this to be automatic, so I can specify the gender, pet, mode of transport in cells and the formula counts based on these values.
e.g. How many people are Male, have a Dog and have a Bike? In this case, Male, Dog and Bike need to be read from cells in the spreadsheet.
I have a formula which uses indirect and offset to select columns but can't help but think there must be a better way.
Here's an example which makes it far clearer than my wordy explanation above. Thanks in advance for your help.
Is there a way of representing my data that's better suited to make this counting easier?
Google Drive Link to Excel file
A very valid - I don't download random files from the internet comment:
Here's a csv:
Name,Dog,Cat,Rabbit,Male,Female,Car,Bike,Skateboard
Alice,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE
Bob,TRUE,FALSE,FALSE,TRUE,FALSE,FALSE,TRUE,FALSE
Chris,FALSE,FALSE,TRUE,FALSE,TRUE,TRUE,FALSE,FALSE
Dave,TRUE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,TRUE
Ellie,TRUE,FALSE,TRUE,FALSE,TRUE,TRUE,TRUE,TRUE
Frank,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE
Gerald,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE
Also the formula I used was:
=COUNTIFS( OFFSET(INDIRECT("R3C"&MATCH(L3,Headings,0),FALSE),0,0,numberOfPeople,1),TRUE,
OFFSET(INDIRECT("R3C"&MATCH(M3,Headings,0),FALSE),0,0,numberOfPeople,1),TRUE,
OFFSET(INDIRECT("R3C"&MATCH(N3,Headings,0),FALSE),0,0,numberOfPeople,1),TRUE)
Where numberOfPeople is a reference to a cell that counted the number of entries. And headings is a reference to the column headings of the table.

OFFSET and INDIRECT are volatile functions, in that they will recalculate every time excel calculates regardless if the underlying data has changed or not.
I prefer to use INDEX to return a range.
INDEX takes 3 criteria. INDEX(range,row,column). By putting 0 in the row criterion it will return the full column.
so using MATCH to find the correct column and 0 for row we get the correct column returned.
=COUNTIFS(INDEX(A:I,0,MATCH($K$2,1:1,0)),TRUE,INDEX(A:I,0,MATCH($K$3,1:1,0)),TRUE,INDEX(A:I,0,MATCH($K$4,1:1,0)),TRUE)

It is pretty easy if you use the COUNTIFS formula combined with several IF statements. Here is what should work (put in "O3" and copy down):
=COUNTIFS(IF($L3=F$2,$F$3:$F$9,$G$3:$G$9),TRUE,IF($M3=C$2,$C$3:$C$9,IF($M3=D$2,$D$3:$D$9,$E$3:$E$9)),TRUE,IF($N3=H$2,$H$3:$H$9,IF($N3=I$2,$I$3:$I$9,$J$3:$J$9)),TRUE)
This compares the given values for your categories in the inner IF statements and returns the corresponding columns to check for TRUE. What COUNTIFS does then is to count all rows where the criteria is TRUE for all given criterias. You can also expand your categories by adding more inner IF statements to return corresponding rows.
It would not be quite easy to change your data representation due to the fact that you can have multiple values for one row. You would have to split data with multiple values into multiple rows. Here is an example:
You could then just use the following formula:
=COUNTIFS($D$3:$D$14,$L3,$C$3:$C$14,$M3,$E$3:$E$14,$N3)
Having given Gender in "L3", given Pet in "M3", given Transportation in "N3".
If you have further questions, leave a comment.

Related

Count how many cells contain this string with lookup value

I have a large table of IDs and file quality from files that pertain to that ID. There are multiple files that constitute a data point (imaging data, volumetric in manner) so there are multiple rows per ID. A sample table similar in manner is posted below.
The goal is to create a function that will count how many "good" and "bad" values are per lookup subject in the SubjID column. There are ranges I want to follow, where if >50% of the G_R_B values for that SubjID are good, than G_R_B = G, 50% is good = R, and x<50% is good = B.
I will then sort and get unique value per subject ID.
Does anyone know how to do this? I tried using VLOOKUP and COUNTIF and found ways to do it when all duplicate IDs are stripped, but I am struggling to take into account when duplicate IDs occur.
EDIT: SAMPLE TABLE TO SEE WHAT TO DO IF RANGES ARE INVOLVED (see comment)
If you can't use latest Excel formulas, you can use:
=IF(COUNTIFS($A$2:$A$7,"good",$B$2:$B$7,B2)/COUNTIF($B$2:$B$7,B2)>0.5,"G",IF(COUNTIFS($A$2:$A$7,"good",$B$2:$B$7,B2)/COUNTIF($B$2:$B$7,B2)<0.5,"B","R"))
Result:
COUNTIF used to find how many times value in B2 appears in B column.
COUNTIFS used to find how many times value in B2 appears in B column, with criteria "good" in column A
EDIT:
Updated formula according to your comment and edit:
=IF(COUNTIFS($A$2:$A$21,"good",$B$2:$B$21,B2)/COUNTIF($B$2:$B$21,B2)>0.8,"G",IF(COUNTIFS($A$2:$A$21,"good",$B$2:$B$21,B2)/COUNTIF($B$2:$B$21,B2)>=0.2,"R","B"))
Result:

VLOOKUP text with two criteria

I'm looking for a way to insert a column based on two criteria, as illustrated below. I have a main table with one row per company, and I want to add a column to this with the city names. However, the lookup table has two rows for some companies - one for "small" and one for "large". I'm only interested in retrieving the cities for companies that have size value "small".
I know that I can achieve this with =SUMIFS if the content of the column was a number instead of text. However, with the cities column consisting of text, I don't know how to proceed. I'd ideally like a solution where I don't have to use a helper column.
Edit: this is just an example of my data. I have hundreds of rows,the duplicate answer suggested uses INDEX/MATCH which requires me to give the exact cell location of each condition. This is not the case in my data.
There are a few solutions that I usually use for these tasks. They're not elegant i.e. not a 2-criteria look-up per se, but they get the job done.
Going by your data structure, you have these choices:
Sort your lookup table by size-company, with size in descending order. Thereafter, it's a straightforward vlookup since your big companies are seggregated from small ones.
Build a new key consisting of company-size i.e. CONCAT(company,size) and do the vlookup based on this key.
It's not possible with VLOOKUP. Look my solution in the picture using a array formula.
Solution using array formulas
Formula in F2: =INDEX($C$1:$C$6;SUM(IF(E2=$A$2:$A$6;1)*IF($B$2:$B$6="small";1)*ROW($C$2:$C$6));1)
Ps: don't forget to confirm the formula with Ctrl+Shift+Enter.
Multi-column lookups are certianly possible but not using VLOOKUP. You'll need to use INDEX and MATCH. This becomes pretty complex as it combines array formulas with boolean logic. Here's a nice explanation.
https://exceljet.net/formula/index-and-match-with-multiple-criteria
For your example, assuming Desired Result Company is in column I.
=INDEX($F$4:$F$5,MATCH(1,(D4:D5=I4)*(E4:E5="small"),0))

Excel VLOOKUP with multiple possible options in table array

I have two lists, the first is a set of users. The second list contains different encounter dates for these users.
I need to identify the date that is within 10 days of the "Renew Date" [Column C], but not before. With Member 1 this would be row 3 1/8/2017. With Member 2 this would be row 6, 1/21/2017.
Now using a VLOOKUP which the user before me who managed this spreadsheet obviously isn't viable as it's simply going to pickup the first date that has a matching Member ID. Is there a way to do this in Excel at all?
I have included a link to a sample file and a screenshoit of the sample data.
https://drive.google.com/file/d/0B5kjsJZFrUgFcUFTNFBzQzN4cm8/view?usp=sharing
To avoid the slowness and complexities of array formulas, you can try with SUMIFS but the problem is that if you have more than one match, it will add them, not return the first match. The sum will look like an aberration. Will work however if you are guaranateed that you have only one match in the data.
An alternative is also to use AVERAGEIFS, which, in case of multiple matches, will give you their average and it will look like a valid date and a good result. Enter this formula in D2 and fill down the column:
D2:
=AVERAGEIFS(G:G,F:F,A2,G:G,">="&C2,G:G,"<="&C2+10)
and don't forget to format column D as Date.
Try this
=SUMPRODUCT($G$2:$G$7,--($F$2:$F$7=A2),--($G$2:$G$7<=C2+10),--($G$2:$G$7>C2))
Format the result as date. See screenshot (my system uses DMY order)
Don't use whole column references with this formula. It will slow down the workbook.

Sumproduct or Countif on a 2D matrix

I'm working on data from a population of people with allergies. Each person has a unique ExceptionID, and each allergen has a unique AllergenID (451 in total).
I have a data table with 2 columns (ExceptionID and AllergenID), where each person's allergies are listed row by row. This means that the ExceptionID column has repeated values for people with multiple allergies, and the AllergenID column has repeated values for the different people who have that allergy.
I am trying to count how many times each pair of allergies is present in this population (e.g. Allergen#107 & Allergen#108, Allergen#107 & Allergen#109,etc). To keep it simple I've created a matrix of 451 rows X 451 columns, representing every pair (twice actually because A/B and B/A are equivalent).
I somehow need to use the row name (allergenID) to lookup the ExceptionID in my data table, and count the cases where that matches the ExceptionIDs from the column name (also AllergenID). I have no problem using Vlookup or Index/Match, but I'm struggling with the correct combination of a lookup and Sumproduct or Countif formula.
Any help is greatly appreciated!
Mike
PS I'm using Excel 2016 if that changes anything.
-=UPDATE=-
So the methods suggested by Dirk and MacroMarc both worked, though I couldn't apply the latter to my full data set (17,000+ rows) because it was taking a long time.
I've since decided to turn this into a VBA macro because we now want to see the counts of triplets instead of pairs.
With the 2 columns you start with, it is as good as impossible... You would need to check every ExceptionID to have 2 different specific AllergenID. Better use a helper-table with ExceptionID as rows and AllergenID as columns (or the opposite... whatever you like). The helper table needs a formula like:
=COUNTIFS($A:$A,$D2,$B:$B,E$1)
Which then can be auto-filled. (The ranges are from my example, you need to change them to your needs).
With this helper-matrix you can easily go for your bigger matrix like this:
=COUNTIFS(E:E,1,INDEX($E:$G,,MATCH($I2,$E$1:$G$1,0)),1)
Again, you can auto-fill with this formula, but you need to change it, so it fits your needs.
Because the columns have the same ID2 (would be your AllergenID), there is no need to lookup them because E:E changes automatically with the auto-fill.
Most important part of the formulas are the $ which should not be messed up, or you can not auto-fill it.
Picture of my self-made example (formulas are from the upper left cell in each table):
If you still have any questions, just ask :)
It can be done straight from your original set-up with array formulas:
Please note that array formulas MUST be entered with Ctrl-Shift-Enter, before copying across and down:
In the example pic, I have NAMED the data ranges $A$2:$A$21 as 'People' and $B$2:$B$21 as 'Allergens' to make it a nicer set-up. You can see in the formula bar how that looks as a formula. However you could use the standard references like this in your first matrix cell:
EDIT: silly me, N function is not needed to turn the booleans into 1's and 0's, since multiplying booleans will do the trick. Below formula works...
SUM(IF(MATCH($A$2:$A$21,$A$2:$A$21,0)=ROW($A$2:$A$21)-1, NOT(ISERROR(MATCH($A$2:$A$21&$E2,$A$2:$A$21&$B$2:$B$21,0)))*NOT(ISERROR(MATCH($A$2:$A$21&F$1, $A$2:$A$21&$B$2:$B$21,0))), 0))
Then copy from F2 across and down. It can be perhaps improved in technique with sumproduct or whatever, but it's just a rough example of the technique....

Finding the maximum value among the products in two rows

I am an excel beginner and I would like to do the following.
Let row1= (a_1 a_2 a_3) and row2= (b_1 b_2 b_3).
I want excel to calculate the largest number among the products (a_1b_1, a_2b_2, a_3b_3).
It is very difficult to look up these things for I am not sure what kind of calculation I am doing and it is hard to explain.
Take a third column, C and enter formula in C1 as $A1*$B1. Pull it down vertically to all other rows so that row number gets incremented for each.
Then in the fourth column, use the formula MAX(C:C)
The following formula, array-entered, gives you the result of the largest number among the products:
{=MATCH(A1:C1*A2:C2)}
(provided your data is in A1:C2 in the form you presented it).
For explanations on how to insert array formula in excel see e.g. this microsoft link; in short, you type the formula without the curly brackets and confirm with CTRL+SHIFT+ENTER instead of only ENTER.
If you want to find where this couple of numbers is (in your case: which column), I would try this:
{=MATCH(MAX(A1:C1*A2:C2);A1:C1*A2:C2;0)}
(also array-entered).
you can do that, or make a pivot with the raw data and get the MAX/MIN/AVG, based on the pivot options. I tend to use that instead and then vlookup the ID to the pivot to get whatever aggregate you need.

Resources