Excel grouping by prioritized ruleset - excel

I wonder why nobody has asked this, but how do I classify (ordinal) entries in a table according to a prioritized ruleset / tree? (possibly with naked excel and not a nested if cascade)
Minimal example (only 3 of 11 or more features shown)
Name | IsCool | IsNerdy | HasChild
Joe | 1 | 1 | 1
Charliese | 1 | 0 | 1
Peter | 1 | 0 | 0
Jonas | 0 | 0 | 0
Rules
Priority | IsCool | IsNerdy | HasChild | => Group
1. | 1 | 1 | ignore | A (at least cool&nerdy)
2. | ignore | ignore | 1 | B (not A, but has a child)
3. | 1 | 0 | 0 | C (only cool)
4. | ignore | ignore | ignore | D (everything else)
stop after first match
yielding:
Name | IsCool | IsNerdy | HasChild | Group
Joe | 1 | 1 | 1 | A
Charliese | 1 | 0 | 1 | B
Peter | 1 | 0 | 0 | C
Jonas | 0 | 0 | 0 | D

You can convert a "ruleset" into all possible combinations of the attributes (IsCool, IsNerdy, HasChild, etc) by treating "Ignore" as 0 (zero) or 1 (unity).
So, the first rule in the questions ruleset would be replaced by two rules.
IsCool ¦ IsNerdy ¦ HasChild¦ Group
1 ¦ 1 ¦ 0 ¦ A
1 ¦ 1 ¦ 1 ¦ A
Although there are only 8 possibilities across the three attributes, this approach can lead to more than 8 rules. For example in the question's ruleset, a person with the data tuple (IsCool, IsNerdy, HasChild) given by (1,1,1) would match to both Group A and Group B when the "ignore"'s in the rulset are expanded in this way. To eliminate the ambiguity this causes, it is also necessary to apply the priorities: the match to Group A has higher priority than that for B so the lookup table would include (1,1,1,A) as a row but exclude (1,1,1,B).
With a larger ruleset involving more attributes, the task of constructing the required VLOOKUPtable from the table of rules will not be without its problems, particularly if a non-manual approach is desired and just Excel is to be used rather Excel in conjunction with VBA.
An alternative approach not involving VBA which treats the ruleset as data is as follows.
Formalising the notation introduced above, a rule involving n attributes can be expressed as
(r[1],r[2],...,r[n],G)
where r[i] (i=1,...,n) can take values of 0, 1 or "ignore" and G represents the group (one of A, B, C or D in the question's example).
An instance of data can similarly be represented as
(d[1],d[2],...,d[n])
where d[i] (i=1,...,n) takes values of 0 or 1 (but not "ignore")
The rule is matched if
r[i] = "ignore" OR "d[i] = r[i]" for each i from 1 to n
There is a pretty obvious way implementing this in Excel as
=AND(OR(r[1]="ignore",d[1]=r[1]),OR(r[2]="ignore",d[2]=r[2]),...,OR(r[n]="ignore",d[n]=r[n]))
where, of course, the relevant cell references are used in place of the d[i] and r[i] placeholders shown above and the appropriate number of OR's are nested inside of the AND to replace the ... shorthand.
The pseudo-formula above has a value of either TRUE or FALSE with the former indicating the data instance matches to the rule and the latter that it does not.
However, this is not the end of the story as the rules still need to be applied in priority order.
So, extending the notation further, assume that the rules are listed in priority order (rule 1 has a higher priority than rule 2, which has a higher priority than rule 3, etc)
If cell C[k] holds the result of the applying the k'th rule to an instance of the data then modifying the above pseudo-formula so that C[k] now has the formula
=IF(OR(C[1],...,C[k-1]),FALSE,AND(...))
ensures that the k'th rule can be matched only if no earlier (and therefore higher priority) rule is matched. (Here the third part of the IF is the AND formula previously noted.)
The screengrab below shows the approach in action for the question's example.
The cells in blue are formulae. Those for TRUE/FALSE values in the second table implement the pseudo-formula discussed above. For example, cellF13 shows the result of applying rule 1 to the data instance (0,0,0) and has the following formula
=AND(OR($C$5="Ignore",$C$5=$C13),OR($D$5="Ignore",$D$5=$D13),OR($E$5="Ignore",$E$5=$E13))
NB: no IF needs to be wrapped around this formula because there is no rule with a higher priority than rule 1.
The formula for cell I13 shows the result of applying rule 4 to the same data instance and needs to take account of the higher priorities accorded to rules 1, 2 and 3. The formula in this cell is
=IF(OR($F13:H13),FALSE,AND(OR($C$8="Ignore",$C$8=$C13),OR($D$8="Ignore",$D$8=$D13),OR($E$8="Ignore",$E$8=$E13)))
The formulae in cells G13 and H13 are similar to that of I13 (left as an exercise).
By design, there can be at most one TRUE value in each row and, if the ruleset is sound, there should be exactly one such value. The formulae in the final column of the second table make this asssumption about the ruleset and simply pick out the relevant value from the final column of the first table corresponding to whichever rule shows as TRUE.
The formula in cell J13 is
=INDEX(F$5:F$8,SUMPRODUCT(1*(F13:I13),F$12:I$12))
The formulae in range F13:J13 were simply copied down the rows of the table to cells F14:J20

You could do this by creating a key in your data (e.g. Joe = "111", Charliese = "101", etc.), and then it's just a vlookup against your ruleset which contains all possible combinations of the key.

Related

Countif criteria with IF/OR logic across multiple sheets and columns

I've been struggling between the SUMPRODUCT and COUNTIFS formulas as there are a lot of specific dependencies in my data. Wondering if anyone can shed a bit more light on this issue.
Have tried SUMPRODUCT and COUNTIFS which give me calculations based on 1 set, but I need to include additional if/or statements.
I have the following:
| ID | Size | Dead/Alive | Duration | Days | Pass/Fil | Reason |
|----|---------|------------|-----------|------|----------|----------|
| 1 | Full | Dead | Permanent | 125 | Pass | Comments |
| 2 | Partial | Alive | Permanent | 500 | Pass | |
| 3 | Other | Dead | Temporary | 180 | Fail | Comments |
| 4 | No | Dead | Temporary | 225 | Fail | Comments |
| 5 | Yes | Alive | Permanent | 200 | Pass | |
with the following rules:
Only Count the ID/ROW if:
1) Values in column A = Full, Partial or Other
OR...
2) Values in column A = No AND values in column B = Dead
OR...
3) If values in column C = Permanent AND values in column D = >=100 or <=200
OR
4) If values in column C = Temporary AND values in column E = Pass, Fail AND column F=not blank
By my calculations, the total should be 5, but this is just a small sampling of my total data. Just not sure how to get that in Excel with either Sumproduct, Countifs or even someone suggested a Lookup function, although Ive never used that one.
Given that you have so many different conditions, I have to break it down one by one and create a few helper columns to account for each condition.
In my solution I created 10 helper columns as shown below, and I have added some sample data (ID 6 to 29) to test the solution.
I also named 7 conditions in my solution:
Cond_1 Values in column A = Full, Partial or Other
Cond_2 Values in column A = No AND values in column B = Dead
Cond_3A Values in column C = Permanent
Cond_3B Values in column D >=100
Cond_3C Values in column D <=200
Cond_3A, Cond_3B and Cond_3C must be TRUE at the same time
Cond_4 Values in column C = Temporary AND values in column E = Pass
Cond_5A Values in column C = Temporary AND values in column E = Fail
Cond_5B Column F is not blank (I did not give a name to this condition)
Cond_5A and Cond_5B must be TRUE at the same time
Please note my Cond_4, Cond_5A and Cond_5B are all related to your original condition 4), which reads a bit odd, and I am not 100% sure if my interpretation of the condition is correct. If not please re-state your last condition and I can amend my answer accordingly.
As shown in my screen-shot, the formulas in I2 to Q2 are listed in Column U. I only used MAX, AND, SUM, =, &, and/or <> to interpret each condition. Please note some of the formulas are Array Formula so you need to press Ctrl+Shift+Enter to make it work.
The To Count column is simply asking whether the SUM of the previous 9 columns is greater than 1, which means at least one of the conditions is met. If so returns 1 otherwise 0.
Then you just need to work out the total of To Count column. In my example it is 22. I have highlighted the entries that did not meet any of the given condition.
You can use only one helper column to capture all conditions in one formula, but I would not recommend it as it would be too long to be easily understood and modified in future.
{=--(SUM(MAX(--(A2=Cond_1)),MAX(--(A2&B2=Cond_2)),--(SUM(--(C2=Cond_3A),--(AND(D2>=Cond_3B,D2<=Cond_3C)))=2),MAX(--((C2&E2)=Cond_4)),--(SUM(MAX(--((C2&E2)=Cond_5)),--(F2<>""))=2))>0)}
Ps. I would also wonder if there is a formula-based solution without using any helper column...? :)

Microsoft Excel- Invert a 0/1 value or leave empty if no value

I'm ran into an issue while trying to do some work in an excel doc. I have created a column to count the number of flags in other columns. One of the flags I created simply inverts the value of another flag. I'm aware there's no reason for the new column if all I'm doing is inverting the value of a column that already exists, but space isn't an issue and in this case it's much easier for to create a 2nd column to keep things easier to understand.
The new column I created is based off a "consent given" column where the value is 1 if consent has been given, 0 if it has not been given, or empty if we don't have that information. I want my flag counter to count when consent has NOT been given, but I can't figure out a formula to set the value as 0 if consent has been given, 1 if it has not been given, or empty if no information. I can easily say 1 for no consent and 0 for consent or no information, but I want the no information columns to be empty. I can't use an empty string in the case of no information because that breaks my flag counter sum column.
Here is some example data of how I would like everything to be-
A | B | C | D
-------------------------------------------------------------
flag counter | consent given | consent not given | other flag
-------------------------------------------------------------
0 | 1 | 0 | 0
1 | 1 | 0 | 1
1 | 0 | 1 | 0
2 | 0 | 1 | 1
1 | | | 1
0 | | | 0
Formulas I am currently using (all formulas only look at cells in their same row- i.e. A1 uses B1, C1, D1, while A2 uses B2, C2, D2. I'll list A1 below for simplicity):
A1: =C1+D1
B1: This is raw data. Values are 1, 0, or empty
C1: This is what I'm looking for. I want something along the lines of =IF(B1=1, 0, IF(B1=0, 1, <empty>)) where <empty> is a pure empty that doesn't break the addition formula in column A (empty string breaks the formula). I've found formulas that leave the empty string or a 0, and either of those work in this instance. I want 1 -> 0, 0 -> 1, no value -> no value.
D1: This is raw data. Values are 1, 0, or empty
use ABS:
=IF(B2="","",ABS(B2-1))
And then in Column A sue sum that will ignore the text.
=SUM(C2:D2)

How to extract multiple rows that meet a criteria which is given by 2 drop-down lists in EXCEL

I have a sheet that looks like this:
A | B | C | D | E | F
1 NAME | TASK | ADRESS | ORDER_GIVER | COUNT | NOTE
2 DROPDOWN_2 | move | NY, xy_street | Ann | 1 | ...
3 DROPDOWN_2 | fill | CA, yx_street | Rose | 3 | ...
...
100 NAME | TASK | ADRESS | ORDER_GIVER | COUNT | NOTE
101 DROPDOWN_1
102
103 NAME | TASK | ADRESS | ORDER_GIVER | COUNT | NOTE
104 DROPDOWN_1
INITIALLY:
In rows 1-99 you find the tasks with 1 column empty (NAME).
In rows 100+ you find "Tickets" which can be printed (2 rows for example 100-101)
THEN
1, The ORGANISER (me) makes tickets with names, by ctrl+c/ctrl+v the "ticket structure" and by choosing a name from the DROPDOWN_1 list.
2, Then starts to assign the tasks (row 1-99) to people by choosing them from the DROPDOWN_2 list. (note that dropdown name lists contain the same names.)
After this I would like to have the Excel to fill in the tickets by the rows that contain the same name as the ticket. One person can be assigned to more tasks, but one task can only be assigned to one man. (So tickets can have 1 NAME but more rows depending on the 1-99 list.
I am asking you to help me make a formula or function for this "autofill" of tickets because I have been searching for days for a solution however couldn't find a proper one.
In the Similar problems and solutions section you can find 2 links which had the closest answer. Unfortunately neither of them contain dropdown lists. I tried to solve the problem with INDEX(MATCH()) functions, but the problem is that it cannot handle the changes of names.
Thanks you,
Max
Similar problems and solutions:
https://www.get-digital-help.com/2009/09/28/extract-all-rows-from-a-range-that-meet-criteria-in-one-column-in-excel/
Extracting all rows based on a value of cell without VBA
Select A101:F392 and enter this as an array formula (ctrl+shift+enter):
=IFERROR(INDEX(A1:F99,ROUND(MOD(SMALL(IFERROR(CHOOSE({1,2},SMALL(IFERROR(1/(1/MMULT(IF(SMALL(COUNTIF(A2:A99,"<="&A2:A99),ROW(INDIRECT("2:98")))=SMALL(COUNTIF(A2:A99,"<="&A2:A99),ROW(INDIRECT("1:97"))),0,ROW(A2:A98)),{1,1}))+{0.001,-0.001},FALSE),ROW(INDIRECT("1:196"))),COUNTIF(A2:A99,"<="&A2:A99)+ROW(A2:A99)/1000),FALSE),ROW(INDIRECT("1:292"))),1)*1000,0),{1,2,3,4,5,6}),"")

Counting the number of older siblings in an Excel spreadsheet

I have a longitudinal spreadsheet of adolescent growth.
ID | CollectionDate | DOB | MOTHER ID | Sex
1 | 1Aug03 | 3Apr90 | 12 | 1
1 | 4Sept04 | 3Apr90 | 12 | 1
1 | 1Sept05 | 3Apr90 | 12 | 1
2 | 1Aug03 | 21Dec91 | 12 | 0
2 | 4Sept04 | 21Dec91 | 12 | 0
2 | 1Sept05 | 21Dec91 | 12 | 0
3 | 1Aug03 | 30Jan89 | 23 | 0
3 | 4Sept04 | 30Jan89 | 23 | 0
This is a sample of how my data is formatted and some of the variables that I have. As you can see, since it is longitudinal, each individual has multiple measurements. In the actual database there are over 10 measurements per individual and over 250 individuals.
What I am wanting to do is input a value signifying the number of older brothers and older sisters each individual has. That is why I have included the Mother ID (because it represents genetic relatedness) and sex. These new variable columns would just say how many older siblings of each sex each individual has. Is there a formula that I could use to do this quickly?
=COUNTIFS($B:$B,"<>"&$B2,$H:$H,$H2,$AI:$AI,$AI2,$J:$J,"<"&$J2)
Create a column named Distinct with this formula
=1/COUNTIF([ID],[#ID])
Then you can find all the older 0-sexed siblings like this
=SUMPRODUCT(([DOB]>[#DOB])*([MOTHERID]=[#MOTHERID])*([Sex]=0)*([Distinct]))
Note that I made the data a Table and used table notation. If you're not familiar [COLUMNNAME] refers to the whole column and [#COLUMNNAME] refers to the value in that column on the current row. It's similar to saying $A:$A and A2 if you're dealing with column A.
The first formula gives you a value to count that will always result in 1 for a particular ID. So ID=1 has three lines and Distinct will result in .33333 for each line. When you add up the three lines you get 1. This is similar to a SELECT DISTINCT in Sql parlance.
The SUMPRODUCT formula sums [Distinct] for every row where the DOB is greater than the current DOB, the Mother is the same as the current Mother, and the Sex is zero.
I have a possible solution. It involves adding two columns -- One for "# older siblings" and one for "unique?". So here are all the headings I have currently:
A -- ID
B -- CollectionDate
C -- DOB
D -- MOTHER ID
E -- Sex
F -- # older siblings
G -- unique?
In G2, I added the following formula:
=IF(A2=A1,0,1)
And dragged down. As long as the data is sorted by ID, this will only display "1" once for each unique person.
In F2, I added the following formula:
=COUNTIFS(G:G,"=1",D:D,"="&D2,C:C,"<"&C2)
And dragged down. It seemed to work correctly for the sample data you provided.
The stipulations are:
You would need the two columns.
The data would need to be sorted by ID
I hope this helps.
You need a formula like this (for example, for row 2):
=COUNTIFS($A:$A,"<>"&$A2,$E:$E,$E2,$D:$D,$D2,$C:$C,"<"&$C2)
Assuming E:E is column for sex, D:D is column for mother ID and C:C is column for DOB.
Write this formula in H2 cell for example and drag it down.

Count number of rows where multiple criteria are met

I'm trying to generate a table that shows a count of how many items are in any given status on any given day. My result table has a set of Dates down column A and column headers are various statuses. A sample of my data table with headers looks like this:
Product | Notice | Assigned | Complete | In Office | In Accounting
1 | 5/5/13 | 5/7/13 | 5/9/13 | 5/10/13 | 5/11/13
2 | 5/5/13 | 5/6/13 | 5/8/13 | 5/9/13 | 5/10/13
3 | 5/6/13 | 5/9/13 | 5/10/13 | 5/10/13 | 5/10/13
4 | 5/4/13 | 5/5/13 | 5/7/13 | 5/8/13 | 5/9/13
5 | 5/7/13 | 5/8/13 | 5/10/13 | 5/11/13 | 5/11/13
If my output table were to contain a set of dates in the first column with the statuses as headers, I need a count of how many rows were at the given status and had not yet transitioned to the next status so that in the Notice column, I'd have a count of rows where the Notice Date was <= X AND where the Assigned, Complete, In Office, In Accounting are all greater than X.
I've used a Sum(if(frequency(if statement to get me REALLY close but I feel like I need to have an AND statement within the second IF like this =SUM(IF(FREQUENCY(IF(AND
Here's what I have that won't work:
=SUM(IF(FREQUENCY(IF(AND(Table1[Assigned]<=A279,Table1[[Complete]:[In Accounting]]<=A279),ROW(Table1[[Complete]:[In Accounting]])),ROW(Table1[[Complete]:[In Accounting]]))>0,1))
If I take the "AND" portion out, this works fine except I need it to ONLY count rows where the given status actually has a date so if an "Assigned" date is empty, I don't want that row to be counted for the Assigned column.
Here's an example of what I'd expect to see in the results. I've listed the count in the each column as well as the corresponding product numbers in parenthesis. The corresponding product numbers are for reference only and won't actually be in the result table.
Date | Notice | Assigned | Complete
5/6 | 2 (1,3) | 2 (2,4) | 0
5/7 | 2 (3,5) | 2 (1,2) | 1 (4)
5/8 | 1 (3) | 2 (1,5) | 1 (2)
OK, assuming you have the original data in A1:F6 then with 2nd table headers in B9:D9 and row labels in A10:A12 then you can use this "array formula" in B10
=SUM((B$2:B$6<=$A10)*(MMULT((C$2:$F$6>$A10)+(C$2:$F$6=""),TRANSPOSE(COLUMN(C$2:$F$6)^0))=COLUMNS(C$2:$F$6)))
confirmed with CTRL+SHIFT+ENTER and copied down and across (see screenshot below)
As you can see the results are as per your requirement. If you replace dates with blanks it will still work
MMULTis a way to get a single value from each row even when you are looking at multiple columns.
I used cell references because I think that's easier, especially when copying the formula across and having a reducing range.......but you can use structured references if you want
Have you tried using COUNTIFS to count based on multiple criteria. It is fairly well documented here: http://office.microsoft.com/en-us/excel-help/countifs-function-HA010047494.aspx (2007+ only)
Basically, you use it like
=COUNTIFS(first_range_to_check, value_you_want_in_first_range, ...)
where the ... represents as many pairs as you want (up to 127 total pairs), note the conditions are AND connection so if you have two pairs, the first pair AND the second pair must return true for that row to count.

Resources