Compare 2 set of cells and count - excel-formula

I'm trying to make something of a pattern recognize counting cell and calculate their probability of occurrence.
Set 1 is from a database that looks like this (consists of 6+1 columns):
ABABAB 1
CACACA 2
CACACA 2
CACACA 2
CACACA 1
ABABAB 1
Set 2 is what I input manually (only 6 columns, without the 7th column information)
CACACA
If I want to know the probability of the 7th column as "2", pattern recognize counting cell should return 75, where if its "1" it should return 25.
Been cracking my brain using countifs function but nothing seems to work.

From my understanding you have seven columns, like this:
You would like to make a ratio between a combination that includes all seven columns (from A to G in the picture), and the combination of the first six columns (from A to F in the picture).
In this case you may use COUNTIFS as follows (I will use your same example CACACA):
=COUNTIFS(A1:A6,"C",B1:B6,"A",C1:C6,"C",D1:D6,"A",E1:E6,"C",F1:F6,"A",G1:G6,"2")/COUNTIFS(A1:A6,"C",B1:B6,"A",C1:C6,"C",D1:D6,"A",E1:E6,"C",F1:F6,"A")
The result will be 0.75, so if you want 75 you can just multiply the result by 100.
If you want to check the result also for CACACA 1, you just need to change the last value of the first COUNTIFS in the formula from "2" to "1" and you will get 0.25.
In case my understanding was wrong or you need more support do not hesitate to drop me a note!

Related

Create a formula to find the count in Excel column

I not very use to use excel complex formula.
Here is request
Column has few number (-ve/+ve). I have to count these based on intervals. These interval are not pre-decided.
See screen Shot
Values in Label col can change in run time. A is less than -15, B is between -15 to 6 and so on. I have to create a formula to add a count in the Count col.
Please guide
Thank you
Regards
You will save yourself a lot of maintenance headaches by re-formatting the "Labels" Table:
D
E
F
G
6
Greater Than or Equal To
Less Than or Equal To
Count
7
Label A
-99
-16
(formula below goes here)
8
Label B
-15
-6
9
Label C
-5
5
10
Label D
6
15
The formula to place in the first count cell (G7 in my example) is:
=COUNTIFS(A$1:A$19,">="&E7,A$1:A$19,"<="&F7)
And then fill it down the length of the table. (In this example 4 rows). Be mindful of the $ that lock the rows of your values column.
I would strongly advise to follow the approach outlined by #Max as it is easier to maintain and less error-prone. However, as you stated, that you are looking for a solution that takes your format into account, I come up with this.
Note:
I split up the complex formula in different pieces to make it easier to explain. You can paste the different pieces together in a single cell per row if you like.
Step 1: Getting rid of unnecessary descriptions
I use a combination of 'SUBSTITUTE', 'MID' and 'FIND' to extract our operators and values from your labels: F3: =SUBSTITUTE(MID(D3,FIND("(",D3)+1,FIND(")",D3)-FIND("(",D3)-1),"interval ","").
The cell D3 contains your label, e.g. "A (interval <-15).
Step 2: Getting the lower threshold
I check whether the label contains two thresholds or not by looking for the "/" character. Next, I handle both situations.
G3: =IF(ISERR(FIND("/",F3)),MID(F3,1,LENGTH(F3)),">="&MID(F3,1,FIND("/",F3)))
The cell F3contains the result of Step 1
Step 4: Getting the upper threshold
Similar to Step 2, except for the operators.
H3: =IF(ISERR(FINN("/",F3)),MID(F3,1,LENGTH(F3)),"<="&MID(F3,FIND("/",F3),LENGTH(F3)-FIND("/",F2)))
Step 5: Counting
I use MID to include the single pieces of Step 3 and Step 4 as text into the formula. If you paste everything together, the use of it will not be necessary.
I3: =COUNTIFS($A$1:$A$19,MID(G3,1,LENGTH(G3)),$A$1:$A$19,MID(H3,1,LENGTH(H3)))
Update: Here's the screenshot.

Sum row based on criteria across multiple columns

I have googled for hours, not being able to find a solution to what I need/want. I have an Excel sheet where I want to sum the values in one column based on the criteria that either one of two columns should have a specific value in it. For instance
A B C
1 4 20 7
2 5 100 3
3 100 21 4
4 15 21 4
5 21 24 8
I want to sum the values in C given that at least one of A and B contains a value of less than or equal to 20. Let us assume that A1:A5 is named A, B1:B5 is named B, and C1:C5 is named C (for simplicity). I have tried:
={SUMPRODUCT(C,((A<=20)+(C<=20)))}
which gives me the rows where both columns match summed twice, and
={SUMPRODUCT(C,((A<=20)*(C<=20)))}
which gives me only the rows where both columns match
So far, I have settled for the solution of adding a column D with the lowest value of A and B, but it bugs me so much that I can't do it with formulas.
Any help would be highly appreciated, so thanks in advance. All I have found when googling is the "multiple criteria for same column" problem.
Thanks. That works. Found another one that works, after I figured out that excel does not treat 1 + 1 = 1 as I learnt in discrete mathematics, but as you say, counts the both the trues. Tried instead with:
{=SUM(IF((A<=20)+(B<=20);C;0))}
But I like yours better.
Your problem that it is "summing twice" in this formula
={SUMPRODUCT(C,((A<=20)+(C<=20)))}
is due to addition turning first TRUE plus the second TRUE into 2. It is not actually summing twice, because for any row, if only one condition is met, it would count that row only once.
The solution is to transform either the 1 or the 2 into a 1, using an IF:
={SUMPRODUCT(C,IF((A<=20)+(C<=20))>0, 1, 0)}
That way, each value in column C would only be counted at max once.
Following this site you could build up your SUMPRODUCT() formula like this:
=SUMPRODUCT(C,SIGN((A<=20)+(C<=20)))
So, instead of a nested IF() you control your or condition with the SIGN()function.
hth
If you plan to use a large set of data then it is best to use the array formula:
{=SUM(IF((A1:A5<=20)+(B1:B5<=20),C1:C5,0))}
Obviously adjust the range to suit the data set, however if the whole of each column is to form part of the formula then you can simply adjust to:
{=SUM(IF((A:A<=20)+(B:B<=20),C:C,0))}
This will perform the calculation on all rows of data within the A, B and C columns. With either example remember to press Ctrl + Shift + Enter in order to trigger the array formula (as opposed to typing the { and }).

Excel Condition missplacing

I am trying to make an average of multiple cells, and I want to ignore to zero cells.
Here is my formula =IFERROR(AVERAGEIF(L4:L10;L12:L18;L20:L26;L28:L34;L36:L37);"") and I don't know where to put the condition to ignore zero "<>0" . Am I doing something wrong?
Assuming you only have positive values and zeroes you can average without zeroes, for non-contiguous ranges using this syntax
=IFERROR(SUM(L4:L10;L12:L18;L20:L26;L28:L34;L36:L37)/INDEX(FREQUENCY((L4:L10;L12:L18;L20:L26;L28:L34;L36:L37);0);2);"")
The FREQUENCY part gives you a two element array, one being the count of zeroes, the other the count of positive values, INDEX then retrieves the second of those (the number of positive values), so if you divide the sum by that count you get the average excluding zeroes. FREQUENCY function (unlike AVERAGEIF) accepts a non contiguous range argument (a "union")
....but if you can identify which rows to exclude by using values in another column then it's easier with AVERAGEIFS, e.g. if on the excluded rows, e.g. in K11, K21, K35 etc. they all have the value "Total" you can use this version:
=IFERROR(AVERAGEIFS(L4:L37;L4:L37;"<>0";K4:K37;"<>Total");"")
adjust depending on the exact text, wildcards are possible
Here is another way using SUMIF and COUNTIF.
Example data:
Values
1
-3
0
5
777
3
0
0
8
text
4
5
6
0
6
7
Formulas:
B18=SUMIF(A2:A17,"<>0",A2:A17)-SUM(A6,A11,A15)
B19=COUNTIF(A2:A5,"<>0")+COUNTIF(A7:A10,"<>0")+COUNTIF(A12:A17,"<>0")
B20=B18/B19

Looking for formula to extract specific values from a row containing numbers and blanks

I have a sheet with rows of data, with many columns. I am looking for help on a formula that will extract the sum of the smallest 3 numbers in a row based on the last 5 values entered. Note that not all the rows will have values for each column, so the first value found on each row will may be found in a different column.
To determine the sum of the smallest 3 I am using the formula =SUM(SMALL(B3:R3,{1,2,3})), Unfortunately, that formula is looking at the entire range. Again, I am looking for help that with a formula that will select only the last 5 values posted.
Here is simple example. The results for each line show the totals that should be determined. Again, it needs to look for the sum of the smallest 3 based on the last 5 posted (in the example below the range would be col. 1 thru 10, with col 10 having the latest postings).
Ex.
1.....2.....3......4......5.....6.....7.....8......9.....10...... Result
31.........44....51....36..........44...34....36....38.......106 (34+36+34)
35..31...44...40.....38...52..........42....37...............115 (37+38+40)
Hope this is understandable. I am looking for a formula solution vs a VBA macro solution because of my users. Thanks for any help!!
Now that you clarified the question, I have an answer for you. This is fairly ugly but it gets the job done. You might want to hide the columns with the intermediate results - or you could get adventurous and "nest" the expressions. This makes it really hard to understand / debug though. If there's a smarter way to do this I am always open to learning.
Assuming you have the data in columns A through J, starting in row 2, put the following in cell L2:P2:
=MATCH(9999, A2:J2,1)
=MATCH(9999,OFFSET($A2,0,0, 1, L2-1)) ... copy this by dragging right to the next 2 columns
=MATCH(9999,OFFSET($A2,0,0, 1, M2-1))
=MATCH(9999,OFFSET($A2,0,0, 1, N2-1))
=MATCH(9999,OFFSET($A2,0,0, 1, O2-1))
The first line finds the last cell with data in it; the next ones find the last cell "not including the last cell", and so they work backwards. The result is a number corresponding to the columns with data. For your example, this gives
10 9 8 7 5
9 8 6 5 4
Now we want to find the sum of the smallest 3 of these: put the following equation in cell Q2:
=SUM(SMALL(INDIRECT("RC["&P2-17&"]:RC["&L2-17&"]",FALSE),{1,2,3}))
Working from the inside out:
RC["&P2-17"] results in RC[-12], which is "the cell 12 to the left of this one".
That is the first of the "last five cells with data", cell E2
RC["&L2-17"] results in RC[-7], the last cell with data in this row
FALSE use "RC" rather than "A1" indexing
INDIRECT turn string into an address (in this case a range)
SMALL find the 3 smallest values in this range
SUM and add them together.
This formula did indeed give me 106, 115 for the example you provided.
I would hide columns L through P so you only see the result (and not the intermediate stuff).

Permutations in Excel

I am have a string with 6 spaces, e.g. 000000. Each space can hold one of three digits - 0, 1, or 2. I know that I can get a total of 120 permutations using the Permut function in Excel, i.e. =PERMUT(6,3) = 120. But I would actually like to have each individual permutation in a cell, e.g. 000001, 000010, etc.. Ideally, the end result would be 120 rows of unique 6-digit IDs.
Please help if you know a faster way of accomplishing this without entering the figures manually.
Thanks!
There is a VBA functionin the last post on this page. Copy it into a VBA module, then in Excel, create a column of integers from 0 to n where n = the number of IDs you want. In the next column, call the VBA function with the value from the first column as the first argument, and 3 as the second argument. Something like
Column A Column b
0 =baseconv(A1, 3)
1 =baseconv(A2, 3)
2 =baseconv(A3, 3)
... etc.
Your IDs are really just incremental values using a base 3 counting system. You can format the output to get leading zeros with a custom format of '000000'.
Incidentally, with 6 positions and 3 available values, you can get 3 ^ 6, or 729 unique IDs
First, I don't think you're using PERMUT correctly here. What PERMUT(6,3) gives you is the total number of ways to arrange three things picked out of a set of six things. So the result is 120 because you could have 6*5*4 possible permutations. In your case you have 3^6 = 729 possible strings, because each position has one of three possible characters.
Others have posted perfectly fine VBA-based solutions, but this isn't that hard to do in the worksheet. Here is an array formula that will return an array of the last six digits of the ternary (base-3) representation of a number:
=FLOOR(MOD(<the number>,3^({5,4,3,2,1,0}+1))/(3^{5,4,3,2,1,0}),1)
(As WarrenG points out, just getting a bunch of base-3 numbers is one way to solve your problem.)
You would drag out the numbers 0 through 728 in a column somewhere, say $A$1:$A$729. Then in $B$1:$G$1, put the formula:
=FLOOR(MOD(A1,3^({5,4,3,2,1,0}+1))/(3^{5,4,3,2,1,0}),1)
remembering to enter it as an array formula with Ctrl-Shift-Enter. Then drag that down through $B$729:$G$729.
Finally in cell $H$1, put the formula:
=CONCATENATE(B1,C1,D1,E1,F1,G1)
and drag that down through $H$729. You're done!

Resources