Excel IF AND statement with percentage output - excel

I have a dataset of co-ordinates, except the co-ordinate is split into two column X and Y.
I'm trying to find the percentage of co-ordinates in a single quadrant, and thought IF/AND statements would work i.e.
IF "numbers in X Column is between -5 and 0" AND "numbers in Y is between -5 and 0" then display total as Percentage of all data
IF "numbers in X Column is between 0 and 5" AND "numbers in Y is between 0 and 5" then display total as Percentage of all data
Etc..
I think that may be the easiest way but I'm stuck on three things;
How to do it for negative numbers
How to do "is greater than a number BUT less than a number" (rather
than just greater/less than), and
How to show results as a percentage (I only know how to return a
TRUE or FALSE) i.e.:
=IF(AND(A2:A100>5,B2:B100>5),TRUE, FALSE)

If you have the newest version of Excel you can use FILTER.
For example, if you are looking for the percent of coordinates in Q1 (0<=X<=5, 0<=Y<=5), you can use something like:
=ROWS(
FILTER($A$2:$B$100,
(($A$2:$A$100>=0)*($A$2:$A$100<=5)*($B$2:$B$100>=0)*($B$2:$B$100<=5))))/
ROWS($A$2:$A$100)
If not, then a simple way is to do what you were originally trying in a helper column, and divide the results by the total rows.
To answer your questions:
For negative numbers, e.g. Q3, just use negative numbers. Something like:
=IF(AND($A$2:$A$100>=-5, $A$2:$A$100<=0, $B$2:$B$100<=0, $B$2:$B$100>=-5),1, 0)
Logically, it is not really "greater than a number BUT less than a number", but "greater than a number AND less than a number". The example in question 1 shows that in action.
To get as a percent, you add up all the values, and divide by the total. In an IF statement you can return anything (format is =IF(test_is_true, value1, value2)), so in the above example, I elect to return 1 for whenever a row meets our criteria.

Related

A function in excel that outputs a Y or an N if all the values are equal for a specific value

Components
Serial Num
Market 1
Market 2
Market 3
1234
100000000
N
Y
N
1233
100000001
N
Y
Y
1235
100000000
Y
N
N
1236
100000000
N
Y
N
1231
100000001
Y
Y
Y
I have a table with over 1k rows each with repeating and different serial numbers. The table's market status (Y or N) is based on an Xlookup that checks the component number and the components market status in another tab. The components market status is accurate. However, since the serial numbers repeat, I'm trying to create a function that can check that if for example, check if Serial number "100000001" is fully qualified Market 2. When it checks Market 2 would be "Y" because all of the "100000001" have a Y in Market 2 but if we do the same for "100000000" it should be "N" because not all of its Market qualify or have "Y". It would be simple there were a small amount of serial numbers but there's over 1k so I was trying to find out if anyone knows if this could even be possible with a spreadsheet of this size. In essence I think it would be like an Xlookup function but it would need to find and check the market status for each Serial number found and then equate if the value is equal to "Y" for each, if they all are then "Y" would come out, otherwise "N".
I hope this makes sense. It's the first time I post on stackoverflow, as such, definitely feel free to let me know if there's anything I can do to make this question clearer or up to standard with StackOverFlow language.
--
I tried using the =FILTER function but when it outputs data it creates a spill since for each serial I would need space below to give me the values. From there I would create an =AND(EXACT([range of the values output],"Y")). This would determine if all the values in the range are equal to Y, if they are "TRUE" would be the output, if not then "FALSE".
I also tried creating an xlookup function that would look for the smallest value. In doing so I would make the Market outputs Y1 and N. What I hoped would happen is that if it came across an "N", it would give me that output first. Since all it takes for the market status of the serial to be "N" is one N in its market. Only way it can be a Y is if they're all the same. Nonetheless, this does not work since XLookup only looks for the value of the serial code and not the market status value.
Hey guys thank you all for your support! I figured out a solution to the madness haha. I downloaded Ablebits which has the ability to merge together cells that contain duplicate values and the columns that contain those values are included together separated by semicolons. For instance, if merged all the cells in my example, serial number "100000000" would have the following under each market: Market 1 (N;Y;N), Market 2 (Y;N;Y), Market 3 (N;N;N). From there I created 2 color conditions in this order. 1.) if any cell contains an "N" then it would be red indicating that it isn't qualified. 2.) If any cell contains a "Y" then it would be green [since the first rule made those that contain an N red, then this one would only apply to those that have all Y;Y;Y;Y or just 1 "Y" if it didn't repeat. It does not exactly give me an output of Y or N but with all the values in a cell we can distinguish which ones have all qualifications and which don't.

Counting if part of string is within interval

I am currently trying to check if a number in a comma-separated string is within a number interval. What I am trying to do is to check if an area code (from the comma-separated string) is within the interval of an area.
The data:
AREAS
Area interval
Name
Number of locations
1000-1499
Area 1
?
1500-1799
Area 2
?
1800-1999
Area 3
?
GEOLOCATIONS
Name
Areas List
Location A
1200, 1400
Location B
1020, 1720
Location C
1700, 1920
Location D
1940, 1950, 1730
The result I want here is the number of unique locations in the "Areas list" within the area interval. So Location D should only count ONCE in the 1800-1999 "area", and the Location A the same in the 1000-1499 location. But location B should count as one in both 1000-1499 and one in 1500-1799 (because a number from each interval is in the comma-separated string in "Areas list"):
Area interval
Name
Number of locations
1000-1499
Area 1
2
1500-1799
Area 2
3
1800-1999
Area 3
2
How is this possible?
I have tried with a COUNTIFS, but it doesnt seem to do the job.
Here is one option using FILTERXML():
Formula in C2:
=SUM(FILTERXML("<x><t>"&TEXTJOIN("</s></t><t>",,"1<s>"&SUBSTITUTE(B$7:B$10,", ","</s><s>"))&"</s></t></x>","//t[count(.//*[.>="&SUBSTITUTE(A2,"-","][.<=")&"])>0]"))
Where:
"<x><t>"&TEXTJOIN("</s></t><t>",,"1<s>"&SUBSTITUTE(B$7:B$10,", ","</s><s>"))&"</s></t></x>" - Is the part where we construct a valid piece of XML. The theory here is that we use three axes here. Each t-node will be named a literal 1 to make sure that once we return them with xpath we can sum the result. The outer x-nodes are there to make sure Excel will handle the inner axes correctly. If you are curious to know how this xml-syntax looks at the end, it's best to step through using the 'Evaluate Formula' function on the Data-tab;
//t[count(.//*[.>="&SUBSTITUTE(A2,"-","][.<=")&"])>0]")) - Basically means that we collect all t-nodes where the count of child s-nodes that are >= to the leftmost number and <= to the rightmost number is larger than zero. For A2 the xpath would look like //t[count(.//*[.>=1000][.<=1499])>0]")) after substitution. In short: //t - Select t-nodes, where count(.//* select all child-nodes where count of nodes that fullfill both requirements [.>=1000][.<=1499] is larger than zero;
Since all t-nodes equal the number 1, the SUM() of these t-nodes equals the amount of unique locations that have at least one area in its Areas List;
Important to note that FILTERXML() will result into an error if no t-nodes could be found. That would mean we need to wrap the FILTERXML() in an IFERROR(...., 0) to counter that and make the SUM() still work correctly.
Or, wrap the above in BYROW():
Formula in C2:
=BYROW(A2:A4,LAMBDA(a,SUM(FILTERXML("<x><t>"&TEXTJOIN("</s></t><t>",,"1<s>"&SUBSTITUTE(B$7:B$10,", ","</s><s>"))&"</s></t></x>","//t[count(.//*[.>="&SUBSTITUTE(a,"-","][.<=")&"])>0]"))))
Using MMULT and TEXTSPLIT:
=LET(rng,TEXTSPLIT(D2,"-"),
tarr,IFERROR(--TRIM(TEXTSPLIT(TEXTJOIN(";",,$B$2:$B$5),",",";")),0),
SUM(--(MMULT((tarr>=--TAKE(rng,,1))*(tarr<=--TAKE(rng,,-1)),SEQUENCE(COLUMNS(tarr),,1,0))>0)))
I am in very distinguished company but will add my version anyway as byrow probably is a slightly different approach
=LET(range,B$2:B$5,
lowerLimit,--#TEXTSPLIT(E2,"-"),
upperLimit,--INDEX(TEXTSPLIT(E2,"-"),2),
counts,BYROW(range,LAMBDA(r,SUM((--TEXTSPLIT(r,",")>=lowerLimit)*(--TEXTSPLIT(r,",")<=upperLimit)))),
SUM(--(counts>0))
)
Here the ugly way to do it, with A LOT of helper columns. But not so complicated 🙂
F4= =TRANSPOSE(FILTERXML("<m><r>"&SUBSTITUTE(B4;",";"</r><r>")&"</r></m>";"//r"))
F11= =TRANSPOSE(FILTERXML("<m><r>"&SUBSTITUTE(A11;"-";"</r><r>")&"</r></m>";"//r"))
F16= =SUM(F18:F21)
F18= =IF(SUM(($F4:$O4>=$F$11)*($F4:$O4<=$G$11))>0;1;"")
G18= =IF(SUM(($F4:$O4>=$F$12)*($F4:$O4<=$G$12))>0;1;"")
H18= =IF(SUM(($F4:$O4>=$F$13)*($F4:$O4<=$G$13))>0;1;"")

What's the logic behind PERCENTILE.INC Excel function?

I would like to know how does Excel think to calculate the values on the function PERCENTILE.INC. I'm making some studies on Percentile and Quartile, I got the below results:
How does Excel think to calculate the values on column F?
Here's the formulas I'm using:
=PERCENTILE.INC(B2:B21; 0,75) ==> F2
=PERCENTILE.INC(B2:B21; 0,50) ==> F3
=PERCENTILE.INC(B2:B21; 0,25) ==> F4
=PERCENTILE.INC(B2:B21; 0,00) ==> F5
Short answer - the position of a given percentile when the data is sorted in ascending order, using percentile.inc, is given by
(N-1)p+1
where p is the required percentile as a fraction from 0 to 1 and N is the number of points.
If this expression gives a whole number, you take the value at this position (e.g. percentile zero gives 1, so its value is exactly 22). If it's not a whole number, you interpolate between the value at the position given by the whole number part (e.g. for p=0.25 it's 5 and the value at this position is 52) and the value at the position one higher (in this case position 6 so the number is 55), then multiply the difference of the two values (3) by the fraction part (0.75) giving you 2.25 and finally add this to the lower of the two values giving you 54.25. A shorter way of saying this is that you go a quarter of the way between the two nearest values. So you have:
If you wished to show the logic as an Excel formula, you could implement the expression shown here on the right (where h, in the second column of the table, is the position calculated from the formula above and x is the value at that position)
like this:
=LET(p,J3,
range,I$2:I$21,
N,COUNT(range),
position,(N-1)*p+1,
lower,FLOOR(position,1),
fraction,MOD(position,1),
upper,CEILING(position,1),
lowerValue,INDEX(range,lower),
upperValue,INDEX(range,upper),
difference,upperValue-lowerValue,
lowerValue+fraction*difference)

How to evaluate a sum in excel to return 0 if the sum of two values is >0 (for my data I do not care about Positive values)

I am summing loads for member design and have the correct values for downforce. However, when I use my formulas for uplift (-) if there is no uplift it returns a positive value which is objectively correct however it is not needed for my purposes. I only need negative values for uplift.
I've tried
sumif(Range,SUM(Range)<0,Sum Range)
sumif(Range,"<0")... this just returns the only negative value which is not what I need
basically, i still need to sum the data but if it returns a positive it should return zero, and if it returns a negative keep the value
Ex.)
2+3 should = 0
3-9 should = -6
You can use MIN to not allow the value to become greater than 0
=MIN(SUM(A1,A2),0)
This will make any positive number 0 because 0 is lower than any positive number.
Something like this?
=IF(SUM(J87+K87)>0,0,J87+K87)

Binning in Excel

Which formulae in MS Excel can we use for -
equi-depth binning
equi-width binning
Here's what I used. The data I was binning was in A2:A2001.
Equi-width:
I calculated the width in a separate cell (U2), using this formula:
=(MAX($A$2:$A$2001) - MIN($A$2:$A$2001) + 0.00000001)/10
10 is the number of bins. The + 0.00000000001 is there because without it, values equal to the maximum were getting put into their own bin.
Then, for the actual binning, I used this:
=ROUNDDOWN(($A2-MIN($A$2:$A$2001))/$U$2, 0)
This function is finding how many bin-widths above the minimum your value is, by dividing (value - minimum) by the bin width. We only care about how many full bin-widths fit into the value, not fractional ones, so we use ROUNDDOWN to chop off all the fractional bin-widths (that is, show 0 decimal places).
Equi-depth
This one is simpler.
=ROUNDDOWN(PERCENTRANK($A$2:$A$2001, $A2)*10, 0)
First, get the percentile rank of the current cell ($A2) out of all the cells being binned ($A$2:$A$2001). This will be a value between 0 and 1, so to convert it into bins, just multiply by the total number of bins you want (I used 10). Then, chop off the decimals the same way as before.
For either of these, if you want your bins to start at 1 rather than 0, just add a +1 to the end of the formula.
Best approach is to use the built-in method:
http://support.microsoft.com/kb/214269
I think the VBA version of the addin (step 3 with most versions) will also give you the code.
Put this formula in B1:
=MAX( ROUNDUP( PERCENTRANK($A$1:$A$8, A1) *4, 0),1)
Fill down the formula all across B column and you are done. The formula divides the range into 4 equal buckets and it returns the bucket number which the cell A1 falls into. The first bucket contains the lowest 25% of values.
General pattern is:
=MAX( ROUNDUP ( PERCENTRANK ([Range], [TestCell]) * [NumberOfBuckets], 0), 1)
You may have to build the matrix to graph.
For the bin bracket you could use =PERCENTILE() for equi-depth and a proportion of the difference =Max(Data) - Min(Data) for equi-width.
You could obtain the frequency with =COUNTIF(). The bin's Mean could be obtained using =SUMPRODUCT((Data>LOWER_BRACKET)*(Data<UPPER_BRACKET)*Data)/frequency
More complex statistics could be reached hacking around with SUMPRODUCT and/or Array formulas (which I do not recommend since are very hard to comprehend for a non-programmer)

Resources