Counting if part of string is within interval - excel

I am currently trying to check if a number in a comma-separated string is within a number interval. What I am trying to do is to check if an area code (from the comma-separated string) is within the interval of an area.
The data:
AREAS
Area interval
Name
Number of locations
1000-1499
Area 1
?
1500-1799
Area 2
?
1800-1999
Area 3
?
GEOLOCATIONS
Name
Areas List
Location A
1200, 1400
Location B
1020, 1720
Location C
1700, 1920
Location D
1940, 1950, 1730
The result I want here is the number of unique locations in the "Areas list" within the area interval. So Location D should only count ONCE in the 1800-1999 "area", and the Location A the same in the 1000-1499 location. But location B should count as one in both 1000-1499 and one in 1500-1799 (because a number from each interval is in the comma-separated string in "Areas list"):
Area interval
Name
Number of locations
1000-1499
Area 1
2
1500-1799
Area 2
3
1800-1999
Area 3
2
How is this possible?
I have tried with a COUNTIFS, but it doesnt seem to do the job.

Here is one option using FILTERXML():
Formula in C2:
=SUM(FILTERXML("<x><t>"&TEXTJOIN("</s></t><t>",,"1<s>"&SUBSTITUTE(B$7:B$10,", ","</s><s>"))&"</s></t></x>","//t[count(.//*[.>="&SUBSTITUTE(A2,"-","][.<=")&"])>0]"))
Where:
"<x><t>"&TEXTJOIN("</s></t><t>",,"1<s>"&SUBSTITUTE(B$7:B$10,", ","</s><s>"))&"</s></t></x>" - Is the part where we construct a valid piece of XML. The theory here is that we use three axes here. Each t-node will be named a literal 1 to make sure that once we return them with xpath we can sum the result. The outer x-nodes are there to make sure Excel will handle the inner axes correctly. If you are curious to know how this xml-syntax looks at the end, it's best to step through using the 'Evaluate Formula' function on the Data-tab;
//t[count(.//*[.>="&SUBSTITUTE(A2,"-","][.<=")&"])>0]")) - Basically means that we collect all t-nodes where the count of child s-nodes that are >= to the leftmost number and <= to the rightmost number is larger than zero. For A2 the xpath would look like //t[count(.//*[.>=1000][.<=1499])>0]")) after substitution. In short: //t - Select t-nodes, where count(.//* select all child-nodes where count of nodes that fullfill both requirements [.>=1000][.<=1499] is larger than zero;
Since all t-nodes equal the number 1, the SUM() of these t-nodes equals the amount of unique locations that have at least one area in its Areas List;
Important to note that FILTERXML() will result into an error if no t-nodes could be found. That would mean we need to wrap the FILTERXML() in an IFERROR(...., 0) to counter that and make the SUM() still work correctly.
Or, wrap the above in BYROW():
Formula in C2:
=BYROW(A2:A4,LAMBDA(a,SUM(FILTERXML("<x><t>"&TEXTJOIN("</s></t><t>",,"1<s>"&SUBSTITUTE(B$7:B$10,", ","</s><s>"))&"</s></t></x>","//t[count(.//*[.>="&SUBSTITUTE(a,"-","][.<=")&"])>0]"))))

Using MMULT and TEXTSPLIT:
=LET(rng,TEXTSPLIT(D2,"-"),
tarr,IFERROR(--TRIM(TEXTSPLIT(TEXTJOIN(";",,$B$2:$B$5),",",";")),0),
SUM(--(MMULT((tarr>=--TAKE(rng,,1))*(tarr<=--TAKE(rng,,-1)),SEQUENCE(COLUMNS(tarr),,1,0))>0)))

I am in very distinguished company but will add my version anyway as byrow probably is a slightly different approach
=LET(range,B$2:B$5,
lowerLimit,--#TEXTSPLIT(E2,"-"),
upperLimit,--INDEX(TEXTSPLIT(E2,"-"),2),
counts,BYROW(range,LAMBDA(r,SUM((--TEXTSPLIT(r,",")>=lowerLimit)*(--TEXTSPLIT(r,",")<=upperLimit)))),
SUM(--(counts>0))
)

Here the ugly way to do it, with A LOT of helper columns. But not so complicated 🙂
F4= =TRANSPOSE(FILTERXML("<m><r>"&SUBSTITUTE(B4;",";"</r><r>")&"</r></m>";"//r"))
F11= =TRANSPOSE(FILTERXML("<m><r>"&SUBSTITUTE(A11;"-";"</r><r>")&"</r></m>";"//r"))
F16= =SUM(F18:F21)
F18= =IF(SUM(($F4:$O4>=$F$11)*($F4:$O4<=$G$11))>0;1;"")
G18= =IF(SUM(($F4:$O4>=$F$12)*($F4:$O4<=$G$12))>0;1;"")
H18= =IF(SUM(($F4:$O4>=$F$13)*($F4:$O4<=$G$13))>0;1;"")

Related

Determine Size Of Array of Equal Items Excel

I have an array that looks like this
11100100110
essentially, an array of fixed size with each item being a 1 or 0 with the last item always equal to 0.
Consider each set of consecutive 1's to be a "bucket". I'd like a formula to determine the size of each bucket. So the output of this formula for the above sequence should be
312
as an array. Ideally this works in both excel and google sheets.
If you are interested this is the result of a list of stars and bars configurations where the 0's in my sequence represent bars and the 1's represent stars (the final value is a dummy 0 to make things easier to work with). I want the size of each non-empty bucket in a given configuration of stars and bars.
Thanks, in advance.
You could also use the standard method with Frequency which will work with Excel 365 and GS:
=FILTER(FREQUENCY(IF(A1:A11=1,ROW(A1:A11)),IF(A1:A11=0,ROW(A1:A11))),FREQUENCY(IF(A1:A11=1,ROW(A1:A11)),IF(A1:A11=0,ROW(A1:A11))))
try:
=INDEX((JOIN(, LEN(SPLIT(A1, 0)))))
update:
=INDEX(IFERROR(1/(1/SUBSTITUTE(FLATTEN(QUERY(TRANSPOSE(IFERROR(1/(1/
LEN(SPLIT(SUBSTITUTE(FLATTEN(QUERY(
TRANSPOSE(A1:K),, 9^9)), " ", ), 0))))),, 9^9)), " ", ))))
Assuming A2:A9 contains the data,
=ARRAYFORMULA(QUERY(FREQUENCY(IF(A2:A9,ROW(A2:A9)),IF(NOT(A2:A9),ROW(A2:A9))),"where Col1>0",))
FREQUENCY(data,classes) to get the frequency of data in classes
Make sequence of row numbers as data, if 1
Make sequence of row numbers as classes, if not 1
QUERY to get rid of zeros

Excel: How to find closest number in table, many times

Excel
Need to find nearest float in a table, for each integer 0..99
https://www.excel-easy.com/examples/closest-match.html explains a great technique for finding the CLOSEST number from an array to a constant cell.
I need to perform this for many values (specifically, find nearest to a vertical list of integers 0..99 from within a list of floats).
Array formulas don't allow the compare-to value (integers) to change as we move down the list of integers, it treats it like a constant location.
I tried Tables, referring to the integers (works) but the formula from the above web site requires an Array operation (F2, control shift Enter), which are not permitted in Tables. Correction: You can enter the formula, control-enter the array function for one cell, copy the formulas, then insert table. Don't change the search cell reference!
Update:
I can still use array operations, but I manually have to copy the desired function into each 100 target cells. No biggie.
Fixed typo in formula. See end of question for details about "perfection".
Example code:
AI4=some integer
AJ4=MATCH(MIN(ABS(Table[float_column]-AI4)), ABS(Table[float_column]-AI4), 0)
repeat for subsequent integers in AI5...AI103
Example data:
0.1 <= matches 0
0.5
0.95 <= matches 1
1.51 <= matches 2
2.89
Consider the case where target=5, and 4.5, 5.5 exist in the list. One gives -0.5 and the other +0.5. Searching for ABS(-.5) will give the first one. Either one is decent, unless your data is non-monotonic.
This still needs a better solution.
Thanks in advance!
I had another problem, which pushed to a better solution.
Specifically, since the Y values for the X that I am interested in can be at varying distances in X, I will interpolate X between the X point before and after. Ie search for less than or equal, also greater than or equal, interpolate the desired X, then interpolate the Y values.
I could go a step further and interpolate N - 1 to N + 1, which will give cleaner results for noisy data.

Excel comparison with multiple IFs

I have a scale, based on which I decide the value of the coefficient for the multiplication. The scale looks as following:
Which means that:
for Category1: when value>=1.000.000 then coef is 1, when value>=500.000 then coef is 0.8 and etc.
Same logic applies for Category2;
Then I have input data in the following format:
Company !MainCat|Sales Amount|
Company1|T1 | 6.500.000|
Company2|T2 | 70.000|
I need to find corresponding coefficient, ratio of the coeffitient and the value (=ratio*MaxCoef). Currently, I am finding coef the following way:
- for company1:
=IF(C8>=$D2;$D$1;IF(C8>=$E2;$E$1;IF(C8>=$F2;$F$1;IF(C8>=$G2;$G$1;IF(C8>=$H2;$H$1;IF(C8>=$I2;$I$1))))))
That is literally hardcoded and doesn't look good. Maybe there is a better way of doing ? Any suggestions?
Formula view:
You can COUNTIF(range, [criteria] < value) * 0.2 as your add 0.2 per coef stage.
To you data do: =COUNTIF(D2:H2, "<"&C8) * 0.2, count how many stages the value passes * the value per stage.
Your count if range needs to be until H2 as I2 is 0, so inferior to value and gets counted.
To combine the COUNTIF() with a dynamic search for the right category based on MainCat you can MATCH() the MainCat with Code which will give the row where the Code is located and utilize INDIRECT() to apply it as range.
=COUNTIF(INDIRECT("D"&MATCH(B8,B:B,0)&":H"&MATCH(B8,B:B,0)),"<"&C8)*0.2
MATCH(B8,B:B,0) - will match the value on B8 (lets say T1) and return the row 2.
INDIRECT("D"&MATCH(B8,B:B,0)&":H"&MATCH(B8,B:B,0) = INDIRECT("D"&2&":H"&2) - will turn the text into an actual range to be use by the COUNTIF().
Create a table ‚Mapping’ that contains two columns, ‚Category’ and ‚Coefficient‘, then use INDEX-MATCH on it as described in https://www.deskbright.com/excel/using-index-match/.
=INDEX(Mapping[Category]; MATCH([Coefficient]; Mapping[Coefficient]; -1))
This example assumes that you put this formula into a table that has a column named ‚Coefficient‘ with the input value to your multiple IFs.
The trick is that as a match_type argument, provide either -1 or 1, according to your needs.
You can do this in VBA. Write your own function which ends in something like that
=MyOwnScale(C8; B8; A2:I3)
The first parameter of your VBA-function is the value, the second the category and the third is the range with the thresholds. So you can move your cascading IF-loops in VBA-Code and you (and your users) see only a clean function call in the cell.

Using tbl.Lookup to match just part of a column value

This question relates to the Schematiq add-in for Microsoft Excel.
Using =tbl.Lookup(table, columnsToSearch, valuesToFind, resultColumn, [defaultValue]) the values in the valuesToFind column have a consistent 3 characters to the left and then varying characters after (e.g. 908-123456 or 908-321654 - i.e. 908 is always consistent)
How can I tell the function to lookup the value based on the first 3 characters only? The expected answer should be the sum of the results of the above, i.e. 500 + 300 = 800
tbl.Lookup() works by looking for an exact match - this helps ensure it's fast but in this case it means you need an extra step to calculate a column of lookup values, something like this:
A2: =tbl.CalculateColumn(A1, "code", "x => LEFT(x, 3)", "startOfCode")
This will give you a new column that you can use for the columnsToSearch argument, however tbl.Lookup() also looks for just one match - it doesn't know how to combine values together if there is more than one matching row in the table, so I think you also need one more step to group your table by the first 3 chars of the code, like this:
A3: =tbl.Group(A2, "startOfCode", "amount")
Because tbl.Group() adds values together by default, this will give you a table with a row for each distinct value of startOfCode and the subtotal of amount for each of those values. Finally, you can do the lookup exactly as you requested, which for your input table will return 800:
A4: =tbl.Lookup(A3, "startOfCode", "908", "amount")

Binning in Excel

Which formulae in MS Excel can we use for -
equi-depth binning
equi-width binning
Here's what I used. The data I was binning was in A2:A2001.
Equi-width:
I calculated the width in a separate cell (U2), using this formula:
=(MAX($A$2:$A$2001) - MIN($A$2:$A$2001) + 0.00000001)/10
10 is the number of bins. The + 0.00000000001 is there because without it, values equal to the maximum were getting put into their own bin.
Then, for the actual binning, I used this:
=ROUNDDOWN(($A2-MIN($A$2:$A$2001))/$U$2, 0)
This function is finding how many bin-widths above the minimum your value is, by dividing (value - minimum) by the bin width. We only care about how many full bin-widths fit into the value, not fractional ones, so we use ROUNDDOWN to chop off all the fractional bin-widths (that is, show 0 decimal places).
Equi-depth
This one is simpler.
=ROUNDDOWN(PERCENTRANK($A$2:$A$2001, $A2)*10, 0)
First, get the percentile rank of the current cell ($A2) out of all the cells being binned ($A$2:$A$2001). This will be a value between 0 and 1, so to convert it into bins, just multiply by the total number of bins you want (I used 10). Then, chop off the decimals the same way as before.
For either of these, if you want your bins to start at 1 rather than 0, just add a +1 to the end of the formula.
Best approach is to use the built-in method:
http://support.microsoft.com/kb/214269
I think the VBA version of the addin (step 3 with most versions) will also give you the code.
Put this formula in B1:
=MAX( ROUNDUP( PERCENTRANK($A$1:$A$8, A1) *4, 0),1)
Fill down the formula all across B column and you are done. The formula divides the range into 4 equal buckets and it returns the bucket number which the cell A1 falls into. The first bucket contains the lowest 25% of values.
General pattern is:
=MAX( ROUNDUP ( PERCENTRANK ([Range], [TestCell]) * [NumberOfBuckets], 0), 1)
You may have to build the matrix to graph.
For the bin bracket you could use =PERCENTILE() for equi-depth and a proportion of the difference =Max(Data) - Min(Data) for equi-width.
You could obtain the frequency with =COUNTIF(). The bin's Mean could be obtained using =SUMPRODUCT((Data>LOWER_BRACKET)*(Data<UPPER_BRACKET)*Data)/frequency
More complex statistics could be reached hacking around with SUMPRODUCT and/or Array formulas (which I do not recommend since are very hard to comprehend for a non-programmer)

Resources