Excel COUNTIF match variations of target: LET solution? - excel

This is a slightly more complicated issue than a simple =COUNTIF(rng,"*"&value&"*"), as found here.
I have a 2D array with cells containing data such as:
abc
def
abc def
ghi
abc,def,ghi
abcdef
ghi; def
..... and several other variations of this. I am trying to count exact matches of "abc", but I want the count to be inclusive of cells containing "abc def" and other like variations, however I can't just use the above simple COUNTIF formula since "abcdef" is not an acceptable match. The target string must stand alone or be separated from other text by an acceptable character in chars.
I think I've got this one 90% done, but the bit I need help with is combining all the possible acceptable variations of a target "name" into a flat range that I can then check my data source against for the COUNTIF. I've tried INDEX(r_1:r_8,idxRow,idxCol) and other familiar solutions that work on the sheet when referencing other ranges, but I'm new to using the =LET function. All of this works well when broken out into separate components on my spreadsheet, but I'm looking for a cleaner solution with =LET. See below for current formula:
=LET(rg, DataTable[[Q14_1]:[Q14_9]],
name, AU38,
chars, {" ",",",";"},
r, 8,
r_1, CONCATENATE(name,chars),
r_2, CONCATENATE(chars,name),
r_3, CONCATENATE(chars,name,chars),
r_4, CONCATENATE(name,chars,"*"),
r_5, CONCATENATE("*",chars,name),
r_6, CONCATENATE(chars,name,chars,"*"),
r_7, CONCATENATE("*",chars,name,chars),
r_8, CONCATENATE("*",chars,name,chars,"*"),
c, COUNTA(chars),
mSeq, SEQUENCE(r*c),
idxRow, 1+MOD(mSeq,r),
idxCol, INT((SEQUENCE(r*c)-1)/r)+1,
X, INDEX(**NeedHelpHere**,idxRow,idxCol),
SUM(COUNTIF(rg,name),COUNTIF(rg,X))
)

Give a try on below formula. If you have more delimiter like space, comma & others then you need to use more SUBSTITUTE() function.
=LET(x,FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(A1:A7," ","</s><s>"),",","</s><s>")&"</s></t>","//s"),y,FILTER(x,x="abc"),SUM(--(y<>"")))
To learn about FILTERXML() please read this article from JvdV.

I've thought about this again and am posting a solution that fits my needs.
I don't need to index a single column of potential matches to then COUNTIF, I can just COUNTIF multiple times. Additionally, I was not taking into account different combinations of chars, I was only searching for the same chars on either side of the target (e.g. ",abc," when I should have also been looking for ",abc;"). Transposing the chars array on one side is a simple way of fixing this. It also turns out that "*"&target&"*" searches for "*target*" AND "target" (duh!), so I simplified further, removing duplicative possibilities.
My final formula is below, which counts the number of times target (by itself or surrounded by any acceptable chars) is present in a given rng:
=LET(rng, DataTable[Q14_1]:[Q14_9]]),
name, $A6,
chars, {" " , "," , ";"},
r_1, CONCATENATE(name,chars,"*"),
r_2, CONCATENATE("*",chars,name),
r_3, CONCATENATE("*",chars,name,TRANSPOSE(chars),"*"),
SUM(COUNTIF(rng,name),COUNTIF(rng,r_1),COUNTIF(rng,r_2),COUNTIF(rng,r_3))
)

Related

How to extract specific text from a sentence in Excel?

I have a database that exports data like this:
How can I get for instance, the Net Rentable Area with the values needed:
E.G.
Net Rentable Area
I tried the TextSplit function but I got a spill.
Please let me know what can be done, thanks!
Also it would be nice to see it working in something such as the Asking Rate, which has a different format.
In cell C2 you can put the following formula:
=1*TEXTSPLIT(TEXTAFTER(A2, B2&" ")," ")
Note: Multiplying by 1 ensures the result will be a number instead of a text.
and here is the output:
If all tokens to find are all words (not interpreted as numbers), then you can use the following without requiring to specify the token to find:
=LET(split, 1*TEXTSPLIT(A2," "), FILTER(split, ISNUMBER(split)))
Under this assumption you can even have the corresponding array version as follow:
=LET(rng, A2:A100, input, FILTER(rng, rng <>""), IFERROR(DROP(REDUCE(0, input,
LAMBDA(acc,text, LET(split, 1*TEXTSPLIT(text," "),
nums, FILTER(split, ISNUMBER(split),""), VSTACK(acc, nums)))),1),"")
)
Note: It uses the trick for creating multiple rows using VSTACK within REDUCE. An idea suggested by #JvdV from this answer. It assumes A1 has the title of the column, if not you can use A:A instead.

Excel - multiple value search across multiple columns or one column with multiple values

I have 7 criteria = TMO-1 through to TMO-7
I have two scenarios to search from.
i have either got a single excel with TMO-6, TMO-201, TMO-67,... etc (some have a lot of values)
or i have split the cell up so the values are all in individual cells such that [TMO-6][TMO-201][TMO-67] etc
I have tried two equations from each. for the first one (the preferred solution) i have tried:
=IF(IFERROR(SEARCH("TMO-1",AB8),0) > 0, "TMO-1",IF(IFERROR(SEARCH("TMO-2",AB8),0) > 0, "TMO-2", "false"))
the problem with that is it finds anything that starts with TMO-1, so will show true if TMO-12 is in the cell.
For option 2 i tried:
=IF(AB9:AR9=TMO-1, TMO-1, IF(AB9:AR9=TMO-2, TMO-2, IF(AB9:AR9=TMO-3, TMO-3,IF(AB9:AR9=TMO-4, TMO-4, IF(AB9:AR9=TMO-5, TMO-5, IF(AB9:AR9=TMO-6, TMO-6, IF(AB9:AR9=TMO-7, TMO-7, "N/A")))))))
and i get the error #spill
any ideas ?
Assuming:
ms365 (Hence the #SPILL error);
The option between concatenated values or seperated (hence AB8 against AB9:AR9);
All numbers are prepended with TMO-;
You are looking for the 1st match in sequence (1-7);
If no match is found, you want to return "Not Found".
First thing that came to mind is to just keep the comma-seperated data in AB8 and use a simple trick to concatenate the delimiters with the sequence:
=ISNUMBER(FIND("-"&SEQUENCE(7)&",",A1&","))
To put that in practice, try:
Formula in B1:
=IFERROR(MATCH("X",IF(ISNUMBER(FIND("-"&SEQUENCE(7)&",",A1&",")),"X"),0),"Not Found")
Other options:
=#IFERROR(SORT(FILTERXML("<t><s>"&SUBSTITUTE(A1,", ","</s><s>")&"</s></t>","//s[substring(.,5)<8]")),"Not Found")
Or, using the insider BETA-functions:
=LET(X,MIN(--DROP(TEXTSPLIT(A1,"-",", "),,1)),IF(X<8,"TMO-"&X,"Not Found"))

Parse a string into a Table using FILTERXML

This is related this question. The OP proposed to give inputs to a formula that contain a list of connection quantities and speeds like this:
1x1000,2x200,1x50 would mean that there is one 1000k connection, two 200k and 1 50k. I would like to parse this into an array table like this:
1
1000
2
200
1
50
I tried this formula, but it only produces the left hand side of the table:
=LET( case, A5,
a, FILTERXML("<t><s>"&SUBSTITUTE(case,",","</s><s>")&"</s></t>","//s[contains(., 'x')]"),
FILTERXML("<t><s>"&SUBSTITUTE(a,"x","</s><s>")&"</s></t>","//s") )
where case is the input variable, a parses the table into strings containing "x" (this is to ensure that only valid "q x speed" strings are used. I then tried to split this array and... no joy.
From this post by JvdV, I think the answer can be found in the xpath, but I cannot find a solution.
Looks like you want to either spill the entire array or use it in later calculations? Either way, I came up with:
=LET(X,FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(A1,",","x"),"x","</s><s>")&"</s></t>","//s"),INDEX(X,SEQUENCE(COUNT(X)/2,2)))
Or, a littel more verbose without LET():
=INDEX(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(A1,",","x"),"x","</s><s>")&"</s></t>","//s"),SEQUENCE(LEN(A1)-LEN(SUBSTITUTE(A1,"x","")),2))
One way to get it is something like
=LET(x, FILTERXML("<t><s>"&SUBSTITUTE($A$1, ",", "</s><s>")&"</s></t>", "//s"),
IF(SEQUENCE(1,2)=1, LEFT(x, SEARCH("x",x)-1), RIGHT(x, LEN(x)-SEARCH("x",x))))
Once you break up the string by comma, you can then break up the component strings by "x" with something like
=TRANSPOSE(FILTERXML("<t><s>"&SUBSTITUTE(A7, "x", "</s><s>")&"</s></t>", "//s"))
but I'm not sure if you can combine the two actions in one go to get both width and depth dimensions (i.e. =TRANSPOSE(FILTERXML("<t><s>"&SUBSTITUTE(original_filterxml, "x", "</s><s>")&"</s></t>", "//s")) will not work).
Maybe,
In C1, formula copied right to D1 and all copied down :
=TRIM(MID(SUBSTITUTE(SUBSTITUTE(","&$A$1,"X",","),",",REPT(" ",99)),((ROW(A1)*2+COLUMN(A1))-2)*99,99))
Or,
If using FILTERXML function, try :
=IFERROR(FILTERXML("<a><b>"&SUBSTITUTE(SUBSTITUTE($A$1,"X",","),",","</b><b>")&"</b></a>","//b["&(ROW(A1)*2+COLUMN(A1))-2&"]"),"")

Find specific characters and return the next value in the cell using Excel Formula

I am not sure where to begin with the formula as I have gotten myself so confused with everything. I have a cell the contains "PON " or "PON: " or "PON = " then the actual PON (Example: PON 123467) I want to formula to return 123467 in the cell.
Examples What I want returned
I have PON 123467 for shoes 123467
I have PON: 234567-AB for food 234567-AB
I have PON - 569874-Weird for accessories 569874-Weird
I have PON = DOG-564-987 for dog food DOG-564-987
I am currently using Excel 365
Filterxml() will give you best companion here in this case. Try-
=FILTERXML("<t><s>"&SUBSTITUTE(FILTERXML("<t><s>"&SUBSTITUTE(A1," for","</s><s>")&"</s></t>","//s[1]")," ","</s><s>")&"</s></t>","//s[last()]")
Using FILTERXML, and testing for a substring following PON, you can try:
=FILTERXML("<t><s>"&SUBSTITUTE(TRIM(A1)," ","</s><s>") & "</s></t>","//s[contains(.,'PON')]/following-sibling::*[string-length(.)>2][1]")
Note that FILTERXML solution will cause a PON that is solely numeric, but with a leading zero, to drop the leading zero. Unfortunately, the xPath implementation in that function does not include the string() function
If dropping the leading zero might be a problem, you can add a character to the node that will force the number to be seen as a string. In the modified formula below, I use the unicode zero-width space, but there are others you can use. Note that this will count as a character for the string=length function, so be sure to maintain the >2 parameter:
=FILTERXML("<t><s>"&SUBSTITUTE(TRIM(A1)," ","</s><s>"&UNICHAR(8203)) & "</s></t>","//s[contains(.,'PON')]/following-sibling::*[string-length(.)>2][1]")
Because of the variablity in your data, that sometimes there are extraneous space-separated substrings between PON and your desired extract, the xpath:
locates the substring PON
returns all subsequent siblings that have a string-length of more than two (adjust if necessary)
returns the first sibling that meets that criterion.
You might try this formula.
=TRIM(LEFT(MID(A2,FIND(#{1,2,3,4,5,6,7,8,9},A2),100),FIND(" ",MID(A2,FIND(#{1,2,3,4,5,6,7,8,9},A2),100))))
It extracts the text between the first number and the first space following that number. The size of that extract is limited to 100 characters.

Formula to look for a word within a sentence

Here is the Sample Google sheet file
https://docs.google.com/spreadsheets/d/1B0CQyFeqxg2wgYHJpFxLIzw_8Pv067p0cwacWk0Nc4o/edit?usp=sharing
I have an Excel Sheet where I need to find Arabic Words and separate them.
For example, I have data like this:
//olyservice/GIS-TANSIQ01/Storage/46-أمانة منطقة عسير -بلدية بللحمر/حدود القري المطلوب اعتمادهاالمعتمد مسمايتها بالوزارة.rar
I'm looking for:
1st column: أمانة منطقة عسير
2nd column: بلدية بللحمر
3rd column: RAR
If there is no أمانة and بلدية words, the columns should be blank.
I tried these methods, without success:
=RIGHT(MID(A2,FIND("-",A2,20)+1,255),25)
and
=TRIM(MID(SUBSTITUTE(A2,"",REPT(" ",99)),MAX(1,FIND("-",SUBSTITUTE(A2,"",REPT(" ",99)))+21),99))
Since you specify certain key words to be found, we can look for those key words and then the relevant delimiter, based on your example.
In your example, أمانة is followed by the dash, and بلدية by the slash. (followed by is in terms of the right-to-left orientation of Arabic words).
Try this:
Col1: =MID(A1,FIND("أمانة",A1),FIND(CHAR(1),SUBSTITUTE(A1,"-",CHAR(1),LEN(A1) - LEN(SUBSTITUTE(A1,"-",""))))-FIND("أمانة",A1))
Col2: =MID(A1,FIND("بلدية",A1),FIND(CHAR(1),SUBSTITUTE(A1,"/",CHAR(1),LEN(A1)-LEN(SUBSTITUTE(A1,"/",""))))-FIND("بلدية",A1))
Col3: =TRIM(RIGHT(SUBSTITUTE(A1,".",REPT(" ",99)),99))
If the keywords are not found, the formula will return an Error. So you can just "wrap" the formula in IFERROR to have it return a blank if the key words are not present.
Edit:
The actual workbook does not have the same pattern as the sample you posted. In particular. Try this for column 2 data:
=MID(A2,FIND("بلدية",A2),99)
or with error suppression:
Col1: =IFERROR(MID(A2,FIND("أمانة",A2),FIND("-",A2,FIND("أمانة",A2))-FIND("أمانة",A2)),"")
Col2: =IFERROR(MID(A2,FIND("بلدية",A2),99),"")
And, the cells that are still returning the #VALUE! error do not have that keyword in the line.
For example:
A6: //olyservice/GIS-TANSIQ01/Storage/103-أمانة منطقة عسير -أحد رفيدة
does not contain بلدية
BTW, those formulas seem to both work on Sheets also.
Edit2:
Since you also posted an example in Sheets, if you can implement this in Sheets, you can use Regular Expressions to account for multiple terminations.
In that case, you would use:
=iferror(REGEXEXTRACT(A2,"(أمانة.*?)\s*(?:[-/\\.]|$)"),"")
or
iferror(REGEXEXTRACT(A2,"(بلدية.*?)\s*(?:[-/\\.\w]|$)"),"")
for the columns.
The regex extracts the pattern that begins with the key phrase, up to the terminator which can be any character in the set of -/\.A-Za-z0-9 or the end of the line. That seems to cover the examples in your sample worksheet, but if there are other terminators, you can add them to the sequence.
In Excel, this would require a VBA UDF to implement the Regex engine.

Resources