Formula to look for a word within a sentence - excel

Here is the Sample Google sheet file
https://docs.google.com/spreadsheets/d/1B0CQyFeqxg2wgYHJpFxLIzw_8Pv067p0cwacWk0Nc4o/edit?usp=sharing
I have an Excel Sheet where I need to find Arabic Words and separate them.
For example, I have data like this:
//olyservice/GIS-TANSIQ01/Storage/46-أمانة منطقة عسير -بلدية بللحمر/حدود القري المطلوب اعتمادهاالمعتمد مسمايتها بالوزارة.rar
I'm looking for:
1st column: أمانة منطقة عسير
2nd column: بلدية بللحمر
3rd column: RAR
If there is no أمانة and بلدية words, the columns should be blank.
I tried these methods, without success:
=RIGHT(MID(A2,FIND("-",A2,20)+1,255),25)
and
=TRIM(MID(SUBSTITUTE(A2,"",REPT(" ",99)),MAX(1,FIND("-",SUBSTITUTE(A2,"",REPT(" ",99)))+21),99))

Since you specify certain key words to be found, we can look for those key words and then the relevant delimiter, based on your example.
In your example, أمانة is followed by the dash, and بلدية by the slash. (followed by is in terms of the right-to-left orientation of Arabic words).
Try this:
Col1: =MID(A1,FIND("أمانة",A1),FIND(CHAR(1),SUBSTITUTE(A1,"-",CHAR(1),LEN(A1) - LEN(SUBSTITUTE(A1,"-",""))))-FIND("أمانة",A1))
Col2: =MID(A1,FIND("بلدية",A1),FIND(CHAR(1),SUBSTITUTE(A1,"/",CHAR(1),LEN(A1)-LEN(SUBSTITUTE(A1,"/",""))))-FIND("بلدية",A1))
Col3: =TRIM(RIGHT(SUBSTITUTE(A1,".",REPT(" ",99)),99))
If the keywords are not found, the formula will return an Error. So you can just "wrap" the formula in IFERROR to have it return a blank if the key words are not present.
Edit:
The actual workbook does not have the same pattern as the sample you posted. In particular. Try this for column 2 data:
=MID(A2,FIND("بلدية",A2),99)
or with error suppression:
Col1: =IFERROR(MID(A2,FIND("أمانة",A2),FIND("-",A2,FIND("أمانة",A2))-FIND("أمانة",A2)),"")
Col2: =IFERROR(MID(A2,FIND("بلدية",A2),99),"")
And, the cells that are still returning the #VALUE! error do not have that keyword in the line.
For example:
A6: //olyservice/GIS-TANSIQ01/Storage/103-أمانة منطقة عسير -أحد رفيدة
does not contain بلدية
BTW, those formulas seem to both work on Sheets also.
Edit2:
Since you also posted an example in Sheets, if you can implement this in Sheets, you can use Regular Expressions to account for multiple terminations.
In that case, you would use:
=iferror(REGEXEXTRACT(A2,"(أمانة.*?)\s*(?:[-/\\.]|$)"),"")
or
iferror(REGEXEXTRACT(A2,"(بلدية.*?)\s*(?:[-/\\.\w]|$)"),"")
for the columns.
The regex extracts the pattern that begins with the key phrase, up to the terminator which can be any character in the set of -/\.A-Za-z0-9 or the end of the line. That seems to cover the examples in your sample worksheet, but if there are other terminators, you can add them to the sequence.
In Excel, this would require a VBA UDF to implement the Regex engine.

Related

How to extract specific text from a sentence in Excel?

I have a database that exports data like this:
How can I get for instance, the Net Rentable Area with the values needed:
E.G.
Net Rentable Area
I tried the TextSplit function but I got a spill.
Please let me know what can be done, thanks!
Also it would be nice to see it working in something such as the Asking Rate, which has a different format.
In cell C2 you can put the following formula:
=1*TEXTSPLIT(TEXTAFTER(A2, B2&" ")," ")
Note: Multiplying by 1 ensures the result will be a number instead of a text.
and here is the output:
If all tokens to find are all words (not interpreted as numbers), then you can use the following without requiring to specify the token to find:
=LET(split, 1*TEXTSPLIT(A2," "), FILTER(split, ISNUMBER(split)))
Under this assumption you can even have the corresponding array version as follow:
=LET(rng, A2:A100, input, FILTER(rng, rng <>""), IFERROR(DROP(REDUCE(0, input,
LAMBDA(acc,text, LET(split, 1*TEXTSPLIT(text," "),
nums, FILTER(split, ISNUMBER(split),""), VSTACK(acc, nums)))),1),"")
)
Note: It uses the trick for creating multiple rows using VSTACK within REDUCE. An idea suggested by #JvdV from this answer. It assumes A1 has the title of the column, if not you can use A:A instead.

Excel - multiple value search across multiple columns or one column with multiple values

I have 7 criteria = TMO-1 through to TMO-7
I have two scenarios to search from.
i have either got a single excel with TMO-6, TMO-201, TMO-67,... etc (some have a lot of values)
or i have split the cell up so the values are all in individual cells such that [TMO-6][TMO-201][TMO-67] etc
I have tried two equations from each. for the first one (the preferred solution) i have tried:
=IF(IFERROR(SEARCH("TMO-1",AB8),0) > 0, "TMO-1",IF(IFERROR(SEARCH("TMO-2",AB8),0) > 0, "TMO-2", "false"))
the problem with that is it finds anything that starts with TMO-1, so will show true if TMO-12 is in the cell.
For option 2 i tried:
=IF(AB9:AR9=TMO-1, TMO-1, IF(AB9:AR9=TMO-2, TMO-2, IF(AB9:AR9=TMO-3, TMO-3,IF(AB9:AR9=TMO-4, TMO-4, IF(AB9:AR9=TMO-5, TMO-5, IF(AB9:AR9=TMO-6, TMO-6, IF(AB9:AR9=TMO-7, TMO-7, "N/A")))))))
and i get the error #spill
any ideas ?
Assuming:
ms365 (Hence the #SPILL error);
The option between concatenated values or seperated (hence AB8 against AB9:AR9);
All numbers are prepended with TMO-;
You are looking for the 1st match in sequence (1-7);
If no match is found, you want to return "Not Found".
First thing that came to mind is to just keep the comma-seperated data in AB8 and use a simple trick to concatenate the delimiters with the sequence:
=ISNUMBER(FIND("-"&SEQUENCE(7)&",",A1&","))
To put that in practice, try:
Formula in B1:
=IFERROR(MATCH("X",IF(ISNUMBER(FIND("-"&SEQUENCE(7)&",",A1&",")),"X"),0),"Not Found")
Other options:
=#IFERROR(SORT(FILTERXML("<t><s>"&SUBSTITUTE(A1,", ","</s><s>")&"</s></t>","//s[substring(.,5)<8]")),"Not Found")
Or, using the insider BETA-functions:
=LET(X,MIN(--DROP(TEXTSPLIT(A1,"-",", "),,1)),IF(X<8,"TMO-"&X,"Not Found"))

Excel COUNTIF match variations of target: LET solution?

This is a slightly more complicated issue than a simple =COUNTIF(rng,"*"&value&"*"), as found here.
I have a 2D array with cells containing data such as:
abc
def
abc def
ghi
abc,def,ghi
abcdef
ghi; def
..... and several other variations of this. I am trying to count exact matches of "abc", but I want the count to be inclusive of cells containing "abc def" and other like variations, however I can't just use the above simple COUNTIF formula since "abcdef" is not an acceptable match. The target string must stand alone or be separated from other text by an acceptable character in chars.
I think I've got this one 90% done, but the bit I need help with is combining all the possible acceptable variations of a target "name" into a flat range that I can then check my data source against for the COUNTIF. I've tried INDEX(r_1:r_8,idxRow,idxCol) and other familiar solutions that work on the sheet when referencing other ranges, but I'm new to using the =LET function. All of this works well when broken out into separate components on my spreadsheet, but I'm looking for a cleaner solution with =LET. See below for current formula:
=LET(rg, DataTable[[Q14_1]:[Q14_9]],
name, AU38,
chars, {" ",",",";"},
r, 8,
r_1, CONCATENATE(name,chars),
r_2, CONCATENATE(chars,name),
r_3, CONCATENATE(chars,name,chars),
r_4, CONCATENATE(name,chars,"*"),
r_5, CONCATENATE("*",chars,name),
r_6, CONCATENATE(chars,name,chars,"*"),
r_7, CONCATENATE("*",chars,name,chars),
r_8, CONCATENATE("*",chars,name,chars,"*"),
c, COUNTA(chars),
mSeq, SEQUENCE(r*c),
idxRow, 1+MOD(mSeq,r),
idxCol, INT((SEQUENCE(r*c)-1)/r)+1,
X, INDEX(**NeedHelpHere**,idxRow,idxCol),
SUM(COUNTIF(rg,name),COUNTIF(rg,X))
)
Give a try on below formula. If you have more delimiter like space, comma & others then you need to use more SUBSTITUTE() function.
=LET(x,FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(A1:A7," ","</s><s>"),",","</s><s>")&"</s></t>","//s"),y,FILTER(x,x="abc"),SUM(--(y<>"")))
To learn about FILTERXML() please read this article from JvdV.
I've thought about this again and am posting a solution that fits my needs.
I don't need to index a single column of potential matches to then COUNTIF, I can just COUNTIF multiple times. Additionally, I was not taking into account different combinations of chars, I was only searching for the same chars on either side of the target (e.g. ",abc," when I should have also been looking for ",abc;"). Transposing the chars array on one side is a simple way of fixing this. It also turns out that "*"&target&"*" searches for "*target*" AND "target" (duh!), so I simplified further, removing duplicative possibilities.
My final formula is below, which counts the number of times target (by itself or surrounded by any acceptable chars) is present in a given rng:
=LET(rng, DataTable[Q14_1]:[Q14_9]]),
name, $A6,
chars, {" " , "," , ";"},
r_1, CONCATENATE(name,chars,"*"),
r_2, CONCATENATE("*",chars,name),
r_3, CONCATENATE("*",chars,name,TRANSPOSE(chars),"*"),
SUM(COUNTIF(rng,name),COUNTIF(rng,r_1),COUNTIF(rng,r_2),COUNTIF(rng,r_3))
)

Remove next substring from charter on last position in Excel

I have Excel sheet which contains data similar to
Addresses
xyz,abc,olk
opn,opk,prt
we-ylj,tyf,uyfas
oiui,ytfy,tydry - We also work in bla,bla,bla
ytfyt,tyfyt,ghfyt
i-hgsd,gsdf-hgd,sdgh,- We also work in xxx,yy,zzz
ytsfgh,gfasdg,tydsfyt
I want to remove all substring which is next to the character "-" only if it's in the last position.
Result should be like
xyz,abc,olk
opn,opk,prt
we-ylj,tyf,uyfas
oiui,ytfy,tydry
ytfyt,tyfyt,ghfyt i-hgsd,gsdf-hgd,sdgh
ytsfgh,gfasdg,tydsfyt
I tried with =Substitute function but unable to replace data because of the last substring separated from "-" is not similar.
Going by your specifications, I would use two columns just so it's not a very long formula:
In B1:
=IFERROR(FIND(CHAR(1),SUBSTITUTE(A1,"-",CHAR(1),LEN(A1)-LEN(SUBSTITUTE(A1,"-",""))))-1,LEN(A1))
This gets the position of the last - or the full text length.
Then in C1:
=LEFT(A1,IF(FIND(",",A1)<B1,B1,LEN(A1)))
This checks if there's a , before the last -. If there is no ,, then the full text is taken.
EDIT: I only now noticed your edited comment. If it's just everything after - We, then I would use this:
=TRIM(LEFT(A1,IFERROR(FIND("- We",A1)-2,LEN(A1))))

Remove all text and characters except some

I have here some text strings
"16cg-301 -request","16cg-3368 - for review","16cg-3684 - for process"
what i would like to do is to remove all the text and characters except the number and the letters "cg" and - which is within the reference code.
If the string you want to extract is always before the first space in the full string then you can use SEARCH and LEFT to extract your reference code:
=LEFT(A1,SEARCH(" ",A1)-1)
This formula would take 16cg-3368 from 16cg-3368 - for review.
I suggest using something like suggested here
How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
With a replace regex similar to this
[^\dcg]*
or a match regex like this
^([0-9cg- ]+).*
else you could also work with a strange formule similar to this
=CONCATENATE(IF(NOT(ISERROR(SEARCH(MID(A2;1;1);"01234567890cg-")>0));MID(A2;1;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;2;1);"01234567890cg-")>0));MID(A2;2;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;3;1);"01234567890cg-")>0));MID(A2;3;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;4;1);"01234567890cg-")>0));MID(A2;4;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;5;1);"01234567890cg-")>0));MID(A2;5;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;6;1);"01234567890cg-")>0));MID(A2;6;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;7;1);"01234567890cg-")>0));MID(A2;7;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;8;1);"01234567890cg-")>0));MID(A2;8;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;9;1);"01234567890cg-")>0));MID(A2;9;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;10;1);"01234567890cg-")>0));MID(A2;10;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;11;1);"01234567890cg-")>0));MID(A2;11;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;12;1);"01234567890cg-")>0));MID(A2;12;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;13;1);"01234567890cg-")>0));MID(A2;13;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;14;1);"01234567890cg-")>0));MID(A2;14;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;15;1);"01234567890cg-")>0));MID(A2;15;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;16;1);"01234567890cg-")>0));MID(A2;16;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;17;1);"01234567890cg-")>0));MID(A2;17;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;18;1);"01234567890cg-")>0));MID(A2;18;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;19;1);"01234567890cg-")>0));MID(A2;19;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;20;1);"01234567890cg-")>0));MID(A2;20;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;21;1);"01234567890cg-")>0));MID(A2;21;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;22;1);"01234567890cg-")>0));MID(A2;22;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;23;1);"01234567890cg-")>0));MID(A2;23;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;24;1);"01234567890cg-")>0));MID(A2;24;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;25;1);"01234567890cg-")>0));MID(A2;25;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;26;1);"01234567890cg-")>0));MID(A2;26;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;27;1);"01234567890cg-")>0));MID(A2;27;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;28;1);"01234567890cg-")>0));MID(A2;28;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;29;1);"01234567890cg-")>0));MID(A2;29;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;30;1);"01234567890cg-")>0));MID(A2;30;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;31;1);"01234567890cg-")>0));MID(A2;31;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;32;1);"01234567890cg-")>0));MID(A2;32;1);""))
only works by now for less than 33 signs.
problem here will be that you will get unexpected behavior like this:
123cg-123 - Process => 123cg-123-c
after rereading , I think you should try an other approach than described in the question ;-)
If you want to return everything up to and including the last digit, then try:
=LEFT(A1,LOOKUP(2,1/ISNUMBER(-MID(A1,seq,1)),seq))
seq is a named formula: Formula ► Define Name
Name: seq
Refers to: =ROW(INDEX($1:$65535,1,1):INDEX($1:$65535,255,1))
seq returns an array of sequential numbers from 1 to 255.
mid(a1,seq,1)
returns an array consisting of the individual characters in the string in A1. The leading minus sign converts the digits from strings to numbers.
The lookup function will then return the position of the last digit

Resources