How to do a proper keyword search - excel

If I search for the term 'tfo' in the cell value 'TFO_xyz' then the result should be TRUE.
If I search for the term 'tfo' in the cell value 'TFO systems' then the result should be TRUE.
If I search for the term 'tfo' in the cell value 'spring TFO' then the result should be TRUE.
BUT if I check 'tfo' in the cell value 'Platform' then I want the result as FALSE
I have used the formula =IF(COUNTIF(A2,"*tfo*"),"TRUE","FALSE"), but this wont give result as FALSE when I check 'tfo' in the word 'Platform'
NOTE:
Platform should be false because tfo is coming in between a word. I'm looking result as True for cell values with just the word tfo like in tfo<‌space>America or TFO_America or <‌space>TFO systems. But I want FALSE result for the words Platform and portfolio because in these two words the term tfo comes in between alphabets.

Try this:
Dim x As Long: x = 1
With Sheet1
Do While x <= .Cells(.Rows.Count, 1).End(xlUp).Row
If VBA.Left(.Cells(x, 1).Value, 3) = "tfo" Or VBA.Right(.Cells(x, 1).Value, 3) = "tfo" Then
.Cells(x, 2).Value = True
End If
x = x + 1
Loop
End With

Try this formula. This assumes that word tfo will be at the beginning or end
Just make sure to place appropriate cell names where i have 'A2' in the formula
=IF(OR(PROPER(LEFT(A2,3))="tfo",PROPER(RIGHT(A2,3))="tfo"),TRUE,FALSE)
Test Cases Below:

My suggestion is to spend sometime to know your data and create a white-list.
Since there is no easy way to properly do fuzzy search in strings.
Function TFO_Search(strText As String) As Boolean
Dim ArryString As Variant
Dim ArryWhitelist As Variant
' Create a White-List Array
ArryWhitelist = Array("TFO_", "TFO ", "_TFO", " TFO", "tfoAmerica")
For Each ArryString In ArryWhitelist
If InStr(UCase(strText), UCase(ArryString)) > 0 Then 'force to UPPER CASE
TFO_Search = True
Exit Function
Else
TFO_Search = False
End If
Next
End Function

I see two dimensions of complexity in your question:
Where does the key word occur in the text (beginning, middle, end)
What are the characters that separate words.
The first one is fixed size, you need to handle three cases. The second one depends on the number of characters you want to accept as delimiters. Below I assumed that you accept space and underscore, however, you may expand this set by inserting more SUBSTITUTE function calls.
In my table, $A2 is the cell in which you search for the keyword, while B$1 contains the keyword.
To standardize the separator character, you need the formula:
B2=SUBSTITUTE($A2,"_"," ")
To check if the string starts with the keyword:
C2=--(LEFT($B2,LEN(B$1)+1)=B$1&" ")
To check if the string ends with the keyword:
D2=--(RIGHT($B2,LEN(B$1)+1)=" "&B$1)
To check if the keyword is in the middle of the string:
E2=--(LEN(SUBSTITUTE(UPPER($B2)," "&UPPER(B$1)&" ",""))<LEN($B2))
To evaluate the above three cases:
F2=--(0<$C2+$D2+$E2)
If you want to use a single cell, combine the formulas into:
G2=--(0<--(LEFT(SUBSTITUTE($A2,"_"," "),LEN(B$1)+1)=B$1&" ")+--(RIGHT(SUBSTITUTE($A2,"_"," "),LEN(B$1)+1)=" "&B$1)+--(LEN(SUBSTITUTE(UPPER(SUBSTITUTE($A2,"_"," "))," "&UPPER(B$1)&" ",""))<LEN(SUBSTITUTE($A2,"_"," "))))
It is not very readable in the end but I don't think there was an easier solution using Formulas only.
Note: If you want to modify the set of characters accepted as delimiters, add more SUBSTITUTE function calls to B2, then copy the Formula of F2 into notepad and replace $C2 with the formula of C2, etc., then replace $B2 with the updated Formula of B2.
Update
Building on the idea in Ron Rosenfelds comment to tigeravatar's answer, the formula can be simplified (the beginning, middle, ending cases can be joined):
=--(LEN(SUBSTITUTE(" "&UPPER($B2)&" "," "&UPPER(B$1)&" ",""))<LEN($B2))
After substituting $B2 with its formula:
=--(LEN(SUBSTITUTE(" "&UPPER(SUBSTITUTE($A2,"_"," "))&" "," "&UPPER(B$1)&" ",""))<LEN(SUBSTITUTE($A2,"_"," ")))

This formula will return true if TFO is at the beginning or end of any given word, or by itself, in the text string. It also checks every word in the text string, so TFO can be at beginning, middle, or end. The formula assumes that if a word starts or ends with TFO, then the result should be TRUE (as is the case for tfoAmerica so same rule would apply to tform), else FALSE.
=OR(ISNUMBER(SEARCH({" tfo","tfo "}," "&SUBSTITUTE(A2,"_"," ")&" ")))
Here are its results:
EDIT:
In the event that the result should only be TRUE if TFO is found by itself, then this version of the formula will suffice:
=ISNUMBER(SEARCH(" tfo "," "&SUBSTITUTE(A2,"_"," ")&" "))
Image showing results of second version:

If you can rely on VBA, then regex is a more flexible solution.
There is a good summary, of how to use them in VBA: How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
For your keyword search problem I wrote the following:
Option Explicit
' Include: Tools > References > Microsoft VBScript Regular Expressions 5.5 (C:\Windows\SysWOW64\vbscript.dll\3)
Public Function SearchKeyWord(strHay As String, strNail As String, Optional strDelimiters As String = " _,.;/", Optional lngNthOccurrence As Long = 1) As Long ' Returns 1-based index of nth occurrence or 0 if not found
Dim strPattern As String: strPattern = CreatePattern(strNail, strDelimiters)
Dim rgxKeyWord As RegExp: Set rgxKeyWord = CreateRegex(strPattern, True)
Dim mtcResult As MatchCollection: Set mtcResult = rgxKeyWord.Execute(strHay)
If (0 <= lngNthOccurrence - 1) And (lngNthOccurrence - 1 < mtcResult.Count) Then
Dim mthResult As Match: Set mthResult = mtcResult(lngNthOccurrence - 1)
SearchKeyWord = mthResult.FirstIndex + Len(mthResult.SubMatches(0)) + 1
Else
SearchKeyWord = 0
End If
End Function
Private Function CreateRegex(strPattern As String, Optional blnIgnoreCase As Boolean = False, Optional blnMultiLine As Boolean = True, Optional blnGlobal As Boolean = True) As RegExp
Dim rgxResult As RegExp: Set rgxResult = New RegExp
With rgxResult
.Pattern = strPattern
.IgnoreCase = blnIgnoreCase
.MultiLine = blnMultiLine
.Global = blnGlobal
End With
Set CreateRegex = rgxResult
End Function
Private Function CreatePattern(strNail As String, strDelimiters As String) As String
Dim strDelimitersEscaped As String: strDelimitersEscaped = RegexEscape(strDelimiters)
Dim strPattern As String: strPattern = "(^|[" & strDelimitersEscaped & "]+)(" & RegexEscape(strNail) & ")($|[" & strDelimitersEscaped & "]+)"
CreatePattern = strPattern
End Function
Private Function RegexEscape(strOriginal As String) As String
Dim strEscaped As String: strEscaped = vbNullString
Dim i As Long: For i = 1 To Len(strOriginal)
Dim strChar As String: strChar = Mid(strOriginal, i, 1)
Select Case strChar
Case ".", "$", "^", "{", "[", "(", "|", ")", "*", "+", "?", "\"
strEscaped = strEscaped & "\" & strChar
Case Else
strEscaped = strEscaped & strChar
End Select
Next i
RegexEscape = strEscaped
End Function
Once you have the above in a Module, you can insert formulas like the following:
=SearchKeyWord($A1,"tfo")
where A1 contains e.g. "tfo America".
As a third parameter, you may specify, which characters you want to treat as delimiters, by default they are space, underscore, comma, dot, semicolon and slash.
The return value is the position of the nth occurrence of the keyword, where n is the value of the fourth parameter (default: 1), or 0 if not found.
To check if the keyword is present in A1, compare the result to 0, which means not found:
=--(SearchKeyWord($A1,"tfo")<>0)

Related

How to convert multiple values in Excel cell

I'm looking for a formula that re-arranges values in excel cells.
The cells contain full names (at least one, up to 20) in the format of "last name + name(s)" but this must be convert into the following format:
1.- First letter of first name, follow by a blank space.
2.- Last name
An example can be found below.
I know I could simple use replace function, but it would be great if this might be possible to achieve via excel formulas.
Thanks in advance.
Since the strings can be so long, I would use FILTERXML and LET if you have the newest version of Excel rather than keep having to repeat things like LEFT, LEN, or FIND.
For example, if the data is always seperated by a "|" and only comes in the form "Last_Name First_Name (possible Mid_Initial)|", then you can use something like:
=LET(x, FILTERXML("<t><s>"&SUBSTITUTE(I1, "|", "</s><s>")&"</s></t>", "//s"),
y, TRIM(LEFT(RIGHT(x, LEN(x)-SEARCH(" ",x)),1)),
z, TRIM(LEFT(x, SEARCH(" ",x))),
LEFT(CONCAT(y&" "&z&", "), LEN(CONCAT(y&" "&z&", "))-2))
Try this UDF.
Option Explicit
Function ExtractName(cellRng As Range)
Dim regex As Object, mc As Object, i As Long, str As String, arr
Set regex = CreateObject("VBScript.regexp")
regex.ignorecase = False
regex.Global = True
arr = Split(cellRng.Value, "|")
str = ""
For i = LBound(arr) To UBound(arr)
regex.Pattern = "^[\w-]+\s\b."
Set mc = regex.Execute(arr(i))
str = str & Split(mc(0), " ")(1) & " " & Split(mc(0), " ")(0) & "|"
Next i
ExtractName = Left(str, Len(str) - 1)
End Function

How to remove all numeric characters separated by white space from an Excel cell?

I need to remove the numeric characters that are separated by white space ONLY in a text string in an Excel cell. For example I have:
johndoe99#mail.com 1 concentr8 on work VARIABLE1 99
I need to get:
johndoe99#mail.com concentr8 on work VARIABLE1
Either formula or VBA script solution is good. Thank you.
I think nomad is right that regex is probably a simpler option. However, I also think that by using the Split() and isNumeric() functions I've come up with a good solution here.
Sub test()
Dim cell As Range
For Each cell In Range("A1:A10") 'adjust as necessary
cell.Value2 = RemoveNumbers(cell.Value2)
Next cell
End Sub
Function RemoveNumbers(ByVal inputString As String) As String
Dim tempSplit As Variant
tempSplit = Split(inputString, " ")
Dim result As String
Dim i As Long
For i = LBound(tempSplit) To UBound(tempSplit)
If Not IsNumeric(tempSplit(i)) Then result = result & " " & tempSplit(i)
Next i
RemoveNumbers = Trim$(result)
End Function
UDF
Function RemNum(cell)
With CreateObject("VBScript.RegExp")
.Global = True: .Pattern = "\s\d+"
RemNum = .Replace(cell, vbNullString)
End With
End Function
Note that in addition to testing for spaces before and after, this also tests for the beginning or end of the string as a delimiter.
You did not indicate the case where the number is the only contents of the string. This routine will remove it but, if you want something else, specify.
Try this:
Function remSepNums(S As String) As String
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "(?:\s+|^)(?:\d+)(?=\s+|$)"
.MultiLine = True
remSepNums = .Replace(S, "")
End With
End Function
Just for fun, if you have a recent version of Excel (Office 365/2016) you can use the following array formula:
=TEXTJOIN(" ",TRUE,IF(NOT(ISNUMBER(FILTERXML("<t><s>"&SUBSTITUTE(TRIM(A1)," ","</s><s>")&"</s></t>","//s"))),FILTERXML("<t><s>"&SUBSTITUTE(TRIM(A1)," ","</s><s>")&"</s></t>","//s"),""))
FILTERXML can be used to split the string into an array of words, separated by spaces
If any word is not a number, return that word, else return a null string
Then join the segments using the TEXTJOIN function.

Remove letters from a cell leaving numbers only

I am trying to remove all letters from a cell and leave the numbers remaining.
I have found bits of code and other questions on here but none are making much sense to me.
I have in cell E23 "as12df34" and want the value of Cell E23 to read "12 34"
Can anyone help with this query please?
You could use a regular expression:
Sub UsageExample()
Dim cl
' iterate each cell
For Each cl in Range("Sheet1!A1:A100")
' replace each non digit sequence by a space
cl.Value = ReplaceRe(cl.Value, "\D+", " ")
Next
End Sub
Public Function ReplaceRe(text As String, pattern As String, replacement) As String
Static re As Object
If re Is Nothing Then
Set re = CreateObject("VBScript.RegExp")
re.Global = True
End If
re.pattern = pattern
ReplaceRe = re.Replace(text, replacement)
End Function
Here's a UDF if you want to do something like that. Making "Spaces" True or False will allow for you to have a single space where non-numeric characters used to be.
Sub Test()
Debug.Print Nums("as12df34", True)
End Sub
Function Nums(What As String, Spaces As Boolean) As String
Dim i As Long
For i = 1 To Len(What)
If IsNumeric(Mid(What, i, 1)) = True Then Nums = Nums & Mid(What, i, 1)
If IsNumeric(Mid(What, i, 1)) = False Then Nums = Nums & " "
Next i
Nums = Trim(Nums)
If Spaces = True Then
Do Until InStr(Nums, " ") = 0
Nums = Replace(Nums, " ", " ")
Loop
Else
Do Until InStr(Nums, " ") = 0
Nums = Replace(Nums, " ", "")
Loop
End If
End Function
I know this may have been answered, but I wanted to let others that may come across this question to see another possibility. I came up with an obvious solution to eliminate all the letters to be replaced with nothing to only leave numbers in the cell. You can just replace the "" for a " " to leave the space that the letters left behind.
It's a huge clutter, but I use it and it works as intended just drag the function to the next cell. No typing required. In my situation, I had a word like "platinum ingot, 3" and it will remove all the letters, comma, and spaces and leaves 3 which can be used to calculate stuff with. I use this to hold 2 values in 1 cell when 1 of the value is never going to also contain numbers.
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE( SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE( SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE( SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE( SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE( SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE( F5,"A",""),"B",""),"C",""),"D",""),"E",""),"F",""),"G",""),"H",""),"I",""),"J",""),"K",""),"L",""),"M",""),"N",""),"O",""),"P",""),"Q",""),"R",""),"S",""),"T",""),"U",""),"V",""),"W",""),"X",""),"Y",""),"Z",""),"a",""),"b",""),"c",""),"d",""),"e",""),"f",""),"g",""),"h",""),"i",""),"j",""),"k",""),"l",""),"m",""),"n",""),"o",""),"p",""),"q",""),"r",""),"s",""),"t","") ,"u",""),"v","") ,"w",""),"x",""),"y",""),"z",""),",","")," ","")

Excel Exact Word Matching

Let's say I have "Vegas is great" in cell A1. I want to write a formula that looks for the exact word, "gas" in cells. Vegas ≠ gas, but the only search formula I'm finding:
=ISNUMBER(SEARCH("gas",lower(A1))
returns true. Is there anyway to do do exact matching? I'd ideally like it to be non-case sensitive which I believe is satisfied by wrapping A1 in lower().
I believe to correctly cover cases you have to pad spaces before and after the term "gas" and the search term. This will ensure that gas will be found at the beginning or end of a cell, and also prevent it from being found in the middle of any words. Your post does not indicate whether punctuation can exist in the file, but to accomodate punctuation padding spaces around the search will not work correctly, you would have to include the case of " gas. " " gas! " etc to allow for any punctuation specifically. If you are worried about catching values like "gas.cost" or similar you can use the same padding around the punctuation search.
=Or(ISNUMBER(SEARCH(" gas ", " "&A1&" ")),ISNUMBER(SEARCH(" gas. ", " "&A1&" ")))
Is a basic search that should return the word gas by itself, or "gas." By padding a space after "gas." in the search it will find it as the final word in a sentence, or at the end of a cell.
Edit: Dropped a parentheses.
The Find function is case sensitive. The SEARCH function is not. There is no need for the LOWER function if you are using SEARCH.
SEARCH(<find_text>, <within_text>, [optional]<start_num>)
Wrap both the find_text and within_text in spaces and perform your SEARCH.
        
The formula in B1 is,
=ISNUMBER(SEARCH(" gas ", " "&A1&" "))
Fill down as necessary.
One can also use regular expressions in VBA to accomplish this. In Regular Expressions, "\b" represents a word boundary. A word boundary is defined as the position between a word and a non-word character or the beginning or end of the line. Word characters are [A-Za-z0-9_] (letters, digits, and the underscore). Hence, one can use this UDF. You do need to be aware that words which include non-word characters (e.g. a hyphen) may be treated differently than you expect. And if you are dealing with non-English letters, the Pattern would need to be modified.
But the code is fairly compact.
Option Explicit
Function reFindWord(FindWord As String, SearchText As String, Optional MatchCase As Boolean = False) As Boolean
Dim RE As Object
Dim sPattern As String
Set RE = CreateObject("vbscript.regexp")
sPattern = "\b" & FindWord & "\b"
With RE
.Pattern = sPattern
.ignorecase = Not MatchCase
reFindWord = .test(SearchText)
End With
End Function
I think the only way to cover all possible punctuation surrounding the search word is to create a custom macro function. Use the enhanced split function to tokenize the sentence into an array of words then search the array for a match.
Enhanced split function
https://msdn.microsoft.com/en-us/library/aa155763
How to create custom macro
http://www.wikihow.com/Create-a-User-Defined-Function-in-Microsoft-Excel
Code to create FindEngWord function
Public Function FindEngWord(ByVal TextToSearch As String, ByVal WordToFind As String) As Boolean
Dim WrdArray() As String
Dim text_string As String
Dim isFound As Boolean
isFound = False
text_string = TextToSearch
WrdArray() = Split(text_string)
isFound = False
For i = 0 To UBound(WrdArray)
If LCase(WrdArray(i)) = LCase(WordToFind) Then
isFound = True
End If
Next i
FindEngWord = isFound
End Function
Public Function Split(ByVal InputText As String, _
Optional ByVal Delimiter As String) As Variant
' This function splits the sentence in InputText into
' words and returns a string array of the words. Each
' element of the array contains one word.
' This constant contains punctuation and characters
' that should be filtered from the input string.
Const CHARS = ".!?,;:""'()[]{}"
Dim strReplacedText As String
Dim intIndex As Integer
' Replace tab characters with space characters.
strReplacedText = Trim(Replace(InputText, _
vbTab, " "))
' Filter all specified characters from the string.
For intIndex = 1 To Len(CHARS)
strReplacedText = Trim(Replace(strReplacedText, _
Mid(CHARS, intIndex, 1), " "))
Next intIndex
' Loop until all consecutive space characters are
' replaced by a single space character.
Do While InStr(strReplacedText, " ")
strReplacedText = Replace(strReplacedText, _
" ", " ")
Loop
' Split the sentence into an array of words and return
' the array. If a delimiter is specified, use it.
'MsgBox "String:" & strReplacedText
If Len(Delimiter) = 0 Then
Split = VBA.Split(strReplacedText)
Else
Split = VBA.Split(strReplacedText, Delimiter)
End If
End Function
Can be called from your excel sheet with this.
=FindEngWord(A1,"gas")
I think this will handle all the cases that you are planning to handle:
=OR(ISNUMBER(SEARCH(" gas",LOWER(A1), 1 )), LEFT(A1,3)= "gas")
I added a space before the "gas" in the search. And if the gas was the only word in the cell or the first word in the cell, the right part of this function handles that case.

Returning a numeric value on either side of a dash in a string?

Does anyone know how to return only the numeric value immediately on either side of a dash in a string?
For example, let's say we have the following string "Text, 2-78, 88-100, 101". I'm looking for a way to identify a dash and then return one of the numbers (left or right).
Ultimately I would like to check to see if a given number, let's say 75, falls within any of the ranges noted in the string. Ideally it would see that 75 falls within "2-78".
Any help would be greatly appreciated!
Go to Tools->References and check "Microsoft VBScript Regular Expressions 5.5." Then you can do something like this. (I know this isn't good code, but it's the idea...) Also, this finds all the #-# patterns and prints either the left or right number for all of them (based on whether the boolean "left" is true or false).
Dim str, res As String
str = "Text, 2-78, 88-100, 101"
Dim left As Boolean
left = False
Dim re1 As New RegExp
re1.Pattern = "\d+-\d+"
re1.Global = True
Dim m, n As Match
For Each m In re1.Execute(str)
Dim re2 As New RegExp
re2.Global = False
If left Then
re2.Pattern = "\d+"
Else
re2.Pattern = "-\d+"
End If
For Each n In re2.Execute(m.Value)
res = n.Value
If Not left Then
res = Mid(res, 2, Len(str))
End If
Next
MsgBox res
Next
You can do this many different ways with VBA. Using the Split() function to convert into an array, first using the commas as a delimiter and then using the dash would probably be a way to go.
That said, if you want a quick and dirty way to do this with excel ( from which you could record a macro ) here is what you can do.
Paste your target string into a cell.
Run Text to Columns on it, using the comma as your deliminator.
Copy the row your now have and Paste-Transpose onto a new sheet.
Run Text to Columns again on your transposed column, this time with the dash as your deliminator.
You now have side by side columns of your numbers, which you can compare to your target values as needed.
You may need to use the Trim() functions in there somewhere to remove whitespace, but hopefully the text to columns would leave you with numbers instead of text numbers.
Ultimately I think there are lots of ways you could approach this sort of problem. It looks like a good way to try and use RegExp. RegExp is not my speciality but I do like to try and use it to answer some Q's here on SO. This code has been tested for your example data and is working properly.
Something like this, assuming your text is in cell A1, and you're testing a value like 75, this also captures single digits in your string in the match collection:
Sub TestRegExp
Dim m As Match
Dim testValue As Long
Dim rangeArray As Variant
testValue = 75 'or whatever value you're trying to find
pattern = "[\d]+[-][\d]+\b|[\d]+"
Set re = New RegExp
re.pattern = pattern
re.Global = True
re.IgnoreCase = True 'doesn't really matter since you're looking for numbers
Set allMatches = re.Execute([A1])
For Each m In allMatches
rangeArray = Split(m, "-")
Select Case UBound(rangeArray)
Case 0
If testValue = rangeArray(0) Then
msg = testValue & " = " & m
Else:
msg = testValue & " NOT " & m
End If
Case 1
If testValue >= CLng(rangeArray(0)) And testValue <= CLng(rangeArray(1)) Then
msg = testValue & " is within range: " & m
Else:
msg = testValue & " is not within range: " & m
End If
Case Else
End Select
MsgBox msg, vbInformation
Next
End Sub

Resources