I need help with my code that displays an input box and the user inputs a name then the code splits the names and counts the names displaying the following:
Sub ParseName()
Dim Name As String
Dim Count As Integer
Dim Cell As Object
Dim n As Integer
Count = 0
Name = InputBox("Enter First Name, Middle Name, and Last Name")
If Name = "" Then
For Each Cell In Selection
n = InStr(1, Cell.Value, Name)
While n <> 0
Count = Count + 1
n = InStr(n + 1, Cell.Value, Name)
Next Cell
MsgBox Count & " Occurrences of " & Name
End If
End Sub
Absolutely. You could split the names, but it probably wouldn't help. VBA's split doesn't allow you to split on single characters, AFAIK, as other languages might if you split them a specific delimiter. So you could just loop through the characters using MID to see if each letter is a space or not.
There is a way without any splitting or looping. You can just replace the space and get the length of the what's left.
Len(Replace(Name, " ", ""))
where REPLACE just replaces one string with another, in this case replacing all the spaces with nothing, and LEN just counts the characters in a string.
Here's your code rewritten to use this method, with the unnecessary code and variables removed. I would also change the Name variable, since I believe that is a reserved word in VBA. It will let you do it, but you're potentially impacting some existing behavior. For this particular purpose, using a standalone function to get the character count is someh
Sub ParseName()
Dim fullName As String, charCount As Integer
fullName = InputBox("Enter First Name, Middle Name, and Last Name")
If fullName <> "" Then
charCount = Len(Replace(fullName, " ", ""))
MsgBox fullName & " has " & charCount & " characters"
End If
End Sub
Bear in mind, however, that there are plenty of other character codes you might not want to count. Tabs, new lines, any of a number of whitespace characters. Non-character symbols. Things of that nature.
Also, this code does not check that the string even contains letters, or that the user input has three names, or that it is in the format First Middle Last.
If I search for the term 'tfo' in the cell value 'TFO_xyz' then the result should be TRUE.
If I search for the term 'tfo' in the cell value 'TFO systems' then the result should be TRUE.
If I search for the term 'tfo' in the cell value 'spring TFO' then the result should be TRUE.
BUT if I check 'tfo' in the cell value 'Platform' then I want the result as FALSE
I have used the formula =IF(COUNTIF(A2,"*tfo*"),"TRUE","FALSE"), but this wont give result as FALSE when I check 'tfo' in the word 'Platform'
NOTE:
Platform should be false because tfo is coming in between a word. I'm looking result as True for cell values with just the word tfo like in tfo<space>America or TFO_America or <space>TFO systems. But I want FALSE result for the words Platform and portfolio because in these two words the term tfo comes in between alphabets.
Try this:
Dim x As Long: x = 1
With Sheet1
Do While x <= .Cells(.Rows.Count, 1).End(xlUp).Row
If VBA.Left(.Cells(x, 1).Value, 3) = "tfo" Or VBA.Right(.Cells(x, 1).Value, 3) = "tfo" Then
.Cells(x, 2).Value = True
End If
x = x + 1
Loop
End With
Try this formula. This assumes that word tfo will be at the beginning or end
Just make sure to place appropriate cell names where i have 'A2' in the formula
=IF(OR(PROPER(LEFT(A2,3))="tfo",PROPER(RIGHT(A2,3))="tfo"),TRUE,FALSE)
Test Cases Below:
My suggestion is to spend sometime to know your data and create a white-list.
Since there is no easy way to properly do fuzzy search in strings.
Function TFO_Search(strText As String) As Boolean
Dim ArryString As Variant
Dim ArryWhitelist As Variant
' Create a White-List Array
ArryWhitelist = Array("TFO_", "TFO ", "_TFO", " TFO", "tfoAmerica")
For Each ArryString In ArryWhitelist
If InStr(UCase(strText), UCase(ArryString)) > 0 Then 'force to UPPER CASE
TFO_Search = True
Exit Function
Else
TFO_Search = False
End If
Next
End Function
I see two dimensions of complexity in your question:
Where does the key word occur in the text (beginning, middle, end)
What are the characters that separate words.
The first one is fixed size, you need to handle three cases. The second one depends on the number of characters you want to accept as delimiters. Below I assumed that you accept space and underscore, however, you may expand this set by inserting more SUBSTITUTE function calls.
In my table, $A2 is the cell in which you search for the keyword, while B$1 contains the keyword.
To standardize the separator character, you need the formula:
B2=SUBSTITUTE($A2,"_"," ")
To check if the string starts with the keyword:
C2=--(LEFT($B2,LEN(B$1)+1)=B$1&" ")
To check if the string ends with the keyword:
D2=--(RIGHT($B2,LEN(B$1)+1)=" "&B$1)
To check if the keyword is in the middle of the string:
E2=--(LEN(SUBSTITUTE(UPPER($B2)," "&UPPER(B$1)&" ",""))<LEN($B2))
To evaluate the above three cases:
F2=--(0<$C2+$D2+$E2)
If you want to use a single cell, combine the formulas into:
G2=--(0<--(LEFT(SUBSTITUTE($A2,"_"," "),LEN(B$1)+1)=B$1&" ")+--(RIGHT(SUBSTITUTE($A2,"_"," "),LEN(B$1)+1)=" "&B$1)+--(LEN(SUBSTITUTE(UPPER(SUBSTITUTE($A2,"_"," "))," "&UPPER(B$1)&" ",""))<LEN(SUBSTITUTE($A2,"_"," "))))
It is not very readable in the end but I don't think there was an easier solution using Formulas only.
Note: If you want to modify the set of characters accepted as delimiters, add more SUBSTITUTE function calls to B2, then copy the Formula of F2 into notepad and replace $C2 with the formula of C2, etc., then replace $B2 with the updated Formula of B2.
Update
Building on the idea in Ron Rosenfelds comment to tigeravatar's answer, the formula can be simplified (the beginning, middle, ending cases can be joined):
=--(LEN(SUBSTITUTE(" "&UPPER($B2)&" "," "&UPPER(B$1)&" ",""))<LEN($B2))
After substituting $B2 with its formula:
=--(LEN(SUBSTITUTE(" "&UPPER(SUBSTITUTE($A2,"_"," "))&" "," "&UPPER(B$1)&" ",""))<LEN(SUBSTITUTE($A2,"_"," ")))
This formula will return true if TFO is at the beginning or end of any given word, or by itself, in the text string. It also checks every word in the text string, so TFO can be at beginning, middle, or end. The formula assumes that if a word starts or ends with TFO, then the result should be TRUE (as is the case for tfoAmerica so same rule would apply to tform), else FALSE.
=OR(ISNUMBER(SEARCH({" tfo","tfo "}," "&SUBSTITUTE(A2,"_"," ")&" ")))
Here are its results:
EDIT:
In the event that the result should only be TRUE if TFO is found by itself, then this version of the formula will suffice:
=ISNUMBER(SEARCH(" tfo "," "&SUBSTITUTE(A2,"_"," ")&" "))
Image showing results of second version:
If you can rely on VBA, then regex is a more flexible solution.
There is a good summary, of how to use them in VBA: How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
For your keyword search problem I wrote the following:
Option Explicit
' Include: Tools > References > Microsoft VBScript Regular Expressions 5.5 (C:\Windows\SysWOW64\vbscript.dll\3)
Public Function SearchKeyWord(strHay As String, strNail As String, Optional strDelimiters As String = " _,.;/", Optional lngNthOccurrence As Long = 1) As Long ' Returns 1-based index of nth occurrence or 0 if not found
Dim strPattern As String: strPattern = CreatePattern(strNail, strDelimiters)
Dim rgxKeyWord As RegExp: Set rgxKeyWord = CreateRegex(strPattern, True)
Dim mtcResult As MatchCollection: Set mtcResult = rgxKeyWord.Execute(strHay)
If (0 <= lngNthOccurrence - 1) And (lngNthOccurrence - 1 < mtcResult.Count) Then
Dim mthResult As Match: Set mthResult = mtcResult(lngNthOccurrence - 1)
SearchKeyWord = mthResult.FirstIndex + Len(mthResult.SubMatches(0)) + 1
Else
SearchKeyWord = 0
End If
End Function
Private Function CreateRegex(strPattern As String, Optional blnIgnoreCase As Boolean = False, Optional blnMultiLine As Boolean = True, Optional blnGlobal As Boolean = True) As RegExp
Dim rgxResult As RegExp: Set rgxResult = New RegExp
With rgxResult
.Pattern = strPattern
.IgnoreCase = blnIgnoreCase
.MultiLine = blnMultiLine
.Global = blnGlobal
End With
Set CreateRegex = rgxResult
End Function
Private Function CreatePattern(strNail As String, strDelimiters As String) As String
Dim strDelimitersEscaped As String: strDelimitersEscaped = RegexEscape(strDelimiters)
Dim strPattern As String: strPattern = "(^|[" & strDelimitersEscaped & "]+)(" & RegexEscape(strNail) & ")($|[" & strDelimitersEscaped & "]+)"
CreatePattern = strPattern
End Function
Private Function RegexEscape(strOriginal As String) As String
Dim strEscaped As String: strEscaped = vbNullString
Dim i As Long: For i = 1 To Len(strOriginal)
Dim strChar As String: strChar = Mid(strOriginal, i, 1)
Select Case strChar
Case ".", "$", "^", "{", "[", "(", "|", ")", "*", "+", "?", "\"
strEscaped = strEscaped & "\" & strChar
Case Else
strEscaped = strEscaped & strChar
End Select
Next i
RegexEscape = strEscaped
End Function
Once you have the above in a Module, you can insert formulas like the following:
=SearchKeyWord($A1,"tfo")
where A1 contains e.g. "tfo America".
As a third parameter, you may specify, which characters you want to treat as delimiters, by default they are space, underscore, comma, dot, semicolon and slash.
The return value is the position of the nth occurrence of the keyword, where n is the value of the fourth parameter (default: 1), or 0 if not found.
To check if the keyword is present in A1, compare the result to 0, which means not found:
=--(SearchKeyWord($A1,"tfo")<>0)
I need to remove the numeric characters that are separated by white space ONLY in a text string in an Excel cell. For example I have:
johndoe99#mail.com 1 concentr8 on work VARIABLE1 99
I need to get:
johndoe99#mail.com concentr8 on work VARIABLE1
Either formula or VBA script solution is good. Thank you.
I think nomad is right that regex is probably a simpler option. However, I also think that by using the Split() and isNumeric() functions I've come up with a good solution here.
Sub test()
Dim cell As Range
For Each cell In Range("A1:A10") 'adjust as necessary
cell.Value2 = RemoveNumbers(cell.Value2)
Next cell
End Sub
Function RemoveNumbers(ByVal inputString As String) As String
Dim tempSplit As Variant
tempSplit = Split(inputString, " ")
Dim result As String
Dim i As Long
For i = LBound(tempSplit) To UBound(tempSplit)
If Not IsNumeric(tempSplit(i)) Then result = result & " " & tempSplit(i)
Next i
RemoveNumbers = Trim$(result)
End Function
UDF
Function RemNum(cell)
With CreateObject("VBScript.RegExp")
.Global = True: .Pattern = "\s\d+"
RemNum = .Replace(cell, vbNullString)
End With
End Function
Note that in addition to testing for spaces before and after, this also tests for the beginning or end of the string as a delimiter.
You did not indicate the case where the number is the only contents of the string. This routine will remove it but, if you want something else, specify.
Try this:
Function remSepNums(S As String) As String
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "(?:\s+|^)(?:\d+)(?=\s+|$)"
.MultiLine = True
remSepNums = .Replace(S, "")
End With
End Function
Just for fun, if you have a recent version of Excel (Office 365/2016) you can use the following array formula:
=TEXTJOIN(" ",TRUE,IF(NOT(ISNUMBER(FILTERXML("<t><s>"&SUBSTITUTE(TRIM(A1)," ","</s><s>")&"</s></t>","//s"))),FILTERXML("<t><s>"&SUBSTITUTE(TRIM(A1)," ","</s><s>")&"</s></t>","//s"),""))
FILTERXML can be used to split the string into an array of words, separated by spaces
If any word is not a number, return that word, else return a null string
Then join the segments using the TEXTJOIN function.
I'm looking for a macro (preferably a function) that would take cell contents, split it into separate words, compare them to one another and remove the shorter words.
Here's an image of what I want the output to look like (I need the words that are crossed out removed):
I tried to write a macro myself, but it doesn't work 100% properly because it's not taking the last words and sometimes removes what shouldn't be removed. Also, I have to do this on around 50k cells, so a macro takes a lot of time to run, that's why I'd prefer it to be a function. I guess I shouldn't use the replace function, but I couldn't make anything else work.
Sub clean_words_containing_eachother()
Dim sht1 As Worksheet
Dim LastRow As Long
Dim Cell As Range
Dim cell_value As String
Dim word, word2 As Variant
Set sht1 = ActiveSheet
col = InputBox("Which column do you want to clear?")
LastRow = sht1.Cells(sht1.Rows.Count, col).End(xlUp).Row
Let to_clean = col & "2:" & col & LastRow
For i = 2 To LastRow
For Each Cell In sht1.Range(to_clean)
cell_value = Cell.Value
cell_split = Split(cell_value, " ")
For Each word In cell_split
For Each word2 In cell_split
If word <> word2 Then
If InStr(word2, word) > 0 Then
If Len(word) < Len(word2) Then
word = word & " "
Cell = Replace(Cell, word, " ")
ElseIf Len(word) > Len(word2) Then
word2 = word2 & " "
Cell = Replace(Cell, word2, " ")
End If
End If
End If
Next word2
Next word
Next Cell
Next i
End Sub
Assuming that the retention of the third word in your first example is an error, since books is contained later on in notebooks:
5003886 book books bound case casebound not notebook notebooks office oxford sign signature
and also assuming that you would want to remove duplicate identical words, even if they are not contained subsequently in another word, then we can use a Regular Expression.
The regex will:
Capture each word
look-ahead to see if that word exists later on in the string
if it does, remove it
Since VBA regexes cannot also look-behind, we work-around this limitation by running the regex a second time on the reversed string.
Then remove the extra spaces and we are done.
Option Explicit
Function cleanWords(S As String) As String
Dim RE As Object, MC As Object, M As Object
Dim sTemp As String
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.Pattern = "\b(\w+)\b(?=.*\1)"
.ignorecase = True
'replace looking forward
sTemp = .Replace(S, "")
' check in reverse
sTemp = .Replace(StrReverse(sTemp), "")
'return to normal
sTemp = StrReverse(sTemp)
'Remove extraneous spaces
cleanWords = WorksheetFunction.Trim(sTemp)
End With
End Function
Limitations
punctuation will not be removed
a "word" is defined as containing only the characters in the class [_A-Za-z0-9] (letters, digits and the underscore).
if any words might be hyphenated, or contain other non-word characters
in the above, they will be treated as two separate words
if you want it treated as a single word, then we might need to change the regex
General steps:
Write cell to array (already working)
for each element (x), go through each element (y) (already working)
if x is in y AND y is longer that x THEN set x to ""
concat array back into string
write string to cell
String/array manipulations are much faster than operations on cells, so this will give you some increase in performance (depending on the amount of words you need to replace for each cell).
The "last word problem" might be that you dont have a space after the last word within your cells, since you only replace word + " " with " ".
I have a column containing multiple string values, like a sentence.
in that sentence i want to find one or all alphanumeric values of 10 or more characters containing atleast one - , and put the resulting values in another column.
For example:
the column containing sentence is like:
upgrade 15.07.2010, old No: WI82-01062. User moved to No: WI12-01012 02.04.2012 to a 2 user network.
or
Upgrade from lite 7/6/07, old No: PTX7-89C367EC5052-01211
Ideally I want a column with values like WI82-01062, WI12-01012 for the first example, and PTX7-89C367EC5052-01211 for the second example.
May be searching for the - in the string and finding the first occurrence of blank space at both ends would help, but I do not have any clue how to write that in excel term.
Thanks
You could probably use a regex like this (there may be better patterns!):
Function ExtractData(r As Variant) As String
Static oRE As Object
Dim sTemp As String
Dim n As Long
Dim matches
If oRE Is Nothing Then
Set oRE = CreateObject("vbscript.regexp")
With oRE
.Pattern = "[A-Za-z0-9\-]{10,}"
.Global = True
End With
End If
Set matches = oRE.Execute(r)
If matches.Count > 0 Then
For n = 1 To matches.Count
sTemp = sTemp & ", " & matches(n - 1)
Next n
ExtractData = Mid$(sTemp, 3)
End If
End Function