I have a string within a cell and I am trying to bold certain parts of that string. I have my code setup so each case is a line within that cell.
The first cell is what I am starting out with, and the one below it is what I am trying to do. Below is my code on what I have so far.
Sub test()
For Each cel In Range("A1:A" & Cells(Rows.Count, 1).End(xlUp).Row)
Dim arr, line As Long, pos As Long, txt, length, dashPos
arr = Split(cel.Value, Chr(10)) ' Spliting cell contents by newline character
pos = 1
For line = 1 To UBound(arr) + 1
txt = arr(line - 1)
length = Len(txt)
'check which line we're on...
Select Case line
Case 4: 'Underline on line 4
cel.Characters(pos, length).Font.Underline = True
Case 5: 'Bold the team players
Case 6: 'Underline on line 6
cel.Characters(pos, length).Font.Underline = True
End Select
pos = pos + Len(txt) + 1 'start position for next line
Next line
Next cel
End Sub
Since you are looking up a certain pattern I thought this could be done through regular expressions since each match in the MatchCollection2 object will have a starting index including the length of the captured pattern. Let's imagine the following sample data:
Now we can apply the following code:
Sub Test()
Dim str As String: str = [A1]
Dim colMatch, objMatch
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "\S+:(?=.*$)"
If .Test(str) = True Then
Set colMatch = .Execute(str)
For Each objMatch In colMatch
Range("A1").Characters(objMatch.FirstIndex, objMatch.Length).Font.Bold = True
Next
End If
End With
End Sub
The result:
About the regular expression's pattern:
\S+:(?=.*$)
You can see an online demo here and a small breakdown below:
\S+: - 1+ Non-whitespace character up to and including a colon.
(?= - A positive lookahead:
.*$ - 0+ characters other than newline up to the end string anchor.
) - Close positive lookahead.
Note: We need to either forget about the "Newline" property of the regex object or set it's value to FALSE. In the example I gave I simply didn't include it because it will then default to FALSE. If this was set to true the end string anchor won't simply match the end of the whole string but the end of each line (which is what we want to avoid if we don't want to match "Server:").
The answer is in another StackOverflow question here:
excel vba: make part of string bold
Which is similar to this:
Change color of certain characters in a cell
It's roughly:
{{CELL OR CELLS NEEDING BOLD CHARACTERS}}.Characters({{LOCATION, INFO}}).Font
.FontStyle = "Bold"
Related
I am trying to extract substring from main string. String have not same pattern. Main string is in Column "I". Desired output should be as per column "J". I have to extract substring between "FL" and "WNG".
I have tried to write code put it is not giving proper output. Can you please assist with alternate solution to get desired output using VBA.
Sub Get_Substring()
Range("K2") = Mid(Range("I2"), InStrRev(Range("I2"), "FL") + 1, _
InStrRev(Range("I2"), "WNG") - _
InStrRev(Range("I2"), "FL") - 1)
End Sub
Try the following...
Range("K2") = Mid(Range("I2"), InStrRev(Range("I2"), "FL") + 2, _
InStrRev(Range("I2"), "WNG") - _
InStrRev(Range("I2"), "FL") - 2)
Although, I would make it clear that you want the value for each of the ranges, as follows...
Range("K2").Value = Mid(Range("I2").Value, InStrRev(Range("I2").Value, "FL") + 2, _
InStrRev(Range("I2").Value, "WNG") - _
InStrRev(Range("I2").Value, "FL") - 2)
The next piece of code extracts the necessary string using arrays, too. But it can do it, even if more "WNG" strings exist in the string to be analyzed:
Private Function ExtractString(strTxt As String) As String
Dim arrFL, arrWNG, i As Long
arrFL = Split(strTxt, "FL")
For i = 1 To UBound(arrFL) 'start from the second array element
arrWNG = Split(arrFL(i), "WNG") 'split each first array element by "WNG"
'if the array contains at least a "WNG" string:
If UBound(arrWNG) > 0 Then ExtractString = arrWNG(0): Exit Function 'extract the first array element
Next
End Function
Note: If more pairs "FL" folowed by "WNG" exists, the function can be adapted to return an array, containing all such potential occurrences...
It can be tested using the next testing Sub:
Sub testExtractString()
Dim x As String
x = "John12REGNO02FL02WNGARM01"
'x = "John12WNGREGNO02FL02WNGARM01"
'x = "John12WNGREGNO02FL02WNGARWNGM01"
Debug.Print ExtractString(x)
End Sub
Just uncomment each x definition row...
I'll chuck in a solution based on regex to assure you got the exact substring:
Sub Test()
Dim stringIn As String: stringIn = "John12REGNO02FL02WNGARM01"
Debug.Print (Extract(stringIn))
End Sub
Function Extract(stringIn As String) As String
With CreateObject("vbscript.regexp")
.Pattern = "^.*FL(.*?)WNG"
If .Test(stringIn) = True Then
Extract = .Execute(stringIn)(0).Submatches(0)
Else
Extract = "None Found"
End If
End With
End Function
^ - Start line anchor.
.*FL - 0+ Chars greedy, and therefor untill, the last occurence of "FL".
(.*?) - A capture group with 0+ but lazy characters and therefor upto the nearest occurence of:
WNG - Literally match "WNG".
NOTE, you could make a more strict pattern only catching digits of that's the only type of characters possible, e.g: ^.*FL(\d*)WNG.
Here is an online demo
You can try the following udf:
Public Function FLWNG(s As String) As String
'Purpose: get the substring enclosed by the most right pair of FL..WNG
Dim tmp
tmp = Split(Replace(s, "WNG", "FL"), "FL")
FLWNG = tmp(UBound(tmp) - 1)
End Function
Explanation
Replacing all occurencies of WNG in the original string (s) with FL allows to split the resulting string by the FL delimiter only.
Assuming that the original string has at least one enclosing structure, you get the enclosed content as next to last element, i.e. via tmp(Ubound(tmp)-1).
I completed code to remove any data in front of a string, add some text (with a space) to the front and store it back in the cell.
However, every time I run the macro (to check if changes that I've made are working for example), a new space is added in between the words.
The code that removes anything before the name and adds the required string. I have called a InStr function and stored the value in integer pos. Note that this is in a loop over a specific range.
If pos > 0 Then
'Removes anything before the channel name
cellValue.Offset(0, 2) = Right(cell, Len(cell) - InStr(cell, pos) - 2)
'Add "DA" to the front of the channel name
cellValue.Offset(0, 0) = "DA " & Right(cell, Len(cell) - InStr(cell, pos) - 2)
'Aligns the text to the right
cellValue.Offset(0, 2).HorizontalAlignment = xlRight
End If
An additional "DA" is not being added and I haven't made any other functions to add spaces anywhere. The extra space is not added if adding "DA " is changed to "DA".
I'd prefer not to add another function/sub/something somewhere to search and remove any extra spaces.
What the string is AND what is in front of the string is unknown. It could be numbers, characters, spaces or exactly what I want it to be. For example, it could be "Q-Quincey", "BA Bob", "DA White" etc. I thought that searching through the cell for the string I want (Quincey, Bob, White) and altering the cell as needed would be the best way.
Solution that you all helped me come up with:
If pos > 0 Then
modString = Right(cell, Len(cell) - InStr(cell, pos) - 2)
'Removes anything before the channel name and places it in the last column
cellValue.Offset(0, 2) = modString
'Aligns the last column text to the right
cellValue.Offset(0, 2).HorizontalAlignment = xlRight
cellValue.Offset(0, 2).Font.Size = 8
'Add "DA" to the front of the channel name in the rightmost column
If StartsWith(cell, "DA ") = True Then
cellValue.Replace cell, "DA" & modString
Else
cellValue.Replace cell, "DA " & modString
End If
End If
Maybe this is something you can work with:
Sample data:
Sample code:
Sub Test()
With Sheet1.Range("A1:A4")
.Replace "*quincey", "AD Quincey"
End With
End Sub
Result:
In your examples, it seems you want to replace the first "word" in the string with something else. If that is always the case, the following function, which makes use of Regular Expressions, can do that:
Option Explicit
Function replaceStart(str As String, replWith As String) As String
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = False
.MultiLine = True
.Pattern = "^\S+\W(?=\w)"
replaceStart = .Replace(str, replWith)
End With
End Function
Sub test()
Debug.Print replaceStart("Q-Quincy", "DA ")
Debug.Print replaceStart("BA Bob", "DA ")
Debug.Print replaceStart("DA White", "DA ")
End Sub
The debug.print will -->
DA Quincy
DA Bob
DA White
The regular expression matches everything up to but not including the first "word" character that follows a non-word character. This should be the second word in the string.
A "word" character is anything in the set of [A-Za-z0-9_]
Seems to work on the examples you present.
If you wanted to go about it through a loop you should remove some redundancies in your code. For instance, refering to cell.offset(0,0) doesn't make sense.
I would set the target cells to a range and simply edit that cell with out placing the unwanted strings in another cell.
**EDIT:
I'd try something like this.**
nameiwant = "Quincy"
Set cell = Range("A1")
If InStr(cell, nameiwant) > 0 And Left(cell, 3) <> "DA " Then
cell.Value = "DA " & nameiwant
End If
If I search for the term 'tfo' in the cell value 'TFO_xyz' then the result should be TRUE.
If I search for the term 'tfo' in the cell value 'TFO systems' then the result should be TRUE.
If I search for the term 'tfo' in the cell value 'spring TFO' then the result should be TRUE.
BUT if I check 'tfo' in the cell value 'Platform' then I want the result as FALSE
I have used the formula =IF(COUNTIF(A2,"*tfo*"),"TRUE","FALSE"), but this wont give result as FALSE when I check 'tfo' in the word 'Platform'
NOTE:
Platform should be false because tfo is coming in between a word. I'm looking result as True for cell values with just the word tfo like in tfo<space>America or TFO_America or <space>TFO systems. But I want FALSE result for the words Platform and portfolio because in these two words the term tfo comes in between alphabets.
Try this:
Dim x As Long: x = 1
With Sheet1
Do While x <= .Cells(.Rows.Count, 1).End(xlUp).Row
If VBA.Left(.Cells(x, 1).Value, 3) = "tfo" Or VBA.Right(.Cells(x, 1).Value, 3) = "tfo" Then
.Cells(x, 2).Value = True
End If
x = x + 1
Loop
End With
Try this formula. This assumes that word tfo will be at the beginning or end
Just make sure to place appropriate cell names where i have 'A2' in the formula
=IF(OR(PROPER(LEFT(A2,3))="tfo",PROPER(RIGHT(A2,3))="tfo"),TRUE,FALSE)
Test Cases Below:
My suggestion is to spend sometime to know your data and create a white-list.
Since there is no easy way to properly do fuzzy search in strings.
Function TFO_Search(strText As String) As Boolean
Dim ArryString As Variant
Dim ArryWhitelist As Variant
' Create a White-List Array
ArryWhitelist = Array("TFO_", "TFO ", "_TFO", " TFO", "tfoAmerica")
For Each ArryString In ArryWhitelist
If InStr(UCase(strText), UCase(ArryString)) > 0 Then 'force to UPPER CASE
TFO_Search = True
Exit Function
Else
TFO_Search = False
End If
Next
End Function
I see two dimensions of complexity in your question:
Where does the key word occur in the text (beginning, middle, end)
What are the characters that separate words.
The first one is fixed size, you need to handle three cases. The second one depends on the number of characters you want to accept as delimiters. Below I assumed that you accept space and underscore, however, you may expand this set by inserting more SUBSTITUTE function calls.
In my table, $A2 is the cell in which you search for the keyword, while B$1 contains the keyword.
To standardize the separator character, you need the formula:
B2=SUBSTITUTE($A2,"_"," ")
To check if the string starts with the keyword:
C2=--(LEFT($B2,LEN(B$1)+1)=B$1&" ")
To check if the string ends with the keyword:
D2=--(RIGHT($B2,LEN(B$1)+1)=" "&B$1)
To check if the keyword is in the middle of the string:
E2=--(LEN(SUBSTITUTE(UPPER($B2)," "&UPPER(B$1)&" ",""))<LEN($B2))
To evaluate the above three cases:
F2=--(0<$C2+$D2+$E2)
If you want to use a single cell, combine the formulas into:
G2=--(0<--(LEFT(SUBSTITUTE($A2,"_"," "),LEN(B$1)+1)=B$1&" ")+--(RIGHT(SUBSTITUTE($A2,"_"," "),LEN(B$1)+1)=" "&B$1)+--(LEN(SUBSTITUTE(UPPER(SUBSTITUTE($A2,"_"," "))," "&UPPER(B$1)&" ",""))<LEN(SUBSTITUTE($A2,"_"," "))))
It is not very readable in the end but I don't think there was an easier solution using Formulas only.
Note: If you want to modify the set of characters accepted as delimiters, add more SUBSTITUTE function calls to B2, then copy the Formula of F2 into notepad and replace $C2 with the formula of C2, etc., then replace $B2 with the updated Formula of B2.
Update
Building on the idea in Ron Rosenfelds comment to tigeravatar's answer, the formula can be simplified (the beginning, middle, ending cases can be joined):
=--(LEN(SUBSTITUTE(" "&UPPER($B2)&" "," "&UPPER(B$1)&" ",""))<LEN($B2))
After substituting $B2 with its formula:
=--(LEN(SUBSTITUTE(" "&UPPER(SUBSTITUTE($A2,"_"," "))&" "," "&UPPER(B$1)&" ",""))<LEN(SUBSTITUTE($A2,"_"," ")))
This formula will return true if TFO is at the beginning or end of any given word, or by itself, in the text string. It also checks every word in the text string, so TFO can be at beginning, middle, or end. The formula assumes that if a word starts or ends with TFO, then the result should be TRUE (as is the case for tfoAmerica so same rule would apply to tform), else FALSE.
=OR(ISNUMBER(SEARCH({" tfo","tfo "}," "&SUBSTITUTE(A2,"_"," ")&" ")))
Here are its results:
EDIT:
In the event that the result should only be TRUE if TFO is found by itself, then this version of the formula will suffice:
=ISNUMBER(SEARCH(" tfo "," "&SUBSTITUTE(A2,"_"," ")&" "))
Image showing results of second version:
If you can rely on VBA, then regex is a more flexible solution.
There is a good summary, of how to use them in VBA: How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
For your keyword search problem I wrote the following:
Option Explicit
' Include: Tools > References > Microsoft VBScript Regular Expressions 5.5 (C:\Windows\SysWOW64\vbscript.dll\3)
Public Function SearchKeyWord(strHay As String, strNail As String, Optional strDelimiters As String = " _,.;/", Optional lngNthOccurrence As Long = 1) As Long ' Returns 1-based index of nth occurrence or 0 if not found
Dim strPattern As String: strPattern = CreatePattern(strNail, strDelimiters)
Dim rgxKeyWord As RegExp: Set rgxKeyWord = CreateRegex(strPattern, True)
Dim mtcResult As MatchCollection: Set mtcResult = rgxKeyWord.Execute(strHay)
If (0 <= lngNthOccurrence - 1) And (lngNthOccurrence - 1 < mtcResult.Count) Then
Dim mthResult As Match: Set mthResult = mtcResult(lngNthOccurrence - 1)
SearchKeyWord = mthResult.FirstIndex + Len(mthResult.SubMatches(0)) + 1
Else
SearchKeyWord = 0
End If
End Function
Private Function CreateRegex(strPattern As String, Optional blnIgnoreCase As Boolean = False, Optional blnMultiLine As Boolean = True, Optional blnGlobal As Boolean = True) As RegExp
Dim rgxResult As RegExp: Set rgxResult = New RegExp
With rgxResult
.Pattern = strPattern
.IgnoreCase = blnIgnoreCase
.MultiLine = blnMultiLine
.Global = blnGlobal
End With
Set CreateRegex = rgxResult
End Function
Private Function CreatePattern(strNail As String, strDelimiters As String) As String
Dim strDelimitersEscaped As String: strDelimitersEscaped = RegexEscape(strDelimiters)
Dim strPattern As String: strPattern = "(^|[" & strDelimitersEscaped & "]+)(" & RegexEscape(strNail) & ")($|[" & strDelimitersEscaped & "]+)"
CreatePattern = strPattern
End Function
Private Function RegexEscape(strOriginal As String) As String
Dim strEscaped As String: strEscaped = vbNullString
Dim i As Long: For i = 1 To Len(strOriginal)
Dim strChar As String: strChar = Mid(strOriginal, i, 1)
Select Case strChar
Case ".", "$", "^", "{", "[", "(", "|", ")", "*", "+", "?", "\"
strEscaped = strEscaped & "\" & strChar
Case Else
strEscaped = strEscaped & strChar
End Select
Next i
RegexEscape = strEscaped
End Function
Once you have the above in a Module, you can insert formulas like the following:
=SearchKeyWord($A1,"tfo")
where A1 contains e.g. "tfo America".
As a third parameter, you may specify, which characters you want to treat as delimiters, by default they are space, underscore, comma, dot, semicolon and slash.
The return value is the position of the nth occurrence of the keyword, where n is the value of the fourth parameter (default: 1), or 0 if not found.
To check if the keyword is present in A1, compare the result to 0, which means not found:
=--(SearchKeyWord($A1,"tfo")<>0)
I'm looking for a macro (preferably a function) that would take cell contents, split it into separate words, compare them to one another and remove the shorter words.
Here's an image of what I want the output to look like (I need the words that are crossed out removed):
I tried to write a macro myself, but it doesn't work 100% properly because it's not taking the last words and sometimes removes what shouldn't be removed. Also, I have to do this on around 50k cells, so a macro takes a lot of time to run, that's why I'd prefer it to be a function. I guess I shouldn't use the replace function, but I couldn't make anything else work.
Sub clean_words_containing_eachother()
Dim sht1 As Worksheet
Dim LastRow As Long
Dim Cell As Range
Dim cell_value As String
Dim word, word2 As Variant
Set sht1 = ActiveSheet
col = InputBox("Which column do you want to clear?")
LastRow = sht1.Cells(sht1.Rows.Count, col).End(xlUp).Row
Let to_clean = col & "2:" & col & LastRow
For i = 2 To LastRow
For Each Cell In sht1.Range(to_clean)
cell_value = Cell.Value
cell_split = Split(cell_value, " ")
For Each word In cell_split
For Each word2 In cell_split
If word <> word2 Then
If InStr(word2, word) > 0 Then
If Len(word) < Len(word2) Then
word = word & " "
Cell = Replace(Cell, word, " ")
ElseIf Len(word) > Len(word2) Then
word2 = word2 & " "
Cell = Replace(Cell, word2, " ")
End If
End If
End If
Next word2
Next word
Next Cell
Next i
End Sub
Assuming that the retention of the third word in your first example is an error, since books is contained later on in notebooks:
5003886 book books bound case casebound not notebook notebooks office oxford sign signature
and also assuming that you would want to remove duplicate identical words, even if they are not contained subsequently in another word, then we can use a Regular Expression.
The regex will:
Capture each word
look-ahead to see if that word exists later on in the string
if it does, remove it
Since VBA regexes cannot also look-behind, we work-around this limitation by running the regex a second time on the reversed string.
Then remove the extra spaces and we are done.
Option Explicit
Function cleanWords(S As String) As String
Dim RE As Object, MC As Object, M As Object
Dim sTemp As String
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.Pattern = "\b(\w+)\b(?=.*\1)"
.ignorecase = True
'replace looking forward
sTemp = .Replace(S, "")
' check in reverse
sTemp = .Replace(StrReverse(sTemp), "")
'return to normal
sTemp = StrReverse(sTemp)
'Remove extraneous spaces
cleanWords = WorksheetFunction.Trim(sTemp)
End With
End Function
Limitations
punctuation will not be removed
a "word" is defined as containing only the characters in the class [_A-Za-z0-9] (letters, digits and the underscore).
if any words might be hyphenated, or contain other non-word characters
in the above, they will be treated as two separate words
if you want it treated as a single word, then we might need to change the regex
General steps:
Write cell to array (already working)
for each element (x), go through each element (y) (already working)
if x is in y AND y is longer that x THEN set x to ""
concat array back into string
write string to cell
String/array manipulations are much faster than operations on cells, so this will give you some increase in performance (depending on the amount of words you need to replace for each cell).
The "last word problem" might be that you dont have a space after the last word within your cells, since you only replace word + " " with " ".
I am trying to cleanup a set of strings in Excel to extract certain words after removing some prefixes and extra characters. Initially I was trying this with FIND, LEFT, MID, etc. Then, I came across this helpful post and trying my hand at regex.
https://superuser.com/questions/794536/excel-formulas-for-stripping-out-prefix-suffix-around-number
I have used the UDF given there called Remove which takes a regex argument. Now, I am still not able to remove all the items I wanted to remove.
In the attached Excel you can see what I have tried and what the answer I am looking.
Here are the Prefixes I wanted to remove:
The numbers in the beginning surrounded by brackets - Ideally I want this in a separate column.
Anyword before a hyphen here there are a number of them 'l-', 'al-'
and then these prefixes below.
bi
bil
fa
wa
wal
How do I write a single regex which would remove all the above prefixes?
Here is the UDF I am using:
Function Remove(objCell As Range, strPattern As String)
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Global = True
RegEx.Pattern = strPattern
Remove = RegEx.Replace(objCell.Value, "")
End Function
Here is the link to the XLSM file which contains the data I have:
https://www.dropbox.com/s/et9ee727ompj5fl/Regex%20Trials.xlsm?dl=0
and here is a screenshot to show you what I am looking for:
Not 100% perfect for words but should get you started
Breakdown of RegEx (\d+\:)+\d+
(\d+\:) finds any patterns that match the format x:
the plus after the bracket then tells it that this is a repeating pattern.
lastly the \d+ matches the last digit in the string so that the regex will find a pattern that matches x:x:x
The next RegEx (?!l-|al-|a-|wa-|fa-|bi-)[a-z].* is a lot more complex.
First of all lets look at the [a-z]. This tells it to match any character between a and z. We then want to capture the rest of the word so by using .* it captures everything from the first match to the end of the string (this includes non a-z characters). However, we don't want it to capture the first part of the string before the hyphen (in most cases) so by using ?! We use what's called negative look ahead. This looks for anything inside the brackets and ignores those bits. | simply means or. so anything inside that bracket will be ignored from the match.
Go to http://regexr.com/ if you want to have a play around is a handy site to learn/test RegEx
Public Sub test()
Dim rng As Range
Dim matches
Dim c
With Sheet1
Set rng = .Range(.Cells(2, 1), .Cells(.Cells(.Rows.Count, 1).End(xlUp).Row, 1))
End With
For Each c In rng
With c
.Offset(0, 6) = ExecuteRegEx(.Value2, "(\d+\:)+\d+")
.Offset(0, 7) = ExecuteRegEx(.Value2, "(?!l-|al-|a-|wa-|fa-|bi-)[a-z].*")
End With
Next c
End Sub
Public Function ExecuteRegEx(str As String, pattern As String) As String
Dim RegEx As Object
Dim matches
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Global = True
.ignorecase = False
.pattern = pattern
If .test(str) Then
Set matches = .Execute(str)
ExecuteRegEx = matches(0)
Else
ExecuteRegEx = vbNullString
End If
End With
End Function
I wouldn't use a regex for this: you can do some splitting of the cell value and testing of the prefixs against a defined array of prefixs:
Note: the array values are in an order where substrings of other prefixs are later in the list
Public Function RemovePrefix(RngSrc As Range) As String
If RngSrc.Count > 1 Then Exit Function
On Error GoTo ExitFunction
Dim Prefixs() As String: Prefixs = Split("wal,wa',wa,bil,bi,fa", ",")
Dim Arr() As String, i As Long, Temp As String
Arr = Split(RngSrc, "-")
If UBound(Arr) > 0 Then
RemovePrefix = Arr(UBound(Arr))
Exit Function
End If
Arr = Split(RngSrc, " ")
For i = 0 To UBound(Prefixs)
Temp = Arr(UBound(Arr))
If InStr(Temp, Prefixs(i)) = 1 Then
RemovePrefix = Right(Temp, Len(Temp) - Len(Prefixs(i)))
Exit Function
End If
Next i
RemovePrefix = Temp
ExitFunction:
If Err Then RemovePrefix = "Error"
End Function