VBA Excel - Replacing exact word in a String Phrase

VBA Excel - Replacing exact word in a String Phrase - string

I have an array with some words that i want to replace with other words, in fact, i have some problem:
Eg:
Var1 -> wksArray
Var2 -> wksArrayBigger
string : Dim wksArray, wksArrayBigger as Variant
When i try to replace wksArray with "test1", it will cause:
var1 -> teste1
var2 -> test1Bigger
string : Dim teste1, teste1Bigger as Variant
How can i solve that?
Function FindAndReplace(VBProjToClean, varArray)
Dim i, b As Integer
Dim str, replace_str As String
Dim VBC As VBComponent
For Each VBC In VBProjToClean.VBComponents
i = 1
With VBC.CodeModule
Do Until i > .CountOfLines
If Not .ProcOfLine(i, vbext_pk_Proc) = "VBE_Remove_Comments" Then
str = .Lines(i, 1)
End If
For b = 1 To UBound(varArray)
If InStr(1, str, varArray(b), vbTextCompare) > 0 Then
replace_str = Replace(str, varArray(b), varArray(b) & "banana")
.ReplaceLine i, replace_str
str = .Lines(i, 1)
End If
Next b
i = i + 1
Loop
End With
Next
End Function

While not a proper fix to your problem, you could sort your search list and possibly add some intermediary replacements should there be conflicts you cant resolve.
arrReplaceBigger -> arIM01
arrReplace -> aR
arIM01 -> aRB '
If your just replacing variable names, you could create a function to replace Replace that would do a number of extra replacements, appending however many possibilities for the next character there are.
white-space, comma, period, left bracket, right bracket, new line, plus, ...., etc
Or instead of doing many extra replacements you could do the same thing using Regular Expressions as David suggests.

Write a function to find out if something is a valid character for a VBA variable
Then for each pear of strings,
Look for the occurences of the string to be replaced, verify that preceding and succeding positoins do not contain valid characters, and if so, replace the string.

Related

Get every word ending with dot using Regex/VBA

I am using excel 2019 and I am trying to extract from a bunch of messed up text cells any (up to 5) word ending with dot that comes after a ].
This is a sample of the text I am trying to parse/clean
`
some text [asred.] ost. |Monday - Ribben (ult.) lot. ac, sino. other maybe long text; collan.
`
I expect to get this:
ost. ult. lot. sino. collan.
I am using this Function found somewhere on the internet which appears to do the job:
`
Public Function RegExtract(Txt As String, Pattern As String) As String
With CreateObject("vbscript.regexp")
'.Global = True
.Pattern = Pattern
If .test(Txt) Then
RegExtract = .Execute(Txt)(0)
Else
RegExtract = "No match found"
End If
End With
End Function
`
and I call it from an empty cell:
=RegExtract(D2; "([\]])(\s\w+[.]){0,5}")
It's the first time I am using regexp, so I might have done terrible things in the eyes of an expert.
So this is my expression: ([]])(\s\w+[.]){0,5}
Right now it returns only
] ost.
Which is much more than I was expecting to be able to do on my first approach to regex, but:
I am not able to get rid of the first ] which is needed to find the place where my useful bits start inside the text block, since \K does not work in excel. I might "find and replace" it later as a smart barbarian, but I'd like to know the way to do it clean, if any clean way exists :)
2)I don't understand how iterators work to get all my "up to 5 occurrencies": I was expecting that {0,5} after the second group meant exactly: "repeat the previous group again until the end of the text block (or until you manage to do it 5 times)".
Thank you for your time :)
--Added after JdvD accepted answer for the records--
I am using this pattern to get all the words ending with dot, after the FIRST occurrence of the closing bracket.
^.*?\]|(\w+\.\s?)|.
This one (without the question mark) instead gets all the words ending with dot, after the LAST occurrence of the closing bracket.
^.*\]|(\w+\.\s?)|.
I was even missing something in my regExtract function: I needed to store the matches into an array through a for loop and then output this array as a string.
I was wrongly assuming that the regex engine was already storing matches as a unique string.
The correct RegExtract function to extract EVERY match is the following:
Public Function RegExtract(Txt As String, Pattern As String) As String
Dim rMatch As Object, arrayMatches(), i As Long
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = Pattern
If .Test(Txt) Then
For Each rMatch In .Execute(Txt)
If Not IsEmpty(rMatch.SubMatches(0)) Then
ReDim Preserve arrayMatches(i)
arrayMatches(i) = rMatch.SubMatches(0)
i = i + 1
End If
Next
RegExtract = Join(arrayMatches, " ")
Else
RegExtract = "No match found"
End If
End With
End Function

RegexMatch:
In addition to the answer given by #RonRosenfeld one could apply what some refer to as 'The Best Regex Trick Ever' which would imply to first match what you don't want and then match what you do want in a capture group. For example:
^.*\]|(\w+\.)
See an online demo where in short this means:
^.*\] - Match 0+ (Greedy) characters from the start of the string upto the last occurence of closing square brackets;
| - Or;
(\w+\.) - Capture group holding 1+ (Greedy) word-characters ending with a dot.
Here is how it could work in an UDF:
Sub Test()
Dim s As String: s = "some text [asred.] ost. |Monday - Ribben (ult.) lot. ac, sino. other maybe long text; collan. "
Debug.Print RegExtract(s, "^.*\]|(\w+\.)")
End Sub
'------
'The above Sub would invoke the below function as an example.
'But you could also invoke this through: `=RegExtract(A1,"^.*\]|(\w+\.)")`
'on your sheet.
'------
Public Function RegExtract(Txt As String, Pattern As String) As String
Dim rMatch As Object, arrayMatches(), i As Long
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = Pattern
If .Test(Txt) Then
For Each rMatch In .Execute(Txt)
If Not IsEmpty(rMatch.SubMatches(0)) Then
ReDim Preserve arrayMatches(i)
arrayMatches(i) = rMatch.SubMatches(0)
i = i + 1
End If
Next
RegExtract = Join(arrayMatches, " ")
Else
RegExtract = "No match found"
End If
End With
End Function
RegexReplace:
Depending on your desired output one could also use a replace function. You'd have to match any remaining character with another alternative for that. For example:
^.*\]|(\w+\.\s?)|.
See an online demo where in short this means that we added another alternative which is simply any single character. A 2nd small addition is that we added the option of an optional space character \s? in the 2nd alternative.
Sub Test()
Dim s As String: s = "some text [asred.] ost. |Monday - Ribben (ult.) lot. ac, sino. other maybe long text; collan. "
Debug.Print RegReplace(s, "^.*\]|(\w+\.\s?)|.", "$1")
End Sub
'------
'There are now 3 parameters to parse to the UDF; String, Pattern and Replacement.
'------
Public Function RegReplace(Txt As String, Pattern As String, Replacement) As String
Dim rMatch As Object, arrayMatches(), i As Long
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = Pattern
RegReplace = Trim(.Replace(Txt, Replacement))
End With
End Function
Note that I used Trim() to remove possible trailing spaces.
Both RegexMatch and RegexReplace would currently return a single string to clean the input but the former does give you the option to deal with the array in the arrayMatches() variable.

There is a method to return all the matches in a string starting after a certain pattern. But I can't recall it at this time.
In the meantime, it seems the simplest would be to remove everything prior to the first ], and then apply Regex to the remainder.
For example:
Option Explicit
Sub findit()
Const str As String = "some text [asred.] ost. |Monday - Ribben (ult.) lot. ac, sino. other maybe long text; collan."
Dim RE As RegExp, MC As MatchCollection, M As Match
Dim S As String
Dim sOutput As String
S = Mid(str, InStr(str, "]"))
Set RE = New RegExp
With RE
.Pattern = "\w+(?=\.)"
.Global = True
If .Test(S) = True Then
Set MC = .Execute(S)
For Each M In MC
sOutput = sOutput & vbLf & M
Next M
End If
End With
MsgBox Mid(sOutput, 2)
End Sub
You could certainly limit the number of matches to 5 by using a counter instead of the For each loop

You can use the following regex
([a-zA-Z]+)\.
Let me explain a little bit.
[a-zA-Z] - this looks for anything that contain any letter from a to z and A to Z, but it only matches the first letter.
\+ - with this you are telling that matches all the letters until it finds something that is not a letter from a to z and A to Z
\. - with this you are just looking for the . at the end of the match
Here the example.

Seperating a character string

I want to separate a character string using the special characters in that string as cutting lines. After each division the next group of strings should be copied in the next column. The picture below shows how it should work.
My first approach doesn't work and maybe it's too complicated. Is there a simple solution to this task?
Sub SeparateString()
Dim i, j, k, counterA, counterB As Integer
Dim str1, str2 As String
Const Sonderz As String = "^!§$%&/()=?`*'_:;°,.-#+´ß}][{³²"
For i = 1 To Worksheets("Tabelle1").Range("A1").End(xlDown).Row
counterA = 0
For j = 1 To Len(Worksheets("Tabelle1").Range("A" & i))
counterB = 0
For k = 1 To Len(Sonderz)
If Mid(Worksheets("Tabelle1").Range("A" & i), j, 1) = Mid(Sonderz, k, 1) Then
counterA = counterA + 1
End If
If Mid(Worksheets("Tabelle1").Range("A" & i), j, 1) <> Mid(Sonderz, k, 1) And counterA = 0 And counterB = 0 Then
Worksheets("Tabelle1").Range("B" & i) = Worksheets("Tabelle1").Range("B" & i) & Mid(Worksheets("Tabelle1").Range("A" & i), j, 1)
counterB = counterB + 1
End If
Next k
Next j
Next i
End Sub

If you are interested and you do have access to Microsoft 365's dynamic arrays:
Formula in B1:
=LET(X,MID(A1,SEQUENCE(LEN(A1)),1),TRANSPOSE(FILTERXML(CONCAT("<t><s>",IF(ISNUMBER(FIND(X,"^!§$%&/()=?`*'_:;°,.-#+´ß}][{³²")),"</s><s>",X),"</s></t>"),"//s")))
Or nest a SUBSTITUTE() if you need to return string variables:
=LET(X,MID(A1,SEQUENCE(LEN(A1)),1),TRANSPOSE(SUBSTITUTE(FILTERXML(CONCAT("<t><s>'",IF(ISNUMBER(FIND(X,"^!§$%&/()=?`*'_:;°,.-#+´ß}][{³²")),"</s><s>'",X),"</s></t>"),"//s"),"'","")))
If VBA is a must, you could think about a regular expression to replace all the characters from a certain class with a uniform delimiter to use Split() on:
Sub Test()
Dim s As String: s = "CD!02?WX12EF"
Dim arr() As String
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "[!§$%&\/()=?`*'_:;°,.#+´ß}\][{³²^-]"
arr = Split(.Replace(s, "!"), "!")
End With
For Each el In arr
Debug.Print el
Next
End Sub
The caret has been moved from being the 1st character to any but the first to prevent a negated-character class; also the hyphen has been moved to the back to prevent an array-construct of characters. Funny enough, if you actually want to be less verbose you could throw these character in a more condense class [!#-/:;?[]-`{}§=°´ß³²].

Assuming the first data is in cell A2,
I would go with the simple use of find() with left() mid() and right()
=left(A2,find("!",A2,1)-1)
then:
=mid(A2,find("!",A2,1)+1,find("?",A2,1)-find("!",A2,1)-1)
and:
=right(A2,len(A2)-find("?",A2,1))
Tested and working with one correction done.

You can also do this in Power Query which has a command to split by ANY delimiter.
In the code below, I generate a list of all possible special characters defined as characters NOT in the set of A..Za..z0..9 and you can easily add to that list by editing the code if you want to include other characters in the permitted list.
Edit: If you only have a few special characters, you can just hard-code the list, eg {"!","?"} instead of using List.Generate, but in your question you did not necessarily restrict the list of special characters, even though those are the only two showing in your examples
To use Power Query:
Select some cell in your Data Table
Data => Get&Transform => from Table/Range
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
let
//change Table name in next line to reflect actual table name
Source = Excel.CurrentWorkbook(){[Name="Table15"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Source", type text}}),
//Generate list of "special characters" for splitting
//the List.Contains function contains the non-special characters
splitterList = List.RemoveNulls(
List.Generate(()=>0,
each _ <= 255,
each _ +1,
each if List.Contains({"A".."Z","a".."z","0".."9"}, Character.FromNumber(_)) then null else Character.FromNumber(_))),
splitIt = Table.SplitColumn(#"Changed Type", "Source",
Splitter.SplitTextByAnyDelimiter(splitterList))
in
splitIt

declaring a variable as a string that starts with a specific character, and then taking it through a for-loop

For part of my code, I need to take a string value in a cell (ex. G1,G2,H3), select each term separated by a comma that starts with G, and then take each term through a for-loop. This way, only the terms starting with G go through the for-loop
This is what I have so far, and it seems to work except for my "part" variable. It remains empty. Is there an issue with my syntax or with my logic?
Thanks in advance.
dim parts, p as string
part = "G" & "*" 'I tried both string and variant for part. I've also tried writing part="G*" instead
parts = o.Cells(cell.Row, "C")
p = Split(parts, ",")
For Each part In p
.
.
.
Next part
New updated working code:
dim parts, part as String
dim p As Variant
parts = o.Cells(cell.Row, "C")
p = Split(parts, ",")
For k = LBound(p) to UBound(p)
If Left(Trim(p(k)), 1) = "G" Then
part = CSTR(p(k))
.
.
.
.
End If
Next k

Arrays should be iterated over using a regular For...Next loop, and Lbound and Ubound. Then, check if the first character of each element is G. It's probably worth calling Trim as well to remove a potential leading space:
Dim i as Long
For i = Lbound(p) to Ubound(p)
If Left(Trim(p(i)), 1) = "G" Then
...
End If
Next

Extract Excel VBA filenames from string

I want to extract file names from a string. The length of the string and the length of the file name are always different.
Must be done with VBA!
String:
href ist gleich: "abc/db://test.pdf|0|100">Bsp.:
I would like that:
test.pdf
I do not know how to proceed.
It would also be nice if the script could extract multiple filenames from a string.
Zb:
String:
href ist gleich: "abc//db://test.t.pdf|0|100" "db://test1.pdf|0|100">Bsp.
I would like that:
test.t.pdf test1.pdf

Sub testExtractFileName()
Debug.Print extractFileName("file://D:/ETVGI_556/Carconfigurator_file/carconf_d.pdf", "//")
Debug.Print extractFileName("abc//db://test.t.pdf|0|100")
Debug.Print extractFileName("db://test1.pdf|0|100")
End Sub
Function extractFileName(initString As String, Optional delim As String) As String
Dim necString As String
necString = left(initString, InStr(initString, ".pdf") + 3)
necString = Right(necString, Len(necString) - InStrRev(necString, _
IIf(delim <> "", delim, "/")) - IIf(delim <> "", Len(delim) - 1, 0))
extractFileName = necString
End Function
The single condition is that in front of the file name (all the time) to exist "//" characters in the initial string. And of course the file extension to all the time to be .pdf. If not, this extension is required and the function can be easily adapted...
The function returns full name if the second (optional) parameter will be "//" or just the file name (without path) if it is omitted.

One option could be using a pattern where you would match the preceding / and capture in a group 1+ word characters \w+ followed by .pdf
Your value is in capturing group 1.
/(\w+\.pdf)
See a regex demo
If you want to have a broader match than \w you could extend what you do want to match using a character class or use a negated character class [^ to match any char except the listed in the character class.
In this case the negated character class [^/|"\s] would match any char except / | " or a whitespace character \s
/([^/|"\s]+\.pdf)
See another regex demo

Try this and edit it according to your needs. At least it was designed for two of your examples.
Dim sStringToFormat As String
Dim i As Integer
Dim vSplit As Variant
Dim colFileNames As Collection
Dim sFormattedString As String
Set colFileNames = New Collection
sStringToFormat = "href ist gleich: ""abc//db://test.t.pdf|0|100"" ""db://test1.pdf|0|100"">Bsp."
vSplit = Split(sStringToFormat, "/")
For i = LBound(vSplit) To UBound(vSplit)
If InStr(vSplit(i), ".") > 0 Then
sFormattedString = Split(vSplit(i), "|")(0)
sFormattedString = Split(sFormattedString, "<")(0)
sFormattedString = Split(sFormattedString, ">")(0)
colFileNames.Add sFormattedString
End If
Next i

VBA Regular Expression Mail

I want to split a string into 3 parts. For example i have a email adress like
testuser#gamil.com
and i want to split it into
testuser
gamil
.com
with left, right and mid (str) i only can extract a string if is a fixed lenght.
Has anybody some ideas to make it?

with left, right and mid (str) i only can extract a string if is a fixed length.
This is not actually true, because you can also use the len function to get the length of the string.
Dim L as Integer
L = Len("testuser#gamil.com")
MsgBox L
You can also use the Instr (and InstrRev, reversed) function to find the index of a particular character or substring.
Dim I as Integer
I = Instr("testuser#gamil.com", "#")
So, for your case, a custom function without regex will return an array of three items:
Function SplitEmail(email$)
'Function returns an array like:
' {"username", "domain", "tld"}
Dim Dot As Integer, At As Integer
Dot = InStrRev(email, ".")
At = InStr(email, "#")
Dim ret(1 To 3) As String
ret(1) = Left(email, At - 1)
ret(2) = Mid(email, At + 1, Dot - At - 1)
ret(3) = Mid(email, Dot + 1)
SplitEmail = ret
End Function
To get the username part, you could do:
Dim some_email$
some_email = "testuser#gamil.com"
MsgBox SplitEmail(some_email)(1)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

VBA Excel - Replacing exact word in a String Phrase - string

Write a function to find out if something is a valid character for a VBA variable Then for each pear of strings, Look for the occurences of the string to be replaced, verify that preceding and succeding positoins do not contain valid characters, and if so, replace the string.

Related

Get every word ending with dot using Regex/VBA

Seperating a character string

declaring a variable as a string that starts with a specific character, and then taking it through a for-loop

Extract Excel VBA filenames from string

VBA Regular Expression Mail

Categories

Resources