Identify pattern in words - excel

I have a question I believe is quite simple, but I don't know the proper way to do it.
Basically, I would like my program to be able to identify words with a certain pattern in it, and if so, to extract what's before the pattern.
The pattern would be, in this case /F, specifically at the end of the word, and it would extract what's before.
For example, if the program finds 21/F, it will identify it as a good match and will extract 21. But if the word was 21/Fudge, it wouldn't do anything.
Do you know the way to look for a match at a specific position in the word?

I would do:
If str Like "*/F" Then
before=Left(str, Len(str)-len("/F"))
Else
'No match!
End If

I would use a regular expression, something like this:
\b\w+?(\d+)\/F\b
This will help you match any digits before "/F" and ignore the rest of the word. In order to use it in VBA you will need to add a reference to 'Microsoft VBScript Regular Expressions 5.5' and here's the VBA behind this. Pattern is "\b\w+?(\d+)/F\b"
Public Sub Extract(Pattern as String, Text as String)
Dim regEx As VBScript_RegExp_55.RegExp
Dim matches As VBScript_RegExp_55.MatchCollection
Set regEx = CreateObject("VBScript.RegExp") ' Create a regular expression.
regEx.Pattern = Pattern
Set matches = regEx.Execute(Text)
Dim i as Long
For i = 0 To (matches.Count - 1)
Debug.Print Matches.Item(i)
Next i
End Sub
Hope this helps.

Related

Replace non-printable characters with " (Inch sign) VBA Excel

I need to replace non-printable characters with " (Inch sign).
I tried to use excel clean function and other UDF functions, but it just remove and not replace.
Note: non-printable characters are highlighted in blue on the above photo and it's position is random on the cells.
this is a sample string file Link`
The expected correct output should be 12"x14" LPG . OUTLET OCT-SEP# process
In advance grateful for useful comments and answer.
As per my comment, you can try:
=SUBSTITUTE(A1,CHAR(25)&CHAR(25),CHAR(34))
Or the VBA pseudo-code:
[A1] = [A1].Replace(Chr(25) & Chr(25), Chr(34))
Where [A1] is the obvious placeholder for the range-object you would want to use with proper and absolute referencing.
With ms365 newest functions, we could also use:
=TEXTJOIN(CHAR(34),,TEXTSPLIT(A1,CHAR(25)))
You can use Regular Expressions within a UDF to create a flexible method to replace "bad" characters, when you don't know exactly what they are.
In the UDF below, I show two pattern options, but others are possible.
One is to replace all characters with a character code >127
the second is to replace all characters with a charcter code >255
Option Explicit
Function ReplaceBadChars(str As String, replWith As String) As String
Dim RE As Object
Set RE = CreateObject("Vbscript.Regexp")
With RE
.Pattern = "[\u0080-\uFFFF]" 'to replace all characters with code >127 or
'.Pattern = "[\u0100-\uFFFF]" 'to replace all characters with code >255
.Global = True
ReplaceBadChars = .Replace(str, replWith)
End With
End Function
On the worksheet you can use, for example:
=ReplaceBadChars(A1,"""")
Or you could use it in a macro if you wanted to process a column of data without adding an extra column.
Note: I am uncertain as to whether there might be an efficiency difference using a smaller negated character class (eg: [^\x00-\x79] instead of the character class I showed in the code. But if, as written, execution seems slow, I'd try this change)
You can try this :
Cells.Replace What:="[The caracter to replace]", Replacement:=""""

What's the best way to keep regex matches in Excel?

I'm working off of the excellent information provided in "How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops", however I'm running into a wall trying to keep the matched expression, rather than the un-matched portion:
"2022-02-14T13:30:00.000Z" converts to "T13:30:00.000Z" instead of "2022-02-14", when the function is used in a spreadsheet. Listed below is the code which was taken from "How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops". I though a negation of the strPattern2 would work, however I'm still having issues. Any help is greatly appreciated.
Function simpleCellRegex(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strPattern2 As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
strPattern = "^T{0-9][0-9][:]{0-9][0-9][:]{0-9][0-9][0-9][Z]"
strPattern2 = "^(19|20)\d\d([- /.])(0[1-9]|1[012])\2(0[1-9]|[12][0-9]|3[01])"
If strPattern2 <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern2
End With
If regEx.test(strInput) Then
simpleCellRegex = regEx.Replace(strInput, strReplace)
Else
simpleCellRegex = "Not matched"
End If
End If
End Function
Replace is very powerful, but you need to do two things:
Specify all the characters you want to drop, if your regexp is <myregexp>, then change it to ^.*?(<myregexp>).*$ assuming you only have one date occurrence in your string. The parentheses are called a 'capturing group' and you can refer to them later as part of your replacement pattern. The ^ at the beginning and the $ at the end ensure that you will only match one occurrence of your pattern even if Global=True. I noticed you were already using a capturing group as a back-reference - you need to add one to the back-reference number because we added a capturing group. Setting up the pattern this way, the entire string will participate in the match and we will use the capturing groups to preserve what we want to keep.
Change your strReplace="" to strReplace="$1", indicating you want to replace whatever was matched with the contents of capturing group #1.
Here is a screenprint from Excel using my RegexpReplace User Defined Function to process your example with my suggestions:
I had to fix up your time portion regexp because you used curly brackets three times where you meant square, and you left out the seconds part completely. Notice by adjusting where you start and end your capturing group parentheses you can keep or drop the T & Z at either end of the time string.
Also, if your program is being passed system timestamps from a reliable source then they are already well-formed and you don't need those long, long regular expressions to reject March 32. You can code both parts in one as
([-0-9/.]{10,10})T([0-9:.]{12,12})Z and when you want the date part use $1 and when you want the time part use $2.

Excel find and replace function correct formula

I wish to use the find and replace function in excel to remove example sentences from cells similar to this:
text <br>〔「text」text,「text」text〕<br>(1)text「sentence―sentence/sentence」<br>(2)text「sentence―sentence」
Sentences are in between 「」brackets and will include a ― and / character somewhere inside the brackets.
I have tried 「*―*/*」 however this will delete everything from the right of the〔
Is there any way to target and delete these specific sentence brackets, with the find and replace tool?
Desired outcome:
text <br>〔「text」text,「text」text〕<br>(1)text<br>(2)text「sentence―sentence」
Quite a long formula but in Excel O365 you could use:
=SUBSTITUTE(CONCAT(FILTERXML("<t><s>"&SUBSTITUTE(CONCAT(IF(MID(A1,SEQUENCE(LEN(A1)),1)="「","</s><s>「",IF(MID(A1,SEQUENCE(LEN(A1)),1)="」","」</s><s>",MID(A1,SEQUENCE(LEN(A1)),1)))),"<br>","|$|")&"</s></t>","//s[not(contains(., '「') and contains(., '―') and contains(., '/') and contains(., '」'))][node()]")),"|$|","<br>")
As long as you have access to CONCAT you could also do this in Excel 2019 but you'll have to swap SEQUENCE(LEN(A1)) for ROW(A$1:INDEX(A:A,LEN(A1)))
This formula won't work in many cases, but if the string has matching rules as in your example, then try this:
=SUBSTITUTE(C5,"「" & INDEX(TRIM(MID(SUBSTITUTE(","&SUBSTITUTE(C5,"」","「"),"「",REPT(" ",99)),(ROW(A1:INDEX(A1:A100,LEN(C5)-LEN(SUBSTITUTE(C5,"」",""))))*2-1)*99,99)),MATCH("*―*/*",TRIM(MID(SUBSTITUTE(","&SUBSTITUTE(C5,"」","「"),"「",REPT(" ",99)),(ROW(A1:INDEX(A1:A100,LEN(C5)-LEN(SUBSTITUTE(C5,"」",""))))*2-1)*99,99)),0)) & "」","")
explain how it works:
split the string between the characters "「 "and "」" into an array
use match("*―*/*",,0) to find the string position (note that it will only return one value if it exists, if you have multiple strings, you can replace match("*―*/*",) with search ("*―*/*",..) and use it as an extra column to get matches string)
Use the index(array,match("*―*/*",..)) to get the string needs replacing (result)
Replace the original string with the results found =substitute(txt,result,"")
Or,
In B1 enter formula :
=SUBSTITUTE(A1,"「"&TRIM(RIGHT(SUBSTITUTE(LEFT(A1,FIND("」",A1,FIND("/",A1))),"「",REPT(" ",99)),99)),"")
You did not tag [VBA], but if you are not averse, you could write a User Defined Function that would do what you want using Regular Expressions.
To enter this User Defined Function (UDF), alt-F11 opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.
To use this User Defined Function (UDF), enter a formula like =replStr(A1) in some cell.
Option Explicit
Function replStr(str As String) As String
Dim RE As Object
Const sPat As String = "\u300C(?:(?=[^\u300D]*\u002F)(?=[^\u300D]*\u2015)[^\u300D]*)\u300D"
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.Pattern = sPat
replStr = .Replace(str, "")
End With
End Function

Trying to parse excel string

I am trying to parse a string from teamspeak. I am new to the functions of excel. I have accomplished this with php but I am driving myself nuts excel. This is the string I am trying to parse:
[URL=client://4792/noEto+VRGdhvT9/iV375Ck1ZIfo=~Rizz]Rizz[/URL]
This is what I have accomplished so far:
=TRIM(MID(B22, 15, FIND("=",B22,12) - FIND("//",B22)))
which returns
4792/noEto+VRGdhvT9/iV375Ck1ZIfo=~
I am trying to get it to return:
noEto+VRGdhvT9/iV375Ck1ZIfo=
Any suggestions? I am looked of splitting of strings and the phrasing is just really confusing. Any help would be appriciated.
Paste the URL in A3, then this formula in B3. You can adjust the cell references as needed. It's a lot of nested functions, but it works.
=left(right(A3, len(A3)-find("/",A3,find("//",A3,1)+2)),find("=",right(A3, len(A3)-find("/",A3,find("//",A3,1)+2)),1))
Or you can use a user-defined function in VBA:
Function RegexExtract(myRange As Range) As String
'VBA Editor, menu Tools - References, add reference to Microsoft VBScript Regular Expressions 5.5
Dim regex As New RegExp, allMatches As MatchCollection
With regex
.Global = True
.pattern = "\d+/(.+=)"
End With
Set allMatches = regex.Execute(myRange.Value)
With allMatches
If .Count = 1 Then
RegexExtract = .Item(0).SubMatches(0)
Else
RegexExtract = "N/A"
End If
End With
End Function
Then use it as formula:
=RegexExtract(A1)
I am trying to parse a string
For that:
=MID(A1,20,28)
works.
Now if you have more than one string maybe the others are not of an identical pattern, so the above might not work for them. But in that case if to help you we'd need to know something about the shape of the others wouldn't we.

Check if cell is only a-z excel

I would like to be sure that all my cell contain only characters (A-Z/a-z). I want to be sure there isn't any symbol, number or anything else. Any tips?
For example I have this "Š".
As a VBA function, the following should work:
Option Compare Binary
Function LettersOnly(S As String) As Boolean
LettersOnly = Not S Like "*[!A-Za-z]*" And S <> ""
End Function
In using the function, S can be either an actual string, or a reference to the cell of concern.
EDIT: Also, you want to be certain you have not set Option Compare Text in your code. The default is Option Compare Binary which is what you want for this type of comparison. I have added that to the code for completeness.
Open the VBA editor (Alt+F11) and create a new module.
Add a reference to "Microsoft VBScript Regular Expressions 5.5" (Tools -> References).
In your new module, create a new function like this:
Function IsAToZOnly(inputStr As String) As Boolean
Dim pattern As String: pattern = "^[A-Za-z]*$"
Dim regEx As New RegExp
regEx.pattern = pattern
IsAToZOnly = regEx.Test(inputStr)
End Function
Use the new function in your worksheet:
=IsAToZOnly(A1)

Resources