This question already has answers here:
Extracting digits from a cell with varying char length
(4 answers)
Closed 2 years ago.
I need to be able to remove all alphabetical characters from a string, leaving just the numbers behind.
I don't need to worry about any other characters like ,.?# and so on, just the letters of the alphabet a-z, regardless of case.
The closest I could get to a solution was the exact opposite, the below VBA is able to remove the numbers from a string.
Function removenumbers(ByVal input1 As String) As String
Dim x
Dim tmp As String
tmp = input1
For x = a To Z
tmp = Replace(tmp, x, "")
Next
removenumbers = tmp
End Function
Is there any modification I can make to remove the letters rather than numbers to the above, or am I going at this completely wrong.
The letters could fall anywhere in the string, and there is no pattern to the strings.
Failing this I will use CTRL + H to remove all letters one by one, but may need to repeat this again each week so UDF would be much quicker.
I'm using Office 365 on Excel 16
Option Explicit
dim mystring as String
dim regex as new RegExp
Private Function rgclean(ByVal mystring As String) As String
'function that find and replace string if contains regex pattern
'returns str
With regex
.Global = True ' return all matches found in string
.Pattern = "[A-Z]" ' add [A-Za-z] if you want lower case as well the regex pattern will pick all letters from A-Z and
End With
rgclean = regex.Replace(mystring, "") '.. and replaces everything else with ""
End Function
Try using regular expression.
Make sure you enable regular expression on: Tools > References > checkbox: "Microsoft VBScript Regular Expressions 5.5"
The function will remove anything from [A-Z], if you want to include lower case add [A-Za-z] into the regex.pattern values. ( .Pattern = "[A-Za-z]")
You just pass the string into the function, and the function will use regular expression to remove any words from in a string
Thanks
Related
I have a column of Hexadecimal strings with many TRAILING zeros.
The problem i have is that the trailing Zeros from the string, needs to be removed
I have searched for a VBA formula such as Trim but my solution has not worked.
Is there a VBA formula I can use to remove all these Trailing zeros from each of the strings.
An example of the HEX string is 4153523132633403277E7F0000000000000000000000000000. I would like to have it in a format of 4153523132633403277E7F
The big issue is that the Hexadecimal strings can be of various lengths.
Formula:
You could try:
Formula in B1:
=LET(a,TEXTSPLIT(A1,,"0"),TEXTJOIN("0",0,TAKE(a,XMATCH("?*",a,2,-1))))
This would TEXTSPLIT() the input and the fact that we can then use XMATCH() to return the position of the last non-empty string with a wildcard match ?*. However, given the fact we can use arrays in our TEXTSPLIT() function, a little less verbose could be:
=TEXTBEFORE(A1,TAKE(TEXTSPLIT(A1,TEXTSPLIT(A1,"0",,1)),,-1),-1)
Or another option, though more verbose, is to use REDUCE() for what it's intended to do, which is to loop a given array:
=REDUCE(A1,SEQUENCE(LEN(A1)),LAMBDA(a,b,IF(RIGHT(a)="0",LEFT(a,LEN(a)-1),a)))
VBA:
If VBA is a must, one way of dealing with this is through the RTrim() function. Since your HEX-string should not contain spaces to begin with I think the following is a safe bet:
Sub Test()
Dim s As String: s = "4153523132633403277E7F0000000000000000000000000000"
Dim s_new As String
s_new = Replace(RTrim(Replace(s, "0", " ")), " ", "0")
Debug.Print s_new
End Sub
If you happen to have spaces anywhere else in your string, another option would be to look for trailing zero's using a regular expression:
Sub Test()
Dim s As String: s = "4153523132633403277E7F0000000000000000000000000000"
Dim s_new As String
With CreateObject("vbscript.regexp")
.Pattern = "0+$"
s_new = .Replace(s, "")
End With
Debug.Print s_new
End Sub
Both the above options should print: 4153523132633403277E7F
As far as I know, there is no function to do that for you. The way I would do it is presented in the pseudo-code below:
while last character is "0"
remove last character
end while
It is quit slow, but VBA itself is not race car either, so you will probably not notice especially if you do not need to that for many times at once.
A more beautiful solution would involve VBA being able to search for the beginning or the end of a string.
An improvement of the solution above is to parse the string backwards and count the "0" characters, and then remove them all at the same time.
I'm having trouble finding a way to remove floating integers from a cell without removing numbers attached to the end of my string. Could I get some help as to how to approach this issue?
For example, in the image attached, instead of:
john123 456 hamilton, I want:
john123 hamilton
This can be done using regular expressions. You will match on the data you want to remove, then replace this data with an empty string.
Since you didn't provide any code, all I can do you for is provide you with a function that you can implement into your own project. This function can be used in VBA or as a worksheet function, such as =ReplaceFloatingIntegers(A1).
You will need to add a reference to Microsoft VBScript Regular Expressions 5.5 by going to Tools, References in the VBE menu.
Function ReplaceFloatingIntegers(Byval inputString As String) As String
With New RegExp
.Global = True
.MultiLine = True
.Pattern = "(\b\d+\b\s?)"
If .Test(inputString) Then
ReplaceFloatingIntegers = .Replace(inputString, "")
Else
ReplaceFloatingIntegers = inputString
End If
End With
End Function
Breaking down the pattern
( ... ) This is a capturing group. Anything captured in this group will be able to be replaced with the .Replace() function.
\b This is a word boundary. We use this because we want to test from the edge to edge of any 'words' (which includes words that contain only digits in our case).
\d+\b This will match any digit (\d), one to unlimited + times, to the next word boundary\b
\s? will match a single whitespace character, but it's optional ? if this character exists
You can look at this personalized Regex101 page to see how this matches your data. Anything matched here is replaced with an empty string.
I have this string
Weiss,Emery/Ap #519-8997 Quam. Street/Hawaiian Gardens,IN - 79589|10/13/2010
how do I get the Hawaiian Gardens only?
I already tried Using some
=mid(left(A1,find("/",A1)-1),find(",",A1)+1,len(A1))
it gives me emery instead
If there are always two slashes before the string you want to extract, based onTyler M's answer you can use this
=MID(E1,
FIND("~",SUBSTITUTE(E1,"/","~",2))+1,
FIND(",",RIGHT(E1,LEN(E1)-FIND("~",SUBSTITUTE(E1,"/","~",2))))-1
)
This substitutes the second occurence of / with a character which normally would not occur in the address, thus making it findable.
Was your intention to also include Google Spreadsheets (looking at your title)? If so,you can use the REGEXEXTRACT() function. For example in B1
=REGEXEXTRACT(A1,"\/([\w\s]*)\,")
In Excel you could build a UDF using this regex rule like so (as an example):
Function REGEXEXTRACT(S As String, PTRN As String) As String
'We will get the last possible match in your string...
Dim regex As Object
Set regex = CreateObject("VBScript.RegExp")
With regex
.Pattern = PTRN
.Global = True
End With
Set matches = regex.Execute(S)
For Each Match In matches
If Match.SubMatches.Count > 0 Then
For Each subMatch In Match.SubMatches
REGEXEXTRACT = subMatch
Next subMatch
End If
Next Match
End Function
Call the function in B1 like so:
=REGEXEXTRACT(A1,"\/([\w\s]*)\,")
I have a extract of all the files on a network drive, and in the some file names is a part number, the part numbers format is 0000-000000-00. Now in the 600,000+ path names in this file I'm trying to figure out how to extract my part numbers out of the path names. I think a mid formula might work but I am at a loss on how to tell it to find anything with the part # format 0000-000000-00 and extract only those 14 characters from the path?
input looks like this
c:\users\stuff\folder_name\1234-000001-01_ baskets_1.pdf
c:\users\stuff\folder_name\1234-000001-02_ baskets_2.pdf
c:\users\stuff\folder_name\1234-000001-03_ baskets_3.pdf
c:\users\stuff\folder_name\1234-000030-01_ tree_30.pdf
c:\users\stuff\folder_name\random text_1234-000030-02_ tree_30.pdf
c:\users\stuff\folder_name\more random stuff_1234-000030-02_ tree_30.pdf
output I'm hoping for
1234-000001-01
1234-000001-02
1234-000001-03
1234-000030-01
Since you have a pattern we can exploit, use this:
=MID(A1,SEARCH("????-??????-??",A1),14)
Finds the start of the pattern and returns the 14 character after.
You wanted a formula but a UDF could also be used to apply a regex to get the pattern (a little overkill in this instance but worth being aware of):
Option Explicit
Public Sub GetCustomString()
Dim i As Long, tests()
tests = Array("c:\users\stuff\folder_name\1234-000001-01_ baskets_1.pdf", _
"c:\users\stuff\folder_name\1234-000001-02_ baskets_2.pdf", _
"c:\users\stuff\folder_name\1234-000001-03_ baskets_3.pdf", _
"c:\users\stuff\folder_name\1234-000030-01_ tree_30.pdf", _
"c:\users\stuff\folder_name\random text_1234-000030-02_ tree_30.pdf", _
"c:\users\stuff\folder_name\more random stuff_1234-000030-02_ tree_30.pdf")
For i = LBound(tests) To UBound(tests)
Debug.Print GetString(tests(i))
Next
End Sub
Public Function GetString(ByVal inputString As String) As String
Dim arr() As String, i As Long, matches As Object, re As Object
Set re = CreateObject("VBScript.RegExp")
With re
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "\d{4}-\d{6}-\d{2}"
If .test(inputString) Then
GetString = .Execute(inputString)(0)
Else
GetString = vbNullString
End If
End With
End Function
Using UDF in sheet:
Pattern: \d{4}-\d{6}-\d{2}
Explanation:
\d{4} matches a digit (equal to [0-9])
{4} Quantifier — Matches exactly 4 times
"-" matches the character - literally (case sensitive)
\d{6} matches a digit (equal to [0-9])
{6} Quantifier — Matches exactly 6 times
"-" matches the character - literally (case sensitive)
\d{2} matches a digit (equal to [0-9])
{2} Quantifier — Matches exactly 2 times
Global pattern flags:
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
I am currently building a numberplate checker on an excel spread sheet that will determine if the letters and numbers of the numberplate are in the correct places and are valid.
The 3 criteria I have are if the numberplates are in on of these formulas:
(I have represented a number as 1 and a letter as A)
AAA111A
A111AAA
AA11AAA
The ultimate objective is for the program to ask the question "Look at these number plates, do they follow a format as shown above."
So far I have only been able to check to see if I have numbers in certain places, however I cannot specify the characters A - Z when trying to do a search function from the left, right and centre.
=ISNUMBER(--MID(A3,1,3))
If I wanted to search within a cell for example, the first character, is it a letter a-z, return true or false? How would I go about doing this?
An example in this instance might be:
DJO148R
The formula
=ISNUMBER(--MID(A5,4,3))
This would turn back as true because the 4th character is a number and so are the next 2.
With the same numberplate, how do I change it to search for letters rather than numbers within the numberplate?
Here is a simpler RegEx implementation. Make sure you include references to Microsoft VBScript Regular Expressions 5.5. This will go in a new inserted module
Function PlateCheck(cell As Range) As Boolean
Dim rex As New RegExp
rex.Pattern = "[A-Z][0-9|A-Z][0-9|A-Z][0-9|A-Z][0-9|A-Z][0-9|A-Z][A-Z]"
If rex.Test(cell.Value) Then
PlateCheck = True
Else
PlateCheck = False
End If
End Function
As per the guys comments, here's how you do it with regex:
Make sure to include MS VB regular expressions 5.5 as a reference.
To do that, in your VBA IDE, go Tools, Reference and then look the regex reference.
Then Add this in a new module:
Function VerifyLicensePlate(ip As Range) As String
Dim regex As New RegExp
Dim inputstr As String: inputstr = ip.Value
With regex
.Global = True
.IgnoreCase = True
End With
Dim strpattern(2) As String
strpattern(0) = "[A-Z][A-Z][A-Z][0-9][0-9][0-9][A-Z]"
strpattern(1) = "[A-Z][A-Z][0-9][0-9][A-Z][A-Z][A-Z]"
strpattern(2) = "[A-Z][0-9][0-9][0-9][A-Z][A-Z][A-Z]"
For i = 0 To 2
regex.pattern = strpattern(i)
If regex.Test(inputstr) Then
VerifyLicensePlate = "Match"
Exit Function
Else
VerifyLicensePlate = "No match"
End If
Next
End Function
Output:
Occam's Razor would suggest,
=NOT(ISNUMBER(--MID(A5,4,3)))
... or,
=ISERROR(--MID(A5,4,3))
Here's a version that uses late-binding, so no need to set a reference. IT is case insensitive, as that seemed to be implied in your question, but that is easily changed.
Option Explicit
Function MatchPattern(S As String) As Boolean
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.Pattern = "\b(?:[A-Z]{3}\d{3}[A-Z]|[A-Z]{2}\d{2}[A-Z]{3}|[A-Z]\d{3}[A-Z]{3})\b"
.ignorecase = True
MatchPattern = .test(S)
End With
End Function
But, as pointed out by G Serg, you don't really need regex for this:
Option Explicit
Option Compare Text 'Case Insensitive
Function MatchPattern(S As String) As Boolean
Const S1 As String = "[A-Z][A-Z][A-Z]###[A-Z]"
Const S2 As String = "[A-Z]###[A-Z][A-Z][A-Z]"
Const S3 As String = "[A-Z][A-Z]##[A-Z][A-Z][A-Z]"
MatchPattern = False
If Len(S) = 7 Then
If S Like S1 Or _
S Like S2 Or _
S Like S3 Then _
MatchPattern = True
End If
End Function
Here is a rather complicated formula that seems to match your specifications:
=AND(LEN(A1)=7,
OR(MMULT(--(CODE(MID(A1,{1,2,3,4,5,6,7},1))>64),--(TRANSPOSE(CODE(MID(A1,{1,2,3,4,5,6,7},1))<91)))={4,5}),
CODE(LEFT(A1,1))>64,CODE(LEFT(A1,1))<91,
CODE(RIGHT(A1,1))>64,CODE(RIGHT(A1,1))<91,
ISNUMBER(-MID(A1,MIN(FIND({1,2,3,4,5,6,7,8,9,0},A1&"0123456789")),
7-MMULT(--(CODE(MID(A1,{1,2,3,4,5,6,7},1))>64),--(TRANSPOSE(CODE(MID(A1,{1,2,3,4,5,6,7},1))<91))))))
Ensure we have only seven characters
The OR(MMULT... function counts the number of letters and returns TRUE if four or five.
Check to make sure first and last character is a letter
There should remain a consecutive string of either two or three digits (seven less the number of letters)
If you want to make the formula case insensitive, replace the instances of A1 with UPPER(A1)
I think the UDF solution is better.