I need to remove numbers from end of string if count of numbers(characters) > 8
I have used the below functions , but it remove all numbers from the string.
So, How this function can be modified to add a condition if count of numbers(characters) > 8
In advance, grateful for any helpful comments and answers.
Option Explicit
Function StripNumber(stdText As String)
Dim str As String, i As Integer
stdText = Trim(stdText)
For i = 1 To Len(stdText)
If Not IsNumeric(Mid(stdText, i, 1)) Then
str = str & Mid(stdText, i, 1)
End If
Next i
StripNumber = str ' * 1
End Function
Function Remove_Number(Text As String) As String
With CreateObject("VBScript.RegExp")
.Global = True
.Pattern = "[0-9]"
Remove_Number = .Replace(Text, "")
End With
End Function
You can use
\d{8,}(?=\.\w+$)
\d{8,}(?=\.[^.]+$)
See the regex demo. If there must be at least 9 digits, replace 8 with 9.
Details:
\d{8,} - eight or more digits
(?=\.\w+$) - that are immediately followed with a . and one or more word chars and then end of string must follow
(?=\.[^.]+$) - the eight or more digits must be immediately followed with a . char and then one or more chars other than a . char till the end of string.
If you have access to the newest functions you can avoid VBA alltogether:
Formula in B2:
=LET(X,TEXTBEFORE(A2,".",-1),Y,TEXTAFTER(A2,X),Z,TEXTAFTER(CONCAT(".",IFERROR(--MID(X,SEQUENCE(LEN(X)),1),".")),".",-1),IF(LEN(Z)>8,SUBSTITUTE(A2,Z&Y,Y),A2))
Or, if there are no leading zeros in these numbers:
=LET(X,TEXTBEFORE(A2,".",-1),Y,TEXTAFTER(A2,X),Z,MAX(IFERROR(--MID(X,SEQUENCE(LEN(X)),LEN(X)),"")),IF(LEN(Z)>8,SUBSTITUTE(A2,Z&Y,Y),A2))
Or; a spilled array:
Formula in B2:
=BYROW(A2:A6,LAMBDA(a,LET(X,TEXTBEFORE(a,".",-1),Y,TEXTAFTER(a,X),Z,TEXTAFTER(CONCAT(".",IFERROR(--MID(X,SEQUENCE(LEN(X)),1),".")),".",-1),IF(LEN(Z)>8,SUBSTITUTE(a,Z&Y,Y),a))))
Or:
=BYROW(A2:A6,LAMBDA(a,LET(X,TEXTBEFORE(a,".",-1),Y,TEXTAFTER(a,X),Z,MAX(IFERROR(--MID(X,SEQUENCE(LEN(X)),LEN(X)),"")),IF(LEN(Z)>8,SUBSTITUTE(a,Z&Y,Y),a))))
Related
I am using excel 2019 and I am trying to extract from a bunch of messed up text cells any (up to 5) word ending with dot that comes after a ].
This is a sample of the text I am trying to parse/clean
`
some text [asred.] ost. |Monday - Ribben (ult.) lot. ac, sino. other maybe long text; collan.
`
I expect to get this:
ost. ult. lot. sino. collan.
I am using this Function found somewhere on the internet which appears to do the job:
`
Public Function RegExtract(Txt As String, Pattern As String) As String
With CreateObject("vbscript.regexp")
'.Global = True
.Pattern = Pattern
If .test(Txt) Then
RegExtract = .Execute(Txt)(0)
Else
RegExtract = "No match found"
End If
End With
End Function
`
and I call it from an empty cell:
=RegExtract(D2; "([\]])(\s\w+[.]){0,5}")
It's the first time I am using regexp, so I might have done terrible things in the eyes of an expert.
So this is my expression: ([]])(\s\w+[.]){0,5}
Right now it returns only
] ost.
Which is much more than I was expecting to be able to do on my first approach to regex, but:
I am not able to get rid of the first ] which is needed to find the place where my useful bits start inside the text block, since \K does not work in excel. I might "find and replace" it later as a smart barbarian, but I'd like to know the way to do it clean, if any clean way exists :)
2)I don't understand how iterators work to get all my "up to 5 occurrencies": I was expecting that {0,5} after the second group meant exactly: "repeat the previous group again until the end of the text block (or until you manage to do it 5 times)".
Thank you for your time :)
--Added after JdvD accepted answer for the records--
I am using this pattern to get all the words ending with dot, after the FIRST occurrence of the closing bracket.
^.*?\]|(\w+\.\s?)|.
This one (without the question mark) instead gets all the words ending with dot, after the LAST occurrence of the closing bracket.
^.*\]|(\w+\.\s?)|.
I was even missing something in my regExtract function: I needed to store the matches into an array through a for loop and then output this array as a string.
I was wrongly assuming that the regex engine was already storing matches as a unique string.
The correct RegExtract function to extract EVERY match is the following:
Public Function RegExtract(Txt As String, Pattern As String) As String
Dim rMatch As Object, arrayMatches(), i As Long
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = Pattern
If .Test(Txt) Then
For Each rMatch In .Execute(Txt)
If Not IsEmpty(rMatch.SubMatches(0)) Then
ReDim Preserve arrayMatches(i)
arrayMatches(i) = rMatch.SubMatches(0)
i = i + 1
End If
Next
RegExtract = Join(arrayMatches, " ")
Else
RegExtract = "No match found"
End If
End With
End Function
RegexMatch:
In addition to the answer given by #RonRosenfeld one could apply what some refer to as 'The Best Regex Trick Ever' which would imply to first match what you don't want and then match what you do want in a capture group. For example:
^.*\]|(\w+\.)
See an online demo where in short this means:
^.*\] - Match 0+ (Greedy) characters from the start of the string upto the last occurence of closing square brackets;
| - Or;
(\w+\.) - Capture group holding 1+ (Greedy) word-characters ending with a dot.
Here is how it could work in an UDF:
Sub Test()
Dim s As String: s = "some text [asred.] ost. |Monday - Ribben (ult.) lot. ac, sino. other maybe long text; collan. "
Debug.Print RegExtract(s, "^.*\]|(\w+\.)")
End Sub
'------
'The above Sub would invoke the below function as an example.
'But you could also invoke this through: `=RegExtract(A1,"^.*\]|(\w+\.)")`
'on your sheet.
'------
Public Function RegExtract(Txt As String, Pattern As String) As String
Dim rMatch As Object, arrayMatches(), i As Long
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = Pattern
If .Test(Txt) Then
For Each rMatch In .Execute(Txt)
If Not IsEmpty(rMatch.SubMatches(0)) Then
ReDim Preserve arrayMatches(i)
arrayMatches(i) = rMatch.SubMatches(0)
i = i + 1
End If
Next
RegExtract = Join(arrayMatches, " ")
Else
RegExtract = "No match found"
End If
End With
End Function
RegexReplace:
Depending on your desired output one could also use a replace function. You'd have to match any remaining character with another alternative for that. For example:
^.*\]|(\w+\.\s?)|.
See an online demo where in short this means that we added another alternative which is simply any single character. A 2nd small addition is that we added the option of an optional space character \s? in the 2nd alternative.
Sub Test()
Dim s As String: s = "some text [asred.] ost. |Monday - Ribben (ult.) lot. ac, sino. other maybe long text; collan. "
Debug.Print RegReplace(s, "^.*\]|(\w+\.\s?)|.", "$1")
End Sub
'------
'There are now 3 parameters to parse to the UDF; String, Pattern and Replacement.
'------
Public Function RegReplace(Txt As String, Pattern As String, Replacement) As String
Dim rMatch As Object, arrayMatches(), i As Long
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = Pattern
RegReplace = Trim(.Replace(Txt, Replacement))
End With
End Function
Note that I used Trim() to remove possible trailing spaces.
Both RegexMatch and RegexReplace would currently return a single string to clean the input but the former does give you the option to deal with the array in the arrayMatches() variable.
There is a method to return all the matches in a string starting after a certain pattern. But I can't recall it at this time.
In the meantime, it seems the simplest would be to remove everything prior to the first ], and then apply Regex to the remainder.
For example:
Option Explicit
Sub findit()
Const str As String = "some text [asred.] ost. |Monday - Ribben (ult.) lot. ac, sino. other maybe long text; collan."
Dim RE As RegExp, MC As MatchCollection, M As Match
Dim S As String
Dim sOutput As String
S = Mid(str, InStr(str, "]"))
Set RE = New RegExp
With RE
.Pattern = "\w+(?=\.)"
.Global = True
If .Test(S) = True Then
Set MC = .Execute(S)
For Each M In MC
sOutput = sOutput & vbLf & M
Next M
End If
End With
MsgBox Mid(sOutput, 2)
End Sub
You could certainly limit the number of matches to 5 by using a counter instead of the For each loop
You can use the following regex
([a-zA-Z]+)\.
Let me explain a little bit.
[a-zA-Z] - this looks for anything that contain any letter from a to z and A to Z, but it only matches the first letter.
\+ - with this you are telling that matches all the letters until it finds something that is not a letter from a to z and A to Z
\. - with this you are just looking for the . at the end of the match
Here the example.
I have a column which contain cells that have some list of alphanumeric number system as follows:
4A(4,5,6,7,8,9); 4B(4,5,7,8); 3A(1,2,3); 3B(1,2,3), 3C(1,2)
On a cell next to it, I use a UDF function to get rid of special characters "(),;" in order to leave the data as
4A456789 4B4578 3A123 3B123 3C12
Function RemoveSpecial(Str As String) As String
Dim SpecialChars As String
Dim i As Long
SpecialChars = "(),;-abcdefghijklmnopqrstuvwxyz"
For i = 1 To Len(SpecialChars)
Str = Replace$(Str, Mid$(SpecialChars, i, 1), "")
Next
RemoveSpecial = Str
End Function
For the most part this works well. However, on certain occasions, the cell would contain an unorthodox pattern such as when a space is included between the 4A and the parenthesized items:
4A (4,5,6,7,8,9);
or when a text appears inside the parenthesis (including two spaces on each side):
4A (4,5, skip 8,9);
or a space appears between the first two characters:
4 A(4,5,6)
How would you fix this so that the random spaces are removed except to delaminate the actual combination of data?
One strategy would be to substitute the patterns you want to keep before eliminating the "special" characters, then restore the desired patterns.
From your sample data, it look like you want to keep a space only if it follow ); or ),
Something like this:
Function RemoveSpecial(Data As Variant) As Variant
Dim SpecialChars As String
Dim KeepStr As Variant, PlaceHolder As Variant, ReplaceStr As Variant
Dim i As Long
Dim DataStr As String
SpecialChars = " (),;-abcdefghijklmnopqrstuvwxyz"
KeepStr = Array("); ", "), ")
PlaceHolder = Array("~0~", "~1~") ' choose a PlaceHolder that won't appear in the data
ReplaceStr = Array(" ", " ")
DataStr = Data
For i = LBound(KeepStr) To UBound(KeepStr)
DataStr = Replace$(DataStr, KeepStr(i), PlaceHolder(i))
Next
For i = 1 To Len(SpecialChars)
DataStr = Replace$(DataStr, Mid$(SpecialChars, i, 1), vbNullString)
Next
For i = LBound(KeepStr) To UBound(KeepStr)
DataStr = Replace$(DataStr, PlaceHolder(i), ReplaceStr(i))
Next
RemoveSpecial = Application.Trim(DataStr)
End Function
Another strategy would be regular expressions (RegEx)
It looks like a regular expression could come in handy here, for example:
Function RemoveSpecial(Str As String) As String
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "\)[;,]( )|[^A-Z\d]+"
RemoveSpecial = .Replace(Str, "$1")
End With
End Function
I have used the regular expression:
\)[;,]( )|[^A-Z\d]+
You can see an online demo to see the result in your browser. The way this works is to apply a form of what some would call "The best regex trick ever!"
\)[;,]( ) - Escape a closing paranthesis, then match either a comma or semicolon before we capture a space character in our 1st capture group.
| - Or use the following alternation:
[^A-Z\d]+ - Any 1+ char any other than in given character class.
EDIT:
In case you have values like 4A; or 4A, you can use:
(?:([A-Z])|\))[;,]( )|[^A-Z\d]+
And replace with $1$2. See an online demo.
I have the following Text sample:
Ins-Si_079_GM_SOC_US_VI SI_SOC_FY1920_US_FY19/20_A2554_Si Resp_2_May
I want to get the number 079, So what I need is the first instance of digits of length 3. There are certain times the 3 digits are at the end, but they usually found with the first 2 underscores. I only want the digits with length three (079) and not 19, 1920, or 2554 which are different lengths.
Sometimes it can look like this with no underscore:
1920 O-B CLI 353 Tar Traf
Or like this with the 3 digit number at the end:
Ins-Si_GM_SOC_US_VI SI_SOC_FY1920_US_FY19/20_A2554_Si Resp_2_079
There are also times where what I need is 2 digits but when it's 2 digits its always at the end like this:
FY1920-Or-OLV-B-45
How would I get what I need in all cases?
You can split the listed items and check for 3 digits via Like:
Function Get3Digits(s As String) As String
Dim tmp, elem
tmp = Split(Replace(Replace(s, "-", " "), "_", " "), " ")
For Each elem In tmp
If elem Like "###" Then Get3Digits = elem: Exit Function
Next
If Get3Digits = vbNullString Then Get3Digits = IIf(Right(s, 2) Like "##", Right(s, 2), "")
End Function
Edited due to comment:
I would execute a 2 digit search when there are no 3 didget numbers before the end part and the last 2 digits are 2. if 3 digits are fount at end then get 3 but if not then get 2. there are times when last is a number but only one number. I would only want to get last if there are 2 or 3 numbers. The - would not be relevant to the 2 digets. if nothing is found that is desired then would return " ".
If VBA is not a must you could try:
=TEXT(INDEX(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"_"," "),"-"," ")," ","</s><s>")&"</s></t>","//s[.*0=0][string-length()=3 or (position()=last() and string-length()=2)]"),1),"000")
It worked for your sample data.
Edit: Some explaination.
SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"_"," "),"-"," ")," ","</s><s>") - The key part to transform all three potential delimiters (hyphen, underscore and space) to valid XML node end- and startconstruct.
The above concatenated using ampersand into a valid XML construct (adding a parent node <t>).
FILTERXML can be used to now 'split' the string into an array.
//s[.*0=0][string-length()=3 or last() and string-length()=2] - The 2nd parameter of FILTERXML which should be valid XPATH syntax. It reads:
//s 'Select all <s> nodes with
following conditions:
[.*0=0] 'Check if an <s> node times zero
returns zero (to check if a node
is numeric. '
[string-length()=3 or (position()=last() and string-length()=2)] 'Check if a node is 3 characters
long OR if it's the last node and
only 2 characters long.
INDEX(.....,1) - I mentioned in the comments that usually this is not needed, but since ExcelO365 might spill the returned array, we may as well implemented to prevent spilling errors for those who use the newest Excel version. Now we just retrieving the very first element of whatever array FILTERXML returns.
TEXT(....,"000") - Excel will try delete leading zeros of a numeric value so we use TEXT() to turn it into a string value of three digits.
Now, if no element can be found, this will return an error however a simple IFERROR could fix this.
Try this function, please:
Function ExtractThreeDigitsNumber(x As String) As String
Dim El As Variant, arr As Variant, strFound As String
If InStr(x, "_") > 0 Then
arr = Split(x, "_")
Elseif InStr(x, "-") > 0 Then
arr = Split(x, "-")
Else
arr = Split(x, " ")
End If
For Each El In arr
If IsNumeric(El) And Len(El) = 3 Then strFound = El: Exit For
Next
If strFound = "" Then
If IsNumeric(Right(x, 2)) Then ExtractThreeDigitsNumber = Right(x, 2)
Else
ExtractThreeDigitsNumber = strFound
End If
End Function
It can be called in this way:
Sub testExtractThreDig()
Dim x As String
x = "Ins-Si_079_GM_SOC_US_VI SI_SOC_FY1920_US_FY19/20_A2554_Si Resp_2_May"
Debug.Print ExtractThreeDigitsNumber(x)
End Sub
I am trying to extract a 5 or 6 digit code from a string.
C:\Users\pthaxthon\Desktop\45697_Originals
C:\Users\pthaxthon\Desktop\123456_Originals
I just need the 5 or 6 digit number
The number always occur after the 4th dash and before the first _
I have tried using the mid and split command but with no success
Range("D14").Value = Mid(fle, 28, 6)
I just need the 5 or 6 digit number
Another solution would be to rely on "The number always occur after the 4th dash and before the first _". Then you could use split in the following way
Option Explicit
Function GetNo(s As String) As String
Const BSLASH = "\"
Const UNDERSCORE = "_"
Const FOUR = 4
Dim v As Variant
' Split the string by backslash
v = Split(s, BSLASH)
' Take always the fourth entry and split it by underscore
v = Split(v(FOUR), UNDERSCORE)
' take string before the underscore
GetNo = v(0)
End Function
Sub TestIt()
Dim inp As String
inp = "C:\Users\pthaxthon\Desktop\123456_Originals"
inp = "C:\Users\pthaxthon\Desktop\45697_Originals"
Debug.Print GetNo(inp)
End Sub
You could also use a regular expression to pull out the number.
Set SDI = CreateObject("VBScript.RegExp")
SDI.Pattern = "\d+" '* keep the number only
Set Num_out = SDI.Execute(Str_In)
Criteria_out = Val(Num_out(0))
Since the number is always between the 28th digit and subsequent "_" character, you can go;
Range("D14").Value = Split(Mid(fle, 28), "_")(0)
I tried to put seconds in 2 text-boxes, each digit in one. Example x= 56 x1= 5 and x2= 6
' s = TimeOfDay.Second
TextBox15.Text = s.Substring(0, 1)
TextBox16.Text = s.Substring(1, 1)'
When I try this I get the following error: System.ArgumentOutOfRangeException
Any ideas on how to fix this?
ArgumentOutOfRange exceptions occurs whenever you attempt to get a character that doesn't exist at the given position. So what is happening is that there is either not a String at position 0 with a length of 1 or there is not a String at position 1 with a length of 1.
To prevent this, add a simple If/Then statement to check if the length of the original String at least equal to the position of the character. Also for what it's worth, since you only want one letter, simply get the character at the desired index of the String.
Here is a quick example:
If s.Length >= 1 Then
TextBox15.Text = s(0).ToString()
End If
If s.Length >= 2 Then
TextBox16.Text = s(1).ToString()
End If
Fiddle: Live Demo
You don't need to convert it to a string before getting the digits, just doing the maths to get them will work well enough:
Dim rightNow = DateTime.Now
TextBox15.Text = (rightNow.Second \ 10).ToString()
TextBox16.Text = (rightNow.Second Mod 10).ToString()
And another approach.
Dim c() As Char = DateTime.Now.Second.ToString("00").ToArray
TextBox1.Text = c(0)
TextBox2.Text = c(1)