How to extract the first instance of digits in a cell with a specified length in VBA? - excel

I have the following Text sample:
Ins-Si_079_GM_SOC_US_VI SI_SOC_FY1920_US_FY19/20_A2554_Si Resp_2_May
I want to get the number 079, So what I need is the first instance of digits of length 3. There are certain times the 3 digits are at the end, but they usually found with the first 2 underscores. I only want the digits with length three (079) and not 19, 1920, or 2554 which are different lengths.
Sometimes it can look like this with no underscore:
1920 O-B CLI 353 Tar Traf
Or like this with the 3 digit number at the end:
Ins-Si_GM_SOC_US_VI SI_SOC_FY1920_US_FY19/20_A2554_Si Resp_2_079
There are also times where what I need is 2 digits but when it's 2 digits its always at the end like this:
FY1920-Or-OLV-B-45
How would I get what I need in all cases?

You can split the listed items and check for 3 digits via Like:
Function Get3Digits(s As String) As String
Dim tmp, elem
tmp = Split(Replace(Replace(s, "-", " "), "_", " "), " ")
For Each elem In tmp
If elem Like "###" Then Get3Digits = elem: Exit Function
Next
If Get3Digits = vbNullString Then Get3Digits = IIf(Right(s, 2) Like "##", Right(s, 2), "")
End Function
Edited due to comment:
I would execute a 2 digit search when there are no 3 didget numbers before the end part and the last 2 digits are 2. if 3 digits are fount at end then get 3 but if not then get 2. there are times when last is a number but only one number. I would only want to get last if there are 2 or 3 numbers. The - would not be relevant to the 2 digets. if nothing is found that is desired then would return " ".

If VBA is not a must you could try:
=TEXT(INDEX(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"_"," "),"-"," ")," ","</s><s>")&"</s></t>","//s[.*0=0][string-length()=3 or (position()=last() and string-length()=2)]"),1),"000")
It worked for your sample data.
Edit: Some explaination.
SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"_"," "),"-"," ")," ","</s><s>") - The key part to transform all three potential delimiters (hyphen, underscore and space) to valid XML node end- and startconstruct.
The above concatenated using ampersand into a valid XML construct (adding a parent node <t>).
FILTERXML can be used to now 'split' the string into an array.
//s[.*0=0][string-length()=3 or last() and string-length()=2] - The 2nd parameter of FILTERXML which should be valid XPATH syntax. It reads:
//s 'Select all <s> nodes with
following conditions:
[.*0=0] 'Check if an <s> node times zero
returns zero (to check if a node
is numeric. '
[string-length()=3 or (position()=last() and string-length()=2)] 'Check if a node is 3 characters
long OR if it's the last node and
only 2 characters long.
INDEX(.....,1) - I mentioned in the comments that usually this is not needed, but since ExcelO365 might spill the returned array, we may as well implemented to prevent spilling errors for those who use the newest Excel version. Now we just retrieving the very first element of whatever array FILTERXML returns.
TEXT(....,"000") - Excel will try delete leading zeros of a numeric value so we use TEXT() to turn it into a string value of three digits.
Now, if no element can be found, this will return an error however a simple IFERROR could fix this.

Try this function, please:
Function ExtractThreeDigitsNumber(x As String) As String
Dim El As Variant, arr As Variant, strFound As String
If InStr(x, "_") > 0 Then
arr = Split(x, "_")
Elseif InStr(x, "-") > 0 Then
arr = Split(x, "-")
Else
arr = Split(x, " ")
End If
For Each El In arr
If IsNumeric(El) And Len(El) = 3 Then strFound = El: Exit For
Next
If strFound = "" Then
If IsNumeric(Right(x, 2)) Then ExtractThreeDigitsNumber = Right(x, 2)
Else
ExtractThreeDigitsNumber = strFound
End If
End Function
It can be called in this way:
Sub testExtractThreDig()
Dim x As String
x = "Ins-Si_079_GM_SOC_US_VI SI_SOC_FY1920_US_FY19/20_A2554_Si Resp_2_May"
Debug.Print ExtractThreeDigitsNumber(x)
End Sub

Related

Remove numbers from end of string if count of numbers(characters) > 8

I need to remove numbers from end of string if count of numbers(characters) > 8
I have used the below functions , but it remove all numbers from the string.
So, How this function can be modified to add a condition if count of numbers(characters) > 8
In advance, grateful for any helpful comments and answers.
Option Explicit
Function StripNumber(stdText As String)
Dim str As String, i As Integer
stdText = Trim(stdText)
For i = 1 To Len(stdText)
If Not IsNumeric(Mid(stdText, i, 1)) Then
str = str & Mid(stdText, i, 1)
End If
Next i
StripNumber = str ' * 1
End Function
Function Remove_Number(Text As String) As String
With CreateObject("VBScript.RegExp")
.Global = True
.Pattern = "[0-9]"
Remove_Number = .Replace(Text, "")
End With
End Function
You can use
\d{8,}(?=\.\w+$)
\d{8,}(?=\.[^.]+$)
See the regex demo. If there must be at least 9 digits, replace 8 with 9.
Details:
\d{8,} - eight or more digits
(?=\.\w+$) - that are immediately followed with a . and one or more word chars and then end of string must follow
(?=\.[^.]+$) - the eight or more digits must be immediately followed with a . char and then one or more chars other than a . char till the end of string.
If you have access to the newest functions you can avoid VBA alltogether:
Formula in B2:
=LET(X,TEXTBEFORE(A2,".",-1),Y,TEXTAFTER(A2,X),Z,TEXTAFTER(CONCAT(".",IFERROR(--MID(X,SEQUENCE(LEN(X)),1),".")),".",-1),IF(LEN(Z)>8,SUBSTITUTE(A2,Z&Y,Y),A2))
Or, if there are no leading zeros in these numbers:
=LET(X,TEXTBEFORE(A2,".",-1),Y,TEXTAFTER(A2,X),Z,MAX(IFERROR(--MID(X,SEQUENCE(LEN(X)),LEN(X)),"")),IF(LEN(Z)>8,SUBSTITUTE(A2,Z&Y,Y),A2))
Or; a spilled array:
Formula in B2:
=BYROW(A2:A6,LAMBDA(a,LET(X,TEXTBEFORE(a,".",-1),Y,TEXTAFTER(a,X),Z,TEXTAFTER(CONCAT(".",IFERROR(--MID(X,SEQUENCE(LEN(X)),1),".")),".",-1),IF(LEN(Z)>8,SUBSTITUTE(a,Z&Y,Y),a))))
Or:
=BYROW(A2:A6,LAMBDA(a,LET(X,TEXTBEFORE(a,".",-1),Y,TEXTAFTER(a,X),Z,MAX(IFERROR(--MID(X,SEQUENCE(LEN(X)),LEN(X)),"")),IF(LEN(Z)>8,SUBSTITUTE(a,Z&Y,Y),a))))

VBA: How to find the values after a "#" symbol in a string

I am trying to set the letters after a # symbol to a variable.
For example, x = #BAL
I want to set y = BAL
Or x = #NE
I want y = NE
I am using VBA.
Split() in my opinion is the easiest way to do it:
Dim myStr As String
myStr = "#BAL"
If InStr(, myStr, "#") > 0 Then '<-- Check for your string to not throw error
MsgBox Split(myStr, "#")(1)
End If
As wisely pointed out by Scott Craner, you should check to ensure the string contains the value, which he checks in this comment by doing: y = Split(x,"#")(ubound(Split(x,"#")). Another way you can do it is using InStr(): If InStr(, x, "#") > 0 Then...
The (1) will take everything after the first instance of the character you are looking for. If you were to have used (0), then this would have taken everything before the #.
Similar but different example:
Dim myStr As String
myStr = "#BAL#TEST"
MsgBox Split(myStr, "#")(2)
The message box would have returned TEST because you used (2), and this was the second instance of your # character.
Then you can even split them into an array:
Dim myStr As String, splitArr() As String
myStr = "#BAL#TEST"
splitArr = Split(myStr, "#") '< -- don't append the collection number this time
MsgBox SplitArr(1) '< -- This would return "BAL"
MsgBox SplitArr(2) '< -- This would return "TEST"
If you are looking for additional reading, here is more from the MSDN:
Split Function
Description Returns a zero-based, one-dimensional array containing a specified number of substrings. SyntaxSplit( expression [ ,delimiter [ ,limit [ ,compare ]]] ) The Split function syntax has thesenamed arguments:
expression
Required. String expression containing substrings and delimiters. If expression is a zero-length string(""), Split returns an empty array, that is, an array with no elements and no data.
delimiter
Optional. String character used to identify substring limits. If omitted, the space character (" ") is assumed to be the delimiter. If delimiter is a zero-length string, a single-element array containing the entire expression string is returned.
limit
Optional. Number of substrings to be returned; -1 indicates that all substrings are returned.
compare
Optional. Numeric value indicating the kind of comparison to use when evaluating substrings. See Settings section for values.
You can do the following to get the substring after the # symbol.
x = "#BAL"
y = Right(x,len(x)-InStr(x,"#"))
Where x can be any string, with characters before or after the # symbol.

Preserving leading 0's in string - number - string conversion

I am working on a macro for a document-tracking sheet at work. I use a button that prompts the user to enter in the document number and I'd like to specify a default number based on the following numbering convention. The first two characters of the document number are the latter two year digits (15 in this case), then there is a "-" followed by a five digit serialization.
My current code looks at the last-entered document and increments those last 5 characters, but chops off any leading zeroes, which I want to keep. This is an extraction of the code to generate this default number (assuming the variable "prevNCRF" is the previous document name found in the doc):
Sub codeChunkTester()
Dim prevNCRF, defNCRFNum As String
Dim NCRFNumAr() As String
'pretend like we found this in the sheet.
prevNCRF = "15-00100"
'split the string into "15" and "00100" and throw those into an array.
NCRFNumAr() = Split(prevNCRF, "-")
'reconstruct the number by reusing the first part and dash, then converting
'the "00100" to a number with Val(), adding 1, then back to a string with CStr().
defNCRFNum = NCRFNumAr(0) & "-" & CStr(Val(NCRFNumAr(1)) + 1)
'message box shows "15-101" rather than "15-00101" as I had hoped.
MsgBox (defNCRFNum)
End Sub
So can anyone help me preserve those zeroes? I suppose I could include a loop that checks the length of the string and adds a leading zero until there are 5 characters, but perhaps there's a better way...
Converting "00100" to a Double using Val turned it into 100, so CStr(100) returns "100" as it should.
You need to format the string to what you want it to look like:
defNCRFNum = NCRFNumAr(0) & "-" & Format(Val(NCRFNumAr(1)) + 1, "00000")
If you need to parameterize the length of the string, you can use the String function to generate the format string:
Const digits As Integer = 5
Dim formatString As String
formatString = String(digits, "0")
defNCRFNum = NCRFNumAr(0) & "-" & Format(Val(NCRFNumAr(1)) + 1, formatString)
Here is that loop solution I mentioned above. If anyone's got something better, I'm all ears!
prevNCRF = "15-00100"
NCRFNumAr() = Split(prevNCRF, "-")
zeroAdder = CStr(Val(NCRFNumAr(1)) + 1)
'loop: everytime the zeroAdder string is not 5 characters long,
'put a zero in front of it.
Do Until Len(zeroAdder) = 5
zeroAdder = "0" & zeroAdder
Loop
defNCRFNum = NCRFNumAr(0) & "-" & zeroAdder
MsgBox (defNCRFNum)
defNCRFNum = NCRFNumAr(0) & "-" & Format(CStr(Val(NCRFNumAr(1)) + 1), String(Len(NCRFNumAr(1)), "0"))

VBA Trim leaving leading white space

I'm trying to compare strings in a macro and the data isn't always entered consistently. The difference comes down to the amount of leading white space (ie " test" vs. "test" vs. " test")
For my macro the three strings in the example should be equivalent. However I can't use Replace, as any spaces in the middle of the string (ex. "test one two three") should be retained. I had thought that was what Trim was supposed to do (as well as removing all trailing spaces). But when I use Trim on the strings, I don't see a difference, and I'm definitely left with white space at the front of the string.
So A) What does Trim really do in VBA? B) Is there a built in function for what I'm trying to do, or will I just need to write a function?
Thanks!
So as Gary's Student aluded to, the character wasn't 32. It was in fact 160. Now me being the simple man I am, white space is white space. So in line with that view I created the following function that will remove ALL Unicode characters that don't actual display to the human eye (i.e. non-special character, non-alphanumeric). That function is below:
Function TrueTrim(v As String) As String
Dim out As String
Dim bad As String
bad = "||127||129||141||143||144||160||173||" 'Characters that don't output something
'the human eye can see based on http://www.gtwiki.org/mwiki/?title=VB_Chr_Values
out = v
'Chop off the first character so long as it's white space
If v <> "" Then
Do While AscW(Left(out, 1)) < 33 Or InStr(1, bad, "||" & AscW(Left(out, 1)) & "||") <> 0 'Left(out, 1) = " " Or Left(out, 1) = Chr(9) Or Left(out, 1) = Chr(160)
out = Right(out, Len(out) - 1)
Loop
'Chop off the last character so long as it's white space
Do While AscW(Right(out, 1)) < 33 Or InStr(1, bad, "||" & AscW(Right(out, 1)) & "||") <> 0 'Right(out, 1) = " " Or Right(out, 1) = Chr(9) Or Right(out, 1) = Chr(160)
out = Left(out, Len(out) - 1)
Loop
End If 'else out = "" and there's no processing to be done
'Capture result for return
TrueTrim = out
End Function
TRIM() will remove all leading spaces
Sub demo()
Dim s As String
s = " test "
s2 = Trim(s)
msg = ""
For i = 1 To Len(s2)
msg = msg & i & vbTab & Mid(s2, i, 1) & vbCrLf
Next i
MsgBox msg
End Sub
It is possible your data has characters that are not visible, but are not spaces either.
Without seeing your code it is hard to know, but you could also use the Application.WorksheetFunction.Clean() method in conjunction with the Trim() method which removes non-printable characters.
MSDN Reference page for WorksheetFunction.Clean()
Why don't you try using the Instr function instead? Something like this
Function Comp2Strings(str1 As String, str2 As String) As Boolean
If InStr(str1, str2) <> 0 Or InStr(str2, str1) <> 0 Then
Comp2Strings = True
Else
Comp2Strings = False
End If
End Function
Basically you are checking if string1 contains string2 or string2 contains string1. This will always work, and you dont have to trim the data.
VBA's Trim function is limited to dealing with spaces. It will remove spaces at the start and end of your string.
In order to deal with things like newlines and tabs, I've always imported the Microsoft VBScript RegEx library and used it to replace whitespace characters.
In your VBA window, go to Tools, References, the find Microsoft VBScript Regular Expressions 5.5. Check it and hit OK.
Then you can create a fairly simple function to trim all white space, not just spaces.
Private Function TrimEx(stringToClean As String)
Dim re As New RegExp
' Matches any whitespace at start of string
re.Pattern = "^\s*"
stringToClean = re.Replace(stringToClean, "")
' Matches any whitespace at end of string
re.Pattern = "\s*$"
stringToClean = re.Replace(stringToClean, "")
TrimEx = stringToClean
End Function
Non-printables divide different lines of a Web page. I replaced them with X, Y and Z respectively.
Debug.Print Trim(Mid("X test ", 2)) ' first place counts as 2 in VBA
Debug.Print Trim(Mid("XY test ", 3)) ' second place counts as 3 in VBA
Debug.Print Trim(Mid("X Y Z test ", 2)) ' more rounds needed :)
Programmers prefer large text as may neatly be chopped with built in tools (inSTR, Mid, Left, and others). Use of text from several children (i.e taking .textContent versus .innerText) may result several non-printables to cope with, yet DOM and REGEX are not for beginners. Addressing sub-elements for inner text precisely (child elements one-by-one !) may help evading non-printable characters.

Extract last alpha+numeric pair in a string in Excel

I'm trying to figure out a way to extract the last alpha+numeric sequence in a string made up of similar patterns. The sequence is an alpha+numeric pair: an alpha string (one or more letters) plus a numeric string (one or more numbers). For instance:
G98Y8RT9 -- I need to isolate "RT9"
H8L77 -- I need to isolate "L77"
D64RL19HT7899 -- I need to isolate "HT7899"
As shown above, there are a variable number of characters in each part of the pair and also in the number of pairs preceding the last one. I've tried Excel formulas using FIND, ISNUMBER, etc., but I couldn't figure out the logic to make it work for these variables.
Is there a formula that would help? Or is some kind of regex VBA function the way to go?
I think this should work, as a user-defined function you can place it in a standard module, and call it like:
=GetLastPair($A$1), etc.
Here is the function:
Function GetLastPair(str As String)
Dim numPart As Integer
Dim strPart As Integer
Do Until Not IsNumeric(Mid(str, Len(str) - numPart, 1))
numPart = numPart + 1
Loop
Do Until IsNumeric(Mid(str, Len(str) - numPart - strPart, 1))
strPart = strPart + 1
Loop
GetLastPair = Right(str, numPart + strPart)
End Function
Results:
A bit long formula, but seems to work:
=RIGHT(A1,MATCH(TRUE,ISNUMBER(1*MID(A1,LEN(A1)-MATCH(FALSE,ISNUMBER(1*MID(A1,LEN(A1)-{0,1,2,3,4,5,6,7,8},1)),0)-{0,1,2,3,4,5,6,7,8},1)),0)+MATCH(FALSE,ISNUMBER(1*MID(A1,LEN(A1)-{0,1,2,3,4,5,6,7,8},1)),0)-1)

Resources