Trying to extract a 6 or 5 digit number from a string - excel

I am trying to extract a 5 or 6 digit code from a string.
C:\Users\pthaxthon\Desktop\45697_Originals
C:\Users\pthaxthon\Desktop\123456_Originals
I just need the 5 or 6 digit number
The number always occur after the 4th dash and before the first _
I have tried using the mid and split command but with no success
Range("D14").Value = Mid(fle, 28, 6)
I just need the 5 or 6 digit number

Another solution would be to rely on "The number always occur after the 4th dash and before the first _". Then you could use split in the following way
Option Explicit
Function GetNo(s As String) As String
Const BSLASH = "\"
Const UNDERSCORE = "_"
Const FOUR = 4
Dim v As Variant
' Split the string by backslash
v = Split(s, BSLASH)
' Take always the fourth entry and split it by underscore
v = Split(v(FOUR), UNDERSCORE)
' take string before the underscore
GetNo = v(0)
End Function
Sub TestIt()
Dim inp As String
inp = "C:\Users\pthaxthon\Desktop\123456_Originals"
inp = "C:\Users\pthaxthon\Desktop\45697_Originals"
Debug.Print GetNo(inp)
End Sub

You could also use a regular expression to pull out the number.
Set SDI = CreateObject("VBScript.RegExp")
SDI.Pattern = "\d+" '* keep the number only
Set Num_out = SDI.Execute(Str_In)
Criteria_out = Val(Num_out(0))

Since the number is always between the 28th digit and subsequent "_" character, you can go;
Range("D14").Value = Split(Mid(fle, 28), "_")(0)

Related

Remove numbers from end of string if count of numbers(characters) > 8

I need to remove numbers from end of string if count of numbers(characters) > 8
I have used the below functions , but it remove all numbers from the string.
So, How this function can be modified to add a condition if count of numbers(characters) > 8
In advance, grateful for any helpful comments and answers.
Option Explicit
Function StripNumber(stdText As String)
Dim str As String, i As Integer
stdText = Trim(stdText)
For i = 1 To Len(stdText)
If Not IsNumeric(Mid(stdText, i, 1)) Then
str = str & Mid(stdText, i, 1)
End If
Next i
StripNumber = str ' * 1
End Function
Function Remove_Number(Text As String) As String
With CreateObject("VBScript.RegExp")
.Global = True
.Pattern = "[0-9]"
Remove_Number = .Replace(Text, "")
End With
End Function
You can use
\d{8,}(?=\.\w+$)
\d{8,}(?=\.[^.]+$)
See the regex demo. If there must be at least 9 digits, replace 8 with 9.
Details:
\d{8,} - eight or more digits
(?=\.\w+$) - that are immediately followed with a . and one or more word chars and then end of string must follow
(?=\.[^.]+$) - the eight or more digits must be immediately followed with a . char and then one or more chars other than a . char till the end of string.
If you have access to the newest functions you can avoid VBA alltogether:
Formula in B2:
=LET(X,TEXTBEFORE(A2,".",-1),Y,TEXTAFTER(A2,X),Z,TEXTAFTER(CONCAT(".",IFERROR(--MID(X,SEQUENCE(LEN(X)),1),".")),".",-1),IF(LEN(Z)>8,SUBSTITUTE(A2,Z&Y,Y),A2))
Or, if there are no leading zeros in these numbers:
=LET(X,TEXTBEFORE(A2,".",-1),Y,TEXTAFTER(A2,X),Z,MAX(IFERROR(--MID(X,SEQUENCE(LEN(X)),LEN(X)),"")),IF(LEN(Z)>8,SUBSTITUTE(A2,Z&Y,Y),A2))
Or; a spilled array:
Formula in B2:
=BYROW(A2:A6,LAMBDA(a,LET(X,TEXTBEFORE(a,".",-1),Y,TEXTAFTER(a,X),Z,TEXTAFTER(CONCAT(".",IFERROR(--MID(X,SEQUENCE(LEN(X)),1),".")),".",-1),IF(LEN(Z)>8,SUBSTITUTE(a,Z&Y,Y),a))))
Or:
=BYROW(A2:A6,LAMBDA(a,LET(X,TEXTBEFORE(a,".",-1),Y,TEXTAFTER(a,X),Z,MAX(IFERROR(--MID(X,SEQUENCE(LEN(X)),LEN(X)),"")),IF(LEN(Z)>8,SUBSTITUTE(a,Z&Y,Y),a))))

Remove Certain Characters from a String using UDF

I have a column which contain cells that have some list of alphanumeric number system as follows:
4A(4,5,6,7,8,9); 4B(4,5,7,8); 3A(1,2,3); 3B(1,2,3), 3C(1,2)
On a cell next to it, I use a UDF function to get rid of special characters "(),;" in order to leave the data as
4A456789 4B4578 3A123 3B123 3C12
Function RemoveSpecial(Str As String) As String
Dim SpecialChars As String
Dim i As Long
SpecialChars = "(),;-abcdefghijklmnopqrstuvwxyz"
For i = 1 To Len(SpecialChars)
Str = Replace$(Str, Mid$(SpecialChars, i, 1), "")
Next
RemoveSpecial = Str
End Function
For the most part this works well. However, on certain occasions, the cell would contain an unorthodox pattern such as when a space is included between the 4A and the parenthesized items:
4A (4,5,6,7,8,9);
or when a text appears inside the parenthesis (including two spaces on each side):
4A (4,5, skip 8,9);
or a space appears between the first two characters:
4 A(4,5,6)
How would you fix this so that the random spaces are removed except to delaminate the actual combination of data?
One strategy would be to substitute the patterns you want to keep before eliminating the "special" characters, then restore the desired patterns.
From your sample data, it look like you want to keep a space only if it follow ); or ),
Something like this:
Function RemoveSpecial(Data As Variant) As Variant
Dim SpecialChars As String
Dim KeepStr As Variant, PlaceHolder As Variant, ReplaceStr As Variant
Dim i As Long
Dim DataStr As String
SpecialChars = " (),;-abcdefghijklmnopqrstuvwxyz"
KeepStr = Array("); ", "), ")
PlaceHolder = Array("~0~", "~1~") ' choose a PlaceHolder that won't appear in the data
ReplaceStr = Array(" ", " ")
DataStr = Data
For i = LBound(KeepStr) To UBound(KeepStr)
DataStr = Replace$(DataStr, KeepStr(i), PlaceHolder(i))
Next
For i = 1 To Len(SpecialChars)
DataStr = Replace$(DataStr, Mid$(SpecialChars, i, 1), vbNullString)
Next
For i = LBound(KeepStr) To UBound(KeepStr)
DataStr = Replace$(DataStr, PlaceHolder(i), ReplaceStr(i))
Next
RemoveSpecial = Application.Trim(DataStr)
End Function
Another strategy would be regular expressions (RegEx)
It looks like a regular expression could come in handy here, for example:
Function RemoveSpecial(Str As String) As String
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "\)[;,]( )|[^A-Z\d]+"
RemoveSpecial = .Replace(Str, "$1")
End With
End Function
I have used the regular expression:
\)[;,]( )|[^A-Z\d]+
You can see an online demo to see the result in your browser. The way this works is to apply a form of what some would call "The best regex trick ever!"
\)[;,]( ) - Escape a closing paranthesis, then match either a comma or semicolon before we capture a space character in our 1st capture group.
| - Or use the following alternation:
[^A-Z\d]+ - Any 1+ char any other than in given character class.
EDIT:
In case you have values like 4A; or 4A, you can use:
(?:([A-Z])|\))[;,]( )|[^A-Z\d]+
And replace with $1$2. See an online demo.

How to extract the first instance of digits in a cell with a specified length in VBA?

I have the following Text sample:
Ins-Si_079_GM_SOC_US_VI SI_SOC_FY1920_US_FY19/20_A2554_Si Resp_2_May
I want to get the number 079, So what I need is the first instance of digits of length 3. There are certain times the 3 digits are at the end, but they usually found with the first 2 underscores. I only want the digits with length three (079) and not 19, 1920, or 2554 which are different lengths.
Sometimes it can look like this with no underscore:
1920 O-B CLI 353 Tar Traf
Or like this with the 3 digit number at the end:
Ins-Si_GM_SOC_US_VI SI_SOC_FY1920_US_FY19/20_A2554_Si Resp_2_079
There are also times where what I need is 2 digits but when it's 2 digits its always at the end like this:
FY1920-Or-OLV-B-45
How would I get what I need in all cases?
You can split the listed items and check for 3 digits via Like:
Function Get3Digits(s As String) As String
Dim tmp, elem
tmp = Split(Replace(Replace(s, "-", " "), "_", " "), " ")
For Each elem In tmp
If elem Like "###" Then Get3Digits = elem: Exit Function
Next
If Get3Digits = vbNullString Then Get3Digits = IIf(Right(s, 2) Like "##", Right(s, 2), "")
End Function
Edited due to comment:
I would execute a 2 digit search when there are no 3 didget numbers before the end part and the last 2 digits are 2. if 3 digits are fount at end then get 3 but if not then get 2. there are times when last is a number but only one number. I would only want to get last if there are 2 or 3 numbers. The - would not be relevant to the 2 digets. if nothing is found that is desired then would return " ".
If VBA is not a must you could try:
=TEXT(INDEX(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"_"," "),"-"," ")," ","</s><s>")&"</s></t>","//s[.*0=0][string-length()=3 or (position()=last() and string-length()=2)]"),1),"000")
It worked for your sample data.
Edit: Some explaination.
SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"_"," "),"-"," ")," ","</s><s>") - The key part to transform all three potential delimiters (hyphen, underscore and space) to valid XML node end- and startconstruct.
The above concatenated using ampersand into a valid XML construct (adding a parent node <t>).
FILTERXML can be used to now 'split' the string into an array.
//s[.*0=0][string-length()=3 or last() and string-length()=2] - The 2nd parameter of FILTERXML which should be valid XPATH syntax. It reads:
//s 'Select all <s> nodes with
following conditions:
[.*0=0] 'Check if an <s> node times zero
returns zero (to check if a node
is numeric. '
[string-length()=3 or (position()=last() and string-length()=2)] 'Check if a node is 3 characters
long OR if it's the last node and
only 2 characters long.
INDEX(.....,1) - I mentioned in the comments that usually this is not needed, but since ExcelO365 might spill the returned array, we may as well implemented to prevent spilling errors for those who use the newest Excel version. Now we just retrieving the very first element of whatever array FILTERXML returns.
TEXT(....,"000") - Excel will try delete leading zeros of a numeric value so we use TEXT() to turn it into a string value of three digits.
Now, if no element can be found, this will return an error however a simple IFERROR could fix this.
Try this function, please:
Function ExtractThreeDigitsNumber(x As String) As String
Dim El As Variant, arr As Variant, strFound As String
If InStr(x, "_") > 0 Then
arr = Split(x, "_")
Elseif InStr(x, "-") > 0 Then
arr = Split(x, "-")
Else
arr = Split(x, " ")
End If
For Each El In arr
If IsNumeric(El) And Len(El) = 3 Then strFound = El: Exit For
Next
If strFound = "" Then
If IsNumeric(Right(x, 2)) Then ExtractThreeDigitsNumber = Right(x, 2)
Else
ExtractThreeDigitsNumber = strFound
End If
End Function
It can be called in this way:
Sub testExtractThreDig()
Dim x As String
x = "Ins-Si_079_GM_SOC_US_VI SI_SOC_FY1920_US_FY19/20_A2554_Si Resp_2_May"
Debug.Print ExtractThreeDigitsNumber(x)
End Sub

Split String New Line After 3 Space in VB.net

i have problem to split string into newline in vb.net.
right now i can make it to split by a single space.i want split new line after 3 space.
Dim s As String = "SOMETHING BIGGER THAN YOUR DREAM"
Dim words As String() = s.Split(New Char() {" "c})
For Each word As String In words
Console.WriteLine(word)
Next
output :
SOMETHING
BIGGER
THAN
YOUR
DREAM
Desire output :
SOMETHING BIGGER THAN
YOUR DREAM
Another alternative added to existing efficient answers might to be:
Dim separator As Char = CChar(" ")
Dim sArr As String() = "SOMETHING BIGGER THAN YOUR DREAM".Split(separator)
Dim indexOfSplit As Integer = 3
Dim sFinal As String = Join(sArr.Take(indexOfSplit).ToArray, separator) & vbNewLine &
Join(sArr.Skip(indexOfSplit).ToArray, separator)
Console.WriteLine(sFinal)
You can split your input string, then loop the array of parts generated and add them to a StringBuilder object.
When you have read a number of parts that is multiple of a defined value, (wordsPerLine, here), you append vbNewLine to the current part.
When the loop completes, print the content of the StringBuilder to the Console:
Dim input As String = "SOMETHING BIGGER THAN YOUR DREAM, NOT MORE THAN YOUR ACCOUNT BALANCE"
Dim wordsPerLine As Integer = 3
Dim wordsCounter As Integer = 1
Dim sb As StringBuilder = New StringBuilder()
For Each word As String In input.Split()
sb.Append(word & If(wordsCounter Mod wordsPerLine = 0, vbNewLine, " "))
wordsCounter += 1
Next
Console.WriteLine(sb.ToString())
Prints:
SOMETHING BIGGER THAN
YOUR DREAM, NOT
MORE THAN YOUR
ACCOUNT BALANCE
Instead of using split, you might capture 3 words in a capturing group and match the trailing whitespace chars.
In the replacement use the group followed by a newline.
Pattern
(\S+(?:\s+\S+){2})\s*
That will match:
( Capture group 1
\S+ Match 1+ non whitespace chars
(?:\s+\S+){2} Repeat 2 times matching 1+ whitespace chars and 1+ non whitespace chars
) Close group 1
\s* Match trailing whitespace chars
.NET Regex demo | VB.NET demo
Example code
Dim s As String = "SOMETHING BIGGER THAN YOUR DREAM"
Dim output As String = Regex.Replace(s, "(\S+(?:\s+\S+){2})\s*", "$1" + Environment.NewLine)
Console.WriteLine(output)
Output
SOMETHING BIGGER THAN
YOUR DREAM
String.Join has an overload that will help you.
First parameter is the character to use between elements of your array.
Second parameter is the array you wish to join.
Third parameter is the starting position, for the first line in your desired output this would be the element at index 0.
Fourth parameter is the length to use, for the first line we want three array elements.
Private Sub OPCode()
Dim s As String = "SOMETHING BIGGER THAN YOUR DREAM"
Dim words As String() = s.Split(New Char() {" "c})
Dim line1 As String = String.Join(" ", words, 0, 3)
Console.WriteLine(line1)
Dim line2 As String = String.Join(" ", words, 3, words.Length - 3)
Console.WriteLine(line2)
End Sub

Preserving leading 0's in string - number - string conversion

I am working on a macro for a document-tracking sheet at work. I use a button that prompts the user to enter in the document number and I'd like to specify a default number based on the following numbering convention. The first two characters of the document number are the latter two year digits (15 in this case), then there is a "-" followed by a five digit serialization.
My current code looks at the last-entered document and increments those last 5 characters, but chops off any leading zeroes, which I want to keep. This is an extraction of the code to generate this default number (assuming the variable "prevNCRF" is the previous document name found in the doc):
Sub codeChunkTester()
Dim prevNCRF, defNCRFNum As String
Dim NCRFNumAr() As String
'pretend like we found this in the sheet.
prevNCRF = "15-00100"
'split the string into "15" and "00100" and throw those into an array.
NCRFNumAr() = Split(prevNCRF, "-")
'reconstruct the number by reusing the first part and dash, then converting
'the "00100" to a number with Val(), adding 1, then back to a string with CStr().
defNCRFNum = NCRFNumAr(0) & "-" & CStr(Val(NCRFNumAr(1)) + 1)
'message box shows "15-101" rather than "15-00101" as I had hoped.
MsgBox (defNCRFNum)
End Sub
So can anyone help me preserve those zeroes? I suppose I could include a loop that checks the length of the string and adds a leading zero until there are 5 characters, but perhaps there's a better way...
Converting "00100" to a Double using Val turned it into 100, so CStr(100) returns "100" as it should.
You need to format the string to what you want it to look like:
defNCRFNum = NCRFNumAr(0) & "-" & Format(Val(NCRFNumAr(1)) + 1, "00000")
If you need to parameterize the length of the string, you can use the String function to generate the format string:
Const digits As Integer = 5
Dim formatString As String
formatString = String(digits, "0")
defNCRFNum = NCRFNumAr(0) & "-" & Format(Val(NCRFNumAr(1)) + 1, formatString)
Here is that loop solution I mentioned above. If anyone's got something better, I'm all ears!
prevNCRF = "15-00100"
NCRFNumAr() = Split(prevNCRF, "-")
zeroAdder = CStr(Val(NCRFNumAr(1)) + 1)
'loop: everytime the zeroAdder string is not 5 characters long,
'put a zero in front of it.
Do Until Len(zeroAdder) = 5
zeroAdder = "0" & zeroAdder
Loop
defNCRFNum = NCRFNumAr(0) & "-" & zeroAdder
MsgBox (defNCRFNum)
defNCRFNum = NCRFNumAr(0) & "-" & Format(CStr(Val(NCRFNumAr(1)) + 1), String(Len(NCRFNumAr(1)), "0"))

Resources