In a .csv spreadsheet, I have multiple strings with incrementing numerical values contained in each, and I need to extract the numbers from each string. For example, here are two strings:
DEVICE1.CM1 - 4.1.1.C1.CA_VALUE (A)
DEVICE1.CM2 - 6.7.1.C2.CA_VALUE (A)
DEVICE1.CM1 - 4.1.2.C1.CA_VALUE (A)
DEVICE1.CM1 - 4.1.2.C2.CA_VALUE (A)
DEVICE1.CM1 - 4.1.2.C3.CA_VALUE (A)
DEVICE1.CM1 - 5.1.1.C1.CA_VALUE (A)
DEVICE1.CM1 - 5.1.1.C2.CA_VALUE (A)
DEVICE1.CM1 - 5.10.1.C3.CA_VALUE (A)
DEVICE1.CM1 - 6.13.1.C10.CA_VALUE (A)
And I am looking to extract "4.1.1.C1" from the first string, and "6.7.1.C2" from the second string.
I have over 1000 strings, each with a different incremental value in the form of "#.#.#.C.#" and all of the options I have tried so far involve searching for a specific value to extract, rather than extracting all values of that general form. Is there any reasonable way to accomplish this?
I am not a big fan of regular expressions because they are often hard to read, but this is a typical example where you should use them. Read carefully the Q&A BigBen linked to in the comments.
Function extractCode(s As String) As String
Static rx As RegExp
If rx Is Nothing Then Set rx = New RegExp
rx.Pattern = "\d+\.\d+\.\d+\.C\d"
If rx.Test(s) Then
extractCode = rx.Execute(s)(0)
End If
End Function
(You will need to add the reference to the Microsoft VBScript Regular Expression library)
--> Updated my answer, you need to escape the dot, else it is a placeholder for any character and the pattern would also match something like "4x1y2zC3",
So here goes:
MID(A1,FIND("-",A1,1)+2,(FIND("_",A1,1)-FIND("-",A1,1))-5)
The fixed structure
(items) are always preceeded by " - " and followed by ".CA_VALUE (A)"
allows to isolate the code string via Split as follows:
consider ".CA_VALUE (A)" as closing delimiter, but change occurrence(s) to "- "
execute Split now on the resulting string using only the first delimiter (StartDelim "- ")
isolate the second token (index: 1 as split results are zero-based)
Function ExtractCode(ByVal s As String) As String
Const StartDelim As String = "- "
Const ClosingDelim As String = ".CA_VALUE (A)"
ExtractCode = Split(Replace(s, ClosingDelim, StartDelim), StartDelim)(1)
End Function
Another approach with focus on splitting via point delimiters //Edit 2021-11-20
If you want to experiment with a fixed start position of your 4-items code in a split array (based on point delimiters "."),
you might also consider the following approach:
split via point delimiters "."
filter only the 3rd,4th,5th and 6th item via WorksheetFunction.Index (by its columns argument)
join the resulting items again via connecting points "."
a) Using (Excel) version MS 365
Function ExtractCode(ByVal s As String, Optional startPos As Long = 3) As Variant
Const delim As String = "."
Dim tmp
tmp = Split(Replace(s, "- ", delim), delim) ' normalize hyphen to point delimiter
With Application.WorksheetFunction
ExtractCode = Join(.Index(tmp, 0, .Sequence(1, 4, startPos)), ".")
End With
End Function
b) Make it backwards compatible
Just change the function result assignment to
ExtractCode = Join(.Index(tmp, 0, Evaluate("{1,2,3,4}-1+" & startPos)), ".")
which in both cases changes the Index column argument to a 1-based column number Array(3,4,5,6)
Related
I have the following strings from which I need to extract 6 digit numbers. Since these strings are generated by another software, they occur interchangeably and I cannot control it. Is there any one method that would extract both 6-digit numbers from each of these strings?
Branch '100235 to 100236 Ckt 1' specified in table 'East Contingency' for record with primary key = 21733 was not found in branch or transformer data.
Loadflow branch ID '256574_701027_1' defined in supplemental branch table was not found in branch or transformer input.
Transmission element from bus number 135415 to bus number 157062 circuit ID = 1 defined for corridor 'IESO-NYISO' was not found in input data
I don't know VBA, but I can learn it if it means I can get the 6 digit numbers using a single method.
thanks
I have been using LEFT(), RIGHT() & MID() previously, but it means manually applying the appropriate formula for individual string.
If you have Microsoft 365, you can use this formula:
=LET(arr,TEXTSPLIT(SUBSTITUTE(SUBSTITUTE(A1,"'"," "),"_"," ")," "),
FILTER(arr,ISNUMBER(-arr)*(LEN(arr)=6)))
Thanks to #TomSharpe for this shorter version, using an array constant within TEXTSPLIT to add on possible delimiters.
=LET(arr,TEXTSPLIT(A1,{"_"," ",","," ","'"}),FILTER(arr,(LEN(arr)=6)*ISNUMBER(-arr)))
Data
Output
An alternative is:
=LET(ζ,MID(A1,SEQUENCE(,LEN(A1)-5),6),ξ,MID(ζ,SEQUENCE(6),1),FILTER(ζ,MMULT(SEQUENCE(,6,,0),1-ISERR(0+ξ))=6))
A couple more suggestions (if you need them):
(1) Replacing all non-digit characters with a space then splitting the resulting string:
=LET(numbers,TEXTSPLIT(TRIM(REDUCE("",MID(A1,SEQUENCE(1,LEN(A1)),1),LAMBDA(a,c,IF(is.digit(c),a&c,a&" "))))," "),FILTER(numbers,LEN(numbers)=6))
Here I've defined a function is.digit as
=LAMBDA(c, IF(c = "", FALSE, AND(CODE(c) > 47, CODE(c) < 58)))
(tl;dr I quite like doing it this way because it hides the implementation details of is.digit and creates a rudimentary form of encapsulation)
(2) A UDF - based on the example here and called as
=RegexTest(A1)
Option Explicit
Function RegexTest(s As String) As Double()
Dim regexOne As Object
Dim theNumbers As Object
Dim Number As Object
Dim result() As Double
Dim i As Integer
Set regexOne = New RegExp
' Not sure how you would extract numbers of length 6 only, so extract all numbers...
regexOne.Pattern = "\d+"
regexOne.Global = True
regexOne.IgnoreCase = True
Set theNumbers = regexOne.Execute(s)
i = 1
For Each Number In theNumbers
'...Check the length of each number here
If Len(Number) = 6 Then
ReDim Preserve result(1 To i)
result(i) = CDbl(Number)
i = i + 1
End If
Next
RegexTest = result
End Function
Note - if you wanted to preserve leading zeroes you would need to omit the Cdbl() and return the numbers as strings. Returns an error if no 6-digit numbers are found.
I have a String in VBA with this text: < History Version="1.10" Client="TestClient001" >
I want to get this TestClient001 or anything that's inside Client="xxxx"
I made this code but it's not working
Client = MID(text,FIND("Client=""",text)+1,FIND("""",text)-FIND("Client=""",text)-1)
Is there a way to specifically get the text inside Client="xxxx"?
There's no such function as Find in VBA - that's a worksheet function. The VBA equivalent is InStr, but I don't think you need to use it here.
The best tool for extracting one string from another in VBA is often Split. It takes one string and splits it into an array based on a delimiting string. The best part is that the delimiter doesn't have to be a single character - you can make it an entire string. In this case, we'd probably do well with two nested Split functions.
Client = Split(Split(text,"Client=""")(1),Chr(34))(0)
The inner Split breaks your text string where it finds "Client="". The (1) returns array element 1. Then the outer Split breaks that returned text where it finds a " character, and returns array element 0 as the final result.
For better maintainability, you may want to use constants for your delimiters as well.
Sub EnclosedTextTest()
Const csFlag1 As String = "Client="""
Const csFlag2 As String = """"
Const csSource As String = "< History Version=""1.10"" Client=""TestClient001"" >"
Dim strClient As String
strClient = Split(Split(csSource, csFlag1)(1), csFlag2)(0)
Debug.Print strClient
End Sub
However, if the Split method doesn't work for you, we can use a method similar to the one you were using, with InStr. There are a couple of options here as well.
InStr will return the position in a string that it finds a matching value. Like Split, it can be given an entire string as its delimiter; however, if you use more than one character you need to account for the fact that it will return where it finds the start of that string.
InStr(1,text,"Client=""")
will return 26, the start of the string "Client="" in the text. This is one of the places where it's helpful to have your delimiter stored in a constant.
intStart = InStr(1,text,csFlag1)+len(csFlag1)
This will return the location it finds the start of the delimiter, plus the length of the delimiter, which positions you at the beginning of the text.
If you store this position in a variable, it makes the next part easier as well. You can use that position to run a second InStr and find the next occurrence of the " character.
intEnd = InStr(intStart,text,csFlag2)
With those values, you can perform your mid. You code overall will look something like this:
Sub InstrTextTest()
Const csFlag1 As String = "Client="""
Const csFlag2 As String = """"
Const csSource As String = "< History Version=""1.10"" Client=""TestClient001"" >"
Dim strClient As String
Dim intPos(0 To 1) As Integer
intPos(0) = InStr(1, csSource, csFlag1) + Len(csFlag1)
intPos(1) = InStr(intPos(0), csSource, csFlag2)
strClient = Mid(csSource, intPos(0), intPos(1) - intPos(0))
Debug.Print strClient
End Sub
This will work, but I prefer the Split method for ease of reading and reuse.
You can make use of Split function to split at character = then with last element of the resulting array remove character quotes and > with help of replace function and you will get the required output.
In the end I got it thanks to the idea given by #alok and #Bigben
Dim cl() As String
Dim ClientCode As String
If (InStr(1, temp, "Client=", vbTextCompare) > 0) Then
cl = Split(temp, "=")
ClientCode = cl(UBound(cl))
ClientCode = Replace(ClientCode, """", "")
ClientCode = Replace(ClientCode, ">", "")
It's XML, so you could do this:
Dim sXML As String
sXML = "<History Version=""1.10"" Client=""TestClient001"">"
With CreateObject("MSXML.Domdocument")
.LoadXML Replace(sXML, ">", "/>") 'close the element
Debug.Print .FirstChild.Attributes.getnameditem("Client").Value
End With
in my program I (after many procedures) get tokenized words. Unfortunately due to reversing them they hold punctuation characters at the beginning of a word eg. "BARGE "UR 106
How to move that " from the beginning to the end -> BARGE "UR 106"
another example: (.REIS (HASAN M should be -> REIS (HASAN M.)
Up to now I've tried:
{DOCHS14.SHEM__ONIA} startswith ["\"","\(","\."\,"\(\."]
then
Local StringVar str:={DOCHS14.SHEM__ONIA}[0]
TrimLeft ({DOCHS14.SHEM__ONIA})
{DOCHS14.SHEM__ONIA}&str;
But that gives me errors:
A number, currency amount, boolean, date, time, date-time, or string is expected here.
How to fix that? or is there another way to solve this problem?
There are multiple issues in your formula.
startswith expects a string not an array of strings.
Only the double qoute must be escaped, but you did it wrong. (See here)
While a solution with startswith is also possible, I have used the Left-function instead. In your third example one have to check two characters, so this must be checked first and output another result.
if Left({DOCHS14.SHEM__ONIA}, 2) = "(." Then
Mid({DOCHS14.SHEM__ONIA}, 3) + ".)"
else if Left({DOCHS14.SHEM__ONIA}, 1) in ["""", "(", ".", ","] Then
Mid({DOCHS14.SHEM__ONIA}, 2) + Left({DOCHS14.SHEM__ONIA}, 1)
else
{DOCHS14.SHEM__ONIA}
I'm having trouble finding a way to remove floating integers from a cell without removing numbers attached to the end of my string. Could I get some help as to how to approach this issue?
For example, in the image attached, instead of:
john123 456 hamilton, I want:
john123 hamilton
This can be done using regular expressions. You will match on the data you want to remove, then replace this data with an empty string.
Since you didn't provide any code, all I can do you for is provide you with a function that you can implement into your own project. This function can be used in VBA or as a worksheet function, such as =ReplaceFloatingIntegers(A1).
You will need to add a reference to Microsoft VBScript Regular Expressions 5.5 by going to Tools, References in the VBE menu.
Function ReplaceFloatingIntegers(Byval inputString As String) As String
With New RegExp
.Global = True
.MultiLine = True
.Pattern = "(\b\d+\b\s?)"
If .Test(inputString) Then
ReplaceFloatingIntegers = .Replace(inputString, "")
Else
ReplaceFloatingIntegers = inputString
End If
End With
End Function
Breaking down the pattern
( ... ) This is a capturing group. Anything captured in this group will be able to be replaced with the .Replace() function.
\b This is a word boundary. We use this because we want to test from the edge to edge of any 'words' (which includes words that contain only digits in our case).
\d+\b This will match any digit (\d), one to unlimited + times, to the next word boundary\b
\s? will match a single whitespace character, but it's optional ? if this character exists
You can look at this personalized Regex101 page to see how this matches your data. Anything matched here is replaced with an empty string.
One of my cells appears to be blank but has a length of 2 characters. I copied the string to this website and it has identified it as a null string.
I have tried using IsNull and IsEmpty, as well as testing to see if it is equivalent to the vbNullString but it is still coming up as False.
How do I identify this string as being Null?
A string value that "appears to be blank but has a length of 2 characters" is said to be whitespace, not blank, not null, not empty.
Use the Trim function (or its Trim$ stringly-typed little brother) to strip leading/trailing whitespace characters, then test the result against vbNullString (or ""):
If Trim$(value) = vbNullString Then
The Trim function won't strip non-breaking spaces though. You can write a function that does:
Public Function TrimStripNBSP(ByVal value As String) As String
TrimStripNBSP = Trim$(Replace(value, Chr$(160), Chr$(32)))
End Function
This replaces non-breaking spaces with ASCII 32 (a "normal" space character), then trims it and returns the result.
Now you can use it to test against vbNullString (or ""):
If TrimStripNBSP(value) = vbNullString Then
The IsEmpty function can only be used with a Variant (only returns a meaningful result given a Variant anyway), to determine whether that variant contains a value.
The IsNull function has extremely limited use in Excel-hosted VBA, and shouldn't be needed since nothing is ever going to be Null in an Excel worksheet - especially not a string with a length of 2.
Chr(160) Issue
160 is the code number of a Non-Breaking Space.
Let us say the cell is A1.
In any cell write =CODE(A1) and in another (e.g. next to) write =CODE(MID(A1,2,1)).
The results are the code numbers (integers e.g. a and b) of the characters.
Now in VBA you can use:
If Cells(1, 1) = Chr(a) & Chr(b) Then
End If
or e.g.
If Left(Cells(1, 1), 1) = Chr(160) then
End If