I am very new to VBA and I'm working with data from a chemistry instrument which outputs values that are not uniformly delimited and contain special characters. I am trying to import these values into excel and have solved pretty much all of the problems except for one. When I am importing these values into excel they are read in line-by-line. Each line that is read-in is contained within its own cell in column A. There can be anywhere from 50- roughly 1000 columns of data, with the associated identifiers and metadata above. Below is a copy/paste of the first 5 lines of data.
1 7.724 1190 1231 1292 PV 4 724391 121434659 49.60% 9.688%
2 9.272 1451 1481 1484 VB 3961552 186833117 76.32% 14.905%
3 10.968 1732 1754 1816 VV 2673526 111034313 45.36% 8.858%
4 15.249 2382 2445 2453 PV 296082 33844178 13.82% 2.700%
5 15.384 2453 2466 2500 VV 219908 14461812 5.91% 1.154%
The problem I am having is that there are times when there are multiple peaks that make up one value and are recorded as 2 letters a space and one to two numbers (0-9), whereas peak types with only one peak are just two letters. For an example please look in line 1 where there is "PV 4". I am trying to use regular expressions to loop through the A column, starting at row 18 and ending around row 1000, to find the letters and associated numbers, and remove the interstitial space so that he cell will look like this:
1 7.724 1190 1231 1292 PV4 724391 121434659 49.60% 9.688%
Once it is in that form, I can use the space delimiter to separate the cells without frame shifting the ones that have the multiple peak types.
Here is the code I've written so far, but I am unsure how to proceed:
Sub PKTYRegexRemoveSpace()
Dim StrPattern As String: StrPattern = "[A,B,H,M,N,P,S,T,U,V,X,\+][A,B,H,M,N,P,S,T,U,V,X,\+]\s[0-9]{1,2}\s"
Dim StrInput As String
Dim MyRange As Range
Dim regEx As New RegExp
Dim Cell As Range
Set MyRange = ActiveSheet.Range("A22:A24")
For Each Cell In MyRange
If StrPattern <> "" Then
StrInput = Cell.Value
With regEx
.Pattern = StrPattern
.Global = False
.IgnoreCase = False
End With
If regEx.Test(StrInput) Then
MsgBox (regEx.Replace(StrInput, *this is where I need help*))
Else
MsgBox ("Not matched")
End If
End If
Next
End Sub
I am using a msgbox during devlopment in order to avoid having to re-import the file for every failed replacement attempt.
Any help would be greatly appreciated!
I suggest change the regex Pattern to use capturing groups and word boundary tokens
\b([A,B,H,M,N,P,S,T,U,V,X,\+][A,B,H,M,N,P,S,T,U,V,X,\+])\s([0-9]{1,2})\b
Then, for the replace string:
$1$2
Related
Help Needed: How to get the Decimal (REAL) number Only in an string in using an Excel formula or VBA?
I have in "column A" a string with just one decimal number on it. I want to extract that decimal (REAL) number ONLY but it is extracting the first number on the string. See below for details...
Current Situation:
I am using on Column B the Formula:
=LOOKUP(9.9E+307,--LEFT(MID(A1,MIN(FIND({1,2,3,4,5,6,7,8,9,0}, $A1&"1023456789")),999),ROW(INDIRECT("1:999"))))
Column A | Column B
"Some text"... The Value of Project 456 is 12.56 ... more text. | 456
Some Text"... Project 459 value is 13.5 ... "more text" | 459
Desired Situation:
I want to get the decimal (REAL) number ONLY out of the string and ignore the numbers that doesn't contain decimals from Column A. Example:
Column A | Column B
"Some text"... The Value of Project 456 is 12.56 ... more text. | 12.56
Some Text"... Project 459 value is 13.5 ... "more text" | 13.5
Any help needed is appreciated, could be an excel formula or VBA solution.
Thank you!
Try this UDF pasted into a standard public module code sheet.
Function realNum(str As String, _
Optional ndx As Integer = 1)
Dim tmp As String
Static rgx As Object
'with rgx as static, it only has to be created once; beneficial when filling a long column with this UDF
If rgx Is Nothing Then
Set rgx = CreateObject("VBScript.RegExp")
End If
realNum = 0
With rgx
.Global = True
.IgnoreCase = True
.MultiLine = True
.Pattern = "[0-9]{1,9}\.[0-9]{1,9}"
If .Test(str) Then
realNum = CDbl(.Execute(str)(ndx - 1))
End If
End With
End Function
Note that while it defaults to the first occurrence (e.g. B2), you can apply an optional parameter to get the second, third, etc. (e.g. B3).
This Array formula will sum all the number that have . in them, per stirng. If only one per string then it returns just that one:
=SUM(IF((ISNUMBER(--TRIM(MID(SUBSTITUTE(A1," ",REPT(" ",99)),ROW($1:$25)*99-98,99)))) * (ISNUMBER(FIND(".",TRIM(MID(SUBSTITUTE(A1," ",REPT(" ",99)),ROW($1:$25)*99-98,99))))),--TRIM(MID(SUBSTITUTE(A1," ",REPT(" ",99)),ROW($1:$25)*99-98,99))))
Being an array formula it must be confirmed with Ctrl-Shift-enter instead of Enter when exiting edit mode:
Just wondering if this is possible, and how I would go about doing it... I'm not sure if I can use conditional formatting, or if it requires VBA, or what.
I have a vlookup that I split coding for, but it only works when they include ".000000.00000.0000.0000" at the end. How would I go about adding ".000000.00000.0000.0000" when the character limit is under 46 for that cell? The coding will always be 46 characters when the zeroes are included, so anything under is invalid.
I was able to use =IF(LEN(A2) < 23, "0" & LEFT(A2,15),LEFT(A2,16)) to truncate what I needed and add leading zeroes if not included to have the end result do what I needed without all of the zeroes.
try this
Sub test()
Dim i As Long
Dim text As String
For i = 1 To 32767
text = text & "a"
Next
text = text & "end"
Range("A1").Value = text
'"end" will be lost
End Sub
I have a group of cells, the first of the string never changes, it is and always will (until the coder changes it) 20 characters (inc spaces).
I then want to extract the 3 numbers (and in some cases 2) from the remaining sequence.
The monthly cost is 2 silver, 1 copper and 40 iron.
The monthly cost is 1 silver, 94 copper and 40 iron.
The monthly cost is 1 silver and 75 copper.
The monthly cost is 8 silver and 40 copper.
The monthly cost is 1 silver.
The monthly cost is 99 silver, 99 copper and 99 iron.
The monthly cost is 1 gold.
In the sample above you can see that there is no set value after the first 20 chars.
1 or 99 silver
1 or 99 copper
0, 1 or 99 iron
I can't get a sequence that gets all the cells correct, I've tried the following:
=IF(J7<>1,(MID(TRIM(J7),FIND(" iron",TRIM(J7))-2,FIND(" iron",TRIM(J7))-FIND(" iron",TRIM(J7))+3)),"")
results in: #VALUE! (when no iron)
=TRIM(MID(J6,FIND(" silver",J6)-2,LEN(J6)-FIND(" silver",J6)-26))&TRIM(MID(J6,FIND(" copper",J6)-2,LEN(J6)-FIND(" copper",J6)-16))&TRIM(MID(J6,FIND(" iron",J6)-2,LEN(J6)-FIND(" iron",J6)-3))
results in: 1 s9440
=MID(J7,31,2-ISERR(MID(J7,21,1)+0))
results in: nd
If I & the cells as part of the calculation, they then don't calculate in the next mathematical step as I've had to allow for spaces in my code, in the case that there may be 2 digit numbers, not single.
=MID(J5,SEARCH(" silver",J5,1)-2,2)&MID(J5,SEARCH(" copper",J5,1)-2,2)&MID(J5,SEARCH(" iron",J5,1)-2,2)
results: 2 140
not: 2140
What I need to end up with is:
2140
19440
175
840
1
999999
Many thanks in advance.
This formula worked for me with your data, assuming text string in cell A1
=IFERROR(MID(A1,SEARCH("silver",A1)-3,2)+0,"")&IFERROR(MID(A1,SEARCH("copper",A1)-3,2)+0,"")&IFERROR(MID(A1,SEARCH("iron",A1)-3,2)+0,"")
I assume you don't want the value for "Gold"?
When it comes to pattern matching in strings, RegEx if often the way to go.
In Excel, this requires a VBA solution, using a reference to "Microsoft VBScript Regular Expresions 5.5" (you can go late bound if you prefer)
Here's a starter for your case, as a UDF
Use it as a formula like =GetValues(A1) assuming 1st raw data is in A1. Copy down for as many rows as required
This will extract up to 3 values from a string.
Function GetValues(r As Range) As Variant
Dim re As RegExp
Dim m As MatchCollection
Dim v As Variant
Dim i As Long
Set re = New RegExp
re.Pattern = "(\d+)\D+(\d+)\D+(\d+)"
If re.test(r.Value) Then
Set m = re.Execute(r.Value)
Else
re.Pattern = "(\d+)\D+(\d+)"
If re.test(r.Value) Then
Set m = re.Execute(r.Value)
Else
re.Pattern = "(\d+)"
If re.test(r.Value) Then
Set m = re.Execute(r.Value)
End If
End If
End If
If m Is Nothing Then
GetValues = vbNullString
Else
For i = 0 To m.Item(0).SubMatches.Count - 1
v = v & m.Item(0).SubMatches(i)
Next
GetValues = v
End If
End Function
Since you are just stripping digits you can use a short one-shot RegExp if you wanted the VBA route:
Function GetDigits(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Pattern = "[^\d]+"
.Global = True
GetDigits = .Replace(strIn, vbNullString)
End With
End Function
Here's another method, using worksheet formulas, for returning all of the digits in a string. Harlan Grove put it out there many years ago.
First define a Name (with Workbook scope):
Seq
Refers to: =ROW(INDEX($1:$65536,1,1):INDEX($1:$65536,255,1))
Then, assuming your string is in A1, use the following array-entered formula. (Formula is entered by holding down ctrl+shift while hitting Enter. (If you do this correctly, Excel will place braces {...} around the formula.
=SUM(IF(ISNUMBER(1/(MID(A1,Seq,1)+1)),MID(A1,Seq,1)*10^MMULT(-(Seq<TRANSPOSE(Seq)),-ISNUMBER(1/(MID(A1,Seq,1)+1)))))
So I have a column called chemical formula for like 40,000 entries, and what I want to be able to do is count up how many elements are contained in the chemical formula. So for example:-
EXACT_MASS FORMULA
626.491026 C40H66O5
275.173274 C13H25NO5
For this, I need some kind of formula that will return with the result of
C H O
40 66 5
13 25 5
all as separate columns for the different elements and in rows for the different entries. Is there a formula that can do this?
You could make your own formula.
Open the VBA editor with ALT and F11 and insert a new module.
Add a reference to Microsoft VBScript Regular Expressions 5.5 by clicking Tools, then references.
Now add the following code:
Public Function FormulaSplit(theFormula As String, theLetter As String) As String
Dim RE As Object
Set RE = CreateObject("VBScript.RegExp")
With RE
.Global = True
.MultiLine = False
.IgnoreCase = False
.Pattern = "[A-Z]{1}[a-z]?"
End With
Dim Matches As Object
Set Matches = RE.Execute(theFormula)
Dim TheCollection As Collection
Set TheCollection = New Collection
Dim i As Integer
Dim Match As Object
For i = (Matches.Count - 1) To 0 Step -1
Set Match = Matches.Item(i)
TheCollection.Add Mid(theFormula, Match.FirstIndex + (Len(Match.Value) + 1)), UCase(Trim(Match.Value))
theFormula = Left(theFormula, Match.FirstIndex)
Next
FormulaSplit = "Not found"
On Error Resume Next
FormulaSplit = TheCollection.Item(UCase(Trim(theLetter)))
On Error GoTo 0
If FormulaSplit = "" Then
FormulaSplit = "1"
End If
Set RE = Nothing
Set Matches = Nothing
Set Match = Nothing
Set TheCollection = Nothing
End Function
Usage:
FormulaSplit("C40H66O5", "H") would return 66.
FormulaSplit("C40H66O5", "O") would return 5.
FormulaSplit("C40H66O5", "blah") would return "Not found".
You can use this formula directly in your workbook.
I've had a stab at doing this in a formula nad come up with the following:
=IFERROR((MID($C18,FIND(D17,$C18)+1,2))*1,IFERROR((MID($C18,FIND(D17,$C18)+1,1))*1,IFERROR(IF(FIND(D17,$C18)>0,1),0)))
It's not very neat and would have to be expanded further if any of your elements are going to appear more than 99 times - I also used a random placement on my worksheet so the titles H,C and O are in row 17. I would personally go with Jamie's answer but just wanted to try this to see if I could do it in a formula possible and figured it was worth sharing just as another perspective.
Even though this has an excellent (and accepted) VBA solution, I couldn't resist the challenge to do this without using VBA.
I posted a solution earlier, which wouldn't work in all cases. This new code should always work:
=MAX(
IFERROR(IF(FIND(C$1&ROW($1:$99),$B2),ROW($1:$99),0),0),
IFERROR(IF(FIND(C$1&CHAR(ROW($65:$90)),$B2&"Z"),1,0),0)
)
Enter as an array formula: Ctrl + Shift + Enter
Output:
The formula outputs 0 when not found, and I simply used conditional formatting to turn zeroes gray.
How it works
This part of the formula looks for the element, followed by a number between 1 and 99. If found, the number of atoms is returned. Otherwise, 0 is returned. The results are stored in an array:
IFERROR(IF(FIND(C$1&ROW($1:$99),$B2),ROW($1:$99),0),0)
In the case of C13H25NO5, a search for "C" returns this array:
{1,0,0,0,0,0,0,0,0,0,0,0,13,0,0,0,...,0}
1 is the first array element, because C1 is a match. 13 is the thirteenth array element, and that's what we're interested in.
The next part of the formula looks for the element, followed by an uppercase letter, which indicates a new element. (The letters A through Z are characters 65 through 90.) If found, the number 1 is returned. Otherwise, 0 is returned. The results are stored in an array:
IFERROR(IF(FIND(C$1&CHAR(ROW($65:$90)),$B2&"Z"),1,0),0)
"Z" is appended to the chemical formula, so that a match will be found when its last element has no number. (For example, "H2O".) There is no element "Z" in the Periodic Table, so this won't cause a problem.
In the case of C13H25NO5, a search for "N" returns this array:
{0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0}
1 is the 15th element in the array. That's because it found the letters "NO", and O is the 15th letter of the alphabet.
Taking the maximum value from each array gives us the number of atoms as desired.
If I am having G4ED7883666 and I want the output to be 7883666
and I have to apply this on a range of cells and they are not the same length and the only common thing is that I have to delete anything before the number that lies before the alphabet?
This formula finds the last number in a string, that is, all digits to the right of the last alpha character in the string.
=RIGHT(A1,MATCH(99,IFERROR(1*MID(A1,LEN(A1)+1-ROW($1:$25),1),99),0)-1)
Note that this is an array formula and must be entered with the Control-Shift-Enter keyboard combination.
How the formula works
Let's assume that the target string is fairly simple: "G4E78"
Working outward from the middle of the formula, the first thing to do is create an array with the elements 1 through 25. (Although this might seem to limit the formula to strings with no more than 25 characters, it actually places a limit of 25 digits on the size of the number that may be extracted by the formula.
ROW($1:$25) = {1;2;3;4;5;6;7; etc.}
Subtracting from this array the value of (1 + the length of the target string) produces a new array, the elements of which count down from the length of string. The first five elements will correspond to the position of the characters of the string - in reverse order!
LEN(A1)+1-ROW($1:$25) = {5;4;3;2;1;0;-1;-2;-3;-4; etc.}
The MID function then creates a new array that reverses the order of the characters of the string.
For example, the first element of the new array is the result of MID(A1, 5, 1), the second of MID(A1, 4, 1) and so on. The #VALUE! errors reflect the fact that MID cannot evaluate 0 or negative values as the position of a string, e.g., MID(A1,0,1) = #VALUE!.
MID(A1,LEN(A1)+1-ROW($1:$25),1) = {"8";"7";"E";"4";"G";#VALUE!;#VALUE!; etc.}
Multiplying the elements of the array by 1 turns the character elements of that array to #VALUE! errors as well.
=1*MID(A1,LEN(A1)+1-ROW($1:$25),1) = {"8";"7";#VALUE!;"4";#VALUE!;#VALUE!;#VALUE!; etc.}
And the IFERROR function turns the #VALUES into 99, which is just an arbitrary number greater than the value of a single digit.
IFERROR(1*MID(A1,LEN(A1)+1-ROW($1:$25),1),99) = {8;7;99;4;99;99;99; etc.}
Matching on the 99 gives the position of the first non-digit character counting from the right end of the string. In this case, "E" is the first non-digit in the reversed string "87E4G", at position 3. This is equivalent to saying that the number we are looking for at the end of the string, plus the "E", is 3 characters long.
MATCH(99,IFERROR(1*MID(A1,LEN(A1)+1-ROW($1:$25),1),99),0) = 3
So, for the final step, we take 3 - 1 (for the "E) characters from the right of string.
RIGHT(A1,MATCH(99,IFERROR(1*MID(A1,LEN(A1)+1-ROW($1:$25),1),99),0)-1) = "78"
One more submission for you to consider. This VBA function will get the right most digits before the first non-numeric character
Public Function GetRightNumbers(str As String)
Dim i As Integer
For i = Len(str) To 0 Step -1
If Not IsNumeric(Mid(str, i, 1)) Then
Exit For
End If
Next i
GetRightNumbers = Mid(str, i + 1)
End Function
You can write some VBA to format the data (just starting at the end and working back until you hit a non-number.)
Or you could (if you're happy to get an addin like Excelicious) then you can use regular expressions to format the text via a formula. An expression like [0-9]+$ would return all the numbers at the end of a string IIRC.
NOTE: This uses the regex pattern in James Snell's answer, so please upvote his answer if you find this useful.
Your best bet is to use a regular expression. You need to set a reference to VBScript Regular Expressions for this to work. Tools --> References...
Now you can use regex in your VBA.
This will find the numbers at the end of each cell. I am placing the result next to the original so that you can verify it is working the way you want. You can modify it to replace the cell as soon as you feel comfortable with it. The code works regardless of the length of the string you are evaluating, and will skip the cell if it doesn't find a match.
Sub GetTrailingNumbers()
Dim ws As Worksheet
Dim rng As Range
Dim cell As Range
Dim result As Object, results As Object
Dim regEx As New VBScript_RegExp_55.RegExp
Set ws = ThisWorkbook.Sheets("Sheet1")
' range is hard-coded here, but you can define
' it programatically based on the shape of your data
Set rng = ws.Range("A1:A3")
' pattern from James Snell's answer
regEx.Pattern = "[0-9]+$"
For Each cell In rng
If regEx.Test(cell.Value) Then
Set results = regEx.Execute(cell.Value)
For Each result In results
cell.Offset(, 1).Value = result.Value
Next result
End If
Next cell
End Sub
Takes the first 4 digits from the right of num:
num1=Right(num,4)
Takes the first 5 digits from the left of num:
num1=Left(num,5)
First takes the first ten digits from the left then takes the first four digits from the right:
num1=Right(Left(num, 10),4)
In your case:
num=G4ED7883666
num1=Right(num,7)