Extracting digits from a cell with varying char length - excel

I have a group of cells, the first of the string never changes, it is and always will (until the coder changes it) 20 characters (inc spaces).
I then want to extract the 3 numbers (and in some cases 2) from the remaining sequence.
The monthly cost is 2 silver, 1 copper and 40 iron.
The monthly cost is 1 silver, 94 copper and 40 iron.
The monthly cost is 1 silver and 75 copper.
The monthly cost is 8 silver and 40 copper.
The monthly cost is 1 silver.
The monthly cost is 99 silver, 99 copper and 99 iron.
The monthly cost is 1 gold.
In the sample above you can see that there is no set value after the first 20 chars.
1 or 99 silver
1 or 99 copper
0, 1 or 99 iron
I can't get a sequence that gets all the cells correct, I've tried the following:
=IF(J7<>1,(MID(TRIM(J7),FIND(" iron",TRIM(J7))-2,FIND(" iron",TRIM(J7))-FIND(" iron",TRIM(J7))+3)),"")
results in: #VALUE! (when no iron)
=TRIM(MID(J6,FIND(" silver",J6)-2,LEN(J6)-FIND(" silver",J6)-26))&TRIM(MID(J6,FIND(" copper",J6)-2,LEN(J6)-FIND(" copper",J6)-16))&TRIM(MID(J6,FIND(" iron",J6)-2,LEN(J6)-FIND(" iron",J6)-3))
results in: 1 s9440
=MID(J7,31,2-ISERR(MID(J7,21,1)+0))
results in: nd
If I & the cells as part of the calculation, they then don't calculate in the next mathematical step as I've had to allow for spaces in my code, in the case that there may be 2 digit numbers, not single.
=MID(J5,SEARCH(" silver",J5,1)-2,2)&MID(J5,SEARCH(" copper",J5,1)-2,2)&MID(J5,SEARCH(" iron",J5,1)-2,2)
results: 2 140
not: 2140
What I need to end up with is:
2140
19440
175
840
1
999999
Many thanks in advance.

This formula worked for me with your data, assuming text string in cell A1
=IFERROR(MID(A1,SEARCH("silver",A1)-3,2)+0,"")&IFERROR(MID(A1,SEARCH("copper",A1)-3,2)+0,"")&IFERROR(MID(A1,SEARCH("iron",A1)-3,2)+0,"")
I assume you don't want the value for "Gold"?

When it comes to pattern matching in strings, RegEx if often the way to go.
In Excel, this requires a VBA solution, using a reference to "Microsoft VBScript Regular Expresions 5.5" (you can go late bound if you prefer)
Here's a starter for your case, as a UDF
Use it as a formula like =GetValues(A1) assuming 1st raw data is in A1. Copy down for as many rows as required
This will extract up to 3 values from a string.
Function GetValues(r As Range) As Variant
Dim re As RegExp
Dim m As MatchCollection
Dim v As Variant
Dim i As Long
Set re = New RegExp
re.Pattern = "(\d+)\D+(\d+)\D+(\d+)"
If re.test(r.Value) Then
Set m = re.Execute(r.Value)
Else
re.Pattern = "(\d+)\D+(\d+)"
If re.test(r.Value) Then
Set m = re.Execute(r.Value)
Else
re.Pattern = "(\d+)"
If re.test(r.Value) Then
Set m = re.Execute(r.Value)
End If
End If
End If
If m Is Nothing Then
GetValues = vbNullString
Else
For i = 0 To m.Item(0).SubMatches.Count - 1
v = v & m.Item(0).SubMatches(i)
Next
GetValues = v
End If
End Function

Since you are just stripping digits you can use a short one-shot RegExp if you wanted the VBA route:
Function GetDigits(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Pattern = "[^\d]+"
.Global = True
GetDigits = .Replace(strIn, vbNullString)
End With
End Function

Here's another method, using worksheet formulas, for returning all of the digits in a string. Harlan Grove put it out there many years ago.
First define a Name (with Workbook scope):
Seq
Refers to: =ROW(INDEX($1:$65536,1,1):INDEX($1:$65536,255,1))
Then, assuming your string is in A1, use the following array-entered formula. (Formula is entered by holding down ctrl+shift while hitting Enter. (If you do this correctly, Excel will place braces {...} around the formula.
=SUM(IF(ISNUMBER(1/(MID(A1,Seq,1)+1)),MID(A1,Seq,1)*10^MMULT(-(Seq<TRANSPOSE(Seq)),-ISNUMBER(1/(MID(A1,Seq,1)+1)))))

Related

Using VBA Regex To Remove a space within a pattern

I am very new to VBA and I'm working with data from a chemistry instrument which outputs values that are not uniformly delimited and contain special characters. I am trying to import these values into excel and have solved pretty much all of the problems except for one. When I am importing these values into excel they are read in line-by-line. Each line that is read-in is contained within its own cell in column A. There can be anywhere from 50- roughly 1000 columns of data, with the associated identifiers and metadata above. Below is a copy/paste of the first 5 lines of data.
1 7.724 1190 1231 1292 PV 4 724391 121434659 49.60% 9.688%
2 9.272 1451 1481 1484 VB 3961552 186833117 76.32% 14.905%
3 10.968 1732 1754 1816 VV 2673526 111034313 45.36% 8.858%
4 15.249 2382 2445 2453 PV 296082 33844178 13.82% 2.700%
5 15.384 2453 2466 2500 VV 219908 14461812 5.91% 1.154%
The problem I am having is that there are times when there are multiple peaks that make up one value and are recorded as 2 letters a space and one to two numbers (0-9), whereas peak types with only one peak are just two letters. For an example please look in line 1 where there is "PV 4". I am trying to use regular expressions to loop through the A column, starting at row 18 and ending around row 1000, to find the letters and associated numbers, and remove the interstitial space so that he cell will look like this:
1 7.724 1190 1231 1292 PV4 724391 121434659 49.60% 9.688%
Once it is in that form, I can use the space delimiter to separate the cells without frame shifting the ones that have the multiple peak types.
Here is the code I've written so far, but I am unsure how to proceed:
Sub PKTYRegexRemoveSpace()
Dim StrPattern As String: StrPattern = "[A,B,H,M,N,P,S,T,U,V,X,\+][A,B,H,M,N,P,S,T,U,V,X,\+]\s[0-9]{1,2}\s"
Dim StrInput As String
Dim MyRange As Range
Dim regEx As New RegExp
Dim Cell As Range
Set MyRange = ActiveSheet.Range("A22:A24")
For Each Cell In MyRange
If StrPattern <> "" Then
StrInput = Cell.Value
With regEx
.Pattern = StrPattern
.Global = False
.IgnoreCase = False
End With
If regEx.Test(StrInput) Then
MsgBox (regEx.Replace(StrInput, *this is where I need help*))
Else
MsgBox ("Not matched")
End If
End If
Next
End Sub
I am using a msgbox during devlopment in order to avoid having to re-import the file for every failed replacement attempt.
Any help would be greatly appreciated!
I suggest change the regex Pattern to use capturing groups and word boundary tokens
\b([A,B,H,M,N,P,S,T,U,V,X,\+][A,B,H,M,N,P,S,T,U,V,X,\+])\s([0-9]{1,2})\b
Then, for the replace string:
$1$2

Formula to search 3 words "text" in any sequence in different cell

Please take a look at the attached image. I have a long list of items and I've created a common keywords to search in that list. I'm using this formula:
=INDEX(A:A,MATCH((("*"&B2&"*")&("*"&C2&"*")&("*"&D2&"*")&("*"&E2&"*")&("*"&F2&"*")),A:A,0))
The problem that the search is going through the same sequence that I entered.
It gives error if the sequence of the words in the cell is different than the sequence in my formula which make sense.
Is there a way I can search for 3 or more words that are existing in any cell in any sequence?
I am open to using VBA if necessary.
My search results:
Here is the user defined function:
Public Function indexMX(rng As Range, pat1 As Range, pat2 As Range, pat3 As Range, pat4 As Range, pat5 As Range) As Variant
Dim r As Range, rngx As Range, s(1 To 5) As String, Kount As Long, j As Long
s(1) = pat1.Value
s(2) = pat2.Value
s(3) = pat3.Value
s(4) = pat4.Value
s(5) = pat5.Value
Set rngx = Intersect(rng, rng.Parent.UsedRange)
For Each r In rngx
v = r.Value
Kount = 0
For j = 1 To 5
If InStr(1, v, s(j)) > 0 Or s(j) = "" Then Kount = Kount + 1
Next j
If Kount = 5 Then
indexMX = v
Exit Function
End If
Next r
indexMX = "no luck"
End Function
Here is an example of its usage:
As you see, we give the UDF() the address of the column and the addresses of the five keywords and the UDF() finds the first item containing all five words.
If a keyword is blank, it is not used. (so if you want to search for only two keywords, leave the other three blank). If no matches are found the phrase no luck is returned.
User Defined Functions (UDFs) are very easy to install and use:
ALT-F11 brings up the VBE window
ALT-I
ALT-M opens a fresh module
paste the stuff in and close the VBE window
If you save the workbook, the UDF will be saved with it.
If you are using a version of Excel later then 2003, you must save
the file as .xlsm rather than .xlsx
To remove the UDF:
bring up the VBE window as above
clear the code out
close the VBE window
To use the UDF from Excel:
=myfunction(A1)
To learn more about macros in general, see:
http://www.mvps.org/dmcritchie/excel/getstarted.htm
and
http://msdn.microsoft.com/en-us/library/ee814735(v=office.14).aspx
and for specifics on UDFs, see:
http://www.cpearson.com/excel/WritingFunctionsInVBA.aspx
Macros must be enabled for this to work!
EDIT#1:
to remove case sensitivity, replace:
If InStr(1, v, s(j)) > 0 Or s(j) = "" Then Kount = Kount + 1
with:
If InStr(1, LCase(v), LCase(s(j))) > 0 Or s(j) = "" Then Kount = Kount + 1
Yes, it is possible for a single cell to return three matching words from a different cell. The answer in this example uses a formula to return 6 matches. VBA and special array functions are not used.
This is the formula:
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUMPRODUCT((IFERROR(SEARCH(FIRST,TARGET),0)>0)*100000+(IFERROR(SEARCH(SECOND,TARGET),0)>0)*20000+(IFERROR(SEARCH(THIRD,TARGET),0)>0)*3000+(IFERROR(SEARCH(FOURTH,TARGET),0)>0)*400+(IFERROR(SEARCH(FIFTH,TARGET),0)>0)*50+(IFERROR(SEARCH(SIXTH,TARGET),0)>0)*6),"1",FIRST&DELIM),"2",SECOND&DELIM),"3",THIRD&DELIM),"4",FOURTH&DELIM),"5",FIFTH&DELIM),"6",SIXTH&DELIM),"0","")
Did you notice the named ranges? FIRST, SECOND, THIRD, etc are individual cells and each one holds a word. We are trying to find those words inside TARGET. If we find the words, then we will write them in this cell holding this formula and each word will be separated by DELIM
The ranges are optional. In the picture below, you'll see cell A2 contains the word "named". This is the first of the six words we're tying to find and it can be expressed as FIRST = "A2" = "named" Inside the formula you'll see that FIRST appears twice. You could replace it with "named" and the cell A2 would become blank but the functionality of the formula would not change.
Even TARGET is optional. It could be written as E1 or typed out word for word.
I don't know why anyone would do that... but it is possible.
DELIM is at cell B2, it is a double space
Now to explain how it works
SEARCH(search for what?, search where?) This is responsible for determining if a match exists or not. If you understand what the named ranges then you've already figured out the syntax is. The location of first letter that of match in TARGET is returned. In this formula it is always 1 If it is not found then the number is 0
IFERROR(value,value) tries to perform the operation. If successful then the result is displayed. If there's an error the second result is displayed. Every IFERROR in this formula is practically the same: IFERROR(SEARCH(FIRST,TARGET),0) It searches inside TARGET trying to find the FIRST word. Result if found is 1 and if not found is 0
It gets a little more complicated from here so lets recap. We're calling SEARCH 6 times. Once for each word we want to find and we're always looking in TARGET. Result will be a 1 if match is found or a 0 if not. Ironically, us humans can put it together and see the match but the formula can't determine which words have been matched without more information
SUMPRODUCT takes the sum (addition) of the product (multiplication) of two or more arrays.
multiply two arrays to get the product
a, b, c * e, f, g = ae, bf, cg
takethe sum of the product to get the SUMPRODUCT
ae + bf + cg`
This is easiest when thought of a price and quantity. If one array is the price of a group of items and the other is quantity of the same group of items, then multiplying the two arrays will create a new array where each element is the cost to buy all items of that type in the group the total, and adding all those numbers give you the total cost you'd pay for all of the items
Here we multiply two arrays:
Qty Price
12.0 0.3
70.0 0.1
20.0 0.4
Multiply them to get the product:
Qty Price Total
12.0 0.3 3.8
70.0 0.1 7.0
20.0 0.4 8.0
Take the sum of the product:
Qty Price Total
12.0 0.3 3.8
70.0 0.1 7.0
20.0 0.4 8.0
18.8 SUMPRODUCT
Lets look at part of the formula:
SUMPRODUCT((IFERROR(SEARCH(FIRST,TARGET),0)>0)*100000+IFERROR(SEARCH(SECOND,TARGET),0)>0)*20000+...
It is easy to see this segment is looking for two words. We know SUMPRODUCT want's to multiply and add arrays. If you're thinking (IFERROR(SEARCH(FIRST,TARGET),0)>0) is an array, you'd be right! It's not an array in the technical sense of the word, but it does evaluate into a single value which can thought of as a 1x1 array, or, a cell. The sharp eyed and quick witted, may have noticed there is something on this array we have mentioned. It's the inequality at the end! Many of you know that you can take numerical values and turn them into boolean by testing them with an inequality. So lets evaluate.... SEARCH for FIRST inside TARGET = 1 because FIRST = "named" which is inside TARGET waaaaaaay in the back and because it wasn't an error we get to keep the 1. Next we do the inequality 1 > 0 = TRUE One is greater than zero and evaluates to TRUE
This is what we have right now
SUMPRODUCT((TRUE*100000+IFERROR(SEARCH(SECOND,TARGET),0)>0)*20000+....
Can you identify the arrays now? We know TRUE is an array, a 1x1. You know the IFERROR bit all the way to the inequality is also an array. Lets evaluate that IFERROR .... Mathematically we should still be working from left to right but trust me, we're ok if we let it slide this once.
IFERROR(SEARCH(SECOND,TARGET),0)>0) SECOND = "array" = 1 = TRUE
Did you follow my short hand? It's ok if you didn't just back up and practice on FIRST until you understand.
Plugging in the value gives us something like this
SUMPRODUCT(TRUE*100000+TRUE*20000+...
SUMPRODUCT is the SUM(addition) of the PRODUCT(multiplication)
So we're adding the stuff we multiply
SUMPRODUCT = (TRUE * 100000) + (TRUE * 20000)
Remember how easy it was to go from 1 > 0 to TRUE. We're "getting all up into that boolean TRUE that" equals 1 Here's a fun fact, -1 is also equal to TRUE. If you've ever seen a formula with a double negative in it like this STUFF(--(MORESTUFF( that's just some Excel wizard person making sure they get a +1 instead of a -1 ... ok, so lets get back on track and evalute
SUMPRODCUT = 1 * 100000 + 1 * 20000+....
SUMPRODCUT = 100000 + 20000+....
SUMPRODCUT = 120000+.....
I know you've been asking about those numbers. One hundred thousand? What's one hundred thousand for? I've been purposefully ignoring until it became convenient to talk about it. And now it's convenient. Go look at the whoooole formula and you'll find a pattern. Those numbers are in a decreasing sequence. Anyone who's ever done bitwise logic can see where this is going.I'm running short on time so I'll cut to the chase. Assume a hypothetical situation where every word was matched. You'd end up with
SUMPRODUCT = 100000 + 20000 + 3000 + 400 + 50 + 6
SUMPRODCUT = 123456
123456 are you pulling my leg? No, I am not. We're almost done so if you're still with me then you're gonna drive it home.
We have a large group of SUBSTITUTE teachers at the front of the line and we gotta get rid of them.
We also have this to contend with :"1",FIRST&DELIM),"2",SECOND&DELIM),"3",THIRD&DELIM),"4",FOURTH&DELIM),"5",FIFTH&DELIM),"6",SIXTH&DELIM),"0","")
Thankfully for us, they are part of the same problem. We worked our way from the middle out.
SUBSTITUTE(SUBSTITUTE(text, old text, new text)
SUBSTITUTE(SUBSTITUTE("123456","1", FIRST & DELIM),"2",SECOND & DELIM)....
Remember at top DELIM was pointing to a cell holding a double space. Each DELIM can be replaced with " " or any other delimiter you want.
SUBSTITUTE(SUBSTITUTE("123456","1", "named" & " "),"2",SECOND & DELIM)...
SUBSTITUTE("named 23456","2",SECOND & DELIM)...
SUBSTITUTE("named 23456","2","array"& " ")...
("named array 3456")... and so on.
Any questions?
Ok, class is dismissed!

How to get the Decimal (REAL) number Only in an string in using an Excel formula or VBA?

Help Needed: How to get the Decimal (REAL) number Only in an string in using an Excel formula or VBA?
I have in "column A" a string with just one decimal number on it. I want to extract that decimal (REAL) number ONLY but it is extracting the first number on the string. See below for details...
Current Situation:
I am using on Column B the Formula:
=LOOKUP(9.9E+307,--LEFT(MID(A1,MIN(FIND({1,2,3,4,5,6,7,8,9,0}, $A1&"1023456789")),999),ROW(INDIRECT("1:999"))))
Column A | Column B
"Some text"... The Value of Project 456 is 12.56 ... more text. | 456
Some Text"... Project 459 value is 13.5 ... "more text" | 459
Desired Situation:
I want to get the decimal (REAL) number ONLY out of the string and ignore the numbers that doesn't contain decimals from Column A. Example:
Column A | Column B
"Some text"... The Value of Project 456 is 12.56 ... more text. | 12.56
Some Text"... Project 459 value is 13.5 ... "more text" | 13.5
Any help needed is appreciated, could be an excel formula or VBA solution.
Thank you!
Try this UDF pasted into a standard public module code sheet.
Function realNum(str As String, _
Optional ndx As Integer = 1)
Dim tmp As String
Static rgx As Object
'with rgx as static, it only has to be created once; beneficial when filling a long column with this UDF
If rgx Is Nothing Then
Set rgx = CreateObject("VBScript.RegExp")
End If
realNum = 0
With rgx
.Global = True
.IgnoreCase = True
.MultiLine = True
.Pattern = "[0-9]{1,9}\.[0-9]{1,9}"
If .Test(str) Then
realNum = CDbl(.Execute(str)(ndx - 1))
End If
End With
End Function
Note that while it defaults to the first occurrence (e.g. B2), you can apply an optional parameter to get the second, third, etc. (e.g. B3).
This Array formula will sum all the number that have . in them, per stirng. If only one per string then it returns just that one:
=SUM(IF((ISNUMBER(--TRIM(MID(SUBSTITUTE(A1," ",REPT(" ",99)),ROW($1:$25)*99-98,99)))) * (ISNUMBER(FIND(".",TRIM(MID(SUBSTITUTE(A1," ",REPT(" ",99)),ROW($1:$25)*99-98,99))))),--TRIM(MID(SUBSTITUTE(A1," ",REPT(" ",99)),ROW($1:$25)*99-98,99))))
Being an array formula it must be confirmed with Ctrl-Shift-enter instead of Enter when exiting edit mode:

comparing two address columns

I have addresses in U and V. I want to see if they are somewhat similar and if they are say "Update" If not say "Omit".
For example 246 N High street in U and 246 North High St in V would return a value of Update.
246 N High Street in U and 458 Auburn Drive in V would return a value of Omit.
Any ideas?
There are a lot of algorithms for doing fuzzy matching. One of the easier ones to implement in excel is N-Gram.
To perform an n-gram match, we have to break each address up into a list of sets of smaller character lengths. A 2-gram list of your address 246 N High street would look like 24,46,6 , N,N , H,Hi,ig,gh,h , s,st,tr,re,ee,et. We could do the same with a 3-gram: 246,46 ,6 N, N ,N H, Hi,Hig,igh,gh ,h s, st,str,tre,ree,eet
We do this with both addresses, then we can check each item in the first address's list to see if it appears in the second address's list; count the matches and divide that by the number of items in the first list. That will give you a percentage of how close they are.
You could get fancy with cell formulas mid() and countif() to do this with sheet formulas, but I think it's easier to just write it out in VBA and make it a UDF.
Function NGramCompare(string1 As String, string2 As String, intGram As Integer) As Double
'Take in two strings and the N-gram
Dim intChar As Integer, intGramMatch As Integer
Dim ngramList1 As String, ngramList2 As String, nGram As Variant
Dim nGramArr1 As Variant
'split the first string into a list of ngrams
For intChar = 1 To Len(string1) - (intGram-1)
If ngramList1 <> "" Then ngramList1 = ngramList1 & ","
ngramList1 = ngramList1 & Mid(string1, intChar, intGram)
Next intChar
'split the secong string into a list of ngrams
For intChar = 1 To Len(string2) - (intGram-1)
If ngramList2 <> "" Then ngramList2 = ngramList2 & ","
ngramList2 = ngramList2 & Mid(string2, intChar, intGram)
Next intChar
'Split the ngramlist1 into an array through which we can iterate
nGramArr1 = Split(ngramList1, ",")
'Iterate through array and compare values to ngramlist2
For Each nGram In nGramArr1
If InStr(1, ngramList2, nGram) Then
'we found a match, add to the counter
intGramMatch = intGramMatch + 1
End If
Next nGram
'output the percentage of grams matching.
NGramCompare = intGramMatch / (UBound(nGramArr1) + 1)
End Function
If you've never used a UDF:
Go to visual basic editor (VBE) with Alt+F11
In the VBA Project window, find your workbook and right click on the name
Choose: Insert>>Module
Double click the new module in the list to bring up it's code window
Paste this function in and save your workbook
Then, assuming address1 is in A1 and address2 is in B1 you can put, in C1:
=NGramCompare(A1, B1, 2)
Which, for your first address, will spit out 56%. Which seems like a reasonably good match. If you find you are getting too many positive hits, you can change your 2-gram to be a 3-gram by changing that last parameter.
To take it a step further so it will say "Update" or "Omit" you could do:
=If(NGramCompare(A1, B1, 2)>.30, "Update", "Omit")
I just set that so that it will consider a match anything above 30%, but you can adjust as necessary. No matter where you set it, you will probably end up with a percentage of compares that are false positives or false negatives, but that's the way fuzzy matching goes.
Some of the naive approaches can be to compare the first few characters
=LEFT(A1,5)=LEFT(B1,5)
or to replace parts until they match
=(SUBSTITUTE(SUBSTITUTE(LOWER(A2)," street"," ST")," north "," N ")
=SUBSTITUTE(SUBSTITUTE(LOWER(B2)," street"," ST")," north "," N "))
both will probably turn into a big ugly formula after adjusting for most cases

How to count up elements in excel

So I have a column called chemical formula for like 40,000 entries, and what I want to be able to do is count up how many elements are contained in the chemical formula. So for example:-
EXACT_MASS FORMULA
626.491026 C40H66O5
275.173274 C13H25NO5
For this, I need some kind of formula that will return with the result of
C H O
40 66 5
13 25 5
all as separate columns for the different elements and in rows for the different entries. Is there a formula that can do this?
You could make your own formula.
Open the VBA editor with ALT and F11 and insert a new module.
Add a reference to Microsoft VBScript Regular Expressions 5.5 by clicking Tools, then references.
Now add the following code:
Public Function FormulaSplit(theFormula As String, theLetter As String) As String
Dim RE As Object
Set RE = CreateObject("VBScript.RegExp")
With RE
.Global = True
.MultiLine = False
.IgnoreCase = False
.Pattern = "[A-Z]{1}[a-z]?"
End With
Dim Matches As Object
Set Matches = RE.Execute(theFormula)
Dim TheCollection As Collection
Set TheCollection = New Collection
Dim i As Integer
Dim Match As Object
For i = (Matches.Count - 1) To 0 Step -1
Set Match = Matches.Item(i)
TheCollection.Add Mid(theFormula, Match.FirstIndex + (Len(Match.Value) + 1)), UCase(Trim(Match.Value))
theFormula = Left(theFormula, Match.FirstIndex)
Next
FormulaSplit = "Not found"
On Error Resume Next
FormulaSplit = TheCollection.Item(UCase(Trim(theLetter)))
On Error GoTo 0
If FormulaSplit = "" Then
FormulaSplit = "1"
End If
Set RE = Nothing
Set Matches = Nothing
Set Match = Nothing
Set TheCollection = Nothing
End Function
Usage:
FormulaSplit("C40H66O5", "H") would return 66.
FormulaSplit("C40H66O5", "O") would return 5.
FormulaSplit("C40H66O5", "blah") would return "Not found".
You can use this formula directly in your workbook.
I've had a stab at doing this in a formula nad come up with the following:
=IFERROR((MID($C18,FIND(D17,$C18)+1,2))*1,IFERROR((MID($C18,FIND(D17,$C18)+1,1))*1,IFERROR(IF(FIND(D17,$C18)>0,1),0)))
It's not very neat and would have to be expanded further if any of your elements are going to appear more than 99 times - I also used a random placement on my worksheet so the titles H,C and O are in row 17. I would personally go with Jamie's answer but just wanted to try this to see if I could do it in a formula possible and figured it was worth sharing just as another perspective.
Even though this has an excellent (and accepted) VBA solution, I couldn't resist the challenge to do this without using VBA.
I posted a solution earlier, which wouldn't work in all cases. This new code should always work:
=MAX(
IFERROR(IF(FIND(C$1&ROW($1:$99),$B2),ROW($1:$99),0),0),
IFERROR(IF(FIND(C$1&CHAR(ROW($65:$90)),$B2&"Z"),1,0),0)
)
Enter as an array formula: Ctrl + Shift + Enter
Output:
The formula outputs 0 when not found, and I simply used conditional formatting to turn zeroes gray.
How it works
This part of the formula looks for the element, followed by a number between 1 and 99. If found, the number of atoms is returned. Otherwise, 0 is returned. The results are stored in an array:
IFERROR(IF(FIND(C$1&ROW($1:$99),$B2),ROW($1:$99),0),0)
In the case of C13H25NO5, a search for "C" returns this array:
{1,0,0,0,0,0,0,0,0,0,0,0,13,0,0,0,...,0}
1 is the first array element, because C1 is a match. 13 is the thirteenth array element, and that's what we're interested in.
The next part of the formula looks for the element, followed by an uppercase letter, which indicates a new element. (The letters A through Z are characters 65 through 90.) If found, the number 1 is returned. Otherwise, 0 is returned. The results are stored in an array:
IFERROR(IF(FIND(C$1&CHAR(ROW($65:$90)),$B2&"Z"),1,0),0)
"Z" is appended to the chemical formula, so that a match will be found when its last element has no number. (For example, "H2O".) There is no element "Z" in the Periodic Table, so this won't cause a problem.
In the case of C13H25NO5, a search for "N" returns this array:
{0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0}
1 is the 15th element in the array. That's because it found the letters "NO", and O is the 15th letter of the alphabet.
Taking the maximum value from each array gives us the number of atoms as desired.

Resources