I have a list of strings in excel as such:
a>b>b>d>c>a
a>b>c>d
b>b>b>d>d>a
etc.
I want to extract the last c or last d from each string whichever comes last,
e.g
a>b>b>d>c>a = C
a>b>c>d = d
b>b>b>d>d>a = d
how would I do this using VBA (or just straight excel if it is possible)?
You could use an excel formula as follows
To help explain will start with just one letter then will show full formula at the end.
First find the number of occurences of c
= LEN(A1) - LEN(SUBSTITUTE(A1,"c","")
Use this position to replace the last c with a unique character ($ as an example)
=SUBSTITUTE(A1,"c","$",LEN(A1) - LEN(SUBSTITUTE(A1,"c","")))
Next find this unique character
= FIND("$",SUBSTITUTE(A1,"c","$",LEN(A1) - LEN(SUBSTITUTE(A1,"c",""))))
This gives the position of the last c, now you can use this in a mid function to return this last c
= MID(A1,FIND("$",SUBSTITUTE(A1,"c","$",LEN(A1) - LEN(SUBSTITUTE(A1,"c","")))),1)
Finally to account for both c and d, use a max to bring back which comes last
= MID(A1,MAX(IFERROR(FIND("$",SUBSTITUTE(A1,"c","$",LEN(A1) - LEN(SUBSTITUTE(A1,"c","")))),0),IFERROR(FIND("$",SUBSTITUTE(A1,"d","$",LEN(A1) - LEN(SUBSTITUTE(A1,"d","")))),0)),1)
Assuming c/d are just examples:
?LastEither("b>b>b>d>d>a", "c", "d")
d
Using
Function LastEither(testStr As String, find1 As String, find2 As String) As String
Dim p1 As Long: p1 = InStrRev(testStr, find1)
Dim p2 As Long: p2 = InStrRev(testStr, find2)
If (p1 > p2) Then
LastEither = find1
ElseIf (p2 > 0) Then LastEither = find2
End If
End Function
General solution:
?FindLastMatch("b>b>b>d>d>a>q>ZZ", ">", "c", "d")
d
?FindLastMatch("b>b>b>d>d>a>q>ZZ", ">", "c", "d", "q")
q
?FindLastMatch("b>b>b>d>d>a>q>ZZ>ppp", ">", "c", "d", "ZZ", "q")
ZZ
Using
Function FindLastMatch(testStr As String, delimiter As String, ParamArray findTokens() As Variant) As String
Dim tokens() As String, i As Long, j As Long
tokens = Split(testStr, delimiter)
For i = UBound(tokens) To 0 Step -1
For j = 0 To UBound(findTokens)
If tokens(i) = findTokens(j) Then
FindLastMatch = tokens(i)
Exit Function
End If
Next
Next
End Function
And here is a array formula to do the same thing. (Changed formula to avoid problem with original pointed out by Grade 'Eh' Bacon)
=MID(A1,MAX((MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)={"c","d"})*ROW(INDIRECT("1:"&LEN(A1)))),1)
An array formula is entered by holding down ctrl+shift while hitting enter. If you do it correctly, Excel will place braces {...} around the formula which you can see in the formula bar.
The formula will return a #VALUE! error if there is neither c nor d in the string.
EDIT: Having seen from some of your comments that you might want to use more than single character words, I present the following User Defined Function. It allows you to use words of any length, and also you are not limited to just two words -- you can use an arbitrary number of words.
You would enter a formula such as:
=LastOne(A8,"Charlie","Delta")
or
=LastOne(A8,$I1:$I2)
where I1 and I2 contain the words you wish to check for.
The words need to be separated by some delimiter that is neither a letter nor a digit.
A Regular Expression (regex) is constructed which consists of a pipe-separated | list of the words or phrases. The pipe | , in a regex, is the same as an OR. The \b at the beginning and end of the regex indicates a word boundary -- that is the point at which a digit or letter is adjacent to a non-digit or non-letter, or the beginning or end of the string. Hence the actual delimiter does not matter, so long as it is not a letter or digit.
All of the matches are placed in a Match Collection; and we only need to look for the last item in the match. There will be MC.Count matches and, since this count is zero based, we subtract one to get the last match.
Here is the code:
===========================================
Option Explicit
Function LastOne(sSearch As String, ParamArray WordList() As Variant) As String
Dim RE As Object, MC As Object
Dim sPat As String
Dim RNG, C
For Each RNG In WordList
If IsArray(RNG) Or IsObject(RNG) Then
For Each C In RNG
sPat = sPat & "|" & C
Next C
Else
sPat = sPat & "|" & RNG
End If
Next RNG
sPat = "\b(?:" & Mid(sPat, 2) & ")\b"
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.Pattern = sPat
.ignorecase = True
If .test(sSearch) = True Then
Set MC = .Execute(sSearch)
LastOne = MC(MC.Count - 1)
End If
End With
End Function
===========================================
Here is a sample screenshot:
Note that an absence of a WordList word will result in a blank cell. One could produce an error if that is preferable.
In VBA you can do this using following simple logic.
Dim str As String
str = "a>b>b>d>c>a"
Dim Cet
Cet = split(str,">")
Dim i as Integer
For i= Ubound(Cet) to Lbound(Cet)
If Cet(i) = "c" or "d" or "C" or "D" then
MsgBox Cet(i)
Exit For
End if
Next i
Assuming your string is in cell A1, and there are no uses of the tilde (~) character in it, you can use the following in a worksheet:
=IF(IFERROR(FIND("~",SUBSTITUTE(A1,"c","~",LEN(A1)-LEN(SUBSTITUTE(A1,"c","")))),0)>IFERROR(FIND("~",SUBSTITUTE(A1,"d","~",LEN(A1)-LEN(SUBSTITUTE(A1,"d","")))),0),"c","d")
EDIT:
In response to a comment, here's an explanation of how this works. I've also neatened up the formula slightly having looked back at it again. The two formulae for c and d are identical, so the explanation will apply for both. So, working outwards
LEN(A1)-LEN(SUBSTITUTE(A1,"c",""))
Here we remove all instances of c from the string. By comparing the length of this calculated string and the original string, we calculate the number of times c appears in the original string.
SUBSTITUTE(A1,"c","~",LEN(A1)-LEN(SUBSTITUTE(A1,"c","")))
Now that we know the number of times c appears in our string, we
replace the last occurrence of c with the tilde character (here we assume the tilde isn't used in the string otherwise).
FIND("~",SUBSTITUTE(A1,"c","~",LEN(A1)-LEN(SUBSTITUTE(A1,"c",""))))
We then find the position of the tilde in the string, which is equivalent to the position of the last c in the string.
IFERROR(FIND("~",SUBSTITUTE(A1,"c","~",LEN(A1)-LEN(SUBSTITUTE(A1,"c","")))),0)
Wrapping this in an IFERROR ensures that we don't have errors coming through the formula - setting the value to 0 if no c exists ensures that we still get a correct answer if our string contains c but not d (and vice versa).
We then apply the same calculation to d and compare the two to see which occurs later in our string. Note: this will give an incorrect answer if there is neither c nor d in the string.
Related
I have a table which has particular string values. SP-1, SP-2, SP-3,.. SP-8
and also V-4 and V-8. I want to add numbers present in the string. The string will be same (either SP- or V-). The numbers following the string will be different. The sum should be separate for each string type.
I have seen many solutions but not able to adapt them.
The table may contain empty cells. Hence I am unable to use Value function.
I want to check the entire table for all SP- strings and V- strings and have the sum of each type. I want to achieve this using formula and not macros. Can any of you help me with the formula
Use this array formula:
=SUM(IFERROR(--SUBSTITUTE($A$1:$A$6,C1&"-",""),0))
Being an array formula it needs be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode.
Try the following User Defined Function:
Public Function SpecialAdder(rng As Range, p As String) As Variant
Dim L As Long, r As Range
If p = "" Then
SpecialAdder = Application.WorksheetFunction.Sum(rng)
Exit Function
End If
SpecialAdder = 0
L = Len(p)
For Each r In rng
If Left(r.Value, L) = p Then
SpecialAdder = SpecialAdder + Mid(r.Value, L + 1)
End If
Next r
End Function
It would be used in the worksheet like:
=specialadder(A1:A100,"SP-")
Good day everyone,
I am trying to find a smart solution of extracting 8 digits from a cell (unique ID). The problem here occurs that it might look like this:
112, 65478411, sale
746, id65478411, sale 12.50
999, 65478411
999, id65478411
Thats most of the cases, and probably all mentioned, so I basically need to find the 8 digits in the cell and extract them into different cell. Does anyone have any ideas? I though of eliminating the first characted, then check if the cell is starting with the id, eliminate it further but I understood that this is not the smart way..
Thank you for the insights.
Try this formula:
=--TEXT(LOOKUP(10^8,MID(SUBSTITUTE(A1," ","x"),ROW(INDIRECT("1:"&LEN(A1)-7)),8)+0),"00000000")
This will return the 8 digit number in the string.
To return just the text then:
=TEXT(LOOKUP(10^8,MID(SUBSTITUTE(A1," ","x"),ROW(INDIRECT("1:"&LEN(A1)-7)),8)+0),"00000000")
You can also write a UDF to accomplish this task, example below
Public Function GetMy8Digits(cell As Range)
Dim s As String
Dim i As Integer
Dim answer
Dim counter As Integer
'get cell value
s = cell.Value
'set the counter
counter = 0
'loop through the entire string
For i = 1 To Len(s)
'check to see if the character is a numeric one
If IsNumeric(Mid(s, i, 1)) = True Then
'add it to the answer
answer = answer + Mid(s, i, 1)
counter = counter + 1
'check to see if we have reached 8 digits
If counter = 8 Then
GetMy8Digits = answer
Exit Function
End If
Else
'was not numeric so reset counter and answer
counter = 0
answer = ""
End If
Next i
End Function
Here is an alternative:
=RIGHT(TRIM(MID(SUBSTITUTE(A1,",",REPT(" ",LEN(A1))),LEN(A4),LEN(A1))),8)
Replace all commas with spaces repeated the length of the string,
Then take the mid point starting from the length of the original string for the length of the string (ie second word in new string)
Trim out the spaces
take the right 8 chars to trim out any extra chars (like id)
So I have a column called chemical formula for like 40,000 entries, and what I want to be able to do is count up how many elements are contained in the chemical formula. So for example:-
EXACT_MASS FORMULA
626.491026 C40H66O5
275.173274 C13H25NO5
For this, I need some kind of formula that will return with the result of
C H O
40 66 5
13 25 5
all as separate columns for the different elements and in rows for the different entries. Is there a formula that can do this?
You could make your own formula.
Open the VBA editor with ALT and F11 and insert a new module.
Add a reference to Microsoft VBScript Regular Expressions 5.5 by clicking Tools, then references.
Now add the following code:
Public Function FormulaSplit(theFormula As String, theLetter As String) As String
Dim RE As Object
Set RE = CreateObject("VBScript.RegExp")
With RE
.Global = True
.MultiLine = False
.IgnoreCase = False
.Pattern = "[A-Z]{1}[a-z]?"
End With
Dim Matches As Object
Set Matches = RE.Execute(theFormula)
Dim TheCollection As Collection
Set TheCollection = New Collection
Dim i As Integer
Dim Match As Object
For i = (Matches.Count - 1) To 0 Step -1
Set Match = Matches.Item(i)
TheCollection.Add Mid(theFormula, Match.FirstIndex + (Len(Match.Value) + 1)), UCase(Trim(Match.Value))
theFormula = Left(theFormula, Match.FirstIndex)
Next
FormulaSplit = "Not found"
On Error Resume Next
FormulaSplit = TheCollection.Item(UCase(Trim(theLetter)))
On Error GoTo 0
If FormulaSplit = "" Then
FormulaSplit = "1"
End If
Set RE = Nothing
Set Matches = Nothing
Set Match = Nothing
Set TheCollection = Nothing
End Function
Usage:
FormulaSplit("C40H66O5", "H") would return 66.
FormulaSplit("C40H66O5", "O") would return 5.
FormulaSplit("C40H66O5", "blah") would return "Not found".
You can use this formula directly in your workbook.
I've had a stab at doing this in a formula nad come up with the following:
=IFERROR((MID($C18,FIND(D17,$C18)+1,2))*1,IFERROR((MID($C18,FIND(D17,$C18)+1,1))*1,IFERROR(IF(FIND(D17,$C18)>0,1),0)))
It's not very neat and would have to be expanded further if any of your elements are going to appear more than 99 times - I also used a random placement on my worksheet so the titles H,C and O are in row 17. I would personally go with Jamie's answer but just wanted to try this to see if I could do it in a formula possible and figured it was worth sharing just as another perspective.
Even though this has an excellent (and accepted) VBA solution, I couldn't resist the challenge to do this without using VBA.
I posted a solution earlier, which wouldn't work in all cases. This new code should always work:
=MAX(
IFERROR(IF(FIND(C$1&ROW($1:$99),$B2),ROW($1:$99),0),0),
IFERROR(IF(FIND(C$1&CHAR(ROW($65:$90)),$B2&"Z"),1,0),0)
)
Enter as an array formula: Ctrl + Shift + Enter
Output:
The formula outputs 0 when not found, and I simply used conditional formatting to turn zeroes gray.
How it works
This part of the formula looks for the element, followed by a number between 1 and 99. If found, the number of atoms is returned. Otherwise, 0 is returned. The results are stored in an array:
IFERROR(IF(FIND(C$1&ROW($1:$99),$B2),ROW($1:$99),0),0)
In the case of C13H25NO5, a search for "C" returns this array:
{1,0,0,0,0,0,0,0,0,0,0,0,13,0,0,0,...,0}
1 is the first array element, because C1 is a match. 13 is the thirteenth array element, and that's what we're interested in.
The next part of the formula looks for the element, followed by an uppercase letter, which indicates a new element. (The letters A through Z are characters 65 through 90.) If found, the number 1 is returned. Otherwise, 0 is returned. The results are stored in an array:
IFERROR(IF(FIND(C$1&CHAR(ROW($65:$90)),$B2&"Z"),1,0),0)
"Z" is appended to the chemical formula, so that a match will be found when its last element has no number. (For example, "H2O".) There is no element "Z" in the Periodic Table, so this won't cause a problem.
In the case of C13H25NO5, a search for "N" returns this array:
{0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0}
1 is the 15th element in the array. That's because it found the letters "NO", and O is the 15th letter of the alphabet.
Taking the maximum value from each array gives us the number of atoms as desired.
If I am having G4ED7883666 and I want the output to be 7883666
and I have to apply this on a range of cells and they are not the same length and the only common thing is that I have to delete anything before the number that lies before the alphabet?
This formula finds the last number in a string, that is, all digits to the right of the last alpha character in the string.
=RIGHT(A1,MATCH(99,IFERROR(1*MID(A1,LEN(A1)+1-ROW($1:$25),1),99),0)-1)
Note that this is an array formula and must be entered with the Control-Shift-Enter keyboard combination.
How the formula works
Let's assume that the target string is fairly simple: "G4E78"
Working outward from the middle of the formula, the first thing to do is create an array with the elements 1 through 25. (Although this might seem to limit the formula to strings with no more than 25 characters, it actually places a limit of 25 digits on the size of the number that may be extracted by the formula.
ROW($1:$25) = {1;2;3;4;5;6;7; etc.}
Subtracting from this array the value of (1 + the length of the target string) produces a new array, the elements of which count down from the length of string. The first five elements will correspond to the position of the characters of the string - in reverse order!
LEN(A1)+1-ROW($1:$25) = {5;4;3;2;1;0;-1;-2;-3;-4; etc.}
The MID function then creates a new array that reverses the order of the characters of the string.
For example, the first element of the new array is the result of MID(A1, 5, 1), the second of MID(A1, 4, 1) and so on. The #VALUE! errors reflect the fact that MID cannot evaluate 0 or negative values as the position of a string, e.g., MID(A1,0,1) = #VALUE!.
MID(A1,LEN(A1)+1-ROW($1:$25),1) = {"8";"7";"E";"4";"G";#VALUE!;#VALUE!; etc.}
Multiplying the elements of the array by 1 turns the character elements of that array to #VALUE! errors as well.
=1*MID(A1,LEN(A1)+1-ROW($1:$25),1) = {"8";"7";#VALUE!;"4";#VALUE!;#VALUE!;#VALUE!; etc.}
And the IFERROR function turns the #VALUES into 99, which is just an arbitrary number greater than the value of a single digit.
IFERROR(1*MID(A1,LEN(A1)+1-ROW($1:$25),1),99) = {8;7;99;4;99;99;99; etc.}
Matching on the 99 gives the position of the first non-digit character counting from the right end of the string. In this case, "E" is the first non-digit in the reversed string "87E4G", at position 3. This is equivalent to saying that the number we are looking for at the end of the string, plus the "E", is 3 characters long.
MATCH(99,IFERROR(1*MID(A1,LEN(A1)+1-ROW($1:$25),1),99),0) = 3
So, for the final step, we take 3 - 1 (for the "E) characters from the right of string.
RIGHT(A1,MATCH(99,IFERROR(1*MID(A1,LEN(A1)+1-ROW($1:$25),1),99),0)-1) = "78"
One more submission for you to consider. This VBA function will get the right most digits before the first non-numeric character
Public Function GetRightNumbers(str As String)
Dim i As Integer
For i = Len(str) To 0 Step -1
If Not IsNumeric(Mid(str, i, 1)) Then
Exit For
End If
Next i
GetRightNumbers = Mid(str, i + 1)
End Function
You can write some VBA to format the data (just starting at the end and working back until you hit a non-number.)
Or you could (if you're happy to get an addin like Excelicious) then you can use regular expressions to format the text via a formula. An expression like [0-9]+$ would return all the numbers at the end of a string IIRC.
NOTE: This uses the regex pattern in James Snell's answer, so please upvote his answer if you find this useful.
Your best bet is to use a regular expression. You need to set a reference to VBScript Regular Expressions for this to work. Tools --> References...
Now you can use regex in your VBA.
This will find the numbers at the end of each cell. I am placing the result next to the original so that you can verify it is working the way you want. You can modify it to replace the cell as soon as you feel comfortable with it. The code works regardless of the length of the string you are evaluating, and will skip the cell if it doesn't find a match.
Sub GetTrailingNumbers()
Dim ws As Worksheet
Dim rng As Range
Dim cell As Range
Dim result As Object, results As Object
Dim regEx As New VBScript_RegExp_55.RegExp
Set ws = ThisWorkbook.Sheets("Sheet1")
' range is hard-coded here, but you can define
' it programatically based on the shape of your data
Set rng = ws.Range("A1:A3")
' pattern from James Snell's answer
regEx.Pattern = "[0-9]+$"
For Each cell In rng
If regEx.Test(cell.Value) Then
Set results = regEx.Execute(cell.Value)
For Each result In results
cell.Offset(, 1).Value = result.Value
Next result
End If
Next cell
End Sub
Takes the first 4 digits from the right of num:
num1=Right(num,4)
Takes the first 5 digits from the left of num:
num1=Left(num,5)
First takes the first ten digits from the left then takes the first four digits from the right:
num1=Right(Left(num, 10),4)
In your case:
num=G4ED7883666
num1=Right(num,7)
I am used to string slicing in 'C' many, many years ago but I am trying to work with VBA for this specific task.
Right now I have created a string "this is a string" and created a new workbook.
What I need now is to use string slicing to put 't' in, say, A1, 'h' in A2, 'i' in A3 etc. to the end of the string.
After which my next string will go in, say B1 etc. until all strings are sliced.
I have searched but it seems most people want to do it the other way around (concatenating a range).
Any thoughts?
Use the mid function.
=MID($A$1,1,1)
The second argument is the start position so you could replace that for something like the row or col function so you can drag the formula dynamically.
ie.
=MID($A$1,ROW(),1)
If you wanted to do it purely in VBA, I believe the mid function exists in there too, so just loop through the string.
Dim str as String
str = Sheet1.Cells(1,1).Value
for i = 1 to Len(str)
'output string 1 character at a time in column C
sheet1.cells(i,3).value = Mid(str,i,1)
next i
* edit *
If you want to do this with multiple strings from an array, you could use something like:
Dim str(1 to 2) as String
str(1) = "This is a test string"
str(2) = "Some more test text"
for j = Lbound(str) to Ubound(str)
for i = 1 to Len(str(j))
'output strings 1 character at a time in columns A and B
sheet1.cells(i,j).value = Mid(str(j),i,1)
next i
next j