Excel SUMPRODUCT Substring Match - excel

I am using SUMPRODUCT to match a cell equaling one of many things.
Using the below formula, I am attempting to match the value 2147 (A single value from a column of many values) to the below variations of the number 2147:
=SUMPRODUCT( -- ("2147"=Table6[data])) > 0
Table6[data] Return Value
1 2147 TRUE
2 2147, 500 FALSE
3 2146-2148 FALSE
4 21475 FALSE
The first TRUE and last FALSE values are as expected (success), but I need the middle two to match TRUE (Identify the 2147 next to the , 500 and between the range 2146-2148.

This uses a custom function. If you use it, I suggest you give it a more meaningful name.
So you'd put this formula in B1 and copy down
=Match2(2147,A1)
In outline, the function checks if a hyphen exists (using Split) and if so, checks the desired value against the lower and upper limits.
If not, using Split again, we split by commas, and if any element of the resultant array equals our desired value we return TRUE.
Function Match2(d As Double, r As Range) As Boolean
Dim v As Variant, i As Long
v = Split(r, "-")
If UBound(v) = 1 Then
If Val(v(0)) <= d And Val(v(1)) >= d Then
Match2 = True
Else
Match2 = False
End If
'we could shorten the five lines above to
'Match2 = (Val(v(0)) <= d And Val(v(1)) >= d)
Else
v = Split(r, ",")
For i = LBound(v) To UBound(v)
If Val(v(i)) = d Then
Match2 = True
Exit Function
End If
Next i
Match2 = False
End If
End Function

Just for an an FYI, this is the formula:
=SUM(IF(ISNUMBER(SEARCH("-",TRIM(MID(SUBSTITUTE(A2,",",REPT(" ",99)),(ROW(INDEX(XFD:XFD,1):INDEX(XFD:XFD,LEN(A2)-LEN(SUBSTITUTE(A2,",",""))+1))-1)*99+1,99)))),(2147 >= --LEFT(TRIM(MID(SUBSTITUTE(A2,",",REPT(" ",99)),(ROW(INDEX(XFD:XFD,1):INDEX(XFD:XFD,LEN(A2)-LEN(SUBSTITUTE(A2,",",""))+1))-1)*99+1,99)),FIND("-",TRIM(MID(SUBSTITUTE(A2,",",REPT(" ",99)),(ROW(INDEX(XFD:XFD,1):INDEX(XFD:XFD,LEN(A2)-LEN(SUBSTITUTE(A2,",",""))+1))-1)*99+1,99)))-1))*(2147<=--MID(TRIM(MID(SUBSTITUTE(A2,",",REPT(" ",99)),(ROW(INDEX(XFD:XFD,1):INDEX(XFD:XFD,LEN(A2)-LEN(SUBSTITUTE(A2,",",""))+1))-1)*99+1,99)),FIND("-",TRIM(MID(SUBSTITUTE(A2,",",REPT(" ",99)),(ROW(INDEX(XFD:XFD,1):INDEX(XFD:XFD,LEN(A2)-LEN(SUBSTITUTE(A2,",",""))+1))-1)*99+1,99)))+1,99)),--(2147 = --TRIM(MID(SUBSTITUTE(A2,",",REPT(" ",99)),(ROW(INDEX(XFD:XFD,1):INDEX(XFD:XFD,LEN(A2)-LEN(SUBSTITUTE(A2,",",""))+1))-1)*99+1,99)))))>0
It is an array formula that needs to be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode.

Related

How to count differences between Cells

I have VBA which comparing 2 cells. Each cell can contain between 1 and 3 different parameters ant parameters are trimmed by the "," comparison is made by simple double for loop(check code). Thing what i can't figure it out is that: How to modify code and get number of unique entries, example
cell 1 [music, art, science]; cell 2 [art, music]; When i run my for loops i get 2 matches(which is fine) but how to count number of unique words in this case should be 3.
I have tried to enter this part of code but its not working well num_possible = num_possible + 1
game_tags_parts = Split(Cells(11, 2), ",")
game_tags_parts_j = Split(Cells(11, j), ",")
num_matches = 0
num_possible = 0
For m = LBound(game_tags_parts) To UBound(game_tags_parts)
num_possible = num_possible + 1
For n = LBound(game_tags_parts_j) To UBound(game_tags_parts_j)
If Trim(game_tags_parts(m)) = Trim(game_tags_parts_j(n)) Then
num_matches = num_matches + 1
End If
Next n
Next m
Actual result should be number of unique words used in those cells, in some cases i get 3 matches, example cell 1 [scifi, space, star] cell 2 [star, space, scifi] and its in total 3 matches. Modification should provide me an number 3 as number of unique words used in both cells. Or in this case where i have cell 1 [art, music, science] and cell 2 [scifi, space, star] where program gives me 0 same words and modification should give me a number 6 as unique used words.
One easy way to get a unique count is to use a Dictionary object:
game_tags_parts = Split(Cells(11, 2), ",")
game_tags_parts_j = Split(Cells(11, j), ",")
Dim myDict As Object
Set myDict = CreateObject("Scripting.Dictionary")
For Each v In game_tags_parts
If Not myDict.Exists(v) Then myDict.Add v, v
Next v
For Each v In game_tags_parts_j
If Not myDict.Exists(v) Then myDict.Add v, v
Next v
MsgBox "unique count: " & myDict.Count

Summing cells that have formula and string concatenated together

I have a column with formula as follows:
=(2+3*6+8) & "KB"
Basically, each cell is a formula and string concatenated (using &). I want to add all these cells up. I tried the following things:
a) =SUM(B2:B21) gives me a sum of 0.
b) Using =B2+B3... gives me a #VALUE error.
c) I tried something like this also - didn't work, gives a sum of 0: =SUM(IF(ISNUMBER(FIND("KB",$C$2:$C$14)),VALUE(LEFT($C$2:$C$14,FIND("KB",$C$2:$C$14)-1)),0))
Make your own SUM function in VBA. Try this:
=StripTextAndSum(A2:A4) - returns 60
=StripTextAndAverage(A2:A4) - returns 20
This method keeps the left most decimal number and tosses away the rest.
NOTE: You can tweak this to fit your needs. One way would be to retain the text so you can return it in the sum....like 150MB (i am assuming KB means kilobyte). Let me know if you like that idea and I'll make it.
EDIT: As pointed out by #DirkReichel, this has been made a little more efficient using IsNumeric instead, but I have retained all the other functions too. IsLetter is a useful function too.
Public Function StripTextAndSum(myRange As Range)
Dim r As Range
Dim n As Double
n = 0
For Each r In myRange.Cells
n = n + ParseNumberFromString(r.Text)
Next r
StripTextAndSum = n
End Function
Public Function StripTextAndAverage(myRange As Range)
Dim n As Double
n = StripTextAndSum(myRange)
StripTextAndAverage = n / (myRange.Cells.Count * 1#)
End Function
Public Function ParseNumberFromString(s As String) As Double
ParseNumberFromString = Left(s, GetLastNumberIndex(s))
End Function
Public Function GetFirstLetterIndex(s As String) As Integer
Dim i As Integer
For i = 1 To Len(s) Step 1
If IsLetter(Mid(s, i, 1)) = True Then
GetFirstLetterIndex = i
Exit Function
End If
Next i
End Function
Public Function GetLastNumberIndex(s As String) As Integer
Dim i As Integer
For i = Len(s) To 1 Step -1
If IsNumeric(Left(s, i)) = True Then
GetLastNumberIndex = i
Exit Function
End If
Next i
End Function
Function IsLetter(s As String) As Boolean
Dim i As Integer
For i = 1 To Len(s)
If LCase(Mid(s, i, 1)) <> UCase(Mid(s, i, 1)) = True Then
IsLetter = True
Else
IsLetter = False
Exit For
End If
Next
End Function
I'd normally just move the KB to the following column and left-justify it.
That way, it still looks identical but the first column only has real numbers that you can manipulate mathematically to your heart's content.
Or, assuming they're all in kilobytes (which seems to be a requirement if you just want to add the numeric bits), don't put KB in the data area at all.
Instead change the heading from, for example, Used memory to Used memory (KB).
Do you really want to populate your beautiful spreadsheets with butt-ugly monstrosities like the following? :-)
=SUM(IF(ISNUMBER(FIND("KB",$C$2:$C$14)),VALUE(LEFT($C$2:$C$1‌​4,FIND("KB",$C$2:$C$‌​14)-1)),0))
If you need to keep your column as-is, you could always use an array formula to get the sum:
=sum(value(left(b2:b21,len(b2:b21)-2)))
You will need to enter this as an array formula (press Ctrl+Shift+Enter to submit it)
Basically this is taking the leftmost chunk of a cell (all but the last two characters, which we know are 'KB'), using value() to convert it into a numeric, and sum() to add it up. Entering it as an array formula just lets us do this to each cell in the list b2:b21 in one swoop.
As #paxdiablo mentioned, though, it might be best to restructure so that you don't have to deal with your values as text in the first place. My approach would be to enter the values and add the "KB" via formatting. You can use a custom formatting with something like 0.00 "KB" so the cell only holds, say, the value 17, but it displays as "17.00 KB".

VBA Excel Find the last specific instance in a string

I have a list of strings in excel as such:
a>b>b>d>c>a
a>b>c>d
b>b>b>d>d>a
etc.
I want to extract the last c or last d from each string whichever comes last,
e.g
a>b>b>d>c>a = C
a>b>c>d = d
b>b>b>d>d>a = d
how would I do this using VBA (or just straight excel if it is possible)?
You could use an excel formula as follows
To help explain will start with just one letter then will show full formula at the end.
First find the number of occurences of c
= LEN(A1) - LEN(SUBSTITUTE(A1,"c","")
Use this position to replace the last c with a unique character ($ as an example)
=SUBSTITUTE(A1,"c","$",LEN(A1) - LEN(SUBSTITUTE(A1,"c","")))
Next find this unique character
= FIND("$",SUBSTITUTE(A1,"c","$",LEN(A1) - LEN(SUBSTITUTE(A1,"c",""))))
This gives the position of the last c, now you can use this in a mid function to return this last c
= MID(A1,FIND("$",SUBSTITUTE(A1,"c","$",LEN(A1) - LEN(SUBSTITUTE(A1,"c","")))),1)
Finally to account for both c and d, use a max to bring back which comes last
= MID(A1,MAX(IFERROR(FIND("$",SUBSTITUTE(A1,"c","$",LEN(A1) - LEN(SUBSTITUTE(A1,"c","")))),0),IFERROR(FIND("$",SUBSTITUTE(A1,"d","$",LEN(A1) - LEN(SUBSTITUTE(A1,"d","")))),0)),1)
Assuming c/d are just examples:
?LastEither("b>b>b>d>d>a", "c", "d")
d
Using
Function LastEither(testStr As String, find1 As String, find2 As String) As String
Dim p1 As Long: p1 = InStrRev(testStr, find1)
Dim p2 As Long: p2 = InStrRev(testStr, find2)
If (p1 > p2) Then
LastEither = find1
ElseIf (p2 > 0) Then LastEither = find2
End If
End Function
General solution:
?FindLastMatch("b>b>b>d>d>a>q>ZZ", ">", "c", "d")
d
?FindLastMatch("b>b>b>d>d>a>q>ZZ", ">", "c", "d", "q")
q
?FindLastMatch("b>b>b>d>d>a>q>ZZ>ppp", ">", "c", "d", "ZZ", "q")
ZZ
Using
Function FindLastMatch(testStr As String, delimiter As String, ParamArray findTokens() As Variant) As String
Dim tokens() As String, i As Long, j As Long
tokens = Split(testStr, delimiter)
For i = UBound(tokens) To 0 Step -1
For j = 0 To UBound(findTokens)
If tokens(i) = findTokens(j) Then
FindLastMatch = tokens(i)
Exit Function
End If
Next
Next
End Function
And here is a array formula to do the same thing. (Changed formula to avoid problem with original pointed out by Grade 'Eh' Bacon)
=MID(A1,MAX((MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)={"c","d"})*ROW(INDIRECT("1:"&LEN(A1)))),1)
An array formula is entered by holding down ctrl+shift while hitting enter. If you do it correctly, Excel will place braces {...} around the formula which you can see in the formula bar.
The formula will return a #VALUE! error if there is neither c nor d in the string.
EDIT: Having seen from some of your comments that you might want to use more than single character words, I present the following User Defined Function. It allows you to use words of any length, and also you are not limited to just two words -- you can use an arbitrary number of words.
You would enter a formula such as:
=LastOne(A8,"Charlie","Delta")
or
=LastOne(A8,$I1:$I2)
where I1 and I2 contain the words you wish to check for.
The words need to be separated by some delimiter that is neither a letter nor a digit.
A Regular Expression (regex) is constructed which consists of a pipe-separated | list of the words or phrases. The pipe | , in a regex, is the same as an OR. The \b at the beginning and end of the regex indicates a word boundary -- that is the point at which a digit or letter is adjacent to a non-digit or non-letter, or the beginning or end of the string. Hence the actual delimiter does not matter, so long as it is not a letter or digit.
All of the matches are placed in a Match Collection; and we only need to look for the last item in the match. There will be MC.Count matches and, since this count is zero based, we subtract one to get the last match.
Here is the code:
===========================================
Option Explicit
Function LastOne(sSearch As String, ParamArray WordList() As Variant) As String
Dim RE As Object, MC As Object
Dim sPat As String
Dim RNG, C
For Each RNG In WordList
If IsArray(RNG) Or IsObject(RNG) Then
For Each C In RNG
sPat = sPat & "|" & C
Next C
Else
sPat = sPat & "|" & RNG
End If
Next RNG
sPat = "\b(?:" & Mid(sPat, 2) & ")\b"
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.Pattern = sPat
.ignorecase = True
If .test(sSearch) = True Then
Set MC = .Execute(sSearch)
LastOne = MC(MC.Count - 1)
End If
End With
End Function
===========================================
Here is a sample screenshot:
Note that an absence of a WordList word will result in a blank cell. One could produce an error if that is preferable.
In VBA you can do this using following simple logic.
Dim str As String
str = "a>b>b>d>c>a"
Dim Cet
Cet = split(str,">")
Dim i as Integer
For i= Ubound(Cet) to Lbound(Cet)
If Cet(i) = "c" or "d" or "C" or "D" then
MsgBox Cet(i)
Exit For
End if
Next i
Assuming your string is in cell A1, and there are no uses of the tilde (~) character in it, you can use the following in a worksheet:
=IF(IFERROR(FIND("~",SUBSTITUTE(A1,"c","~",LEN(A1)-LEN(SUBSTITUTE(A1,"c","")))),0)>IFERROR(FIND("~",SUBSTITUTE(A1,"d","~",LEN(A1)-LEN(SUBSTITUTE(A1,"d","")))),0),"c","d")
EDIT:
In response to a comment, here's an explanation of how this works. I've also neatened up the formula slightly having looked back at it again. The two formulae for c and d are identical, so the explanation will apply for both. So, working outwards
LEN(A1)-LEN(SUBSTITUTE(A1,"c",""))
Here we remove all instances of c from the string. By comparing the length of this calculated string and the original string, we calculate the number of times c appears in the original string.
SUBSTITUTE(A1,"c","~",LEN(A1)-LEN(SUBSTITUTE(A1,"c","")))
Now that we know the number of times c appears in our string, we
replace the last occurrence of c with the tilde character (here we assume the tilde isn't used in the string otherwise).
FIND("~",SUBSTITUTE(A1,"c","~",LEN(A1)-LEN(SUBSTITUTE(A1,"c",""))))
We then find the position of the tilde in the string, which is equivalent to the position of the last c in the string.
IFERROR(FIND("~",SUBSTITUTE(A1,"c","~",LEN(A1)-LEN(SUBSTITUTE(A1,"c","")))),0)
Wrapping this in an IFERROR ensures that we don't have errors coming through the formula - setting the value to 0 if no c exists ensures that we still get a correct answer if our string contains c but not d (and vice versa).
We then apply the same calculation to d and compare the two to see which occurs later in our string. Note: this will give an incorrect answer if there is neither c nor d in the string.

how to sum with an interval

I am looking for a formula or macro which can employ the following: I need to sum-square the amount per week. This should be done from the beginning on. My data structure is as follow:
Col A Col B Column C
Year Week Amount
2000 1 368
2000 2 8646
… … …
2000 52 46846
2001 1 656
2001 2 846
… … …
2001 52 4651
2002 1 489
… … …
2014 52 46546
I would have a column D in which I have the sum-squared of the amount per week. So Cell(Column "D", "week 2000w1") should be,
=SUMSQ(Amount 2000w1)
For the first year, this is easy. The problem occurs in the next year. In Cell (Column "D", week "2001w1") the formula should be,
=SUMSQ(Amount 2000w1;Amount 2001w1)
For the last year, cell (Column "D", week "2014w1") should be the formula,
=SUMSQ(Amount 2000w1;Amount 2001w1; Amount 2002w1;Amount 2003w1;Amount 2004w1; Amount 2005w1;Amount 2006w1;Amount 2007w1; Amount 2008w1;Amount 2009w1;Amount 2010w1; Amount 2011w1;Amount 2012w1;Amount 2013w1)
This should be done for the weeks 1 till 52 for all the years. Is there a quick way to do this?
This is a solution with worksheetfunction, you can develop similar with a macro too, but I think now it's easier without it:
=SUMSQ(INDEX([Amount]*([Week]=[#Week])*([Year]<=[#Year]),0))
Thank you Juhász Máté for stealing my jerb. Now I feel dumb for having this developed.
Anyways, here's a VBA solution.
For your sample dataset =SUMQ($C:$C, $B:$B, $B2) would give 804,881 i.e. the sum of *week1*s squares.
The advanced use as =SUMQ($C:$C, $B:$B, $B2, $A:$A, "<=", $A2) will give 135,424 that is sum of *week1*s for years lower than or equal 2000.
Public Function SUMQ(NumsToSquare As Range, Filter1 As Range, FilterCriterion As Variant, _
Optional Filter2 As Range, Optional FilterRelation As String, Optional FilterCriterion2 As Variant) As Long
Set NumsToSquare = Intersect(NumsToSquare, NumsToSquare.Worksheet.UsedRange)
Set Filter1 = Intersect(Filter1, Filter1.Worksheet.UsedRange)
RowsCount = Filter1.Rows.Count
ColumnsCount = Filter1.Columns.Count
If Not Filter2 Is Nothing Then Advanced = True
If Advanced Then Set Filter2 = Intersect(Filter2, Filter2.Worksheet.UsedRange)
On Error Resume Next
For i = 1 To RowsCount
For j = 1 To ColumnsCount
If Not Advanced Then
If Filter1(i, j).Value2 = FilterCriterion Then SUMQ = SUMQ + NumsToSquare(i, j).Value2 ^ 2
Else
If Filter1(i, j).Value2 = FilterCriterion And Judge(Filter2(i, j).Value2, FilterRelation, FilterCriterion2) Then SUMQ = SUMQ + NumsToSquare(i, j).Value2 ^ 2
End If
Next j
Next i
End Function
Private Function Judge(var1 As Variant, FilterRelation As String, var2 As Variant) As Boolean
Judge = False
On Error GoTo err:
Select Case FilterRelation 'cf. https://msdn.microsoft.com/en-us/library/aa711633(v=vs.71).aspx
Case "=" 'The = operator tests whether the two operands are equal.
Judge = (var1 = var2)
Case "<>" 'The <> operator tests whether the two operands are not equal.
Judge = (var1 <> var2)
Case "<" 'The < operator tests whether the first operand is less than the second operand.
Judge = (var1 < var2)
Case ">" 'The > operator tests whether the first operand is greater than the second operand.
Judge = (var1 > var2)
Case "<=" 'The <= operator tests whether the first operand is less than or equal to the second operand.
Judge = (var1 <= var2)
Case ">=" 'The >= operator tests whether the first operand is greater than or equal to the second operand.
Judge = (var1 >= var2)
End Select
err:
End Function

How to count up elements in excel

So I have a column called chemical formula for like 40,000 entries, and what I want to be able to do is count up how many elements are contained in the chemical formula. So for example:-
EXACT_MASS FORMULA
626.491026 C40H66O5
275.173274 C13H25NO5
For this, I need some kind of formula that will return with the result of
C H O
40 66 5
13 25 5
all as separate columns for the different elements and in rows for the different entries. Is there a formula that can do this?
You could make your own formula.
Open the VBA editor with ALT and F11 and insert a new module.
Add a reference to Microsoft VBScript Regular Expressions 5.5 by clicking Tools, then references.
Now add the following code:
Public Function FormulaSplit(theFormula As String, theLetter As String) As String
Dim RE As Object
Set RE = CreateObject("VBScript.RegExp")
With RE
.Global = True
.MultiLine = False
.IgnoreCase = False
.Pattern = "[A-Z]{1}[a-z]?"
End With
Dim Matches As Object
Set Matches = RE.Execute(theFormula)
Dim TheCollection As Collection
Set TheCollection = New Collection
Dim i As Integer
Dim Match As Object
For i = (Matches.Count - 1) To 0 Step -1
Set Match = Matches.Item(i)
TheCollection.Add Mid(theFormula, Match.FirstIndex + (Len(Match.Value) + 1)), UCase(Trim(Match.Value))
theFormula = Left(theFormula, Match.FirstIndex)
Next
FormulaSplit = "Not found"
On Error Resume Next
FormulaSplit = TheCollection.Item(UCase(Trim(theLetter)))
On Error GoTo 0
If FormulaSplit = "" Then
FormulaSplit = "1"
End If
Set RE = Nothing
Set Matches = Nothing
Set Match = Nothing
Set TheCollection = Nothing
End Function
Usage:
FormulaSplit("C40H66O5", "H") would return 66.
FormulaSplit("C40H66O5", "O") would return 5.
FormulaSplit("C40H66O5", "blah") would return "Not found".
You can use this formula directly in your workbook.
I've had a stab at doing this in a formula nad come up with the following:
=IFERROR((MID($C18,FIND(D17,$C18)+1,2))*1,IFERROR((MID($C18,FIND(D17,$C18)+1,1))*1,IFERROR(IF(FIND(D17,$C18)>0,1),0)))
It's not very neat and would have to be expanded further if any of your elements are going to appear more than 99 times - I also used a random placement on my worksheet so the titles H,C and O are in row 17. I would personally go with Jamie's answer but just wanted to try this to see if I could do it in a formula possible and figured it was worth sharing just as another perspective.
Even though this has an excellent (and accepted) VBA solution, I couldn't resist the challenge to do this without using VBA.
I posted a solution earlier, which wouldn't work in all cases. This new code should always work:
=MAX(
IFERROR(IF(FIND(C$1&ROW($1:$99),$B2),ROW($1:$99),0),0),
IFERROR(IF(FIND(C$1&CHAR(ROW($65:$90)),$B2&"Z"),1,0),0)
)
Enter as an array formula: Ctrl + Shift + Enter
Output:
The formula outputs 0 when not found, and I simply used conditional formatting to turn zeroes gray.
How it works
This part of the formula looks for the element, followed by a number between 1 and 99. If found, the number of atoms is returned. Otherwise, 0 is returned. The results are stored in an array:
IFERROR(IF(FIND(C$1&ROW($1:$99),$B2),ROW($1:$99),0),0)
In the case of C13H25NO5, a search for "C" returns this array:
{1,0,0,0,0,0,0,0,0,0,0,0,13,0,0,0,...,0}
1 is the first array element, because C1 is a match. 13 is the thirteenth array element, and that's what we're interested in.
The next part of the formula looks for the element, followed by an uppercase letter, which indicates a new element. (The letters A through Z are characters 65 through 90.) If found, the number 1 is returned. Otherwise, 0 is returned. The results are stored in an array:
IFERROR(IF(FIND(C$1&CHAR(ROW($65:$90)),$B2&"Z"),1,0),0)
"Z" is appended to the chemical formula, so that a match will be found when its last element has no number. (For example, "H2O".) There is no element "Z" in the Periodic Table, so this won't cause a problem.
In the case of C13H25NO5, a search for "N" returns this array:
{0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0}
1 is the 15th element in the array. That's because it found the letters "NO", and O is the 15th letter of the alphabet.
Taking the maximum value from each array gives us the number of atoms as desired.

Resources