How to count up elements in excel - excel

So I have a column called chemical formula for like 40,000 entries, and what I want to be able to do is count up how many elements are contained in the chemical formula. So for example:-
EXACT_MASS FORMULA
626.491026 C40H66O5
275.173274 C13H25NO5
For this, I need some kind of formula that will return with the result of
C H O
40 66 5
13 25 5
all as separate columns for the different elements and in rows for the different entries. Is there a formula that can do this?

You could make your own formula.
Open the VBA editor with ALT and F11 and insert a new module.
Add a reference to Microsoft VBScript Regular Expressions 5.5 by clicking Tools, then references.
Now add the following code:
Public Function FormulaSplit(theFormula As String, theLetter As String) As String
Dim RE As Object
Set RE = CreateObject("VBScript.RegExp")
With RE
.Global = True
.MultiLine = False
.IgnoreCase = False
.Pattern = "[A-Z]{1}[a-z]?"
End With
Dim Matches As Object
Set Matches = RE.Execute(theFormula)
Dim TheCollection As Collection
Set TheCollection = New Collection
Dim i As Integer
Dim Match As Object
For i = (Matches.Count - 1) To 0 Step -1
Set Match = Matches.Item(i)
TheCollection.Add Mid(theFormula, Match.FirstIndex + (Len(Match.Value) + 1)), UCase(Trim(Match.Value))
theFormula = Left(theFormula, Match.FirstIndex)
Next
FormulaSplit = "Not found"
On Error Resume Next
FormulaSplit = TheCollection.Item(UCase(Trim(theLetter)))
On Error GoTo 0
If FormulaSplit = "" Then
FormulaSplit = "1"
End If
Set RE = Nothing
Set Matches = Nothing
Set Match = Nothing
Set TheCollection = Nothing
End Function
Usage:
FormulaSplit("C40H66O5", "H") would return 66.
FormulaSplit("C40H66O5", "O") would return 5.
FormulaSplit("C40H66O5", "blah") would return "Not found".
You can use this formula directly in your workbook.

I've had a stab at doing this in a formula nad come up with the following:
=IFERROR((MID($C18,FIND(D17,$C18)+1,2))*1,IFERROR((MID($C18,FIND(D17,$C18)+1,1))*1,IFERROR(IF(FIND(D17,$C18)>0,1),0)))
It's not very neat and would have to be expanded further if any of your elements are going to appear more than 99 times - I also used a random placement on my worksheet so the titles H,C and O are in row 17. I would personally go with Jamie's answer but just wanted to try this to see if I could do it in a formula possible and figured it was worth sharing just as another perspective.

Even though this has an excellent (and accepted) VBA solution, I couldn't resist the challenge to do this without using VBA.
I posted a solution earlier, which wouldn't work in all cases. This new code should always work:
=MAX(
IFERROR(IF(FIND(C$1&ROW($1:$99),$B2),ROW($1:$99),0),0),
IFERROR(IF(FIND(C$1&CHAR(ROW($65:$90)),$B2&"Z"),1,0),0)
)
Enter as an array formula: Ctrl + Shift + Enter
Output:
The formula outputs 0 when not found, and I simply used conditional formatting to turn zeroes gray.
How it works
This part of the formula looks for the element, followed by a number between 1 and 99. If found, the number of atoms is returned. Otherwise, 0 is returned. The results are stored in an array:
IFERROR(IF(FIND(C$1&ROW($1:$99),$B2),ROW($1:$99),0),0)
In the case of C13H25NO5, a search for "C" returns this array:
{1,0,0,0,0,0,0,0,0,0,0,0,13,0,0,0,...,0}
1 is the first array element, because C1 is a match. 13 is the thirteenth array element, and that's what we're interested in.
The next part of the formula looks for the element, followed by an uppercase letter, which indicates a new element. (The letters A through Z are characters 65 through 90.) If found, the number 1 is returned. Otherwise, 0 is returned. The results are stored in an array:
IFERROR(IF(FIND(C$1&CHAR(ROW($65:$90)),$B2&"Z"),1,0),0)
"Z" is appended to the chemical formula, so that a match will be found when its last element has no number. (For example, "H2O".) There is no element "Z" in the Periodic Table, so this won't cause a problem.
In the case of C13H25NO5, a search for "N" returns this array:
{0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0}
1 is the 15th element in the array. That's because it found the letters "NO", and O is the 15th letter of the alphabet.
Taking the maximum value from each array gives us the number of atoms as desired.

Related

Formula to search 3 words "text" in any sequence in different cell

Please take a look at the attached image. I have a long list of items and I've created a common keywords to search in that list. I'm using this formula:
=INDEX(A:A,MATCH((("*"&B2&"*")&("*"&C2&"*")&("*"&D2&"*")&("*"&E2&"*")&("*"&F2&"*")),A:A,0))
The problem that the search is going through the same sequence that I entered.
It gives error if the sequence of the words in the cell is different than the sequence in my formula which make sense.
Is there a way I can search for 3 or more words that are existing in any cell in any sequence?
I am open to using VBA if necessary.
My search results:
Here is the user defined function:
Public Function indexMX(rng As Range, pat1 As Range, pat2 As Range, pat3 As Range, pat4 As Range, pat5 As Range) As Variant
Dim r As Range, rngx As Range, s(1 To 5) As String, Kount As Long, j As Long
s(1) = pat1.Value
s(2) = pat2.Value
s(3) = pat3.Value
s(4) = pat4.Value
s(5) = pat5.Value
Set rngx = Intersect(rng, rng.Parent.UsedRange)
For Each r In rngx
v = r.Value
Kount = 0
For j = 1 To 5
If InStr(1, v, s(j)) > 0 Or s(j) = "" Then Kount = Kount + 1
Next j
If Kount = 5 Then
indexMX = v
Exit Function
End If
Next r
indexMX = "no luck"
End Function
Here is an example of its usage:
As you see, we give the UDF() the address of the column and the addresses of the five keywords and the UDF() finds the first item containing all five words.
If a keyword is blank, it is not used. (so if you want to search for only two keywords, leave the other three blank). If no matches are found the phrase no luck is returned.
User Defined Functions (UDFs) are very easy to install and use:
ALT-F11 brings up the VBE window
ALT-I
ALT-M opens a fresh module
paste the stuff in and close the VBE window
If you save the workbook, the UDF will be saved with it.
If you are using a version of Excel later then 2003, you must save
the file as .xlsm rather than .xlsx
To remove the UDF:
bring up the VBE window as above
clear the code out
close the VBE window
To use the UDF from Excel:
=myfunction(A1)
To learn more about macros in general, see:
http://www.mvps.org/dmcritchie/excel/getstarted.htm
and
http://msdn.microsoft.com/en-us/library/ee814735(v=office.14).aspx
and for specifics on UDFs, see:
http://www.cpearson.com/excel/WritingFunctionsInVBA.aspx
Macros must be enabled for this to work!
EDIT#1:
to remove case sensitivity, replace:
If InStr(1, v, s(j)) > 0 Or s(j) = "" Then Kount = Kount + 1
with:
If InStr(1, LCase(v), LCase(s(j))) > 0 Or s(j) = "" Then Kount = Kount + 1
Yes, it is possible for a single cell to return three matching words from a different cell. The answer in this example uses a formula to return 6 matches. VBA and special array functions are not used.
This is the formula:
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUMPRODUCT((IFERROR(SEARCH(FIRST,TARGET),0)>0)*100000+(IFERROR(SEARCH(SECOND,TARGET),0)>0)*20000+(IFERROR(SEARCH(THIRD,TARGET),0)>0)*3000+(IFERROR(SEARCH(FOURTH,TARGET),0)>0)*400+(IFERROR(SEARCH(FIFTH,TARGET),0)>0)*50+(IFERROR(SEARCH(SIXTH,TARGET),0)>0)*6),"1",FIRST&DELIM),"2",SECOND&DELIM),"3",THIRD&DELIM),"4",FOURTH&DELIM),"5",FIFTH&DELIM),"6",SIXTH&DELIM),"0","")
Did you notice the named ranges? FIRST, SECOND, THIRD, etc are individual cells and each one holds a word. We are trying to find those words inside TARGET. If we find the words, then we will write them in this cell holding this formula and each word will be separated by DELIM
The ranges are optional. In the picture below, you'll see cell A2 contains the word "named". This is the first of the six words we're tying to find and it can be expressed as FIRST = "A2" = "named" Inside the formula you'll see that FIRST appears twice. You could replace it with "named" and the cell A2 would become blank but the functionality of the formula would not change.
Even TARGET is optional. It could be written as E1 or typed out word for word.
I don't know why anyone would do that... but it is possible.
DELIM is at cell B2, it is a double space
Now to explain how it works
SEARCH(search for what?, search where?) This is responsible for determining if a match exists or not. If you understand what the named ranges then you've already figured out the syntax is. The location of first letter that of match in TARGET is returned. In this formula it is always 1 If it is not found then the number is 0
IFERROR(value,value) tries to perform the operation. If successful then the result is displayed. If there's an error the second result is displayed. Every IFERROR in this formula is practically the same: IFERROR(SEARCH(FIRST,TARGET),0) It searches inside TARGET trying to find the FIRST word. Result if found is 1 and if not found is 0
It gets a little more complicated from here so lets recap. We're calling SEARCH 6 times. Once for each word we want to find and we're always looking in TARGET. Result will be a 1 if match is found or a 0 if not. Ironically, us humans can put it together and see the match but the formula can't determine which words have been matched without more information
SUMPRODUCT takes the sum (addition) of the product (multiplication) of two or more arrays.
multiply two arrays to get the product
a, b, c * e, f, g = ae, bf, cg
takethe sum of the product to get the SUMPRODUCT
ae + bf + cg`
This is easiest when thought of a price and quantity. If one array is the price of a group of items and the other is quantity of the same group of items, then multiplying the two arrays will create a new array where each element is the cost to buy all items of that type in the group the total, and adding all those numbers give you the total cost you'd pay for all of the items
Here we multiply two arrays:
Qty Price
12.0 0.3
70.0 0.1
20.0 0.4
Multiply them to get the product:
Qty Price Total
12.0 0.3 3.8
70.0 0.1 7.0
20.0 0.4 8.0
Take the sum of the product:
Qty Price Total
12.0 0.3 3.8
70.0 0.1 7.0
20.0 0.4 8.0
18.8 SUMPRODUCT
Lets look at part of the formula:
SUMPRODUCT((IFERROR(SEARCH(FIRST,TARGET),0)>0)*100000+IFERROR(SEARCH(SECOND,TARGET),0)>0)*20000+...
It is easy to see this segment is looking for two words. We know SUMPRODUCT want's to multiply and add arrays. If you're thinking (IFERROR(SEARCH(FIRST,TARGET),0)>0) is an array, you'd be right! It's not an array in the technical sense of the word, but it does evaluate into a single value which can thought of as a 1x1 array, or, a cell. The sharp eyed and quick witted, may have noticed there is something on this array we have mentioned. It's the inequality at the end! Many of you know that you can take numerical values and turn them into boolean by testing them with an inequality. So lets evaluate.... SEARCH for FIRST inside TARGET = 1 because FIRST = "named" which is inside TARGET waaaaaaay in the back and because it wasn't an error we get to keep the 1. Next we do the inequality 1 > 0 = TRUE One is greater than zero and evaluates to TRUE
This is what we have right now
SUMPRODUCT((TRUE*100000+IFERROR(SEARCH(SECOND,TARGET),0)>0)*20000+....
Can you identify the arrays now? We know TRUE is an array, a 1x1. You know the IFERROR bit all the way to the inequality is also an array. Lets evaluate that IFERROR .... Mathematically we should still be working from left to right but trust me, we're ok if we let it slide this once.
IFERROR(SEARCH(SECOND,TARGET),0)>0) SECOND = "array" = 1 = TRUE
Did you follow my short hand? It's ok if you didn't just back up and practice on FIRST until you understand.
Plugging in the value gives us something like this
SUMPRODUCT(TRUE*100000+TRUE*20000+...
SUMPRODUCT is the SUM(addition) of the PRODUCT(multiplication)
So we're adding the stuff we multiply
SUMPRODUCT = (TRUE * 100000) + (TRUE * 20000)
Remember how easy it was to go from 1 > 0 to TRUE. We're "getting all up into that boolean TRUE that" equals 1 Here's a fun fact, -1 is also equal to TRUE. If you've ever seen a formula with a double negative in it like this STUFF(--(MORESTUFF( that's just some Excel wizard person making sure they get a +1 instead of a -1 ... ok, so lets get back on track and evalute
SUMPRODCUT = 1 * 100000 + 1 * 20000+....
SUMPRODCUT = 100000 + 20000+....
SUMPRODCUT = 120000+.....
I know you've been asking about those numbers. One hundred thousand? What's one hundred thousand for? I've been purposefully ignoring until it became convenient to talk about it. And now it's convenient. Go look at the whoooole formula and you'll find a pattern. Those numbers are in a decreasing sequence. Anyone who's ever done bitwise logic can see where this is going.I'm running short on time so I'll cut to the chase. Assume a hypothetical situation where every word was matched. You'd end up with
SUMPRODUCT = 100000 + 20000 + 3000 + 400 + 50 + 6
SUMPRODCUT = 123456
123456 are you pulling my leg? No, I am not. We're almost done so if you're still with me then you're gonna drive it home.
We have a large group of SUBSTITUTE teachers at the front of the line and we gotta get rid of them.
We also have this to contend with :"1",FIRST&DELIM),"2",SECOND&DELIM),"3",THIRD&DELIM),"4",FOURTH&DELIM),"5",FIFTH&DELIM),"6",SIXTH&DELIM),"0","")
Thankfully for us, they are part of the same problem. We worked our way from the middle out.
SUBSTITUTE(SUBSTITUTE(text, old text, new text)
SUBSTITUTE(SUBSTITUTE("123456","1", FIRST & DELIM),"2",SECOND & DELIM)....
Remember at top DELIM was pointing to a cell holding a double space. Each DELIM can be replaced with " " or any other delimiter you want.
SUBSTITUTE(SUBSTITUTE("123456","1", "named" & " "),"2",SECOND & DELIM)...
SUBSTITUTE("named 23456","2",SECOND & DELIM)...
SUBSTITUTE("named 23456","2","array"& " ")...
("named array 3456")... and so on.
Any questions?
Ok, class is dismissed!

Count Patterns In one Cell Excel

I wanted your help, I'm currently working in extracting some data, now the thing is that I have to count an specific amount of Call IDs a call ID format is the following 9129572520020000711. The pattern is 19 characters that starts with 9 and ends in 1.
and I want to count how many times this pattern appears in one cell
I.E. this is the value in one cell and I want to count how many times the pattern appears.
1912957252002000071129129545183410000711391295381628700007114912959791875000071159129597085000000711691295892838400007117912958908933000071189129452513730000711
To solve this with formulae you need to know:
The starting character
The ending character
The length of your Call ID
Finding all possible Call IDs
Let B1 be your number string and B2 be the call ID (or pattern) you are looking for. In B5 enter the formula =MID($B$2,1,1) to find the starting character you are looking for. In B6 enter =RIGHT($B$2,1) for the end character. In B7 enter =LEN($B$2) for the length of the call ID.
In Column A we'll enter the position of every starting character. The first formula will be a simple Find() formula in B10 as =FIND($B$5,$B$1,1). To find the other starting characters start the Find() at the location after the last starting character: =FIND($B$5,$B$1,$A10+1) in B11. Copy this down the column a few dozen times (or more).
In Column B we'll see if the next X characters (where X is the length of the Call ID) meets the criteria for a Call ID:
=IF(MID($B$1,$A10+($B$7-1),1)=$B$6,TRUE,FALSE)
The MID($B$1,$A10+($B$7-1),1)=$B$6 checks if the character at the end of the character at the end of this possible Call ID is the end character we're looking for. $A10+($B$7) calculates the position of the possible Call ID and $B$6 is the end character.
In Column C we can return the actual Call ID if there is a match. This isn't necessary to find the count, but will be useful later. Simply check if the value in Column B is True and, if yes, return the calculated string: =IF(B10,MID($B$1,$A10,$B$7),"").
To actually count the number of valid Call IDs, do a CountIf() of the Call ID column to check for the number of True values: =IF(B10,MID($B$1,$A10,$B$7),"").
If you don't want all the #Values! just wrap everything in IFERROR(,"") formulas.
Finding all consecutive Call IDs
However , some of these Call IDs overlap. Operating on the assumption that Call IDs cannot overlap, we simply have to start our search after the end character of a found ID, not the start. Insert an "Ending Position" column in Column B with the formulae: =$A10+($C$7-1), starting in B11. Alter A11 to =FIND($C$5,$C$1,$B10+1) and copy down. Don't change A10 as this finds the first starting position and is not depending on anything but the original text.
Which ones are valid?
I don't know, that depends on other criteria for your Call IDs. If you receive them consecutively, then the second method is best and the other possible ones found are by coincidence. If not, then you'll have to apply some other validation criteria to the first method, hence why we identified each ID.
You can solve this simply with a UDF using a regular expression.
Option Explicit
Function callIDcount(S As String) As Long
Dim RE As Object, MC As Object
Const sPat As String = "9\d{17}1"
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.Pattern = sPat
Set MC = .Execute(S)
callIDcount = MC.Count
End With
End Function
Using your example, this returns a count of 8
The regular expression engine captures all of the matches that match the pattern, into the match collection. To see how many are there, we merely return the count of that collection.
Trivial modifications would allow one to return the actual ID's also, should that be necessary.
The regex:
9\d{17}1
9\d{17}1
Match the character “9” literally 9
Match a single character that is a “digit” (ASCII 0–9 only) \d{17}
Exactly 17 times {17}
Match the character “1” literally 1
Created with RegexBuddy
EDIT Reading through TheFizh's post, he considered that you might want the count to include overlapping CallID's. In other words, given:
9129572520020000711291
We see that includes:
9129572520020000711
9572520020000711291
where the second overlaps with the first, but both meet your requirements.
Should that be what you want, merely change the regex so it does not "consume" the match:
Const sPat As String = "9(?=\d{17}1)"
and you will return the result of 15 instead of 8, which would be non-overlapping pattern.
Do you mean something like what's following?
Sub CallID_noPatterns()
Dim CallID As String, CallIDLen As Integer
CallID = "9#################1"
CallIDLen = Len(CallID) 'the CallID's length
'Say that you want to get the value of "A1" cell and deal with its value
Dim CellVal As String, CellLen As Integer
CellVal = CStr(Range("A1").Text) 'get its value as a string
CellLen = Len(CellVal) 'get its length
'You Have 2 options:-
'1-The value is smaller than your CallID length. (Not Applicable)
'2-The value is longer than or equal to your CallID length
'So just run your code for the 2nd option
Dim i As Integer, num_checks, num_patterns
i = 0
num_patterns = 0
'imagine both of them as 2 arrays, every array consists of sequenced elements
'and your job is to take a sub-array from your value, of a length
' equals to CallID's length
'then compare your sub-array with CallID
num_checks = CellLen - CallIDLen + 1
If CellLen >= CallIDLen Then
For i = 0 To num_checks - 1 Step 19
For j = i To num_checks - 1
If Mid(CellVal, (j + 1), CallIDLen) Like CallID Then
num_patterns = num_patterns + 1
Exit For
End If
Next j
Next i
End If
'Display your result
MsgBox "Number of Patterns: " & Str(num_patterns)
End Sub

How can I lookup data from one column, when the value I'm referencing changes columns?

I want to do an INDEX-MATCH-like lookup between two documents, except my MATCH's index array doesn't stay in one column.
In Vague-English: I want a value from a known column that matches another value that may be found in any column.
Refer to the image below. Let's call everything to the left of the bold vertical line on column H doc1, and the right side will be doc2.
Doc2 has a column "Find This", which will be the INDEX's array. It is compared with "ID1" from doc1 (Note that the values in "Find This" will not be in the same order as column ID1, but it's easier to undertsand this way).
The "[Result]" column in doc2 will be the value from doc1's "Want This" column from the row that matches "FIND THIS" ...However, sometimes the value from "FIND THIS" is not in the "ID1" column, and is instead in "ID2","ID3", etc.
So, I'm trying to generate Col K from Col J. This would be like pressing Ctrl+F and searching for a value in Col J, then taking the value from Col D in that row and copying it to Col K.
I made identical values from a column the same color in the other doc to make it easier to visualize where they are coming from.
Note also that in column F of doc1, the same value from doc2's "Find This" can be found after some other text.
Also note that the column headers are only there as examples, the ID columns are not actually numbered.
I would simply hard-code the correct column to search from, but I'm not in control of doc1, and I'm worried that future versions may have new "ID" columns, with other's being removed.
I'd prefer this to be a solution in the form of a formula, but VB will do.
To generate column K based on given values of column J then you could use the following:
=INDEX(doc1!$D$2:$D$14,SUMPRODUCT((doc1!$B$2:$H$14=J2)*ROW(doc1!$B$2:$H$14))-1)
Copy that formula down as far as you need to go.
It basically only returns the row of the where a matching column J is found. we then find that row in the index of your D range to get your value in K.
Proof of concept:
UPDATE:
If you are working with non unique entities n column J. That is the value on its own can be found in multiple rows and columns. Consider using the following to return the Last row where there J value is found:
=INDEX(doc1!$D$2:$D$14,AGGREGATE(14,6,(doc1!$B$2:$H$14=J2)*ROW(doc1!$B$2:$H$14),1)-1)
UPDATE 2:
And to return the first row where what you are looking in column J is found use:
=INDEX($D$2:$D$14,AGGREGATE(15,6,1/($B$2:$H$14=J2)*ROW($B$2:$H$14)-1,1))
Thanks to Scott Craner for the hint on the minimum formula.
To determine if you have UNIQUE data from column J in your range B2:H14 you can enter this array formula. In order to enter an array formula you need to press CTRL+SHFT+ENTER at the same time and not just ENTER. You will know you have done it right when you see {} around your formula in the formula bar. You cannot at the {} manually.
=IF(MAX(COUNTIF($B$2:$H$14,J2:J14))>1,"DUPLICATES","UNIQUE")
UPDATE 3
AGGREGATE - A relatively new function to me but goes back to Excel 2010. Aggregate is 19 functions rolled into 1. It would be nice if they all worked the same way but they do not. I think it is functions numbered 14 and up that will perform the same way an array formula or a CSE formula if you prefer. The nice thing is you do not need to use CSE when entering or editing them. SUMPRODUCT is another example of a regular formula that performs array formula calculations.
The meat of this explanation I believe is what is happening inside of the AGGREGATE brackets. If you click on the link you will get an idea of what the first two arguments are. The first defines which function you are using, and the second tell AGGREGATE how to deal with Errors, hidden rows, and some other nested functions. That is the relatively easy part. What I believe you want to know is what is happening with this:
(doc1!$B$2:$H$14=J2)*ROW(doc1!$B$2:$H$14)
For illustrative purpose lets reduce this formula to something a little smaller in scale that does the same thing. I'll avoid starting in A1 as that can make life a little easier when counting since it the 1st row and first column. So by placing the example range outside of it you can see some more special considerations potentially.
What I want to know is what row each of the items list in Column C occurs in column B
| B | C
3 | DOG | PLATYPUS
4 | CAT | DOG
5 | PLATYPUS |
The full formula for our mini example would be:
{=($B$3:$B$5=C2)*ROW($B$3:$B$5)}
And we are going to look at the following as an array
=INDEX($B$3:$B$5,AGGREGATE(14,6,($B$3:$B$5=C2)*ROW($B$3:$B$5),1)-2)
So the first brackets is going to be a Boolean array as you noted. Every cell that is TRUE will TRUE until its forced into a math calculation. When that happens, True becomes 1 and False becomes 0.I that formula was entered as a CSE formula and place in D2, it would break down as follows:
FALSE X 3
FALSE X 4
TRUE X 5
The 3, 4 and 5 come from ROW() returning the value of the row number that it is dealing with at the time of the array math operation. Little trick, we could have had ROW(1:3). Just need to make sure the size of the array matches! This is not matrix math is just straight across multiplication. And since the Boolean is now experiencing a math operation we are now looking at:
0 X 3 = 0
0 X 4 = 0
1 X 5 = 5
So the array of {0,0,5} gets handed back to the aggregate for more processing. The important thing to note here is that it contains ONLY 0 and the individual row numbers where we had a match. So with the first aggregate formula, formula 14 was chosen which is the LARGE function. And we also told it to ignore errors, which in this particular case does not matter. So after providing the array to the aggregate function, there was a ,1) to finish off the aggregate function. The 1 tells the aggregate function that we want the 1st larges number when the array is sorted from smallest to largest. If that number was 2 it would be the 2nd largest number and so on. So the last row or the only row that something is found on is returned. So in our small example it would be 5.
But wait that 5 was buried inside another function called Index. and in our small example that INDEX formula would be:
=INDEX($B$3:$B$5,AGGREGATE(...)-2)
Well we know that the range is only 3 rows long, so asking for the 5th row, would have excel smacking you up side the head with an error because your index number is out of range. So in comes the header row correction of -1 in the original formula or -2 for the small example and what we really see for the small example is:
=INDEX($B$3:$B$5,5-2)
=INDEX($B$3:$B$5,3)
and here is a weird bit of info, That last one does not become PLATYPUS...it becomes the cell reference to =B5 which pulls PLATYPUS. But that little nuance is a story for another time.
Now in the comments Scott essentially told me to invert for the error to get the first row. And this is important step for the aggregate and it had me running in circles for awhile. So the full equation for the first row option in our mini example is
=INDEX($B$3:$B$5,AGGREGATE(15,6,1/($B$3:$B$5=C2)*ROW($B$3:$B$5),1)-2)
And what Scott Craner was actually suggesting which Skips one math step is:
=INDEX($B$3:$B$5,AGGREGATE(15,6,ROW($B$3:$B$5)/($B$3:$B$5=C2),1)-2)
However since I only realized this after writing this all up the explanation will continue with the first of these two equations
So the important thing to note here is the change from function 14 to function 15 which is SMALL. Think of it a finding the minimum. And this time that 6 plays a huge factor along with the 1/. So our array in the middle this time equates to:
1/FALSE X 3
1/FALSE X 4
1/TRUE X 5
Which then becomes:
1/0 X 3
1/0 X 4
1/1 X 5
Which then has excel slapping you up side the head again because you are trying to divide by 0:
#div/0 X 3
#div/0 X 4
1/1 X 5
But you were smart and you protected yourself from that slap upside the head when you told AGGREGATE to ignore error when you used 6 as the second argument/reference! Therefore what is above becomes:
{5}
Since we are performing a SMALL, and we passed ,1) as the closing part of the AGGREGATE, we have essentially said give me the minimum row number or the 1st smallest number of the resulting array when sorted in ascending order.
The rest plays out the same as it did for the LARGE AGGREGATE method. The pitfall I fell into originally is I did not use the 1/ to force an error. As a result, every time I tried getting the SMALL of the array I was getting 0 from all the false results.
SUMPRODUCT works in a very similar fashion, but only works when your result array in the middle only returns 1 non zero answer. The reason being is the last step of the SUMPRODUCT function is to all the individual elements of the resulting array. So if you only have 1 non zero, you get that non zero number. If you had two rows that matched for instance 12 and 31, then the SUMPRODUCT method would return 43 which is not any of the row numbers you wanted, where as aggregate large would have told you 31 and aggregate small would have told you 12.
Something like this maybe, starting in K2 and copied down:
=IFERROR(INDEX(D:D,MAX(IFERROR(MATCH(J2,B:B,0),-1),IFERROR(MATCH(J2,E:E,0),-1),IFERROR(MATCH(J2,G:G,0),-1),IFERROR(MATCH(J2,H:H,0),-1))),"")
If you want to keep the positions of the columns for the Match variable, consider creating generic range names for each column you want to check, like "Col1", "Col2", "Col3". Create a few more range names than you think you will need and reference them to =$B:$B, =$E:$E etc. Plug all range names into Match functions inside the Max() statement as above.
When columns are added or removed from the table, adjust the range name definitions to the columns you want to check.
For example, if you set up the formula with five Matches inside the Max(), and the table changes so you only want to check three columns, point three of the range names to the same column. The Max() will only return one result and one lookup, even if the same column is matched several times.
I came up with a vba solution if I understood correctly:
Sub DisplayActiveRange()
Dim sheetToSearch As Worksheet
Set sheetToSearch = Sheet2
Dim sheetToOutput As Worksheet
Set sheetToOutput = Sheet1
Dim search As Range
Dim output As Range
Dim searchCol As String
searchCol = "J"
Dim outputCol As String
outputCol = "K"
Dim valueCol As String
valueCol = "D"
Dim r As Range
Dim currentRow As Integer
currentRow = 1
Dim maxRow As Integer
maxRow = sheetToOutput.UsedRange.Rows.Count
For currentRow = 1 To maxRow
Set search = Range("J" & currentRow)
For Each r In sheetToSearch.UsedRange
If r.Value <> "" Then
If r.Value = search.Value Then
Set output = sheetToOutput.Range(outputCol & currentRow)
output.Value = sheetToSearch.Range(valueCol & currentRow).Value
currentRow = currentRow + 1
Set search = sheetToOutput.Range(searchCol & currentRow)
End If
End If
Next
Next currentRow
End Sub
There might be better ways of doing it, but this will give you what you want. We assume headers in both "source" and "destination" sheets. You will need to adapt the "Const" declarations according to how your sheets are named. Press Control & G in Excel to bring up the VBA window and copy and paste this code into "This Workbook" under the "VBA Project" group, then select "Run" from the menu:
Option Explicit
Private Const sourceSheet = "Source"
Private Const destSheet = "Destination"
Public Sub FindColumns()
Dim rowCount As Long
Dim foundValue As String
Sheets(destSheet).Select
rowCount = 1 'Assume a header row
Do While Range("J" & rowCount + 1).value <> ""
rowCount = rowCount + 1
foundValue = FncFindText(Range("J" & rowCount).value)
Sheets(destSheet).Select
Range("K" & rowCount).value = foundValue
Loop
End Sub
Private Function FncFindText(value As String) As String
Dim rowLoop As Long
Dim colLoop As Integer
Dim found As Boolean
Dim pos As Long
Sheets(sourceSheet).Select
rowLoop = 1
colLoop = 0
Do While Range(alphaCon(colLoop + 1) & rowLoop + 1).value <> "" And found = False
rowLoop = rowLoop + 1
Do While Range(alphaCon(colLoop + 1) & rowLoop).value <> "" And found = False
colLoop = colLoop + 1
pos = InStr(Range(alphaCon(colLoop) & rowLoop).value, value)
If pos > 0 Then
FncFindText = Mid(Range(alphaCon(colLoop) & rowLoop).value, pos, Len(value))
found = True
End If
Loop
colLoop = 0
Loop
End Function
Private Function alphaCon(aNumber As Integer) As String
Dim letterArray As String
Dim iterations As Integer
letterArray = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
If aNumber <= 26 Then
alphaCon = (Mid$(letterArray, aNumber, 1))
Else
If aNumber Mod 26 = 0 Then
iterations = Int(aNumber / 26)
alphaCon = (Mid$(letterArray, iterations - 1, 1)) & (Mid$(letterArray, 26, 1))
Else
'we deliberately round down using 'Int' as anything with decimal places is not a full iteration.
iterations = Int(aNumber / 26)
alphaCon = (Mid$(letterArray, iterations, 1)) & (Mid$(letterArray, (aNumber - (26 * iterations)), 1))
End If
End If
End Function

Prevent Partial Duplicates in Excel

I have a worksheet with products where the people in my office can add new positions. The problem we're running into is that the products have specifications but not everybody puts them in (or inputs them wrong).
Example:
"cool product 14C"
Is there a way to convert Data Valuation option so that it warns me now in case I put "very cool product 14B" or anything that contains an already existing string of characters (say, longer than 4), like "cool produKt 14C" but also "good product 15" and so on?
I know that I can prevent 100% matches using COUNTIF and spot words that start/end in the same way using LEFT/RIGHT but I need to spot partial matches within the entries as well.
Thanks a lot!
If you want to cover typo's, word wraps, figure permutations etc. maybe a SOUNDEX algorithm would suit to your problem. Here's an implementation for Excel ...
So if you insert this as a user defined function, and create a column =SOUNDEX(A1) for each product row, upon entry of a new product name you can filter for all product rows with same SOUNDEX value. You can further automate this by letting user enter the new name into a dialog form first, do the validation, present them a Combo Box dropdown with possible duplicates, etc. etc. etc.
edit:
small function to find parts of strings terminated by blanks in a range (in answer to your comment)
Function FindSplit(Arg As Range, LookRange As Range) As String
Dim LookFor() As String, LookCell As Range
Dim Idx As Long
LookFor = Split(Arg)
FindSplit = ""
For Idx = 0 To UBound(LookFor)
For Each LookCell In LookRange.Cells
If InStr(1, LookCell, LookFor(Idx)) <> 0 Then
If FindSplit <> "" Then FindSplit = FindSplit & ", "
FindSplit = FindSplit & LookFor(Idx) & ":" & LookCell.Row
End If
Next LookCell
Next Idx
If FindSplit = "" Then FindSplit = "Cool entry!"
End Function
This is a bit crude ... but what it does is the following
split a single cell argument in pieces and put it into an array --> split()
process each piece --> For Idx = ...
search another range for strings that contain the piece --> For Each ...
add piece and row number of cell where it was found into a result string
You can enter/copy this as a formula next to each cell input and know immediately if you've done a cool input or not.
Value of cell D8 is [asd:3, wer:4]
Note the use of absolute addressing in the start of lookup range; this way you can copy the formula well down.
edit 17-Mar-2015
further to comment Joanna 17-Mar-2015, if the search argument is part of the range you're scanning, e.g. =FINDSPLIT(C5; C1:C12) you want to make sure that the If Instr(...) doesn't hit if LookCell and LookFor(Idx) are really the same cell as this would create a false positive. So you would rewrite the statement to
...
...
If InStr(1, LookCell, LookFor(Idx)) <> 0 And _
Not (LookCell.Row = Arg.Row And LookCell.Column = Arg.Column) _
Then
hint
Do not use a complete column (e.g. $C:$C) as the second argument as the function tends to become very slow without further precautions

Deleting variable number of leading characters from a variable-length string

If I am having G4ED7883666 and I want the output to be 7883666
and I have to apply this on a range of cells and they are not the same length and the only common thing is that I have to delete anything before the number that lies before the alphabet?
This formula finds the last number in a string, that is, all digits to the right of the last alpha character in the string.
=RIGHT(A1,MATCH(99,IFERROR(1*MID(A1,LEN(A1)+1-ROW($1:$25),1),99),0)-1)
Note that this is an array formula and must be entered with the Control-Shift-Enter keyboard combination.
How the formula works
Let's assume that the target string is fairly simple: "G4E78"
Working outward from the middle of the formula, the first thing to do is create an array with the elements 1 through 25. (Although this might seem to limit the formula to strings with no more than 25 characters, it actually places a limit of 25 digits on the size of the number that may be extracted by the formula.
ROW($1:$25) = {1;2;3;4;5;6;7; etc.}
Subtracting from this array the value of (1 + the length of the target string) produces a new array, the elements of which count down from the length of string. The first five elements will correspond to the position of the characters of the string - in reverse order!
LEN(A1)+1-ROW($1:$25) = {5;4;3;2;1;0;-1;-2;-3;-4; etc.}
The MID function then creates a new array that reverses the order of the characters of the string.
For example, the first element of the new array is the result of MID(A1, 5, 1), the second of MID(A1, 4, 1) and so on. The #VALUE! errors reflect the fact that MID cannot evaluate 0 or negative values as the position of a string, e.g., MID(A1,0,1) = #VALUE!.
MID(A1,LEN(A1)+1-ROW($1:$25),1) = {"8";"7";"E";"4";"G";#VALUE!;#VALUE!; etc.}
Multiplying the elements of the array by 1 turns the character elements of that array to #VALUE! errors as well.
=1*MID(A1,LEN(A1)+1-ROW($1:$25),1) = {"8";"7";#VALUE!;"4";#VALUE!;#VALUE!;#VALUE!; etc.}
And the IFERROR function turns the #VALUES into 99, which is just an arbitrary number greater than the value of a single digit.
IFERROR(1*MID(A1,LEN(A1)+1-ROW($1:$25),1),99) = {8;7;99;4;99;99;99; etc.}
Matching on the 99 gives the position of the first non-digit character counting from the right end of the string. In this case, "E" is the first non-digit in the reversed string "87E4G", at position 3. This is equivalent to saying that the number we are looking for at the end of the string, plus the "E", is 3 characters long.
MATCH(99,IFERROR(1*MID(A1,LEN(A1)+1-ROW($1:$25),1),99),0) = 3
So, for the final step, we take 3 - 1 (for the "E) characters from the right of string.
RIGHT(A1,MATCH(99,IFERROR(1*MID(A1,LEN(A1)+1-ROW($1:$25),1),99),0)-1) = "78"
One more submission for you to consider. This VBA function will get the right most digits before the first non-numeric character
Public Function GetRightNumbers(str As String)
Dim i As Integer
For i = Len(str) To 0 Step -1
If Not IsNumeric(Mid(str, i, 1)) Then
Exit For
End If
Next i
GetRightNumbers = Mid(str, i + 1)
End Function
You can write some VBA to format the data (just starting at the end and working back until you hit a non-number.)
Or you could (if you're happy to get an addin like Excelicious) then you can use regular expressions to format the text via a formula. An expression like [0-9]+$ would return all the numbers at the end of a string IIRC.
NOTE: This uses the regex pattern in James Snell's answer, so please upvote his answer if you find this useful.
Your best bet is to use a regular expression. You need to set a reference to VBScript Regular Expressions for this to work. Tools --> References...
Now you can use regex in your VBA.
This will find the numbers at the end of each cell. I am placing the result next to the original so that you can verify it is working the way you want. You can modify it to replace the cell as soon as you feel comfortable with it. The code works regardless of the length of the string you are evaluating, and will skip the cell if it doesn't find a match.
Sub GetTrailingNumbers()
Dim ws As Worksheet
Dim rng As Range
Dim cell As Range
Dim result As Object, results As Object
Dim regEx As New VBScript_RegExp_55.RegExp
Set ws = ThisWorkbook.Sheets("Sheet1")
' range is hard-coded here, but you can define
' it programatically based on the shape of your data
Set rng = ws.Range("A1:A3")
' pattern from James Snell's answer
regEx.Pattern = "[0-9]+$"
For Each cell In rng
If regEx.Test(cell.Value) Then
Set results = regEx.Execute(cell.Value)
For Each result In results
cell.Offset(, 1).Value = result.Value
Next result
End If
Next cell
End Sub
Takes the first 4 digits from the right of num:
num1=Right(num,4)
Takes the first 5 digits from the left of num:
num1=Left(num,5)
First takes the first ten digits from the left then takes the first four digits from the right:
num1=Right(Left(num, 10),4)
In your case:
num=G4ED7883666
num1=Right(num,7)

Resources