Way to extract Count from description - excel

I constantly am tasked with pulling information from another source and receive the information without Size denomination. The size or count is in the description and want to be able to write an excel formula with the following logic:
if cell contains Capsules, Tablets, Liquid, etc, then make this cell equal too Capsules, Tablets, Liquid, etc + the number that comes before it
A few example items:
Nature's Bounty Magnesium 500 mg, 200 Tablets
Aerobic Life Mag 07 Oxygen Digestive System Cleanser Capsules, 180 Count
Natural Vitality Natural Calm Anti Stress Drink 30 count Raspberry Lemon
Want the following column to have Tablets or Count if it is said in the description.
I've tried using if and then statements but don't know how to combine them to create 'or' and 'and'
Expect to have the columns from the examples above to look as follows:
200 Tablets
180 Count
30 Count

If the values are in column A, the formula would be like that:
=+IF(ISERROR(VALUE(RIGHT(LEFT(A3,SEARCH(IF(ISERROR(SEARCH("Tablets",A3)),IF(ISERROR(SEARCH("Capsules",A3)),IF(ISERROR(SEARCH("Count",A3)),"","Count"),"Capsules"),"Tablets"),$A3)-1),4))),0,VALUE(RIGHT(LEFT(A3,SEARCH(IF(ISERROR(SEARCH("Tablets",A3)),IF(ISERROR(SEARCH("Capsules",A3)),IF(ISERROR(SEARCH("Count",A3)),"","Count"),"Capsules"),"Tablets"),$A3)-1),4)))&" "&IF(ISERROR(SEARCH("Tablets",A3)),IF(ISERROR(SEARCH("Capsules",A3)),IF(ISERROR(SEARCH("Count",A3)),"","Count"),"Capsules"),"Tablets")
As the formula is long see below separated in two different cells, the first one search which is the item we are looking (Tablets, Capsules or Count) and returns it, the second one search the previous number:
+IF(ISERROR(SEARCH("Tablets",A4)),IF(ISERROR(SEARCH("Capsules",A4)),IF(ISERROR(SEARCH("Count",A4)),"","Count"),"Capsules"),"Tablets")
+IF(ISERROR(VALUE(RIGHT(LEFT(A4,SEARCH(B4,$A4)-1),4))),0,VALUE(RIGHT(LEFT(A4,SEARCH(B4,$A4)-1),4)))
The formula has a limitation regarding the numbers is getting, and it's limited to 9999, more than that the formula trunks the result. Also for number 1 to 9 the formula may return error #value. This can be addressed changing the second formula if needed.

I would do something like this:
Start with an IF statement IF(-test-, -true-, -false-)
Populate the -test- portion. Say I first wanted to find 'Tablet'. I could use ...FIND("Tablet",A2)... However, the issue here is if "Tablet" is not found it will throw an error so I would error-proof it like this ...IFERROR(Find("Tablet",A2), FALSE)... which will return FALSE if "Tablet" is not found in cell A2.
But I also want to look for 'Count' so I could use an OR along with that I already built up ...OR( IFERROR(Find("Tablet",A2), FALSE), IFERROR(Find("Count",A2), FALSE) ) so this will OR together the outcomes - if either "Tablet" or "Count" is found, this function returns a number (and since it's a number, it's read as TRUE!)
Now I throw that OR() equation into my IF for the -true-, then of course my I could make as "Tablet" or "Count" or put a new equation there to specifically say what it found. And my -false- I can make ""
At this point, I would have a complete equation which will check for "Count" or "Tablet" and spit out whatever is in my -true- equation. Otherwise, the cell would remain blank:
IF(OR( IFERROR(Find("Tablet",A2), FALSE), IFERROR(Find("Count",A2), FALSE) ), "Count/Tablet", "")

Here's a lightly-tested UDF using RegExp:
Function Amounts(txt)
Dim re As Object, allMatches, m, rv, sep
Set re = CreateObject("VBScript.RegExp")
re.Pattern = "(\d+\s?(tablets|tablet|count|liquid))"
re.ignorecase = True
re.MultiLine = True
re.Global = True
Set allMatches = re.Execute(txt)
For Each m In allMatches
rv = rv & sep & m
sep = ","
Next m
Amounts = IIf(rv <> "", rv, "(none)")
End Function
Usage:
=Amounts(A2)

Related

Formula to search 3 words "text" in any sequence in different cell

Please take a look at the attached image. I have a long list of items and I've created a common keywords to search in that list. I'm using this formula:
=INDEX(A:A,MATCH((("*"&B2&"*")&("*"&C2&"*")&("*"&D2&"*")&("*"&E2&"*")&("*"&F2&"*")),A:A,0))
The problem that the search is going through the same sequence that I entered.
It gives error if the sequence of the words in the cell is different than the sequence in my formula which make sense.
Is there a way I can search for 3 or more words that are existing in any cell in any sequence?
I am open to using VBA if necessary.
My search results:
Here is the user defined function:
Public Function indexMX(rng As Range, pat1 As Range, pat2 As Range, pat3 As Range, pat4 As Range, pat5 As Range) As Variant
Dim r As Range, rngx As Range, s(1 To 5) As String, Kount As Long, j As Long
s(1) = pat1.Value
s(2) = pat2.Value
s(3) = pat3.Value
s(4) = pat4.Value
s(5) = pat5.Value
Set rngx = Intersect(rng, rng.Parent.UsedRange)
For Each r In rngx
v = r.Value
Kount = 0
For j = 1 To 5
If InStr(1, v, s(j)) > 0 Or s(j) = "" Then Kount = Kount + 1
Next j
If Kount = 5 Then
indexMX = v
Exit Function
End If
Next r
indexMX = "no luck"
End Function
Here is an example of its usage:
As you see, we give the UDF() the address of the column and the addresses of the five keywords and the UDF() finds the first item containing all five words.
If a keyword is blank, it is not used. (so if you want to search for only two keywords, leave the other three blank). If no matches are found the phrase no luck is returned.
User Defined Functions (UDFs) are very easy to install and use:
ALT-F11 brings up the VBE window
ALT-I
ALT-M opens a fresh module
paste the stuff in and close the VBE window
If you save the workbook, the UDF will be saved with it.
If you are using a version of Excel later then 2003, you must save
the file as .xlsm rather than .xlsx
To remove the UDF:
bring up the VBE window as above
clear the code out
close the VBE window
To use the UDF from Excel:
=myfunction(A1)
To learn more about macros in general, see:
http://www.mvps.org/dmcritchie/excel/getstarted.htm
and
http://msdn.microsoft.com/en-us/library/ee814735(v=office.14).aspx
and for specifics on UDFs, see:
http://www.cpearson.com/excel/WritingFunctionsInVBA.aspx
Macros must be enabled for this to work!
EDIT#1:
to remove case sensitivity, replace:
If InStr(1, v, s(j)) > 0 Or s(j) = "" Then Kount = Kount + 1
with:
If InStr(1, LCase(v), LCase(s(j))) > 0 Or s(j) = "" Then Kount = Kount + 1
Yes, it is possible for a single cell to return three matching words from a different cell. The answer in this example uses a formula to return 6 matches. VBA and special array functions are not used.
This is the formula:
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUMPRODUCT((IFERROR(SEARCH(FIRST,TARGET),0)>0)*100000+(IFERROR(SEARCH(SECOND,TARGET),0)>0)*20000+(IFERROR(SEARCH(THIRD,TARGET),0)>0)*3000+(IFERROR(SEARCH(FOURTH,TARGET),0)>0)*400+(IFERROR(SEARCH(FIFTH,TARGET),0)>0)*50+(IFERROR(SEARCH(SIXTH,TARGET),0)>0)*6),"1",FIRST&DELIM),"2",SECOND&DELIM),"3",THIRD&DELIM),"4",FOURTH&DELIM),"5",FIFTH&DELIM),"6",SIXTH&DELIM),"0","")
Did you notice the named ranges? FIRST, SECOND, THIRD, etc are individual cells and each one holds a word. We are trying to find those words inside TARGET. If we find the words, then we will write them in this cell holding this formula and each word will be separated by DELIM
The ranges are optional. In the picture below, you'll see cell A2 contains the word "named". This is the first of the six words we're tying to find and it can be expressed as FIRST = "A2" = "named" Inside the formula you'll see that FIRST appears twice. You could replace it with "named" and the cell A2 would become blank but the functionality of the formula would not change.
Even TARGET is optional. It could be written as E1 or typed out word for word.
I don't know why anyone would do that... but it is possible.
DELIM is at cell B2, it is a double space
Now to explain how it works
SEARCH(search for what?, search where?) This is responsible for determining if a match exists or not. If you understand what the named ranges then you've already figured out the syntax is. The location of first letter that of match in TARGET is returned. In this formula it is always 1 If it is not found then the number is 0
IFERROR(value,value) tries to perform the operation. If successful then the result is displayed. If there's an error the second result is displayed. Every IFERROR in this formula is practically the same: IFERROR(SEARCH(FIRST,TARGET),0) It searches inside TARGET trying to find the FIRST word. Result if found is 1 and if not found is 0
It gets a little more complicated from here so lets recap. We're calling SEARCH 6 times. Once for each word we want to find and we're always looking in TARGET. Result will be a 1 if match is found or a 0 if not. Ironically, us humans can put it together and see the match but the formula can't determine which words have been matched without more information
SUMPRODUCT takes the sum (addition) of the product (multiplication) of two or more arrays.
multiply two arrays to get the product
a, b, c * e, f, g = ae, bf, cg
takethe sum of the product to get the SUMPRODUCT
ae + bf + cg`
This is easiest when thought of a price and quantity. If one array is the price of a group of items and the other is quantity of the same group of items, then multiplying the two arrays will create a new array where each element is the cost to buy all items of that type in the group the total, and adding all those numbers give you the total cost you'd pay for all of the items
Here we multiply two arrays:
Qty Price
12.0 0.3
70.0 0.1
20.0 0.4
Multiply them to get the product:
Qty Price Total
12.0 0.3 3.8
70.0 0.1 7.0
20.0 0.4 8.0
Take the sum of the product:
Qty Price Total
12.0 0.3 3.8
70.0 0.1 7.0
20.0 0.4 8.0
18.8 SUMPRODUCT
Lets look at part of the formula:
SUMPRODUCT((IFERROR(SEARCH(FIRST,TARGET),0)>0)*100000+IFERROR(SEARCH(SECOND,TARGET),0)>0)*20000+...
It is easy to see this segment is looking for two words. We know SUMPRODUCT want's to multiply and add arrays. If you're thinking (IFERROR(SEARCH(FIRST,TARGET),0)>0) is an array, you'd be right! It's not an array in the technical sense of the word, but it does evaluate into a single value which can thought of as a 1x1 array, or, a cell. The sharp eyed and quick witted, may have noticed there is something on this array we have mentioned. It's the inequality at the end! Many of you know that you can take numerical values and turn them into boolean by testing them with an inequality. So lets evaluate.... SEARCH for FIRST inside TARGET = 1 because FIRST = "named" which is inside TARGET waaaaaaay in the back and because it wasn't an error we get to keep the 1. Next we do the inequality 1 > 0 = TRUE One is greater than zero and evaluates to TRUE
This is what we have right now
SUMPRODUCT((TRUE*100000+IFERROR(SEARCH(SECOND,TARGET),0)>0)*20000+....
Can you identify the arrays now? We know TRUE is an array, a 1x1. You know the IFERROR bit all the way to the inequality is also an array. Lets evaluate that IFERROR .... Mathematically we should still be working from left to right but trust me, we're ok if we let it slide this once.
IFERROR(SEARCH(SECOND,TARGET),0)>0) SECOND = "array" = 1 = TRUE
Did you follow my short hand? It's ok if you didn't just back up and practice on FIRST until you understand.
Plugging in the value gives us something like this
SUMPRODUCT(TRUE*100000+TRUE*20000+...
SUMPRODUCT is the SUM(addition) of the PRODUCT(multiplication)
So we're adding the stuff we multiply
SUMPRODUCT = (TRUE * 100000) + (TRUE * 20000)
Remember how easy it was to go from 1 > 0 to TRUE. We're "getting all up into that boolean TRUE that" equals 1 Here's a fun fact, -1 is also equal to TRUE. If you've ever seen a formula with a double negative in it like this STUFF(--(MORESTUFF( that's just some Excel wizard person making sure they get a +1 instead of a -1 ... ok, so lets get back on track and evalute
SUMPRODCUT = 1 * 100000 + 1 * 20000+....
SUMPRODCUT = 100000 + 20000+....
SUMPRODCUT = 120000+.....
I know you've been asking about those numbers. One hundred thousand? What's one hundred thousand for? I've been purposefully ignoring until it became convenient to talk about it. And now it's convenient. Go look at the whoooole formula and you'll find a pattern. Those numbers are in a decreasing sequence. Anyone who's ever done bitwise logic can see where this is going.I'm running short on time so I'll cut to the chase. Assume a hypothetical situation where every word was matched. You'd end up with
SUMPRODUCT = 100000 + 20000 + 3000 + 400 + 50 + 6
SUMPRODCUT = 123456
123456 are you pulling my leg? No, I am not. We're almost done so if you're still with me then you're gonna drive it home.
We have a large group of SUBSTITUTE teachers at the front of the line and we gotta get rid of them.
We also have this to contend with :"1",FIRST&DELIM),"2",SECOND&DELIM),"3",THIRD&DELIM),"4",FOURTH&DELIM),"5",FIFTH&DELIM),"6",SIXTH&DELIM),"0","")
Thankfully for us, they are part of the same problem. We worked our way from the middle out.
SUBSTITUTE(SUBSTITUTE(text, old text, new text)
SUBSTITUTE(SUBSTITUTE("123456","1", FIRST & DELIM),"2",SECOND & DELIM)....
Remember at top DELIM was pointing to a cell holding a double space. Each DELIM can be replaced with " " or any other delimiter you want.
SUBSTITUTE(SUBSTITUTE("123456","1", "named" & " "),"2",SECOND & DELIM)...
SUBSTITUTE("named 23456","2",SECOND & DELIM)...
SUBSTITUTE("named 23456","2","array"& " ")...
("named array 3456")... and so on.
Any questions?
Ok, class is dismissed!

How do I sum data based on a PART of the headers name?

Say I have columns
/670 - White | /650 - black | /680 - Red | /800 - Whitest
These have data in their rows. Basically, I want to SUM their values together if their headers contain my desired string.
For modularity's sake, I wanted to merely specify to sum /670, /650, and /680 without having to mention the rest of the header text.
So, something like =SUMIF(a1:c1; "/NUM & /NUM & /NUM"; a2:c2)
That doesn't work, and honestly I don't know what i should be looking for.
Additional stuff:
I'm trying to think of the answer myself, is it possible to mention the header text as condition for ifs? Like: if A2="/650 - Black" then proceed to sum the next header. Is this possible?
Possibility it would not involve VBA, a draggable formula would be preferable!
At this point, I may as well request a version which handles the complete header name rather than just a part of it as I believe it to be difficult for formula code alone.
Thanks for having a look!
Let me know if I need to elaborate.
EDIT: In regards to data samples, any positive number will do actually, damn shame stack overflow doesn't support table markdown. Anyway, for example then..:
+-------------+-------------+-------------+-------------+-------------+
| A | B | C | D | E |
+---+-------------+-------------+-------------+-------------+-------------+
| 1 |/650 - Black |/670 - White |/800 - White |/680 - Red |/650 - Black |
+---+-------------+-------------+-------------+-------------+-------------+
| 2 | 250 | 400 | 100 | 300 | 125 |
+---+-------------+-------------+-------------+-------------+-------------+
I should have clarified:
The number range for these headers would go from /100 - /9999 and no more than that.
EDIT:
Progress so far:
https://docs.google.com/spreadsheets/d/1GiJKFcPWzG5bDsNt93eG7WS_M5uuVk9cvkt2VGSbpxY/edit?usp=sharing
Formula:
=SUMPRODUCT((A2:D2*
(MID($A$1:$D$1,2,4)=IF(LEN($H$1)=4,$H$1&"",$H$1&" ")))+(A2:D2*
(MID($A$1:$D$1,2,4)=IF(LEN($I$1)=4,$I$1&"",$I$1&" ")))+(A2:D2*
(MID($A$1:$D$1,2,4)=IF(LEN($J$1)=4,$J$1&"",$J$1&" "))))
Apparently, each MID function is returning false with each F9 calculation.
EDIT EDIT:
Okay! I found my issue, it's the /being read when you ALSO mentioned that it wasn't required. Man, I should stop skimming!
Final Edit:
=SUMPRODUCT((RETURNSUM*
(MID(HEADER,2,4)=IF(LEN(Match5)=4,Match5&"",Match5&" ")))+(RETURNSUM*
(MID(HEADER,2,4)=IF(LEN(Match6)=4,Match6&"",Match6&" ")))+(RETURNSUM*
(MID(HEADER,2,4)=IF(LEN(Match7)=4,Match7&"",Match7&" ")))
The idea is that Header and RETURNSUM will become match criteria like the matches written above, that way it would be easier to punch new criterion into the search table. As of the moment, it doesn't support multiple rows/dragging.
I have knocked up a couple of formulas that will achieve what you are looking for. For ease I have made the search input require the number only as pressing / does not automatically type into the formula bar. I apologise for the length of the answer, I got a little carried away with the explanation.
I have set this up for 3 criteria located in J1, K1 and L1.
Here is the output I achieved:
Formula 1 - SUMPRODUCT():
=SUMPRODUCT((A4:G4*(MID($A$1:$G$1,2,4)=IF(LEN($J$1)=4,$J$1&"",$J$1&" ")))+(A4:G4*(MID($A$1:$G$1,2,4)=IF(LEN($K$1)=4,$K$1&"",$K$1&" ")))+(A4:G4*(MID($A$1:$G$1,2,4)=IF(LEN($L$1)=4,$L$1&"",$L$1&" "))))
Sumproduct(array1,[array2]) behaves as an array formula without needed to be entered as one. Array formulas break down ranges and calculate them cell by cell (in this example we are using single rows so the formula will assess columns seperately).
(A4:G4*(MID($A$1:$G$1,2,4)=IF(LEN($J$1)=4,$J$1&"",$J$1&" ")))
Essentially I have broken the Sumproduct() formula into 3 identical parts - 1 for each search condition. (A4:G4*: Now, as the formula behaves like an array, we will multiply each individual cell by either 1 or 0 and add the results together.
1 is produced when the next part of the formula is true and 0 for when it is false (default numeric values for TRUE/FALSE).
(MID($A$1:$G$1,2,4)=IF(LEN($J$1)=4,$J$1&"",$J$1&" "))
MID(text,start_num,num_chars) is being used here to assess the 4 digits after the "/" and see whether they match with the number in the 3 cells that we are searching from (in this case the first one: J1). Again, as SUMPRODUCT() works very much like an array formula, each cell in the range will be assessed individually.
I have then used the IF(logical_test,[value_if_true],[value_if_false]) to check the length of the number that I am searching. As we are searching for a 4 digit text string, if the number is 4 digits then add nothing ("") to force it to a text string and if it is not (as it will have to be 3 digits) add 1 space to the end (" ") again forcing it to become a text string.
The formula will then perform the calculation like so:
The MID() formula produces the array: {"650 ","670 ","800 ","680 ","977 ","9999","143 "}. This combined with the first search produces {TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE} which when multiplied by A4:G4
(remember 0 for false and 1 for true) produces this array: {250,0,0,0,0,0,0} essentially pulling the desired result ready to be summed together.
Formula 2: =SUM(IF(Array)): [This formula does not work for 3 digit numbers as they will exist within the 4 digit numbers! I have included it for educational purposes only]
=SUM(IF(ISNUMBER(SEARCH($J$1,$A$1:$G$1)),A8:G8),IF(ISNUMBER(SEARCH($K$1,$A$1:$G$1)),A8:G8),IF(ISNUMBER(SEARCH($L$1,$A$1:$G$1)),A8:G8))
The formula will need to be entered as an array (once copy and pasted while still in the formula bar hit CTRL+SHIFT+ENTER)
This formula works in a similar way, SUM() will add together the array values produced where IF(ISNUMBER(SEARCH() columns match the result column.
SEARCH() will return a number when it finds the exact characters in a cell which represents it's position in number of characters. By using ISNUMBER() I am avoiding having to do the whole MID() and IF(LEN()=4,""," ") I used in the previous formula as TRUE/FALSE will be produced when a match is found regardless of it's position or cell formatting.
As previously mentioned, this poses a problem as 999 can be found within 9999 etc.
The resulting array for the first part is: {250,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE} (if you would like to see the array you can highlight that part of the formula and calculate with F9 but be sure to highlight the exact brackets for that part of the formula).
I hope I have explained this well, feel free to ask any questions about stuff that you don't understand. It is good to see people keen to learn and not just fishing for a fast answer. I would be more than happy to help and explain in more depth.
I start this solution with the names in an array, you can read the header names into an array with not too much difficulty.
Sub test()
Dim myArray(1 To 4) As String
myArray(1) = "/670 - White"
myArray(2) = "/650 - black"
myArray(3) = "/680 - Red"
myArray(4) = "/800 - Whitest"
For Each ArrayValue In myArray
'Find position of last character
endposition = InStr(1, ArrayValue, " - ", vbTextCompare)
'Grab the number section from the string, based on starting and ending positions
stringvalue = Mid(ArrayValue, 2, endposition - 2)
'Convert to number
NumberValue = CLng(stringvalue)
'Add to total
Total = Total + NumberValue
Next ArrayValue
'Print total
Debug.Print Total
End Sub
This will print the answer to the debug window.

Prevent Partial Duplicates in Excel

I have a worksheet with products where the people in my office can add new positions. The problem we're running into is that the products have specifications but not everybody puts them in (or inputs them wrong).
Example:
"cool product 14C"
Is there a way to convert Data Valuation option so that it warns me now in case I put "very cool product 14B" or anything that contains an already existing string of characters (say, longer than 4), like "cool produKt 14C" but also "good product 15" and so on?
I know that I can prevent 100% matches using COUNTIF and spot words that start/end in the same way using LEFT/RIGHT but I need to spot partial matches within the entries as well.
Thanks a lot!
If you want to cover typo's, word wraps, figure permutations etc. maybe a SOUNDEX algorithm would suit to your problem. Here's an implementation for Excel ...
So if you insert this as a user defined function, and create a column =SOUNDEX(A1) for each product row, upon entry of a new product name you can filter for all product rows with same SOUNDEX value. You can further automate this by letting user enter the new name into a dialog form first, do the validation, present them a Combo Box dropdown with possible duplicates, etc. etc. etc.
edit:
small function to find parts of strings terminated by blanks in a range (in answer to your comment)
Function FindSplit(Arg As Range, LookRange As Range) As String
Dim LookFor() As String, LookCell As Range
Dim Idx As Long
LookFor = Split(Arg)
FindSplit = ""
For Idx = 0 To UBound(LookFor)
For Each LookCell In LookRange.Cells
If InStr(1, LookCell, LookFor(Idx)) <> 0 Then
If FindSplit <> "" Then FindSplit = FindSplit & ", "
FindSplit = FindSplit & LookFor(Idx) & ":" & LookCell.Row
End If
Next LookCell
Next Idx
If FindSplit = "" Then FindSplit = "Cool entry!"
End Function
This is a bit crude ... but what it does is the following
split a single cell argument in pieces and put it into an array --> split()
process each piece --> For Idx = ...
search another range for strings that contain the piece --> For Each ...
add piece and row number of cell where it was found into a result string
You can enter/copy this as a formula next to each cell input and know immediately if you've done a cool input or not.
Value of cell D8 is [asd:3, wer:4]
Note the use of absolute addressing in the start of lookup range; this way you can copy the formula well down.
edit 17-Mar-2015
further to comment Joanna 17-Mar-2015, if the search argument is part of the range you're scanning, e.g. =FINDSPLIT(C5; C1:C12) you want to make sure that the If Instr(...) doesn't hit if LookCell and LookFor(Idx) are really the same cell as this would create a false positive. So you would rewrite the statement to
...
...
If InStr(1, LookCell, LookFor(Idx)) <> 0 And _
Not (LookCell.Row = Arg.Row And LookCell.Column = Arg.Column) _
Then
hint
Do not use a complete column (e.g. $C:$C) as the second argument as the function tends to become very slow without further precautions

Excel Substrings

I have two unordered sets of data here:
blah blah:2020:50::7.1:45
movie blah:blahbah, The:1914:54:
I want to extract all the data to the left of the year (aka, 1915 and 1914).
What excel formula would I use for this?
I tried this formula
=IF(ISNUMBER(SEARCH(":",A1)),MID(A1,SEARCH(":",A1),300),A1)
these were the results below:
: blahblah, The:1914:54::7
:1915:50::7.1:45:
This is because there is a colon in the movie title.
The results I need consistently are:
:1914:54::7.9:17::
:1915:50::7.1:45::
Can someone help with this?
You can use Regular Expressions, make sure you include a reference for it in your VBA editor. The following UDF will do the job.
Function ExtractNumber(cell As Range) As String
ExtractNumber = ""
Dim rex As New RegExp
rex.Pattern = "(:\d{4}:\d{2}::\d\.\d:\d{2}::\d:\d:\d:\d:\d:\d:\d)"
rex.Global = True
Dim mtch As Object, sbmtch As Object
For Each mtch In rex.Execute(cell.Value)
ExtractNumber = ExtractNumber & mtch.SubMatches(0)
Next mtch
End Function
Without VBA:
In reality you don't want to find the : You want to find either :1 or :2 since the year will either start with 1 or 2This formula should do it:
=MID(A1,MIN(IFERROR(FIND(":1",A1,1),9999),IFERROR(FIND(":2",A1),9999)),9999)
Look for a four digit string, in a certain range, bounded by colons.
For example:
=MID(A1,MIN(FIND(":" &ROW(INDIRECT("1900:2100"))&":",A1 &":" &ROW(INDIRECT("1900:2100"))&":")),99)
entered as an array formula by holding down ctrl-shift while hitting Enter would ensure years in the range 1900 to 2100. Change those values as appropriate for your data. The 99 at the end represents the longest possible string. Again, that can be increased as required.
You can use the same approach to return just the left hand part, up to the colon preceding the year:
=LEFT(A1,-1+MIN(FIND(":" &ROW(INDIRECT("1900:2100"))&":",A1 &":" &ROW(INDIRECT("1900:2100"))&":")))
Here is a screen shot, showing the original data in B1:B2, with the results of the first part in B4:B5, and the formula for B4 showing in the formula bar.
The results for the 2nd part are in B7:B9

How to count up elements in excel

So I have a column called chemical formula for like 40,000 entries, and what I want to be able to do is count up how many elements are contained in the chemical formula. So for example:-
EXACT_MASS FORMULA
626.491026 C40H66O5
275.173274 C13H25NO5
For this, I need some kind of formula that will return with the result of
C H O
40 66 5
13 25 5
all as separate columns for the different elements and in rows for the different entries. Is there a formula that can do this?
You could make your own formula.
Open the VBA editor with ALT and F11 and insert a new module.
Add a reference to Microsoft VBScript Regular Expressions 5.5 by clicking Tools, then references.
Now add the following code:
Public Function FormulaSplit(theFormula As String, theLetter As String) As String
Dim RE As Object
Set RE = CreateObject("VBScript.RegExp")
With RE
.Global = True
.MultiLine = False
.IgnoreCase = False
.Pattern = "[A-Z]{1}[a-z]?"
End With
Dim Matches As Object
Set Matches = RE.Execute(theFormula)
Dim TheCollection As Collection
Set TheCollection = New Collection
Dim i As Integer
Dim Match As Object
For i = (Matches.Count - 1) To 0 Step -1
Set Match = Matches.Item(i)
TheCollection.Add Mid(theFormula, Match.FirstIndex + (Len(Match.Value) + 1)), UCase(Trim(Match.Value))
theFormula = Left(theFormula, Match.FirstIndex)
Next
FormulaSplit = "Not found"
On Error Resume Next
FormulaSplit = TheCollection.Item(UCase(Trim(theLetter)))
On Error GoTo 0
If FormulaSplit = "" Then
FormulaSplit = "1"
End If
Set RE = Nothing
Set Matches = Nothing
Set Match = Nothing
Set TheCollection = Nothing
End Function
Usage:
FormulaSplit("C40H66O5", "H") would return 66.
FormulaSplit("C40H66O5", "O") would return 5.
FormulaSplit("C40H66O5", "blah") would return "Not found".
You can use this formula directly in your workbook.
I've had a stab at doing this in a formula nad come up with the following:
=IFERROR((MID($C18,FIND(D17,$C18)+1,2))*1,IFERROR((MID($C18,FIND(D17,$C18)+1,1))*1,IFERROR(IF(FIND(D17,$C18)>0,1),0)))
It's not very neat and would have to be expanded further if any of your elements are going to appear more than 99 times - I also used a random placement on my worksheet so the titles H,C and O are in row 17. I would personally go with Jamie's answer but just wanted to try this to see if I could do it in a formula possible and figured it was worth sharing just as another perspective.
Even though this has an excellent (and accepted) VBA solution, I couldn't resist the challenge to do this without using VBA.
I posted a solution earlier, which wouldn't work in all cases. This new code should always work:
=MAX(
IFERROR(IF(FIND(C$1&ROW($1:$99),$B2),ROW($1:$99),0),0),
IFERROR(IF(FIND(C$1&CHAR(ROW($65:$90)),$B2&"Z"),1,0),0)
)
Enter as an array formula: Ctrl + Shift + Enter
Output:
The formula outputs 0 when not found, and I simply used conditional formatting to turn zeroes gray.
How it works
This part of the formula looks for the element, followed by a number between 1 and 99. If found, the number of atoms is returned. Otherwise, 0 is returned. The results are stored in an array:
IFERROR(IF(FIND(C$1&ROW($1:$99),$B2),ROW($1:$99),0),0)
In the case of C13H25NO5, a search for "C" returns this array:
{1,0,0,0,0,0,0,0,0,0,0,0,13,0,0,0,...,0}
1 is the first array element, because C1 is a match. 13 is the thirteenth array element, and that's what we're interested in.
The next part of the formula looks for the element, followed by an uppercase letter, which indicates a new element. (The letters A through Z are characters 65 through 90.) If found, the number 1 is returned. Otherwise, 0 is returned. The results are stored in an array:
IFERROR(IF(FIND(C$1&CHAR(ROW($65:$90)),$B2&"Z"),1,0),0)
"Z" is appended to the chemical formula, so that a match will be found when its last element has no number. (For example, "H2O".) There is no element "Z" in the Periodic Table, so this won't cause a problem.
In the case of C13H25NO5, a search for "N" returns this array:
{0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0}
1 is the 15th element in the array. That's because it found the letters "NO", and O is the 15th letter of the alphabet.
Taking the maximum value from each array gives us the number of atoms as desired.

Resources