Calculate alphanumeric string to an integer in Excel - string

I have an issue that I've not been able to figure out even with many of the ideas presented in other posts. My data comes in Excel and here are examples of each manner that any given cell might have the data:
4days 4hrs 41mins 29seconds
23hrs 43mins 4seconds
2hrs 2mins
52mins 16seconds
The end result would be to calculate the total minutes while allowing seconds to be ignored, so that the previous values would end up as follows:
6041
52
1423
122
Would anyone have an idea how to go about that?
Thanks for the assistance!

Bit tedious (and assumes units are always plural - also produces results in different order to example) but, with formulae only, if your data is in column A, in B1 and copied down:
="="&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"days","*1440+"),"hrs","*60+"),"mins","*1+"),"seconds","*0")," ","")&0
then Copy B and Paste Special values into C and apply Text to Columns to C with Tab as the delimiter.

This array formula** should also work:
=SUM(IFERROR(0+MID(REPT(" ",31)&SUBSTITUTE(A1&"dayhrminsecond"," ",REPT(" ",31)),FIND({"day","hr","min","second"},REPT(" ",31)&SUBSTITUTE(A1&"dayhrminsecond"," ",REPT(" ",31)))-31,31),0)*{1440,60,1,0})
Regards
**Array formulas are not entered in the same way as 'standard' formulas. Instead of pressing just ENTER, you first hold down CTRL and SHIFT, and only then press ENTER. If you've done it correctly, you'll notice Excel puts curly brackets {} around the formula (though do not attempt to manually insert these yourself).

The easiest option is probably VBA with a regular expression. You can then easily find each of the fields, and do the maths.
If you want to stick to "pure" Excel, then it seems to only option is to use SEARCH or FIND to find the position of each of the "days", "hrs", "mins" in the text (you may have to check if they're always plural). Then use MID with the position found above to extract the different components. See http://office.microsoft.com/en-gb/excel-help/split-text-among-columns-by-using-functions-HA010102341.aspx for similar examples.
But there's quite a bit of work to handle the cases where some components are missing, so either you'll use quite a few cells, so you'll get a very complex formula...

Here is a User Defined Function, written in VBA, which takes your string as the argument and returns the number of minutes. Only the first characters of the time interval names are checked (e.g. d, h, m) as this seems to provide sufficient discrimination.
To enter this User Defined Function (UDF), opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.
To use this User Defined Function (UDF), enter a formula like
=SumMinutes(A1)
in some cell.
Option Explicit
Function SumMinutes(S As String) As Long
Dim RE As Object, MC As Object
Dim lMins As Long
Dim I As Long
Set RE = CreateObject("vbscript.regexp")
With RE
.Pattern = "(\d+)(?=\s*d)|(\d+)(?=\s*h)|(\d+)(?=\s*m)"
.Global = True
.ignorecase = True
If .test(S) = True Then
Set MC = .Execute(S)
For I = 0 To MC.Count - 1
With MC(I)
lMins = lMins + _
.submatches(0) * 1440 + _
.submatches(1) * 60 + _
.submatches(2)
End With
Next I
End If
End With
SumMinutes = lMins
End Function

Related

Formula to search 3 words "text" in any sequence in different cell

Please take a look at the attached image. I have a long list of items and I've created a common keywords to search in that list. I'm using this formula:
=INDEX(A:A,MATCH((("*"&B2&"*")&("*"&C2&"*")&("*"&D2&"*")&("*"&E2&"*")&("*"&F2&"*")),A:A,0))
The problem that the search is going through the same sequence that I entered.
It gives error if the sequence of the words in the cell is different than the sequence in my formula which make sense.
Is there a way I can search for 3 or more words that are existing in any cell in any sequence?
I am open to using VBA if necessary.
My search results:
Here is the user defined function:
Public Function indexMX(rng As Range, pat1 As Range, pat2 As Range, pat3 As Range, pat4 As Range, pat5 As Range) As Variant
Dim r As Range, rngx As Range, s(1 To 5) As String, Kount As Long, j As Long
s(1) = pat1.Value
s(2) = pat2.Value
s(3) = pat3.Value
s(4) = pat4.Value
s(5) = pat5.Value
Set rngx = Intersect(rng, rng.Parent.UsedRange)
For Each r In rngx
v = r.Value
Kount = 0
For j = 1 To 5
If InStr(1, v, s(j)) > 0 Or s(j) = "" Then Kount = Kount + 1
Next j
If Kount = 5 Then
indexMX = v
Exit Function
End If
Next r
indexMX = "no luck"
End Function
Here is an example of its usage:
As you see, we give the UDF() the address of the column and the addresses of the five keywords and the UDF() finds the first item containing all five words.
If a keyword is blank, it is not used. (so if you want to search for only two keywords, leave the other three blank). If no matches are found the phrase no luck is returned.
User Defined Functions (UDFs) are very easy to install and use:
ALT-F11 brings up the VBE window
ALT-I
ALT-M opens a fresh module
paste the stuff in and close the VBE window
If you save the workbook, the UDF will be saved with it.
If you are using a version of Excel later then 2003, you must save
the file as .xlsm rather than .xlsx
To remove the UDF:
bring up the VBE window as above
clear the code out
close the VBE window
To use the UDF from Excel:
=myfunction(A1)
To learn more about macros in general, see:
http://www.mvps.org/dmcritchie/excel/getstarted.htm
and
http://msdn.microsoft.com/en-us/library/ee814735(v=office.14).aspx
and for specifics on UDFs, see:
http://www.cpearson.com/excel/WritingFunctionsInVBA.aspx
Macros must be enabled for this to work!
EDIT#1:
to remove case sensitivity, replace:
If InStr(1, v, s(j)) > 0 Or s(j) = "" Then Kount = Kount + 1
with:
If InStr(1, LCase(v), LCase(s(j))) > 0 Or s(j) = "" Then Kount = Kount + 1
Yes, it is possible for a single cell to return three matching words from a different cell. The answer in this example uses a formula to return 6 matches. VBA and special array functions are not used.
This is the formula:
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUMPRODUCT((IFERROR(SEARCH(FIRST,TARGET),0)>0)*100000+(IFERROR(SEARCH(SECOND,TARGET),0)>0)*20000+(IFERROR(SEARCH(THIRD,TARGET),0)>0)*3000+(IFERROR(SEARCH(FOURTH,TARGET),0)>0)*400+(IFERROR(SEARCH(FIFTH,TARGET),0)>0)*50+(IFERROR(SEARCH(SIXTH,TARGET),0)>0)*6),"1",FIRST&DELIM),"2",SECOND&DELIM),"3",THIRD&DELIM),"4",FOURTH&DELIM),"5",FIFTH&DELIM),"6",SIXTH&DELIM),"0","")
Did you notice the named ranges? FIRST, SECOND, THIRD, etc are individual cells and each one holds a word. We are trying to find those words inside TARGET. If we find the words, then we will write them in this cell holding this formula and each word will be separated by DELIM
The ranges are optional. In the picture below, you'll see cell A2 contains the word "named". This is the first of the six words we're tying to find and it can be expressed as FIRST = "A2" = "named" Inside the formula you'll see that FIRST appears twice. You could replace it with "named" and the cell A2 would become blank but the functionality of the formula would not change.
Even TARGET is optional. It could be written as E1 or typed out word for word.
I don't know why anyone would do that... but it is possible.
DELIM is at cell B2, it is a double space
Now to explain how it works
SEARCH(search for what?, search where?) This is responsible for determining if a match exists or not. If you understand what the named ranges then you've already figured out the syntax is. The location of first letter that of match in TARGET is returned. In this formula it is always 1 If it is not found then the number is 0
IFERROR(value,value) tries to perform the operation. If successful then the result is displayed. If there's an error the second result is displayed. Every IFERROR in this formula is practically the same: IFERROR(SEARCH(FIRST,TARGET),0) It searches inside TARGET trying to find the FIRST word. Result if found is 1 and if not found is 0
It gets a little more complicated from here so lets recap. We're calling SEARCH 6 times. Once for each word we want to find and we're always looking in TARGET. Result will be a 1 if match is found or a 0 if not. Ironically, us humans can put it together and see the match but the formula can't determine which words have been matched without more information
SUMPRODUCT takes the sum (addition) of the product (multiplication) of two or more arrays.
multiply two arrays to get the product
a, b, c * e, f, g = ae, bf, cg
takethe sum of the product to get the SUMPRODUCT
ae + bf + cg`
This is easiest when thought of a price and quantity. If one array is the price of a group of items and the other is quantity of the same group of items, then multiplying the two arrays will create a new array where each element is the cost to buy all items of that type in the group the total, and adding all those numbers give you the total cost you'd pay for all of the items
Here we multiply two arrays:
Qty Price
12.0 0.3
70.0 0.1
20.0 0.4
Multiply them to get the product:
Qty Price Total
12.0 0.3 3.8
70.0 0.1 7.0
20.0 0.4 8.0
Take the sum of the product:
Qty Price Total
12.0 0.3 3.8
70.0 0.1 7.0
20.0 0.4 8.0
18.8 SUMPRODUCT
Lets look at part of the formula:
SUMPRODUCT((IFERROR(SEARCH(FIRST,TARGET),0)>0)*100000+IFERROR(SEARCH(SECOND,TARGET),0)>0)*20000+...
It is easy to see this segment is looking for two words. We know SUMPRODUCT want's to multiply and add arrays. If you're thinking (IFERROR(SEARCH(FIRST,TARGET),0)>0) is an array, you'd be right! It's not an array in the technical sense of the word, but it does evaluate into a single value which can thought of as a 1x1 array, or, a cell. The sharp eyed and quick witted, may have noticed there is something on this array we have mentioned. It's the inequality at the end! Many of you know that you can take numerical values and turn them into boolean by testing them with an inequality. So lets evaluate.... SEARCH for FIRST inside TARGET = 1 because FIRST = "named" which is inside TARGET waaaaaaay in the back and because it wasn't an error we get to keep the 1. Next we do the inequality 1 > 0 = TRUE One is greater than zero and evaluates to TRUE
This is what we have right now
SUMPRODUCT((TRUE*100000+IFERROR(SEARCH(SECOND,TARGET),0)>0)*20000+....
Can you identify the arrays now? We know TRUE is an array, a 1x1. You know the IFERROR bit all the way to the inequality is also an array. Lets evaluate that IFERROR .... Mathematically we should still be working from left to right but trust me, we're ok if we let it slide this once.
IFERROR(SEARCH(SECOND,TARGET),0)>0) SECOND = "array" = 1 = TRUE
Did you follow my short hand? It's ok if you didn't just back up and practice on FIRST until you understand.
Plugging in the value gives us something like this
SUMPRODUCT(TRUE*100000+TRUE*20000+...
SUMPRODUCT is the SUM(addition) of the PRODUCT(multiplication)
So we're adding the stuff we multiply
SUMPRODUCT = (TRUE * 100000) + (TRUE * 20000)
Remember how easy it was to go from 1 > 0 to TRUE. We're "getting all up into that boolean TRUE that" equals 1 Here's a fun fact, -1 is also equal to TRUE. If you've ever seen a formula with a double negative in it like this STUFF(--(MORESTUFF( that's just some Excel wizard person making sure they get a +1 instead of a -1 ... ok, so lets get back on track and evalute
SUMPRODCUT = 1 * 100000 + 1 * 20000+....
SUMPRODCUT = 100000 + 20000+....
SUMPRODCUT = 120000+.....
I know you've been asking about those numbers. One hundred thousand? What's one hundred thousand for? I've been purposefully ignoring until it became convenient to talk about it. And now it's convenient. Go look at the whoooole formula and you'll find a pattern. Those numbers are in a decreasing sequence. Anyone who's ever done bitwise logic can see where this is going.I'm running short on time so I'll cut to the chase. Assume a hypothetical situation where every word was matched. You'd end up with
SUMPRODUCT = 100000 + 20000 + 3000 + 400 + 50 + 6
SUMPRODCUT = 123456
123456 are you pulling my leg? No, I am not. We're almost done so if you're still with me then you're gonna drive it home.
We have a large group of SUBSTITUTE teachers at the front of the line and we gotta get rid of them.
We also have this to contend with :"1",FIRST&DELIM),"2",SECOND&DELIM),"3",THIRD&DELIM),"4",FOURTH&DELIM),"5",FIFTH&DELIM),"6",SIXTH&DELIM),"0","")
Thankfully for us, they are part of the same problem. We worked our way from the middle out.
SUBSTITUTE(SUBSTITUTE(text, old text, new text)
SUBSTITUTE(SUBSTITUTE("123456","1", FIRST & DELIM),"2",SECOND & DELIM)....
Remember at top DELIM was pointing to a cell holding a double space. Each DELIM can be replaced with " " or any other delimiter you want.
SUBSTITUTE(SUBSTITUTE("123456","1", "named" & " "),"2",SECOND & DELIM)...
SUBSTITUTE("named 23456","2",SECOND & DELIM)...
SUBSTITUTE("named 23456","2","array"& " ")...
("named array 3456")... and so on.
Any questions?
Ok, class is dismissed!

Find entire number between # and space at variable places within a cell

How can I find an entire number between a "#" and a space when that combination could appear anywhere in a given cell?
Example cell contents:
"This is a #123 Test that I 45 like to run"
"This is a #45 Test that I 98 like to run"
I need to return "123" from the first one and "45" from the second one.
Using Mid(), I can return the "1", but the problem is the number between # and space can vary in length, but there will generally be a #, number or numbers, then a space.
As a secondary issue, there may be scenarios where there is no "#", but I need to find the first numeric value in the cell and return them (i.e. "1", "34", "648").
Any advice on either of these challenges is greatly appreciated.
This should work as well:
=MID(A11,(FIND("#",A11,1)+1),FIND(" ",A11,FIND("#",A11,1)+1)-FIND("#",A11,1))
works by looking for the hash and the following space... Not for the secondary question...
Since you've put the excel-vba tag on your question, here's a vba way of doing it using regular expressions that should satisfy both your primary and secondary issues:
Sub tmp()
Dim regEx As New RegExp
regEx.Pattern = "^.*?\#?(\d+)"
Dim i As Integer
For i = 1 To Range("A" & Rows.Count).End(xlUp).Row:
Set mat = regEx.Execute(Cells(i, 1).Value)
If mat.Count = 1 Then
Cells(i, 2).Value = mat(0).SubMatches(0)
End If
Next
End Sub
The regular expression uses a non-greedy character search (ie the "?" on the end of "'.*?" is what does that) to find the first pattern in the cell that matches either "#123" or just "123" where the "123" is any arbitrary sequence of digits.
This will return the first number in a string:
=--LEFT(MID(A1,AGGREGATE(15,6,FIND({1,2,3,4,5,6,7,8,9,0},A1),1),LEN(A1)),FIND(" ",MID(A1,AGGREGATE(15,6,FIND({1,2,3,4,5,6,7,8,9,0},A1),1),LEN(A1))))
AGGREGATE was introduced in 2010 Excel. If you do not ahve that then you will need to use this array formula:
=--LEFT(MID(A1,MIN(IFERROR(FIND({1,2,3,4,5,6,7,8,9,0},A1),1E+99)),LEN(A1)),FIND(" ",MID(A1,MIN(IFERROR(FIND({1,2,3,4,5,6,7,8,9,0},A1),1E+99)),LEN(A1))))
Being an array formula it needs to be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode. If done correctly then excel will put {} around the formula.

Add a space after colored text

I'm Using Microsoft Excel 2013.
I have a lot of data that I need to separate in Excel that is in a single cell. The "Text to Columns" feature works great except for one snag.
In a single cell, I have First Name, Last Name & Email address. The last name and email addresses do not have a space between them, but the color of the names are different than the email.
Example (all caps represent colored names RGB (1, 91, 167), lowercase is the email which is just standard black text):
JOHN DOEjohndoe#acmerockets.com
So I need to put a space after DOE so that it reads:
JOHN DOE johndoe#acmerockets.com
I have about 20k rows to go through so any tips would be appreciated. I just need to get a space or something in between that last name and email so I can use the "Text to Columns" feature and split those up.
Not a complete answer, but I would do it way:
Step 1 to get rid of the formatting:
Copy all text that you have to the notepad
Then copy-paste text from Notepad to excel as text
I think this should remove all the formatting issues
Step 2 is to use VBA to grab emails. I assume that you have all your emails as lowercase. Therefore something like this should do the trick (link link2):
([a-z0-9\-_+]*#([a-z0-9\-_+].)?[a-z0-9\-_+].[a-z0-9]{2,6})
Step 3 is to exclude emails that you extracted from Step2 from your main text. Something like this via simple Excel function:
=TRIM(SUBSTITUTE(FULLTEXT,EMAIL,""))
Since you removed all the formatting in Step1, you can apply it back when you done
You can knock this out pretty quickly taking advantage of a how Font returns the Color for a set of characters that do not have the same color: it returns Null! Knowing this, you can iterate through the characters 2 at a time and find the first spot where it throws Null. You now know that the color shift is there and can spit out the pieces using Mid.
Code makes use of this behavior and IsNull to iterate through a fixed Range. Define the Range however you want to get the cells. By default it spits them out in the neighboring two columns with Offset.
Sub FindChangeInColor()
Dim rng_cell As Range
Dim i As Integer
For Each rng_cell In Range("B2:B4")
For i = 1 To Len(rng_cell.Text) - 1
If IsNull(rng_cell.Characters(i, 2).Font.Color) Then
rng_cell.Offset(0, 1) = Mid(rng_cell, 1, i)
rng_cell.Offset(0, 2) = Mid(rng_cell, i + 1)
End If
Next
Next
End Sub
Picture of ranges and results
The nice thing about this approach is that the actual colors involved don't matter. You also don't have to manually search for a switch, although that would have been the next step.
Also your neighboring cells will be blank if no color change was found, so it's decently robust against bad inputs.
Edit adds ability to change original string if you want that instead:
Sub FindChangeInColorAndAddChar()
Dim rng_cell As Range
Dim i As Integer
For Each rng_cell In Range("B2:B4")
For i = 1 To Len(rng_cell.Text) - 1
If IsNull(rng_cell.Characters(i, 2).Font.Color) Then
rng_cell = Mid(rng_cell, 1, i) & "|" & Mid(rng_cell, i + 1)
End If
Next
Next
End Sub
Picture of results again use same input as above.

Excel Substrings

I have two unordered sets of data here:
blah blah:2020:50::7.1:45
movie blah:blahbah, The:1914:54:
I want to extract all the data to the left of the year (aka, 1915 and 1914).
What excel formula would I use for this?
I tried this formula
=IF(ISNUMBER(SEARCH(":",A1)),MID(A1,SEARCH(":",A1),300),A1)
these were the results below:
: blahblah, The:1914:54::7
:1915:50::7.1:45:
This is because there is a colon in the movie title.
The results I need consistently are:
:1914:54::7.9:17::
:1915:50::7.1:45::
Can someone help with this?
You can use Regular Expressions, make sure you include a reference for it in your VBA editor. The following UDF will do the job.
Function ExtractNumber(cell As Range) As String
ExtractNumber = ""
Dim rex As New RegExp
rex.Pattern = "(:\d{4}:\d{2}::\d\.\d:\d{2}::\d:\d:\d:\d:\d:\d:\d)"
rex.Global = True
Dim mtch As Object, sbmtch As Object
For Each mtch In rex.Execute(cell.Value)
ExtractNumber = ExtractNumber & mtch.SubMatches(0)
Next mtch
End Function
Without VBA:
In reality you don't want to find the : You want to find either :1 or :2 since the year will either start with 1 or 2This formula should do it:
=MID(A1,MIN(IFERROR(FIND(":1",A1,1),9999),IFERROR(FIND(":2",A1),9999)),9999)
Look for a four digit string, in a certain range, bounded by colons.
For example:
=MID(A1,MIN(FIND(":" &ROW(INDIRECT("1900:2100"))&":",A1 &":" &ROW(INDIRECT("1900:2100"))&":")),99)
entered as an array formula by holding down ctrl-shift while hitting Enter would ensure years in the range 1900 to 2100. Change those values as appropriate for your data. The 99 at the end represents the longest possible string. Again, that can be increased as required.
You can use the same approach to return just the left hand part, up to the colon preceding the year:
=LEFT(A1,-1+MIN(FIND(":" &ROW(INDIRECT("1900:2100"))&":",A1 &":" &ROW(INDIRECT("1900:2100"))&":")))
Here is a screen shot, showing the original data in B1:B2, with the results of the first part in B4:B5, and the formula for B4 showing in the formula bar.
The results for the 2nd part are in B7:B9

How do I convert the SERIESSUM function to VB6?

On the Daily Dose of Excel website, written by the late Frank Kabel, there are some formulae which can stand in for ATP functions. Not being an Excel guru, I'm struggling with converting one (so far!) to VB6. (Why I'm doing this I may relate once the NDA runs out.)
The problem I'm having is with the code standing in for SERIESSUM, namely,
=SUMPRODUCT(coefficients,x^(n+m*(ROW(INDIRECT("1:"&ROWS(coefficients)))-1)))
Now the SUMPRODUCT and ROWS functions I've been able to render fairly simply with
Public Function SUMPRODUCT(a1 As Variant, a2 As Variant) As Double
Dim dRes As Double
Dim dVal As Double
Dim i As Long
If LBound(a1) = LBound(a2) And UBound(a1) = UBound(a2) Then
For i = LBound(a1) To UBound(a1)
dVal = a1(i) * a2(i)
dRes = dRes + dVal
Next
End If
SUMPRODUCT = dRes
End Function
Public Function ROWS(a1 As Variant)
ROWS = UBound(a1) - LBound(a1) + 1
End Function
What I don't 'get' yet is
how x^(n+m*(ROW(INDIRECT("1:"&ROWS(coefficients)))-1)) evaluates to an array
and what that array might contain
Any Excel gurus out there?
ROW(INDIRECT("1:"&ROWS(coefficients)))-1
If coefficients has 5 rows, this will return the array {1,2,3,4,5}. The rest of the progression is
{1m, 2m, 3m, 4m, 5m)
{n+1m, n+2m, n+3m, n+4m, n+5m)
{x^n+1m, x^n+2m, x^n+3m, x^n+4m, x^n+5m)
That resulting array gets 'series summed' against coeffecients.
You can see the progression in Excel's formula bar by using Ctrl+= on highlighted parts of the formulas. There is a limit on how many characters you can display in the formula bar, so if coefficients has a lot of rows, you may get the error "formula too long"
In the formula bar, select ROW(INDIRECT("1:"&ROWS(coefficients)))-1 and press Ctrl+=. Then select another portion of your formula, making sure you match opening and closing parentheses, and hit Ctrl+=. You can iterate this until you have the whole formula calculated. When you're done, be sure to ESCAPE out of the cell so you don't lose your original formula.
See also Episode 474 here.

Resources