Excel Substrings - excel

I have two unordered sets of data here:
blah blah:2020:50::7.1:45
movie blah:blahbah, The:1914:54:
I want to extract all the data to the left of the year (aka, 1915 and 1914).
What excel formula would I use for this?
I tried this formula
=IF(ISNUMBER(SEARCH(":",A1)),MID(A1,SEARCH(":",A1),300),A1)
these were the results below:
: blahblah, The:1914:54::7
:1915:50::7.1:45:
This is because there is a colon in the movie title.
The results I need consistently are:
:1914:54::7.9:17::
:1915:50::7.1:45::
Can someone help with this?

You can use Regular Expressions, make sure you include a reference for it in your VBA editor. The following UDF will do the job.
Function ExtractNumber(cell As Range) As String
ExtractNumber = ""
Dim rex As New RegExp
rex.Pattern = "(:\d{4}:\d{2}::\d\.\d:\d{2}::\d:\d:\d:\d:\d:\d:\d)"
rex.Global = True
Dim mtch As Object, sbmtch As Object
For Each mtch In rex.Execute(cell.Value)
ExtractNumber = ExtractNumber & mtch.SubMatches(0)
Next mtch
End Function

Without VBA:
In reality you don't want to find the : You want to find either :1 or :2 since the year will either start with 1 or 2This formula should do it:
=MID(A1,MIN(IFERROR(FIND(":1",A1,1),9999),IFERROR(FIND(":2",A1),9999)),9999)

Look for a four digit string, in a certain range, bounded by colons.
For example:
=MID(A1,MIN(FIND(":" &ROW(INDIRECT("1900:2100"))&":",A1 &":" &ROW(INDIRECT("1900:2100"))&":")),99)
entered as an array formula by holding down ctrl-shift while hitting Enter would ensure years in the range 1900 to 2100. Change those values as appropriate for your data. The 99 at the end represents the longest possible string. Again, that can be increased as required.
You can use the same approach to return just the left hand part, up to the colon preceding the year:
=LEFT(A1,-1+MIN(FIND(":" &ROW(INDIRECT("1900:2100"))&":",A1 &":" &ROW(INDIRECT("1900:2100"))&":")))
Here is a screen shot, showing the original data in B1:B2, with the results of the first part in B4:B5, and the formula for B4 showing in the formula bar.
The results for the 2nd part are in B7:B9

Related

Summing the digits in Excel cells (long and short strings)

I'm working on a research related to frequencies.
I want to sum all the numbers in each cell and reduce them to single number only.
some cells have 2 numbers, others have 13 numbers. like these..
24.0542653897891
25.4846064424057
27
28.6055035477009
I tried several formulas to do that. the best ones have me 2 digits number, that I couldn't sum it again to get a single result.
like these Formulas:
=SUMPRODUCT(MID(SUBSTITUTE(B5,".",""),ROW(INDIRECT("1:"&LEN(B5)-1)),1)+0)
=SUMPRODUCT(1*MID(C5,ROW(INDIRECT("1:"&LEN(C5))),1))
any suggestion?
Thank you in advance.
EDIT
Based on your explanation your comments, it seems that what you want is what is called the digital root of the all the digits (excluding the decimal point). In other words, repeatedly summing the digits until you get to a single digit.
That can be calculated by a simpler formula than adding up the digits.
=1+(SUBSTITUTE(B5,".","")-1)-(INT((SUBSTITUTE(B5,".","")-1)/9)*9)
For long numbers, we can split the number in half and process each half. eg:
=1+MOD(1+MOD(LEFT(SUBSTITUTE(B5,".",""),INT(LEN(SUBSTITUTE(B5,".",""))/2))-1,9)+1+MOD(RIGHT(SUBSTITUTE(B5,".",""),LEN(SUBSTITUTE(B5,".",""))-INT(LEN(SUBSTITUTE(B5,".",""))/2))-1,9)-1,9)
However, the numbers should be stored as TEXT. When numbers are stored as numbers, what we see may not necessarily be what is stored there, and what the formula (as well as the UDF) will process.
The long formula version will correct all the errors on your worksheet EXCEPT for B104. B104 appears to have the value 5226.9332653096000 but Excel is really storing the value 5226.9333265309688. Because of Excel's precision limitations, this will get processed as 5226.93332653097. Hence there will be a disagreement.
Another method that should work would be to round all of the results in your column B to 15 digits (eg: . Combining that with using the long formula version should result in agreement for all the values you show.
Explanation
if a number is divisible by 9, its digital root will be 9, otherwise, the digital root will be n MOD 9
The general formula would be: =1+Mod(n-1,9)
In your case, since we are dealing with numbers larger than can be calculated using the MOD function, we need to both remove the dot, and also use the equivalent of mod which is n-(int(n/9)*9)
Notes:
this will work best with numbers stored as text. Since Excel may display and/or convert large numbers, or numbers with many decimal places, differently than expected, working with text strings of digits is the most stable method.
this method will not work reliably with numbers > 15 digits.
If you have numbers > 15 digits, then I suggest a VBA User Defined Function:
Option Explicit
Function digitalRoot(num As String) As Long
Dim S As String, Sum As Long, I As Long
S = num
Do While Len(S) > 1
Sum = 0
For I = 1 To Len(S)
Sum = Sum + Val(Mid(S, I, 1))
Next I
S = Trim(Str(Sum))
Loop
digitalRoot = CLng(S)
End Function
You could use a formula like:
=SUMPRODUCT(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s"))
You might need an extra SUBSTITUTE for changing . to , if that's your decimal delimiter:
=SUMPRODUCT(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(A1,".",",")," ","</s><s>")&"</s></t>","//s"))
However, maybe a UDF as others proposed is also a possibility for you. Though, something tells me I might have misinterpreted your question...
I hope you are looking for something like following UDF.
Function SumToOneDigit(myNumber)
Dim temp: temp = 0
CalcLoop:
For i = 1 To Len(myNumber)
If IsNumeric(Mid(myNumber, i, 1)) Then temp = temp + Mid(myNumber, i, 1)
Next
If Len(temp) > 1 Then
myNumber = temp
temp = 0
GoTo CalcLoop
End If
SumToOneDigit = temp
End Function
UDF (User Defined Functions) are codes in VBA (visual basic for applications).
When you can not make calculations with Given Excel functions like ones in your question, you can UDFs in VBA module in Excel. See this link for UDF .. If you dont have developer tab see this link ,, Add a module in VBA in by right clicking on the workbook and paste the above code in that module. Remember, this code remains in this workbook only. So, if you want to use this UDF in some other file your will have to add module in that file and paste the code in there as well. If you are frequently using such an UDF, better to make add-in out of it like this link
In addition to using "Text to Columns" as a one-off conversion, this is relatively easy to do in VBA, by creating a user function that accepts the data as a string, splits it into an array separated by spaces, and then loops the elements to add them up.
Add the following VBA code to a new module:
Function fSumData(strData As String) As Double
On Error GoTo E_Handle
Dim aData() As String
Dim lngLoop1 As Long
aData = Split(strData, " ")
For lngLoop1 = LBound(aData) To UBound(aData)
fSumData = fSumData + CDbl(aData(lngLoop1))
Next lngLoop1
fExit:
On Error Resume Next
Exit Function
E_Handle:
MsgBox Err.Description & vbCrLf & vbCrLf & "fSumData", vbOKOnly + vbCritical, "Error: " & Err.Number
Resume fExit
End Function
Then enter this into a cell in the Excel worksheet:
=fSumData(A1)
Regards,
The UDF below will return the sum of all numbers in a cell passed to it as an argument.
Function SumCell(Cell As Range) As Double
Dim Fun As Double ' function return value
Dim Sp() As String ' helper array
Dim i As Integer ' index to helper array
Sp = Split(Cell.Cells(1).Value)
For i = 0 To UBound(Sp)
Fun = Fun + Val(Sp(i))
Next i
SumCell = Fun
End Function
Install the function in a standard code module, created with a name like Module1. Call it from the worksheet with syntax like =SumCell(A2) where A2 is the cell that contains the numbers to be summed up. Copy down as you would a built-in function.

Find entire number between # and space at variable places within a cell

How can I find an entire number between a "#" and a space when that combination could appear anywhere in a given cell?
Example cell contents:
"This is a #123 Test that I 45 like to run"
"This is a #45 Test that I 98 like to run"
I need to return "123" from the first one and "45" from the second one.
Using Mid(), I can return the "1", but the problem is the number between # and space can vary in length, but there will generally be a #, number or numbers, then a space.
As a secondary issue, there may be scenarios where there is no "#", but I need to find the first numeric value in the cell and return them (i.e. "1", "34", "648").
Any advice on either of these challenges is greatly appreciated.
This should work as well:
=MID(A11,(FIND("#",A11,1)+1),FIND(" ",A11,FIND("#",A11,1)+1)-FIND("#",A11,1))
works by looking for the hash and the following space... Not for the secondary question...
Since you've put the excel-vba tag on your question, here's a vba way of doing it using regular expressions that should satisfy both your primary and secondary issues:
Sub tmp()
Dim regEx As New RegExp
regEx.Pattern = "^.*?\#?(\d+)"
Dim i As Integer
For i = 1 To Range("A" & Rows.Count).End(xlUp).Row:
Set mat = regEx.Execute(Cells(i, 1).Value)
If mat.Count = 1 Then
Cells(i, 2).Value = mat(0).SubMatches(0)
End If
Next
End Sub
The regular expression uses a non-greedy character search (ie the "?" on the end of "'.*?" is what does that) to find the first pattern in the cell that matches either "#123" or just "123" where the "123" is any arbitrary sequence of digits.
This will return the first number in a string:
=--LEFT(MID(A1,AGGREGATE(15,6,FIND({1,2,3,4,5,6,7,8,9,0},A1),1),LEN(A1)),FIND(" ",MID(A1,AGGREGATE(15,6,FIND({1,2,3,4,5,6,7,8,9,0},A1),1),LEN(A1))))
AGGREGATE was introduced in 2010 Excel. If you do not ahve that then you will need to use this array formula:
=--LEFT(MID(A1,MIN(IFERROR(FIND({1,2,3,4,5,6,7,8,9,0},A1),1E+99)),LEN(A1)),FIND(" ",MID(A1,MIN(IFERROR(FIND({1,2,3,4,5,6,7,8,9,0},A1),1E+99)),LEN(A1))))
Being an array formula it needs to be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode. If done correctly then excel will put {} around the formula.

Returning multiple values using Vlookup in excel

I have an excel sheet set up to automatically calculate meetings per day by day of the week. I would like to write a formula to return all dates I have a meeting scheduled (comma separated preferably), but I am having some difficulty. Using Vlookup, I can only get it to return the first date.
For example, here is what my data looks like:
A B C
Initial Meetings Follow-up Meetings Date
1 1 7/29/2015
0 1 7/30/2015
1 1 7/31/2015
0 0 8/1/2015
0 0 8/2/2015
I would like to write a formula to return "7/29/2015, 7/31/2015" in one cell, and "7/29/2015, 7/30/2015, 7/31/2015" in another, but I seem to be stuck.
You can't do this with vLookup.
This can be done relatively easily in a VB script, but it would affect portability as many if not most users disable macros by default and in many cases users are prevented from using Macros because their company disables them and makes it policy that users should not use them.
If you are OK with Macros, you can put the following into a new module and then use =MultiVlookup(lookup_value,table_array, col_index_num) in the same way as you'd use vlookup and it should give you a comma separated list of multiple matches:
Public Function MultiVlookup(find_value, search_range, return_row)
Dim myval ' String to represent return value (comma-separated list)
Dim comma ' Bool to represent whether we need to prefix the next result with ", "
comma = False
'Debug.Print find_value.value, return_row
For Each rw In search_range.Rows ' Iterate through each row in the range
If rw.Cells(1, 1).value = find_value Then ' If we have found the lookup value...
If comma Then ' Add a comma if it's not the first value we're adding to the list
myval = myval + ", "
Else
comma = True
End If
myval = myval + Str(rw.Cells(1, return_row).value)
End If
Next
MultiVlookup = myval
End Function
This may not be the cleanest way of doing it, and it isn't a direct copy of vlookup (for instance it does not have a fourth "range lookup" argument as vlookup does), but it works for my test:
Finally my original suggestion (in case it helps others - it's not the exact solution to the question) was:
I've not tried it myself, but this link shows what I think you might be looking for.
Great code, but don't forget to add the following is you use Option Explicit:
Dim rw As Range
WHEELS

Prevent Partial Duplicates in Excel

I have a worksheet with products where the people in my office can add new positions. The problem we're running into is that the products have specifications but not everybody puts them in (or inputs them wrong).
Example:
"cool product 14C"
Is there a way to convert Data Valuation option so that it warns me now in case I put "very cool product 14B" or anything that contains an already existing string of characters (say, longer than 4), like "cool produKt 14C" but also "good product 15" and so on?
I know that I can prevent 100% matches using COUNTIF and spot words that start/end in the same way using LEFT/RIGHT but I need to spot partial matches within the entries as well.
Thanks a lot!
If you want to cover typo's, word wraps, figure permutations etc. maybe a SOUNDEX algorithm would suit to your problem. Here's an implementation for Excel ...
So if you insert this as a user defined function, and create a column =SOUNDEX(A1) for each product row, upon entry of a new product name you can filter for all product rows with same SOUNDEX value. You can further automate this by letting user enter the new name into a dialog form first, do the validation, present them a Combo Box dropdown with possible duplicates, etc. etc. etc.
edit:
small function to find parts of strings terminated by blanks in a range (in answer to your comment)
Function FindSplit(Arg As Range, LookRange As Range) As String
Dim LookFor() As String, LookCell As Range
Dim Idx As Long
LookFor = Split(Arg)
FindSplit = ""
For Idx = 0 To UBound(LookFor)
For Each LookCell In LookRange.Cells
If InStr(1, LookCell, LookFor(Idx)) <> 0 Then
If FindSplit <> "" Then FindSplit = FindSplit & ", "
FindSplit = FindSplit & LookFor(Idx) & ":" & LookCell.Row
End If
Next LookCell
Next Idx
If FindSplit = "" Then FindSplit = "Cool entry!"
End Function
This is a bit crude ... but what it does is the following
split a single cell argument in pieces and put it into an array --> split()
process each piece --> For Idx = ...
search another range for strings that contain the piece --> For Each ...
add piece and row number of cell where it was found into a result string
You can enter/copy this as a formula next to each cell input and know immediately if you've done a cool input or not.
Value of cell D8 is [asd:3, wer:4]
Note the use of absolute addressing in the start of lookup range; this way you can copy the formula well down.
edit 17-Mar-2015
further to comment Joanna 17-Mar-2015, if the search argument is part of the range you're scanning, e.g. =FINDSPLIT(C5; C1:C12) you want to make sure that the If Instr(...) doesn't hit if LookCell and LookFor(Idx) are really the same cell as this would create a false positive. So you would rewrite the statement to
...
...
If InStr(1, LookCell, LookFor(Idx)) <> 0 And _
Not (LookCell.Row = Arg.Row And LookCell.Column = Arg.Column) _
Then
hint
Do not use a complete column (e.g. $C:$C) as the second argument as the function tends to become very slow without further precautions

Calculate alphanumeric string to an integer in Excel

I have an issue that I've not been able to figure out even with many of the ideas presented in other posts. My data comes in Excel and here are examples of each manner that any given cell might have the data:
4days 4hrs 41mins 29seconds
23hrs 43mins 4seconds
2hrs 2mins
52mins 16seconds
The end result would be to calculate the total minutes while allowing seconds to be ignored, so that the previous values would end up as follows:
6041
52
1423
122
Would anyone have an idea how to go about that?
Thanks for the assistance!
Bit tedious (and assumes units are always plural - also produces results in different order to example) but, with formulae only, if your data is in column A, in B1 and copied down:
="="&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"days","*1440+"),"hrs","*60+"),"mins","*1+"),"seconds","*0")," ","")&0
then Copy B and Paste Special values into C and apply Text to Columns to C with Tab as the delimiter.
This array formula** should also work:
=SUM(IFERROR(0+MID(REPT(" ",31)&SUBSTITUTE(A1&"dayhrminsecond"," ",REPT(" ",31)),FIND({"day","hr","min","second"},REPT(" ",31)&SUBSTITUTE(A1&"dayhrminsecond"," ",REPT(" ",31)))-31,31),0)*{1440,60,1,0})
Regards
**Array formulas are not entered in the same way as 'standard' formulas. Instead of pressing just ENTER, you first hold down CTRL and SHIFT, and only then press ENTER. If you've done it correctly, you'll notice Excel puts curly brackets {} around the formula (though do not attempt to manually insert these yourself).
The easiest option is probably VBA with a regular expression. You can then easily find each of the fields, and do the maths.
If you want to stick to "pure" Excel, then it seems to only option is to use SEARCH or FIND to find the position of each of the "days", "hrs", "mins" in the text (you may have to check if they're always plural). Then use MID with the position found above to extract the different components. See http://office.microsoft.com/en-gb/excel-help/split-text-among-columns-by-using-functions-HA010102341.aspx for similar examples.
But there's quite a bit of work to handle the cases where some components are missing, so either you'll use quite a few cells, so you'll get a very complex formula...
Here is a User Defined Function, written in VBA, which takes your string as the argument and returns the number of minutes. Only the first characters of the time interval names are checked (e.g. d, h, m) as this seems to provide sufficient discrimination.
To enter this User Defined Function (UDF), opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.
To use this User Defined Function (UDF), enter a formula like
=SumMinutes(A1)
in some cell.
Option Explicit
Function SumMinutes(S As String) As Long
Dim RE As Object, MC As Object
Dim lMins As Long
Dim I As Long
Set RE = CreateObject("vbscript.regexp")
With RE
.Pattern = "(\d+)(?=\s*d)|(\d+)(?=\s*h)|(\d+)(?=\s*m)"
.Global = True
.ignorecase = True
If .test(S) = True Then
Set MC = .Execute(S)
For I = 0 To MC.Count - 1
With MC(I)
lMins = lMins + _
.submatches(0) * 1440 + _
.submatches(1) * 60 + _
.submatches(2)
End With
Next I
End If
End With
SumMinutes = lMins
End Function

Resources