String comparison between two cells containing text but one containing multiple delimiters - excel

I wish to compare two cells in excel which contain similar text but one containing different delimiters in it. I want to ignore the delimiters while comparing the strings.
Eg.
John Doe: Mary Ann. Are Married/
John Doe Mary Ann Are Married
I am totally unaware of macros. Any leads are appreciated!

If you have Office 365 Excel then we can use this array formula:
=TEXTJOIN("",TRUE,IF(((CODE(UPPER(MID(A1,ROW(INDIRECT("1:" & LEN(A1))),1)))>=65)*(CODE(UPPER(MID(A1,ROW(INDIRECT("1:" & LEN(A1))),1)))<=90))+(CODE(UPPER(MID(A1,ROW(INDIRECT("1:" & LEN(A1))),1)))=32),MID(A1,ROW(INDIRECT("1:" & LEN(A1))),1),""))=TEXTJOIN("",TRUE,IF(((CODE(UPPER(MID(A2,ROW(INDIRECT("1:" & LEN(A2))),1)))>=65)*(CODE(UPPER(MID(A2,ROW(INDIRECT("1:" & LEN(A2))),1)))<=90))+(CODE(UPPER(MID(A2,ROW(INDIRECT("1:" & LEN(A2))),1)))=32),MID(A2,ROW(INDIRECT("1:" & LEN(A2))),1),""))
Being an array formula it must be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode. If done correctly then Excel will put {} around the formula.

You can try this:
Function CompareByLetter(t1 As String, t2 As String) As Boolean
CompareByLetter = CleanString(t1) = CleanString(t2)
End Function
Function CleanString(t As String) As String
Dim t1, x, c
For x = 1 To Len(t)
c = Asc(UCase(Mid(t, x, 1)))
If (c >= 65 And c <= 90) Or c = 32 Then t1 = t1 & Mid(t, x, 1)
Next x
CleanString = t1
End Function
Then you can use it as a formula:
=CompareByLetter(A1,A2)
This macro just compares strings by only keeping letters and spaces.
You can also use =CleanString(A1) to remove all other characters from your strings.
To use this in your project, open excel and press ALT+F11
Right click off to the left side and select Insert -> Module
Paste the code into the module window on the right (see attached).
Image mirror since Stackoverflow image hosting seems down
After that, you should be able to use the functions as you would a formula.
Just enter =CompareByLetter(A1,A2) in a cell.

Related

How can I replace multiple string at once in Excel?

The function I expected
some_function(original_text, "search_text", "replacement_text")
The value of the second & third parameters will be multiple characters. For example. The result will replace the character based on the location of the character at the second & third parameters
some_function("9528", "1234567890", "abcdefghij")
1 -> a
2 -> b
3 -> c
...
8 -> h
9 -> i
0 -> j
The result of some_function will be iebh. The nested SUBSTITUTE function can archive the goal but I hope to compact the complexity.
The way you described your requirement is best written out via REDUCE(), a lambda-related helper function and recently announced to be in production:
=REDUCE("9528",SEQUENCE(10),LAMBDA(x,y,SUBSTITUTE(x,MID("1234567890",y,1),MID("abcdefghij",y,1))))
Needless to say, this would become more vivid when used with cell-references:
Formula in A3:
=REDUCE(A1,SEQUENCE(LEN(B1)),LAMBDA(x,y,SUBSTITUTE(x,MID(B1,y,1),MID(C1,y,1))))
Another, more convoluted way, could be:
=LET(A,9528,B,1234567890,C,"abcdefghij",D,MID(A,SEQUENCE(LEN(A)),1),CONCAT(IFERROR(MID(C,FIND(D,B),1),D)))
Or, as per the sceenshot above:
=LET(A,A1,B,B1,C,C1,D,MID(A,SEQUENCE(LEN(A)),1),CONCAT(IFERROR(MID(C,FIND(D,B),1),D)))
Function Multi_Replace(Original As String, Search_Text As String, Replace_With As String) As String
'intEnd represents the last character being replaced
Dim intEnd As Long: intEnd = WorksheetFunction.Min(Len(Search_Text), Len(Replace_With))
'necessary if Search text and replace text are different lengths;
Dim intChar As Long 'to track which character we're replacing
'Replace each character individually
For intChar = 1 To intEnd
Original = Replace(Original, Mid(Search_Text, intChar, 1), Mid(Replace_With, intChar, 1))
Next
Multi_Replace = Original
End Function
Maybe simpler if you do not have lambda yet: =TEXTJOIN(,,CHAR(96+MID(A1,SEQUENCE(LEN(A1)),1)))
*Note that this will not return 0 as the expected result.
Let's say you have a list of countries in column A and aim to replace all the abbreviations with the corresponding full names. you start with inputting the "Find" and "Replace" items in separate columns (D and E respectively), and then enter this formula in B2:
=XLOOKUP(A2, $D$2:$D$4, $E$2:$E$4, A2)
Translated from the Excel language into the human language, here's what the formula does:
Search for the A2 value (lookup_value) in D2:D4 (lookup_array) and return a match from E2:E4 (return_array). If not found, pull the original value from A2.
Double-click the fill handle to get the formula copied to the below cells, and the result won't keep you waiting:
Since the XLOOKUP function is only available in Excel 365, the above formula won't work in earlier versions. However, you can easily mimic this behavior with a combination of IFERROR or IFNA and VLOOKUP:
=IFNA(VLOOKUP(A2, $D$2:$E$4, 2, FALSE), A2)

Separating Data in the same excel column

I have a column of data with multiple value types in it. I am trying to separate out out each value type into a separate column. Below an example of the data:
6 - Cutler, Jay (Ovr: 83)
22 - Forte, Matt (Ovr: 88)
86 - Miller, Zach (Ovr: 80)
I tried to separate the data by a) going to data and clicking text to columns; however, the "Ovr: 80" portion of the data does not separate "Ovr" from 80. I also tried b) to convert to .csv file, but again was unable to separate "Ovr" from "80". Is there a formula I can use to separate this portion of the data from the rest?
I would like the data to be separated into different columns as show below:
6 | Cutler, | Jay | Ovr | 83
22 | Forte | Matt | Ovr | 88
86 | Miller | Zach | Ovr | 80
Any insight is much appreciated!
Select the cells you wish to process and run this macro to place results in the cells to the right of the selected cells:
Option Explicit
Sub dural()
Dim r As Range, s As String, ary
Dim i As Long, a
For Each r In Selection
s = r.Value
If s <> "" Then
s = Replace(Replace(s, "-", " "), ",", " ")
s = Replace(Replace(s, "(", " "), ")", " ")
s = Application.WorksheetFunction.Trim(Replace(s, ":", " "))
ary = Split(s, " ")
i = 1
For Each a In ary
r.Offset(0, i).Value = a
i = i + 1
Next a
End If
Next r
End Sub
using the method above your could do something like this...
first clean the text so its more manageable, using this formula and copying in a column you can clean it so it become a space delimited set
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"- ",""),",",""),"(",""),")",""),":","")
from there just copy the values the formula give you to a new sheet maybe and then use 'Text To Columns to get it split into columns.
For the record I do not recommend this method if you are willing to do the text to column option.
Functions used for this solution are:
LEFT function
FIND function
MID function
for your first column of text use the following:
=left(A1,find(" ",A1))*1
That will pull out the first number presuming you do not have any leading spaces. The *1 converts from text to a number.
for your second column of last times use the following:
=MID(A1,FIND("-",A1)+2,FIND(",",A1)-(FIND("-",A1)+2))
Provided you have a coma and a dash as indicated in your example data you will not get an error and it should pull the last name without the coma.
For your third column of first names follow the same general technique as last names with the following,
=MID(A1,FIND(",",A1)+2,FIND("(",A1)-2-(FIND(",",A1)+2)+1)
Follow the similar pattern to get you over column
=MID(A1,FIND("(",A1)+1,FIND(":",A1)-1-(FIND("(",A1)+1)+1)
and finally to get your age column use this:
=MID(A1,FIND(":",A1)+2,FIND(")",A1)-1-(FIND(":",A1)+2)+1)
copy the above formulas down as far as you need to go.

How to concatenate a list of words into a sentence with "and" before last item in Excel?

I want to join a list of words in Excel (not in VBA... with an Excel formula in the worksheet) to the following specifications:
Formula should ignore empty cells.
Formula should concatenate the words with "and" before final item if there is more than one item in the array of cells.
Formula should add "," between items if there are more than two items.
Examples:
A1=dog
A2=cat
A3=bird
A4=fish
Result would be: dog, cat, bird, and fish
A1=dog
A2=cat
A3=(empty cell)
A4=fish
Result would be: dog, cat, and fish
A1=dog
A2=(empty cell)
A3=bird
A4=(empty cell)
Result would be: dog and bird
A1=dog
A2=(empty cell)
A3=(empty cell)
A4=(empty cell)
Result would be: dog
Pretty please? I promise I've searched and searched for the answer.
Edit: Thank you, ExcelArchitect, I got it! This was the first time I'd ever used a custom function. You use it just like any other function in the worksheet! This is so great.
Not to push my luck, but how to do I get two cells to concatenate with my result if there is only one word in the result and two other cells if there is more than one word? Example: If the function you made for me returns just "dog", I'd want it to concatenate a cell with the text (B1) "My favorite thing to wear is a " and then "dog" and then another cell (B2) that says " costume." to make the sentence "My favorite thing to wear is a dog costume." But if it returns more than one animal, it would concatenate two other cells like this: Cell C1 "My favorite things to wear are " and "dog, cat, and bird" and Cell C2 " costumes." so that it would say "My favorite things to wear are dog, cat, and bird costumes."
If you're curious, my data really has nothing to do with animals or costumes. I am writing a program that will score a psychological test and then create an interpretive report from the test scores (I'm a psychologist).
-Mary Anne
Mary Anne:
This would be a great time to use VBA! But if you don't want to, there is a way to accomplish your goal without it.
You have to account for all of the possible outcomes here. With 4 different animals that means you have 15 outcomes:
Your equation just has to take into account all 15. It is VERY long and drawn out as a result. As such, if you have more than 4 animals that you'd like to turn into phrases, you should go the VBA route.
Here is my set up:
The formula in A7 is the following:
=IF(AND(A2<>"", A3="", A4="", A5=""), A2, IF(AND(A2="", A3<>"", A4="", A5=""), A3, IF(AND(A2="", A3="", A4<>"", A5=""), A4, IF(AND(A2="", A3="", A4="", A5<>""), A5, IF(AND(A2<>"", A3<>"", A4="", A5=""), A2&" and "&A3, IF(AND(A2<>"", A3="", A4<>"", A5=""), A2&" and "&A4, IF(AND(A2<>"", A3="", A4="", A5<>""), A2&" and "&A5, IF(AND(A2="", A3<>"", A4<>"", A5=""),A3&" and "&A4, IF(AND(A2="", A3<>"", A4="", A5<>""), A3&" and "&A5, IF(AND(A2="", A3="", A4<>"", A5<>""),A4&" and "&A5, IF(AND(A2<>"", A3<>"", A4<>"", A5=""), A2&", "&A3&", and "&A4, IF(AND(A2<>"", A3<>"", A4="", A5<>""), A2&", "&A3&", and "&A5, IF(AND(A2<>"", A3="", A4<>"", A5<>""), A2&", "&A4&", and "&A5, IF(AND(A2="", A3<>"", A4<>"", A5<>""), A3&", "&A4&", and "&A5, A2&", "&A3&", "&A4&", and "&A5))))))))))))))
Here it is via Excel:
Mary Anne - I'm such a nerd that I had to do this. Here is the VBA solution, and you can have as many names as you want! Paste this code into a new module in the workbook (go to Developer -> Visual Basic, then Insert -> New Module, and paste), then you can use it in your worksheet like a regular function. Just give it the range where the names are and you should be good to go! -Matt
Function CreatePhrase(NamesRng As Range) As String
'Creates a comma-separated phrase given a list of words or names
Dim Cell As Range
Dim l As Long
Dim cp As String
'Add commas between the values in the cells
For Each Cell In NamesRng
If Not IsEmpty(Cell) And Not Cell.Value = "" And Not Cell.Value = " " Then
cp = cp & Cell.Value & ", "
End If
Next Cell
'Remove trailing comma and space
If Right(cp, 2) = ", " Then cp = Left(cp, Len(cp) - 2)
'If there is only one value (no commas) then quit here
If InStr(1, cp, ",", vbTextCompare) = 0 Then
CreatePhrase = cp
Exit Function
End If
'Add "and" to the end of the phrase
For l = 1 To Len(cp)
If Mid(cp, Len(cp) - l + 1, 1) = "," Then
cp = Left(cp, Len(cp) - l + 2) & "and" & Right(cp, l - 1)
Exit For
End If
Next l
'If there are only two words or names (only one comma) then remove the comma
If InStr(InStr(1, cp, ",", vbTextCompare) + 1, cp, ",", vbTextCompare) = 0 Then
cp = Left(cp, InStr(1, cp, ",", vbTextCompare) - 1) & Right(cp, Len(cp) - InStr(1, cp, ",", vbTextCompare))
End If
CreatePhrase = cp
End Function
Hope that helps!
Matt, via ExcelArchitect.com
VBA is simpler. A formula is quite complicated, since Excel has no native functions allowing concatenation of a range. However, given that you have written that you would have up to eight animals, it is doable with the following formula which concatenates the contents of A1:A8 according to your rules. You can change those locations in the formula in the obvious locations.
I made one change: I may be wrong, but I believe English rules indicate that the comma preceding the last and should be omitted, so I did so. It could be added in if necessary. EDIT: Further investigation reveals a difference between US and UK rules: US rules are as you requested, UK rules omit the comma before the conjunction. I will modify the formulas and UDF to comply with US conventions.
In the formulas, the modification is to place a comma immediately prior to the and. The change in the UDF is likewise minor.
The formula was constructed from the following sequences:
So putting those formulas together, so as only to refer to A1:A8, we wind up with this monster:
=SUBSTITUTE(IFERROR(SUBSTITUTE(MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(CONCATENATE(",",A1,",",A2,",",A3,",",A4,",",A5,",",A6,",",A7,",",A8,","),",,",","),",,",","),",,",","),2,LEN(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(CONCATENATE(",",A1,",",A2,",",A3,",",A4,",",A5,",",A6,",",A7,",",A8,","),",,",","),",,",","),",,",","))-2),",",",and ",LEN(MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(CONCATENATE(",",A1,",",A2,",",A3,",",A4,",",A5,",",A6,",",A7,",",A8,","),",,",","),",,",","),",,",","),2,LEN(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(CONCATENATE(",",A1,",",A2,",",A3,",",A4,",",A5,",",A6,",",A7,",",A8,","),",,",","),",,",","),",,",","))-2))-LEN(SUBSTITUTE(MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(CONCATENATE(",",A1,",",A2,",",A3,",",A4,",",A5,",",A6,",",A7,",",A8,","),",,",","),",,",","),",,",","),2,LEN(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(CONCATENATE(",",A1,",",A2,",",A3,",",A4,",",A5,",",A6,",",A7,",",A8,","),",,",","),",,",","),",,",","))-2),",",""))),MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(CONCATENATE(",",A1,",",A2,",",A3,",",A4,",",A5,",",A6,",",A7,",",A8,","),",,",","),",,",","),",,",","),2,LEN(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(CONCATENATE(",",A1,",",A2,",",A3,",",A4,",",A5,",",A6,",",A7,",",A8,","),",,",","),",,",","),",,",","))-2)),",",", ")
Here is a VBA solution which will allow for any number of items; it concatenate according to the same rules as above.
Option Explicit
Function ConcatRangeWithAnd(RG As Range, Optional Delim As String = ", ")
Dim COL As Collection
Dim C As Range
Dim S As String
Dim I As Long
Set COL = New Collection
For Each C In RG
If Len(C.Text) > 0 Then COL.Add C.Text
Next C
Select Case COL.Count
Case 0
Exit Function
Case 1
ConcatRangeWithAnd = COL(1)
Case 2
ConcatRangeWithAnd = COL(1) & " and " & COL(2)
Case Else
For I = 1 To COL.Count - 1
S = S & COL(I) & ", "
Next I
ConcatRangeWithAnd = S & "and " & COL(COL.Count)
End Select
End Function
With the new TEXTJOIN function, this can be done very easily.
Step 1: Use TEXTJOIN function with the ", " delimiter, and set the ignore_empty to TRUE. This will give you comma separated, concatenated string, ignoring the blank values.
Step 2: Count the number of not blank entries in the list using COUNTA function. And subtract 1 from it. You might want to floor the value at 1 using the MAX function at this point.
Step 3: Use the SUBSTITUTE function to replace the last instance of the comma, which was calculated in Step 2, with a " and ".
Putting it all together:
=SUBSTITUTE(TEXTJOIN(", ",TRUE,A1:A14),", "," and ",MAX(1,COUNTA(A1:A14)-1))
Plug in any Range you want instead of A1:A14 in the above formula, and you will get a comma separated concatenate with an and before the last word.
Regarding duplicates:
Firstly, I really love Matt's solution and I've added this to my collection of custom functions.
What I do miss though is the possibility to remove duplicates from the phrase without removing them from the original range.
As you can't create a virtual range (a range that you can just play with in VBA independently from your source data), the solution would probably involve converting the range to an array, running some deduplication code and then creating the phrase from that.
My solution (albeit inelegant) is just to use the UNIQUE and FILTER functions to get a deduplicated list elsewhere on the spreadsheet (can be hidden if it bothers you) and to use Matt's function on that.
=UNIQUE(FILTER(yourRange,yourRange<>""))

Sort out dimensions listed incorrectly/only keep data in a certain format

I have a list of around 1500 items with dimensions, but the dimensions do not all have the same format. The dimensions I want to keep are listed as L x W x H. How can I sort the dimensions listed like this from the stuff I don't want (some are listed as only L x H, Diameter, or just gibberish, etc.) Thank you.
If by gibberish you mean text values that could include <space>x<space> then you have some real problems. However, it it can be reasonable assumed that the L x W x H format is what you want and the only values that contain 2 occurrences of <space>x<space> are valid ones then a helper column would identify the valid entries.
In an unused column to the right put this formula into the second row.
=ISNUMBER(FIND(" x ", $A2, FIND(" x ", $A2) + 3))
Fill down as necessary. The results should resemble the image below.
        
Use Data ► Sort & Filter ► Filter to filter your Helper column for FALSE. These entries can be deleted and when you turn the filter off you will be ;left with valid entries.
Elaborating on #jeeped's answer, if you are dealing with data from an external source, you might want to relax your rules to allow other valid input formats:
There must be exactly three numbers, all non-negative integers.
A decimal point is allowed, but no digits after the decimal point.
They can be separated by "x" or "X" or "*".
They can have extra spaces before, after or between the numbers, but not between the digits.
That would mean these values would all be OK:
17x12x13
100 * 50 * 2
100. X 200. X 300
Problems of this sort are ideally suited to regular expressions. The RegExp feature can be added in Code editor with Tools > References, then check "Microsoft VBScript Regular Expressions". Then try this VBA function:
Public Function IsNxNxN(s As String) As Boolean
With New RegExp
.Pattern = "^\s*(\d+)\.?\s*[xX*]\s*(\d+)\.?\s*[xX*]\s*(\d+)\.?\s*$"
With .Execute(s)
IsNxNxN = (.Count = 1)
End With
End With
End Function
In jeeped's sample worksheet, you would replace the B2 formula with:
=IsNxNxN(A2)
If you are trying to clean up the data as well as filter it, you could use this:
Public Function CleanupNxNxN(s As String) As String
With New RegExp
.Pattern = "^\s*(\d+)\.?\s*[xX*]\s*(\d+)\.?\s*[xX*]\s*(\d+)\.?\s*$"
With .Execute(s)
If .Count = 1 Then
With .Item(0)
CleanupNxNxN = .SubMatches(0) & " x " & _
.SubMatches(1) & " x " & _
.SubMatches(2)
End With
End If
End With
End With
End Function
and set the formula for C2 to:
=CleanupNxNxN(A2)
Any dimension values that are invalid will report False in column B and blank in Column C. Valid dimensions such as " 10. x 20X30 " would be reformatted as "10 x 20 x 30".
If you would like to allow extra "gibberish" before or after the dimensions, you could remove the "^" and "&" anchor characters from .Pattern, and get:
"approx. Size: 10*20*30 feet" would yield: True, "10 x 20 x 30"

Calculate alphanumeric string to an integer in Excel

I have an issue that I've not been able to figure out even with many of the ideas presented in other posts. My data comes in Excel and here are examples of each manner that any given cell might have the data:
4days 4hrs 41mins 29seconds
23hrs 43mins 4seconds
2hrs 2mins
52mins 16seconds
The end result would be to calculate the total minutes while allowing seconds to be ignored, so that the previous values would end up as follows:
6041
52
1423
122
Would anyone have an idea how to go about that?
Thanks for the assistance!
Bit tedious (and assumes units are always plural - also produces results in different order to example) but, with formulae only, if your data is in column A, in B1 and copied down:
="="&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"days","*1440+"),"hrs","*60+"),"mins","*1+"),"seconds","*0")," ","")&0
then Copy B and Paste Special values into C and apply Text to Columns to C with Tab as the delimiter.
This array formula** should also work:
=SUM(IFERROR(0+MID(REPT(" ",31)&SUBSTITUTE(A1&"dayhrminsecond"," ",REPT(" ",31)),FIND({"day","hr","min","second"},REPT(" ",31)&SUBSTITUTE(A1&"dayhrminsecond"," ",REPT(" ",31)))-31,31),0)*{1440,60,1,0})
Regards
**Array formulas are not entered in the same way as 'standard' formulas. Instead of pressing just ENTER, you first hold down CTRL and SHIFT, and only then press ENTER. If you've done it correctly, you'll notice Excel puts curly brackets {} around the formula (though do not attempt to manually insert these yourself).
The easiest option is probably VBA with a regular expression. You can then easily find each of the fields, and do the maths.
If you want to stick to "pure" Excel, then it seems to only option is to use SEARCH or FIND to find the position of each of the "days", "hrs", "mins" in the text (you may have to check if they're always plural). Then use MID with the position found above to extract the different components. See http://office.microsoft.com/en-gb/excel-help/split-text-among-columns-by-using-functions-HA010102341.aspx for similar examples.
But there's quite a bit of work to handle the cases where some components are missing, so either you'll use quite a few cells, so you'll get a very complex formula...
Here is a User Defined Function, written in VBA, which takes your string as the argument and returns the number of minutes. Only the first characters of the time interval names are checked (e.g. d, h, m) as this seems to provide sufficient discrimination.
To enter this User Defined Function (UDF), opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.
To use this User Defined Function (UDF), enter a formula like
=SumMinutes(A1)
in some cell.
Option Explicit
Function SumMinutes(S As String) As Long
Dim RE As Object, MC As Object
Dim lMins As Long
Dim I As Long
Set RE = CreateObject("vbscript.regexp")
With RE
.Pattern = "(\d+)(?=\s*d)|(\d+)(?=\s*h)|(\d+)(?=\s*m)"
.Global = True
.ignorecase = True
If .test(S) = True Then
Set MC = .Execute(S)
For I = 0 To MC.Count - 1
With MC(I)
lMins = lMins + _
.submatches(0) * 1440 + _
.submatches(1) * 60 + _
.submatches(2)
End With
Next I
End If
End With
SumMinutes = lMins
End Function

Resources