Excel - Checking if 2 strings match, if they don't, return the position of the string where the two diverge - excel

I have two strings that I want to compare. The function I want is to basically:
(1) Check if the two strings match exactly
(2) If they do match, return TRUE
(3) If they do not match, return the position of the string where the two diverge
For example:
Cell A1: Barack Obama
Cell A2: Barack Obana
I know that these two strings don't match and the error is the "n" in "Obana". Therefore, the error happens at the string position of 10 in A2. I would like the function to return 10.
My attempt:
=IF(EXACT(A1,A2), "MATCH", ??(SEARCH(A1,A2,1))??
Thanks!

How about the following VBA function:
Function MatchOrDiverge(BaseString As String, ComparedString As String)
If BaseString = ComparedString Then
MatchOrDiverge = "MATCH"
Else
For i = 1 To Len(BaseString)
If Not (Mid(BaseString, i, 1) = Mid(ComparedString, i, 1)) Then
MatchOrDiverge = i
Exit Function
End If
Next i
MatchOrDiverge = Len(BaseString) + 1
End If
End Function
This takes 2 strings as input. First, it checks to see if the 2 strings are the same. If they are, it returns "MATCH".
If the 2 strings are not equal, it loops through the BaseString and checks its characters against the ComparedString. When a character does not match, it returns that character's index.
If the strings match, but the second one is longer (e.g., "cat" and "cattle"), then it returns the length of the BaseString + 1.
Screenshot in action:

Here is a formula:
=IF(EXACT(A1,A2),"MATCH",AGGREGATE(15,6,ROW(INDIRECT("1:" & MAX(LEN(A1),LEN(A2))))/(NOT(EXACT(MID(A1,ROW(INDIRECT("1:" & MAX(LEN(A1),LEN(A2)))),1),MID(A2,ROW(INDIRECT("1:" & MAX(LEN(A1),LEN(A2)))),1)))),1))
This is a long and convoluted formula and changing the references is not quick. The UDF option given by elmer007 would be easier to use and reference in the long run.

Related

How can I replace multiple string at once in Excel?

The function I expected
some_function(original_text, "search_text", "replacement_text")
The value of the second & third parameters will be multiple characters. For example. The result will replace the character based on the location of the character at the second & third parameters
some_function("9528", "1234567890", "abcdefghij")
1 -> a
2 -> b
3 -> c
...
8 -> h
9 -> i
0 -> j
The result of some_function will be iebh. The nested SUBSTITUTE function can archive the goal but I hope to compact the complexity.
The way you described your requirement is best written out via REDUCE(), a lambda-related helper function and recently announced to be in production:
=REDUCE("9528",SEQUENCE(10),LAMBDA(x,y,SUBSTITUTE(x,MID("1234567890",y,1),MID("abcdefghij",y,1))))
Needless to say, this would become more vivid when used with cell-references:
Formula in A3:
=REDUCE(A1,SEQUENCE(LEN(B1)),LAMBDA(x,y,SUBSTITUTE(x,MID(B1,y,1),MID(C1,y,1))))
Another, more convoluted way, could be:
=LET(A,9528,B,1234567890,C,"abcdefghij",D,MID(A,SEQUENCE(LEN(A)),1),CONCAT(IFERROR(MID(C,FIND(D,B),1),D)))
Or, as per the sceenshot above:
=LET(A,A1,B,B1,C,C1,D,MID(A,SEQUENCE(LEN(A)),1),CONCAT(IFERROR(MID(C,FIND(D,B),1),D)))
Function Multi_Replace(Original As String, Search_Text As String, Replace_With As String) As String
'intEnd represents the last character being replaced
Dim intEnd As Long: intEnd = WorksheetFunction.Min(Len(Search_Text), Len(Replace_With))
'necessary if Search text and replace text are different lengths;
Dim intChar As Long 'to track which character we're replacing
'Replace each character individually
For intChar = 1 To intEnd
Original = Replace(Original, Mid(Search_Text, intChar, 1), Mid(Replace_With, intChar, 1))
Next
Multi_Replace = Original
End Function
Maybe simpler if you do not have lambda yet: =TEXTJOIN(,,CHAR(96+MID(A1,SEQUENCE(LEN(A1)),1)))
*Note that this will not return 0 as the expected result.
Let's say you have a list of countries in column A and aim to replace all the abbreviations with the corresponding full names. you start with inputting the "Find" and "Replace" items in separate columns (D and E respectively), and then enter this formula in B2:
=XLOOKUP(A2, $D$2:$D$4, $E$2:$E$4, A2)
Translated from the Excel language into the human language, here's what the formula does:
Search for the A2 value (lookup_value) in D2:D4 (lookup_array) and return a match from E2:E4 (return_array). If not found, pull the original value from A2.
Double-click the fill handle to get the formula copied to the below cells, and the result won't keep you waiting:
Since the XLOOKUP function is only available in Excel 365, the above formula won't work in earlier versions. However, you can easily mimic this behavior with a combination of IFERROR or IFNA and VLOOKUP:
=IFNA(VLOOKUP(A2, $D$2:$E$4, 2, FALSE), A2)

Count Patterns In one Cell Excel

I wanted your help, I'm currently working in extracting some data, now the thing is that I have to count an specific amount of Call IDs a call ID format is the following 9129572520020000711. The pattern is 19 characters that starts with 9 and ends in 1.
and I want to count how many times this pattern appears in one cell
I.E. this is the value in one cell and I want to count how many times the pattern appears.
1912957252002000071129129545183410000711391295381628700007114912959791875000071159129597085000000711691295892838400007117912958908933000071189129452513730000711
To solve this with formulae you need to know:
The starting character
The ending character
The length of your Call ID
Finding all possible Call IDs
Let B1 be your number string and B2 be the call ID (or pattern) you are looking for. In B5 enter the formula =MID($B$2,1,1) to find the starting character you are looking for. In B6 enter =RIGHT($B$2,1) for the end character. In B7 enter =LEN($B$2) for the length of the call ID.
In Column A we'll enter the position of every starting character. The first formula will be a simple Find() formula in B10 as =FIND($B$5,$B$1,1). To find the other starting characters start the Find() at the location after the last starting character: =FIND($B$5,$B$1,$A10+1) in B11. Copy this down the column a few dozen times (or more).
In Column B we'll see if the next X characters (where X is the length of the Call ID) meets the criteria for a Call ID:
=IF(MID($B$1,$A10+($B$7-1),1)=$B$6,TRUE,FALSE)
The MID($B$1,$A10+($B$7-1),1)=$B$6 checks if the character at the end of the character at the end of this possible Call ID is the end character we're looking for. $A10+($B$7) calculates the position of the possible Call ID and $B$6 is the end character.
In Column C we can return the actual Call ID if there is a match. This isn't necessary to find the count, but will be useful later. Simply check if the value in Column B is True and, if yes, return the calculated string: =IF(B10,MID($B$1,$A10,$B$7),"").
To actually count the number of valid Call IDs, do a CountIf() of the Call ID column to check for the number of True values: =IF(B10,MID($B$1,$A10,$B$7),"").
If you don't want all the #Values! just wrap everything in IFERROR(,"") formulas.
Finding all consecutive Call IDs
However , some of these Call IDs overlap. Operating on the assumption that Call IDs cannot overlap, we simply have to start our search after the end character of a found ID, not the start. Insert an "Ending Position" column in Column B with the formulae: =$A10+($C$7-1), starting in B11. Alter A11 to =FIND($C$5,$C$1,$B10+1) and copy down. Don't change A10 as this finds the first starting position and is not depending on anything but the original text.
Which ones are valid?
I don't know, that depends on other criteria for your Call IDs. If you receive them consecutively, then the second method is best and the other possible ones found are by coincidence. If not, then you'll have to apply some other validation criteria to the first method, hence why we identified each ID.
You can solve this simply with a UDF using a regular expression.
Option Explicit
Function callIDcount(S As String) As Long
Dim RE As Object, MC As Object
Const sPat As String = "9\d{17}1"
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.Pattern = sPat
Set MC = .Execute(S)
callIDcount = MC.Count
End With
End Function
Using your example, this returns a count of 8
The regular expression engine captures all of the matches that match the pattern, into the match collection. To see how many are there, we merely return the count of that collection.
Trivial modifications would allow one to return the actual ID's also, should that be necessary.
The regex:
9\d{17}1
9\d{17}1
Match the character “9” literally 9
Match a single character that is a “digit” (ASCII 0–9 only) \d{17}
Exactly 17 times {17}
Match the character “1” literally 1
Created with RegexBuddy
EDIT Reading through TheFizh's post, he considered that you might want the count to include overlapping CallID's. In other words, given:
9129572520020000711291
We see that includes:
9129572520020000711
9572520020000711291
where the second overlaps with the first, but both meet your requirements.
Should that be what you want, merely change the regex so it does not "consume" the match:
Const sPat As String = "9(?=\d{17}1)"
and you will return the result of 15 instead of 8, which would be non-overlapping pattern.
Do you mean something like what's following?
Sub CallID_noPatterns()
Dim CallID As String, CallIDLen As Integer
CallID = "9#################1"
CallIDLen = Len(CallID) 'the CallID's length
'Say that you want to get the value of "A1" cell and deal with its value
Dim CellVal As String, CellLen As Integer
CellVal = CStr(Range("A1").Text) 'get its value as a string
CellLen = Len(CellVal) 'get its length
'You Have 2 options:-
'1-The value is smaller than your CallID length. (Not Applicable)
'2-The value is longer than or equal to your CallID length
'So just run your code for the 2nd option
Dim i As Integer, num_checks, num_patterns
i = 0
num_patterns = 0
'imagine both of them as 2 arrays, every array consists of sequenced elements
'and your job is to take a sub-array from your value, of a length
' equals to CallID's length
'then compare your sub-array with CallID
num_checks = CellLen - CallIDLen + 1
If CellLen >= CallIDLen Then
For i = 0 To num_checks - 1 Step 19
For j = i To num_checks - 1
If Mid(CellVal, (j + 1), CallIDLen) Like CallID Then
num_patterns = num_patterns + 1
Exit For
End If
Next j
Next i
End If
'Display your result
MsgBox "Number of Patterns: " & Str(num_patterns)
End Sub

How to count up elements in excel

So I have a column called chemical formula for like 40,000 entries, and what I want to be able to do is count up how many elements are contained in the chemical formula. So for example:-
EXACT_MASS FORMULA
626.491026 C40H66O5
275.173274 C13H25NO5
For this, I need some kind of formula that will return with the result of
C H O
40 66 5
13 25 5
all as separate columns for the different elements and in rows for the different entries. Is there a formula that can do this?
You could make your own formula.
Open the VBA editor with ALT and F11 and insert a new module.
Add a reference to Microsoft VBScript Regular Expressions 5.5 by clicking Tools, then references.
Now add the following code:
Public Function FormulaSplit(theFormula As String, theLetter As String) As String
Dim RE As Object
Set RE = CreateObject("VBScript.RegExp")
With RE
.Global = True
.MultiLine = False
.IgnoreCase = False
.Pattern = "[A-Z]{1}[a-z]?"
End With
Dim Matches As Object
Set Matches = RE.Execute(theFormula)
Dim TheCollection As Collection
Set TheCollection = New Collection
Dim i As Integer
Dim Match As Object
For i = (Matches.Count - 1) To 0 Step -1
Set Match = Matches.Item(i)
TheCollection.Add Mid(theFormula, Match.FirstIndex + (Len(Match.Value) + 1)), UCase(Trim(Match.Value))
theFormula = Left(theFormula, Match.FirstIndex)
Next
FormulaSplit = "Not found"
On Error Resume Next
FormulaSplit = TheCollection.Item(UCase(Trim(theLetter)))
On Error GoTo 0
If FormulaSplit = "" Then
FormulaSplit = "1"
End If
Set RE = Nothing
Set Matches = Nothing
Set Match = Nothing
Set TheCollection = Nothing
End Function
Usage:
FormulaSplit("C40H66O5", "H") would return 66.
FormulaSplit("C40H66O5", "O") would return 5.
FormulaSplit("C40H66O5", "blah") would return "Not found".
You can use this formula directly in your workbook.
I've had a stab at doing this in a formula nad come up with the following:
=IFERROR((MID($C18,FIND(D17,$C18)+1,2))*1,IFERROR((MID($C18,FIND(D17,$C18)+1,1))*1,IFERROR(IF(FIND(D17,$C18)>0,1),0)))
It's not very neat and would have to be expanded further if any of your elements are going to appear more than 99 times - I also used a random placement on my worksheet so the titles H,C and O are in row 17. I would personally go with Jamie's answer but just wanted to try this to see if I could do it in a formula possible and figured it was worth sharing just as another perspective.
Even though this has an excellent (and accepted) VBA solution, I couldn't resist the challenge to do this without using VBA.
I posted a solution earlier, which wouldn't work in all cases. This new code should always work:
=MAX(
IFERROR(IF(FIND(C$1&ROW($1:$99),$B2),ROW($1:$99),0),0),
IFERROR(IF(FIND(C$1&CHAR(ROW($65:$90)),$B2&"Z"),1,0),0)
)
Enter as an array formula: Ctrl + Shift + Enter
Output:
The formula outputs 0 when not found, and I simply used conditional formatting to turn zeroes gray.
How it works
This part of the formula looks for the element, followed by a number between 1 and 99. If found, the number of atoms is returned. Otherwise, 0 is returned. The results are stored in an array:
IFERROR(IF(FIND(C$1&ROW($1:$99),$B2),ROW($1:$99),0),0)
In the case of C13H25NO5, a search for "C" returns this array:
{1,0,0,0,0,0,0,0,0,0,0,0,13,0,0,0,...,0}
1 is the first array element, because C1 is a match. 13 is the thirteenth array element, and that's what we're interested in.
The next part of the formula looks for the element, followed by an uppercase letter, which indicates a new element. (The letters A through Z are characters 65 through 90.) If found, the number 1 is returned. Otherwise, 0 is returned. The results are stored in an array:
IFERROR(IF(FIND(C$1&CHAR(ROW($65:$90)),$B2&"Z"),1,0),0)
"Z" is appended to the chemical formula, so that a match will be found when its last element has no number. (For example, "H2O".) There is no element "Z" in the Periodic Table, so this won't cause a problem.
In the case of C13H25NO5, a search for "N" returns this array:
{0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0}
1 is the 15th element in the array. That's because it found the letters "NO", and O is the 15th letter of the alphabet.
Taking the maximum value from each array gives us the number of atoms as desired.

Deleting variable number of leading characters from a variable-length string

If I am having G4ED7883666 and I want the output to be 7883666
and I have to apply this on a range of cells and they are not the same length and the only common thing is that I have to delete anything before the number that lies before the alphabet?
This formula finds the last number in a string, that is, all digits to the right of the last alpha character in the string.
=RIGHT(A1,MATCH(99,IFERROR(1*MID(A1,LEN(A1)+1-ROW($1:$25),1),99),0)-1)
Note that this is an array formula and must be entered with the Control-Shift-Enter keyboard combination.
How the formula works
Let's assume that the target string is fairly simple: "G4E78"
Working outward from the middle of the formula, the first thing to do is create an array with the elements 1 through 25. (Although this might seem to limit the formula to strings with no more than 25 characters, it actually places a limit of 25 digits on the size of the number that may be extracted by the formula.
ROW($1:$25) = {1;2;3;4;5;6;7; etc.}
Subtracting from this array the value of (1 + the length of the target string) produces a new array, the elements of which count down from the length of string. The first five elements will correspond to the position of the characters of the string - in reverse order!
LEN(A1)+1-ROW($1:$25) = {5;4;3;2;1;0;-1;-2;-3;-4; etc.}
The MID function then creates a new array that reverses the order of the characters of the string.
For example, the first element of the new array is the result of MID(A1, 5, 1), the second of MID(A1, 4, 1) and so on. The #VALUE! errors reflect the fact that MID cannot evaluate 0 or negative values as the position of a string, e.g., MID(A1,0,1) = #VALUE!.
MID(A1,LEN(A1)+1-ROW($1:$25),1) = {"8";"7";"E";"4";"G";#VALUE!;#VALUE!; etc.}
Multiplying the elements of the array by 1 turns the character elements of that array to #VALUE! errors as well.
=1*MID(A1,LEN(A1)+1-ROW($1:$25),1) = {"8";"7";#VALUE!;"4";#VALUE!;#VALUE!;#VALUE!; etc.}
And the IFERROR function turns the #VALUES into 99, which is just an arbitrary number greater than the value of a single digit.
IFERROR(1*MID(A1,LEN(A1)+1-ROW($1:$25),1),99) = {8;7;99;4;99;99;99; etc.}
Matching on the 99 gives the position of the first non-digit character counting from the right end of the string. In this case, "E" is the first non-digit in the reversed string "87E4G", at position 3. This is equivalent to saying that the number we are looking for at the end of the string, plus the "E", is 3 characters long.
MATCH(99,IFERROR(1*MID(A1,LEN(A1)+1-ROW($1:$25),1),99),0) = 3
So, for the final step, we take 3 - 1 (for the "E) characters from the right of string.
RIGHT(A1,MATCH(99,IFERROR(1*MID(A1,LEN(A1)+1-ROW($1:$25),1),99),0)-1) = "78"
One more submission for you to consider. This VBA function will get the right most digits before the first non-numeric character
Public Function GetRightNumbers(str As String)
Dim i As Integer
For i = Len(str) To 0 Step -1
If Not IsNumeric(Mid(str, i, 1)) Then
Exit For
End If
Next i
GetRightNumbers = Mid(str, i + 1)
End Function
You can write some VBA to format the data (just starting at the end and working back until you hit a non-number.)
Or you could (if you're happy to get an addin like Excelicious) then you can use regular expressions to format the text via a formula. An expression like [0-9]+$ would return all the numbers at the end of a string IIRC.
NOTE: This uses the regex pattern in James Snell's answer, so please upvote his answer if you find this useful.
Your best bet is to use a regular expression. You need to set a reference to VBScript Regular Expressions for this to work. Tools --> References...
Now you can use regex in your VBA.
This will find the numbers at the end of each cell. I am placing the result next to the original so that you can verify it is working the way you want. You can modify it to replace the cell as soon as you feel comfortable with it. The code works regardless of the length of the string you are evaluating, and will skip the cell if it doesn't find a match.
Sub GetTrailingNumbers()
Dim ws As Worksheet
Dim rng As Range
Dim cell As Range
Dim result As Object, results As Object
Dim regEx As New VBScript_RegExp_55.RegExp
Set ws = ThisWorkbook.Sheets("Sheet1")
' range is hard-coded here, but you can define
' it programatically based on the shape of your data
Set rng = ws.Range("A1:A3")
' pattern from James Snell's answer
regEx.Pattern = "[0-9]+$"
For Each cell In rng
If regEx.Test(cell.Value) Then
Set results = regEx.Execute(cell.Value)
For Each result In results
cell.Offset(, 1).Value = result.Value
Next result
End If
Next cell
End Sub
Takes the first 4 digits from the right of num:
num1=Right(num,4)
Takes the first 5 digits from the left of num:
num1=Left(num,5)
First takes the first ten digits from the left then takes the first four digits from the right:
num1=Right(Left(num, 10),4)
In your case:
num=G4ED7883666
num1=Right(num,7)

Excel 2010 VBA step through a string and place one char into each cell in sequence

I am used to string slicing in 'C' many, many years ago but I am trying to work with VBA for this specific task.
Right now I have created a string "this is a string" and created a new workbook.
What I need now is to use string slicing to put 't' in, say, A1, 'h' in A2, 'i' in A3 etc. to the end of the string.
After which my next string will go in, say B1 etc. until all strings are sliced.
I have searched but it seems most people want to do it the other way around (concatenating a range).
Any thoughts?
Use the mid function.
=MID($A$1,1,1)
The second argument is the start position so you could replace that for something like the row or col function so you can drag the formula dynamically.
ie.
=MID($A$1,ROW(),1)
If you wanted to do it purely in VBA, I believe the mid function exists in there too, so just loop through the string.
Dim str as String
str = Sheet1.Cells(1,1).Value
for i = 1 to Len(str)
'output string 1 character at a time in column C
sheet1.cells(i,3).value = Mid(str,i,1)
next i
* edit *
If you want to do this with multiple strings from an array, you could use something like:
Dim str(1 to 2) as String
str(1) = "This is a test string"
str(2) = "Some more test text"
for j = Lbound(str) to Ubound(str)
for i = 1 to Len(str(j))
'output strings 1 character at a time in columns A and B
sheet1.cells(i,j).value = Mid(str(j),i,1)
next i
next j

Resources