How to make partial match between two strings with mispelled characters - excel

I have a list of 12 digit alphanumeric codes and I need to match against a list of entries where codes might be misspelled.
For example, if the exact code is "K4I3T9OTG9GZ" the entry I have to check might be "K413T90TGS" (1 instead of capital I, 0 instead of capital O, S instead of Z).
I need to do a partial match to be able to find the right code.
Any ideas?
I already tried VLOOKUP with wildcards which worked for most entries with at least five consecutive right characters, but I still have a couple of hundred entries with no match.

Maybe this will help (array formula - Ctrl+Shift+Enter):
=SUMPRODUCT(--ISNUMBER(MATCH(MID($B$2,ROW($A$1:$A$12),1),MID($B$1,ROW($A$1:$A$12),1),0)))
The formula will check each character, one by one, and compare it against the "original"/"exact" code. In your example the result would be 7, as seven characters are matched exactly:
K 4 - 3 T 9 - T G -
{1;1;0;1;1;1;0;1;1;0;0;0}
Here's the full picture:

Related

How to trim prefix in goggle sheets using various conditions

I have data as follows in excel/google sheets.
Numbers that have a length of 19 characters need to be manipulated in this way
For all strings with a length of 19 last 6 digits need to be trimmed, ( i can easily do it )
and remove the leading prefix which is either 200 or 20000
for example
2005507187528000001 to 5507187528 |
2000017303364000001 to 17303364
Have no idea what to do to remove the prefix, I tried trimming the last 14 digits to get 20000 or 20055 and using this to determine if I need to take out the first 3 or first 6, but no success.
Please help !!!
thanks
If I understood your question correctly you want to remove the first N characters whether it is 200 or 20000.
Try:
=IF(LEFT(A2,5)="20000",RIGHT(A2,LEN(A2)-5),RIGHT(A2,LEN(A2)-3))
Drag down to column.
Result:
Explanation:
Using the LEFT() function you can extract the first 5 characters. You can then use an IF() to check if it is equal to 20000. Then using the Combination of RIGHT() and LEN() to remove the first N characters. If it is equal to 20000 remove the first 5 characters, if not then remove the first 3 characters.
Using an ArrayFormula:
=ARRAYFORMULA(IF(A2:A="","",IF(LEFT(A2:A,5)="20000",RIGHT(A2:A,LEN(A2:A)-5),RIGHT(A2:A,LEN(A2:A)-3))))
Here's a way using arrayformula so you don't have to drag down/copy to cells below. This of course still needs to be adjusted to your range.
Note: I have not included the formula to remove the last 6 characters since according to you you already have this, so you can just add this formula to yours.
For all strings with a length of 19 last 6 digits need to be trimmed,
( i can easily do it )
References:
Remove the First N Characters in a Cell in Google Sheets - Multiple ways to remove the first N characters, refer to this link.
LEFT()
IF()
try:
=INDEX(IFERROR(REGEXEXTRACT(A2:A&""; ".{6}$")))
update:
=REGEXEXTRACT(F905&""; "^20+(\d.*)\d{6}")

Removing first two digits in Excel if the character length is greater than certain number

I have cell phone numbers in Excel some with country code- 91 and some without country code. I need to remove the country code. We have 10 digit phone numbers so I need to remove the first two digits if the character length of the cell is greater than 10, i.e. if I have a number with country code like 917465785236 I need to remove the first two digits- 91 so that I only have 7465785236. I am trying the below piece but it doesn't check the IF condition and removes the first two digits from all the cells. Can someone tell me what's wrong I am doing here:
=IF((LEN(A1>10)),RIGHT(A1, LEN(A1)-2))
You probably need to put the parentheses differently for the Len function:
=IF((LEN(A1)>10),RIGHT(A1, LEN(A1)-2))
You're not using the parenthesis properly. Also since you strictly want to have 10 characters, you don't need to calculated the length in the RIGHT formula.. It needs to be like this:
=IF(LEN(A1)>10,RIGHT(A1, LEN(A1)-2),A1)
Now, that is the issue with your formula, but the solution to your question doesn't even need a IF statement, You can simply use:\
RIGHT(A1,10)
It will automatically get the 10 characters at the end and remove the rest.

How to find the last 2 letters in an alpha numeric string?

I have a column of alpha numeric addresses with no punctuation. In all cells, the state is the last 2 letters (abbreviation) followed by the ZIP. However, sometimes the ZIP is 5 letters and sometimes its xxxxx-xxxx so mid()right() wont work. Can anyone think of a formula that will work?
If those are the only two options: ##### and #####-#### then:
=MID(A1,LEN(A1)-IF(ISNUMBER(--MID(A1,LEN(A1)-5,1)),11,6),2)

Nesting Excel formulas to extract e-mail address top-level domain

I want to extract the top-level domain from e-mail addresses using Excel formulas.
I tried it first with concatenating RIGHT(..) Formulas and splitting for the dot. Sadly I do not know how to do this recursively with excel formulas, so I swapped to deleting all characters except the last 4. Now the problem is, when I split my formulas into single cells it works perfectly fine. If I try to use them together, I get only the output of the first inner Formula. How do I fix this?
=RIGHT(B8; LEN(B8)-(LEN(B8)-4))
=RIGHT(BF8;LEN(BF8)-FIND(".";BF8))
These are the formulas split into single cells. And here both together
=RIGHT(RIGHT(B8; LEN(B8)-(LEN(B8)-4));LEN(B8)-FIND(".";B8))
I get the same return value as in the first row from this formula
=RIGHT(B8; LEN(B8)-(LEN(B8)-4))
This =RIGHT(B8; LEN(B8)-(LEN(B8)-4)) is just a uselessly complicated version of =RIGHT(B8; 4).
Substituting this for BF8 in
=RIGHT(BF8;LEN(BF8)-FIND(".";BF8))
yields this
=RIGHT(RIGHT(B8; 4);LEN(RIGHT(B8; 4))-FIND(".";RIGHT(B8; 4)))
which can be simplified as
=RIGHT(RIGHT(B8; 4);4-FIND(".";RIGHT(B8; 4)))
So that's the answer to your question.
But note that this will fail when parsing e-mail addresses whose top-level domain name has more than 3 characters! So it won't work for e.g. test#test.info. Note that top-level domains can be up to 63 characters long!
In this earlier answer, I give a more general solution to this problem, not limited to searching a predetermined number of characters from the right.
=MID(B8;FIND(CHAR(1);SUBSTITUTE(B8;".";CHAR(1);LEN(B8)-LEN(SUBSTITUTE(B8;".";""))))+1;LEN(B8))
returns everything after the last . in the string.
Dot character may appear in left part if e-mail, like: john.johnson#email.com
So, you can't just find "." you need firstly find #, then find dot in right substring.
Tehese are your steps:
1. =FIND("#"; B8)
find # character place
2. =RIGHT(B8;LEN(B8) - FIND("#"; B8))
get substring right from #
3. =FIND(".";RIGHT(B8;LEN(B8) - FIND("#"; B8)))
find "." in step 2 substring
4. =RIGHT(RIGHT(B8;LEN(B8) - FIND("#"; B8)); LEN(RIGHT(B8;LEN(B8) - FIND("#"; B8))) - FIND(".";RIGHT(B8;LEN(B8) - FIND("#"; B8))))
get right(step2; len(step2) - step3)

Replace string with special character in Excel

I want to partially mask names on excel after concatenating:
A1: David Goliath
B1 (output): Dav*******ath
Please help. I need the 1st three and and last 3 characters shown and the rest to be replaced by a special character. Since this formula will be applied on a long list, the length of names would vary.
Formula
=LEFT(A1,3)&REPT("*", LEN(A1)-6)&RIGHT(A1,3)
Picture
How it works
This formula relies on string manipulation to grab the first 3 characters, last 3 characters, and a string of * in the middle. This assumes that the entries are at least 6 characters long. If you want it to work for less than 6, you would need to decide how to hide the middle.
The only real trick is knowing that the number of * you need is 6 less than the length of the string since you are taking 3 characters from the front and back.

Resources