Finding Last Name while ignoring Suffixes - excel

I have a field that has first and last names. Some names include a middle initial, some names include a suffix.
I am trying to find a formula that only pulls the last name regardless of which format it is in.
Example format
Donald P Bellisario --> Bellisario
Dale Earnhardt Jr --> Earnhardt
Jimmy M Butler III--> Butler
Kanye E West--> West
Joseph Biden--> Biden
Formula 1: =TRIM(RIGHT(SUBSTITUTE(AS9," ",REPT(" ",LEN(AS9))),LEN(AS9)))
Formula 2:=RIGHT(AS9,LEN(AS9)-FIND("*",SUBSTITUTE(AS9," ","*",LEN(AS9)-LEN(SUBSTITUTE(AS9," ","")))))
Formula 1 and 2 do not ignore suffixes and will list those if existent Jack Smith Jr--> Jr
Formula 3: =SUBSTITUTE(TRIM(RIGHT(SUBSTITUTE(TRIM(SUBSTITUTE($AS9,IFERROR(RIGHT($AS9,LEN(AS9)-FIND(" ",$AS9)-10),""),""))," ",REPT(" ","99")),99)),",","")
Formula 3 will only include 10 characters after the end of the first name without displaying the middle initial. E.G(Heisenberger--> Heisenberg)

Truth is, working with names can be subject to various edge-cases that will prove a working solution wrong at some point. But for those samples shown I'd use FILTERXML() to "split" these input strings on the spaces and use xpath expressions to filter out those substrings:
Formula in B1:
=FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ.', '')!=''][translate(., 'aeiouAEIOU', '')!=.][last()]")
The trick here is that there are three coherent xpath expressions working together:
[translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ.', '')!=''] - Assert that node is not nothing when all uppercase and dots have been substituted with nothing.
[translate(., 'aeiouAEIOU', '')!=.] - Assert that node is not equal to its original node when all vowels (upper- and lowercase) have been substituted with nothing.
[last()] - The last() function returns an integer equal to the context size from the expression evaluation context, and thus it will return the last node that compiled testing against previous expressions.
I'd guess that depending on possible edge-cases you could add more rules to the equation. For a more comprehensive insight on these expressions you could have a look here.
Good luck.

Related

Excel: How to find six different combinations of words in string?

I have been working for several days on this and have researched everything looking for this answer. I'd appreciate any help you can give.
In Excel I am searching a string of text in column A:
Bought 1 HD Sep 3 2021 325.0 Call # 2.75
I am detecting the first word (in this case "Bought") and detecting the last word before "#" symbol (in this case "Call").
I am then detecting the price following the "#" symbol (in this case "2.75"). This number will go into column B (header "Open") or column C (header "Close") depending on the combination of words found:
Sold/Put=Close
Sold/Call=Open
Bought/Put=Open
Bought/Call=Close
Sold (by itself)=Open
Sold (by itself)=Close.
Bought 1 HD Sep 3 2021 325.0 Call # 2.75
The combination found in the above string is: "Bought Call". Therefore the number at the end ("2.75"), goes into "Open" column.
Here's another example:
Sold 4 AI Sep 17 2021 50.0 Put # 1.5
The combination found in the above string is: "Sold Put". Therefore the number at the end ("1.5") goes into "Close" column.
I am currently using this formula to determine if the string contains "Sold" and "Call" and get the desired number and it does work:
=IF(AND(
ISNUMBER(SEARCH({"Sold","Call"},A10))),
TRIM(MID(A10,SEARCH("#",A10)+LEN("#"),255))," ")
But, I don't know how to search for all the other possible combinations.
The point behind this is to be able to paste the transaction from the broker and have most of the entry process automated. I'm sure many will benefit from this as I've not found anything like this.
I'd appreciate any help and if possible, an explanation of the formula so I can better learn.
Thanks!
I think you have the right idea, but would just extend the IF statement.
Something like the below might work for you:
=IF(ISNUMBER(SEARCH("Call", $A1)),
IF(ISNUMBER(SEARCH({"Bought","Sold"}, $A1)),
NUMBERVALUE(RIGHT($A1, LEN($A1)-SEARCH("#", $A1))),""),
IF(ISNUMBER(SEARCH({"!!!","!!!","Bought","Sold"}, $A1)),
NUMBERVALUE(RIGHT($A1, LEN($A1)-SEARCH("#", $A1))),""))
Just enter in column B and drag down; columns B through E should fill as needed.
For example:
Note that the search for "!!!" is just random characters, it can be anything that you don't think has a good chance of appearing in the string.
Here/screenshots refer:
(requires Office 365 compatible version Excel)
Main lookup
=LET(fn_1,MATCH("*"&$H$7:$H$12&"*",B4,0),fn_2,MATCH("*"&$I$7:$I$12&"*",B4,0),IFERROR(INDEX($J$7:$J$12,MATCH(1,IF($I$7:$I$12="",fn_1*ISNUMBER(fn_2),fn_1*fn_2),0)),))
EDIT:
Other Excel versions:
=IFERROR(INDEX($J$7:$J$12,MATCH(1,IF($I$7:$I$12="",MATCH("*"&$H$7:$H$12&"*",B4,0)*ISNUMBER(MATCH("*"&$I$7:$I$12&"*",B4,0)),MATCH("*"&$H$7:$H$12&"*",B4,0)*MATCH("*"&$I$7:$I$12&"*",B4,0)),0)),)
(all that falls away is the 'Let' formula, replacing fn_1 and fn_2 with respective functions in index formula within the let making first equation somewhat longer, but otherwise identical)
Example applications
Have provided 2 examples of how one might customize to insert numeric in one of the columns (the key part to this question is really how to do lookup in first instance, from thereon it's a matter of finetuning/taking appropriate action)...
Assuming calls/buys are "long" position and strike price go in first col (here, D), and puts/sales are "short" position with strike price going in 2nd col (here, E):
Long - insert strike price col D
=IF(LET(fn_1,MATCH("*"&$H$7:$H$12&"*",B4,0),fn_2,MATCH("*"&$I$7:$I$12&"*",B4,0),IFERROR(INDEX($K$7:$K$12,MATCH(1,IF($I$7:$I$12="",fn_1*ISNUMBER(fn_2),fn_1*fn_2),0)),))=1,MID(SUBSTITUTE(B4," ",""),SEARCH("#",SUBSTITUTE(B4," ",""))+1,LEN(SUBSTITUTE(B4," ",""))),"")
EDIT
Other Excel versions:
=IF(IFERROR(INDEX($K$7:$K$12,MATCH(1,IF($I$7:$I$12="",MATCH("*"&$H$7:$H$12&"*",B4,0)*ISNUMBER(MATCH("*"&$I$7:$I$12&"*",B4,0)),MATCH("*"&$H$7:$H$12&"*",B4,0)*MATCH("*"&$I$7:$I$12&"*",B4,0)),0)),)=1,MID(SUBSTITUTE(B4," ",""),SEARCH("#",SUBSTITUTE(B4," ",""))+1,LEN(SUBSTITUTE(B4," ",""))),"")
Short - insert strike price col E
=IF(LET(fn_1,MATCH("*"&$H$7:$H$12&"*",B4,0),fn_2,MATCH("*"&$I$7:$I$12&"*",B4,0),IFERROR(INDEX($K$7:$K$12,MATCH(1,IF($I$7:$I$12="",fn_1*ISNUMBER(fn_2),fn_1*fn_2),0)),))=2,MID(SUBSTITUTE(B4," ",""),SEARCH("#",SUBSTITUTE(B4," ",""))+1,LEN(SUBSTITUTE(B4," ",""))),"")
EDIT
Other Excel versions:
Follow same routine in previous Edits (remove Let, replace fn_1 & fn_2 with respective formulae...)
Note similarity in all 3 equations above: 2nd and 3rd contain 1st (effectively they just wrap a big old 'if' statement around 1st, use lookup_2 col (here, col K), and use mid/search to extract rate after the hashtag.
Assumes you don't have other hashtags in the sentence..
Customize as required.

Excel COUNTIF match variations of target: LET solution?

This is a slightly more complicated issue than a simple =COUNTIF(rng,"*"&value&"*"), as found here.
I have a 2D array with cells containing data such as:
abc
def
abc def
ghi
abc,def,ghi
abcdef
ghi; def
..... and several other variations of this. I am trying to count exact matches of "abc", but I want the count to be inclusive of cells containing "abc def" and other like variations, however I can't just use the above simple COUNTIF formula since "abcdef" is not an acceptable match. The target string must stand alone or be separated from other text by an acceptable character in chars.
I think I've got this one 90% done, but the bit I need help with is combining all the possible acceptable variations of a target "name" into a flat range that I can then check my data source against for the COUNTIF. I've tried INDEX(r_1:r_8,idxRow,idxCol) and other familiar solutions that work on the sheet when referencing other ranges, but I'm new to using the =LET function. All of this works well when broken out into separate components on my spreadsheet, but I'm looking for a cleaner solution with =LET. See below for current formula:
=LET(rg, DataTable[[Q14_1]:[Q14_9]],
name, AU38,
chars, {" ",",",";"},
r, 8,
r_1, CONCATENATE(name,chars),
r_2, CONCATENATE(chars,name),
r_3, CONCATENATE(chars,name,chars),
r_4, CONCATENATE(name,chars,"*"),
r_5, CONCATENATE("*",chars,name),
r_6, CONCATENATE(chars,name,chars,"*"),
r_7, CONCATENATE("*",chars,name,chars),
r_8, CONCATENATE("*",chars,name,chars,"*"),
c, COUNTA(chars),
mSeq, SEQUENCE(r*c),
idxRow, 1+MOD(mSeq,r),
idxCol, INT((SEQUENCE(r*c)-1)/r)+1,
X, INDEX(**NeedHelpHere**,idxRow,idxCol),
SUM(COUNTIF(rg,name),COUNTIF(rg,X))
)
Give a try on below formula. If you have more delimiter like space, comma & others then you need to use more SUBSTITUTE() function.
=LET(x,FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(A1:A7," ","</s><s>"),",","</s><s>")&"</s></t>","//s"),y,FILTER(x,x="abc"),SUM(--(y<>"")))
To learn about FILTERXML() please read this article from JvdV.
I've thought about this again and am posting a solution that fits my needs.
I don't need to index a single column of potential matches to then COUNTIF, I can just COUNTIF multiple times. Additionally, I was not taking into account different combinations of chars, I was only searching for the same chars on either side of the target (e.g. ",abc," when I should have also been looking for ",abc;"). Transposing the chars array on one side is a simple way of fixing this. It also turns out that "*"&target&"*" searches for "*target*" AND "target" (duh!), so I simplified further, removing duplicative possibilities.
My final formula is below, which counts the number of times target (by itself or surrounded by any acceptable chars) is present in a given rng:
=LET(rng, DataTable[Q14_1]:[Q14_9]]),
name, $A6,
chars, {" " , "," , ";"},
r_1, CONCATENATE(name,chars,"*"),
r_2, CONCATENATE("*",chars,name),
r_3, CONCATENATE("*",chars,name,TRANSPOSE(chars),"*"),
SUM(COUNTIF(rng,name),COUNTIF(rng,r_1),COUNTIF(rng,r_2),COUNTIF(rng,r_3))
)

Find specific characters and return the next value in the cell using Excel Formula

I am not sure where to begin with the formula as I have gotten myself so confused with everything. I have a cell the contains "PON " or "PON: " or "PON = " then the actual PON (Example: PON 123467) I want to formula to return 123467 in the cell.
Examples What I want returned
I have PON 123467 for shoes 123467
I have PON: 234567-AB for food 234567-AB
I have PON - 569874-Weird for accessories 569874-Weird
I have PON = DOG-564-987 for dog food DOG-564-987
I am currently using Excel 365
Filterxml() will give you best companion here in this case. Try-
=FILTERXML("<t><s>"&SUBSTITUTE(FILTERXML("<t><s>"&SUBSTITUTE(A1," for","</s><s>")&"</s></t>","//s[1]")," ","</s><s>")&"</s></t>","//s[last()]")
Using FILTERXML, and testing for a substring following PON, you can try:
=FILTERXML("<t><s>"&SUBSTITUTE(TRIM(A1)," ","</s><s>") & "</s></t>","//s[contains(.,'PON')]/following-sibling::*[string-length(.)>2][1]")
Note that FILTERXML solution will cause a PON that is solely numeric, but with a leading zero, to drop the leading zero. Unfortunately, the xPath implementation in that function does not include the string() function
If dropping the leading zero might be a problem, you can add a character to the node that will force the number to be seen as a string. In the modified formula below, I use the unicode zero-width space, but there are others you can use. Note that this will count as a character for the string=length function, so be sure to maintain the >2 parameter:
=FILTERXML("<t><s>"&SUBSTITUTE(TRIM(A1)," ","</s><s>"&UNICHAR(8203)) & "</s></t>","//s[contains(.,'PON')]/following-sibling::*[string-length(.)>2][1]")
Because of the variablity in your data, that sometimes there are extraneous space-separated substrings between PON and your desired extract, the xpath:
locates the substring PON
returns all subsequent siblings that have a string-length of more than two (adjust if necessary)
returns the first sibling that meets that criterion.
You might try this formula.
=TRIM(LEFT(MID(A2,FIND(#{1,2,3,4,5,6,7,8,9},A2),100),FIND(" ",MID(A2,FIND(#{1,2,3,4,5,6,7,8,9},A2),100))))
It extracts the text between the first number and the first space following that number. The size of that extract is limited to 100 characters.

Pulling out specific characters that are combined in a single column

I have a data extract that resulted in fields being combined into column A like this:
Sales Figures Report pg 121
Walmart Inc. 001230134 99 Associates Parkway 56.12 20.00 10.00 86.12 00 1
1400.25 262.40 14.50 1677.15 02 9
50.00 100.25 10.00 160.25 00 1
1400.25 262.40 14.50 1677.15 02 9
There are over 50,000 rows in this sort of format, some are a little different as they'll start with vendor information and then have those values after (still all in col. A). In the above example 1677.15 is the combined value of the three numbers before that, i.e. the total amount due.
Originally I wanted to basically separate out each value using things like left(), mid(), right() etc. however at this point all I want is the total figure, i.e. the $1677.15 in the above example. What is the best non-vba method of doing this?
Two issues:
The problem is that the amounts are not always the same number of digits (can range from $xx.xx to $xxx,xxx.xx)
Since There are multiple "." you can't use search() to find the correct character location.
In your example, the Total figure is always the third group from the end. That being the case, you can use the following formula and defined names:
=IFERROR(--INDEX(TRIM(MID(SUBSTITUTE(A1," ",REPT(" ",99)),seq_99,99)),LEN(A1)-LEN(SUBSTITUTE(A1," ",""))-1),"")
Defined Names
seq refers to: =ROW(INDEX($1:$65535,1,1):INDEX($1:$65535,255,1))
seq_99 refers to: =IF(seq=1,1,(seq-1)*99)
In the formula,
SUBSTITUTE: Replace all spaces with a large number (99) of spaces
MID Return an array consisting of each word (space separated)
TRIM the words to remove the extra spaces
INDEX Return the item that is third from the end.
IFERROR handle lines such as your first, which does not contain the relevant pattern.
seq and seq_99 return arrays of {1,2,3,...}and{1,99,198,...}` to be used in the formulas

Remove all text and characters except some

I have here some text strings
"16cg-301 -request","16cg-3368 - for review","16cg-3684 - for process"
what i would like to do is to remove all the text and characters except the number and the letters "cg" and - which is within the reference code.
If the string you want to extract is always before the first space in the full string then you can use SEARCH and LEFT to extract your reference code:
=LEFT(A1,SEARCH(" ",A1)-1)
This formula would take 16cg-3368 from 16cg-3368 - for review.
I suggest using something like suggested here
How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
With a replace regex similar to this
[^\dcg]*
or a match regex like this
^([0-9cg- ]+).*
else you could also work with a strange formule similar to this
=CONCATENATE(IF(NOT(ISERROR(SEARCH(MID(A2;1;1);"01234567890cg-")>0));MID(A2;1;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;2;1);"01234567890cg-")>0));MID(A2;2;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;3;1);"01234567890cg-")>0));MID(A2;3;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;4;1);"01234567890cg-")>0));MID(A2;4;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;5;1);"01234567890cg-")>0));MID(A2;5;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;6;1);"01234567890cg-")>0));MID(A2;6;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;7;1);"01234567890cg-")>0));MID(A2;7;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;8;1);"01234567890cg-")>0));MID(A2;8;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;9;1);"01234567890cg-")>0));MID(A2;9;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;10;1);"01234567890cg-")>0));MID(A2;10;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;11;1);"01234567890cg-")>0));MID(A2;11;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;12;1);"01234567890cg-")>0));MID(A2;12;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;13;1);"01234567890cg-")>0));MID(A2;13;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;14;1);"01234567890cg-")>0));MID(A2;14;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;15;1);"01234567890cg-")>0));MID(A2;15;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;16;1);"01234567890cg-")>0));MID(A2;16;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;17;1);"01234567890cg-")>0));MID(A2;17;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;18;1);"01234567890cg-")>0));MID(A2;18;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;19;1);"01234567890cg-")>0));MID(A2;19;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;20;1);"01234567890cg-")>0));MID(A2;20;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;21;1);"01234567890cg-")>0));MID(A2;21;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;22;1);"01234567890cg-")>0));MID(A2;22;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;23;1);"01234567890cg-")>0));MID(A2;23;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;24;1);"01234567890cg-")>0));MID(A2;24;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;25;1);"01234567890cg-")>0));MID(A2;25;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;26;1);"01234567890cg-")>0));MID(A2;26;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;27;1);"01234567890cg-")>0));MID(A2;27;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;28;1);"01234567890cg-")>0));MID(A2;28;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;29;1);"01234567890cg-")>0));MID(A2;29;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;30;1);"01234567890cg-")>0));MID(A2;30;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;31;1);"01234567890cg-")>0));MID(A2;31;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;32;1);"01234567890cg-")>0));MID(A2;32;1);""))
only works by now for less than 33 signs.
problem here will be that you will get unexpected behavior like this:
123cg-123 - Process => 123cg-123-c
after rereading , I think you should try an other approach than described in the question ;-)
If you want to return everything up to and including the last digit, then try:
=LEFT(A1,LOOKUP(2,1/ISNUMBER(-MID(A1,seq,1)),seq))
seq is a named formula: Formula ► Define Name
Name: seq
Refers to: =ROW(INDEX($1:$65535,1,1):INDEX($1:$65535,255,1))
seq returns an array of sequential numbers from 1 to 255.
mid(a1,seq,1)
returns an array consisting of the individual characters in the string in A1. The leading minus sign converts the digits from strings to numbers.
The lookup function will then return the position of the last digit

Resources