Excel First Word (with error checking) - excel

While I can extract the first word from a cell containing multiple text values with error checking to return the only word if no multiple values exist. I cannot seem to wrap my brain around adding more checks (or if it is even possible in the same nested formula) for situations where some of the source cells contain a comma between multiple words. Example, the formula below will return "James" from "James Marriott". But, it returns "James," from "James, Marriott". If all of my cells in the range were consistent that would be easy, but they aren't. Attempts to nest multiple find statements have resulted in failure. Suggestions?
=IFERROR(LEFT(A1,FIND(" ",A2)-1),A2)
To compound matters, there are also cells that contain abbreviations as the first word, so somehow I need to account for that as well. For example "J.W. Marriott" where I need to apply the above logic to extract "Marriott".
Here are some examples below:
Text Desired output
James Marriott James
James, Marriott James
Able Acme Able
Golden, Eagle Golden
J.W. Marriott Marriott
A.B. Acme Acme

you could use regex (to set up please look at the post here)
Then you can extract the desired word with a formula like:
=regex(A1, "(?![Etc])[a-zA-Z]{2,}")
(This is searching for a pattern of two or more lower or upper case letters in the cell A1...and not searching for Etc)

Related

Find value within cell from a series in Excel

I have a list of addresses, such as this:
Lake Havasu,  Lake Havasu City,  Arizona.
St. Johns River,  Palatka,  Florida.
Tennessee River,  Knoxville,  Tennessee.
I would like to extract the State from these addresses and then have a column showing the abbreviated State name (AZ, FL, TN etc.).
I have a table that has the States with their abbreviation and once I extract the State, doing a simple INDEX MATCH to get the abbreviation is easy. I don't want to use text-to-columns because this file will constantly have values added to it and it would be much easier to just have a formula that does the extraction for me.
The ways I've tried to approach this that have failed so far are:
Some kind of SEARCH() function that looks at the full State list and tries to find a value that exists in the cell
A MID or RIGHT approach to only capture the last section but I can't work out how to have FIND only look for the second ", "
A version of INDEX MATCH but that fails because I can't find a good way to search or find the values as per approach (1)
Any help would be appreciated!
Please try this formula, where A2 is the original text.
=FILTERXML("<data><a>" & SUBSTITUTE(A2,", ","</a><a>") & "</a></data>","data/a[3]")
An alternative would be to look for the 2nd comma as shown below. Note that the "50" in the formula is an irrelevant number required by the MID() function. It shouldn't be smaller than the number of characters you need to return, however.
Char(160) is a character that wouldn't (shouldn't) naturally occur in your text, as it might if the text comes from a UNIX database. You can replace it with another one that fits the description.
=TRIM(MID(A2, FIND(CHAR(160),SUBSTITUTE(A2,",",CHAR(160),2)) + 1,50))
The following variation of the above would remove the final period. It will fail if there is anything following the period, such as an unwanted blank. That could be accommodated within the formula as well but it would be easier to treat the original data, if that is an option for you.
=TRIM(MID(LEFT(A2, LEN(A2)-1), FIND(CHAR(160),SUBSTITUTE(A2,",",CHAR(160),2)) + 1,50))
To find the abbreviation I would recommend to use VLOOKUP rather than INDEX/MATCH.
Use this (screenshot refers):
=MID(MID(B3,1,LEN(B3)-1),SEARCH(",",B3,SEARCH(",",B3,1)+1)+3,LEN(B3))

Extracting text from complex string in excel

The attached image (link: https://i.stack.imgur.com/w0pEw.png) shows a range of cells (B1:B7) from a table I imported from the web. I need a formula that allows me to extract the names from each cell. In this case, my objective is to generate the following list of names, where each name is in its own cell: Erik Karlsson, P.K. Subban, John Tavares, Matthew Tkachuk, Steven Stamkos, Dustin Brown, Shea Weber.
I have been reading about left, right, and mid functions, but I'm confused by the irregular spacing and special characters (i.e. the box with question mark beside some names).
Can anyone help me extract the names? Thanks
Assuming that your cells follow the same format, you can use a variety of text functions to get the name.
This function requires the following format:
Some initial text, followed by
2 new lines in Excel (represented by CHAR(10)
The name, which consists of a first name, a space, then a last name
A second space on the same line as the name, followed by some additional text.
With this format, you can use the following formula (assuming your data is in an Excel table, with the column of initial data named Text):
=MID([#Text],SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1,SEARCH(" ",MID([#Text],SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1,LEN([#Text])),SEARCH(" ",MID([#Text],SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1,LEN([#Text])))+1)-1)
To come up with this formula, we take the following steps:
First, we figure out where the name starts. We know this occurs after the 2 new lines, so we use:
=SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1
The inner (occurring second) SEARCH finds the first new line, and the outer (occurring first) finds the 2nd new line.
Now that we have that value, we can use it to determine the rest of the string (after the 2 new lines). Let's say that the previous formula was stored in a table column called Start of Name. The 2nd formula will then be:
=MID([#Text],[#[Start of Name]],LEN([#Text]))
Note that we're using the length of the entire text, which by definition is more than we need. However, that's not an issue, since Excel returns the smaller amount between the last argument to MID and the actual length of the text.
Once we have the text from the start of the name on, we need to calculate the position of the 2nd space (where the name ends). To do that, we need to calculate the position of the first space. This is similar to how we calculated the start of the name earlier (which starts after 2 new lines). The function we need is:
=SEARCH(" ",[#[Rest of String]],SEARCH(" ",[#[Rest of String]])+1)-1
So now, we know where the name starts (after 2 new lines), and where it ends (after the 2nd space). Assuming we have these numbers stored in columns named Start of Name and To Second Space respectively, we can use the following formula to get the name:
=MID([#Text],[#[Start of Name]],[#[To Second Space]])
This is equivalent to the first formula: The difference is that the first formula doesn't use any "helper columns".
Of course, if any cell doesn't match this format, then you'll be out of luck. Using Excel formulas to parse text can be finicky and inflexible. For example, if someone has a middle name, or someone has a initials with spaces (e.g. P.K. Subban was P. K. Subban), or there was a Jr. or something, your job would be a lot harder.
Another alternative is to use regular expressions to get the data you want. I would recommend this thorough answer as a primer. Although you still have the same issues with name formats.
Finally, there's the obligatory Falsehoods Programmers Believe About Names as a warning against assuming any kind of standardized name format.

Excel formula to search partial match and align row

I have 2 column data in Excel like this:
Can somebody help me write a formula in column C that will take the first name or the last name from column A and match it with column B and paste the values in column C. The following picture will give you the exact idea what I am trying to do. Thanks
Since your data is not "regular", you can try this formula which uses wild card to look for just the last name.
=INDEX($B$1:$B$4,MATCH("*" &MID(A1,FIND(" ",A1)+1,99)&"*",$B$1:$B$4,0))
It would be simpler if the first part followed some rule, but some have the first initial of the first name at the beginning; and others at the end.
Edit: (explanation added)
The FIND returns the character number of the first space
Add 1 to that to get the character number of the next word
Use MID to extract that word
Use that word in MATCH (with wild-cards before and after), to find it in the array of email addresses. This will return it's position in the array (row number)
Use that row number as an argument to the INDEX function to return the actual email address.
If you want to first examine the email address, you will need to determine which of the letters comprise the last name. Since this is not regular according to your example, you will need to check both options.
You will not be able to look for the first name from the email, as it is not present.
If you can guarantee that the first part will be follow the rule of one or the other, eg: either
FirstInitialLastName or
LastNameFirstInitial
Then you can try this:
=IFERROR(INDEX($B$1:$B$4,MATCH(MID(A1,FIND(" ",A1)+1,99)& LEFT(A1,1) &"*",$B$1:$B$4,0)),
INDEX($B$1:$B$4,MATCH( LEFT(A1,1)&MID(A1,FIND(" ",A1)+1,99) &"*",$B$1:$B$4,0)))
This seems to do what you want.
=IFERROR(VLOOKUP(LOWER(MID(A1,(SEARCH(" ",A1)+1),LEN(A1)))&LOWER(MID(A1,1,1))&"*",$B$1:$B$4,1,FALSE),VLOOKUP(LOWER(MID(A1,1,1))&LOWER(MID(A1,(SEARCH(" ",A1)+1),LEN(A1)))&"*",$B$1:$B$4,1,FALSE))
Its pretty crazy long and would likely be easier to digest and debug broken up into columns instead of one huge formula.
It basically makes FLast and FirstL out of the name field by splitting on the space.
LastF:
=LOWER(MID(A1,(SEARCH(" ",A1)+1),LEN(A1)))&LOWER(MID(A1,1,1))
And FirstL:
=LOWER(MID(A1,1,1))&LOWER(MID(A1,(SEARCH(" ",A1)+1),LEN(A1)))
We then make 2 vlookups for these by using wildcards:
LastF:
=VLOOKUP([lastfirst equation from above]&"*",$B$1:$B$4,1,FALSE)
And FirstL:
=VLOOKUP([firstlast equation from above]&"*",$B$1:$B$4,1,FALSE)
And then wrap those in IfError so they both get tried:
=IfError([firstLast vlookup],[lastfirst vlookup])
The rub is that's going to be hell to edit if you ever need to, which is why I suggest doing each piece in another column then referencing the previous one.
You also need to be aware that this answer will get tripped up by essentially the same name - e.g. Sam Smith and Sasha Smith would both match whatever the first entry for ssmith was. Any solution here will likely have the same pitfall.

Excel, Numberplate Clarification

I am working on an excel document for fuel cards at the minute and my current issue is to write in a formula for validating number plates based on UK standard plates (two letters followed by two numbers then three letters i.e. BK08JWZ). At this point in time we are not considering personal plates in this just to keep things simple.
Ideally I need excel to look at the text in the box and confirm it to an agreed layout but I am struggling to find the right formula. The plates are in column 'I' and I have already added in another column after titled 'approved plates' in column 'J'but this can be deleted if it's not needed.
Results wise, I can do this one of two ways, to either get the excel document to highlight and number plates that do not match the DVLA standard , or have a column next to the number plate column that registers a boolean response to the recognition i.e. If it is valid (true) or if not (false).
Either way the plate needs to be able to be seen as it was currently, so if there is something wrong with it, it needs to be visible, not throw up an error message.
Any help would be very welcome.
All the information on UK standard number plates are on this site:
https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/359317/INF104_160914.pdf
I would do it like this:
1) create a lookup sheet with data from the booklet. One column for allowed "memory tag" identiffiers (first two letters), one column for the allowed "age identiffiers" (first two numbers), and one column for allowed random letters (last three letters, full alphabet except I and Q)
2) strip spaces from the number plate for comparison
3) Use MID(numberplate,1,2), MID(numberplate,3,2) and MID(numberplate,5,3) to compare to each lookup list repectively (using INDEX()>0).
4) when all 3 parts are found in lookup lists the number plate is valid.
Try researching Regular Expressions or RegEx. This is a powerful programming tool to determine whether strings match specific patterns. You can use RegEx expressions to extract the pattern, replace the pattern or test for the pattern. Very efficient but not for the faint-hearted although there is plenty of help on-line. Try this article for starters.
The following RegEx may be what you need..
(?^[A-Z]{2}[0-9]{2}[A-Z]{3}$)|(?^[A-Z][0-9]{1,3}[A-Z]{3}$)|(?^[A-Z]{3}[0-9]{1,3}[A-Z]$)|(?^[0-9]{1,4}[A-Z]{1,2}$)|(?^[0-9]{1,3}[A-Z]{1,3}$)|(?^[A-Z]{1,2}[0-9]{1,4}$)|(?^[A-Z]{1,3}[0-9]{1,3}$)
This was copied from this article which gives a very full explanation using DVLA rules.
EDIT:
To use RegEx within Excel. In the IDE, Tools menu, select References and add the Microsoft VBScript Regular Expressions 5.5 reference.
With acknowlegement to user3616725s helpful observation.

Looking up Bigrams in Excel

Suppose I have a list of two-word pairs in a column in Excel. These words are delimited by a space so that a typical pair might look like "extreme happiness". The goal is to search for these 'bigrams' in a larger string located in another column. The issue is that the bigram will only be found if the two words are together and separated by a space. What would be preferable is if Excel could look for both words anywhere in a given larger string. It is crucial that the bigrams occupy one cell each since a score is assigned to each bigram and in fact the function used VLOOKUPs this value based on the bigram cell value. Would it make sense to change the space between any two words to a - or some other character? Is there a way to have Excel look up each value one at a time (perhaps by recognizing this character and passing through the larger string twice, that is, once for each word)?
Example: "The weather last night was extremely cold, but the warm fire gave me some happiness."
Here we would like to find both the word 'extreme' within the word extremely and the word happiness. Currently Excel would not be successful in doing this since it would just look for "extreme happiness" and determine that no such string exists.
If the bigram in the row below "extreme happiness" reads "weather gave" (for some reason) Excel will go check whether that bigram exists in the larger string and return a second score. This is done so that at the end every score can be added together.
This is pretty easy with a couple of formulas. See screenshot below:
The logic is simple. Assuming your bigram is in B1, we can input the following in C1. This will replace the spaces with *, which is Excel's wildcard character.
=SUBSTITUTE(B2," ","*")
Then we concatenate it to give us a wildcarded beginning and end.
=CONCATENATE("*",SUBSTITUTE(B2," ","*"),"*")
We then use a simple COUNTIF against the statement (here in A1) to return to us a count of occurence.
=COUNTIF(A2,CONCATENATE("*",SUBSTITUTE(B2," ","*"),"*"))
A simple IF check enclosing the above, with condition >0, can be used to give us either Yes or No.
=IF(COUNTIF(A2,CONCATENATE("*",SUBSTITUTE(B2," ","*"),"*"))>0,"Yes","No")
Let us know if this helps.

Resources