How to extract the text between many commas in excel - excel-formula

I have a problem to extract the suburb from a address.
For example, the dress is "143 Stephanie St, Upper Kedron, QLD, 4055."
How to set up a formula to extract the buburb, Upper Kedron, from the address?
I really appreciate your review :)

This will extract second word separated by comma:
=TRIM(MID(SUBSTITUTE(A1,",",REPT(" ",255)),255,255))
You will have problem if Upper Kedron is not 2nd word like for example:
143, Stephanie St, Upper Kedron, QLD, 4055 (this will extract Stephanie St).
So if you have cases that formula doesn't work give us more inputs.

As per the previous answer, looks like your suburb can be in any position. So the best way is to split the address to multiple columns and then choose a column of your suburb.
=TRIM(MID(SUBSTITUTE($A1,",",REPT(" ",999)),1+((COLUMN(A1)-1)*999),999))

Related

Extracting the last sequence of numbers in excel

I have a list of addresses from which I need to extract the last sequence of numbers (zip code). I'm looking for a general expression from which I can extract the zip codes from addresses from all over the world. I would have to tweak the expression in order for it to work for each country, or for a group of countries, I assume.
I'm trying to write a formula in excel that can recognise the last digit in a string, and from that, extract the numbers immediately before that last digit and stoping whenever it reaches a non-integer. Below I have an example of an address and the formula I've come up with (in E26), but I'm looking for something more compact:
National Institute of Pharmaceutical Education and Research (NIPER), Phase X, Sector 67, SAS Nagar, Punjab, 160062, India.
=MID(E26, MAX(IF(ISNUMBER(VALUE(MID(E26,ROW(INDIRECT("1:" & LEN(E26))),1))),ROW(INDIRECT("1:" & LEN(E26))))+1)-6, 6)
The first part of recognizing the last digit is working fine, the problem is to recognize the beggining of the sequence, at least in cases where there's also street numbers within the string (such as in this case). This is why I'm subtacting -6 to the position where the last digit was found, since I know the lenght of the zip code in this particular country. However, it may not be the case for all countries.
Plus there are cases, where there's a space between the sequence such as: 160 062. Also, they won't always have delimeters that I could use to extract the zip codes, hence, the reason why a need an algorithm for this.
I was wondering if there's a nitter way to do this? I would be open for VBA. Thanks for your help!
Best regards,
Antonio

Excel First Word (with error checking)

While I can extract the first word from a cell containing multiple text values with error checking to return the only word if no multiple values exist. I cannot seem to wrap my brain around adding more checks (or if it is even possible in the same nested formula) for situations where some of the source cells contain a comma between multiple words. Example, the formula below will return "James" from "James Marriott". But, it returns "James," from "James, Marriott". If all of my cells in the range were consistent that would be easy, but they aren't. Attempts to nest multiple find statements have resulted in failure. Suggestions?
=IFERROR(LEFT(A1,FIND(" ",A2)-1),A2)
To compound matters, there are also cells that contain abbreviations as the first word, so somehow I need to account for that as well. For example "J.W. Marriott" where I need to apply the above logic to extract "Marriott".
Here are some examples below:
Text Desired output
James Marriott James
James, Marriott James
Able Acme Able
Golden, Eagle Golden
J.W. Marriott Marriott
A.B. Acme Acme
you could use regex (to set up please look at the post here)
Then you can extract the desired word with a formula like:
=regex(A1, "(?![Etc])[a-zA-Z]{2,}")
(This is searching for a pattern of two or more lower or upper case letters in the cell A1...and not searching for Etc)

How do I extract text between two commas in Excel?

How do I extract text between two commas in Excel?
92 4th Street North, Providence, RI 02904
In this case, how would I extract "Providence" substring using simple Excel formulas (LEN, FIND, LEFT, RIGHT, etc)?
Try the following formula.
=MID(A2,FIND("^",SUBSTITUTE(A2,",","^",1))+1,FIND("^",SUBSTITUTE(A2,",","^",2))-FIND("^",SUBSTITUTE(A2,",","^",1))-1)
Try the following formula
=LEFT(RIGHT(A1,FIND(",",A1)),FIND(",",RIGHT(A1,FIND(",",A1)))-1)
Considering your data is on A1
#RAJA-THEVAR's formula worked very well for me with a list of over 2500 addresses, as long as the address only contained two commas. With an address like "100 Washington Street, Suite 225, Denver, CO 80220" it returns "Suite 225." I used the following formula to identify and addresses that contained more than two commas:
=LEN(A1)-LEN(SUBSTITUTE(A1,",",""))
Many of these many-comma addresses had strange formats or information, and I found it better to fix them by hand.

How to ignore the last two words in a cell

I need a formula that will return the County, separate from the code and date.
Kent BEC100 30/09/14
Not all counties are one word so I need a formula that ignores the last two words, I have one that will find the last word, second last word, first word, but I need one that will pull everything except the last two words. There are occasions where the county isn't present so I'm thinking I need to add an ISERROR in?
Bedford BED101 30/09/14
BLA102 30/09/14
Lancs BOL100 30/09/14
Coventry, West Midla COV100 30/09/14
west Sussex CRA101 30/09/14
If your string does not contain a pipe and is already trimmed, then use
=LEFT(A1,FIND("|",SUBSTITUTE(A1," ","|",-1+LEN(A1)-LEN(SUBSTITUTE(A1," ",""))))-1)
Here's how it works:
LEN(A1)-LEN(SUBSTITUTE(A1," ","") is the number of spaces.
SUBSTITUTE inserts a pipe at the penultimate space (with the help of my -1)
FIND gets the position of that pipe.
LEFT extracts the string up to that point.
If your string does contain a pipe, then use a different character. You can enclose the whole thing inside an IFERROR (you allude to this in the question) if you need more robustness.
In your examples code and date are always the same length so this formula will work if that always holds
=TRIM(MID(A1,1,LEN(A1)-15))

How do you parse a street address from a single cell in Excel into individual cells when the address format isn't consistent?

I have a list of addresses which are individual strings in an Excel spreadsheet:
123 Sesame St New York, NY 00000
123 Sesame Ct Atlanta, GA 11111
100 Sesame Way, 400 Jacksonville, FL 22222
As you can see above the third address is different. It has a suite number of 400 on what would normally be the street line 2 line. I am having trouble coming up with a formula that will parse the addresses above into its individual cells: Street 1 (with street 2 or suite information in this line), City, State and Zip.
My thought is to start from the right and extract information based on a space delimiter, but I am not sure how to do this. How would I go about this?
I guess you can do a combination of MID & FIND to extract parts of the address,
e.g.
=IF(IFERROR(MID(A1,1,FIND(",",A1,FIND(",",A1)+1)),1)=1,MID(A1, 1, FIND(",",A1)-1),MID(A1,1,FIND(",",A1,FIND(",",A1)+1)-1))
will extract the address from cell A1, depending of the number of commas it finds (1 or > 1).
ZIP and state won't be too difficult following the above mentioned pattern. I think the problem is extracting the city as you don't know where to set the limit between the city name and the street unless you have a finite set of street types, e.g. ct, st, way etc.
You can use a slightly shorter formula using SUBSTITUTE() and LEN() in addition to FIND() and LEFT():
=LEFT(A1,FIND("#",SUBSTITUTE(A1,",","#",LEN(A1)-LEN(SUBSTITUTE(A1,",",""))))-1)
The first part which gets executed is:
LEN(A1)-LEN(SUBSTITUTE(A1,",",""))
Which basically calculates the number of commas in the input string. This then goes into the next formula:
SUBSTITUTE(A1,",","#",[1])
Which substitutes the last occurrence of comma by # (if addresses have this, use another character which you won't find in the address).
=LEFT(A1,FIND("#",[2])-1)
And the last part is takes the characters up to the # we just inserted.

Resources