How to extract address in excel? - excel-formula

I have addresses in one cell and I want to extract them in different cells on the same row. Some cells have four lines of address and some have three. I am able to easily split using text to column and various delimiters for the ones with three but not the ones with four.
enter image description here
In the first example I have four lines and second has three
Anchorage Oncology Centre
3801 University Lake Drive
Suite 300-B2
Anchorage, AK 99508 US
I would like the above as split into 5 cells. One cell each for address, City, State, Zip code and country
Anchorage Oncology Centre
3801 University Lake Drive
Suite 300-B2
Anchorage
AK
99508
US
in second example below
Providence Alaska Medical Center
3200 Providence Drive
Anchorage, AK 99508 US
I would like
Providence Alaska Medical Center
3200 Providence Drive
Anchorage
AK
99508
US
Could this be done using a formula?
What I have done so far is that the full address is in cell A1 and I want them in B1, C1, D1, E1, and F1. What i have done is for the country I use =RIGHT(A2,2), zip code, i use =MID(A2, LEN(A2)-7, 5), state =MID(A2, LEN(A2)-10, 2). Now I am trying to extract the city. The city is before the comma and after the line break (Char(10)) and Address is first 2 or 3 lines. I don't know how to do that.
There is a line break between each line.
Thank you

If you use SUBSTITUTE() you can substitute the n'th occurrence of a character with a new character, then use FIND() to return that character. For example, if you SUBSTITUTE your CHAR(10), the third occurrence, you can find that character again that would be the end of the address.
So if your FULL address is in A1, then you could extract the address with LEFT(A1,FIND("~",SUBSTITUTE(A1,CHAR(10),"~",3)))

A possible solution would be
=IF(ROW(A1)<SUMPRODUCT((A$1:A$7<>"")*1),A1,TRIM(MID(SUBSTITUTE(SUBSTITUTE(INDEX(A$1:A$7,SUMPRODUCT((A$1:A$7<>"")*1)),",",)," ",REPT(" ",999)),(ROW(X1)-SUMPRODUCT((A$1:A$7<>"")*1)+1)*999-998,999)))
The split part is from here:
=TRIM(MID(SUBSTITUTE(A$2," ",REPT(" ",999)),ROW(X1)*999-998,999))
I have added some calculations to find the last row and eliminate commas:
SUBSTITUTE(INDEX(A$1:A$7,SUMPRODUCT((A$1:A$7<>"")*1)),",",)
If all the adresses are located in the first row, then u can insert the formula below the adress in column one and copy it down and to the right. Currently the address can have 7 lines, so u have to copy it at least to line 8.

Related

MID formula using SEARCH formula to extract text from cell

Very simple formulas following but I am missing some understanding and it seems frustratingly simple.
Very simple text extraction:
MID(A1,Start Num, Num of Chars)
A simple formula text finding formula,
SEARCH(Find_text, within_text, start_num)
Combined these two formulas can find and extract text from a field between two text characters, for instance 'underscores', 'parentheses' 'commas'
So for example to extract
text to extract >>> Jimbo Jones
from a cell containing parentheses an easy formula would be;
Sample text A1 = Incident Report No.1234, user (Jimbo Jones) Status- pending
formula;
=MID(A1, SEARCH("(", A1)+1, SEARCH(")", A1) - SEARCH("(", A1) -1)
Extracted text = Jimbo Jones
The logic is simple
1.Identify cell containing text
2.Find the start number by nominating a first searchable character
3.Find the end number of the text being extracted by searching for the second searchable character
4.Subtracting the Start Number from the End number gives the number of characters to extract
Without using Search formula the code is;
MID=(A1,32,11) = Jimbo Jones
But if i want to extract text between commas or other identical characters (such as quotation marks, apostrophes, asterisk ) I need to use this formula following (which I found suggested)
=MID(A1, SEARCH(",", A1)+1, SEARCH(",", A1, SEARCH(",", A1) +1) - SEARCH(",",A1) -1)
Sample text A1 Incident Report No.1234 user, Jimbo Jones, Status- pending
Extracted text = Jimbo Jones
But I how do i nominate other boundaries, such as text between 3rd and 4th comma for example?
Sample text A1 Incident Report, No.1234, user, Jimbo Jones, Status- pending
The reason for my confusion is in the above formula excel finds the second iteration of the comma no matter where they are in the text yet the actual formula being used is identical to the formula finding the first comma, the count of characters seems to automatically assume somehow that I want the second comma not the first, how do i instruct the formula find subsequent iterations of commas, such as 3rd 4th or 9th?
And what am i not understanding in why the formula finds the 2nd comma?
Cheers!
To explain what you are confused about:
At first sight it looks that it uses same formula to find 1st and 2nd searched symbol. But at second look you might notice that there is and argument start_num which tells for a formula where to start looking from. If you give first symbol location +1 (SEARCH(",", A1) +1))as that argument, formula will start looking for first search symbol in this part: ' No.1234, user, Jimbo Jones, Status- pending' and will give answer 42. You got 1st occasion place with first formula and then second occasion with formula above. Just find length by substracting and thats it.
Possible solutions:
If you have Office 365, use TEXTAFTER() and TEXTBEFORE() as others have stated where you can pass instance number as an argument:
=TEXTAFTER(TEXTBEFORE(A1,",",4),",",3)
Result:
Then you can use TRIM() to get rid of unwanted spaces in begining and end.
If you use older version of Office you can use SUBSTITUTE() as workaround as it lets you to change nth occasion of specific symbol in text.
Choose a symbol that does not appear in your text and change 3th and 4th occasions of your searched symbol to it. Then look for them (in this example we will substitute , to #:
=MID(A1,SEARCH("#",SUBSTITUTE(A1,",","#",3))+1,SEARCH("#",SUBSTITUTE(A1,",","#",4))-(SEARCH("#",SUBSTITUTE(A1,",","#",3))+1))
Little explanation:
Formulas used in explanation column C:
C
=SUBSTITUTE(A1,",","#",3)
=SUBSTITUTE(A1,",","#",4)
=SEARCH("#",B1)
=SEARCH("#",B2)
=MID(A1,B3+1,B4-(B3+1))
Full formula:
=MID(A1,SEARCH("#",SUBSTITUTE(A1,",","#",3))+1,SEARCH("#",SUBSTITUTE(A1,",","#",4))-(SEARCH("#",SUBSTITUTE(A1,",","#",3))+1))
Trimmed:
=TRIM(MID(A1,SEARCH("#",SUBSTITUTE(A1,",","#",3))+1,SEARCH("#",SUBSTITUTE(A1,",","#",4))-(SEARCH("#",SUBSTITUTE(A1,",","#",3))+1)))
Thanks for the responses all, I grinded through using the two Formulas I asked about (MID and SEARCH) and I have a result.
It's not pretty nor elegant but it extracts the data as per requirement. I will benefit from the tips left here in response to my question as simpler options are available.
Requirement: Extract text between 3rd and 4th Commas using MID and SEARCH
Sample text A15
Incident Report (ammended), No.12545424234, user, Jimbo Jones, Status-
pending
MID(A15,(SEARCH(",",A15,(1+(SEARCH(",",A15,SEARCH(",",A15)+1)))))+2,(SEARCH(",",A15,(SEARCH(",",A15,SEARCH(",",A15)+1)+(SEARCH(",",A15)))-(SEARCH(",",A15,SEARCH(",",A15)+1)-(SEARCH(",",A15)))+1)-(SEARCH(",",A15,(1+(SEARCH(",",A15,SEARCH(",",A15)+1))))))-2)
Test Extracted
Jimbo Jones
Obviously this solution works on other text, but it's also obviously not easy to quickly amend for other text locations.
Anyway, cheers again for the pointers...

MS Excel: Extract between the Nth and Nth Character in a string

Using an MS Excel formula (No VBA/Macro), I would like to extract between the Nth and Nth characters and/or letters in a string. Example: I have text in Columns A2 and A3, I would like to extract text located between the 4th space and 9th space in each of the following strings.
Column A2: Johnny went to the store and bought an apple and some grapes
Column A3: We were not expecting to go home so early but we had no other choice due to rain
Results:
Column A2: store and bought an apple
Column A3: to go home so early
With Microsoft 365 or Excel 2019 you can use TEXTJOIN() in combination with FILTERXML():
Formula in B1:
=TEXTJOIN(" ",,FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[position()>4][position()<6]"))
The xpath expressions first selects all elements from the 5th word onwards and consequently only returns the first 5 elements from that respective array. Therefor //s[position()>4][position()<6] can also be written as //s[position()>4 and position()<10].
What you are looking for is a substring function, which is not really available in excel, I would suggest looking at this tutorial for functions that perform the same functionality https://www.excel-easy.com/examples/substring.html, and try to find the best one for your use case.
I would think the MID function would be the best suit for your use case.

partial match excel with different languages

I have 2 different columns in excel and both are containing address lines but in different languages 1 in English and 2nd in Japanese, How can I do the partial match just the based upon the last numeric numbers of both the address.
Please take a look in the attachment to better understand.
My data set looks like this
So, first issue is address 1 does not have a number on the right.
Now, for address 2 you could use :
=right(B1,8)
which will bring back the last 8 characters ie the 7 digits with the hyphen. You could match to that result as you wish, assuming the data for adrress 2 is in column B.
Image as proof:
As per your data sample, in English address the number you are looking for is separated by comma and is coming at start, For example in 3-10-31, AMAVAMA, you can look for 3-10-31 which is separated by a comma from rest of the address
If this is the case in most of the cells you can use the below formula
Considering your Address 1 is in column B, Address 2 is in column C and the formula is in column A
=INDEX($C$2:$C$5,MATCH(1,IF(IFERROR(SEARCH(LEFT(B2,FIND(",",B2)-1),$C$2:$C$5),0)>0,1,0),0))
the formula is in cell A2 and is an array formula

Excel formula to delete data in a cell from a specific character forward

I have an Excel spreadsheet with 5,000+ lines. Each line has a single cell in it which contains a number of words followed by a value in parenthesis followed by more words. The number of words varies in each cell.
Example:
North Avenue (123) North Avenue
Highland Parks Mall (456) Highland Parks Mall
I would welcome any help with a formula that can delete everything from and including the first parenthesis on.
Using the examples above, I would like the cell contents to read:
North Avenue
Highland Parks Mall
Any help would be greatly appreciated! I've tried to search the site but am not even sure how to word the search query for this specific scenario.
Assuming your value is in A1, use this:
=LEFT(A1,FIND(" (",A1)-1)
This also assumes that every instance has the space, paren or else you will get errors when find fails to match.

Excel file clean up

I have an exported excel file where sentences are repeated without a space or period after each sentence. Is there any way to clean this up by removing the repeated sentence without doing this manually ? Here is the sample of sentence
Integrase, superantigen-encoding pathogenicity islands SaPIIntegrase, superantigen-encoding pathogenicity islands SaPI
if your sentence is only repeated once, then this will do
=LEFT(A1,LEN(A1)/2)
Put the following formula in cell B1
=LEFT(A1, LEN(A1)/2)
Then select B1 and double click the drag handle (little black box in bottom right corner). If I understood your problem correctly you now have "single instances" in column B. Finally, select all of column B, copy, and do Paste Special Values into column A. Lastly, delete column B.

Resources