Retreiving specific letters from Text in Microsoft Excel - excel

I have the column as you can see in this image.
The Distributor Address is in the format: "Street Address, Postcode, State".
I need to retrieve only "Postcode" and "State" and combine them.
The new column should be like this ➡"NSW2007","VIC3182"...
How could I retrieve the specific letters and combine them?

You can use FILTERXML if you have the newest version of Excel. See Excel - Extract substring(s) from string using FILTERXML for an excellent overview.
In this case, something like the below should work:
EDIT: overlooked the specific format you wanted
=FILTERXML("<t><s>"&SUBSTITUTE(A1,",","</s><s>")&"</s></t>","//s[3]")&
FILTERXML("<t><s>"&SUBSTITUTE(A1,",","</s><s>")&"</s></t>","//s[2]")

Well, here is a different way:
=RIGHT(A2,3)&","&TRIM(MID(A2,FIND(",",A2,1)+2,LEN(A2)-FIND(",",A2,FIND(",",A2,1)+1)))
This does rely on the state being the last 3 characters and the postcode after the first comma...

There are several ways of solving that. I will show a solution using a few basic functions: RIGHT, MID and LEN.
Assuming your data is on A1:
=RIGHT(A1,3)&MID(A1, LEN(A1)-8, 4)
RIGHT returns the 3 last characters from the cell A1.
MID returns the middle of the cell given a starting position and a length.
LEN gives you the starting position of the zip code, which is always the length of the string - 8 in your table.
Is it essential that your data is well organized as in the image to work well. One alternative is finding the ", " as below.
=RIGHT(A1,3)&MID(A1, FIND(", ",A1)+2,4)

Related

How can I extract an instagram username from a hyperlink to a neighboring column and keep the hyperlink on google sheets?

I'm tasked with going through a long column of instagram profile URLs and creating a new adjacent column comprised of just their usernames. I could theoretically go through the list individually, copy-pasting the part between ".com/" and the last "/" and then hyperlinking each of them, but I feel like there might be a faster way.
I've experimented with formulas trying to extract only the username but to no avail. I also realized formula cells can't be hyperlinked, so I would also need a solution for that. Here what I was trying so far:
Here the input and expected output:
URL
User Name
http://instgram.com/stack_overflow/
stack_overflow
http://instgram.com/stackoverflowing/
stackoverflowing
http://instgram.com/stackoverflowthestack/
stackoverflowthestack
http://instgram.com/stackoverflowingstacks/
stackoverflowingstacks
The end result should look like User Name Column and be adjustable for any length of usernames (slashes and spaces among other special characters cannot be used in instagram usernames).
Also, I'm unsure why my google docs takes semi-columns instead of commas as I'm used to with Excel, but it is what it is.
Figuring this out would save me loads of time in the long-run and I would be very appreciative.
If you can use TEXTAFTER (Office Insider Beta only, Windows: 2203 (Build 15104), Mac: 16.60 (220304)). You don't have to deal with LEFT/RIGHT/FIND functions. On B2 cell you enter and expand down the following formula:
=SUBSTITUTE(TEXTAFTER(A1,"/",-2),"/","")
Here is the output:
You can replace SUBSTITUTE as follow via TEXTBEFORE to remove the last /:
=TEXTBEFORE(TEXTAFTER(A1,"/",-2),"/")
Explanation
The main idea is that:
TEXTAFTER(text,delimiter,[instance_num], [match_mode],
[match_end], [if_not_found])
allows to search backward, using the third input argument: instance_num (with a negative number). We are interested in the penultimate occurrence of /, therefore this input argument would be -2.
Alternative Solution
If such functions are not available for your excel version the following works using MID(text, start_num, num_chars) function:
=LET(url, A1, length, LEN(url), count, length-LEN(SUBSTITUTE(url,"/",""))-1,
startPos, FIND(" ",SUBSTITUTE(url,"/"," ",count))+1, numChars, length-startPos,
MID(url,startPos, numChars))
Note: We use LET to avoid repeating the same calculation and also to make it easier to understand. Without LET it's a bigger formula but it works too:
=MID(A1, FIND(" ",SUBSTITUTE(A1,"/"," ",
LEN(A1)-LEN(SUBSTITUTE(A1,"/",""))-1))+1, LEN(A1) -
(FIND(" ",SUBSTITUTE(A1,"/"," ",LEN(A1)-LEN(SUBSTITUTE(A1,"/",""))-1))+1))
we are looking for the penultimate / so count name achieves that:
count, length-LEN(SUBSTITUTE(url,"/",""))-1
The following formula:
startPos, FIND(" ",SUBSTITUTE(url,"/"," ",count))+1
is a way to replace just the penultimate / by space ( ). URLs don't have spaces so it is a good replacement. Then finding the position of such space plus 1 will give us the starting position required by MID. Having the starting position and the length of the URL we can calculate the number of characters (numChars), so we can finally invoke MID with the required input arguments.
Note: The approach you tried in your question, relies on specific pattern of the URL (ending with .com) the above approaches don't have such constraint, so you can use the following URL: https://www.redcross.org/ and it works.

Find value within cell from a series in Excel

I have a list of addresses, such as this:
Lake Havasu,  Lake Havasu City,  Arizona.
St. Johns River,  Palatka,  Florida.
Tennessee River,  Knoxville,  Tennessee.
I would like to extract the State from these addresses and then have a column showing the abbreviated State name (AZ, FL, TN etc.).
I have a table that has the States with their abbreviation and once I extract the State, doing a simple INDEX MATCH to get the abbreviation is easy. I don't want to use text-to-columns because this file will constantly have values added to it and it would be much easier to just have a formula that does the extraction for me.
The ways I've tried to approach this that have failed so far are:
Some kind of SEARCH() function that looks at the full State list and tries to find a value that exists in the cell
A MID or RIGHT approach to only capture the last section but I can't work out how to have FIND only look for the second ", "
A version of INDEX MATCH but that fails because I can't find a good way to search or find the values as per approach (1)
Any help would be appreciated!
Please try this formula, where A2 is the original text.
=FILTERXML("<data><a>" & SUBSTITUTE(A2,", ","</a><a>") & "</a></data>","data/a[3]")
An alternative would be to look for the 2nd comma as shown below. Note that the "50" in the formula is an irrelevant number required by the MID() function. It shouldn't be smaller than the number of characters you need to return, however.
Char(160) is a character that wouldn't (shouldn't) naturally occur in your text, as it might if the text comes from a UNIX database. You can replace it with another one that fits the description.
=TRIM(MID(A2, FIND(CHAR(160),SUBSTITUTE(A2,",",CHAR(160),2)) + 1,50))
The following variation of the above would remove the final period. It will fail if there is anything following the period, such as an unwanted blank. That could be accommodated within the formula as well but it would be easier to treat the original data, if that is an option for you.
=TRIM(MID(LEFT(A2, LEN(A2)-1), FIND(CHAR(160),SUBSTITUTE(A2,",",CHAR(160),2)) + 1,50))
To find the abbreviation I would recommend to use VLOOKUP rather than INDEX/MATCH.
Use this (screenshot refers):
=MID(MID(B3,1,LEN(B3)-1),SEARCH(",",B3,SEARCH(",",B3,1)+1)+3,LEN(B3))

Extracting text from complex string in excel

The attached image (link: https://i.stack.imgur.com/w0pEw.png) shows a range of cells (B1:B7) from a table I imported from the web. I need a formula that allows me to extract the names from each cell. In this case, my objective is to generate the following list of names, where each name is in its own cell: Erik Karlsson, P.K. Subban, John Tavares, Matthew Tkachuk, Steven Stamkos, Dustin Brown, Shea Weber.
I have been reading about left, right, and mid functions, but I'm confused by the irregular spacing and special characters (i.e. the box with question mark beside some names).
Can anyone help me extract the names? Thanks
Assuming that your cells follow the same format, you can use a variety of text functions to get the name.
This function requires the following format:
Some initial text, followed by
2 new lines in Excel (represented by CHAR(10)
The name, which consists of a first name, a space, then a last name
A second space on the same line as the name, followed by some additional text.
With this format, you can use the following formula (assuming your data is in an Excel table, with the column of initial data named Text):
=MID([#Text],SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1,SEARCH(" ",MID([#Text],SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1,LEN([#Text])),SEARCH(" ",MID([#Text],SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1,LEN([#Text])))+1)-1)
To come up with this formula, we take the following steps:
First, we figure out where the name starts. We know this occurs after the 2 new lines, so we use:
=SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1
The inner (occurring second) SEARCH finds the first new line, and the outer (occurring first) finds the 2nd new line.
Now that we have that value, we can use it to determine the rest of the string (after the 2 new lines). Let's say that the previous formula was stored in a table column called Start of Name. The 2nd formula will then be:
=MID([#Text],[#[Start of Name]],LEN([#Text]))
Note that we're using the length of the entire text, which by definition is more than we need. However, that's not an issue, since Excel returns the smaller amount between the last argument to MID and the actual length of the text.
Once we have the text from the start of the name on, we need to calculate the position of the 2nd space (where the name ends). To do that, we need to calculate the position of the first space. This is similar to how we calculated the start of the name earlier (which starts after 2 new lines). The function we need is:
=SEARCH(" ",[#[Rest of String]],SEARCH(" ",[#[Rest of String]])+1)-1
So now, we know where the name starts (after 2 new lines), and where it ends (after the 2nd space). Assuming we have these numbers stored in columns named Start of Name and To Second Space respectively, we can use the following formula to get the name:
=MID([#Text],[#[Start of Name]],[#[To Second Space]])
This is equivalent to the first formula: The difference is that the first formula doesn't use any "helper columns".
Of course, if any cell doesn't match this format, then you'll be out of luck. Using Excel formulas to parse text can be finicky and inflexible. For example, if someone has a middle name, or someone has a initials with spaces (e.g. P.K. Subban was P. K. Subban), or there was a Jr. or something, your job would be a lot harder.
Another alternative is to use regular expressions to get the data you want. I would recommend this thorough answer as a primer. Although you still have the same issues with name formats.
Finally, there's the obligatory Falsehoods Programmers Believe About Names as a warning against assuming any kind of standardized name format.

How to modify numbers at the end of a cell using a formula

I have cells in excel containing data of the form v-1-2-1, v-1-2-10, v-1-2-100. I want to convert it to v-1-2-001, v-1-2-010,v-1-2-100. I have nearly 2000 entries
If all of the data follows the format shown then you could use FIND to return the position of '-'. There will be three instances of this character and you need to find the third one so use the position given by the first instance as the start position parameter of the second FIND and again for the third (essentially nesting FIND). Once you have the position of the third '-' you know where the final set of numbers are (from the returned third position+1 to the LEN of the string) and could use SUBSTITUTE or a combination of other excel string functions to configure the final portion as you need it.
I'm assuming that excel has your data formatted as text.
If you need further assistance I'm happy to knock up the formula in excel but I'm off to work now and won't be able to do so for around 9 hours.
Please try:
=LEFT(A1,6)&TEXT(MID(A1,7,10),"000")

How to find a string within a string

I have the list with like 100,000 site link strings
Each link is unique, but it has consistent ?Item=
Then, it's either nothing or it continues after & symbol.
My question is: How do I pull out the item numbers?
I know replace function can offer similar functionality, but it works with Fixed sizes, in my case string can be different in size.
Link example:
www.site.com?sadfsf?sdfsdf&adfasfd?Item=JGFGGG55555
or
www.site.com?sadfsf?sdfsdf&adfasfd?Item=JGFGGG55555&sdafsdfsdfsdf
In both cases I need to get JGFGGG55555 only
If this always is the last portion of the string, you can use the following:
=MID(A1, FIND("?Item=", A1) + 6, 99)
This assumes:
no item numbers will be over 99 digits.
no additional fields follow the item number.
Edit:
With the update to your question, it is apparent you have some strings with additional data after the ?Item= field. Without using VBA there is not a simple means of using MID and FIND to extract this.
However you could create a column which acts as a placeholder.
For example, create a column using:
=MID(A1,FIND("?Item=",A1)+6,99)
This gets you the following value: JGFGGG55555&sdafsdfsdfsdf
Next, create a column using:
=IF(ISERROR(FIND("&",B2)),B2,LEFT(B2,FIND("&",B2)-1))
This produces: JGFGGG55555 by searching the first value for a & and using the portion before it. If it is not found, the first value is simply repeated.
This formula should work for both the examples given:
=MID(A1,FIND("=",A1),IFERROR(LEN(A1)-FIND("&",A1,FIND("=",A1))-1,LEN(A1)+1-FIND("=",A1)))

Resources