Extract Text from URL in Excel - excel

I'd like to do the following in Excel:
http://www.examplesite.com/ABCD123.php --> /ABCD123.php
http://www.examplesite.com/folder/EFG567.php --> /folder/EFG567.php
Any help will be highly appreciated.

This is more generic and is based on assuming that .com/ will always be in a web address (though clearly that assumption is not robust when one considers .co.uk etc.)
=RIGHT(A1,LEN(A1)-(FIND(".com/",A1,1) + 4))

Actually figured it out:
=RIGHT(H8,LEN(H8)-26)
where 26 is the first 26 characters (http://www.examplesite.com).
There must be a more elegant/general solution for this though (i.e. Finding the number of characters before the ".com/")

This is quite hacky as it assumes that the address will always start with http://
=MID(A1,FIND("/",A1,8),LEN(A1)+1-FIND("/",A1,8))
Translated:
character position 8 is the first position after the http:// part
starting from pos 8, find the position of the first occurence of "/"
now subtract that position from the overall length of the string and add 1 to avoid losing the final character
now extract the substring from the position of that first "/" extending for the number of characters that we just calculated

Related

Nesting Excel formulas to extract e-mail address top-level domain

I want to extract the top-level domain from e-mail addresses using Excel formulas.
I tried it first with concatenating RIGHT(..) Formulas and splitting for the dot. Sadly I do not know how to do this recursively with excel formulas, so I swapped to deleting all characters except the last 4. Now the problem is, when I split my formulas into single cells it works perfectly fine. If I try to use them together, I get only the output of the first inner Formula. How do I fix this?
=RIGHT(B8; LEN(B8)-(LEN(B8)-4))
=RIGHT(BF8;LEN(BF8)-FIND(".";BF8))
These are the formulas split into single cells. And here both together
=RIGHT(RIGHT(B8; LEN(B8)-(LEN(B8)-4));LEN(B8)-FIND(".";B8))
I get the same return value as in the first row from this formula
=RIGHT(B8; LEN(B8)-(LEN(B8)-4))
This =RIGHT(B8; LEN(B8)-(LEN(B8)-4)) is just a uselessly complicated version of =RIGHT(B8; 4).
Substituting this for BF8 in
=RIGHT(BF8;LEN(BF8)-FIND(".";BF8))
yields this
=RIGHT(RIGHT(B8; 4);LEN(RIGHT(B8; 4))-FIND(".";RIGHT(B8; 4)))
which can be simplified as
=RIGHT(RIGHT(B8; 4);4-FIND(".";RIGHT(B8; 4)))
So that's the answer to your question.
But note that this will fail when parsing e-mail addresses whose top-level domain name has more than 3 characters! So it won't work for e.g. test#test.info. Note that top-level domains can be up to 63 characters long!
In this earlier answer, I give a more general solution to this problem, not limited to searching a predetermined number of characters from the right.
=MID(B8;FIND(CHAR(1);SUBSTITUTE(B8;".";CHAR(1);LEN(B8)-LEN(SUBSTITUTE(B8;".";""))))+1;LEN(B8))
returns everything after the last . in the string.
Dot character may appear in left part if e-mail, like: john.johnson#email.com
So, you can't just find "." you need firstly find #, then find dot in right substring.
Tehese are your steps:
1. =FIND("#"; B8)
find # character place
2. =RIGHT(B8;LEN(B8) - FIND("#"; B8))
get substring right from #
3. =FIND(".";RIGHT(B8;LEN(B8) - FIND("#"; B8)))
find "." in step 2 substring
4. =RIGHT(RIGHT(B8;LEN(B8) - FIND("#"; B8)); LEN(RIGHT(B8;LEN(B8) - FIND("#"; B8))) - FIND(".";RIGHT(B8;LEN(B8) - FIND("#"; B8))))
get right(step2; len(step2) - step3)

How can I split a phrase into a new line every x characters on Google Sheets?

I am translating a game, and the game's text box only supports 50 characters max per line. Is there a way to use a formula to split the entire sentence every 50 characters or whole word (49, 48, 47, etc)?
I am currently working with this formula.
=JOIN(CHAR(10),SPLIT(REGEXREPLACE(A1, "(.{50})", "/$1"),"/"))
The problem with this code, is that it splits at exactly 50 characters (one time), and will split in the middle of the word.
So again, my goal is to have it not split on the 50th character IF the 50th character is in the middle of the word, and for the rule to apply for the rest of the lines too because it only applies on the first line.
Please take a look at this test google sheet to get an example of what I am talking about.
If it's impossible to do it on Google Sheets, I don't mind moving to Excel provided I get a functioning code.
For the record, I did ask in Google's product forums 2 days ago, and still haven't received an answer.
=REGEXREPLACE(A1, "(.{1,50})\b", "$1" & CHAR(10))
{50} matches exactly 50 times, but what you need is 50 or less.
\b is word boundary that matches between alphanumeric and non-alphanumeric character.
= REGEXEXTRACT(A1,"(?ism)^"&REPT("([\w\d'\(\),. ]{0,49}\s)", ROUNDUP(LEN(A1)/50,0))&"([\w\d'\(\),. ]{0,49})$")
Tested with various expressions and works as intended. Note that only these characters [a-zA-Z0-9_'(),.] are allowed, Which means - and other characters not mentioned will not work. If you need them, add them inside the REPT expression and finishing regexp formula. Otherwise, This will work perfectly.
You are pretty close. I'm not an expert in Sheets, so not sure if this is the best way, but your Regex is wrong for what you want.
Also, you need to be certain that you don't use a split character that might appear in the phrase itself. However, using CHAR(10) for the replace character allows you to insert LF without going through the JOIN SPLIT sequence.
replace any line feeds, carriage returns and spaces with a single space
Match strings that start with a non-Space character followed by up to 49 more characters which are followed by a space or the end of the string.
replace the capture group with the capturing group followed by the CHAR(10) (and delete the space following).
There will be extra CHAR(10) at the end which you can strip off.
EDIT Regex changed slightly due to a difference in behavior between Google's RE and what I am used to (probably has to do with how a non-backtracking regex works). The problem showed up on your example:
=regexreplace(REGEXREPLACE(REGEXREPLACE(A1 & " ","[\r\n\s]+"," "),"(\S.{0,49})\s","$1" & char(10)),"\n+\z","")

Search within Search Function in Excel

I was going through the Search Function syntax and examples as per the following Office support document here.
Although the overall steps are clear as per the image attached here, one specific portion
SEARCH("""",A2,SEARCH("""",A2)+1)
is not quite clear to me. Could someone explain how it leads to 10 in the results?
It leads to 10 because it is counting from the beginning as in the first search function. The only difference is that you are searching for the second ". The first " is the fifth character in the sentence, while the second is the tenth. You are still looking at the same cell, same characters.
If you want to start the count after the first " is located, you can look at the right part following the first quotation.
SEARCH("""",RIGHT(A2,LEN(A2)-SEARCH("""",A2)))
It just finds the second occurrence of the double quotation mark. This is the simplest way to do such things, because SEARCH only finds the first occurrence and returns its position.
SEARCH() finds the position of the requested character, in this case a quotation mark. The embedded second SEARCH() gives the first search a new start position, which is after the first quotation mark at position 10. Since it is starting at location 10 + 1, the next occurrence of a quotation mark is at position 24.

Remove everything after certain string in Excel 2007

A B
1 www.harborfreight.com/ www.harborfreight.com
2 totsy.com totsy.com
3 www.totsy.com/customer/account/login/ www.totsy.com/customer/account/login
4 www.pandawill.com/ www.pandawill.com
I am trying to reduce the above Column A values to their simplest domain name form by removing every character after the first "/". It doesn't work on line 3 above using this formula:
=IF(ISERROR(SEARCH("/",A3)),A3,TRIM(LEFT(A3,FIND("|",SUBSTITUTE(A3,"/","|",LEN(A3)-LEN(SUBSTITUTE(A3,"/",""))))-1)))
Obviously my formula above seems to be stripping every character after the last "/". Can you please recommend the correct change?
Thanks,
Dan
Your formula seems very convoluted to me, is there a reason you are messing with substitutions?
This seemingly works fine for me:
=IFERROR(LEFT(A1,FIND("/",A1)-1),A1)
In that it returns the string before the first /, or just returns the string if / is not found.
Use this formula.
=MID(A1;1;if(ISERR(find("/";A1))=true;"99";find("/";A1)-1))
It finds the "/" character and returns text before it. Errorproof.

How do I get the last character of a string using an Excel function?

How do I get the last character of a string using an Excel function?
No need to apologize for asking a question! Try using the RIGHT function. It returns the last n characters of a string.
=RIGHT(A1, 1)
=RIGHT(A1)
is quite sufficient (where the string is contained in A1).
Similar in nature to LEFT, Excel's RIGHT function extracts a substring from a string starting from the right-most character:
SYNTAX
RIGHT( text, [number_of_characters] )
Parameters or Arguments
text
The string that you wish to extract from.
number_of_characters
Optional. It indicates the number of characters that you wish to extract starting from the right-most character. If this parameter is omitted, only 1 character is returned.
Applies To
Excel 2016, Excel 2013, Excel 2011 for Mac, Excel 2010, Excel 2007, Excel 2003, Excel XP, Excel 2000
Since number_of_characters is optional and defaults to 1 it is not required in this case.
However, there have been many issues with trailing spaces and if this is a risk for the last visible character (in general):
=RIGHT(TRIM(A1))
might be preferred.
Looks like the answer above was a little incomplete try the following:-
=RIGHT(A2,(LEN(A2)-(LEN(A2)-1)))
Obviously, this is for cell A2...
What this does is uses a combination of Right and Len - Len is the length of a string and in this case, we want to remove all but one from that... clearly, if you wanted the last two characters you'd change the -1 to -2 etc etc etc.
After the length has been determined and the portion of that which is required - then the Right command will display the information you need.
This works well combined with an IF statement - I use this to find out if the last character of a string of text is a specific character and remove it if it is. See, the example below for stripping out commas from the end of a text string...
=IF(RIGHT(A2,(LEN(A2)-(LEN(A2)-1)))=",",LEFT(A2,(LEN(A2)-1)),A2)
Just another way to do this:
=MID(A1, LEN(A1), 1)

Resources