Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have an excel spreadsheet of clients. One of the cells is a clob pulled from our database and contains hundreds of words.
What I want to do, is remove everything except the text which starts after "phrase1" and ends before "phrase2" - is there any way to do this in Excel?
i.e.
Text is
a quick brown fox jumped over the lazy dog
I want everything between "brown" and "over", output will be "fox jumped"
Many thanks
Edit: the "I want" wasn't supposed to sound rude, it's just how I would write a requirement in the simplest way to understand. I've tried TRIM and custom filtering but no clue where to go from here.
Here is one approach, although I'm sure there are more compact versions by some formula wizards.
=MID(A1,SEARCH(A2,A1)+LEN(A2),SEARCH(A3,A1)-SEARCH(A2,A1)-LEN(A2))
This assumes your original string is in A1, first substring in A2, second substring in A3.
To automatically remove leading and trailing spaces, use this adjustement:
=MID(A1,SEARCH(A2,A1)+LEN(A2)+1,SEARCH(A3,A1)-SEARCH(A2,A1)-LEN(A2)-2)
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 12 days ago.
Improve this question
I have a data set in which I want to delete "SO" coming in the last of each string but not in middle or starting.
Example:
Aurangabad SO will become Aurangabad
Delhi South SO will become Delhi South
as of find and replace will replace South to outh. How can I avoid matching string coming in between to be saved.
Within sheets you can try:
=INDEX(IF(LEN(A:A),REGEXREPLACE(A:A,"(?i)\sSO$",""),))
So, in Excel this works:
Logic is simple, just find the total length with len() less the last 3 characters ie " S0".
Should be similar in Googlesheets but do check if needed.
However, if any entry does not end in "SO" then you can get round that with this:
Check if the last three characters are " SO", and if so, remove the last three characters.
=IF(RIGHT(A1,3)=" SO",LEFT(A1,LEN(A1)-3),A1)
If you need to modify the original data you can do Find and Replace in Google Sheets with Regular Expressions. Put SO$ as the value to find and be sure to tick Search using regular expressions - then it will only change "SO" when it's at the end of a string
The SUBSTITUTE function in Excel is case-sensitive, so =SUBSTITUTE("Delhi South SO", " SO", "") returns "Delhi South". If you have cases where there might be other " SO" strings you don't want to delete, then check the other answers that use regex and Google Sheets.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 months ago.
Improve this question
I'm trying to find an average of a large array of candidates compensation. Some of the cells contain text with multiple numbers showing a range such as, "$100k - $120k". Others are labeled as TC("120k TC") for total composition.
How would I be able to find the average of these numbers by using a something along the lines of substituting letters or parsing the string into a number WITHOUT changing the actual values listed? I do NOT want to mutate the original cell value of I only want to find an average of them all through a formula to bypass the additional "k", "TC" and "-" rendering them un-averageable as they are not parsed as numbers.
Would need to clean up the texts in stages.
find if a certain text is present: eg.
=IF(IFERROR(FIND("-",A1,1),"")<>"","- is present","")
=IF(IFERROR(FIND("TC",A1,1),"")<>"","TC is present","")
=IF(IFERROR(FIND("$",A1,1),"")<>"","$ is present","")
then split left and right price values if "-" is present: eg.
=LEFT(A1,FIND("-",A1,1))
=RIGHT(A1,FIND("-",A1,1))
then if texts are present, remove those texts: eg.
=SUBSTITUTE(A1,"-","")
=SUBSTITUTE(A1,"$","")
=SUBSTITUTE(A1,"k","")
then can use trim() to remove spaces on ends, value() to convert text to number etc...
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Hi I have a huge set of data with thousands of columns, one of the column I need to extract certain string patterns: e.g. 41242456-2020-12 or 41242456-2020-2 or 41242456-2020-200 (8 digit number-year-1~3 digit number), that was mixed among text in the string, e.g. most of times the numbers appear in the beginning, sometimes its like the following:
Blah Blah LEX#41242456-2020-12BLABLABLAH
Blah Blah LEXIDA ID:41242456-2020-12BLAHBLAHBLAH etc.
Hence unable to extract them fully through one formula.
Is there a way I can use any formula/vba code to only extract 41242456-2020-12 and removing all other characters?
Look here and elsewhere on the web on how to use regular expressions in Excel.
The regular expression you want to match against is \d{8}-[12]\d{3}-\d{1,3} which means
eight numbers
a dash
a "1" or a "2" (because if it's 3, or 0 then I assume it's not a valid year)
three numbers
a dash
one to three numbers
You might want to use (\d{8})-([12]\d{3})-(\d{1,3}) so that matching will give you the three numbers for you. Parentheses in regular expressions mean 'return what matched this part.'
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a column in my data containing the following type of data
WRO2->DHLPAKET-ASCHHEIM-DI
Each group of words have its own significance.
I am looking for a way to extract the characters between the second and the third minus. (in this case, its "ASCHHEIM". it might change as per the scenario, so extracting it based on its position would be futile)
I want to extract whatever is in between those - and appear in a column of its own.
In your case, if the suggested method Text-To-Columns is not an option somehow, you could use:
=TRIM(MID(SUBSTITUTE(A1,"-",REPT(" ",LEN(A1))),2*LEN(A1)+1,LEN(A1)))
This part 2*.. stands for (N-1)*.., in this case the third 'word'
More information here
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have a Excel spreadsheet in which one column is for accession numbers. While importing the accession numbers, the person imported the filenames instead of just numbers. So now the accession 'numbers' look like:
SRA002989.sra
SRA002986.sra
....
Is there a way to strip off the extensions and just keep the first part like SRA002989, SRA002986 etc.?
Try using the SUBSTITUTE() function. If your data is in Column A, the following will work and can be copied around as necessary.
=SUBSTITUTE(A1,".sra","")
The simplest approach would be to select the column, press CTRL + H for search and replace and to search for ".sra" and replace it with an empty value (just leave the second field empty).