Excel Remove text before string for multiple possible strings - excel

With Excel, I need to find and remove some text from all cells in a column. Using the example below I need get all instances of DEV* and BA* into another column.
Example data in a column:
Data
Dan Smith DEV001
Bob Jones BA005
Bob Jones2 BA 005
Needed Result
DEV001
BA005
BA 005
This example works partially but not with multiple possible matches.
=TRIM(RIGHT(A2, LEN(A2) - SEARCH("DEV", A2)))
How can this be done with multiple possible matches?

Try using this
• Formula used in cell B1
=REPLACE(A1,1,MAX(IFERROR(FIND({"DEV","BA"},A1),""))-1,"")

Here is something to consider for those using ms365 with access to TEXTBEFORE():
Formula in B1:
=SUBSTITUTE(A1,TEXTBEFORE(A1,{"DEV","BA"}),"",1)
TEXTBEFORE() - will look (case sensitive) for either 'DEV' or 'BA' and will return the substring before the 1st occurence of any of these two;
SUBSTITUTE() - will replace (also case sensitive) this returned substring with nothing. And to make sure we won't substitute unwanted parts after the lookup value we only replace the 1st occurence.

Related

MID formula using SEARCH formula to extract text from cell

Very simple formulas following but I am missing some understanding and it seems frustratingly simple.
Very simple text extraction:
MID(A1,Start Num, Num of Chars)
A simple formula text finding formula,
SEARCH(Find_text, within_text, start_num)
Combined these two formulas can find and extract text from a field between two text characters, for instance 'underscores', 'parentheses' 'commas'
So for example to extract
text to extract >>> Jimbo Jones
from a cell containing parentheses an easy formula would be;
Sample text A1 = Incident Report No.1234, user (Jimbo Jones) Status- pending
formula;
=MID(A1, SEARCH("(", A1)+1, SEARCH(")", A1) - SEARCH("(", A1) -1)
Extracted text = Jimbo Jones
The logic is simple
1.Identify cell containing text
2.Find the start number by nominating a first searchable character
3.Find the end number of the text being extracted by searching for the second searchable character
4.Subtracting the Start Number from the End number gives the number of characters to extract
Without using Search formula the code is;
MID=(A1,32,11) = Jimbo Jones
But if i want to extract text between commas or other identical characters (such as quotation marks, apostrophes, asterisk ) I need to use this formula following (which I found suggested)
=MID(A1, SEARCH(",", A1)+1, SEARCH(",", A1, SEARCH(",", A1) +1) - SEARCH(",",A1) -1)
Sample text A1 Incident Report No.1234 user, Jimbo Jones, Status- pending
Extracted text = Jimbo Jones
But I how do i nominate other boundaries, such as text between 3rd and 4th comma for example?
Sample text A1 Incident Report, No.1234, user, Jimbo Jones, Status- pending
The reason for my confusion is in the above formula excel finds the second iteration of the comma no matter where they are in the text yet the actual formula being used is identical to the formula finding the first comma, the count of characters seems to automatically assume somehow that I want the second comma not the first, how do i instruct the formula find subsequent iterations of commas, such as 3rd 4th or 9th?
And what am i not understanding in why the formula finds the 2nd comma?
Cheers!
To explain what you are confused about:
At first sight it looks that it uses same formula to find 1st and 2nd searched symbol. But at second look you might notice that there is and argument start_num which tells for a formula where to start looking from. If you give first symbol location +1 (SEARCH(",", A1) +1))as that argument, formula will start looking for first search symbol in this part: ' No.1234, user, Jimbo Jones, Status- pending' and will give answer 42. You got 1st occasion place with first formula and then second occasion with formula above. Just find length by substracting and thats it.
Possible solutions:
If you have Office 365, use TEXTAFTER() and TEXTBEFORE() as others have stated where you can pass instance number as an argument:
=TEXTAFTER(TEXTBEFORE(A1,",",4),",",3)
Result:
Then you can use TRIM() to get rid of unwanted spaces in begining and end.
If you use older version of Office you can use SUBSTITUTE() as workaround as it lets you to change nth occasion of specific symbol in text.
Choose a symbol that does not appear in your text and change 3th and 4th occasions of your searched symbol to it. Then look for them (in this example we will substitute , to #:
=MID(A1,SEARCH("#",SUBSTITUTE(A1,",","#",3))+1,SEARCH("#",SUBSTITUTE(A1,",","#",4))-(SEARCH("#",SUBSTITUTE(A1,",","#",3))+1))
Little explanation:
Formulas used in explanation column C:
C
=SUBSTITUTE(A1,",","#",3)
=SUBSTITUTE(A1,",","#",4)
=SEARCH("#",B1)
=SEARCH("#",B2)
=MID(A1,B3+1,B4-(B3+1))
Full formula:
=MID(A1,SEARCH("#",SUBSTITUTE(A1,",","#",3))+1,SEARCH("#",SUBSTITUTE(A1,",","#",4))-(SEARCH("#",SUBSTITUTE(A1,",","#",3))+1))
Trimmed:
=TRIM(MID(A1,SEARCH("#",SUBSTITUTE(A1,",","#",3))+1,SEARCH("#",SUBSTITUTE(A1,",","#",4))-(SEARCH("#",SUBSTITUTE(A1,",","#",3))+1)))
Thanks for the responses all, I grinded through using the two Formulas I asked about (MID and SEARCH) and I have a result.
It's not pretty nor elegant but it extracts the data as per requirement. I will benefit from the tips left here in response to my question as simpler options are available.
Requirement: Extract text between 3rd and 4th Commas using MID and SEARCH
Sample text A15
Incident Report (ammended), No.12545424234, user, Jimbo Jones, Status-
pending
MID(A15,(SEARCH(",",A15,(1+(SEARCH(",",A15,SEARCH(",",A15)+1)))))+2,(SEARCH(",",A15,(SEARCH(",",A15,SEARCH(",",A15)+1)+(SEARCH(",",A15)))-(SEARCH(",",A15,SEARCH(",",A15)+1)-(SEARCH(",",A15)))+1)-(SEARCH(",",A15,(1+(SEARCH(",",A15,SEARCH(",",A15)+1))))))-2)
Test Extracted
Jimbo Jones
Obviously this solution works on other text, but it's also obviously not easy to quickly amend for other text locations.
Anyway, cheers again for the pointers...

MS Excel: Extract between the Nth and Nth Character in a string

Using an MS Excel formula (No VBA/Macro), I would like to extract between the Nth and Nth characters and/or letters in a string. Example: I have text in Columns A2 and A3, I would like to extract text located between the 4th space and 9th space in each of the following strings.
Column A2: Johnny went to the store and bought an apple and some grapes
Column A3: We were not expecting to go home so early but we had no other choice due to rain
Results:
Column A2: store and bought an apple
Column A3: to go home so early
With Microsoft 365 or Excel 2019 you can use TEXTJOIN() in combination with FILTERXML():
Formula in B1:
=TEXTJOIN(" ",,FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[position()>4][position()<6]"))
The xpath expressions first selects all elements from the 5th word onwards and consequently only returns the first 5 elements from that respective array. Therefor //s[position()>4][position()<6] can also be written as //s[position()>4 and position()<10].
What you are looking for is a substring function, which is not really available in excel, I would suggest looking at this tutorial for functions that perform the same functionality https://www.excel-easy.com/examples/substring.html, and try to find the best one for your use case.
I would think the MID function would be the best suit for your use case.

Extract Sub-String from String in Excel (No VBA)

I have a series of paths in excel which follow the pattern:
C:\Folder\Subfolder1\SURNAME, Firstname\Subfolder2\SURNAME, Firstname - YYYY MM DD - Invoice.pdf
I cannot use VBA, so using an array formula, how would I extract SURNAME, Firstname?
You may use:
=TRIM(MID(SUBSTITUTE(A1,"\",REPT(" ",LEN(A1))),3*LEN(A1)+1,LEN(A1)))
Where 3* could be read as n-1, so change to whichever number to get the nth substring from a delimited string.
Another option, with access to FILTERXML:
=FILTERXML("<t><s>"&SUBSTITUTE(A1,"\","</s><s>")&"</s></t>","//s[position()=4]")
This would essentially pull the 4th substring from a "\" delimited string. Change position()=4 to the nth position if you like to retrieve other substrings. This option seems a bit longer, but could become handy when you want to retrieve multiple substrings where you just need to change up the XPATH.
After your commend, I think you might want to try:
=FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(A6," - ","\"),"\","</s><s>")&"</s></t>","//s[position()=last()-2]")
With data in A1, in B1 enter:
=LEFT(RIGHT(A1,LEN(A1)-FIND("#",SUBSTITUTE(A1,"\","#",LEN(A1)-LEN(SUBSTITUTE(A1,"\",""))),1)),FIND("-",RIGHT(A1,LEN(A1)-FIND("#",SUBSTITUTE(A1,"\","#",LEN(A1)-LEN(SUBSTITUTE(A1,"\",""))),1)))-2)
Or try,
In B1 copied across right until blank and all copied down :
=TRIM(MID(SUBSTITUTE("\"&MID($A1,FIND("\",$A1,4)+1,FIND("-",$A1,4)-FIND("\",$A1,4)-2),"\",REPT(" ",199)),COLUMN(A1)*399,199))

How can I remove all letters after a certain character in a column in Excel?

For example,
I have a column with email addresses and I want to remove everything before the # sign and everything after the '.' so I can attain the company names.
Such as:
Emails
loo#yahoo.com
christina#google.com
rachel#espn.com
john#apple.com
ahmed#microsoft.com
I want to create a new column that looks like this:
Companies
yahoo
google
espn
apple
ahmed
What is a function I can use to attain this new column?
with data in A2, in B2 enter:
=SUBSTITUTE(MID(A2,FIND("#",A2)+1,9999),".com","")
This will work for all emails ending in .com and if there are some records that do not have com at the end, use:
=MID(A1,FIND("#",A1)+1,FIND(".",A1,FIND("#",A1)+1)-(FIND("#",A1)+1))
This will handle records like:
darth.vader#deathstar.com
in which a dot occurs before the #
You can easily use Mid and Find functions.
Mid gives you the substring from one text with arguments : text from which to find, start and no. of characters to be extracted.
And find returns the position of character in the word .
Use this formula in A2 cell and A1 cell contains your string
=MID(A1,FIND("#",A1)+1,FIND(".",A1)-FIND("#",A1)-1)
Select the column (or copy it into a new column) and press Ctrl+H to go to Find & Replace:
Find *# and replace with nothing (keep blank).
Find .* and replace with nothing (keep blank).
Here * represents any sequence of characters.

Match lookup value lengths for match with beginning of lookup value

In Excel 2013 I have two tables.
The first contains alpha numeric codes that vary in length.
Some examples from first table:
12345.12345
12346-12345
12AB1234
123.123
23456.123
A1234567.012
01234.12345
The second table contains alpha numeric codes I need to match with the beginning of the codes in the first table. Any numeric codes are currently stored as text.
Some examples from second table:
12345
12346
123
23456
A1234567
01234
How do I return a value from a different column in the second table containing any value? And for some context, the return column from the second table contains a description of for the codes.
I did not jet manage to find a solution using vlookup or match.
Also looked at using wildcards, but this only works one way, the wrong way.
The quickest solution, assuming you dont care about letters, is to use a LEFT(FIND( with substitution. If letters need to be excluded, then explanation will need to be provided how the format should be presented.
Solution: =IFERROR(LEFT(A2,FIND(".",SUBSTITUTE(A2,"-","."))-1),A2)
This formula will find the first "." or "-" and present all characters prior to. If none are found, then it will display the full ID.
If letters need to be removed as well, however, it should be noted that the use of some serious substitute nesting, or VBA script will be required.
A1 is the first cell in your column, in B1 write the following:
=LEFT(A1,MATCH(TRUE,ISERROR(VALUE(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1))),0)-1)
press Ctrl+Shift+Enter at the same time (Array Formula)
it will return the first numeric part of your Data
you can copy paste values in column C and compare with the second table
To have the result in Table1 directly in B1 use:
=IFERROR(INDEX(Sheet2!$A$1:$A$4,MATCH(VALUE(LEFT(A1,MATCH(TRUE,ISERROR(VALUE(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1))),0)-1)),Sheet2!$A$1:$A$4,0)),"")
press Ctrl+Shift+Enter at the same time (Array Formula)
It will return the corresponding number from Table2 (sheet2) if matched or "" empty if no match
Change A1:A4 to correspond all your Numbers in Table2 and keep the $ to fix the references when you drag down the formula

Resources