Excel Remove only last characters if they match - excel-formula

I've been trying a few different ways to try and search and replace on excell to remove the last couple of characters.
For instance in one column I have product name S
I want to remove the " S" only.
I have tried some if formulas a swell and not had much luck. I'm assuming there is a simple regex that can be used for the search and replace e.g. " S/" that would just replace if its the last characters and has nothing after it.

Try using the SUBSTITUTE function and replace the letters you want to remove with a unique character/ word / space not appearing anywhere else in the booklet, depending on which part of the string you're trying to remove and what format you're trying to keep
then find and replace ( CTRL +F) that word with the black (space) character
see how to use SUBSTITUTE function here:
https://exceljet.net/excel-functions/excel-substitute-function

Since you are only interested in the end of the string, I don't think you need regex or anything too sophisticated.
If I understand correctly, you want to get the original string (product name S) up until but not including something that appears at the end (S). This means that in your example, you want the 12 leftmost digits: the digits of the original string (14) minus the digits of the pattern (2) - this would give you product name. If the original string does not end with the pattern, you want the original string.
Therefore, I suggest the following:
=IF(RIGHT("original string",LEN("pattern"))="pattern",
LEFT("original string",LEN("original string")-LEN("pattern")),
"original string")
Check these examples:

Related

How can I substitute multiple occurrences of junk strings in Excel?

In the image, 'muddle' is the string containing junk words and the strings I want to extract. There is a fixed list of junk words - the good strings could be literally anything.
You can see this formula has correctly extracted "moo" and "coo", which are not in the list of junk words. The formula is below.
=LET(junkStart,FILTER(SEARCH(Table1[junkwords],Table2[muddle]),ISNUMBER(SEARCH(Table1[junkwords],Table2[muddle]))),
junkEnd,FILTER(SEARCH(Table1[junkwords],Table2[muddle])+LEN(Table1[junkwords])-1,ISNUMBER(SEARCH(Table1[junkwords],Table2[muddle])+LEN(Table1[junkwords])-1)),
goodstart,FILTER(junkEnd+1,(junkEnd+1<=LEN(Table2[muddle]))*(ISERROR(XMATCH(junkEnd+1,junkStart)))),
goodend,FILTER(junkStart-1,(junkStart-1>=LEN(1))*(ISERROR(XMATCH(junkStart-1,junkEnd))))+1,
goodchars,goodend-goodstart,
TEXTJOIN("; ",TRUE,MID(Table2[muddle],goodstart,goodchars)))
This works well, but it falls down if a junk word occurs more than once. See below.
The only difference is that 'woo' occurs twice in the second example.
I need a single cell solution. VBA is not an option for me. Using the name manager would be untidy, as would nested formulas.
I've got this far with formulas, which as far as I can tell is the furthest anyone has got with the 'removing multiple words from a cell' problem. I can see the issue - once SEARCH locates the start of a string in a cell, it doesn't go looking for a second occurrence of that string. But I don't know how to find the start of every instance of every string. Can anyone help?
REDUCE is perfect for this:
=REDUCE(Table2[muddle],Table1[junkwords],LAMBDA(m,j,SUBSTITUTE(m,j,"")))
REDUCE starts at the Table2[muddle] value as m then it substitutes the first value of Table1[junkwords] j with "" the outcome becomes the new m which will get a substitute of the second value of j. The result will be the new m, etc.
If you would want to have it comma separated it becomes more complicated, but you can realize by:
=LET(t,SUBSTITUTE(","&REDUCE(Table2[muddle],Table1[junkwords],LAMBDA(x,y,SUBSTITUTE(x,y,",")))&",",",,",","),
MID(t,2,LEN(t)-3))
This does almost the same as the previous solution, but instead of substituting for blanks it substitutes for , and substitutes all duplicate ,, for singles, so if more substitutes followed eachother it results in one comma. Also, if the first and/or last part got substituted by a single ,, then the result would have a leading and/or trailing ,. This is solved by first adding , in the front and back before substituting the double comma's for singles. the result t is then wrapped in MID, where the first and last character (both being a ,) are removed.
Alternate solution:
=LET(t,REDUCE(Table2[muddle],Table1[junkwords],LAMBDA(x,y,SUBSTITUTE(x,y," "))),
SUBSTITUTE(TRIM(t)," ",","))
Or in one go if you don't want to use LET:
=SUBSTITUTE(TRIM(REDUCE(Table2[muddle],Table1[junkwords],LAMBDA(x,y,SUBSTITUTE(x,y," "))))," ",",")
This replaces the junk words with a space. Regardless how many junk words in between words or how many trailing or leading spaces TRIM will fix it to the words separated by one space only. Substituting the spaces for comma gets to your result.
There's no single-formula solution if the junkwords list is not fixed.
Instead, you may choose to use the Substitute() function on each cell of the "Extracted Strings" column to substitute all occurances of each junk word in muddle, i.e. substitute "boo" muddle, then substitute "voo" in the resulted string, replace "noo" in the resulted string...so on. You will get the last cell.
One point to note though, you need to ensure no substring / partial strings problem in the junkwords or you need to define the rules of processing in order for the solution to be "complete". Consider the followings:
junk words = abc, def, cde
muddle = 1234abcdef5678
if you process the string in the above order, you got "12345678"
if you process the junk words in reverse order, you got "123abf5678"

Excel: Find words of certain length in string?

I have this file where I want to make a conditional check for any cell that contains the letter combination "_SOL", or where the string is followed by any numeric character like "_SOL1524", and stop looking after that. So I don't want matches for "_SOLUTION" or "_SOLothercharactersthannumeric".
So when I use the following formula, I also get results for words like "_SOLUTION":
=IF(ISNUMBER(FIND("_SOL",A1))=TRUE,"Yay","")
How can I avoid this, and only get matches if the match is "_SOL" or "_SOLnumericvalue" (one numeric character)
Clarification: The whole strings may be "Blabla_SOL_BLABLA", "Blabla_SOLUTION_BLABLA" or "Blabla_SOL1524_BLABLA"
Maybe this, which will check if the character after "_SOL" is numeric.
=IF(ISNUMBER(VALUE(MID(A1,FIND("_SOL",A1)+4,1))),"Yay","")
Or, as per OP's request and suggestion, to include the possibility of an underscore after "SOL"
=IF(OR(ISNUMBER(VALUE(MID(A1,FIND("_SOL",A1)+4,1))),ISNUMBER(FIND("_SOL_",A1))),"Yay","")
Here is an alternative way to check if your string contains SOL followed by either nothing or any numeric value up to any characters after SOL:
=IF(COUNT(FILTERXML("<t><s>"&SUBSTITUTE(A1,"_","1</s><s>")&"</s></t>","//s[substring-after(.,'SOL')*0=0]")>0),"Yey","Nay")
Just to use in an unfortunate event where you would encounter SOL1TEXT for example. Or, maybe saver (in case you have text like AEROSOL):
=IF(COUNT(FILTERXML("<t><s>"&SUBSTITUTE(A1,"_","</s><s>")&"</s></t>","//s[translate(.,'1234567890','')='SOL']")>0),"Yey","Nay")
And to prevent that you have text like 123SOL123 you could even do:
=IF(COUNT(FILTERXML("<t><s>"&SUBSTITUTE(A1,"_","1</s><s>")&"</s></t>","//s[starts-with(., 'SOL') and substring(., 4)*0=0]")>0),"Yey","Nay")

How to make FIND function exact in Excel

I'm using the FIND function in Excel to check whether certain characters appear in a string of characters in a cell.
However, this function doesn't work cleanly for certain special characters. Specifically F̌,B̌, and some others. When F̌ appears in the string, FIND recognizes it as both F and F̌.
Notable that this is not the case for characters such as Ď and Č. FIND works nicely for these.
How can I get the formula to always differentiate between characters with and without the hat? Is there a way to work in EXACT?
Thank you!
It is because F̌ is actually two characters.
=LEN("F̌") returns 2 not 1. The second character is the hat.
If you do:
=UNICHAR(70)&UNICHAR(780)
It will return the F̌
And as such =FIND("F","F̌") will return 1 as it is the first letter of a two character string.
To find "F" in A,B,F̌,F use:
=AGGREGATE(15,7,ROW($ZZ1:INDEX($ZZ:$ZZ,LEN(A1)))/((MID(A1,ROW($ZZ1:INDEX($ZZ:$ZZ,LEN(A1))),1)="F")*(MID(A1,ROW($ZZ2:INDEX($ZZ:$ZZ,LEN(A1)+1)),1)<>UNICHAR(780))),1)
To find either then we need to use IF:
=IF(LEN(A2)=2,FIND(A2,A1),AGGREGATE(15,7,ROW($ZZ$1:INDEX($ZZ:$ZZ,LEN(A1)))/((MID(A1,ROW($ZZ$1:INDEX($ZZ:$ZZ,LEN(A1))),1)=A2)*(MID(A1,ROW($ZZ$2:INDEX($ZZ:$ZZ,LEN(A1)+1)),1)<>UNICHAR(780))),1))
Given that your substrings are comma-separated, look for the character followed by a comma (and add a comma to the end of the string to find the last character).
This allows you to separate multicharacter substrings from uni-character substrings where the latter is contained in the former.
You could use something like:
=FIND("F,",A5&",")
That will find an F in A5, but will not find an F if only F̌ is present

Excel- Turn MARIA ANDERS to M.ANDERS as well as 99 other names, only if the first name is > 3 letters

I am trying to output a customer name using vlookup. The names are UPPERCASE but I need to limit them if the first name is longer than 3 characters, I need to just use the first letter of the first name. Here is the problem:
If the First name is longer than three characters you should just use the first character
of the first name and put a dot after that. For example if the customer name is “Steve
Johnson” the system should show “S. JOHNSON” or for “Ana Johnson” the system
should show “ANA JOHNSON”.
I should be able to do this without VB. Maybe an IF statement? Like if the first name is > 3 letters, take the first letter in the string?
Use Find to locate the space, and If test the position. Then either return the string as is, or manipulate the string to suit your needs. Wrap the whole thing in Upper to get upper case
=UPPER(IF(FIND(" ",A1)<=4,A1,LEFT(A1,1)&"."&MID(A1,FIND(" ",A1),999)))

Is there a way to use the Find and Replace function in Excel to replace the first and last entry in a string of text?

I have a sequence of a letters, in my case part of a gene. I want to change the first and the last letter in this string of text, but keep the internal characters the same.
For example, if I have the the sequence:
ATCGAATCCATGACG
And I want to change the first letter, in this case A to the word START and change the last letter, G, to STOP all while keeping the internal A's and G's the same. Is this possible to do with the Find and Replace function, or will I have to write a script?
It is easy to do when I have a handful of sequences, I do it by hand. When I get into the hundreds, it can be very difficult.
Thank you.
The function LEN(text) returns the number of characters within a string of letters. MID(text, start, num_chars) returns the middle section of a string. CONCATENATE(text1, text2, ...) pieces together different strings. We can use these in combination to get what you want:
=CONCATENATE("START", MID(A1,2,LEN(A1)-2), "STOP")
You could use replace, and focus on the left and right side independently, then combine, or you can use left/right to add string of text to the available string minus a character, like:
="START"&LEFT(RIGHT(A1,LEN(A1)-1),LEN(A1)-2)&"STOP"
I used left/right, but mid would also work
just another option:
=REPLACE(REPLACE(A1,LEN(A1),1,"STOP"),1,1,"START")

Resources