In the image, 'muddle' is the string containing junk words and the strings I want to extract. There is a fixed list of junk words - the good strings could be literally anything.
You can see this formula has correctly extracted "moo" and "coo", which are not in the list of junk words. The formula is below.
=LET(junkStart,FILTER(SEARCH(Table1[junkwords],Table2[muddle]),ISNUMBER(SEARCH(Table1[junkwords],Table2[muddle]))),
junkEnd,FILTER(SEARCH(Table1[junkwords],Table2[muddle])+LEN(Table1[junkwords])-1,ISNUMBER(SEARCH(Table1[junkwords],Table2[muddle])+LEN(Table1[junkwords])-1)),
goodstart,FILTER(junkEnd+1,(junkEnd+1<=LEN(Table2[muddle]))*(ISERROR(XMATCH(junkEnd+1,junkStart)))),
goodend,FILTER(junkStart-1,(junkStart-1>=LEN(1))*(ISERROR(XMATCH(junkStart-1,junkEnd))))+1,
goodchars,goodend-goodstart,
TEXTJOIN("; ",TRUE,MID(Table2[muddle],goodstart,goodchars)))
This works well, but it falls down if a junk word occurs more than once. See below.
The only difference is that 'woo' occurs twice in the second example.
I need a single cell solution. VBA is not an option for me. Using the name manager would be untidy, as would nested formulas.
I've got this far with formulas, which as far as I can tell is the furthest anyone has got with the 'removing multiple words from a cell' problem. I can see the issue - once SEARCH locates the start of a string in a cell, it doesn't go looking for a second occurrence of that string. But I don't know how to find the start of every instance of every string. Can anyone help?
REDUCE is perfect for this:
=REDUCE(Table2[muddle],Table1[junkwords],LAMBDA(m,j,SUBSTITUTE(m,j,"")))
REDUCE starts at the Table2[muddle] value as m then it substitutes the first value of Table1[junkwords] j with "" the outcome becomes the new m which will get a substitute of the second value of j. The result will be the new m, etc.
If you would want to have it comma separated it becomes more complicated, but you can realize by:
=LET(t,SUBSTITUTE(","&REDUCE(Table2[muddle],Table1[junkwords],LAMBDA(x,y,SUBSTITUTE(x,y,",")))&",",",,",","),
MID(t,2,LEN(t)-3))
This does almost the same as the previous solution, but instead of substituting for blanks it substitutes for , and substitutes all duplicate ,, for singles, so if more substitutes followed eachother it results in one comma. Also, if the first and/or last part got substituted by a single ,, then the result would have a leading and/or trailing ,. This is solved by first adding , in the front and back before substituting the double comma's for singles. the result t is then wrapped in MID, where the first and last character (both being a ,) are removed.
Alternate solution:
=LET(t,REDUCE(Table2[muddle],Table1[junkwords],LAMBDA(x,y,SUBSTITUTE(x,y," "))),
SUBSTITUTE(TRIM(t)," ",","))
Or in one go if you don't want to use LET:
=SUBSTITUTE(TRIM(REDUCE(Table2[muddle],Table1[junkwords],LAMBDA(x,y,SUBSTITUTE(x,y," "))))," ",",")
This replaces the junk words with a space. Regardless how many junk words in between words or how many trailing or leading spaces TRIM will fix it to the words separated by one space only. Substituting the spaces for comma gets to your result.
There's no single-formula solution if the junkwords list is not fixed.
Instead, you may choose to use the Substitute() function on each cell of the "Extracted Strings" column to substitute all occurances of each junk word in muddle, i.e. substitute "boo" muddle, then substitute "voo" in the resulted string, replace "noo" in the resulted string...so on. You will get the last cell.
One point to note though, you need to ensure no substring / partial strings problem in the junkwords or you need to define the rules of processing in order for the solution to be "complete". Consider the followings:
junk words = abc, def, cde
muddle = 1234abcdef5678
if you process the string in the above order, you got "12345678"
if you process the junk words in reverse order, you got "123abf5678"
I am trying to embed a SUBSTITUTE in my function, but I am not sure where to incorporate it. I am trying to extract just the text "Scrumactiviteiten" but in the source data sometimes a space will be in there. A sample:
Column A
1 Team xxxx 2018-17 Scrumactiviteiten 123 and then something
2 Team xxxx 2018-17 Scrum activiteiten 123 and then something
Column B (My formula)
1 Scrumactiviteiten
2 Scrum activiteiten
The function I used to extract it (ignore the "Balans" search please):
=IFERROR(IFERROR(IFERROR(MID(A1;SEARCH("Scrum activiteiten";A1;1);18);
MID(A1;SEARCH("Scrumactiviteiten";A1;1);17));MID(A1;SEARCH("Balans";A1;1);10));" ")
This works fine, but to remove the space I tried to embed a SUBSTITUTE where I use the mid search result as the old text and provide "Scrumactiviteiten" as the new text:
=IFERROR(IFERROR(IFERROR(SUBSTITUTE(A24;((MID(A24;SEARCH("Scrum activiteiten";A24;1);18)));"Scrumactiviteiten");MID(A24;SEARCH("Scrumactiviteiten";A24;1);17));MID(A24;SEARCH("Balans";A24;1);10));" ")
The result however is a copy of the full string. I also tried putting the substitute before the search but that would not work either. I am pretty new to Excel formula's and I think I messed up the order or just plain don't understand how I embed a SUBSTITUTE in the formula I created. Some explanation would be much appreciated on what I'm doing wrong! Thank you in advance,
Mark
The problem is you are not providing the correct arguments to the function, try this formula:
=IFERROR(IFERROR(IFERROR(SUBSTITUTE(((MID(A24;SEARCH("Scrum activiteiten";A24;1);18)));" ";"");MID(A24;SEARCH("Scrumactiviteiten";A24;1);17));MID(A24;SEARCH("Balans";A24;1);10));" ")
To use SUBSTITUTE you first provide the string in which you want to replace something, the next two arguments are the string you want replaced and the string you want to replace it with. So for example =SUBSTITUTE("Scrum activiteiten";" ";"") returns Scrumactiviteiten as the space " " is replaced with an empty string "".
I've been trying a few different ways to try and search and replace on excell to remove the last couple of characters.
For instance in one column I have product name S
I want to remove the " S" only.
I have tried some if formulas a swell and not had much luck. I'm assuming there is a simple regex that can be used for the search and replace e.g. " S/" that would just replace if its the last characters and has nothing after it.
Try using the SUBSTITUTE function and replace the letters you want to remove with a unique character/ word / space not appearing anywhere else in the booklet, depending on which part of the string you're trying to remove and what format you're trying to keep
then find and replace ( CTRL +F) that word with the black (space) character
see how to use SUBSTITUTE function here:
https://exceljet.net/excel-functions/excel-substitute-function
Since you are only interested in the end of the string, I don't think you need regex or anything too sophisticated.
If I understand correctly, you want to get the original string (product name S) up until but not including something that appears at the end (S). This means that in your example, you want the 12 leftmost digits: the digits of the original string (14) minus the digits of the pattern (2) - this would give you product name. If the original string does not end with the pattern, you want the original string.
Therefore, I suggest the following:
=IF(RIGHT("original string",LEN("pattern"))="pattern",
LEFT("original string",LEN("original string")-LEN("pattern")),
"original string")
Check these examples:
I would like to trim off the text which is after the bracket in the cell Value
The current formula I'm using keeps giving me the error not being able to extract the targetted string.
=LEFT([#[Name ]],FIND("(",[#[Name ]]))
I want to go shopping (Today)
Goal: Is to remove
(Today)
Expected Result:
I want to go shopping
One of these should do.
=TRIM(LEFT([name], FIND("(", [name]&"(")-1))
=TRIM(REPLACE([name], FIND("(", [name]&"("), LEN([name]), TEXT(,)))
Note that I suffixed the original text with the character that the FIND is looking for. In this manner, it will always be found even if it is not in the original text.
You may find that you have a rogue trailing space in the Name header label.
In a Google spreadsheet I have some data imported from a .csv file which is loc A1 = 123.4 followed by 2 spaces
I want to use the numeric value but the spreadsheet refuses to recognize the string as a number.
The obvious answer is substitute(A1;" ";"") but this does not work!!. Nor do any of the other string search commands.
Am I going insane?
I am using a Mac running 10.4.7 and chrome
Not sure if you have already solved this... try this...this seems to be working
=Value(Substitute(A13,CHAR(160), ""))
or
=Substitute(A13,CHAR(160), "")*1
OK. I've examined this in Excel (where I'm more handy with VBA/etc) and these are not ordinary "spaces" in your cell, they are actually non-breaking spaces, an ascii chr value of 160 (ordinary space is Chr(32)).
Try this formula to replace the non-breaking space character with a null string:
=SUBSTITUTE(A13,CHAR(160),"")
Excel has a function called Clean() which removes non-printing characters like this, but I do not see this function in Google Docs.
Thanks #kaushai and #davidzemens
I have edited line 22 of the sheet.
It shows that =Substitute(A4,CHAR(160), "") is not numeric
however =Substitute(A4,CHAR(160), "")*1 is numeric
and =value(Substitute(A4,CHAR(160), "")) is numeric