How can I combine multiple nested Substitute functions in Excel? - excel

I am trying to set up a function to reformat a string that will later be concatenated. An example string would look like this:
Standard_H2_W1_Launch_123x456_S_40K_AB
Though sometimes the "S" doesn't exist, and sometimes the "40K" is "60K" or not there, and the "_AB" can also be "_CD" or _"EF". Finally, all underscores need to be changed to hyphens. The final product should look like this:
Standard-H2-W1-Launch-123x456-
I have four functions that if ran one after the other will take care of all of this:
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"_AB","_"),"_CD","_"),"_EF","_")
=SUBSTITUTE(SUBSTITUTE(B2,"_40K",""),"_60K","")
=SUBSTITUTE(C2,"_S_","_")
=SUBSTITUTE(D2,"_","-")
I've tried a number of ways of combining these into one function, but I'm relatively new to this level of excel so I'm at a loss. Is there anyway to combine all of this so that it executes one command after the other in one cell?

To simply combine them you can place them all together like this:
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"_AB","_"),"_CD","_"),"_EF","_"),"_40K",""),"_60K",""),"_S_","_"),"_","-")
(note that this may pass the older Excel limit of 7 nested statements. I'm testing in Excel 2010
Another way to do it is by utilizing Left and Right functions.
This assumes that the changing data on the end is always present and is 8 characters long
=SUBSTITUTE(LEFT(A2,LEN(A2)-8),"_","-")
This will achieve the same resulting string
If the string doesn't always end with 8 characters that you want to strip off you can search for the "_S" and get the current location. Try this:
=SUBSTITUTE(LEFT(A2,FIND("_S",A2,1)),"_","-")

nesting SUBSTITUTE() in a string can be nasty, however, it's always possible to arrange it:

Thanks for the idea of breaking down a formula Werner!
Using Alt+Enter allows one to put each bit of a complex substitute formula on separate lines: they become easier to follow and automatically line themselves up when Enter is pressed.
Just make sure you have enough end statements to match the number of substitute( lines either side of the cell reference.
As in this example:
=
substitute(
substitute(
substitute(
substitute(
B11
,"(","")
,")","")
,"[","")
,"]","")
becomes:
=
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(B12,"(",""),")",""),"[",""),"]","")
which works fine as is, but one can always delete the extra paragraphs manually:
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B12,"(",""),")",""),"[",""),"]","")
Name > substitute()
[American Samoa] > American Samoa

I would use the following approach:
=SUBSTITUTE(LEFT(A2,LEN(A2)-X),"_","-")
where X denotes the length of things you're not after. And, for X I'd use
(ISERROR(FIND("_S",A2,1))*2)+
(ISERROR(FIND("_40K",A2,1))*4)+
(ISERROR(FIND("_60K",A2,1))*4)+
(ISERROR(FIND("_AB",A2,1))*3)+
(ISERROR(FIND("_CD",A2,1))*3)+
(ISERROR(FIND("_EF",A2,1))*3)
The above ISERROR(FIND("X",.,.))*x will return 0 if X is not found and x (the length of X) if it is found. So technically you're trimming A2 from the right with possible matches.
The advantage of this approach above the other mentioned is that it's more apparent what substitution (or removal) is taking place, since the "substitution" is not nested.

=SUBSTITUTE(text, old_text, new_text)
if: a=!, b=#, c=#,... x=>, y=?, z=~, " "=" "
then: abcdefghijklmnopqrstuvwxyz ... try this out
equals: !##$%^&*()-=+[]\{}|;:/<>?~ ... ;}? ;*(| ]:;
RULES:
(1) text to substitute is in cell A1
(2) max 64 substitution levels (the formula below only has 27 levels [alphabet + space])
(2) "old_text" cannot also be a "new_text" (ie: if a=z .: z cannot be "old text")
---so if a=z,b=y,...y=b,z=a, then the result is
---abcdefghijklmnopqrstuvwxyz = zyxwvutsrqponnopqrstuvwxyz (and z changes to a then changes back to z) ... (pattern starts to fail after m=n, n=m... and n becomes n)
The formula is:
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"a","!"),"b","#"),"c","#"),"d","$"),"e","%"),"f","^"),"g","&"),"h","*"),"i","("),"j",")"),"k","-"),"l","="),"m","+"),"n","["),"o","]"),"p","\"),"q","{"),"r","}"),"s","|"),"t",";"),"u",":"),"v","/"),"w","<"),"x",">"),"y","?"),"z","~")," "," ")

Related

Recursive LAMBDA to replace characters by specific substitutes from a lookup table

The goal is to iterate through rows of the character table and replace each character with it's substitute.
The character table in this example is ={"&","&";"<","<";">",">";"'","&apos;";"""","""}, or:
*(Sidenote: "&","&" must be last on the list in this exact case, or it will replace other occurrences from previous substitutions, since we're going last to first.)
Formula:
=LAMBDA(XML,Pos,
LET(
Cls,{"&","&";"<","<";">",">";"'","&apos;";"""","""},
Row,IF(ISOMITTED(Pos),ROWS(Cls),Pos),
Crf,INDEX(Cls,Row,1),
Crr,INDEX(Cls,Row,2),
Sub,SUBSTITUTE(XML,Crf,Crr),
IF(Row=0,XML,ENCODEXML(Sub,Row-1))
))
Expected result for =ENCODEXML("sl < dk & jf") would be sl < dk &amp jf
I'm getting #VALUE! error instead.
You need to have an exit on the recursive:
=LAMBDA(XML,Pos,
LET(
Cls,{"&","&";"<","<";">",">";"'","&apos;";"""","""},
Row,IF(ISOMITTED(Pos),ROWS(Cls),Pos),
Crf,INDEX(Cls,Row,1),
Crr,INDEX(Cls,Row,2),
Sub,SUBSTITUTE(XML,Crf,Crr),
IF(Row>1,ENCODEXML(Sub,Row-1),Sub)
))
You need to add the , in the call:
=ENCODEXML("sl < dk & jf",)
Or as #Filcuk discovered(and I learned just now) if it is optional it needs to be declared using []
ie:
=LAMBDA(XML,[Pos],
LET(
Cls,{"&","&";"<","<";">",">";"'","&apos;";"""","""},
Row,IF(ISOMITTED(Pos),ROWS(Cls),Pos),
Crf,INDEX(Cls,Row,1),
Crr,INDEX(Cls,Row,2),
Sub,SUBSTITUTE(XML,Crf,Crr),
IF(Row>1,ENCODEXML(Sub,Row-1),Sub)
))
Then the , is not needed:
=ENCODEXML("sl < dk & jf")
Just to complement the answer above by Scott; using a recursive lambda through the name manager seems to be obsolete (if one doesn't explicitly need a named function for later use). Since REDUCE() is a recursive function on it's own. Therefor, one can apply the following structure:
=LET(X,<LookupTable>,REDUCE(<InputValue>,INDEX(X,0,1),LAMBDA(a,b,SUBSTITUTE(a,b,VLOOKUP(b,X,<ReturnCol>,0)))))
Where:
<LookupTable> - Refers to a matrix where the leftmost column holds the lookup values. This is particularly true for VLOOKUP() however, with different structures one can start using XLOOKUP() (to make the solution more applicable);
<InputValue> - A reference to the input string you need to apply the substitution to;
<ReturnCol> - In addition to the 1st point: when one uses VLOOKUP() an index refering to the column with the replacement values need to be given;
In the case given by OP this would translate to:
=LET(X,{"&","&";"<","<";">",">";"'","&apos;";"""","""},REDUCE("sl < dk & jf",INDEX(X,0,1),LAMBDA(a,b,SUBSTITUTE(a,b,VLOOKUP(b,X,2,0)))))

Remove first space if string contains exactly 2 spaces

I'm having issues when trying to remove the first space of a string if that string has 2 spaces in it. For example it should be turning "Fully Functional Method" into "FullyFunctional Method", but "Functional Method" should not be changed because it only has 1 space. I can't really think of a way to remove first space if the string contains 2 spaces.
I don't know exactly what you want to do, but you may search into RegExp and String.replace() to replace some stuff in a String.
Here is another link to understand the Characters, metacharacters, and metasequences.
var myPattern1:RegExp = / /g;
var str1:String = "This is a string that contains double spaces.";
trace(str1.replace(myPattern1, " "));
//this replaces all " " by " "...
//outputs : This is a string that contains double spaces.
Or in your case (I suppose) something like this
var myPattern2:RegExp = / /;
var str2:String = "Fully Functional Method";
trace(str2.replace(myPattern2, ""));
//If you omit the g, only the first space will be replaced by ""
//outputs : FullyFunctional Method
There is so much things you can do by using RegExp, that I will not explain this here...
Just check on the Adobe website...
This is a quick and efficient way to work on Strings.
I hope this will help.
Since you check at those links, you will understand that my example is pure rough and should be modified to have a FullyFunctional Method. :D
Do a linear scan through the string. Count the number of spaces and record the index of the first space, if any. If there are two spaces, return a string that is the concatenation of the characters up to but not including the first space, and the characters after the first space.
Keep it simple. It is possible to solve your problem with regex, but keep in mind that the worst case time complexity of finding a particular character in an unsorted set is always going to be O(N), so it won't be faster.

How to quickly edit determinate part of code inside different similar lines

I have this problem I'm adjusting a code I've made I have a structure like this:
Apple1 = Fruit("ss","ss",[0.1,0.4],'w')
PineApple = Fruit("ss","ss",[0.315,0.4],'w')
Banana = Fruit("ss","ss",[0.315,0.280],'w')
...
...
Instead of "ss"I would like to type further information like "Golden Delicious". For the moment I'm simply deleting "ss"clicking over it and then replacing it with the information I want to insert. I'm sure there is a faster way to do it, I've tried something with VIM macros but I can't figure out how to "Raw input" my data.
I've try simply to substitute it with Spyder, but is slow because I have to click substitute every time, with VIM for what I've try is the same.
Then I wonder how insert something else after 'w'...
This is an example of an final output only to understand better the question :
Apple1 = Fruit("Golden Delicous","red",[0.1,0.4],'w')
PineApple = Fruit("Ananas comosus","green",[0.315,0.4],'w')
Banana = Fruit(" Cavendish banana","yellow",[0.315,0.280],'w')
...
...
I reformulate the question: which is the faster way to change "ss", for the moment I'm clicking over "ss" delate "ss" and write e.g "Golden Delicous" but is very slow. What I would like is that for every single ss the editor ask me to insert something to replace the single ss.
e.g. first ssin the fist line: I want to replace it typing something else e.g. "Golden Delicous" second ssin the first line I want to replace it typing somethingelse e.g. red. First ssin the second line I want to replace it with s.e. e.g. Ananas comosussecond ssin the second line I want to replace with s.e. e.g. green and so on.
I'm sure there is an answer for this somewhere but I can't find it!
Please if you down vote explain me why so I can improve it!
As far as I understand, the data that you want to substitute for "ss" does not have regular structure, so you will need to enter it by hand.
In Vim you would do it like this:
Place the cursor over the first "ss", then press * and then N.
Press ce, enter the new data (e.g. "Golden Delicious"), then leave Insert mode by pressing Escape.
Press n to jump to the next instance of "ss".
Repeat steps 2 and 3 ad libitum.
Look up :h * and :h n for more information.
I would do it like that:
:%s/ss/\=input('Replacement: ')/gc
This queries you for each occurrence. With the /c flag, the display is even updated during the loop (at the cost of having to additionally answer y for each occurrence); without the flag, you would need to keep track of where you are yourself.
You can use a function that searches the whole file substituting all "ss" strings with values from arrays populated with the replacement data:
function! ChangeSS()
let ss1 = ['Golden Delicous', 'Ananas comosus', 'Cavendish banana']
let ss2 = ['red', 'green', 'yellow']
call cursor(1, 1)
let l = "ss2"
while search('"ss"', 'W') > 0
if l == "ss1"
let l = "ss2"
else
let l = "ss1"
endif
execute 'normal ci"' . remove({l}, 0)
endwhile
endfunction
It uses a reference variable (l) that exchanges which array you want to extract data from. ss1 is for first appearance of "ss" in the line and ss2 for the second one.
Run it like:
:call ChangeSS()
That (in my test) yields:
Apple1 = Fruit("Golden Delicous","red",[0.1,0.4],'w')
PineApple = Fruit("Ananas comosus","green",[0.315,0.4],'w')
Banana = Fruit("Cavendish banana","yellow",[0.315,0.280],'w')

How can I remove repeated characters in a string with R?

I would like to implement a function with R that removes repeated characters in a string. For instance, say my function is named removeRS, so it is supposed to work this way:
removeRS('Buenaaaaaaaaa Suerrrrte')
Buena Suerte
removeRS('Hoy estoy tristeeeeeee')
Hoy estoy triste
My function is going to be used with strings written in spanish, so it is not that common (or at least correct) to find words that have more than three successive vowels. No bother about the possible sentiment behind them. Nonetheless, there are words that can have two successive consonants (especially ll and rr), but we could skip this from our function.
So, to sum up, this function should replace the letters that appear at least three times in a row with just that letter. In one of the examples above, aaaaaaaaa is replaced with a.
Could you give me any hints to carry out this task with R?
I did not think very carefully on this, but this is my quick solution using references in regular expressions:
gsub('([[:alpha:]])\\1+', '\\1', 'Buenaaaaaaaaa Suerrrrte')
# [1] "Buena Suerte"
() captures a letter first, \\1 refers to that letter, + means to match it once or more; put all these pieces together, we can match a letter two or more times.
To include other characters besides alphanumerics, replace [[:alpha:]] with a regex matching whatever you wish to include.
I think you should pay attention to the ambiguities in your problem description. This is a first stab, but it clearly does not work with "Good Luck" in the manner you desire:
removeRS <- function(str) paste(rle(strsplit(str, "")[[1]])$values, collapse="")
removeRS('Buenaaaaaaaaa Suerrrrte')
#[1] "Buena Suerte"
Since you want to replace letters that appear AT LEAST 3 times, here is my solution:
gsub("([[:alpha:]])\\1{2,}", "\\1", "Buennaaaa Suerrrtee")
#[1] "Buenna Suertee"
As you can see the 4 "a" have been reduced to only 1 a, the 3 r have been reduced to 1 r but the 2 n and the 2 e have not been changed.
As suggested above you can replace the [[:alpha:]] by any combination of [a-zA-KM-Z] or similar, and even use the "or" operator | inside the squre brackets [y|Q] if you want your code to affect only repetitions of y and Q.
gsub("([a|e])\\1{2,}", "\\1", "Buennaaaa Suerrrtee")
# [1] "Buenna Suerrrtee"
# triple r are not affected and there are no triple e.

String manipulation with Excel - how to remove part of a string if another part is there?

I've done some Googling, and can't find anything, though maybe I'm just looking in the wrong places. I'm also not very adept at VBA, but I'm sure I can figure it out with the right pointers :)
I have a string I'm building that's a concatenation of various cells, based on various conditions. I hit these in order.
=IF(A405<>A404,G405,G405&H404)
What I want to do is go back through my concatenated list, removing a superseded value if the superseder is in the list.
For example, see the following list:
A, D, G, Y, Z
I want to remove D if and only if Y is present.
How would I go about this? (VBA or in-cell, though I'd prefer in-cell)
Try:
=IF(ISERROR(FIND("Y",A1)),A1,SUBSTITUTE(A1,"D, ",""))
But that assumes you always have the comma and space following the D.
Firstly, why not keep a string array instead as you go through all the cells, then concatenate it all at the end?
Otherwise, you'll be using string functions like INSTR and MID to do something like:
start1 = instr(myLongString,"Y, ")
if start1 > 0 Then
start2 = instr(myLongString,"D, ")
if start2 > 0 then
newLongString = left(myLongString, start2 - 1) & _
mid(myLongString, start2 + 3)
end if
end if
But, as I said, I would keep an array that is easy to loop through, then once you have all the values you KNOW you will use, just concatenate them at the end.
VBA : You can always use the regexp object.
I think that gives you the ability to test anything on your script as long as you build correctly the regular expression.
Check out : http://msdn.microsoft.com/en-us/library/yab2dx62(VS.85).aspx ( for regexp reference )
and a simple tool to test your regexps : http://www.codehouse.com/webmaster_tools/regex/
In-cell: you could do it in a more excel friendly way:
suppose on column A:A you have the values.
You can add a new column where you perform the check
if(indirect("A"&row()) <> indirect("A"&row()-1), indirect("G"&row()), indirect("G"&row())& indirect("H"&row()))
or whatever the values are. I guess however that on one branch of the if statement the value should be blank. After that you concatenate only the B:B column values ( skipping blanks if needed ).
Hope this helps.
It's probably easier to start at the end, make your additions to the beginning of the string, and only add D if Y is not present.
I guess D could appear anywhere, so how about:
If InStr(strString, "Y") > 0 Then
strString = Replace(strString, "d", "")
strString = Replace(strString, " ", "")
strString = Replace(strString, " ,", "")
strString = Replace(strString, ",,", ",")
End If
If there are not too many of these combinations that you want to remove, you can use =IF(FIND("D"; A2)> 0; REPLACE(A2;1;3;"");A2).
I just got this as a possible solution via email, too:
=IF(A15<>A14,G15,IF(OR(AND(G15="CR247, ",ISNUMBER(FIND("CR247, ",H14))),AND(G15="CR149, ",ISNUMBER(FIND("CR215, ",H14))),AND(G15="CR149, ",ISNUMBER(FIND("CR180, ",H14))),AND(G15="CR180, ",ISNUMBER(FIND("CR215, ",H14))),G15="CR113, "),H14,G15&H14))
(this has the "real" values with precedence rules)
It looks relatively similar to #Joseph's answer.
Is there a better solution?

Resources