Recursive LAMBDA to replace characters by specific substitutes from a lookup table

Recursive LAMBDA to replace characters by specific substitutes from a lookup table - excel

The goal is to iterate through rows of the character table and replace each character with it's substitute.
The character table in this example is ={"&","&";"<","<";">",">";"'","&apos;";"""","""}, or:
*(Sidenote: "&","&" must be last on the list in this exact case, or it will replace other occurrences from previous substitutions, since we're going last to first.)
Formula:
=LAMBDA(XML,Pos,
LET(
Cls,{"&","&";"<","<";">",">";"'","&apos;";"""","""},
Row,IF(ISOMITTED(Pos),ROWS(Cls),Pos),
Crf,INDEX(Cls,Row,1),
Crr,INDEX(Cls,Row,2),
Sub,SUBSTITUTE(XML,Crf,Crr),
IF(Row=0,XML,ENCODEXML(Sub,Row-1))
))
Expected result for =ENCODEXML("sl < dk & jf") would be sl < dk &amp jf
I'm getting #VALUE! error instead.

You need to have an exit on the recursive:
=LAMBDA(XML,Pos,
LET(
Cls,{"&","&";"<","<";">",">";"'","&apos;";"""","""},
Row,IF(ISOMITTED(Pos),ROWS(Cls),Pos),
Crf,INDEX(Cls,Row,1),
Crr,INDEX(Cls,Row,2),
Sub,SUBSTITUTE(XML,Crf,Crr),
IF(Row>1,ENCODEXML(Sub,Row-1),Sub)
))
You need to add the , in the call:
=ENCODEXML("sl < dk & jf",)
Or as #Filcuk discovered(and I learned just now) if it is optional it needs to be declared using []
ie:
=LAMBDA(XML,[Pos],
LET(
Cls,{"&","&";"<","<";">",">";"'","&apos;";"""","""},
Row,IF(ISOMITTED(Pos),ROWS(Cls),Pos),
Crf,INDEX(Cls,Row,1),
Crr,INDEX(Cls,Row,2),
Sub,SUBSTITUTE(XML,Crf,Crr),
IF(Row>1,ENCODEXML(Sub,Row-1),Sub)
))
Then the , is not needed:
=ENCODEXML("sl < dk & jf")

Just to complement the answer above by Scott; using a recursive lambda through the name manager seems to be obsolete (if one doesn't explicitly need a named function for later use). Since REDUCE() is a recursive function on it's own. Therefor, one can apply the following structure:
=LET(X,<LookupTable>,REDUCE(<InputValue>,INDEX(X,0,1),LAMBDA(a,b,SUBSTITUTE(a,b,VLOOKUP(b,X,<ReturnCol>,0)))))
Where:
<LookupTable> - Refers to a matrix where the leftmost column holds the lookup values. This is particularly true for VLOOKUP() however, with different structures one can start using XLOOKUP() (to make the solution more applicable);
<InputValue> - A reference to the input string you need to apply the substitution to;
<ReturnCol> - In addition to the 1st point: when one uses VLOOKUP() an index refering to the column with the replacement values need to be given;
In the case given by OP this would translate to:
=LET(X,{"&","&";"<","<";">",">";"'","&apos;";"""","""},REDUCE("sl < dk & jf",INDEX(X,0,1),LAMBDA(a,b,SUBSTITUTE(a,b,VLOOKUP(b,X,2,0)))))

Related

MATLAB cell to string

I am trying to read an excel sheet and then and find cells that are not empty and have date information in them by finding two '/' in a string
but matlab keeps to erroring on handling cell type
"Undefined operator '~=' for input arguments of type 'cell'."
"Undefined function 'string' for input arguments of type 'cell'."
"Undefined function 'char' for input arguments of type 'cell'."
MyFolderInfo = dir('C:\');
filename = 'Export.xls';
[num,txt,raw] = xlsread(filename,'A1:G200');
for i = 1:length(txt)
if ~isnan(raw(i,1))
if sum(ismember(char(raw(i,1)),'/')) == 2
A(i,1) = raw(i,1);
end
end
end
please help fixing it

There are multiple issues with your code. Since raw is a cell array, you can't run isnan on it, isnan is for numerical arrays. Since all you're interested in is cells with text in them, you don't need to use raw at all, any blank cells will not be present in txt.
My approach is to create a logical array, has_2_slashes, and then use it to extract the elements from raw that have two slashes in them.
Here is my code. I generalized it to read multiple columns since your original code only seemed to be written to handle one column.
filename = 'Export.xls';
[~, ~, raw] = xlsread(filename, 'A1:G200');
[num_rows, num_cols] = size(raw);
has_2_slashes = false(num_rows, num_cols);
for row = 1:num_rows
for col = 1:num_cols
has_2_slashes(row, col) = sum(ismember(raw{row, col}, '/')) == 2;
end
end
A = raw(has_2_slashes);

cellfun(#numel,strfind(txt,'/'))
should give you a numerical array where the (i,j)th element contains the number of slashes. For example,
>> cellfun(#numel,strfind({'a','b';'/','/abc/'},'/'))
ans =
0 0
1 2
The key here is to use strfind.
Now you may want to expand a bit in your question on what you intend to do next with txt -- in other words, specify desired output more, which is always a good thing to do. If you intend to read the dates, it may be better to just read it upfront, for example by using regexp or datetime as opposed to getting an array which can then map to where the dates are. As is, using ans>=2 next gives you the logical array that can let you extract the matched entries.

Displaying positive symbol for positive elements in MATLAB array?

I have an array in MATLAB, and I wanted to display the positive symbol, "+" in front of positive elements, and keep the negative symbol, "-" in already existing negative values. I thought I could do the following:
I was thinking of constructing a sort of cell string or string array, and having an if, else system where if the numbers magnitude was >0, then I should store the value as '+' concatenated with the conversion of the element. If it was 0, just do a straight up char conversion since 0 has no sign, and if it was negative, just convert it. I know what to do, however, logistically, I think my order of commands is whacky.
How can I implement this?
I have the following script for an array x, but it just spews out values, I want an orderly string array I can copy and paste for use outside of MATLAB.
x;
pos = '+';
bound = length(x);
for i=1:bound
if(x(i)==0)
num2str(x(i))
end
if(x(i)>0)
num2str(x(i))
strcat(pos,num2str(x(i)))
end
if(x(i)<0)
num2str(x(i))
strcat(pos,num2str(x(i)))
end
end

I think you are searching for this.
Let's make an example.
First type in your command window :
test = 5;
Then:
sprintf('%+d',test)
You should have in this way what you want.
Of course you need to adapt it to your case. I suggest you to read this.
I hope it helps.

How can I combine multiple nested Substitute functions in Excel?

I am trying to set up a function to reformat a string that will later be concatenated. An example string would look like this:
Standard_H2_W1_Launch_123x456_S_40K_AB
Though sometimes the "S" doesn't exist, and sometimes the "40K" is "60K" or not there, and the "_AB" can also be "_CD" or _"EF". Finally, all underscores need to be changed to hyphens. The final product should look like this:
Standard-H2-W1-Launch-123x456-
I have four functions that if ran one after the other will take care of all of this:
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"_AB","_"),"_CD","_"),"_EF","_")
=SUBSTITUTE(SUBSTITUTE(B2,"_40K",""),"_60K","")
=SUBSTITUTE(C2,"_S_","_")
=SUBSTITUTE(D2,"_","-")
I've tried a number of ways of combining these into one function, but I'm relatively new to this level of excel so I'm at a loss. Is there anyway to combine all of this so that it executes one command after the other in one cell?

To simply combine them you can place them all together like this:
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"_AB","_"),"_CD","_"),"_EF","_"),"_40K",""),"_60K",""),"_S_","_"),"_","-")
(note that this may pass the older Excel limit of 7 nested statements. I'm testing in Excel 2010
Another way to do it is by utilizing Left and Right functions.
This assumes that the changing data on the end is always present and is 8 characters long
=SUBSTITUTE(LEFT(A2,LEN(A2)-8),"_","-")
This will achieve the same resulting string
If the string doesn't always end with 8 characters that you want to strip off you can search for the "_S" and get the current location. Try this:
=SUBSTITUTE(LEFT(A2,FIND("_S",A2,1)),"_","-")

nesting SUBSTITUTE() in a string can be nasty, however, it's always possible to arrange it:

Thanks for the idea of breaking down a formula Werner!
Using Alt+Enter allows one to put each bit of a complex substitute formula on separate lines: they become easier to follow and automatically line themselves up when Enter is pressed.
Just make sure you have enough end statements to match the number of substitute( lines either side of the cell reference.
As in this example:
=
substitute(
substitute(
substitute(
substitute(
B11
,"(","")
,")","")
,"[","")
,"]","")
becomes:
=
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(B12,"(",""),")",""),"[",""),"]","")
which works fine as is, but one can always delete the extra paragraphs manually:
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B12,"(",""),")",""),"[",""),"]","")
Name > substitute()
[American Samoa] > American Samoa

I would use the following approach:
=SUBSTITUTE(LEFT(A2,LEN(A2)-X),"_","-")
where X denotes the length of things you're not after. And, for X I'd use
(ISERROR(FIND("_S",A2,1))*2)+
(ISERROR(FIND("_40K",A2,1))*4)+
(ISERROR(FIND("_60K",A2,1))*4)+
(ISERROR(FIND("_AB",A2,1))*3)+
(ISERROR(FIND("_CD",A2,1))*3)+
(ISERROR(FIND("_EF",A2,1))*3)
The above ISERROR(FIND("X",.,.))*x will return 0 if X is not found and x (the length of X) if it is found. So technically you're trimming A2 from the right with possible matches.
The advantage of this approach above the other mentioned is that it's more apparent what substitution (or removal) is taking place, since the "substitution" is not nested.

=SUBSTITUTE(text, old_text, new_text)
if: a=!, b=#, c=#,... x=>, y=?, z=~, " "=" "
then: abcdefghijklmnopqrstuvwxyz ... try this out
equals: !##$%^&*()-=+[]\{}|;:/<>?~ ... ;}? ;*(| ]:;
RULES:
(1) text to substitute is in cell A1
(2) max 64 substitution levels (the formula below only has 27 levels [alphabet + space])
(2) "old_text" cannot also be a "new_text" (ie: if a=z .: z cannot be "old text")
---so if a=z,b=y,...y=b,z=a, then the result is
---abcdefghijklmnopqrstuvwxyz = zyxwvutsrqponnopqrstuvwxyz (and z changes to a then changes back to z) ... (pattern starts to fail after m=n, n=m... and n becomes n)
The formula is:
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"a","!"),"b","#"),"c","#"),"d","$"),"e","%"),"f","^"),"g","&"),"h","*"),"i","("),"j",")"),"k","-"),"l","="),"m","+"),"n","["),"o","]"),"p","\"),"q","{"),"r","}"),"s","|"),"t",";"),"u",":"),"v","/"),"w","<"),"x",">"),"y","?"),"z","~")," "," ")

lua string get Nth number

I am trying to find the Nth number in a string. Should i use string.find? If so, how? I know the arguments are the string to search and the pattern to find, but the 3rd argument (where to start) seems like it might just work.
the lua string tutorial i am looking at
thanks!

You'll want to create a function that splits your string into an array. Once you've done this, you'll be able to return whatever number position you're looking for.
function findnth(str, nth)
local array = {}
for i in string.gmatch(str, "%d+") do
table.insert(array, i)
end
return array[nth]
end
The function above works like so:
print(findnth("1 3 7 2 15 2", 4))
Returns:
2
Edit: Changed function to suit OP's specific needs.

String manipulation with Excel - how to remove part of a string if another part is there?

I've done some Googling, and can't find anything, though maybe I'm just looking in the wrong places. I'm also not very adept at VBA, but I'm sure I can figure it out with the right pointers :)
I have a string I'm building that's a concatenation of various cells, based on various conditions. I hit these in order.
=IF(A405<>A404,G405,G405&H404)
What I want to do is go back through my concatenated list, removing a superseded value if the superseder is in the list.
For example, see the following list:
A, D, G, Y, Z
I want to remove D if and only if Y is present.
How would I go about this? (VBA or in-cell, though I'd prefer in-cell)

Try:
=IF(ISERROR(FIND("Y",A1)),A1,SUBSTITUTE(A1,"D, ",""))
But that assumes you always have the comma and space following the D.

Firstly, why not keep a string array instead as you go through all the cells, then concatenate it all at the end?
Otherwise, you'll be using string functions like INSTR and MID to do something like:
start1 = instr(myLongString,"Y, ")
if start1 > 0 Then
start2 = instr(myLongString,"D, ")
if start2 > 0 then
newLongString = left(myLongString, start2 - 1) & _
mid(myLongString, start2 + 3)
end if
end if
But, as I said, I would keep an array that is easy to loop through, then once you have all the values you KNOW you will use, just concatenate them at the end.

VBA : You can always use the regexp object.
I think that gives you the ability to test anything on your script as long as you build correctly the regular expression.
Check out : http://msdn.microsoft.com/en-us/library/yab2dx62(VS.85).aspx ( for regexp reference )
and a simple tool to test your regexps : http://www.codehouse.com/webmaster_tools/regex/
In-cell: you could do it in a more excel friendly way:
suppose on column A:A you have the values.
You can add a new column where you perform the check
if(indirect("A"&row()) <> indirect("A"&row()-1), indirect("G"&row()), indirect("G"&row())& indirect("H"&row()))
or whatever the values are. I guess however that on one branch of the if statement the value should be blank. After that you concatenate only the B:B column values ( skipping blanks if needed ).
Hope this helps.

It's probably easier to start at the end, make your additions to the beginning of the string, and only add D if Y is not present.

I guess D could appear anywhere, so how about:
If InStr(strString, "Y") > 0 Then
strString = Replace(strString, "d", "")
strString = Replace(strString, " ", "")
strString = Replace(strString, " ,", "")
strString = Replace(strString, ",,", ",")
End If

If there are not too many of these combinations that you want to remove, you can use =IF(FIND("D"; A2)> 0; REPLACE(A2;1;3;"");A2).

I just got this as a possible solution via email, too:
=IF(A15<>A14,G15,IF(OR(AND(G15="CR247, ",ISNUMBER(FIND("CR247, ",H14))),AND(G15="CR149, ",ISNUMBER(FIND("CR215, ",H14))),AND(G15="CR149, ",ISNUMBER(FIND("CR180, ",H14))),AND(G15="CR180, ",ISNUMBER(FIND("CR215, ",H14))),G15="CR113, "),H14,G15&H14))
(this has the "real" values with precedence rules)
It looks relatively similar to #Joseph's answer.
Is there a better solution?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Recursive LAMBDA to replace characters by specific substitutes from a lookup table - excel

Related

MATLAB cell to string

Displaying positive symbol for positive elements in MATLAB array?

How can I combine multiple nested Substitute functions in Excel?

lua string get Nth number

String manipulation with Excel - how to remove part of a string if another part is there?

Categories

Resources