how to do this
puts [string map { ( ) "\(" "\)"} (3.8.001)]
o\p I'm getting $tclsh main.tcl
(3.8.001)
I'm expecting
\(3.8.001\)
help me to do this
You should use the string map as follows,
puts [string map { ( "\\(" ) "\\)"} (3.8.001)]
Backslash has to be used twice, to have a single backslash when used inside double quotes in Tcl.
Whenever I'm confused about how exactly to write a complex string map with backslashes involved, I try building the mapping list with list. I might then use the literal it produces rather than having my script contain the actual list command call, but that's purely optimisation on my part. (And a very low value one; the bytecode compiler does it for me if all the arguments to list are literals.) In particularly tricky cases, I'll build it by stages with lappend, but that's only where what is going on is a true head-scratcher!
Also, the mapping is supposed to be “replaceA withA replaceB withB ...”; you were putting ) and "\(" in the wrong order, and the result would not have been expected to work at all.
set mapping [list "(" "\\(" ")" "\\)"]
# puts "mapping is “$mapping”"; # Yay for unicode quote characters!
puts [string map $mapping (3.8.001)]
The sequence you were looking for is this, with a few more braces and fewer double-quotes, but I encourage you to learn how to work this out for yourself…
puts [string map {( {\(} ) {\)}} "(3.8.001)"]
Related
I am aware that in octave escape sequences are treated differently in single/double quotes. Nevertheless, there seems to be a type difference:
Whereas class("bla") and class('bla') are both char,
typeinfo("bla") is string, whereas typeinfo('bla') is sq_string,
which may be short for single quote string.
More interesting, warning("on", "Octave:mixed-string-concat") activates warning
that these two types are mixed.
So after activation, ["bla" 'bla'] yields a warning.
Note that typeinfo(["bla" "bla"]) is string,
whereas if one of the two strings concatenated is single quote, so is the result,
e.g. typeinfo(['bla' "bla"]) is sq_string.
I have a situation where someone activates the warning
and so I want to program so to avoid these.
Thus my question: is there a way to convert sq_string to string?
The core of my problem is that fieldnames seem to be single quoted strings.
What an interesting question. I've never thought one might have a need for such a warning or conversion ... though now that I think about it, it makes sense if you want to collect 'raw' strings, and have their escape sequences interpreted and vice versa ...
After some experimentation, I have found a way to do what you want: use sprintf. This seems to return a (double-quoted) string if your formatted string is in double quotes, and an sq_string if it's in single quotes. If your formatted string is simply "%s", then you can pass a bunch of strings as subsequent arguments, and these will be concatenated (as a double-quoted string).
If you'd prefer to go in the reverse direction and ensure your strings are always single quoted, you can then still do the above with a single-quoted formatted string, or you can just use strcat: this does not trigger your warning, can also be called with a single argument, and seems to always return an sq_string.
Also, since I would generally recommend using either of these with "cell-generated sequence" syntax for convenience, this means that you would be better off "collecting" individual strings in cells more generally. E.g.
a = { 'one', 'two', 'three' }
b = { "four", "five", "six" }
typeinfo( sprintf( "%s", a{:} ) ) % outputs: string
typeinfo( strcat( b{:} ) ) % outputs: sq_string
I'm currently writing an interpreter for a language I have designed.
The lexer/parser (GLR) is written in Flex/Bison and the main interpreter in D - and everything working flawlessly so far.
The thing is I want to also add string interpolation, that is identify string literals that contain a specific pattern (e.g. "[some expression]") and convert the included expression. I think this should be done at parser level, from within the corresponding Grammar action.
My idea is converting/treating the interpolated string as what it would look like with simple concatenation (as it works right now).
E.g.
print "this is the [result]. yay!"
to
print "this is the " + result + ". yay!"
However, I'm a bit confused as to how I could do that in Bison: basically, how do I tell it to re-parse a specific string (while constructing the main AST)?
Any ideas?
You could reparse the string, if you really wanted you, by generating a reentrant parser. You would probably want a reentrant scanner, as well, although I suppose you could kludge something together with a default scanner, using flex's buffer stack. Indeed, it's worth learning how to build reentrant parsers and scanners on the general principle of avoiding unnecessary globals, whether or not you need them for this particular purpose.
But you don't really need to reparse anything; you can do the entire parse in one pass. You just need enough smarts in your scanner so that it knows about nested interpolations.
The basic idea is to let the scanner split the string literal with interpolations into a series of tokens, which can easily be assembled into an appropriate AST by the parser. Since the scanner may return more than one token out of a single string literal, we'll need to introduce a start condition to keep track of whether the scan is currently inside a string literal or not. And since interpolations can, presumably, be nested we'll use flex's optional start condition stack, enabled with %option stack, to keep track of the nested contexts.
So here's a rough sketch.
As mentioned, the scanner has extra start conditions: SC_PROGRAM, the default, which is in effect while the scanner is scanning regular program text, and SC_STRING, in effect while the scanner is scanning a string. SC_PROGRAM is only needed because flex does not provide an official interface to check whether the start condition stack is empty; aside from nesting, it is identical to the INITIAL top-level start condition. The start condition stack is used to keep track of interpolation markers ([ and ] in this example), and it is needed because an interpolated expression might use brackets (as array subscripts, for example) or might even include a nested interpolated string. Since SC_PROGRAM is, with one exception, identical to INITIAL, we'll make it an inclusive rule.
%option stack
%s SC_PROGRAM
%x SC_STRING
%%
Since we're using a separate start condition to analyse string literals, we can also normalise escape sequences as we parse. Not all applications will want to do this, but it's pretty common. But since that's not really the point of this answer, I've left out most of the details. More interesting is the way that embedded interpolation expressions are handled, particularly deeply nested ones.
The end result will be to turn string literals into a series of tokens, possibly representing a nested structure. In order to avoid actually parsing in the scanner, we don't make any attempt to create AST nodes or otherwise rewrite the string literal; instead, we just pass the quote characters themselves through to the parser, delimiting the sequence of string literal pieces:
["] { yy_push_state(SC_STRING); return '"'; }
<SC_STRING>["] { yy_pop_state(); return '"'; }
A very similar set of rules is used for interpolation markers:
<*>"[" { yy_push_state(SC_PROGRAM); return '['; }
<INITIAL>"]" { return ']'; }
<*>"]" { yy_pop_state(); return ']'; }
The second rule above avoids popping the start condition stack if it is empty (as it will be in the INITIAL state). It's not necessary to issue an error message in the scanner; we can just pass the unmatched close bracket through to the parser, which will then do whatever error recovery seems necessary.
To finish off the SC_STRING state, we need to return tokens for pieces of the string, possibly including escape sequences:
<SC_STRING>{
[^[\\"]+ { yylval.str = strdup(yytext); return T_STRING; }
\\n { yylval.chr = '\n'; return T_CHAR; }
\\t { yylval.chr = '\t'; return T_CHAR; }
/* ... Etc. */
\\x[[:xdigit]]{2} { yylval.chr = strtoul(yytext, NULL, 16);
return T_CHAR; }
\\. { yylval.chr = yytext[1]; return T_CHAR; }
}
Returning escaped characters like that to the parser is probably not the best strategy; normally I would use an internal scanner buffer to accumulate the entire string. But it was simple for illustrative purposes. (Some error handling is omitted here; there are various corner cases, including newline handling and the annoying case where the last character in the program is a backslash inside an unterminated string literal.)
In the parser, we just need to insert a concatenation node for interpolated strings. The only complication is that we don't want to insert such a node for the common case of a string literal without any interpolations, so we use two syntax productions, one for a string with exactly one contained piece, and one for a string with two or more pieces:
string : '"' piece '"' { $$ = $2; }
| '"' piece piece_list '"' { $$ = make_concat_node(
prepend_to_list($2, $3));
}
piece : T_STRING { $$ = make_literal_node($1); }
| '[' expr ']' { $$ = $2; }
piece_list
: piece { $$ = new_list($1); }
| piece_list piece { $$ = append_to_list($1, $2); }
What is an efficient way in MATLAB to replace/insert one symbol (in series of symbols) with several others that correspond to the one that is being replaced?
For example, consider having a string Eq: Eq = 'A*exp(-((x-xc)/w)^2)'. Is there a way to replace * with .*, / with ./,\ with .\, and ^ with .^ without writing four separate strrep() lines?
Regular expressions will do the job nicely. Regular expressions simply find patterns in text. You specify what kind of pattern you are looking for by a regular expression, and the output gives you the locations of where the pattern occurred.
For our particular case, not only do we want to find where patterns occur, we also want to replace those patterns with something else. Specifically, use the function regexprep from MATLAB to replace matches in a string with something else. What you want to do is replace all *, /, \ and ^ symbols by adding a . in front of each.
How regexprep works is that the first input is the string you're looking at, the second input is a pattern that you're trying to find. In our case, we want to find any of *, /, \ and ^. To specify this pattern, you put those desired symbols in [] brackets. Regular expressions reserve \ as a special symbol to delineate characters that can be parsed as a regular expression but actually aren't. As such, you need to use \\ for the \ character and \^ for the ^ character. The third input is what you want to replace each match with. In our case, we simply want to reuse each matched character, but we add a . at the beginning of the match. This is done by doing \.$0 in the regular expression syntax. $0 means to grab the first token produced by a match... which is essentially the matched symbol from the pattern. . is also a reserved keyword using regular expressions, so we must prepend this symbol with a \ character.
Without further ado:
>> Eq = 'A*exp(-((x-xc)/w)^2)';
>> out = regexprep(Eq, '[*/\\\^]', '\.$0')
out =
A.*exp(-((x-xc)./w).^2)
The pattern we are looking for is [*/\\\^], which means that we want to find any of *, /, \ - denoted as \\ in regex, and \^ - denoted as ^ in regex. We want to find any of these symbols and replace them with the same symbol by adding a . character in front - \.$0.
As a more complicated example, let's make sure that we include all of the symbols you're looking for in a sample equation:
>> A = 'A*exp(-((x-xc)/w)^2) \ b^2';
>> out = regexprep(A, '[*/\\\^]', '\.$0')
out =
A.*exp(-((x-xc)./w).^2) .\ b.^2
I'd go with regexp as in rayryeng's answer. But here's another approach, just to provide an alternative.
ops = '*/\^'; %// operators that need a dot
ii = find(ismember(Eq, ops)); %// find where dots should be inserted
[~, jj] = sort([1:numel(Eq) ii-.5]); %// will be used to properly order the result
result = [Eq repmat('.',1,numel(ii))]; %// insert dots at the end
result = result(jj); %// properly order the result
And a variant:
ops = '*/\^'; %// operators that need a dot
ii = find(ismember(Eq, ops)); %// find where dots should be inserted
jj = sort([1:numel(Eq) ii-.5]); %// dot locations are marked with fractional part
result = Eq(ceil(jj)); %// repeat characters where the dots will be placed
result(mod(jj,1)>0) = '.'; %// place dots at indices with fractional part
The vectorize function already does almost all of what you want except that it does not convert mldivide (\) to ldivide (.\).
By "efficient," do you mean fewer lines of code or faster? Regular expressions are almost always slower than other approaches and less readable. I don't think they're necessary or a good choice in this case. If you only need to convert your string once, then speed is less of a concern than readability (strrep will still be faster). If you need to do it many times, this simple code that you alluded to is 4–5 times faster than regexrep for short strings like your example (and much faster for longer strings):
out = strrep(Eq,'*','.*');
out = strrep(out,'/','./');
out = strrep(out,'\','.\');
out = strrep(out,'^','.^');
If you want one line, use:
out = strrep(strrep(strrep(strrep(Eq,'*','.*'),'/','./'),'\','.\'),'^','.^');
which will also be slightly faster still. Or create your own version of vectorize and call that.
Where regular expressions shine is in more complex cases, e.g., if your string is already partially vectorized: Eq = 'A.*exp(-((x-xc)/w)^2)'. Even still, the vectorize function just uses strrep and then calls strfind to "remove any possible '..*', '../', etc." and replace them with the proper element-wise operators because it's faster (symbolic math strings can get very large, for example).
I'd like to calculate the length of a replace string used in a substitution. That is, "bar" in :s/foo/bar. Suppose I have access to this command string, I can run and undo it, and may separate the parts marked by / with split(). How would I get the string length of the replace string if it contains special characters like \1, \2 etc or ~?
For instance if I have
:s/\v(foo)|(bars)/\2\rreplace/
the replace length would be strlen("bars\rreplace") = 12.
EDIT: Just to be clear, I hope to use this to move the cursor past the text that was affected by a substitute operation. I'd appreciate alternative solutions as well.
You have to use :help sub-replace-expression. In it, you use submatch(2) instead of \2. If the expression is a custom function, you can as a side effect store the original length in a variable, and access that later:
function! Replace()
let g:replaceLength = strlen(submatch(0))
" Equivalent of \2\rreplace
return submatch(2) . "\r" . 'replace'
endfunction
:s/\v(foo)|(bars)/\=Replace()/
Basically, I have a string that consists of multiple, space-separated words. The thing is, however, that there can be multiple spaces instead of just one separating the words. This is why [split] does not do what I want:
split "a b"
gives me this:
{a {} {} {} b}
instead of this:
{a b}
Searching Google, I found a page on the Tcler's wiki, where a user asked more or less the same question.
One proposed solution would look like this:
split [regsub -all {\s+} "a b" " "]
which seems to work for simple string. But a test string such as [string repeat " " 4] (used string repeat because StackOverflow strips multiple spaces) will result in regsub returning " ", which split would again split up into {{} {}} instead of an empty list.
Another proposed solution was this one, to force a reinterpretation of the given string as a list:
lreplace "a list with many spaces" 0 -1
But if there's one thing I've learned about TCL, it is that you should never use list functions (starting with l) on strings. And indeed, this one will choke on strings containing special characters (namely { and }):
lreplace "test \{a b\}"
returns test {a b} instead of test \{a b\} (which would be what I want, every space-separated word split up into a single element of the resulting list).
Yet another solution was to use a 'filter':
proc filter {cond list} {
set res {}
foreach element $list {if [$cond $element] {lappend res $element}}
set res
}
You'd then use it like this:
filter llength [split "a list with many spaces"]
Again, same problem. This would call llength on a string, which might contain special characters (again, { and }) - passing it "\{a b\}" would result in TCL complaining about an "unmatched open brace in list".
I managed to get it to work by modifying the given filter function, adding a {*} in front of $cond in the if, so I could use it with string length instead of llength, which seemed to work for every possible input I've tried to use it on so far.
Is this solution safe to use as it is now? Would it choke on some special input I didn't test so far? Or, is it possible to do this right in a simpler way?
The easiest way is to use regexp -all -inline to select and return all words. For example:
# The RE matches any non-empty sequence of non-whitespace characters
set theWords [regexp -all -inline {\S+} $theString]
If instead you define words to be sequences of alphanumerics, you instead use this for the regular expression term: {\w+}
You can use regexp instead:
From tcl wiki split:
Splitting by whitespace: the pitfalls
split { abc def ghi}
{} abc def {} ghi
Usually, if you are splitting by whitespace and do not want those blank fields, you are better off doing:
regexp -all -inline {\S+} { abc def ghi}
abc def ghi