Macros and string interpolation (Julia) - metaprogramming

Let's say I make this simple string macro
macro e_str(s)
return string("I touched this: ",s)
end
If I apply it to a string with interpolation, I
obtain:
julia> e"foobar $(log(2))"
"I touched this: foobar \$(log(2))"
Whereas I would like to obtain:
julia> e"foobar $(log(2))"
"I touched this: foobar 0.6931471805599453"
What changes do I have to make to my macro declaration?

It's better to parse the string at compile-time than to delegate to Julia. Basically, put the string into an IOBuffer, scan the string for $ signs, and use the parse function whenever they come up.
macro e_str(s)
components = []
buf = IOBuffer(s)
while !eof(buf)
push!(components, rstrip(readuntil(buf, '$'), '$'))
if !eof(buf)
push!(components, parse(buf; greedy=false))
end
end
quote
string($(map(esc, components)...))
end
end
This doesn't work with escaped $ characters, but that can be resolved with some minor changes to handle \ also. I have included a basic example at the bottom of this post.
I wrote it this way because string macros are generally not for emulating Julia strings — regular macros with regular string literals are better for that purpose. So writing up the parsing yourself isn't that bad, especially because it allows customized extensions. If you really want parsing to be identical to how Julia parses it, you could escape the string and then reparse it, as #MattB suggested:
macro e_str(s)
esc(parse("\"$(escape_string(s))\""))
end
The resulting expression is a :string expression which you could dump and inspect, and then analyse the usual way.
String macros do not come with built-in interpolation facilities. However, it is possible to manually implement this functionality. Note that it is not possible to embed without escaping string literals that have the same delimiter as the surrounding string macro; that is, although """ $("x") """ is possible, " $("x") " is not. Instead, this must be escaped as " $(\"x\") ".
There are two approaches to implementing interpolation manually: implement parsing manually, or get Julia to do the parsing. The first approach is more flexible, but the second approach is easier.
Manual parsing
macro interp_str(s)
components = []
buf = IOBuffer(s)
while !eof(buf)
push!(components, rstrip(readuntil(buf, '$'), '$'))
if !eof(buf)
push!(components, parse(buf; greedy=false))
end
end
quote
string($(map(esc, components)...))
end
end
Julia parsing
macro e_str(s)
esc(parse("\"$(escape_string(s))\""))
end
This method escapes the string (but note that escape_string does not escape the $ signs) and passes it back to Julia's parser to parse. Escaping the string is necessary to ensure that " and \ do not affect the string's parsing. The resulting expression is a :string expression, which can be examined and decomposed for macro purposes.

Related

A string interpolation within concatenation is producing two double quotes instead of one (JuliaLang)

I am trying to include a single double quote in a string during a concatenation within JuliaLang, as below:
tmpStr = string(tmpStr, string("graph [label=\" hi \"]; "))
The output in the text file written with writedlm is:
graph [label="" hi ""]
How can I modify the string interpolation to include only a single double quote instead of this repetition?
The extra double quotes come from writedlm. writedlm uses standard CSV escaping method, which surrounds special characters with double quotes, and uses "" to represent a single double quote. This is OK, as long as you do the inverse transformation when reading the file.
A good method to trace such problems is to create a minimal working example. In this case, something like:
writedlm("tst.tst",["\""])
Which writes tst.tst, but tst.tst now has:
""""
But when read properly:
julia> data = readdlm("tst.tst")
1×1 Array{Any,2}:
"\""
As expected.
Another option to avoid getting the extra quotes is to add quotes=false as an option to writedlm, as in the following example:
julia> writedlm(STDOUT,["\""],quotes=false)
"

(F)Lex checking symbol without "consuming" it

The purpose of this is to concatenate strings (with (f)lex if possible) if they're written consecutively separated only by whitespace.
Strings start and end with "s.
The thing is I used states and while it can concatenate the strings it also consumes the next character/symbol that comes right after the strings.
For example -- "this " "is only " "1 string"id -- this will concatenate the strings ("this is only 1 string") but it will also "consume" the i in id thus destroying one token.
Is there a way to check the next char/symbol without actually "consuming/disposing" (can't really think of a term) it.
\" yy_push_state(X_STRING); yylval.s = new std::string("");
<X_STRING>\" yy_push_state(X_CONC);
<X_STRING>. yylval.s += yytext;
<X_STRING>\n yyerror("newline in string");
<X_CONC>[ ^\n] ;
<X_CONC>\" yy_pop_state();
<X_CONC>. yy_pop_state(); yy_pop_state(); return STRING
Any way to do it?
You can use yyless(0) to cause the current token to be rescanned. Make sure you change start condition, or you'll end up with an endless loop.
By the way, I think your code would be more readable if you switched start conditions with BEGIN rather than using the state stack. In fact, you could easily avoid start conditions, but that would make interpreting escape sequences more complicated. Possibly better would be to just avoid X_CONC by using a rule for \"[[:space:]]*\"

Multiline string literal in Matlab?

Is there a multiline string literal syntax in Matlab or is it necessary to concatenate multiple lines?
I found the verbatim package, but it only works in an m-file or function and not interactively within editor cells.
EDIT: I am particularly after readbility and ease of modifying the literal in the code (imagine it contains indented blocks of different levels) - it is easy to make multiline strings, but I am looking for the most convenient sytax for doing that.
So far I have
t = {...
'abc'...
'def'};
t = cellfun(#(x) [x sprintf('\n')],t,'Unif',false);
t = horzcat(t{:});
which gives size(t) = 1 8, but is obviously a bit of a mess.
EDIT 2: Basically verbatim does what I want except it doesn't work in Editor cells, but maybe my best bet is to update it so it does. I think it should be possible to get current open file and cursor position from the java interface to the Editor. The problem would be if there were multiple verbatim calls in the same cell how would you distinguish between them.
I'd go for:
multiline = sprintf([ ...
'Line 1\n'...
'Line 2\n'...
]);
Matlab is an oddball in that escape processing in strings is a function of the printf family of functions instead of the string literal syntax. And no multiline literals. Oh well.
I've ended up doing two things. First, make CR() and LF() functions that just return processed \r and \n respectively, so you can use them as pseudo-literals in your code. I prefer doing this way rather than sending entire strings through sprintf(), because there might be other backslashes in there you didn't want processed as escape sequences (e.g. if some of your strings came from function arguments or input read from elsewhere).
function out = CR()
out = char(13); % # sprintf('\r')
function out = LF()
out = char(10); % # sprintf('\n');
Second, make a join(glue, strs) function that works like Perl's join or the cellfun/horzcat code in your example, but without the final trailing separator.
function out = join(glue, strs)
strs = strs(:)';
strs(2,:) = {glue};
strs = strs(:)';
strs(end) = [];
out = cat(2, strs{:});
And then use it with cell literals like you do.
str = join(LF, {
'abc'
'defghi'
'jklm'
});
You don't need the "..." ellipses in cell literals like this; omitting them does a vertical vector construction, and it's fine if the rows have different lengths of char strings because they're each getting stuck inside a cell. That alone should save you some typing.
Bit of an old thread but I got this
multiline = join([
"Line 1"
"Line 2"
], newline)
I think if makes things pretty easy but obviously it depends on what one is looking for :)

Make string manipulation more convenient in Mathematica

With Mathematica I always feel that strings are "second class citizens." Compared to a language such as PERL, one must juggle a lot of code to accomplish the same task.
The available functionality is not bad, but the syntax is uncomfortable. While there are a few shorthand forms such as <> for StringJoin and ~~ for StringExpression, most of the string functionality lacks such syntax, and uses clumsy names like: StringReplace, StringDrop, StringReverse, Characters, CharacterRange, FromCharacterCode, and RegularExpression.
In Mathematica strings are handled like mathematical objects, allowing 5 "a" + "b" where "a" and "b" act as symbols. This is a feature that I would not change, even if that would not break stacks of code. Nevertheless it precludes certain terse string syntax, wherein the expression 5 "a" + "b" would be rendered "aaaaab" for example.
What is the best way to make string manipulation more convenient in Mathematica?
Ideas that come to mind, either alone or in combination, are:
Overload existing functions to work on strings, e.g. Take, Replace, Reverse.
This was the original topic of my question to which Sasha replied. It was seen as inadvisable.
Use shortened names for string functions, e.g. StringReplace >> StrRpl, Characters >> Chrs, RegularExpression >> "RegEx"
Create new infix syntax for string functions, and possibly new string operations.
Create a new container for strings, e.g. str["string"], and then definitions for various functions. (This was suggested by Leonid Shifrin.)
A variable of (4), expand strings (automatically?) to characters, e.g. "string" >> str["s","t","r","i","n","g"] so that the characters can be seen by Part, Take, etc.
Call another language such as PERL from within Mathematica to handle string processing.
Create new string functions that conglomerate frequently used sequences of operations.
I think the reason these operations have String* names is that they have tiny differences compared to their list counterparts. Specifically compare Cases to StringCases.
Now the way to to achieve what you want is to do it like this:
Begin["StringOverload`"];
{Drop, Cases, Take, Reverse};
Unprotect[String];
ToStringHead[Drop] = StringDrop;
ToStringHead[Take] = StringTake;
ToStringHead[Cases] = StringCases;
ToStringHead[Reverse] = StringReverse;
String /:
HoldPattern[(h : Drop | Cases | Take | Reverse)[s_String, rest__]] :=
With[{head = ToStringHead[h]}, head[s, rest]]
RemoveOverloading[] :=
UpValues[String] =
DeleteCases[UpValues[String],
x_ /; ! FreeQ[Unevaluated[x], (Drop | Cases | Take | Reverse)]]
End[];
You get to load stuff with Get or Need, and remove the overloading with RemoveOverloading[] called with the correct context.
In[21]:= Cases["this is a sentence", RegularExpression["\\s\\w\\w\\s"]]
Out[21]= {" is "}
In[22]:= Take["This is dangerous", -9]
Out[22]= "dangerous"
In[23]:= Drop["This is dangerous", -9]
Out[23]= "This is "
I do not think doing this is the right way to go, though. You might consider introducing shorter symbols in some context which would automatically evaluate to String* symbols

Modifying a character in a string in Lua

Is there any way to replace a character at position N in a string in Lua.
This is what I've come up with so far:
function replace_char(pos, str, r)
return str:sub(pos, pos - 1) .. r .. str:sub(pos + 1, str:len())
end
str = replace_char(2, "aaaaaa", "X")
print(str)
I can't use gsub either as that would replace every capture, not just the capture at position N.
Strings in Lua are immutable. That means, that any solution that replaces text in a string must end up constructing a new string with the desired content. For the specific case of replacing a single character with some other content, you will need to split the original string into a prefix part and a postfix part, and concatenate them back together around the new content.
This variation on your code:
function replace_char(pos, str, r)
return str:sub(1, pos-1) .. r .. str:sub(pos+1)
end
is the most direct translation to straightforward Lua. It is probably fast enough for most purposes. I've fixed the bug that the prefix should be the first pos-1 chars, and taken advantage of the fact that if the last argument to string.sub is missing it is assumed to be -1 which is equivalent to the end of the string.
But do note that it creates a number of temporary strings that will hang around in the string store until garbage collection eats them. The temporaries for the prefix and postfix can't be avoided in any solution. But this also has to create a temporary for the first .. operator to be consumed by the second.
It is possible that one of two alternate approaches could be faster. The first is the solution offered by Paŭlo Ebermann, but with one small tweak:
function replace_char2(pos, str, r)
return ("%s%s%s"):format(str:sub(1,pos-1), r, str:sub(pos+1))
end
This uses string.format to do the assembly of the result in the hopes that it can guess the final buffer size without needing extra temporary objects.
But do beware that string.format is likely to have issues with any \0 characters in any string that it passes through its %s format. Specifically, since it is implemented in terms of standard C's sprintf() function, it would be reasonable to expect it to terminate the substituted string at the first occurrence of \0. (Noted by user Delusional Logic in a comment.)
A third alternative that comes to mind is this:
function replace_char3(pos, str, r)
return table.concat{str:sub(1,pos-1), r, str:sub(pos+1)}
end
table.concat efficiently concatenates a list of strings into a final result. It has an optional second argument which is text to insert between the strings, which defaults to "" which suits our purpose here.
My guess is that unless your strings are huge and you do this substitution frequently, you won't see any practical performance differences between these methods. However, I've been surprised before, so profile your application to verify there is a bottleneck, and benchmark potential solutions carefully.
You should use pos inside your function instead of literal 1 and 3, but apart from this it looks good. Since Lua strings are immutable you can't really do much better than this.
Maybe
"%s%s%s":format(str:sub(1,pos-1), r, str:sub(pos+1, str:len())
is more efficient than the .. operator, but I doubt it - if it turns out to be a bottleneck, measure it (and then decide to implement this replacement function in C).
With luajit, you can use the FFI library to cast the string to a list of unsigned charts:
local ffi = require 'ffi'
txt = 'test'
ptr = ffi.cast('uint8_t*', txt)
ptr[1] = string.byte('o')

Resources