Understanding Raku's `&?BLOCK` compile-time variable - metaprogramming

I really appreciate the Raku's &?BLOCK variable – it lets you recurse within an unnamed block, which can be extremely powerful. For example, here's a simple, inline, and anonymous factorial function:
{ when $_ ≤ 1 { 1 };
$_ × &?BLOCK($_ - 1) }(5) # OUTPUT: «120»
However, I have some questions about it when used in more complex situations. Consider this code:
{ say "Part 1:";
my $a = 1;
print ' var one: '; dd $a;
print ' block one: '; dd &?BLOCK ;
{
my $a = 2;
print ' var two: '; dd $a;
print ' outer var: '; dd $OUTER::a;
print ' block two: '; dd &?BLOCK;
print "outer block: "; dd &?OUTER::BLOCK
}
say "\nPart 2:";
print ' block one: '; dd &?BLOCK;
print 'postfix for: '; dd &?BLOCK for (1);
print ' prefix for: '; for (1) { dd &?BLOCK }
};
which yields this output (I've shortened the block IDs):
Part 1:
var one: Int $a = 1
block one: -> ;; $_? is raw = OUTER::<$_> { #`(Block|…6696) ... }
var two: Int $a = 2
outer var: Int $a = 1
block two: -> ;; $_? is raw = OUTER::<$_> { #`(Block|…8496) ... }
outer block: -> ;; $_? is raw = OUTER::<$_> { #`(Block|…8496) ... }
Part 2:
block one: -> ;; $_? is raw = OUTER::<$_> { #`(Block|…6696) ... }
postfix for: -> ;; $_ is raw { #`(Block|…9000) ... }
prefix for: -> ;; $_ is raw { #`(Block|…9360) ... }
Here's what I don't understand about that: why does the &?OUTER::BLOCK refer (based on its ID) to block two rather than block one? Using OUTER with $a correctly causes it to refer to the outer scope, but the same thing doesn't work with &?BLOCK. Is it just not possible to use OUTER with &?BLOCK? If not, is there a way to access the outer block from the inner block? (I know that I can assign &?BLOCK to a named variable in the outer block and then access that variable in the inner block. I view that as a workaround but not a full solution because it sacrifices the ability to refer to unnamed blocks, which is where much of &?BLOCK's power comes from.)
Second, I am very confused by Part 2. I understand why the &?BLOCK that follows the prefix for refers to an inner block. But why does the &?BLOCK that precedes the postfix for also refer to its own block? Is a block implicitly created around the body of the for statement? My understanding is that the postfix forms were useful in large part because they do not require blocks. Is that incorrect?
Finally, why do some of the blocks have OUTER::<$_> in the but others do not? I'm especially confused by Block 2, which is not the outermost block.
Thanks in advance for any help you can offer! (And if any of the code behavior shown above indicates a Rakudo bug, I am happy to write it up as an issue.)

That's some pretty confusing stuff you've encountered. That said, it does all make some kind of sense...
Why does the &?OUTER::BLOCK refer (based on its ID) to block two rather than block one?
Per the doc, &?BLOCK is a "special compile variable", as is the case for all variables that have a ? as their twigil.
As such it's not a symbol that can be looked up at run-time, which is what syntax like $FOO::bar is supposed to be about afaik.
So I think the compiler ought by rights reject use of a "compile variable" with the package lookup syntax. (Though I'm not sure. Does it make sense to do "run-time" lookups in the COMPILING package?)
There may already be a bug filed (in either of the GH repos rakudo/rakudo/issues or raku/old-issues-tracker/issues) about it being erroneous to try to do a run-time lookup of a special compile variable (the ones with a ? twigil). If not, it makes sense to me to file one.
Using OUTER with $a correctly causes it to refer to the outer scope
The symbol associated with the $a variable in the outer block is stored in the stash associated with the outer block. This is what's referenced by OUTER.
Is it just not possible to use OUTER with &?BLOCK?
I reckon not for the reasons given above. Let's see if anyone corrects me.
If not, is there a way to access the outer block from the inner block?
You could pass it as an argument. In other words, close the inner block with }(&?BLOCK); instead of just }. Then you'd have it available as $_ in the inner block.
Why does the &?BLOCK that precedes the postfix for also refer to its own block?
It is surprising until you know why, but...
Is a block implicitly created around the body of the for statement?
Seems so, so the body can take an argument passed by each iteration of the for.
My understanding is that the postfix forms were useful in large part because they do not require blocks.
I've always thought of their benefit as being that they A) avoid a separate lexical scope and B) avoid having to type in the braces.
Is that incorrect?
It seems so. for has to be able to supply a distinct $_ to its statement(s) (you can put a series of statements in parens), so if you don't explicitly write braces, it still has to create a distinct lexical frame, and presumably it was considered better that the &?BLOCK variable tracked that distinct frame with its own $_, and "pretended" that was a "block", and displayed its gist with a {...}, despite there being no explicit {...}.
Why do some of the blocks have OUTER::<$_> in them but others do not?
While for (and given etc) always passes an "it" aka $_ argument to its blocks/statements, other blocks do not have an argument automatically passed to them, but they will accept one if it's manually passed by the writer of code manually passing one.
To support this wonderful idiom in which one can either pass or not pass an argument, blocks other than ones that are automatically fed an $_ are given this default of binding $_ to the outer block's $_.
I'm especially confused by Block 2, which is not the outermost block.
I'm confused by you being especially confused by that. :) If the foregoing hasn't sufficiently cleared this last aspect up for you, please comment on what it is about this last bit that's especially confusing.

During compilation the compiler has to keep track of various things. One of which is the current block that it is compiling.
The block object gets stored in the compiled code wherever it sees the special variable $?BLOCK.
Basically the compile-time variables aren't really variables, but more of a macro.
So whenever it sees $?BLOCK the compiler replaces it with whatever the current block the compiler is currently compiling.
It just happens that $?OUTER::BLOCK is somehow close enough to $?BLOCK that it replaces that too.
I can show you that there really isn't a variable by that name by trying to look it up by name.
{ say ::('&?BLOCK') } # ERROR: No such symbol '&?BLOCK'
Also every pair of {} (that isn't a hash ref or hash index) denotes a new block.
So each of these lines will say something different:
{
say $?BLOCK.WHICH;
say "{ $?BLOCK.WHICH }";
if True { say $?BLOCK.WHICH }
}
That means if you declare a variable inside one of those constructs it is contained to that construct.
"{ my $a = "abc"; say $a }"; # abc
say $a; # COMPILE ERROR: Variable '$a' is not declared
if True { my $b = "def"; say $b } # def
say $b; # COMPILE ERROR: Variable '$b' is not declared
In the case of postfix for, the left side needs to be a lambda/closure so that for can set $_ to the current value.
It was probably just easier to fake it up to be a Block than to create a new Code type just for that use.
Especially since an entire Raku source file is also considered a Block.
A bare Block can have an optional argument.
my &foo;
given 5 {
&foo = { say $_ }
}
foo( ); # 5
foo(42); # 42
If you give it an argument it sets $_ to that value.
If you don't, $_ will point to whatever $_ was outside of that declaration. (Closure)
For many of the uses of that construct, doing that can be very handy.
sub call-it-a (&c){
c()
}
sub call-it-b (&c, $arg){
c( $arg * 10 )
}
for ^5 {
call-it-a( { say $_ } ); # 0␤ 1␤ 2␤ 3␤ 4␤
call-it-b( { say $_ }, $_ ); # 0␤10␤20␤30␤40␤
}
For call-it-a we needed it to be a closure over $_ to work.
For call-it-b we needed it to be an argument instead.
By having :( ;; $_? is raw = OUTER::<$_> ) as the signature it caters to both use-cases.
This makes it easy to create simple lambdas that just do what you want them to do.

Related

How to know if returning an l-value when using `FALLBACK`?

How can I know if I actually need to return an l-value when using FALLBACK?
I'm using return-rw but I'd like to only use return where possible. I want to track if I've actually modified %!attrs or have only just read the value when FALLBACK was called.
Or (alternate plan B) can I attach a callback or something similar to my %!attrs to monitor for changes?
class Foo {
has %.attrs;
submethod BUILD { %!attrs{'bar'} = 'bar' }
# multi method FALLBACK(Str:D $name, *#rest) {
# say 'read-only';
# return %!attrs{$name} if %!attrs«$name»:exists;
# }
multi method FALLBACK(Str:D $name, *#rest) {
say 'read-write';
return-rw %!attrs{$name} if %!attrs«$name»:exists;
}
}
my $foo = Foo.new;
say $foo.bar;
$foo.bar = 'baz';
say $foo.bar;
This feels a bit like a X-Y question, so let's simplify the example, and see if that answers helps in your decisions.
First of all: if you return the "value" of a non-existing key in a hash, you are in fact returning a container that will auto-vivify the key in the hash when assigned to:
my %hash;
sub get($key) { return-rw %hash{$key} }
get("foo") = 42;
dd %hash; # Hash %hash = {:foo(42)}
Please note that you need to use return-rw here to ensure the actual container is returned, rather than just the value in the container. Alternately, you can use the is raw trait, which allows you to just set the last value:
my %hash;
sub get($key) is raw { %hash{$key} }
get("foo") = 42;
dd %hash; # Hash %hash = {:foo(42)}
Note that you should not use return in that case, as that will still de-containerize again.
To get back to your question:
I want to track if I've actually modified %!attrs or have only just read the value when FALLBACK was called.
class Foo {
has %!attrs;
has %!unexpected;
method TWEAK() { %!attrs<bar> = 'bar' }
method FALLBACK(Str:D $name, *#rest) is raw {
if %!attrs{$name}:exists {
%!attrs{$name}
}
else {
%!unexpected{$name}++;
Any
}
}
}
This would either return the container found in the hash, or record the access to the unknown key and return an immutable Any.
Regarding plan B, recording changes: for that you could use a Proxy object for that.
Hope this helps in your quest.
Liz's answer is full of useful info and you've accepted it but I thought the following might still be of interest.
How to know if returning an l-value ... ?
Let's start by ignoring the FALLBACK clause.
You would have to test the value. To deal with Scalars, you must test the .VAR of the value. (For non-Scalar values the .VAR acts like a "no op".) I think (but don't quote me) that Scalar|Array|Hash covers all the l-value super-types:
my \value = 42; # Int is an l-value is False
my \l-value-one = $; # Scalar is an l-value is True
my \l-value-too = #; # Array is an l-value is True
say "{.VAR.^name} is an l-value is {.VAR ~~ Scalar|Array|Hash}"
for value, l-value-one, l-value-too
How to know if returning an l-value when using FALLBACK?
Adding "when using FALLBACK" makes no difference to the answer.
How can I know if I actually need to return an l-value ... ?
Again, let's start by ignoring the FALLBACK clause.
This is a completely different question than "How to know if returning an l-value ... ?". I think it's the core of your question.
Afaik, the answer is, you need to anticipate how the returned value will be used. If there's any chance it'll be used as an l-value, and you want that usage to work, then you need to return an l-value. The language/compiler can't (or at least doesn't) help you make that decision.
Consider some related scenarios:
my $baz := foo.bar;
... (100s of lines of code) ...
$baz = 42;
Unless the first line returns an l-value, the second line will fail.
But the situation is actually much more immediate than that:
routine-foo = 42;
routine-foo is evaluated first, in its entirety, before the lhs = rhs expression is evaluated.
Unless the compiler's resolution of the routine-foo call somehow incorporated the fact that the very next thing to happen would be that the lhs will be assigned to, then there would be no way for a singly or multiply dispatched routine-foo to know whether it can safely return an r-value or must return an l-value.
And the compiler's resolution does not incorporate that. Thus, for example:
multi term:<bar> is rw { ... }
multi term:<bar> { ... }
bar = 99; # Ambiguous call to 'term:<bar>(...)'
I can imagine this one day (N years from now) being solved by a combination of allowing = to be an overloadable operator, robust macros that allow overloading of = being available, and routine resolution being modified so the above ambiguous call could do something equivalent to resolving to the is rw multi. But I doubt it will actually come to pass even with N=10. Perhaps there is another way but I can't think of one at the moment.
How can I know if I actually need to return an l-value when using FALLBACK?
Again, adding "when using FALLBACK" makes no difference to the answer.
I want to track if I've actually modified %!attrs or have only just read the value when FALLBACK was called.
When FALLBACK is called it doesn't know what context it's being called in -- r-value or l-value. Any modification comes after it has already returned.
In other words, whatever solution you come up with will being nothing to do per se with FALLBACK (even if you have to use it to implement some other aspect of whatever it is you're trying to do).
(Even if it were, I suspect trying to solve it via FALLBACK itself would just make matters worse. One can imagine writing two FALLBACK multis, one with an is rw trait, but, as explained above, my imagination doesn't stretch to that making any difference any time soon, if ever, and could only happen if the above imaginary things happened (the macros etc.) and the compiler was also modified to pay attention to the two FALLBACK multi variants, and I'm not at all meaning to suggest that that even makes sense.)
Plan B
Or (alternate plan B) can I attach a callback or something similar to my %!attrs to monitor for changes?
As Lizmat notes, that's the realm of Proxys. And thus your next SO question... :)

How to convert a hash ref in one line to a constant in perl

I'm using Sphinx::Search.
Is there is a easier way for this code example to convert a string to a constant?
use Sphinx::Search;
my $config = {
x => 'SPH_MATCH_EXTENDED2',
};
my $x = $config->{x};
print Sphinx::Search->$x(); # output: 6
I have used advice from
How do I access a constant in Perl whose name is contained in a variable?
and this example works, but if I am always using a string from a hash then do I need to put it into a separate variable to use it in this way?
my $x = $config->{x};
print Sphinx::Search->$x(); # output: 6
Is there a one- liner for this?
# does not work
print Sphinx::Search->$config->{x}();
You can create a reference to the value and immediately dereference it:
Sphinx::Search->${ \$config->{x} };
(If there are no arguments, the () is optional).
I'm guessing that SPH_MATCH_EXTENDED2 is the name of a constant that is exported by Sphinx::Search. The problem is that these are implemented as a subroutine with no parameters, so you may use them only where a bare subroutine name will be understood by Perl as a call, or where an explicit call is valid ( SPH_MATCH_EXTENDED2() )
The easiest solution is to avoid quoting the hash value at all, like so
my $config = { x => SPH_MATCH_EXTENDED2 }
and afterwards, you may use just
$config->{x}; # 6
instead of calling a pseudo class method

Is there any label concept available in TCL?

Actually my requirement is while choosing the label it will redirect to specific place where i have give the description.
Example
set a 20
switch -- $a {
"20" : goto check
"abc" : goto check1
}
Label 20:
puts "Given value is integer"
Label abc:
puts "Given value is alpha"
likewise is there any option available in TCL
Tcl doesn't support goto at all; it's low-level semantics are incompatible with goto, though they work fine with just about all higher-level concepts (such as structured programming, state machines, etc.) What to do instead depends on exactly what you're doing; toy examples aren't very helpful here.
The one option for a direct goto is to use tcl::unsupported::assemble (Tcl 8.6 only).
proc foo a {
tcl::unsupported::assemble {
expr {
$a eq 20
}
jumpTrue check
expr {
$a eq "abc"
}
jumpTrue check2
jump end
label check
eval {
puts "Given value is integer"
}
pop
label check2
eval {
puts "Given value is alpha"
}
pop
label end
# There *must* be one result value pushed onto the stack at the end
push ""
}
}
puts "before"
foo 20
puts "mid-1"
foo abc
puts "mid-2"
foo 3.14
puts "after"
That lets you write a direct goto (the jumpTrue and jump; there's also a jumpFalse) to a label and the expr and eval pseudo-opcodes let you inject an expression evaluation or script rather than writing everything by hand. However, the writing of bytecode by hand will get very boring very quickly and the command isn't supported (because we don't really like our bytecode). This is how to do it, but it's truly not encouraged. In particular, you need take care to manage the evaluation stack right; both expr and eval push one value, and the net stack effect of the whole bytecode needs to be to push exactly one value (or throw an exception).
Also, the assembler doesn't allow all bytecode instructions that the engine knows. Some are restricted because they're unsafe (except how the compiler uses them) and others because we've no idea how to describe them sensibly! There's not much help for what the legal bytecodes really are either…

Token empty when matching grammar although rule matched

So my rule is
/* Addition and subtraction have the lowest precedence. */
additionExp returns [double value]
: m1=multiplyExp {$value = $m1.value;}
( op=AddOp m2=multiplyExp )* {
if($op != null){ // test if matched
if($op.text == "+" ){
$value += $m2.value;
}else{
$value -= $m2.value;
}
}
}
;
AddOp : '+' | '-' ;
My test ist 3 + 4 but op.text always returns NULL and never a char.
Does anyone know how I can test for the value of AddOp?
In the example from ANTLR4 Actions and Attributes it should work:
stat: ID '=' INT ';'
{
if ( !$block::symbols.contains($ID.text) ) {
System.err.println("undefined variable: "+$ID.text);
}
}
| block
;
Are you sure $op.text is always null? Your comparison appears to check for $op.text=="+" rather than checking for null.
I always start these answers with a suggestion that you migrate all of your action code to listeners and/or visitors when using ANTLR 4. It will clean up your grammar and greatly simplify long-term maintenance of your code.
This is probably the primary problem here: Comparing String objects in Java should be performed using equals: "+".equals($op.text). Notice that I used this ordering to guarantee that you never get a NullPointerException, even if $op.text is null.
I recommend removing the op= label and referencing $AddOp instead.
When you switch to using listeners and visitors, removing the explicit label will marginally reduce the size of the parse tree.
(Only relevant to advanced users) In some edge cases involving syntax errors, labels may not be assigned while the object still exists in the parse tree. In particular, this can happen when a label is assigned to a rule reference (your op label is assigned to a token reference), and an error appears within the labeled rule. If you reference the context object via the automatically generated methods in the listener/visitor, the instances will be available even when the labels weren't assigned, improving your ability to report details of some errors.

Velocity: Is it possible to nest macros that use ## and $bodyContent?

I have a macro that looks essentially like this:
#macro( surround $x )
surround:$x
$bodyContent
/surround:$x
#end
Invocation ##surround("A")bunch o' stuff#end produces "surround:A bunch o' stuff /surround:A" as
expected. Invocation ##surround("A")##surround("B")more stuff#end#end produces
surround:A surround:B more stuff /surround:B /surround:A which is exactly what I want.
But now I want to build upwards with another macro
#macro( annotated-surround $x $y )
##surround( $x )
annotate:$y
$bodyContent
#end
#end
The intended expansion of #annotated-surround( "C" "note" ) stuff #end is
surround:C annotate:note stuff /surround:C
...but this doesn't work; I get the dreaded semi-infinite expansion of the annotated-surround body
content.
I have read the answer at Closure in Velocity template macros and still don't quite know whether what I want to do is possible.
I'm willing to do arbitrarily tricky things within the definitions of #surround and
#annotated-surround, but I don't want the users of those macros to see any complexity. The
whole idea is to simplify their lives.
As long as I have your ear: Setting macro.provide.scope.control=true is supposed to "a local namespace in macros". What does this mean? Is the provided namespace independent of the default context, but with a single such space shared among all invocations of all macros? Or is a separate context provided for each macro invocation, even recursively? It has to be the latter because of $macro.parent, right?
And yet another question. Consider the following macro:
#macro( recursive $x )
#if($x == 0)
zero
#else
$x before . . .
#set($xMinusOne = $x - 1)
#recursive($xMinusOne)
. . . $x after
#end
#end
#recursive( 4 ) yields:
4 before . . .
3 before . . .
2 before . . .
1 before . . .
zero . . .
0 after . . .
0 after . . .
0 after . . .
4 after
Now I understand all those occurrences of "0": there's only one global $x, so assigning to it on
the recursive calls smashes it and it doesn't get restored. But where on earth does that final "4"
come from? For that matter, how is it that my first "surround" macro works to arbitrary depth;
how come its final $x doesn't get smashed in inner calls?
Sorry to be so prolix, but I have been unable to find clear documentation in this matter.
The problem is the combination of global variables, a name collision, and lazy rendering.
Let's walk through the rendering process for ##annotated-surround( "x" "y" )content#end:
Rendering enters the annotated-surround macro. The context map contains:
$x = String x
$y = String y
$bodyContent = Renderable content - note that the String output of this has not yet been evaluated.
Rendering of the first line enters the surround macro. This updates the context map to:
new $x = old $x = String x
$y = String y
$bodyContent = Renderable annotate:$y\n$bodyContent - note that the String output of this still has not yet been evaluated, it's still template code.
Rendering outputs the first line of surround, producing the String surround:x.
Rendering begins evaluating the second line of surround, which references $bodyContent.
Rendering the first line of $bodyContent produces the String annotate:y.
Rendering begins evaluating the second line of $bodyContent, which references $bodyContent.
Rendering the first line of $bodyContent produces the String annotate:y.
Rendering begins evaluating the second line of $bodyContent, which references $bodyContent.
etc.
The solution is to remove part of the problem's combination. Global variables and lazy rendering are fundamental parts of how Velocity works, so you can't touch those. That leaves the name collision. What you need is for each macro's $bodyContent to be referred to with a different name. This is easily achieved by assigning it to new variables with unique names in each macro before invoking any other macros, and using the new variable in any invoked macro's body, like this:
#macro( surround $x )
surround:$x
$bodyContent
/surround:$x
#end
#macro( annotated-surround $x $y )
#set( $annotated-surround-content = $bodyContent )
##surround( $x )
annotate:$y
$annotated-surround-content
#end
#end
Rendering of this version goes like this:
Rendering enters the annotated-surround macro. The context map contains:
$x = String x
$y = String y
$bodyContent = Renderable content - note that the String output of this has not yet been evaluated.
Rendering of the first line executes the #set directive, adding a variable to the context map: $annotated-surround-content = current $bodyContent = Renderable content.
Rendering of the second line enters the surround macro. This updates the context map to:
new $x = old $x = String x
$y = String y
$annotated-surround-content = old $bodyContent = Renderable content
$bodyContent = Renderable annotate:$y\n$annotated-surround-content
Rendering outputs the first line of surround, producing the String surround:x.
Rendering begins evaluating the second line of surround, which references $bodyContent.
Rendering the first line of $bodyContent produces the String annotate:y.
Rendering begins evaluating the second line of $bodyContent, which references $annotated-surround-content.
Rendering $annotated-surround-content produces the String content.
Rendering outputs the third line of surround, producing the String /surround:x.
The final rendered output is surround:x annotate:y content /surround:x. This approach can be generalized by applying such substitutions to all occurrences of $bodyContent that are inside the content of another macro call, each time using a variable name derived from the macro's name to ensure uniqueness. It won't work for recursive macros without something extra to distinguish each nested invocation, however.
Regarding the scope setting, all that does is add a $macro object to the context, which is unique to each macro invocation and can be used as a map. If you set $macro.myVar to something different in each of two nested macro calls, the outer macro's value for it will be unchanged when the inner one finishes. This does not help with the $bodyContent issue, however, because any reference to $macro inside a macro's $bodyContent will be resolved to the innermost macro when it's rendered.
Regarding the final 4 from #recursive( 4 ), that comes from a combination of macro arguments having local scope and being passed by name. For all but the outermost invocation of #recursive, the argument $x is a reference to the global context variable $xMinusOne - when they render the after line, the use of $x is actually resolved to looking up the current value of $xMinusOne in the global context. For the outermost invocation it is instead the constant value 4, and the arguments of the inner invocations go out of scope when they finish, so when the outermost one gets to the final line it's back to being 4.
Starting with the easiest, macro.provide.scope.control=true will definitely create a separate $macro scope object for every macro invocation. Otherwise, as you note, the $macro.parent would be nonsense. The whole point of the "scope controls" is to provide an explicit namespace for the type of VTL block in question. You can even do surround.provide.scope.control=true to automatically create a $surround scope inside of ##surround bodyContent.
On your first question, i'm a little confused as to what's happening. Both the call to ##annotate-surround and the nested call to ##surround will make $bodyContent references available. Am i right that's what happening is that the "wrong" $bodyContent is being used? The $bodyContent reference should belong to the nearest block macro call. To reference the outer macro's $bodyContent within the inner macro, you'll probably need to #set( $macro.bodyContent = $bodyContent ) and then, within the inner, use it via $macro.parent.bodyContent
As for #recursive weirdness, i don't know offhand and have to get on to other work now. It also doesn't help that i don't have Velocity checked out on my present machine, so i can't quickly try things out.

Resources