Why aren't SAS Macro Variables Local-Scope by Default?

Why aren't SAS Macro Variables Local-Scope by Default? - scope

I found this very helpful SO page while trying to resolve an issue related to macro variable scope.
why doesn't %let create a local macro variable?
So to summarize, writing %let x = [];or %do x = [] %to []; in a macro will:
create a local-scope macro variable x if there is no "x" already in the global symbol table, or
update the global-scope macro variable "x" if an "x" is in the global symbol table
This strikes me as very non-intuitive. I would be willing to bet there are tons of bugs out in the SAS wilderness due to this design choice. I rarely see %local statements in macros, even above loop statements using common variable names like "i" or "counter." For example, I just pulled up the first paper with the word "macro" in the title from this list of SUGI and SAS Global Forum papers
http://www.lexjansen.com/cgi-bin/xsl_transform.php?x=sgf2015&c=sugi
And indeed, I found this code in the first SAS conference paper I opened:
%macro flag;
data CLAIMS;
set CLAIMS;
%do j= 1 %to 3;
if icd9px&j in (&codelist)
then _prostate=1;
%end;
run;
%mend;
%flag;
http://support.sas.com/resources/papers/proceedings15/1340-2015.pdf
Woe unto anyone who calls %flag and also has their own &j variable. They could easily end up with no log errors but bogus results because their &j is 4 everywhere after they call %flag, which will be (from experience) a bug that is no fun to track down. Or worse, they may never recognize their results are bogus.
So my question is, why was the decision made not to have all macro variables be local scope by default? Are there good reasons why SAS macro variable scope works the way it does?

Largely, because SAS is a 50 year old language which existed before lexical scoping was clearly preferred.
SAS has a mixture of the two scoping concepts, but is mostly dynamically scoped unless you intentionally change it. This means that just by reading a function's definition, you can't tell what variables will be available to it at run-time; and assignment statements apply to the version of a variable which is currently available at run-time (Rather than being enforced to be in the most local scope available).
That means that the macro compiler can't tell if a particular assignment statement is intended to be assigning a local macro variable, or a possibly-existing-at-runtime higher scope macro variable. SAS could enforce the local macro variable as you state, but that would turn SAS into a lexical scoping language, which isn't desired both based on consistency with past (keeping backwards compatibility) and based on functionality; SAS offers the ability to enforce lexical scoping (use %local) but doesn't offer the ability to intentionally alter a variable in a higher scope (some form of parent?) other than %global.
Note that Dynamic Scoping was very common back in the 60s and 70s. S-Plus, Lisp, etc. all had dynamic scoping. SAS tends to prefer backwards compatibility as far back as possible. SAS also is commonly used analysts, rather than programmers, and so needs to avoid complexity whenever possible. They offer %local for those of us who do want the advantages of lexical scoping

Answering WHY were the scoping rules defined this way is hard for me, without knowing the history of the macro language.
When I learned the macro language (on 6.12), I was lucky to be taught from early on that macros should always declare their variables to be %LOCAL, unless they had a really good reason not to. Sometimes if a macro var was not declared to be %local or %global I would even put a /* Not Local: MyMacVar */ comment in them to document that I did not intend to declare the scope (this is unusual but sometimes useful). It pains me to see UG papers, SO answers, etc, that do not declare variables as %LOCAL.
I'm going to guess (this is just a guess), that there was some early version of SAS which had (global) macro variables for text generation in code, but did not have macros. So in such a version, people would have gotten used to having lots of global macro variables, and the associated problems (e.g. collisions). Then, when SAS designed macros, the question would have come up, "Can I reference my macro vars from inside a macro?" And the designer chose to answer "yes, not only can you reference them, you can also assign values to them, and I'll make it easy by allowing you to do that by default. But also, a macro will create its own scope that can hold local macro variables. If you reference a macro var or assign a macro var that with the same name as a macro var that exists in a global scope (or any outer scope), I'll assume you are referencing the global macro variable (like you are used to already), unless you have explicitly declared the macro var to be %LOCAL."
From the perspective of the current macro language / macro developer, most folks think most global macro vars should be avoided. And one of the benefits of the macro language is that it provides macros that allow for modularization/encapsulation/information-hiding. When viewed from this perspective, %local variables are more useful, and macro variables that are not declared to be %local are a threat to encapsulation (i.e. collision threat). So I would tend to agree that if I were redesigning the macro language, I would make macro variables %local by default. But of course at this point, it's too late for a change.

Then we couldn't do this or a least not without a new declarative statement.
33 %let c=C is global;
34 %macro b(arg);
35 %let &arg=Set by B;
36 %mend b;
37 %macro a(arg);
38 %local c;
39 %b(c);
40 %put NOTE: &=c;
41 %mend a;
42 %a();
NOTE: C=Set by B

Related

What's the difference between 'my' and 'our' in Raku? [duplicate]

I've read the spec but I'm still confused how my class differs from [our] class. What are differences and when to use which?

The my scope declarator implies lexical scoping: following its declaration, the symbol is visible to the code within the current set of curly braces. We thus tend to call the region within a pair of curly braces a "lexical scope". For example:
sub foo($p) {
# say $var; # Would be a compile time error, it's not declared yet
my $var = 1;
if $p {
$var += 41; # Inner scope, $var is visible
}
return $var; # Same scope that it was declared in, $var is visible
}
# say $var; # $var is no longer available, the scope ended
Since the variable's visibility is directly associated with its location in the code, lexical scope is really helpful in being able to reason about programs. This is true for:
The programmer (both for their own reasoning about the program, but also because more errors can be detected and reported when things have lexical scope)
The compiler (lexical scoping permits easier and better optimization)
Tools such as IDEs (analyzing and reasoning about things with lexical scope is vastly more tractable)
Early on in the design process of the language that would become Raku, subroutines did not default to having lexical scope (and had our scope like in Perl), however it was realized that lexical scope is a better default. Making subroutine calls always try to resolve a symbol with lexical scope meant it was possible to report undeclared subroutines at compile time. Furthermore, the set of symbols in lexical scope is fixed at compile time, and in the case of declarative constructs like subroutines, the routine is bound to that symbol in a readonly manner. This also allows things like compile-time resolution of multiple dispatch, compile-time argument checking, and so forth. It is likely that future versions of the Raku language will specify an increasing number of compile-time checks on lexically scoped program elements.
So if lexical scoping is so good, why does our (also known as package) scope exist? In short, because:
Sometimes we want to share things more widely than within a given lexical scope. We could just declare everything lexical and then mark things we want to share with is export, but..
Once we get to the point of using a lot of different libraries, having everything try to export things into the single lexical scope of the consumer would likely lead to a lot of conflicts
Packages allow namespacing of symbols. For example, if I want to use the Cro clients for both HTTP and WebSockets in the same code, I can happily use both, and refer to them as Cro::HTTP::Client and Cro::WebSocket::Client respectively.
Packages are introduced by package declarators, such as class, module, grammar, and (with caveats) role. An our declaration will make an installation in the enclosing package construct.
These packages ultimately exist within a top-level package named GLOBAL - which is fitting, since they are effectively globally visible. If we declare an our-scoped variable, it is thus a global variable (albeit hopefully a namespaced one), about which enough has been written that we know we should pause for thought and wonder if a global variable is the best API decision (because, ultimately, everything that ends up visible via GLOBAL is an API decision).
Where things do get a bit blurry, however, is that we can have lexical packages. These are packages that do not get installed in GLOBAL. I find these extremely useful when doing OO programming. For example, I might have:
# This class that ends up in GLOBAL...
class Cro::HTTP::Client {
# Lexically scoped classes, which are marked `my` and thus hidden
# implementation details. This means I can refactor them however I
# want, and never have to worry about downstream fallout!
my class HTTP1Pipeline {
# Implementation...
}
my class HTTP2Pipeline {
# Implementation...
}
# Implementation...
}
Lexical packages can also be nested and contain our-scoped variables, however don't end up being globally visible (unless we somehow choose to leak them out).
Different Raku program elements have been ascribed a default scope:
Subroutines default to lexical (my) scope
Methods default to has scope (only visible through a method dispatch)
Type (class, role, grammar, subset) and module declarations default to package (our) scope
Constants and enumerations default to package (our) scope
Effectively, things that are most often there to be shared default to package scope, and the rest do not. (Variables do force us to pick a scope explicitly, however the most common choice is also the shortest one to type.)
Personally, I'm hesitant to make a thing more visible than the language defaults, however I'll often make them less visible (for example, my on constants that are for internal use, and on classes that I'm using to structure implementation details). When I could do something by exposing an our-scoped variable in a globally visible package, I'll still often prefer to make it my-scoped and provide a sub (exported) or method (visible by virtue of being on a package-scoped class) to control access to it, to buy myself some flexibility in the future. I figure it's OK to make wrong choices now if I've given myself space to make them righter in the future without inconveniencing anyone. :-)
In summary:
Use my scope for everything that's an implementation detail
Also use my scope for things that you plan to export, but remember exporting puts symbols into the single lexical scope of the consumer and risks name clashes, so be thoughtful about exporting particularly generic names
Use our for things that are there to be shared, and when its desired to use namespacing to avoid clashes
The elements we'd most want to share default to our scope anyway, so explicitly writing our should give pause for thought

As with variables, my binds a name lexically, whereas our additionally creates an entry in the surrounding package.
module M {
our class Foo {}
class Bar {} # same as above, really
my class Baz {}
}
say M::Foo; # ok
say M::Bar; # still ok
say M::Baz; # BOOM!
Use my for classes internal to your module. You can of course still make such local symbols available to importing code by marking them is export.

The my vs our distinction is mainly relevant when generating the symbol table. For example:
my $a; # Create symbol <$a> at top level
package Foo { # Create symbol <Foo> at top level
my $b; # Create symbol <$b> in Foo scope
our $c; # Create symbol <$c> in Foo scope
} # and <Foo::<$c>> at top level
In practice this means that anything that is our scoped is readily shared to the outside world by prefixing the package identifier ($Foo::c or Foo::<$c> are synonymous), and anything that is my scoped is not readily available — although you can certainly provide access to it via, e.g., getter subs.
Most of the time you'll want to use my. Most variables just belong to their current scope, and no one has any business peaking in. But our can be useful in some cases:
constants that don't poison the symbol table (this is why, actually, using constant implies an our scope). So you can make a more C-style enum/constants by using package Colors { constant red = 1; constant blue = 2; } and then referencing them as Colors::red
classes or subs that should be accessible but needn't be exported (or shouldn't be because overlapping symbols with builtins or other modules). Exporting symbols can be great, but sometimes it's also nice to have the package/module namespace to remind you what stuff goes with. As such, it's also a nice way to manage options at runtime via subs: CoolModule::set-preferences( ... ). (although dynamic variables can be used to nice effect here as well).
I'm sure others will comment with other times the our scope is useful, but these are the ones from my own experience.

What is the difference between macros and functions in Rust?

Quoted from the Rust blog:
One last thing to mention: Rust’s macros are significantly different from C macros, if you’ve used those
What is the difference between macros and function in Rust? How is it different from C?

Keep on reading the documentation, specifically the chapter on macros!
Rust functions vs Rust macros
Macros are executed at compile time. They generally expand into new pieces of code that the compiler will then need to further process.
Rust macros vs C macros
The biggest difference to me is that Rust macros are hygenic. The book has an example that explains what hygiene prevents, and also says:
Each macro expansion happens in a distinct ‘syntax context’, and each variable is tagged with the syntax context where it was introduced.
It uses this example:
For example, this C program prints 13 instead of the expected 25.
#define FIVE_TIMES(x) 5 * x
int main() {
printf("%d\n", FIVE_TIMES(2 + 3));
return 0;
}
Beyond that, Rust macros
Can be distributed with the compiled code
Can be overloaded in argument counts
Can match on syntax patterns like braces or parenthesis or commas
Can require a repeated input pattern
Can be recursive
Operate at the syntax level, not the text level

Quoting from the Rust documentation:
The Difference Between Macros and Functions
Fundamentally, macros are a way of writing code that writes other code, which
is known as metaprogramming. In Appendix C, we discuss the derive
attribute, which generates an implementation of various traits for you. We’ve
also used the println! and vec! macros throughout the book. All of these
macros expand to produce more code than the code you’ve written manually.
Metaprogramming is useful for reducing the amount of code you have to write and
maintain, which is also one of the roles of functions. However, macros have
some additional powers that functions don’t.
A function signature must declare the number and type of parameters the
function has. Macros, on the other hand, can take a variable number of
parameters: we can call println!("hello") with one argument or
println!("hello {}", name) with two arguments. Also, macros are expanded
before the compiler interprets the meaning of the code, so a macro can, for
example, implement a trait on a given type. A function can’t, because it gets
called at runtime and a trait needs to be implemented at compile time.
The downside to implementing a macro instead of a function is that macro
definitions are more complex than function definitions because you’re writing
Rust code that writes Rust code. Due to this indirection, macro definitions are
generally more difficult to read, understand, and maintain than function
definitions.
Another important difference between macros and functions is that you must
define macros or bring them into scope before you call them in a file, as
opposed to functions you can define anywhere and call anywhere.

In macro, you can take variable number of parameters.
In function you have to define number and type of parameters.

How to achieve encapsulation in J?

I'm not an expert on scope in J, so please correct me if I make a mistake. (That, in fact, is part of the reason for this question.)
What I want to do is create a name that is visible within (but not without) a locale. Note that assigning with =. does not achieve this.
I think this is impossible, but I'd love confirmation from a J expert.
After seeing Eelvex's answer, I feel I have to clarify my question. Here's what I want: I want a name that is global within a locale but invisible outside a locale, even if you know the name and qualify it with the locale suffix, exactly analogous to a private member of a class in OOP.
Let's imagine a J verb called private that makes a name private within a locale.
cocurrent 'foo'
x =: 3
private 'x' NB. x is still visible to all members of _foo_, but cannot be accessed in any way outside of _foo_
bar =: 3 : 'x & *'
cocurrent 'base'
bar_foo_ 14 NB. This works, because bar_foo_ can see x_foo_
x_foo_ NB. value error. We can't see x_foo_ because it's private to the locale.

Edit, (after OP's edit)
No, you can't hide a name. If an entity is visible in a locale, then it is accessible from all locales. AFAIK the only names that are truly private are names defined with =. inside an explicit : definition
Previews answer:
All names are visible within (but not without) their locale. Eg:
a_l1_ =: 15
a_l2_ =: 20
coclass 'l1'
a
15
coclass 'l2'
a
20
coclass 'base'
a
|value error: a

Short answer: Yes, it's impossible in current implementations.
Long answer: You probably should think of locales as being the public part of a class or object (though locales can also be used for other purposes, such as stack frames or closures).
If you want hidden information, you might think about putting it in a different process, or on a different machine, rather than in a locale. You could also try obscuring it (for example, using the foreign function interface, or files), but whether this is valid depends on your reasons for hiding the information.
That said, note that accessing arbitrary information in an arbitrary locale is somewhat like using the debugger api or reflection api in another language. You can do it, but if that's not what you want you should probably avoid doing that.
That said, in my opinion, you should ideally eliminate private state, rather than hide it. (And, if that winds up being too slow, you might also consider implementing the relevant speed-critical part of your code in some other language. J is wonderful for exploring architectural alternatives but the current implementations do not include compilers suitable for optimizing arbitrary, highly serial, algorithms. You could consider (13 :) or (f.) to be compilers - but they are not going to replace something like the gcc build tools and they currently are not capable of emitting code that gcc can handle.)
That said, it's also hypothetically possible that a language extension (analogous to 9!:24) could be added, to prevent explicit access to locales from new sentences.

Manipulating/Clearing Variables via Lists: Mathematica

My problem (in Mathematica) is referring to variables given in a particular array and manipulating them in the following manner (as an example):
Inputs: vars={x,y,z}, system=some ODE like x^2+3*x*y+...etc
(note that I haven't actually created variables x y and z)
Aim:
To assign values to the variables in the list "var" with the intention of inputting these values into the system of ODEs. Then, once I am done, clear the values of the variables in the array vars so that it is in its original form {x,y,z} (and not something like {x,1,3} where y=1 and z=3). I want to do this by referring to the positional elements of vars (I aim not to know that x, y and z are the actual variables).
The reason why: I am trying to write a program that can have any number of variables and ODEs as defined by the user. Since the number of variables and the actual letters used for them are unknown, it is necessary to perform manipulations with the array itself.
Attempt:
A fixed number of variables is easy. For the arbitrary case, I have tried modules and blocks, but with no success. Consider the following code:
Clear[x,y,z,vars,svars]
vars={x,y,z}
svars=Map[ToString,vars]
Module[{vars=vars,svars=svars},
Symbol[svars[[1]]]//Evaluate=1
]
then vars={1,y,z} and not {x,y,z} after running this. I have done functional programming with lists, atoms etc. Thus is makes sense to me that vars is changed afterwards, because I have changed x and not vars. However, I cannot get "x" in the list of variables to remain local. Of course I could put in "x" itself, but that is particular to this specific case. I would prefer to put something like:
Clear[x,y,z,vars,svars]
vars={x,y,z}
svars=Map[ToString,vars]
Module[{vars=vars,svars=svars, vars[[1]]},
Symbol[svars[[1]]]//Evaluate=1
]
which of course doesn't work because vars[[1]] is not a symbol or an assignment to a symbol.
Other possibilities:
I found a function
assignToName[name_String, value_] :=
ToExpression[name, InputForm, Function[var, var = value, HoldAll]]
which looked promising. Basically name_String is the name of the variable and value is its new value. I attempted to do:
vars={x,y,z}
svars=Map[ToString,vars]
vars[[1]]=//Evaluate=1
assignToName[svars[[1]],svars[[1]]]
but then something likeD[x^2, vars[[1]]] doesn't work (x is not a valid variable).
If I am missing something, or if perhaps I am going down the wrong path, I'm open to trying other things.
Thanks.

I can't say that I followed your train(s) of thought very well, so these are fragments which might help you to answer your own questions than a coherent and fully-formed answer. But to answer your final 'question', I think you may be going down some wrong path(s).
In passing, note that evaluating the expression
vars = {x,y,z}
does in fact define those three variables though it doesn't define any rewrite rules (such as values) for them.
Given a polynomial poly you can extract the variables in it with the function Variables[poly] so something like
Variables[x^2+3*x*y]
should return
{x,y}
Note that I write 'should' rather than does because I don't have Mathematica on this machine so my syntax may be a bit wonky. Note also that your example ODE is nothing of the sort but it strikes me that you can probably write a wrapper to manipulate an ODE into a form from which Variables can extract the variables. Mathematica offers a lot of other functions for picking expressions apart and re-assembling them, follow the trails from Variables. It often allows the use of functions defined on Lists on expressions with other heads too so it's always worth experimenting a bit.
There are a couple of widely applicable ways to avoid setting values of variables in Mathematica. For instance, you could write
x^2+3*x*y/.{x->2,y->3}
which will evaluate to
22
but not set values for x and y. This is a very simple example of using (sets of) replacement rules for temporary assignment of values to variables
The other way to avoid setting values for variables is to define functions using Modules or Blocks both of which define their own contexts. The documentation will tell you all about these two and the differences between them.
I can't help thinking that all your clever tricks using Symbol, ToExpression and ToString are a bit beside the point. Spend some time familiarising yourself with Mathematica's in-built functionality before going further down that route, you may well find you don't need to.
Finally, writing, in any language, expressions such as
vars=vars,svars=svars
will lead to madness. It may be syntactically correct, you may even be able to decrypt the semantics when you first write code like that, but in a week's time you will curse your younger self for writing it.

Is there a language with subroutines but no local variables?

I'm wondering if anyone if aware of a language that has support for variables (that could be considered 'global'), and subroutines (functions), but without a concept of parameter passing, local scope, etc. Something where every subroutine has access to every global variable, and only global variables.

BASIC and assembly come immediately to mind.
Of course, this is not construed as a feature. That's why we invent conventions for which global variables should be used for parameter passing.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string