So I have this problem where I have to figure out the output using two different scoping rules. I know the output using lexical scoping is a=3 and b=1, but I am having hard time figure out the output using dynamic scoping.
Note:the code example that follows uses C syntax, but let's just treat it as pseudo-code.
int a,b;
int p() {
int a, p;
a = 0; b = 1; p = 2;
return p;
}
void print() {
printf("%d\n%d\n",a,b);
}
void q () {
int b;
a = 3; b = 4;
print();
}
main() {
a = p();
q();
}
Here is what I come up with.
Using Dynamic scoping, the nonlocal references to a and b can change. So I have a=2 ( return from p() ), then b=4 ( inside q() ).
So the output is 2 4?
As we know, C doesn't have dynamic scoping, but assuming it did, the program would print 3 4.
In main, a and b are the global ones. a will be set to 2, as we will see that this is what p will return.
In p, called from main, b is still the global one, but a is the one local in p. The local a is set to 0, but will soon disappear. The global b is set to 1. The local p is set to 2, and 2 will be returned. Now the global b is 1.
In q, called from main, a is the global one, but b is the one local in q. Here the global a is set to 3, and the local b is set to 4.
In print, called from q, a is the global one (which has the value 3), and b is the one local in q (which has the value 4).
It is in this last step, inside the function print, that we see a difference from static scoping. With static scoping a and b would be the global ones. With dynamic scoping, we have to look at the chain of calling functions, and in q we find a variable b, which will be the b used inside print.
C is not a dynamically scoped language. If you want to experiment in order to understand the difference, you're better off with a language like Perl which lets you chose between both.
Related
I am working on a compiler in TypeScript and thinking a lot about lexical scope. In particular I'm wondering about how you handle the situation where you have hoisted functions, where variables can be undefined at one point, and then defined at another point. For example:
function a() {
let d = 10
b()
return b
let g = 40
function b() {
c()
let f = 30
return c
function c() {
console.log(d, e, f, g)
}
}
let e = 20
}
const x = a()() // returns b, returns c
x() // log the variables
Here, what is the "lexical scope" inside the function c?
Is it all the variables it could possibly/potentially have access to (at some point) during possible code evaluation? (d, e, f, g)
Is it what it only will eventually be defined within its runtime context? (d, f)
Is it what won't be undefined and throw an error? (d)
Is it every combination of possible parents all at once (like a cartesian product sort of thing, times however many parents in the tree)?
etc.
At first, for a while (for many years), I thought of lexical scope as a simple tree. You had higher scopes and nested scopes, and each nested scope introduced new variables. Nice and clean, easy-peasy. But this, this throws it for a loop. Now it's like, focusing on c, there are steps of evaluation inside b, and at each step, a possible different parent scope may exist. Then a might have a suite of parent scopes too. So it is like a natural flower/tree that blossoms upward (which is hard to imagine in programming). There are more and more possible parents. What I'm thinking is, the lexical scope is the combination of every combination of parents on upward. But then how do you use this scope in a compiler?
Now I am confused. What am I supposed to use the lexical scope for in a compiler if I can't tell for a nested function what 1 scope definition it is? I assumed each nested function has 1 scope, but no, it's parents are vast and combinatory. You can only really tell what the scope is when it is bound ("binding", as opposed to "scope"). That is, at the specific step in code evaluation, what the value of the scope is.
So how do you use this information in a compiler? I can tell if a variable is in scope only seemingly if I evaluate/simulate the code running itself. Basically, how do I use this lexical scope in a compiler now with this confusion?
I was going to use it as the source of truth for what variables were defined in a nested lexical scope block. But now I can't, because I need to know at each step what the values of the variables are before I can know what the scope is. Or am I missing something?
Thinking of trying something like this:
type SiteContainerScopeType = {
like: Site.ContainerScopeType
parent?: SiteContainerScopeType
children: Array<SiteContainerStepScopeType>
declarations: Record<string, SitePropertyDeclarationType>
}
type SiteContainerStepScopeType = {
like: Site.ContainerStepScope
previous: SiteContainerStepScopeType
context: SiteContainerScopeType
declarations: Record<string, SitePropertyDeclarationType>
}
If I understand correctly the semantics described in Understanding Hoisting in JavaScript, the code can be reorganized as follows, ordering first declarations, then variable initialization and finally the rest of the expressions.
function a() {
// DECLARATIONS
let d
let g
let e
function b() {
let f
function c() {
console.log(d, e, f, g)
}
f = 30
c()
return c
}
// INITIALIZATION
e = 20
d = 10
g = 40
// EXPRESSIONS
b()
return b
}
When doing this, we find a more usual structure of having variables declared before being initialized and used.
Could you please explain differences between and definition of call by value, call by reference, call by name and call by need?
Call by value
Call-by-value evaluation is the most common evaluation strategy, used in languages as different as C and Scheme. In call-by-value, the argument expression is evaluated, and the resulting value is bound to the corresponding variable in the function (frequently by copying the value into a new memory region). If the function or procedure is able to assign values to its parameters, only its local copy is assigned — that is, anything passed into a function call is unchanged in the caller's scope when the function returns.
Call by reference
In call-by-reference evaluation (also referred to as pass-by-reference), a function receives an implicit reference to a variable used as argument, rather than a copy of its value. This typically means that the function can modify (i.e. assign to) the variable used as argument—something that will be seen by its caller. Call-by-reference can therefore be used to provide an additional channel of communication between the called function and the calling function. A call-by-reference language makes it more difficult for a programmer to track the effects of a function call, and may introduce subtle bugs.
differences
call by value example
If data is passed by value, the data is copied from the variable used in for example main() to a variable used by the function. So if the data passed (that is stored in the function variable) is modified inside the function, the value is only changed in the variable used inside the function. Let’s take a look at a call by value example:
#include <stdio.h>
void call_by_value(int x) {
printf("Inside call_by_value x = %d before adding 10.\n", x);
x += 10;
printf("Inside call_by_value x = %d after adding 10.\n", x);
}
int main() {
int a=10;
printf("a = %d before function call_by_value.\n", a);
call_by_value(a);
printf("a = %d after function call_by_value.\n", a);
return 0;
}
The output of this call by value code example will look like this:
a = 10 before function call_by_value.
Inside call_by_value x = 10 before adding 10.
Inside call_by_value x = 20 after adding 10.
a = 10 after function call_by_value.
call by reference example
If data is passed by reference, a pointer to the data is copied instead of the actual variable as is done in a call by value. Because a pointer is copied, if the value at that pointers address is changed in the function, the value is also changed in main(). Let’s take a look at a code example:
#include <stdio.h>
void call_by_reference(int *y) {
printf("Inside call_by_reference y = %d before adding 10.\n", *y);
(*y) += 10;
printf("Inside call_by_reference y = %d after adding 10.\n", *y);
}
int main() {
int b=10;
printf("b = %d before function call_by_reference.\n", b);
call_by_reference(&b);
printf("b = %d after function call_by_reference.\n", b);
return 0;
}
The output of this call by reference source code example will look like this:
b = 10 before function call_by_reference.
Inside call_by_reference y = 10 before adding 10.
Inside call_by_reference y = 20 after adding 10.
b = 20 after function call_by_reference.
when to use which
One advantage of the call by reference method is that it is using pointers, so there is no doubling of the memory used by the variables (as with the copy of the call by value method). This is of course great, lowering the memory footprint is always a good thing. So why don’t we just make all the parameters call by reference?
There are two reasons why this is not a good idea and that you (the programmer) need to choose between call by value and call by reference. The reason are: side effects and privacy. Unwanted side effects are usually caused by inadvertently changes that are made to a call by reference parameter. Also in most cases you want the data to be private and that someone calling a function only be able to change if you want it. So it is better to use a call by value by default and only use call by reference if data changes are expected.
call by name
In call-by-name evaluation, the arguments to a function are not evaluated before the function is called — rather, they are substituted directly into the function body (using capture-avoiding substitution) and then left to be evaluated whenever they appear in the function.
call by need
Lazy evaluation, or call-by-need is an evaluation strategy which delays the evaluation of an expression until its value is needed (non-strict evaluation) and which also avoids repeated evaluations
Me and my friend are having some trouble in regards to understanding Static and Dynamic scoping. I believe with dynamic, the variable (global) will keep being updated by other functions until printed, whereas with static I think that whatever value get's assigned to a variable first stays that way.
Is this thinking correct or no?
For an example using my thoughts above I have calculated the following from this code snippet.
int a, b, c;
void p() {
int a = 3;
b = 1;
c = a + b;
q();
}
void print() { printf(“%d %d %d\n”, a, b, c); }
void q() {
int b = 4;
a = 5;
c = a + b;
print();
}
main() {
int c = 5;
p();
}
Output with static scoping: 315
Output with dynamic scoping: 549
With static scoping, print would fail because neither a, b, nor c are assigned values either inside print or at the scope where print is defined (namely, the first line of the file).
With dynamic scoping, the output would be 549, since each of a, b, and c has a value assigned in q. Not demonstrated by your code is also the fact that after q returns from its call inside p, the local variable a has the value 5 set in q, not the global variable. Namely, the following occurs:
Global variables a, b, and c are declared, but do not have values. Let's assume your language initializes such values to 0.
main is called. A variable c local to main is given the value 5; global c still equals 0.
p is called. A p-local variable a is assigned the value 3; global a is still 0.
No local variable b exists in p or its caller, main, so the global b is set to 1.
No local variable c exists in p, but one does in c, to its value is set to 3 + 1 = 4.
q is called. A local b is declared and set to 4, leaving global b set to 0.
No local variable a exists in q, but one does in its caller p, so that value changes from 3 to 5.
No local variable c exists in q or its caller p, but does in p's caller main, so that value is set to 5 + 4 = 9. Global c is still 0.
print is called, and lacking any local a, b, or c, it looks back in its call chain. It uses a from p, b from q, and c from main (none of the globals are used.
q returns. In p, the values of a and c are still 5 and 9 as set in q. b is still 1, since q declared a local b.
p returns. In main, we still have a=0 (since p declared its own copy before calling q), b=1 (since p modified the global b), and c=9 (since q ultimately modified the variable local to c).
main returns. We still have global a=0, b=1, and c=0.
If that's confusing (and I didn't confuse myself and make any mistakes), you might understand why most languages use static scoping: it's not only much easier, but possible, to reason about the behavior of the program without having to run or simulate it just to track variable assignments.
static scoping- 5 1 9// it takes global values as variables not defined within print function
dynamic scoping- 5 4 9
I want to know what is call-by-need.
Though I searched in wikipedia and found it here: http://en.wikipedia.org/wiki/Evaluation_strategy,
but could not understand properly.
If anyone can explain with an example and point out the difference with call-by-value, it would be a great help.
Suppose we have the function
square(x) = x * x
and we want to evaluate square(1+2).
In call-by-value, we do
square(1+2)
square(3)
3*3
9
In call-by-name, we do
square(1+2)
(1+2)*(1+2)
3*(1+2)
3*3
9
Notice that since we use the argument twice, we evaluate it twice. That would be wasteful if the argument evaluation took a long time. That's the issue that call-by-need fixes.
In call-by-need, we do something like the following:
square(1+2)
let x = 1+2 in x*x
let x = 3 in x*x
3*3
9
In step 2, instead of copying the argument (like in call-by-name), we give it a name. Then in step 3, when we notice that we need the value of x, we evaluate the expression for x. Only then do we substitute.
BTW, if the argument expression produced something more complicated, like a closure, there might be more shuffling of lets around to eliminate the possibility of copying. The formal rules are somewhat complicated to write down.
Notice that we "need" values for the arguments to primitive operations like + and *, but for other functions we take the "name, wait, and see" approach. We would say that the primitive arithmetic operations are "strict". It depends on the language, but usually most primitive operations are strict.
Notice also that "evaluation" still means to reduce to a value. A function call always returns a value, not an expression. (One of the other answers got this wrong.) OTOH, lazy languages usually have lazy data constructors, which can have components that are evaluated on-need, ie, when extracted. That's how you can have an "infinite" list---the value you return is a lazy data structure. But call-by-need vs call-by-value is a separate issue from lazy vs strict data structures. Scheme has lazy data constructors (streams), although since Scheme is call-by-value, the constructors are syntactic forms, not ordinary functions. And Haskell is call-by-name, but it has ways of defining strict data types.
If it helps to think about implementations, then one implementation of call-by-name is to wrap every argument in a thunk; when the argument is needed, you call the thunk and use the value. One implementation of call-by-need is similar, but the thunk is memoizing; it only runs the computation once, then it saves it and just returns the saved answer after that.
Imagine a function:
fun add(a, b) {
return a + b
}
And then we call it:
add(3 * 2, 4 / 2)
In a call-by-name language this will be evaluated so:
a = 3 * 2 = 6
b = 4 / 2 = 2
return a + b = 6 + 2 = 8
The function will return the value 8.
In a call-by-need (also called a lazy language) this is evaluated like so:
a = 3 * 2
b = 4 / 2
return a + b = 3 * 2 + 4 / 2
The function will return the expression 3 * 2 + 4 / 2. So far almost no computational resources have been spent. The whole expression will be computed only if its value is needed - say we wanted to print the result.
Why is this useful? Two reasons. First if you accidentally include dead code it doesn't weigh your program down and thus can be a lot more efficient. Second it allows to do very cool things like efficiently calculating with infinite lists:
fun takeFirstThree(list) {
return [list[0], list[1], list[2]]
}
takeFirstThree([0 ... infinity])
A call-by-name language would hang there trying to create a list from 0 to infinity. A lazy language will simply return [0,1,2].
A simple, yet illustrative example:
function choose(cond, arg1, arg2) {
if (cond)
do_something(arg1);
else
do_something(arg2);
}
choose(true, 7*0, 7/0);
Now lets say we're using the eager evaluation strategy, then it would calculate both 7*0 and 7/0 eagerly. If it is a lazy evaluated strategy (call-by-need), then it would just send the expressions 7*0 and 7/0 through to the function without evaluating them.
The difference? you would expect to execute do_something(0) because the first argument gets used, although it actually depends on the evaluation strategy:
If the language evaluates eagerly, then it will, as stated, evaluate 7*0 and 7/0 first, and what's 7/0? Divide-by-zero error.
But if the evaluation strategy is lazy, it will see that it doesn't need to calculate the division, it will call do_something(0) as we were expecting, with no errors.
In this example, the lazy evaluation strategy can save the execution from producing errors. In a similar manner, it can save the execution from performing unnecessary evaluation that it won't use (the same way it didn't use 7/0 here).
Here's a concrete example for a bunch of different evaluation strategies written in C. I'll specifically go over the difference between call-by-name, call-by-value, and call-by-need, which is kind of a combination of the previous two, as suggested by Ryan's answer.
#include<stdio.h>
int x = 1;
int y[3]= {1, 2, 3};
int i = 0;
int k = 0;
int j = 0;
int foo(int a, int b, int c) {
i = i + 1;
// 2 for call-by-name
// 1 for call-by-value, call-by-value-result, and call-by-reference
// unsure what call-by-need will do here; will likely be 2, but could have evaluated earlier than needed
printf("a is %i\n", a);
b = 2;
// 1 for call-by-value and call-by-value-result
// 2 for call-by-reference, call-by-need, and call-by-name
printf("x is %i\n", x);
// this triggers multiple increments of k for call-by-name
j = c + c;
// we don't actually care what j is, we just don't want it to be optimized out by the compiler
printf("j is %i\n", j);
// 2 for call-by-name
// 1 for call-by-need, call-by-value, call-by-value-result, and call-by-reference
printf("k is %i\n", k);
}
int main() {
int ans = foo(y[i], x, k++);
// 2 for call-by-value-result, call-by-name, call-by-reference, and call-by-need
// 1 for call-by-value
printf("x is %i\n", x);
return 0;
}
The part we're most interested in is the fact that foo is called with k++ as the actual parameter for the formal parameter c.
Note that how the ++ postfix operator works is that k++ returns k at first, and then increments k by 1. That is, the result of k++ is just k. (But, then after that result is returned, k will be incremented by 1.)
We can ignore all of the code inside foo up until the line j = c + c (the second section).
Here's what happens for this line under call-by-value:
When the function is first called, before it encounters the line j = c + c, because we're doing call-by-value, c will have the value of evaluating k++. Since evaluating k++ returns k, and k is 0 (from the top of the program), c will be 0. However, we did evaluate k++ once, which will set k to 1.
The line becomes j = 0 + 0, which behaves exactly like how you'd expect, by setting j to 0 and leaving c at 0.
Then, when we run printf("k is %i\n", k); we get that k is 1, because we evaluated k++ once.
Here's what happens for the line under call-by-name:
Since the line contains c and we're using call-by-name, we replace the text c with the text of the actual argument, k++. Thus, the line becomes j = (k++) + (k++).
We then run j = (k++) + (k++). One of the (k++)s will be evaluated first, returning 0 and setting k to 1. Then, the second (k++) will be evaluated, returning 1 (because k was set to 1 by the first evaluation of k++), and setting k to 2. Thus, we end up with j = 0 + 1 and k set to 2.
Then, when we run printf("k is %i\n", k);, we get that k is 2 because we evaluated k++ twice.
Finally, here's what happens for the line under call-by-need:
When we encounter j = c + c; we recognize that this is the first time the parameter c is evaluated. Thus we need to evaluate its actual argument (once) and store that value to be the evaluation of c. Thus, we evaluate the actual argument k++, which will return k, which is 0, and therefore the evaluation of c will be 0. Then, since we evaluated k++, k will be set to 1. We then use this stored evaluation as the evaluation for the second c. That is, unlike call-by-name, we do not re-evaluate k++. Instead, we reuse the previously evaluated initial value for c, which is 0. Thus, we get j = 0 + 0; just as if c was pass-by-value. And, since we only evaluated k++ once, k is 1.
As explained in the previous step, j = c + c is j = 0 + 0 under call-by-need, and it runs exactly as you'd expect.
When we run printf("k is %i\n", k);, we get that k is 1 because we only evaluated k++ once.
Hopefully this helps to differentiate how call-by-value, call-by-name, and call-by-need work. If it would be helpful to differentiate call-by-value and call-by-need more clearly, let me know in a comment and I'll explain the code earlier on in foo and why it works the way it does.
I think this line from Wikipedia sums things up nicely:
Call by need is a memoized variant of call by name, where, if the function argument is evaluated, that value is stored for subsequent use. If the argument is pure (i.e., free of side effects), this produces the same results as call by name, saving the cost of recomputing the argument.
I was wondering. Are there languages that use only pass-by-reference as their eval strategy?
I don't know what an "eval strategy" is, but Perl subroutine calls are pass-by-reference only.
sub change {
$_[0] = 10;
}
$x = 5;
change($x);
print $x; # prints "10"
change(0); # raises "Modification of a read-only value attempted" error
VB (pre .net), VBA & VBS default to ByRef although it can be overriden when calling/defining the sub or function.
FORTRAN does; well, preceding such concepts as pass-by-reference, one should probably say that it uses pass-by-address; a FORTRAN function like:
INTEGER FUNCTION MULTIPLY_TWO_INTS(A, B)
INTEGER A, B
MULTIPLY_BY_TWO_INTS = A * B
RETURN
will have a C-style prototype of:
extern int MULTIPLY_TWO_INTS(int *A, int *B);
and you could call it via something like:
int result, a = 1, b = 100;
result = MULTIPLY_TWO_INTS(&a, &b);
Another example are languages that do not know function arguments as such but use stacks. An example would be Forth and its derivatives, where a function can change the variable space (stack) in whichever way it wants, modifying existing elements as well as adding/removing elements. "prototype comments" in Forth usually look something like
(argument list -- return value list)
and that means the function takes/processes a certain, not necessarily constant, number of arguments and returns, again, not necessarily a constant, number of elements. I.e. you can have a function that takes a number N as argument and returns N elements - preallocating an array, if you so like.
How about Brainfuck?