In F# what does top-level mean? - scope

When people talk about F# they sometimes mention the term top-level;
what does top-level mean?
For example in previous SO Q&A
Error FS0037 sometimes, very confusing
Defining Modules VS.NET vs F# Interactive
What the difference between a namespace and a module in F#?
AutoOpen attribute in F#
F# and MEF: Exporting Functions
How to execute this F# function
The term also appears regularly in comments but for those Q&A I did not reference them.
The Wikipedia article on scope touches on this, but has no specifics for F#.
The F# 3.x spec only states:
11.2.1.1 Arity Conformance for Functions and Values
The parentheses indicate a top-level function, which might be a
first-class computed expression that computes to a function value,
rather than a compile-time function value.
13.1 Custom Attributes
For example, the STAThread attribute should be placed immediately
before a top-level “do” statement.
14.1.8 Name Resolution for Type Variables
It is initially empty for any member or any other top-level construct
that contains expressions and types.
I suspect the term has different meanings in different contexts:
Scope, F# interactive, shadowing.
If you could also explain the origins from F# predecessor languages, (ML, CAML, OCaml) it would be appreciated.
Lastly I don't plan to mark an answer as accepted for a few days to avoid hasty answers.

I think the term top-level has different meaning in different contexts.
Generally speaking, I'd use it whenever you have some structure that allows nesting to refer to the one position at the top that is not nested inside anything else.
For example, if you said "top-level parentheses" in an expression, it would refer to the outer-most pair of parentheses:
((1 + 2) * (3 * (8)))
^ ^
When talking about functions and value bindings (and scope) in F#, it refers to the function that is not nested inside another function. So functions inside modules are top-level:
module Foo =
let topLevel n =
let nested a = a * 10
10 + nested n
Here, nested is nested inside topLevel.
In F#, functions and values defined using let can appear inside modules or inside classes, which complicates things a bit - I'd say only those inside modules are top-level, but that's probably just because they are public by default.
The do keyword works similarly - you can nest it (although almost nobody does that) and so top-level do that allows STAThread attribute is the one that is not nested inside another do or let:
module Foo =
[<STAThread>]
do
printfn "Hello!"
Bud it is not allowed on any do nested inside another expression:
do
[<STAThread>]
do
printfn "Hello!"
printfn "This is odd notation, I know..."

Related

Difficulty of implementing `case` expressions in a template-instantiation evaluator for a lazy functional language?

I'm following "Implementing functional languages: a tutorial" by SPJ, and I'm stuck on Exercise 2.18 (page 70), reproduced below. This is in the chapter about a template-instantiation evaluator for the simple lazy functional language described in the book (similar to a mini Miranda/Haskell):
Exercise 2.18. Why is it hard to introduce case expressions into the template instantiation machine?
(Hint: think about what instantiate would do with a case expression.)
The tutorial then goes on to cover an implementation of several less-general versions of destructuring structured data: an if primitive, a casePair primitive, and a caseList primitive. I haven't yet done the implementation for this section (Chapter 2 Mark 5), but I don't see why implementing these separately would be significantly easier than implementing a single case primitive.
The only plausible explanations I can offer is that the most generic case form is variadic in both number of alternatives (number of tags to match against) and arity (number of arguments to the structured data). All of the above primitives are fixed-arity and have a known number of alternatives. I don't see why this would make implementation significantly more difficult, however.
The instantiation of the case statement is straightforward:
Instantiate the scrutinee.
Instantiate the body expression of each alternative. (This may be somewhat wasteful if we substitute in unevaluated branches.) (I notice now this may be a problem, will post in an answer.)
Encapsulate the result in a new node type, NCase, where:
data Node = NAp Addr Addr
| ...
| NCase [(Int, [Name], Addr)]
Operationally, the reduction of the case statement is straightforward.
Check if the argument is evaluated.
If not, make it the new stack and push the current stack to the dump. (Similar to evaluating the argument of any primitive.)
If the argument is evaluated, then search for an alternative with a matching tag.
If no alternative with a matching tag is found, then throw an error (inexhaustive case branches).
Instantiate the body of the matching alternative with the environment augmented with the structured data arguments. (E.g., in case Pack {0, 2} 3 4 in <0> a b -> a + b, instantiate a + b with environment [a <- 3, b <- 4])
A new node type would likely have to be introduced for case (NCase) containing the list of alternatives, but that's not too dissuading.
I found a GitHub repository #bollu/timi which seems to implement a template-instantiation evaluator also following this tutorial. There is a section called "Lack of lambda and case", which attributes the lack of a generic case statement to the following reason:
Case requires us to have some notion of pattern matching / destructuring which is not present in this machine.
However, in this tutorial there is no notion of pattern-matching; we would simply be matching by tag number (an integer), so I'm not sure if this explanation is valid.
Aside, partly for myself: a very similar question was asked about special treatment for case statements in the next chapter of the tutorial (concerning G-machines rather than template-instantiation).
I think I figured it out while I was expanding on my reasoning in the question. I'll post here for posterity, but if someone has a more understandable explanation or is able to correct me I'll be happy to accept it.
The difficulty lies in the fact that the instantiate step performs all of the variable substitutions, and this happens separately from evaluation (the step function). The problem is as bollu says in the GitHub repository linked in the original question: it is not easy to destructure structured data at instantiation time. This makes it difficult to instantiate the bodies of all of the alternatives.
To illustrate this, consider the instantiation of let expressions. This works like so:
Instantiate each new binding expression.
Augment the current environment with the new bindings.
Instantiate the body with the augmented expression.
However, now consider the case of case expressions. What we want to do is:
Instantiate the scrutinee. (Which should eventually evaluate to the form Pack {m, n} a0 a1 ... an)
For each alternative (each of which has the form <m> b0 b1 ... bn -> body), augment the environment with the new bindings ([b0 <- a0, b1 <- a1, ..., bn <- an] and then instantiate the body of the alternative.)
The problem lies somewhere in between the two steps: calling instantiate on the scrutinee results in the instantiated Addr, but we don't readily have access to a1, a2, ... an to augment the environment with at instantiation time. While this might be possible if the scrutinee was a literal Pack value, if it needed further evaluation (e.g., was the evaluated result of a call to a supercombinator) then we would need to first evaluate it.
To solidify my own understanding, I'd like to answer the additional question: How do the primitives if, casePair, and caseList avoid this problem?
if trivially avoids this problem because boolean values are nullary. casePair and caseList avoid this problem by deferring the variable bindings using thunk(s); the body expressions get instantiated once the thunk is called, which is after the scrutinee is evaluated.
Possible solutions:
I'm thinking that it might be possible to get around this if we define a destructuring primitive operator on structured data objects. I.e., (Pack {m, n} a0 a1 ... an).3 would evaluate to a3.
In this case, what we could do is call instantiate scrut which would give us the address scrutAddr, and we could then augment the environment with new bindings [b0 <- (NAp .0 scrut), b1 <- (NAp .1 scrut), ..., bn <- (NAp .n scrut)].
The issue seems to lie in the fact that instantiation (substitution) and evaluation are separated. If variables were not instantiated separately from evaluation but rather added to/looked up from the environment upon binding/usage, then this would not be a problem. This is as if we placed the bodies of the case statements into thunks to be instantiated after the scrutinee is evaluated, which is similar to what casePair and caseList do.
I haven't worked through either of these alternate solutions or how much extra work they would incur.

Does the functionality of Grouping operator `()` in JavaScript differ from Haskell or other programming languages?

Grouping operator ( ) in JavaScript
The grouping operator ( ) controls the precedence of evaluation in expressions.
Does the functionality ( ) in JavaScript itself differ from Haskell or any other programming languages?
In other words,
Is the functionality ( ) in programming languages itself affected by evaluation strategies ?
Perhaps we can share the code below:
a() * (b() + c())
to discuss the topic here, but not limited to the example.
Please feel free to use your own examples to illustrate. Thanks.
Grouping parentheses mean the same thing in Haskell as they do in high school mathematics. They group a sub-expression into a single term. This is also what they mean in Javascript and most other programming language1, so you don't have to relearn this for Haskell coming from other common languages, if you have learnt it the right way.
Unfortunately, this grouping is often explained as meaning "the expression inside the parentheses must be evaluated before the outside". This comes from the order of steps you would follow to evaluate the expression in a strict language (like high school mathematics). However the grouping really isn't really about the order in which you evaluate things, even in that setting. Instead it is used to determine what the expression actually is at all, which you need to know before you can do anythign at all with the expression, let alone evaluate it. Grouping is generally resolved as part of parsing the language, totally separate from the order in which any runtime evaluation takes place.
Let's consider the OP's example, but I'm going to declare that function call syntax is f{} rather than f() just to avoid using the same symbol for two purposes. So in my newly-made-up syntax, the OP's example is:
a{} * (b{} + c{})
This means:
a is called on zero arguments
b is called on zero arguments
c is called on zero arguments
+ is called on two arguments; the left argument is the result of b{}, and the right argument is the result of c{}
* is called on two arguments: the left argument is the result of a{}, and the right argument is the result of b{} + c{}
Note I have not numbered these. This is just an unordered list of sub-expressions that are present, not an order in which we must evaluate them.
If our example had not used grouping parentheses, it would be a{} * b{} + c{}, and our list of sub-expressions would instead be:
a is called on zero arguments
b is called on zero arguments
c is called on zero arguments
+ is called on two arguments; the left argument is the result of a{} * b{}, and the right argument is the result of c{}
* is called on two arguments: the left argument is the result of a{}, and the right argument is the result of b{}
This is simply a different set of sub-expressions from the first (because the overall expression doesn't mean the same thing). That is all that grouping parentheses do; they allow you to specify which sub-expressions are being passed as arguments to other sub-expressions2.
Now, in a strict language "what is being passed to what" does matter quite a bit to evaluation order. It is impossible in a strict language to call anything on "the result of a{} + b{} without first having evaluated a{} + b{} (and we can't call + without evaluating a{} and b{}). But even though the grouping determines what is being passed to what, and that partially determines evaluation order3, grouping isn't really "about" evaluation order. Evaluation order can change as a result of changing the grouping in our expression, but changing the grouping makes it a different expression, so almost anything can change as a result of changing grouping!
Non-strict languages like Haskell make it especially clear that grouping is not about order of evaluation, because in non-strict languages you can pass something like "the result of a{} + b{}" as an argument before you actually evaluate that result. So in my lists of subexpressions above, any order at all could potentially be possible. The grouping doesn't determine it at all.
A language needs other rules beyond just the grouping of sub-expressions to pin down evaluation order (if it wants to specify the order), whether it's strict or lazy. So since you need other rules to determine it anyway, it is best (in my opinion) to think of evaluation order as a totally separate concept than grouping. Mixing them up seems like a shortcut when you're learning high school mathematics, but it's just a handicap in more general settings.
1 In languages with roughly C-like syntax, parentheses are also used for calling functions, as in func(arg1, arg2, arg3). The OP themselves has assumed this syntax in their a() * (b() + c()) example, where this is presumably calling a, b, and c as functions (passing each of them zero arguments).
This usage is totally unrelated to grouping parentheses, and Haskell does not use parentheses for calling functions. But there can be some confusion because the necessity of using parentheses to call functions in C-like syntax sometimes avoids the need for grouping parentheses e.g. in func(2 + 3) * 6 it is unambiguous that 2 + 3 is being passed to func and the result is being multiplied by 6; in Haskell syntax you would need some grouping parentheses because func 2 + 3 * 6 without parentheses is interpreted as the same thing as (func 2) + (3 * 6), which is not func (2 + 3) * 6.
C-like syntax is not alone in using parentheses for two totally unrelated purposes; Haskell overloads parentheses too, just for different things in addition to grouping. Haskell also uses them as part of the syntax for writing tuples (e.g. (1, True, 'c')), and the unit type/value () which you may or may not want to regard as just an "empty tuple".
2 Which is also what associativity and precedence rules for operators do. Without knowing that * is higher precedence than +, a * b + c is ambiguous; there would be no way to know what it means. With the precedence rules, we know that a * b + c means "add c to the result of multiplying a and b", but we now have no way to write down what we mean when we want "multiply a by the result of adding b and c" unless we also allow grouping parentheses.
3 Even in a strict language the grouping only partially determines evaluation order. If you look at my "lists of sub-expressions" above it's clear that in a strict language we need to have evaluated a{}, b{}, and c{} early on, but nothing determines whether we evaluate a{} first and then b{} and then c{}, or c{} first, and then a{} and then b{}, or any other permutation. We could even evaluate only the two of them in the innermost +/* application (in either order), and then the operator application before evaluating the third named function call, etc etc.
Even in a strict language, the need to evaluate arguments before the call they are passed to does not fully determine evaluation order from the grouping. Grouping just provides some constraints.
4 In general in a lazy language evaluation of a given call happens a bit at a time, as it is needed, so in fact in general all of the sub-evaluations in a given expression could be interleaved in a complicated fashion (not happening precisely one after the other) anyway.
To clarify the dependency graph:
Answer by myself (the Questioner), however, I am willing to be examined, and still waiting for your answer (not opinion based):
Grouping operator () in every language share the common functionality to compose Dependency graph.
In mathematics, computer science and digital electronics, a dependency graph is a directed graph representing dependencies of several objects towards each other. It is possible to derive an evaluation order or the absence of an evaluation order that respects the given dependencies from the dependency graph.
dependency graph 1
dependency graph 2
the functionality of Grouping operator () itself is not affected by evaluation strategies of any languages.

Does Unbound always need to be in a `FreshM` monad?

I'm working on a project based on some existing code that uses the unbound library.
The code uses unsafeUnbind a bunch, which is causing me problems.
I've tried using freshen, but I get the following error:
error "fresh encountered bound name!
Please report this as a bug."
I'm wondering:
Is the library intended to be used entirely within a FreshM monad? Or are their ways to do things like lambda application without being in Fresh?
What kinds of values can I give to freshen, in order to avoid the errors they list?
If I end up using unsafeUnbind, under what conditions is it safe to use?
Is the library intended to be used entirely within a FreshM monad? Or are their ways to do things like lambda application without being in Fresh?
In most situations you will want to operate within a Fresh or an LFresh monad.
What kinds of values can I give to freshen, in order to avoid the errors they list?
So I think the reason you're getting the error is because you're passing a term to freshen rather than a pattern. In Unbound, patterns are like a generalization of names: a single Name E is a pattern consisting of a single variable which stands for Es, but also (p1, p2) or [p] are patterns comprised of a pair of patterns p1 and p2 or a list of patterns p, respectively. This lets you define terms that bind two variables at the same time, for example. Other more exotic type constructors include Embed t and Rebind p1 p2 former makes a pattern that embeds a term inside of a pattern, while the latter is similar to (p1,p2) except that the names within p1 scope over p2 (for example if p2 has Embeded terms in it, p1 will be scope over those terms). This is really powerful because it lets you define things like Scheme's let* form, or telescopes like in dependently typed languages. (See the paper for details).
Now finally the type constructorBind p t is what brings a term and a type together: A term Bind p t means that the names in p are bound in Bind p t and scope over t. So an (untyped) lambda term might be constructed with data Expr = Lam (Bind Var Expr) | App Expr Expr | V Var where type Var = Name Expr.
So back to freshen. You should only call freshen on patterns so calling it on something of type Bind p t is incorrect (and I suspect the source of the error message you're seeing) - you should call it on just the p and then apply the resulting permutation to the term t to apply the renaming that freshen constructs.
If I end up using `unsafeUnbind, under what conditions is it safe to use?
The place where I've used it is if I need to temporarily sneak under a binder and do some operation that I know for sure does not do anything to the names. An example might be collecting some source position annotations from a term, or replacing some global constant by a closed term. Also if you can guarantee that the term you're working with already has been renamed so any names that you unsafeUnbind are going to be unique already.
Hope this helps.
PS: I maintain unbound-generics which is a clone of Unbound, but using GHC.Generics instead of RepLib.

How to run several operations back to back in a lambda in Haskell?

I'm teaching myself Haskell, and I am having difficulty understanding how I might pipeline a number of operations in the body of a lambda function without using a do block. Take the following for example:
par = (\c->
c + 1
c + 2
)
In Ruby and other imperative languages, I am used to being able to run serveral expressions in a lambda block by adding a linebreak between them. In Haskell, I looked for a similar construct and found that Haskell doesn't respect linebreaks in pure expressions.
So, syntactically, if I wanted to run the second line after the first without using a do block, what could I do?
Unless it's side-effectful code, having multiple independent expressions with no side effects makes no sense in Haskell (nor any other languages for that matter). You can use let to store the value of an expression in a name and then use it in the subsequent expression (the returm value of the lambda), or a do-block, but anything else is meaningless in Haskell and thus not a valid program.
In other words: it makes little sense to ask this question solely about syntax as Haskell's syntax is all about its semantics.

why do some languages require function to be declared in code before calling?

Suppose you have this pseudo-code
do_something();
function do_something(){
print "I am saying hello.";
}
Why do some programming languages require the call to do_something() to appear below the function declaration in order for the code to run?
Programming languages use a symbol table to hold the various classes, functions, etc. that are used in the source code. Some languages compile in a single pass, whereby the symbols are pulled out of the symbol table as soon as they are used. Others use two passes, where the first pass is used to populate the table, and then the second is used to find the entries.
Most languages with a static type system are designed to require definition before use, which means there must be some sort of declaration of a function before the call so that the call can be checked (e.g., is the function getting the right number and types of arguments). This sort of design helps both a person and a compiler reading the program: everything you see has already been defined. The ease of reading and the popularity of one-pass compilers may explain the popularity of this design rule.
Unfortunately definition before use does not play well with mutual recursion, and so language designers resorted to an ugly hack whereby you have
Declaration (sometimes called a "forward declaration" from the keyword in Pascal)
Use
Definition
You see the same phenomenon at the type level in C in the form of the "incomplete struct declaration."
Around 1990 some language designers figured out that the one-pass compiler with no abstract-syntax tree should be a thing of the past, and two very nice designs from that era—Modula-3 and Haskell got rid of definition before use: in those languages, any defined function or variable is visible throughout its scope, including parts of the program textually before the definition. In other words, mutual recursion is the default for both types and functions. Good on them, I say—these languages have no ugly and unnecessary forward declarations.
Why [have definition before use]?
Easy to write a one-pass compiler in 1975.
without definition before use, you have to think harder about mutual recursion, especially mutually recursive type definitions.
Some people think it makes it easier for a person to read the code.

Resources