name capture in Haskell `let` expression - haskell

I was writing a function something similar to this:
f x = let
x = ...
in
e
Due to scoping rules in Haskell any use of x in e will resolve to the definition of x in the let construct.
Why is such a thing allowed in Haskell?
Shouldn't the compiler reject such a program telling we cannot bind a value that has the same name as argument of the function.
(This example may be simplistic, but in real world context where variables have semantic meaning associated with them it is easy to make such mistake)

You can enable warnings for this type of name shadowing with the compiler flag
-fwarn-name-shadowing
This option causes a warning to be emitted whenever an inner-scope value has the same name as an outer-scope value, i.e. the inner value shadows the outer one. This can catch typographical errors that turn into hard-to-find bugs, e.g., in the inadvertent capture of what would be a recursive call in f = ... let f = id in ... f ....
However, it is more common to compile with -Wall, which includes a lot of other warnings that will help you avoid bad practices.

Related

Understanding recurive let expression in lambda calculus with Haskell, OCaml and nix language

I'm trying to understand how recursive set operate internally by comparing similar feature in another functional programming languages and concepts.
I can find it in wiki. In that, I need to know Y combinator, fixed point. I can get it briefly in wiki.
Then, now I start to apply this in Haskell.
Haskell
It is easy. But I want to know behind the scenes.
*Main> let x = y; y = 10; in x
10
When you write a = f b in a lazy functional language like Haskell or Nix, the meaning is stronger than just assignment. a and f b will be the same thing. This is usually called a binding.
I'll focus on a Nix example, because you're asking about recursive sets specifically.
A simple attribute set
Let's look at the initialization of an attribute set first. When the Nix interpreter is asked to evaluate this file
{ a = 1 + 1; b = true; }
it parses it and returns a data structure like this
{ a = <thunk 1>; b = <thunk 2>; }
where a thunk is a reference to the relevant syntax tree node and a reference to the "environment", which behaves like a dictionary from identifiers to their values, although implemented more efficiently.
Perhaps the reason we're evaluating this file is because you requested nix-build, which will not just ask for the value of a file, but also traverse the attribute set when it sees that it is one. So nix-build will ask for the value of a, which will be computed from its thunk. When the computation is complete, the memory that held the thunk is assigned the actual value, type = tInt, value.integer = 2.
A recursive attribute set
Nix has a special syntax that combines the functionality of attribute set construction syntax ({ }) and let-binding syntax. This is avoids some repetition when you're constructing attribute sets with some shared values.
For example
let b = 1 + 1;
in { b = b; a = b + 5; }
can be expressed as
rec { b = 1 + 1; a = b + 5; }
Evaluation works in a similar manner.
At first the evaluator returns a representation of the attribute set with all thunks, but this time the thunks reference an new environment that includes all the attributes, on top of the existing lexical scope.
Note that all these representations can be constructed while performing a minimal amount of work.
nix-build traverses attrsets in alphabetic order, so it will evaluate a first. It's a thunk that references the a + syntax node and an environment with b in it. Evaluating this requires evaluating the b syntax node (an ExprVar), which references the environment, where we find the 1 + 1 thunk, which is changed to a tInt of 2 as before.
As you can see, this process of creating thunks but only evaluating them when needed is indeed lazy and allows us to have various language constructs with their own scoping rules.
Haskell implementations usually follow a similar pattern, but may compile the code rather than interpret a syntax tree, and resolve all variable references to constant memory offsets completely. Nix tries to do this to some degree, but it must be able to fall back on strings because of the inadvisable with keyword that makes the scope dynamic.
I guess several things by myself.
In eagar evaluation language, I must declare before use it. So the order of declaration is simple.
int x = 10;
int y = x;
Just for Nix language
In wiki, there isn't any concept comparision with Haskell though let ... in is compared with Haskell.
lexical scope
all variables are lexically scoped.
mutual recursion
https://en.wikipedia.org/wiki/Let_expression#Mutually_recursive_let_expression

Statements vs Expressions in Haskell, Ocaml, Javascript

In Haskell, afaik, there are no statements, just expressions. That is, unlike in an imperative language like Javascript, you cannot simply execute code line after line, i.e.
let a = 1
let b = 2
let c = a + b
print(c)
Instead, everything is an expression and nothing can simply modify state and return nothing (i.e. a statement). On top of that, everything would be wrapped in a function such that, in order to mimic such an action as above, you'd use the monadic do syntax and thereby hide the underlying nested functions.
Is this the same in OCAML/F# or can you just have imperative statements?
This is a bit of a complicated topic. Technically, in ML-style languages, everything is an expression. However, there is some syntactic sugar to make it read more like statements. For example, the sample you gave in F# would be:
let a = 1
let b = 2
let c = a + b
printfn "%d" c
However, the compiler silently turns those "statements" into the following expression for you:
let a = 1 in
let b = 2 in
let c = a + b in
printfn "%d" c
Now, the last line here is going to do IO, and unlike in Haskell, it won't change the type of the expression to IO. The type of the expression here is unit. unit is the F# way of expressing "this function doesn't really have result" in the type system. Of course, if the function doesn't have a result, in a purely functional language it would be pointless to call it. The only reason to call it would be for some side-effect, and since Haskell doesn't allow side-effects, they use the IO monad to encode the fact the function has an IO producing side-effect into the type system.
F# and other ML-based languages do allow side-effects like IO, so they have the unit type to represent functions that only do side-effects, like printing. When designing your application, you will generally want to avoid having unit-returning functions except for things like logging or printing. If you feel so inclined, you can even use F#'s moand-ish feature, Computation Expressions, to encapsulate your side-effects for you.
Not to be picky, but there's no language OCaml/F# :-)
To answer for OCaml: OCaml is not a pure functional language. It supports side effects directly through mutability, I/O, and exceptions. In many cases it treats such constructs as expressions with the value (), the single value of type unit.
Expressions of type unit can appear in a sequence separated by ;:
let s = ref 0 in
while !s < 10 do
Printf.printf "%d\n" !s; (* This has type unit *)
incr s (* This has type unit *)
done (* The while as a whole has type unit *)
Update
More specifically, ; ignores the value of the first expression and returns the value of the second expression. The first expression should have type unit but this isn't absolutely required.
# print_endline "hello"; 44 ;;
hello
- : int = 44
# 43 ; 44 ;;
Warning 10: this expression should have type unit.
- : int = 44
The ; operator is right associative, so you can write a ;-separated sequence of expressions without extra parentheses. It has the value of the last (rightmost) expression.
To answer the question we need to define what is an expression and what is a statement.
Distinction between expressions and statements
In layman terms, an expression is something that evaluates (reduces) to a value. It is basically something, that may occur on the right-hand side of the assignment operator. Contrary, a statement is some directive that doesn't produce directly a value.
For example, in Python, the ternary operator builds expressions, e.g.,
'odd' if x % 2 else 'even'
is an expression, so you can assign it to a variable, print, etc
While the following is a statement:
if x % 2:
'odd'
else:
'even'
It is not reduced to a value by Python, it couldn't be printed, assigned to a value, etc.
So far we were focusing more on the semantical differences between expressions and statements. But for a casual user, they are more noticeable on the syntactic level. I.e., there are places where a statement is expected and places where expressions are expected. For example, you can put a statement to the right of the assignment operator.
OCaml/Reason/Haskell/F# story
In OCaml, Reason, and F# such constructs as if, while, print etc are expressions. They all evaluate to values and can occur on the right-hand side of the assignment operator. So it looks like that there is no distinction between statements and expressions. Indeed, there are no statements in OCaml grammar at all. I believe, that F# and Reason are also not using word statement to exclude confusion. However, there are syntactic forms that are not expressions, for example:
open Core_kernel
it is not an expression, definitely, and
type students = student list
is not an expression.
So what is that? In the OCaml parlance, they are called definitions, and they are syntactic constructs that can appear in the module on the, so called, top-level. For example, in OCaml, there are value definitions, that look like this
let harry = student "Harry"
let larry = student "Larry"
let group = [harry; larry]
Every line above is a definition. And every line contains an expression on the right-hand side of the = symbol. In OCaml there is also a let expression, that has form let <v> = <exp> in <exp> that should not be confused with the top-level let definition.
Roughly the same is true for F# and Reason. It is also true for Haskell, that has a distinction between expressions and declarations. It actually should be true to probably every real-world language (i.e., excluding brainfuck and other toy languages).
Summary
So, all these languages have syntactic forms that are not expressions. They are not called statements per se, but we can treat them as statements. So there is a distinction between statements and expressions. The main difference from common imperative languages is that some well-known statements (e.g., if, while, for) are expressions in OCaml/F#/Reason/Haskell, and this is why people commonly say that there is no distinction between expressions and statements.

Infinite loop while updating record [duplicate]

I want to update a record syntax with a change in one field so i did something like:
let rec = rec{field = 1}
But I've noticed that i can't print rec anymore, means the compiler seems to get into an infinite loop when i try. so i have tried doing:
let a = 1 -- prints OK
let a = a -- now i can't print a (also stuck in a loop)
So i can't do let a = a with any type, but i don't understand why, and how should i resolve this issue.
BTW: while doing:
let b = a {...record changes..}
let a = b
works, but seems redundant.
The issue you're running into is that all let and where bindings in Haskell are recursive by default. So when you write
let rec = rec { ... }
it tries to define a circular data type that will loop forever when you try to evaluate it (just like let a = a).
There's no real way around this—it's a tradeoff in the language. It makes recursive functions (and even plain values) easier to write with less noise, but also means you can't easily redefine a a bunch of times in terms of itself.
The only thing you can really do is give your values different names—rec and rec' would be a common way to do this.
To be fair to Haskell, recursive functions and even recursive values come up quite often. Code like
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
can be really nice once you get the hang of it, and not having to explicitly mark this definition as recursive (like you'd have to do in, say, OCaml) is a definite upside.
You never need to update a variable : you can always make another variable with the new value. In your case let rec' = rec{field = 1}.
Maybe you worry about performance and the value being unnecessarily copied. That's the compiler's job, not yours : even if you declare 2 variables in your code, the compiler should only make one in memory and update it in place.
Now there are times when the code is so complex that the compiler fails to optimize. You can tell by inspecting the intermediate core language, or even the final assembly. Profile first to know what functions are slow : if it's just an extra Int or Double, you don't care.
If you do find a function that the compiler failed to optimize and that takes too much time, then you can rewrite it to handle the memory yourself. You will then use things like unboxed vectors, IO and ST monad, or even language extensions to access the native machine-level types.
First of all, Haskell does not allow "copying" data to itself, which in the normal sense, means the data is mutable. In Haskell you don't have mutable "variable"s, so you will not be able to modify the value a given variable presents.
All you have did, is define a new variable which have the same name of its previous version. But, to do this properly, you have to refer to the old variable, not the newly defined one. So your original definition
let rec = rec { field=1 }
is a recursive definition, the name rec refer to itself. But what you intended to do, is to refer to the rec defined early on.
So this is a name conflict.
Haskell have some machenism to work around this. One is your "temporary renaming".
For the original example this looks like
let rec' = rec
let rec = rec' { field=1 }
This looks like your given solution. But remember this is only available in a command line environment. If you try to use this in a function, you may have to write
let rec' = rec in let rec = rec' { field=1 } in ...
Here is another workaround, which might be useful when rec is belong to another module (say "MyModule"):
let rec = MyModule.rec { field=1 }

Haskell: let statement, copy data type to itself with/without modification not working

I want to update a record syntax with a change in one field so i did something like:
let rec = rec{field = 1}
But I've noticed that i can't print rec anymore, means the compiler seems to get into an infinite loop when i try. so i have tried doing:
let a = 1 -- prints OK
let a = a -- now i can't print a (also stuck in a loop)
So i can't do let a = a with any type, but i don't understand why, and how should i resolve this issue.
BTW: while doing:
let b = a {...record changes..}
let a = b
works, but seems redundant.
The issue you're running into is that all let and where bindings in Haskell are recursive by default. So when you write
let rec = rec { ... }
it tries to define a circular data type that will loop forever when you try to evaluate it (just like let a = a).
There's no real way around this—it's a tradeoff in the language. It makes recursive functions (and even plain values) easier to write with less noise, but also means you can't easily redefine a a bunch of times in terms of itself.
The only thing you can really do is give your values different names—rec and rec' would be a common way to do this.
To be fair to Haskell, recursive functions and even recursive values come up quite often. Code like
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
can be really nice once you get the hang of it, and not having to explicitly mark this definition as recursive (like you'd have to do in, say, OCaml) is a definite upside.
You never need to update a variable : you can always make another variable with the new value. In your case let rec' = rec{field = 1}.
Maybe you worry about performance and the value being unnecessarily copied. That's the compiler's job, not yours : even if you declare 2 variables in your code, the compiler should only make one in memory and update it in place.
Now there are times when the code is so complex that the compiler fails to optimize. You can tell by inspecting the intermediate core language, or even the final assembly. Profile first to know what functions are slow : if it's just an extra Int or Double, you don't care.
If you do find a function that the compiler failed to optimize and that takes too much time, then you can rewrite it to handle the memory yourself. You will then use things like unboxed vectors, IO and ST monad, or even language extensions to access the native machine-level types.
First of all, Haskell does not allow "copying" data to itself, which in the normal sense, means the data is mutable. In Haskell you don't have mutable "variable"s, so you will not be able to modify the value a given variable presents.
All you have did, is define a new variable which have the same name of its previous version. But, to do this properly, you have to refer to the old variable, not the newly defined one. So your original definition
let rec = rec { field=1 }
is a recursive definition, the name rec refer to itself. But what you intended to do, is to refer to the rec defined early on.
So this is a name conflict.
Haskell have some machenism to work around this. One is your "temporary renaming".
For the original example this looks like
let rec' = rec
let rec = rec' { field=1 }
This looks like your given solution. But remember this is only available in a command line environment. If you try to use this in a function, you may have to write
let rec' = rec in let rec = rec' { field=1 } in ...
Here is another workaround, which might be useful when rec is belong to another module (say "MyModule"):
let rec = MyModule.rec { field=1 }

ocaml style: parameterize programs

I have a OCaml program which modules have lots of functions that depend on a parameter, i.e. "dimension". This parameter is determined once at the beginning of a run of the code and stays constant until termination.
My question is: how can I write the code shorter, so that my functions do not all require a "dimension" parameter. Those modules call functions of each other, so there is no strict hierarchy (or I can't see this) between the modules.
how is the ocaml style to adress this problem? Do I have to use functors or are there other means?
You probably cannot evaluate the parameter without breaking dependencies between modules, otherwise you would just define it in one of the modules where it is accessible from other modules. A solution that comes to my mind is a bit "daring". Define the parameter as a lazy value, and suspend in it a dereference of a "global cell":
let hidden_box = ref None
let initialize_param p =
match !hidden_box with None -> hidden_box := Some p | _ -> assert false
let param =
lazy (match !hidden_box with None -> assert false | Some p -> p)
The downside is that Lazy.force param is a bit verbose.
ETA: Note that "there is no strict hierarchy between the modules" is either:
false, or
you have a recursive module definition, or
you are tying the recursive knot somewhere.
In case (2) you can just put everything into a functor. In case (3) you are passing parameters already.

Resources