what is the meaning of "let x = x in x" and "data Float#" in GHC.Prim in Haskell - haskell

I looked at the module of GHC.Prim and found that it seems that all datas in GHC.Prim are defined as data Float# without something like =A|B, and all functions in GHC.Prim is defined as gtFloat# = let x = x in x.
My question is whether these definations make sense and what they mean.
I checked the header of GHC.Prim like below
{-
This is a generated file (generated by genprimopcode).
It is not code to actually be used. Its only purpose is to be
consumed by haddock.
-}
I guess it may have some relations with the questions and who could please explain that to me.

It's magic :)
These are the "primitive operators and operations". They are hardwired into the compiler, hence there are no data constructors for primitives and all functions are bottom since they are necessarily not expressable in pure haskell.
(Bottom represents a "hole" in a haskell program, an infinite loop or undefined are examples of bottom)
To put it another way
These data declarations/functions are to provide access to the raw compiler internals. GHC.Prim exists to export these primitives, it doesn't actually implement them or anything (eg its code isn't actually useful). All of that is done in the compiler.
It's meant for code that needs to be extremely optimized. If you think you might need it, some useful reading about the primitives in GHC

A brief expansion of jozefg's answer ...
Primops are precisely those operations that are supplied by the runtime because they can't be defined within the language (or shouldn't be, for reasons of efficiency, say). The true purpose of GHC.Prim is not to define anything, but merely to export some operations so that Haddock can document their existence.
The construction let x = x in x is used at this point in GHC's codebase because the value undefined has not yet been, um, "defined". (That waits until the Prelude.) But notice that the circular let construction, just like undefined, is both syntactically correct and can have any type. That is, it's an infinite loop with the semantics of ⊥, just as undefined is.
... and an aside
Also note that in general the Haskell expression let x = z in y means "change the variable x to the expression z wherever x occurs in the expression y". If you're familiar with the lambda calculus, you should recognize this as the reduction rule for the application of the lambda abstraction \x -> y to the term z. So is the Haskell expression let x = x in x nothing more than some syntax on top of the pure lambda calculus? Let's take a look.
First, we need to account for the recursiveness of Haskell's let expressions. The lambda calculus does not admit recursive definitions, but given a primitive fixed-point operator fix,1 we can encode recursiveness explicitly. For example, the Haskell expression let x = x in x has the same meaning as (fix \r x -> r x) z.2 (I've renamed the x on the right side of the application to z to emphasize that it has no implicit relation to the x inside the lambda).
Applying the usual definition of a fixed-point operator, fix f = f (fix f), our translation of let x = x in x reduces (or rather doesn't) like this:
(fix \r x -> r x) z ==>
(\s y -> s y) (fix \r x -> r x) z ==>
(\y -> (fix \r x -> r x) y) z ==>
(fix \r x -> r x) z ==> ...
So at this point in the development of the language, we've introduced the semantics of ⊥ from the foundation of the (typed) lambda calculus with a built-in fixed-point operator. Lovely!
We need a primitive fixed-point operation (that is, one that is built into the language) because it's impossible to define a fixed-point combinator in the simply typed lambda calculus and its close cousins. (The definition of fix in Haskell's Prelude doesn't contradict this—it's defined recursively, but we need a fixed-point operator to implement recursion.)
If you haven't seen this before, you should read up on fixed-point recursion in the lambda calculus. A text on the lambda calculus is best (there are some free ones online), but some Googling should get you going. The basic idea is that we can convert a recursive definition into a non-recursive one by abstracting over the recursive call, then use a fixed-point combinator to pass our function (lambda abstraction) to itself. The base-case of a well-defined recursive definition corresponds to a fixed point of our function, so the function executes, calling itself over and over again until it hits a fixed point, at which point the function returns its result. Pretty damn neat, huh?

Related

Sharing vs. non-sharing fixed-point combinator

This is the usual definition of the fixed-point combinator in Haskell:
fix :: (a -> a) -> a
fix f = let x = f x in x
On https://wiki.haskell.org/Prime_numbers, they define a different fixed-point combinator:
_Y :: (t -> t) -> t
_Y g = g (_Y g) -- multistage, non-sharing, g (g (g (g ...)))
-- g (let x = g x in x) -- two g stages, sharing
_Y is a non-sharing fixpoint combinator, here arranging for a recursive "telescoping" multistage primes production (a tower of producers).
What exactly does this mean? What is "sharing" vs. "non-sharing" in that context? How does _Y differ from fix?
"Sharing" means f x re-uses the x that it creates; but with _Y g = g . g . g . g . ..., each g calculates its output anew (cf. this and this).
In that context, the sharing version has much worse memory usage, leads to a space leak.1
The definition of _Y mirrors the usual lambda calculus definition's effect for the Y combinator, which emulates recursion by duplication, while true recursion refers to the same (hence, shared) entity.
In
x = f x
(_Y g) = g (_Y g)
both xs refer to the same entity, but each of (_Y g)s refer to equivalent, but separate, entity. That's the intention of it, anyway.
Of course thanks to referential transparency there's no guarantee in Haskell the language for any of this. But GHC the compiler does behave this way.
_Y g is a common sub-expression and it could be "eliminated" by a compiler by giving it a name and reusing that named entity, subverting the whole purpose of it. That's why the GHC has the "no common sub-expressions elimination" -fno-cse flag which prevents this explicitly. It used to be that you had to use this flag to achieve the desired behaviour here, but not anymore. GHC won't be as aggressive at common sub-expressions elimination anymore, with the more recent (read: several years now) versions.
disclaimer: I'm the author of that part of the page you're referring to. Was hoping for the back-and-forth that's usual on wiki pages, but it never came, so my work didn't get reviewed like that. Either no-one bothered, or it is passable (lacking major errors). The wiki seems to be largely abandoned for many years now.
1 The g function involved,
(3:) . minus [5,7..] . foldr (\ (x:xs) ⟶ (x:) . union xs) []
. map (\ p ⟶ [p², p² + 2p..])
produces an increasing stream of all odd primes given an increasing stream of all odd primes. To produce a prime N in value, it consumes its input stream up to the first prime above sqrt(N) in value, at least. Thus the production points are given roughly by repeated squaring, and there are ~ log (log N) of such g functions in total in the chain (or "tower") of these primes producers, each immediately garbage collectible, the lowest one producing its primes given just the first odd prime, 3, known a priori.
And with the two-staged _Y2 g = g x where { x = g x } there would be only two of them in the chain, but only the top one would be immediately garbage collectible, as discussed at the referenced link above.
_Y is translated to the following STG:
_Y f = let x = _Y f in f x
fix is translated identically to the Haskell source:
fix f = let x = f x in x
So fix f sets up a recursive thunk x and returns it, while _Y is a recursive function, and importantly it’s not tail-recursive. Forcing _Y f enters f, passing a new call to _Y f as an argument, so each recursive call sets up a new thunk; forcing the x returned by fix f enters f, passing x itself as an argument, so each recursive call is into the same thunk—this is what’s meant by “sharing”.
The sharing version usually has better memory usage, and also lets the GHC RTS detect some kinds of infinite loop. When a thunk is forced, before evaluation starts, it’s replaced with a “black hole”; if at any point during evaluation of a thunk a black hole is reached from the same thread, then we know we have an infinite loop and can throw an exception (which you may have seen displayed as Exception: <<loop>>).
I think you already received excellent answers, from a GHC/Haskell perspective. I just wanted to chime in and add a few historical/theoretical notes.
The correspondence between unfolding and cyclic views of recursion is rigorously studied in Hasegawa's PhD thesis: https://www.springer.com/us/book/9781447112211
(Here's a shorter paper that you can read without paying Springer: https://link.springer.com/content/pdf/10.1007%2F3-540-62688-3_37.pdf)
Hasegawa assumes a traced monoidal category, a requirement that is much less stringent than the usual PCPO assumption of domain theory, which forms the basis of how we think about Haskell in general. What Hasegawa showed was that one can define these "sharing" fixed point operators in such a setting, and established that they correspond to the usual unfolding view of fixed points from Church's lambda-calculus. That is, there is no way to tell them apart by making them produce different answers.
Hasegawa's correspondence holds for what's known as central arrows; i.e., when there are no "effects" involved. Later on, Benton and Hyland extended this work and showed that the correspondence holds for certain cases when the underlying arrow can perform "mild" monadic effects as well: https://pdfs.semanticscholar.org/7b5c/8ed42a65dbd37355088df9dde122efc9653d.pdf
Unfortunately, Benton and Hyland only allow effects that are quite "mild": Effects like the state and environment monads fit the bill, but not general effects like exceptions, lists, or IO. (The fixed point operators for these effectful computations are known as mfix in Haskell, with the type signature (a -> m a) -> m a, and they form the basis of the recursive-do notation.)
It's still an open question how to extend this work to cover arbitrary monadic effects. Though it doesn't seem to be receiving much attention these days. (Would make a great PhD topic for those interested in the correspondence between lambda-calculus, monadic effects, and graph-based computations.)

Monads, composition and the order of computation

All the monad articles often state, that monads allow you to sequence effects in order.
But what about simple composition? Ain't
f x = x + 1
g x = x * 2
result = f g x
requires g x to be computed before f ...?
Do monads do the same but with handling of effects?
Disclaimer: Monads are a lot of things. They are notoriously difficult to explain, so I will not attempt to explain what monads are in general here, since the question does not ask for that. I will assume you have a basic grasp on what the Monad interface is as well as how it works for some useful datatypes, like Maybe, Either, and IO.
What is an effect?
Your question begins with a note:
All the monad articles often state, that monads allow you to sequence effects in order.
Hmm. This is interesting. In fact, it is interesting for a couple reasons, one of which you have identified: it implies that monads let you create some sort of sequencing. That’s true, but it’s only part of the picture: it also states that sequencing happens on effects.
Here’s the thing, though… what is an “effect”? Is adding two numbers together an effect? Under most definitions, the answer would be no. What about printing something to stdout, is that an effect? In that case, I think most people would agree that the answer is yes. However, consider something more subtle: is short-circuiting a computation by producing Nothing an effect?
Error effects
Let’s take a look at an example. Consider the following code:
> do x <- Just 1
y <- Nothing
return (x + y)
Nothing
The second line of that example “short-circuits” due to the Monad instance for Maybe. Could that be considered an effect? In some sense, I think so, since it’s non-local, but in another sense, probably not. After all, if the x <- Just 1 or y <- Nothing lines are swapped, the result is still the same, so the ordering doesn’t matter.
However, consider a slightly more complex example that uses Either instead of Maybe:
> do x <- Left "x failed"
y <- Left "y failed"
return (x + y)
Left "x failed"
Now this is more interesting. If you swap the first two lines now, you get a different result! Still, is this a representation of an “effect” like the ones you allude to in your question? After all, it’s just a bunch of function calls. As you know, do notation is just an alternative syntax for a bunch of uses of the >>= operator, so we can expand it out:
> Left "x failed" >>= \x ->
Left "y failed" >>= \y ->
return (x + y)
Left "x failed"
We can even replace the >>= operator with the Either-specific definition to get rid of monads entirely:
> case Left "x failed" of
Right x -> case Left "y failed" of
Right y -> Right (x + y)
Left e -> Left e
Left e -> Left e
Left "x failed"
Therefore, it’s clear that monads do impose some sort of sequencing, but this is not because they are monads and monads are magic, it’s just because they happen to enable a style of programming that looks more impure than Haskell typically allows.
Monads and state
But perhaps that is unsatisfying to you. Error handling is not compelling because it is simply short-circuiting, it doesn’t actually have any sequencing in the result! Well, if we reach for some slightly more complex types, we can do that. For example, consider the Writer type, which allows a sort of “logging” using the monadic interface:
> execWriter $ do
tell "hello"
tell " "
tell "world"
"hello world"
This is even more interesting than before, since now the result of each computation in the do block is unused, but it still affects the output! This is clearly side-effectful, and order is clearly very important! If we reorder the tell expressions, we would get a very different result:
> execWriter $ do
tell " "
tell "world"
tell "hello"
" worldhello"
But how is this possible? Well, again, we can rewrite it to avoid do notation:
execWriter (
tell "hello" >>= \_ ->
tell " " >>= \_ ->
tell "world")
We could inline the definition of >>= again for Writer, but it’s too long to be terribly illustrative here. The point is, though, that Writer is just a completely ordinary Haskell datatype that doesn’t do any I/O or anything like that, and yet we have used the monadic interface to create something that looks like ordered effects.
We can go even further by creating an interface that looks like mutable state using the State type:
> flip execState 0 $ do
modify (+ 3)
modify (* 2)
6
Once again, if we reorder the expressions, we get a different result:
> flip execState 0 $ do
modify (* 2)
modify (+ 3)
3
Clearly, monads are a useful tool for creating interfaces that look stateful and have a well-defined ordering despite actually just being ordinary function calls.
Why can monads do this?
What gives monads this power? Well, they’re not magic—they’re just ordinary pure Haskell code. But consider the type signature for >>=:
(>>=) :: Monad m => m a -> (a -> m b) -> m b
Notice how the second argument depends on a, and the only way to get an a is from the first argument? This means that >>= needs to “run” the first argument to produce a value before it can apply the second argument. This doesn’t have to do with evaluation order so much as it has to do with actually writing code that will typecheck.
Now, it’s true that Haskell is a lazy language. But Haskell’s laziness doesn’t really matter for any of this because all of this code is actually pure, even the example using State! It’s simply a pattern that encodes computations that look sort of stateful in a pure way, but if you actually implemented State yourself, you’d find that it just passes around the “current state” in the definition of the >>= function. There’s not any actual mutation.
And that’s it. Monads, by virtue of their interface, impose an ordering on how their arguments may be evaluated, and instances of Monad exploit that to make stateful-looking interfaces. You don’t need Monad to have evaluation ordering, though, as you found; obviously in (1 + 2) * 3 the addition will be evaluated before the multiplication.
But what about IO??
Okay, you got me. Here’s the problem: IO is magic.
Monads are not magic, but IO is. All of the above examples are purely functional, but obviously reading a file or writing to stdout is not pure. So how the heck does IO work?
Well, IO is implemented by the GHC runtime, and you could not write it yourself. However, in order to make it work nicely with the rest of Haskell, there needs to be a well-defined evaluation order! Otherwise things would get printed out in the wrong order and all sorts of other hell would break loose.
Well, it turns out the Monad’s interface is a great way to ensure that evaluation order is predictable, since it works for pure code already. So IO leverages the same interface to guarantee the evaluation order is the same, and the runtime actually defines what that evaluation means.
However, don’t be misled! You don’t need monads to do I/O in a pure language, and you don’t need IO to have monadic effects. Early versions of Haskell experimented with a non-monadic way to do I/O, and the other parts of this answer explain how you can have pure monadic effects. Remember that monads are not special or holy, they’re just a pattern that Haskell programmers have found useful because of its various properties.
Yes, the functions you proposed are strict for the standard numerical types. But not all functions are! In
f _ = 3
g x = x * 2
result = f (g x)
it is not the case that g x must be computed before f (g x).
Yes, monads use function composition to sequence effects, and are not the only way to achieve sequenced effects.
Strict semantics and side effects
In most languages, there is sequencing by strict semantics applied first to the function side of an expression, then to each argument in turn, and finally the function is applied to the arguments. So in JS, the function application form,
<Code 1>(<Code 2>, <Code 3>)
runs four pieces of code in a specified order: 1, 2, 3, then it checks that the output of 1 was a function, then it calls the function with those two computed arguments. And it does this because any of those steps can have side-effects. You would write,
const logVal = (log, val) => {
console.log(log);
return val;
};
logVal(1, (a, b) => logVal(4, a+b))(
logVal(2, 2),
logVal(3, 3));
And that works for those languages. These are side-effects, which we can say in this context means that JS's type system doesn't give you any way to know that they are there.
Haskell does have a strict application primitive, but it wanted to be pure, which roughly means that it wanted the type system to track effects. So they introduced a form of metaprogramming where one of their types is a type-level adjective, “programs which compute a _____”. A program interacts with the real world; Haskell code in theory doesn't. You have to define “main is a program which computes a unit type” and then the compiler actually just builds that program for you as an executable binary file. By the time that file is run Haskell is not really in the picture any more!
This is therefore more specific than normal function application, because the abstract problem I wrote in JavaScript is,
I have a program which computes {a function from (X, Y) pairs to programs which compute Zs}.
I also have a program which computes an X, and a program which computes a Y.
I want to put these all together into a program which computes a Z.
That's not just function composition itself. But a function can do that.
Peeking inside monads
A monad is a pattern. The pattern is, sometimes you have an adjective which does not add much when you repeat it. For example there is not much added when you say "a delayed delayed x" or "zero or more (zero or more xs)" or "either a null or else either a null or else an x." Similarly for the IO monad, not much is added by "a program to compute a program to compute an x" that is not available in "a program to compute an x."
The pattern is that there is some canonical merging algorithm which merges:
join: given an <adjective> <adjective> x, I will make you an <adjective> x.
We also add two other properties, the adjective should be outputtish,
map: given an x -> y and an <adjective> x, I will make you an <adjective> y
and universally embeddable,
pure: given an x, I will make you an <adjective> x.
Given these three things and a couple axioms you happen to have a common "monad" idea which you can develop One True Syntax for.
Now this metaprogramming idea obviously contains a monad. In JS we would write,
interface IO<x> {
run: () => Promise<x>
}
function join<x>(pprog: IO<IO<x>>): IO<x> {
return { run: () => pprog.run().then(prog => prog.run()) };
}
function map<x, y>(prog: IO<x>, fn: (in: x) => y): IO<y> {
return { run: () => prog.run().then(x => fn(x)) }
}
function pure<x>(input: x): IO<x> {
return { run: () => Promise.resolve(input) }
}
// with those you can also define,
function bind<x, y>(prog: IO<x>, fn: (in: x) => IO<y>): IO<y> {
return join(map(prog, fn));
}
But the fact that a pattern exists does not mean it is useful! I am claiming that these functions turn out to be all you need to resolve the problem above. And it is not hard to see why: you can use bind to create a function scope inside of which the adjective doesn't exist, and manipulate your values there:
function ourGoal<x, y, z>(
fnProg: IO<(inX: x, inY: y) => IO<z>>,
xProg: IO<x>,
yProg: IO<y>): IO<z> {
return bind(fnProg, fn =>
bind(xProg, x =>
bind(yProg, y => fn(x, y))));
}
How this answers your question
Notice that in the above we choose an order of operations by how we write the three binds. We could have written those in some other order. But we needed all the parameters to run the final program.
This choice of how we sequence our operations is indeed realized in the function calls: you are 100% right. But the way that you are doing it, with only function composition, is flawed because it demotes the effects down to side-effects in order to get the types through.

Can any recursive definition be rewritten using foldr?

Say I have a general recursive definition in haskell like this:
foo a0 a1 ... = base_case
foo b0 b1 ...
| cond1 = recursive_case_1
| cond2 = recursive_case_2
...
Can it always rewritten using foldr? Can it be proved?
If we interpret your question literally, we can write const value foldr to achieve any value, as #DanielWagner pointed out in a comment.
A more interesting question is whether we can instead forbid general recursion from Haskell, and "recurse" only through the eliminators/catamorphisms associated to each user-defined data type, which are the natural generalization of foldr to inductively defined data types. This is, essentially, (higher-order) primitive recursion.
When this restriction is performed, we can only compose terminating functions (the eliminators) together. This means that we can no longer define non terminating functions.
As a first example, we lose the trivial recursion
f x = f x
-- or even
a = a
since, as said, the language becomes total.
More interestingly, the general fixed point operator is lost.
fix :: (a -> a) -> a
fix f = f (fix f)
A more intriguing question is: what about the total functions we can express in Haskell? We do lose all the non-total functions, but do we lose any of the total ones?
Computability theory states that, since the language becomes total (no more non termination), we lose expressiveness even on the total fragment.
The proof is a standard diagonalization argument. Fix any enumeration of programs in the total fragment so that we can speak of "the i-th program".
Then, let eval i x be the result of running the i-th program on the natural x as input (for simplicity, assume this is well typed, and that the result is a natural). Note that, since the language is total, then a result must exist. Moreover, eval can be implemented in the unrestricted Haskell language, since we can write an interpreter of Haskell in Haskell (left as an exercise :-P), and that would work as fine for the fragment. Then, we simply take
f n = succ $ eval n n
The above is a total function (a composition of total functions) which can be expressed in Haskell, but not in the fragment. Indeed, otherwise there would be a program to compute it, say the i-th program. In such case we would have
eval i x = f x
for all x. But then,
eval i i = f i = succ $ eval i i
which is impossible -- contradiction. QED.
In type theory, it is indeed the case that you can elaborate all definitions by dependent pattern-matching into ones only using eliminators (a more strongly-typed version of folds, the generalisation of lists' foldr).
See e.g. Eliminating Dependent Pattern Matching (pdf)

How did Haskell add Turing-completeness to System F?

I've been reading up on various type systems and lambda calculi, and i see that all of the typed lambda calculi in the lambda cube are strongly normalizing rather than Turing equivalent. This includes System F, the simply typed lambda calculus plus polymorphism.
This leads me to the following questions, for which I've been unable to find any comprehensible answer:
How does the formalism of (e.g.) Haskell differ from the calculus on which it is ostensibly based?
What language features in Haskell don't fall within System F formalism?
What's the minimum change necessary to allow Turing complete computation?
Thank you so much to whomever helps me understand this.
In a word, general recursion.
Haskell allows for arbitrary recursion while System F has no form of recursion. The lack of infinite types means fix isn't expressible as a closed term.
There is no primitive notion of names and recursion. In fact, pure System F has no notion of any such thing as definitions!
So in Haskell this single definition is what adds turing completeness
fix :: (a -> a) -> a
fix f = let x = f x in x
Really this function is indicative of a more general idea, by having fully recursive bindings, we get turing completeness. Notice that this applies to types, not just values.
data Rec a = Rec {unrec :: Rec a -> a}
y :: (a -> a) -> a
y f = u (Rec u)
where u x = f $ unrec x x
With infinite types we can write the Y combinator (modulo some unfolding) and through it general recursion!
In pure System F, we often have some informal notion of definitions, but these are simply shorthands that are to be mentally inlined fully. This isn't possible in Haskell as this would create infinite terms.
The kernel of Haskell terms without any notion of let, where or = is strongly normalizing, since we don't have infinite types. Even this core term calculus isn't really System F. System F has "big lambdas" or type abstraction. The full term for id in System F is
id := /\ A -> \(x : A) -> x
This is because type inference for System F is undecidable! We explicitly notate wherever and whenever we expect polymorphism. In Haskell such a property would be annoying, so we limit the power of Haskell. In particular, we never infer a polymorphic type for a Haskell lambda argument without annotation (terms and conditions may apply). This is why in ML and Haskell
let x = exp in foo
isn't the same as
(\x -> foo) exp
even when exp isn't recursive! This is the crux of HM type inference and algorithm W, called "let generalization".

What would pattern matching look like in a strict Haskell?

As a research experiment, I've recently worked on implementing strict-by-default Haskell modules. Instead of being lazy-by-default and having ! as an escape hatch, we're strict-by-default and have ~ as an escape hatch. This behavior is enabled using a {-# LANGUAGE Strict #-} pragma.
While working on making patterns strict I came up on an interesting question: should patterns be strict in the "top-level" only or in all bind variables. For example, if we have
f x = case x of
y -> ...
we will force y even though Haskell would not do so. The more tricky case is
f x = case x of
Just y -> ...
Should we interpret that as
f x = case x of
Just y -> ... -- already strict in 'x' but not in `y`
or
f x = case x of
Just !y -> ... -- now also strict in 'y'
(Note that we're using the normal, lazy Haskell Just here.)
One design constraint that might of value is this: I want the pragma to be modular. For example, even with Strict turned on we don't evaluate arguments to functions defined in other modules. That would make it non-modular.
Is there any prior art here?
As far as I understand things, refutable patterns are always strict at least on the outer level. Which is another way to say that the scrutinized expression must have been evaluated to WHNF, otherwise you couldn't see if it is a 'Just' or a 'Nothing'.
Hence your
!(Just y) -> ...
notation appears useless.
OTOH, since in a strict language, the argument to Just must already have been evaluated, the notation
Just !y ->
doesn't make sense either.

Resources