GHC accepts this code, but it ought to be illegal syntax(?) Any guesses as to what's going on?
module Tilde where
~ x = x + 2 -- huh?
~ x +++ y = y * 3 -- this makes sense
The (+++) equation makes sense: it's declaring an operator, using infix syntax, and using an irrefutable pattern match on the first argument.
The first 'equation' looks like the same to start with. But there's no operator. If I ask
λ> :i ~
===> <interactive>:1:1: error: parse error on input `~'
λ> :i (~)
===> class (a ~ b) => (~) (a :: k) (b :: k)
-- Defined in `Data.Type.Equality'
instance [incoherent] forall k (a :: k) (b :: k). (a ~ b) => a ~ b
-- Defined in `Data.Type.Equality'
which is a bemusing discovery, but nothing to do with it(?) I can't define my own class or operator (~) -- Illegal binding of built-in syntax, not surprisingly.
Oh:
λ> :i x
===> x :: Integer -- GHCi defaulting, presumably
and trying to run x loops for ever. So the strangeness is actually defining
x = x + 2
Then what's the ~ doing?
The tilde is doing exactly what it did in your other example: it makes the pattern irrefutable (so the pattern match can not fail). The pattern already was irrefutable, of course, in both cases (being a plain variable, which always matches), but that doesn't make the tilde illegal, just unnecessary.
Writing
x = 5
creates a global variable named x, bound to the value 5. Adding a tilde makes the pattern match irrefutable, but it was already irrefutable, so that doesn't make much sense. But it's legal to write something like
(xs, ys) = span odd [1..10]
This defines two global variables, xs and ys, containing all the odd numbers and all the even numbers between 1 and 10. You could even make this irrefutable if you want by adding a tilde. Of course, this pattern can't fail (if the expression is well-typed), so there's no point to that. But consider:
~(x:xs) = filter odd [1..10]
This defines two global variables, x and xs, if the filter returns at least one result. If the filter were to return zero results, the pattern match would fail. (In practice, this means that accessing x or xs would throw a pattern match failure exception.)
You can even write utterly bizarre stuff like
False = True
This seemingly nonsensical declaration pattern-matches the pattern False against the value True, and does nothing either way. It's one of those obscure corners of the language.
Related
In Haskell, the () type has two values, namely, () and bottom. If you have an expression e :: (), there's no point in actually inspecting it, since either it's e = () or by inspecting it you're crashing a program which could otherwise have not crashed.
Hence, I figured that perhaps operations on values of type () would not inspect the value and would not distinguish between () and bottom.
However, this is wildly untrue:
▎λ ghci
GHCi, version 9.0.2: https://www.haskell.org/ghc/ :? for help
ghci> u = (undefined :: ())
ghci> show u
"*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:75:14 in base:GHC.Err
undefined, called at <interactive>:1:6 in interactive:Ghci1
ghci> () == u
*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:75:14 in base:GHC.Err
undefined, called at <interactive>:1:6 in interactive:Ghci1
ghci> f () = "ok"
ghci> f u
"*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:75:14 in base:GHC.Err
undefined, called at <interactive>:1:6 in interactive:Ghci1
What is the reason for this? Here are some conjectures:
For some reason that I can't think of, it's useful to be non-lazy on (). Sometimes we want that bottom to propagate.
Haskell semantics are written in such a way that destructuring any ADTs, even trivial ones, inspects them. This means that having case (undefined :: ()) of { () -> ... } not throw would be a violation of language semantics
() is an extremely special case and simply isn't worth the attention to eke out this tiny extra bit of safety in a massive language like Haskell
There's also the possible combination explanation of 2+3, that Haskell could have had semantics dictating that an expression case e of inspects e unless it is of type (), but that would pollute the language spec for relatively low benefit
I will address this part:
For some reason that I can't think of, it's useful to be non-lazy on (). Sometimes we want that bottom to propagate.
Let's have a look at Control.Parallel.Strategies (version 1, an older version). This is a module for parallel evaluation. Let's focus on one of its functions for the sake of simplicity:
parMap :: Strategy b -> (a -> b) -> [a] -> [b]
The result of parMap strat f xs is the same as map f xs, except that the list is computed in parallel. What is the strat argument? Well,
strat :: Strategy b
means
strat :: b -> ()
There are only two things you can do with strat:
call it and ignore the result, which by laziness amounts to not calling it at all;
call it and force the result, even if you know it's () or a bottom.
parMap does the latter, in parallel. This allows the caller to specify a strat argument that evaluates the list values of type b as needed. For example
parMap (\(x,y) -> ()) f xs
parMap (\(x,y) -> x `seq` ()) f xs
parMap (\(x,y) -> x `seq` y `seq` ()) f xs
are valid calls, and will cause parMap to evaluate the new list-of-pairs only to expose the pair constructor, also the first component, also the second component, respectively.
Hence, forcing the () result of strat in this case allows the user to control how much evaluation to perform during parMap, i.e. how much to force the result (in parallel), and consequently which parts of the result should be left unevaluated. (By comparison map f xs would leave the result fully unevaluated -- it is completely lazy. parMap can not do that otherwise it is not longer parallel.)
Minor digression: note that the GADT
data a :~: b where
Refl :: t :~: t
has one constructor like (). Here, it is mandatory that such values are forced as in:
foo :: Int :~: String -> Int -> String
foo Refl x = x ++ " hello"
Here the first argument must be a bottom. By forcing that, we make the function error out with an exception. If we did not force that, we would get a very nasty undefined behavior like those in C and C++, completely breaking type safety. Haskell will correctly reject any attempt to circumvent that:
foo :: Int :~: String -> Int -> String
foo _ x = x ++ " hello"
triggers a type error at compile time.
I don't know for sure, but I suspect it's none of the things you said. Instead, this is so that the language is predictable and consistent.
There are, essentially, two things you observed, and I consider them to be separate things. The first is that checking whether a x is indeed () with a case statement forces evaluation of x; the second is that the instances (of Show and Eq) are written to use a case statement.
Pattern matching: the predictable, consistent rule here is that if you write case <e0> of <pat> -> <e1>, then e0 is evaluated far enough to check whether the constructors in pat are in fact in the given places. Well, okay, there's some wrinkles here to do with irrefutable patterns; let's say instead that e0 is evaluated far enough to check whether pat actually does match! For the () type, that means that the pattern () causes full evaluation -- because you've specified the full value that you expect it to be -- while the pattern x or _ can match without further evaluation.
Class instances: the natural inductive way to specify what the various class instances do is to always have an outermost case that matches against each available constructor with simple variable patterns for the fields, then does something (presumably recursive calls) on each of the fields in turn. That is, simplifying a bit, the show implementation goes like:
show x = case x of
<Con0> field00 field01 field02 <...> -> "<Con0>"
++ " " ++ show field00
++ " " ++ show field01
++ " " ++ show field02
++ <...>
<Con1> field10 field11 field12 <...> -> "<Con1>"
++ " " ++ show field10
++ " " ++ show field11
++ " " ++ show field12
++ <...>
<...>
It is very natural for the specialization of this scheme to the single-constructor, zero-field type () to go:
show x = case x of
() -> "()"
(Additionally, the Report specifies that (==) is is always strict in both arguments; but that property would also arise naturally from the obvious way of writing a generic Eq instance derivation algorithm.) Therefore the path of least surprise is for class instances to pattern match on their argument(s).
#2 is definitely true.
The () type is just a nullary data type with special type/data constructor syntax:
data () = ()
As a result, the Haskell 2010 report, while only providing an informal semantics, makes it pretty clear in section 3.17.2 Informal Semantics of Pattern Matching that the expression:
case undefined of () -> "ack!"
will be evaluated as per rule #5:
Matching the pattern con pat1 … patn against a value, where con is a constructor defined by data, depends on the value:
If the value is of the form con v1 … vn, sub-patterns are matched left-to-right against the components of the data value; if all matches succeed, the overall match succeeds; the first to fail or diverge causes the overall match to fail or diverge, respectively.
If the value is of the form con′ v1 … vm, where con is a different constructor to con′, the match fails.
If the value is ⊥, the match diverges.
Here, the value of undefined is ⊥, so the third bullet point applies, and the match diverges. And if the match diverges, the program diverges, and if the program diverges it must terminate with an error or -- at worst -- loop forever. It cannot continue as if nothing has happened. Admittedly, this last part is not explicitly stated, but it is the only reasonable interpretation for the semantics of a divergent evaluation of an expression.
In Haskell (at least with GHC v8.8.4), being in the Num class does NOT imply being in the Eq class:
$ ghci
GHCi, version 8.8.4: https://www.haskell.org/ghc/ :? for help
λ>
λ> let { myEqualP :: Num a => a -> a -> Bool ; myEqualP x y = x==y ; }
<interactive>:6:60: error:
• Could not deduce (Eq a) arising from a use of ‘==’
from the context: Num a
bound by the type signature for:
myEqualP :: forall a. Num a => a -> a -> Bool
at <interactive>:6:7-41
Possible fix:
add (Eq a) to the context of
the type signature for:
myEqualP :: forall a. Num a => a -> a -> Bool
• In the expression: x == y
In an equation for ‘myEqualP’: myEqualP x y = x == y
λ>
It seems this is because for example Num instances can be defined for some functional types.
Furthermore, if we prevent ghci from overguessing the type of integer literals, they have just the Num type constraint:
λ>
λ> :set -XNoMonomorphismRestriction
λ>
λ> x=42
λ> :type x
x :: Num p => p
λ>
Hence, terms like x or 42 above have no reason to be comparable.
But still, they happen to be:
λ>
λ> y=43
λ> x == y
False
λ>
Can somebody explain this apparent paradox?
Integer literals can't be compared without using Eq. But that's not what is happening, either.
In GHCi, under NoMonomorphismRestriction (which is default in GHCi nowadays; not sure about in GHC 8.8.4) x = 42 results in a variable x of type forall p :: Num p => p.1
Then you do y = 43, which similarly results in the variable y having type forall q. Num q => q.2
Then you enter x == y, and GHCi has to evaluate in order to print True or False. That evaluation cannot be done without picking a concrete type for both p and q (which has to be the same). Each type has its own code for the definition of ==, so there's no way to run the code for == without deciding which type's code to use.3
However each of x and y can be used as any type in Num (because they have a definition that works for all of them)4. So we can just use (x :: Int) == y and the compiler will determine that it should use the Int definition for ==, or x == (y :: Double) to use the Double definition. We can even do this repeatedly with different types! None of these uses change the type of x or y; we're just using them each time at one of the (many) types they support.
Without the concept of defaulting, a bare x == y would just produce an Ambiguous type variable error from the compiler. The language designers thought that would be extremely common and extremely annoying with numeric literals in particular (because the literals are polymorphic, but as soon as you do any operation on them you need a concrete type). So they introduced rules that some ambiguous type variables should be defaulted to a concrete type if that allows compilation to continue.5
So what is actually happening when you do x == y is that the compiler is just picking Integer to use for x and y in that particular expression, because you haven't given it enough information to pin down any particular type (and because the defaulting rules apply in this situation). Integer has an Eq instance so it can use that, even though the most general types of x and y don't include the Eq constraint. Without picking something it couldn't possibly even attempt to call == (and of course the "something" it picks has to be in Eq or it still won't work).
If you turn on -Wtype-defaults (which is included in -Wall), the compiler will print a warning whenever it applies defaulting6, which makes the process more visible.
1 The forall p part is implicit in standard Haskell, because all type variables are automatically introduced with forall at the beginning of the type expression in which they appear. You have to turn on extensions to even write the forall manually; either ExplicitForAll just for the ability to write forall, or any one of the many extensions that actually add functionality that makes forall useful to write explicitly.
2 GHCi will probably pick p again for the type variable, rather than q. I'm just using a different one to emphasise that they're different variables.
3 Technically it's not each type that necessarily has a different ==, but each Eq instance. Some of those instances are polymorphic, so they apply to multiple types, but that only really comes up with types that have some structure (like Maybe a, etc). Basic types like Int, Integer, Double, Char, Bool, each have their own instance, and each of those instances has its own code for ==.
4 In the underlying system, a type like forall p. Num p => p is in fact much like a function; one that takes a Num instance for a concrete type as a parameter. To get a concrete value you have to first "apply the function" to a type's Num instance, and only then do you get an actual value that could be printed, compared with other things, etc. In standard Haskell these instance parameters are always invisibly passed around by the compiler; some extensions allow you to manipulate this process a little more directly.
This is the root of what's confusing about why x == y works when x and y are polymorphic variables. If you had to explicitly pass around the type/instance arguments it would be obvious what's going on here, because you would have to manually apply both x and y to something and compare the results.
5 The gist of the default rules is that if the constraints on an ambiguous type variable are:
all built-in classes
at least one of them is a numeric class (Num, Floating, etc)
then GHC will try Integer to see if that type checks and allows all other constraints to be resolved. If that doesn't work it will try Double, and if that doesn't work then it reports an error.
You can set the types it will try with a default declaration (the "default default" being default (Integer, Double)), but you can't customise the conditions under which it will try to default things, so changing the default types is of limited use in my experience.
GHCi however comes with extended default rules that are a bit more useful in an interpreter (because it has to do type inference line-by-line instead of on the whole module at once). You can turn those on in compiled code with ExtendedDefaultRules extension (or turn them off in GHCi with NoExtendedDefaultRules), but again, neither of those options is particularly useful in my experience. It's annoying that the interpreter and the compiler behave differently, but the fundamental difference between module-at-a-time compilation and line-at-a-time interpretation mean that switching either's default rules to work consistently with the other is even more annoying. (This is also why NoMonomorphismRestriction is in effect by default in the interpreter now; the monomorphism restriction does a decent job at achieving its goals in compiled code but is almost always wrong in interpreter sessions).
6 You can also use a typed hole in combination with the asTypeOf helper to get GHC to tell you what type it's inferring for a sub-expression like this:
λ :t x
x :: Num p => p
λ :t y
y :: Num p => p
λ (x `asTypeOf` _) == y
<interactive>:19:15: error:
• Found hole: _ :: Integer
• In the second argument of ‘asTypeOf’, namely ‘_’
In the first argument of ‘(==)’, namely ‘(x `asTypeOf` _)’
In the expression: (x `asTypeOf` _) == y
• Relevant bindings include
it :: Bool (bound at <interactive>:19:1)
Valid hole fits include
x :: forall p. Num p => p
with x
(defined at <interactive>:1:1)
it :: forall p. Num p => p
with it
(defined at <interactive>:10:1)
y :: forall p. Num p => p
with y
(defined at <interactive>:12:1)
You can see it tells us nice and simply Found hole: _ :: Integer, before proceeding with all the extra information it likes to give us about errors.
A typed hole (in its simplest form) just means writing _ in place of an expression. The compiler errors out on such an expression, but it tries to give you information about what you could use to "fill in the blank" in order to get it to compile; most helpfully, it tells you the type of something that would be valid in that position.
foo `asTypeOf` bar is an old pattern for adding a bit of type information. It returns foo but it restricts (this particular usage of) it to be the same type as bar (the actual value of bar is totally unused). So if you already have a variable d with type Double, x `asTypeOf` d will be the value of x as a Double.
Here I'm using asTypeOf "backwards"; instead of using the thing on the right to constrain the type of the thing on the left, I'm putting a hole on the right (which could have any type), but asTypeOf conveniently makes sure it's the same type as x without otherwise changing how x is used in the overall expression (so the same type inference still applies, including defaulting, which isn't always the case if you lift a small part of a larger expression out to ask GHCi for its type with :t; in particular :t x won't tell us Integer, but Num p => p).
I'm solving 99 Haskell Probems. I've successfully solved problem No. 21, and when I opened solution page, the following solution was proposed:
Insert an element at a given position into a list.
insertAt :: a -> [a] -> Int -> [a]
insertAt x xs (n+1) = let (ys,zs) = split xs n in ys++x:zs
I found pattern (n + 1) interesting, because it seems to be an elegant way to convert 1-based argument of insertAt into 0-based argument of split (it's function from previous exercises, essentially the same as splitAt). The problem is that GHC did not find this pattern that elegant, in fact it says:
Parse error in pattern: n + 1
I don't think that the guy who wrote the answer was dumb and I would like to know if this kind of patterns is legal in Haskell, and if it is, how to fix the solution.
I believe it has been removed from the language, and so was likely around when the author of 99 Haskell Problems wrote that solution, but it is no longer in Haskell.
The problem with n+k patterns goes back to a design decision in Haskell, to distinguish between constructors and variables in patterns by the first character of their names. If you go back to ML, a common function definition might look like (using Haskell syntax)
map f nil = nil
map f (x:xn) = f x : map f xn
As you can see, syntactically there's no difference between f and nil on the LHS of the first line, but they have different roles; f is a variable that needs to be bound to the first argument to map while nil is a constructor that needs to be matched against the second. Now, ML makes this distinction by looking each variable up in the surrounding scope, and assuming names are variables when the look-up fails. So nil is recognized as a constructor when the lookup fails. But consider what happens when there's a typo in the pattern:
map f niil = nil
(two is in niil). niil isn't a constructor name in scope, so it gets treated as a variable, and the definition is silently interpreted incorrectly.
Haskell's solution to this problem is to require constructor names to begin with uppercase letters, and variable names to begin with lowercase letters. And, for infix operators / constructors, constructor names must begin with : while operator names may not begin with :. This also helps distinguish between deconstructing bindings:
x:xn = ...
is clearly a deconstructing binding, because you can't define a function named :, while
n - m = ...
is clearly a function definition, because - can't be a constructor name.
But allowing n+k patterns, like n+1, means that + is both a valid function name, and something that works like a constructor in patterns. Now
n + 1 = ...
is ambiguous again; it could be part of the definition of a function named (+), or it could be a deconstructing pattern match definition of n. In Haskell 98, this ambiguity was solved by declaring
n + 1 = ...
a function definition, and
(n + 1) = ...
a deconstructing binding. But that obviously was never a satisfactory solution.
Note that you can now use view patterns instead of n+1.
For example:
{-# LANGUAGE ViewPatterns #-}
module Temp where
import Data.List (splitAt)
split :: [a] -> Int -> ([a], [a])
split = flip splitAt
insertAt :: a -> [a] -> Int -> [a]
insertAt x xs (subtract 1 -> n) = let (ys,zs) = split xs n in ys++x:zs
I have to dessignate types of 2 functions(without using compiler :t) i just dont know how soudl i read these functions to make correct steps.
f x = map -1 x
f x = map (-1) x
Well i'm a bit confuse how it will be parsed
Function application, or "the empty space operator" has higher precedence than any operator symbol, so the first line parses as f x = map - (1 x), which will most likely1 be a type error.
The other example is parenthesized the way it looks, but note that (-1) desugars as negate 1. This is an exception from the normal rule, where operator sections like (+1) desugar as (\x -> x + 1), so this will also likely1 be a type error since map expects a function, not a number, as its first argument.
1 I say likely because it is technically possible to provide Num instances for functions which may allow this to type check.
For questions like this, the definitive answer is to check the Haskell Report. The relevant syntax hasn't changed from Haskell 98.
In particular, check the section on "Expressions". That should explain how expressions are parsed, operator precedence, and the like.
These functions do not have types, because they do not type check (you will get ridiculous type class constraints). To figure out why, you need to know that (-1) has type Num n => n, and you need to read up on how a - is interpreted with or without parens before it.
The following function is the "correct" version of your function:
f x = map (subtract 1) x
You should be able to figure out the type of this function, if I say that:
subtract 1 :: Num n => n -> n
map :: (a -> b) -> [a] -> [b]
well i did it by my self :P
(map) - (1 x)
(-)::Num a => a->a->->a
1::Num b=> b
x::e
map::(c->d)->[c]->[d]
map::a
a\(c->d)->[c]->[d]
(1 x)::a
1::e->a
f::(Num ((c->d)->[c]->[d]),Num (e->(c->d)->[c]->[d])) => e->(c->d)->[c]->[d]
How should one reason about function evaluation in examples like the following in Haskell:
let f x = ...
x = ...
in map (g (f x)) xs
In GHC, sometimes (f x) is evaluated only once, and sometimes once for each element in xs, depending on what exactly f and g are. This can be important when f x is an expensive computation. It has just tripped a Haskell beginner I was helping and I didn't know what to tell him other than that it is up to the compiler. Is there a better story?
Update
In the following example (f x) will be evaluated 4 times:
let f x = trace "!" $ zip x x
x = "abc"
in map (\i -> lookup i (f x)) "abcd"
With language extensions, we can create situations where f x must be evaluated repeatedly:
{-# LANGUAGE GADTs, Rank2Types #-}
module MultiEvG where
data BI where
B :: (Bounded b, Integral b) => b -> BI
foo :: [BI] -> [Integer]
foo xs = let f :: (Integral c, Bounded c) => c -> c
f x = maxBound - x
g :: (forall a. (Integral a, Bounded a) => a) -> BI -> Integer
g m (B y) = toInteger (m + y)
x :: (Integral i) => i
x = 3
in map (g (f x)) xs
The crux is to have f x polymorphic even as the argument of g, and we must create a situation where the type(s) at which it is needed can't be predicted (my first stab used an Either a b instead of BI, but when optimising, that of course led to only two evaluations of f x at most).
A polymorphic expression must be evaluated at least once for each type it is used at. That's one reason for the monomorphism restriction. However, when the range of types it can be needed at is restricted, it is possible to memoise the values at each type, and in some circumstances GHC does that (needs optimising, and I expect the number of types involved mustn't be too large). Here we confront it with what is basically an inhomogeneous list, so in each invocation of g (f x), it can be needed at an arbitrary type satisfying the constraints, so the computation cannot be lifted outside the map (technically, the compiler could still build a cache of the values at each used type, so it would be evaluated only once per type, but GHC doesn't, in all likelihood it wouldn't be worth the trouble).
Monomorphic expressions need only be evaluated once, they can be shared. Whether they are is up to the implementation; by purity, it doesn't change the semantics of the programme. If the expression is bound to a name, in practice you can rely on it being shared, since it's easy and obviously what the programmer wants. If it isn't bound to a name, it's a question of optimisation. With the bytecode generator or without optimisations, the expression will often be evaluated repeatedly, but with optimisations repeated evaluation would indicate a compiler bug.
Polymorphic expressions must be evaluated at least once for every type they're used at, but with optimisations, when GHC can see that it may be used multiple times at the same type, it will (usually) still be shared for that type during a larger computation.
Bottom line: Always compile with optimisations, help the compiler by binding expressions you want shared to a name, and give monomorphic type signatures where possible.
Your examples are indeed quite different.
In the first example, the argument to map is g (f x) and is passed once to map most likely as partially applied function.
Should g (f x), when applied to an argument within map evaluate its first argument, then this will be done only once and then the thunk (f x) will be updated with the result.
Hence, in your first example, f xwill be evaluated at most 1 time.
Your second example requires a deeper analysis before the compiler can arrive at the conclusion that (f x) is always constant in the lambda expression. Perhaps it will never optimize it at all, because it may have knowledge that trace is not quite kosher. So, this may evaluate 4 times when tracing, and 4 times or 1 time when not tracing.
This is really dependent on GHC's optimizations, as you've been able to tell.
The best thing to do is to study the GHC core that you get after optimizing the program. I would look at the generated Core and examine whether f x had its own let statement outside the map or not.
If you want to be sure, then you should factor f x out into its own variable assigned in a let, but there's not really a guaranteed way to figure it out other than reading through Core.
All that said, with the exception of things like trace that use unsafePerformIO, this will never change the semantics of your program: how it actually behaves.
In GHC without optimizations, the body of a function is evaluated every time the function is called. (A "call" means the function is applied to arguments and the result is evaluated.) In the following example, f x is inside a function, so it will execute each time the function is called.
(GHC may optimize this expression as discussed in the FAQ [1].)
let f x = trace "!" $ zip x x
x = "abc"
in map (\i -> lookup i (f x)) "abcd"
However, if we move f x out of the function, it will execute only once.
let f x = trace "!" $ zip x x
x = "abc"
in map ((\f_x i -> lookup i f_x) (f x)) "abcd"
This can be rewritten more readably as
let f x = trace "!" $ zip x x
x = "abc"
g f_x i = lookup i f_x
in map (g (f x)) "abcd"
The general rule is that, each time a function is applied to an argument, a new "copy" of the function body is created. Function application is the only thing that may cause an expression to re-execute. However, be warned that some functions and function calls do not look like functions syntactically.
[1] http://www.haskell.org/haskellwiki/GHC/FAQ#Subexpression_Elimination