Strictness of dataToTag argument - haskell

In GHC.Prim, we find a magical function named dataToTag#:
dataToTag# :: a -> Int#
It turns a value of any type into an integer based on the data constructor it uses. This is used to speed up derived implementations of Eq, Ord, and Enum. In the GHC source, the docs for dataToTag# explain that the argument should already by evaluated:
The dataToTag# primop should always be applied to an evaluated argument.
The way to ensure this is to invoke it via the 'getTag' wrapper in GHC.Base:
getTag :: a -> Int#
getTag !x = dataToTag# x
It makes total sense to me that we need to force x's evaluation before dataToTag# is called. What I do not get is why the bang pattern is sufficient. The definition of getTag is just syntactic sugar for:
getTag :: a -> Int#
getTag x = x `seq` dataToTag# x
But let's turn to the docs for seq:
A note on evaluation order: the expression seq a b does not guarantee that a will be evaluated before b. The only guarantee given by seq is that the both a and b will be evaluated before seq returns a value. In particular, this means that b may be evaluated before a. If you need to guarantee a specific order of evaluation, you must use the function pseq from the "parallel" package.
In the Control.Parallel module from the parallel package, the docs elaborate further:
... seq is strict in both its arguments, so the compiler may, for example, rearrange a `seq` b into b `seq` a `seq` b ...
How is it that getTag is guaranteed to behave work, given that seq is insufficient for controlling evaluation order?

GHC tracks certain information about each primop. One key datum is whether the primop "can_fail". The original meaning of this flag is that a primop can fail if it can cause a hard fault. For example, array indexing can cause a segmentation fault if the index is out of range, so indexing operations can fail.
If a primop can fail, GHC will restrict certain transformations around it, and in particular won't float it out of any case expressions. It would be rather bad, for example, if
if n < bound
then unsafeIndex c n
else error "out of range"
were compiled to
case unsafeIndex v n of
!x -> if n < bound
then x
else error "out of range"
One of these bottoms is an exception; the other is a segfault.
dataToTag# is marked can_fail. So GHC sees (in Core) something like
getTag = \x -> case x of
y -> dataToTag# y
(Note that case is strict in Core.) Because dataToTag# is marked can_fail, it won't be floated out of any case expressions.

Related

Why are values of type () inspected?

In Haskell, the () type has two values, namely, () and bottom. If you have an expression e :: (), there's no point in actually inspecting it, since either it's e = () or by inspecting it you're crashing a program which could otherwise have not crashed.
Hence, I figured that perhaps operations on values of type () would not inspect the value and would not distinguish between () and bottom.
However, this is wildly untrue:
▎λ ghci
GHCi, version 9.0.2: https://www.haskell.org/ghc/ :? for help
ghci> u = (undefined :: ())
ghci> show u
"*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:75:14 in base:GHC.Err
undefined, called at <interactive>:1:6 in interactive:Ghci1
ghci> () == u
*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:75:14 in base:GHC.Err
undefined, called at <interactive>:1:6 in interactive:Ghci1
ghci> f () = "ok"
ghci> f u
"*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:75:14 in base:GHC.Err
undefined, called at <interactive>:1:6 in interactive:Ghci1
What is the reason for this? Here are some conjectures:
For some reason that I can't think of, it's useful to be non-lazy on (). Sometimes we want that bottom to propagate.
Haskell semantics are written in such a way that destructuring any ADTs, even trivial ones, inspects them. This means that having case (undefined :: ()) of { () -> ... } not throw would be a violation of language semantics
() is an extremely special case and simply isn't worth the attention to eke out this tiny extra bit of safety in a massive language like Haskell
There's also the possible combination explanation of 2+3, that Haskell could have had semantics dictating that an expression case e of inspects e unless it is of type (), but that would pollute the language spec for relatively low benefit
I will address this part:
For some reason that I can't think of, it's useful to be non-lazy on (). Sometimes we want that bottom to propagate.
Let's have a look at Control.Parallel.Strategies (version 1, an older version). This is a module for parallel evaluation. Let's focus on one of its functions for the sake of simplicity:
parMap :: Strategy b -> (a -> b) -> [a] -> [b]
The result of parMap strat f xs is the same as map f xs, except that the list is computed in parallel. What is the strat argument? Well,
strat :: Strategy b
means
strat :: b -> ()
There are only two things you can do with strat:
call it and ignore the result, which by laziness amounts to not calling it at all;
call it and force the result, even if you know it's () or a bottom.
parMap does the latter, in parallel. This allows the caller to specify a strat argument that evaluates the list values of type b as needed. For example
parMap (\(x,y) -> ()) f xs
parMap (\(x,y) -> x `seq` ()) f xs
parMap (\(x,y) -> x `seq` y `seq` ()) f xs
are valid calls, and will cause parMap to evaluate the new list-of-pairs only to expose the pair constructor, also the first component, also the second component, respectively.
Hence, forcing the () result of strat in this case allows the user to control how much evaluation to perform during parMap, i.e. how much to force the result (in parallel), and consequently which parts of the result should be left unevaluated. (By comparison map f xs would leave the result fully unevaluated -- it is completely lazy. parMap can not do that otherwise it is not longer parallel.)
Minor digression: note that the GADT
data a :~: b where
Refl :: t :~: t
has one constructor like (). Here, it is mandatory that such values are forced as in:
foo :: Int :~: String -> Int -> String
foo Refl x = x ++ " hello"
Here the first argument must be a bottom. By forcing that, we make the function error out with an exception. If we did not force that, we would get a very nasty undefined behavior like those in C and C++, completely breaking type safety. Haskell will correctly reject any attempt to circumvent that:
foo :: Int :~: String -> Int -> String
foo _ x = x ++ " hello"
triggers a type error at compile time.
I don't know for sure, but I suspect it's none of the things you said. Instead, this is so that the language is predictable and consistent.
There are, essentially, two things you observed, and I consider them to be separate things. The first is that checking whether a x is indeed () with a case statement forces evaluation of x; the second is that the instances (of Show and Eq) are written to use a case statement.
Pattern matching: the predictable, consistent rule here is that if you write case <e0> of <pat> -> <e1>, then e0 is evaluated far enough to check whether the constructors in pat are in fact in the given places. Well, okay, there's some wrinkles here to do with irrefutable patterns; let's say instead that e0 is evaluated far enough to check whether pat actually does match! For the () type, that means that the pattern () causes full evaluation -- because you've specified the full value that you expect it to be -- while the pattern x or _ can match without further evaluation.
Class instances: the natural inductive way to specify what the various class instances do is to always have an outermost case that matches against each available constructor with simple variable patterns for the fields, then does something (presumably recursive calls) on each of the fields in turn. That is, simplifying a bit, the show implementation goes like:
show x = case x of
<Con0> field00 field01 field02 <...> -> "<Con0>"
++ " " ++ show field00
++ " " ++ show field01
++ " " ++ show field02
++ <...>
<Con1> field10 field11 field12 <...> -> "<Con1>"
++ " " ++ show field10
++ " " ++ show field11
++ " " ++ show field12
++ <...>
<...>
It is very natural for the specialization of this scheme to the single-constructor, zero-field type () to go:
show x = case x of
() -> "()"
(Additionally, the Report specifies that (==) is is always strict in both arguments; but that property would also arise naturally from the obvious way of writing a generic Eq instance derivation algorithm.) Therefore the path of least surprise is for class instances to pattern match on their argument(s).
#2 is definitely true.
The () type is just a nullary data type with special type/data constructor syntax:
data () = ()
As a result, the Haskell 2010 report, while only providing an informal semantics, makes it pretty clear in section 3.17.2 Informal Semantics of Pattern Matching that the expression:
case undefined of () -> "ack!"
will be evaluated as per rule #5:
Matching the pattern con pat1 … patn against a value, where con is a constructor defined by data, depends on the value:
If the value is of the form con v1 … vn, sub-patterns are matched left-to-right against the components of the data value; if all matches succeed, the overall match succeeds; the first to fail or diverge causes the overall match to fail or diverge, respectively.
If the value is of the form con′ v1 … vm, where con is a different constructor to con′, the match fails.
If the value is ⊥, the match diverges.
Here, the value of undefined is ⊥, so the third bullet point applies, and the match diverges. And if the match diverges, the program diverges, and if the program diverges it must terminate with an error or -- at worst -- loop forever. It cannot continue as if nothing has happened. Admittedly, this last part is not explicitly stated, but it is the only reasonable interpretation for the semantics of a divergent evaluation of an expression.

How do we overcome the compile time and runtime gap when programming in a Dependently Typed Language?

I'm told that in dependent type system, "types" and "values" is mixed, and we can treat both of them as "terms" instead.
But there is something I can't understand: in a strongly typed programming language without Dependent Type (like Haskell), Types is decided (infered or checked) at compile time, but values is decided (computed or inputed) at runtime.
I think there must be a gap between these two stages. Just think that if a value is interactively read from STDIN, how can we reference this value in a type which must be decided AOT?
e.g. There is a natural number n and a list of natural number xs (which contains n elements) which I need to read from STDIN, how can I put them into a data structure Vect n Nat?
Suppose we input n :: Int at runtime from STDIN. We then read n strings, and store them into vn :: Vect n String (pretend for the moment this can be done).
Similarly, we can read m :: Int and vm :: Vect m String. Finally, we concatenate the two vectors: vn ++ vm (simplifying a bit here). This can be type checked, and will have type Vect (n+m) String.
Now it is true that the type checker runs at compile time, before the values n,m are known, and also before vn,vm are known. But this does not matter: we can still reason symbolically on the unknowns n,m and argue that vn ++ vm has that type, involving n+m, even if we do not yet know what n+m actually is.
It is not that different from doing math, where we manipulate symbolic expressions involving unknown variables according to some rules, even if we do not know the values of the variables. We don't need to know what number is n to see that n+n = 2*n.
Similarly, the type checker can type check
-- pseudocode
readNStrings :: (n :: Int) -> IO (Vect n String)
readNStrings O = return Vect.empty
readNStrings (S p) = do
s <- getLine
vp <- readNStrings p
return (Vect.cons s vp)
(Well, actually some more help from the programmer could be needed to typecheck this, since it involves dependent matching and recursion. But I'll neglect this.)
Importantly, the type checker can check that without knowing what n is.
Note that the same issue actually already arises with polymorphic functions.
fst :: forall a b. (a, b) -> a
fst (x, y) = x
test1 = fst # Int # Float (2, 3.5)
test2 = fst # String # Bool ("hi!", True)
...
One might wonder "how can the typechecker check fst without knowing what types a and b will be at runtime?". Again, by reasoning symbolically.
With type arguments this is arguably more obvious since we usually run the programs after type erasure, unlike value parameters like our n :: Int above, which can not be erased. Still, there is some similarity between universally quantifying over types or over Int.
It seems to me that there are two questions here:
Given that some values are unknown during compile-time (e.g., values read from STDIN), how can we make use of them in types? (Note that chi has already given an excellent answer to this.)
Some operations (e.g., getLine) seem to make absolutely no sense at compile-time; how could we possibly talk about them in types?
The answer to (1), as chi has said, is symbolic or abstract reasoning. You can read in a number n, and then have a procedure that builds a Vect n Nat by reading from the command line n times, making use of arithmetic properties such as the fact that 1+(n-1) = n for nonzero natural numbers.
The answer to (2) is a bit more subtle. Naively, you might want to say "this function returns a vector of length n, where n is read from the command line". There are two types you might try to give this (apologies if I'm getting Haskell notation wrong)
unsafePerformIO (do n <- getLine; return (IO (Vect (read n :: Int) Nat)))
or (in pseudo-Coq notation, since I'm not sure what Haskell's notation for existential types is)
IO (exists n, Vect n Nat)
These two types can actually both be made sense of, and say different things. The first type, to me, says "at compile time, read n from the command line, and return a function which, at runtime, gives a vector of length n by performing IO". The second type says "at runtime, perform IO to get a natural number n and a vector of length n".
The way I like looking at this is that all side effects (other than, perhaps, non-termination) are monad transformers, and there is only one monad: the "real-world" monad. Monad transformers work just as well at the type level as at the term level; the one thing which is special is run :: M a -> a which executes the monad (or stack of monad transformers) in the "real world". There are two points in time at which you can invoke run: one is at compile time, where you invoke any instance of run which shows up at the type level. Another is at runtime, where you invoke any instance of run which shows up at the value level. Note that run only makes sense if you specify an evaluation order; if your language does not specify whether it is call-by-value or call-by-name (or call-by-push-value or call-by-need or call-by-something-else), you can get incoherencies when you try to compute a type.

Do newtypes incur no cost even when you cannot pattern-match on them?

Context
Most Haskell tutorials I know (e.g. LYAH) introduce newtypes as a cost-free idiom that allows enforcing more type safety. For instance, this code will type-check:
type Speed = Double
type Length = Double
computeTime :: Speed -> Length -> Double
computeTime v l = l / v
but this won't:
newtype Speed = Speed { getSpeed :: Double }
newtype Length = Length { getLength :: Double }
-- wrong!
computeTime :: Speed -> Length -> Double
computeTime v l = l / v
and this will:
-- right
computeTime :: Speed -> Length -> Double
computeTime (Speed v) (Length l) = l / v
In this particular example, the compiler knows that Speed is just a Double, so the pattern-matching is moot and will not generate any executable code.
Question
Are newtypes still cost-free when they appear as arguments of parametric types? For instance, consider a list of newtypes:
computeTimes :: [Speed] -> Length -> [Double]
computeTimes vs l = map (\v -> getSpeed v / l) vs
I could also pattern-match on speed in the lambda:
computeTimes' :: [Speed] -> Length -> [Double]
computeTimes' vs l = map (\(Speed v) -> v / l) vs
In either case, for some reason, I feel that real work is getting done! I start to feel even more uncomfortable when the newtype is buried within a deep tree of nested parametric datatypes, e.g. Map Speed [Set Speed]; in this situation, it may be difficult or impossible to pattern-match on the newtype, and one would have to resort to accessors like getSpeed.
TL;DR
Will the use of a newtype never ever incur a cost, even when the newtype appears as a (possibly deeply-buried) argument of another parametric type?
On their own, newtypes are cost-free. Applying their constructor, or pattern matching on them has zero cost.
When used as parameter for other types e.g. [T] the representation of [T] is precisely the same as the one for [T'] if T is a newtype for T'. So, there's no loss in performance.
However, there are two main caveats I can see.
newtypes and instances
First, newtype is frequently used to introduce new instances of type classes. Clearly, when these are user-defined, there's no guarantee that they have the same cost as the original instances. E.g., when using
newtype Op a = Op a
instance Ord a => Ord (Op a) where
compare (Op x) (Op y) = compare y x
comparing two Op Int will cost slightly more than comparing Int, since the arguments need to be swapped. (I am neglecting optimizations here, which might make this cost free when they trigger.)
newtypes used as type arguments
The second point is more subtle. Consider the following two implementations of the identity [Int] -> [Int]
id1, id2 :: [Int] -> [Int]
id1 xs = xs
id2 xs = map (\x->x) xs
The first one has constant cost. The second has a linear cost (assuming no optimization triggers). A smart programmer should prefer the first implementation, which is also simpler to write.
Suppose now we introduce newtypes on the argument type, only:
id1, id2 :: [Op Int] -> [Int]
id1 xs = xs -- error!
id2 xs = map (\(Op x)->x) xs
We can no longer use the constant cost implementation because of a type error. The linear cost implementation still works, and is the only option.
Now, this is quite bad. The input representation for [Op Int] is exactly, bit by bit, the same for [Int]. Yet, the type system forbids us to perform the identity in an efficient way!
To overcome this issue, safe coercions where introduced in Haskell.
id3 :: [Op Int] -> [Int]
id3 = coerce
The magic coerce function, under certain hypotheses, removes or inserts newtypes as needed to make type match, even inside other types, as for [Op Int] above. Further, it is a zero-cost function.
Note that coerce works only under certain conditions (the compiler checks for them). One of these is that the newtype constructor must be visible: if a module does not export Op :: a -> Op a you can not coerce Op Int to Int or vice versa. Indeed, if a module exports the type but not the constructor, it would be wrong to make the constructor accessible anyway through coerce. This makes the "smart constructors" idiom still safe: modules can still enforce complex invariants through opaque types.
It doesn't matter how deeply buried a newtype is in a stack of (fully) parametric types. At runtime, the values v :: Speed and w :: Double are completely indistinguishable – the wrapper is erased by the compiler, so even v is really just a pointer to a single 64-bit floating-point number in memory. Whether that pointer is stored in a list or tree or whatever doesn't make a difference either. getSpeed is a no-op and will not appear at runtime in any way at all.
So what do I mean by “fully parametric”? The thing is, newtypes can obviously make a difference at compile time, via the type system. In particular, they can guide instance resolution, so a newtype that invokes a different class method may certainly have worse (or, just as easily, better!) performance than the wrapped type. For example,
class Integral n => Fibonacci n where
fib :: n -> Integer
instance Fibonacci Int where
fib = (fibs !!)
where fibs = [ if i<2 then 1
else fib (i-2) + fib (i-1)
| i<-[0::Int ..] ]
this implementation is pretty slow, because it uses a lazy list (and performs lookups in it over and over again) for memoisation. On the other hand,
import qualified Data.Vector as Arr
-- | A number between 0 and 753
newtype SmallInt = SmallInt { getSmallInt :: Int }
instance Fibonacci SmallInt where
fib = (fibs Arr.!) . getSmallInt
where fibs = Arr.generate 754 $
\i -> if i<2 then 1
else fib (SmallInt $ i-2) + fib (SmallInt $ i-1)
This fib is much faster, because thanks to the input being limited to a small range, it is feasible to strictly allocate all of the results and store them in a fast O (1) lookup array, not needing the spine-laziness.
This of course applies again regardless of what structure you store the numbers in. But the different performance only comes about because different method instantiations are called – at runtime this means simply, completely different functions.
Now, a fully parametric type constructor must be able to store values of any type. In particular, it cannot impose any class restrictions on the contained data, and hence also not call any class methods. Therefore this kind of performance difference can not happen if you're just dealing with generic [a] lists or Map Int a maps. It can, however, occur when you're dealing with GADTs. In this case, even the actual memory layout might be completely differet, for instance with
{-# LANGUAGE GADTs #-}
import qualified Data.Vector as Arr
import qualified Data.Vector.Unboxed as UArr
data Array a where
BoxedArray :: Arr.Vector a -> Array a
UnboxArray :: UArr.Unbox a => UArr.Vector a -> Array a
might allow you to store Double values more efficiently than Speed values, because the former can be stored in a cache-optimised unboxed array. This is only possible because the UnboxArray constructor is not fully parametric.

Can pseq be defined in terms of seq?

As far as I know, seq a b evaluates (forces) a and b before returning b. It does not guarantee that a is evaluated first.
pseq a b evaluates a first, then evaluates/returns b.
Now consider the following:
xseq a b = (seq a id) b
Function application needs to evaluate the left operand first (to get a lambda form), and it can't blindly evaluate the right operand before entering the function because that would violate Haskell's non-strict semantics.
Therefore (seq a id) b must evaluate seq a id first, which forces a and id (in some unspecified order (but evaluating id does nothing)), then returns id b (which is b); therefore xseq a b evaluates a before b.
Is xseq a valid implementation of pseq? If not, what's wrong with the above argument (and is it possible to define pseq in terms of seq at all)?
The answer seems to be "no, at least not without additional magic".
The problem with
xseq a b = (seq a id) b
is that the compiler can see that the result of seq a id is id, which is strict everywhere. Function application is allowed to evaluate the argument first if the function is strict, because then doing so does not change the semantics of the expression. Therefore an optimizing compiler could start evaluating b first because it knows it will eventually need it.
Can pseq be defined in terms of seq?
In GHC - yes.
As noted by Alec, you'll also need the mirror-smoke lazy:
-- for GHC 8.6.5
import Prelude(seq)
import GHC.Base(lazy)
infixr 0 `pseq`
pseq :: a -> b -> b
pseq x y = x `seq` lazy y
the definition matching its counterpart in the GHC sources; the imports are
very different.
For other Haskell implementations, this may work:
import Prelude(seq)
infixr 0 `pseq`
pseq :: a -> b -> b
pseq x y = x `seq` (case x of _ -> y)
possibly in conjunction with - at the very least - the equivalent of:
-- for GHC 8.6.5
{-# NOINLINE pseq #-}
I'll let melpomene decide if that also qualifies as mirror-smoke...

Memoizing multiplication

My application multiplies vectors after a (costly) conversion using an FFT. As a result, when I write
f :: (Num a) => a -> [a] -> [a]
f c xs = map (c*) xs
I only want to compute the FFT of c once, rather than for every element of xs. There really isn't any need to store the FFT of c for the entire program, just in the local scope.
I attempted to define my Num instance like:
data Foo = Scalar c
| Vec Bool v -- the bool indicates which domain v is in
instance Num Foo where
(*) (Scalar c) = \x -> case x of
Scalar d -> Scalar (c*d)
Vec b v-> Vec b $ map (c*) v
(*) v1 = let Vec True v = fft v1
in \x -> case x of
Scalar d -> Vec True $ map (c*) v
v2 -> Vec True $ zipWith (*) v (fft v2)
Then, in an application, I call a function similar to f (which works on arbitrary Nums) where c=Vec False v, and I expected that this would be just as fast as if I hack f to:
g :: Foo -> [Foo] -> [Foo]
g c xs = let c' = fft c
in map (c'*) xs
The function g makes the memoization of fft c occur, and is much faster than calling f (no matter how I define (*)). I don't understand what is going wrong with f. Is it my definition of (*) in the Num instance? Does it have something to do with f working over all Nums, and GHC therefore being unable to figure out how to partially compute (*)?
Note: I checked the core output for my Num instance, and (*) is indeed represented as nested lambdas with the FFT conversion in the top level lambda. So it looks like this is at least capable of being memoized. I have also tried both judicious and reckless use of bang patterns to attempt to force evaluation to no effect.
As a side note, even if I can figure out how to make (*) memoize its first argument, there is still another problem with how it is defined: A programmer wanting to use the Foo data type has to know about this memoization capability. If she wrote
map (*c) xs
no memoization would occur. (It must be written as (map (c*) xs)) Now that I think about it, I'm not entirely sure how GHC would rewrite the (*c) version since I have curried (*). But I did a quick test to verify that both (*c) and (c*) work as expected: (c*) makes c the first arg to *, while (*c) makes c the second arg to *. So the problem is that it is not obvious how one should write the multiplication to ensure memoization. Is this just an inherent downside to the infix notation (and the implicit assumption that the arguments to * are symmetric)?
The second, less pressing issue is that the case where we map (v*) onto a list of scalars. In this case, (hopefully) the fft of v would be computed and stored, even though it is unnecessary since the other multiplicand is a scalar. Is there any way around this?
Thanks
I believe stable-memo package could solve your problem. It memoizes values not using equality but by reference identity:
Whereas most memo combinators memoize based on equality, stable-memo does it based on whether the exact same argument has been passed to the function before (that is, is the same argument in memory).
And it automatically drops memoized values when their keys are garbage collected:
stable-memo doesn't retain the keys it has seen so far, which allows them to be garbage collected if they will no longer be used. Finalizers are put in place to remove the corresponding entries from the memo table if this happens.
So if you define something like
fft = memo fft'
where fft' = ... -- your old definition
you'll get pretty much what you need: Calling map (c *) xs will memoize the computation of fft inside the first call to (*) and it gets reused on subsequent calls to (c *). And if c is garbage collected, so is fft' c.
See also this answer to How to add fields that only cache something to ADT?
I can see two problems that might prevent memoization:
First, f has an overloaded type and works for all Num instances. So f cannot use memoization unless it is either specialized (which usually requires a SPECIALIZE pragma) or inlined (which may happen automatically, but is more reliable with an INLINE pragma).
Second, the definition of (*) for Foo performs pattern matching on the first argument, but f multiplies with an unknown c. So within f, even if specialized, no memoization can occur. Once again, it very much depends on f being inlined, and a concrete argument for c to be supplied, so that inlining can actually appear.
So I think it'd help to see how exactly you're calling f. Note that if f is defined using two arguments, it has to be given two arguments, otherwise it cannot be inlined. It would furthermore help to see the actual definition of Foo, as the one you are giving mentions c and v which aren't in scope.

Resources