I'm concerned with if and when a polymorphic "global" class value is shared/memoized, particularly across module boundaries. I have read this and this, but they don't quite seem to reflect my situation, and I'm seeing some different behavior from what one might expect from the answers.
Consider a class that exposes a value that can be expensive to compute:
{-# LANGUAGE FlexibleInstances, UndecidableInstances #-}
module A
import Debug.Trace
class Costly a where
costly :: a
instance Num i => Costly i where
-- an expensive (but non-recursive) computation
costly = trace "costly!" $ (repeat 1) !! 10000000
foo :: Int
foo = costly + 1
costlyInt :: Int
costlyInt = costly
And a separate module:
module B
import A
bar :: Int
bar = costly + 2
main = do
print foo
print bar
print costlyInt
print costlyInt
Running main yields two separate evaluations of costly (as indicated by the trace): one for foo, and one for bar. I know that costlyInt just returns the (evaluated) costly from foo, because if I remove print foo from main then the first costlyInt becomes costly. (I can also cause costlyInt to perform a separate evaluation no matter what, by generalizing the type of foo to Num a => a.)
I think I know why this behavior happens: the instance of Costly is effectively a function that takes a Num dictionary and generates a Costly dictionary. So when compiling bar and resolving the reference to costly, ghc generates a fresh Costly dictionary, which has an expensive thunk in it. Question 1: am I correct about this?
There are a few ways to cause just one evaluation of costly, including:
Put everything in one module.
Remove the Num i instance constraint and just define a Costly Int instance.
Unfortunately, the analogs of these solutions are not feasible in my program -- I have several modules that use the class value in its polymorphic form, and only in the top-level source file are concrete types finally used.
There are also changes that don't reduce the number of evaluations, such as:
Using INLINE, INLINABLE, or NOINLINE on the costly definition in the instance. (I didn't expect this to work, but hey, worth a shot.)
Using a SPECIALIZE instance Costly Int pragma in the instance definition.
The latter is surprising to me -- I'd expected it to be essentially equivalent to the second item above that did work. That is, I thought it would generate a special Costly Int dictionary, which all of foo, bar, and costlyInt would share. My question 2: what am I missing here?
My final question: is there any relatively simple and foolproof way to get what I want, i.e., all references to costly of a particular concrete type being shared across modules? From what I've seen so far, I suspect the answer is no, but I'm still holding out hope.
Controlling sharing is tricky in GHC. There are many optimizations that GHC does which can affect sharing (such as inlining, floating things out, etc).
In this case, to answer the question why the SPECIALIZE pragma did not achieve the intended effect, let's look at the Core of the B module, in particular of the bar function:
Rec {
bar_xs
bar_xs = : x1_r3lO bar_xs
end Rec }
bar1 = $w!! bar_xs 10000000
-- ^^^ this repeats the computation. bar_xs is just repeat 1
bar =
case trace $fCostlyi2 bar1 of _ { I# x_aDm -> I# (+# x_aDm 2) }
-- ^^^ this is just the "costly!" string
That didn't work as we wanted. Instead of reusing costly, GHC decided to just inline the costly function.
So we have to prevent GHC from inlining costly, or the computation will be duplicated. How do we do that? You might think adding a {-# NOINLINE costly #-} pragma would be enough, but unfortunately specialization without inlining don't seem to work together well:
A.hs:13:3: Warning:
Ignoring useless SPECIALISE pragma for NOINLINE function: ‘$ccostly’
But there is a trick to convince GHC to do what we want: we can write costly in the following way:
instance Num i => Costly i where
-- an expensive (but non-recursive) computation
costly = memo where
memo :: i
memo = trace "costly!" $ (repeat 1) !! 10000000
{-# NOINLINE memo #-}
{-# SPECIALIZE instance Costly Int #-}
-- (this might require -XScopedTypeVariables)
This allows us to specialize costly, will simultanously avoiding the inlining of our computation.
Related
I understand that newtype erases the type constructor at compile time as an optimization, so that newtype Foo = Foo Int results in just an Int. In other words, I am not asking this question. My question is not about what newtype does.
Instead, I'm trying to understand why the compiler can't simply apply this optimization itself when it sees a single-value data constructor. When I use hlint, it's smart enough to tell me that a single-value data constructor should be a newtype. (I never make this mistake, but tried it out to see what would happen. My suspicions were confirmed.)
One objection could be that without newtype, we couldn't use GeneralizedNewTypeDeriving and other such extensions. But that's easily solved. If we say…
data Foo m a b = Foo a (m b) deriving (Functor, Applicative, Monad)
The compiler can just barf and tell us of our folly.
Why do we need newtype when the compiler can always figure it out for itself?
It seems plausible that newtype started out mostly as a programmer-supplied annotation to perform an optimization that compilers were too stupid to figure out on their own, sort of like the register keyword in C.
However, in Haskell, newtype isn't just an advisory annotation for the compiler; it actually has semantic consequences. The types:
newtype Foo = Foo Int
data Bar = Bar Int
declare two non-isomorphic types. Specifically, Foo undefined and undefined :: Foo are equivalent while Bar undefined and undefined :: Bar are not, with the result that:
Foo undefined `seq` "not okay" -- is an exception
Bar undefined `seq` "okay" -- is "okay"
and
case undefined of Foo n -> "okay" -- is okay
case undefined of Bar n -> "not okay" -- is an exception
As others have noted, if you make the data field strict:
data Baz = Baz !Int
and take care to only use irrefutable pattern matches, then Baz acts just like the newtype Foo:
Baz undefined `seq` "not okay" -- exception, like Foo
case undefined of ~(Baz n) -> "okay" -- is "okay", like Foo
In other words, if my grandmother had wheels, she'd be a bike!
So, why can't the compiler simply apply this optimization itself when it sees a single-value data constructor? Well, it can't perform this optimization in general without changing the semantics of a program, so it needs to first prove that the semantics are unchanged if a particular arbitrary, one-constructor, one-field data type is made strict in its field and matched irrefutably instead of strictly. Since this depends on how values of the type are actually used, this can be hard to do for data types exported by a module, especially at function call boundaries, but the existing optimization mechanisms for specialization, inlining, strictness analysis, and unboxing often perform equivalent optimizations in chunks of self-contained code, so you may get the benefits of a newtype even when you use a data type by accident. In general, though, it seems to be too hard a problem for the compiler to solve, so the burden of remembering to newtype things is left on the programmer.
This leads to the obvious question -- why can't we change the semantics so they're equivalent; why are the semantics of newtype and data different in the first place?
Well, the reason for the newtype semantics seems pretty obvious. As a result of the nature of the newtype optimization (erasure of the type and constructor at compile time), it becomes impossible -- or at the very least exceedingly difficulty -- to separately represent Foo undefined and undefined :: Foo at compile time which explains the equivalence of these two values. Consequently, irrefutable matching is an obvious further optimization when there's only one possible constructor and there's no possibility that that constructor isn't present (or at least no possibility of distinguishing between presence and absence of the constructor, because the only case where this could happen is in distinguishing between Foo undefined and undefined :: Foo, which we've already said can't be distinguished in compiled code).
The reason for the semantics of a one-constructor, one-field data type (in the absence of strictness annotations and irrefutable matches) is maybe less obvious. However, these semantics are entirely consistent with data types having constructor and/or field counts other than one, while the newtype semantics would introduce an arbitrary inconsistency between this one special case of a data type and all others.
Because of this historical distinction between data and newtype types, a number of subsequent extensions have treated them differently, further entrenching different semantics. You mention GeneralizedNewTypeDeriving which works on newtypes but not one-constructor, one-field data types. There are further differences in calculation of representational equivalence used for safe coercions (i.e., Data.Coerce) and DerivingVia, the use of existential quantification or more general GADTs, the UNPACK pragma, etc. There are also some differences in the way types are represented in generics, though now that I look at them more carefully, they seem pretty superficial.
Even if newtypes were an unnecessary historical mistake that could have been replaced by special-casing certain data types, it's a little late to put the genie back in the bottle.
Besides, newtypes don't really strike me as unnecessary duplication of an existing facility. To me, data and newtype types are conceptually quite different. A data type is an algebraic, sum-of-products type, and it's just coincidence that a particular special case of algebraic types happens to have one constructor and one field and so ends up being (nearly) isomorphic to the field type. In contrast, a newtype is intended from the start to be an isomorphism of an existing type, basically a type alias with an extra wrapper to distinguish it at the type level and allow us to pass around a separate type constructor, attach instances, and so on.
This is an excellent question. Semantically,
newtype Foo = Foo Int
is identical to
data Foo' = Foo !Int
except that pattern matching on the former is lazy and on the latter is strict. So a compiler certainly could compile them the same, and adjust the compilation of pattern matching to keep the semantics right.
For a type like you've described, that optimization isn't really all that critical in practice, because users can just use newtype and sprinkle in seqs or bang patterns as needed. Where it would get a lot more useful is for existentially quantified types and GADTs. That is, we'd like to get the more compact representation for types like
data Baz a b where
Baz :: !a -> Baz a Bool
data Quux where
Quux :: !a -> Quux
But GHC doesn't currently offer any such optimization, and doing so would be somewhat trickier in these contexts.
Why do we need newtype when the compiler can always figure it out for itself?
It can’t. data and newtype have different semantics: data adds an additional level of indirection, while newtype has exactly the same representation as its wrapped type, and always uses lazy pattern matching, while you choose whether to make data lazy or strict with strictness annotation (! or pragmas like StrictData).
Likewise, a compiler doesn’t always know for certain when data can be replaced with newtype. Strictness analysis allows it to conservatively determine when it may remove unnecessary laziness around things that will always be evaluated; in this case it can effectively remove the data wrapper locally. GHC does something similar when removing extra boxing & unboxing in a chain of operations on a boxed numeric type like Int, so it can do most of the calculations on the more efficient unboxed Int#. But in general (that is, without global optimisation) it can’t know whether some code is relying on that thunk’s being there.
So HLint offers this as a suggestion because usually you don’t need the “extra” wrapper at runtime, but other times it’s essential. The advice is just that: advice.
There is unboxed types GHC for Int, Float, etc.
I know about code built on them is running with less overhead,
but I don't see a way how to input and output data to/from a function based on unboxed Int i.e.
GHC.Exts defines functions (+#) and (*#), but I cannot find function boxing/unboxy
readInt:: String -> Int#
showInt:: Int# -> String
boxInt :: Int# -> Int
unboxInt :: Int -> Int#
instance Show Int# and instance Read Int# cannot exist because show and read polymorphic.
Without these function how could I integrated optimized code block on unboxed types with the rest of application?
Int, Float, etc. are just data types in GHC:
data Int = I# Int#
data Float = F# Float#
-- etc.
The constructors are only exported by GHC.Exts. Import it and use the constructors to convert:
{-# LANGUAGE MagicHash #-}
import GHC.Exts
main = do I# x <- readLn
I# y <- readLn
print (I# (x +# y))
I know about code built on them is running with less overhead
Though this is true in a sense, it's nothing you should normally worry about. GHC tries very hard to optimise away the boxes of built-in types, and I'd expect that it manages to do it well in most cases where you could do that manually as well.
In practice, what you should be more careful about is to ensure
that it actually sees the concrete Int or Float type for which it knows the unboxed form. In particular, this does not work for polymorphic functions (polymorphism generally relies on the boxes, like it does in OO languages).If you want a function to be polymorphic and still run fast with the primitive types, make sure you add a SPECIALIZE annotation and/or rewrite rules.
that laziness doesn't get in the way. Unboxed types are always strict, so strictness annotations can make it a lot easier for GHC to remove the boxes.
And of course profile your code.
Only if you're really sure you want it (e.g. to ensure that the boxes won't re-appear when a new GHC that optimises differently), or maybe if you'd like to get SIMD instructions in, should you actually do manual accesses to unboxed primitive types.
I'm working on a project right now where I'm dealing with
the Prim typeclass and I need to ensure that a particular
function I've written is specialized. That is, I need to make sure that
when I call it, I get a specialized version of the function in which the
Prim dictionaries get inlined into the specialized
definition instead of being passed at runtime.
Fortunately, this is a pretty well-understood thing in GHC. You can just
write:
{-# SPECIALIZE foo :: ByteArray Int -> Int #-}
foo :: Prim a => ByteArray a -> Int
foo = ...
And in my code, this approach is working fine. But, since typeclasses are
open, there can be Prim instances that I don't know about yet when
the library is being written. This brings me to the problem at hand.
The GHC user guide's documentation of SPECIALIZE
provides two ways to use it. The first is putting SPECIALIZE at the
site of the definition, as I did in the example above. The second is
putting the SPECIALIZE pragma in another module where the function is imported.
For reference, the example the user manual provides is:
module Map( lookup, blah blah ) where
lookup :: Ord key => [(key,a)] -> key -> Maybe a
lookup = ...
{-# INLINABLE lookup #-}
module Client where
import Map( lookup )
data T = T1 | T2 deriving( Eq, Ord )
{-# SPECIALISE lookup :: [(T,a)] -> T -> Maybe a
The problem I'm having is that this is not working in my code. The project
is on github,
and the relevant lines are:
bench/Main.hs line 24
src/BTree/Compact.hs line 149
To run the benchmark, run these commands:
git submodule init && git submodule update
cabal new-build bench && ./dist-newstyle/build/btree-0.1.0.0/build/bench/bench
When I run the benchmark as is, there is a part of the output that reads:
Off-heap tree, Amount of time taken to build:
0.293197796
If I uncomment line 151 of BTree.Compact,
that part of the benchmark runs fifty times faster:
Off-heap tree, Amount of time taken to build:
5.626834e-2
It's worth pointing out that the function in question, modifyWithM, is enormous.
It's implementation is over 100 lines, but I do not think this should make a
difference. The docs claim:
... mark the definition of f as INLINABLE, so that GHC guarantees to expose an unfolding regardless of how big it is.
So, my understanding is that, if specializing at the definition site works, it
should always be possible to instead specialize at the call site. I would appreciate
any insights from people who understand this machinery better than I do, and I'm
happy to provide more information if something is unclear. Thanks.
EDIT: I've realized that in the git commit I linked to in this post, there is a problem with the benchmark code. It repeatedly inserts the same value. However, even after fixing this, the specialization problem is still happening.
In Haskell to define an instance of a type class you need to supply a dictionary of functions required by the type class. I.e. to define an instance of Bounded, you need to supply a definition for minBound and maxBound.
For the purpose of this question, let's call this dictionary the vtbl for the type class instance. Let me know if this is poor analogy.
My question centers around what kind of code generation can I expect from GHC when I call a type class function. In such cases I see three possibilities:
the vtbl lookup to find the implementation function is down at run time
the vtbl lookup is done at compile time and a direct call to the implementation function is emitted in the generated code
the vtbl lookup is done at compile time and the implementation function is inlined at the call site
I'd like to understand when each of these occur - or if there are other possibilities.
Also, does it matter if the type class was defined in a separately compiled module as opposed to being part of the "main" compilation unit?
In a runnable program it seems that Haskell knows the types of all the functions and expressions in the program. Therefore, when I call a type class function the compiler should know what the vtbl is and exactly which implementation function to call. I would expect the compiler to at least generate a direct call to implementation function. Is this true?
(I say "runnable program" here to distinguish it from compiling a module which you don't run.)
As with all good questions, the answer is "it depends". The rule of thumb is that there's a runtime cost to any typeclass-polymorphic code. However, library authors have a lot of flexibility in eliminating this cost with GHC's rewrite rules, and in particular there is a {-# SPECIALIZE #-} pragma that can automatically create monomorphic versions of polymorphic functions and use them whenever the polymorphic function can be inferred to be used at the monomorphic type. (The price for doing this is library and executable size, I think.)
You can answer your question for any particular code segment using ghc's -ddump-simpl flag. For example, here's a short Haskell file:
vDouble :: Double
vDouble = 3
vInt = length [2..5]
main = print (vDouble + realToFrac vInt)
Without optimizations, you can see that GHC does the dictionary lookup at runtime:
Main.main :: GHC.Types.IO ()
[GblId]
Main.main =
System.IO.print
# GHC.Types.Double
GHC.Float.$fShowDouble
(GHC.Num.+
# GHC.Types.Double
GHC.Float.$fNumDouble
(GHC.Types.D# 3.0)
(GHC.Real.realToFrac
# GHC.Types.Int
# GHC.Types.Double
GHC.Real.$fRealInt
GHC.Float.$fFractionalDouble
(GHC.List.length
# GHC.Integer.Type.Integer
(GHC.Enum.enumFromTo
# GHC.Integer.Type.Integer
GHC.Enum.$fEnumInteger
(__integer 2)
(__integer 5)))))
...the relevant bit being realToFrac #Int #Double. At -O2, on the other hand, you can see it did the dictionary lookup statically and inlined the implementation, the result being a single call to int2Double#:
Main.main2 =
case GHC.List.$wlen # GHC.Integer.Type.Integer Main.main3 0
of ww_a1Oq { __DEFAULT ->
GHC.Float.$w$sshowSignedFloat
GHC.Float.$fShowDouble_$sshowFloat
GHC.Show.shows26
(GHC.Prim.+## 3.0 (GHC.Prim.int2Double# ww_a1Oq))
(GHC.Types.[] # GHC.Types.Char)
}
It's also possible for a library author to choose to rewrite the polymorphic function to a call to a monomorphic one but not inline the implementation of the monomorphic one; this means that all of the possibilities you proposed (and more) are possible.
If the compiler can "tell", at compile-time, what actual type you're using, then the method lookup happens at compile-time. Otherwise it happens at run-time. If lookup happens at compile-time, the method code may be inlined, depending on how large the method is. (This goes for regular functions too: If the compiler knows which function you're calling, it will inline it if that function is "small enough".)
Consider, for example, (sum [1 .. 10]) :: Integer. Here the compiler statically knows that the list is a list of Integer takes, so it can inline the + function for Integer. On the other hand, if you do something like
foo :: Num x => [x] -> x
foo xs = sum xs - head x
then, when you call sum, the compiler doesn't know what type you're using. (It depends on what type is given to foo), so it can't do any compile-time lookup.
On the other hand, using the {-# SPECIALIZE #-} pragma, you can do something like
{-# SPECIALIZE foo:: [Int] -> Int #-}
What this does is tell the compiler to compile a special version of foo where the input is a list of Int values. This obviously means that for that version, the compiler can do all the method lookups at compile-time (and almost certainly inline them all). Now there are two versions of foo - one which works for any type and does run-time type lookups, and one that works only for Int, but is [probably] much faster.
When you call the foo function, the compiler has to decide which version to call. If the compiler can "tell", at compile-time, that you want the Int version, it will do that. If it can't "tell" what type you're going to use, it'll use the slower any-type version.
Note that you can have multiple specialisations of a single function. For example, you could do
{-# SPECIALIZE foo :: [Int] -> Int #-}
{-# SPECIALIZE foo :: [Double] -> Double #-}
{-# SPECIALIZE foo :: [Complex Double] -> Complex Double #-}
Now, whenever the compiler can tell that you're using one of these types, it'll use the version hard-coded for that type. But if the compiler can't tell what type you're using, it'll never use the specialised versions, and always the polymorphic one. (That might mean that you need to specialise the function(s) that call foo, for example.)
If you crawl around the compiler's Core output, you can probably figure out exactly what it did in any particular circumstance. You will probably go stark raving mad though...
As other answers describe, any of these can happen in different situations. For any specific function call, the only way to be sure is to look at the generated core. That said, there are some cases where you can get a good idea of what will happen.
Using a type class method at a monomorphic type.
When a type class method is called in a situation where the type is entirely known at compile time, GHC will perform the lookup at compile time. For example
isFive :: Int -> Bool
isFive i = i == 5
Here the compiler knows that it needs Ints Eq dictionary, so it emits code to call the function statically. Whether or not that call is inlined depends upon GHC's usual inlining rules, and whether or not an INLINE pragma applies to the class method definition.
Exposing a polymorphic function
If a polymorphic function is exposed from a compiled module, then the base case is that the lookup needs to be performed at runtime.
module Foo (isFiveP) where
isFiveP :: (Eq a, Num a) => a -> Bool
isFiveP i = i == 5
What GHC actually does is transform this into a function of the form (more or less)
isFiveP_ eqDict numDict i = (eq_op eqDict) i (fromIntegral_fn numDict 5)
so the function lookups would need to be performed at runtime.
That's the base case, anyway. What actually happens is that GHC can be quite aggressive about cross-module inlining. isFiveP is small enough that it would be inlined into the call site. If the type can be determined at the call site, then the dictionary lookups will all be performed at compile time. Even if a polymorphic function isn't directly inlined at the call site, the dictionary lookups may still be performed at compile time due to GHC's usual function transformations, if the code ever gets to a form where the function (with class dictionary parameters) can be applied to a statically-known dictionary.
I am starting Haskell and was looking at some libraries where data types are defined with "!". Example from the bytestring library:
data ByteString = PS {-# UNPACK #-} !(ForeignPtr Word8) -- payload
{-# UNPACK #-} !Int -- offset
{-# UNPACK #-} !Int -- length
Now I saw this question as an explanation of what this means and I guess it is fairly easy to understand. But my question is now: what is the point of using this? Since the expression will be evaluated whenever it is need, why would you force the early evaluation?
In the second answer to this question C.V. Hansen says: "[...] sometimes the overhead of lazyness can be too much or wasteful". Is that supposed to mean that it is used to save memory (saving the value is cheaper than saving the expression)?
An explanation and an example would be great!
Thanks!
[EDIT] I think I should have chosen an example without {-# UNPACK #-}. So let me make one myself. Would this ever make sense? Is yes, why and in what situation?
data MyType = Const1 !Int
| Const2 !Double
| Const3 !SomeOtherDataTypeMaybeMoreComplex
The goal here is not strictness so much as packing these elements into the data structure. Without strictness, any of those three constructor arguments could point either to a heap-allocated value structure or a heap-allocated delayed evaluation thunk. With strictness, it could only point to a heap-allocated value structure. With strictness and packed structures, it's possible to make those values inline.
Since each of those three values is a pointer-sized entity and is accessed strictly anyway, forcing a strict and packed structure saves pointer indirections when using this structure.
In the more general case, a strictness annotation can help reduce space leaks. Consider a case like this:
data Foo = Foo Int
makeFoo :: ReallyBigDataStructure -> Foo
makeFoo x = Foo (computeSomething x)
Without the strictness annotation, if you just call makeFoo, it will build a Foo pointing to a thunk pointing to the ReallyBigDataStructure, keeping it around in memory until something forces the thunk to evaluate. If we instead have
data Foo = Foo !Int
This forces the computeSomething evaluation to proceed immediately (well, as soon as something forces makeFoo itself), which avoids leaving a reference to the ReallyBigDataStructure.
Note that this is a different use case than the bytestring code; the bytestring code forces its parameters quite frequently so it's unlikely to lead to a space leak. It's probably best to interpret the bytestring code as a pure optimization to avoid pointer dereferences.