Specializing Imported Function in GHC Haskell - haskell

I'm working on a project right now where I'm dealing with
the Prim typeclass and I need to ensure that a particular
function I've written is specialized. That is, I need to make sure that
when I call it, I get a specialized version of the function in which the
Prim dictionaries get inlined into the specialized
definition instead of being passed at runtime.
Fortunately, this is a pretty well-understood thing in GHC. You can just
write:
{-# SPECIALIZE foo :: ByteArray Int -> Int #-}
foo :: Prim a => ByteArray a -> Int
foo = ...
And in my code, this approach is working fine. But, since typeclasses are
open, there can be Prim instances that I don't know about yet when
the library is being written. This brings me to the problem at hand.
The GHC user guide's documentation of SPECIALIZE
provides two ways to use it. The first is putting SPECIALIZE at the
site of the definition, as I did in the example above. The second is
putting the SPECIALIZE pragma in another module where the function is imported.
For reference, the example the user manual provides is:
module Map( lookup, blah blah ) where
lookup :: Ord key => [(key,a)] -> key -> Maybe a
lookup = ...
{-# INLINABLE lookup #-}
module Client where
import Map( lookup )
data T = T1 | T2 deriving( Eq, Ord )
{-# SPECIALISE lookup :: [(T,a)] -> T -> Maybe a
The problem I'm having is that this is not working in my code. The project
is on github,
and the relevant lines are:
bench/Main.hs line 24
src/BTree/Compact.hs line 149
To run the benchmark, run these commands:
git submodule init && git submodule update
cabal new-build bench && ./dist-newstyle/build/btree-0.1.0.0/build/bench/bench
When I run the benchmark as is, there is a part of the output that reads:
Off-heap tree, Amount of time taken to build:
0.293197796
If I uncomment line 151 of BTree.Compact,
that part of the benchmark runs fifty times faster:
Off-heap tree, Amount of time taken to build:
5.626834e-2
It's worth pointing out that the function in question, modifyWithM, is enormous.
It's implementation is over 100 lines, but I do not think this should make a
difference. The docs claim:
... mark the definition of f as INLINABLE, so that GHC guarantees to expose an unfolding regardless of how big it is.
So, my understanding is that, if specializing at the definition site works, it
should always be possible to instead specialize at the call site. I would appreciate
any insights from people who understand this machinery better than I do, and I'm
happy to provide more information if something is unclear. Thanks.
EDIT: I've realized that in the git commit I linked to in this post, there is a problem with the benchmark code. It repeatedly inserts the same value. However, even after fixing this, the specialization problem is still happening.

Related

reuse/memoization of global polymorphic (class) values in Haskell

I'm concerned with if and when a polymorphic "global" class value is shared/memoized, particularly across module boundaries. I have read this and this, but they don't quite seem to reflect my situation, and I'm seeing some different behavior from what one might expect from the answers.
Consider a class that exposes a value that can be expensive to compute:
{-# LANGUAGE FlexibleInstances, UndecidableInstances #-}
module A
import Debug.Trace
class Costly a where
costly :: a
instance Num i => Costly i where
-- an expensive (but non-recursive) computation
costly = trace "costly!" $ (repeat 1) !! 10000000
foo :: Int
foo = costly + 1
costlyInt :: Int
costlyInt = costly
And a separate module:
module B
import A
bar :: Int
bar = costly + 2
main = do
print foo
print bar
print costlyInt
print costlyInt
Running main yields two separate evaluations of costly (as indicated by the trace): one for foo, and one for bar. I know that costlyInt just returns the (evaluated) costly from foo, because if I remove print foo from main then the first costlyInt becomes costly. (I can also cause costlyInt to perform a separate evaluation no matter what, by generalizing the type of foo to Num a => a.)
I think I know why this behavior happens: the instance of Costly is effectively a function that takes a Num dictionary and generates a Costly dictionary. So when compiling bar and resolving the reference to costly, ghc generates a fresh Costly dictionary, which has an expensive thunk in it. Question 1: am I correct about this?
There are a few ways to cause just one evaluation of costly, including:
Put everything in one module.
Remove the Num i instance constraint and just define a Costly Int instance.
Unfortunately, the analogs of these solutions are not feasible in my program -- I have several modules that use the class value in its polymorphic form, and only in the top-level source file are concrete types finally used.
There are also changes that don't reduce the number of evaluations, such as:
Using INLINE, INLINABLE, or NOINLINE on the costly definition in the instance. (I didn't expect this to work, but hey, worth a shot.)
Using a SPECIALIZE instance Costly Int pragma in the instance definition.
The latter is surprising to me -- I'd expected it to be essentially equivalent to the second item above that did work. That is, I thought it would generate a special Costly Int dictionary, which all of foo, bar, and costlyInt would share. My question 2: what am I missing here?
My final question: is there any relatively simple and foolproof way to get what I want, i.e., all references to costly of a particular concrete type being shared across modules? From what I've seen so far, I suspect the answer is no, but I'm still holding out hope.
Controlling sharing is tricky in GHC. There are many optimizations that GHC does which can affect sharing (such as inlining, floating things out, etc).
In this case, to answer the question why the SPECIALIZE pragma did not achieve the intended effect, let's look at the Core of the B module, in particular of the bar function:
Rec {
bar_xs
bar_xs = : x1_r3lO bar_xs
end Rec }
bar1 = $w!! bar_xs 10000000
-- ^^^ this repeats the computation. bar_xs is just repeat 1
bar =
case trace $fCostlyi2 bar1 of _ { I# x_aDm -> I# (+# x_aDm 2) }
-- ^^^ this is just the "costly!" string
That didn't work as we wanted. Instead of reusing costly, GHC decided to just inline the costly function.
So we have to prevent GHC from inlining costly, or the computation will be duplicated. How do we do that? You might think adding a {-# NOINLINE costly #-} pragma would be enough, but unfortunately specialization without inlining don't seem to work together well:
A.hs:13:3: Warning:
Ignoring useless SPECIALISE pragma for NOINLINE function: ‘$ccostly’
But there is a trick to convince GHC to do what we want: we can write costly in the following way:
instance Num i => Costly i where
-- an expensive (but non-recursive) computation
costly = memo where
memo :: i
memo = trace "costly!" $ (repeat 1) !! 10000000
{-# NOINLINE memo #-}
{-# SPECIALIZE instance Costly Int #-}
-- (this might require -XScopedTypeVariables)
This allows us to specialize costly, will simultanously avoiding the inlining of our computation.

Haskell: Matching two expressions that are not from class Eq

First of all, I want to clarify that I've tried to find a solution to my problem googling but I didn't succeed.
I need a way to compare two expressions. The problem is that these expressions are not comparable. I'm coming from Erlang, where I can do :
case exp1 of
exp2 -> ...
where exp1 and exp2 are bound. But Haskell doesn't allow me to do this. However, in Haskell I could compare using ==. Unfortunately, their type is not member of the class Eq. Of course, both expressions are unknown until runtime, so I can't write a static pattern in the source code.
How could compare this two expressions without having to define my own comparison function? I suppose that pattern matching could be used here in some way (as in Erlang), but I don't know how.
Edit
I think that explaining my goal could help to understand the problem.
I'm modyfing an Abstract Syntax Tree (AST). I am going to apply a set of rules that are going to modify this AST, but I want to store the modifications in a list. The elements of this list should be a tuple with the original piece of the AST and its modification. So the last step is to for each tuple search for a piece of the AST that is exactly the same, and substitute it by the second element of the tuple. So, I will need something like this:
change (old,new):t piece_of_ast =
case piece_of_ast of
old -> new
_ -> piece_of_ast
I hope this explanation clarify my problem.
Thanks in advance
It's probably an oversight in the library (but maybe I'm missing a subtle reason why Eq is a bad idea!) and I would contact the maintainer to get the needed Eq instances added in.
But for the record and the meantime, here's what you can do if the type you want to compare for equality doesn't have an instance for Eq, but does have one for Data - as is the case in your question.
The Data.Generics.Twins package offers a generic version of equality, with type
geq :: Data a => a -> a -> Bool
As the documentation states, this is 'Generic equality: an alternative to "deriving Eq" '. It works by comparing the toplevel constructors and if they are the same continues on to the subterms.
Since the Data class inherits from Typeable, you could even write a function like
veryGenericEq :: (Data a, Data b) => a -> b -> Bool
veryGenericEq a b = case (cast a) of
Nothing -> False
Maybe a' -> geq a' b
but I'm not sure this is a good idea - it certainly is unhaskelly, smashing all types into one big happy universe :-)
If you don't have a Data instance either, but the data type is simple enough that comparing for equality is 100% straightforward then StandaloneDeriving is the way to go, as #ChristianConkle indicates. To do this you need to add a {-# LANGUAGE StandaloneDeriving #-} pragma at the top of your file and add a number of clauses
deriving instance Eq a => Eq (CStatement a)
one for each type CStatement uses that doesn't have an Eq instance, like CAttribute. GHC will complain about each one you need, so you don't have to trawl through the source.
This will create a bunch of so-called 'orphan instances.' Normally, an instance like instance C T where will be defined in either the module that defines C or the module that defines T. Your instances are 'orphans' because they're separated from their 'parents.' Orphan instances can be bad because you might start using a new library which also has those instances defined; which instance should the compiler use? There's a little note on the Haskell wiki about this issue. If you're not publishing a library for others to use, it's fine; it's your problem and you can deal with it. It's also fine for testing; if you can implement Eq, then the library maintainer can probably include deriving Eq in the library itself, solving your problem.
I'm not familiar with Erlang, but Haskell does not assume that all expressions can be compared for equality. Consider, for instance, undefined == undefined or undefined == let x = x in x.
Equality testing with (==) is an operation on values. Values of some types are simply not comparable. Consider, for instance, two values of type IO String: return "s" and getLine. Are they equal? What if you run the program and type "s"?
On the other hand, consider this:
f :: IO Bool
f = do
x <- return "s" -- note that using return like this is pointless
y <- getLine
return (x == y) -- both x and y have type String.
It's not clear what you're looking for. Pattern matching may be the answer, but if you're using a library type that's not an instance of Eq, the likely answer is that comparing two values is actually impossible (or that the library author has for some reason decided to impose that restriction).
What types, specifically, are you trying to compare?
Edit: As a commenter mentioned, you can also compare using Data. I don't know if that is easier for you in this situation, but to me it is unidiomatic; a hack.
You also asked why Haskell can't do "this sort of thing" automatically. There are a few reasons. In part it's historical; in part it's that, with deriving, Eq satisfies most needs.
But there's also an important principle in play: the public interface of a type ideally represents what you can actually do with values of that type. Your specific type is a bad example, because it exposes all its constructors, and it really looks like there should be an Eq instance for it. But there are many other libraries that do not expose the implementation of their types, and instead require you to use provided functions (and class instances). Haskell allows that abstraction; a library author can rely on the fact that a consumer can't muck about with implementation details (at least without using unsafe tools like unsafeCoerce).

GHC code generation for type class function calls

In Haskell to define an instance of a type class you need to supply a dictionary of functions required by the type class. I.e. to define an instance of Bounded, you need to supply a definition for minBound and maxBound.
For the purpose of this question, let's call this dictionary the vtbl for the type class instance. Let me know if this is poor analogy.
My question centers around what kind of code generation can I expect from GHC when I call a type class function. In such cases I see three possibilities:
the vtbl lookup to find the implementation function is down at run time
the vtbl lookup is done at compile time and a direct call to the implementation function is emitted in the generated code
the vtbl lookup is done at compile time and the implementation function is inlined at the call site
I'd like to understand when each of these occur - or if there are other possibilities.
Also, does it matter if the type class was defined in a separately compiled module as opposed to being part of the "main" compilation unit?
In a runnable program it seems that Haskell knows the types of all the functions and expressions in the program. Therefore, when I call a type class function the compiler should know what the vtbl is and exactly which implementation function to call. I would expect the compiler to at least generate a direct call to implementation function. Is this true?
(I say "runnable program" here to distinguish it from compiling a module which you don't run.)
As with all good questions, the answer is "it depends". The rule of thumb is that there's a runtime cost to any typeclass-polymorphic code. However, library authors have a lot of flexibility in eliminating this cost with GHC's rewrite rules, and in particular there is a {-# SPECIALIZE #-} pragma that can automatically create monomorphic versions of polymorphic functions and use them whenever the polymorphic function can be inferred to be used at the monomorphic type. (The price for doing this is library and executable size, I think.)
You can answer your question for any particular code segment using ghc's -ddump-simpl flag. For example, here's a short Haskell file:
vDouble :: Double
vDouble = 3
vInt = length [2..5]
main = print (vDouble + realToFrac vInt)
Without optimizations, you can see that GHC does the dictionary lookup at runtime:
Main.main :: GHC.Types.IO ()
[GblId]
Main.main =
System.IO.print
# GHC.Types.Double
GHC.Float.$fShowDouble
(GHC.Num.+
# GHC.Types.Double
GHC.Float.$fNumDouble
(GHC.Types.D# 3.0)
(GHC.Real.realToFrac
# GHC.Types.Int
# GHC.Types.Double
GHC.Real.$fRealInt
GHC.Float.$fFractionalDouble
(GHC.List.length
# GHC.Integer.Type.Integer
(GHC.Enum.enumFromTo
# GHC.Integer.Type.Integer
GHC.Enum.$fEnumInteger
(__integer 2)
(__integer 5)))))
...the relevant bit being realToFrac #Int #Double. At -O2, on the other hand, you can see it did the dictionary lookup statically and inlined the implementation, the result being a single call to int2Double#:
Main.main2 =
case GHC.List.$wlen # GHC.Integer.Type.Integer Main.main3 0
of ww_a1Oq { __DEFAULT ->
GHC.Float.$w$sshowSignedFloat
GHC.Float.$fShowDouble_$sshowFloat
GHC.Show.shows26
(GHC.Prim.+## 3.0 (GHC.Prim.int2Double# ww_a1Oq))
(GHC.Types.[] # GHC.Types.Char)
}
It's also possible for a library author to choose to rewrite the polymorphic function to a call to a monomorphic one but not inline the implementation of the monomorphic one; this means that all of the possibilities you proposed (and more) are possible.
If the compiler can "tell", at compile-time, what actual type you're using, then the method lookup happens at compile-time. Otherwise it happens at run-time. If lookup happens at compile-time, the method code may be inlined, depending on how large the method is. (This goes for regular functions too: If the compiler knows which function you're calling, it will inline it if that function is "small enough".)
Consider, for example, (sum [1 .. 10]) :: Integer. Here the compiler statically knows that the list is a list of Integer takes, so it can inline the + function for Integer. On the other hand, if you do something like
foo :: Num x => [x] -> x
foo xs = sum xs - head x
then, when you call sum, the compiler doesn't know what type you're using. (It depends on what type is given to foo), so it can't do any compile-time lookup.
On the other hand, using the {-# SPECIALIZE #-} pragma, you can do something like
{-# SPECIALIZE foo:: [Int] -> Int #-}
What this does is tell the compiler to compile a special version of foo where the input is a list of Int values. This obviously means that for that version, the compiler can do all the method lookups at compile-time (and almost certainly inline them all). Now there are two versions of foo - one which works for any type and does run-time type lookups, and one that works only for Int, but is [probably] much faster.
When you call the foo function, the compiler has to decide which version to call. If the compiler can "tell", at compile-time, that you want the Int version, it will do that. If it can't "tell" what type you're going to use, it'll use the slower any-type version.
Note that you can have multiple specialisations of a single function. For example, you could do
{-# SPECIALIZE foo :: [Int] -> Int #-}
{-# SPECIALIZE foo :: [Double] -> Double #-}
{-# SPECIALIZE foo :: [Complex Double] -> Complex Double #-}
Now, whenever the compiler can tell that you're using one of these types, it'll use the version hard-coded for that type. But if the compiler can't tell what type you're using, it'll never use the specialised versions, and always the polymorphic one. (That might mean that you need to specialise the function(s) that call foo, for example.)
If you crawl around the compiler's Core output, you can probably figure out exactly what it did in any particular circumstance. You will probably go stark raving mad though...
As other answers describe, any of these can happen in different situations. For any specific function call, the only way to be sure is to look at the generated core. That said, there are some cases where you can get a good idea of what will happen.
Using a type class method at a monomorphic type.
When a type class method is called in a situation where the type is entirely known at compile time, GHC will perform the lookup at compile time. For example
isFive :: Int -> Bool
isFive i = i == 5
Here the compiler knows that it needs Ints Eq dictionary, so it emits code to call the function statically. Whether or not that call is inlined depends upon GHC's usual inlining rules, and whether or not an INLINE pragma applies to the class method definition.
Exposing a polymorphic function
If a polymorphic function is exposed from a compiled module, then the base case is that the lookup needs to be performed at runtime.
module Foo (isFiveP) where
isFiveP :: (Eq a, Num a) => a -> Bool
isFiveP i = i == 5
What GHC actually does is transform this into a function of the form (more or less)
isFiveP_ eqDict numDict i = (eq_op eqDict) i (fromIntegral_fn numDict 5)
so the function lookups would need to be performed at runtime.
That's the base case, anyway. What actually happens is that GHC can be quite aggressive about cross-module inlining. isFiveP is small enough that it would be inlined into the call site. If the type can be determined at the call site, then the dictionary lookups will all be performed at compile time. Even if a polymorphic function isn't directly inlined at the call site, the dictionary lookups may still be performed at compile time due to GHC's usual function transformations, if the code ever gets to a form where the function (with class dictionary parameters) can be applied to a statically-known dictionary.

Performance varies dramatically if a function is moved between modules

If I move a function from where it's used into a separate module, I've noticed the performance of the program drops significantly.
calc = sum . nub . map third . filter isProd . concat . map parts . permutations
where third (_,_,b) = fromDigits b
isProd (a,b,p) = fromDigits a * fromDigits b == fromDigits p
-- All possibilities have digits: A x AAAA or AA x AAA
parts (a:b:c:d:e:rest) = [([a], [b,c,d,e], rest)
,([a,b], [c,d,e], rest)]
in another module:
fromDigits :: Integral a => [a] -> a
fromDigits = foldl1' (\a b -> 10 * a + b)
This runs in 0.1 seconds when fromDigits is in the same module, but 0.4 seconds when I move it to another module.
I assume this is because GHC can't inline the function if it's in different module, but I feel like it should be able to, since they are in the same package.
I'm not sure what the compiler settings are, but it's built with Leksah/cabal defaults. I'm fairly sure that's with -O2 as a minimum.
For the type-class polymorphic fromDigits, you get a function that is, due to the dictionary lookups for (+), (*) and fromInteger, too large to have its unfolding automatically exposed. That means it can't be specialised at the call sites and the dictionary lookups can't be eliminated to possibly inline addition and multiplication (which might enable further optimisation).
When it is defined in the same module as it is used in, with optimisations, GHC creates a specialised version for the type it's used at, if that is known. Then the dictionary lookups can be eliminated and the (+) and (*) operations can be inlined (if the type they're used at has operations suitable for inlining).
But that depends on the type being known. So if you have the polymorphic calc and fromDigits in one module, but use it only in some other module, you are again in the position that only the generic version is available, but since its unfolding is not exposed, it can't be specialised or otherwise optimised at the call site.
One solution is to make the unfolding of the function exposed in the interface file, so it can be properly optimised where it is used, when the necessary data (in particular the type) is available. You can expose the function's unfolding in the interface file by adding an {-# INLINE #-}, or, as of GHC 7, an {-# INLINABLE #-} pragma to the function. That makes the almost unchanged source code available when compiling the calling code, so the function can be properly optimised with more information available.
The downside to this is code-bloat, you get a copy of the optimised code at every call site (for INLINABLE it's not so extreme, you get at least one copy per calling module, that's usually not too bad).
An alternative solution is to generate specialised versions in the defining module by adding {-# SPECIALISE #-} pragmas (US spelling also accepted) to let GHC create optimised versions for the important types (Int, Integer, Word, ?). That also creates rewrite rules, so that uses at the specialised-for types get rewritten to use the specialised version (when compiling with optimisations).
The downside to this is that some optimisations that would be possible when the code is inlined aren't.
You can tell GHC to inline a function at the call site using the inline function: http://www.haskell.org/ghc/docs/7.0.4/html/libraries/ghc-prim-0.2.0.0/GHC-Prim.html#v%3Ainline. You may want to use it in conjunction with the INLINABLE pragma: http://www.haskell.org/ghc/docs/7.0.4/html/users_guide/pragmas.html#inlinable-pragma

In Haskell, is there some way to forcefully coerce a polymorphic call?

I have a list of values (or functions) of any type. I have another list of functions of any type. The user at runtime will choose one from the first list, and another from the second list. I have a mechanism to ensure that the two items are type compatible (value or output from first is compatible with input of second).
I need some way to call the function with the value (or compose the functions). If the second function has concrete types, unsafeCoerce works fine. But if it's of the form:
polyFunc :: MyTypeclass a => a -> IO ()
polyFunc x = print . show . typeclassFunc x
Then unsafeCoerce doesn't work since it can't resolve to a concrete type.
Is there any way to do what I'm trying to do?
Here's an example of what the lists might look like. However... I'm not limited to this, if there is some other way to represent these that will solve the problem, I would like to know. A critical thing to consider is that: the list can change at runtime so I do not know at compile time all the possible types that might be involved.
data Wrapper = forall a. Wrapper a
firstList :: [Wrapper]
firstList = [Wrapper "blue", Wrapper 5, Wrapper valueOfMyTypeclass]
data OtherWrapper = forall a. Wrapper (a -> IO ())
secondList :: [OtherWrapper]
secondList = [OtherWrapper print, OtherWrapper polyFunc]
Note: As for why I want to do such a crazy thing:
I'm generating code and typechecking it with hint. But that happens at runtime. The problem is that hint is slow at actually executing things and high performance for this is critical. Also, at least in certain cases, I do not want to generate code and run it through ghc at runtime (though we have done some of that, too). So... I'm trying to find somewhere in the middle: dynamically hook things together without having to generate code and compile, but run it at compiled speed instead of interpreted.
Edit: Okay, so now that I see what's going on a bit more, here's a very general approach -- don't use polymorphic functions directly at all! Instead, use functions of type Dynamic -> IO ()! Then, they can use "typecase"-style dispatch directly to choose which monomorphic function to invoke -- i.e. just switching on the TypeRep. You do have to encode this dispatch directly for each polymorphic function you're wrapping. However, you can automate this with some template Haskell if it becomes enough of a hassle.
Essentially, rather than overloading Haskell's polymorphism, just as Dynamic embeds an dynamically typed language in a statically typed language, you now extend that to embed dynamic polymorphism in a statically typed language.
--
Old answer: More code would be helpful. But, as far as I can tell, this is the read/show problem. I.e. You have a function that produces a polymorphic result, and a function that takes a polymorphic input. The issue is that you need to pick what the intermediate value is, such that it satisfies both constraints. If you have a mechanism to do so, then the usual tricks will work, making sure you satisfy that open question which the compiler can't know the answer to.
I'm not sure that I completely understand your question. But since you have value and function which have compatible types you could combine them into single value. Then compiler could prove that types do match.
{-# LANGUAGE ExistentialQuantification #-}
data Vault = forall a . Vault (a -> IO ()) a
runVault :: Vault -> IO ()
runVault (Vault f x) = f xrun

Resources