Here's the code:
{-# LANGUAGE FlexibleContexts #-}
import Data.Int
import qualified Data.Vector.Unboxed as U
import qualified Data.Vector.Generic as V
{-# NOINLINE f #-} -- Note the 'NO'
--f :: (Num r, V.Vector v r) => v r -> v r -> v r
--f :: (V.Vector v Int64) => v Int64 -> v Int64 -> v Int64
--f :: (U.Unbox r, Num r) => U.Vector r -> U.Vector r -> U.Vector r
f :: U.Vector Int64 -> U.Vector Int64 -> U.Vector Int64
f = V.zipWith (+) -- or U.zipWith, it doesn't make a difference
main = do
let iters = 100
dim = 221184
y = U.replicate dim 0 :: U.Vector Int64
let ans = iterate ((f y)) y !! iters
putStr $ (show $ U.sum ans)
I compiled with ghc 7.6.2 and -O2, and it took 1.7 seconds to run.
I tried several different versions of f:
f x = U.zipWith (+) x
f x = (U.zipWith (+) x) . id
f x y = U.zipWith (+) x y
Version 1 is the same as the original while versions 2 and 3 run in in under 0.09 seconds (and INLINING f doesn't change anything).
I also noticed that if I make f polymorphic (with any of the three signatures above), even with a "fast" definition (i.e. 2 or 3), it slows back down...to exactly 1.7 seconds. This makes me wonder if the original problem is perhaps due to (lack of) type inference, even though I'm explicitly giving the types for the Vector type and element type.
I'm also interested in adding integers modulo q:
newtype Zq q i = Zq {unZq :: i}
As when adding Int64s, if I write a function with every type specified,
h :: U.Vector (Zq Q17 Int64) -> U.Vector (Zq Q17 Int64) -> U.Vector (Zq Q17 Int64)
I get an order of magnitude better performance than if I leave any polymorphism
h :: (Modulus q) => U.Vector (Zq q Int64) -> U.Vector (Zq q Int64) -> U.Vector (Zq q Int64)
But I should at least be able to remove the specific phantom type! It should be compiled out, since I'm dealing with a newtype.
Here are my questions:
Where is the slowdown coming from?
What is going on in versions 2 and 3 of f that affect performance in any way? It seems like a bug to me that (what amounts to) coding style can affect performance like this. Are there other examples outside of Vector where partially applying a function or other stylistic choices affect performance?
Why does polymorphism slow me down an order of magnitude independent of where the polymorphism is (i.e. in the vector type, in the Num type, both, or phantom type)? I know polymorphism makes code slower, but this is ridiculous. Is there a hack around it?
EDIT 1
I filed a issue with the Vector library page. I found a GHC
issue relating to this problem.
EDIT2
I rewrote the question after gaining some insight from #kqr's answer.
Below is the original for reference.
--------------ORIGINAL QUESTION--------------------
Here's the code:
{-# LANGUAGE FlexibleContexts #-}
import Control.DeepSeq
import Data.Int
import qualified Data.Vector.Unboxed as U
import qualified Data.Vector.Generic as V
{-# NOINLINE f #-} -- Note the 'NO'
--f :: (Num r, V.Vector v r) => v r -> v r -> v r
--f :: (V.Vector v Int64) => v Int64 -> v Int64 -> v Int64
--f :: (U.Unbox r, Num r) => U.Vector r -> U.Vector r -> U.Vector r
f :: U.Vector Int64 -> U.Vector Int64 -> U.Vector Int64
f = V.zipWith (+)
main = do
let iters = 100
dim = 221184
y = U.replicate dim 0 :: U.Vector Int64
let ans = iterate ((f y)) y !! iters
putStr $ (show $ U.sum ans)
I compiled with ghc 7.6.2 and -O2, and it took 1.7 seconds to run.
I tried several different versions of f:
f x = U.zipWith (+) x
f x = (U.zipWith (+) x) . U.force
f x = (U.zipWith (+) x) . Control.DeepSeq.force)
f x = (U.zipWith (+) x) . (\z -> z `seq` z)
f x = (U.zipWith (+) x) . id
f x y = U.zipWith (+) x y
Version 1 is the same as the original, version 2 runs in 0.111 seconds, and versions 3-6 run in in under 0.09 seconds (and INLINING f doesn't change anything).
So the order-of-magnitude slowdown appears to be due to laziness since force helped, but I'm not sure where the laziness is coming from. Unboxed types aren't allowed to be lazy, right?
I tried writing a strict version of iterate, thinking the vector itself must be lazy:
{-# INLINE iterate' #-}
iterate' :: (NFData a) => (a -> a) -> a -> [a]
iterate' f x = x `seq` x : iterate' f (f x)
but with the point-free version of f, this didn't help at all.
I also noticed something else, which could be just a coincidence and red herring:
If I make f polymorphic (with any of the three signatures above), even with a "fast" definition, it slows back down...to exactly 1.7 seconds. This makes me wonder if the original problem is perhaps due to (lack of) type inference, even though everything should be inferred nicely.
Here are my questions:
Where is the slowdown coming from?
Why does composing with force help, but using a strict iterate doesn't?
Why is U.force worse than DeepSeq.force? I have no idea what U.force is supposed to do, but it sounds a lot like DeepSeq.force, and seems to have a similar effect.
Why does polymorphism slow me down an order of magnitude independent of where the polymorphism is (i.e. in the vector type, in the Num type, or both)?
Why are versions 5 and 6, neither of which should have any strictness implications at all, just as fast as a strict function?
As #kqr pointed out, the problem doesn't seem to be strictness. So something about the way I write the function is causing the generic zipWith to be used rather than the Unboxed-specific version. Is this just a fluke between GHC and the Vector library, or is there something more general that can be said here?
While I don't have the definitive answer you want, there are two things that might help you along.
The first thing is that x `seq` x is, both semantically and computationally, the same thing as just x. The wiki says about seq:
A common misconception regarding seq is that seq x "evaluates" x. Well, sort of. seq doesn't evaluate anything just by virtue of existing in the source file, all it does is introduce an artificial data dependency of one value on another: when the result of seq is evaluated, the first argument must also (sort of; see below) be evaluated.
As an example, suppose x :: Integer, then seq x b behaves essentially like if x == 0 then b else b – unconditionally equal to b, but forcing x along the way. In particular, the expression x `seq` x is completely redundant, and always has exactly the same effect as just writing x.
What the first paragraph says is that writing seq a b doesn't mean that a will magically get evaluated this instant, it means that a will get evaluated as soon as b needs to be evaluated. This might occur later in the program, or maybe never at all. When you view it in that light, it is obvious that seq x x is a redundancy, because all it says is, "evaluate x as soon as x needs to be evaluated." Which of course is what would happen anyway if you had just written x.
This has two implications for you:
Your "strict" iterate' function isn't really any stricter than it would be without the seq. In fact, I have a hard time imagining how the iterate function could become any stricter than it already is. You can't make the tail of the list strict, because it is infinite. The main thing you can do is force the "accumulator", f x, but doing so doesn't give any significant performance increase on my system.[1]
Scratch that. Your strict iterate' does exactly the same thing as my bang pattern version. See the comments.
Writing (\z -> z `seq` z) does not give you a strict identity function, which I assume is what you were going for. In fact, the common identity function is as strict as you'll get – it will evaluate its result as soon as it is needed.
However, I peeked at the core GHC generates for
U.zipWith (+) y
and
U.zipWith (+) y . id
and there is only one big difference that my untrained eye can spot. The first one uses just a plain Data.Vector.Generic.zipWith (here's where your polymorphism coincidence might come into play – if GHC chooses a generic zipWith it will of course perform as if the code was polymorphic!) while the latter has exploded this single function call into almost 90 lines of state monad code and unpacked machine types.
The state monad code looks almost like the loops and destructive updates you would write in an imperative language, so I assume it's tailored pretty well to the machine it's running on. If I wasn't in such a hurry, I would take a longer look to see more exactly how it works and why GHC suddenly decided it needed a tight loop. I have attached the generated core as much for myself as anyone else who wants to take a look.[2]
[1]: Forcing the accumulator along the way: (This is what you already do, I misunderstood the code!)
{-# LANGUAGE BangPatterns #-}
iterate' f !x = x : iterate f (f x)
[2]: What core U.zipWith (+) y . id gets translated into.
Related
As claimed in the documentation <$!> is the strict version of <$>, but surprisingly
<$!> :: Monad m => (a -> b) -> m a -> m b
f <$!> m = do
x <- m
let z = f x
z `seq` return z
instead of the more natural (in my opinion; because it keeps the weaker constraint and mimics $!)
<$!> :: Functor f => (a -> b) -> f a -> f b
f <$!> x = x `seq` (f <$> x)
I guess that appliying seq after the binding is different than the "natural" approach, but I don't know how different it is. My question is: Is there any reason which makes the "natural" approach useless, and that's why the implementation is constraint to Monad?
GHC's commit message includes the following two links which sheds more light on this function:
https://mail.haskell.org/pipermail/libraries/2013-November/021728.html
https://mail.haskell.org/pipermail/libraries/2014-April/022864.html
This was the reason which is mentioned by Johan Tibell for it (quoting from the linked mailing list):
It works on Monads instead of Functors as required by us inspecting
the argument.
This version is highly convenient if you want to work with
functors/applicatives in e.g. parser and avoid spurious thunks at the
same time. I realized that it was needed while fixing large space
usage (but not space-leak) issues in cassava.
I guess that appliying seq after the binding is different than the "natural" approach, but I don't know how different it is
Since haskell is functional, seq must work through data dependencies; it sets up a relationship: "when seq x y is evaluated to WHNF, a will have been as well".
The idea here is to pin the evaluation of a to the outer m a which we know must be evaluated for each >>= or <*> to proceed.
In your version:
Prelude> f <$!> x = x `seq` (f <$> x)
Prelude> let thunk = error "explode"
Prelude> case (+) <$!> Just thunk <*> Just thunk of ; Just _ -> "we can easily build up thunks"
"we can easily build up thunks"
I do wonder if there's a better solution possible though
Graham Hutton, in the 2nd edition of Programming in Haskell, spends the last 2 chapters on the topic of stack machine based implementation of an AST.
And he finishes by showing how to derive the correct implementation of that machine from the semantic model of the AST.
I'm trying to enlist the help of Data.SBV in that derivation, and failing.
And I'm hoping that someone can help me understand whether I'm:
Asking for something that Data.SBV can't do, or
Asking Data.SBV for something it can do, but asking incorrectly.
-- test/sbv-stack.lhs - Data.SBV assisted stack machine implementation derivation.
{-# LANGUAGE OverloadedLists #-}
{-# LANGUAGE ScopedTypeVariables #-}
import Data.SBV
import qualified Data.SBV.List as L
import Data.SBV.List ((.:), (.++)) -- Since they don't collide w/ any existing list functions.
-- AST Definition
data Exp = Val SWord8
| Sum Exp Exp
-- Our "Meaning" Function
eval :: Exp -> SWord8
eval (Val x) = x
eval (Sum x y) = eval x + eval y
type Stack = SList Word8
-- Our "Operational" Definition.
--
-- This function attempts to implement the *specification* provided by our
-- "meaning" function, above, in a way that is more conducive to
-- implementation in our available (and, perhaps, quite primitive)
-- computational machinery.
--
-- Note that we've (temporarily) assumed that this machinery will consist
-- of some form of *stack-based computation engine* (because we're
-- following Hutton's example).
--
-- Note that we give the *specification* of the function in the first
-- (commented out) line of the definition. The derivation of the actual
-- correct definition from this specification is detailed in Ch. 17 of
-- Hutton's book.
eval' :: Exp -> Stack -> Stack
-- eval' e s = eval e : s -- our "specification"
eval' (Val n) s = push n s -- We're defining this one manually.
where
push :: SWord8 -> Stack -> Stack
push n s = n .: s
eval' (Sum x y) s = add (eval' y (eval' x s))
where
add :: Stack -> Stack
add = uninterpret "add" s -- This is the function we're asking to be derived.
-- Now, let's just ask SBV to "solve" our specification of `eval'`:
spec :: Goal
spec = do x :: SWord8 <- forall "x"
y :: SWord8 <- forall "y"
-- Our spec., from above, specialized to the `Sum` case:
constrain $ eval' (Sum (Val x) (Val y)) L.nil .== eval (Sum (Val x) (Val y)) .: L.nil
We get:
λ> :l test/sbv-stack.lhs
[1 of 1] Compiling Main ( test/sbv-stack.lhs, interpreted )
Ok, one module loaded.
Collecting type info for 1 module(s) ...
λ> sat spec
Unknown.
Reason: smt tactic failed to show goal to be sat/unsat (incomplete quantifiers)
What happened?!
Well, maybe, asking SBV to solve for anything other than a predicate (i.e. - a -> Bool) doesn't work?
The fundamental issue here is that you are mixing SMTLib's sequence logic and quantifiers. And the problem turns out to be too difficult for an SMT solver to handle. This sort of synthesis of functions is indeed possible if you restrict yourself to basic logics. (Bitvectors, Integers, Reals.) But adding sequences to the mix puts it into the undecidable fragment.
This doesn't mean z3 cannot synthesize your add function. Perhaps a future version might be able to handle it. But at this point you're at the mercy of heuristics. To see why, note that you're asking the solver to synthesize the following definition:
add :: Stack -> Stack
add s = v .: s''
where (a, s') = L.uncons s
(b, s'') = L.uncons s'
v = a + b
while this looks rather innocent and simple, it requires capabilities beyond the current abilities of z3. In general, z3 can currently synthesize functions that only make a finite number of choices on concrete elements. But it is unable to do so if the output depends on input for every choice of input. (Think of it as a case-analysis producing engine: It can conjure up a function that maps certain inputs to others, but cannot figure out if something should be incremented or two things must be added. This follows from the work in finite-model finding theory, and is way beyond the scope of this answer! See here for details: https://arxiv.org/abs/1706.00096)
A better use case for SBV and SMT solving for this sort of problem is to actually tell it what the add function is, and then prove some given program is correctly "compiled" using Hutton's strategy. Note that I'm explicitly saying a "given" program: It would also be very difficult to model and prove this for an arbitrary program, but you can do this rather easily for a given fixed program. If you are interested in proving the correspondence for arbitrary programs, you really should be looking at theorem provers such as Isabelle, Coq, ACL2, etc.; which can deal with induction, a proof technique you will no doubt need for this sort of problem. Note that SMT solvers cannot perform induction in general. (You can use e-matching to simulate some induction like proofs, but it's a kludge at best and in general unmaintainable.)
Here's your example, coded to prove the \x -> \y -> x + y program is "correctly" compiled and executed with respect to reference semantics:
{-# LANGUAGE ScopedTypeVariables #-}
import Data.SBV
import qualified Data.SBV.List as L
import Data.SBV.List ((.:))
-- AST Definition
data Exp = Val SWord8
| Sum Exp Exp
-- Our "Meaning" Function
eval :: Exp -> SWord8
eval (Val x) = x
eval (Sum x y) = eval x + eval y
-- Evaluation by "execution"
type Stack = SList Word8
run :: Exp -> SWord8
run e = L.head (eval' e L.nil)
where eval' :: Exp -> Stack -> Stack
eval' (Val n) s = n .: s
eval' (Sum x y) s = add (eval' y (eval' x s))
add :: Stack -> Stack
add s = v .: s''
where (a, s') = L.uncons s
(b, s'') = L.uncons s'
v = a + b
correct :: IO ThmResult
correct = prove $ do x :: SWord8 <- forall "x"
y :: SWord8 <- forall "y"
let pgm = Sum (Val x) (Val y)
spec = eval pgm
machine = run pgm
return $ spec .== machine
When I run this, I get:
*Main> correct
Q.E.D.
And the proof takes almost no time. You can easily extend this by adding other operators, binding forms, function calls, the whole works if you like. So long as you stick to a fixed "program" for verification, it should work out just fine.
If you make a mistake, let's say define add by subtraction (modify the last line of it to ready v = a - b), you get:
*Main> correct
Falsifiable. Counter-example:
x = 32 :: Word8
y = 0 :: Word8
I hope this gives an idea of what the current capabilities of SMT solvers are and how you can put them to use in Haskell via SBV.
Program synthesis is an active research area with many custom techniques and tools. An out of the box use of an SMT-solver will not get you there. But if you do build such a custom system in Haskell, you can use SBV to access an underlying SMT solver to solve many constraints you'll have to handle during the process.
(Aside: An extended example, similar in spirit but with different goals, is shipped with the SBV package: https://hackage.haskell.org/package/sbv-8.5/docs/Documentation-SBV-Examples-Strings-SQLInjection.html. This program shows how to use SBV and SMT solvers to find SQL injection vulnerabilities in an idealized SQL implementation. That might be of some interest here, and would be more aligned with how SMT solvers are typically used in practice.)
Take this OCaml code:
let silly (g : (int -> int) -> int) (f : int -> int -> int) =
g (f (print_endline "evaluated"; 0))
silly (fun _ -> 0) (fun x -> fun y -> x + y)
It prints evaluated and returns 0. But if I eta-expand f to get g (fun x -> f (print_endline "evaluated"; 0) x), evaluated is no longer printed.
Same holds for this SML code:
fun silly (g : (int -> int) -> int, f : int -> int -> int) : int =
g (f (print "evaluated" ; 0));
silly ((fn _ => 0), fn x => fn y => x + y);
On the other hand, this Haskell code doesn't print evaluated even with the strict pragma:
{-# LANGUAGE Strict #-}
import Debug.Trace
silly :: ((Int -> Int) -> Int) -> (Int -> Int -> Int) -> Int
silly g f = g (f (trace "evaluated" 0))
main = print $ silly (const 0) (+)
(I can make it, though, by using seq, which makes perfect sense for me)
While I understand that OCaml and SML do the right thing theoretically, are there any practical reason to prefer this behaviour to the "lazier" one? Eta-contraction is a common refactoring tool and I'm totally scared of using it in a strict language. I feel like I should paranoically eta-expand everything, just because otherwise arguments to partially applied functions can be evaluated when they're not supposed to. When is the "strict" behaviour useful?
Why and how does Haskell behave differently under the Strict pragma? Are there any references I can familiarize myself with to better understand the design space and pros and cons of the existing approaches?
To address the technical part of your question, eta-conversion also changes the meaning of expressions in lazy languages, you just need to consider the eta-rule of a different type constructor, e.g., + instead of ->.
This is the eta-rule for binary sums:
(case e of Lft y -> f (Lft y) | Rgt y -> f (Rgt y)) = f e (eta-+)
This equation holds under eager evaluation, because e will always be reduced on both sides. Under lazy evaluation, however, the r.h.s. only reduces e if f also forces it. That might make the l.h.s. diverge where the r.h.s. would not. So the equation does not hold in a lazy language.
To make it concrete in Haskell:
f x = 0
lhs = case undefined of Left y -> f (Left y); Right y -> f (Right y)
rhs = f undefined
Here, trying to print lhs will diverge, whereas rhs yields 0.
There is more that could be said about this, but the essence is that the equational theories of both evaluation regimes are sort of dual.
The underlying problem is that under a lazy regime, every type is inhabited by _|_ (non-termination), whereas under eager it is not. That has severe semantic consequences. In particular, there are no inductive types in Haskell, and you cannot prove termination of a structural recursive function, e.g., a list traversal.
There is a line of research in type theory distinguishing data types (strict) from codata types (non-strict) and providing both in a dual manner, thus giving the best of both worlds.
Edit: As for the question why a compiler should not eta-expand functions: that would utterly break every language. In a strict language with effects that's most obvious, because the ability to stage effects via multiple function abstractions is a feature. The simplest example perhaps is this:
let make_counter () =
let x = ref 0 in
fun () -> x := !x + 1; !x
let tick = make_counter ()
let n1 = tick ()
let n2 = tick ()
let n3 = tick ()
But effects are not the only reason. Eta-expansion can also drastically change the performance of a program! In the same way you sometimes want to stage effects you sometimes also want to stage work:
match :: String -> String -> Bool
match regex = \s -> run fsm s
where fsm = ...expensive transformation of regex...
matchFloat = match "[0-9]+(\.[0-9]*)?((e|E)(+|-)?[0-9]+)?"
Note that I used Haskell here, because this example shows that implicit eta-expansion is not desirable in either eager or lazy languages!
With respect to your final question (why does Haskell do this), the reason "Strict Haskell" behaves differently from a truly strict language is that the Strict extension doesn't really change the evaluation model from lazy to strict. It just makes a subset of bindings into "strict" bindings by default, and only in the limited Haskell sense of forcing evaluation to weak head normal form. Also, it only affects bindings made in the module with the extension turned on; it doesn't retroactively affect bindings made elsewhere. (Moreover, as described below, the strictness doesn't take effect in partial function application. The function needs to be fully applied before any arguments are forced.)
In your particular Haskell example, I believe the only effect of the Strict extension is as if you had explicitly written the following bang patterns in the definition of silly:
silly !g !f = g (f (trace "evaluated" 0))
It has no other effect. In particular, it doesn't make const or (+) strict in their arguments, nor does it generally change the semantics of function applications to make them eager.
So, when the term silly (const 0) (+) is forced by print, the only effect is to evaluate its arguments to WHNF as part of the function application of silly. The effect is similar to writing (in non-Strict Haskell):
let { g = const 0; f = (+) } in g `seq` f `seq` silly g f
Obviously, forcing g and f to their WHNFs (which are lambdas) isn't going to have any side effect, and when silly is applied, const 0 is still lazy in its remaining argument, so the resulting term is something like:
(\x -> 0) ((\x y -> <defn of plus>) (trace "evaluated" 0))
(which should be interpreted without the Strict extension -- these are all lazy bindings here), and there's nothing here that will force the side effect.
As noted above, there's another subtle issue that this example glosses over. Even if you had made everything in sight strict:
{-# LANGUAGE Strict #-}
import Debug.Trace
myConst :: a -> b -> a
myConst x y = x
myPlus :: Int -> Int -> Int
myPlus x y = x + y
silly :: ((Int -> Int) -> Int) -> (Int -> Int -> Int) -> Int
silly g f = g (f (trace "evaluated" 0))
main = print $ silly (myConst 0) myPlus
this still wouldn't have printed "evaluated". This is because, in the evaluation of silly when the strict version of myConst forces its second argument, that argument is a partial application of the strict version of myPlus, and myPlus won't force any of its arguments until it's been fully applied.
This also means that if you change the definition of myPlus to:
myPlus x = \y -> x + y -- now it will print "evaluated"
then you'll be able to largely reproduce the ML behavior. Because myPlus is now fully applied, it will force its argument, and this will print "evaluated". You can suppress it again eta-expanding f in the definition of silly:
silly g f = g (\x -> f (trace "evaluated" 0) x) -- now it won't
because now when myConst forces its second argument, that argument is already in WHNF (because it's a lambda), and we never get to the application of f, full or not.
In the end, I guess I wouldn't take "Haskell plus the Strict extension and unsafe side effects like trace" too seriously as a good point in the design space. Its semantics may be (barely) coherent, but they sure are weird. I think the only serious use case is when you have some code whose semantics "obviously" don't depend on lazy versus strict evaluation but where performance would be improved by a lot of forcing. Then, you can just turn on Strict for a performance boost without having to think too hard.
Which difference in does it make if I define a function with a lambda expression or without so when compiling the module with GHC
f :: A -> B
f = \x -> ...
vs.
f :: A -> B
f x = ...
I think I saw that it helps the compiler to inline the function but other than that can it have an impact on my code if I change from the first to the second version.
I am trying to understand someone else's code and get behind the reasoning why this function is defined in the first and not the second way.
To answer that question, I wrote a little program with both ways, and looked at the Core generated:
f1 :: Int -> Int
f1 = \x -> x + 2
{-# NOINLINE f1 #-}
f2 :: Int -> Int
f2 x = x + 2
{-# NOINLINE f2 #-}
I get the core by running ghc test.hs -ddump-simpl. The relevant part is:
f1_rjG :: Int -> Int
[GblId, Arity=1, Str=DmdType]
f1_rjG =
\ (x_alH :: Int) -> + # Int GHC.Num.$fNumInt x_alH (GHC.Types.I# 2)
f2_rlx :: Int -> Int
[GblId, Arity=1, Str=DmdType]
f2_rlx =
\ (x_amG :: Int) -> + # Int GHC.Num.$fNumInt x_amG (GHC.Types.I# 2)
The results are identical, so to answer your question: there is no impact from changing from one form to the other.
That being said, I recommend looking at leftaroundabout's answer, which deals about the cases where there actually is a difference.
First off, the second form is just more flexible (it allows you to do pattern matching, with other clauses below for alternative cases).
When there's only one clause, it's actually equivalent to a lambda... unless you have a where scope. Namely,
f = \x -> someCalculation x y
where y = expensiveConstCalculation
is more efficient than
f x = someCalculation x y
where y = expensiveConstCalculation
because in the latter, y is always recalculated when you evaluate f with a different argument. In the lambda form, y is re-used:
If the signature of f is monomorphic, then f is a constant applicative form, i.e. global constant. That means y is shared throughout your entire program, and only someCalculation needs to be re-done for each call of f. This is typically ideal performance-wise, though of course it also means that y keeps occupying memory.
If f s polymorphic, then it is in fact implicitly a function of the types you're using it with. That means you don't get global sharing, but if you write e.g. map f longList, then still y needs to be computed only once before getting mapped over the list.
That's the gist of the performance differences. Now, of course GHC can rearrange stuff and since it's guaranteed that the results are the same, it might always transform one form to the other if deemed more efficient. But normally it doesn't.
I want something like
f :: [forall m. (Mutable v) (PrimState m) r -> m ()] -> v r -> v r -- illegal signature
f gs x = runST $ do
y <- thaw x
foldM_ (\_ g -> g y) undefined gs -- you get the idea
unsafeFreeze y
I'm essentially in the same position I was in this question where Vitus commented:
[I]f you want keep polymorphic functions inside some structure, you need either specialized data type (e.g. newtype I = I (forall a. a -> a)) or ImpredicativeTypes.
Also, see this question. The problem is, these are both really ugly solutions. So I've come up with a third alternative, which is to avoid the polymorphism altogether by running what "should" be a ST computation in IO instead. Thus f becomes:
f :: [(Mutable v) RealWorld r -> IO ()] -> v r -> v r
f gs x = unsafePerformIO $ do
y <- thaw x
foldM_ (\_ g -> g y) undefined gs -- you get the idea
unsafeFreeze y
I feel slightly dirty for going the unsafe IO route compared the to the "safe" ST route, but if my alternative is a wrapper or impredicative types... Apparently, I'm not alone.
Is there any reason I shouldn't use unsafePerformIO here? In this case, is it really unsafe at all? Are there performance considerations or anything else I should be aware of?
--------------EDIT----------------
An answer below shows me how to get around this problem altogether, which is great. But I'm still interested in the original question (implicaitons of runST vs unsafePerformIO when using mutable vectors) for educational purposes.
I can't say I understand the problem statement completely yet, but the following file compiles without error under GHC 7.6.2. It has the same body as your first example (and in particular doesn't call unsafePerformIO at all); the primary difference is that the forall is moved outside of all type constructors.
{-# LANGUAGE RankNTypes #-}
import Control.Monad
import Control.Monad.Primitive (PrimState)
import Control.Monad.ST
import Data.Vector.Generic hiding (foldM_)
f :: Vector v r => (forall m. [Mutable v (PrimState m) r -> m ()]) -> v r -> v r
f gs x = runST $ do
y <- thaw x
foldM_ (\_ g -> g y) undefined gs
unsafeFreeze y
Now let's tackle the the ST vs IO question. The reason it's called unsafePerformIO and not unusablePerformIO is because it comes with a proof burden that can't be checked by the compiler: the thing you are running unsafePerformIO on must behave as if it is referentially transparent. Since ST actions come with a (compiler-checked) proof that they behave transparently when executed with runST, this means there is no more danger in using unsafePerformIO on code that would typecheck in ST than there is in using runST.
BUT: there is danger from a software engineering standpoint. Since the proof is no longer compiler-checked, it's much easier for future refactoring to violate the conditions under which it's safe to use unsafePerformIO. So if it is possible to avoid it (as it seems to be here), you should take efforts to do so. (Additionally, "there is no more danger" doesn't mean "there is no danger": the unsafeFreeze call you are making has its own proof burden that you must satisfy; but then you already had to satisfy that proof burden for the ST code to be correct.)