No stream fusion with unsafeUpdate_ in unboxed vector - haskell

Is it possible to maintain stream fusion when processing a vector if unsafeUpdate_ function is used to update some elements of a vector? The answer seems to be no in the test I did. For the code below, temporary vector is generated in upd function, as confirmed in the core:
module Main where
import Data.Vector.Unboxed as U
upd :: Vector Int -> Vector Int
upd v = U.unsafeUpdate_ v (U.fromList [0]) (U.fromList [2])
sum :: Vector Int -> Int
sum = U.sum . upd
main = print $ Main.sum $ U.fromList [1..3]
In the core, $wupd function is used in sum - as seen below, it generates new bytearray:
$wupd :: Vector Int -> Vector Int
$wupd =
\ (w :: Vector Int) ->
case w `cast` ... of _ { Vector ipv ipv1 ipv2 ->
case main11 `cast` ... of _ { Vector ipv3 ipv4 ipv5 ->
case main7 `cast` ... of _ { Vector ipv6 ipv7 ipv8 ->
runSTRep
(\ (# s) (s :: State# s) ->
case >=# ipv1 0 of _ {
False -> case main6 ipv1 of wild { };
True ->
case newByteArray# (*# ipv1 8) (s `cast` ...)
of _ { (# ipv9, ipv10 #) ->
case (copyByteArray# ipv2 (*# ipv 8) ipv10 0 (*# ipv1 8) ipv9)
`cast` ...
There is a nice, tight loop in the core for sum function but just before that loop, there is a call to $wupd function, and so, a temporary generation.
Is there a way to avoid temporary generation in the example here? The way I think about it, updating a vector in index i is the case of parsing a stream but only acting on the stream in index i (skipping the rest), and replacing the element there with another element. So, updating a vector in an arbitrary location shouldn't break stream fusion, right?

I can't be 100% sure, because with vector it's turtles all the way down (you never really reach the actual implementation, there's always another indirection), but as far as I understand it, the update variants force a new temporary through cloning:
unsafeUpdate_ :: (Vector v a, Vector v Int) => v a -> v Int -> v a -> v a
{-# INLINE unsafeUpdate_ #-}
unsafeUpdate_ v is w
= unsafeUpdate_stream v (Stream.zipWith (,) (stream is) (stream w))
unsafeUpdate_stream :: Vector v a => v a -> Stream (Int,a) -> v a
{-# INLINE unsafeUpdate_stream #-}
unsafeUpdate_stream = modifyWithStream M.unsafeUpdate
and modifyWithStream calls clone (and new),
modifyWithStream :: Vector v a
=> (forall s. Mutable v s a -> Stream b -> ST s ())
-> v a -> Stream b -> v a
{-# INLINE modifyWithStream #-}
modifyWithStream p v s = new (New.modifyWithStream p (clone v) s)
new :: Vector v a => New v a -> v a
{-# INLINE_STREAM new #-}
new m = m `seq` runST (unsafeFreeze =<< New.run m)
-- | Convert a vector to an initialiser which, when run, produces a copy of
-- the vector.
clone :: Vector v a => v a -> New v a
{-# INLINE_STREAM clone #-}
clone v = v `seq` New.create (
do
mv <- M.new (length v)
unsafeCopy mv v
return mv)
and I see no way that vector would get rid of that unsafeCopy again.

If you need to change one or very few elements, there are nice solutions in repa and yarr libraries. They preserve fusion (I'm not sure about repa) and Haskell-idiomatic.
Repa, using fromFunction:
upd arr = fromFunction (extent arr) ix
where ix (Z .: 0) = 2
ix i = index arr i
Yarr, using Delayed:
upd arr = Delayed (extent arr) (touchArray arr) (force arr) ix
where ix 0 = return 2
ix i = index arr i

Related

How do I parameterize a function by module in Haskell?

This might seem artificial, but I can't seem to find an obvious answer to the following:
Say I have the following imports:
import qualified Data.Map as M
import qualified Data.HashMap.Lazy as HML
Now I have some function (comp) that takes some list, does something, creates a map, returns it.
My question is how do I have two ways of calling comp so that its calls (say) to insert and size map correctly?
As a strawman, I could write two copies of this function, one referencing M.insert and M.size, while the other references HML.insert and HML.size ... but how do I "pass the module as a parameter", or indicate this otherwise?
Thanks!
Edit: to make this less abstract these are the exact definitions of comp:
mapComp :: KVPairs -> IO ()
mapComp kvpairs = do
let init = M.empty
let m = foldr ins init kvpairs where
ins (k, v) t = M.insert k v t
if M.size m /= length kvpairs
then putStrLn $ "FAIL: " ++ show (M.size m) ++ ", " ++ show (length kvpairs)
else pure ()
hashmapComp :: KVPairs -> IO()
hashmapComp kvpairs = do
let init = HML.empty
let m = foldr ins init kvpairs where
ins (k, v) t = HML.insert k v t
if HML.size m /= length kvpairs
then putStrLn $ "Fail: " ++ show (HML.size m) ++ ", " ++ show (length kvpairs)
else pure ()
Edit (2): this turned out to be way more interesting than I anticipated, thanks to everyone who responded!
Here's how to to it with module signatures and mixins (a.k.a. Backpack)
You would have to define a library (it could be an internal library) with a signature like:
-- file Mappy.hsig
signature Mappy where
class C k
data Map k v
empty :: Map k v
insert :: C k => k -> v -> Map k v -> Map k v
size :: Map k v -> Int
in the same library or in another, write code that imports the signature as if it were a normal module:
module Stuff where
import qualified Mappy as M
type KVPairs k v = [(k,v)]
comp :: M.C k => KVPairs k v -> IO ()
comp kvpairs = do
let init = M.empty
let m = foldr ins init kvpairs where
ins (k, v) t = M.insert k v t
if M.size m /= length kvpairs
then putStrLn $ "FAIL: " ++ show (M.size m) ++ ", " ++ show (length kvpairs)
else pure ()
In another library (it must be a different one) write an "implementation" module that matches the signature:
-- file Mappy.hs
{-# language ConstraintKinds #-}
module Mappy (C,insert,empty,size,Map) where
import Data.Map.Lazy
type C = Ord
The "signature match" is performed based on names and types only, the implementation module doesn't need to know about the existence of the signature.
Then, in a library or executable in which you want to use the abstract code, pull both the library with the abstract code and the library with the implementation:
executable somexe
main-is: Main.hs
build-depends: base ^>=4.11.1.0,
indeflib,
lazyimpl
default-language: Haskell2010
library indeflib
exposed-modules: Stuff
signatures: Mappy
build-depends: base ^>=4.11.1.0
hs-source-dirs: src
default-language: Haskell2010
library lazyimpl
exposed-modules: Mappy
build-depends: base ^>=4.11.1.0,
containers >= 0.5
hs-source-dirs: impl1
default-language: Haskell2010
Sometimes the name of the signature and of the implementing module don't match, in that case one has to use the mixins section of the Cabal file.
Edit. Creating the HashMap implementation proved somewhat tricky, because insert required two constraints (Eq and Hashable) instead of one. I had to resort to the "class synonym" trick. Here's the code:
{-# language ConstraintKinds, FlexibleInstances, UndecidableInstances #-}
module Mappy (C,insert,HM.empty,HM.size,Map) where
import Data.Hashable
import qualified Data.HashMap.Strict as HM
type C = EqHash
class (Eq q, Hashable q) => EqHash q -- class synonym trick
instance (Eq q, Hashable q) => EqHash q
insert :: EqHash k => k -> v -> Map k v -> Map k v
insert = HM.insert
type Map = HM.HashMap
The simplest is to parameterize by the operations you actually need, rather than the module. So:
mapComp ::
m ->
(K -> V -> m -> m) ->
(m -> Int) ->
KVPairs -> IO ()
mapComp empty insert size kvpairs = do
let m = foldr ins empty kvpairs where
ins (k, v) t = insert k v t
if size m /= length kvpairs
then putStrLn $ "FAIL: " ++ show (size m) ++ ", " ++ show (length kvpairs)
else pure ()
You can then call it as, e.g. mapComp M.empty M.insert M.size or mapComp HM.empty HM.insert HM.size. As a small side benefit, callers may use this function even if the data structure they prefer doesn't offer a module with exactly the right names and types by writing small adapters and passing them in.
If you like, you can combine these into a single record to ease passing them around:
data MapOps m = MapOps
{ empty :: m
, insert :: K -> V -> m -> m
, size :: m -> Int
}
mops = MapOps M.empty M.insert M.size
hmops = MapOps HM.empty HM.insert HM.size
mapComp :: MapOps m -> KVPairs -> IO ()
mapComp ops kvpairs = do
let m = foldr ins (empty ops) kvpairs where
ins (k, v) t = insert ops k v t
if size ops m /= length kvpairs
then putStrLn "Yikes!"
else pure ()
I am afraid that it is not possible to do in Haskell without workarounds. Main problem is that comp would use different types for same objects for M and for HML variants, which is impossible to do in Haskell directly.
You will need to let comp know which option are you going to take using either data or polymorphism.
As a base idea I would create ADT to cover possible options and use boolean value to determine the module:
data SomeMap k v = M (M.Map k v) | HML (HML.HashMap k v)
f :: Bool -> IO ()
f shouldIUseM = do ...
And then use case expression in foldr to check whether your underlying map is M or HML. However, I don't see any good point of using such a bloatcode, it would be much better to create compM and compHML separately.
Another approach would be to create typeclass that would wrap all your cases
class SomeMap m where
empty :: m k v
insert :: k -> v -> m k v -> m k v
size :: m k v -> Int
And then write instances for each map manually (or using some TemplateHaskell magic, which I believe could help here, however it is out of my skills). It will require some bloat code as well, but then you will be able to parametrize comp over the used map type:
comp :: SomeMap m => m -> IO ()
comp thisCouldBeEmptyInitMap = do ...
But honestly, I would write this function like this:
comp :: Bool -> IO ()
comp m = if m then fooM else fooHML
I'm a little suspicious this is an XY problem, so here's how I would address the code you linked to. You have, the following:
mapComp :: KVPairs -> IO ()
mapComp kvpairs = do
let init = M.empty
let m = foldr ins init kvpairs where
ins (k, v) t = M.insert k v t
if M.size m /= length kvpairs
then putStrLn $ "FAIL: " ++ show (M.size m) ++ ", " ++ show (length kvpairs)
else pure ()
hashmapComp :: KVPairs -> IO()
hashmapComp kvpairs = do
let init = HML.empty
let m = foldr ins init kvpairs where
ins (k, v) t = HML.insert k v t
if HML.size m /= length kvpairs
then putStrLn $ "Fail: " ++ show (HML.size m) ++ ", " ++ show (length kvpairs)
else pure ()
This has a lot of repetition, which is usually not good. So we factor out the bits that are different between the two functions, and parameterize a new function by those changing bits:
-- didn't try to compile this
comp :: mp k v -> (k -> v -> mp k v -> mp k v) -> (mp k v -> Int) -> KVPairs -> IO()
comp h_empty h_insert h_size kvpairs = do
let init = h_empty
let m = foldr ins init kvpairs where
ins (k, v) t = h_insert k v t
if h_size m /= length kvpairs
then putStrLn $ "Fail: " ++ show (h_size m) ++ ", " ++ show (length kvpairs)
else pure ()
As you can see this is a really mechanical process. Then you call e.g. comp M.empty M.insert M.size.
If you want to be able to define comp such that it can work on map types that you haven't thought of yet (or which your users will specify), then you must define comp against an abstract interface. This is done with typeclasses, as in SomeMap radrow's answer.
In fact you can do part of this abstracting already, by noticing that both maps you want to work with implement the standard Foldable and Monoid.
-- didn't try to compile this
comp :: (Foldable (mp k), Monoid (mp k v))=> (k -> v -> mp k v -> mp k v) -> KVPairs -> IO()
comp h_insert kvpairs = do
let init = mempty -- ...also why not just use `mempty` directly below:
let m = foldr ins init kvpairs where
ins (k, v) t = h_insert k v t
if length m /= length kvpairs
then putStrLn $ "Fail: " ++ show (length m) ++ ", " ++ show (length kvpairs)
else pure ()
As mentioned in the comments, I think backpack is (will be?) the way to get what I think you're asking for, i.e. parameterized modules. I don't know much about it, and it's not clear to me what usecases it solves that you wouldn't want to use the more traditional approach I've described above (maybe I'll read the wiki page).

Pattern matching in `Alternative`

I have a function that pattern matches on its arguments to produce a computation in StateT () Maybe (). This computation can fail when run, in which case I want the current pattern match branch to fail, so to speak.
I highly doubt it's possible to have something like
compute :: Int -> StateT () Maybe Int
compute = return
f :: Maybe Int -> Maybe Int -> StateT () Maybe ()
f (Just n1) (Just n2) = do
m <- compute (n1 + n2)
guard (m == 42)
f (Just n) _ = do
m <- compute n
guard (m == 42)
f _ (Just n) = do
m <- compute n
guard (m == 42)
behave in the way I want it to: When the first computation fails due to the guard or somewhere in compute, I want f to try the next pattern.
Obviously the above can't work, because StateT (as any other monad might) involves an additional parameter when expanded, so I probably can't formulate this as simple pattern guards.
The following does what I want, but it's ugly:
f' :: Maybe Int -> Maybe Int -> StateT () Maybe ()
f' a b = asum (map (\f -> f a b) [f1, f2, f3])
where
f1 a b = do
Just n1 <- pure a
Just n2 <- pure b
m <- compute (n1 + n2)
guard (m == 42)
f2 a _ = do
Just n <- pure a
m <- compute n
guard (m == 42)
f3 _ b = do
Just n <- pure b
m <- compute n
guard (m == 42)
A call like execStateT (f (Just 42) (Just 1)) () would fail for f but return Just () for f', because it matches f2.
How do I get the behavior of f' while having elegant pattern matching with as little auxiliary definitions as possible like in f? Are there other, more elegant ways to formulate this?
Complete runnable example:
#! /usr/bin/env stack
-- stack --resolver=lts-11.1 script
import Control.Monad.Trans.State
import Control.Applicative
import Control.Monad
import Data.Foldable
compute :: Int -> StateT () Maybe Int
compute = return
f :: Maybe Int -> Maybe Int -> StateT () Maybe ()
f (Just n1) (Just n2) = do
m <- compute (n1 + n2)
guard (m == 42)
f (Just n) _ = do
m <- compute n
guard (m == 42)
f _ (Just n) = do
m <- compute n
guard (m == 42)
f' :: Maybe Int -> Maybe Int -> StateT () Maybe ()
f' a b = asum (map (\f -> f a b) [f1, f2, f3])
where
f1 a b = do
Just n1 <- pure a
Just n2 <- pure b
m <- compute (n1 + n2)
guard (m == 42)
f2 a _ = do
Just n <- pure a
m <- compute n
guard (m == 42)
f3 _ b = do
Just n <- pure b
m <- compute n
guard (m == 42)
main = do
print $ execStateT (f (Just 42) (Just 1)) () -- Nothing
print $ execStateT (f' (Just 42) (Just 1)) () -- Just (), because `f2` succeeded
Edit: I elicited quite some clever answers with this question so far, thanks! Unfortunately, they mostly suffer from overfitting to the particular code example I've given. In reality, I need something like this for unifying two expressions (let-bindings, to be precise), where I want to try unifying the RHS of two simultaneous lets if possible and fall through to the cases where I handle let bindings one side at a time by floating them. So, actually there's no clever structure on Maybe arguments to exploit and I'm not computeing on Int actually.
The answers so far might benefit others beyond the enlightenment they brought me though, so thanks!
Edit 2: Here's some compiling example code with probably bogus semantics:
module Unify (unify) where
import Control.Applicative
import Control.Monad.Trans.State.Strict
data Expr
= Var String -- meta, free an bound vars
| Let String Expr Expr
-- ... more cases
-- no Eq instance, fwiw
-- | If the two terms unify, return the most general unifier, e.g.
-- a substitution (`Map`) of meta variables for terms as association
-- list.
unify :: [String] -> Expr -> Expr -> Maybe [(String, Expr)]
unify metaVars l r = execStateT (go [] [] l r) [] -- threads the current substitution as state
where
go locals floats (Var x) (Var y)
| x == y = return ()
go locals floats (Var x) (Var y)
| lookup x locals == Just y = return ()
go locals floats (Var x) e
| x `elem` metaVars = tryAddSubstitution locals floats x e
go locals floats e (Var y)
| y `elem` metaVars = tryAddSubstitution locals floats y e
-- case in point:
go locals floats (Let x lrhs lbody) (Let y rrhs rbody) = do
go locals floats lrhs rrhs -- try this one, fail current pattern branch if rhss don't unify
-- if we get past the last statement, commit to this branch, no matter
-- the next statement fails or not
go ((x,y):locals) floats lbody rbody
-- try to float the let binding. terms mentioning a floated var might still
-- unify with a meta var
go locals floats (Let x rhs body) e = do
go locals (Left (x,rhs):floats) body e
go locals floats e (Let y rhs body) = do
go locals (Right (y,rhs):floats) body e
go _ _ _ _ = empty
tryAddSubstitution = undefined -- magic
When I need something like this, I just use asum with the blocks inlined. Here I also condensed the multiple patterns Just n1 <- pure a; Just n2 <- pure b into one, (Just n1, Just n2) <- pure (a, b).
f :: Maybe Int -> Maybe Int -> StateT () Maybe ()
f a b = asum
[ do
(Just n1, Just n2) <- pure (a, b)
m <- compute (n1 + n2)
guard (m == 42)
, do
Just n <- pure a
m <- compute n
guard (m == 42)
, do
Just n <- pure b
m <- compute n
guard (m == 42)
]
You can also use chains of <|>, if you prefer:
f :: Maybe Int -> Maybe Int -> StateT () Maybe ()
f a b
= do
(Just n1, Just n2) <- pure (a, b)
m <- compute (n1 + n2)
guard (m == 42)
<|> do
Just n <- pure a
m <- compute n
guard (m == 42)
<|> do
Just n <- pure b
m <- compute n
guard (m == 42)
This is about as minimal as you can get for this kind of “fallthrough”.
If you were using Maybe alone, you would be able to do this with pattern guards:
import Control.Monad
import Control.Applicative
ensure :: Alternative f => (a -> Bool) -> a -> f a
ensure p a = a <$ guard (p a)
compute :: Int -> Maybe Int
compute = return
f :: Maybe Int -> Maybe Int -> Maybe Int
f (Just m) (Just n)
| Just x <- ensure (== 42) =<< compute (m + n)
= return x
f (Just m) _
| Just x <- ensure (== 42) =<< compute m
= return x
f _ (Just n)
| Just x <- ensure (== 42) =<< compute n
= return x
f _ _ = empty
(ensure is a general purpose combinator. Cf. Lift to Maybe using a predicate)
As you have StateT on the top, though, you would have to supply a state in order to pattern match on Maybe, which would foul up everything. That being so, you are probably better off with something in the vein of your "ugly" solution. Here is a whimsical attempt at improving its looks:
import Control.Monad
import Control.Applicative
import Control.Monad.State
import Control.Monad.Trans
import Data.Foldable
ensure :: Alternative f => (a -> Bool) -> a -> f a
ensure p a = a <$ guard (p a)
compute :: Int -> StateT () Maybe Int
compute = return
f :: Maybe Int -> Maybe Int -> StateT () Maybe Int
f a b = asum (map (\c -> f' (c a b)) [liftA2 (+), const, flip const])
where
f' = ensure (== 42) <=< compute <=< lift
While this is an answer specific to the snippet I've given, the refactorings only apply limited to the code I was facing.
Perhaps it's not that far-fetched of an idea to extract the skeleton of the asum expression above to a more general combinator:
-- A better name would be welcome.
selector :: Alternative f => (a -> a -> a) -> (a -> f b) -> a -> a -> f b
selector g k x y = asum (fmap (\sel -> k (sel x y)) [g, const, flip const])
f :: Maybe Int -> Maybe Int -> StateT () Maybe Int
f = selector (liftA2 (+)) (ensure (== 42) <=< compute <=< lift)
Though it is perhaps a bit awkward of a combinator, selector does show the approach is more general than it might appear at first: the only significant restriction is that k has to produce results in some Alternative context.
P.S.: While writing selector with (<|>) instead of asum is arguably more tasteful...
selector g k x y = k (g x y) <|> k x <|> k y
... the asum version straightforwardly generalises to an arbitrary number of pseudo-patterns:
selector :: Alternative f => [a -> a -> a] -> (a -> f b) -> a -> a -> f b
selector gs k x y = asum (fmap (\g -> k (g x y)) gs)
It looks like you could get rid of the whole pattern match by relying on the fact that Int forms a Monoid with addition and 0 as the identity element, and that Maybe a forms a Monoid if a does. Then your function becomes:
f :: Maybe Int -> Maybe Int -> StateT () Maybe Int
f a b = pure $ a <> b >>= compute >>= pure . mfilter (== 42)
You could generalise by passing the predicate as an argument:
f :: Monoid a => (a -> Bool) -> Maybe a -> Maybe a -> StateT () Maybe a
f p a b = pure $ a <> b >>= compute >>= pure . mfilter p
The only thing is that compute is now taking a Maybe Int as input, but that is just a matter of calling traverse inside that function with whatever computation you need to do.
Edit: Taking into account your last edit, I find that if you spread your pattern matches into separate computations that may fail, then you can just write
f a b = f1 a b <|> f2 a b <|> f3 a b
where f1 (Just a) (Just b) = compute (a + b) >>= check
f1 _ _ = empty
f2 (Just a) _ = compute a >>= check
f2 _ _ = empty
f3 _ (Just b) = compute b >>= check
f3 _ _ = empty
check x = guard (x == 42)

GHC Calling Convention for Sum Type Function Arguments

Does GHC ever unpack sum types when passing them to functions? For example, let's say that we have the following type:
data Foo
= Foo1 {-# UNPACK #-} !Int {-# UNPACK #-} !Word
| Foo2 {-# UNPACK #-} !Int
| Foo3 {-# UNPACK #-} !Word
Then I define a function that is strict in its Foo argument:
consumeFoo :: Foo -> Int
consumeFoo x = case x of ...
At runtime, when I call consumeFoo, what can I expect to happen? The GHC calling convention is to pass arguments in registers (or on the stack once there are too many). I can see two ways that the argument passing could go:
A pointer to a Foo on the heap gets passed in as one argument.
A three-argument representation of Foo is used, one argument representing the data constructor that was used and the other two representing the possible Int and Word values in the data constructor.
I would prefer the second representation, but I don't know if it is actually what happens. I am aware of UnpackedSumTypes landing in GHC 8.2, but it's unclear if it does what I want. If I had instead written the function as:
consumeFooAlt :: (# (# Int#, Word# #) | Int# | Word# #) -> Int
Then I would expect that evaluation (2) would be what happens. And the Unpacking section of the unpacked sums page indicates that I could do this as well:
data Wrap = Wrap {-# UNPACK #-} !Foo
consumeFooAlt2 :: Wrap -> Int
And that should also have the representation I want, I think.
So my question is, without using a wrapper type or a raw unpacked sum, how can I guarentee that a sum is unpacked into registers (or onto the stack) when I pass it as an argument to a function? If it is possible, is it something that GHC 8.0 can already do, or is it something that will only be available in GHC 8.2?
First: Guaranteed optimization and GHC don't mix well. Due to the high level it is very hard to predict the code that GHC will generate in every case. The only way to be sure is to look at the Core. If you're developing an extremely performance dependent application with GHC, then you need to become familar with Core I.
I am not aware of any optimization in GHC that does exactly what you describe. Here is an example program:
module Test where
data Sum = A {-# UNPACK #-} !Int | B {-# UNPACK #-} !Int
consumeSum :: Sum -> Int
consumeSum x = case x of
A y -> y + 1
B y -> y + 2
{-# NOINLINE consumeSumNoinline #-}
consumeSumNoinline = consumeSum
{-# INLINE produceSumInline #-}
produceSumInline :: Int -> Sum
produceSumInline x = if x == 0 then A x else B x
{-# NOINLINE produceSumNoinline #-}
produceSumNoinline :: Int -> Sum
produceSumNoinline x = if x == 0 then A x else B x
test :: Int -> Int
--test x = consumeSum (produceSumInline x)
test x = consumeSumNoinline (produceSumNoinline x)
Let's first look at what happens if we don't inline consumeSum nor produceSum. Here is the core:
test :: Int -> Int
test = \ (x :: Int) -> consumeSumNoinline (produceSumNoinline x)
(produced with ghc-core test.hs -- -dsuppress-unfoldings -dsuppress-idinfo -dsuppress-module-prefixes -dsuppress-uniques)
Here, we can see that GHC (8.0 in this case) does not unbox the sum type passed as a function argument. Nothing changes if we inline either consumeSum or produceSum.
If we inline both however, then the following code is generated:
test :: Int -> Int
test =
\ (x :: Int) ->
case x of _ { I# x1 ->
case x1 of wild1 {
__DEFAULT -> I# (+# wild1 2#);
0# -> lvl1
}
}
What happened here is that through inlining, GHC ends up with:
\x -> case (if x == 0 then A x else B x) of
A y -> y + 1
B y -> y + 2
Which through the case-of-case (if is just a special case) is turned into:
\x -> if x == 0 then case (A x) of ... else case (B x) of ...
Now that is a case with a known constructor, so GHC can reduce the case at compile time ending up with:
\x -> if x == 0 then x + 1 else x + 2
So it completely eliminated the constructor.
In summary, I believe that GHC does not have any concept of an "unboxed sum" type prior to version 8.2, which also applies to function arguments. The only way to get "unboxed" sums is to get the constructor eliminated completely through inlining.
If you need such an optimization, your simplest solution is to do it yourself.
I think there are actually many ways to achieve this, but one is:
data Which = Left | Right | Both
data Foo = Foo Which Int Word
The unpacking of any fields of this type is completely irrelevant to the question of the 'shape of the representation', which is what you are really asking about. Enumerations are already highly optimized - only one value for every constructor is ever created - so the addition of this field doesn't affect performance. The unpacked representation of this type is precisely what you want - one word for Which constructor and one for each field.
If you write your functions in the proper way, you get the proper code:
data Which = Lft | Rgt | Both
data Foo = Foo Which {-# UNPACK #-} !Int {-# UNPACK #-} !Word
consumeFoo :: Foo -> Int
consumeFoo (Foo w l r) =
case w of
Lft -> l
Rgt -> fromIntegral r
Both -> l + fromIntegral r
The generated core is quite obvious:
consumeFoo :: Foo -> Int
consumeFoo =
\ (ds :: Foo) ->
case ds of _ { Foo w dt dt1 ->
case w of _ {
Lft -> I# dt;
Rgt -> I# (word2Int# dt1);
Both -> I# (+# dt (word2Int# dt1))
}
}
However, for simple programs such as:
consumeFoos = foldl' (+) 0 . map consumeFoo
This optimization makes no difference. As is indicated in the other answer, the inner function consumeFoo is just inlined:
Rec {
$wgo :: [Foo] -> Int# -> Int#
$wgo =
\ (w :: [Foo]) (ww :: Int#) ->
case w of _ {
[] -> ww;
: y ys ->
case y of _ {
Lft dt -> $wgo ys (+# ww dt);
Rgt dt -> $wgo ys (+# ww (word2Int# dt));
Both dt dt1 -> $wgo ys (+# ww (+# dt (word2Int# dt1)))
}
}
end Rec }
vs.
Rec {
$wgo :: [Foo] -> Int# -> Int#
$wgo =
\ (w :: [Foo]) (ww :: Int#) ->
case w of _ {
[] -> ww;
: y ys ->
case y of _ { Foo w1 dt dt1 ->
case w1 of _ {
Lft -> $wgo ys (+# ww dt);
Rgt -> $wgo ys (+# ww (word2Int# dt1));
Both -> $wgo ys (+# ww (+# dt (word2Int# dt1)))
}
}
}
end Rec }
Which, in almost every case when working with low-level, unpacked data, is the goal anyways, as most of your functions are small and cost little to inline.

Haskell HashTable help rewrite using State monad

So, here is my clumsy code implementing chained HashTable in Haskell.
{-# LANGUAGE FlexibleInstances #-}
import Data.Array(Array(..), array, bounds, elems, (//), (!))
import Data.List(foldl')
import Data.Char
import Control.Monad.State
class HashTranform a where
hashPrepare :: a -> Integer
instance HashTranform Integer where
hashPrepare = id
instance HashTranform String where
hashPrepare cs = fromIntegral (foldl' (flip ((+) . ord)) 0 cs)
divHashForSize :: (HashTranform a) => Integer -> a -> Integer
divHashForSize sz k = 1 + (hashPrepare k) `mod` sz
type Chain k v = [(k, v)]
chainWith :: (Eq k) => Chain k v -> (k, v) -> Chain k v
chainWith cs p#(k, v) = if (null after) then p:cs else before ++ p:(tail after)
where (before, after) = break ((== k) . fst) cs
chainWithout :: (Eq k) => Chain k v -> k -> Chain k v
chainWithout cs k = filter ((/= k) . fst) cs
data Hash k v = Hash {
hashFunc :: (k -> Integer)
, chainTable :: Array Integer (Chain k v)
}
--type HState k v = State (Hash k v)
instance (Show k, Show v) => Show (Hash k v) where
show = show . concat . elems . chainTable
type HashFuncForSize k = Integer -> k -> Integer
createHash :: HashFuncForSize k -> Integer -> Hash k v
createHash hs sz = Hash (hs sz) (array (1, sz) [(i, []) | i <- [1..sz]])
withSlot :: Hash k v -> k -> (Chain k v -> Chain k v) -> Hash k v
withSlot h k op
| rows < hashed = h
| otherwise = Hash hf (ht // [(hashed, op (ht!hashed))])
where hf = hashFunc h
ht = chainTable h
rows = snd (bounds ht)
hashed = hf k
insert' :: (Eq k) => Hash k v -> (k, v) -> Hash k v
insert' h p#(k, v) = withSlot h k (flip chainWith p)
delete' :: (Eq k) => Hash k v -> k -> Hash k v
delete' h k = withSlot h k (flip chainWithout k)
insert :: (Eq k) => Hash k v -> Chain k v -> Hash k v
insert src pairs = foldl' insert' src pairs
delete :: (Eq k) => Hash k v -> [k] -> Hash k v
delete src keys = foldl' delete' src keys
search :: (Eq k) => k -> Hash k v -> Maybe v
search k h
| rows < hashed = Nothing
| otherwise = k `lookup` (ht!hashed)
where hf = hashFunc h
ht = chainTable h
rows = snd (bounds ht)
hashed = hf k
The problem is I don't want to have to code like this:
new = intHash `insert` [(1112, "uygfd"), (211, "catdied")]
new' = new `delete` [(1112, "uygfd")]
I believe it's modified with State Monad somehow, but having read online tutorials I couldn't quite grasp how exactly it's done.
So could you show me how to implement at least insert, delete, search or any one of them to give exposition.
At the end of the day your "state" will be a Hash k v. Let's break the interface functions into two groups. First are "state dependent" functions like search k which has a type like Hash k v -> _ (where _ just means "something"). Second are the "state updating" functions like flip insert (k, v) and flip delete ks which have types like Hash k v -> Hash k v.
As you've noted, you can already simulate "state" by manually passing around the Hash k v argument. The State monad is nothing more than type magic to make that easier.
If you look at Control.Monad.State you'll see modify :: (s -> s) -> State s () and gets :: (s -> a) -> State s a. These functions transform your "state updating" and "state dependent" functions into "State monad actions". So now we can write a combined State monad action like so
deleteIf :: (v -> Bool) -> k -> State (Hash k v) ()
deleteIf predicate k = do
v <- gets $ search k
case fmap predicate v of
Nothing -> return ()
Just False -> return ()
Just True -> modify $ flip delete [k]
and then we can sequence together larger computations
computation = deleteIf (>0) 'a' >> deleteIf (>0) 'b'
and then execute them by "running" the State monad
runState computation (createHash f 100)

Chaining Haskell Functions in data types

Lets say I have the following:
data FuncAndValue v res = FuncAndValue (v -> res) v
chain :: (res -> new_res) -> FuncAndValue v res -> FuncAndValue v new_res
chain new_f (FuncAndValue old_f v) = FuncAndValue (new_f . old_f) v
Is GHC likely to be able to combine functions new_f and old_f into a single function through inlining?
Basically, does storing functions in data types in anyway inhibit optimizations.
I'd like GHC to be easily able to compose chains of functions into one (i.e. so a "sum" on my structure doesn't involve repeated calls to a thunk that represents (+) and instead just inlines the (+) so it runs like a for loop. I'm hoping storing functions in data types and then accessing them later doesn't inhibit this.
Is GHC likely to be able to combine functions new_f and old_f into a single function through inlining?
Yes, if it could do the same without the intervening FuncAndValue. Of course the unfoldings of the functions need to be available, or there wouldn't be any chance of inlining anyway. But if there is a chance, wrapping the function(s) in a FuncAndValue makes little difference if any.
But let's ask GHC itself. First the type and a very simple chaining:
module FuncAndValue where
data FuncAndValue v res = FuncAndValue (v -> res) v
infixr 7 `chain`
chain :: (res -> new_res) -> FuncAndValue v res -> FuncAndValue v new_res
chain new_f (FuncAndValue old_f v) = FuncAndValue (new_f . old_f) v
apply :: FuncAndValue v res -> res
apply (FuncAndValue f x) = f x
trivia :: FuncAndValue Int (Int,Int)
trivia = FuncAndValue (\x -> (2*x - 1, 3*x + 2)) 1
composed :: FuncAndValue Int Int
composed = chain (uncurry (+)) trivia
and (the interesting part of) the core we get for trivia and composed:
FuncAndValue.trivia1 =
\ (x_af2 :: GHC.Types.Int) ->
(case x_af2 of _ { GHC.Types.I# y_agp ->
GHC.Types.I# (GHC.Prim.-# (GHC.Prim.*# 2 y_agp) 1)
},
case x_af2 of _ { GHC.Types.I# y_agp ->
GHC.Types.I# (GHC.Prim.+# (GHC.Prim.*# 3 y_agp) 2)
})
FuncAndValue.composed2 =
\ (x_agg :: GHC.Types.Int) ->
case x_agg of _ { GHC.Types.I# y_agp ->
GHC.Types.I#
(GHC.Prim.+#
(GHC.Prim.-# (GHC.Prim.*# 2 y_agp) 1)
(GHC.Prim.+# (GHC.Prim.*# 3 y_agp) 2))
}
Inlined fair enough, no (.) to be seen. The two cases from trivia have been joined so that we have only one in composed. Unless somebody teaches GHC enough algebra to simplify \x -> (2*x-1) + (3*x+2) to \x -> 5*x + 1, that's as good as you can hope. apply composed is reduced to 6 at compile time, even in a separate module.
But that was very simple, let's give it a somewhat harder nut to crack.
An inlinable version of until (the current definition of until is recursive, so GHC doesn't inline it),
module WWUntil where
wwUntil :: (a -> Bool) -> (a -> a) -> a -> a
wwUntil p f = recur
where
recur x
| p x = x
| otherwise = recur (f x)
Another simple function it its own module,
collatzStep :: Int -> Int
collatzStep n
| n .&. 1 == 0 = n `unsafeShiftR` 1
| otherwise = 3*n + 1
and finally, the nut
module Hailstone (collatzLength, hailstone) where
import FuncAndValue
import CollatzStep
import WWUntil
data P = P {-# UNPACK #-} !Int {-# UNPACK #-} !Int
fstP :: P -> Int
fstP (P x _) = x
sndP :: P -> Int
sndP (P _ y) = y
hailstone :: Int -> FuncAndValue Int Int
hailstone n = sndP `chain` wwUntil ((== 1) . fstP) (\(P n k) -> P (collatzStep n) (k+1))
`chain` FuncAndValue (\x -> P x 0) n
collatzLength :: Int -> Int
collatzLength = apply . hailstone
I have helped the strictness analyser a bit by using a strict pair. With the vanilla (,) the second component would be unboxed and reboxed after adding 1 in each step, and I just can't bear such waste ;) But otherwise there's no relevant difference.
And (the interesting part of) the core GHC generates:
Rec {
Hailstone.$wrecur [Occ=LoopBreaker]
:: GHC.Prim.Int#
-> GHC.Prim.Int# -> (# GHC.Prim.Int#, GHC.Prim.Int# #)
[GblId, Arity=2, Caf=NoCafRefs, Str=DmdType LL]
Hailstone.$wrecur =
\ (ww_sqq :: GHC.Prim.Int#) (ww1_sqr :: GHC.Prim.Int#) ->
case ww_sqq of wild_Xm {
__DEFAULT ->
case GHC.Prim.word2Int#
(GHC.Prim.and# (GHC.Prim.int2Word# wild_Xm) (__word 1))
of _ {
__DEFAULT ->
Hailstone.$wrecur
(GHC.Prim.+# (GHC.Prim.*# 3 wild_Xm) 1) (GHC.Prim.+# ww1_sqr 1);
0 ->
Hailstone.$wrecur
(GHC.Prim.uncheckedIShiftRA# wild_Xm 1) (GHC.Prim.+# ww1_sqr 1)
};
1 -> (# 1, ww1_sqr #)
}
end Rec }
lvl_rsz :: GHC.Types.Int -> GHC.Types.Int
[GblId, Arity=1, Caf=NoCafRefs]
lvl_rsz =
\ (x_iog :: GHC.Types.Int) ->
case x_iog of _ { GHC.Types.I# tpl1_B4 ->
case Hailstone.$wrecur tpl1_B4 0 of _ { (# _, ww2_sqH #) ->
GHC.Types.I# ww2_sqH
}
}
and that's exactly what you get without FuncAndValue. Everything inlined nicely, a beautiful tight loop.
Basically, does storing functions in data types in anyway inhibit optimizations.
If you wrap the function under enough layers, yes. But it's the same with other values.

Resources