I have the following code, which has been stripped down and is I think as minimal as possible that has some very odd behaviour.
The code consists of two source files:
One to define some data:
module MyFunction where
data MyFunction =
MyFunction {
functionNumber :: Int,
functionResult :: IO String
}
makeMyFunction :: Show a => Int -> IO a -> MyFunction
makeMyFunction number result = MyFunction {
functionNumber = number,
functionResult = result >>= return . show }
And the other is Main:
module Main (main) where
import System.CPUTime (getCPUTime)
import Data.List (foldl')
import Data.Foldable (foldlM)
import Control.Monad (foldM)
import MyFunction
exampleFunction = do
--let x = foldl' (\a b -> a `seq` (a + b)) 0 [1..20000000] -- This works
--x <- foldlM (\a b -> a `seq` return (a + b)) 0 [1..20000000] -- This works (*)
x <- foldM (\a b -> a `seq` return (a + b)) 0 [1..20000000] -- This doesn't
print x
return ()
runFunction fn = do
result <- functionResult fn
duration <- getCPUTime
if result /= "()"
then putStrLn ""
else return ()
putStrLn (show (fromIntegral duration / (10^9)) ++ "ms")
return fn
main = do
runFunction (makeMyFunction 123 exampleFunction)
return ()
The code as above (compiled using GHC 7.10.3 with stack 1.0.0 with default flags) has a rapid increase in memory usage (exceeding 1GB), and takes typically 3.3 seconds.
If I make a changes to the code, for example:
Use one of the commented alternatives to the problem line
Take out any line from runFunction
The memory usage will remain minimal, and takes only about 1 second.
One feature that I think is most surprising to me is that replacing foldM with foldlM (which as far as I know foldM = foldlM) fixes the problem.
Also making changes to code that I don't see has any relationship to the problem lines of code also fixes the problem. For example removing the last putStrLn.
Another oddity is that if I merge the MyFunction module into the Main module, while it doesn't fix the problem, it actually causes foldlM to behave as foldM using excessive memory.
In the real code that this came from, I have a large number exampleFunctions, and there is significantly more Main code, and every so often I encounter this sort of unexplained memory usage from functions, that can usually be resolved by some sort of voodoo.
I'm looking for an explanation for the behaviour. If I know why this occurs I can then look into avoiding it. Could this be a compiler issue, or maybe just a misunderstanding on my part?
(*) I've highlighted the secondary issue that causes the same memory growth to occur with foldlM.
Here is foldlM from Foldable.hs (ghc)
-- | Monadic fold over the elements of a structure,
-- associating to the left, i.e. from left to right.
foldlM :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM f z0 xs = foldr f' return xs z0
where f' x k z = f z x >>= k
and foldM from Monad.hs
foldM :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
{-# INLINEABLE foldM #-}
{-# SPECIALISE foldM :: (a -> b -> IO a) -> a -> [b] -> IO a #-}
{-# SPECIALISE foldM :: (a -> b -> Maybe a) -> a -> [b] -> Maybe a #-}
foldM = foldlM
I placed these definitions to a separate module Test and tested the execution with and without INLINEABLE / SPESIALISE lines. Whatever the reason is, leaving out the SPECIALISE directives helped and the execution time and memory usage was like with foldlM.
After a little bit more digging, removing line
{-# SPECIALISE foldM :: (a -> b -> IO a) -> a -> [b] -> IO a #-}
effected the most.
Related
An initial value (x), function (f), a predicate (p) and a time bound (t) is given. I want to apply 'f' repeatedly on x till it satisfies the 'p'. But on the same time wanna make sure it doesn't exceeds time limit. If time exceeds 't', it should return the partial result i.e., a pair of a number 'n' and value of applying 'f' n-times on 'x', for the largest n for which it actually performed the computation.
If partial result condition is relaxed, this can be easily programmed as -
import System.Timeout
iter :: a -> (a -> a) -> (a -> Bool) -> Int -> IO (Maybe (Int, a))
iter x f p t = do
let fs = x:(map f fs)
timeout t $ return $! head $ filter (\x -> p $ snd x) $ zip [1..] fs
I want it to have signature similar to -
iter :: a -> (a -> a) -> (a -> Bool) -> Int -> IO (Either (Int, a) (Int, a))
With Left for partial result and Right for complete result.
An silly and trivial example use of above function is -
*Main> iter 1 (+2) (> 1000000) 1000000
Just (500001,1000001)
*Main> iter 1 (+2) (> 1000000) 100000
Nothing
I want the second call to return partial computed result. Is there a simple way to do it?
More practical examples can be Newton–Raphson method or gradient descent.
I believe it is simplest to delegate such tasks to libraries with better abstraction capability than that offered by base, e.g. async:
{-# LANGUAGE BangPatterns #-}
import Control.Concurrent.Async
import Control.Concurrent
iter :: a -> (a -> a) -> (a -> Bool) -> Int -> IO (Either (Int, a) (Int, a))
iter z f p maxt = do
o <- newMVar (0, z)
let loop old#(!i,x) = do
modifyMVar_ o (const $ return old)
if p x then return old else loop (i+1, f x)
race (threadDelay maxt >> readMVar o) (loop (0, z))
race takes two IO actions and returns whichever completes first, killing the other. The left action completes only if the maximum time has elapsed, and that thread is able to read the MVar. Since the other thread holds the MVar for a short period of time (while it is writing the result) the worker will never be interrupted while writing the result.
Note also that the only thing that forces the chain of applications f $ f $ f.. is the predicate p - if you pass a lazy function (e.g. const False) then this will not work as you want. In practice there are few cases when you would use such a function (esp. with numeric computing), so it likely won't be of much concern. But in this case loop does no actual work and builds a ridiculously large number amount of applications:
>iter 2 (\x -> x * x) (const False) (10^6)
Left (472190,Interrupted.
My computer will never be able to print this result because it has 6.8×10^142142 digits. However:
>iter 2 (\x -> x * x) (<0) (10^6)
Left (24,Interrupted
This is a small number with only about 5,000,000 digits.
I believe your best bet is to use an IORef to keep track of the current computation state. Even if the computation is interrupted, its side effects won't be undone. Be sure to use just one IORef for both the counter and the current value; otherwise they could go out of sync in case of a timeout.
I'm not at all experienced with asynchronous code in Haskell, so take this with a grain of salt, but it seems to work.
{-# LANGUAGE BangPatterns #-}
module IterTimeout where
import Control.Applicative
import Data.IORef
import System.Timeout
iter :: a -> (a -> a) -> (a -> Bool) -> Int -> IO (Either (Int, a) (Int, a))
iter x f p t = do
ref <- newIORef (0, x)
result <- timeout t (iterStep f p ref)
maybe (Left <$> readIORef ref) (return . Right) result
iterStep :: (a -> a) -> (a -> Bool) -> IORef (Int, a) -> IO (Int, a)
iterStep f p ref = go
where
go = do
old#(!oldCount, oldVal) <- readIORef ref
if p oldVal
then return old
else writeIORef ref (oldCount + 1, f oldVal) >> go
I'm trying to parse dates such as 09/10/2015 17:20:52:
{-# LANGUAGE FlexibleContexts #-}
import Text.Parsec
import Text.Parsec.String
import Text.Read
import Control.Applicative hiding (many, (<|>))
data Day = Day
{ mo :: Int
, dy :: Int
, yr :: Int
} deriving (Show)
data Time = Time
{ hr :: Int
, min :: Int
, sec :: Int
} deriving (Show)
day = listUncurry Day <$> (sepCount 3 (char '/') $ read <$> many digit)
time = listUncurry Time <$> (sepCount 3 (char ':') $ dign 2 )
dign :: (Stream s m Char, Read b) => Int -> ParsecT s u m b
dign = (read <$>) . flip count digit
-- how generalize to n?
listUncurry h [x1,x2,x3] = h x1 x2 x3
sepCount n sep p = (:) <$> p <*> (count (n-1) $ sep *> p)
I have a hunch that some kind of zipWithN would generalize listUncurry. Maybe some kind of foldl ($)?
As a side question (out of curiosity), can parsec parsers be used generatively?
Actually, you only need Functor:
listUncurry :: Functor f => (a -> a -> a -> r) -> f [a] -> f r
listUncurry h p =
(\[x, y, z] -> h x y z) <$> p
To me, a hint that only Functor is necessary is when you have a code pattern like:
do x <- m
return (f ...)
This is equivalent to
m >>= (\x -> return (f ...))
which is the same as
fmap (\x -> f ...) m
This is because the monad laws imply this identity:
fmap f xs = xs >>= return . f
Polyvariadic listUncurry
I don't really recommend this in most circumstances since it turns what would be compile time errors into runtime errors, but this is how you could implement a polyvariadic listUncurry:
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FlexibleInstances #-}
class ListUncurry a x r where
listUncurry :: a -> [x] -> r
instance ListUncurry k a r => ListUncurry (a -> k) a r where
listUncurry f (x:xs) = listUncurry (f x) xs
listUncurry _ _ = error "listUncurry: Too few arguments given"
instance ListUncurry r a r where
listUncurry r [] = r
listUncurry _ _ = error "listUncurry: Too many arguments given"
You will need a lot of explicit type annotations if you use it too. There is probably a way to use a type family or functional dependency to help with that, but I can't think of it off the top of my head at the moment. Since that is probably solvable (to an extent at least), in my mind the bigger problem is the type errors being changed from compile time errors to runtime errors.
Sample usage:
ghci> listUncurry ord ['a'] :: Int
97
ghci> listUncurry ((==) :: Int -> Int -> Bool) [1,5::Int] :: Bool
False
ghci> listUncurry ((==) :: Char -> Char -> Bool) ['a'] :: Bool
*** Exception: listUncurry: Too few arguments given
ghci> listUncurry ((==) :: Char -> Char -> Bool) ['a','b','c'] :: Bool
*** Exception: listUncurry: Too many arguments given
A safer listUncurry
If you change the class to
class ListUncurry a x r where
listUncurry :: a -> [x] -> Maybe r
and change the error cases in the instances appropriately, you will at least get a better interface to handling the errors. You could also replace the Maybe with a type that differentiates between the "too many" and "too few" argument errors if you wanted to retain that information.
I feel that this would be a bit better of an approach, although you will need to add a bit more error handling (Maybe's Functor, Applicative and Monad interfaces will make this fairly nice though).
Comparing the two approaches
It ultimately depends on what sort of error this would represent. If the program execution can no longer continue in any meaningful way if it runs into such an error, then the first approach (or something like it) might be more appropriate than the second. If there is any way to recover from the error, the second approach would be better than the first.
Whether or not a polyvariadic technique should be used in the first place is a different question. It might be better to restructure the program to avoid the additional complexity of the polyvariadic stuff.
also i'm sure i shouldn't be snocing a list -- what's the right way to do this?
The following implementation of sepCount is more efficient:
-- | #sepCount n sep p# applies #n# (>=1) occurrences of #p#,
-- separated by #sep#. Returns a list of the values returned by #p#.
sepCount n sep p = p <:> count (n - 1) (sep *> p)
where (<:>) = liftA2 (:)
I need an list of biased, random booleans. Each boolean needs to have the same probability of being True (Bernoulli distributed). These booleans are passed to a function, which generates zero or more output booleans per input boolean. I need an infinite list, because I don't know in advance how many booleans are required to provide enough output. See the below (simplified) code:
import System.Random.MWC
import System.Random.MWC.Distributions
foo :: [Bool] -> [Bool] -- foo outputs zero or more Bools per input Bool
main = do
gen <- create
bits <- sequence . repeat $ bernoulli 0.25 gen
print . take 32 . foo $ bits
Unfortunately, this code just hangs at the second line of main. I guess that there is something non-lazy happening somewhere with Control.Monad.ST?
(I would be able to do something like this with System.Random.randoms, but the resulting values don't have the required distributions.)
Can I fix this while keep using the System.Random.MWC library? Or does this require me to switch to alternative implementations?
The mwc-random package provides two PrimMonad instances, one for IO and another for ST s. As long as an ST computation is parameterized over all state tags s, we can run the computation and extract the value with runST :: (forall s. ST s a) -> a. By itself this wouldn't be very useful since we'd lose the state: the seed of the random generator, but mwc-random also provides explicit ways to handle the seeds:
save :: PrimMonad m => Gen (PrimState m) -> m Seed
restore :: PrimMonad m => Seed -> m (Gen (PrimState m))
We can use these to make a computation that generates a stream of values from a computation that generates a single value, as long as the generator is in forall s. ST s.
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE ScopedTypeVariables #-}
import System.Random.MWC
import Control.Monad.ST
import System.Random.MWC.Distributions
randomStream :: forall s a. (forall s. GenST s -> ST s a) -> GenST s -> ST s [a]
randomStream item = go
where
go :: forall s. GenST s -> ST s [a]
go gen = do
x <- item gen
seed <- save gen
return (x:runST (restore seed >>= go))
With this we can write your example as
main = do
bits <- withSystemRandom (randomStream (bernoulli 0.25))
print . take 32 $ bits
We can actually build generators more sophisticated than using the same generator for each item in the stream. We could thread a state along the stream so that each value can depend on the results of the previous ones.
unfoldStream :: forall s a b. (forall s. b -> GenST s -> ST s (a, b)) -> b -> GenST s -> ST s [a]
unfoldStream item = go
where
go :: forall s. b -> GenST s -> ST s [a]
go b gen = do
(x,b') <- item b gen
seed <- save gen
return (x:runST (restore seed >>= go b'))
The following example stream has results that increase in likelihood every time the result is False.
import Control.Monad.Primitive
interesting :: (PrimMonad m) => Double -> Gen (PrimState m) -> m (Bool, Double)
interesting p gen = do
result <- bernoulli p gen
let p' = if result then p else p + (1-p)*0.25
return (result, p')
main = do
bits <- withSystemRandom (unfoldStream interesting 0)
print . take 32 $ bits
The culprit is sequence . repeat - this will hang for (almost?) every monad, since you must perform a potentially infinite number of effects.
The simplest solutions would be to use a different library - which may not be possible if you are relying on the quality of the numbers produced from mwc-random. The next simplest solution is to rewrite foo to have type [IO Bool] -> IO [Bool] and pass it repeat (bernoulli 0.25 gen) - this would allow foo to make the choice of when to stop executing the effects produced by the infinite list. But having your logic inside of IO is not very nice.
The standard trick when you need an infinite list of random numbers is to use a pure function f :: StdGen -> (Result, StdGen). Then unfoldr (Just . f) :: StdGen -> [Result], and the output is an infinite list. At first glance, it may appear that mwc-random only has monadic functions, and that there is no pure interface. However, that is not the case, because ST s is an instance of PrimMonad. You also have the functions converting a Gen to a Seed. Using these, you can get a pure RNG function for any monadic one:
{-# LANGUAGE RankNTypes #-}
import System.Random.MWC
import System.Random.MWC.Distributions
import Control.Monad.ST
import Data.List
pureRand :: (forall s . GenST s -> ST s t) -> Seed -> (t, Seed)
pureRand f s = runST $ do
s' <- restore s
r <- f s'
s'' <- save s'
return (r, s'')
pureBernoulli :: Double -> Seed -> (Bool, Seed)
pureBernoulli a = pureRand (bernoulli a)
foo :: [Bool] -> [Bool]
foo = id
main = do
gen <- create >>= save
let bits = unfoldr (Just . pureBernoulli 0.25) gen
print . take 32 . foo $ bits
It is unfortunate that mwc-random doesn't expose this sort of interface by default but it is pretty easy to get to.
The other option is slightly more scary - use unsafe functions.
import System.IO.Unsafe
repeatM rand = go where
go = do
x <- rand
xs <- unsafeInterleaveIO go
return (x : xs)
main2 = do
gen <- create
bits <- repeatM (bernoulli 0.25 gen)
print . take 32 . foo $ bits
Naturally this comes with the usual caveats surrounding unsafe - use it only if you are exceedingly inconvenienced by the pure functions. unsafeInterleaveIO may reorder or never execute effects - if foo, for example, ignores one element, it will never be computed and the corresponding effect of updating the state stored in gen may not happen. For example, the following will print nothing:
snd <$> ((,) <$> unsafeInterleaveIO (putStrLn "Hello") <*> return ())
Is it possible to have a function that takes a foreign function call where some of the foreign function's arguments are CString and return a function that accepts String instead?
Here's an example of what I'm looking for:
foreign_func_1 :: (CDouble -> CString -> IO())
foreign_func_2 :: (CDouble -> CDouble -> CString -> IO ())
externalFunc1 :: (Double -> String -> IO())
externalFunc1 = myFunc foreign_func_1
externalFunc2 :: (Double -> Double -> String -> IO())
externalFunc2 = myFunc foreign_func_2
I figured out how to do this with the C numeric types. However, I can't figure out a way to do it that can allow string conversion.
The problem seems to be fitting in IO functions, since everything that converts to CStrings such as newCString or withCString are IO.
Here is what the code looks like to just handle converting doubles.
class CConvertable interiorArgs exteriorArgs where
convertArgs :: (Ptr OtherIrrelevantType -> interiorArgs) -> exteriorArgs
instance CConvertable (IO ()) (Ptr OtherIrrelevantType -> IO ()) where
convertArgs = doSomeOtherThingsThatArentCausingProblems
instance (Real b, Fractional a, CConvertable intArgs extArgs) => CConvertable (a->intArgs) (b->extArgs) where
convertArgs op x= convertArgs (\ctx -> op ctx (realToFrac x))
Is it possible to have a function that takes a foreign function call where some of the foreign function's arguments are CString and return a function that accepts String instead?
Is it possible, you ask?
<lambdabot> The answer is: Yes! Haskell can do that.
Ok. Good thing we got that cleared up.
Warming up with a few tedious formalities:
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE UndecidableInstances #-}
Ah, it's not so bad though. Look, ma, no overlaps!
The problem seems to be fitting in IO functions, since everything that converts to CStrings such as newCString or withCString are IO.
Right. The thing to observe here is that there are two somewhat interrelated matters with which to concern ourselves: A correspondence between two types, allowing conversions; and any extra context introduced by performing a conversion. To deal with this fully, we'll make both parts explicit and shuffle them around appropriately. We also need to take heed of variance; lifting an entire function requires working with types in both covariant and contravariant position, so we'll need conversions going in both directions.
Now, given a function we wish to translate, the plan goes something like this:
Convert the function's argument, receiving a new type and some context.
Defer the context onto the function's result, to get the argument how we want it.
Collapse redundant contexts where possible
Recursively translate the function's result, to deal with multi-argument functions
Well, that doesn't sound too difficult. First, explicit contexts:
class (Functor f, Cxt t ~ f) => Context (f :: * -> *) t where
type Collapse t :: *
type Cxt t :: * -> *
collapse :: t -> Collapse t
This says we have a context f, and some type t with that context. The Cxt type function extracts the plain context from t, and Collapse tries to combine contexts if possible. The collapse function lets us use the result of the type function.
For now, we have pure contexts, and IO:
newtype PureCxt a = PureCxt { unwrapPure :: a }
instance Context IO (IO (PureCxt a)) where
type Collapse (IO (PureCxt a)) = IO a
type Cxt (IO (PureCxt a)) = IO
collapse = fmap unwrapPure
{- more instances here... -}
Simple enough. Handling various combinations of contexts is a bit tedious, but the instances are obvious and easy to write.
We'll also need a way to determine the context given a type to convert. Currently the context is the same going in either direction, but it's certainly conceivable for it to be otherwise, so I've treated them separately. Thus, we have two type families, supplying the new outermost context for an import/export conversion:
type family ExpCxt int :: * -> *
type family ImpCxt ext :: * -> *
Some example instances:
type instance ExpCxt () = PureCxt
type instance ImpCxt () = PureCxt
type instance ExpCxt String = IO
type instance ImpCxt CString = IO
Next up, converting individual types. We'll worry about recursion later. Time for another type class:
class (Foreign int ~ ext, Native ext ~ int) => Convert ext int where
type Foreign int :: *
type Native ext :: *
toForeign :: int -> ExpCxt int ext
toNative :: ext -> ImpCxt ext int
This says that two types ext and int are uniquely convertible to each other. I realize that it might not be desirable to always have only one mapping for each type, but I didn't feel like complicating things further (at least, not right now).
As noted, I've also put off handling recursive conversions here; probably they could be combined, but I felt it would be clearer this way. Non-recursive conversions have simple, well-defined mappings that introduce a corresponding context, while recursive conversions need to propagate and merge contexts and deal with distinguishing recursive steps from the base case.
Oh, and you may have noticed by now the funny wiggly tilde business going on up there in the class contexts. That indicates a constraint that the two types must be equal; in this case it ties each type function to the opposite type parameter, which gives the bidirectional nature mentioned above. Er, you probably want to have a fairly recent GHC, though. On older GHCs, this would need functional dependencies instead, and would be written as something like class Convert ext int | ext -> int, int -> ext.
The term-level conversion functions are pretty simple--note the type function application in their result; application is left-associative as always, so that's just applying the context from the earlier type families. Also note the cross-over in names, in that the export context comes from a lookup using the native type.
So, we can convert types that don't need IO:
instance Convert CDouble Double where
type Foreign Double = CDouble
type Native CDouble = Double
toForeign = pure . realToFrac
toNative = pure . realToFrac
...as well as types that do:
instance Convert CString String where
type Foreign String = CString
type Native CString = String
toForeign = newCString
toNative = peekCString
Now to strike at the heart of the matter, and translate whole functions recursively. It should come as no surprise that I've introduced yet another type class. Actually, two, as I've separated import/export conversions this time.
class FFImport ext where
type Import ext :: *
ffImport :: ext -> Import ext
class FFExport int where
type Export int :: *
ffExport :: int -> Export int
Nothing interesting here. You may be noticing a common pattern by now--we're doing roughly equal amounts of computing at both the term and type level, and we're doing them in tandem, even to the point of mimicking names and expression structure. This is pretty common if you're doing type-level calculation for things involving real values, since GHC gets fussy if it doesn't understand what you're doing. Lining things up like this reduces headaches significantly.
Anyway, for each of these classes, we need one instance for each possible base case, and one for the recursive case. Alas, we can't easily have a generic base case, due to the usual bothersome nonsense with overlapping. It could be done using fundeps and type equality conditionals, but... ugh. Maybe later. Another option would be to parameterize the conversion function by a type-level number giving the desired conversion depth, which has the downside of being less automatic, but gains some benefit from being explicit as well, such as being less likely to stumble on polymorphic or ambiguous types.
For now, I'm going to assume that every function ends with something in IO, since IO a is distinguishable from a -> b without overlap.
First, the base case:
instance ( Context IO (IO (ImpCxt a (Native a)))
, Convert a (Native a)
) => FFImport (IO a) where
type Import (IO a) = Collapse (IO (ImpCxt a (Native a)))
ffImport x = collapse $ toNative <$> x
The constraints here assert a specific context using a known instance, and that we have some base type with a conversion. Again, note the parallel structure shared by the type function Import and term function ffImport. The actual idea here should be pretty obvious--we map the conversion function over IO, creating a nested context of some sort, then use Collapse/collapse to clean up afterwards.
The recursive case is similar, but more elaborate:
instance ( FFImport b, Convert a (Native a)
, Context (ExpCxt (Native a)) (ExpCxt (Native a) (Import b))
) => FFImport (a -> b) where
type Import (a -> b) = Native a -> Collapse (ExpCxt (Native a) (Import b))
ffImport f x = collapse $ ffImport . f <$> toForeign x
We've added an FFImport constraint for the recursive call, and the context wrangling has gotten more awkward because we don't know exactly what it is, merely specifying enough to make sure we can deal with it. Note also the contravariance here, in that we're converting the function to native types, but converting the argument to a foreign type. Other than that, it's still pretty simple.
Now, I've left out some instances at this point, but everything else follows the same patterns as the above, so let's just skip to the end and scope out the goods. Some imaginary foreign functions:
foreign_1 :: (CDouble -> CString -> CString -> IO ())
foreign_1 = undefined
foreign_2 :: (CDouble -> SizedArray a -> IO CString)
foreign_2 = undefined
And conversions:
imported1 = ffImport foreign_1
imported2 = ffImport foreign_2
What, no type signatures? Did it work?
> :t imported1
imported1 :: Double -> String -> [Char] -> IO ()
> :t imported2
imported2 :: Foreign.Storable.Storable a => Double -> AsArray a -> IO [Char]
Yep, that's the inferred type. Ah, that's what I like to see.
Edit: For anyone who wants to try this out, I've taken the full code for the demonstration here, cleaned it up a bit, and uploaded it to github.
This can be done with template haskell. In many ways it is simpler than the
alternatives involving classes, since it is easier pattern match on
Language.Haskell.TH.Type than do the same thing with instances.
{-# LANGUAGE TemplateHaskell #-}
-- test.hs
import FFiImport
import Foreign.C
foreign_1 :: CDouble -> CString -> CString -> IO CString
foreign_2 :: CDouble -> CString -> CString -> IO (Int,CString)
foreign_3 :: CString -> IO ()
foreign_1 = undefined; foreign_2 = undefined; foreign_3 = undefined
fmap concat (mapM ffimport ['foreign_1, 'foreign_2, 'foreign_3])
Inferred types of the generated functions are:
imported_foreign_1 :: Double -> String -> String -> IO String
imported_foreign_2 :: Double -> String -> String -> IO (Int, String)
imported_foreign_3 :: String -> IO ()
Checking the generated code by loading test.hs with -ddump-splices (note that
ghc still seems to miss some parentheses in the pretty printing) shows that
foreign_2 writes a definition which after some prettying up looks like:
imported_foreign_2 w x y
= (\ (a, b) -> ((return (,) `ap` return a) `ap` peekCString b) =<<
join
(((return foreign_2 `ap`
(return . (realToFrac :: Double -> CDouble)) w) `ap`
newCString x) `ap`
newCString y))
or translated to do notation:
imported_foreign_2 w x y = do
w2 <- return . (realToFrac :: Double -> CDouble) w
x2 <- newCString x
y2 <- newCString y
(a,b) <- foreign_2 w2 x2 y2
a2 <- return a
b2 <- peekCString b
return (a2,b2)
Generating code the first way is simpler in that there are less variables to
track. While foldl ($) f [x,y,z] doesn't type check when it would mean
((f $ x) $ y $ z) = f x y z
it's acceptable in template haskell which involves only a handful of different
types.
Now for the actual implementation of those ideas:
{-# LANGUAGE TemplateHaskell #-}
-- FFiImport.hs
module FFiImport(ffimport) where
import Language.Haskell.TH; import Foreign.C; import Control.Monad
-- a couple utility definitions
-- args (a -> b -> c -> d) = [a,b,c]
args (AppT (AppT ArrowT x) y) = x : args y
args _ = []
-- result (a -> b -> c -> d) = d
result (AppT (AppT ArrowT _) y) = result y
result y = y
-- con (IO a) = IO
-- con (a,b,c,d) = TupleT 4
con (AppT x _) = con x
con x = x
-- conArgs (a,b,c,d) = [a,b,c,d]
-- conArgs (Either a b) = [a,b]
conArgs ty = go ty [] where
go (AppT x y) acc = go x (y:acc)
go _ acc = acc
The splice $(ffimport 'foreign_2) looks at the type of foreign_2 with reify to
decide on which functions to apply to the arguments or result.
-- Possibly useful to parameterize based on conv'
ffimport :: Name -> Q [Dec]
ffimport n = do
VarI _ ntype _ _ <- reify n
let ty :: [Type]
ty = args ntype
let -- these define conversions
-- (ffiType, (hsType -> IO ffiType, ffiType -> IO hsType))
conv' :: [(TypeQ, (ExpQ, ExpQ))]
conv' = [
([t| CString |], ([| newCString |],
[| peekCString |])),
([t| CDouble |], ([| return . (realToFrac :: Double -> CDouble) |],
[| return . (realToFrac :: CDouble -> Double) |]))
]
sequenceFst :: Monad m => [(m a, b)] -> m [(a,b)]
sequenceFst x = liftM (`zip` map snd x) (mapM fst x)
conv' <- sequenceFst conv'
-- now conv' :: [(Type, (ExpQ, ExpQ))]
Given conv' above, it's somewhat straightforward to apply those functions when
the types match. The back case would be shorter if converting components of
returned tuples wasn't important.
let conv :: Type -- ^ type of v
-> Name -- ^ variable to be converted
-> ExpQ
conv t v
| Just (to,from) <- lookup t conv' =
[| $to $(varE v) |]
| otherwise = [| return $(varE v) |]
-- | function to convert result types back, either
-- occuring as IO a, IO (a,b,c) (for any tuple size)
back :: ExpQ
back
| AppT _ rty <- result ntype,
TupleT n <- con rty,
n > 0, -- for whatever reason $(conE (tupleDataName 0))
-- doesn't work when it could just be $(conE '())
convTup <- map (maybe [| return |] snd .
flip lookup conv')
(conArgs rty)
= do
rs <- replicateM n (newName "r")
lamE [tupP (map varP rs)]
[| $(foldl (\f x -> [| $f `ap` $x |])
[| return $(conE (tupleDataName n)) |]
(zipWith (\c r -> [| $c $(varE r)|]) convTup rs))
|]
| AppT _ nty <- result ntype,
Just (_,from) <- nty `lookup` conv' = from
| otherwise = [| return |]
Finally, put both parts together in a function definition:
vs <- replicateM (length ty) (newName "v")
liftM (:[]) $
funD (mkName $ "imported_"++nameBase n)
[clause
(map varP vs)
(normalB [| $back =<< join
$(foldl (\x y -> [| $x `ap` $y |])
[| return $(varE n) |]
(zipWith conv ty vs))
|])
[]]
Here's a horrible two typeclass solution. The first part (named, unhelpfully, foo) will take things of types like Double -> Double -> CString -> IO () and turn them into things like IO (Double -> IO (Double -> IO (String -> IO ()))). So each conversion is forced into IO just to keep things fully uniform.
The second part, (named cio for "collapse io) will take those things and shove all the IO bits to the end.
class Foo a b | a -> b where
foo :: a -> b
instance Foo (IO a) (IO a) where
foo = id
instance Foo a (IO b) => Foo (CString -> a) (IO (String -> IO b)) where
foo f = return $ \s -> withCString s $ \cs -> foo (f cs)
instance Foo a (IO b) => Foo (Double -> a) (IO (Double -> IO b)) where
foo f = return $ \s -> foo (f s)
class CIO a b | a -> b where
cio :: a -> b
instance CIO (IO ()) (IO ()) where
cio = id
instance CIO (IO b) c => CIO (IO (a -> IO b)) (a -> c) where
cio f = \a -> cio $ f >>= ($ a)
{-
*Main> let x = foo (undefined :: Double -> Double -> CString -> IO ())
*Main> :t x
x :: IO (Double -> IO (Double -> IO (String -> IO ())))
*Main> :t cio x
cio x :: Double -> Double -> String -> IO ()
-}
Aside from being a generally terrible thing to do, there are two specific limitations. The first is that a catchall instance of Foo can't be written. So for every type you want to convert, even if the conversion is just id, you need an instance of Foo. The second limitation is that a catchall base case of CIO can't be written because of the IO wrappers around everything. So this only works for things that return IO (). If you want it to work for something returning IO Int you need to add that instance too.
I suspect that with sufficient work and some typeCast trickery these limitations can be overcome. But the code is horrible enough as is, so I wouldn't recommend it.
It's definitely possible. The usual approach is to create lambdas to pass to withCString. Using your example:
myMarshaller :: (CDouble -> CString -> IO ()) -> CDouble -> String -> IO ()
myMarshaller func cdouble string = ...
withCString :: String -> (CString -> IO a) -> IO a
The inner function has type CString -> IO a, which is exactly the type after applying a CDouble to the C function func. You've got a CDouble in scope too, so that's everything you need.
myMarshaller func cdouble string =
withCString string (\cstring -> func cdouble cstring)
I've seen the other post about this, but is there a clean way of doing this in Haskell?
As a 2nd part, can it also be done without making the function monadic?
The package data-memocombinators on hackage provides lots of reusable memoization routines. The basic idea is:
type Memo a = forall r. (a -> r) -> (a -> r)
I.e. it can memoize any function from a. The module then provides some primitives (like unit :: Memo () and integral :: Memo Int), and combinators for building more complex memo tables (like pair :: Memo a -> Memo b -> Memo (a,b) and list :: Memo a -> Memo [a]).
You can modify Jonathan´s solution with unsafePerformIO to create a "pure" memoizing version of your function.
import qualified Data.Map as Map
import Data.IORef
import System.IO.Unsafe
memoize :: Ord a => (a -> b) -> (a -> b)
memoize f = unsafePerformIO $ do
r <- newIORef Map.empty
return $ \ x -> unsafePerformIO $ do
m <- readIORef r
case Map.lookup x m of
Just y -> return y
Nothing -> do
let y = f x
writeIORef r (Map.insert x y m)
return y
This will work with recursive functions:
fib :: Int -> Integer
fib 0 = 1
fib 1 = 1
fib n = fib_memo (n-1) + fib_memo (n-2)
fib_memo :: Int -> Integer
fib_memo = memoize fib
Altough this example is a function with one integer parameter, the type of memoize tells us that it can be used with any function that takes a comparable type. If you have a function with more than one parameter just group them in a tuple before applying memoize. F.i.:
f :: String -> [Int] -> Float
f ...
f_memo = curry (memoize (uncurry f))
This largely follows http://www.haskell.org/haskellwiki/Memoization.
You want a function of type (a -> b). If it doesn't call itself, then
you can just write a simple wrapper that caches the return values. The
best way to store this mapping depends on what properties of a you can
exploit. Ordering is pretty much a minimum. With integers
you can construct an infinite lazy list or tree holding the values.
type Cacher a b = (a -> b) -> a -> b
positive_list_cacher :: Cacher Int b
positive_list_cacher f n = (map f [0..]) !! n
or
integer_list_cacher :: Cacher Int b
integer_list_cacher f n = (map f (interleave [0..] [-1, -2, ..]) !!
index n where
index n | n < 0 = 2*abs(n) - 1
index n | n >= 0 = 2 * n
So, suppose it is recursive. Then you need it to call not itself, but
the memoized version, so you pass that in instead:
f_with_memo :: (a -> b) -> a -> b
f_with_memo memoed base = base_answer
f_with_memo memoed arg = calc (memoed (simpler arg))
The memoized version is, of course, what we're trying to define.
But we can start by creating a function that caches its inputs:
We could construct one level by passing in a function that creates a
structure that caches values. Except we need to create the version of f
that already has the cached function passed in.
Thanks to laziness, this is no problem:
memoize cacher f = cached where
cached = cacher (f cached)
then all we need is to use it:
exposed_f = memoize cacher_for_f f
The article gives hints as to how to use a type class selecting on the
input to the function to do the above, rather than choosing an explicit
caching function. This can be really nice -- rather than explicitly
constructing a cache for each combination of input types, we can implicitly
combine caches for types a and b into a cache for a function taking a and b.
One final caveat: using this lazy technique means the cache never shrinks,
it only grows. If you instead use the IO monad, you can manage this, but
doing it wisely depends on usage patterns.
Doing a direct translation from the more imperative languages, I came up with this.
memoize :: Ord a => (a -> IO b) -> IO (a -> IO b)
memoize f =
do r <- newIORef Map.empty
return $ \x -> do m <- readIORef r
case Map.lookup x m of
Just y -> return y
Nothing -> do y <- f x
writeIORef r (Map.insert x y m)
return y
But this is somehow unsatisfactory. Also, Data.Map constrains the parameter to be an instance of Ord.
If your arguments are going to be natural numbers, you can do simply:
memo f = let values = map f [0..]
in \n -> values !! n
However, that doesn't really help you with the stack overflowing, and it doesn't work with recursive calls. You can see some fancier solutions at http://www.haskell.org/haskellwiki/Memoization.