Time bounded function with partial result in Haskell - haskell

An initial value (x), function (f), a predicate (p) and a time bound (t) is given. I want to apply 'f' repeatedly on x till it satisfies the 'p'. But on the same time wanna make sure it doesn't exceeds time limit. If time exceeds 't', it should return the partial result i.e., a pair of a number 'n' and value of applying 'f' n-times on 'x', for the largest n for which it actually performed the computation.
If partial result condition is relaxed, this can be easily programmed as -
import System.Timeout
iter :: a -> (a -> a) -> (a -> Bool) -> Int -> IO (Maybe (Int, a))
iter x f p t = do
let fs = x:(map f fs)
timeout t $ return $! head $ filter (\x -> p $ snd x) $ zip [1..] fs
I want it to have signature similar to -
iter :: a -> (a -> a) -> (a -> Bool) -> Int -> IO (Either (Int, a) (Int, a))
With Left for partial result and Right for complete result.
An silly and trivial example use of above function is -
*Main> iter 1 (+2) (> 1000000) 1000000
Just (500001,1000001)
*Main> iter 1 (+2) (> 1000000) 100000
Nothing
I want the second call to return partial computed result. Is there a simple way to do it?
More practical examples can be Newton–Raphson method or gradient descent.

I believe it is simplest to delegate such tasks to libraries with better abstraction capability than that offered by base, e.g. async:
{-# LANGUAGE BangPatterns #-}
import Control.Concurrent.Async
import Control.Concurrent
iter :: a -> (a -> a) -> (a -> Bool) -> Int -> IO (Either (Int, a) (Int, a))
iter z f p maxt = do
o <- newMVar (0, z)
let loop old#(!i,x) = do
modifyMVar_ o (const $ return old)
if p x then return old else loop (i+1, f x)
race (threadDelay maxt >> readMVar o) (loop (0, z))
race takes two IO actions and returns whichever completes first, killing the other. The left action completes only if the maximum time has elapsed, and that thread is able to read the MVar. Since the other thread holds the MVar for a short period of time (while it is writing the result) the worker will never be interrupted while writing the result.
Note also that the only thing that forces the chain of applications f $ f $ f.. is the predicate p - if you pass a lazy function (e.g. const False) then this will not work as you want. In practice there are few cases when you would use such a function (esp. with numeric computing), so it likely won't be of much concern. But in this case loop does no actual work and builds a ridiculously large number amount of applications:
>iter 2 (\x -> x * x) (const False) (10^6)
Left (472190,Interrupted.
My computer will never be able to print this result because it has 6.8×10^142142 digits. However:
>iter 2 (\x -> x * x) (<0) (10^6)
Left (24,Interrupted
This is a small number with only about 5,000,000 digits.

I believe your best bet is to use an IORef to keep track of the current computation state. Even if the computation is interrupted, its side effects won't be undone. Be sure to use just one IORef for both the counter and the current value; otherwise they could go out of sync in case of a timeout.
I'm not at all experienced with asynchronous code in Haskell, so take this with a grain of salt, but it seems to work.
{-# LANGUAGE BangPatterns #-}
module IterTimeout where
import Control.Applicative
import Data.IORef
import System.Timeout
iter :: a -> (a -> a) -> (a -> Bool) -> Int -> IO (Either (Int, a) (Int, a))
iter x f p t = do
ref <- newIORef (0, x)
result <- timeout t (iterStep f p ref)
maybe (Left <$> readIORef ref) (return . Right) result
iterStep :: (a -> a) -> (a -> Bool) -> IORef (Int, a) -> IO (Int, a)
iterStep f p ref = go
where
go = do
old#(!oldCount, oldVal) <- readIORef ref
if p oldVal
then return old
else writeIORef ref (oldCount + 1, f oldVal) >> go

Related

Haskell instance of `bind` for a custom type

I'm trying to create an instance for bind operator (>>=) to the custom type ST a
I found this way to do it but I don't like that hardcoded 0.
Is there any way to implement it without having the hardcoded 0 and respecting the type of the function?
newtype ST a = S (Int -> (a, Int))
-- This may be useful to implement ">>=" (bind), but it is not mandatory to use it
runState :: ST a -> Int -> (a, Int)
runState (S s) = s
instance Monad ST where
return :: a -> ST a
return x = S (\n -> (x, n))
(>>=) :: ST a -> (a -> ST b) -> ST b
s >>= f = f (fst (runState s 0))
I often find it easier to follow such code with a certain type of a pseudocode rewrite, like this: starting with the
instance Monad ST where
return :: a -> ST a
return x = S (\n -> (x, n))
we get to the
runState (return x) n = (x, n)
which expresses the same thing exactly. It is now a kind of a definition through an interaction law that it must follow. This allows me to ignore the "noise"/wrapping around the essential stuff.
Similarly, then, we have
(>>=) :: ST a -> (a -> ST b) -> ST b
s >>= f = -- f (fst (runState s 0)) -- nah, 0? what's that?
--
-- runState (s >>= f) n = runState (f a) i where
-- (a, i) = runState s n
--
S $ \ n -> let (a, i) = runState s n in
runState (f a) i
because now we have an Int in sight (i.e. in scope), n, that will get provided to us when the combined computation s >>= f will "run". I mean, when it will runState.
Of course nothing actually runs until called upon from main. But it can be a helpful metaphor to hold in mind.
The way we've defined it is both the easiest and the most general, which is usually the way to go. There are more ways to make the types fit though.
One is to use n twice, in the input to the second runState as well, but this will leave the i hanging unused.
Another way is to flip the time arrow around w.r.t. the state passing, with
S $ \ n -> let (a, i2) = runState s i
(b, i ) = runState (f a) n
in (b, i2)
which is a bit weird to say the least. s still runs first (as expected for the s >>= f combination) to produce the value a from which f creates the second computation stage, but the state is being passed around in the opposite direction.
The most important thing to keep in mind is that your ST type is a wrapper around a function. What if you started your definition as (>>=) = \s -> \f -> S (\n -> ... )? It might be (ok, is) a bit silly to write separate lambdas for the s and f parameters there, but I did it to show that they're not really any different from the n parameter. You can use it in your definition of (>>=).

Unexpected memory growth with Control.Monad foldM

I have the following code, which has been stripped down and is I think as minimal as possible that has some very odd behaviour.
The code consists of two source files:
One to define some data:
module MyFunction where
data MyFunction =
MyFunction {
functionNumber :: Int,
functionResult :: IO String
}
makeMyFunction :: Show a => Int -> IO a -> MyFunction
makeMyFunction number result = MyFunction {
functionNumber = number,
functionResult = result >>= return . show }
And the other is Main:
module Main (main) where
import System.CPUTime (getCPUTime)
import Data.List (foldl')
import Data.Foldable (foldlM)
import Control.Monad (foldM)
import MyFunction
exampleFunction = do
--let x = foldl' (\a b -> a `seq` (a + b)) 0 [1..20000000] -- This works
--x <- foldlM (\a b -> a `seq` return (a + b)) 0 [1..20000000] -- This works (*)
x <- foldM (\a b -> a `seq` return (a + b)) 0 [1..20000000] -- This doesn't
print x
return ()
runFunction fn = do
result <- functionResult fn
duration <- getCPUTime
if result /= "()"
then putStrLn ""
else return ()
putStrLn (show (fromIntegral duration / (10^9)) ++ "ms")
return fn
main = do
runFunction (makeMyFunction 123 exampleFunction)
return ()
The code as above (compiled using GHC 7.10.3 with stack 1.0.0 with default flags) has a rapid increase in memory usage (exceeding 1GB), and takes typically 3.3 seconds.
If I make a changes to the code, for example:
Use one of the commented alternatives to the problem line
Take out any line from runFunction
The memory usage will remain minimal, and takes only about 1 second.
One feature that I think is most surprising to me is that replacing foldM with foldlM (which as far as I know foldM = foldlM) fixes the problem.
Also making changes to code that I don't see has any relationship to the problem lines of code also fixes the problem. For example removing the last putStrLn.
Another oddity is that if I merge the MyFunction module into the Main module, while it doesn't fix the problem, it actually causes foldlM to behave as foldM using excessive memory.
In the real code that this came from, I have a large number exampleFunctions, and there is significantly more Main code, and every so often I encounter this sort of unexplained memory usage from functions, that can usually be resolved by some sort of voodoo.
I'm looking for an explanation for the behaviour. If I know why this occurs I can then look into avoiding it. Could this be a compiler issue, or maybe just a misunderstanding on my part?
(*) I've highlighted the secondary issue that causes the same memory growth to occur with foldlM.
Here is foldlM from Foldable.hs (ghc)
-- | Monadic fold over the elements of a structure,
-- associating to the left, i.e. from left to right.
foldlM :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM f z0 xs = foldr f' return xs z0
where f' x k z = f z x >>= k
and foldM from Monad.hs
foldM :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
{-# INLINEABLE foldM #-}
{-# SPECIALISE foldM :: (a -> b -> IO a) -> a -> [b] -> IO a #-}
{-# SPECIALISE foldM :: (a -> b -> Maybe a) -> a -> [b] -> Maybe a #-}
foldM = foldlM
I placed these definitions to a separate module Test and tested the execution with and without INLINEABLE / SPESIALISE lines. Whatever the reason is, leaving out the SPECIALISE directives helped and the execution time and memory usage was like with foldlM.
After a little bit more digging, removing line
{-# SPECIALISE foldM :: (a -> b -> IO a) -> a -> [b] -> IO a #-}
effected the most.

Haskell infinite list of Bernoulli distributed booleans

I need an list of biased, random booleans. Each boolean needs to have the same probability of being True (Bernoulli distributed). These booleans are passed to a function, which generates zero or more output booleans per input boolean. I need an infinite list, because I don't know in advance how many booleans are required to provide enough output. See the below (simplified) code:
import System.Random.MWC
import System.Random.MWC.Distributions
foo :: [Bool] -> [Bool] -- foo outputs zero or more Bools per input Bool
main = do
gen <- create
bits <- sequence . repeat $ bernoulli 0.25 gen
print . take 32 . foo $ bits
Unfortunately, this code just hangs at the second line of main. I guess that there is something non-lazy happening somewhere with Control.Monad.ST?
(I would be able to do something like this with System.Random.randoms, but the resulting values don't have the required distributions.)
Can I fix this while keep using the System.Random.MWC library? Or does this require me to switch to alternative implementations?
The mwc-random package provides two PrimMonad instances, one for IO and another for ST s. As long as an ST computation is parameterized over all state tags s, we can run the computation and extract the value with runST :: (forall s. ST s a) -> a. By itself this wouldn't be very useful since we'd lose the state: the seed of the random generator, but mwc-random also provides explicit ways to handle the seeds:
save :: PrimMonad m => Gen (PrimState m) -> m Seed
restore :: PrimMonad m => Seed -> m (Gen (PrimState m))
We can use these to make a computation that generates a stream of values from a computation that generates a single value, as long as the generator is in forall s. ST s.
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE ScopedTypeVariables #-}
import System.Random.MWC
import Control.Monad.ST
import System.Random.MWC.Distributions
randomStream :: forall s a. (forall s. GenST s -> ST s a) -> GenST s -> ST s [a]
randomStream item = go
where
go :: forall s. GenST s -> ST s [a]
go gen = do
x <- item gen
seed <- save gen
return (x:runST (restore seed >>= go))
With this we can write your example as
main = do
bits <- withSystemRandom (randomStream (bernoulli 0.25))
print . take 32 $ bits
We can actually build generators more sophisticated than using the same generator for each item in the stream. We could thread a state along the stream so that each value can depend on the results of the previous ones.
unfoldStream :: forall s a b. (forall s. b -> GenST s -> ST s (a, b)) -> b -> GenST s -> ST s [a]
unfoldStream item = go
where
go :: forall s. b -> GenST s -> ST s [a]
go b gen = do
(x,b') <- item b gen
seed <- save gen
return (x:runST (restore seed >>= go b'))
The following example stream has results that increase in likelihood every time the result is False.
import Control.Monad.Primitive
interesting :: (PrimMonad m) => Double -> Gen (PrimState m) -> m (Bool, Double)
interesting p gen = do
result <- bernoulli p gen
let p' = if result then p else p + (1-p)*0.25
return (result, p')
main = do
bits <- withSystemRandom (unfoldStream interesting 0)
print . take 32 $ bits
The culprit is sequence . repeat - this will hang for (almost?) every monad, since you must perform a potentially infinite number of effects.
The simplest solutions would be to use a different library - which may not be possible if you are relying on the quality of the numbers produced from mwc-random. The next simplest solution is to rewrite foo to have type [IO Bool] -> IO [Bool] and pass it repeat (bernoulli 0.25 gen) - this would allow foo to make the choice of when to stop executing the effects produced by the infinite list. But having your logic inside of IO is not very nice.
The standard trick when you need an infinite list of random numbers is to use a pure function f :: StdGen -> (Result, StdGen). Then unfoldr (Just . f) :: StdGen -> [Result], and the output is an infinite list. At first glance, it may appear that mwc-random only has monadic functions, and that there is no pure interface. However, that is not the case, because ST s is an instance of PrimMonad. You also have the functions converting a Gen to a Seed. Using these, you can get a pure RNG function for any monadic one:
{-# LANGUAGE RankNTypes #-}
import System.Random.MWC
import System.Random.MWC.Distributions
import Control.Monad.ST
import Data.List
pureRand :: (forall s . GenST s -> ST s t) -> Seed -> (t, Seed)
pureRand f s = runST $ do
s' <- restore s
r <- f s'
s'' <- save s'
return (r, s'')
pureBernoulli :: Double -> Seed -> (Bool, Seed)
pureBernoulli a = pureRand (bernoulli a)
foo :: [Bool] -> [Bool]
foo = id
main = do
gen <- create >>= save
let bits = unfoldr (Just . pureBernoulli 0.25) gen
print . take 32 . foo $ bits
It is unfortunate that mwc-random doesn't expose this sort of interface by default but it is pretty easy to get to.
The other option is slightly more scary - use unsafe functions.
import System.IO.Unsafe
repeatM rand = go where
go = do
x <- rand
xs <- unsafeInterleaveIO go
return (x : xs)
main2 = do
gen <- create
bits <- repeatM (bernoulli 0.25 gen)
print . take 32 . foo $ bits
Naturally this comes with the usual caveats surrounding unsafe - use it only if you are exceedingly inconvenienced by the pure functions. unsafeInterleaveIO may reorder or never execute effects - if foo, for example, ignores one element, it will never be computed and the corresponding effect of updating the state stored in gen may not happen. For example, the following will print nothing:
snd <$> ((,) <$> unsafeInterleaveIO (putStrLn "Hello") <*> return ())

Unfold returning the last state of the accumulator

The unfold function in Haskell is very handy to create lists. Its definition is:
unfold :: (b -> Maybe (a, b)) -> b -> [a]
But I would like to get the last value of the accumulator used. A possible implementation is:
unfoldRest :: (b -> Maybe (a, b)) -> b -> ([a], b)
unfoldRest fct ini = go fct ini []
where
go f s acc =
case f s of
Nothing -> (acc, s)
Just (a, b) -> go f b (acc ++ [a])
But I was wondering if there wasn't a way to do it with existing functions. In the end this:
countDown 0 = Nothing
countDown n = Just (n, n-1)
unfoldRest countDown 10
will return:
([10,9,8,7,6,5,4,3,2,1],0)
Because the iteration stopped when the accumulator value reached 0.
import Data.List
unfoldr' :: (b -> Maybe (a, b)) -> b -> [(a, b)]
unfoldr' f = unfoldr (fmap (\(a, b) -> ((a, b), b)) . f)
will give you all the states of the accumulator. Then you can choose to look at whichever you want, including the last.
This is not much of an answer (Tom Ellis' better covers the "a way to do it with existing functions" part), but it is worth it taking a second look at your original solution. Since you are using (++) to append single elements repeatedly, it takes quadratic time with respect to the length of the generated list. You can avoid that by dropping the helper function and building the list directly with (:):
unfoldRest' :: (b -> Maybe (a, b)) -> b -> ([a], b)
unfoldRest' f s = case f s of
Nothing -> ([], s)
Just (a, b) -> (\ ~(xs, y) -> (a : xs, y)) $ unfoldRest' f b
The lazy pattern match (~(xs, y) in the lambda) is important; it allows you to look at the first elements of the list without having to calculate the final state, and therefore to do something useful with infinite lists (in any case, Tom Ellis' solution is better for infinite lists, as you can see the not only the generated values but also state after any arbitrary segment). As Will Ness points out, you may find the right hand side of the Just case more natural to write using a let binding, as in let (xs, y) = unfoldRest' f b in (a : xs, y).
As you tagged the question with "pointfree", it is worth it mentioning that you can reduce quite a lot the amount of points by using maybe and the Control.Arrow combinators:
import Control.Arrow ((***), first, app)
unfoldRest'' f s =
maybe ([], s) (app . (first . (:) *** unfoldRest'' f)) $ f s
Whether you want to go that far is a matter of taste. The laziness issue is dealt with correctly, as the implementation of (***) for functions uses a lazy pattern match.
I've grappled with this problem before - one of ways to solve it is by using the State monad.
In simple terms, they deal with functions on the form s -> (d, s). Intuitively, s is the type of the state that may change during a computation.
The first thing to note is that s -> Maybe (d, s) doesn't have the form s -> (d, s): the former is a tuple of things, while the latter is a Maybe, we need a function on the type s -> (Maybe d, s), if the function returns None, the modified function will return the previous state. One possible implementation of this adapter is:
keepFailure :: (s -> Maybe (d, s)) -> (s -> (Maybe d, s))
keepFailure f s = maybe (Nothing, s) (first Just) (f s)
Remember to import Data.Bifunctor because of the first function
There's a function that converts from s -> (d, s) to State s d called state, and
runState to convert it back. Now we implement the function which is will try exhausting the state of all possible values:
stateUnfoldr :: State s (Maybe d) -> State s [d]
stateUnfoldr f = do
mx <- f
case mx of
Just x -> do
xs <- stateUnfoldr f
return $ x:xs
Nothing -> return []
In simple terms, mx <- f works like "apply f to the input, update the state, get assign the return value to mx"
Then, we can piece everything together:
fStateUnfoldr :: (s -> Maybe (d, s)) -> (s -> ([d], s))
fStateUnfoldr f = runState $ stateUnfoldr $ state . keepFailure $ f
Remember to import Control.Monad.State
state . keepFailure adapts f into a State s (Maybe d) Monad, then stateUnfoldr unfolds to a State s [d], then runState turns it back to a function.
We can also use the execState or evalState instead of runState if you want just the state or just the list.

How do you make a generic memoize function in Haskell?

I've seen the other post about this, but is there a clean way of doing this in Haskell?
As a 2nd part, can it also be done without making the function monadic?
The package data-memocombinators on hackage provides lots of reusable memoization routines. The basic idea is:
type Memo a = forall r. (a -> r) -> (a -> r)
I.e. it can memoize any function from a. The module then provides some primitives (like unit :: Memo () and integral :: Memo Int), and combinators for building more complex memo tables (like pair :: Memo a -> Memo b -> Memo (a,b) and list :: Memo a -> Memo [a]).
You can modify Jonathan´s solution with unsafePerformIO to create a "pure" memoizing version of your function.
import qualified Data.Map as Map
import Data.IORef
import System.IO.Unsafe
memoize :: Ord a => (a -> b) -> (a -> b)
memoize f = unsafePerformIO $ do
r <- newIORef Map.empty
return $ \ x -> unsafePerformIO $ do
m <- readIORef r
case Map.lookup x m of
Just y -> return y
Nothing -> do
let y = f x
writeIORef r (Map.insert x y m)
return y
This will work with recursive functions:
fib :: Int -> Integer
fib 0 = 1
fib 1 = 1
fib n = fib_memo (n-1) + fib_memo (n-2)
fib_memo :: Int -> Integer
fib_memo = memoize fib
Altough this example is a function with one integer parameter, the type of memoize tells us that it can be used with any function that takes a comparable type. If you have a function with more than one parameter just group them in a tuple before applying memoize. F.i.:
f :: String -> [Int] -> Float
f ...
f_memo = curry (memoize (uncurry f))
This largely follows http://www.haskell.org/haskellwiki/Memoization.
You want a function of type (a -> b). If it doesn't call itself, then
you can just write a simple wrapper that caches the return values. The
best way to store this mapping depends on what properties of a you can
exploit. Ordering is pretty much a minimum. With integers
you can construct an infinite lazy list or tree holding the values.
type Cacher a b = (a -> b) -> a -> b
positive_list_cacher :: Cacher Int b
positive_list_cacher f n = (map f [0..]) !! n
or
integer_list_cacher :: Cacher Int b
integer_list_cacher f n = (map f (interleave [0..] [-1, -2, ..]) !!
index n where
index n | n < 0 = 2*abs(n) - 1
index n | n >= 0 = 2 * n
So, suppose it is recursive. Then you need it to call not itself, but
the memoized version, so you pass that in instead:
f_with_memo :: (a -> b) -> a -> b
f_with_memo memoed base = base_answer
f_with_memo memoed arg = calc (memoed (simpler arg))
The memoized version is, of course, what we're trying to define.
But we can start by creating a function that caches its inputs:
We could construct one level by passing in a function that creates a
structure that caches values. Except we need to create the version of f
that already has the cached function passed in.
Thanks to laziness, this is no problem:
memoize cacher f = cached where
cached = cacher (f cached)
then all we need is to use it:
exposed_f = memoize cacher_for_f f
The article gives hints as to how to use a type class selecting on the
input to the function to do the above, rather than choosing an explicit
caching function. This can be really nice -- rather than explicitly
constructing a cache for each combination of input types, we can implicitly
combine caches for types a and b into a cache for a function taking a and b.
One final caveat: using this lazy technique means the cache never shrinks,
it only grows. If you instead use the IO monad, you can manage this, but
doing it wisely depends on usage patterns.
Doing a direct translation from the more imperative languages, I came up with this.
memoize :: Ord a => (a -> IO b) -> IO (a -> IO b)
memoize f =
do r <- newIORef Map.empty
return $ \x -> do m <- readIORef r
case Map.lookup x m of
Just y -> return y
Nothing -> do y <- f x
writeIORef r (Map.insert x y m)
return y
But this is somehow unsatisfactory. Also, Data.Map constrains the parameter to be an instance of Ord.
If your arguments are going to be natural numbers, you can do simply:
memo f = let values = map f [0..]
in \n -> values !! n
However, that doesn't really help you with the stack overflowing, and it doesn't work with recursive calls. You can see some fancier solutions at http://www.haskell.org/haskellwiki/Memoization.

Resources