I want to write a function that takes a time limit (in seconds) and a list, and computes as many elements of the list as possible within the time limit.
My first attempt was to first write the following function, which times a pure computation and returns the time elapsed along with the result:
import Control.DeepSeq
import System.CPUTime
type Time = Double
timed :: (NFData a) => a -> IO (a, Time)
timed x = do t1 <- getCPUTime
r <- return $!! x
t2 <- getCPUTime
let diff = fromIntegral (t2 - t1) / 10^12
return (r, diff)
I can then define the function I want in terms of this:
timeLimited :: (NFData a) => Time -> [a] -> IO [a]
timeLimited remaining [] = return []
timeLimited remaining (x:xs) = if remaining < 0
then return []
else do
(y,t) <- timed x
ys <- timeLimited (remaining - t) xs
return (y:ys)
This isn't quite right though. Even ignoring timing errors and floating point errors, this approach never stops the computation of an element of the list once it has started, which means that it can (and in fact, normally will) overrun its time limit.
If instead I had a function that could short-circuit evaluation if it had taken too long:
timeOut :: Time -> a -> IO (Maybe (a,t))
timeOut = undefined
then I could write the function that I really want:
timeLimited' :: Time -> [a] -> IO [a]
timeLimited' remaining [] = return []
timeLimited' remaining (x:xs) = do
result <- timeOut remaining x
case result of
Nothing -> return []
Just (y,t) -> do
ys <- timeLimited' (remaining - t) xs
return (y:ys)
My questions are:
How do I write timeOut?
Is there a better way to write the function timeLimited, for example, one that doesn't suffer from accumulated floating point error from adding up time differences multiple times?
Here's an example I was able to cook up using some of the suggestions above. I've not done huge amounts of testing to ensure work is cut off exactly when the timer runs out, but based on the docs for timeout, this should work for all things not using FFI.
import Control.Concurrent.STM
import Control.DeepSeq
import System.Timeout
type Time = Int
-- | Compute as many items of a list in given timeframe (microseconds)
-- This is done by running a function that computes (with `force`)
-- list items and pushed them onto a `TVar [a]`. When the requested time
-- expires, ghc will terminate the execution of `forceIntoTVar`, and we'll
-- return what has been pushed onto the tvar.
timeLimited :: (NFData a) => Time -> [a] -> IO [a]
timeLimited t xs = do
v <- newTVarIO []
_ <- timeout t (forceIntoTVar xs v)
readTVarIO v
-- | Force computed values into given tvar
forceIntoTVar :: (NFData a) => [a] -> TVar [a] -> IO [()]
forceIntoTVar xs v = mapM (atomically . modifyTVar v . forceCons) xs
-- | Returns function that does actual computation and cons' to tvar value
forceCons :: (NFData a) => a -> [a] -> [a]
forceCons x = (force x:)
Now let's try it on something costly:
main = do
xs <- timeLimited 100000 expensiveThing -- run for 100 milliseconds
print $ length $ xs -- how many did we get?
-- | Some high-cost computation
expensiveThing :: [Integer]
expensiveThing = sieve [2..]
where
sieve (p:xs) = p : sieve [x|x <- xs, x `mod` p > 0]
Compiled and run with time, it seems to work (obviously there is some overhead outside the timed portion, but I'm at roughly 100ms:
$ time ./timeLimited
1234
./timeLimited 0.10s user 0.01s system 97% cpu 0.112 total
Also, something to note about this approach; since I'm enclosing the entire operation of running the computations and pushing them onto the tvar inside one call to timeout, some time here is likely lost in creating the return structure, though I'm assuming (if your computations are costly) it won't account or much of your overall time.
Update
Now that I've had some time to think about it, due to Haskell's laziness, I'm not 100% positive the note above (about time-spent creating the return structure) is correct; either way, let me know if this is not precise enough for what you are trying to accomplish.
You can implement timeOut with the type you gave using timeout and evaluate. It looks something like this (I've omitted the part that computes how much time is left -- use getCurrentTime or similar for that):
timeoutPure :: Int -> a -> IO (Maybe a)
timeoutPure t a = timeout t (evaluate a)
If you want more forcing than just weak-head normal form, you can call this with an already-seq'd argument, e.g. timeoutPure (deepseq v) instead of timeoutPure v.
I would use two threads together with TVars and raise an exception (that causes every ongoing transaction to be rolled back) in the computation thread when the timeout has been reached:
forceIntoTVar :: (NFData a) => [a] -> TVar [a] -> IO [()]
forceIntoTVar xs v = mapM (atomically . modifyTVar v . forceCons) xs
-- | Returns function that does actual computation and cons' to tvar value
forceCons :: (NFData a) => a -> [a] -> [a]
forceCons x = (force x:)
main = do
v <- newTVarIO []
tID <- forkIO $ forceIntoTVar args v
threadDelay 200
killThread tID
readTVarIO v
In this example you (may) need to adjust forceIntoTVar a bit so that e.g. the list nodes are NOT computet inside the atomic transaction but first computed and then a atomic transaction is started to cons them onto the list.
In any case, when the exception is raised the ongoing transaction is rolled back or the ongoing computation is stopped before the result is consed to the list and that is what you want.
What you need to consider is that when the individual computations to prepare a node run with high frequency then this example is probably very costly compared to not using STM.
Related
I'd like to search through a list, testing each element for property X and then return when an element with property X is found.
This list is very large and would benefit from parallelism, but the cost of the spark is rather high relative to the compute time. parListChunk would be great, but then it must search through the entire list.
Is there some way I can write something like parListChunk but with early abort?
This is the naive search code:
hasPropertyX :: Object -> Bool
anyObjectHasPropertyX :: [Object] -> Bool
anyObjectHasPropertyX [] = False
anyObjectHasPropertyX l
| hasPropertyX (head l) == True = True
| otherwise = anyObjectHasPropertyX (tail l)
and this is my first attempt at parallelism:
anyObjectHasPropertyXPar [] = False
anyObjectHasPropertyXPar [a] = hasPropertyX a
anyObjectHasPropertyXPar (a:b:rest) = runEval $ do c1 <- rpar (force (hasPropertyX a))
c2 <- rpar (force (hasPropertyX b))
rseq c1
rseq c2
if (c1 == True) || (c2 == True) then return True else return (anyObjectHasPropertyXPar rest)
This does run slightly faster than the naive code (even with -N1, oddly enough), but not by much (it helps a little by extending the number of parallel computations). I believe it's not benefitting much because it has to spark one thread for each element in the list.
Is there an approach similar to parListChunk that will only spark n threads and that allows for an early abort?
Edit: I'm having problems thinking about this because it seems that I would need to monitor the return value of all the threads. If I omit the rseq's and have something like
if (c1 == True) || (c2 == True) then ...
Is the runtime environment intelligent enough to monitor both threads and continue when either one of them returns?
I don't think you're going to have much luck using Control.Parallel.Strategies. A key feature of this module is that it expresses "deterministic parallelism" such that the result of the program is unaffected by the parallel evaluation. The problem you've described is fundamentally non-deterministic because threads are racing to find the first match.
Update: I see now that you're only returning True if the element is found, so the desired behavior is technically deterministic. So, perhaps there is a way to trick the Strategies module into working. Still, the implementation below seems to meet the requirements.
Here's an implementation of a parallel find parFind that runs in the IO monad using Control.Concurrent primitives and seems to do what you want. Two MVars are used: runningV keeps count of how many threads are still running to allow the last thread standing to detect search failure; and resultV is used to return Just the result or Nothing when search failure is detected by that last thread. Note that it is unlikely to perform better than a single-threaded implementation unless the test (your hasPropertyX above) is substantially more work than the list traversal, unlike this toy example.
import Control.Monad
import Control.Concurrent
import Data.List
import System.Environment
-- Thin a list to every `n`th element starting with index `i`
thin :: Int -> Int -> [a] -> [a]
thin i n = unfoldr step . drop i
where step [] = Nothing
step (y:ys) = Just (y, drop (n-1) ys)
-- Use `n` parallel threads to find first element of `xs` satisfying `f`
parFind :: Int -> (a -> Bool) -> [a] -> IO (Maybe a)
parFind n f xs = do
resultV <- newEmptyMVar
runningV <- newMVar n
comparisonsV <- newMVar 0
threads <- forM [0..n-1] $ \i -> forkIO $ do
case find f (thin i n xs) of
Just x -> void (tryPutMVar resultV (Just x))
Nothing -> do m <- takeMVar runningV
if m == 1
then void (tryPutMVar resultV Nothing)
else putMVar runningV (m-1)
result <- readMVar resultV
mapM_ killThread threads
return result
myList :: [Int]
myList = [1..1000000000]
-- Use `n` threads to find first element equal to `y` in `myList`
run :: Int -> Int -> IO ()
run n y = do x <- parFind n (== y) myList
print x
-- e.g., stack ghc -- -O2 -threaded SearchList.hs
-- time ./SearchList +RTS -N4 -RTS 4 12345 # find 12345 using 4 threads -> 0.018s
-- time ./SearchList +RTS -N4 -RTS 4 -1 # full search w/o match -> 6.7s
main :: IO ()
main = do [n,y] <- getArgs
run (read n) (read y)
Also, note that this version runs the threads on interleaved sublists rather than dividing the main list up into consecutive chunks. I did it this way because (1) it was easier to demonstrate that "early" elements were found quickly; and (2) my huge list means that memory usage can explode if the whole list needs to be kept in memory.
In fact, this example is a bit of a performance time bomb -- its memory usage is nondeterministic and can probably explode if one thread falls way behind so that a substantial portion of the whole list needs to be kept in memory.
In a real world example where the whole list is probably being kept in memory and the property test is expensive, you may find that breaking the list into chunks is faster.
I have a map operation (that is actually run in parallel using parMap from Control.Parallel.Strategies) that takes quite a while. Given that I know how many times the function is applied (n in this context), how can I easily display, every once in a while, how many of the n applications have been evaluated?
The obvious solution would be to make the map a mapM with some putStr inside the mapping function, but that would:
take an unnecessary amount of efficiency
not sample the status every once in a while but in every applications
basically remove all the good things about a deterministic algorithm in the context of parallelism
So, is there a way to keep track of this information, that I'm missing, that avoids these problems?
In production you probably shouldn't use trace and are forced to deal with the complications of needing IO, but for tests you could modify the definition of parMap to take another parameter telling when to emit a count:
import Control.Monad (sequence)
import Control.Parallel.Strategies (Strategy, using, rseq, rparWith, parMap)
import Debug.Trace (traceShow)
import System.IO (hFlush, hSetBuffering, BufferMode(NoBuffering), stdout)
evalList' :: Integer -> Strategy a -> Strategy [a]
evalList' t s as = sequence $ foldr f [] $ zip as [1..]
where f (a, n) ss | n `mod` t == 0 = s (traceShow n a):ss
| otherwise = s a:ss
parList' :: Integer -> Strategy a -> Strategy [a]
parList' t s = evalList' t (rparWith s)
parMap' :: Integer -> Strategy b -> (a -> b) -> [a] -> [b]
parMap' t s f xs = map f xs `using` parList' t s
-- some work to do
fib :: Integer -> Integer
fib 0 = 1
fib 1 = 1
fib n = fib (n-1) + fib(n-2)
main = do hSetBuffering stdout NoBuffering
print $ sum (parMap' 1000 rseq (fib.(+20).(`mod` 5)) ([1..10000]::[Integer]))
If the work packages given by each list element become to small, you could adapt parListChunk instead accordingly.
One could try to craft this behaviour using timeout.
seconds :: Int
seconds = 1000000
progress :: [a] -> IO ()
progress [] = return ()
progress l#(x:xs) =
do r <- timeout (5 * seconds) x -- 5s
threadDelay (2 * seconds) -- 2s more delay
case r of
Nothing -> progress l -- retry
Just y -> do putStrLn "one done!"
progress xs
Be careful since I fear that timeout is aborting the computation. If there's another thread that evaluates x that should be fine, but if this is the only thread doing that it could cause a livelock if 5 seconds are not enough.
As a part of a solution for the problem I'm trying to solve I need to generate a list of repeated application of a function to it's previous result. Sounds very much like iterate function, with the exception, that iterate has signature of
iterate :: (a -> a) -> a -> [a]
and my function lives inside of IO (I need to generate random numbers), so I'd need something more of a:
iterate'::(a -> IO a) -> a -> [a]
I have looked at the hoogle, but without much success.
You can actually get a lazy iterate that works on infinite lists if you use the pipes library. The definition is really simple:
import Pipes
iterate' :: (a -> IO a) -> a -> Producer a IO r
iterate' f a = do
yield a
a2 <- lift (f a)
iterate' f a2
For example, let's say that our step function is:
step :: Int -> IO Int
step n = do
m <- readLn
return (n + m)
Then applying iterate to step generates a Producer that lazily prompts the user for input and generates the tally of values read so far:
iterate' step 0 :: Producer Int IO ()
The simplest way to read out the value is to loop over the Producer using for:
main = runEffect $
for (iterate' step 0) $ \n -> do
lift (print n)
The program then endlessly loops, requesting user input and displaying the current tally:
>>> main
0
10<Enter>
10
14<Enter>
24
5<Enter>
29
...
Notice how this gets two things correct which the other solutions do not:
It works on infinite lists (you don't need a termination condition)
It produces results immediately. It doesn't wait until you run the action on the entire list to start producing usable values.
However, we can easily filter results just like the other two solutions. For example, let's say I want to stop when the tally is greater than 100. I can just write:
import qualified Pipes.Prelude as P
main = runEffect $
for (iterate' step 0 >-> P.takeWhile (< 100)) $ \n -> do
lift (print n)
You can read that as saying: "Loop over the iterated values while they are less than 100. Print the output". Let's try it:
>>> main
0
10<Enter>
10
20<Enter>
30
75<Enter>
>>> -- Done!
In fact, pipes has another helper function for printing out values, so you can simplify the above to a pipeline:
main = runEffect $ iterate' step 0 >-> P.takeWhile (< 100) >-> P.print
This gives a clear view of the flow of information. iterate' produces a never-ending stream of Ints, P.takeWhile filters that stream, and P.print prints all values that reach the end.
If you want to learn more about the pipes library, I encourage you to read the pipes tutorial.
Your functions lives in IO, so the signature is rather:
iterate'::(a -> IO a) -> a -> IO [a]
The problem is that the original iterate function returns an infinite list, so if you try to do the same in IO you will get an action that never ends. Maybe you should add a condition to end the iteration.
iterate' action value = do
result <- action value
if condition result
then return []
else
rest <- iterate' action result
return $ result : rest
Firstly, your resulting list must be in the IO monad, so iterate' must have produce an IO [a], rather than '[a]'
Iterate can be defined as:
iterate (a -> a) -> a -> [a]
iterate f x = x : iterate f (f x)
so we could make an iterateM quite easily
iterateM :: (a -> m a) -> m a -> [m a]
iterateM f x = x : iterateM f (x >>= f)
This still needs your seed value to be in the monad to start though, and also gives you a list of monadic things, rather than a monad of listy things.
So, lets change it a bit.
iterateM :: (a -> m a) -> a -> m [a]
iterateM f x = sequence $ go f (return x)
where
go f x = x : go f (x >>= f)
However, this doesn't work. This is because sequence first runs every action, and then returns. (You can see this if you write some safeDivide :: Double -> Double -> Maybe Double, and then try something like fmap (take 10) $ iterateM (flip safeDivide 2) 1000. You'll find it doesn't terminate. I'm not sure how to fix that though.
Consider the following abbreviated code from this excellent blog post:
import System.Random (Random, randomRIO)
newtype Stream m a = Stream { runStream :: m (Maybe (NonEmptyStream m a)) }
type NonEmptyStream m a = (a, Stream m a)
empty :: (Monad m) => Stream m a
empty = Stream $ return Nothing
cons :: (Monad m) => a -> Stream m a -> Stream m a
cons a s = Stream $ return (Just (a, s))
fromList :: (Monad m) => [a] -> NonEmptyStream m a
fromList (x:xs) = (x, foldr cons empty xs)
Not too bad thus far - a monadic, recursive data structure and a way to build one from a list.
Now consider this function that chooses a (uniformly) random element from a stream, using constant memory:
select :: NonEmptyStream IO a -> IO a
select (a, s) = select' (return a) 1 s where
select' :: IO a -> Int -> Stream IO a -> IO a
select' a n s = do
next <- runStream s
case next of
Nothing -> a
Just (a', s') -> select' someA (n + 1) s' where
someA = do i <- randomRIO (0, n)
case i of 0 -> return a'
_ -> a
I'm not grasping the mysterious cyclic well of infinity that's going on in the last four lines; the result a' depends on a recursion on someA, which itself could depend on a', but not necessarily.
I get the vibe that the recursive worker is somehow 'accumulating' potential values in the IO a accumulator, but I obviously can't reason about it well enough.
Could anyone provide an explanation as to how this function produces the behaviour that it does?
That code doesn't actually run in constant space, as it composes a bigger and bigger IO a action which delays all the random choices until it's reached the end of the stream. Only when we reach the Nothing -> a case does the action in a actually get run.
For example, try running it on an infinite, constant space stream made by this function:
repeat' :: a -> NonEmptyStream IO a
repeat' x = let xs = (x, Stream $ return (Just xs)) in xs
Obviously, running select on this stream won't terminate, but you should see the memory usage going up as it allocates a lot of thunks for the delayed actions.
Here's a slightly re-written version of the code which does the choices as it goes along, so it runs in constant space and should hopefully be more clear as well. Note that I've replaced the IO a argument with a plain a which makes it clear that there are no delayed actions being built up here.
select :: NonEmptyStream IO a -> IO a
select (x, xs) = select' x 1 xs where
select' :: a -> Int -> Stream IO a -> IO a
select' current n xs = do
next <- runStream xs
case next of
Nothing -> return current
Just (x, xs') -> do
i <- randomRIO (0, n) -- (1)
case i of
0 -> select' x (n+1) xs' -- (2)
_ -> select' current (n+1) xs' -- (3)
As the name implies, current stores the currently selected value at each step. Once we've extracted the next item from the stream, we (1) pick a random number and use this to decide whether to (2) replace our selection with the new item or (3) keep our current selection before recursing on the rest of the stream.
There doesn't seem anything "cyclic" going on here. In particular, a' does not depend on someA. The a' is bound by pattern machting on the result of next. It is being used by someA which is in turn used on the right hand side, but this does not constitute a cycle.
What select' does is to traverse the stream. It maintains two accumulating arguments. The first is a random element from the stream (it's not yet selected and still random, hence IO a). The second is the position in the stream (Int).
The invariant being maintained is that the first accumulator selects an element uniformly from the stream we have seen so far, and that the integer represents the number of elements encountered so far.
Now, if we reach the end of the stream (Nothing), we can return the current random element, and it will be ok.
If we see another element (the Just case), then we recurse by calling select' again. Updating the number of elements to n + 1 is trivial. But how do we update the random element someA? Well, the old random element a chooses between the first n positions of the stream with equal probability. If we choose the new element a' with probability 1 / (n + 1) and use the old one in all other cases, then we have a uniform distribution over the whole stream up to this point again.
I want to write a function that takes a time limit (in seconds) and a list, and computes as many elements of the list as possible within the time limit.
My first attempt was to first write the following function, which times a pure computation and returns the time elapsed along with the result:
import Control.DeepSeq
import System.CPUTime
type Time = Double
timed :: (NFData a) => a -> IO (a, Time)
timed x = do t1 <- getCPUTime
r <- return $!! x
t2 <- getCPUTime
let diff = fromIntegral (t2 - t1) / 10^12
return (r, diff)
I can then define the function I want in terms of this:
timeLimited :: (NFData a) => Time -> [a] -> IO [a]
timeLimited remaining [] = return []
timeLimited remaining (x:xs) = if remaining < 0
then return []
else do
(y,t) <- timed x
ys <- timeLimited (remaining - t) xs
return (y:ys)
This isn't quite right though. Even ignoring timing errors and floating point errors, this approach never stops the computation of an element of the list once it has started, which means that it can (and in fact, normally will) overrun its time limit.
If instead I had a function that could short-circuit evaluation if it had taken too long:
timeOut :: Time -> a -> IO (Maybe (a,t))
timeOut = undefined
then I could write the function that I really want:
timeLimited' :: Time -> [a] -> IO [a]
timeLimited' remaining [] = return []
timeLimited' remaining (x:xs) = do
result <- timeOut remaining x
case result of
Nothing -> return []
Just (y,t) -> do
ys <- timeLimited' (remaining - t) xs
return (y:ys)
My questions are:
How do I write timeOut?
Is there a better way to write the function timeLimited, for example, one that doesn't suffer from accumulated floating point error from adding up time differences multiple times?
Here's an example I was able to cook up using some of the suggestions above. I've not done huge amounts of testing to ensure work is cut off exactly when the timer runs out, but based on the docs for timeout, this should work for all things not using FFI.
import Control.Concurrent.STM
import Control.DeepSeq
import System.Timeout
type Time = Int
-- | Compute as many items of a list in given timeframe (microseconds)
-- This is done by running a function that computes (with `force`)
-- list items and pushed them onto a `TVar [a]`. When the requested time
-- expires, ghc will terminate the execution of `forceIntoTVar`, and we'll
-- return what has been pushed onto the tvar.
timeLimited :: (NFData a) => Time -> [a] -> IO [a]
timeLimited t xs = do
v <- newTVarIO []
_ <- timeout t (forceIntoTVar xs v)
readTVarIO v
-- | Force computed values into given tvar
forceIntoTVar :: (NFData a) => [a] -> TVar [a] -> IO [()]
forceIntoTVar xs v = mapM (atomically . modifyTVar v . forceCons) xs
-- | Returns function that does actual computation and cons' to tvar value
forceCons :: (NFData a) => a -> [a] -> [a]
forceCons x = (force x:)
Now let's try it on something costly:
main = do
xs <- timeLimited 100000 expensiveThing -- run for 100 milliseconds
print $ length $ xs -- how many did we get?
-- | Some high-cost computation
expensiveThing :: [Integer]
expensiveThing = sieve [2..]
where
sieve (p:xs) = p : sieve [x|x <- xs, x `mod` p > 0]
Compiled and run with time, it seems to work (obviously there is some overhead outside the timed portion, but I'm at roughly 100ms:
$ time ./timeLimited
1234
./timeLimited 0.10s user 0.01s system 97% cpu 0.112 total
Also, something to note about this approach; since I'm enclosing the entire operation of running the computations and pushing them onto the tvar inside one call to timeout, some time here is likely lost in creating the return structure, though I'm assuming (if your computations are costly) it won't account or much of your overall time.
Update
Now that I've had some time to think about it, due to Haskell's laziness, I'm not 100% positive the note above (about time-spent creating the return structure) is correct; either way, let me know if this is not precise enough for what you are trying to accomplish.
You can implement timeOut with the type you gave using timeout and evaluate. It looks something like this (I've omitted the part that computes how much time is left -- use getCurrentTime or similar for that):
timeoutPure :: Int -> a -> IO (Maybe a)
timeoutPure t a = timeout t (evaluate a)
If you want more forcing than just weak-head normal form, you can call this with an already-seq'd argument, e.g. timeoutPure (deepseq v) instead of timeoutPure v.
I would use two threads together with TVars and raise an exception (that causes every ongoing transaction to be rolled back) in the computation thread when the timeout has been reached:
forceIntoTVar :: (NFData a) => [a] -> TVar [a] -> IO [()]
forceIntoTVar xs v = mapM (atomically . modifyTVar v . forceCons) xs
-- | Returns function that does actual computation and cons' to tvar value
forceCons :: (NFData a) => a -> [a] -> [a]
forceCons x = (force x:)
main = do
v <- newTVarIO []
tID <- forkIO $ forceIntoTVar args v
threadDelay 200
killThread tID
readTVarIO v
In this example you (may) need to adjust forceIntoTVar a bit so that e.g. the list nodes are NOT computet inside the atomic transaction but first computed and then a atomic transaction is started to cons them onto the list.
In any case, when the exception is raised the ongoing transaction is rolled back or the ongoing computation is stopped before the result is consed to the list and that is what you want.
What you need to consider is that when the individual computations to prepare a node run with high frequency then this example is probably very costly compared to not using STM.