Manual performGC hugely reduces memory footprint - haskell

My program uses the GHC API in IO, doing some computation inside a GhcMonad and forcing the result before returning it; something like this:
main :: IO ()
main = do
x <- runGhcT $ do
x0 <- someGhcFunctionality
x1 <- furtherProcessing
liftIO . evaluate . force $ x1
putStrLn "Done with GHC."
_ <- getLine
continueProcessingOutsideGhc x
At the pause point, I can see the process using 30+ GB of RAM; since continueProcessingOutsideGhc also uses some amount of memory on its own, this can lead to running out of memory in the middle of continueProcessingOutsideGhc.
However, what I have found is that manually forcing garbage collection at the pause point changes things drastically:
import System.Mem
main :: IO ()
main = do
x <- runGhcT $ do
x0 <- someGhcFunctionality
x1 <- furtherProcessing
liftIO . evaluate . force $ x1
putStrLn "Done with GHC."
_ <- getLine
performGC
putStrLn "Done with performGC."
_ <- getLine
continueProcessingOutsideGhc x
That performGC line decreases the memory footprint by 85%, to about 4 GB. This is of course enough by a far margin to let continueProcessingOutsideGhc finish. I should also note that doing liftIO performGC inside runGhcT doesn't have the same effect; I guess that makes sense if the global GHC context is holding on to a lot of things.
What I'd like to understand is why all that garbage is left there after exiting runGhcT without the manual performGC.

I think there's not much to it. The garbage collector runs when the runtime notices that it's needed -- which, generally, is when the "short-lived" generation fills up. So if you have recently generated a bunch of garbage, it's going to live a while, until you allocated a bunch of new stuff and trigger a collection.
You haven't allocated any memory (well, maybe some, but certainly not much) since the end of runGhcT, so you haven't triggered a collection.

Related

How to reuse efficiently input from stdin in Haskell

I understand that I should not try to re-read from stdin because of errors about Haskell IO - handle closed
For example, in below:
main = do
x <- getContents
putStrLn $ map id x
x <- getContents --problem line
putStrLn x
the second call x <- getContents will cause the error:
test: <stdin>: hGetContents: illegal operation (handle is closed)
Of course, I can omit the second line to read from getContents.
main = do
x <- getContents
putStrLn $ map id x
putStrLn x
But will this become a performance/memory issue? Will GHC have to keep all of the contents read from stdin in the main memory?
I imagine the first time around when x is consumed, GHC can throw away the portions of x that are already processed. So theoretically, GHC could only use a small amount of constant memory for the processing. But since we are going to use x again (and again), it seems that GHC cannot throw away anything. (Nor can it read again from stdin).
Is my understanding about the memory implications here correct? And if so, is there a fix?
Yes, your understanding is correct: If you reuse x, ghc has to keep it all in memory.
I think a possible fix is to consume it lazily (once).
Let's say you want to output x to several output handles hdls :: [Handle]. The naive approach is:
main :: IO ()
main = do
x <- getContents
forM_ hdls $ \hdl -> do
hPutStr hdl x
This will read stdin into x as the first hPutStr traverses the string (at least for unbuffered handles, hPutStr is simply a loop that calls hPutChar for each character in the string). From then on it'll be kept in memory for all following hdls.
Alternatively:
main :: IO ()
main = do
x <- getContents
forM_ x $ \c -> do
forM_ hdls $ \hdl -> do
hPutChar hdl c
Here we've transposed the loops: Instead of iterating over the handles (and for each handle iterating over the input characters), we iterate over the input characters, and for each character, we print it to each handle.
I haven't tested it, but this form should guarantee that we don't need a lot of memory because each input character c is used once and then discarded.

Is evaluate or $! sufficient to WHNF-force a value in a multithreaded monadic context, or do I need pseq?

The following seems to work (as in: it keeps saying Surely tomorrow every second)
import Control.Concurrent
import Control.Concurrent.MVar
import Control.Exception (evaluate)
main :: IO ()
main = do
godot <- newEmptyMVar
forkIO $ do
g <- evaluate $ last [0..]
putMVar godot g
let loop = do
threadDelay $ 10^6
g <- tryTakeMVar godot
case g of
Just g -> return ()
Nothing -> putStrLn "Surely tomorrow." >> loop
loop
This uses evaluate to ensure last [0..] is actually forced to WHFN before filling the MVar – if I change the forked thread to
forkIO $ do
let g = last [0..]
putMVar godot g
then the program terminates.
However, evaluate uses seq. In the context of deterministic parallelism, it's always emphasized that seq is not sufficient to actually guarantee evaluation order. Does this problem not arise in a monadic context, or should I better use
forkIO $ do
let g = last [0..]
g `pseq` putMVar godot g
to ensure the compiler can't reorder the evaluation so tryTakeMVar succeeds prematurely?
The point of pseq is to ensure that after the parent thread sparks a computation with par, it does not immediately proceed to try to evaluate the result of the sparked computation itself, but instead does its own job first. See the documentation for an example. When you're working more explicitly with concurrency, you shouldn't need pseq.
If I'm not totally wrong, evaluating last [0..] to WHNF would take an infinite amount of time, because WHNF for an Int means that you know the exact number.
putMVar will not start executing before last [0..] is evaluated to WHNF (which as we know takes forever), because putMVar will need the RealWorld-token (s) returned by the call to evaluate. (Or to put it more simply: evaluate works. It finishes only after evaluating its argument to WHNF.)
evaluate :: a -> IO a
evaluate a = IO $ \s -> seq# a s
-- this ^
putMVar (MVar mvar#) x = IO $ \ s# ->
-- which is used here ^^
case putMVar# mvar# x s# of
-- is needed here ^^
s2# -> (# s2#, () #)
where seq# is a GHC-prim function that guarantees to return (# a, s #) only after evaluating a to WHNF (that's its purpose). That is, only after a is evaluated to WHNF, s can be used in the call to putMVar. Although these tokens are purely imaginative ("RealWorld is deeply magical..."), they are respected by the compiler, and the whole IO-monad is built on top of it.
So yes, evaluate is enough in this case. evaluate is more than seq: it combines IO-monadic sequencing with seq#-sequencing to produce its effect.
In fact, the pseq version looks a bit fishy to me, because it ultimately depends on lazy, where evaluate ultimately depends on seq# and monadic token-passing. And I trust seq# a bit more.

Haskell: How to benchmark a computation accurately with deepseq/force

I have a web server written in Haskell that computes some data in multiple steps.
I want to accurately measure and display how long each action takes.
In the presence of laziness, what is a good way to do this?
Note that "benchmarking" is not quite the right terminology since I only only want to measure time in a production system and not sample many runs. I know that for that case I can use criterion.
You can use force from Control.DeepSeq to fully evaluate a data structure (and thus demand and measure its computation).
One problem is that forcing a large data structure takes some time itself!
This is because a deepseq (used by force) will walk down your algebraic data type tree, visiting every node (but not doing anything with it).
When you perform only a cheap operation to each node, such as map (*2) mylist, and try to measure how long it takes, this overhead can suddenly become significant, messing up your measurements.
import Control.DeepSeq
import Control.Exception (evaluate)
import Data.Time (diffUTCTime, getCurrentTime)
-- | Measures how long a computation takes, printing both the time and the
-- overhead of `force` to stdout. So it forces *twice*.
benchmarkForce :: NFData a => String -> IO a -> IO a
benchmarkForce msg action = do
before <- getCurrentTime
-- Force the first time to measure computation + forcing
result <- evaluate . force =<< action
after <- getCurrentTime
-- Force again to see how long forcing itself takes
_ <- evaluate . force $ result
afterAgain <- getCurrentTime
putStrLn $ msg ++ ": " ++ show (diffTimeMs before after) ++ " ms"
++ " (force time: " ++ show (diffTimeMs after afterAgain) ++ " ms)"
return result
where
-- Time difference `t2 - t1` in milliseconds
diffTimeMs t1 t2 = realToFrac (t2 `diffUTCTime` t1) * 1000.0 :: Double
The first evaluate . force run will make sure your action and its return value are evaluated entirely.
By doing a second force run over the result, we can measure how much overhead it added to the first traversal.
This of course comes at the expense of two traversals; being able to measure how much time a deepseq wastes requires you to waste that time twice.
Here is an example to measure some pure functions with that:
main :: IO ()
main = do
l <- benchmarkForce "create list" $
return [1..10000000 :: Integer]
_ <- benchmarkForce "double each list element" $
return $ map (*2) l
_ <- benchmarkForce "map id l" $
return $ map id l
return ()
(Of course it also works with functions in IO.)
The output:
create list: 1091.936 ms (force time: 71.33200000000001 ms)
double each list element: 1416.0569999999998 ms (force time: 96.808 ms)
map id l: 484.493 ms (force time: 67.232 ms)
As we can see, the force creates around 13% overhead in the map id l case.

Haskell Gtk : mainGUI function blocking other threads

I'm having a little bit of trouble with threading (using STM) and combining that with GTK (http://hackage.haskell.org/package/gtk-0.12.3).
I have a function that loops indefinitely, ticking every 2 seconds, which prints the contents of a list, then clears the list, defined like
data VirtualHelicopter = VirtualHelicopter { getOrders :: TVar [(Option, Int)] }
run :: VirtualHelicopter -> IO ()
run h = do
forever ( (putStrLn . show =<< (atomRead orders))
>> clearOrders orders
>> milliSleep 2000)
where
orders = getOrders h
atomRead = atomically . readTVar
clearOrders x = atomically $ writeTVar x []
milliSleep = threadDelay . (*) 1000
Additionally, I have a GUI function, defined as
runGUI :: VirtualHelicopter -> IO ()
runGUI flyer = do
Gtk.initGUI
~ GUI set up stuff, some key listeners that write to the TVar inside flyer ~
forkIO $ run flyer
Gtk.mainGUI
If I forkIO the "run" function from inside GHCI, everything works fine - it ticks every 2 seconds, printing to the console and updating the queue with anything I add into it using the TVar from the flyer.
However, when I try to forkIO from within the GUI, it no longer ticks every two seconds - it just kind of hangs until I put in some input, then gives unreliable outputs.
Anyone have any idea why this happens? You can see the whole project this is from, with full context at https://github.com/tetigi/majom, which is a Haskell extension of this project idea http://procrastineering.blogspot.co.uk/2011/11/computer-controlling-syma-helicopter.html. The threading implementation is on the "threadingForVHeli" branch.
As hammar said, adding -threaded to the build options solved it. I didn't realise there was a compiler option for threaded things!
I'll also work on clearing up the race condition that was mentioned :)

Killing a thread when MVar is garbage collected

I have a worker thread which reads data repeatedly from an MVar and performs some useful work on that. After a while, the rest of the program forgets about that worker thread, which means that it will wait on an empty MVar and become very lonely. My question is:
Will the MVar be garbage collected if threads no longer write to it, for instance because they all wait for it?
Will garbage collection kill the waiting threads?
If neither, can I somehow indicate to the compiler that the MVar should be garbage collected and the thread be killed?
EDIT: I should probably clarify the purpose of my question. I don't desire general protection against deadlocks; instead, what I would like to do is to tie the life of the worker thread to life of a value (as in: dead values are claimed by garbage collection). In other words, the worker thread is a resource that I would like to free not by hand, but when a certain value (the MVar or a derivative) is garbage collected.
Here an example program that demonstrates what I have in mind
import Control.Concurrent
import Control.Concurrent.MVar
main = do
something
-- the thread forked in something can be killed here
-- because the MVar used for communication is no longer in scope
etc
something = do
v <- newEmptyMVar
forkIO $ forever $ work =<< takeMVar v
putMVar v "Haskell"
putMVar v "42"
In other words, I want the thread to be killed when I can no longer communicate with it, i.e. when the MVar used for communication is no longer in scope. How to do that?
It will just work: when the MVar is only reachable by the thread that is blocked on it, then the thread is sent the BlockedIndefinitelyOnMVar exception, which will normally cause it to die silently (the default exception handler for a thread ignores this exception).
BTW, for doing some cleanup when the thread dies, you'll want to use forkFinally (which I just added to Control.Concurrent).
If you're lucky, you'll get a "BlockedIndefinitelyOnMVar", indicating that you're waiting on an MVar that no thread will ever write to.
But, to quote Ed Yang,
GHC only knows that a thread can be considered garbage if there are no
references to the thread. Who is holding a reference to the thread?
The MVar, as the thread is blocking on this data structure and has
added itself to the blocking list of this. Who is keeping the MVar
alive? Why, our closure that contains a call to takeMVar. So the
thread stays.
without a bit of work (which would be, by the way, quite interesting to see), BlockedIndefinitelyOnMVar is not an obviously useful mechanism for giving your Haskell programs deadlock protection.
GHC just can't solve the problem in general of knowing whether your thread will make progress.
A better approach would be to explicitly terminate threads by sending them a Done message. E.g. just lift your message type into an optional value that also includes an end-of-message value:
import Control.Concurrent
import Control.Concurrent.MVar
import Control.Monad
import Control.Exception
import Prelude hiding (catch)
main = do
something
threadDelay (10 * 10^6)
print "Still here"
something = do
v <- newEmptyMVar
forkIO $
finally
(let go = do x <- takeMVar v
case x of
Nothing -> return ()
Just v -> print v >> go
in go)
(print "Done!")
putMVar v $ Just "Haskell"
putMVar v $ Just "42"
putMVar v Nothing
and we get the correct clean up:
$ ./A
"Haskell"
"42"
"Done!"
"Still here"
I tested the simple weak MVar and it did get finalized and killed. The code is:
import Control.Monad
import Control.Exception
import Control.Concurrent
import Control.Concurrent.MVar
import System.Mem(performGC)
import System.Mem.Weak
dologger :: MVar String -> IO ()
dologger mv = do
tid <- myThreadId
weak <- mkWeakPtr mv (Just (putStrLn "X" >> killThread tid))
logger weak
logger :: Weak (MVar String) -> IO ()
logger weak = act where
act = do
v <- deRefWeak weak
case v of
Just mv -> do
a <- try (takeMVar mv) :: IO (Either SomeException String)
print a
either (\_ -> return ()) (\_ -> act) a
Nothing -> return ()
play mv = act where
act = do
c <- getLine
if c=="quit" then return ()
else putMVar mv c >> act
doplay mv = do
forkIO (dologger mv)
play mv
main = do
putStrLn "Enter a string to escape, or quit to exit"
mv <- newEmptyMVar
doplay mv
putStrLn "*"
performGC
putStrLn "*"
yield
putStrLn "*"
threadDelay (10^6)
putStrLn "*"
The session with the program was:
(chrisk)-(/tmp)
(! 624)-> ghc -threaded -rtsopts --make weak2.hs
[1 of 1] Compiling Main ( weak2.hs, weak2.o )
Linking weak2 ...
(chrisk)-(/tmp)
(! 625)-> ./weak2 +RTS -N4 -RTS
Enter a string to escape, or quit to exit
This is a test
Right "This is a test"
Tab Tab
Right "Tab\tTab"
quit
*
*
X
*
Left thread killed
*
So blocking on takeMVar did not keep the MVar alive on ghc-7.4.1 despite expectations.
While BlockedIndefinitelyOnMVar should work, also consider using ForeignPointer finalizers. The normal role of those is to delete C structures that are no longer accessible in Haskell. However, you can attach any IO finalizer to them.

Resources