I'm using gloss to create am RTS game in Haskell, but I've noticed that even a very simple program will occupy more and more memory as it runs. The following program, for example, will gradually increase its memory use (it will require ~0.025mb per second ).
module Main (
main
)
where
import Graphics.Gloss
import Graphics.Gloss.Interface.IO.Game
main =
playIO (InWindow "glossmem" (500, 500) (0,0)) white 10 0
(\world -> return (translate (-250) 0 (text $ show world)))
(\event -> (\world -> return world))
(\timePassed -> (\world -> return $ world + timePassed))
I've tried limiting the heap size at runtime but that just causes the program to crash when it hits the limit. I'm concerned this behaviour will become a performance issue when I have a more complex world, is there a way to use gloss such that this won't be an issue? Or am I using the wrong tool for the job?
Thanks, I fixed this in gloss-1.7.7.1. It was a typical laziness-induced space leak in the code that manages the frame timing for animations. Your example program now runs in constant space.
Related
I am writing a small snake game in Haskell as sort of a guided tutorial for beginners. The "rendering" just takes a Board and produces a Data.ByteString.Builder which is printed in the terminal. (the html profiles are pushed to the repo, you can inspect them without compiling the programm)
The problem
The problem I have is that the heap profiling looks weird: There are many spikes, and suddenly Builder, PAP and BuildStep take as same memory as the rest of the program. Considering that rendering is happenning 10 times in a second (i.e. every second we produce 10 builders), it seems inconsistent that every once in a while the builder just takes that much memory. I don't know if this is considered an space leak, since there is no thunks in the profile, but the PAP doesn't look right (I don't know...)
Implementation
The board is represented as an inmutable array of builders indexed by coordinaates (tuples) type Board = Array (Int, Int) Builder (essentialy, what should be printed in each coordinate). The function which converts the board into a builder is the expected strict fold which handle new lines using height and width of the board.
toBuilder :: RenderState -> Builder
-- |- The Array (Int, Int) Builder
toBuilder (RenderState b binf#(BoardInfo h w) gOver s) =
-- ^^^ height and width
if gOver
then ppScore s <> fst (boardToString $ emptyGrid binf) -- Not interesting. Case of game over print build an empty grid
else ppScore s <> fst (boardToString b) -- print the current board
where
boardToString = foldl' fprint (mempty, 0) -- concatenate builders and count the number, such that when #width builders have been concatenated, add a new line.
fprint (!s, !i) cell =
if ((i + 1) `mod` w) == 0
then (s <> cell <> B.charUtf8 '\n', i + 1 )
else (s <> cell , i + 1)
Up to the .prof file this function take most of the time and space (92%, which is expected). Moreover, this is the only part of the code that produces a big builder, so the problem should be here.
The buffering mode
The above profile happens when BufferMode is set to LineBuffering (default), but interestingly if I change it to NoBuffering then the profile looks the same but a thunk appears and the builder disappear...
The questions
I have reached a point which I don't know whats going on, hence my questions are a little bit vague:
Is my code with line buffering (first profile) actually leaking? No thunk appears but the PAP eating so much memory looks like a warning
The second profile clearly(?) leaks, is there an standard way to inspect which part of the code is producing the thunk?
Am I completely missing something, and actually the profile looks fine?
In case anyone is interested, I think I've found the problem. It is the terminal speed... If I run an smaller board size or a slower rendering time (the picture is for a 50x70 board with 10 renders a second), then the memory usage is completely normal.
What I think is happening, is that the board is printed into the console using B.hPutBuilder stdout, this action takes shorter than the console to actually print it, so the haskell thread continues and creates another board which should wait to be printed because the console is busy. I guess this leads to some how, two boards living in memory for a short time.
Other guesses are welcome!
I have been trying to write a function in that will take a histogram of a vector using the accelerate library. I recognize that histograms aren't the idea case for GPU processing, but I'm generating a fairly large dataset from a small seed and it would be nice if it could be reduced to a few kilobyte array before transferring it back to main memory.
The code that I've come up with is below. It takes a number of output bins then then creates a new array where the values of a[x] is the number of occurrences of x in xs
hist :: A.Exp Int -> A.Acc (A.Vector Int) -> A.Acc (A.Vector Int)
hist bins xs = A.permute
(const (+1))
(A.fill (A.index1 bins) 0)
(A.index1 . (xs A.!))
xs
The code appears to run properly under the Accelerate interpreter. However, if I try to call it through accelerate-cuda, I get the following error message.
./Data/Array/Accelerate/CUDA/State.hs:85:9: (unhandled): CUDA Exception: unspecified launch failure
My question is two-fold. First, what am I doing that causes CUDA to fail? Second, is there a better way to take a histogram through Accelerate?
This was a bug in Accelerate (and/or underlying change in CUDA) which has now been fixed. Apologies for taking so long to get to it, this slipped off my radar.
I'm trying to deliver a large HTTP response using the Haskell Snap framework, but memory usage grows in proportion to the size of the response. Here's a couple of cut down test cases that use a large lazy ByteString:
import Snap.Core (Snap, writeLBS, readRequestBody)
import Snap.Http.Server (quickHttpServe)
import Control.Monad.IO.Class (MonadIO(liftIO))
import qualified Data.ByteString.Lazy.Char8 as LBS (ByteString, length, replicate)
main :: IO ()
main = quickHttpServe $ site test1 where
test1, test2 :: LBS.ByteString -> Snap ()
-- Send ss to client
test1 = writeLBS
-- Print ss to stdout upon receiving request
test2 = liftIO . print
site write = do
body <- readRequestBody 1000
-- Making ss dependant on the request stops GHC from keeping a
-- reference to ss as pointed out by Reid Barton.
let bodyLength = fromIntegral $ LBS.length body
write $ ss bodyLength
ss c = LBS.replicate (1000000000000 * (c + 1)) 'S'
test1 delivers ss to the client. Memory usage grows in proportion to the ByteString's size.
test2 prints ss to stdout within the Snap monad stack upon receiving a request. This runs in a small constant amount of memory.
The responses are delivered using chunked encoding, so I thought Snap should also be able to deliver the response in constant memory. Is there any way to achieve this? It's also worth noting that the response starts being delivered immediately. I've tried using transformRequestBody, but ran into the same problem.
Memory was measured using "top" on linux and was observed to steadily grow to 15GB resident at which point it was just starting to swap. The request did complete, so the memory usage is a couple of orders of magnitude less than the size of the ByteString. Once the request had completed, it sat at 15Gb resident. When I fired another request at it, it still remained steady at 15Gb and completed the request as before. The virtual size stayed within 5% of the resident size.
Firing 2 concurrent requests at it resulted at first in a drop in virtual and resident memory to about 5Gb, followed by an increase to about 17Gb at which point the machine was getting unusable so I killed the process.
GHC version 7.8.3
Snap version 0.14.0.5
I am trying to parallelize a ray-tracer. This means I have a very long list of small computations. The vanilla program runs on a specific scene in 67.98 seconds and 13 MB of total memory use and 99.2% productivity.
In my first attempt I used the parallel strategy parBuffer with a buffer size of 50. I chose parBuffer because it walks through the list only as fast as sparks are consumed, and does not force the spine of the list like parList, which would use a lot of memory since the list is very long. With -N2, it ran in a time of 100.46 seconds and 14 MB of total memory use and 97.8% productivity. The spark information is: SPARKS: 480000 (476469 converted, 0 overflowed, 0 dud, 161 GC'd, 3370 fizzled)
The large proportion of fizzled sparks indicates that the granularity of sparks was too small, so next I tried using the strategy parListChunk, which splits the list into chunks and creates a spark for each chunk. I got the best results with a chunk size of 0.25 * imageWidth. The program ran in 93.43 seconds and 236 MB of total memory use and 97.3% productivity. The spark information is: SPARKS: 2400 (2400 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled). I believe the much greater memory use is because parListChunk forces the spine of the list.
Then I tried to write my own strategy that lazily divided the list into chunks and then passed the chunks to parBuffer and concatenated the results.
concat $ withStrategy (parBuffer 40 rdeepseq) (chunksOf 100 (map colorPixel pixels))
This ran in 95.99 seconds and 22MB of total memory use and 98.8% productivity. This was succesful in the sense that all the sparks are being converted and the memory usage is much lower, however the speed is not improved. Here is an image of part of the eventlog profile.
As you can see the threads are being stopped due to heap overflows. I tried adding +RTS -M1G which increases the default heap size all the way up to 1Gb. The results did not change. I read that Haskell main thread will use memory from the heap if its stack overflows, so I also tried increasing the default stack size too with +RTS -M1G -K1G but this also had no impact.
Is there anything else I can try? I can post more detailed profiling info for memory usage or eventlog if needed, I did not include it all because it is a lot of information and I did not think all of it was necessary to include.
EDIT: I was reading about the Haskell RTS multicore support, and it talks about there being a HEC (Haskell Execution Context) for each core. Each HEC contains, among other things, an Allocation Area (which is a part of a single shared heap). Whenever any HEC's Allocation Area is exhausted, a garbage collection must be performed. The appears to be a RTS option to control it, -A. I tried -A32M but saw no difference.
EDIT2:
Here is a link to a github repo dedicated to this question. I have included the profiling results in the profiling folder.
EDIT3: Here is the relevant bit of code:
render :: [([(Float,Float)],[(Float,Float)])] -> World -> [Color]
render grids world = cs where
ps = [ (i,j) | j <- reverse [0..wImgHt world - 1] , i <- [0..wImgWd world - 1] ]
cs = map (colorPixel world) (zip ps grids)
--cs = withStrategy (parListChunk (round (wImgWd world)) rdeepseq) (map (colorPixel world) (zip ps grids))
--cs = withStrategy (parBuffer 16 rdeepseq) (map (colorPixel world) (zip ps grids))
--cs = concat $ withStrategy (parBuffer 40 rdeepseq) (chunksOf 100 (map (colorPixel world) (zip ps grids)))
The grids are random floats that are precomputed and used by colorPixel.The type of colorPixel is:
colorPixel :: World -> ((Float,Float),([(Float,Float)],[(Float,Float)])) -> Color
Not the solution to your problem, but a hint to the cause:
Haskell seems to be very conservative in memory reuse and when the interpreter sees the potential to reclaim a memory block, it goes for it. Your problem description fits the minor GC behavior described here (bottom)
https://wiki.haskell.org/GHC/Memory_Management.
New data are allocated in 512kb "nursery". Once it's exhausted, "minor
GC" occurs - it scans the nursery and frees unused values.
So if you chop the data into smaller chunks, you enable the engine to do the cleanup earlier - GC kicks in.
I've written a daemon in Haskell that scrapes information from a webpage every 5 minutes.
The daemon originally ran fine for about 50 minutes, but then it unexpectedly died with out of memory (requested 1048576 bytes). Every time I ran it it died after the same amount of time. Setting it to sleep only 30 seconds, it instead died after 8 minutes.
I realized the code to scrape the website was incredibly memory inefficient (going from about 30M while sleeping to 250M while parsing 9M of html), so I rewrote it so that now it only uses about 15M extra while parsing. Thinking the problem was fixed, I ran the daemon overnight and when I woke up it was actually using less memory than it was that night. I thought I was done, but roughly 20 hours after it had started, it had crashed with the same error.
I started looking into ghc profiling but I wasn't able to get that to work. Next I started messing with rts options, and I tried setting -H64m to set the default heap size to be larger than my program was using, and also using -Ksize to shrink the maximum size of the stack to see if that would make it crash sooner.
Despite every change I've made, the daemon still seems to crash after a constant number of iterations. Making the parsing more memory efficient made this value higher, but it still crashes. This doesn't make sense to me because none of these have runs have even come close to using all of my memory, much less swap space. The heap size is supposed to be unlimited by default, shrinking the stack size didn't make a difference, and all my ulimits are either unlimited or significantly higher than what the daemon is using.
In the original code I pinpointed the crash to somewhere in the html parsing, but I haven't done the same for the more memory efficient version because 20 hours takes so long to run. I don't know if this would even be useful to know because it doesn't seem like any specific part of the program is broken because it run successfully for dozens of iterations before crashing.
Out of ideas, I even looked through the ghc source code for this error, and it appears to be a failed call to mmap, which wasn't very helpful to me because I assume that isn't the root of the problem.
(Edit: code rewritten and moved to end of post)
I'm pretty new at Haskell, so I'm hoping this is some quirk of lazy evaluation or something else that has a quick fix. Otherwise, I'm fresh out of ideas.
I'm using GHC version 7.4.2 on FreeBsd 9.1
Edit:
Replacing the downloading with static html got rid of the problem, so I've narrowed it down to how I'm using http-conduit. I've edited the code above to include my networking code. The hackage docs mention to share a manager so I've done that. And it also says that for http you have to explicitly close connections, but I don't think I need to do that for httpLbs.
Here's my code.
import Control.Monad.IO.Class (liftIO)
import qualified Data.Text as T
import qualified Data.ByteString.Lazy as BL
import Text.Regex.PCRE
import Network.HTTP.Conduit
main :: IO ()
main = do
manager <- newManager def
daemonLoop manager
daemonLoop :: Manager -> IO ()
daemonLoop manager = do
rows <- scrapeWebpage manager
putStrLn $ "number of rows parsed: " ++ (show $ length rows)
doSleep
daemonLoop manager
scrapeWebpage :: Manager -> IO [[BL.ByteString]]
scrapeWebpage manager = do
putStrLn "before makeRequest"
html <- makeRequest manager
-- Force evaluation of html.
putStrLn $ "html length: " ++ (show $ BL.length html)
putStrLn "after makeRequest"
-- Breaks ~10M html table into 2d list of bytestrings.
-- Max memory usage is about 45M, which is about 15M more than when sleeping.
return $ map tail $ html =~ pattern
where
pattern :: BL.ByteString
pattern = BL.concat $ replicate 12 "<td[^>]*>([^<]+)</td>\\s*"
makeRequest :: Manager -> IO BL.ByteString
makeRequest manager = runResourceT $ do
defReq <- parseUrl url
let request = urlEncodedBody params $ defReq
-- Don't throw errors for bad statuses.
{ checkStatus = \_ _ -> Nothing
-- 1 minute.
, responseTimeout = Just 60000000
}
response <- httpLbs request manager
return $ responseBody response
and it's output:
before makeRequest
html length: 1555212
after makeRequest
number of rows parsed: 3608
...
before makeRequest
html length: 1555212
after makeRequest
bannerstalkerd: out of memory (requested 2097152 bytes)
Getting rid of the regex computations fixed the problem, but it seems that the error happens after the networking and during the regex, presumably because of something I'm doing wrong with http-conduit. Any ideas?
Also, when I try to compile with profiling enabled I get this error:
Could not find module `Network.HTTP.Conduit'
Perhaps you haven't installed the profiling libraries for package `http-conduit-1.8.9'?
Indeed, I have not installed profiling libraries for http-conduit and I don't know how.
So you've found yourself a leak. By tricking with compiler options and memory settings you can only postpone the moment your program crashes, but you cannot eliminate the source of the problem, so no matter what you set there, you will still run out of memory eventually.
I recommend you to carefully walk thru all the non-pure code and primarilly the part working with resources. Check whether all resources get released correctly. Check whether you have an accumulating state, like a growing ubounded channel. And, of course, as wisely suggested by n.m., profile it.
I have a scraper that parses pages without pausing and downloads files, and it does it all concurrently. I've never seen it using any more memory than ~60M. I've been compiling it with GHC 7.4.2, GHC 7.6.1 and GHC 7.6.2 and had no problems with neither.
It should be noted that the root of your problem may also be in the libraries you're using.
In my scraper I use http-conduit, http-conduit-browser, HandsomeSoup and HXT.
I ended up solving my own problem. It seems to be a GHC bug on FreeBSD. I submitted a bug report and switched to Linux, and now it's been running flawlessly for the last few days.