After seeing EKG in 24 days of Hackage, I tried to use it in one of my programs, but it wasn't showing any of my memory allocation.
So I tried it again with a sample program that just sucks up memory:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import System.Remote.Monitoring (forkServer)
import Control.Applicative ((<$>))
import Control.Monad (foldM, forM_)
import Control.Monad.Primitive (PrimMonad, PrimState)
import Data.Vector.Mutable (MVector, replicate, read, write, length)
import Prelude hiding (read, length, replicate)
import Text.Printf
accumBy :: (Functor m, PrimMonad m) => (a -> a -> a) -> MVector (PrimState m) a -> m a
accumBy f v = do
a <- read v 0
foldM (\a i -> do
a' <- f a <$> read v i
write v i a'
return a'
) a [1 .. length v - 1]
main :: IO ()
main = do
forkServer "localhost" 8000
forM_ [1..] $ \n -> do
v <- replicate (n*1024) (n :: Int)
accumBy (+) v >>= printf "%08x\n"
The program runs fine
% ghc --make Temp.hs -rtsopts && ./Temp +RTS -K32mM -RTS
00000400
00001000
00002400
...
But EKG doesn't seem to be detecting my memory usage at all
What am I doing wrong?
You need to use -T or -t or -S or -s RTS option for collecting statistics, e.g.:
ghc --make Temp.hs -rtsopts && ./Temp +RTS -T -K32mM -RTS
Related
These two programs do the same thing, but one runs 10x faster.
This takes approx. 10 seconds on my machine:
import Control.Monad
import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as BL
theValueOne=B.singleton 1
main = replicateM_ 100000000 $ B.putStr theValueOne
The second version uses output-lazy IO. It is done in about 1 second (as fast as c):
import qualified Data.ByteString.Lazy as BL
main = BL.putStr $ BL.pack $ replicate 100000000 1
Question: Why is the non-lazy version so slow? More importantly, how can I make it fast? (I've tried recursion, forM, modifying the output buffer using hSetBuffering... Nothing has made a difference)
Note- This is more than just an academic question. The non-lazy version is an extremely simplified version of an executable my company uses in production, which is also slow in the same way. It would be nearly impossible to re-architect the larger program around the analogous lazy solution.
Updated: Added possible source of problem and a solution.
I don't think it has anything to do with lazy I/O. If you rewrite the strict I/O version to write two bytes at once:
theValueOne = B.singleton 1
main = replicateM_ 50000000 $ B.putStr (theValueOne <> theValueOne)
that halves the time. Write ten bytes at once:
theValueOne = B.singleton 1
main = replicateM_ 10000000 $ B.putStr (foldMap id (replicate 10 theValueOne))
and it's already faster than the lazy I/O version.
The issue is that there's a bit of overhead in a B.hPutStr call, much more than the overhead of a C fwrite call, and it's just not a particularly efficient way to write a single byte.
A good chunk of the overhead comes from the fact that Haskell I/O buffers have immutable metadata. Even though the buffer content itself is mutable, the pointers to valid data within the buffer are immutable, and so writing a single byte requires a heap allocation of a new GHC.IO.Buffer.Buffer structure which GHC can't optimize away
One solution is to use a hand-crafted buffering structure with a mutable pointer. The following works, and it's about twice as fast as the lazy I/O version in the original question.
{-# LANGUAGE RecordWildCards #-}
{-# OPTIONS_GHC -Wall #-}
import Control.Monad
import Data.IORef
import Data.Word
import Foreign.ForeignPtr
import Foreign.Ptr
import Foreign.Storable
import System.IO
data WriteBuffer = WriteBuffer
{ handle :: !Handle
, capacity :: !Int
, used :: !(IORef Int)
, content :: !(ForeignPtr Word8)
}
newBuffer :: Handle -> IO WriteBuffer
newBuffer h = do
hSetBinaryMode h True
hSetBuffering h NoBuffering
WriteBuffer h cap <$> newIORef 0 <*> mallocForeignPtrBytes cap
where cap = 4096
flushBuffer :: WriteBuffer -> IO ()
flushBuffer WriteBuffer{..} = do
n <- readIORef used
withForeignPtr content $ \p -> hPutBuf handle p n
writeIORef used 0
writeByte :: Word8 -> WriteBuffer -> IO ()
writeByte w buf#(WriteBuffer{..}) = do
n <- readIORef used
withForeignPtr content $ \p -> poke (plusPtr p n) w
let n' = n + 1
writeIORef used n'
when (n' == capacity) $
flushBuffer buf
main :: IO ()
main = do
b <- newBuffer stdout
replicateM_ 100000000 (writeByte 1 b)
flushBuffer b
Someone ironically, converting this to a version using an immutable counter and passing the WriteBuffer as state through foldM doubles the speed again, so it's about 4 times as fast as the lazy I/O version in the original question:
{-# LANGUAGE RecordWildCards #-}
{-# OPTIONS_GHC -Wall #-}
import Control.Monad
import Data.Word
import Foreign.ForeignPtr
import Foreign.Ptr
import Foreign.Storable
import System.IO
data WriteBuffer = WriteBuffer
{ handle :: !Handle
, capacity :: !Int
, used :: !Int
, content :: !(ForeignPtr Word8)
}
newBuffer :: Handle -> IO WriteBuffer
newBuffer h = do
hSetBinaryMode h True
hSetBuffering h NoBuffering
WriteBuffer h cap 0 <$> mallocForeignPtrBytes cap
where cap = 4096
flushBuffer :: WriteBuffer -> IO WriteBuffer
flushBuffer buf#WriteBuffer{..} = do
withForeignPtr content $ \p -> hPutBuf handle p used
return $ buf { used = 0 }
writeByte :: Word8 -> WriteBuffer -> IO WriteBuffer
writeByte w buf#(WriteBuffer{..}) = do
withForeignPtr content $ \p -> poke (plusPtr p used) w
let used' = used + 1
buf' = buf { used = used' }
if (used' == capacity)
then flushBuffer buf'
else return buf'
main :: IO ()
main = do
b <- newBuffer stdout
b' <- foldM (\s _ -> writeByte 1 s) b [(1::Int)..100000000]
void (flushBuffer b')
The reason this one is so fast seems to be that GHC is able to optimize away the WriteBuffer constructor entirely from the fold and just pass around unboxed pointers and integers in the loop. My guess is that if I modified the mutable version above to avoid boxing and unboxing the integer in the used IORef, it would be similarly fast.
I've written a function getSamplesFromFile that takes a file and returns its contents as a Vector of Floats. The functions reads the contents of the file into a Data.ByteString using Data.ByteString.hGet, it then converts this Data.ByteString to a Vector of Floats using:
import qualified Data.Vector.Unboxed as V
import qualified Data.ByteString as BS
import Data.Word
import System.Environment
import GHC.Int
toVector :: BS.ByteString -> V.Vector Float
toVector bs = vgenerate (fromIntegral (BS.length bs `div` 3)) $ \i ->
myToFloat [BS.index bs (3*i), BS.index bs (3*i+1), BS.index bs (3*i+2)]
where
myToFloat :: [Word8] -> Float
myToFloat = sum . map fromIntegral
vgenerate n f = V.generate n (f . fromIntegral)
I was testing how lazy this program was via a small test program:
main = do
[file] <- getArgs
samples <- getSamplesFromFile file
let slice = V.slice 0 50000 samples
print slice
If I run this on a 13MB file, it seems as if every sample is loaded into memory, even though I only need 50000 samples to be printed.
If I make a small modification to this problem and first map or filter over it, the result is different:
main = do
[file] <- getArgs
samples <- getSamplesFromFile file
let slice = V.slice 0 50000 samples
let mapped = V.map id slice
print mapped
This way, it seems that not every sample was loaded into memory, only the slice:
To make sure this was the case, I ran the program again with a slice of half the size (25000 samples):
Now, the memory usage seems to be proportional to the size of the slice. Just because I map over the slice with id.
The result is the same when filtering over the samples. How can applying a higher-order function suddenly make the behavior lazy?
EDIT
The problem seems to have to do something with cabal. As you can see from the pictures, I was testing my code inside a cabal project called laziness. I can't reproduce this weird behavior if use a separate Main.hs file outside of a cabal project. This is the Main.hs I'm using:
module Main where
import qualified Data.ByteString as BS
import qualified Data.Vector.Unboxed as V
import Data.Word
import GHC.Int
import System.Environment
main = do
[file] <- getArgs
samples <- getSamplesFromFile file
let slice = V.slice 0 50000 samples
--let filtered = V.filter (>0) slice
let mapped = V.map id slice
print slice
getSamplesFromFile = fmap toVector . BS.readFile
toVector :: BS.ByteString -> V.Vector Float
toVector bs = vgenerate (fromIntegral (BS.length bs `div` 3)) $ \i ->
myToFloat [BS.index bs (3*i), BS.index bs (3*i+1), BS.index bs (3*i+2)]
where
myToFloat :: [Word8] -> Float
myToFloat = sum . map fromIntegral
vgenerate n f = V.generate n (f . fromIntegral)
I don't experience the weird behavior if I do the following:
Create a new directory somewhere via mkdir
Add the above Main.hs to the directory.
Compile using ghc Main.hs -O2 -rtsopts -prof.
Run via ./Main myfile.wav +RTS -hy.
Create the pdf using hp2ps and ps2pdf.
I do experience the weird behavior if I do the following:
Create a new directory, laziness, via mkdir laziness.
Initiate a cabal project via cabal init.
Add the above Main.hs to /src.
Add ghc-options: -O2 -rtsopts -prof to laziness.cabal.
Compile using cabal install
Run via laziness myfile.wav +RTS -hy.
Create the pdf using hp2ps and ps2pdf.
I even experience the weird behavior if I:
cd laziness/src
Compile using ghc Main.hs -O2 -rtsopts -prof.
Run via ./Main myfile.wav +RTS -hy.
Create the pdf using hp2ps and ps2pdf.
So it seems that this behavior only occurs when the code is inside a cabal project. This seems weird to me. Could this have something to do with the setup of my cabal project?.
I have a computation that along with other things generates some data (a lot of it) and I want to write into a file.
The way the code is structured now is (simplified):
writeRecord :: Handle -> Record -> IO ()
writeRecord h r = hPutStrLn h (toByteString r)
This function is then called periodically during a bigger computation. It is almost like a log, and in fact, multiple files are being written simultaneously.
Now I want the output file to be compressed with Gzip.
In languages like Java I would do something like:
outStream = new GzipOutputStream(new FileOutputStream(path))
and then would just write into that wrapped output stream.
What is the way of doing it in Haskell?
I think writing something like
writeRecord h r = hPut h ((compressed . toByteString) r)
is not correct because compressing each small bit individually isn't efficient (I even tried it and the size of the compressed file is bigger than uncompressed this way).
I also don't think that I can just produce a lazy ByteString (or even a list of chunks) and then write it with compressed . fromChunks because this will require my "generator" building the full thing in memory. And the fact that more than one file is produced at the same time makes it even more complicated.
So what would be a way to solve this in Haskell? Writing to file(s) and have them gzipped?
All the streaming libraries support compression. If I understand the particular problem and the way you are thinking about it, io-streams might be the simplest for your purposes. Here I alternate between writing to trump and clinton output streams, which are written as compressed files. I follow by showing the pipes equivalent of Michael's conduit program
#!/usr/bin/env stack
-- stack --resolver lts-6.21 --install-ghc runghc --package io-streams
{-# LANGUAGE OverloadedStrings #-}
import qualified System.IO.Streams as IOS
import qualified System.IO as IO
import Data.ByteString (ByteString)
analyzer :: IOS.OutputStream ByteString -> IOS.OutputStream ByteString -> IO ()
analyzer clinton trump = do
IOS.write (Just "This is a string\n") clinton
IOS.write (Just "This is a string\n") trump
IOS.write (Just "Clinton string\n") clinton
IOS.write (Just "Trump string\n") trump
IOS.write (Just "Another Clinton string\n") clinton
IOS.write (Just "Another Trump string\n") trump
IOS.write Nothing clinton
IOS.write Nothing trump
main:: IO ()
main =
IOS.withFileAsOutput "some-file-clinton.txt.gz" $ \clinton_compressed ->
IOS.withFileAsOutput "some-file-trump.txt.gz" $ \trump_compressed -> do
clinton <- IOS.gzip IOS.defaultCompressionLevel clinton_compressed
trump <- IOS.gzip IOS.defaultCompressionLevel trump_compressed
analyzer clinton trump
Obviously you can mix all kinds of IO in analyzer between acts of writing to the two output streams - I'm just show in the writes, so to speak. In particular, if analyzer is understood as depending on an input stream, the writes can depend on reads from the input stream. Here's a (slightly!) more complicated program that does that. If I run the program above I see
$ stack gzip_so.hs
$ gunzip some-file-clinton.txt.gz
$ gunzip some-file-trump.txt.gz
$ cat some-file-clinton.txt
This is a string
Clinton string
Another Clinton string
$ cat some-file-trump.txt
This is a string
Trump string
Another Trump string
With pipes and conduit there are various ways of achieving the above effect, with a higher level of decomposition of parts. Writing to separate files will however be a little more subtle. Here in any case is the pipes equivalent of Michael S's conduit program:
#!/usr/bin/env stack
-- stack --resolver lts-6.21 --install-ghc runghc --package pipes-zlib
{-# LANGUAGE OverloadedStrings #-}
import Control.Monad.IO.Class (MonadIO, liftIO)
import Data.ByteString (ByteString, hPutStr)
import System.IO (IOMode(..), withFile, Handle)
import Pipes
import qualified Pipes.ByteString as PB
import qualified Pipes.GZip as P
-- Some helper function you may have
someAction :: IO ByteString
someAction = return "This is a string\n"
-- Original version
producerHandle :: Handle -> IO ()
producerHandle h = do
str <- someAction
hPutStr h str
producerPipe :: MonadIO m => Producer ByteString m ()
producerPipe = do
str <- liftIO someAction
yield str
main :: IO ()
main = withFile "some-file-pipes.txt.gz" WriteMode $ \h ->
runEffect $ P.compress P.defaultCompression producerPipe >-> PB.toHandle h
-- Edit
Here for what it's worth is yet another way of superimposing several producers on a single thread with pipes or conduit, to add to the different approaches Michael S and danidiaz mentioned:
#!/usr/bin/env stack
-- stack --resolver lts-6.21 --install-ghc runghc --package pipes-zlib
{-# LANGUAGE OverloadedStrings #-}
import Pipes
import Pipes.GZip
import qualified Pipes.Prelude as P
import qualified Pipes.ByteString as Bytes
import System.IO
import Control.Monad (replicateM_)
producer = replicateM_ 50000 $ do
marie "This is going to Marie\n" -- arbitary IO can be interspersed here
arthur "This is going to Arthur\n" -- with liftIO
sylvia "This is going to Sylvia\n"
where
marie = yield; arthur = lift . yield; sylvia = lift . lift . yield
sinkHelper h p = runEffect (compress bestSpeed p >-> Bytes.toHandle h)
main :: IO ()
main =
withFile "marie.txt.gz" WriteMode $ \marie ->
withFile "arthur.txt.gz" WriteMode $ \arthur ->
withFile "sylvia.txt.gz" WriteMode $ \sylvia ->
sinkHelper sylvia
$ sinkHelper arthur
$ sinkHelper marie
$ producer
It is quite simple and fast, and can be written in conduit with the obvious alterations - but finding it natural involves a higher level of buy-in with the 'monad transformer stack' point of view. It would be the most natural way of writing such a program from the point of view of something like the streaming library.
Doing this with conduit is fairly straightforward, though you'd need to adjust your code a bit. I've put together an example of before and after code to demonstrate it. The basic idea is:
Replace hPutStr h with yield
Add some liftIO wrappers
Instead of using withBinaryFile or the like, use runConduitRes, gzip, and sinkFile
Here's the example:
#!/usr/bin/env stack
-- stack --resolver lts-6.21 --install-ghc runghc --package conduit-extra
{-# LANGUAGE OverloadedStrings #-}
import Control.Monad.IO.Class (MonadIO, liftIO)
import Data.ByteString (ByteString, hPutStr)
import Data.Conduit (ConduitM, (.|), yield, runConduitRes)
import Data.Conduit.Binary (sinkFile)
import Data.Conduit.Zlib (gzip)
import System.IO (Handle)
-- Some helper function you may have
someAction :: IO ByteString
someAction = return "This is a string\n"
-- Original version
producerHandle :: Handle -> IO ()
producerHandle h = do
str <- someAction
hPutStr h str
-- Conduit version
producerConduit :: MonadIO m => ConduitM i ByteString m ()
producerConduit = do
str <- liftIO someAction
yield str
main :: IO ()
main = runConduitRes $ producerConduit
.| gzip
.| sinkFile "some-file.txt.gz"
You can learn more about conduit in the conduit tutorial.
Your Java idea is interesting, give me a few more minutes, I'll add an answer that looks more like that.
EDIT
Here's a version that's closer to your Java style approach. It relies on a SinkFunc.hs module which is available as a Gist at: https://gist.github.com/snoyberg/283154123d30ff9e201ea4436a5dd22d
#!/usr/bin/env stack
-- stack --resolver lts-6.21 --install-ghc runghc --package conduit-extra
{-# LANGUAGE OverloadedStrings #-}
{-# OPTIONS_GHC -Wall -Werror #-}
import Data.ByteString (ByteString)
import Data.Conduit ((.|))
import Data.Conduit.Binary (sinkHandle)
import Data.Conduit.Zlib (gzip)
import System.IO (withBinaryFile, IOMode (WriteMode))
import SinkFunc (withSinkFunc)
-- Some helper function you may have
someAction :: IO ByteString
someAction = return "This is a string\n"
producerFunc :: (ByteString -> IO ()) -> IO ()
producerFunc write = do
str <- someAction
write str
main :: IO ()
main = withBinaryFile "some-file.txt.gz" WriteMode $ \h -> do
let sink = gzip .| sinkHandle h
withSinkFunc sink $ \write -> producerFunc write
EDIT 2 One more for good measure, actually using ZipSink to stream the data to multiple different files. There are lots of different ways of slicing this, but this is one way that works:
#!/usr/bin/env stack
-- stack --resolver lts-6.21 --install-ghc runghc --package conduit-extra
{-# LANGUAGE OverloadedStrings #-}
import Control.Monad.Trans.Resource (MonadResource)
import Data.ByteString (ByteString)
import Data.Conduit (ConduitM, (.|), yield, runConduitRes, ZipSink (..))
import Data.Conduit.Binary (sinkFile)
import qualified Data.Conduit.List as CL
import Data.Conduit.Zlib (gzip)
data Output = Foo ByteString | Bar ByteString
fromFoo :: Output -> Maybe ByteString
fromFoo (Foo bs) = Just bs
fromFoo _ = Nothing
fromBar :: Output -> Maybe ByteString
fromBar (Bar bs) = Just bs
fromBar _ = Nothing
producer :: Monad m => ConduitM i Output m ()
producer = do
yield $ Foo "This is going to Foo"
yield $ Bar "This is going to Bar"
sinkHelper :: MonadResource m
=> FilePath
-> (Output -> Maybe ByteString)
-> ConduitM Output o m ()
sinkHelper fp f
= CL.mapMaybe f
.| gzip
.| sinkFile fp
main :: IO ()
main = runConduitRes
$ producer
.| getZipSink
(ZipSink (sinkHelper "foo.txt.gz" fromFoo) *>
ZipSink (sinkHelper "bar.txt.gz" fromBar))
For incremental compression, I think you could make use of compressIO/foldCompressStream in Codec.Compression.Zlib.Internal.
If you're able to represent your producer action as an IO (Maybe a) (such as an MVar take or InputStream/Chan read) where Nothing signifies end of input, something like this should work:
import System.IO (Handle)
import qualified Data.ByteString as BS
import qualified Codec.Compression.Zlib.Internal as ZLib
compressedWriter :: Handle -> (IO (Maybe BS.ByteString)) -> IO ()
compressedWriter handle source =
ZLib.foldCompressStream
(\next -> source >>= maybe (next BS.empty) next)
(\chunk next -> BS.hPut handle chunk >> next)
(return ())
(ZLib.compressIO ZLib.rawFormat ZLib.defaultCompressParams)
This solution is similar to Michael Snoyman's EDIT 2, but uses the foldl, pipes, pipes-zlib and streaming-eversion packages.
{-# language OverloadedStrings #-}
module Main where
-- cabal install bytestring foldl pipes pipes-zlib streaming-eversion
import Data.Foldable
import Data.ByteString
import qualified Control.Foldl as L
import Pipes
import qualified Pipes.Prelude
import Pipes.Zlib (compress,defaultCompression,defaultWindowBits)
import Streaming.Eversion.Pipes (transvertMIO)
import System.IO
type Tag = String
producer :: Monad m => Producer (Tag,ByteString) m ()
producer = do
yield $ ("foo","This is going to Foo")
yield $ ("bar","This is going to Bar")
foldForTag :: Handle -> Tag -> L.FoldM IO (Tag,ByteString) ()
foldForTag handle tag =
L.premapM (\(tag',bytes) -> if tag' == tag then Just bytes else Nothing)
. L.handlesM L.folded
. transvertMIO (compress defaultCompression defaultWindowBits)
$ L.mapM_ (Data.ByteString.hPut handle)
main :: IO ()
main = do
withFile "foo.txt" WriteMode $ \h1 ->
withFile "bar.txt" WriteMode $ \h2 ->
let multifold = traverse_ (uncurry foldForTag) [(h1,"foo"),(h2,"bar")]
in L.impurely Pipes.Prelude.foldM multifold producer
This solution is similar to Michael Snoyman's EDIT 2, but uses the streaming, streaming-bytestring, pipes and pipes-zlib packages.
{-# language OverloadedStrings #-}
module Main where
-- cabal install bytestring streaming streaming-bytestring pipes pipes-zlib
import Data.ByteString
import qualified Data.ByteString.Streaming as B
import Streaming
import qualified Streaming.Prelude as S
import Pipes (next)
import qualified Pipes.Prelude
import Pipes.Zlib (compress,defaultCompression,defaultWindowBits)
import System.IO
type Tag = String
producer :: Monad m => Stream (Of (Tag,ByteString)) m ()
producer = do
S.yield ("foo","This is going to Foo")
S.yield ("bar","This is going to Bar")
-- I couldn't find a streaming-zlib on Hackage, took a pipes detour
compress' :: MonadIO m
=> Stream (Of ByteString) m r -> Stream (Of ByteString) m r
compress' = S.unfoldr Pipes.next
. compress defaultCompression defaultWindowBits
. Pipes.Prelude.unfoldr S.next
keepTag :: Monad m
=> Tag -> Stream (Of (Tag,ByteString)) m r -> Stream (Of ByteString) m r
keepTag tag = S.map snd . S.filter ((tag==) . fst)
main :: IO ()
main = runResourceT
. B.writeFile "foo.txt" . B.fromChunks . compress' . keepTag "foo"
. B.writeFile "bar.txt" . B.fromChunks . compress' . keepTag "bar"
$ S.copy producer
I make use of the copy function from Streaming.Prelude, that allows you to
Duplicate the content of stream, so that it can be acted on twice in
different ways, but without breaking streaming.
The following program does not explode when the executable (compiled via ghc -O0 Explode.hs) is run, but does explode when run in ghci (via either ghci Explode.hs or ghci -fobject-code Explode.hs) :
--Explode.hs
--Does not explode with : ghc -O0 Explode.hs
--Explodes with : ghci Explode.hs
--Explodes with : ghci -fobject-code Explode.hs
module Main (main) where
import Data.Int
import qualified Data.ByteString.Lazy as BL
import qualified Data.ByteString.Lazy.Char8 as BLC
createStr :: Int64 -> String -> BL.ByteString
createStr num str = BL.take num $ BL.cycle $ BLC.pack str
main = do
BLC.writeFile "results.txt" $ createStr 100000000 "abc\n"
Why does it explode in ghci and not with ghc -O0 Explode.hs, and how can I stop it from exploding in ghci? The methods I adopted in Memory blowing up for strict sum/strict foldl in ghci dont seem to work here. Thanks.
After inspecting the code of writeFile, it seems that it depends on the hPut function of Data.ByteString.Lazy:
-- | Outputs a 'ByteString' to the specified 'Handle'.
--
hPut :: Handle -> ByteString -> IO ()
hPut h cs = foldrChunks (\c rest -> S.hPut h c >> rest) (return ()) cs
hPut constructs the IO action that will print the lazy bytestring by applying a right fold of sorts over the chunks. The source for the foldrChunks function is:
-- | Consume the chunks of a lazy ByteString with a natural right fold.
foldrChunks :: (S.ByteString -> a -> a) -> a -> ByteString -> a
foldrChunks f z = go
where go Empty = z
go (Chunk c cs) = f c (go cs)
Looking at the code, it seems as if the "spine" of the lazy bytestring (but not the actual data in each chunk) will be forced before writing the first byte, because of how (>>) behaves for the IO monad.
In your example, the strict chunks composing your lazy bytestring are very small. This means a whole lot of them will be generated when foldrChunks "forces the spine" of the 100000000 character long lazy bytestring.
If this analysis is correct, then reducing the number of strict chunks by making them bigger would reduce memory usage. This variant of createStr that creates bigger chunks doesn't blow up for me in ghci:
createStr :: Int64 -> String -> BL.ByteString
createStr num str = BL.take num $ BL.cycle $ BLC.pack $ concat $ replicate 1000 $ str
(I'm not sure why the compiled example doesn't blow up.)
I am trying to come up with equivalent of "wc -l" using Haskell Iteratee library. Below is the code for "wc" (which just counts the words - similar to the code in iteratee example on hackage), and runs very fast:
{-# LANGUAGE BangPatterns #-}
import Data.Iteratee as I
import Data.ListLike as LL
import Data.Iteratee.IO
import Data.ByteString
length1 :: (Monad m, Num a, LL.ListLike s el) => Iteratee s m a
length1 = liftI (step 0)
where
step !i (Chunk xs) = liftI (step $ i + fromIntegral (LL.length xs))
step !i stream = idone i stream
{-# INLINE length1 #-}
main = do
i' <- enumFile 1024 "/usr/share/dict/words" (length1 :: (Monad m) => Iteratee ByteString m Int)
result <- run i'
print result
{- Time measured on a linux x86 box:
$ time ./test ## above haskell compiled code
4950996
real 0m0.013s
user 0m0.004s
sys 0m0.007s
$ time wc -c /usr/share/dict/words
4950996 /usr/share/dict/words
real 0m0.003s
user 0m0.000s
sys 0m0.002s
-}
Now, how do you extend it to count the number of lines that too runs fast? I did a version using Prelude.filter to filter only "\n" to length but it is slower than linux "wc -l" because of too much memory, and gc (lazy evaluation, I guess). So, I wrote another version using Data.ListLike.filter but it won't compile because it doesn't type check - help here would be appreciated:
{-# LANGUAGE BangPatterns #-}
import Data.Iteratee as I
import Data.ListLike as LL
import Data.Iteratee.IO
import Data.ByteString
import Data.Char
import Data.ByteString.Char8 (pack)
numlines :: (Monad m, Num a, LL.ListLike s el) => Iteratee s m a
numlines = liftI $ step 0
where
step !i (Chunk xs) = liftI (step $i + fromIntegral (LL.length $ LL.filter (\x -> x == Data.ByteString.Char8.pack "\n") xs))
step !i stream = idone i stream
{-# INLINE numlines #-}
main = do
i' <- enumFile 1024 "/usr/share/dict/words" (numlines :: (Monad m) => Iteratee ByteString m Int)
result <- run i'
print result
So I did some experimenting and I got a wc -l that is only about twice as slow as "wc -l" This is better performance than even the wc -c version shown above.
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.ByteString.Lazy.Char8 as BSL
import qualified Data.ByteString.Char8 as BS
import qualified Data.Enumerator as E
import qualified Data.Enumerator.Binary as EB
import Control.Monad.IO.Class (liftIO)
import Data.Int
numlines :: Int64 -> E.Iteratee BS.ByteString IO ()
numlines n = do
chunk <- EB.take 1024
case chunk of
"" -> do liftIO $ print n
return ()
a -> do let ct = BSL.count '\n' a
numlines (n+ct)
main = do
let i = EB.enumFile "/usr/share/dict/words" E.$$ numlines 0
E.run_ i
Running it vs. native:
Eriks-MacBook-Air:skunk erikhinton$ time wc -l "/usr/share/dict/words"
235886 /usr/share/dict/words
real 0m0.009s
user 0m0.006s
sys 0m0.002s
Eriks-MacBook-Air:skunk erikhinton$ time ./wcl
235886
real 0m0.019s
user 0m0.013s
sys 0m0.005s
[EDIT]
Here's an even faster, smaller footprint and far more concise/expressive way of doing it. These enumerators are starting to get fun.
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.ByteString.Lazy.Char8 as BSL
import qualified Data.ByteString.Char8 as BS
import qualified Data.Enumerator as E
import qualified Data.Enumerator.Binary as EB
import qualified Data.Enumerator.List as EL
import Control.Monad.IO.Class (liftIO)
import Data.Int
numlines :: E.Iteratee BS.ByteString IO ()
numlines = do
num <- EL.fold (\n b -> (BS.count '\n' b) + n ) 0
liftIO . print $ num
main = do
let i = EB.enumFile "/usr/share/dict/words" E.$$ numlines
E.run_ i
And the timing
Eriks-MacBook-Air:skunk erikhinton$ time ./wcl2
235886
real 0m0.015s
user 0m0.010s
sys 0m0.004s
There are a lot of good answers already; I have very little to offer performance-wise but a few style points.
First, I would write it this way:
import Prelude as P
import Data.Iteratee
import qualified Data.Iteratee as I
import qualified Data.Iteratee.IO as I
import qualified Data.ByteString as B
import Data.Char
import System.Environment
-- numLines has a concrete stream type so it's not necessary to provide an
-- annotation later. It could have a more general type.
numLines :: Monad m => I.Iteratee B.ByteString m Int
numLines = I.foldl' step 0
where
--step :: Int -> Word8 -> Int
step acc el = if el == (fromIntegral $ ord '\n') then acc + 1 else acc
main = do
f:_ <- getArgs
words <- run =<< I.enumFile 65536 f numLines
print words
The biggest difference is that this uses Data.Iteratee.ListLike.foldl'. Note that only the individual stream elements matter to the step function, not the stream type. It's exactly the same function as you would use with e.g. Data.ByteString.Lazy.foldl'.
Using foldl' also means that you don't need to manually write iteratees with liftI. I would discourage users from doing so unless absolutely necessary. The result is usually longer and harder to maintain with little to no benefit.
Finally, I've increased the buffer size significantly. On my system this is marginally faster than enumerators default of 4096, which is again marginally faster (with iteratee) than your choice of 1024. YMMV with this setting of course.
If you're reading ByteString chunks, you can use the count function from Data.ByteString, the relevant step would then be
step !i (Chunk xs) = liftI (step $ i + count 10 xs)
(perhaps with a fromIntegral). Data.ByteString.count is pretty fast, that shouldn't be too much slower than wc -l.
I figured out how to fix the type error. The key to fixing type error is understanding the relationship between Data.ListLike.filter and ByteString input that is being passed to that filter. Here is the type of Data.ListLike.filter:
Data.ListLike.filter
:: Data.ListLike.Base.ListLike full item =>
(item -> Bool) -> full -> full
full refers to the stream in the context of an enumerator/iteratee, if I understand it correctly. item refers to the element of the stream.
Now, if we want to filter on newline in the input file, we have to know the type of input file stream, and the type of elements in that stream. In this case, input file is being read as ByteString stream. ByteString is documented as a space-efficient representation of a Word8 vector. So, item type here is Word8.
So, when we write the filter, in the step function, we have to make sure that Bool operation is defined for Word8 since that is the type of the item being passed to the filter (as explained above). We are filtering for newline. So, the bool function like the one below which builds a Word8 representation of newline, and check for equality against x of type Word8, should work:
\x -> x == Data.ByteString.Internal.c2w '\n'
There is still one more missing piece - for some reasons, the compiler (v7.0.3 Mac) is unable to deduce the type of el in numfile type signature (if anyone has ideas on why it is so, please do discuss). So, telling it explicitly that it is Word8 solves the compilation issue:
numlines :: (Monad m, Num a, LL.ListLike s Word8) => Iteratee s m a
Full code below - it compiles, and runs quite fast.
{-# LANGUAGE BangPatterns,FlexibleContexts #-}
import Data.Iteratee as I
import Data.ListLike as LL
import Data.Iteratee.IO
import Data.ByteString
import GHC.Word (Word8)
import Data.ByteString.Internal (c2w)
numlines :: (Monad m, Num a, LL.ListLike s Word8) => Iteratee s m a
numlines = liftI $ step 0
where
step !i (Chunk xs) = let newline = c2w '\n' in liftI (step $i + fromIntegral (LL.length $ LL.filter (\x -> x == newline) xs))
step !i stream = idone i stream
{-# INLINE numlines #-}
main = do
i' <- enumFile 1024 "/usr/share/dict/words" (numlines :: (Monad m) => Iteratee ByteString m Int)
result <- run i'
print result
{- Time to run on mac OSX:
$ time ./test ## above compiled program: ghc --make -O2 test.hs
235886
real 0m0.011s
user 0m0.007s
sys 0m0.004s
$ time wc -l /usr/share/dict/words
235886 /usr/share/dict/words
real 0m0.005s
user 0m0.002s
sys 0m0.002s
-}