Evaluation of higher-order functions - haskell

I've written a function getSamplesFromFile that takes a file and returns its contents as a Vector of Floats. The functions reads the contents of the file into a Data.ByteString using Data.ByteString.hGet, it then converts this Data.ByteString to a Vector of Floats using:
import qualified Data.Vector.Unboxed as V
import qualified Data.ByteString as BS
import Data.Word
import System.Environment
import GHC.Int
toVector :: BS.ByteString -> V.Vector Float
toVector bs = vgenerate (fromIntegral (BS.length bs `div` 3)) $ \i ->
myToFloat [BS.index bs (3*i), BS.index bs (3*i+1), BS.index bs (3*i+2)]
where
myToFloat :: [Word8] -> Float
myToFloat = sum . map fromIntegral
vgenerate n f = V.generate n (f . fromIntegral)
I was testing how lazy this program was via a small test program:
main = do
[file] <- getArgs
samples <- getSamplesFromFile file
let slice = V.slice 0 50000 samples
print slice
If I run this on a 13MB file, it seems as if every sample is loaded into memory, even though I only need 50000 samples to be printed.
If I make a small modification to this problem and first map or filter over it, the result is different:
main = do
[file] <- getArgs
samples <- getSamplesFromFile file
let slice = V.slice 0 50000 samples
let mapped = V.map id slice
print mapped
This way, it seems that not every sample was loaded into memory, only the slice:
To make sure this was the case, I ran the program again with a slice of half the size (25000 samples):
Now, the memory usage seems to be proportional to the size of the slice. Just because I map over the slice with id.
The result is the same when filtering over the samples. How can applying a higher-order function suddenly make the behavior lazy?
EDIT
The problem seems to have to do something with cabal. As you can see from the pictures, I was testing my code inside a cabal project called laziness. I can't reproduce this weird behavior if use a separate Main.hs file outside of a cabal project. This is the Main.hs I'm using:
module Main where
import qualified Data.ByteString as BS
import qualified Data.Vector.Unboxed as V
import Data.Word
import GHC.Int
import System.Environment
main = do
[file] <- getArgs
samples <- getSamplesFromFile file
let slice = V.slice 0 50000 samples
--let filtered = V.filter (>0) slice
let mapped = V.map id slice
print slice
getSamplesFromFile = fmap toVector . BS.readFile
toVector :: BS.ByteString -> V.Vector Float
toVector bs = vgenerate (fromIntegral (BS.length bs `div` 3)) $ \i ->
myToFloat [BS.index bs (3*i), BS.index bs (3*i+1), BS.index bs (3*i+2)]
where
myToFloat :: [Word8] -> Float
myToFloat = sum . map fromIntegral
vgenerate n f = V.generate n (f . fromIntegral)
I don't experience the weird behavior if I do the following:
Create a new directory somewhere via mkdir
Add the above Main.hs to the directory.
Compile using ghc Main.hs -O2 -rtsopts -prof.
Run via ./Main myfile.wav +RTS -hy.
Create the pdf using hp2ps and ps2pdf.
I do experience the weird behavior if I do the following:
Create a new directory, laziness, via mkdir laziness.
Initiate a cabal project via cabal init.
Add the above Main.hs to /src.
Add ghc-options: -O2 -rtsopts -prof to laziness.cabal.
Compile using cabal install
Run via laziness myfile.wav +RTS -hy.
Create the pdf using hp2ps and ps2pdf.
I even experience the weird behavior if I:
cd laziness/src
Compile using ghc Main.hs -O2 -rtsopts -prof.
Run via ./Main myfile.wav +RTS -hy.
Create the pdf using hp2ps and ps2pdf.
So it seems that this behavior only occurs when the code is inside a cabal project. This seems weird to me. Could this have something to do with the setup of my cabal project?.

Related

Data.Binary encodeFile does not seem to be thread safe - corrupted file?

I'm trying to replicate a situation where a binary file was essentially corrupted with a filesize of 0 in a real world application via encodeFile, this occurred after a hard reboot.
Although I've not been able to replicate this behavior exactly, I have gotten it to replicate a corrupted(?) file with code below.
When we first run it (some text is garbled due to multiple threads printing):
"New valid file written"
Example "hmm" [0]
"Testing..."
"Donenn"oo
tt een#no~ouGugHghCh I bDby-ytSteTesAs
R
CTCa~al#lllSSttaacckk ((ffrroomm HHaassCCaallllStS#at~caGkcH)kC:)I
:D
- Fe IreNrrIorSroH,r- ,5c ~ac#la
llelde #da~ tGa HtsC rIscDr/-cMS/aTMiAanRi.Tnh~.s#h:s5:35:31:51 5i ni nm amiani:nM:aMiani
n
"d"ideiien#ig~n.Gg.H..C..I..D..-..S."T.
Command "cabal v2-repl app" exited unexpectedly
After a few runs eventually we get an error of:
*** Exception: not enough bytes
CallStack (from HasCallStack):
error, called at src/Main.hs:53:15 in main:Main
What is the cause of this error? Is it just the case that encodeFile is not safe when used via multiple threads (which is kind of odd as there is no mention of threads on https://hackage.haskell.org/package/binary-0.10.0.0/docs/Data-Binary.html).
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE LambdaCase #-}
module Main where
import System.PosixCompat.Files
import System.Process
import System.Process.Internals
import System.Posix.Signals
import System.Posix.Process
import Control.Concurrent
import Control.Monad
import Data.Binary
import GHC.Generics (Generic)
import Control.Exception
data Example = Example String [Int] deriving (Generic, Show)
instance Binary Example
main :: IO ()
main = do
checkFile
encodeFile "output.txt" $ Example "hmm" [0]
checkFile
print "New valid file written"
decodeFileOrFail "output.txt" >>= \case
Right v#(Example s z) -> print v
Left (e,e') -> do
error $ e'
rip
print "Testing..."
forM_ [1..3] (const $ forkIO $ catch (do
checkFile
somethingIO
checkFile) (\e -> do
print (e :: SomeException)
rip
)
)
print "Done"
checkFile :: IO ()
checkFile = do
fileExist "output.txt" >>= \case
True -> do
x <- getFileSize "output.txt"
if x == 0 then
rip
else
pure ()
decodeFileOrFail "output.txt" >>= \case
Right (Example s z) -> pure ()
Left (e,e') -> do
error $ e'
rip
False -> pure ()
rip :: IO ()
rip = do
print "dieing......."
getProcessID >>= signalProcess sigKILL
somethingIO :: IO ()
somethingIO = do
let v = 10 :: Int
decodeFileOrFail "output.txt" >>= \case
Right (Example s z) -> encodeFile "output.txt" $ z ++ [v]
Left (e,e') -> do
error $ e'
rip
getFileSize :: String -> IO Int
getFileSize path = getFileStatus path >>= return . fromIntegral . fileSize
With a cabal file of:
cabal-version: 1.12
name: HaskellNixCabalStarter
version: 0.1.0.0
author: HaskellNixCabalStarter
maintainer: HaskellNixCabalStarter
license: MIT
build-type: Simple
executable app
main-is: Main.hs
other-modules:
Paths_HaskellNixCabalStarter
hs-source-dirs:
src
build-depends:
base >=4.12 && <4.13
, binary
, process
, random
, unix
, unix-compat
default-language: Haskell2010
There's nothing particularly mysterious going on here. Reading and writing files simply aren't atomic operations, and this is biting you. If you have one thread writing output.txt and another reading output.txt, it is completely normal and expected for the reader to occasionally see only part of the file that the writer would eventually produce.
This is not particularly special to the binary package, nor even to the language -- this is, to a first approximation, true of nearly every library and language that deals with a filesystem. Guaranteeing atomicity of the appropriate kind is quite hard, indeed; but many, many engineering years have gone into providing this kind of thing for databases, so if that's a need for you, you might consider using one of them.
Alternately, a significantly simpler solution is to have a single thread that is responsible for reading and writing the appropriate file, and to communicate with it via one of Haskell's excellent inter-thread communication tools.
Some OSs do offer an atomic file-rename operation. In such a situation, one could also consider writing to a temporary file, then using an atomic rename to overwrite the filename you actually care about. (Thanks to a commenter who I will leave anonymous because they chose to delete their comment for suggesting this.)

Why would a program be faster in runghc or with profiling?

I've got the following program that takes a big input (a list of extension/mime mapping, a list of files) and output results line by line (the mime type for each file).
import System.IO
import Control.Monad
import qualified Data.Map as M
import System.FilePath
import Data.Char
main :: IO ()
main = do
input_line <- getLine
let n = read input_line :: Int -- Number of elements which make up the association table.
input_line <- getLine
let q = read input_line :: Int -- Number Q of file names to be analyzed.
mimeMap <- fmap M.fromList $ replicateM n $ do
input_line <- getLine
let input = words input_line
let ext = input!!0 -- file extension
let mt = input!!1 -- MIME type.
return (map toLower ext, mt)
replicateM_ q $ do
fname <- getLine
let ext = map toLower . drop 1 . takeExtension $ fname
mime = M.findWithDefault "UNKNOWN" ext mimeMap
putStrLn mime
The program was quite slow, so I started profiling it, and I got a strange result.
When compiled with
ghc --make -O2 coding.hs
the program is very slow. However, the -fprof-auto seems to speed it all up. Compiled with
ghc --make -O2 coding.hs -prof -fprof-auto -fforce-recomp
makes it blazing fast -prof alone has no effect.
Strangely, it is also very fast when run with runghc coding.hs.
I have no idea in what direction to go from there. Does anyone understand what is happenning here?
EDIT: I should mention that my ghc is 7.10.1.
To provide a complete answer to the question:
As Reid Barton mentioned, the problem seems to be the infamous state hack optimization, which inlines mimeMap into the repeated IO action, executing it much more times than necessary. -fno-state-hack disables that optimization and solves the problem. Another way to solve the problem is to force a strict evaluation of ``mimeMap.
!mimeMap <- fmap M.fromList $ replicateM n [...]
However, there also seems to be a regression in GHC 7.10, in which -fno-state-hack does not solves the problem. That explains why it didn't fix it for me.
Thanks a lot everyone for your answers.

Haskell Hashtable Performance

I am trying to use hash tables in Haskell with the hashtables package, and finding that I cannot get anywhere near Python's performance. How can I achieve similar performance? Is it possible given current Haskell libraries and compilers? If not, what's the underlying issue?
Here is my Python code:
y = {}
for x in xrange(10000000):
y[x] = x
print y[100]
Here's my corresponding Haskell code:
import qualified Data.HashTable.IO as H
import Control.Monad
main = do
y <- H.new :: IO (H.CuckooHashTable Int Int)
forM_ [1..10000000] $ \x -> H.insert y x x
H.lookup y 100 >>= print
Here is another version using Data.Map, which is slower than both for me:
import qualified Data.Map as Map
import Data.List
import Control.Monad
main = do
let m = foldl' (\m x -> Map.insert x x m) Map.empty [1..10000000]
print $ Map.lookup 100 m
Interestingly enough, Data.HashMap performs very badly:
import qualified Data.HashMap.Strict as Map
import Data.List
main = do
let m = foldl' (\m x -> Map.insert x x m) Map.empty [1..10000000]
print $ Map.lookup 100 m
My suspicion is that Data.HashMap performs badly because unlike Data.Map, it is not spine-strict (I think), so foldl' is just a foldl, with the associated thunk buildup problems.
Note that I have used -prof and verified that the majority of the time is spend in the hashtables or Data.Map code, not on the forM or anything like that. All code is compiled with -O2 and no other parameters.
As reddit.com/u/cheecheeo suggested here, using Data.Judy, you'll get similar performance for your particular microbenchmark:
module Main where
import qualified Data.Judy as J
import Control.Monad (forM_)
main = do
h <- J.new :: IO (J.JudyL Int)
forM_ [0..10000000] $ \i -> J.insert (fromIntegral i) i h
v <- J.lookup 100 h
putStrLn $ show v
Timeing the above:
$ time ./Main
Just 100
real 0m0.958s
user 0m0.924s
sys 0m0.032s
Timing the python code of OP:
$ time ./main.py
100
real 0m1.067s
user 0m0.886s
sys 0m0.180s
The documentation for hashtables notes that "Cuckoo hashing, like the basic hash table implementation using linear probing, can suffer from long delays when the table is resized." You use new, which creates a new table of the default size. From looking at the source, it appears that the default size is 2. Inserting 10000000 items likely entails numerous resizings.
Try using newSized.
Given the times above, I thought I would throw in the Data.Map solution, which seems to be comparable to using newSized.
import qualified Data.Map as M
main = do
print $ M.lookup 100 $ M.fromList $ map (\x -> (x,x)) [1..10000000]

Why isn't EKG showing my allocated memory?

After seeing EKG in 24 days of Hackage, I tried to use it in one of my programs, but it wasn't showing any of my memory allocation.
So I tried it again with a sample program that just sucks up memory:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import System.Remote.Monitoring (forkServer)
import Control.Applicative ((<$>))
import Control.Monad (foldM, forM_)
import Control.Monad.Primitive (PrimMonad, PrimState)
import Data.Vector.Mutable (MVector, replicate, read, write, length)
import Prelude hiding (read, length, replicate)
import Text.Printf
accumBy :: (Functor m, PrimMonad m) => (a -> a -> a) -> MVector (PrimState m) a -> m a
accumBy f v = do
a <- read v 0
foldM (\a i -> do
a' <- f a <$> read v i
write v i a'
return a'
) a [1 .. length v - 1]
main :: IO ()
main = do
forkServer "localhost" 8000
forM_ [1..] $ \n -> do
v <- replicate (n*1024) (n :: Int)
accumBy (+) v >>= printf "%08x\n"
The program runs fine
% ghc --make Temp.hs -rtsopts && ./Temp +RTS -K32mM -RTS
00000400
00001000
00002400
...
But EKG doesn't seem to be detecting my memory usage at all
What am I doing wrong?
You need to use -T or -t or -S or -s RTS option for collecting statistics, e.g.:
ghc --make Temp.hs -rtsopts && ./Temp +RTS -T -K32mM -RTS

IO over big files in haskell: Performance issue

I'm trying to work over big files using Haskell. I'd like to browse an input file byte after byte, and to generate an output byte after byte. Of course I need the IO to be buffered with blocks of reasonable size (a few KB). I can't do it, and I need your help please.
import System
import qualified Data.ByteString.Lazy as BL
import Data.Word
import Data.List
main :: IO ()
main =
do
args <- System.getArgs
let filename = head args
byteString <- BL.readFile filename
let wordsList = BL.unpack byteString
let foldFun acc word = doSomeStuff word : acc
let wordsListCopy = foldl' foldFun [] wordsList
let byteStringCopy = BL.pack (reverse wordsListCopy)
BL.writeFile (filename ++ ".cpy") byteStringCopy
where
doSomeStuff = id
I name this file TestCopy.hs, then do the following:
$ ls -l *MB
-rwxrwxrwx 1 root root 10000000 2011-03-24 13:11 10MB
-rwxrwxrwx 1 root root 5000000 2011-03-24 13:31 5MB
$ ghc --make -O TestCopy.hs
[1 of 1] Compiling Main ( TestCopy.hs, TestCopy.o )
Linking TestCopy ...
$ time ./TestCopy 5MB
real 0m5.631s
user 0m1.972s
sys 0m2.488s
$ diff 5MB 5MB.cpy
$ time ./TestCopy 10MB
real 3m6.671s
user 0m3.404s
sys 1m21.649s
$ diff 10MB 10MB.cpy
$ time ./TestCopy 10MB +RTS -K500M -RTS
real 2m50.261s
user 0m3.808s
sys 1m13.849s
$ diff 10MB 10MB.cpy
$
My problem: There is a huge difference between a 5MB and a 10 MB file. I'd like the performances to be linear in the size of the input file. Please what am i doing wrong, and how can I achieve this? I don't mind using lazy bytestrings or anything else as long as it works, but it has to be a standard ghc library.
Precision: It's for a university project. And I'm not trying to copy files. The doSomeStuff function shall perform compression/decompression actions that I have to customize.
For chunked input processing I would use the enumerator package.
import Data.Enumerator
import Data.Enumerator.Binary (enumFile)
We use bytestrings
import Data.ByteString as BS
and IO
import Control.Monad.Trans (liftIO)
import Control.Monad (mapM_)
import System (getArgs)
Your main function could look like following:
main =
do (filepath:_) <- getArgs
let destination
run_ $ enumFile filepath $$ writeFile (filepath ++ ".cpy")
enumFile reads 4096 bytes per chunk and passes these to writeFile, which writes it down.
enumWrite is defined as:
enumWrite :: FilePath -> Iteratee BS.ByteString IO ()
enumWrite filepath =
do liftIO (BS.writeFile filepath BS.empty) -- ensure the destination is empty
continue step
where
step (Chunks xs) =
do liftIO (mapM_ (BS.appendFile filepath) xs)
continue step
step EOF = yield () EOF
As you can see, the step function takes chunks of bytestrings and appends them
to the destination file. These chunks have the type Stream BS.Bytestring, where
Stream is defined as:
data Stream a = Chunks [a] | EOF
On an EOF step terminates, yielding ().
To have a much more elaborate read on this I personally recommend Michael
Snoymans tutorial
The numbers
$ time ./TestCopy 5MB
./TestCopy 5MB 2,91s user 0,32s system 96% cpu 3,356 total
$ time ./TestCopy2 5MB
./TestCopy2 5MB 0,04s user 0,03s system 93% cpu 0,075 total
That's quite an improvement. Now in order to implement your fold you probably want to write an Enumeratee, which is used to transform a input stream. Fortunately there is already a map function defined in the enumerator package, which can be modified for your need, i.e. it can be modified to carry over state.
On the construction of the intermediate result
You construct wordsList in reverse order and reverse it afterwards. I think difference lists do a better job, because appends take only O(1) time due to the fact that appending is only a function composition. I'm not sure whether they takes more space though. Here's a rough sketch of difference lists:
type DList a = [a] -> [a]
emptyList :: DList a
emptyList = id
snoc :: DList a -> a -> DList a
snoc dlist a = dlist . (a:)
toList :: DList a -> [a]
toList dlist = dlist []
This answer is probably not needed anymore, but I added it for completeness.
I take it this is a follow on to Reading large file in haskell? from yesterday.
Try compiling with "-rtsopts -O2" instead of just "-O".
You claim "I'd like to browse an input file byte after byte, and to generate an output byte after byte." but your code reads the entire input before trying to create any output. This is just not very representative of the goal.
With my system I see "ghc -rtsopts --make -O2 b.hs" giving
(! 741)-> time ./b five
real 0m2.789s user 0m2.235s sys 0m0.518s
(! 742)-> time ./b ten
real 0m5.830s user 0m4.623s sys 0m1.027s
Which now looks linear to me.

Resources