Why would a program be faster in runghc or with profiling? - haskell

I've got the following program that takes a big input (a list of extension/mime mapping, a list of files) and output results line by line (the mime type for each file).
import System.IO
import Control.Monad
import qualified Data.Map as M
import System.FilePath
import Data.Char
main :: IO ()
main = do
input_line <- getLine
let n = read input_line :: Int -- Number of elements which make up the association table.
input_line <- getLine
let q = read input_line :: Int -- Number Q of file names to be analyzed.
mimeMap <- fmap M.fromList $ replicateM n $ do
input_line <- getLine
let input = words input_line
let ext = input!!0 -- file extension
let mt = input!!1 -- MIME type.
return (map toLower ext, mt)
replicateM_ q $ do
fname <- getLine
let ext = map toLower . drop 1 . takeExtension $ fname
mime = M.findWithDefault "UNKNOWN" ext mimeMap
putStrLn mime
The program was quite slow, so I started profiling it, and I got a strange result.
When compiled with
ghc --make -O2 coding.hs
the program is very slow. However, the -fprof-auto seems to speed it all up. Compiled with
ghc --make -O2 coding.hs -prof -fprof-auto -fforce-recomp
makes it blazing fast -prof alone has no effect.
Strangely, it is also very fast when run with runghc coding.hs.
I have no idea in what direction to go from there. Does anyone understand what is happenning here?
EDIT: I should mention that my ghc is 7.10.1.

To provide a complete answer to the question:
As Reid Barton mentioned, the problem seems to be the infamous state hack optimization, which inlines mimeMap into the repeated IO action, executing it much more times than necessary. -fno-state-hack disables that optimization and solves the problem. Another way to solve the problem is to force a strict evaluation of ``mimeMap.
!mimeMap <- fmap M.fromList $ replicateM n [...]
However, there also seems to be a regression in GHC 7.10, in which -fno-state-hack does not solves the problem. That explains why it didn't fix it for me.
Thanks a lot everyone for your answers.

Related

Data.Binary encodeFile does not seem to be thread safe - corrupted file?

I'm trying to replicate a situation where a binary file was essentially corrupted with a filesize of 0 in a real world application via encodeFile, this occurred after a hard reboot.
Although I've not been able to replicate this behavior exactly, I have gotten it to replicate a corrupted(?) file with code below.
When we first run it (some text is garbled due to multiple threads printing):
"New valid file written"
Example "hmm" [0]
"Testing..."
"Donenn"oo
tt een#no~ouGugHghCh I bDby-ytSteTesAs
R
CTCa~al#lllSSttaacckk ((ffrroomm HHaassCCaallllStS#at~caGkcH)kC:)I
:D
- Fe IreNrrIorSroH,r- ,5c ~ac#la
llelde #da~ tGa HtsC rIscDr/-cMS/aTMiAanRi.Tnh~.s#h:s5:35:31:51 5i ni nm amiani:nM:aMiani
n
"d"ideiien#ig~n.Gg.H..C..I..D..-..S."T.
Command "cabal v2-repl app" exited unexpectedly
After a few runs eventually we get an error of:
*** Exception: not enough bytes
CallStack (from HasCallStack):
error, called at src/Main.hs:53:15 in main:Main
What is the cause of this error? Is it just the case that encodeFile is not safe when used via multiple threads (which is kind of odd as there is no mention of threads on https://hackage.haskell.org/package/binary-0.10.0.0/docs/Data-Binary.html).
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE LambdaCase #-}
module Main where
import System.PosixCompat.Files
import System.Process
import System.Process.Internals
import System.Posix.Signals
import System.Posix.Process
import Control.Concurrent
import Control.Monad
import Data.Binary
import GHC.Generics (Generic)
import Control.Exception
data Example = Example String [Int] deriving (Generic, Show)
instance Binary Example
main :: IO ()
main = do
checkFile
encodeFile "output.txt" $ Example "hmm" [0]
checkFile
print "New valid file written"
decodeFileOrFail "output.txt" >>= \case
Right v#(Example s z) -> print v
Left (e,e') -> do
error $ e'
rip
print "Testing..."
forM_ [1..3] (const $ forkIO $ catch (do
checkFile
somethingIO
checkFile) (\e -> do
print (e :: SomeException)
rip
)
)
print "Done"
checkFile :: IO ()
checkFile = do
fileExist "output.txt" >>= \case
True -> do
x <- getFileSize "output.txt"
if x == 0 then
rip
else
pure ()
decodeFileOrFail "output.txt" >>= \case
Right (Example s z) -> pure ()
Left (e,e') -> do
error $ e'
rip
False -> pure ()
rip :: IO ()
rip = do
print "dieing......."
getProcessID >>= signalProcess sigKILL
somethingIO :: IO ()
somethingIO = do
let v = 10 :: Int
decodeFileOrFail "output.txt" >>= \case
Right (Example s z) -> encodeFile "output.txt" $ z ++ [v]
Left (e,e') -> do
error $ e'
rip
getFileSize :: String -> IO Int
getFileSize path = getFileStatus path >>= return . fromIntegral . fileSize
With a cabal file of:
cabal-version: 1.12
name: HaskellNixCabalStarter
version: 0.1.0.0
author: HaskellNixCabalStarter
maintainer: HaskellNixCabalStarter
license: MIT
build-type: Simple
executable app
main-is: Main.hs
other-modules:
Paths_HaskellNixCabalStarter
hs-source-dirs:
src
build-depends:
base >=4.12 && <4.13
, binary
, process
, random
, unix
, unix-compat
default-language: Haskell2010
There's nothing particularly mysterious going on here. Reading and writing files simply aren't atomic operations, and this is biting you. If you have one thread writing output.txt and another reading output.txt, it is completely normal and expected for the reader to occasionally see only part of the file that the writer would eventually produce.
This is not particularly special to the binary package, nor even to the language -- this is, to a first approximation, true of nearly every library and language that deals with a filesystem. Guaranteeing atomicity of the appropriate kind is quite hard, indeed; but many, many engineering years have gone into providing this kind of thing for databases, so if that's a need for you, you might consider using one of them.
Alternately, a significantly simpler solution is to have a single thread that is responsible for reading and writing the appropriate file, and to communicate with it via one of Haskell's excellent inter-thread communication tools.
Some OSs do offer an atomic file-rename operation. In such a situation, one could also consider writing to a temporary file, then using an atomic rename to overwrite the filename you actually care about. (Thanks to a commenter who I will leave anonymous because they chose to delete their comment for suggesting this.)

Evaluation of higher-order functions

I've written a function getSamplesFromFile that takes a file and returns its contents as a Vector of Floats. The functions reads the contents of the file into a Data.ByteString using Data.ByteString.hGet, it then converts this Data.ByteString to a Vector of Floats using:
import qualified Data.Vector.Unboxed as V
import qualified Data.ByteString as BS
import Data.Word
import System.Environment
import GHC.Int
toVector :: BS.ByteString -> V.Vector Float
toVector bs = vgenerate (fromIntegral (BS.length bs `div` 3)) $ \i ->
myToFloat [BS.index bs (3*i), BS.index bs (3*i+1), BS.index bs (3*i+2)]
where
myToFloat :: [Word8] -> Float
myToFloat = sum . map fromIntegral
vgenerate n f = V.generate n (f . fromIntegral)
I was testing how lazy this program was via a small test program:
main = do
[file] <- getArgs
samples <- getSamplesFromFile file
let slice = V.slice 0 50000 samples
print slice
If I run this on a 13MB file, it seems as if every sample is loaded into memory, even though I only need 50000 samples to be printed.
If I make a small modification to this problem and first map or filter over it, the result is different:
main = do
[file] <- getArgs
samples <- getSamplesFromFile file
let slice = V.slice 0 50000 samples
let mapped = V.map id slice
print mapped
This way, it seems that not every sample was loaded into memory, only the slice:
To make sure this was the case, I ran the program again with a slice of half the size (25000 samples):
Now, the memory usage seems to be proportional to the size of the slice. Just because I map over the slice with id.
The result is the same when filtering over the samples. How can applying a higher-order function suddenly make the behavior lazy?
EDIT
The problem seems to have to do something with cabal. As you can see from the pictures, I was testing my code inside a cabal project called laziness. I can't reproduce this weird behavior if use a separate Main.hs file outside of a cabal project. This is the Main.hs I'm using:
module Main where
import qualified Data.ByteString as BS
import qualified Data.Vector.Unboxed as V
import Data.Word
import GHC.Int
import System.Environment
main = do
[file] <- getArgs
samples <- getSamplesFromFile file
let slice = V.slice 0 50000 samples
--let filtered = V.filter (>0) slice
let mapped = V.map id slice
print slice
getSamplesFromFile = fmap toVector . BS.readFile
toVector :: BS.ByteString -> V.Vector Float
toVector bs = vgenerate (fromIntegral (BS.length bs `div` 3)) $ \i ->
myToFloat [BS.index bs (3*i), BS.index bs (3*i+1), BS.index bs (3*i+2)]
where
myToFloat :: [Word8] -> Float
myToFloat = sum . map fromIntegral
vgenerate n f = V.generate n (f . fromIntegral)
I don't experience the weird behavior if I do the following:
Create a new directory somewhere via mkdir
Add the above Main.hs to the directory.
Compile using ghc Main.hs -O2 -rtsopts -prof.
Run via ./Main myfile.wav +RTS -hy.
Create the pdf using hp2ps and ps2pdf.
I do experience the weird behavior if I do the following:
Create a new directory, laziness, via mkdir laziness.
Initiate a cabal project via cabal init.
Add the above Main.hs to /src.
Add ghc-options: -O2 -rtsopts -prof to laziness.cabal.
Compile using cabal install
Run via laziness myfile.wav +RTS -hy.
Create the pdf using hp2ps and ps2pdf.
I even experience the weird behavior if I:
cd laziness/src
Compile using ghc Main.hs -O2 -rtsopts -prof.
Run via ./Main myfile.wav +RTS -hy.
Create the pdf using hp2ps and ps2pdf.
So it seems that this behavior only occurs when the code is inside a cabal project. This seems weird to me. Could this have something to do with the setup of my cabal project?.

Fast counting known elements in a list

I am trying to write a solution for one of the Hackerrank problems. The challenge is to count elements in a list, the elements vary from 0 to 99, so it is possible to count them in linear time. Here is what I got:
{-# LANGUAGE BangPatterns #-}
{-# OPTIONS_GHC -O3 #-}
module Main where
import Data.STRef
import Data.Foldable
import Control.Monad
import Control.Monad.ST
main = do
line1 <- getLine
line2 <- getLine
let
!ns = map read $ words line2 :: [Int]
res = runST $ do
refs <- forM [0..99] $ \i ->
newSTRef (0 :: Int)
traverse_ (\x -> modifySTRef' (refs !! x) (+1) ) ns
mapM (\ref -> readSTRef ref) refs
putStrLn . unwords . map show $ res
This code works but not fast enough to pass the last test case. Can someone recommend an improvement to it? (link to the problem)
This can be done as a one-liner using accumArray from Data.Array. Something like accumArray (+) 0 (0,99) . zip values $ repeat 1 where values is the input.
It appears to still not be fast enough, which is somewhat vexing. accumArray is more or less as efficient as possible for what it does. Testing on my system reveals the time for processing 1,000,000 input values to be about 1 second, even without compiling it, and that time is dominated by generating the random inputs. That's a far cry from the 5 seconds on the test site.. I have to wonder how overloaded that system is.
One problem you have is that you're looking up your STRefs in a list which means that you'll have to traverse O(n) steps for every lookup and modification. This can be alleviated by using something like Data.Map.Map which has O(log(n)) lookup and modification time.
You could also use a mutable Array or Vector for O(1) lookup/modification time in the ST monad. This is probably the fastest method.

Different concurrency behavior in ghci and runghc

I have some simple code which prints to the screen at fixed intervals of time, unless an IORef is set to indicate that the user is currently typing:
import Data.IORef
import Control.Concurrent
main = do
amTyping <- newIORef False
forkIO $ printALot amTyping
aChar <- getChar
writeIORef amTyping True
aLine <- getLine
writeIORef amTyping False
putStrLn $ aChar : aLine
main
printALot :: IORef Bool -> IO ()
printALot theRef = do
putStrLn "1111111"
threadDelay 1000000
isTyping <- readIORef theRef
if isTyping
then return ()
else printALot theRef
This works beautifully in GHCi, but when I use it with runghc (or compile it), the read of or write to the IORef seems not to work -- printALot just continues looping, overrunning anything the user types.
What's the difference here between ghci and runghc/compiled? Am I using IORefs wrong, but not noticing because ghci isn't truly multithreaded?
This has nothing to do with concurrency.
Your interpreted and compiled programs differ in the terminal mode they use: non-canonical vs canonical.
In the canonical mode, your program doesn't get the character before the whole line is available — hence the effect you are observing.
To fix this, simply put the handle in the non-buffering mode:
import System.IO
main = do
hSetBuffering stdin NoBuffering
...

Better data stream reading in Haskell

I am trying to parse an input stream where the first line tells me how many lines of data there are. I'm ending up with the following code, and it works, but I think there is a better way. Is there?
main = do
numCases <- getLine
proc $ read numCases
proc :: Integer -> IO ()
proc numCases
| numCases == 0 = return ()
| otherwise = do
str <- getLine
putStrLn $ findNextPalin str
proc (numCases - 1)
Note: The code solves the Sphere problem https://www.spoj.pl/problems/PALIN/ but I didn't think posting the rest of the code would impact the discussion of what to do here.
Use replicate and sequence_.
main, proc :: IO ()
main = do numCases <- getLine
sequence_ $ replicate (read numCases) proc
proc = do str <- getLine
putStrLn $ findNextPalin str
sequence_ takes a list of actions, and runs them one after the other, in sequence. (Then it throws away the results; if you were interested in the return values from the actions, you'd use sequence.)
replicate n x makes a list of length n, with each element being x. So we use it to build up the list of actions we want to run.
Dave Hinton's answer is correct, but as an aside here's another way of writing the same code:
import Control.Applicative
main = (sequence_ . proc) =<< (read <$> getLine)
proc x = replicate x (putStrLn =<< (findNextPalin <$> getLine))
Just to remind everyone that do blocks aren't necessary! Note that in the above, both =<< and <$> stand in for plain old function application. If you ignore both operators, the code reads exactly the same as similarly-structured pure functions would. I've added some gratuitous parentheses to make things more explicit.
Their purpose is that <$> applies a regular function inside a monad, while =<< does the same but then compresses an extra layer of the monad (e.g., turning IO (IO a) into IO a).
The interesting part of looking at code this way is that you can mostly ignore where the monads and such are; typically there's very few ways to place the "function application" operators to make the types work.
You (and the previous answers) should work harder to divide up the IO from the logic. Make main gather the input and separately (purely, if possible) do the work.
import Control.Monad -- not needed, but cleans some things up
main = do
numCases <- liftM read getLine
lines <- replicateM numCases getLine
let results = map findNextPalin lines
mapM_ putStrLn results
When solving SPOJ problems in Haskell, try not to use standard strings at all. ByteStrings are much faster, and I've found you can usually ignore the number of tests and just run a map over everything but the first line, like so:
{-# OPTIONS_GHC -O2 -optc-O2 #-}
import qualified Data.ByteString.Lazy.Char8 as BS
main :: IO ()
main = do
(l:ls) <- BS.lines `fmap` BS.getContents
mapM_ findNextPalin ls
The SPOJ page in the Haskell Wiki gives a lot of good pointers about how to read Ints from ByteStrings, as well as how to deal with a large quantities of input. It'll help you avoid exceeding the time limit.

Resources