I'm learning Haskell for two years now and I'm still confused, whats the best (fastest) way to read tons of numbers from a single input line.
For learning I registered into hackerearth.com trying to solve every challenge in Haskell. But now I'm stuck with a challenge because I run into timeout issues. My program is just too slow for beeing accepted by the site.
Using the profiler I found out it takes 80%+ of the time for parsing a line with lots of integers. The percentage gets even higher when the number of values in the line increases.
Now this is the way, I'm reading numbers from an input line:
import qualified Data.ByteString.Char8 as C8
main = do
scores <- fmap (map (fst . fromJust . C8.readInt) . C8.words) C8.getLine :: IO [Int]
Is there any way to get the data faster into the variable?
BTW: The biggest testcase consist of a line with 200.000 9-digits values. Parsing takes incredible long (> 60s).
It's always difficult to declare a particular approach "the fastest", since there's almost always some way to squeeze out more performance. However, an approach using Data.ByteString.Char8 and the general method you suggest should be among the fastest methods for reading numbers. If you encounter a case where performance is poor, the problem likely lies elsewhere.
To give some concrete results, I generated a 191Meg file of 20 million 9-digit numbers, space-separate on a single line. I then tried several general methods of reading a line of numbers and printing their sum (which, for the record, was 10999281565534666). The obvious approach using String:
reader :: IO [Int]
reader = map read . words <$> getLine
sum' xs = sum xs -- work around GHC ticket 10992
main = print =<< sum' <$> reader
took 52secs; a similar approach using Text:
import qualified Data.Text as T
import qualified Data.Text.IO as T
import qualified Data.Text.Read as T
readText = map parse . T.words <$> T.getLine
where parse s = let Right (n, _) = T.decimal s in n
ran in 2.4secs (but note that it would need to be modified to handle negative numbers!); and the same approach using Char8:
import qualified Data.ByteString.Char8 as C
readChar8 :: IO [Int]
readChar8 = map parse . C.words <$> C.getLine
where parse s = let Just (n, _) = C.readInt s in n
ran in 1.4secs. All examples were compiled with -O2 on GHC 8.0.2.
As a comparison benchmark, a scanf-based C implementation:
/* GCC 5.4.0 w/ -O3 */
#include <stdio.h>
int main()
{
long x, acc = 0;
while (scanf(" %ld", &x) == 1) {
acc += x;
}
printf("%ld\n", acc);
return 0;
}
ran in about 2.5secs, on par with the Text implementation.
You can squeeze a bit more performance out of the Char8 implementation. Using a hand-rolled parser:
readChar8' :: IO [Int]
readChar8' = parse <$> C.getLine
where parse = unfoldr go
go s = do (n, s1) <- C.readInt s
let s2 = C.dropWhile C.isSpace s1
return (n, s2)
runs in about 0.9secs -- I haven't tried to determine why there's a difference, but the compiler must be missing an opportunity to perform some optimization of the words-to-readInt pipeline.
Haskell Code for Reference
Make some numbers with Numbers.hs:
-- |Generate 20M 9-digit numbers:
-- ./Numbers 20000000 100000000 999999999 > data1.txt
import qualified Data.ByteString.Char8 as C
import Control.Monad
import System.Environment
import System.Random
main :: IO ()
main = do [n, a, b] <- map read <$> getArgs
nums <- replicateM n (randomRIO (a,b))
let _ = nums :: [Int]
C.putStrLn (C.unwords (map (C.pack . show) nums))
Find their sum with Sum.hs:
import Data.List
import qualified Data.Text as T
import qualified Data.Text.IO as T
import qualified Data.Text.Read as T
import qualified Data.Char8 as C
import qualified Data.ByteString.Char8 as C
import System.Environment
-- work around https://ghc.haskell.org/trac/ghc/ticket/10992
sum' xs = sum xs
readString :: IO [Int]
readString = map read . words <$> getLine
readText :: IO [Int]
readText = map parse . T.words <$> T.getLine
where parse s = let Right (n, _) = T.decimal s in n
readChar8 :: IO [Int]
readChar8 = map parse . C.words <$> C.getLine
where parse s = let Just (n, _) = C.readInt s in n
readHand :: IO [Int]
readHand = parse <$> C.getLine
where parse = unfoldr go
go s = do (n, s1) <- C.readInt s
let s2 = C.dropWhile C.isSpace s1
return (n, s2)
main = do [method] <- getArgs
let reader = case method of
"string" -> readString
"text" -> readText
"char8" -> readChar8
"hand" -> readHand
print =<< sum' <$> reader
where:
./Sum string <data1.txt # 54.3 secs
./Sum text <data1.txt # 2.29 secs
./Sum char8 <data1.txt # 1.34 secs
./Sum hand <data1.txt # 0.91 secs
Related
With the following code:
(lazy_test.hs)
-- Testing lazy evaluation of monadically constructed lists, using State.
import Control.Monad.State
nMax = 5
foo :: Int -> State [Int] Bool
foo n = do
modify $ \st -> n : st
return (n `mod` 2 == 1)
main :: IO ()
main = do
let ress = for [0..nMax] $ \n -> runState (foo n) []
sts = map snd $ dropWhile (not . fst) ress
print $ head sts
for = flip map
I can set nMax to 5, or 50,000,000, and I get approximately the same run time:
nMax = 5:
$ stack ghc lazy_test.hs
[1 of 1] Compiling Main ( lazy_test.hs, lazy_test.o )
Linking lazy_test ...
$ time ./lazy_test
[1]
real 0m0.019s
user 0m0.002s
sys 0m0.006s
nMax = 50,000,000:
$ stack ghc lazy_test.hs
[1 of 1] Compiling Main ( lazy_test.hs, lazy_test.o )
Linking lazy_test ...
$ time ./lazy_test
[1]
real 0m0.020s
user 0m0.002s
sys 0m0.005s
which is as I expect, given my understanding of lazy evaluation mechanics.
However, if I switch from State to StateT:
(lazy_test2.hs)
-- Testing lazy evaluation of monadically constructed lists, using StateT.
import Control.Monad.State
nMax = 5
foo :: Int -> StateT [Int] IO Bool
foo n = do
modify $ \st -> n : st
return (n `mod` 2 == 1)
main :: IO ()
main = do
ress <- forM [0..nMax] $ \n -> runStateT (foo n) []
let sts = map snd $ dropWhile (not . fst) ress
print $ head sts
for = flip map
then I see an extreme difference between the respective run times:
nMax = 5:
$ stack ghc lazy_test2.hs
[1 of 1] Compiling Main ( lazy_test2.hs, lazy_test2.o )
Linking lazy_test2 ...
$ time ./lazy_test2
[1]
real 0m0.019s
user 0m0.002s
sys 0m0.004s
nMax = 50,000,000:
$ stack ghc lazy_test2.hs
[1 of 1] Compiling Main ( lazy_test2.hs, lazy_test2.o )
Linking lazy_test2 ...
$ time ./lazy_test2
[1]
real 0m29.758s
user 0m25.488s
sys 0m4.231s
And I'm assuming that's because I'm losing lazy evaluation of the monadically constructed list, when I switch to the StateT-based implementation.
Is that correct?
Can I recover lazy evaluation of a monadically constructed list, while keeping with the StateT-based implementation?
In your example, you're only running one foo action per runState, so your use of State and/or StateT is essentially irrelevant. You can replace the use of foo with the equivalent:
import Control.Monad
nMax = 50000000
main :: IO ()
main = do
ress <- forM [0..nMax] $ \n -> return (n `mod` 2 == 1, [n])
let sts = map snd $ dropWhile (not . fst) ress
print $ head sts
and it behaves the same way.
The issue is the strictness of the IO monad. If you ran this computation in the Identity monad instead:
import Control.Monad
import Data.Functor.Identity
nMax = 50000000
main :: IO ()
main = do
let ress = runIdentity $ forM [0..nMax] $ \n -> return (n `mod` 2 == 1, [n])
let sts = map snd $ dropWhile (not . fst) ress
print $ head sts
then it would run lazily.
If you want to run lazily in the IO monad, you need to do it explicitly with unsafeInterleaveIO, so the following would work:
import System.IO.Unsafe
import Control.Monad
nMax = 50000000
main :: IO ()
main = do
ress <- lazyForM [0..nMax] $ \n -> return (n `mod` 2 == 1, [n])
let sts = map snd $ dropWhile (not . fst) ress
print $ head sts
lazyForM :: [a] -> (a -> IO b) -> IO [b]
lazyForM (x:xs) f = do
y <- f x
ys <- unsafeInterleaveIO (lazyForM xs f)
return (y:ys)
lazyForM [] _ = return []
The other answer by K A Buhr explains why State vs StateT is not the pertinent factor (IO is), and also points out how your example is strangely constructed (in that the State(T) part isn't actually used as each number uses a new state []). But aside from those points, I'm not sure I would say "losing lazy evaluation of the monadically constructed list", because if we understand something like "lazy evaluation = evaluated only when needed", then foo does indeed need to run on every element on the input list in order to perform all the effects, so lazy evaluation is not being "lost". You are getting what you asked for. (It just so happens that foo doesn't perform any IO, and perhaps someone else can comment with if it's ever possible for a compiler/GHC to optimize it away on this basis, but you can easily see why GHC does the naive thing here.)
This is a common, well-known problem in Haskell. There are various libraries (best known of which are streaming, pipes, conduit) which solve the problem by giving you streams (basically lists) which are lazy in the effects too. If I recreate your example in a streaming style,
import Data.Function ((&))
import Control.Monad.State
import Streaming
import qualified Streaming.Prelude as S
foo :: Int -> StateT [Int] IO Bool
foo n =
(n `mod` 2 == 1) <$ modify (n:)
nMax :: Int
nMax = 5000000
main :: IO ()
main = do
mHead <- S.head_ $ S.each [0..nMax]
& S.mapM (flip runStateT [] . foo)
& S.dropWhile (not . fst)
print $ snd <$> mHead
then both versions run practically instantaneously. To make the difference more apparent, imagine that foo also called print "hi". Then the streaming version, being lazy in the effects, would print only twice, whereas your original versions would both print nMax times. As they're lazy in the effects, then the whole list doesn't need to be traversed in order to short-circuit and finish early.
I'm writing a toy implementation of a rainbow table in Haskell. The main datastructure is a strict Map h c, containing a large amount of pairs, generated from random values c:
import qualified Data.Map as M
import System.Random
table :: (RandomGen g, Random c) => Int -> g -> Map h c
table n = M.fromList . map (\c -> (chain c, c)) . take n . randoms
where chain is very expensive to compute. The part that dominates the computation time is embarrassingly parallel, so I would expect to get a quasi-linear speedup in the number of cores if it runs in parallel.
However, I would like the computed pairs to be added to the table straight away, rather than accumulated in a list in memory. It should be noted that collisions may occur, and in that case, the redundant chains should be dropped as soon as possible. Heap profiling confirms that this is the case.
I've found parMap from Control.Parallel.Strategies, and tried to apply it to my table-building function:
table n = M.fromList . parMap (evalTuple2 rseq rseq) (\c -> (chain c, c)) . take n . randoms
but, running with -N, I get to 1.3 core usage at best. Heap profiling indicates, at least, that the intermediate list does not reside in memory, but '-s' also reports 0 sparks created. How is this possible with my usage of parMap ? What is the proper way to do this ?
EDIT: chain is defined as:
chain :: (c -> h) -> [h -> c] -> c -> h
chain h = h . flip (foldl' (flip (.h)))
where (c -> h) is the target hash function, from cleartext to hash,
and [h -> c] is a family of reducer functions. I want the implementation to stay generic over c and h, but for benchmarking I use strict bytestrings for both.
Here is what I came up with. Let me know how the benchmarks work out:
#!/usr/bin/env stack
{- stack --resolver lts-14.1 script --optimize
--package scheduler
--package containers
--package random
--package splitmix
--package deepseq
-}
{-# LANGUAGE BangPatterns #-}
import Control.DeepSeq
import Control.Scheduler
import Data.Foldable as F
import Data.IORef
import Data.List (unfoldr)
import Data.Map.Strict as M
import System.Environment (getArgs)
import System.Random as R
import System.Random.SplitMix
-- for simplicity
chain :: Show a => a -> String
chain = show
makeTable :: Int -> SMGen -> (SMGen, M.Map String Int)
makeTable = go M.empty
where go !acc i gen
| i > 0 =
let (c, gen') = R.random gen
in go (M.insert (chain c) c acc) (i - 1) gen'
| otherwise = (gen, acc)
makeTablePar :: Int -> SMGen -> IO (M.Map String Int)
makeTablePar n0 gen0 = do
let gens = unfoldr (Just . splitSMGen) gen0
gensState <- initWorkerStates Par (\(WorkerId wid) -> newIORef (gens !! wid))
tables <-
withSchedulerWS gensState $ \scheduler -> do
let k = numWorkers (unwrapSchedulerWS scheduler)
(q, r) = n0 `quotRem` k
forM_ ((if r == 0 then [] else [r]) ++ replicate k q) $ \n ->
scheduleWorkState scheduler $ \genRef -> do
gen <- readIORef genRef
let (gen', table) = makeTable n gen
writeIORef genRef gen'
table `deepseq` pure table
pure $ F.foldl' M.union M.empty tables
main :: IO ()
main = do
[n] <- fmap read <$> getArgs
gen <- initSMGen
print =<< makeTablePar n gen
Few notes on implementation:
Don't use generator from random, it is hella slow, splitmix is x200 faster
In makeTable, if you want duplicate results to be discarded right away, then manual loop or unfold is required. But since we need the generator returned, I opted for the manual loop.
In order to minimize synchronization between threads, independent maps will be built up per thread, and at the end duplicates get removed, when resulting maps are merged together.
I'm trying to follow this tutorial: https://wiki.haskell.org/Tutorials/Programming_Haskell/String_IO.
In the last part 7 Extension: using SMP parallelism I copy the code but it fails to compile with this error message
/home/dhilst/parallelspell.hs:13:20: error:
Variable not in scope: chunk :: Int -> [String] -> t
I searched for chunks at Hoogle and got Data.Text.Internal.Lazy, but this seems to be an internal module. And I couldn't import it anyway.
Here is the code:
import Data.Set hiding (map)
import Data.Maybe
import Data.Char
import Text.Printf
import System.IO
import System.Environment
import Control.Concurrent
import Control.Monad
main = do
(f,g,n) <- readFiles
let dict = fromList (lines f)
work = chunk n (words g)
run n dict work
run n dict work = do
chan <- newChan
errs <- getChanContents chan
mapM_ (forkIO . thread chan dict) (zip [1..n] work)
wait n errs 0
wait n xs i = when (i < n) $ case xs of
Nothing : ys -> wait n ys $! i+1
Just s : ys -> putStrLn s >> wait n ys i
thread chan dict (me,xs) = do
mapM_ spellit xs
writeChan chan Nothing
where spellit w = when (spell dict w) $
writeChan chan . Just $ printf "Thread %d: %-25s" (me::Int) w
spell d w = w `notMember` d
readFiles = do
[s,n] <- getArgs
f <- readFile "/usr/share/dict/words"
g <- readFile s
return (f,g, read n)
And here is the compilation line:
ghc -O --make -threaded parallelspell.hs
--
Update: I write my own version of chunk based on this quest:How to partition a list in Haskell?
chunk :: Int -> [a] -> [[a]]
chunk _ [] = []
chunk n xs = (take n xs) : (chunk n (drop n xs))
Still, does this means that the tutorial that I'm following is very old and out of date!? Can anyone confirm if that function already existed some day or if I'm missing something?
Regards,
Looks like the tutorial just forgot to define chunk. I encourage you to update the wiki to include a suitable definition.
I'm trying to find the sum of integers in a file. The code using the normal string is:
main = do
contents <- getContents
L.putStrLn (sumFile contents)
where sumFile = sum . map read. words
I tried to change it to use the Data.ByteString.Lazy module like this:
import Data.ByteString.Lazy as L
main = do
contents <- L.getContents
L.putStrLn (sumFile contents)
where sumFile = sum . L.map read. words
But this refused as words was returning a string. Then I tried using Data.ByteString.Char8 but it used a strict ByteString.
How can I make this function completely lazy?
I found a slightly length workaround to reading the file as a ByteString and then as a list of integers. Thanks to #melpomene
import Data.ByteString.Lazy.Char8 as L
main = do
contents <- L.getContents
print (sumFile contents)
where sumFile x = sum $ Prelude.map tups $ Prelude.map L.readInt (L.words x)
where read' = tups.(L.readInt)
tups :: (Num a) => (Maybe (a, b)) -> a
tups (Just (a,b)) = a
tups Nothing = 0
Why is Data.Binary.Get isn't lazy as it says? Or am I doing something wrong here?
import Data.ByteString.Lazy (pack)
import Data.Binary.Get (runGet, isEmpty, getWord8)
getWords = do
empty <- isEmpty
if empty
then return []
else do
w <- getWord8
ws <- getWords
return $ w:ws
main = print $ take 10 $ runGet getWords $ pack $ repeat 1
This main function just hangs instead of printing 10 words.
The documentation you linked provides several examples. The first one needs to read all the input before it can return and looks a lot like what you have written. The second one is a left-fold and processes the input in a streaming fashion. Here's your code rewritten in this style:
module Main where
import Data.Word (Word8)
import qualified Data.ByteString.Lazy as BL
import Data.Binary.Get (runGetState, getWord8)
getWords :: BL.ByteString -> [Word8]
getWords input
| BL.null input = []
| otherwise =
let (w, rest, _) = runGetState getWord8 input 0
in w : getWords rest
main :: IO ()
main = print . take 10 . getWords . BL.pack . repeat $ 1
Testing:
*Main> :main
[1,1,1,1,1,1,1,1,1,1]