I have simple command line interface with insertion records in DB and
now it writes a lot of info to stdout, like this:
record 856/1000: 85%
record 857/1000: 85%
record 858/1000: 85%
but I want to have 1 dynamic line with updating current string parameters
status |T | C | A | E
inserting |1000 | 857 | 85% | 96
How can I achieve that?

If it's just one row, you can use \r to rewind the cursor to the beginning of the line.
Here's an example:
import Control.Concurrent
import Control.Monad
import Text.Printf
main :: IO ()
main = do
forM_ [10, 9 .. 1] $ \seconds -> do
printf "\rLaunching missiles in %2d..." (seconds :: Int)
threadDelay $ 1 * 1000 * 1000
putStrLn "\nBlastoff!"

Joey Hess's concurrent-output library is designed for progress output like this (and more complex variations).


Read large lines in huge file without buffering

I was wondering if there's an easy way to get lines one at a time out of a file without eventually loading the whole file in memory. I'd like to do a fold over the lines with an attoparsec parser. I tried using Data.Text.Lazy.IO with hGetLine and that blows through my memory. I read later that eventually loads the whole file.
I also tried using pipes-text with folds and view lines:
s <- Pipes.sum $
folds (\i _ -> (i+1)) 0 id (view Text.lines (Text.fromHandle handle))
print s
to just count the number of lines and it seems to be doing some wonky stuff "hGetChunk: invalid argument (invalid byte sequence)" and it takes 11 minutes where wc -l takes 1 minute. I heard that pipes-text might have some issues with gigantic lines? (Each line is about 1GB)
I'm really open to any suggestions, can't find much searching except for newbie readLine how-tos.
The following code uses Conduit, and will:
UTF8-decode standard input
Run the lineC combinator as long as there is more data available
For each line, simply yield the value 1 and discard the line content, without ever read the entire line into memory at once
Sum up the 1s yielded and print it
You can replace the yield 1 code with something which will do processing on the individual lines.
#!/usr/bin/env stack
-- stack --resolver lts-8.4 --install-ghc runghc --package conduit-combinators
import Conduit
main :: IO ()
main = (runConduit
$ stdinC
.| decodeUtf8C
.| peekForeverE (lineC (yield (1 :: Int)))
.| sumC) >>= print
This is probably easiest as a fold over the decoded text stream
{-#LANGUAGE BangPatterns #-}
import Pipes
import qualified Pipes.Prelude as P
import qualified Pipes.ByteString as PB
import qualified Pipes.Text.Encoding as PT
import qualified Control.Foldl as L
import qualified Control.Foldl.Text as LT
main = do
n <- L.purely P.fold (LT.count '\n') $ void $ PT.decodeUtf8 PB.stdin
print n
It takes about 14% longer than wc -l for the file I produced which was just long lines of commas and digits. IO should properly be done with Pipes.ByteString as the documentation says, the rest is conveniences of various sorts.
You can map an attoparsec parser over each line, distinguished by view lines, but keep in mind that an attoparsec parser can accumulate the whole text as it pleases and this might not be a great idea over a 1 gigabyte chunk of text. If there is a repeated figure on each line (e.g. word separated numbers) you can use Pipes.Attoparsec.parsed to stream them.

Lazy ByteString : memory exploding in certain cases

Below we have two seemingly functionally equivalent programs. For the first the memory remains constant, whereas for the second the memory explodes (using ghc 7.8.2 & bytestring- in Ubuntu 14.04 64-bit):
Non-exploding :
--ghc -O3 NoExplode.hs
module Main where
import Data.ByteString.Lazy as BL
import Data.ByteString.Lazy.Char8 as BLC
num = 1000000000
bytenull = BLC.pack ""
countDataPoint arg sum
| arg == bytenull = sum
| otherwise = countDataPoint (BL.tail arg) (sum+1)
test1 = BL.last $ BL.take num $ BLC.cycle $ BLC.pack "abc"
test2 = countDataPoint (BL.take num $ BLC.cycle $ BLC.pack "abc") 0
main = do
print test1
print test2
Exploding :
--ghc -O3 Explode.hs
module Main where
import Data.ByteString.Lazy as BL
import Data.ByteString.Lazy.Char8 as BLC
num = 1000000000
bytenull = BLC.pack ""
countDataPoint arg sum
| arg == bytenull = sum
| otherwise = countDataPoint (BL.tail arg) (sum+1)
longByteStr = BL.take num $ BLC.cycle $ BLC.pack "abc"
test1 = BL.last $ longByteStr
test2 = countDataPoint (BL.take num $ BLC.cycle $ BLC.pack "abc") 0
main = do
print test1
print test2
Additional details :
The difference is that inExplode.hs I have taken BL.take num $ BLC.cycle $ BLC.pack "abc" out of the definition of test1, and assigned it to its own value longByteStr.
Strangely if we comment out either print test1 or print test2 in Explode.hs (but obviously not both), then the program does not explode.
Is there a reason memory is exploding in Explode.hs and not in NoExplode.hs, and also why the exploding program (Explode.hs) requires both print test1 and print test2 in order to exlode?
Why ghc performs common expression elimination in one case, but not in the other? Who knows. Maybe common expressions where killed by inlining. Basically it depends on internal implementation.
Regarding -ddump-simp, see this SO question: Reading GHC Core
I reproduced it with ghc-7.8.2. It performs common expression elimination. You can check output of -ddump-simpl. So you actually creating one lazy bytestring.
In the first version you create two lazy bytestrings. print test1 forces the first one, but it is garbage collected on the fly because nobody else uses it. The same for print test2 -- it forces the second bytestring, and it is GC'ed on the fly.
In the second version you create one lazy bytestring. print test1 forces it, but it can't be GC'ed because it is needed for print test2. As a result, after the first print you have entire bytestring loaded into memory.
If you remove one print, the bytestring is GC'ed on the fly again. because it is not used anywhere else.
PS. "GC'ed on the fly" means: print takes the first chunk and outputs it to stdout. The chunk becomes available for GC. Then prints takes the second chunk, etc...

Fake key presses using XHB

I'm trying to simulate key presses using XHB and XTest, using this example code as a reference. Unfortunately, whatever I do, the resulting program has no effect. No exceptions, no warnings.
Any ideas?
I'm using XHB 0.5.2012.11.23 with GHC 7.4.1 on Ubuntu 12.04.
Here's what I've got so far:
import Control.Monad
import Control.Concurrent
import Graphics.XHB
import Graphics.XHB.Gen.Test
main = do
Just c <- connect
let keyCode = 38 -- A
forever $ do
fakeInput c $ MkFakeInput (toBit' EventMaskKeyPress) keyCode 0 (fromXid xidNone) 0 0 0
threadDelay $ 1 * 1000
fakeInput c $ MkFakeInput (toBit' EventMaskKeyRelease) keyCode 0 (fromXid xidNone) 0 0 0
threadDelay $ 1 * 1000
toBit' :: (BitEnum a, Integral b) => a -> b
toBit' = fromIntegral . toBit
The issue here is a bit subtle. If you look at the XTest protocol you'll find that the FAKE_EVENT doesn't expect an EVENT_MASK but instead expects a FAKE_EVENT_TYPE. KeyPress is FAKE_EVENT_TYPE 2 whereas KeyRelease is 3. Things work as expected after these values are used in place of EventMaskKeyPress and EventMaskKeyRelease (moreover, you don't need the nasty toBit coercion, which was the smell that pointed me towards this being incorrect).

Haskell read first n lines

I'm trying to learn Haskell to get used to functional programming languages. I've decided to try a few problems at interviewstreet to start out. I'm having trouble reading from stdin and doing io in general with haskell's lazy io.
Most of the problems have data coming from stdin in the following form:
data line 1
data line 2
data line 3
data line n
where n is the number of following lines coming from stdin and the next lines are the data.
How do I run my program on each of the n lines one at a time and return the solution to stdout?
I know the stdin input won't be very large but I'm asking about evaluating each line one at a time pretending the input is larger than what can fit in memory just to learn how to use haskell.
You can use interact, in conjunction with lines to process data from stdin one line at a time. Here's an example program that uses interact to access stdin, lines to split the data on each newline, a list comprehension to apply the function perLine to each line of the input, and unlines to put the output from perLine back together again.
main = interact processInput
processInput input = unlines [perLine line | line <- lines input]
perLine line = reverse line -- do whatever you want to 'line' here!
You don't need to worry about the size of the data you're getting over stdin; Haskell's laziness ensures that you only keep the parts you're actually working on in memory at any time.
EDIT: if you still want to work on only the first n lines, you can use the take function in the above example, like this:
processInput input = unlines [perLine line | line <- take 10 (lines input)]
This will terminate the program after the first ten lines have been read and processed.
You can also use a simple recursion:
getMultipleLines :: Int -> IO [String]
getMultipleLines n
| n <= 0 = return []
| otherwise = do
x <- getLine
xs <- getMultipleLines (n-1)
return (x:xs)
And then use it in your main:
main :: IO ()
main = do
line <- getLine
let numLines = read line :: Int
inputs <- getMultipleLines numLines

How to get good performance when writing a list of integers from 1 to 10 million to a file?

I want a program that will write a sequence like,
to a file. What's the simplest code one can write, and get decent performance? My intuition is that there is some lack-of-buffering problem. My C code runs at 100 MB/s, whereas by reference the Linux command line utility dd runs at 9 GB/s 3 GB/s (sorry for the imprecision, see comments -- I'm more interested in the big picture orders-of-magnitude though).
One would think this would be a solved problem by now ... i.e. any modern compiler would make it immediate to write such programs that perform reasonably well ...
C code
#include <stdio.h>
int main(int argc, char **argv) {
int len = 10000000;
for (int a = 1; a <= len; a++) {
printf ("%d\n", a);
return 0;
I'm compiling with clang -O3. A performance skeleton which calls putchar('\n') 8 times gets comparable performance.
Haskell code
A naiive Haskell implementation runs at 13 MiB/sec, compiling with ghc -O2 -optc-O3 -optc-ffast-math -fllvm -fforce-recomp -funbox-strict-fields. (I haven't recompiled my libraries with -fllvm, perhaps I need to do that.) Code:
import Control.Monad
main = forM [1..10000000 :: Int] $ \j -> putStrLn (show j)
My best stab with Haskell runs even slower, at 17 MiB/sec. The problem is I can't find a good way to convert Vector's into ByteString's (perhaps there's a solution using iteratees?).
import qualified Data.Vector.Unboxed as V
import Data.Vector.Unboxed (Vector, Unbox, (!))
writeVector :: (Unbox a, Show a) => Vector a -> IO ()
writeVector v = V.mapM_ (System.IO.putStrLn . show) v
main = writeVector (V.generate 10000000 id)
It seems that writing ByteString's is fast, as demonstrated by this code, writing an equivalent number of characters,
import Data.ByteString.Char8 as B
main = B.putStrLn (B.replicate 76000000 '\n')
This gets 1.3 GB/s, which isn't as fast as dd, but obviously much better.
Some completely unscientific benchmarking first:
All programmes have been compiled with the default optimisation level (-O3 for gcc, -O2 for GHC) and run with
time ./prog > outfile
As a baseline, the C programme took 1.07s to produce a ~76MB (78888897 bytes) file, roughly 70MB/s throughput.
The "naive" Haskell programme (forM [1 .. 10000000] $ \j -> putStrLn (show j)) took 8.64s, about 8.8MB/s.
The same with forM_ instead of forM took 5.64s, about 13.5MB/s.
The ByteString version from dflemstr's answer took 9.13s, about 8.3MB/s.
The Text version from dflemstr's answer took 5.64s, about 13.5MB/s.
The Vector version from the question took 5.54s, about 13.7MB/s.
main = mapM_ (C.putStrLn . C.pack . show) $ [1 :: Int .. 10000000], where C is Data.ByteString.Char8, took 4.25s, about 17.9MB/s.
putStr . unlines . map show $ [1 :: Int .. 10000000] took 3.06s, about 24.8MB/s.
A manual loop,
main = putStr $ go 1
go :: Int -> String
go i
| i > 10000000 = ""
| otherwise = shows i . showChar '\n' $ go (i+1)
took 2.32s, about 32.75MB/s.
main = putStrLn $ replicate 78888896 'a' took 1.15s, about 66MB/s.
main = C.putStrLn $ C.replicate 78888896 'a' where C is Data.ByteString.Char8, took 0.143s, about 530MB/s, roughly the same figures for lazy ByteStrings.
What can we learn from that?
First, don't use forM or mapM unless you really want to collect the results. Performancewise, that sucks.
Then, ByteString output can be very fast (10.), but if the construction of the ByteString to output is slow (3.), you end up with slower code than the naive String output.
What's so terrible about 3.? Well, all the involved Strings are very short. So you get a list of
Chunk "1234567" Empty
and between any two such, a Chunk "\n" Empty is put, then the resulting list is concatenated, which means all these Emptys are tossed away when a ... (Chunk "1234567" (Chunk "\n" (Chunk "1234568" (...)))) is built. That's a lot of wasteful construct-deconstruct-reconstruct going on. Speed comparable to that of the Text and the fixed "naive" String version can be achieved by packing to strict ByteStrings and using fromChunks (and Data.List.intersperse for the newlines). Better performance, slightly better than 6., can be obtained by eliminating the costly singletons. If you glue the newlines to the Strings, using \k -> shows k "\n" instead of show, the concatenation has to deal with half as many slightly longer ByteStrings, which pays off.
I'm not familiar enough with the internals of either text or vector to offer more than a semi-educated guess concerning the reasons for the observed performance, so I'll leave them out. Suffice it to say that the performance gain is marginal at best compared to the fixed naive String version.
Now, 6. shows that ByteString output is faster than String output, enough that in this case the additional work of packing is more than compensated. However, don't be fooled by that to believe that is always so. If the Strings to pack are long, the packing can take more time than the String output.
But ten million invocations of putStrLn, be it the String or the ByteString version, take a lot of time. It's faster to grab the stdout Handle just once and construct the output String in non-IO code. unlines already does well, but we still suffer from the construction of the list map show [1 .. 10^7]. Unfortunately, the compiler didn't manage to eliminate that (but it eliminated [1 .. 10^7], that's already pretty good). So let's do it ourselves, leading to 8. That's not too terrible, but still takes more than twice as long as the C programme.
One can make a faster Haskell programme by going low-level and directly filling ByteStrings without going through String via show, but I don't know if the C speed is reachable. Anyway, that low-level code isn't very pretty, so I'll spare you what I have, but sometimes one has to get one's hands dirty if speed matters.
Using lazy byte strings gives you some buffering, because the string will be written instantly and more numbers will only be produced as they are needed. This code shows the basic idea (there might be some optimizations that could be made):
import qualified Data.ByteString.Lazy.Char8 as ByteString
main =
ByteString.putStrLn .
ByteString.intercalate (ByteString.singleton '\n') .
map (ByteString.pack . show) $
([1..10000000] :: [Int])
I still use Strings for the numbers here, which leads to horrible slowdowns. If we switch to the text library instead of the bytestring library, we get access to "native" show functions for ints, and can do this:
import Data.Monoid
import Data.List
import Data.Text.Lazy.IO as Text
import Data.Text.Lazy.Builder as Text
import Data.Text.Lazy.Builder.Int as Text
main :: IO ()
main =
Text.putStrLn .
Text.toLazyText .
mconcat .
intersperse (Text.singleton '\n') .
map Text.decimal $
([1..10000000] :: [Int])
I don't know how you are measuring the "speed" of these programs (with the pv tool?) but I imagine that one of these procedures will be the fastest trivial program you can get.
If you are going for maximum performance, then it helps to take a holistic view; i.e., you want to write a function that maps from [Int] to series of system calls that write chunks of memory to a file.
Lazy bytestrings are good representation for a sequence of chunks of memory. Mapping a lazy bytestring to a series of systems calls that write chunks of memory is what L.hPut is doing (assuming an import qualified Data.ByteString.Lazy as L). Hence, we just need a means to efficiently construct the corresponding lazy bytestring. This is what lazy bytestring builders are good at. With the new bytestring builder (here is the API documentation), the following code does the job.
import qualified Data.ByteString.Lazy as L
import Data.ByteString.Lazy.Builder (toLazyByteString, charUtf8)
import Data.ByteString.Lazy.Builder.ASCII (intDec)
import Data.Foldable (foldMap)
import Data.Monoid (mappend)
import System.IO (openFile, IOMode(..))
main :: IO ()
main = do
h <- openFile "/dev/null" WriteMode
L.hPut h $ toLazyByteString $
foldMap ((charUtf8 '\n' `mappend`) . intDec) [1..10000000]
Note that I output to /dev/null to avoid interference by the disk driver. The effort of moving the data to the OS remains the same. On my machine, the above code runs in 0.45 seconds, which is 12 times faster than the 5.4 seconds of your original code. This implies a throughput of 168 MB/s. We can squeeze out an additional 30% speed (220 MB/s) using bounded encodings].
import qualified Data.ByteString.Lazy.Builder.BasicEncoding as E
L.hPut h $ toLazyByteString $
((\x -> (x, '\n')) E.>$< E.intDec `E.pairB` E.charUtf8)
Their syntax looks a bit quirky because a BoundedEncoding a specifies the conversion of a Haskell value of type a to a bounded-length sequence of bytes such that the bound can be computed at compile-time. This allows functions such as E.encodeListWithB to perform some additional optimizations for implementing the actual filling of the buffer. See the the documentation of Data.ByteString.Lazy.Builder.BasicEncoding in the above link to the API documentation (phew, stupid hyperlink limit for new users) for more information.
Here is the source of all my benchmarks.
The conclusion is that we can get very good performance from a declarative solution provided that we understand the cost model of our implementation and use the right datastructures. Whenever constructing a packed sequence of values (e.g., a sequence of bytes represented as a bytestring), then the right datastructure to use is a bytestring Builder.
