How to show progress in Shake?

How to show progress in Shake? - haskell

I am trying to figure out how can i take the progress info from a Progress type (in Development.Shake.Progress) to output it before executing a command. The possible desired output would be:
[1/9] Compiling src/Window/Window.cpp
[2/9] Compiling src/Window/GlfwError.cpp
[3/9] Compiling src/Window/GlfwContext.cpp
[4/9] Compiling src/Util/MemTrack.cpp
...
For now i am simulating this using some IORef that keeps the total (initially set to the sum of the source files) and a count that i increase before executing each build command, but this seems like a hackish solution to me.
On top of that this solution seems to work correctly on clean builds, but misbehaves on partial builds as the sum that displayed is still the total of all the source files.
With access to a Progress data type i will be able to calculate this fraction correctly using its countSkipped, countBuild, and countTodo members (see Progress.hs:53), but i am still not sure how i can i achieve this.
Any help is appreciated.

Values of type Progress are currently only available as an argument to the function stored in shakeProgress. You can obtain the Progress whenever you want with:
{-# LANGUAGE RecordWildCards #-}
import Development.Shake
import Data.IORef
import Data.Monoid
import Control.Monad
main = do
ref <- newIORef $ return mempty
shakeArgs shakeOptions{shakeProgress = writeIORef ref} $ do
want ["test" ++ show i | i <- [1..5]]
"test*" %> \out -> do
Progress{..} <- liftIO $ join $ readIORef ref
putNormal $
"[" ++ show (countBuilt + countSkipped + 1) ++
"/" ++ show (countBuilt + countSkipped + countTodo) ++
"] " ++ out
writeFile' out ""
Here we create an IORef to squirrel away the argument passed to shakeProgress, then retrieve it later when running the rules. Running the above code I see:
[1/5] test5
[2/5] test4
[3/5] test3
[4/5] test2
[5/5] test1
Running at a higher level of parallelism gives less precise results - initially there are only 3 items in todo (Shake increments countTodo as it finds items todo, and spawns items as soon as it knows about any of them), and there are often two rules running at the same index (there is no information about how many are in progress). Given knowledge of your specific rules, you could refine the output, e.g. storing an IORef you increment to ensure the index was monotonic.
The reason this code is somewhat convoluted is that the Progress information was intended to be used for asynchronous progress messages, although your approach seems perfectly valid. It may be worth introducing a getProgress :: Action Progress function for synchronous progress messages.

Related

Haskell: initialise a list with symbols?

In Haskell, is there a way of initialising a list and declaring symbols in that list at the same time?
Currently I do this:
import Data.List
main = do
let lambda = "\x03BB"
xi = "\x926"
bol = "\x1D539"
cohomology_algebra = [ lambda, bol, xi]
putStrLn $ xi
putStrLn $ show cohomology_algebra
However I have a long list of symbols and I worry that i forget to put them all in the list (it has happened)
Ideally I would do something like:
main = do
let cohomology_algebra = [ lambda = "\x03BB", bol = "\x1D539", xi= "\x926"] -- does not compile
putStrLn $ show cohomology_algebra
Is there a way around this?

Not a perfect solution, but you could use
let cohomology_algebra#[lambda, bol, xi] = ["\x03BB", "\x926", "\x1D539"]
This will trigger a runtime error if the two lists above have different length (at the point where the names are demanded).
It's not optimal, since this check should be at compile time instead. Further, in this code style we have to separate the identifier form its value too much, making it possible to swap some definitions by mistake.

Is it possible to recover from an erroneous eval in hint?

I am trying to use hint package from hackage to create a simple environment where user can issue lines of code for evaluation (like in ghci). I expect some of the input lines to be erroneous (eval would end the session with an error). How can I create a robust session that ignores erroneous input (or better: it reports an error but can accept other input) and keeps the previously consistent state?
Also, I would like to use it in do style, i.e. let a = 3 as standalone input line makes sense.
To clarify things: I have no problem with a single eval. What I would like to do, is allow continuing evaluation even after some step failed. Also I would like to incrementally extend a monadic chain (as you do in ghci I guess).
In other words: I want something like this, except that I get to evaluate 3 and don't stop at undefined with the error.
runInterpreter $ setImports [ "Prelude" ] >> eval "undefined" >> eval "3"
More specifically I would like something like this to be possible:
runInterpreter $ setImports ... >> eval' "let a = (1, 2)" -- modifying context
>> typeOf "b" -- error but not breaking the chain
>> typeOf "a" -- (Num a, Num b) => (a, b)
I don't expect it to work this straightforwardly, this is just to show the idea. I basically would like to build up some context (as you do in ghci) and every addition to the context would modify it only if there is no failure, failures could be logged or explicitly retrieved after each attempt to modify the context.

You didn't show any code so I don't know the problem. The most straight-forward way I use hint handles errors fine:
import Language.Haskell.Interpreter
let doEval s = runInterpreter $ setImports ["Prelude"] >> eval s
has resulted in fine output for me...
Prelude Language.Haskell.Interpreter> doEval "1 + 2"
Right "3"
Prelude Language.Haskell.Interpreter> doEval "1 + 'c'"
ghc: panic! (the 'impossible' happened)
(GHC version 7.10.2 for x86_64-apple-darwin):
nameModule doEval_a43r
... Except that now the impossible happens... that's a bug. Notice you are supposed to get Left someError in cases like these:
data InterpreterError
= UnknownError String
| WontCompile [GhcError]
| NotAllowed String
| GhcException String
-- Defined in ‘hint-0.4.2.3:Hint.Base’
Have you looked through the ghchq bug list and/or submitted an issue?
EDIT:
And the correct functionality is back, at least as of GHC 7.10.3 x64 on OS X with hint version 0.4.2.3. In other words, it appears the bug went away from 7.10.2 to 7.10.3
The output is:
Left (WontCompile [GhcError {errMsg = ":3:3:\n No instance for (Num Char) arising from a use of \8216+\8217\n In the expression: 1 + 'c'\n In an equation for \8216e_11\8217: e_11 = 1 + 'c'\n In the first argument of \8216show_M439719814875238119360034\8217, namely\n \8216(let e_11 = 1 + 'c' in e_11)\8217"}])
Though executing the doEval line twice in GHCi does cause a panic, things seem to work once in the interpreter and properly regardless when compiled.

Order of execution within monads

I was learning how to use the State monad and I noticed some odd behavior in terms of the order of execution. Removing the distracting bits that involve using the actual state, say I have the following code:
import Control.Monad
import Control.Monad.State
import Debug.Trace
mainAction :: State Int ()
mainAction = do
traceM "Starting the main action"
forM [0..2] (\i -> do
traceM $ "i is " ++ show i
forM [0..2] (\j -> do
traceM $ "j is " ++ show j
someSubaction i j
)
)
Running runState mainAction 1 in ghci produces the following output:
j is 2
j is 1
j is 0
i is 2
j is 2
j is 1
j is 0
i is 1
j is 2
j is 1
j is 0
i is 0
Outside for loop
which seems like the reverse order of execution of what might be expected. I thought that maybe this is a quirk of forM and tried it with sequence which specifically states that it runs its computation sequentially from left to right like so:
mainAction :: State Int ()
mainAction = do
traceM "Outside for loop"
sequence $ map handleI [0..2]
return ()
where
handleI i = do
traceM $ "i is " ++ show i
sequence $ map (handleJ i) [0..2]
handleJ i j = do
traceM $ "j is " ++ show j
someSubaction i j
However, the sequence version produces the same output. What is the actual logic in terms of the order of execution that is happening here?

Haskell is lazy, which means things are not executed immediately. Things are executed whenever their result is needed – but no sooner. Sometimes code isn't executed at all if its result isn't needed.
If you stick a bunch of trace calls in a pure function, you will see this laziness happening. The first thing that is needed will be executed first, so that's the trace call you see first.
When something says "the computation is run from left to right" what it means is that the result will be the same as if the computation was run from left to right. What actually happens under the hood might be very different.
This is in fact why it's a bad idea to do I/O inside pure functions. As you have discovered, you get "weird" results because the execution order can be pretty much anything that produces the correct result.
Why is this a good idea? When the language doesn't enforce a specific execution order (such as the traditional "top to bottom" order seen in imperative languages) the compiler is free to do a tonne of optimisations, such as for example not executing some code at all because its result isn't needed.
I would recommend you to not think too much about execution order in Haskell. There should be no reason to. Leave that up to the compiler. Think instead about which values you want. Does the function give the correct value? Then it works, regardless of which order it executes things in.

I thought that maybe this is a quirk of forM and tried it with sequence which specifically states that it runs its computation sequentially from left to right like so: [...]
You need to learn to make the following, tricky distinction:
The order of evaluation
The order of effects (a.k.a. "actions")
What forM, sequence and similar functions promise is that the effects will be ordered from left to right. So for example, the following is guaranteed to print characters in the same order that they occur in the string:
putStrLn :: String -> IO ()
putStrLn str = forM_ str putChar >> putChar '\n'
But that doesn't mean that expressions are evaluated in this left-to-right order. The program has to evaluate enough of the expressions to figure out what the next action is, but that often does not require evaluating everything in every expression involved in earlier actions.
Your example uses the State monad, which bottoms out to pure code, so that accentuates the order issues. The only thing that a traversal functions such as forM promises in this case is that gets inside the actions mapped to the list elements will see the effect of puts for elements to their left in the list.

can xmonad's logHook be run at set intervals rather than in (merely) response to layout events?

I'm using dynamicLogWithPP from XMonad.Hooks.DynamicLog together with dzen2 as a status bar under xmonad. One of the things I'd like to have displayed in the bar is the time remaining in the currently playing track in audacious (if any). Getting this information is easy:
audStatus :: Player -> X (Maybe String)
audStatus p = do
info <- liftIO $ tryS $ withPlayer p $ do
ispaused <- paused
md <- getMetadataString
timeleftmillis <- (-) <$> (getCurrentTrack >>= songFrames) <*> time
let artist = md ! "artist"
title = md ! "title"
timeleft = timeleftmillis `quot` 1000
(minutes, seconds) = timeleft `quotRem` 60
disp = artist ++ " - " ++ title ++ " (-"++(show minutes)++":"++(show seconds)++")" -- will be wrong if seconds < 10
audcolor False = dzenColor base0 base03
audcolor True = dzenColor base1 base02
return $ wrap "^ca(1, pms p)" "^ca()" (audcolor ispaused disp)
return $ either (const Nothing) Just info
So I can stick that in ppExtras and it works fine—except it only gets run when the logHook gets run, and that happens only when a suitable event comes down the pike. So the display is potentially static for a long time, until I (e.g.) switch workspaces.
It seems like some people just run two dzen bars, with one getting output piped in from a shell script. Is that the only way to have regular updates? Or can this be done from within xmonad (without getting too crazy/hacky)?
ETA: I tried this, which seems as if it should work better than it does:
create a TChan for updates from XMonad, and another for updates from a function polling Audacious;
set the ppOutput field in the PP structure from DynamicLog to write to the first TChan;
fork the audacious-polling function and have it write to the second TChan;
fork a function to read from both TChans (checking that they aren't empty, first), and combining the output.
Updates from XMonad are read from the channel and processed in a timely fashion, but updates from Audacious are hardly registered at all—every five or so seconds at best. It seems as if some approach along these lines ought to work, though.

I know this is an old question, but I came here looking for an answer to this a few days ago, and I thought I'd share the way I solved it. You actually can do it entirely from xmonad. It's a tiny bit hacky, but I think it's much nicer than any of the alternatives I've come across.
Basically, I used the XMonad.Util.Timer library, which will send an X event after a specified time period (in this case, one second). Then I just wrote an event hook for it, which starts the timer again, and then manually runs the log hook.
I also had to use the XMonad.Util.ExtensibleState library, because Timer uses an id variable to make sure it's responding to the right event, so I have to store that variable between events.
Here's my code:
{-# LANGUAGE DeriveDataTypeable #-}
import qualified XMonad.Util.ExtensibleState as XS
import XMonad.Util.Timer
...
-- wrapper for the Timer id, so it can be stored as custom mutable state
data TidState = TID TimerId deriving Typeable
instance ExtensionClass TidState where
initialValue = TID 0
...
-- put this in your startupHook
-- start the initial timer, store its id
clockStartupHook = startTimer 1 >>= XS.put . TID
-- put this in your handleEventHook
clockEventHook e = do -- e is the event we've hooked
(TID t) <- XS.get -- get the recent Timer id
handleTimer t e $ do -- run the following if e matches the id
startTimer 1 >>= XS.put . TID -- restart the timer, store the new id
ask >>= logHook.config -- get the loghook and run it
return Nothing -- return required type
return $ All True -- return required type
Pretty straightforward. I hope this is helpful to someone.

It cannot be done from within xmonad; xmonad's current threading model is a bit lacking (and so is dzen's). However, you can start a separate process that periodically polls your music player and then use one of the dzen multiplexers (e.g. dmplex) to combine the output from the two processes.
You may also want to look into xmobar and taffybar, which both have better threading stories than dzen does.
With regards to why your proposed TChan solution doesn't work properly, you might want to read the sections "Conventions", "Foreign Imports", and "The Non-Threaded Runtime" at my crash course on the FFI and gtk, keeping in mind that xmonad currently uses GHC's non-threaded runtime. The short answer is that xmonad's main loop makes an FFI call to Xlib that waits for an X event; this call blocks all other Haskell threads from running until it returns.

How to get good performance when writing a list of integers from 1 to 10 million to a file?

question
I want a program that will write a sequence like,
1
...
10000000
to a file. What's the simplest code one can write, and get decent performance? My intuition is that there is some lack-of-buffering problem. My C code runs at 100 MB/s, whereas by reference the Linux command line utility dd runs at 9 GB/s 3 GB/s (sorry for the imprecision, see comments -- I'm more interested in the big picture orders-of-magnitude though).
One would think this would be a solved problem by now ... i.e. any modern compiler would make it immediate to write such programs that perform reasonably well ...
C code
#include <stdio.h>
int main(int argc, char **argv) {
int len = 10000000;
for (int a = 1; a <= len; a++) {
printf ("%d\n", a);
}
return 0;
}
I'm compiling with clang -O3. A performance skeleton which calls putchar('\n') 8 times gets comparable performance.
Haskell code
A naiive Haskell implementation runs at 13 MiB/sec, compiling with ghc -O2 -optc-O3 -optc-ffast-math -fllvm -fforce-recomp -funbox-strict-fields. (I haven't recompiled my libraries with -fllvm, perhaps I need to do that.) Code:
import Control.Monad
main = forM [1..10000000 :: Int] $ \j -> putStrLn (show j)
My best stab with Haskell runs even slower, at 17 MiB/sec. The problem is I can't find a good way to convert Vector's into ByteString's (perhaps there's a solution using iteratees?).
import qualified Data.Vector.Unboxed as V
import Data.Vector.Unboxed (Vector, Unbox, (!))
writeVector :: (Unbox a, Show a) => Vector a -> IO ()
writeVector v = V.mapM_ (System.IO.putStrLn . show) v
main = writeVector (V.generate 10000000 id)
It seems that writing ByteString's is fast, as demonstrated by this code, writing an equivalent number of characters,
import Data.ByteString.Char8 as B
main = B.putStrLn (B.replicate 76000000 '\n')
This gets 1.3 GB/s, which isn't as fast as dd, but obviously much better.

Some completely unscientific benchmarking first:
All programmes have been compiled with the default optimisation level (-O3 for gcc, -O2 for GHC) and run with
time ./prog > outfile
As a baseline, the C programme took 1.07s to produce a ~76MB (78888897 bytes) file, roughly 70MB/s throughput.
The "naive" Haskell programme (forM [1 .. 10000000] $ \j -> putStrLn (show j)) took 8.64s, about 8.8MB/s.
The same with forM_ instead of forM took 5.64s, about 13.5MB/s.
The ByteString version from dflemstr's answer took 9.13s, about 8.3MB/s.
The Text version from dflemstr's answer took 5.64s, about 13.5MB/s.
The Vector version from the question took 5.54s, about 13.7MB/s.
main = mapM_ (C.putStrLn . C.pack . show) $ [1 :: Int .. 10000000], where C is Data.ByteString.Char8, took 4.25s, about 17.9MB/s.
putStr . unlines . map show $ [1 :: Int .. 10000000] took 3.06s, about 24.8MB/s.
A manual loop,
main = putStr $ go 1
where
go :: Int -> String
go i
| i > 10000000 = ""
| otherwise = shows i . showChar '\n' $ go (i+1)
took 2.32s, about 32.75MB/s.
main = putStrLn $ replicate 78888896 'a' took 1.15s, about 66MB/s.
main = C.putStrLn $ C.replicate 78888896 'a' where C is Data.ByteString.Char8, took 0.143s, about 530MB/s, roughly the same figures for lazy ByteStrings.
What can we learn from that?
First, don't use forM or mapM unless you really want to collect the results. Performancewise, that sucks.
Then, ByteString output can be very fast (10.), but if the construction of the ByteString to output is slow (3.), you end up with slower code than the naive String output.
What's so terrible about 3.? Well, all the involved Strings are very short. So you get a list of
Chunk "1234567" Empty
and between any two such, a Chunk "\n" Empty is put, then the resulting list is concatenated, which means all these Emptys are tossed away when a ... (Chunk "1234567" (Chunk "\n" (Chunk "1234568" (...)))) is built. That's a lot of wasteful construct-deconstruct-reconstruct going on. Speed comparable to that of the Text and the fixed "naive" String version can be achieved by packing to strict ByteStrings and using fromChunks (and Data.List.intersperse for the newlines). Better performance, slightly better than 6., can be obtained by eliminating the costly singletons. If you glue the newlines to the Strings, using \k -> shows k "\n" instead of show, the concatenation has to deal with half as many slightly longer ByteStrings, which pays off.
I'm not familiar enough with the internals of either text or vector to offer more than a semi-educated guess concerning the reasons for the observed performance, so I'll leave them out. Suffice it to say that the performance gain is marginal at best compared to the fixed naive String version.
Now, 6. shows that ByteString output is faster than String output, enough that in this case the additional work of packing is more than compensated. However, don't be fooled by that to believe that is always so. If the Strings to pack are long, the packing can take more time than the String output.
But ten million invocations of putStrLn, be it the String or the ByteString version, take a lot of time. It's faster to grab the stdout Handle just once and construct the output String in non-IO code. unlines already does well, but we still suffer from the construction of the list map show [1 .. 10^7]. Unfortunately, the compiler didn't manage to eliminate that (but it eliminated [1 .. 10^7], that's already pretty good). So let's do it ourselves, leading to 8. That's not too terrible, but still takes more than twice as long as the C programme.
One can make a faster Haskell programme by going low-level and directly filling ByteStrings without going through String via show, but I don't know if the C speed is reachable. Anyway, that low-level code isn't very pretty, so I'll spare you what I have, but sometimes one has to get one's hands dirty if speed matters.

Using lazy byte strings gives you some buffering, because the string will be written instantly and more numbers will only be produced as they are needed. This code shows the basic idea (there might be some optimizations that could be made):
import qualified Data.ByteString.Lazy.Char8 as ByteString
main =
ByteString.putStrLn .
ByteString.intercalate (ByteString.singleton '\n') .
map (ByteString.pack . show) $
([1..10000000] :: [Int])
I still use Strings for the numbers here, which leads to horrible slowdowns. If we switch to the text library instead of the bytestring library, we get access to "native" show functions for ints, and can do this:
import Data.Monoid
import Data.List
import Data.Text.Lazy.IO as Text
import Data.Text.Lazy.Builder as Text
import Data.Text.Lazy.Builder.Int as Text
main :: IO ()
main =
Text.putStrLn .
Text.toLazyText .
mconcat .
intersperse (Text.singleton '\n') .
map Text.decimal $
([1..10000000] :: [Int])
I don't know how you are measuring the "speed" of these programs (with the pv tool?) but I imagine that one of these procedures will be the fastest trivial program you can get.

If you are going for maximum performance, then it helps to take a holistic view; i.e., you want to write a function that maps from [Int] to series of system calls that write chunks of memory to a file.
Lazy bytestrings are good representation for a sequence of chunks of memory. Mapping a lazy bytestring to a series of systems calls that write chunks of memory is what L.hPut is doing (assuming an import qualified Data.ByteString.Lazy as L). Hence, we just need a means to efficiently construct the corresponding lazy bytestring. This is what lazy bytestring builders are good at. With the new bytestring builder (here is the API documentation), the following code does the job.
import qualified Data.ByteString.Lazy as L
import Data.ByteString.Lazy.Builder (toLazyByteString, charUtf8)
import Data.ByteString.Lazy.Builder.ASCII (intDec)
import Data.Foldable (foldMap)
import Data.Monoid (mappend)
import System.IO (openFile, IOMode(..))
main :: IO ()
main = do
h <- openFile "/dev/null" WriteMode
L.hPut h $ toLazyByteString $
foldMap ((charUtf8 '\n' `mappend`) . intDec) [1..10000000]
Note that I output to /dev/null to avoid interference by the disk driver. The effort of moving the data to the OS remains the same. On my machine, the above code runs in 0.45 seconds, which is 12 times faster than the 5.4 seconds of your original code. This implies a throughput of 168 MB/s. We can squeeze out an additional 30% speed (220 MB/s) using bounded encodings].
import qualified Data.ByteString.Lazy.Builder.BasicEncoding as E
L.hPut h $ toLazyByteString $
E.encodeListWithB
((\x -> (x, '\n')) E.>$< E.intDec `E.pairB` E.charUtf8)
[1..10000000]
Their syntax looks a bit quirky because a BoundedEncoding a specifies the conversion of a Haskell value of type a to a bounded-length sequence of bytes such that the bound can be computed at compile-time. This allows functions such as E.encodeListWithB to perform some additional optimizations for implementing the actual filling of the buffer. See the the documentation of Data.ByteString.Lazy.Builder.BasicEncoding in the above link to the API documentation (phew, stupid hyperlink limit for new users) for more information.
Here is the source of all my benchmarks.
The conclusion is that we can get very good performance from a declarative solution provided that we understand the cost model of our implementation and use the right datastructures. Whenever constructing a packed sequence of values (e.g., a sequence of bytes represented as a bytestring), then the right datastructure to use is a bytestring Builder.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string