Haskell String formattation as in C - string

Is it possible to manipulate strings to output just as it is possible in C, I mean:
printf("%.2f", number);
is it possible to do the same formating in Haskell?

You can use the Text.Printf module, which is part of the base package, so it is (normally) already installed. This module is documented as:
A C printf(3)-like formatter. This version has been extended by Bart Massey as per the recommendations of John Meacham and Simon Marlow.
We can make use of the printf function, for example:
Prelude> import Text.Printf
Prelude Text.Printf> number = 3.1415926
Prelude Text.Printf> printf "%.2f\n" number
3.14

Related

Calculate simple math expression pass as a command line argument

I'm a newbie in Haskell and I'm lost. I was trying to parse a math expression, but really don't know how Haskell programming works well. So what I'm trying to write is a program to resolve a simple math expression. I'm looking for ideas on how I could resolve by giving arguments.
The command line could look like : ./math "3 + 2" or ./math "5 * 8"
My code looks like this:
import System.Environment (getArgs)
import Text.Printf
main :: IO ()
main = do
args <- getArgs
printf "%.2f" args[1] + args[2]
Haskell has no array[index] syntax. It does have list!!index syntax (which isn't really special syntax at all, !! is just an infix-function defined in the prelude). Note that Haskell indices are 0-based and unlike in Bash, the zeroth argument is not the command name itself, so you probably want indices 0 and 1.
Also, in Haskell function application binds more tightly than any operators. So, if you were to write
printf "%.2f" args!!0 + args!!1
it would parse as ((printf "%.2f" args)!!0) + (args!!1), which is obviously not right. You need to make explicit what precedence you want:
printf "%.2f" (args!!0 + args!!1)
or as we like to do it, with $ instead of parens:
printf "%.2f" $ args!!0 + args!!1
That's still not right, because the arguments come in as strings, but the addition should be performed on numbers. For this, you need to read the numbers; I'd suggest you do that separately:
import Text.Read (readMaybe)
main = do
args <- getArgs
let a, b :: Double
Just a = readMaybe $ args!!0
Just b = readMaybe $ args!!1
printf "%.2f" $ a + b
$ runhaskell Argsmath.hs 3 2
5.00
Of course this will not allow you to do stuff like ./math "5 * 8" because you have no means of parsing the *. For that, something read-based would be awkward; I suggest you check out parser combinator libraries, there are plenty of tutorials around; this one seems to be nice and simple.

Data.ByteString.Lazy.Internal.ByteString to string?

Trying to write a module which returns the external IP address of my computer.
Using Network.Wreq get function, then applying a lense to obtain responseBody, the type I end up with is Data.ByteString.Lazy.Internal.ByteString. As I want to filter out the trailing "\n" of the result body, I want to use this for a regular expression subsequently.
Problem: That seemingly very specific ByteString type is not accepted by regex library and I found no way to convert it to a String.
Here is my feeble attempt so far (not compiling).
{-# LANGUAGE OverloadedStrings #-}
module ExtIp (getExtIp) where
import Network.Wreq
import Control.Lens
import Data.BytesString.Lazy
import Text.Regex.Posix
getExtIp :: IO String
getExtIp = do
r <- get "http://myexternalip.com/raw"
let body = r ^. responseBody
let addr = body =~ "[^\n]*\n"
return (addr)
So my question is obviously: How to convert that funny special ByteString to a String? Explaining how I can approach such a problem myself is also appreciated. I tried to use unpack and toString but have no idea what to import to get those functions if they exist.
Being a very sporadic haskell user, I also wonder if someone could show me the idiomatic haskell way of defining such a function. The version I show here does not account for possible runtime errors/exceptions, after all.
Short answer: Use unpack from Data.ByteString.Lazy.Char8
Longer answer:
In general when you want to convert a ByteString (of any variety) to a String or Text you have to specify an encoding - e.g. UTF-8 or Latin1, etc.
When retrieving an HTML page the encoding you are suppose to use may appear in the Content-type header or in the response body itself as a <meta ...> tag.
Alternatively you can just guess at what the encoding of the body is.
In your case I presume you are accessing a site like http://whatsmyip.org and you only need to parse out your IP address. So without examining the headers or looking through the HTML, a safe encoding to use would be Latin1.
To convert ByteStrings to Text via an encoding, have a look at the functions in Data.Text.Encoding
For instance, the decodeLatin1 function.
I simply do not understand why you insist on using Strings, when you have already a ByteString at hand that is the faster/more efficient implementation.
Importing regex gives you almost no benefit - for parsing an ip-address I would use attoparsec which works great with ByteStrings.
Here is a version that does not use regex but returns a String - note I did not compile it for I have no haskell setup where I am right now.
{-# LANGUAGE OverloadedStrings #-}
module ExtIp (getExtIp) where
import Network.Wreq
import Control.Lens
import Data.ByteString.Lazy.Char8 as Char8
import Data.Char (isSpace)
getExtIp :: IO String
getExtIp = do
r <- get "http://myexternalip.com/raw"
return $ Char8.unpack $ trim (r ^. responseBody)
where trim = Char8.reverse . (Char8.dropWhile isSpace) . Char8.reverse . (Char8.dropWhile isSpace)

How to get a String from a Lazy.Builder?

I need to manipulate the binary encoding as '0' and '1' of simple strings given as input, using ascii 7-bits.
For the encoding I have used the function Data.ByteString.Lazy.Builder.string7 :: String -> Builder
However, I have not found a way to convert back the resulting Builder object into a string of '0' and '1'. Is it possible ? Is there another way ?
Subsidiary question: And if I wanted it in hexadecimal form as text ?
There's an unpackChars function in Data.ByteString.Lazy.Internal. There's also a non-lazy counterpart in Data.ByteString.Internal.
import qualified Data.ByteString.Lazy.Builder as Build
import qualified Data.ByteString.Lazy as BS
import qualified Data.ByteString.Lazy.Internal as BSI
--> BSI.unpackChars $ Build.toLazyByteString $ Build.string7 "010101"
--"010101"
You can also use map (chr . fromIntegral) . BS.unpack instead of unpackChars, but unpackChars is probably faster.
Alternatively, as Michael Snoyman commented below, you could use Data.ByteString.Char8 or its lazy version and you'll get the right conversions to begin with.

How to get good performance when writing a list of integers from 1 to 10 million to a file?

question
I want a program that will write a sequence like,
1
...
10000000
to a file. What's the simplest code one can write, and get decent performance? My intuition is that there is some lack-of-buffering problem. My C code runs at 100 MB/s, whereas by reference the Linux command line utility dd runs at 9 GB/s 3 GB/s (sorry for the imprecision, see comments -- I'm more interested in the big picture orders-of-magnitude though).
One would think this would be a solved problem by now ... i.e. any modern compiler would make it immediate to write such programs that perform reasonably well ...
C code
#include <stdio.h>
int main(int argc, char **argv) {
int len = 10000000;
for (int a = 1; a <= len; a++) {
printf ("%d\n", a);
}
return 0;
}
I'm compiling with clang -O3. A performance skeleton which calls putchar('\n') 8 times gets comparable performance.
Haskell code
A naiive Haskell implementation runs at 13 MiB/sec, compiling with ghc -O2 -optc-O3 -optc-ffast-math -fllvm -fforce-recomp -funbox-strict-fields. (I haven't recompiled my libraries with -fllvm, perhaps I need to do that.) Code:
import Control.Monad
main = forM [1..10000000 :: Int] $ \j -> putStrLn (show j)
My best stab with Haskell runs even slower, at 17 MiB/sec. The problem is I can't find a good way to convert Vector's into ByteString's (perhaps there's a solution using iteratees?).
import qualified Data.Vector.Unboxed as V
import Data.Vector.Unboxed (Vector, Unbox, (!))
writeVector :: (Unbox a, Show a) => Vector a -> IO ()
writeVector v = V.mapM_ (System.IO.putStrLn . show) v
main = writeVector (V.generate 10000000 id)
It seems that writing ByteString's is fast, as demonstrated by this code, writing an equivalent number of characters,
import Data.ByteString.Char8 as B
main = B.putStrLn (B.replicate 76000000 '\n')
This gets 1.3 GB/s, which isn't as fast as dd, but obviously much better.
Some completely unscientific benchmarking first:
All programmes have been compiled with the default optimisation level (-O3 for gcc, -O2 for GHC) and run with
time ./prog > outfile
As a baseline, the C programme took 1.07s to produce a ~76MB (78888897 bytes) file, roughly 70MB/s throughput.
The "naive" Haskell programme (forM [1 .. 10000000] $ \j -> putStrLn (show j)) took 8.64s, about 8.8MB/s.
The same with forM_ instead of forM took 5.64s, about 13.5MB/s.
The ByteString version from dflemstr's answer took 9.13s, about 8.3MB/s.
The Text version from dflemstr's answer took 5.64s, about 13.5MB/s.
The Vector version from the question took 5.54s, about 13.7MB/s.
main = mapM_ (C.putStrLn . C.pack . show) $ [1 :: Int .. 10000000], where C is Data.ByteString.Char8, took 4.25s, about 17.9MB/s.
putStr . unlines . map show $ [1 :: Int .. 10000000] took 3.06s, about 24.8MB/s.
A manual loop,
main = putStr $ go 1
where
go :: Int -> String
go i
| i > 10000000 = ""
| otherwise = shows i . showChar '\n' $ go (i+1)
took 2.32s, about 32.75MB/s.
main = putStrLn $ replicate 78888896 'a' took 1.15s, about 66MB/s.
main = C.putStrLn $ C.replicate 78888896 'a' where C is Data.ByteString.Char8, took 0.143s, about 530MB/s, roughly the same figures for lazy ByteStrings.
What can we learn from that?
First, don't use forM or mapM unless you really want to collect the results. Performancewise, that sucks.
Then, ByteString output can be very fast (10.), but if the construction of the ByteString to output is slow (3.), you end up with slower code than the naive String output.
What's so terrible about 3.? Well, all the involved Strings are very short. So you get a list of
Chunk "1234567" Empty
and between any two such, a Chunk "\n" Empty is put, then the resulting list is concatenated, which means all these Emptys are tossed away when a ... (Chunk "1234567" (Chunk "\n" (Chunk "1234568" (...)))) is built. That's a lot of wasteful construct-deconstruct-reconstruct going on. Speed comparable to that of the Text and the fixed "naive" String version can be achieved by packing to strict ByteStrings and using fromChunks (and Data.List.intersperse for the newlines). Better performance, slightly better than 6., can be obtained by eliminating the costly singletons. If you glue the newlines to the Strings, using \k -> shows k "\n" instead of show, the concatenation has to deal with half as many slightly longer ByteStrings, which pays off.
I'm not familiar enough with the internals of either text or vector to offer more than a semi-educated guess concerning the reasons for the observed performance, so I'll leave them out. Suffice it to say that the performance gain is marginal at best compared to the fixed naive String version.
Now, 6. shows that ByteString output is faster than String output, enough that in this case the additional work of packing is more than compensated. However, don't be fooled by that to believe that is always so. If the Strings to pack are long, the packing can take more time than the String output.
But ten million invocations of putStrLn, be it the String or the ByteString version, take a lot of time. It's faster to grab the stdout Handle just once and construct the output String in non-IO code. unlines already does well, but we still suffer from the construction of the list map show [1 .. 10^7]. Unfortunately, the compiler didn't manage to eliminate that (but it eliminated [1 .. 10^7], that's already pretty good). So let's do it ourselves, leading to 8. That's not too terrible, but still takes more than twice as long as the C programme.
One can make a faster Haskell programme by going low-level and directly filling ByteStrings without going through String via show, but I don't know if the C speed is reachable. Anyway, that low-level code isn't very pretty, so I'll spare you what I have, but sometimes one has to get one's hands dirty if speed matters.
Using lazy byte strings gives you some buffering, because the string will be written instantly and more numbers will only be produced as they are needed. This code shows the basic idea (there might be some optimizations that could be made):
import qualified Data.ByteString.Lazy.Char8 as ByteString
main =
ByteString.putStrLn .
ByteString.intercalate (ByteString.singleton '\n') .
map (ByteString.pack . show) $
([1..10000000] :: [Int])
I still use Strings for the numbers here, which leads to horrible slowdowns. If we switch to the text library instead of the bytestring library, we get access to "native" show functions for ints, and can do this:
import Data.Monoid
import Data.List
import Data.Text.Lazy.IO as Text
import Data.Text.Lazy.Builder as Text
import Data.Text.Lazy.Builder.Int as Text
main :: IO ()
main =
Text.putStrLn .
Text.toLazyText .
mconcat .
intersperse (Text.singleton '\n') .
map Text.decimal $
([1..10000000] :: [Int])
I don't know how you are measuring the "speed" of these programs (with the pv tool?) but I imagine that one of these procedures will be the fastest trivial program you can get.
If you are going for maximum performance, then it helps to take a holistic view; i.e., you want to write a function that maps from [Int] to series of system calls that write chunks of memory to a file.
Lazy bytestrings are good representation for a sequence of chunks of memory. Mapping a lazy bytestring to a series of systems calls that write chunks of memory is what L.hPut is doing (assuming an import qualified Data.ByteString.Lazy as L). Hence, we just need a means to efficiently construct the corresponding lazy bytestring. This is what lazy bytestring builders are good at. With the new bytestring builder (here is the API documentation), the following code does the job.
import qualified Data.ByteString.Lazy as L
import Data.ByteString.Lazy.Builder (toLazyByteString, charUtf8)
import Data.ByteString.Lazy.Builder.ASCII (intDec)
import Data.Foldable (foldMap)
import Data.Monoid (mappend)
import System.IO (openFile, IOMode(..))
main :: IO ()
main = do
h <- openFile "/dev/null" WriteMode
L.hPut h $ toLazyByteString $
foldMap ((charUtf8 '\n' `mappend`) . intDec) [1..10000000]
Note that I output to /dev/null to avoid interference by the disk driver. The effort of moving the data to the OS remains the same. On my machine, the above code runs in 0.45 seconds, which is 12 times faster than the 5.4 seconds of your original code. This implies a throughput of 168 MB/s. We can squeeze out an additional 30% speed (220 MB/s) using bounded encodings].
import qualified Data.ByteString.Lazy.Builder.BasicEncoding as E
L.hPut h $ toLazyByteString $
E.encodeListWithB
((\x -> (x, '\n')) E.>$< E.intDec `E.pairB` E.charUtf8)
[1..10000000]
Their syntax looks a bit quirky because a BoundedEncoding a specifies the conversion of a Haskell value of type a to a bounded-length sequence of bytes such that the bound can be computed at compile-time. This allows functions such as E.encodeListWithB to perform some additional optimizations for implementing the actual filling of the buffer. See the the documentation of Data.ByteString.Lazy.Builder.BasicEncoding in the above link to the API documentation (phew, stupid hyperlink limit for new users) for more information.
Here is the source of all my benchmarks.
The conclusion is that we can get very good performance from a declarative solution provided that we understand the cost model of our implementation and use the right datastructures. Whenever constructing a packed sequence of values (e.g., a sequence of bytes represented as a bytestring), then the right datastructure to use is a bytestring Builder.

Getting list of object names in module with template haskell?

I'd like to be able to take a file with declarations such as:
test_1 = assert $ 1 == 1
test_2 = assert $ 2 == 1
and generate a basic run function like
main = runTests [test_1, test2]
The goal is to get something like Python's nosetest.
Can I do this with template Haskell? I cannot find a lot of documentation on it (there are many broken links in the Wiki).
You might want to look into the test-framework family of packages. In particular, the test-framework-th package provides the Template Haskell function defaultMainGenerator which does exactly what you want for both QuickCheck and HUnit tests, as long as you follow the convention of prefixing HUnit test cases with case_ and QuickCheck properties with prop_.
{-# LANGUAGE TemplateHaskell #-}
import Test.Framework.Providers.HUnit
import Test.Framework.Providers.QuickCheck2
import Test.Framework.TH
import Test.HUnit
import Test.QuickCheck
main = $(defaultMainGenerator)
case_checkThatHUnitWorks =
assert $ 1 == 1
prop_checkThatQuickCheckWorks =
(1 == 1)
There is another way, you don't have to use template haskell. haskell-src-exts can parse Haskell, and you could extract from that.
Or if your purpose is practical, you can make like quickcheck and do a simple-minded parse, i.e. looking for identifiers that start with prop_ in column 0. This is a perfectly adequate solution for real work, though it may be theoretically unsatisfying.

Resources