Is this a bug in Haskell's printf "%g"? - haskell

This c++ program
#include <cstdio>
int main(void)
{
double x = 1.0;
printf("%g\n", x);
double y = 1.25;
printf("%g\n", y);
}
seems to perform correctly according to my understanding of fprint "%g", as it produces the following output:
1
1.25
However, the output from this Haskell program
import Numeric (showGFloat)
import Text.Printf (printf)
main :: IO ()
main = do
let x = 1.0 :: Double
putStrLn $ printf "%g" x
putStrLn $ showGFloat Nothing x ""
let y = 1.25 :: Double
putStrLn $ printf "%g" y
putStrLn $ showGFloat Nothing y ""
is
1.0
1.0
1.25
1.25
My question is: Why does Haskell print "1.0" instead of "1", as I expected it to? The Haskell docs for printf suggest that the Haskell behavior should be the same the C++ behavior. Or am I missing something?

It really looks like a bug....
In the C spec, it is very clear that %g is to cut off trailing '.000...'s. See http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf, p. 313:
Finally, unless the # flag is used, any trailing zeros are removed from the
fractional portion of the result and the decimal-point character is removed if
there is no fractional portion remaining.
The bug exists in Hugs also, I tried it.
Also, awk uses printf, and there is a Linux command line version of printf, and both are compatible with the c version, so ghci and hugs are certainly the odd man out.
This isn't the only incompatibility, as printf should (according to the spec) truncate at 5 digits when no precision is given, and it doesn't.
In Haskell
> printf "%g\n" 1.111111111111111111111
yields
1.1111111111111112
while, in C
int main() {
printf("%g\n", 1.111111111111111);
}
yields
1.11111
The Haskell report doesn't ever mention the Text.Printf library, so despite it being in base, I don't think you can consider this either a Haskell, ghc, or hugs bug, but just a library bug.

Related

How to find the index of a list with XMonad.Util.Run runProcessWithInput output in haskell?

I am trying to write a custom functionality for my xmonad window manager.
I am not very comfortable with functional programming yet and still am trying to wrap my head around it. In python etc. this would be an easy task for me.
I have the light utility installed on my system.
$ light -S 40.0
for example sets the backgroundlight to 40% brightness.
$ light -G
returns the current brightness.
I have a list with values I like to advance through (and a start index)
screenBrightnessIndex :: Int
screenBrightnessIndex = 9
screenBrightnessSteps :: [Float]
screenBrightnessSteps = [0, 0.1, 0.2, 0.4, 0.8, 2, 5, 10, 20, 40, 60, 80, 100]
I wrote the following function:
changeBrightness :: Int -> Int
changeBrightness i
| sBI + i < 0 = 0
| sBI + i > length screenBrightnessSteps - 1 = length screenBrightnessSteps - 1
| otherwise = sBI + i
where
sBI = fromMaybe screenBrightnessIndex (elemIndex (???) screenBrightnessSteps)
It should check the current output of "light -G" and find the index. Then it either increases or deceases the index by the provided value.
Calling the functions should work like this:
spawn $ "light -S " ++ show (screenBrightnessSteps !! changeBrightness 1)
spawn $ "light -S " ++ show (screenBrightnessSteps !! changeBrightness -1)
My problem is that I don't understand how I would get the current screen brightness as a float.
I tried the following (with XMonad.Util.Run):
read (runProcessWithInput "light" ["-G"] "") :: Float
But this gives me errors. I know it has to do with the fact that the function return a MonadIO String and not a String. I also found several answers for similar problems. Most of them said it isn't possible to extract a String because of exceptions that may occur in Monads and security. But Monads are still very confusing to me. Also I don't believe there is a programming problem without a solution. I just don't know how.
I hope I explained my problem sufficiently.
It would be nice if someone could help me with this.
It would be even better if there is a better/more simple approach to the problem I am not thinking of.
Thanks alot.
The signatures can help here:
runProcessWithInput :: MonadIO m => FilePath -> [String] -> String -> m String
read :: Read a => String -> a
read expects a String type and not a MonadIO m => m StringSource type. So your goal should be to get a String from that first and then pass it to read. This might be usefull: https://wiki.haskell.org/Introduction_to_Haskell_IO/Actions

Calculate simple math expression pass as a command line argument

I'm a newbie in Haskell and I'm lost. I was trying to parse a math expression, but really don't know how Haskell programming works well. So what I'm trying to write is a program to resolve a simple math expression. I'm looking for ideas on how I could resolve by giving arguments.
The command line could look like : ./math "3 + 2" or ./math "5 * 8"
My code looks like this:
import System.Environment (getArgs)
import Text.Printf
main :: IO ()
main = do
args <- getArgs
printf "%.2f" args[1] + args[2]
Haskell has no array[index] syntax. It does have list!!index syntax (which isn't really special syntax at all, !! is just an infix-function defined in the prelude). Note that Haskell indices are 0-based and unlike in Bash, the zeroth argument is not the command name itself, so you probably want indices 0 and 1.
Also, in Haskell function application binds more tightly than any operators. So, if you were to write
printf "%.2f" args!!0 + args!!1
it would parse as ((printf "%.2f" args)!!0) + (args!!1), which is obviously not right. You need to make explicit what precedence you want:
printf "%.2f" (args!!0 + args!!1)
or as we like to do it, with $ instead of parens:
printf "%.2f" $ args!!0 + args!!1
That's still not right, because the arguments come in as strings, but the addition should be performed on numbers. For this, you need to read the numbers; I'd suggest you do that separately:
import Text.Read (readMaybe)
main = do
args <- getArgs
let a, b :: Double
Just a = readMaybe $ args!!0
Just b = readMaybe $ args!!1
printf "%.2f" $ a + b
$ runhaskell Argsmath.hs 3 2
5.00
Of course this will not allow you to do stuff like ./math "5 * 8" because you have no means of parsing the *. For that, something read-based would be awkward; I suggest you check out parser combinator libraries, there are plenty of tutorials around; this one seems to be nice and simple.

Reading Unformatted Binary file: Unexpected output - Fortran90

Preface: I needed to figure out the structure of a binary grid_data_file. From the Fortran routines I figured that the first record consists of 57 bytes and has information in the following order.
No. of the file :: integer*4
File name :: char*16
file status :: char*3 (i.e. new, old, tmp)
.... so forth (rest is clear from write statement in the program)
Now for the testing I wrote a simple program as follows: (I haven't included all the parameters)
Program testIO
implicit none
integer :: x, nclat, nclon
character :: y, z
real :: lat_gap, lon_gap, north_lat, west_lat
integer :: gridtype
open(11, file='filename', access='direct', form='unformatted', recl='200')
read(11, rec=1) x,y,z,lat_gap,lon_gap, north_lat,west_lat, nclat, nclon, gridtyp
write(*,*) x,y,z,lat_gap,lon_gap, north_lat,west_lat, nclat, nclon, gridtyp
close(11)
END
To my surprise, when I change the declaration part to
integer*4 :: x, nclat, nclon
character*16 :: y
character*3 :: z
real*4 :: lat_gap, lon_gap, north_lat, west_lat
integer*2 :: gridtype
It gives me SOME correct information, albeit not all! I can't understand this. It would help me to improve my Fortran knowledge if someone explains this phenomenon.
Moreover, I can't use ACCESS=stream due to machine being old and not supported, so I conclude that above is the only possibility to figure out the file structure.
From your replies and what others have commented, I think your problem might be a misunderstanding of what a Fortran "record" is:
You say that you have a binary file where each entry (you said record, but more on that later) is 57 bytes.
The problem is that a "record" in Fortran I/O is not what you would expect it is coming from a C (or anywhere else, really) background. See the following document from Intel, which gives a good explanation of the different access modes:
https://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/fortran/lin/bldaps_for/common/bldaps_rectypes.htm
In short, it has extra data (a header) describing the data in each entry.
Moreover, I can't use ACCESS=stream due to machine being old and not supported, so I conclude that above is the only possibility to figure out the file structure. Any guidance would be a big help!
If you can't use stream, AFAIK there is really no simple and painless way to read binary files with no record information.
A possible solution which requires a C compiler is to do IO in a C function that you call from Fortran, "minimal" example:
main.f90:
program main
integer, parameter :: dp = selected_real_kind(15)
character(len=*), parameter :: filename = 'test.bin'
real(dp) :: val
call read_bin(filename, val)
print*, 'Read: ', val
end program
read.c:
#include <string.h>
#include <stdio.h>
void read_bin_(const char *fname, double *ret, unsigned int len)
{
char buf[256];
printf("len = %d\n", len);
strncpy(buf, fname, len);
buf[len] = '\0'; // fortran strings are not 0-terminated
FILE* fh = fopen(buf, "rb");
fread(ret, sizeof(double), 1, fh);
fclose(fh);
}
Note that there is an extra parameter needed in the end and some string manipulation because of the way Fortran handles strings, which differs from C.
write.c:
#include <stdio.h>
int main() {
double d = 1.234;
FILE* fh = fopen("test.bin", "wb");
fwrite(&d, sizeof(double), 1, fh);
fclose(fh);
}
Compilation instructions:
gcc -o write write.c
gcc -c -g read.c
gfortran -g -o readbin main.f90 read.o
Create binary file with ./write, then see how the Fortran code can read it back with ./readbin.
This can be extended for different data types to basically emulate access=stream. In the end, if you can recompile the original Fortran code to output the data file differently, this will be the easiest solution, as this one is pretty much a crude hack.
Lastly, a tip for getting into unknown data formats: The tool od is your friend, check its manpage. It can directly convert binary represantations into a variety of different native datatypes. Try with the above example (the z adds the character representation in the right-hand column, not very useful here, in general yes):
od -t fDz test.bin

Haskell triangle microbenchmark

Consider this simple "benchmark":
n :: Int
n = 1000
main = do
print $ length [(a,b,c) | a<-[1..n],b<-[1..n],c<-[1..n],a^2+b^2==c^2]
and appropriate C version:
#include <stdio.h>
int main(void)
{
int a,b,c, N=1000;
int cnt = 0;
for (a=1;a<=N;a++)
for (b=1;b<=N;b++)
for (c=1;c<=N;c++)
if (a*a+b*b==c*c) cnt++;
printf("%d\n", cnt);
}
Compilation:
Haskell version is compiled as: ghc -O2 triangle.hs (ghc 7.4.1)
C version is compiled as: gcc -O2 -o triangle-c triangle.c (gcc 4.6.3)
Run times:
Haskell: 4.308s real
C: 1.145s real
Is it OK behavior even for such a simple and maybe well optimizable program that Haskell is almost 4 times slower? Where does Haskell waste time?
The Haskell version is wasting time allocating boxed integers and tuples.
You can verify this by for example running the haskell program with the flags +RTS -s. For me the outputted statistics include:
80,371,600 bytes allocated in the heap
A straightforward encoding of the C version is faster since the compiler can use unboxed integers and skip allocating tuples:
n :: Int
n = 1000
main = do
print $ f n
f :: Int -> Int
f max = go 0 1 1 1
where go cnt a b c
| a > max = cnt
| b > max = go cnt (a+1) 1 1
| c > max = go cnt a (b+1) 1
| a^2+b^2==c^2 = go (cnt+1) a b (c+1)
| otherwise = go cnt a b (c+1)
See:
51,728 bytes allocated in the heap
The running time of this version is 1.920s vs. 1.212s for the C version.
I don't know how much your "bench" is relevant.
I agree that the list-comprehension syntax is "nice" to use, but if you want to compare the performances of the two languages, you should maybe compare them on a fairer test.
I mean, creating a list of possibly a lot of elements and then calculating it's length is nothing like incrementing a counter in a (triple loop).
So maybe haskell has some nice optimizations which detects what you are doing and never creates the list, but I wouldn't code relying on that, and you probably shouldn't either.
I don't think you would code your program like that if you needed to count rapidly, so why do it for this bench?
Haskell can be optimized quite well — but you need the proper techniques, and you need to know what you're doing.
This list comprehension syntax is elegant, yet wasteful. You should read the appropriate chapter of RealWorldHaskell in order to find out more about your profiling opportunities. In this exact case, you create a lot of list spines and boxed Ints for no good reason at all. See here:
You should definitely do something about that. EDIT: #opqdonut just posted a good answer on how to make this faster.
Just remember next time to profile your application before comparing any benchmarks. Haskell makes it easy to write idiomatic code, but it also hides a lot of complexity.

How to get good performance when writing a list of integers from 1 to 10 million to a file?

question
I want a program that will write a sequence like,
1
...
10000000
to a file. What's the simplest code one can write, and get decent performance? My intuition is that there is some lack-of-buffering problem. My C code runs at 100 MB/s, whereas by reference the Linux command line utility dd runs at 9 GB/s 3 GB/s (sorry for the imprecision, see comments -- I'm more interested in the big picture orders-of-magnitude though).
One would think this would be a solved problem by now ... i.e. any modern compiler would make it immediate to write such programs that perform reasonably well ...
C code
#include <stdio.h>
int main(int argc, char **argv) {
int len = 10000000;
for (int a = 1; a <= len; a++) {
printf ("%d\n", a);
}
return 0;
}
I'm compiling with clang -O3. A performance skeleton which calls putchar('\n') 8 times gets comparable performance.
Haskell code
A naiive Haskell implementation runs at 13 MiB/sec, compiling with ghc -O2 -optc-O3 -optc-ffast-math -fllvm -fforce-recomp -funbox-strict-fields. (I haven't recompiled my libraries with -fllvm, perhaps I need to do that.) Code:
import Control.Monad
main = forM [1..10000000 :: Int] $ \j -> putStrLn (show j)
My best stab with Haskell runs even slower, at 17 MiB/sec. The problem is I can't find a good way to convert Vector's into ByteString's (perhaps there's a solution using iteratees?).
import qualified Data.Vector.Unboxed as V
import Data.Vector.Unboxed (Vector, Unbox, (!))
writeVector :: (Unbox a, Show a) => Vector a -> IO ()
writeVector v = V.mapM_ (System.IO.putStrLn . show) v
main = writeVector (V.generate 10000000 id)
It seems that writing ByteString's is fast, as demonstrated by this code, writing an equivalent number of characters,
import Data.ByteString.Char8 as B
main = B.putStrLn (B.replicate 76000000 '\n')
This gets 1.3 GB/s, which isn't as fast as dd, but obviously much better.
Some completely unscientific benchmarking first:
All programmes have been compiled with the default optimisation level (-O3 for gcc, -O2 for GHC) and run with
time ./prog > outfile
As a baseline, the C programme took 1.07s to produce a ~76MB (78888897 bytes) file, roughly 70MB/s throughput.
The "naive" Haskell programme (forM [1 .. 10000000] $ \j -> putStrLn (show j)) took 8.64s, about 8.8MB/s.
The same with forM_ instead of forM took 5.64s, about 13.5MB/s.
The ByteString version from dflemstr's answer took 9.13s, about 8.3MB/s.
The Text version from dflemstr's answer took 5.64s, about 13.5MB/s.
The Vector version from the question took 5.54s, about 13.7MB/s.
main = mapM_ (C.putStrLn . C.pack . show) $ [1 :: Int .. 10000000], where C is Data.ByteString.Char8, took 4.25s, about 17.9MB/s.
putStr . unlines . map show $ [1 :: Int .. 10000000] took 3.06s, about 24.8MB/s.
A manual loop,
main = putStr $ go 1
where
go :: Int -> String
go i
| i > 10000000 = ""
| otherwise = shows i . showChar '\n' $ go (i+1)
took 2.32s, about 32.75MB/s.
main = putStrLn $ replicate 78888896 'a' took 1.15s, about 66MB/s.
main = C.putStrLn $ C.replicate 78888896 'a' where C is Data.ByteString.Char8, took 0.143s, about 530MB/s, roughly the same figures for lazy ByteStrings.
What can we learn from that?
First, don't use forM or mapM unless you really want to collect the results. Performancewise, that sucks.
Then, ByteString output can be very fast (10.), but if the construction of the ByteString to output is slow (3.), you end up with slower code than the naive String output.
What's so terrible about 3.? Well, all the involved Strings are very short. So you get a list of
Chunk "1234567" Empty
and between any two such, a Chunk "\n" Empty is put, then the resulting list is concatenated, which means all these Emptys are tossed away when a ... (Chunk "1234567" (Chunk "\n" (Chunk "1234568" (...)))) is built. That's a lot of wasteful construct-deconstruct-reconstruct going on. Speed comparable to that of the Text and the fixed "naive" String version can be achieved by packing to strict ByteStrings and using fromChunks (and Data.List.intersperse for the newlines). Better performance, slightly better than 6., can be obtained by eliminating the costly singletons. If you glue the newlines to the Strings, using \k -> shows k "\n" instead of show, the concatenation has to deal with half as many slightly longer ByteStrings, which pays off.
I'm not familiar enough with the internals of either text or vector to offer more than a semi-educated guess concerning the reasons for the observed performance, so I'll leave them out. Suffice it to say that the performance gain is marginal at best compared to the fixed naive String version.
Now, 6. shows that ByteString output is faster than String output, enough that in this case the additional work of packing is more than compensated. However, don't be fooled by that to believe that is always so. If the Strings to pack are long, the packing can take more time than the String output.
But ten million invocations of putStrLn, be it the String or the ByteString version, take a lot of time. It's faster to grab the stdout Handle just once and construct the output String in non-IO code. unlines already does well, but we still suffer from the construction of the list map show [1 .. 10^7]. Unfortunately, the compiler didn't manage to eliminate that (but it eliminated [1 .. 10^7], that's already pretty good). So let's do it ourselves, leading to 8. That's not too terrible, but still takes more than twice as long as the C programme.
One can make a faster Haskell programme by going low-level and directly filling ByteStrings without going through String via show, but I don't know if the C speed is reachable. Anyway, that low-level code isn't very pretty, so I'll spare you what I have, but sometimes one has to get one's hands dirty if speed matters.
Using lazy byte strings gives you some buffering, because the string will be written instantly and more numbers will only be produced as they are needed. This code shows the basic idea (there might be some optimizations that could be made):
import qualified Data.ByteString.Lazy.Char8 as ByteString
main =
ByteString.putStrLn .
ByteString.intercalate (ByteString.singleton '\n') .
map (ByteString.pack . show) $
([1..10000000] :: [Int])
I still use Strings for the numbers here, which leads to horrible slowdowns. If we switch to the text library instead of the bytestring library, we get access to "native" show functions for ints, and can do this:
import Data.Monoid
import Data.List
import Data.Text.Lazy.IO as Text
import Data.Text.Lazy.Builder as Text
import Data.Text.Lazy.Builder.Int as Text
main :: IO ()
main =
Text.putStrLn .
Text.toLazyText .
mconcat .
intersperse (Text.singleton '\n') .
map Text.decimal $
([1..10000000] :: [Int])
I don't know how you are measuring the "speed" of these programs (with the pv tool?) but I imagine that one of these procedures will be the fastest trivial program you can get.
If you are going for maximum performance, then it helps to take a holistic view; i.e., you want to write a function that maps from [Int] to series of system calls that write chunks of memory to a file.
Lazy bytestrings are good representation for a sequence of chunks of memory. Mapping a lazy bytestring to a series of systems calls that write chunks of memory is what L.hPut is doing (assuming an import qualified Data.ByteString.Lazy as L). Hence, we just need a means to efficiently construct the corresponding lazy bytestring. This is what lazy bytestring builders are good at. With the new bytestring builder (here is the API documentation), the following code does the job.
import qualified Data.ByteString.Lazy as L
import Data.ByteString.Lazy.Builder (toLazyByteString, charUtf8)
import Data.ByteString.Lazy.Builder.ASCII (intDec)
import Data.Foldable (foldMap)
import Data.Monoid (mappend)
import System.IO (openFile, IOMode(..))
main :: IO ()
main = do
h <- openFile "/dev/null" WriteMode
L.hPut h $ toLazyByteString $
foldMap ((charUtf8 '\n' `mappend`) . intDec) [1..10000000]
Note that I output to /dev/null to avoid interference by the disk driver. The effort of moving the data to the OS remains the same. On my machine, the above code runs in 0.45 seconds, which is 12 times faster than the 5.4 seconds of your original code. This implies a throughput of 168 MB/s. We can squeeze out an additional 30% speed (220 MB/s) using bounded encodings].
import qualified Data.ByteString.Lazy.Builder.BasicEncoding as E
L.hPut h $ toLazyByteString $
E.encodeListWithB
((\x -> (x, '\n')) E.>$< E.intDec `E.pairB` E.charUtf8)
[1..10000000]
Their syntax looks a bit quirky because a BoundedEncoding a specifies the conversion of a Haskell value of type a to a bounded-length sequence of bytes such that the bound can be computed at compile-time. This allows functions such as E.encodeListWithB to perform some additional optimizations for implementing the actual filling of the buffer. See the the documentation of Data.ByteString.Lazy.Builder.BasicEncoding in the above link to the API documentation (phew, stupid hyperlink limit for new users) for more information.
Here is the source of all my benchmarks.
The conclusion is that we can get very good performance from a declarative solution provided that we understand the cost model of our implementation and use the right datastructures. Whenever constructing a packed sequence of values (e.g., a sequence of bytes represented as a bytestring), then the right datastructure to use is a bytestring Builder.

Resources