Pretty print ByteString to hex nibble-wise - haskell

What's an idiomatic way of treating a bytestring nibblewise and pretty printing its hexadecimal (0-F) representation?
putStrLn . show . B.unpack
-- [1,126]
Which, upon further work
putStrLn . show . map (\x -> N.showIntAtBase 16 (DC.intToDigit) x "") . B.unpack
["1","7e"]
But what I really want is
["1","7","e"]
Or better yet
['1','7','e']
I could munge up ["1","7e"] but that string manipulation whereas I'd rather do numeric manipulation. Do I need to drop down to shifting and masking numeric values?

You can now use Data.ByteString.Builder. To print a ByteString to its hex equivalent (with two hex digits per byte, in the right order, and efficiently), simply use:
toLazyByteString . byteStringHex
or
toLazyByteString . lazyByteStringHex
depending on which flavor of ByteString you have as input.

I'd like to elaborate on max taldykin's answer (that I have upvoted), which I think is over-complicated. There is no need for NoMonomorphismRestriction, printf or Data.List.
Here is my version:
import qualified Data.ByteString as B
import Numeric (showHex)
prettyPrint :: B.ByteString -> String
prettyPrint = concat . map (flip showHex "") . B.unpack
main :: IO ()
main = putStrLn . prettyPrint . B.pack $ [102, 117, 110]

Somethig like this:
{-# LANGUAGE NoMonomorphismRestriction #-}
import qualified Data.ByteString as B
import Text.Printf
import Data.List
import Numeric
hex = foldr showHex "" . B.unpack
list = printf "[%s]" . concat . intersperse "," . map show
Test:
> let x = B.pack [102,117,110]
> list . hex $ x
"['6','6','7','5','6','e']"
Upd Oh, there is a stupid memory leak: of course you should replace foldr with foldl' (because laziness is not required here):
hex = foldl' (flip showHex) "" . B.unpack

You have ["1","7e"] :: [String]
concat ["1", "7e"] is "17e" :: String which is equal to [Char] and equal to ['1','7','e'] :: [Char].
Than you may split that String into pieces:
> Data.List.Split.splitEvery 1 . concat $ ["1", "7e"]
["1","7","e"]
it :: [[Char]]

If you just want a regular hex en/decoding of ByteStrings, you can use the memory package. They call the hex encoding Base16.
>>> let input = "Is 3 > 2?" :: ByteString
>>> let convertedTo base = convertToBase base input :: ByteString
>>> convertedTo Base16
"49732033203e20323f"
Full documentation: https://hackage.haskell.org/package/memory-0.18.0/docs/Data-ByteArray-Encoding.html#t:Base

Related

Tuple initialization from IO data in Haskell

I would like to know what is the best way to get a tuple from data read from the input in Haskell. I often encounter this problem in competitive programming when the input is made up of several lines that contain space-separated integers. Here is an example:
1 3 10
2 5 8
10 11 0
0 0 0
To read lines of integers, I use the following function:
readInts :: IO [Int]
readInts = fmap (map read . words) getLine
Then, I transform these lists into tuples with of the appropriate size:
readInts :: IO (Int, Int, Int, Int)
readInts = fmap ((\l -> (l !! 0, l !! 1, l !! 2, l !! 3)) . map read . words) getLine
This approach does not seem very idiomatic to me.
The following syntax is more readable but it only works for 2-tuples:
readInts :: IO (Int, Int)
readInts = fmap ((\[x, y] -> (x, y)) . map read . words) getLine
(EDIT: as noted in the comments, the solution above works for n-tuples in general).
Is there an idiomatic way to initialize tuples from lists of integers without having to use !! in Haskell? Alternatively, is there a different approach to processing this type of input?
How about this:
readInts :: IO (<any tuple you like>)
readInts = read . ("(" ++) . (++ ")") . intercalate "," . words <$> getLine
Given that the context is 'competitive programming' (something I'm only dimly aware of as a concept), I'm not sure that the following offers a particularly competitive alternative, but IMHO I'd consider it idiomatic to use one of several available parser combinators.
The base package comes with a module called Text.ParserCombinators.ReadP. Here's how you could use it to parse the input file from the linked article:
module Q57693986 where
import Text.ParserCombinators.ReadP
parseNumber :: ReadP Integer
parseNumber = read <$> munch1 (`elem` ['0'..'9'])
parseTriple :: ReadP (Integer, Integer, Integer)
parseTriple =
(,,) <$> parseNumber <*> (char ' ' *> parseNumber) <*> (char ' ' *> parseNumber)
parseLine :: ReadS (Integer, Integer, Integer)
parseLine = readP_to_S (parseTriple <* eof)
parseInput :: String -> [(Integer, Integer, Integer)]
parseInput = concatMap (fmap fst . filter (null . snd)) . fmap parseLine . lines
You can use the parseInput against this input file:
1 3 10
2 5 8
10 11 0
0 0 0
Here's a GHCi session that parses that file:
*Q57693986> parseInput <$> readFile "57693986.txt"
[(1,3,10),(2,5,8),(10,11,0),(0,0,0)]
Each parseLine function produces a list of tuples that match the parser; e.g.:
*Q57693986> parseLine "11 32 923"
[((11,32,923),"")]
The second element of the tuple is any remaining String still waiting to be parsed. In the above example, parseLine has completely consumed the line, which is what I'd expect for well-formed input, so the remaining String is empty.
The parser returns a list of alternatives if there's more than one way the input could be consumed by the parser, but again, in the above example, there's only one suggested alternative, as the line has been fully consumed.
The parseInput function throws away any tuple that hasn't been fully consumed, and then picks only the first element of any remaining tuples.
This approach has often served me with puzzles such as Advent of Code, where the input files tend to be well-formed.
This is a way to generate a parser that works generically for any tuple (of reasonable size). It requires the library generics-sop.
{-# LANGUAGE DeriveGeneric, DeriveAnyClass,
FlexibleContexts, TypeFamilies, TypeApplications #-}
import GHC.Generics
import Generics.SOP
import Generics.SOP (hsequence, hcpure,Proxy,to,SOP(SOP),NS(Z),IsProductType,All)
import Data.Char
import Text.ParserCombinators.ReadP
import Text.ParserCombinators.ReadPrec
import Text.Read
componentP :: Read a => ReadP a
componentP = munch isSpace *> readPrec_to_P readPrec 1
productP :: (IsProductType a xs, All Read xs) => ReadP a
productP =
let parserOutside = hsequence (hcpure (Proxy #Read) componentP)
in Generics.SOP.to . SOP . Z <$> parserOutside
For example:
*Main> productP #(Int,Int,Int) `readP_to_S` " 1 2 3 "
[((1,2,3)," ")]
It allows components of different types, as long as they all have a Read instance.
It also parses records that have a Generics.SOP.Generic instance:
data Stuff = Stuff { x :: Int, y :: Bool }
deriving (Show,GHC.Generics.Generic,Generics.SOP.Generic)
For example:
*Main> productP #Stuff `readP_to_S` " 1 True"
[(Stuff {x = 1, y = True},"")]

Print list of floats with format

In Haskell, how can I print a list of floats (or Fractional, rather) and also specify formatting? E.g. putStrLn $ magic "%.2f" [3.14159] should print [3.14].
As pointed out here there is Text.Printf (printf), but I don't understand how to use it with lists?
One thing that 'works' is
printf' :: [Double] -> [String]
printf' l = map (\x -> printf "%.2f" x) l
with
main = do
putStrLn $ show (printf' [3.14159])
but it's horrible, there must be another way.
The first two examples in this answer to a similar question don't work at all.
The answer here is neat enough, but as pointed out is not type-safe, plus it breaks Read/Show interop.
Are there any other alternatives? Thanks.
You can make use of intercalate :: [a] -> [[a]] -> [a] to put commas between the elements, and do some additional list processing to add a '[' in front and an ']' at the end, like:
import Data.List(intercalate)
main :: IO ()
main = putStrLn ('[' : intercalate "," (printf' [3.14159]) ++ "]")
this then yields:
Prelude Text.Printf Data.List> main
[3.14]
Note that you can simplify your printf' to just:
printf' :: [Double] -> [String]
printf' = map (printf "%.2f")
You can import the Numeric library to achieve this. I've quickly written a small program as an example, without the use of the printf function.
module Main where
import Numeric
main :: IO ()
main = putStrLn "hello world"
a = [1.111, 2.2222222222222, 3.33333333333333333333333]
printList :: [Double] -> [String]
printList xs = map format xs
format :: Double -> String
format x = showFFloat (Just 2) x ""
where the simple call printList a will result in:
["1.11","2.22","3.33"]

What is the fastest way to parse line with lots of Ints?

I'm learning Haskell for two years now and I'm still confused, whats the best (fastest) way to read tons of numbers from a single input line.
For learning I registered into hackerearth.com trying to solve every challenge in Haskell. But now I'm stuck with a challenge because I run into timeout issues. My program is just too slow for beeing accepted by the site.
Using the profiler I found out it takes 80%+ of the time for parsing a line with lots of integers. The percentage gets even higher when the number of values in the line increases.
Now this is the way, I'm reading numbers from an input line:
import qualified Data.ByteString.Char8 as C8
main = do
scores <- fmap (map (fst . fromJust . C8.readInt) . C8.words) C8.getLine :: IO [Int]
Is there any way to get the data faster into the variable?
BTW: The biggest testcase consist of a line with 200.000 9-digits values. Parsing takes incredible long (> 60s).
It's always difficult to declare a particular approach "the fastest", since there's almost always some way to squeeze out more performance. However, an approach using Data.ByteString.Char8 and the general method you suggest should be among the fastest methods for reading numbers. If you encounter a case where performance is poor, the problem likely lies elsewhere.
To give some concrete results, I generated a 191Meg file of 20 million 9-digit numbers, space-separate on a single line. I then tried several general methods of reading a line of numbers and printing their sum (which, for the record, was 10999281565534666). The obvious approach using String:
reader :: IO [Int]
reader = map read . words <$> getLine
sum' xs = sum xs -- work around GHC ticket 10992
main = print =<< sum' <$> reader
took 52secs; a similar approach using Text:
import qualified Data.Text as T
import qualified Data.Text.IO as T
import qualified Data.Text.Read as T
readText = map parse . T.words <$> T.getLine
where parse s = let Right (n, _) = T.decimal s in n
ran in 2.4secs (but note that it would need to be modified to handle negative numbers!); and the same approach using Char8:
import qualified Data.ByteString.Char8 as C
readChar8 :: IO [Int]
readChar8 = map parse . C.words <$> C.getLine
where parse s = let Just (n, _) = C.readInt s in n
ran in 1.4secs. All examples were compiled with -O2 on GHC 8.0.2.
As a comparison benchmark, a scanf-based C implementation:
/* GCC 5.4.0 w/ -O3 */
#include <stdio.h>
int main()
{
long x, acc = 0;
while (scanf(" %ld", &x) == 1) {
acc += x;
}
printf("%ld\n", acc);
return 0;
}
ran in about 2.5secs, on par with the Text implementation.
You can squeeze a bit more performance out of the Char8 implementation. Using a hand-rolled parser:
readChar8' :: IO [Int]
readChar8' = parse <$> C.getLine
where parse = unfoldr go
go s = do (n, s1) <- C.readInt s
let s2 = C.dropWhile C.isSpace s1
return (n, s2)
runs in about 0.9secs -- I haven't tried to determine why there's a difference, but the compiler must be missing an opportunity to perform some optimization of the words-to-readInt pipeline.
Haskell Code for Reference
Make some numbers with Numbers.hs:
-- |Generate 20M 9-digit numbers:
-- ./Numbers 20000000 100000000 999999999 > data1.txt
import qualified Data.ByteString.Char8 as C
import Control.Monad
import System.Environment
import System.Random
main :: IO ()
main = do [n, a, b] <- map read <$> getArgs
nums <- replicateM n (randomRIO (a,b))
let _ = nums :: [Int]
C.putStrLn (C.unwords (map (C.pack . show) nums))
Find their sum with Sum.hs:
import Data.List
import qualified Data.Text as T
import qualified Data.Text.IO as T
import qualified Data.Text.Read as T
import qualified Data.Char8 as C
import qualified Data.ByteString.Char8 as C
import System.Environment
-- work around https://ghc.haskell.org/trac/ghc/ticket/10992
sum' xs = sum xs
readString :: IO [Int]
readString = map read . words <$> getLine
readText :: IO [Int]
readText = map parse . T.words <$> T.getLine
where parse s = let Right (n, _) = T.decimal s in n
readChar8 :: IO [Int]
readChar8 = map parse . C.words <$> C.getLine
where parse s = let Just (n, _) = C.readInt s in n
readHand :: IO [Int]
readHand = parse <$> C.getLine
where parse = unfoldr go
go s = do (n, s1) <- C.readInt s
let s2 = C.dropWhile C.isSpace s1
return (n, s2)
main = do [method] <- getArgs
let reader = case method of
"string" -> readString
"text" -> readText
"char8" -> readChar8
"hand" -> readHand
print =<< sum' <$> reader
where:
./Sum string <data1.txt # 54.3 secs
./Sum text <data1.txt # 2.29 secs
./Sum char8 <data1.txt # 1.34 secs
./Sum hand <data1.txt # 0.91 secs

Reading numbers inline

Imagine I read an input block via stdin that looks like this:
3
12
16
19
The first number is the number of following rows. I have to process these numbers via a function and report the results separated by a space.
So I wrote this main function:
main = do
num <- readLn
putStrLn $ intercalate " " [ show $ myFunc $ read getLine | c <- [1..num]]
Of course that function doesn't compile because of the read getLine.
But what is the correct (read: the Haskell way) way to do this properly? Is it even possible to write this function as a one-liner?
Is it even possible to write this function as a one-liner?
Well, it is, and it's kind of concise, but see for yourself:
main = interact $ unwords . map (show . myFunc . read) . drop 1 . lines
So, how does this work?
interact :: (String -> String) -> IO () takes all contents from STDIN, passes it through the given function, and prints the output.
We use unwords . map (show . myFunc . read) . drop 1 . lines :: String -> String:
lines :: String -> [String] breaks a string at line ends.
drop 1 removes the first line, as we don't actually need the number of lines.
map (show . myFunc . read) converts each String to the correct type, uses myFunc, and then converts it back to a `String.
unwords is basically the same as intercalate " ".
However, keep in mind that interact isn't very GHCi friendly.
You can build a list of monadic actions with <$> (or fmap) and execute them all with sequence.
λ intercalate " " <$> sequence [show . (2*) . read <$> getLine | _ <- [1..4]]
1
2
3
4
"2 4 6 8"
Is it even possible to write this function as a one-liner?
Sure, but there is a problem with the last line of your main function. Because you're trying to apply intercalate " " to
[ show $ myFunc $ read getLine | c <- [1..num]]
I'm guessing you expect the latter to have type [String], but it is in fact not a well-typed expression. How can that be fixed? Let's first define
getOneInt :: IO Int
getOneInt = read <$> getLine
for convenience (we'll be using it multiple times in our code). Now, what you meant is probably something like
[ show . myFunc <$> getOneInt | c <- [1..num]]
which, if the type of myFunc aligns with the rest, has type [IO String]. You can then pass that to sequence in order to get a value of type IO [String] instead. Finally, you can "pass" that (using =<<) to
putStrLn . intercalate " "
in order to get the desired one-liner:
import Control.Monad ( replicateM )
import Data.List ( intercalate )
main :: IO ()
main = do
num <- getOneInt
putStrLn . intercalate " " =<< sequence [ show . myFunc <$> getOneInt | c <- [1..num]]
where
myFunc = (* 3) -- for example
getOneInt :: IO Int
getOneInt = read <$> getLine
In GHCi:
λ> main
3
45
23
1
135 69 3
Is the code idiomatic and readable, though? Not so much, in my opinion...
[...] what is the correct (read: the Haskell way) way to do this properly?
There is no "correct" way of doing it, but the following just feels more natural and readable to me:
import Control.Monad ( replicateM )
import Data.List ( intercalate )
main :: IO ()
main = do
n <- getOneInt
ns <- replicateM n getOneInt
putStrLn $ intercalate " " $ map (show . myFunc) ns
where
myFunc = (* 3) -- replace by your own function
getOneInt :: IO Int
getOneInt = read <$> getLine
Alternatively, if you want to eschew the do notation:
main =
getOneInt >>=
flip replicateM getOneInt >>=
putStrLn . intercalate " " . map (show . myFunc)
where
myFunc = (* 3) -- replace by your own function

What is the Haskell idiom for walking a file and filling a structure when only some of the data is interesting?

Often I find I need to parse a little bit of text. Usually the text is not lines of uniform data like CSV rather it is more unstructured. So the goal is not to turn each line into a Haskell data type but to gather up data into a structure.
In an imperative language I would write something like this.
values = {} # could just as easily be a class or C struct
for line in input_lines:
if line matches A:
parse out interesting piece
values[A] = parsed chunk
elif line matches B:
parse out interesting piece
values[B] = parsed chunk
...
elif line matches Z:
parse out interesting piece
values[Z] = parsed chunk
break # we know there is nothing else after this
do something with values
I wrote a bit of Haskell this morning to do the same thing using foldr.
This parses the output of rsync --stats. A sample file looks like this.
Number of files: 1
Number of files transferred: 0
Total file size: 4953701 bytes
Total transferred file size: 0 bytes
Literal data: 10 bytes
Matched data: 230 bytes
File list size: 43
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 11
Total bytes received: 57
sent 11 bytes received 57 bytes 12.36 bytes/sec
total size is 4953701 speedup is 72848.54
Small and simple to demonstrate my problem. This particular file format is representative of this recurring style of problem where I want to quickly read 3 or 5 bits from a file and doing something else with the results. In an imperative language I'd just toss them into a few variables, a dictionary, something. The Haskell below is my attempt at a similar approach.
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Map as M
import qualified Data.Text as T
import Data.Text (Text)
import qualified Data.Text.IO as TIO
import Data.Text.Read (decimal)
import System.Environment (getArgs)
stats_map :: M.Map Text Int
stats_map = foldr (uncurry M.insert) M.empty [("Total file size", 1),
("Literal data", 2),
("Matched data", 3)]
getStatsMap :: Text -> M.Map Text Integer -> M.Map Text Integer
getStatsMap t rm = doMatch chunks rm
where
chunks = [ T.strip chunk | chunk <- T.splitOn ":" t ]
doMatch :: [Text] -> M.Map Text Integer -> M.Map Text Integer
doMatch (f1:f2:_) rm' =
case M.lookup f1 stats_map of
(Just _) -> case decimal . head . T.words $ f2 of
Left _ -> rm'
Right (x,_) -> M.insert f1 x rm'
Nothing -> rm'
doMatch _ rm' = rm'
parseStats :: [Text] -> M.Map Text Integer
parseStats ts = foldr getStatsMap M.empty ts
readStats :: FilePath -> IO [Text]
readStats filename = TIO.readFile filename >>= return . T.lines
main :: IO ()
main = do
[filename] <- getArgs
lines <- readStats filename
putStrLn . show . parseStats $ lines
Unlike in the imperative version I cannot break the foldr execution though.
Laziness cannot rescue me here. Parsec, attoparsec and friends are both overkill and not exactly what I am looking for this kind of task.
How can I approach this common imperative task in a more Haskell way?
I've gone for simple data structures to try to emphasise that the behaviour's there in the standard ones if you want it:
First version - using catMaybes and take to ignore irrelevant data and shortcut:
import Data.Maybe (catMaybes)
import Data.Char (isDigit)
import Control.Monad (msum)
-- maybe get an int if the key matches before :
get :: String -> String -> Maybe Int
get key input = let (l,r) = break (==':') input in
if l == key then Just . read . filter isDigit $ r
else Nothing
-- get any that match
getAny :: [String] -> String -> Maybe Int
getAny keys input = msum $ map (flip get input) keys
-- get all that match at least one
getThese :: [String] -> String -> [Int]
getThese keys = take (length keys) . catMaybes . map (getAny keys) . lines
This gives you the output you were after:
fmap (getThese ["Total file size","Literal data","Matched data"]) (readFile "example.txt") >>= print
[4953701,10,230]
and we can check that it's shortcutting by feeding it a bomb to eat:
> getThese ["a"] (unlines ["no","a: 5",undefined])
[5]
Sometimes recursion is simpler
Pick out one element for each predicate in order:
oneEach :: [(a->Bool)] -> [a] -> [a]
oneEach [] _ = []
oneEach _ [] = error "oneEach: run out of input while still looking"
oneEach qs#(p:ps) (i:is) | p i = i : oneEach ps is
| otherwise = oneEach qs is
Compose some functions to split the string and pull out the ones we wanted, then read the data. This assumes you want all the digits to the right of the : as your Int
getInOrder :: [String] -> String -> [Int]
getInOrder keys = map (read.filter isDigit.snd)
. oneEach (map ((.fst).(==)) keys)
. map (break (==':'))
. lines
which works:
main = fmap (getInOrder ["Total file size","Literal data","Matched data"]) (readFile "example.txt") >>= print
[4953701,10,230]
This version is primitive in some ways (hard codes some things, doesn't handle ordering), but may be more readable:
import System.Environment (getArgs)
import Data.List.Utils
import Data.Char
main = do
[filename] <- getArgs
txt <- readFile filename
let ls = lines txt
let ils = filter interestingLine ls
putStrLn $ show $ map fmt (filter (/="") ils)
interestingLine l = startswith "Literal data" l
|| startswith "Matched data" l
|| startswith "Total file size" l
fmt :: String -> (String,Int)
fmt l | startswith "Literal data" l = (take 14 l,(read $ filter isNumber l))
| startswith "Matched data" l = (take 14 l,(read $ filter isNumber l))
| startswith "Total file size" l = (take 17 l,(read $ filter isNumber l))
| otherwise = error "fmt: unmatched line, look also at interestingLine"

Resources