How do I read 1 byte from STDIN? - haskell

How do I read one byte from STDIN?
I tried searching for IO Word8 on Hoogle but there is nothing useful.
The closest I can find is System.IO.getChar, but it reads a Char not a Word8.

I think the easiest option is to use hGet from Data.ByteString like this:
import qualified Data.ByteString as B
import System.IO (stdin)
import Data.Word (Word8)
getByte :: IO Word8
getByte = B.head <$> B.hGet stdin 1
You can alternatively use hSetBinaryMode like this:
import System.IO
import Data.Word (Word8)
getByte :: IO Word8
getByte = do
hSetBinaryMode stdin True
c <- getChar
hSetBinaryMode stdin False
pure $! fromIntegral (fromEnum c)

Related

Efficient streaming and manipulation of a byte stream in Haskell

While writing a deserialiser for a large (<bloblength><blob>)* encoded binary file I got stuck with the various Haskell produce-transform-consume libraries. So far I'm aware of four streaming libraries:
Data.Conduit: Widely used, has very careful resource management
Pipes: Similar to conduit (Haskell Cast #6 nicely reveals the differences between conduit and pipes)
Data.Binary.Get: Offers useful functions such as getWord32be, but the streaming example is awkward
System.IO.Streams: Seems to be the easiest one to use
Here's a stripped down example of where things go wrong when I try to do Word32 streaming with conduit. A slightly more realistic example would first read a Word32 that determines the blob length and then yield a lazy ByteString of that length (which is then deserialised further).
But here I just try to extract Word32's in streaming fashion from a binary file:
module Main where
-- build-depends: bytestring, conduit, conduit-extra, resourcet, binary
import Control.Monad.Trans.Resource (MonadResource, runResourceT)
import qualified Data.Binary.Get as G
import qualified Data.ByteString as BS
import qualified Data.ByteString.Char8 as C
import qualified Data.ByteString.Lazy as BL
import Data.Conduit
import qualified Data.Conduit.Binary as CB
import qualified Data.Conduit.List as CL
import Data.Word (Word32)
import System.Environment (getArgs)
-- gets a Word32 from a ByteString.
getWord32 :: C.ByteString -> Word32
getWord32 bs = do
G.runGet G.getWord32be $ BL.fromStrict bs
-- should read BytesString and return Word32
transform :: (Monad m, MonadResource m) => Conduit BS.ByteString m Word32
transform = do
mbs <- await
case mbs of
Just bs -> do
case C.null bs of
False -> do
yield $ getWord32 bs
leftover $ BS.drop 4 bs
transform
True -> return ()
Nothing -> return ()
main :: IO ()
main = do
filename <- fmap (!!0) getArgs -- should check length getArgs
result <- runResourceT $ (CB.sourceFile filename) $$ transform =$ CL.consume
print $ length result -- is always 8188 for files larger than 32752 bytes
The output of the program is just the number of Word32's that were read. It turns out the stream terminates after reading the first chunk (about 32KiB). For some reason mbs is never Nothing, so I must check null bs which stops the stream when the chunk is consumed. Clearly, my conduit transform is faulty. I see two routes to a solution:
The await doesn't want to go to the second chunk of the ByteStream, so is there another function that pulls the next chunk? In examples I've seen (e.g. Conduit 101) this is not how it's done
This is just the wrong way to set up transform.
How is this done properly? Is this the right way to go? (Performance does matter.)
Update: Here's a BAD way to do it using Systems.IO.Streams:
module Main where
import Data.Word (Word32)
import System.Environment (getArgs)
import System.IO (IOMode (ReadMode), openFile)
import qualified System.IO.Streams as S
import System.IO.Streams.Binary (binaryInputStream)
import System.IO.Streams.List (outputToList)
main :: IO ()
main = do
filename : _ <- getArgs
h <- openFile filename ReadMode
s <- S.handleToInputStream h
i <- binaryInputStream s :: IO (S.InputStream Word32)
r <- outputToList $ S.connect i
print $ last r
'Bad' means: Very demanding in time and space, does not handle Decode exception.
Your immediate problem is caused by how you are using leftover. That function is used to "Provide a single piece of leftover input to be consumed by the next component in the current monadic binding", and so when you give it bs before looping with transform you are effectively throwing away the rest of the bytestring (i.e. what is after bs).
A correct solution based on your code would use the incremental input interface of Data.Binary.Get to replace your yield/leftover combination with something that consumes each chunk fully. A more pragmatic approach, though, is using the binary-conduit package, which provides that in the shape of conduitGet (its source gives a good idea of what a "manual" implementation would look like):
import Data.Conduit.Serialization.Binary
-- etc.
transform :: (Monad m, MonadResource m) => Conduit BS.ByteString m Word32
transform = conduitGet G.getWord32be
One caveat is that this will throw a parse error if the total number of bytes is not a multiple of 4 (i.e. the last Word32 is incomplete). In the unlikely case of that not being what you want, a lazy way out would be simply using \bs -> C.take (4 * truncate (C.length bs / 4)) bs on the input bytestring.
With pipes (and pipes-group and pipes-bytestring) the demo problem reduces to combinators. First we resolve the incoming undifferentiated byte stream into little 4 byte chunks:
chunksOfStrict :: (Monad m) => Int -> Producer ByteString m r -> Producer ByteString m r
chunksOfStrict n = folds mappend mempty id . view (Bytes.chunksOf n)
then we map these to Word32s and (here) count them.
main :: IO ()
main = do
filename:_ <- getArgs
IO.withFile filename IO.ReadMode $ \h -> do
n <- P.length $ chunksOfStrict 4 (Bytes.fromHandle h) >-> P.map getWord32
print n
This will fail if we have less than 4 bytes or otherwise fail to parse but we can as well map with
getMaybeWord32 :: ByteString -> Maybe Word32
getMaybeWord32 bs = case G.runGetOrFail G.getWord32be $ BL.fromStrict bs of
Left r -> Nothing
Right (_, off, w32) -> Just w32
The following program will then print the parses for the valid 4 byte sequences
main :: IO ()
main = do
filename:_ <- getArgs
IO.withFile filename IO.ReadMode $ \h -> do
runEffect $ chunksOfStrict 4 (Bytes.fromHandle h)
>-> P.map getMaybeWord32
>-> P.concat -- here `concat` eliminates maybes
>-> P.print
There are other ways of dealing with failed parses, of course.
Here, though, is something closer to the program you asked for. It takes a four byte segment from a byte stream (Producer ByteString m r) and reads it as a Word32 if it is long enough; it then takes that many of the incoming bytes and accumulates them into a lazy bytestring, yielding it. It just repeats this until it runs out of bytes. In main below, I print each yielded lazy bytestring that is produced:
module Main (main) where
import Pipes
import qualified Pipes.Prelude as P
import Pipes.Group (folds)
import qualified Pipes.ByteString as Bytes ( splitAt, fromHandle, chunksOf )
import Control.Lens ( view ) -- or Lens.Simple (view) -- or Lens.Micro ((.^))
import qualified System.IO as IO ( IOMode(ReadMode), withFile )
import qualified Data.Binary.Get as G ( runGet, getWord32be )
import Data.ByteString ( ByteString )
import qualified Data.ByteString.Lazy.Char8 as BL
import System.Environment ( getArgs )
splitLazy :: (Monad m, Integral n) =>
n -> Producer ByteString m r -> m (BL.ByteString, Producer ByteString m r)
splitLazy n bs = do
(bss, rest) <- P.toListM' $ view (Bytes.splitAt n) bs
return (BL.fromChunks bss, rest)
measureChunks :: Monad m => Producer ByteString m r -> Producer BL.ByteString m r
measureChunks bs = do
(lbs, rest) <- lift $ splitLazy 4 bs
if BL.length lbs /= 4
then rest >-> P.drain -- in fact it will be empty
else do
let w32 = G.runGet G.getWord32be lbs
(lbs', rest') <- lift $ splitLazy w32 bs
yield lbs
measureChunks rest
main :: IO ()
main = do
filename:_ <- getArgs
IO.withFile filename IO.ReadMode $ \h -> do
runEffect $ measureChunks (Bytes.fromHandle h) >-> P.print
This is again crude in that it uses runGet not runGetOrFail, but this is easily repaired. The pipes standard procedure would be to stop the stream transformation on a failed parse and return the unparsed bytestream.
If you were anticipating that the Word32s were for large numbers, so that you did not want to accumulate the corresponding stream of bytes as a lazy bytestring, but say write them to different files without accumulating, we could change the program pretty easily to do that. This would require a sophisticated use of conduit but is the preferred approach with pipes and streaming.
Here's a relatively straightforward solution that I want to throw into the ring. It's a repeated use of splitAt wrapped into a State monad that gives an interface identical to (a subset of) Data.Binary.Get. The resulting [ByteString] is obtained in main with a whileJust over getBlob.
module Main (main) where
import Control.Monad.Loops
import Control.Monad.State
import qualified Data.Binary.Get as G (getWord32be, runGet)
import qualified Data.ByteString.Lazy as BL
import Data.Int (Int64)
import Data.Word (Word32)
import System.Environment (getArgs)
-- this is going to mimic the Data.Binary.Get.Get Monad
type Get = State BL.ByteString
getWord32be :: Get (Maybe Word32)
getWord32be = state $ \bs -> do
let (w, rest) = BL.splitAt 4 bs
case BL.length w of
4 -> (Just w', rest) where
w' = G.runGet G.getWord32be w
_ -> (Nothing, BL.empty)
getLazyByteString :: Int64 -> Get BL.ByteString
getLazyByteString n = state $ \bs -> BL.splitAt n bs
getBlob :: Get (Maybe BL.ByteString)
getBlob = do
ml <- getWord32be
case ml of
Nothing -> return Nothing
Just l -> do
blob <- getLazyByteString (fromIntegral l :: Int64)
return $ Just blob
runGet :: Get a -> BL.ByteString -> a
runGet g bs = fst $ runState g bs
main :: IO ()
main = do
fname <- head <$> getArgs
bs <- BL.readFile fname
let ls = runGet loop bs where
loop = whileJust getBlob return
print $ length ls
There's no error handling in getBlob, but it's easy to extend. Time and space complexity is quite good, as long as the resulting list is used carefully. (The python script that creates some random data for consumption by the above is here).

Efficient chunking of conduit for strict bytestring

This is a followup to this earlier question. I have a conduit source (from Network.HTTP.Conduit) which is strict ByteString. I will like to recombine them into larger chunks (to send over network to another client, after another encoding and conversion to lazy bytestring). I wrote chunksOfAtLeast conduit, derived from the answer in above question which seems to work pretty well. I am wondering if there is any further scope for improving it performance-wise.
import Data.Conduit as C
import Control.Monad.IO.Class
import Control.Monad
import Data.Conduit.Combinators as CC
import Data.Conduit.List as CL
import Data.ByteString.Lazy as LBS hiding (putStrLn)
import Data.ByteString as BS hiding (putStrLn)
chunksOfAtLeast :: Monad m => Int -> Conduit BS.ByteString m BS.ByteString
chunksOfAtLeast chunkSize =
loop
where
loop = do
bs <- takeE chunkSize =$= ((BS.concat . ($ [])) <$> CL.fold (\front next -> front . (next:)) id)
unless (BS.null bs) $ do
yield bs
loop
main = do
yieldMany ["hello", "there", "world!"] $$ chunksOfAtLeast 8 =$ CL.mapM_ Prelude.print
Getting optimal performance is always a case of trying something and benchmarking it, so I can't tell you with certainty that I'm offering you something more efficient. That said, combining smaller chunks of data into larger chunks is a primary goal of builders, so leveraging them may be more efficient. Here's an example:
{-# LANGUAGE OverloadedStrings #-}
import Conduit
import Data.ByteString (ByteString)
import Data.ByteString.Builder (byteString)
import Data.Conduit.ByteString.Builder
bufferChunks :: Conduit ByteString IO ByteString
bufferChunks = mapC byteString =$= builderToByteString
main :: IO ()
main = yieldMany ["hello", "there", "world!"] $$ bufferChunks =$ mapM_C print

Converting ByteString Generated by System.Entropy to Text

When I run this code, I get a decode error from Data.Text. What am I doing wrong?
import Data.Text (Text, pack, unpack)
import Data.Text.Encoding (decodeUtf8)
import Data.ByteString (ByteString)
import System.Entropy
randBS :: IO ByteString
randBS = do
randBytes <- getEntropy 2048
return randBytes
main :: IO ()
main = do
r <- randBS
putStrLn $ unpack $ decodeUtf8 r
Runtime Error:
Cannot decode byte '\xc4': Data.Text.Internal.Encoding.Fusion.streamUtf8:
Invalid UTF-8 stream
I would like to generate some random bytes that will be used as an auth token.
I am on Mac OS X (Yosemite) and GHC Version 7.10.1
randBS returns random bytes not utf-8 encoded data!
What you have is not a representation of Text so it doesn't matter which function you use you will encounter some decoding error, and so you'll have to use something like decodeUtf8With and use an error handler to replace invalid bytes with their literal counterpart.
Something like:
import Data.Text (Text, pack, unpack)
import Data.Text.Encoding (decodeUtf8With)
import Data.ByteString (ByteString)
import Data.Char (chr)
import Control.Applicative ((<$>))
import System.Entropy
handler _ x = chr <$> fromIntegral <$> x
randBS :: IO ByteString
randBS = do
randBytes <- getEntropy 2048
return randBytes
main :: IO ()
main = do
r <- randBS
putStrLn $ unpack $ decodeUtf8With handler r
Not tested, in this moment I don't have GHC installed :s
Probably even better is to simply use hexadecimal encoding instead of utf-8 + error handler. You can do so with the base16-bytestring library. So you'd first use the encode :: ByteString -> ByteString to obtain a representation with only ASCII values:
import Data.Text (Text, pack, unpack)
import Data.ByteString (ByteString)
import Data.ByteString.Encoding (decodeUtf8)
import Data.ByteString.Base16 (encode)
import System.Entropy
--- ... randBS as before
main = do
r <- randBS
putStrLn $ unpack $ decodeUtf8 $ encode r

Idiomatic io-streams directory traversal

I was discussing some code on Reddit, and it made me curious about how this would be implemented in io-streams. Consider the following code which traverses a directory structure and prints out all of the filenames:
import Control.Exception (bracket)
import qualified Data.Foldable as F
import Data.Streaming.Filesystem (closeDirStream, openDirStream,
readDirStream)
import System.Environment (getArgs)
import System.FilePath ((</>))
printFiles :: FilePath -> IO ()
printFiles dir = bracket
(openDirStream dir)
closeDirStream
loop
where
loop ds = do
mfp <- readDirStream ds
F.forM_ mfp $ \fp' -> do
let fp = dir </> fp'
ftype <- getFileType fp
case ftype of
FTFile -> putStrLn fp
FTFileSym -> putStrLn fp
FTDirectory -> printFiles fp
_ -> return ()
loop ds
main :: IO ()
main = getArgs >>= mapM_ printFiles
Instead of simply printing the files, suppose we wanted to create some kind of streaming filepath representation. I know how this would work in enumerator, conduit, and pipes. However, since the intermediate steps require acquisition of a scarce resource (the DirStream), I'm not sure what the implementation would be for io-streams. Can someone provide an example of how that would be done?
For comparison, here's the conduit implementation, which is made possible via bracketP and MonadResource. And here's how the conduit code would be used to implemented the same file printing program as above:
import Control.Monad.IO.Class (liftIO)
import Control.Monad.Trans.Resource (runResourceT)
import Data.Conduit (($$))
import Data.Conduit.Filesystem (sourceDirectoryDeep)
import qualified Data.Conduit.List as CL
import System.Environment (getArgs)
main :: IO ()
main =
getArgs >>= runResourceT . mapM_ eachRoot
where
-- False means don't traverse dir symlinks
eachRoot root = sourceDirectoryDeep False root
$$ CL.mapM_ (liftIO . putStrLn)
Typical style would be to do something like this:
traverseDirectory :: RawFilePath -> (InputStream RawFilePath -> IO a) -> IO a
i.e. a standard "with-" function, with the obvious implementation.
Edit: added a working example implementation: https://gist.github.com/gregorycollins/00c51e7e33cf1f9c8cc0
It's not exactly complicated but it's also not as trivial as I had first suggested.

Reading in a binary file in haskell

How could I write a function with a definition something like...
readBinaryFile :: Filename -> IO Data.ByteString
I've got the functional parts of Haskell down, but the type system and monads still make my head hurt. Can someone write and explain how that function works to me?
import Data.ByteString.Lazy
readFile fp
easy as pie man. Knock off the lazy if you don't want the string to be lazy.
import Data.ByteString.Lazy as BS
import Data.Word
import Data.Bits
fileToWordList :: String -> IO [Word8]
fileToWordList fp = do
contents <- BS.readFile fp
return $ unpack contents
readBinaryFile :: Filename -> IO Data.ByteString
This is simply the Data.ByteString.readFile function, which you should never have to write, since it is in the bytestring package.

Resources