Neater binary file processing in Haskell

Neater binary file processing in Haskell - haskell

This is a general question inspired by a particular piece of code I wrote that I'm not happy with. I'm using Data.Binary.Get to grab some data from a binary file. Some of the code looks a bit like this
import Data.Binary.Get
data Thing = Thing
{
aaa :: Integer,
bbb :: Integer,
ccc :: Integer
} deriving (Show)
getThing :: Get Thing
getThing = do
laaa <- getWord8 >>= \x -> return (toInteger x)
lbbb <- getWord16host >>= \x -> return (toInteger x)
lccc <- getWord32host >>= \x -> return (toInteger x)
return $ Thing laaa lbbb lccc
The "getThing" function is really long. I am hoping there is a nice way to do something like the following pseudocode or maybe something even more concise.
[laaa, lbbb, lccc] <- MAGIC [getword8, getword16, getWord32] >>= \x -> return (toInteger x)
What have you got?

I would write
getThing :: Get Thing
getThing = Thing <$> intFrom getWord8 <*> intFrom getWord16 <*> intFrom getWord32
where
where intFrom x = toInteger <$> x
The magic you are looking for is known as sequence, but you can't put IO Word8, IO Word16 and IO Word32 in the same list:
getThing :: Get Thing
getThing = do
[laaa, lbbb, lccc] <- sequence [toInteger <$> getword8, toInteger <$> getword16, toInteger <$> getWord32]
return $ Thing laaa lbbb lccc

Related

Point free version for readMaybe

I want to write a function to read an Int without do notation. It works (see below), but I was wondering if it the bit around readMaybe can be written in point free form (or cleaned up a bit in some other way)?
main :: IO ()
main = getLine >>= (\x -> return $ (readMaybe x :: Maybe Int)) >>= print

Step 1: Replace the lambda with its pointfree equivalent:
main :: IO ()
main = getLine >>= return . (readMaybe :: String -> Maybe Int) >>= print
Step 2: Replace m >>= return . f with f <$> m:
main :: IO ()
main = (readMaybe :: String -> Maybe Int) <$> getLine >>= print
Step 3: Replace f <$> m >>= g with m >>= g . f:
main :: IO ()
main = getLine >>= print . (readMaybe :: String -> Maybe Int)
Step 4: Use a type application instead of writing out a long, awkward type:
{-# LANGUAGE TypeApplications #-}
main :: IO ()
main = getLine >>= print . readMaybe #Int
As an alternative to using <$> in steps 2 and 3, you can accomplish the same with just the monad laws, like this (picking up after step 1):
Replace m >>= f >>= g with m >>= \x -> f x >>= g (associativity):
main :: IO ()
main = getLine >>= \x -> (return . (readMaybe :: String -> Maybe Int)) x >>= print
Simplify the . away:
main :: IO ()
main = getLine >>= \x -> return ((readMaybe :: String -> Maybe Int) x) >>= print
Replace return x >>= f with f x (left identity):
main :: IO ()
main = getLine >>= \x -> print ((readMaybe :: String -> Maybe Int) x)
Now just replace that new lambda with its pointfree equivalent, and you end up in the exact same place as step 3.

Why doesn't this simple composition work?

I was recently in need of putting head in between two monadic operations. Here's the SSCCE:
module Main where
f :: IO [Int]
f = return [1..5]
g :: Int -> IO ()
g = print
main = do
putStrLn "g <$> head <$> f"
g <$> head <$> f
putStrLn "g . head <$> f"
g . head <$> f
putStrLn "head <$> f >>= g"
head <$> f >>= g
This program is well-formed and compiles without warnings. However, only one version out of 3 above works1. Why is that?
And specifically, what would be the best way to link f and g together with head in the middle? I ended up using the 3rd one (in the form of do notation), but I don't really like it, since it should be a trivial one-liner2.
1 Spoiler alert: the 3rd one is the only one that prints 1; the other two are silent, both under runhaskell and repl.
2 I do realize that those are all one-liners, but the order of operations feels really confusing in the only one that works.

Probably the best way to write this is:
f >>= g . head
or in a more verbose form:
f >>= (g . head)
so we basically perform an fmap on the value for f (we thus take the head of the values wrapped in the IO monad), and then we pass then to g, like:
(head <$> f) >>= g
is semantically the same.
But now what happens if we use g <$> head <$> f? Let us first analyze the types:
f :: IO [Int]
g :: Int -> IO ()
(<$>) :: Functor m => (a -> b) -> m a -> m b
(I used m here to avoid confusion with the f function)
The canonical form of this is:
((<$>) ((<$>) g head) f)
The second (<$>) takes a g :: Int -> IO () and head :: [c] -> c as parameters, so that means that a ~ Int, b ~ IO (), and m ~ (->) [c]. So the result is:
(<$>) g head :: (->) [c] (IO ())
or less verbose:
g <$> head :: [c] -> IO ()
The first (<$>) function thus takes as parameters g <$> head :: [c] -> IO (), and IO [Int], so that means that m ~ IO, a ~ [Int], c ~ Int, b ~ IO (), and hence we obtain the type:
(<$>) (g <$> head) f :: IO (IO ())
We thus do not perform any real action: we fmap the [Int] list to an IO action (that is wrapped in the IO). You could see it as return (print 1): you do not "evaluate" the print 1, but you return that wrapped in an IO.
You can of course "absorb" the outer IO here, and then use the inner IO, like:
evalIO :: IO (IO f) -> IO f
evalIO res = do
f <- res
f
or shorter:
evalIO :: IO (IO f) -> IO f
evalIO res = res >>= id
(this can be generalized to all sorts of Monads, but this is irrelevant here).
The evalIO is also known as join :: Monad m => m (m a) -> m a.

The first and second are exactly the same, because <$> is left-associative and head is a function, and <$> is . in the function monad. Then,
g . head <$> f
= fmap (print . head) (return [1..5] :: IO [Int])
= do { x <- (return [1..5] :: IO [Int])
; return ( print (head x) ) }
= do { let x = [1..5]
; return ( print (head x) ) } :: IO _whatever
=
return ( print 1 ) :: IO (IO ())
We have one too many returns there. In fact,
= fmap (print . head) (return [1..5] :: IO [Int])
= return (print (head [1..5]))
= return (print 1)
is a shorter derivation.
The third one is
(head <$> f) >>= g
= (fmap head $ return [1..5]) >>= print
= (return (head [1..5])) >>= print
= (return 1) >>= print
which is obviously OK.

Lazy list wrapped in IO

Suppose the code
f :: IO [Int]
f = f >>= return . (0 :)
g :: IO [Int]
g = f >>= return . take 3
When I run g in ghci, it cause stackoverflow. But I was thinking maybe it could be evaluated lazily and produce [0, 0, 0] wrapped in IO. I suspect IO is to blame here, but I really have no idea. Obviously the following works:
f' :: [Int]
f' = 0 : f'
g' :: [Int]
g' = take 3 f'
Edit: In fact I am not interested in having such a simple function f, original code looked more along the lines:
h :: a -> IO [Either b c]
h a = do
(r, a') <- h' a
case r of
x#(Left _) -> h a' >>= return . (x :)
y#(Right _) -> return [y]
h' :: IO (Either b c, a)
-- something non trivial
main :: IO ()
main = mapM_ print . take 3 =<< h a
h does some IO computations and stores invalid (Left) responses in a list until a valid response (Right) is produced. The attempt is to construct the list lazily even though we are in the IO monad. So that someone reading the result of h can start consuming the list even before it is complete (because it may even be infinite). And if the one reading the results cares only for the first 3 entries no matter what, the rest of the list does not even have to be constructed. And I am getting the feeling that this will not be possible :/.

Yes, IO is to blame here. >>= for IO is strict in the "state of the world". If you write m >>= h, you'll get an action that first performs the action m, then applies h to the result, and finally performs the action h yields. It doesn't matter that your f action doesn't "do anything"; it has to be performed anyway. Thus you end up in an infinite loop starting the f action over and over.
Thankfully, there is a way around this, because IO is an instance of MonadFix. You can "magically" access the result of an IO action from within that action. Critically, that access must be sufficiently lazy, or you'll throw yourself into an infinite loop.
import Control.Monad.Fix
import Data.Functor ((<$>))
f :: IO [Int]
f = mfix (\xs -> return (0 : xs))
-- This `g` is just like yours, but prettier IMO
g :: IO [Int]
g = take 3 <$> f
There's even a bit of syntactic sugar in GHC for this letting you use do notation with the rec keyword or mdo notation.
{-# LANGUAGE RecursiveDo #-}
f' :: IO [Int]
f' = do
rec res <- (0:) <$> (return res :: IO [Int])
return res
f'' :: IO [Int]
f'' = mdo
res <- f'
return (0 : res)
For more interesting examples of ways to use MonadFix, see the Haskell Wiki.

It sounds like you want a monad that mixes the capabilities of lists and IO. Luckily, that's just what ListT is for. Here's your example in that form, with an h' that computes the Collatz sequence and asks the user how they feel about each element in the sequence (I couldn't really think of anything convincing that fit the shape of your outline).
import Control.Monad.IO.Class
import qualified ListT as L
h :: Int -> L.ListT IO (Either String ())
h a = do
(r, a') <- liftIO (h' a)
case r of
x#(Left _) -> L.cons x (h a')
y#(Right _) -> return y
h' :: Int -> IO (Either String (), Int)
h' 1 = return (Right (), 1)
h' n = do
putStrLn $ "Say something about " ++ show n
s <- getLine
return (Left s, if even n then n `div` 2 else 3*n + 1)
main = readLn >>= L.traverse_ print . L.take 3 . h
Here's how it looks in ghci:
> main
2
Say something about 2
small
Left "small"
Right ()
> main
3
Say something about 3
prime
Left "prime"
Say something about 10
not prime
Left "not prime"
Say something about 5
fiver
Left "fiver"
I suppose modern approaches would use pipes or conduits or iteratees or something, but I don't know enough about them to talk about the tradeoffs compared to ListT.

I'm not sure if this is an appropriate usage, but unsafeInterleaveIO would get you the behavior you're asking for, by deferring the IO actions of f until the value inside of f is asked for:
module Tmp where
import System.IO.Unsafe (unsafeInterleaveIO)
f :: IO [Int]
f = unsafeInterleaveIO f >>= return . (0 :)
g :: IO [Int]
g = f >>= return . take 3
*Tmp> g
[0,0,0]

Mutually recursive IO definitions

I can write the following:
f :: [Int] -> [Int]
f x = 0:(map (+1) x)
g :: [Int] -> [Int]
g x = map (*2) x
a = f b
b = g a
main = print $ take 5 a
And things work perfectly fine (ideone).
However, lets say I want g to do something more complex than multiply by 2, like ask the user for a number and add that, like so:
g2 :: [Int] -> IO [Int]
g2 = mapM (\x -> getLine >>= (return . (+x) . read))
How do I then, well, tie the knot?
Clarification:
Basically I want the list of Ints from f to be the input of g2 and the list of Ints from g2 to be the input of f.

The effectful generalization of lists is ListT:
import Control.Monad
import Pipes
f :: ListT IO Int -> ListT IO Int
f x = return 0 `mplus` fmap (+ 1) x
g2 :: ListT IO Int -> ListT IO Int
g2 x = do
n <- x
n' <- lift (fmap read getLine)
return (n' + n)
a = f b
b = g2 a
main = runListT $ do
n <- a
lift (print n)
mzero
You can also implement take like functionality with a little extra code:
import qualified Pipes.Prelude as Pipes
take' :: Monad m => Int -> ListT m a -> ListT m a
take' n l = Select (enumerate l >-> Pipes.take n)
main = runListT $ do
n <- take' 5 a
lift (print n)
mzero
Example session:
>>> main
0
1<Enter>
2
2<Enter>
3<Enter>
7
4<Enter>
5<Enter>
6<Enter>
18
7<Enter>
8<Enter>
9<Enter>
10<Enter>
38
You can learn more about ListT by reading the pipes tutorial, specifically the section on ListT.

MonadFix instance for Rand monad

I would like to generate infinite stream of numbers with Rand monad from System.Random.MWC.Monad. If only there would be a MonadFix instance for this monad, or instance like this:
instance (PrimMonad m) => MonadFix m where
...
then one could write:
runWithSystemRandom (mfix (\ xs -> uniform >>= \x -> return (x:xs)))
There isn't one though.
I was going through MonadFix docs but I don't see an obvious way of implementing this instance.

You can write a MonadFix instance. However, the code will not generate an infinite stream of distinct random numbers. The argument to mfix is a function that calls uniform exactly once. When the code is run, it will call uniform exactly once, and create an infinite list containing the result.
You can try the equivalent IO code to see what happens:
import System.Random
import Control.Monad.Fix
main = print . take 10 =<< mfix (\xs -> randomIO >>= (\x -> return (x : xs :: [Int])))
It seems that you want to use a stateful random number generator, and you want to run the generator and collect its results lazily. That isn't possible without careful use of unsafePerformIO. Unless you need to produce many random numbers quickly, you can use a pure RNG function such as randomRs instead.

A question: how do you wish to generate your initial seed?
The problem is that MWS is built on the "primitive" package which abstracts only IO and strict (Control.Monad.ST.ST s). It does not also abstract lazy (Control.Monad.ST.Lazy.ST s).
Perhaps one could make instances for "primitive" to cover lazy ST and then MWS could be lazy.
UPDATE: I can make this work using Control.Monad.ST.Lazy by using strictToLazyST:
module Main where
import Control.Monad(replicateM)
import qualified Control.Monad.ST as S
import qualified Control.Monad.ST.Lazy as L
import qualified System.Random.MWC as A
foo :: Int -> L.ST s [Int]
foo i = do rest <- foo $! succ i
return (i:rest)
splam :: A.Gen s -> S.ST s Int
splam = A.uniformR (0,100)
getS :: Int -> S.ST s [Int]
getS n = do gen <- A.create
replicateM n (splam gen)
getL :: Int -> L.ST s [Int]
getL n = do gen <- createLazy
replicateM n (L.strictToLazyST (splam gen))
createLazy :: L.ST s (A.Gen s)
createLazy = L.strictToLazyST A.create
makeLots :: A.Gen s -> L.ST s [Int]
makeLots gen = do x <- L.strictToLazyST (A.uniformR (0,100) gen)
rest <- makeLots gen
return (x:rest)
main = do
print (S.runST (getS 8))
print (L.runST (getL 8))
let inf = L.runST (foo 0) :: [Int]
print (take 10 inf)
let inf3 = L.runST (createLazy >>= makeLots) :: [Int]
print (take 10 inf3)

(This would be better suited as a comment to Heatsink's answer, but it's a bit too long.)
MonadFix instances must adhere to several laws. One of them is left shrinking/thightening:
mfix (\x -> a >>= \y -> f x y) = a >>= \y -> mfix (\x -> f x y)
This law allows to rewrite your expression as
mfix (\xs -> uniform >>= \x -> return (x:xs))
= uniform >>= \x -> mfix (\xs -> return (x:xs))
= uniform >>= \x -> mfix (return . (x :))
Using another law, purity mfix (return . h) = return (fix h), we can further simplify to
= uniform >>= \x -> return (fix (x :))
and using the standard monad laws and rewriting fix (x :) as repeat x
= liftM (\x -> fix (x :)) uniform
= liftM repeat uniform
Therefore, the result is indeed one invocation of uniform and then just repeating the single value indefinitely.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Neater binary file processing in Haskell - haskell

Related

Point free version for readMaybe

Why doesn't this simple composition work?

Lazy list wrapped in IO

Mutually recursive IO definitions

MonadFix instance for Rand monad

Categories

Resources