How to check that two files are equal in Haskell - haskell

I'm learning Haskell and I need to compare two files.
I did not find a function that does this, so I coded it myself. Below is the function I came up with.
cmpFiles :: FilePath -> FilePath -> IO Bool
cmpFiles a b = withBinaryFile a ReadMode $ \ha ->
withBinaryFile b ReadMode $ \hb ->
fix (\loop -> do
isEofA <- hIsEOF ha
isEofB <- hIsEOF hb
if | isEofA && isEofB -> return True -- both files reached EOF
| isEofA || isEofB -> return False -- only one reached EOF
| otherwise -> do -- read content
x <- hGet ha 4028 -- TODO: How to use a constant?
y <- hGet hb 4028 -- TODO: How to use a constant?
if x /= y
then return False -- different content
else loop -- same content, contunue...
)
My questions are:
Is this code idiomatic? It looks very imperative rather than functional.
Is this code efficient (Layz IO issues with big files, performance...)?
Is there a more compact way to write it?

How about
cmpFiles a b = do
aContents <- readFile a
bContents <- readFile b
return (aContents == bContents)

You can even make a one-liner out of it:
cmpFiles a b = liftM2 (==) (readFile a) (readFile b)
This one is actually equivalent to Reid Barton's solution. Equivalent is not a weasel word here, if you take the definition of liftM2 from hackage
liftM2 f m1 m2 = do { x1 <- m1; x2 <- m2; return (f x1 x2) }
and insert (==) and the readFiles you are there immediately.
Laziness is your friend in haskell. The documentation of readFile states that the input is read lazily, i.e. only on demand. == is lazy, too. Thus the entire liftM22 ... reads the files only until it finds a difference.

cmpFiles a b = (==) <$> readFile a <*> readFile b

Related

How to get this function to be evaluated lazily

I have the following function:
main = do xs <- getContents
edLines <- ed $ lines xs
putStr $ unlines edLines
Firstly I used the working version main = interact (unlines . ed . lines) but changed the signature of ed since. Now it returns IO [String] instead of just [String] so I can't use this convenient definition any more.
The problem is that now my function ed is still getting evaluated partly but nothing is displayed till I close the stdin via CTRL + D.
Definition of ed:
ed :: Bool -> [EdCmdLine] -> IO EdLines
ed xs = concatM $ map toLinesExt $ scanl (flip $ edLine defHs) (return [Leaf ""]) xs where
toLinesExt :: IO [EdState] -> IO EdLines
toLinesExt rsIO = do
rs#(r:_) <- rsIO -- todo add fallback pattern with (error)
return $ fromEd r ++ [" "]
The scanl is definitely evaluated lazy because edLine is getting evaluated for sure (observable by the side effects).
I think it could have to do with concatM:
concatM :: (Foldable t, Monad m) => t (m [a]) -> m [a]
concatM xsIO = foldr (\accIO xIO -> do {x <- xIO; acc <- accIO; return $ acc ++ x}) (return []) xsIO
All I/O in Haskell is explicitly ordered. The last two lines of your main function desugar into something like
ed (lines xs) >>= (\edLines -> putStr $ unlines edLines)
>>= sequences all of the I/O effects on the left before all of those on the right. You're constructing an I/O action of the form generate line 1 >> ... >> generate line n >> output line 1 >> ... >> output line n.
This isn't really an evaluation order issue, it's a correctness issue. An implementation is free to evaluate in any order it wants, but it can't change the ordering of I/O actions that you specified, any more than it can reorder the elements of a list.
Here's a toy example showing what you need to do:
lineProducingActions :: [IO String]
lineProducingActions = replicate 10 getLine
wrongOrder, correctOrder :: IO ()
wrongOrder = do
xs <- sequence lineProducingActions
mapM_ putStrLn xs
correctOrder = do
let xs = [x >>= putStrLn | x <- lineProducingActions]
sequence_ xs
Note that you can decouple the producer and consumer while getting the ordering you want. You just need to avoid combining the I/O actions in the producer. I/O actions are pure values that can be manipulated just like any other values. They aren't side-effectful expressions that happen immediately as they're written. They happen, rather, in whatever order you glue them together in.
You would need to use unsafeInterleaveIO to schedule some of your IO actions for later. Beware that the IO actions may then be executed in a different order than you might first expect!
However, I strongly recommend not doing that. Change your IO [String] action to print each line as it's produced instead.
Alternately, if you really want to maintain the computation-as-pipeline view, check out one of the many streaming libraries available on Hackage (streamly, pipes, iteratees, conduit, machines, and probably half a dozen others).
Thanks to #benrg answer I was able to solve the issue with the following code:
ed :: [EdCmdLine] -> [IO EdLines]
ed cmds = map (>>= return . toLines . head) $ edHistIO where
toLines :: EdState -> EdLines
toLines r = fromEd r ++ [" "]
edHistIO = edRec defHs cmds (return [initState])
edRec :: [HandleHandler] -> [EdCmdLine] -> IO EdHistory -> [IO EdHistory]
edRec _ [] hist = [hist] -- if CTRL + D
edRec defHs (cmd:cmds) hist = let next = edLine defHs cmd hist in next : edRec defHs cmds next
main = getContents >>= mapM_ (>>= (putStr . unlines)) . ed . lines

Is this syntax as expressive as the do-notation?

The do notation allows us to express monadic code without overwhelming nestings, so that
main = getLine >>= \ a ->
getLine >>= \ b ->
putStrLn (a ++ b)
can be expressed as
main = do
a <- getLine
b <- getLine
putStrLn (a ++ b)
Suppose, though, the syntax allows ... #expression ... to stand for do { x <- expression; return (... x ...) }. For example, foo = f a #(b 1) c would be desugared as: foo = do { x <- b 1; return (f a x c) }. The code above could, then, be expressed as:
main = let a = #getLine in
let b = #getLine in
putStrLn (a ++ b)
Which would be desugared as:
main = do
x <- getLine
let a = x in
return (do
x' <- getLine
let b = x' in
return (putStrLn (a ++ b)))
That is equivalent. This syntax is appealing to me because it seems to offer the same functionality as the do-notation, while also allowing some shorter expressions such as:
main = putStrLn (#(getLine) ++ #(getLine))
So, I wonder if there is anything defective with this proposed syntax, or if it is indeed complete and equivalent to the do-notation.
putStrLn is already String -> IO (), so your desugaring ... return (... return (putStrLn (a ++ b))) ends up having type IO (IO (IO ())), which is likely not what you wanted: running this program won't print anything!
Speaking more generally, your notation can't express any do-block which doesn't end in return. [See Derek Elkins' comment.]
I don't believe your notation can express join, which can be expressed with do without any additional functions:
join :: Monad m => m (m a) -> m a
join mx = do { x <- mx; x }
However, you can express fmap constrained to Monad:
fmap' :: Monad m => (a -> b) -> m a -> m b
fmap' f mx = f #mx
and >>= (and thus everything else) can be expressed using fmap' and join. So adding join would make your notation complete, but still not convenient in many cases, because you end up needing a lot of joins.
However, if you drop return from the translation, you get something quite similar to Idris' bang notation:
In many cases, using do-notation can make programs unnecessarily verbose, particularly in cases such as m_add above where the value bound is used once, immediately. In these cases, we can use a shorthand version, as follows:
m_add : Maybe Int -> Maybe Int -> Maybe Int
m_add x y = pure (!x + !y)
The notation !expr means that the expression expr should be evaluated and then implicitly bound. Conceptually, we can think of ! as being a prefix function with the following type:
(!) : m a -> a
Note, however, that it is not really a function, merely syntax! In practice, a subexpression !expr will lift expr as high as possible within its current scope, bind it to a fresh name x, and replace !expr with x. Expressions are lifted depth first, left to right. In practice, !-notation allows us to program in a more direct style, while still giving a notational clue as to which expressions are monadic.
For example, the expression:
let y = 42 in f !(g !(print y) !x)
is lifted to:
let y = 42 in do y' <- print y
x' <- x
g' <- g y' x'
f g'
Adding it to GHC was discussed, but rejected (so far). Unfortunately, I can't find the threads discussing it.
How about this:
do a <- something
b <- somethingElse a
somethingFinal a b

Haskell memory usage and IO

I had just written a piece of Haskell code where in order to debug my code I put in a bunch of print statements in my code (so, my most important function returned IO t, when it just needed to return t) and I saw that this function, on a successful run, would take up a lot of memory (roughly 1.2GB). Once I saw that the program was working fine, I removed all the print statements from the function and ran it, only to realize that it was giving me this error:
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.
Even though it was the same exact piece of code, for some reason the print statements made it ignore stack space overflow. Can anyone enlighten me as to why this happens?
I know I haven't provided my code which might make it harder to answer this question, but I've hacked a bunch of things together and it doesn't look very pretty so I doubt it would be useful and I am fairly certain that the only difference is the print statements.
EDIT:
Since people really wanted to see the code here is the relevant part:
linkCallers :: ([Int], Int, Int, I.IntDisjointSet, IntMap Int) -> ([Int], Int, Int, I.IntDisjointSet, IntMap Int)
linkCallers ([], x, y, us, im) = ([], x, y, us, im)
linkCallers ((a:b:r), x, y, us, im) = if a == b
then (r, x, y+1, us, im)
else if sameRep
then (r, x+1, y+1, us, im)
else (r, x+1, y+1, us', im')
where
ar = fst $ I.lookup a us
br = fst $ I.lookup b us
sameRep = case ar of
Nothing -> False
_ -> ar == br
as' = ar >>= flip lookup im
bs' = br >>= flip lookup im
totalSize = do
asize <- as'
bsize <- bs'
return $ asize + bsize
maxSize = (convertMaybe as') + (convertMaybe bs')
us' = I.union a b $ I.insert a $ I.insert b $ us
newRep = fromJust $ fst $ I.lookup a us'
newRep' = fromJust $ fst $ I.lookup b us'
im'' = case ar of
Nothing -> case br of
Nothing -> im
Just bk -> delete bk im
Just ak -> delete ak $ case br of
Nothing -> im
Just bk -> delete bk im
im' = case totalSize of
Nothing -> insert newRep maxSize im''
Just t -> insert newRep t im''
startLinkingAux' (c,x,y,us,im) = let t#(_,x',_,us',im') = linkCallers (c,x,y,us,im) in
case (fst $ I.lookup primeMinister us') >>= flip lookup im' >>= return . (>=990000) of
Just True -> x'
_ -> startLinkingAux' t
startLinkingAux' used to look something like this:
startLinkingAux' (c,x,y,us,im) = do
print (c,x,y,us,im)
let t#(_,x',_,us',im') = linkCallers (c,x,y,us,im) in
case (fst $ I.lookup primeMinister us') >>= flip lookup im' >>= return . (>=990000) of
Just True -> return x'
_ -> startLinkingAux' t
There could be a memory leak in one of the arguments. Probably the first thing I'd try would be to ask the author of disjoint-set to add a RFData instance for IntDisjointSet (or do it yourself, looking at the source code, it'd fairly easy). Then try calling force on all values returned by linkCallers to see if it helps.
Second, you're not using disjoint-set right. The main idea of the algorithm is that lookups compress paths in the set. This is what gives it it's great performance! So every time you make a lookup, you should replace your old set with a new one. But this makes using a disjoint set quite clumsy in a functional language. It'd suggest to use the State monad for this and use it internally in linkCallers, as one big do block instead of where, just passing the starting set and extracting the final one. And define functions like
insertS :: (MonadState IntDisjointSet m) => Int -> m ()
insertS x = modify (insert x)
lookupS :: (MonadState IntDisjointSet m) => Int -> m (Maybe Int)
lookupS x = state (lookup x)
-- etc
to use inside State. (Perhaps they'd be a good contribution to the library as well as this will be probably a common problem.)
Finally, there are lot of small improvements that can make the code more readable:
Many times you're applying a single function to two values. I'd suggest to define something like
onPair :: (a -> b) -> (a, a) -> (b, b)
onPair f (x, y) = (f x, f y)
-- and use it like:
(ar, br) = onPair (fst . flip I.lookup us) (a, b)
Also using Applicative functions can make things shorter:
sameRep = fromMaybe False $ (==) <$> ar <*> br
totalSize = (+) <$> as' <*> bs'
then also
im'' = maybe id delete ar . maybe id delete br $ im
im' = insert newRep (fromJust maxSize totalSize) im''
Hope it helps.

How to increment a variable in functional programming?

How do you increment a variable in a functional programming language?
For example, I want to do:
main :: IO ()
main = do
let i = 0
i = i + 1
print i
Expected output:
1
Simple way is to introduce shadowing of a variable name:
main :: IO () -- another way, simpler, specific to monads:
main = do main = do
let i = 0 let i = 0
let j = i i <- return (i+1)
let i = j+1 print i
print i -- because monadic bind is non-recursive
Prints 1.
Just writing let i = i+1 doesn't work because let in Haskell makes recursive definitions — it is actually Scheme's letrec. The i in the right-hand side of let i = i+1 refers to the i in its left hand side — not to the upper level i as might be intended. So we break that equation up by introducing another variable, j.
Another, simpler way is to use monadic bind, <- in the do-notation. This is possible because monadic bind is not recursive.
In both cases we introduce new variable under the same name, thus "shadowing" the old entity, i.e. making it no longer accessible.
How to "think functional"
One thing to understand here is that functional programming with pure — immutable — values (like we have in Haskell) forces us to make time explicit in our code.
In imperative setting time is implicit. We "change" our vars — but any change is sequential. We can never change what that var was a moment ago — only what it will be from now on.
In pure functional programming this is just made explicit. One of the simplest forms this can take is with using lists of values as records of sequential change in imperative programming. Even simpler is to use different variables altogether to represent different values of an entity at different points in time (cf. single assignment and static single assignment form, or SSA).
So instead of "changing" something that can't really be changed anyway, we make an augmented copy of it, and pass that around, using it in place of the old thing.
As a general rule, you don't (and you don't need to). However, in the interests of completeness.
import Data.IORef
main = do
i <- newIORef 0 -- new IORef i
modifyIORef i (+1) -- increase it by 1
readIORef i >>= print -- print it
However, any answer that says you need to use something like MVar, IORef, STRef etc. is wrong. There is a purely functional way to do this, which in this small rapidly written example doesn't really look very nice.
import Control.Monad.State
type Lens a b = ((a -> b -> a), (a -> b))
setL = fst
getL = snd
modifyL :: Lens a b -> a -> (b -> b) -> a
modifyL lens x f = setL lens x (f (getL lens x))
lensComp :: Lens b c -> Lens a b -> Lens a c
lensComp (set1, get1) (set2, get2) = -- Compose two lenses
(\s x -> set2 s (set1 (get2 s) x) -- Not needed here
, get1 . get2) -- But added for completeness
(+=) :: (Num b) => Lens a b -> Lens a b -> State a ()
x += y = do
s <- get
put (modifyL x s (+ (getL y s)))
swap :: Lens a b -> Lens a b -> State a ()
swap x y = do
s <- get
let x' = getL x s
let y' = getL y s
put (setL y (setL x s y') x')
nFibs :: Int -> Int
nFibs n = evalState (nFibs_ n) (0,1)
nFibs_ :: Int -> State (Int,Int) Int
nFibs_ 0 = fmap snd get -- The second Int is our result
nFibs_ n = do
x += y -- Add y to x
swap x y -- Swap them
nFibs_ (n-1) -- Repeat
where x = ((\(x,y) x' -> (x', y)), fst)
y = ((\(x,y) y' -> (x, y')), snd)
There are several solutions to translate imperative i=i+1 programming to functional programming. Recursive function solution is the recommended way in functional programming, creating a state is almost never what you want to do.
After a while you will learn that you can use [1..] if you need a index for example, but it takes a lot of time and practice to think functionally instead of imperatively.
Here's a other way to do something similar as i=i+1 not identical because there aren't any destructive updates. Note that the State monad example is just for illustration, you probably want [1..] instead:
module Count where
import Control.Monad.State
count :: Int -> Int
count c = c+1
count' :: State Int Int
count' = do
c <- get
put (c+1)
return (c+1)
main :: IO ()
main = do
-- purely functional, value-modifying (state-passing) way:
print $ count . count . count . count . count . count $ 0
-- purely functional, State Monad way
print $ (`evalState` 0) $ do {
count' ; count' ; count' ; count' ; count' ; count' }
Note: This is not an ideal answer but hey, sometimes it might be a little good to give anything at all.
A simple function to increase the variable would suffice.
For example:
incVal :: Integer -> Integer
incVal x = x + 1
main::IO()
main = do
let i = 1
print (incVal i)
Or even an anonymous function to do it.

Why can't I compare result of lookup to Nothing in Haskell?

I have the following code:
import System.Environment
import System.Directory
import System.IO
import Data.List
dispatch :: [(String, [String] -> IO ())]
dispatch = [ ("add", add)
, ("view", view)
, ("remove", remove)
, ("bump", bump)
]
main = do
(command:args) <- getArgs
let result = lookup command dispatch
if result == Nothing then
errorExit
else do
let (Just action) = result
action args
errorExit :: IO ()
errorExit = do
putStrLn "Incorrect command"
add :: [String] -> IO ()
add [fileName, todoItem] = appendFile fileName (todoItem ++ "\n")
view :: [String] -> IO ()
view [fileName] = do
contents <- readFile fileName
let todoTasks = lines contents
numberedTasks = zipWith (\n line -> show n ++ " - " ++ line) [0..] todoTasks
putStr $ unlines numberedTasks
remove :: [String] -> IO ()
remove [fileName, numberString] = do
handle <- openFile fileName ReadMode
(tempName, tempHandle) <- openTempFile "." "temp"
contents <- hGetContents handle
let number = read numberString
todoTasks = lines contents
newTodoItems = delete (todoTasks !! number) todoTasks
hPutStr tempHandle $ unlines newTodoItems
hClose handle
hClose tempHandle
removeFile fileName
renameFile tempName fileName
bump :: [String] -> IO ()
bump [fileName, numberString] = do
handle <- openFile fileName ReadMode
(tempName, tempHandle) <- openTempFile "." "temp"
contents <- hGetContents handle
let number = read numberString
todoTasks = lines contents
bumpedItem = todoTasks !! number
newTodoItems = [bumpedItem] ++ delete bumpedItem todoTasks
hPutStr tempHandle $ unlines newTodoItems
hClose handle
hClose tempHandle
removeFile fileName
renameFile tempName fileName
Trying to compile it gives me the following error:
$ ghc --make todo
[1 of 1] Compiling Main ( todo.hs, todo.o )
todo.hs:16:15:
No instance for (Eq ([[Char]] -> IO ()))
arising from a use of `=='
Possible fix:
add an instance declaration for (Eq ([[Char]] -> IO ()))
In the expression: result == Nothing
In a stmt of a 'do' block:
if result == Nothing then
errorExit
else
do { let (Just action) = ...;
action args }
In the expression:
do { (command : args) <- getArgs;
let result = lookup command dispatch;
if result == Nothing then
errorExit
else
do { let ...;
.... } }
I don't get why is that since lookup returns Maybe a, which I'm surely can compare to Nothing.
The type of the (==) operator is Eq a => a -> a -> Bool. What this means is that you can only compare objects for equality if they're of a type which is an instance of Eq. And functions aren't comparable for equality: how would you write (==) :: (a -> b) -> (a -> b) -> Bool? There's no way to do it.1 And while clearly Nothing == Nothing and Just x /= Nothing, it's the case that Just x == Just y if and only if x == y; thus, there's no way to write (==) for Maybe a unless you can write (==) for a.
There best solution here is to use pattern matching. In general, I don't find myself using that many if statements in my Haskell code. You can instead write:
main = do (command:args) <- getArgs
case lookup command dispatch of
Just action -> action args
Nothing -> errorExit
This is better code for a couple of reasons. First, it's shorter, which is always nice. Second, while you simply can't use (==) here, suppose that dispatch instead held lists. The case statement remains just as efficient (constant time), but comparing Just x and Just y becomes very expensive. Second, you don't have to rebind result with let (Just action) = result; this makes the code shorter and doesn't introduce a potential pattern-match failure (which is bad, although you do know it can't fail here).
1:: In fact, it's impossible to write (==) while preserving referential transparency. In Haskell, f = (\x -> x + x) :: Integer -> Integer and g = (* 2) :: Integer -> Integer ought to be considered equal because f x = g x for all x :: Integer; however, proving that two functions are equal in this way is in general undecidable (since it requires enumerating an infinite number of inputs). And you can't just say that \x -> x + x only equals syntactically identical functions, because then you could distinguish f and g even though they do the same thing.
The Maybe a type has an Eq instance only if a has one - that's why you get No instance for (Eq ([[Char]] -> IO ())) (a function can't be compared to another function).
Maybe the maybe function is what you're looking for. I can't test this at the moment, but it should be something like this:
maybe errorExit (\action -> action args) result
That is, if result is Nothing, return errorExit, but if result is Just action, apply the lambda function on action.

Resources