dealing with IO vs pure code in haskell - haskell

I'm writing a shell script (my 1st non-example in haskell) which is supposed to list a directory, get every file size, do some string manipulation (pure code) and then rename some files. I'm not sure what i'm doing wrong, so 2 questions:
How should i arrange the code in such program?
I have a specific issue, i get the following error, what am i doing wrong?
error:
Couldn't match expected type `[FilePath]'
against inferred type `IO [FilePath]'
In the second argument of `mapM', namely `fileNames'
In a stmt of a 'do' expression:
files <- (mapM getFileNameAndSize fileNames)
In the expression:
do { fileNames <- getDirectoryContents;
files <- (mapM getFileNameAndSize fileNames);
sortBy cmpFilesBySize files }
code:
getFileNameAndSize fname = do (fname, (withFile fname ReadMode hFileSize))
getFilesWithSizes = do
fileNames <- getDirectoryContents
files <- (mapM getFileNameAndSize fileNames)
sortBy cmpFilesBySize files

Your second, specific, problem is with the types of your functions. However, your first issue (not really a type thing) is the do statement in getFileNameAndSize. While do is used with monads, it's not a monadic panacea; it's actually implemented as some simple translation rules. The Cliff's Notes version (which isn't exactly right, thanks to some details involving error handling, but is close enough) is:
do a ≡ a
do a ; b ; c ... ≡ a >> do b ; c ...
do x <- a ; b ; c ... ≡ a >>= \x -> do b ; c ...
In other words, getFileNameAndSize is equivalent to the version without the do block, and so you can get rid of the do. This leaves you with
getFileNameAndSize fname = (fname, withFile fname ReadMode hFileSize)
We can find the type for this: since fname is the first argument to withFile, it has type FilePath; and hFileSize returns an IO Integer, so that's the type of withFile .... Thus, we have getFileNameAndSize :: FilePath -> (FilePath, IO Integer). This may or may not be what you want; you might instead want FilePath -> IO (FilePath,Integer). To change it, you can write any of
getFileNameAndSize_do fname = do size <- withFile fname ReadMode hFileSize
return (fname, size)
getFileNameAndSize_fmap fname = fmap ((,) fname) $
withFile fname ReadMode hFileSize
-- With `import Control.Applicative ((<$>))`, which is a synonym for fmap.
getFileNameAndSize_fmap2 fname = ((,) fname)
<$> withFile fname ReadMode hFileSize
-- With {-# LANGUAGE TupleSections #-} at the top of the file
getFileNameAndSize_ts fname = (fname,) <$> withFile fname ReadMode hFileSize
Next, as KennyTM pointed out, you have fileNames <- getDirectoryContents; since getDirectoryContents has type FilePath -> IO FilePath, you need to give it an argument. (e.g. getFilesWithSizes dir = do fileNames <- getDirectoryContents dir ...). This is probably just a simple oversight.
Mext, we come to the heart of your error: files <- (mapM getFileNameAndSize fileNames). I'm not sure why it gives you the precise error it does, but I can tell you what's wrong. Remember what we know about getFileNameAndSize. In your code, it returns a (FilePath, IO Integer). However, mapM is of type Monad m => (a -> m b) -> [a] -> m [b], and so mapM getFileNameAndSize is ill-typed. You want getFileNameAndSize :: FilePath -> IO (FilePath,Integer), like I implemented above.
Finally, we need to fix your last line. First of all, although you don't give it to us, cmpFilesBySize is presumably a function of type (FilePath, Integer) -> (FilePath, Integer) -> Ordering, comparing on the second element. This is really simple, though: using Data.Ord.comparing :: Ord a => (b -> a) -> b -> b -> Ordering, you can write this comparing snd, which has type Ord b => (a, b) -> (a, b) -> Ordering. Second, you need to return your result wrapped up in the IO monad rather than just as a plain list; the function return :: Monad m => a -> m a will do the trick.
Thus, putting this all together, you'll get
import System.IO (FilePath, withFile, IOMode(ReadMode), hFileSize)
import System.Directory (getDirectoryContents)
import Control.Applicative ((<$>))
import Data.List (sortBy)
import Data.Ord (comparing)
getFileNameAndSize :: FilePath -> IO (FilePath, Integer)
getFileNameAndSize fname = ((,) fname) <$> withFile fname ReadMode hFileSize
getFilesWithSizes :: FilePath -> IO [(FilePath,Integer)]
getFilesWithSizes dir = do fileNames <- getDirectoryContents dir
files <- mapM getFileNameAndSize fileNames
return $ sortBy (comparing snd) files
This is all well and good, and will work fine. However, I might write it slightly differently. My version would probably look like this:
{-# LANGUAGE TupleSections #-}
import System.IO (FilePath, withFile, IOMode(ReadMode), hFileSize)
import System.Directory (getDirectoryContents)
import Control.Applicative ((<$>))
import Control.Monad ((<=<))
import Data.List (sortBy)
import Data.Ord (comparing)
preservingF :: Functor f => (a -> f b) -> a -> f (a,b)
preservingF f x = (x,) <$> f x
-- Or liftM2 (<$>) (,), but I am not entirely sure why.
fileSize :: FilePath -> IO Integer
fileSize fname = withFile fname ReadMode hFileSize
getFilesWithSizes :: FilePath -> IO [(FilePath,Integer)]
getFilesWithSizes = return . sortBy (comparing snd)
<=< mapM (preservingF fileSize)
<=< getDirectoryContents
(<=< is the monadic equivalent of ., the function composition operator.) First off: yes, my version is longer. However, I'd probably already have preservingF defined somewhere, making the two equivalent in length.* (I might even inline fileSize if it weren't used elsewhere.) Second, I like this version better because it involves chaining together simpler pure functions we've already written. While your version is similar, mine (I feel) is more streamlined and makes this aspect of things clearer.
So this is a bit of an answer to your first question of how to structure these things. I personally tend to lock my IO down into as few functions as possible—only functions which need to touch the outside world directly (e.g. main and anything which interacts with a file) get an IO. Everything else is an ordinary pure function (and is only monadic if it's monadic for general reasons, along the lines of preservingF). I then arrange things so that main, etc., are just compositions and chains of pure functions: main gets some values from IO-land; then it calls pure functions to fold, spindle, and mutilate the date; then it gets more IO values; then it operates more; etc. The idea is to separate the two domains as much as possible, so that the more compositional non-IO code is always free, and the black-box IO is only done precisely where necessary.
Operators like <=< really help with writing code in this style, as they let you operate on functions which interact with monadic values (such as the IO-world) just as you would operate on normal functions. You should also look at Control.Applicative's function <$> liftedArg1 <*> liftedArg2 <*> ... notation, which lets you apply ordinary functions to any number of monadic (really Applicative) arguments. This is really nice for getting rid of spurious <-s and just chaining pure functions over monadic code.
*: I feel like preservingF, or at least its sibling preserving :: (a -> b) -> a -> (a,b), should be in a package somewhere, but I've been unable to find either.

getDirectoryContents is a function. You should supply an argument to it, e.g.
fileNames <- getDirectoryContents "/usr/bin"
Also, the type of getFileNameAndSize is FilePath -> (FilePath, IO Integer), as you can check from ghci:
Prelude> :m + System.IO
Prelude System.IO> let getFileNameAndSize fname = do (fname, (withFile fname ReadMode hFileSize))
Prelude System.IO> :t getFileNameAndSize
getFileNameAndSize :: FilePath -> (FilePath, IO Integer)
But mapM requires the input function to return an IO stuff:
Prelude System.IO> :t mapM
mapM :: (Monad m) => (a -> m b) -> [a] -> m [b]
-- # ^^^^^^^^
You should change its type to FilePath -> IO (FilePath, Integer) to match the type.
getFileNameAndSize fname = do
fsize <- withFile fname ReadMode hFileSize
return (fname, fsize)

Related

Does GHC generate optimized code for `mapM_`?

I wrote this code,
import System.FilePath ((</>))
fp = "/Users/USER/Documents/Test/"
fpAcc = fp </> "acc.txt"
paths = map (fp </>) ["A.txt", "B.txt", "C.txt"]
main :: IO ()
main =
writeFile fpAcc ""
>> return paths
>>= mapM_ ((appendFile fpAcc =<<) . readFile)
>> readFile fpAcc >>= putStrLn
and this is the definition of mapM_:
mapM_ :: (Foldable t, Monad m) => (a -> m b) -> t a -> m ()
mapM_ f = foldr c (return ())
-- See Note [List fusion and continuations in 'c']
where c x k = f x >> k
{-# INLINE c #-}
Is the expression mapM_ ((appendFile fpAcc =<<) . readFile) producing 3 write operations to the disk, or due to some kind of GHC optimization, only one?
Would be great if the compiler could generate code that uses an intermediate memory to hold the appended data and then write only once. But, since mapM_ maps each element of the structure to a monadic action, perhaps each step performs one write operation.
No, this is not optimized. The code you wrote will open fpAcc, write some bytes, and close fpAcc once for each element in paths. Indeed, it would be incorrect for the compiler to optimize this: opening and closing a file is observable from outside the program, so an optimization that opened the file just once would be a behavioral change, not just a speed change.

How to use readFile

I am having trouble reading in a level file in Haskell. The goal is to read in a simple txt file with two numbers seperated by a space and then commas. The problem I keep getting is this: Couldn't match type `IO' with `[]'
If I understand correctly the do statement is supposed to pull the String out of the Monad.
readLevelFile :: FilePath -> [FallingRegion]
readLevelFile f = do
fileContent <- readFile f
(map lineToFallingRegion (lines fileContent))
lineToFallingRegion :: String -> FallingRegion
lineToFallingRegion s = map textShapeToFallingShape (splitOn' (==',') s)
textShapeToFallingShape :: String -> FallingShape
textShapeToFallingShape s = FallingShape (read $ head numbers) (read $ head
$ tail numbers)
where numbers = splitOn' (==' ') s
You can't pull things out of IO. You can think of IO as a container (in fact, some interpretations of IO liken it to the box containing Schrödinger's cat). You can't see what's in the container, but if you step into the container, values become visible.
So this should work:
readLevelFile f = do
fileContent <- readFile f
return (map lineToFallingRegion (lines fileContent))
It does not, however, have the type given in the OP. Inside the do block, fileContent is a String value, but the entire block is still inside the IO container.
This means that the return type of the function isn't [FallingRegion], but IO [FallingRegion]. So if you change the type annotation for readLevelFile to
readLevelFile :: FilePath -> IO [FallingRegion]
you should be able to get past the first hurdle.
Let's look at your first function with explicit types:
readLevelFile f = do
(fileContent :: String) <-
(readFile :: String -> IO String) (f :: String) :: IO String
fileContent is indeed of type String but is only available within the execution of the IO Monad under which we are evaluating. Now what?
(map lineToFallingRegion (lines fileContent)) :: [String]
Now you are suddenly using an expression that is not an IO monad but instead is a list value - since lists are also a type of monad the type check tries to unify IO with []. What you actually wanted is to return this value:
return (map lineToFallingRegion (lines fileContent)) :: IO [String]
Now recalling that we can't ever "exit" the IO monad your readLevelFile type must be IO - an honest admission that it interacts with the outside world:
readLevelFile :: FilePath -> IO [FallingRegion]

Haskell Turtle - split a shell

Is it possible to split a Shell in Turtle library (Haskell) and do different things to either split of the shell, such that the original Shell is only run once ?
/---- shell2
---Shell1 --/
\
\-----shell3
For instance, how to do
do
let lstmp = lstree "/tmp"
view lstmp
view $ do
path <- lstmp
x <- liftIO $ testdir path
return x
such that lstree "/tmp" would only run once.
Specifically I would like to send Shell 2 and Shell 3 to different files using output.
You won't be able to split a Shell into two separate shells that run simultaneously, unless there's some magic I don't know. But file writing is a fold over the contents of a shell or some other succession of things. It is built into turtle that you can always combine many folds and make them run simultaneously using the Control.Foldl material - here
foldIO :: Shell a -> FoldM IO a r -> IO r -- specializing
A shell is secretly a FoldM IO a r -> IO r under the hood anyway, so this is basically runShell. To do this we need to get the right Shell and the right combined FoldM IO. The whole idea of the Fold a b and FoldM m a b types from the foldl package is simultaneous folding.
I think the easiest way to get the right shell is just to make the lstree fold return a FilePath together with the result of testdir. You basically wrote this:
withDirInfo :: FilePath -> Shell (Bool, FilePath)
withDirInfo tmp = do
let lstmp = lstree tmp
path <- lstmp
bool <- liftIO $ testdir path
return (bool, path)
So now we can get a Shell (Bool, FilePath) from /tmp This has all the information our two folds will need, and thus that our combined fold will need.
Next we might write a helper fold that prints the Text component of the FilePath to a given handle:
sinkFilePaths :: Handle -> FoldM IO FilePath ()
sinkFilePaths handle = L.sink (T.hPutStrLn handle . format fp)
Then we can use this Handle -> FoldM IO FilePath () to define two FoldM IO (Bool, FilePath) (). Each will write different stuff to different handles, and we can unite them into a single simultaneous fold with <*. This is an independent FoldM IO ... and can be applied e.g. to a pure list of type [(Bool, FilePath)] using L.fold and it will write different things from the list to the different handles. In our case, though, we will apply it to the Shell (Bool, FilePath) we defined.
The only subtle part of this is the use of L.handlesM to print only the second element, in both cases, and only those filtered as directories in the other. This uses the _2 lens and filtered from the lens libraries. This could probably be simplified, but see what you think:
{-#LANGUAGE OverloadedStrings #-}
import Turtle
import qualified Control.Foldl as L
import qualified System.IO as IO
import Control.Lens (_2,filtered)
import qualified Data.Text.IO as T
main = IO.withFile "tmpfiles.txt" IO.WriteMode $ \h ->
IO.withFile "tmpdirs.txt" IO.WriteMode $ \h' -> do
foldIO (withDirInfo "/tmp") (sinkFilesDirs h h')
withDirInfo :: Turtle.FilePath -> Shell (Bool, Turtle.FilePath)
withDirInfo tmp = do
let lstmp = lstree tmp
path <- lstmp
bool <- liftIO $ testdir path
return (bool, path)
sinkFilePaths :: Handle -> FoldM IO Turtle.FilePath ()
sinkFilePaths handle = L.sink (T.hPutStrLn handle . format fp)
sinkFilesDirs :: Handle -> Handle -> FoldM IO (Bool, Turtle.FilePath) ()
sinkFilesDirs h h' = allfiles <* alldirs where
allfiles :: L.FoldM IO (Bool, Turtle.FilePath) ()
allfiles = L.handlesM _2 (sinkFilePaths h)
-- handle the second element of pairs with sinkFilePaths
alldirs :: FoldM IO (Bool, Turtle.FilePath) ()
alldirs = L.handlesM (filtered (\(bool,file) -> bool) . _2) (sinkFilePaths h')
-- handle the second element of pairs where the first element
-- is true using sinkFilePaths
It sounds like you're looking for something like async to split off your shells from the first shell and then wait for them to return. async is a pretty capable library that can achieve much more than the below example, but it provides a pretty simple solution to what you're asking for:
import Control.Concurrent.Async
import Turtle.Shell
import Turtle.Prelude
main :: IO ()
main = do
let lstmp1 = lstree "/tmp"
let lstmp2 = lstree "/etc"
view lstmp1
view lstmp2
job1 <- async $ view $ do
path <- lstmp1
x <- liftIO $ testdir path
return x
job2 <- async $ view $ do
path <- lstmp2
x <- liftIO $ testdir path
return x
wait job1
wait job2
Is this what you're looking for?

How withFile is implemented in haskell

Following a haskell tutorial, the author provides the following implementation of the withFile method:
withFile' :: FilePath -> IOMode -> (Handle -> IO a) -> IO a
withFile' path mode f = do
handle <- openFile path mode
result <- f handle
hClose handle
return result
But why do we need to wrap the result in a return? Doesn't the supplied function f already return an IO as can be seen by it's type Handle -> IO a?
You're right: f already returns an IO, so if the function were written like this:
withFile' path mode f = do
handle <- openFile path mode
f handle
there would be no need for a return. The problem is hClose handle comes in between, so we have to store the result first:
result <- f handle
and doing <- gets rid of the IO. So return puts it back.
This is one of the tricky little things that confused me when I first tried Haskell. You're misunderstanding the meaning of the <- construct in do-notation. result <- f handle doesn't mean "assign the value of f handle to result"; it means "bind result to a value 'extracted' from the monadic value of f handle" (where the 'extraction' happens in some way that's defined by the particular Monad instance that you're using, in this case the IO monad).
I.e., for some Monad typeclass m, the <- statement takes an expression of type m a in the right hand side and a variable of type a on the left hand side, and binds the variable to a value. Thus in your particular example, with result <- f handle, we have the types f result :: IO a, result :: a and return result :: IO a.
PS do-notation has also a special form of let (without the in keyword in this case!) that works as a pure counterpart to <-. So you could rewrite your example as:
withFile' :: FilePath -> IOMode -> (Handle -> IO a) -> IO a
withFile' path mode f = do
handle <- openFile path mode
let result = f handle
hClose handle
result
In this case, because the let is a straightforward assignment, the type of result is IO a.

Unwrapping a monad

Given the below program, I am having issues dealing with monads.
module Main
where
import System.Environment
import System.Directory
import System.IO
import Text.CSV
--------------------------------------------------
exister :: String -> IO Bool
exister path = do
fileexist <- doesFileExist path
direxist <- doesDirectoryExist path
return (fileexist || direxist )
--------------------------------------------------
slurp :: String -> IO String
slurp path = do
withFile path ReadMode (\handle -> do
contents <- hGetContents handle
last contents `seq` return contents )
--------------------------------------------------
main :: IO ()
main = do
[csv_filename] <- getArgs
putStrLn (show csv_filename)
csv_raw <- slurp csv_filename
let csv_data = parseCSV csv_filename csv_raw
printCSV csv_data -- unable to compile.
csv_data is an Either (parseerror) CSV type, and printCSV takes only CSV data.
Here's the ediff between the working version and the broken version.
***************
*** 27,30 ****
csv_raw <- slurp csv_filename
let csv_data = parseCSV csv_filename csv_raw
! printCSV csv_data -- unable to compile.
\ No newline at end of file
--- 27,35 ----
csv_raw <- slurp csv_filename
let csv_data = parseCSV csv_filename csv_raw
! case csv_data of
! Left error -> putStrLn $ show error
! Right csv_data -> putStrLn $ printCSV csv_data
!
! putStrLn "done"
!
reference: http://hackage.haskell.org/packages/archive/csv/0.1.2/doc/html/Text-CSV.html
Regarding monads:
Yes, Either a is a monad. So simplifying the problem, you are basically asking for this:
main = print $ magicMonadUnwrap v
v :: Either String Int
v = Right 3
magicMonadUnwrap :: (Monad m) => m a -> a
magicMonadUnwrap = undefined
How do you define magicMonadUnwrap? Well, you see, it's different for each monad. Each one needs its own unwrapper. Many of these have the word "run" in them, for example, runST, runCont, or runEval. However, for some monads, it might not be safe to unwrap them (hence the need for differing unwrappers).
One implementation for lists would be head. But what if the list is empty? An unwrapper for Maybe is fromJust, but what if it's Nothing?
Similarly, the unwrapper for the Either monad would be something like:
fromRight :: Either a b -> b
fromRight (Right x) = x
But this unwrapper isn't safe: what if you had a Left value instead? (Left usually represents an error state, in your case, a parse error). So the best way to act upon an Either value it is to use the either function, or else use a case statement matching Right and Left, as Daniel Wagner illustrated.
tl;dr: there is no magicMonadUnwrap. If you're inside that same monad, you can use <-, but to truly extract the value from a monad...well...how you do it depends on which monad you're dealing with.
Use case.
main = do
...
case csv_data of
Left err -> {- whatever you're going to do with an error -- print it, throw it as an exception, etc. -}
Right csv -> printCSV csv
The either function is shorter (syntax-wise), but boils down to the same thing.
main = do
...
either ({- error condition function -}) printCSV csv_data
You must unlearn what you have learned.
Master Yoda.
Instead of thinking about, or searching for ways to "free", "liberate", "release", "unwrap" or "extract" normal Haskell values from effect-centric (usually monadic) contexts, learn how to use one of Haskell's more distinctive features - functions are first-class values:
you can use functions like values of other types e.g. like Bool, Char, Int, Integer etc:
arithOps :: [(String, Int -> Int -> Int)]
arithOps = zip ["PLUS","MINUS", "MULT", "QUOT", "REM"]
[(+), (-), (*), quot, rem]
For your purposes, what's more important is that functions can also be used as arguments e.g:
map :: (a -> b) -> [a] -> [b]
map f xs = [ f x | x <- xs ]
filter :: (a -> Bool) -> [a] -> [a]
filter p xs = [ x | x <- xs, p x ]
These higher-order functions are even available for use in effect-bearing contexts e.g:
import Control.Monad
liftM :: Monad m => (a -> b) -> (m a -> m b)
liftM2 :: Monad m => (a -> b -> c) -> (m a -> m b -> m c)
liftM3 :: Monad m => (a -> b -> c -> d) -> (m a -> m b -> m c -> m d)
...etc, which you can use to lift your regular Haskell functions:
do .
.
.
val <- liftM3 calculate this_M that_M other_M
.
.
.
Of course, the direct approach also works:
do .
.
.
x <- this_M
y <- that_M
z <- other_M
let val = calculate x y z
.
.
.
As your skills develop, you'll find yourself delegating more and more code to ordinary functions and leaving the effects to a vanishingly-small set of entities defined in terms of functors, applicatives, monads, arrows, etc as you progress towards Haskell mastery.
You're not convinced? Well, here's a brief note of how effects used to be handled in Haskell - there's also a longer description of how Haskell arrived at the monadic interface. Alternately, you could look at Standard ML, OCaml, and other similar languages - who knows, maybe you'll be happier with using them...

Resources