I'm learning Haskell, and writing a short parsing script as an exercise. Most of my script consists of pure functions, but I have two, nested IO components:
Read a list of files from a path.
Read the contents of each file, which, in turn, will be the input for most of the rest of the program.
What I have works, but the nested IO and layers of fmap "feel" clunky, like I should either be avoiding nested IO (somehow), or more skillfully using do notation to avoid all the fmaps. I'm wondering if I'm over-complicating things, doing it wrong, etc. Here's some relevant code:
getPaths :: FilePath -> IO [String]
getPaths folder = do
allFiles <- listDirectory folder
let txtFiles = filter (isInfixOf ".txt") allFiles
paths = map ((folder ++ "/") ++) txtFiles
return paths
getConfig :: FilePath -> IO [String]
getConfig path = do
config <- readFile path
return $ lines config
main = do
paths = getPaths "./configs"
let flatConfigs = map getConfigs paths
blockConfigs = map (fmap chunk) flatConfigs
-- Parse and do stuff with config data.
return
I end up dealing with IO [IO String] from using listDirectory as input for readFile. Not unmanageable, but if I use do notation to unwrap the [IO String] to send to some parser function, I still end up either using nested fmap or pollute my supposedly pure functions with IO awareness (fmap, etc). The latter seems worse, so I'm doing the former. Example:
type Block = [String]
getTrunkBlocks :: [Block] -> [Block]
getTrunkBlocks = filter (liftM2 (&&) isInterface isMatchingInt)
where isMatchingInt line = isJust $ find predicate line
predicate = isInfixOf "switchport mode trunk"
main = do
paths <- getPaths "./configs"
let flatConfigs = map getConfig paths
blockConfigs = map (fmap chunk) flatConfigs
trunks = fmap (fmap getTrunkBlocks) blockConfigs
return $ "Trunk count: " ++ show (length trunks)
fmap, fmap, fmap... I feel like I've inadvertently made this more complicated than necessary, and can't imagine how convoluted this could get if I had deeper IO nesting.
Suggestions?
Thanks in advance.
I think you want something like this for your main:
main = do
paths <- getPaths "./configs"
flatConfigs <- traverse getConfig paths
let blockConfigs = fmap chunk flatConfigs
-- Parse and do stuff with config data.
return ()
Compare
fmap :: Functor f => (a -> b) -> f a -> f b
and
traverse :: (Applicative f, Traversable t) => (a -> f b) -> t a -> f (t b)
They are quite similar, but traverse lets you use effects like IO.
Here are the types again specialized a little for comparison:
fmap :: (a -> b) -> [a] -> [b]
traverse :: (a -> IO b) -> [a] -> IO [b]
(traverse is also known as mapM)
Your idea of 'nestedness' is actually a pretty good insight into what monads are. Monads can be seen as Functors with two additional operations, return with type a -> m a and join with type m (m a) -> m a. We can then make functions of type a -> m b composable:
fmap :: (a -> m b) -> m a -> m (m b)
f =<< v = join (fmap f v) :: (a -> m b) -> m a -> m b
So we want to use join here but have m [m a] at the moment so our monad combinators won't help directly. Lets search for m [m a] -> m (m [a]) using hoogle and our first result looks promising. It is sequence:: [m a] -> m [a].
If we look at the related function we also find traverse :: (a -> IO b) -> [a] -> IO [b] which is similarly sequence (fmap f v).
Armed with that knowledge we can just write:
readConfigFiles path = traverse getConfig =<< getPaths path
Related
Working with Maybe seems really difficult in Haskell. I was able to implement the function I need after many frustrating compile errors, but it's still completely disorganized and I don't know how else can I improve it.
I need to:
extract multiple nested ... Maybes into one, final Maybe ...
Do a -> b -> IO () with Just a and Just b or nothing (*)
Here is example with IO part removed. I need a -> b -> IO (), not (a,b) -> IO () later but I couldn't figure out how to pass both arguments otherwise (I can mapM_ with one argument only).
import Network.URI
type URL = String
type Prefix = String
fubar :: String -> Maybe (Prefix, URL)
fubar url = case parseURI url of
Just u -> (flip (,) $ url)
<$> (fmap ((uriScheme u ++) "//" ++ ) ((uriRegName <$> uriAuthority u)))
_ -> Nothing
Result:
> fubar "https://hackage.haskell.org/package/base-4.9.0.0/docs/src/Data.Foldable.html#mapM"
Just ("https://hackage.haskell.org"
,"https://hackage.haskell.org/package/base-4.9.0.0/docs/src/Data.Foldable.html#mapM"
)
(*) printing what failed to parse wrong would be nice
This is pretty simple written with do notation:
fubar :: String -> Maybe (Prefix, URL)
fubar url = do
u <- parseURI url
scheme <- uriScheme u
authority <- uriAuthority u
return (scheme ++ "//" ++ uriRegName authority, url)
Monads in general (and Maybe in particular) are all about combining m (m a) into m a. Each <- binding is an alternate syntax for a call to >>=, the operator responsible for aborting if it sees a Nothing, and otherwise unwrapping the Just for you.
First note that you're just stacking multiple fmaps there, with α <$> (fmap β (γ <$> uriAuthority u)). This can (functor laws!) be rewritten α . β . γ <$> uriAuthority u, i.e.
{-# LANGUAGE TupleSections #-}
...
Just u -> (,url) . ((uriScheme u++"//") ++ ) . uriRegName <$> uriAuthority u
It might be better for legibility to actually keep the layers separate, but then you should also give them names as amalloy suggests.
Further, more strongly:
Extract multiple nested M into one, final M
Well, sounds like monads, doesn't it?
fubar url = do
u <- parseURI url
(,url) . ((uriScheme u++"//") ++ ) . uriRegName <$> uriAuthority u
I'm not entirely clear on precisely what you're asking, but I'll do my best to answer the questions you have presented.
To extract multiple nested Maybes into a single final Maybe is taken care of by Maybe's monad-nature (also applicative-nature). How specifically to do it depends on how they are nested.
Simplest example:
Control.Monad.join :: (Monad m) => m (m a) -> m a
-- thus
Control.Monad.join :: Maybe (Maybe a) -> Maybe a
A tuple:
squishTuple :: (Maybe a, Maybe b) -> Maybe (a,b)
squishTuple (ma, mb) = do -- do in Maybe monad
a <- ma
b <- mb
return (a,b)
-- or
squishTuple (ma, mb) = liftA2 (,) ma mb
A list:
sequenceA :: (Applicative f, Traversable t) => t (f a) -> f (t a)
-- thus
sequenceA :: [Maybe a] -> Maybe [a]
-- (where t = [], f = Maybe)
Other structures can be flattened by composing these and following the types. For example:
flattenComplexThing :: (Maybe a, [Maybe (Maybe b)]) -> Maybe (a, [b])
flattenComplexThing (ma, mbs) = do
a <- ma
bs <- (join . fmap sequenceA . sequenceA) mbs
return (a, bs)
That join . fmap sequenceA . sequenceA line is a bit complex, and it takes some getting used to to know how to construct things like this. My brain works in a very type-directed way (read the composition right-to-left):
[Maybe (Maybe b)]
|
sequenceA :: [Maybe _] -> Maybe [_]
↓
Maybe [Maybe b]
|
-- sequenceA :: [Maybe b] -> Maybe [b]
-- fmap f makes the function f work "inside" the Maybe, so
fmap sequenceA :: Maybe [Maybe b] -> Maybe (Maybe [b])
↓
Maybe (Maybe [b])
|
join :: Maybe (Maybe _) -> Maybe _
↓
Maybe [b]
As for the second question, how to do an a -> b -> IO () when you have Maybe a and Maybe b, presuming you don't want to do the action at all in the case that either one is Nothing, you just do some gymnastics:
conditional :: (a -> IO ()) -> Maybe a -> IO ()
conditional = maybe (return ())
conditional2 :: (a -> b -> IO ()) -> Maybe a -> Maybe b -> IO ()
conditional2 f ma mb = conditional (uncurry f) (liftA2 (,) ma mb)
Again I found conditional2 in a mainly type-directed way in my mind.
It takes some time to develop your type gymnastics, but then it starts to get really fun. To make code like this readable, it's important to use helper functions, e.g. conditional above, and name them well (which is arguable about conditional :-). You'll gradually get familiar with the standard library's helpers. There's no magic bullet here, you just have to get used to it -- it's a language. Work with it, strive for clarity, if something is too ugly try your best to make it prettier. And ask more specific questions :-)
Why does this function have the type:
deleteAllMp4sExcluding :: [Char] -> IO (IO ())
instead of deleteAllMp4sExcluding :: [Char] -> IO ()
Also, how could I rewrite this so that it would have a simpler definition?
Here is the function definition:
import System.FilePath.Glob
import qualified Data.String.Utils as S
deleteAllMp4sExcluding videoFileName =
let dirGlob = globDir [compile "*"] "."
f = filter (\s -> S.endswith ".mp4" s && (/=) videoFileName s) . head . fst
lst = f <$> dirGlob
in mapM_ removeFile <$> lst
<$> when applied to IOs has type (a -> b) -> IO a -> IO b. So since mapM_ removeFile has type [FilePath] -> IO (), b in this case is IO (), so the result type becomes IO (IO ()).
To avoid nesting like this, you should not use <$> when the function you're trying to apply produces an IO value. Rather you should use >>= or, if you don't want to change the order of the operands, =<<.
Riffing on sepp2k's answer, this is an excellent example to show the difference between Functor and Monad.
The standard Haskell definition of Monad goes something like this (simplified):
class Monad m where
return :: a -> m a
(>>=) :: m a -> (a -> m b) -> m b
However, this is not the only way the class could have been defined. An alternative runs like this:
class Functor m => Monad m where
return :: a -> m a
join :: m (m a) -> m a
Given that, you can define >>= in terms of fmap and join:
(>>=) :: Monad m => m a -> (a -> m b) -> m b
ma >>= f = join (f <$> ma)
We'll look at this in a simplified sketch of the problem you're running into. What you're doing can be schematized like this:
ma :: IO a
f :: a -> IO b
f <$> ma :: IO (IO b)
Now you're stuck because you need an IO b, and the Functor class has no operation that will get you there from IO (IO b). The only way to get where you want is to dip into Monad, and the join operation is precisely what solves it:
join (f <$> ma) :: IO b
But by the join/<$> definition of >>=, this is the same as:
ma >>= f :: IO a
Note that the Control.Monad library comes with a version of join (written in terms of return and (>>=)); you could put that in your function to get the result you want. But the better thing to do is to recognize that what you're trying to do is fundamentally monadic, and thus that <$> is not the right tool for the job. You're feeding the result of one action to another; that intrinsically requires you to use Monad.
I have a function which signature is String -> String -> IO (Maybe String)
Now, I use this function to build values for a dictionary and I end up with: [(String,IO (Maybe String))]
I have to analyze the values in the dictionary and based on the result return the appropriate key. I was hoping to just use filter on the Map to step through it but I can't think of a way to extract that IO action on the fly. So how do I map/filter through the dictionary running the IO action and based on the result of the IO action's value return appropriate key of the dictionary?? Is there an easy way of doing it or I just got myself in a mess?
Thanks.
Perhaps the solution is to use sequence with something like
sequence . map (\(a,mb) -> (mb >>= \b -> return (a,b)))
then you can simply use liftM to apply your filter to the resulting IO [(String,Maybe String)].
liftM is in Control.Monad. Alternatively, in do notation
myFilter :: (String,(Maybe String)) -> Bool)
-> [(String,IO (Maybe String))]
-> IO [(String,(Maybe String))]
myFilter f ml =
do
l <- sequence . map (\(a,mb) -> (mb >>= \b -> return (a,b))) $ ml
return $ filter f ml
Perhaps some refactoring is in order. Often when working with monads you want to use mapM instead of map.
There is also a filterM function in Control.Monad. It might be what you need.
Edit: it was pointed out in comments that
sequence . map (\(a,mb) -> (mb >>= \b -> return (a,b))) $ ml
is equivalent to
mapM (\(a,mb) -> fmap ((,) a) mb) ml
so
myFiter' f = (liftM $ filter f) . mapM (\(a,mb) -> fmap ((,) a) mb)
I have a function which signature is String -> String -> IO (Maybe String) Now, I use this function to build values for a dictionary and I end up with: [(String,IO (Maybe String))]
This part doesn't add up for me. How do you input only 2 strings, and end up with a meaningful result of type [(String, IO (Maybe String))]?
Lacking the particulars, lets assume your situation is something like this:
f :: String -> String -> IO (Maybe String)
magic :: String -> (String, String)
magic takes some key, and somehow splits the key into the two input strings needed by f. So, assuming you have a list of keys, you would wind up with a [(String, IO (Maybe String))] like this:
-- ks :: [String]
myMap = map (\k -> let (a,b) = magic k in (k, f a b)) ks
But...wouldn't it be so much nicer if we had a [(String, Maybe String)] instead? Assuming we're inside an IO action...
someIOAction = do
...
myMap <- monadicMagicks (\k -> let (a,b) = magic k in (k, f a b)) ks
But what monadicMagicks could we use to make this work out right? Let's look at the types that we expect from these expressions
monadicMagicks (\k -> ...) ks :: IO [(String, Maybe String)]
(\k -> let (a,b) = magic k in (k, f a b)) :: String -> (String, IO (Maybe String))
ks :: [String]
-- therefore
monadicMagicks :: (String -> (String, IO (Maybe String)))
-> [String]
-> IO [(String, Maybe String)]
Stop...Hoogle time. Let's generalize what we're looking for a bit, first. We have two basic data types here: String, which appears to be the "input", and Maybe String, which appears to be the "output". So let's replace String with a and Maybe String with b. So we're looking for (a -> (a, IO b)) -> [a] -> IO [(a, b)]. Hrm. No results found. Well, if we could get a result type of just IO [b], then in a do block we could get it out of IO, and then zip it with the original key list. That also means the function we put in wouldn't have to perform the pairing of key with result. So let's simplify what we're looking for and try again: (a -> IO b) -> [a] -> IO [b]. Hey, check it out! mapM matches that type signature perfectly.
monadicMagicks :: (a -> IO b) -> [a] -> IO [(a, b)]
monadicMagicks f xs = do
ys <- mapM f xs
return $ zip xs ys
This could actually be written more succinctly, and with a more general type signature:
monadicMagicks :: Monad m => (a -> m b) -> [a] -> m [(a, b)]
monadicMagicks f xs = zip xs `liftM` mapM f xs
someIOAction = do
...
-- Like we said, the lambda no longer has to pair up the key with the result
myMap <- monadicMagicks (\k -> let (a,b) = magic k in f a b) ks
This last line could also be rewritten a bit shorter, if you're comfortable with function composition:
myMap <- monadicMagicks (uncurry f . magic) ks
Note that monadicMagicks might not make much sense for some monads, which is probably why it isn't in the standard libs. (For example, using mapM in the list monad usually means that the result will have a different length than the input):
ghci> mapM (\x -> [x, -x]) [1,2]
[[1,2],[1,-2],[-1,2],[-1,-2]]
I'm writing a program that reads from a list of files. The each file either contains a link to the next file or marks that it's the end of the chain.
Being new to Haskell, it seemed like the idiomatic way to handle this is is a lazy list of possible files to this end, I have
getFirstFile :: String -> DataFile
getNextFile :: Maybe DataFile -> Maybe DataFile
loadFiles :: String -> [Maybe DataFile]
loadFiles = iterate getNextFile . Just . getFirstFile
getFiles :: String -> [DataFile]
getFiles = map fromJust . takeWhile isJust . loadFiles
So far, so good. The only problem is that, since getFirstFile and getNextFile both need to open files, I need their results to be in the IO monad. This gives the modified form of
getFirstFile :: String -> IO DataFile
getNextFile :: Maybe DataFile -> IO (Maybe DataFile)
loadFiles :: String -> [IO Maybe DataFile]
loadFiles = iterate (getNextFile =<<) . Just . getFirstFile
getFiles :: String -> IO [DataFile]
getFiles = liftM (map fromJust . takeWhile isJust) . sequence . loadFiles
The problem with this is that, since iterate returns an infinite list, sequence becomes an infinite loop. I'm not sure how to proceed from here. Is there a lazier form of sequence that won't hit all of the list elements? Should I be rejiggering the map and takeWhile to be operating inside the IO monad for each list element? Or do I need to drop the whole infinite list process and write a recursive function to terminate the list manually?
A step in the right direction
What puzzles me is getNextFile. Step into a simplified world with me, where we're not dealing with IO yet. The type is Maybe DataFile -> Maybe DataFile. In my opinion, this should simply be DataFile -> Maybe DataFile, and I will operate under the assumption that this adjustment is possible. And that looks like a good candidate for unfoldr. The first thing I am going to do is make my own simplified version of unfoldr, which is less general but simpler to use.
import Data.List
-- unfoldr :: (b -> Maybe (a,b)) -> b -> [a]
myUnfoldr :: (a -> Maybe a) -> a -> [a]
myUnfoldr f v = v : unfoldr (fmap tuplefy . f) v
where tuplefy x = (x,x)
Now the type f :: a -> Maybe a matches getNextFile :: DataFile -> Maybe DataFile
getFiles :: String -> [DataFile]
getFiles = myUnfoldr getNextFile . getFirstFile
Beautiful, right? unfoldr is a lot like iterate, except once it hits Nothing, it terminates the list.
Now, we have a problem. IO. How can we do the same thing with IO thrown in there? Don't even think about The Function Which Shall Not Be Named. We need a beefed up unfoldr to handle this. Fortunately, the source for unfoldr is available to us.
unfoldr :: (b -> Maybe (a, b)) -> b -> [a]
unfoldr f b =
case f b of
Just (a,new_b) -> a : unfoldr f new_b
Nothing -> []
Now what do we need? A healthy dose of IO. liftM2 unfoldr almost gets us the right type, but won't quite cut it this time.
An actual solution
unfoldrM :: Monad m => (b -> m (Maybe (a, b))) -> b -> m [a]
unfoldrM f b = do
res <- f b
case res of
Just (a, b') -> do
bs <- unfoldrM f b'
return $ a : bs
Nothing -> return []
It is a rather straightforward transformation; I wonder if there is some combinator that could accomplish the same.
Fun fact: we can now define unfoldr f b = runIdentity $ unfoldrM (return . f) b
Let's again define a simplified myUnfoldrM, we just have to sprinkle in a liftM in there:
myUnfoldrM :: Monad m => (a -> m (Maybe a)) -> a -> m [a]
myUnfoldrM f v = (v:) `liftM` unfoldrM (liftM (fmap tuplefy) . f) v
where tuplefy x = (x,x)
And now we're all set, just like before.
getFirstFile :: String -> IO DataFile
getNextFile :: DataFile -> IO (Maybe DataFile)
getFiles :: String -> IO [DataFile]
getFiles str = do
firstFile <- getFirstFile str
myUnfoldrM getNextFile firstFile
-- alternatively, to make it look like before
getFiles' :: String -> IO [DataFile]
getFiles' = myUnfoldrM getNextFile <=< getFirstFile
By the way, I typechecked all of these with data DataFile = NoClueWhatGoesHere, and the type signatures for getFirstFile and getNextFile, with their definitions set to undefined.
[edit] changed myUnfoldr and myUnfoldrM to behave more like iterate, including the initial value in the list of results.
[edit] Additional insight on unfolds:
If you have a hard time wrapping your head around unfolds, the Collatz sequence is possibly one of the simplest examples.
collatz :: Integral a => a -> Maybe a
collatz 1 = Nothing -- the sequence ends when you hit 1
collatz n | even n = Just $ n `div` 2
| otherwise = Just $ 3 * n + 1
collatzSequence :: Integral a => a -> [a]
collatzSequence = myUnfoldr collatz
Remember, myUnfoldr is a simplified unfold for the cases where the "next seed" and the "current output value" are the same, as is the case for collatz. This behavior should be easy to see given myUnfoldr's simple definition in terms of unfoldr and tuplefy x = (x,x).
ghci> collatzSequence 9
[9,28,14,7,22,11,34,17,52,26,13,40,20,10,5,16,8,4,2,1]
More, mostly unrelated thoughts
The rest has absolutely nothing to do with the question, but I just couldn't resist musing. We can define myUnfoldr in terms of myUnfoldrM:
myUnfoldr f v = runIdentity $ myUnfoldrM (return . f) v
Look familiar? We can even abstract this pattern:
sinkM :: ((a -> Identity b) -> a -> Identity c) -> (a -> b) -> a -> c
sinkM hof f = runIdentity . hof (return . f)
unfoldr = sinkM unfoldrM
myUnfoldr = sinkM myUnfoldrM
sinkM should work to "sink" (opposite of "lift") any function of the form
Monad m => (a -> m b) -> a -> m c.
since the Monad m in those functions can be unified with the Identity monad constraint of sinkM. However, I don't see anything that sinkM would actually be useful for.
sequenceWhile :: Monad m => (a -> Bool) -> [m a] -> m [a]
sequenceWhile _ [] = return []
sequenceWhile p (m:ms) = do
x <- m
if p x
then liftM (x:) $ sequenceWhile p ms
else return []
Yields:
getFiles = liftM (map fromJust) . sequenceWhile isJust . loadFiles
As you have noticed, IO results can't be lazy, so you can't (easily) build an infinite list using IO. There is a way out, however, in unsafeInterleaveIO; with this, you can do something like:
ioList startFile = do
v <- processFile startFile
continuation <- unsafeInterleaveIO (nextFile startFile >>= ioList)
return (v:continuation)
It's important to be careful here, though - you've just deferred the results of ioList to some unpredictable time in the future. It may never be run at all, in fact. So be very careful when you're being Clever™ like this.
Personally, I would just build a manual recursive function.
Laziness and I/O are a tricky combination. Using unsafeInterleaveIO is one way to produce lazy lists in the IO monad (and this is the technique used by the standard getContents, readFile and friends). However, as convenient as this is, it exposes pure code to possible I/O errors and makes makes releasing resources (such as file handles) non-deterministic. This is why most "serious" Haskell applications (especially those concerned with efficiency) nowadays use things called Enumerators and Iteratees for streaming I/O. One library in Hackage that implements this concept is enumerator.
You are probably fine with using lazy I/O in your application, but I thought I'd still give this as an example of another way to approach these kind of problems. You can find more in-depth tutorials about iteratees here and here.
For example, your stream of DataFiles could be implemented as an Enumerator like this:
import Data.Enumerator
import Control.Monad.IO.Class (liftIO)
iterFiles :: String -> Enumerator DataFile IO b
iterFiles s = first where
first (Continue k) = do
file <- liftIO $ getFirstFile s
k (Chunks [file]) >>== next file
first step = returnI step
next prev (Continue k) = do
file <- liftIO $ getNextFile (Just prev)
case file of
Nothing -> k EOF
Just df -> k (Chunks [df]) >>== next df
next _ step = returnI step
Is there a traditional way to map over a function that uses IO? Specifically, I'd like to map over a function that returns a random value of some kind. Using a normal map will result in an output of type ([IO b]), but to unpack the values in the list from IO, I need a something of type (IO [b]). So I wrote...
mapIO :: (a -> IO b) -> [a] -> [b] -> IO [b]
mapIO f [] acc = do return acc
mapIO f (x:xs) acc = do
new <- f x
mapIO f xs (new:acc)
... which works fine. But it seems like there ought to be a solution for this built into Haskell. For instance, an example use case:
getPercent :: Int -> IO Bool
getPercent x = do
y <- getStdRandom (randomR (1,100))
return $ y < x
mapIO (\f -> getPercent 50) [0..10] []
The standard way is via:
Control.Monad.mapM :: (Monad m) => (a -> m b) -> [a] -> m [b]
which is implemented in terms of sequence:
sequence :: (Monad m) => [m a] -> m [a]
Just to add to Don's answer, hake a look to the mapM_ function as well, which does exactly what mapM does but discards all the results so you get only side-effects.
This is useful if you want the have computations executed (for example IO computations) but are not interested in the result (for example, unlinking files).
And also see forM and forM_.