Help with Haskell IO typing - haskell

I need some help with types in Haskell...
Here's the code I'm working on:
loadManyPeople :: [FilePath] → IO [Person]
loadManyPeople fs = do
return $ concat $ map loadPeople fs
loadPeople :: FilePath → IO [Person]
loadPeople file = do
lines ← getLines file
return $ map parsePerson lines
loadPeople is fine. I want loadManyPeople to load all the Persons from each file, then concat them into one list of Persons.
I'm new to Haskell and need help with getting the types to work out.
Thanks for the help.
Alex

loadPeople gives you an IO [Person], so map loadPeople gives you a [IO [Person]], however to use concat, you'd need a [[Person]].
To fix this, you can use sequence, which is a function of type [IO a] -> IO [a], so you can do the following:
loadManyPeople fs = do
manyPeople <- sequence $ map loadPeople fs
return $ concat manyPeople
However there's a shortcut for using map and then sequence: mapM which has type (a -> IO b) -> [a] -> IO [b]. With mapM your code looks like this:
loadManyPeople fs = do
manyPeople <- mapM loadPeople fs
return $ concat manyPeople
This can be written more succinctly using Applicative:
import Control.Applicative
loadManyPeople fs = concat <$> mapM loadPeople fs

Related

Haskell - Inserting a thread dealy between the results of Prelude.sequence function

I have some code that looks like this
listOfIOByteStrings = simpleHttp <$> apiLinks :: [IO ByteString]
where apiLinks is a list of links which calls some API function.
I also have this function
z = sequence listOfIOByteStrings
sequence has this type sequence :: Monad m => t (m a) -> m (t a)
What I want to do is add a thread delay between each ByteString's evaluation.
I'm thinking of using threadDelay threadDelay :: Int -> IO ()
This is what I'm doing ultimately
listOfContent <- z
pPrint $ filteredTitles . onlyElems . parseXML <$> listOfContent
where
parseXML :: ByteString -> [Content]
onlyElems :: [Content] -> [Element]
and
filteredTitles :: [Element] -> [String]
Applying thread-delay between result of sequence would be something like this
printing (filteredTitles . onlyElems . parseXML (bytestring of link1))...
delay of 1 sec...
printing (filteredTitles . onlyElems . parseXML (bytestring of link2))...
delay of 1 sec...
printing (filteredTitles . onlyElems . parseXML (bytestring of link3))...
delay of 1 sec...
I'm not sure how I should go about doing that.
One of way to do that is using forM_ as
...
do listOfContent <- z
forM_ listOfContent $
\content -> do pPrint $ (filteredTitles . onlyElems . parseXML) content
threadDelay 1000000
I don't quite follow all the types in your pipeline, so consider this a rough sketch that can be fleshed out. First, you don't want to call sequence too early. Keep your list of IO Bytestring values for now.
Next, you want some function (defined in terms of filteredTitles . onlyElems . parseXML) that takes a single Bytestring and returns an IO (). If pPrint is the write type, this might simply be
process :: IO ByteString -> IO ()
process ibs = do
bs <- ibs
pPrint (filteredTitles . onlyElems . parseXML $ bs)
map process (apiLinks >>= simpleHttp) should result in a list of type [IO ()]. That could probably be rewrittn in a less clunky fashion, but now we can get to the heart of the answer, which is using intersperse to insert your thread delays before finally sequencing the [IO ()] to get IO [()].
import Data.List
let results = map process (apiLinks >>= simpleHttp)
actions = intersperse (threadDelay 1) results
in sequence actions
intersperse :: a -> [a] -> [a] works by inserting its first argument between each element of its second. A simple example using strings:
> intersperse '-' "abc"
"a-b-c"

Unwrapping a from IO (a)

I've been learning Haskell in the last 2 weeks and decided to try challenges at places such as HackerRank. This has required learning IO. I have read many answers on StackExchange and the general gist is you don't unwrap IO a, you just manipulate that data inside the IO function. That being the case what is the point of all the pure functions, if I'm not allowed to send data from main out to them? Here is some code that reads how many test cases, then for each test case reads N ordered pairs.
main = do
test <- getLine
replicateM (read test) doTest
doTest = do
query<-getLine
rs<-replicateM (read query) readPair
return rs -- just here to make the file compile
readPair :: IO (Int, Int)
readPair = do
input <- getLine
let a = words input in return (read (a!!0) :: Int, read (a!!1) ::Int)
At this point I have a IO [(Int, Int)] inside of rs. I would like to send that data to this function:
validFunction :: [(Int,Int)]->Bool
validFuntion [] = True
validFunction (x:[]) = True
validFunction (x:xs) = (not $ elem (snd x) (fmap snd xs)) && validFunction xs
But I can't seem to figure out how to do that. Any help or suggestions about how to call this function with the data I've read from the user would be appreciated. Or if I'm going about it from the wrong angle, and pointers on what I should be doing would also work.
Edit: From reading lots of other questions on here I now have the general idea that once you're in IO you're stuck there. But what I can't seem to find is the syntax to call a pure function with IO data and get back IO data. I've tried some of the following :
fmap validFunction [rs] :: IO Bool -- tried it with just rs without [] as well
mapM validFunction [rs] :: IO Bool
validFunction rs :: IO Bool
I was able to get this to work:
putStrLn . f . validFunction $ rs
though I'm still not clear on why this lets you pass the IO [(Int, Int)] to validFunction.
First of all, if you use x <- act in do, you essentially have a value. Unless you did something very suspicious, x isn't a IO something, but a something: So it's perfectly fine to use
foo :: Int -> Char
foo = …
bar :: IO Int
bar = …
fooDo :: IO Char
fooDo = do
number <- bar
return (foo number) -- apply foo directly on number
However, IO is an instance of Functor, so we can use fmap to lift foo:
liftedFoo :: IO Int -> IO Char
liftedFoo = fmap foo
So we could have written fooDo like this:
fooDo = fmap foo readLn
Although it's name is now misleading, it still does the same as before. But let's leave this naming voodoo aside, how would you tackle this? Well, your doTest has the correct type:
doTest :: IO [(Int, Int)]
doTest = do
query <- getLine
rs <- replicateM (read query) readPair
return rs
So all that's missing is calling validFunction. We can do that like in fooDo:
doTest :: IO Bool
doTest = do
query <- getLine
rs <- replicateM (read query) readPair
return (validFunction rs)
-- ^^^^^^^^^^^^^^^^^^
-- no IO inside here
-- ^^^^^^
-- back
-- to IO
Or we can fmap over another IO value, like replicateM (read query) readPair:
doTest :: IO Bool
doTest = do
query <- getLine
fmap validFunction (replicateM (read query) readPair)
The latter is harder to read, though. But you write your fooDo doTest as you want to do.

Haskell Turtle - split a shell

Is it possible to split a Shell in Turtle library (Haskell) and do different things to either split of the shell, such that the original Shell is only run once ?
/---- shell2
---Shell1 --/
\
\-----shell3
For instance, how to do
do
let lstmp = lstree "/tmp"
view lstmp
view $ do
path <- lstmp
x <- liftIO $ testdir path
return x
such that lstree "/tmp" would only run once.
Specifically I would like to send Shell 2 and Shell 3 to different files using output.
You won't be able to split a Shell into two separate shells that run simultaneously, unless there's some magic I don't know. But file writing is a fold over the contents of a shell or some other succession of things. It is built into turtle that you can always combine many folds and make them run simultaneously using the Control.Foldl material - here
foldIO :: Shell a -> FoldM IO a r -> IO r -- specializing
A shell is secretly a FoldM IO a r -> IO r under the hood anyway, so this is basically runShell. To do this we need to get the right Shell and the right combined FoldM IO. The whole idea of the Fold a b and FoldM m a b types from the foldl package is simultaneous folding.
I think the easiest way to get the right shell is just to make the lstree fold return a FilePath together with the result of testdir. You basically wrote this:
withDirInfo :: FilePath -> Shell (Bool, FilePath)
withDirInfo tmp = do
let lstmp = lstree tmp
path <- lstmp
bool <- liftIO $ testdir path
return (bool, path)
So now we can get a Shell (Bool, FilePath) from /tmp This has all the information our two folds will need, and thus that our combined fold will need.
Next we might write a helper fold that prints the Text component of the FilePath to a given handle:
sinkFilePaths :: Handle -> FoldM IO FilePath ()
sinkFilePaths handle = L.sink (T.hPutStrLn handle . format fp)
Then we can use this Handle -> FoldM IO FilePath () to define two FoldM IO (Bool, FilePath) (). Each will write different stuff to different handles, and we can unite them into a single simultaneous fold with <*. This is an independent FoldM IO ... and can be applied e.g. to a pure list of type [(Bool, FilePath)] using L.fold and it will write different things from the list to the different handles. In our case, though, we will apply it to the Shell (Bool, FilePath) we defined.
The only subtle part of this is the use of L.handlesM to print only the second element, in both cases, and only those filtered as directories in the other. This uses the _2 lens and filtered from the lens libraries. This could probably be simplified, but see what you think:
{-#LANGUAGE OverloadedStrings #-}
import Turtle
import qualified Control.Foldl as L
import qualified System.IO as IO
import Control.Lens (_2,filtered)
import qualified Data.Text.IO as T
main = IO.withFile "tmpfiles.txt" IO.WriteMode $ \h ->
IO.withFile "tmpdirs.txt" IO.WriteMode $ \h' -> do
foldIO (withDirInfo "/tmp") (sinkFilesDirs h h')
withDirInfo :: Turtle.FilePath -> Shell (Bool, Turtle.FilePath)
withDirInfo tmp = do
let lstmp = lstree tmp
path <- lstmp
bool <- liftIO $ testdir path
return (bool, path)
sinkFilePaths :: Handle -> FoldM IO Turtle.FilePath ()
sinkFilePaths handle = L.sink (T.hPutStrLn handle . format fp)
sinkFilesDirs :: Handle -> Handle -> FoldM IO (Bool, Turtle.FilePath) ()
sinkFilesDirs h h' = allfiles <* alldirs where
allfiles :: L.FoldM IO (Bool, Turtle.FilePath) ()
allfiles = L.handlesM _2 (sinkFilePaths h)
-- handle the second element of pairs with sinkFilePaths
alldirs :: FoldM IO (Bool, Turtle.FilePath) ()
alldirs = L.handlesM (filtered (\(bool,file) -> bool) . _2) (sinkFilePaths h')
-- handle the second element of pairs where the first element
-- is true using sinkFilePaths
It sounds like you're looking for something like async to split off your shells from the first shell and then wait for them to return. async is a pretty capable library that can achieve much more than the below example, but it provides a pretty simple solution to what you're asking for:
import Control.Concurrent.Async
import Turtle.Shell
import Turtle.Prelude
main :: IO ()
main = do
let lstmp1 = lstree "/tmp"
let lstmp2 = lstree "/etc"
view lstmp1
view lstmp2
job1 <- async $ view $ do
path <- lstmp1
x <- liftIO $ testdir path
return x
job2 <- async $ view $ do
path <- lstmp2
x <- liftIO $ testdir path
return x
wait job1
wait job2
Is this what you're looking for?

Haskell IO code doesn't typecheck

I'm a beginner with Haskell and am having trouble figuring out some code. What do I need to do to get the types right on this IO section of my code?
Thanks in advance.
loadPeople :: FilePath -> IO [Person]
loadPeople file = do
lines <- getLines file
map parsePerson lines
getLines :: FilePath -> IO [String]
getLines = liftM lines . readFile
parsePerson :: String -> Person
parsePerson line = ...........
map is underlined in red in Leksah, and the compile error I am receiving is:
src\Main.hs:13:3:
Couldn't match expected type `IO [Person]'
against inferred type `[Person]'
In the expression: map parsePerson lines
In the expression:
do { lines <- getLines file;
map parsePerson lines }
In the definition of `loadPeople':
loadPeople file
= do { lines <- getLines file;
map parsePerson lines }
map parsePerson lines has type [Person], but since you need the result type of loadPeople is IO [Person], you need to wrap it in IO using return:
return $ map parsePerson lines

dealing with IO vs pure code in haskell

I'm writing a shell script (my 1st non-example in haskell) which is supposed to list a directory, get every file size, do some string manipulation (pure code) and then rename some files. I'm not sure what i'm doing wrong, so 2 questions:
How should i arrange the code in such program?
I have a specific issue, i get the following error, what am i doing wrong?
error:
Couldn't match expected type `[FilePath]'
against inferred type `IO [FilePath]'
In the second argument of `mapM', namely `fileNames'
In a stmt of a 'do' expression:
files <- (mapM getFileNameAndSize fileNames)
In the expression:
do { fileNames <- getDirectoryContents;
files <- (mapM getFileNameAndSize fileNames);
sortBy cmpFilesBySize files }
code:
getFileNameAndSize fname = do (fname, (withFile fname ReadMode hFileSize))
getFilesWithSizes = do
fileNames <- getDirectoryContents
files <- (mapM getFileNameAndSize fileNames)
sortBy cmpFilesBySize files
Your second, specific, problem is with the types of your functions. However, your first issue (not really a type thing) is the do statement in getFileNameAndSize. While do is used with monads, it's not a monadic panacea; it's actually implemented as some simple translation rules. The Cliff's Notes version (which isn't exactly right, thanks to some details involving error handling, but is close enough) is:
do a ≡ a
do a ; b ; c ... ≡ a >> do b ; c ...
do x <- a ; b ; c ... ≡ a >>= \x -> do b ; c ...
In other words, getFileNameAndSize is equivalent to the version without the do block, and so you can get rid of the do. This leaves you with
getFileNameAndSize fname = (fname, withFile fname ReadMode hFileSize)
We can find the type for this: since fname is the first argument to withFile, it has type FilePath; and hFileSize returns an IO Integer, so that's the type of withFile .... Thus, we have getFileNameAndSize :: FilePath -> (FilePath, IO Integer). This may or may not be what you want; you might instead want FilePath -> IO (FilePath,Integer). To change it, you can write any of
getFileNameAndSize_do fname = do size <- withFile fname ReadMode hFileSize
return (fname, size)
getFileNameAndSize_fmap fname = fmap ((,) fname) $
withFile fname ReadMode hFileSize
-- With `import Control.Applicative ((<$>))`, which is a synonym for fmap.
getFileNameAndSize_fmap2 fname = ((,) fname)
<$> withFile fname ReadMode hFileSize
-- With {-# LANGUAGE TupleSections #-} at the top of the file
getFileNameAndSize_ts fname = (fname,) <$> withFile fname ReadMode hFileSize
Next, as KennyTM pointed out, you have fileNames <- getDirectoryContents; since getDirectoryContents has type FilePath -> IO FilePath, you need to give it an argument. (e.g. getFilesWithSizes dir = do fileNames <- getDirectoryContents dir ...). This is probably just a simple oversight.
Mext, we come to the heart of your error: files <- (mapM getFileNameAndSize fileNames). I'm not sure why it gives you the precise error it does, but I can tell you what's wrong. Remember what we know about getFileNameAndSize. In your code, it returns a (FilePath, IO Integer). However, mapM is of type Monad m => (a -> m b) -> [a] -> m [b], and so mapM getFileNameAndSize is ill-typed. You want getFileNameAndSize :: FilePath -> IO (FilePath,Integer), like I implemented above.
Finally, we need to fix your last line. First of all, although you don't give it to us, cmpFilesBySize is presumably a function of type (FilePath, Integer) -> (FilePath, Integer) -> Ordering, comparing on the second element. This is really simple, though: using Data.Ord.comparing :: Ord a => (b -> a) -> b -> b -> Ordering, you can write this comparing snd, which has type Ord b => (a, b) -> (a, b) -> Ordering. Second, you need to return your result wrapped up in the IO monad rather than just as a plain list; the function return :: Monad m => a -> m a will do the trick.
Thus, putting this all together, you'll get
import System.IO (FilePath, withFile, IOMode(ReadMode), hFileSize)
import System.Directory (getDirectoryContents)
import Control.Applicative ((<$>))
import Data.List (sortBy)
import Data.Ord (comparing)
getFileNameAndSize :: FilePath -> IO (FilePath, Integer)
getFileNameAndSize fname = ((,) fname) <$> withFile fname ReadMode hFileSize
getFilesWithSizes :: FilePath -> IO [(FilePath,Integer)]
getFilesWithSizes dir = do fileNames <- getDirectoryContents dir
files <- mapM getFileNameAndSize fileNames
return $ sortBy (comparing snd) files
This is all well and good, and will work fine. However, I might write it slightly differently. My version would probably look like this:
{-# LANGUAGE TupleSections #-}
import System.IO (FilePath, withFile, IOMode(ReadMode), hFileSize)
import System.Directory (getDirectoryContents)
import Control.Applicative ((<$>))
import Control.Monad ((<=<))
import Data.List (sortBy)
import Data.Ord (comparing)
preservingF :: Functor f => (a -> f b) -> a -> f (a,b)
preservingF f x = (x,) <$> f x
-- Or liftM2 (<$>) (,), but I am not entirely sure why.
fileSize :: FilePath -> IO Integer
fileSize fname = withFile fname ReadMode hFileSize
getFilesWithSizes :: FilePath -> IO [(FilePath,Integer)]
getFilesWithSizes = return . sortBy (comparing snd)
<=< mapM (preservingF fileSize)
<=< getDirectoryContents
(<=< is the monadic equivalent of ., the function composition operator.) First off: yes, my version is longer. However, I'd probably already have preservingF defined somewhere, making the two equivalent in length.* (I might even inline fileSize if it weren't used elsewhere.) Second, I like this version better because it involves chaining together simpler pure functions we've already written. While your version is similar, mine (I feel) is more streamlined and makes this aspect of things clearer.
So this is a bit of an answer to your first question of how to structure these things. I personally tend to lock my IO down into as few functions as possible—only functions which need to touch the outside world directly (e.g. main and anything which interacts with a file) get an IO. Everything else is an ordinary pure function (and is only monadic if it's monadic for general reasons, along the lines of preservingF). I then arrange things so that main, etc., are just compositions and chains of pure functions: main gets some values from IO-land; then it calls pure functions to fold, spindle, and mutilate the date; then it gets more IO values; then it operates more; etc. The idea is to separate the two domains as much as possible, so that the more compositional non-IO code is always free, and the black-box IO is only done precisely where necessary.
Operators like <=< really help with writing code in this style, as they let you operate on functions which interact with monadic values (such as the IO-world) just as you would operate on normal functions. You should also look at Control.Applicative's function <$> liftedArg1 <*> liftedArg2 <*> ... notation, which lets you apply ordinary functions to any number of monadic (really Applicative) arguments. This is really nice for getting rid of spurious <-s and just chaining pure functions over monadic code.
*: I feel like preservingF, or at least its sibling preserving :: (a -> b) -> a -> (a,b), should be in a package somewhere, but I've been unable to find either.
getDirectoryContents is a function. You should supply an argument to it, e.g.
fileNames <- getDirectoryContents "/usr/bin"
Also, the type of getFileNameAndSize is FilePath -> (FilePath, IO Integer), as you can check from ghci:
Prelude> :m + System.IO
Prelude System.IO> let getFileNameAndSize fname = do (fname, (withFile fname ReadMode hFileSize))
Prelude System.IO> :t getFileNameAndSize
getFileNameAndSize :: FilePath -> (FilePath, IO Integer)
But mapM requires the input function to return an IO stuff:
Prelude System.IO> :t mapM
mapM :: (Monad m) => (a -> m b) -> [a] -> m [b]
-- # ^^^^^^^^
You should change its type to FilePath -> IO (FilePath, Integer) to match the type.
getFileNameAndSize fname = do
fsize <- withFile fname ReadMode hFileSize
return (fname, fsize)

Resources