Following on from my last question, I am now able to get the desired result I was after, but in a totally evil way: using unsafePerformIO. I understand this is not the right way to approach this (though to my defense I got the idea from hoogle checking types, and then from an ag search in kmett's repos to see when he used unsafePerformIO across a hundred or so repos. I read the warnings on hackage, I know it is bad.).
What I'd like now is to be a way to do this without unsafePerformIO.
Here is the code:
module Main where
import Control.Monad (liftM)
import Data.List (isSubsequenceOf)
import qualified Data.Text as T
import System.Directory (listDirectory)
import System.FilePath ((</>), takeExtension)
import System.IO.Unsafe (unsafePerformIO)
import Text.PDF.Info
title :: FilePath -> IO String
title path = do
result <- pdfInfo path
case result of
Left someError -> do
return "no title"
Right info -> do
case (pdfInfoTitle info) of
Nothing -> return "no title"
Just title -> return (T.unpack title)
titleString :: FilePath -> String
titleString s = unsafePerformIO (title s)
{-# NOINLINE titleString #-}
dir = "/some/path"
main :: IO ()
main = do
print =<<
liftM
(filter
(\path ->
(isSubsequenceOf "annotated" (titleString (dir </> path))) &&
(takeExtension path == ".pdf")))
(listDirectory dir)
Along the way I tried to use hole typing and lots of Hoogle to get help from the tools (teach a man to fish..). I need mentoring to get the process of discovery using tools and docs more dialed. If you have tips on how you approach such things, or at least imagine what you would do if you lost all your long term memory of Haskell except hole typing and hoogle and let me know how you would proceed. I plan to watch Brian McKenna's data61 videos soon, but until then. Thanks in advance!
First, let's split out your filtering function:
isAnnotatedPdf :: FilePath -> Bool
isAnnotatedPdf path = (isSubsequenceOf "annotated" (titleString (dir </> path))) && (takeExtension path == ".pdf")
main :: IO ()
main = do
print =<<
liftM
(filter isAnnotatedPdf)
(listDirectory dir)
Now, use some syntactic sugar to clean up main:
main :: IO ()
main = do
dirList <- listDirectory dir
let filteredList = filter isAnnotatedPdf dirList
print filteredList
Next, change isAnnotatedPdf to return its result inside of IO, and then modify main so that it's okay to do that:
isAnnotatedPdf :: FilePath -> IO Bool
isAnnotatedPdf path = do
return $ (isSubsequenceOf "annotated" (titleString (dir </> path))) && (takeExtension path == ".pdf")
main :: IO ()
main = do
dirList <- listDirectory dir
filteredList <- filterM isAnnotatedPdf dirList
print filteredList
Extract a variable pdfTitle inside isAnnotatedPdf to make the next step more clear:
isAnnotatedPdf :: FilePath -> IO Bool
isAnnotatedPdf path = do
let pdfTitle = titleString (dir </> path)
return $ (isSubsequenceOf "annotated" pdfTitle) && (takeExtension path == ".pdf")
Finally, change isAnnotatedPdf to use its new IO context instead of using your unsafePerformIO wrapper:
isAnnotatedPdf :: FilePath -> IO Bool
isAnnotatedPdf path = do
pdfTitle <- title (dir </> path)
return $ (isSubsequenceOf "annotated" pdfTitle) && (takeExtension path == ".pdf")
And you're done! Now you can get rid of titleString and all of your references to unsafePerformIO.
As a bonus, you can now easily avoid the need to call pdfInfo on things that aren't PDFs, by moving the pure takeExtension check to before the monadic title check, like this:
isAnnotatedPdf :: FilePath -> IO Bool
isAnnotatedPdf path = if takeExtension path == ".pdf"
then do
pdfTitle <- title (dir </> path)
return $ isSubsequenceOf "annotated" pdfTitle
else return False
Or using <$> instead of do:
isAnnotatedPdf :: FilePath -> IO Bool
isAnnotatedPdf path = if takeExtension path == ".pdf"
then isSubsequenceOf "annotated" <$> title (dir </> path)
else return False
From the following code from a simple server using Spock:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Web.Spock
import Web.Spock.Config
import Data.Time.Clock
import Control.Concurrent
import Network.HTTP.Types.Status
import Network.HTTP.Types.URI
import Control.Monad.Trans
import Control.Concurrent.STM
import qualified Data.Text as T
app :: MyApp ()
app =
do get root $ redirect' "https://google.no"
-- Store params
get ("oauth2" <//> var) $ \path' ->
do ...
The majority of the imports don't relate to the question.
This is the (<//>) :: Path as Open -> Path bs ps -> Path (Append as bs) ps function. As the documentation says:
Combine two path components
In the source code [GitHub], we see that it is implemented as:
(<//>) :: Path as 'Open -> Path bs ps -> Path (Append as bs) ps
(<//>) = (</>)
This (</>) function originates from the (</>) :: Path as Open -> Path bs ps -> Path (Append as bs) ps in the reroute package. It is implemented as [GitHub]:
(</>) :: Path as 'Open -> Path bs ps2 -> Path (Append as bs) ps2
(</>) Empty xs = xs
(</>) (StaticCons pathPiece xs) ys = StaticCons pathPiece (xs </> ys)
(</>) (VarCons xs) ys = VarCons (xs </> ys)
It thus basically appends some path pieces together. You can here see this as some sort of linked list. A string literal (like "oauth2") can, with the OverloadedStrings be converted into a Path since it is an instance of the IsString class [GitHub]:
instance (a ~ '[], pathState ~ 'Open) => IsString (Path a pathState) where
fromString = static
It will generate a StaticCons with each time a piece of the path (well since "oauth2" does not contain any slashes, it will just have one block):
static :: String -> Path '[] 'Open
static s =
let pieces = filter (not . T.null) $ T.splitOn "/" $ T.pack s
in foldr StaticCons Empty pieces
I finished reading the Pipes tutorial, and I wanted to write a function to list all the files in a directory, recursively. I tried with the following code:
enumFiles :: FilePath -> Producer' FilePath (PS.SafeT IO) ()
enumFiles path =
PS.bracket (openDirStream path) (closeDirStream) loop
where
loop :: DirStream -> Producer' FilePath (PS.SafeT IO) ()
loop ds = PS.liftBase (readDirStream ds) >>= checkName
where
checkName :: FilePath -> Producer' FilePath (PS.SafeT IO) ()
checkName "" = return ()
checkName "." = loop ds
checkName ".." = loop ds
checkName name = PS.liftBase (getSymbolicLinkStatus newPath)
>>= checkStat newPath
where newPath = path </> name
checkStat path stat
| isRegularFile stat = yield path >> loop ds
| isDirectory stat = enumFiles path
| otherwise = loop ds
However this producer will terminate as soon as the return () is reached. I guess I'm not composing it in the right way, but I fail to see what is the correct way of doing this.
Simply change this line:
| isDirectory stat = enumFiles path
to
| isDirectory stat = enumFiles path >> loop ds
The code was missing the recursion in this recursive case.
You can also break this producer up into a composition of smaller producers and pipes:
{-# LANGUAGE RankNTypes #-}
module Main where
import qualified Pipes.Prelude as P
import qualified Pipes.Safe as PS
import Control.Monad
import Pipes
import System.FilePath.Posix
import System.Posix.Directory
import System.Posix.Files
readDirStream' :: FilePath -> Producer' FilePath (PS.SafeT IO) ()
readDirStream' dirpath =
PS.bracket (openDirStream dirpath) closeDirStream (forever . loop)
where
loop stream =
liftIO (readDirStream stream) >>= yield
enumFiles :: FilePath -> Producer' FilePath (PS.SafeT IO) ()
enumFiles path =
readDirStream' path
>-> P.takeWhile (/= "")
>-> P.filter (not . flip elem [".", ".."])
>-> P.map (path </>)
>-> forever (do
entry <- await
status <- liftIO $ getSymbolicLinkStatus entry
when (isDirectory status) (enumFiles entry)
when (isRegularFile status) (yield entry))
main :: IO ()
main =
PS.runSafeT $ runEffect (enumFiles "/tmp" >-> P.stdoutLn)
I find it's often helpful to use forever from Control.Monad or one of the combinators from Pipe.Prelude instead of manual recursion; it helps cut down on small typos like this one. However, as the kids say, your mileage may very well vary.
Is it possible to split a Shell in Turtle library (Haskell) and do different things to either split of the shell, such that the original Shell is only run once ?
/---- shell2
---Shell1 --/
\
\-----shell3
For instance, how to do
do
let lstmp = lstree "/tmp"
view lstmp
view $ do
path <- lstmp
x <- liftIO $ testdir path
return x
such that lstree "/tmp" would only run once.
Specifically I would like to send Shell 2 and Shell 3 to different files using output.
You won't be able to split a Shell into two separate shells that run simultaneously, unless there's some magic I don't know. But file writing is a fold over the contents of a shell or some other succession of things. It is built into turtle that you can always combine many folds and make them run simultaneously using the Control.Foldl material - here
foldIO :: Shell a -> FoldM IO a r -> IO r -- specializing
A shell is secretly a FoldM IO a r -> IO r under the hood anyway, so this is basically runShell. To do this we need to get the right Shell and the right combined FoldM IO. The whole idea of the Fold a b and FoldM m a b types from the foldl package is simultaneous folding.
I think the easiest way to get the right shell is just to make the lstree fold return a FilePath together with the result of testdir. You basically wrote this:
withDirInfo :: FilePath -> Shell (Bool, FilePath)
withDirInfo tmp = do
let lstmp = lstree tmp
path <- lstmp
bool <- liftIO $ testdir path
return (bool, path)
So now we can get a Shell (Bool, FilePath) from /tmp This has all the information our two folds will need, and thus that our combined fold will need.
Next we might write a helper fold that prints the Text component of the FilePath to a given handle:
sinkFilePaths :: Handle -> FoldM IO FilePath ()
sinkFilePaths handle = L.sink (T.hPutStrLn handle . format fp)
Then we can use this Handle -> FoldM IO FilePath () to define two FoldM IO (Bool, FilePath) (). Each will write different stuff to different handles, and we can unite them into a single simultaneous fold with <*. This is an independent FoldM IO ... and can be applied e.g. to a pure list of type [(Bool, FilePath)] using L.fold and it will write different things from the list to the different handles. In our case, though, we will apply it to the Shell (Bool, FilePath) we defined.
The only subtle part of this is the use of L.handlesM to print only the second element, in both cases, and only those filtered as directories in the other. This uses the _2 lens and filtered from the lens libraries. This could probably be simplified, but see what you think:
{-#LANGUAGE OverloadedStrings #-}
import Turtle
import qualified Control.Foldl as L
import qualified System.IO as IO
import Control.Lens (_2,filtered)
import qualified Data.Text.IO as T
main = IO.withFile "tmpfiles.txt" IO.WriteMode $ \h ->
IO.withFile "tmpdirs.txt" IO.WriteMode $ \h' -> do
foldIO (withDirInfo "/tmp") (sinkFilesDirs h h')
withDirInfo :: Turtle.FilePath -> Shell (Bool, Turtle.FilePath)
withDirInfo tmp = do
let lstmp = lstree tmp
path <- lstmp
bool <- liftIO $ testdir path
return (bool, path)
sinkFilePaths :: Handle -> FoldM IO Turtle.FilePath ()
sinkFilePaths handle = L.sink (T.hPutStrLn handle . format fp)
sinkFilesDirs :: Handle -> Handle -> FoldM IO (Bool, Turtle.FilePath) ()
sinkFilesDirs h h' = allfiles <* alldirs where
allfiles :: L.FoldM IO (Bool, Turtle.FilePath) ()
allfiles = L.handlesM _2 (sinkFilePaths h)
-- handle the second element of pairs with sinkFilePaths
alldirs :: FoldM IO (Bool, Turtle.FilePath) ()
alldirs = L.handlesM (filtered (\(bool,file) -> bool) . _2) (sinkFilePaths h')
-- handle the second element of pairs where the first element
-- is true using sinkFilePaths
It sounds like you're looking for something like async to split off your shells from the first shell and then wait for them to return. async is a pretty capable library that can achieve much more than the below example, but it provides a pretty simple solution to what you're asking for:
import Control.Concurrent.Async
import Turtle.Shell
import Turtle.Prelude
main :: IO ()
main = do
let lstmp1 = lstree "/tmp"
let lstmp2 = lstree "/etc"
view lstmp1
view lstmp2
job1 <- async $ view $ do
path <- lstmp1
x <- liftIO $ testdir path
return x
job2 <- async $ view $ do
path <- lstmp2
x <- liftIO $ testdir path
return x
wait job1
wait job2
Is this what you're looking for?
I'm writing a shell script (my 1st non-example in haskell) which is supposed to list a directory, get every file size, do some string manipulation (pure code) and then rename some files. I'm not sure what i'm doing wrong, so 2 questions:
How should i arrange the code in such program?
I have a specific issue, i get the following error, what am i doing wrong?
error:
Couldn't match expected type `[FilePath]'
against inferred type `IO [FilePath]'
In the second argument of `mapM', namely `fileNames'
In a stmt of a 'do' expression:
files <- (mapM getFileNameAndSize fileNames)
In the expression:
do { fileNames <- getDirectoryContents;
files <- (mapM getFileNameAndSize fileNames);
sortBy cmpFilesBySize files }
code:
getFileNameAndSize fname = do (fname, (withFile fname ReadMode hFileSize))
getFilesWithSizes = do
fileNames <- getDirectoryContents
files <- (mapM getFileNameAndSize fileNames)
sortBy cmpFilesBySize files
Your second, specific, problem is with the types of your functions. However, your first issue (not really a type thing) is the do statement in getFileNameAndSize. While do is used with monads, it's not a monadic panacea; it's actually implemented as some simple translation rules. The Cliff's Notes version (which isn't exactly right, thanks to some details involving error handling, but is close enough) is:
do a ≡ a
do a ; b ; c ... ≡ a >> do b ; c ...
do x <- a ; b ; c ... ≡ a >>= \x -> do b ; c ...
In other words, getFileNameAndSize is equivalent to the version without the do block, and so you can get rid of the do. This leaves you with
getFileNameAndSize fname = (fname, withFile fname ReadMode hFileSize)
We can find the type for this: since fname is the first argument to withFile, it has type FilePath; and hFileSize returns an IO Integer, so that's the type of withFile .... Thus, we have getFileNameAndSize :: FilePath -> (FilePath, IO Integer). This may or may not be what you want; you might instead want FilePath -> IO (FilePath,Integer). To change it, you can write any of
getFileNameAndSize_do fname = do size <- withFile fname ReadMode hFileSize
return (fname, size)
getFileNameAndSize_fmap fname = fmap ((,) fname) $
withFile fname ReadMode hFileSize
-- With `import Control.Applicative ((<$>))`, which is a synonym for fmap.
getFileNameAndSize_fmap2 fname = ((,) fname)
<$> withFile fname ReadMode hFileSize
-- With {-# LANGUAGE TupleSections #-} at the top of the file
getFileNameAndSize_ts fname = (fname,) <$> withFile fname ReadMode hFileSize
Next, as KennyTM pointed out, you have fileNames <- getDirectoryContents; since getDirectoryContents has type FilePath -> IO FilePath, you need to give it an argument. (e.g. getFilesWithSizes dir = do fileNames <- getDirectoryContents dir ...). This is probably just a simple oversight.
Mext, we come to the heart of your error: files <- (mapM getFileNameAndSize fileNames). I'm not sure why it gives you the precise error it does, but I can tell you what's wrong. Remember what we know about getFileNameAndSize. In your code, it returns a (FilePath, IO Integer). However, mapM is of type Monad m => (a -> m b) -> [a] -> m [b], and so mapM getFileNameAndSize is ill-typed. You want getFileNameAndSize :: FilePath -> IO (FilePath,Integer), like I implemented above.
Finally, we need to fix your last line. First of all, although you don't give it to us, cmpFilesBySize is presumably a function of type (FilePath, Integer) -> (FilePath, Integer) -> Ordering, comparing on the second element. This is really simple, though: using Data.Ord.comparing :: Ord a => (b -> a) -> b -> b -> Ordering, you can write this comparing snd, which has type Ord b => (a, b) -> (a, b) -> Ordering. Second, you need to return your result wrapped up in the IO monad rather than just as a plain list; the function return :: Monad m => a -> m a will do the trick.
Thus, putting this all together, you'll get
import System.IO (FilePath, withFile, IOMode(ReadMode), hFileSize)
import System.Directory (getDirectoryContents)
import Control.Applicative ((<$>))
import Data.List (sortBy)
import Data.Ord (comparing)
getFileNameAndSize :: FilePath -> IO (FilePath, Integer)
getFileNameAndSize fname = ((,) fname) <$> withFile fname ReadMode hFileSize
getFilesWithSizes :: FilePath -> IO [(FilePath,Integer)]
getFilesWithSizes dir = do fileNames <- getDirectoryContents dir
files <- mapM getFileNameAndSize fileNames
return $ sortBy (comparing snd) files
This is all well and good, and will work fine. However, I might write it slightly differently. My version would probably look like this:
{-# LANGUAGE TupleSections #-}
import System.IO (FilePath, withFile, IOMode(ReadMode), hFileSize)
import System.Directory (getDirectoryContents)
import Control.Applicative ((<$>))
import Control.Monad ((<=<))
import Data.List (sortBy)
import Data.Ord (comparing)
preservingF :: Functor f => (a -> f b) -> a -> f (a,b)
preservingF f x = (x,) <$> f x
-- Or liftM2 (<$>) (,), but I am not entirely sure why.
fileSize :: FilePath -> IO Integer
fileSize fname = withFile fname ReadMode hFileSize
getFilesWithSizes :: FilePath -> IO [(FilePath,Integer)]
getFilesWithSizes = return . sortBy (comparing snd)
<=< mapM (preservingF fileSize)
<=< getDirectoryContents
(<=< is the monadic equivalent of ., the function composition operator.) First off: yes, my version is longer. However, I'd probably already have preservingF defined somewhere, making the two equivalent in length.* (I might even inline fileSize if it weren't used elsewhere.) Second, I like this version better because it involves chaining together simpler pure functions we've already written. While your version is similar, mine (I feel) is more streamlined and makes this aspect of things clearer.
So this is a bit of an answer to your first question of how to structure these things. I personally tend to lock my IO down into as few functions as possible—only functions which need to touch the outside world directly (e.g. main and anything which interacts with a file) get an IO. Everything else is an ordinary pure function (and is only monadic if it's monadic for general reasons, along the lines of preservingF). I then arrange things so that main, etc., are just compositions and chains of pure functions: main gets some values from IO-land; then it calls pure functions to fold, spindle, and mutilate the date; then it gets more IO values; then it operates more; etc. The idea is to separate the two domains as much as possible, so that the more compositional non-IO code is always free, and the black-box IO is only done precisely where necessary.
Operators like <=< really help with writing code in this style, as they let you operate on functions which interact with monadic values (such as the IO-world) just as you would operate on normal functions. You should also look at Control.Applicative's function <$> liftedArg1 <*> liftedArg2 <*> ... notation, which lets you apply ordinary functions to any number of monadic (really Applicative) arguments. This is really nice for getting rid of spurious <-s and just chaining pure functions over monadic code.
*: I feel like preservingF, or at least its sibling preserving :: (a -> b) -> a -> (a,b), should be in a package somewhere, but I've been unable to find either.
getDirectoryContents is a function. You should supply an argument to it, e.g.
fileNames <- getDirectoryContents "/usr/bin"
Also, the type of getFileNameAndSize is FilePath -> (FilePath, IO Integer), as you can check from ghci:
Prelude> :m + System.IO
Prelude System.IO> let getFileNameAndSize fname = do (fname, (withFile fname ReadMode hFileSize))
Prelude System.IO> :t getFileNameAndSize
getFileNameAndSize :: FilePath -> (FilePath, IO Integer)
But mapM requires the input function to return an IO stuff:
Prelude System.IO> :t mapM
mapM :: (Monad m) => (a -> m b) -> [a] -> m [b]
-- # ^^^^^^^^
You should change its type to FilePath -> IO (FilePath, Integer) to match the type.
getFileNameAndSize fname = do
fsize <- withFile fname ReadMode hFileSize
return (fname, fsize)