Haskell Turtle - split a shell

Haskell Turtle - split a shell - haskell

Is it possible to split a Shell in Turtle library (Haskell) and do different things to either split of the shell, such that the original Shell is only run once ?
/---- shell2
---Shell1 --/
\
\-----shell3
For instance, how to do
do
let lstmp = lstree "/tmp"
view lstmp
view $ do
path <- lstmp
x <- liftIO $ testdir path
return x
such that lstree "/tmp" would only run once.
Specifically I would like to send Shell 2 and Shell 3 to different files using output.

You won't be able to split a Shell into two separate shells that run simultaneously, unless there's some magic I don't know. But file writing is a fold over the contents of a shell or some other succession of things. It is built into turtle that you can always combine many folds and make them run simultaneously using the Control.Foldl material - here
foldIO :: Shell a -> FoldM IO a r -> IO r -- specializing
A shell is secretly a FoldM IO a r -> IO r under the hood anyway, so this is basically runShell. To do this we need to get the right Shell and the right combined FoldM IO. The whole idea of the Fold a b and FoldM m a b types from the foldl package is simultaneous folding.
I think the easiest way to get the right shell is just to make the lstree fold return a FilePath together with the result of testdir. You basically wrote this:
withDirInfo :: FilePath -> Shell (Bool, FilePath)
withDirInfo tmp = do
let lstmp = lstree tmp
path <- lstmp
bool <- liftIO $ testdir path
return (bool, path)
So now we can get a Shell (Bool, FilePath) from /tmp This has all the information our two folds will need, and thus that our combined fold will need.
Next we might write a helper fold that prints the Text component of the FilePath to a given handle:
sinkFilePaths :: Handle -> FoldM IO FilePath ()
sinkFilePaths handle = L.sink (T.hPutStrLn handle . format fp)
Then we can use this Handle -> FoldM IO FilePath () to define two FoldM IO (Bool, FilePath) (). Each will write different stuff to different handles, and we can unite them into a single simultaneous fold with <*. This is an independent FoldM IO ... and can be applied e.g. to a pure list of type [(Bool, FilePath)] using L.fold and it will write different things from the list to the different handles. In our case, though, we will apply it to the Shell (Bool, FilePath) we defined.
The only subtle part of this is the use of L.handlesM to print only the second element, in both cases, and only those filtered as directories in the other. This uses the _2 lens and filtered from the lens libraries. This could probably be simplified, but see what you think:
{-#LANGUAGE OverloadedStrings #-}
import Turtle
import qualified Control.Foldl as L
import qualified System.IO as IO
import Control.Lens (_2,filtered)
import qualified Data.Text.IO as T
main = IO.withFile "tmpfiles.txt" IO.WriteMode $ \h ->
IO.withFile "tmpdirs.txt" IO.WriteMode $ \h' -> do
foldIO (withDirInfo "/tmp") (sinkFilesDirs h h')
withDirInfo :: Turtle.FilePath -> Shell (Bool, Turtle.FilePath)
withDirInfo tmp = do
let lstmp = lstree tmp
path <- lstmp
bool <- liftIO $ testdir path
return (bool, path)
sinkFilePaths :: Handle -> FoldM IO Turtle.FilePath ()
sinkFilePaths handle = L.sink (T.hPutStrLn handle . format fp)
sinkFilesDirs :: Handle -> Handle -> FoldM IO (Bool, Turtle.FilePath) ()
sinkFilesDirs h h' = allfiles <* alldirs where
allfiles :: L.FoldM IO (Bool, Turtle.FilePath) ()
allfiles = L.handlesM _2 (sinkFilePaths h)
-- handle the second element of pairs with sinkFilePaths
alldirs :: FoldM IO (Bool, Turtle.FilePath) ()
alldirs = L.handlesM (filtered (\(bool,file) -> bool) . _2) (sinkFilePaths h')
-- handle the second element of pairs where the first element
-- is true using sinkFilePaths

It sounds like you're looking for something like async to split off your shells from the first shell and then wait for them to return. async is a pretty capable library that can achieve much more than the below example, but it provides a pretty simple solution to what you're asking for:
import Control.Concurrent.Async
import Turtle.Shell
import Turtle.Prelude
main :: IO ()
main = do
let lstmp1 = lstree "/tmp"
let lstmp2 = lstree "/etc"
view lstmp1
view lstmp2
job1 <- async $ view $ do
path <- lstmp1
x <- liftIO $ testdir path
return x
job2 <- async $ view $ do
path <- lstmp2
x <- liftIO $ testdir path
return x
wait job1
wait job2
Is this what you're looking for?

Related

Does GHC generate optimized code for `mapM_`?

I wrote this code,
import System.FilePath ((</>))
fp = "/Users/USER/Documents/Test/"
fpAcc = fp </> "acc.txt"
paths = map (fp </>) ["A.txt", "B.txt", "C.txt"]
main :: IO ()
main =
writeFile fpAcc ""
>> return paths
>>= mapM_ ((appendFile fpAcc =<<) . readFile)
>> readFile fpAcc >>= putStrLn
and this is the definition of mapM_:
mapM_ :: (Foldable t, Monad m) => (a -> m b) -> t a -> m ()
mapM_ f = foldr c (return ())
-- See Note [List fusion and continuations in 'c']
where c x k = f x >> k
{-# INLINE c #-}
Is the expression mapM_ ((appendFile fpAcc =<<) . readFile) producing 3 write operations to the disk, or due to some kind of GHC optimization, only one?
Would be great if the compiler could generate code that uses an intermediate memory to hold the appended data and then write only once. But, since mapM_ maps each element of the structure to a monadic action, perhaps each step performs one write operation.

No, this is not optimized. The code you wrote will open fpAcc, write some bytes, and close fpAcc once for each element in paths. Indeed, it would be incorrect for the compiler to optimize this: opening and closing a file is observable from outside the program, so an optimization that opened the file just once would be a behavioral change, not just a speed change.

Convert IO callback to infinite list

I am using a library that I can provide with a function a -> IO (), which it will call occasionally.
Because the output of my function depends not only on the a it receives as input, but also on the previous a's, it would be much easier for me to write a function [a] -> IO (), where [a] is infinite.
Can I write a function:
magical :: ([a] -> IO ()) -> (a -> IO ())
That collects the a's it receives from the callback and passes them to my function as a lazy infinite list?

The IORef solution is indeed the simplest one. If you'd like to explore a pure (but more complex) variant, have a look at conduit. There are other implementations of the same concept, see Iteratee I/O, but I found myself conduit to be very easy to use.
A conduit (AKA pipe) is an abstraction of of program that can accept input and/or produce output. As such, it can keep internal state, if needed. In your case, magical would be a sink, that is, a conduit that accepts input of some type, but produces no output. By wiring it into a source, a program that produces output, you complete the pipeline and then ever time the sink asks for an input, the source is run until it produces its output.
In your case you'd have roughly something like
magical :: Sink a IO () -- consumes a stream of `a`s, no result
magical = go (some initial state)
where
go state = do
m'input <- await
case m'input of
Nothing -> return () -- finish
Just input -> do
-- do something with the input
go (some updated state)

This is not exactly what you asked for, but it might be enough for your purposes, I think.
magical :: ([a] -> IO ()) -> IO (a -> IO ())
magical f = do
list <- newIORef []
let g x = do
modifyIORef list (x:)
xs <- readIORef list
f xs -- or (reverse xs), if you need FIFO ordering
return g
So if you have a function fooHistory :: [a] -> IO (), you can use
main = do
...
foo <- magical fooHistory
setHandler foo -- here we have foo :: a -> IO ()
...
As #danidaz wrote above, you probably do not need magical, but can play the same trick directly in your fooHistory, modifying a list reference (IORef [a]).
main = do
...
list <- newIORef []
let fooHistory x = do
modifyIORef list (x:)
xs <- readIORef list
use xs -- or (reverse xs), if you need FIFO ordering
setHandler fooHistory -- here we have fooHistory :: a -> IO ()
...

Control.Concurrent.Chan does almost exactly what I wanted!
import Control.Monad (forever)
import Control.Concurrent (forkIO)
import Control.Concurrent.Chan
setHandler :: (Char -> IO ()) -> IO ()
setHandler f = void . forkIO . forever $ getChar >>= f
process :: String -> IO ()
process ('h':'i':xs) = putStrLn "hi" >> process xs
process ('a':xs) = putStrLn "a" >> process xs
process (x:xs) = process xs
process _ = error "Guaranteed to be infinite"
main :: IO ()
main = do
c <- newChan
setHandler $ writeChan c
list <- getChanContents c
process list

This seems like a flaw in the library design to me. You might consider an upstream patch so that you could provide something more versatile as input.

Haskell Turtle get out of Shell Monad

could you please help me with Turtle library.
I want to write simple program, that calculates disk space usage.
Here is the code:
getFileSize :: FilePath -> IO Size
getFileSize f = do
status <- stat f
return $ fileSize status
main = sh $ do
let sizes = fmap getFileSize $ find (suffix ".hs") "."
so now I have sizes bind of type Shell (IO Size). But I can't just sum it, with sum fold, cause there is IO Size in there. If it was something like [IO Size] I could pull IO monad out of there by using sequence to transform it to IO [Size]. But I can't do this with Shell monad since it is not Traversable. So I wrote something like this
import qualified Control.Foldl as F
main = sh $ do
let sizes = fmap getFileSize $ find (suffix ".hs") "."
lst <- fold sizes F.list
let cont = sequence lst
sz <- liftIO $ cont
liftIO $ putStrLn (show (sum sz))
First I folded Shell (IO Size) to [IO Size] and then to IO [Size] to sum list afterwards.
But I wonder if there is more canonical or elegant solution to this, because here I created two lists to accomplish my task. And I throught that Shell monad is for manipulating entities in constant space. Maybe there is some fold to make IO (Shell Size) from Shell (IO Size)?
Thanks.

You have an IO action, and you really want a Shell action. The usual way to handle that is with the liftIO method, which is available because Shell is an instance of MonadIO.
file <- find (suffix ".hs") "."
size <- liftIO $ getFileSize file
or even
size <- liftIO . getFileSize =<< find (suffix ".hs") "."
Fortunately, the Turtle package itself offers some size functions you can use directly with MonadIO instances like Shell in Turtle.Prelude so you don't need to use liftIO yourself.
Now you actually have to sum these up, but you can do that with fold and sum.
I would recommend that you avoid breaking open the Shell type itself. That should be reserved for adding totally new functionality to the API. That certainly isn't necessary in this case.

Actually I've managed to get rid of IO here by using helper transformation
sio :: Shell (IO a) -> Shell a
sio s = Shell (\(FoldShell step begin done) ->
let step' x a = do
a' <- a
step x a'
in
_foldShell s (FoldShell step' begin done))
But now I wonder is there any simpler solution to this task...

Using mapM f [list] where f is defined with do notation

I currently have this code which will perform the main' function on each of the filenames in the list files.
Ideally I have been trying to combine main and main' but I haven't made much progress. Is there a better way to simplify this or will I need to keep them separate?
{- Start here -}
main :: IO [()]
main = do
files <- getArgs
mapM main' files
{- Main's helper function -}
main' :: FilePath -> IO ()
main' file = do
contents <- readFile file
case (runParser parser 0 file $ lexer contents) of Left err -> print err
Right xs -> putStr xs
Thanks!
Edit: As most of you are suggesting; I was trying a lambda abstraction for this but wasn't getting it right. - Should've specified this above. With the examples I see this better.

The Control.Monad library defines the function forM which is mapM is reverse arguments. That makes it easier to use in your situation, i.e.
main :: IO ()
main = do
files <- getArgs
forM_ files $ \file -> do
contents <- readFile file
case (runParser f 0 file $ lexer contents) of
Left err -> print err
Right xs -> putStr xs
The version with the underscore at the end of the name is used when you are not interested in the resulting list (like in this case), so main can simply have the type IO (). (mapM has a similar variant called mapM_).

You can use forM, which equals flip mapM, i.e. mapM with its arguments flipped, like this:
forM_ files $ \file -> do
contents <- readFile file
...
Also notice that I used forM_ instead of forM. This is more efficient when you are not interested in the result of the computation.

dealing with IO vs pure code in haskell

I'm writing a shell script (my 1st non-example in haskell) which is supposed to list a directory, get every file size, do some string manipulation (pure code) and then rename some files. I'm not sure what i'm doing wrong, so 2 questions:
How should i arrange the code in such program?
I have a specific issue, i get the following error, what am i doing wrong?
error:
Couldn't match expected type `[FilePath]'
against inferred type `IO [FilePath]'
In the second argument of `mapM', namely `fileNames'
In a stmt of a 'do' expression:
files <- (mapM getFileNameAndSize fileNames)
In the expression:
do { fileNames <- getDirectoryContents;
files <- (mapM getFileNameAndSize fileNames);
sortBy cmpFilesBySize files }
code:
getFileNameAndSize fname = do (fname, (withFile fname ReadMode hFileSize))
getFilesWithSizes = do
fileNames <- getDirectoryContents
files <- (mapM getFileNameAndSize fileNames)
sortBy cmpFilesBySize files

Your second, specific, problem is with the types of your functions. However, your first issue (not really a type thing) is the do statement in getFileNameAndSize. While do is used with monads, it's not a monadic panacea; it's actually implemented as some simple translation rules. The Cliff's Notes version (which isn't exactly right, thanks to some details involving error handling, but is close enough) is:
do a ≡ a
do a ; b ; c ... ≡ a >> do b ; c ...
do x <- a ; b ; c ... ≡ a >>= \x -> do b ; c ...
In other words, getFileNameAndSize is equivalent to the version without the do block, and so you can get rid of the do. This leaves you with
getFileNameAndSize fname = (fname, withFile fname ReadMode hFileSize)
We can find the type for this: since fname is the first argument to withFile, it has type FilePath; and hFileSize returns an IO Integer, so that's the type of withFile .... Thus, we have getFileNameAndSize :: FilePath -> (FilePath, IO Integer). This may or may not be what you want; you might instead want FilePath -> IO (FilePath,Integer). To change it, you can write any of
getFileNameAndSize_do fname = do size <- withFile fname ReadMode hFileSize
return (fname, size)
getFileNameAndSize_fmap fname = fmap ((,) fname) $
withFile fname ReadMode hFileSize
-- With `import Control.Applicative ((<$>))`, which is a synonym for fmap.
getFileNameAndSize_fmap2 fname = ((,) fname)
<$> withFile fname ReadMode hFileSize
-- With {-# LANGUAGE TupleSections #-} at the top of the file
getFileNameAndSize_ts fname = (fname,) <$> withFile fname ReadMode hFileSize
Next, as KennyTM pointed out, you have fileNames <- getDirectoryContents; since getDirectoryContents has type FilePath -> IO FilePath, you need to give it an argument. (e.g. getFilesWithSizes dir = do fileNames <- getDirectoryContents dir ...). This is probably just a simple oversight.
Mext, we come to the heart of your error: files <- (mapM getFileNameAndSize fileNames). I'm not sure why it gives you the precise error it does, but I can tell you what's wrong. Remember what we know about getFileNameAndSize. In your code, it returns a (FilePath, IO Integer). However, mapM is of type Monad m => (a -> m b) -> [a] -> m [b], and so mapM getFileNameAndSize is ill-typed. You want getFileNameAndSize :: FilePath -> IO (FilePath,Integer), like I implemented above.
Finally, we need to fix your last line. First of all, although you don't give it to us, cmpFilesBySize is presumably a function of type (FilePath, Integer) -> (FilePath, Integer) -> Ordering, comparing on the second element. This is really simple, though: using Data.Ord.comparing :: Ord a => (b -> a) -> b -> b -> Ordering, you can write this comparing snd, which has type Ord b => (a, b) -> (a, b) -> Ordering. Second, you need to return your result wrapped up in the IO monad rather than just as a plain list; the function return :: Monad m => a -> m a will do the trick.
Thus, putting this all together, you'll get
import System.IO (FilePath, withFile, IOMode(ReadMode), hFileSize)
import System.Directory (getDirectoryContents)
import Control.Applicative ((<$>))
import Data.List (sortBy)
import Data.Ord (comparing)
getFileNameAndSize :: FilePath -> IO (FilePath, Integer)
getFileNameAndSize fname = ((,) fname) <$> withFile fname ReadMode hFileSize
getFilesWithSizes :: FilePath -> IO [(FilePath,Integer)]
getFilesWithSizes dir = do fileNames <- getDirectoryContents dir
files <- mapM getFileNameAndSize fileNames
return $ sortBy (comparing snd) files
This is all well and good, and will work fine. However, I might write it slightly differently. My version would probably look like this:
{-# LANGUAGE TupleSections #-}
import System.IO (FilePath, withFile, IOMode(ReadMode), hFileSize)
import System.Directory (getDirectoryContents)
import Control.Applicative ((<$>))
import Control.Monad ((<=<))
import Data.List (sortBy)
import Data.Ord (comparing)
preservingF :: Functor f => (a -> f b) -> a -> f (a,b)
preservingF f x = (x,) <$> f x
-- Or liftM2 (<$>) (,), but I am not entirely sure why.
fileSize :: FilePath -> IO Integer
fileSize fname = withFile fname ReadMode hFileSize
getFilesWithSizes :: FilePath -> IO [(FilePath,Integer)]
getFilesWithSizes = return . sortBy (comparing snd)
<=< mapM (preservingF fileSize)
<=< getDirectoryContents
(<=< is the monadic equivalent of ., the function composition operator.) First off: yes, my version is longer. However, I'd probably already have preservingF defined somewhere, making the two equivalent in length.* (I might even inline fileSize if it weren't used elsewhere.) Second, I like this version better because it involves chaining together simpler pure functions we've already written. While your version is similar, mine (I feel) is more streamlined and makes this aspect of things clearer.
So this is a bit of an answer to your first question of how to structure these things. I personally tend to lock my IO down into as few functions as possible—only functions which need to touch the outside world directly (e.g. main and anything which interacts with a file) get an IO. Everything else is an ordinary pure function (and is only monadic if it's monadic for general reasons, along the lines of preservingF). I then arrange things so that main, etc., are just compositions and chains of pure functions: main gets some values from IO-land; then it calls pure functions to fold, spindle, and mutilate the date; then it gets more IO values; then it operates more; etc. The idea is to separate the two domains as much as possible, so that the more compositional non-IO code is always free, and the black-box IO is only done precisely where necessary.
Operators like <=< really help with writing code in this style, as they let you operate on functions which interact with monadic values (such as the IO-world) just as you would operate on normal functions. You should also look at Control.Applicative's function <$> liftedArg1 <*> liftedArg2 <*> ... notation, which lets you apply ordinary functions to any number of monadic (really Applicative) arguments. This is really nice for getting rid of spurious <-s and just chaining pure functions over monadic code.
*: I feel like preservingF, or at least its sibling preserving :: (a -> b) -> a -> (a,b), should be in a package somewhere, but I've been unable to find either.

getDirectoryContents is a function. You should supply an argument to it, e.g.
fileNames <- getDirectoryContents "/usr/bin"
Also, the type of getFileNameAndSize is FilePath -> (FilePath, IO Integer), as you can check from ghci:
Prelude> :m + System.IO
Prelude System.IO> let getFileNameAndSize fname = do (fname, (withFile fname ReadMode hFileSize))
Prelude System.IO> :t getFileNameAndSize
getFileNameAndSize :: FilePath -> (FilePath, IO Integer)
But mapM requires the input function to return an IO stuff:
Prelude System.IO> :t mapM
mapM :: (Monad m) => (a -> m b) -> [a] -> m [b]
-- # ^^^^^^^^
You should change its type to FilePath -> IO (FilePath, Integer) to match the type.
getFileNameAndSize fname = do
fsize <- withFile fname ReadMode hFileSize
return (fname, fsize)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Haskell Turtle - split a shell - haskell

Related

Does GHC generate optimized code for `mapM_`?

Convert IO callback to infinite list

Haskell Turtle get out of Shell Monad

Using mapM f [list] where f is defined with do notation

dealing with IO vs pure code in haskell

Categories

Resources