How to print paths using Haskell Turtle library?

How to print paths using Haskell Turtle library? - haskell

To learn a bit about Turtle, I thought it would be nice to modify example from the tutorial. I chose to remove the reduntant "FilePath" from each line of the output thinking it would be a simple exercise.
And yet, despite author's efforts into making his library easy to use I nearly failed to use it to solve this simple problem.
I tried everyting I saw that looked like it would allow me to somehow lift >>= from IO into Shell: MonadIO, FoldM, liftIO, _foldIO with no success. I grew frustrated and only through reading Turtle source code I was able to find something that seems to work ("no obvious defects" comes to mind).
Why is this so hard? How does one logically arrive a solution using API of this library?
#!/usr/bin/env stack
-- stack --resolver lts-8.17 --install-ghc runghc --package turtle --package lens
{-# LANGUAGE OverloadedStrings #-}
import Turtle
import Control.Lens
import Control.Foldl as Foldl
import Filesystem.Path.CurrentOS
import Data.Text.IO as T
import Data.Text as T
main = do
homedir <- home
let paths = lstree $ homedir </> "projects"
let t = fmap (Control.Lens.view _Right . toText) paths
customView t
customView s = sh (do
x <- s
liftIO $ T.putStrLn x)

You don't lift >>= from IO into Shell. Shell already has a Monad instance that comes with its own >>= function. Instead you either lift IO actions into Shell with liftIO or run the shell with fold or foldM. Use sh to run the Shell when you don't care about the results.
I believe your example can be simplified to
main = sh $ do
homedir <- home
filepath <- lstree $ homedir </> "projects"
case (toText filepath) of
Right path -> liftIO $ T.putStrLn x
Left approx -> return () -- This shouldn't happen
As for the difficulty with getting a string back from a FilePath, I don't think that can be blamed on the Turtle author. I think it can be simplified to
stringPath :: FilePath -> String
stringPath filepath =
case (toText filePath) of -- try to use the human readable version
Right path -> T.unpack path
Left _ -> encodeString filePath -- fall back on the machine readable one
Combined this would simplify the example to
main = sh $ do
homedir <- home
filepath <- lstree $ homedir </> "projects"
liftIO $ putStrLn (stringPath filepath)
or
main = view $ do
homedir <- home
filepath <- lstree $ homedir </> "projects"
return $ stringPath filepath

Related

how to list the functions exported by a Haskell module from an .hs script?

I am aware of this thread and the agreed-upon ghci :browse command, but I am looking for something similar to run from a script.hs file:
Say I have a module that I can import into my script.hs. How do I then view the list of functions I have just gained access to?
What I've settled on for now
Adapting this thread that suggests the now-deprecated ghc-mod command-line program, I am
calling the terminal command ghc -e ':browse <module, e.g. Data.List>'
from my script.hs using Shelly.
My full script:
#!/usr/bin/env runghc
{-# LANGUAGE OverloadedStrings #-}
import Safe (headDef)
import Shelly
import System.Environment (getArgs)
import qualified Data.Text as T
mdl :: IO String
mdl = getArgs >>= return . headDef "Data.List"
runShelly :: String -> IO ()
runShelly mdl = shelly $ silently $ do
out <- run "ghc" ["-e", T.pack (":browse " ++ mdl)]
let lns = T.lines out
liftIO $ mapM_ (putStrLn .T.unpack) $ lns
main :: IO ()
main = mdl >>= runShelly
This way I can pass the module name on the command line as <script> <module> and get back the functions, one per line. It defaults to Data.List if I pass no arguments.
So that's a solution, but surely there must be handier introspection facilities than this?

Haskell: interaction between withCurrentDirectory and runConcurrently

I'm trying to automate some file management in Haskell using System.Directory. My script works synchronously, but in my use case, I have about twenty directories, for each of which I'd like to start a long-running process, so I am also using Control.Concurrent.Async, which seems to be causing problems.
Minimal Example:
#!/usr/bin/env stack
-- stack --resolver lts-10.3 --install-ghc runghc --package async
import Control.Concurrent.Async (Concurrently(..), runConcurrently)
import Control.Monad (filterM)
import System.Directory as Dir
import System.Process (callCommand)
dirs :: IO [FilePath]
dirs = do
prefix <- (++ "/Desktop/dirs/") <$> Dir.getHomeDirectory
paths <- fmap (prefix ++) <$> Dir.listDirectory prefix
filterM Dir.doesDirectoryExist paths
pullDir :: FilePath -> IO ()
pullDir dir = Dir.withCurrentDirectory dir $ callCommand "pwd"
main :: IO ()
main = dirs >>= runConcurrently . traverse (Concurrently . pullDir) >> pure ()
Expected output:
/Users/daniel/Desktop/dirs/1
/Users/daniel/Desktop/dirs/2
/Users/daniel/Desktop/dirs/3
/Users/daniel/Desktop/dirs/4
/Users/daniel/Desktop/dirs/5
Actual output (varies!):
/Users/daniel/Desktop/dirs/3
/Users/daniel/Desktop/dirs/4
/Users/daniel/Desktop/dirs/3
/Users/daniel/Desktop/dirs/5
/Users/daniel/Desktop/dirs/5
We see the actual output runs pwd for the same directory more than once and fails to run pwd for some of the directories entierly. I'm almost positive this has to do with withCurrentDirectory.
How can I implement this correctly while still preserving the concurrency?

This isn't possible with withCurrentDirectory. The current directory is a process-wide setting. Whenever something changes it, it's changed for everything in the process. This isn't a Haskell issue - it's just how the concept of "current directory" works.
To get this to work concurrently, you'll need to use full paths for everything, instead of changing the current directory.

Haskell Turtle - split a shell

Is it possible to split a Shell in Turtle library (Haskell) and do different things to either split of the shell, such that the original Shell is only run once ?
/---- shell2
---Shell1 --/
\
\-----shell3
For instance, how to do
do
let lstmp = lstree "/tmp"
view lstmp
view $ do
path <- lstmp
x <- liftIO $ testdir path
return x
such that lstree "/tmp" would only run once.
Specifically I would like to send Shell 2 and Shell 3 to different files using output.

You won't be able to split a Shell into two separate shells that run simultaneously, unless there's some magic I don't know. But file writing is a fold over the contents of a shell or some other succession of things. It is built into turtle that you can always combine many folds and make them run simultaneously using the Control.Foldl material - here
foldIO :: Shell a -> FoldM IO a r -> IO r -- specializing
A shell is secretly a FoldM IO a r -> IO r under the hood anyway, so this is basically runShell. To do this we need to get the right Shell and the right combined FoldM IO. The whole idea of the Fold a b and FoldM m a b types from the foldl package is simultaneous folding.
I think the easiest way to get the right shell is just to make the lstree fold return a FilePath together with the result of testdir. You basically wrote this:
withDirInfo :: FilePath -> Shell (Bool, FilePath)
withDirInfo tmp = do
let lstmp = lstree tmp
path <- lstmp
bool <- liftIO $ testdir path
return (bool, path)
So now we can get a Shell (Bool, FilePath) from /tmp This has all the information our two folds will need, and thus that our combined fold will need.
Next we might write a helper fold that prints the Text component of the FilePath to a given handle:
sinkFilePaths :: Handle -> FoldM IO FilePath ()
sinkFilePaths handle = L.sink (T.hPutStrLn handle . format fp)
Then we can use this Handle -> FoldM IO FilePath () to define two FoldM IO (Bool, FilePath) (). Each will write different stuff to different handles, and we can unite them into a single simultaneous fold with <*. This is an independent FoldM IO ... and can be applied e.g. to a pure list of type [(Bool, FilePath)] using L.fold and it will write different things from the list to the different handles. In our case, though, we will apply it to the Shell (Bool, FilePath) we defined.
The only subtle part of this is the use of L.handlesM to print only the second element, in both cases, and only those filtered as directories in the other. This uses the _2 lens and filtered from the lens libraries. This could probably be simplified, but see what you think:
{-#LANGUAGE OverloadedStrings #-}
import Turtle
import qualified Control.Foldl as L
import qualified System.IO as IO
import Control.Lens (_2,filtered)
import qualified Data.Text.IO as T
main = IO.withFile "tmpfiles.txt" IO.WriteMode $ \h ->
IO.withFile "tmpdirs.txt" IO.WriteMode $ \h' -> do
foldIO (withDirInfo "/tmp") (sinkFilesDirs h h')
withDirInfo :: Turtle.FilePath -> Shell (Bool, Turtle.FilePath)
withDirInfo tmp = do
let lstmp = lstree tmp
path <- lstmp
bool <- liftIO $ testdir path
return (bool, path)
sinkFilePaths :: Handle -> FoldM IO Turtle.FilePath ()
sinkFilePaths handle = L.sink (T.hPutStrLn handle . format fp)
sinkFilesDirs :: Handle -> Handle -> FoldM IO (Bool, Turtle.FilePath) ()
sinkFilesDirs h h' = allfiles <* alldirs where
allfiles :: L.FoldM IO (Bool, Turtle.FilePath) ()
allfiles = L.handlesM _2 (sinkFilePaths h)
-- handle the second element of pairs with sinkFilePaths
alldirs :: FoldM IO (Bool, Turtle.FilePath) ()
alldirs = L.handlesM (filtered (\(bool,file) -> bool) . _2) (sinkFilePaths h')
-- handle the second element of pairs where the first element
-- is true using sinkFilePaths

It sounds like you're looking for something like async to split off your shells from the first shell and then wait for them to return. async is a pretty capable library that can achieve much more than the below example, but it provides a pretty simple solution to what you're asking for:
import Control.Concurrent.Async
import Turtle.Shell
import Turtle.Prelude
main :: IO ()
main = do
let lstmp1 = lstree "/tmp"
let lstmp2 = lstree "/etc"
view lstmp1
view lstmp2
job1 <- async $ view $ do
path <- lstmp1
x <- liftIO $ testdir path
return x
job2 <- async $ view $ do
path <- lstmp2
x <- liftIO $ testdir path
return x
wait job1
wait job2
Is this what you're looking for?

What is the haskell way to copy a directory

I find myself doing more and more scripting in haskell. But there are some cases where I'm really not sure of how to do it "right".
e.g. copy a directory recursively (a la unix cp -r).
Since I mostly use linux and Mac Os I usually cheat:
import System.Cmd
import System.Exit
copyDir :: FilePath -> FilePath -> IO ExitCode
copyDir src dest = system $ "cp -r " ++ src ++ " " ++ dest
But what is the recommended way to copy a directory in a platform independent fashion?
I didn't find anything suitable on hackage.
This is my rather naiv implementation I use so far:
import System.Directory
import System.FilePath((</>))
import Control.Applicative((<$>))
import Control.Exception(throw)
import Control.Monad(when,forM_)
copyDir :: FilePath -> FilePath -> IO ()
copyDir src dst = do
whenM (not <$> doesDirectoryExist src) $
throw (userError "source does not exist")
whenM (doesFileOrDirectoryExist dst) $
throw (userError "destination already exists")
createDirectory dst
content <- getDirectoryContents src
let xs = filter (`notElem` [".", ".."]) content
forM_ xs $ \name -> do
let srcPath = src </> name
let dstPath = dst </> name
isDirectory <- doesDirectoryExist srcPath
if isDirectory
then copyDir srcPath dstPath
else copyFile srcPath dstPath
where
doesFileOrDirectoryExist x = orM [doesDirectoryExist x, doesFileExist x]
orM xs = or <$> sequence xs
whenM s r = s >>= flip when r
Any suggestions of what really is the way to do it?
I updated this with the suggestions of hammar and FUZxxl.
...but still it feels kind of clumsy to me for such a common task!

It's possible to use the Shelly library in order to do this, see cp_r:
cp_r "sourcedir" "targetdir"
Shelly first tries to use native cp -r if available. If not, it falls back to a native Haskell IO implementation.
For further details on type semantics of cp_r, see this post written by me to described how to use cp_r with String and or Text.
Shelly is not platform independent, since it relies on the Unix package, which is not supported under Windows.

I couldn't find anything that does this on Hackage.
Your code looks pretty good to me. Some comments:
dstExists <- doesDirectoryExist dst
This does not take into account that a file with the destination name might exist.
if or [not srcExists, dstExists] then print "cannot copy"
You might want to throw an exception or return a status instead of printing directly from this function.
paths <- forM xs $ \name -> do
[...]
return ()
Since you're not using paths for anything, you can change this to
forM_ xs $ \name -> do
[...]

The filesystem-trees package provides the means for a very simple implementation:
import System.File.Tree (getDirectory, copyTo_)
copyDirectory :: FilePath -> FilePath -> IO ()
copyDirectory source target = getDirectory source >>= copyTo_ target

The MissingH package provides recursive directory traversals, which you might be able to use to simplify your code.

I assume that the function in Path.IO copyDirRecur with variants to include/exclude symlinks may be a newer and maintained solution. It requires to convert the filepath to Path x Dir which is achieved with parseRelDir respective parseAbsDir, but I think to have a more precise date type than FilePath is worthwile to avoid hard to track errors at run-time.

There are also some functions for copying files and directories in the core Haskell library Cabal modules, specifically Distribution.Simple.Utils in package Cabal. copyDirectoryRecursive is one, and there are other functions near this one in that module.

How to programmatically retrieve GHC package information?

More specifically, given an arbritary package name I need to retrieve the same library-dirs field that can be obtained with the ghc-pkg describe command from inside a running Haskell program.

Here's what I could come up with by peeking into the ghc-pkg source code.
The getPkgInfos function returns the package definitions for all installed packages (hopefully including user-installed packages). With this in your hands, you can retrieve the library directories and other package information. See the documentation for details.
The GHC_PKGCONF variable needs to point to the global package config file for systems where it isn't located at the usual place. ghc-pkg solves this problem by receiving a command line flag via a wrapper script in Ubuntu, for instance.
import qualified Config
import qualified System.Info
import Data.List
import Distribution.InstalledPackageInfo
import GHC.Paths
import System.Directory
import System.Environment
import System.FilePath
import System.IO.Error
getPkgInfos :: IO [InstalledPackageInfo]
getPkgInfos = do
global_conf <-
catch (getEnv "GHC_PKGCONF")
(\err -> if isDoesNotExistError err
then do let dir = takeDirectory $ takeDirectory ghc_pkg
path1 = dir </> "package.conf"
path2 = dir </> ".." </> ".." </> ".."
</> "inplace-datadir"
</> "package.conf"
exists1 <- doesFileExist path1
exists2 <- doesFileExist path2
if exists1 then return path1
else if exists2 then return path2
else ioError $ userError "Can't find package.conf"
else ioError err)
let global_conf_dir = global_conf ++ ".d"
global_conf_dir_exists <- doesDirectoryExist global_conf_dir
global_confs <-
if global_conf_dir_exists
then do files <- getDirectoryContents global_conf_dir
return [ global_conf_dir ++ '/' : file
| file <- files
, isSuffixOf ".conf" file]
else return []
user_conf <-
try (getAppUserDataDirectory "ghc") >>= either
(\_ -> return [])
(\appdir -> do
let subdir = currentArch ++ '-':currentOS ++ '-':ghcVersion
user_conf = appdir </> subdir </> "package.conf"
user_exists <- doesFileExist user_conf
return (if user_exists then [user_conf] else []))
let pkg_dbs = user_conf ++ global_confs ++ [global_conf]
return.concat =<< mapM ((>>= return.read).readFile) pkg_dbs
currentArch = System.Info.arch
currentOS = System.Info.os
ghcVersion = Config.cProjectVersion
I wrote this code myself, but it was largely inspired by ghc-pkg (with some pieces copied verbatim). The original code was licensed under a BSD-style license, I think this can be distributed under the cc-wiki license all Stackoverflow content is under, but I'm not really sure. Anyway, as anything else, I did some initial testing and it seems to work, but use it at your own risk.

The format of the installed packages database is Distribution.InstalledPackageInfo.
import Distribution.InstalledPackageInfo
import Distribution.Package
import Distribution.Text
import GHC.Paths
import System
import System.FilePath
main = do
name:_ <- getArgs
packages <- fmap read $ readFile $ joinPath [libdir, "package.conf"]
let matches = filter ((PackageName name ==) . pkgName . package) packages
mapM_ (print . libraryDirs) (matches :: [InstalledPackageInfo_ String])
This doesn't obey the user's package configuration, but should be a start.

Ask Duncan Coutts on the haskell-cafe# or cabal mailing lists. (I'm serious. That is a better forum for Cabal questions than stack overflow).
Sometimes you just have to point people at a different forum.

If you're using cabal to configure and build your program/library you can used the autogenerated Paths_* module.
For example, if you have a foo.cabal file, cabal will generate a Paths_foo module (see its source under dist/build/autogen) which you can import. This module exports a function getLibDir :: IO FilePath which has the value you're looking for.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to print paths using Haskell Turtle library? - haskell

Related

how to list the functions exported by a Haskell module from an .hs script?

Haskell: interaction between withCurrentDirectory and runConcurrently

Haskell Turtle - split a shell

What is the haskell way to copy a directory

How to programmatically retrieve GHC package information?

Categories

Resources