What is the haskell way to copy a directory - haskell

I find myself doing more and more scripting in haskell. But there are some cases where I'm really not sure of how to do it "right".
e.g. copy a directory recursively (a la unix cp -r).
Since I mostly use linux and Mac Os I usually cheat:
import System.Cmd
import System.Exit
copyDir :: FilePath -> FilePath -> IO ExitCode
copyDir src dest = system $ "cp -r " ++ src ++ " " ++ dest
But what is the recommended way to copy a directory in a platform independent fashion?
I didn't find anything suitable on hackage.
This is my rather naiv implementation I use so far:
import System.Directory
import System.FilePath((</>))
import Control.Applicative((<$>))
import Control.Exception(throw)
import Control.Monad(when,forM_)
copyDir :: FilePath -> FilePath -> IO ()
copyDir src dst = do
whenM (not <$> doesDirectoryExist src) $
throw (userError "source does not exist")
whenM (doesFileOrDirectoryExist dst) $
throw (userError "destination already exists")
createDirectory dst
content <- getDirectoryContents src
let xs = filter (`notElem` [".", ".."]) content
forM_ xs $ \name -> do
let srcPath = src </> name
let dstPath = dst </> name
isDirectory <- doesDirectoryExist srcPath
if isDirectory
then copyDir srcPath dstPath
else copyFile srcPath dstPath
where
doesFileOrDirectoryExist x = orM [doesDirectoryExist x, doesFileExist x]
orM xs = or <$> sequence xs
whenM s r = s >>= flip when r
Any suggestions of what really is the way to do it?
I updated this with the suggestions of hammar and FUZxxl.
...but still it feels kind of clumsy to me for such a common task!

It's possible to use the Shelly library in order to do this, see cp_r:
cp_r "sourcedir" "targetdir"
Shelly first tries to use native cp -r if available. If not, it falls back to a native Haskell IO implementation.
For further details on type semantics of cp_r, see this post written by me to described how to use cp_r with String and or Text.
Shelly is not platform independent, since it relies on the Unix package, which is not supported under Windows.

I couldn't find anything that does this on Hackage.
Your code looks pretty good to me. Some comments:
dstExists <- doesDirectoryExist dst
This does not take into account that a file with the destination name might exist.
if or [not srcExists, dstExists] then print "cannot copy"
You might want to throw an exception or return a status instead of printing directly from this function.
paths <- forM xs $ \name -> do
[...]
return ()
Since you're not using paths for anything, you can change this to
forM_ xs $ \name -> do
[...]

The filesystem-trees package provides the means for a very simple implementation:
import System.File.Tree (getDirectory, copyTo_)
copyDirectory :: FilePath -> FilePath -> IO ()
copyDirectory source target = getDirectory source >>= copyTo_ target

The MissingH package provides recursive directory traversals, which you might be able to use to simplify your code.

I assume that the function in Path.IO copyDirRecur with variants to include/exclude symlinks may be a newer and maintained solution. It requires to convert the filepath to Path x Dir which is achieved with parseRelDir respective parseAbsDir, but I think to have a more precise date type than FilePath is worthwile to avoid hard to track errors at run-time.

There are also some functions for copying files and directories in the core Haskell library Cabal modules, specifically Distribution.Simple.Utils in package Cabal. copyDirectoryRecursive is one, and there are other functions near this one in that module.

Related

getAllFiles (but not symlinks)

I have a directory traversal function in Haskell, but I want it to ignore symlinks. I figured out how to filter out the files alone, albeit with a slightly inelegant secondary filterM. But after some diagnosis I realize that I'm failing to filter symlinked directories.
I'd like to be able to write something like this:
-- Lazily return (normal) files from rootdir
getAllFiles :: FilePath -> IO [FilePath]
getAllFiles root = do
nodes <- pathWalkLazy root
-- get file paths from each node
let files = [dir </> file | (dir, _, files) <- nodes,
file <- files,
not . pathIsSymbolicLink dir]
normalFiles <- filterM (liftM not . pathIsSymbolicLink) files
return normalFiles
However, all the variations I have tried get some version of the "Couldn't match expected type ‘Bool’ with actual type ‘IO Bool’" message (without the filter clause in the comprehension it works, but fails to filter those linked dirs).
Various hints at ways I might completely restructure the function are in partial form at online resources, but I'm pretty sure that every such variation will run into some similar issue. The list comprehension would certainly be the most straightforward way... if I could just somehow exclude those dirs that are links.
Followup: Unfortunately, the solution kindly provided by ChrisB behaves (almost?!) identically to my existing version. I defined three functions, and run them within a test program:
-- XXX: debugging
files <- getAllFilesRaw rootdir
putStrLn ("getAllFilesRaw: " ++ show (length files))
files' <- getAllFilesNoSymFiles rootdir
putStrLn ("getAllFilesNoSymFiles: " ++ show (length files'))
files'' <- getAllFilesNoSymDirs rootdir
putStrLn ("getAllFilesNoSymDirs: " ++ show (length files''))
The first is my version with the normalFiles filter removed. The second is my original version (minus the type error in the listcomp). The final one is ChrisB's suggestion.
Running that, then also the system find utility:
% find $CONDA_PREFIX -type f | wc -l
449667
% find -L $CONDA_PREFIX -type f | wc -l
501153
% haskell/find-dups $CONDA_PREFIX
getAllFilesRaw : 501153
getAllFilesNoSymFiles: 464553
getAllFilesNoSymDirs: 464420
Moreover, this question came up because—for my own self-education—I've implemented the same application in a bunch of languages: Python; Golang; Rust; Julia; TypeScript; Bash, except the glitch, Haskell; others are planned. The programs actually do something more with the files, but that's not the point of this question.
The point of this is that ALL other languages report the same number as the system find tool. Moreover, the specific issue is things like this:
% ls -l /home/dmertz/miniconda3/pkgs/ncurses-6.2-he6710b0_1/lib/terminfo
lrwxrwxrwx 1 dmertz dmertz 17 Apr 29 2020 /home/dmertz/miniconda3/pkgs/ncurses-6.2-he6710b0_1/lib/terminfo -> ../share/terminfo
There are about 16k examples here (on my system currently), but looking at some in the other version of the tool, I see specifically that all the other languages are excluding the contents of that symlink directory.
EDIT:
Instead of just fixing a Bool / IO Bool issue we now want to mach find's behavior.
After looking at the documentation,
this seems to be quite hard to implement reasonably performantly
with the PathWalk library, so i just handrolled it.
(Using do-notation, as requested in the comments.)
In my quick and dirty tests the results match those of find:
import System.FilePath
import System.Directory
getAllFiles' :: FilePath -> IO [FilePath]
getAllFiles' path = do
isSymlink <- pathIsSymbolicLink path
if isSymlink
-- if this is a symlink, return the empty list.
-- even if this was the original root. (matches find's behavior)
then return []
else do
isFile <- doesFileExist path
if isFile
then return [path] -- if this is a file, return it
else do
-- if it's not a file, we assume it to be a directory
dirContents <- listDirectory path
-- run this function recursively on all the children
-- and accumulate the results
fmap concat $ mapM (getAllFiles' . (path </>)) dirContents
Original Answer solving the IO Bool / Bool issue
getAllFiles :: FilePath -> IO [FilePath]
getAllFiles root = pathWalkLazy root
-- remove dirs that are symlinks
>>= filterM (\(dir, _, _) -> fmap not $ pathIsSymbolicLink dir)
-- flatten to list of files
>>= return . concat . map (\(dir, _, files) -> map (\f -> dir </> f) files)
-- remove files that are symlinks
>>= filterM (fmap not . pathIsSymbolicLink)

`mapM_`, `copyFile`, and a ghost file

I'm having a strange problem when copying a list of files in Haskell. If I run the following code:
copy :: [FilePath] -> FilePath -> IO ()
-- Precondition: dir must be a directory.
copy fs dir = do
isDir <- doesDirectoryExist dir
if (isDir)
then do
mapM_ putStrLn fs -- Poor man's debug.
mapM_ (`copyFile` dir) fs
else ioError (userError $ dir ++ " is not a directory.")
The output of mapM_ putStrLn fs gives a single file, which extists, however the second mapM_ fails with the following message:
./.copyFile4363.tmp: copyFile: inappropriate type (Is a directory)
I'm really puzzled, since in both uses of mapM_ the list fs is passed as parameter.
Am I overlooking something?
From System.Directory Haddock (emphasis mine):
copyFile :: FilePath -> FilePath -> IO () Source
copyFile old new copies the existing file from old to new. If the new file already exists, it is atomically replaced by the old file. Neither path may refer to an existing directory. The permissions of old are copied to new, if possible.

Check if two directories are on the same filesystem in haskell

If I have two directories A and B How do I tell if they are on the same filesystem (e.g on same hardrive) in Haskell on OS X and linux ?
I checked System.Directory and System.FilePath.Posix which don't seem to have any thing for doing this.
The getFileStatus and deviceID functions from the unix package should help you with that.
One way would be to exploit the stat utility and write a wrapper for it yourself. stat has the ability to give device number for your file. I tested this following code in Linux and it works for different disks (but I'm not sure for Mac OS):
import Control.Applicative ((<$>))
import System.Process
statDeviceID :: FilePath -> IO String
statDeviceID fp = readProcess "stat" ["--printf=%d", fp] ""
-- for mac which has a different version of stat
-- statDeviceID fp = readProcess "stat" ["-f", "%d", fp] ""
checkSameDevice :: [FilePath] -> IO Bool
checkSameDevice xs = (\x -> all (== head x) x) <$> (sequence $ map statDeviceID xs)
paths = ["/mnt/Books", "/home/sibi"]
main = checkSameDevice paths >>= print
In ghci:
λ> main
False -- False since /mnt is a different hard disk

How to write a zip file using Haskell LibZip?

I'm trying to figure out a dead-simple task using LibZip in Haskell: how do I open an archive foo.zip, decompress it, recompress it, and save it to a new archive bar.zip? With the Zip library, this is easy:
{-# LANGUAGE OverloadedStrings #-}
import Codec.Archive.Zip (toArchive, fromArchive)
import qualified Data.ByteString.Lazy as B
import System.Environment
saveZipAs :: FilePath -> FilePath -> IO ()
saveZipAs source dest = do
arch <- fmap toArchive $ B.readFile source
putStrLn "Archive info: " >> print arch
B.writeFile dest $ fromArchive arch
LibZip, on the other hand, provides no clear way to do this (that I can see). It only seems to be able to instantiate a zip file with withArchive (which is an issue in and of itself, because a file you want to open might not be on disk), and I don't see a way to do any kind of "save as" operation, nor to extract the compressed bytes as a ByteString or otherwise (as in Zip). LibZip is supposedly faster than Zip, so I want to at least give it a try, but it seems much more obscure (and also impure, carrying around an IO everywhere it goes, where it is really only needed at the beginning and the end, if ever). Can anyone give me some tips?
Side note: it really boggles the mind how people can spend such huge amounts of time writing a library, only to document it so poorly that no one can use it. Library writers, please don't do this!
Your link is somehow to an old version of the library, and the very last version of the library seems to have haddock compilation bugs.
Here are file reading functions in a newer version:
http://hackage.haskell.org/package/LibZip-0.10.2/docs/Codec-Archive-LibZip.html#g:3
The reverse process seems to be addFile/sourceBuffer and related functions.
Here is full source code of zip repacking:
import Codec.Archive.LibZip
import Codec.Archive.LibZip.Types
main = readZip "foo.zip" >>= writeZip "bar.zip"
readZip :: FilePath -> IO [(FilePath, ZipSource)]
readZip zipName = withArchive [] zipName $ do
nn <- fileNames []
ss <- mapM (\n -> sourceFile n 0 (-1)) nn
return $ zip nn ss
writeZip :: FilePath -> [(FilePath, ZipSource)] -> IO ()
writeZip zipName zipContent = withArchive [CreateFlag] zipName $ do
mapM_ (uncurry addFile) zipContent
Few refactorings still can be done: liftM2 zip can be used in readZip, and function composition . in writeZip.

How to programmatically retrieve GHC package information?

More specifically, given an arbritary package name I need to retrieve the same library-dirs field that can be obtained with the ghc-pkg describe command from inside a running Haskell program.
Here's what I could come up with by peeking into the ghc-pkg source code.
The getPkgInfos function returns the package definitions for all installed packages (hopefully including user-installed packages). With this in your hands, you can retrieve the library directories and other package information. See the documentation for details.
The GHC_PKGCONF variable needs to point to the global package config file for systems where it isn't located at the usual place. ghc-pkg solves this problem by receiving a command line flag via a wrapper script in Ubuntu, for instance.
import qualified Config
import qualified System.Info
import Data.List
import Distribution.InstalledPackageInfo
import GHC.Paths
import System.Directory
import System.Environment
import System.FilePath
import System.IO.Error
getPkgInfos :: IO [InstalledPackageInfo]
getPkgInfos = do
global_conf <-
catch (getEnv "GHC_PKGCONF")
(\err -> if isDoesNotExistError err
then do let dir = takeDirectory $ takeDirectory ghc_pkg
path1 = dir </> "package.conf"
path2 = dir </> ".." </> ".." </> ".."
</> "inplace-datadir"
</> "package.conf"
exists1 <- doesFileExist path1
exists2 <- doesFileExist path2
if exists1 then return path1
else if exists2 then return path2
else ioError $ userError "Can't find package.conf"
else ioError err)
let global_conf_dir = global_conf ++ ".d"
global_conf_dir_exists <- doesDirectoryExist global_conf_dir
global_confs <-
if global_conf_dir_exists
then do files <- getDirectoryContents global_conf_dir
return [ global_conf_dir ++ '/' : file
| file <- files
, isSuffixOf ".conf" file]
else return []
user_conf <-
try (getAppUserDataDirectory "ghc") >>= either
(\_ -> return [])
(\appdir -> do
let subdir = currentArch ++ '-':currentOS ++ '-':ghcVersion
user_conf = appdir </> subdir </> "package.conf"
user_exists <- doesFileExist user_conf
return (if user_exists then [user_conf] else []))
let pkg_dbs = user_conf ++ global_confs ++ [global_conf]
return.concat =<< mapM ((>>= return.read).readFile) pkg_dbs
currentArch = System.Info.arch
currentOS = System.Info.os
ghcVersion = Config.cProjectVersion
I wrote this code myself, but it was largely inspired by ghc-pkg (with some pieces copied verbatim). The original code was licensed under a BSD-style license, I think this can be distributed under the cc-wiki license all Stackoverflow content is under, but I'm not really sure. Anyway, as anything else, I did some initial testing and it seems to work, but use it at your own risk.
The format of the installed packages database is Distribution.InstalledPackageInfo.
import Distribution.InstalledPackageInfo
import Distribution.Package
import Distribution.Text
import GHC.Paths
import System
import System.FilePath
main = do
name:_ <- getArgs
packages <- fmap read $ readFile $ joinPath [libdir, "package.conf"]
let matches = filter ((PackageName name ==) . pkgName . package) packages
mapM_ (print . libraryDirs) (matches :: [InstalledPackageInfo_ String])
This doesn't obey the user's package configuration, but should be a start.
Ask Duncan Coutts on the haskell-cafe# or cabal mailing lists. (I'm serious. That is a better forum for Cabal questions than stack overflow).
Sometimes you just have to point people at a different forum.
If you're using cabal to configure and build your program/library you can used the autogenerated Paths_* module.
For example, if you have a foo.cabal file, cabal will generate a Paths_foo module (see its source under dist/build/autogen) which you can import. This module exports a function getLibDir :: IO FilePath which has the value you're looking for.

Resources