`mapM_`, `copyFile`, and a ghost file - haskell

I'm having a strange problem when copying a list of files in Haskell. If I run the following code:
copy :: [FilePath] -> FilePath -> IO ()
-- Precondition: dir must be a directory.
copy fs dir = do
isDir <- doesDirectoryExist dir
if (isDir)
then do
mapM_ putStrLn fs -- Poor man's debug.
mapM_ (`copyFile` dir) fs
else ioError (userError $ dir ++ " is not a directory.")
The output of mapM_ putStrLn fs gives a single file, which extists, however the second mapM_ fails with the following message:
./.copyFile4363.tmp: copyFile: inappropriate type (Is a directory)
I'm really puzzled, since in both uses of mapM_ the list fs is passed as parameter.
Am I overlooking something?

From System.Directory Haddock (emphasis mine):
copyFile :: FilePath -> FilePath -> IO () Source
copyFile old new copies the existing file from old to new. If the new file already exists, it is atomically replaced by the old file. Neither path may refer to an existing directory. The permissions of old are copied to new, if possible.

Related

haskell: cd command does not work in shake/command library

For some reason I cannot make cd command work in shake/command Haskell library. It thinks directory I called with cd does not exist even though it is present in the filesystem.
Here is an excerpt of my code:
dataDir = "data"
data Settings = Settings {
url :: String
} deriving (Eq, Show, Generic, JSON.ToJSON, JSON.FromJSON)
settings :: String -> Handler Settings
settings subfolder = let
gitPath = dataDir ++ "/" ++ subfolder ++ "/git/.git"
in do
pathExists <- liftIO $ doesPathExist gitPath
-- Stdout pwdOut <- liftIO $ cmd ("pwd" :: String)
-- liftIO $ putStrLn $ pwdOut
if not pathExists
then do
liftIO $ (cmd_ ("mkdir -p" :: String) [gitPath] :: IO ())
liftIO $ (cmd_ ("cd" :: String) [gitPath] :: IO ())
liftIO $ (cmd_ ("git init" :: String) :: IO ())
return $ Settings { url = ""}
else do
liftIO $ (cmd_ (Cwd ".") ("cd" :: String) [gitPath] :: IO ())
Stdout out <- liftIO $ (cmd ("git config --get remote.origin.url" :: String))
return $ Settings {url = out}
It fails with an error cd: createProcess: runInteractiveProcess: exec: does not exist (No such file or directory) in both cases: if dir exists and when mkdir command is executed.
Cannot wrap my head around it. But before I submit a bug to the shake's github page, I want to make sure with you I am not doing anything stupid that might cause this kind of behavior.
Thanks in advance for help.
As described in the other answer, cd is not an executable, so if you wanted to run it, you would have to pass Shell to cmd.
However, it is almost certainly the case that you don't want to call cd in a command, as it does not change the directory for any subsequent command. Each cmd is a separate process, with a separate environment, so the subsequent command will be in a fresh environment, and the same working directory as before the cd. The solution is to pass (Cwd gitPath) to each command you want to operate with the given directory.
Shake's Haddock page describes cmd_, and links to its source. There we can see that cmd_ eventually calls commandExplicitIO, which constructs a ProcessOpts with RawCommand and passes it to process. process then takes that ProcessOpts, pattern-matches it as a RawCommand (via cmdSpec), and calls proc. We have now entered the well-documented zone: you must give proc an executable, and cd is not an executable. (Why? Since processes cannot change the working directory of their parent, cd must be a shell builtin.)

getAllFiles (but not symlinks)

I have a directory traversal function in Haskell, but I want it to ignore symlinks. I figured out how to filter out the files alone, albeit with a slightly inelegant secondary filterM. But after some diagnosis I realize that I'm failing to filter symlinked directories.
I'd like to be able to write something like this:
-- Lazily return (normal) files from rootdir
getAllFiles :: FilePath -> IO [FilePath]
getAllFiles root = do
nodes <- pathWalkLazy root
-- get file paths from each node
let files = [dir </> file | (dir, _, files) <- nodes,
file <- files,
not . pathIsSymbolicLink dir]
normalFiles <- filterM (liftM not . pathIsSymbolicLink) files
return normalFiles
However, all the variations I have tried get some version of the "Couldn't match expected type ‘Bool’ with actual type ‘IO Bool’" message (without the filter clause in the comprehension it works, but fails to filter those linked dirs).
Various hints at ways I might completely restructure the function are in partial form at online resources, but I'm pretty sure that every such variation will run into some similar issue. The list comprehension would certainly be the most straightforward way... if I could just somehow exclude those dirs that are links.
Followup: Unfortunately, the solution kindly provided by ChrisB behaves (almost?!) identically to my existing version. I defined three functions, and run them within a test program:
-- XXX: debugging
files <- getAllFilesRaw rootdir
putStrLn ("getAllFilesRaw: " ++ show (length files))
files' <- getAllFilesNoSymFiles rootdir
putStrLn ("getAllFilesNoSymFiles: " ++ show (length files'))
files'' <- getAllFilesNoSymDirs rootdir
putStrLn ("getAllFilesNoSymDirs: " ++ show (length files''))
The first is my version with the normalFiles filter removed. The second is my original version (minus the type error in the listcomp). The final one is ChrisB's suggestion.
Running that, then also the system find utility:
% find $CONDA_PREFIX -type f | wc -l
449667
% find -L $CONDA_PREFIX -type f | wc -l
501153
% haskell/find-dups $CONDA_PREFIX
getAllFilesRaw : 501153
getAllFilesNoSymFiles: 464553
getAllFilesNoSymDirs: 464420
Moreover, this question came up because—for my own self-education—I've implemented the same application in a bunch of languages: Python; Golang; Rust; Julia; TypeScript; Bash, except the glitch, Haskell; others are planned. The programs actually do something more with the files, but that's not the point of this question.
The point of this is that ALL other languages report the same number as the system find tool. Moreover, the specific issue is things like this:
% ls -l /home/dmertz/miniconda3/pkgs/ncurses-6.2-he6710b0_1/lib/terminfo
lrwxrwxrwx 1 dmertz dmertz 17 Apr 29 2020 /home/dmertz/miniconda3/pkgs/ncurses-6.2-he6710b0_1/lib/terminfo -> ../share/terminfo
There are about 16k examples here (on my system currently), but looking at some in the other version of the tool, I see specifically that all the other languages are excluding the contents of that symlink directory.
EDIT:
Instead of just fixing a Bool / IO Bool issue we now want to mach find's behavior.
After looking at the documentation,
this seems to be quite hard to implement reasonably performantly
with the PathWalk library, so i just handrolled it.
(Using do-notation, as requested in the comments.)
In my quick and dirty tests the results match those of find:
import System.FilePath
import System.Directory
getAllFiles' :: FilePath -> IO [FilePath]
getAllFiles' path = do
isSymlink <- pathIsSymbolicLink path
if isSymlink
-- if this is a symlink, return the empty list.
-- even if this was the original root. (matches find's behavior)
then return []
else do
isFile <- doesFileExist path
if isFile
then return [path] -- if this is a file, return it
else do
-- if it's not a file, we assume it to be a directory
dirContents <- listDirectory path
-- run this function recursively on all the children
-- and accumulate the results
fmap concat $ mapM (getAllFiles' . (path </>)) dirContents
Original Answer solving the IO Bool / Bool issue
getAllFiles :: FilePath -> IO [FilePath]
getAllFiles root = pathWalkLazy root
-- remove dirs that are symlinks
>>= filterM (\(dir, _, _) -> fmap not $ pathIsSymbolicLink dir)
-- flatten to list of files
>>= return . concat . map (\(dir, _, files) -> map (\f -> dir </> f) files)
-- remove files that are symlinks
>>= filterM (fmap not . pathIsSymbolicLink)

How to write a zip file using Haskell LibZip?

I'm trying to figure out a dead-simple task using LibZip in Haskell: how do I open an archive foo.zip, decompress it, recompress it, and save it to a new archive bar.zip? With the Zip library, this is easy:
{-# LANGUAGE OverloadedStrings #-}
import Codec.Archive.Zip (toArchive, fromArchive)
import qualified Data.ByteString.Lazy as B
import System.Environment
saveZipAs :: FilePath -> FilePath -> IO ()
saveZipAs source dest = do
arch <- fmap toArchive $ B.readFile source
putStrLn "Archive info: " >> print arch
B.writeFile dest $ fromArchive arch
LibZip, on the other hand, provides no clear way to do this (that I can see). It only seems to be able to instantiate a zip file with withArchive (which is an issue in and of itself, because a file you want to open might not be on disk), and I don't see a way to do any kind of "save as" operation, nor to extract the compressed bytes as a ByteString or otherwise (as in Zip). LibZip is supposedly faster than Zip, so I want to at least give it a try, but it seems much more obscure (and also impure, carrying around an IO everywhere it goes, where it is really only needed at the beginning and the end, if ever). Can anyone give me some tips?
Side note: it really boggles the mind how people can spend such huge amounts of time writing a library, only to document it so poorly that no one can use it. Library writers, please don't do this!
Your link is somehow to an old version of the library, and the very last version of the library seems to have haddock compilation bugs.
Here are file reading functions in a newer version:
http://hackage.haskell.org/package/LibZip-0.10.2/docs/Codec-Archive-LibZip.html#g:3
The reverse process seems to be addFile/sourceBuffer and related functions.
Here is full source code of zip repacking:
import Codec.Archive.LibZip
import Codec.Archive.LibZip.Types
main = readZip "foo.zip" >>= writeZip "bar.zip"
readZip :: FilePath -> IO [(FilePath, ZipSource)]
readZip zipName = withArchive [] zipName $ do
nn <- fileNames []
ss <- mapM (\n -> sourceFile n 0 (-1)) nn
return $ zip nn ss
writeZip :: FilePath -> [(FilePath, ZipSource)] -> IO ()
writeZip zipName zipContent = withArchive [CreateFlag] zipName $ do
mapM_ (uncurry addFile) zipContent
Few refactorings still can be done: liftM2 zip can be used in readZip, and function composition . in writeZip.

What is the haskell way to copy a directory

I find myself doing more and more scripting in haskell. But there are some cases where I'm really not sure of how to do it "right".
e.g. copy a directory recursively (a la unix cp -r).
Since I mostly use linux and Mac Os I usually cheat:
import System.Cmd
import System.Exit
copyDir :: FilePath -> FilePath -> IO ExitCode
copyDir src dest = system $ "cp -r " ++ src ++ " " ++ dest
But what is the recommended way to copy a directory in a platform independent fashion?
I didn't find anything suitable on hackage.
This is my rather naiv implementation I use so far:
import System.Directory
import System.FilePath((</>))
import Control.Applicative((<$>))
import Control.Exception(throw)
import Control.Monad(when,forM_)
copyDir :: FilePath -> FilePath -> IO ()
copyDir src dst = do
whenM (not <$> doesDirectoryExist src) $
throw (userError "source does not exist")
whenM (doesFileOrDirectoryExist dst) $
throw (userError "destination already exists")
createDirectory dst
content <- getDirectoryContents src
let xs = filter (`notElem` [".", ".."]) content
forM_ xs $ \name -> do
let srcPath = src </> name
let dstPath = dst </> name
isDirectory <- doesDirectoryExist srcPath
if isDirectory
then copyDir srcPath dstPath
else copyFile srcPath dstPath
where
doesFileOrDirectoryExist x = orM [doesDirectoryExist x, doesFileExist x]
orM xs = or <$> sequence xs
whenM s r = s >>= flip when r
Any suggestions of what really is the way to do it?
I updated this with the suggestions of hammar and FUZxxl.
...but still it feels kind of clumsy to me for such a common task!
It's possible to use the Shelly library in order to do this, see cp_r:
cp_r "sourcedir" "targetdir"
Shelly first tries to use native cp -r if available. If not, it falls back to a native Haskell IO implementation.
For further details on type semantics of cp_r, see this post written by me to described how to use cp_r with String and or Text.
Shelly is not platform independent, since it relies on the Unix package, which is not supported under Windows.
I couldn't find anything that does this on Hackage.
Your code looks pretty good to me. Some comments:
dstExists <- doesDirectoryExist dst
This does not take into account that a file with the destination name might exist.
if or [not srcExists, dstExists] then print "cannot copy"
You might want to throw an exception or return a status instead of printing directly from this function.
paths <- forM xs $ \name -> do
[...]
return ()
Since you're not using paths for anything, you can change this to
forM_ xs $ \name -> do
[...]
The filesystem-trees package provides the means for a very simple implementation:
import System.File.Tree (getDirectory, copyTo_)
copyDirectory :: FilePath -> FilePath -> IO ()
copyDirectory source target = getDirectory source >>= copyTo_ target
The MissingH package provides recursive directory traversals, which you might be able to use to simplify your code.
I assume that the function in Path.IO copyDirRecur with variants to include/exclude symlinks may be a newer and maintained solution. It requires to convert the filepath to Path x Dir which is achieved with parseRelDir respective parseAbsDir, but I think to have a more precise date type than FilePath is worthwile to avoid hard to track errors at run-time.
There are also some functions for copying files and directories in the core Haskell library Cabal modules, specifically Distribution.Simple.Utils in package Cabal. copyDirectoryRecursive is one, and there are other functions near this one in that module.

Faster alternatives to hFileSize to retrieve the size of a file in Haskell?

I am wondering how to get the size of a file in haskell with the least amount of overhead. Right now I have the following code:
getFileSize :: FilePath -> IO Integer
getFileSize x = do
handle <- openFile x ReadMode
size <- hFileSize handle
hClose handle
return size
This seems to be quite slow. I have stumbled across getFileStatus in System.Posix.Files but don't know how it works - at least I only get errors when playing around with it in ghci. Also, I am not sure if this would work on Windows (probably not).
So to reiterate: What is the best (and platform independent) approach to get the size of a file in Haskell?
What you want are indeed getFileStatus and fileSize, both from System.Posix (which will work just fine under Windows, if you use the unix-compat package instead of unix). Usage is as follows, leaving error handling up to you:
getFileSize :: String -> IO Integer
getFileSize path = do
stat <- getFileStatus path
return $ fromIntegral (fileSize stat)
For what it's worth, and though I think it's less readable, you could shorten this form to:
getFileSize path = getFileStatus path >>= \s -> return $ fileSize s
I don't know if there is a better way. RWH supplies its own wrapper to hFileSize:
getFileSize path = handle (\_ -> return Nothing) $
bracket (openFile path ReadMode) hClose $ \h -> do
size <- hFileSize h
return (Just size)
It also notes that the unix-compat is available, which "provides portable implementations of parts of the unix package."
It seems like System.Posix.Files doesn't work in Windows (except in Cygwin), have you tried unix-compat ?
https://hackage.haskell.org/package/unix-compat-0.4.1.4/docs/System-PosixCompat-Files.html
This worked for me on my Windows 10 machine:
> cabal install unix-compat
Resolving dependencies...
... lots of output, plus I had to put Cygwin on my path to make it build ...
> ghci
Prelude> import System.PosixCompat.Files
Prelude System.PosixCompat.Files> getFileStatus ".bashrc">>= \s -> return $ fileSize s
5764
import System.Posix.Files
import System.Posix.Types
getFileSize :: FilePath -> IO FileOffset
getFileSize path = fmap fileSize $ getFileStatus path
https://hackage.haskell.org/package/directory-1.3.6.0/docs/System-Directory.html#v:getFileSize
getFileSize :: FilePath -> IO Integer

Resources