Check if two directories are on the same filesystem in haskell - linux

If I have two directories A and B How do I tell if they are on the same filesystem (e.g on same hardrive) in Haskell on OS X and linux ?
I checked System.Directory and System.FilePath.Posix which don't seem to have any thing for doing this.

The getFileStatus and deviceID functions from the unix package should help you with that.

One way would be to exploit the stat utility and write a wrapper for it yourself. stat has the ability to give device number for your file. I tested this following code in Linux and it works for different disks (but I'm not sure for Mac OS):
import Control.Applicative ((<$>))
import System.Process
statDeviceID :: FilePath -> IO String
statDeviceID fp = readProcess "stat" ["--printf=%d", fp] ""
-- for mac which has a different version of stat
-- statDeviceID fp = readProcess "stat" ["-f", "%d", fp] ""
checkSameDevice :: [FilePath] -> IO Bool
checkSameDevice xs = (\x -> all (== head x) x) <$> (sequence $ map statDeviceID xs)
paths = ["/mnt/Books", "/home/sibi"]
main = checkSameDevice paths >>= print
In ghci:
λ> main
False -- False since /mnt is a different hard disk

Related

Haskell Turtle get out of Shell Monad

could you please help me with Turtle library.
I want to write simple program, that calculates disk space usage.
Here is the code:
getFileSize :: FilePath -> IO Size
getFileSize f = do
status <- stat f
return $ fileSize status
main = sh $ do
let sizes = fmap getFileSize $ find (suffix ".hs") "."
so now I have sizes bind of type Shell (IO Size). But I can't just sum it, with sum fold, cause there is IO Size in there. If it was something like [IO Size] I could pull IO monad out of there by using sequence to transform it to IO [Size]. But I can't do this with Shell monad since it is not Traversable. So I wrote something like this
import qualified Control.Foldl as F
main = sh $ do
let sizes = fmap getFileSize $ find (suffix ".hs") "."
lst <- fold sizes F.list
let cont = sequence lst
sz <- liftIO $ cont
liftIO $ putStrLn (show (sum sz))
First I folded Shell (IO Size) to [IO Size] and then to IO [Size] to sum list afterwards.
But I wonder if there is more canonical or elegant solution to this, because here I created two lists to accomplish my task. And I throught that Shell monad is for manipulating entities in constant space. Maybe there is some fold to make IO (Shell Size) from Shell (IO Size)?
Thanks.
You have an IO action, and you really want a Shell action. The usual way to handle that is with the liftIO method, which is available because Shell is an instance of MonadIO.
file <- find (suffix ".hs") "."
size <- liftIO $ getFileSize file
or even
size <- liftIO . getFileSize =<< find (suffix ".hs") "."
Fortunately, the Turtle package itself offers some size functions you can use directly with MonadIO instances like Shell in Turtle.Prelude so you don't need to use liftIO yourself.
Now you actually have to sum these up, but you can do that with fold and sum.
I would recommend that you avoid breaking open the Shell type itself. That should be reserved for adding totally new functionality to the API. That certainly isn't necessary in this case.
Actually I've managed to get rid of IO here by using helper transformation
sio :: Shell (IO a) -> Shell a
sio s = Shell (\(FoldShell step begin done) ->
let step' x a = do
a' <- a
step x a'
in
_foldShell s (FoldShell step' begin done))
But now I wonder is there any simpler solution to this task...

Haskell Turtle - split a shell

Is it possible to split a Shell in Turtle library (Haskell) and do different things to either split of the shell, such that the original Shell is only run once ?
/---- shell2
---Shell1 --/
\
\-----shell3
For instance, how to do
do
let lstmp = lstree "/tmp"
view lstmp
view $ do
path <- lstmp
x <- liftIO $ testdir path
return x
such that lstree "/tmp" would only run once.
Specifically I would like to send Shell 2 and Shell 3 to different files using output.
You won't be able to split a Shell into two separate shells that run simultaneously, unless there's some magic I don't know. But file writing is a fold over the contents of a shell or some other succession of things. It is built into turtle that you can always combine many folds and make them run simultaneously using the Control.Foldl material - here
foldIO :: Shell a -> FoldM IO a r -> IO r -- specializing
A shell is secretly a FoldM IO a r -> IO r under the hood anyway, so this is basically runShell. To do this we need to get the right Shell and the right combined FoldM IO. The whole idea of the Fold a b and FoldM m a b types from the foldl package is simultaneous folding.
I think the easiest way to get the right shell is just to make the lstree fold return a FilePath together with the result of testdir. You basically wrote this:
withDirInfo :: FilePath -> Shell (Bool, FilePath)
withDirInfo tmp = do
let lstmp = lstree tmp
path <- lstmp
bool <- liftIO $ testdir path
return (bool, path)
So now we can get a Shell (Bool, FilePath) from /tmp This has all the information our two folds will need, and thus that our combined fold will need.
Next we might write a helper fold that prints the Text component of the FilePath to a given handle:
sinkFilePaths :: Handle -> FoldM IO FilePath ()
sinkFilePaths handle = L.sink (T.hPutStrLn handle . format fp)
Then we can use this Handle -> FoldM IO FilePath () to define two FoldM IO (Bool, FilePath) (). Each will write different stuff to different handles, and we can unite them into a single simultaneous fold with <*. This is an independent FoldM IO ... and can be applied e.g. to a pure list of type [(Bool, FilePath)] using L.fold and it will write different things from the list to the different handles. In our case, though, we will apply it to the Shell (Bool, FilePath) we defined.
The only subtle part of this is the use of L.handlesM to print only the second element, in both cases, and only those filtered as directories in the other. This uses the _2 lens and filtered from the lens libraries. This could probably be simplified, but see what you think:
{-#LANGUAGE OverloadedStrings #-}
import Turtle
import qualified Control.Foldl as L
import qualified System.IO as IO
import Control.Lens (_2,filtered)
import qualified Data.Text.IO as T
main = IO.withFile "tmpfiles.txt" IO.WriteMode $ \h ->
IO.withFile "tmpdirs.txt" IO.WriteMode $ \h' -> do
foldIO (withDirInfo "/tmp") (sinkFilesDirs h h')
withDirInfo :: Turtle.FilePath -> Shell (Bool, Turtle.FilePath)
withDirInfo tmp = do
let lstmp = lstree tmp
path <- lstmp
bool <- liftIO $ testdir path
return (bool, path)
sinkFilePaths :: Handle -> FoldM IO Turtle.FilePath ()
sinkFilePaths handle = L.sink (T.hPutStrLn handle . format fp)
sinkFilesDirs :: Handle -> Handle -> FoldM IO (Bool, Turtle.FilePath) ()
sinkFilesDirs h h' = allfiles <* alldirs where
allfiles :: L.FoldM IO (Bool, Turtle.FilePath) ()
allfiles = L.handlesM _2 (sinkFilePaths h)
-- handle the second element of pairs with sinkFilePaths
alldirs :: FoldM IO (Bool, Turtle.FilePath) ()
alldirs = L.handlesM (filtered (\(bool,file) -> bool) . _2) (sinkFilePaths h')
-- handle the second element of pairs where the first element
-- is true using sinkFilePaths
It sounds like you're looking for something like async to split off your shells from the first shell and then wait for them to return. async is a pretty capable library that can achieve much more than the below example, but it provides a pretty simple solution to what you're asking for:
import Control.Concurrent.Async
import Turtle.Shell
import Turtle.Prelude
main :: IO ()
main = do
let lstmp1 = lstree "/tmp"
let lstmp2 = lstree "/etc"
view lstmp1
view lstmp2
job1 <- async $ view $ do
path <- lstmp1
x <- liftIO $ testdir path
return x
job2 <- async $ view $ do
path <- lstmp2
x <- liftIO $ testdir path
return x
wait job1
wait job2
Is this what you're looking for?

How to write a zip file using Haskell LibZip?

I'm trying to figure out a dead-simple task using LibZip in Haskell: how do I open an archive foo.zip, decompress it, recompress it, and save it to a new archive bar.zip? With the Zip library, this is easy:
{-# LANGUAGE OverloadedStrings #-}
import Codec.Archive.Zip (toArchive, fromArchive)
import qualified Data.ByteString.Lazy as B
import System.Environment
saveZipAs :: FilePath -> FilePath -> IO ()
saveZipAs source dest = do
arch <- fmap toArchive $ B.readFile source
putStrLn "Archive info: " >> print arch
B.writeFile dest $ fromArchive arch
LibZip, on the other hand, provides no clear way to do this (that I can see). It only seems to be able to instantiate a zip file with withArchive (which is an issue in and of itself, because a file you want to open might not be on disk), and I don't see a way to do any kind of "save as" operation, nor to extract the compressed bytes as a ByteString or otherwise (as in Zip). LibZip is supposedly faster than Zip, so I want to at least give it a try, but it seems much more obscure (and also impure, carrying around an IO everywhere it goes, where it is really only needed at the beginning and the end, if ever). Can anyone give me some tips?
Side note: it really boggles the mind how people can spend such huge amounts of time writing a library, only to document it so poorly that no one can use it. Library writers, please don't do this!
Your link is somehow to an old version of the library, and the very last version of the library seems to have haddock compilation bugs.
Here are file reading functions in a newer version:
http://hackage.haskell.org/package/LibZip-0.10.2/docs/Codec-Archive-LibZip.html#g:3
The reverse process seems to be addFile/sourceBuffer and related functions.
Here is full source code of zip repacking:
import Codec.Archive.LibZip
import Codec.Archive.LibZip.Types
main = readZip "foo.zip" >>= writeZip "bar.zip"
readZip :: FilePath -> IO [(FilePath, ZipSource)]
readZip zipName = withArchive [] zipName $ do
nn <- fileNames []
ss <- mapM (\n -> sourceFile n 0 (-1)) nn
return $ zip nn ss
writeZip :: FilePath -> [(FilePath, ZipSource)] -> IO ()
writeZip zipName zipContent = withArchive [CreateFlag] zipName $ do
mapM_ (uncurry addFile) zipContent
Few refactorings still can be done: liftM2 zip can be used in readZip, and function composition . in writeZip.

What is the haskell way to copy a directory

I find myself doing more and more scripting in haskell. But there are some cases where I'm really not sure of how to do it "right".
e.g. copy a directory recursively (a la unix cp -r).
Since I mostly use linux and Mac Os I usually cheat:
import System.Cmd
import System.Exit
copyDir :: FilePath -> FilePath -> IO ExitCode
copyDir src dest = system $ "cp -r " ++ src ++ " " ++ dest
But what is the recommended way to copy a directory in a platform independent fashion?
I didn't find anything suitable on hackage.
This is my rather naiv implementation I use so far:
import System.Directory
import System.FilePath((</>))
import Control.Applicative((<$>))
import Control.Exception(throw)
import Control.Monad(when,forM_)
copyDir :: FilePath -> FilePath -> IO ()
copyDir src dst = do
whenM (not <$> doesDirectoryExist src) $
throw (userError "source does not exist")
whenM (doesFileOrDirectoryExist dst) $
throw (userError "destination already exists")
createDirectory dst
content <- getDirectoryContents src
let xs = filter (`notElem` [".", ".."]) content
forM_ xs $ \name -> do
let srcPath = src </> name
let dstPath = dst </> name
isDirectory <- doesDirectoryExist srcPath
if isDirectory
then copyDir srcPath dstPath
else copyFile srcPath dstPath
where
doesFileOrDirectoryExist x = orM [doesDirectoryExist x, doesFileExist x]
orM xs = or <$> sequence xs
whenM s r = s >>= flip when r
Any suggestions of what really is the way to do it?
I updated this with the suggestions of hammar and FUZxxl.
...but still it feels kind of clumsy to me for such a common task!
It's possible to use the Shelly library in order to do this, see cp_r:
cp_r "sourcedir" "targetdir"
Shelly first tries to use native cp -r if available. If not, it falls back to a native Haskell IO implementation.
For further details on type semantics of cp_r, see this post written by me to described how to use cp_r with String and or Text.
Shelly is not platform independent, since it relies on the Unix package, which is not supported under Windows.
I couldn't find anything that does this on Hackage.
Your code looks pretty good to me. Some comments:
dstExists <- doesDirectoryExist dst
This does not take into account that a file with the destination name might exist.
if or [not srcExists, dstExists] then print "cannot copy"
You might want to throw an exception or return a status instead of printing directly from this function.
paths <- forM xs $ \name -> do
[...]
return ()
Since you're not using paths for anything, you can change this to
forM_ xs $ \name -> do
[...]
The filesystem-trees package provides the means for a very simple implementation:
import System.File.Tree (getDirectory, copyTo_)
copyDirectory :: FilePath -> FilePath -> IO ()
copyDirectory source target = getDirectory source >>= copyTo_ target
The MissingH package provides recursive directory traversals, which you might be able to use to simplify your code.
I assume that the function in Path.IO copyDirRecur with variants to include/exclude symlinks may be a newer and maintained solution. It requires to convert the filepath to Path x Dir which is achieved with parseRelDir respective parseAbsDir, but I think to have a more precise date type than FilePath is worthwile to avoid hard to track errors at run-time.
There are also some functions for copying files and directories in the core Haskell library Cabal modules, specifically Distribution.Simple.Utils in package Cabal. copyDirectoryRecursive is one, and there are other functions near this one in that module.

Faster alternatives to hFileSize to retrieve the size of a file in Haskell?

I am wondering how to get the size of a file in haskell with the least amount of overhead. Right now I have the following code:
getFileSize :: FilePath -> IO Integer
getFileSize x = do
handle <- openFile x ReadMode
size <- hFileSize handle
hClose handle
return size
This seems to be quite slow. I have stumbled across getFileStatus in System.Posix.Files but don't know how it works - at least I only get errors when playing around with it in ghci. Also, I am not sure if this would work on Windows (probably not).
So to reiterate: What is the best (and platform independent) approach to get the size of a file in Haskell?
What you want are indeed getFileStatus and fileSize, both from System.Posix (which will work just fine under Windows, if you use the unix-compat package instead of unix). Usage is as follows, leaving error handling up to you:
getFileSize :: String -> IO Integer
getFileSize path = do
stat <- getFileStatus path
return $ fromIntegral (fileSize stat)
For what it's worth, and though I think it's less readable, you could shorten this form to:
getFileSize path = getFileStatus path >>= \s -> return $ fileSize s
I don't know if there is a better way. RWH supplies its own wrapper to hFileSize:
getFileSize path = handle (\_ -> return Nothing) $
bracket (openFile path ReadMode) hClose $ \h -> do
size <- hFileSize h
return (Just size)
It also notes that the unix-compat is available, which "provides portable implementations of parts of the unix package."
It seems like System.Posix.Files doesn't work in Windows (except in Cygwin), have you tried unix-compat ?
https://hackage.haskell.org/package/unix-compat-0.4.1.4/docs/System-PosixCompat-Files.html
This worked for me on my Windows 10 machine:
> cabal install unix-compat
Resolving dependencies...
... lots of output, plus I had to put Cygwin on my path to make it build ...
> ghci
Prelude> import System.PosixCompat.Files
Prelude System.PosixCompat.Files> getFileStatus ".bashrc">>= \s -> return $ fileSize s
5764
import System.Posix.Files
import System.Posix.Types
getFileSize :: FilePath -> IO FileOffset
getFileSize path = fmap fileSize $ getFileStatus path
https://hackage.haskell.org/package/directory-1.3.6.0/docs/System-Directory.html#v:getFileSize
getFileSize :: FilePath -> IO Integer

Resources