Haskell: interaction between withCurrentDirectory and runConcurrently - haskell

I'm trying to automate some file management in Haskell using System.Directory. My script works synchronously, but in my use case, I have about twenty directories, for each of which I'd like to start a long-running process, so I am also using Control.Concurrent.Async, which seems to be causing problems.
Minimal Example:
#!/usr/bin/env stack
-- stack --resolver lts-10.3 --install-ghc runghc --package async
import Control.Concurrent.Async (Concurrently(..), runConcurrently)
import Control.Monad (filterM)
import System.Directory as Dir
import System.Process (callCommand)
dirs :: IO [FilePath]
dirs = do
prefix <- (++ "/Desktop/dirs/") <$> Dir.getHomeDirectory
paths <- fmap (prefix ++) <$> Dir.listDirectory prefix
filterM Dir.doesDirectoryExist paths
pullDir :: FilePath -> IO ()
pullDir dir = Dir.withCurrentDirectory dir $ callCommand "pwd"
main :: IO ()
main = dirs >>= runConcurrently . traverse (Concurrently . pullDir) >> pure ()
Expected output:
/Users/daniel/Desktop/dirs/1
/Users/daniel/Desktop/dirs/2
/Users/daniel/Desktop/dirs/3
/Users/daniel/Desktop/dirs/4
/Users/daniel/Desktop/dirs/5
Actual output (varies!):
/Users/daniel/Desktop/dirs/3
/Users/daniel/Desktop/dirs/4
/Users/daniel/Desktop/dirs/3
/Users/daniel/Desktop/dirs/5
/Users/daniel/Desktop/dirs/5
We see the actual output runs pwd for the same directory more than once and fails to run pwd for some of the directories entierly. I'm almost positive this has to do with withCurrentDirectory.
How can I implement this correctly while still preserving the concurrency?

This isn't possible with withCurrentDirectory. The current directory is a process-wide setting. Whenever something changes it, it's changed for everything in the process. This isn't a Haskell issue - it's just how the concept of "current directory" works.
To get this to work concurrently, you'll need to use full paths for everything, instead of changing the current directory.

Related

Data.Binary encodeFile does not seem to be thread safe - corrupted file?

I'm trying to replicate a situation where a binary file was essentially corrupted with a filesize of 0 in a real world application via encodeFile, this occurred after a hard reboot.
Although I've not been able to replicate this behavior exactly, I have gotten it to replicate a corrupted(?) file with code below.
When we first run it (some text is garbled due to multiple threads printing):
"New valid file written"
Example "hmm" [0]
"Testing..."
"Donenn"oo
tt een#no~ouGugHghCh I bDby-ytSteTesAs
R
CTCa~al#lllSSttaacckk ((ffrroomm HHaassCCaallllStS#at~caGkcH)kC:)I
:D
- Fe IreNrrIorSroH,r- ,5c ~ac#la
llelde #da~ tGa HtsC rIscDr/-cMS/aTMiAanRi.Tnh~.s#h:s5:35:31:51 5i ni nm amiani:nM:aMiani
n
"d"ideiien#ig~n.Gg.H..C..I..D..-..S."T.
Command "cabal v2-repl app" exited unexpectedly
After a few runs eventually we get an error of:
*** Exception: not enough bytes
CallStack (from HasCallStack):
error, called at src/Main.hs:53:15 in main:Main
What is the cause of this error? Is it just the case that encodeFile is not safe when used via multiple threads (which is kind of odd as there is no mention of threads on https://hackage.haskell.org/package/binary-0.10.0.0/docs/Data-Binary.html).
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE LambdaCase #-}
module Main where
import System.PosixCompat.Files
import System.Process
import System.Process.Internals
import System.Posix.Signals
import System.Posix.Process
import Control.Concurrent
import Control.Monad
import Data.Binary
import GHC.Generics (Generic)
import Control.Exception
data Example = Example String [Int] deriving (Generic, Show)
instance Binary Example
main :: IO ()
main = do
checkFile
encodeFile "output.txt" $ Example "hmm" [0]
checkFile
print "New valid file written"
decodeFileOrFail "output.txt" >>= \case
Right v#(Example s z) -> print v
Left (e,e') -> do
error $ e'
rip
print "Testing..."
forM_ [1..3] (const $ forkIO $ catch (do
checkFile
somethingIO
checkFile) (\e -> do
print (e :: SomeException)
rip
)
)
print "Done"
checkFile :: IO ()
checkFile = do
fileExist "output.txt" >>= \case
True -> do
x <- getFileSize "output.txt"
if x == 0 then
rip
else
pure ()
decodeFileOrFail "output.txt" >>= \case
Right (Example s z) -> pure ()
Left (e,e') -> do
error $ e'
rip
False -> pure ()
rip :: IO ()
rip = do
print "dieing......."
getProcessID >>= signalProcess sigKILL
somethingIO :: IO ()
somethingIO = do
let v = 10 :: Int
decodeFileOrFail "output.txt" >>= \case
Right (Example s z) -> encodeFile "output.txt" $ z ++ [v]
Left (e,e') -> do
error $ e'
rip
getFileSize :: String -> IO Int
getFileSize path = getFileStatus path >>= return . fromIntegral . fileSize
With a cabal file of:
cabal-version: 1.12
name: HaskellNixCabalStarter
version: 0.1.0.0
author: HaskellNixCabalStarter
maintainer: HaskellNixCabalStarter
license: MIT
build-type: Simple
executable app
main-is: Main.hs
other-modules:
Paths_HaskellNixCabalStarter
hs-source-dirs:
src
build-depends:
base >=4.12 && <4.13
, binary
, process
, random
, unix
, unix-compat
default-language: Haskell2010
There's nothing particularly mysterious going on here. Reading and writing files simply aren't atomic operations, and this is biting you. If you have one thread writing output.txt and another reading output.txt, it is completely normal and expected for the reader to occasionally see only part of the file that the writer would eventually produce.
This is not particularly special to the binary package, nor even to the language -- this is, to a first approximation, true of nearly every library and language that deals with a filesystem. Guaranteeing atomicity of the appropriate kind is quite hard, indeed; but many, many engineering years have gone into providing this kind of thing for databases, so if that's a need for you, you might consider using one of them.
Alternately, a significantly simpler solution is to have a single thread that is responsible for reading and writing the appropriate file, and to communicate with it via one of Haskell's excellent inter-thread communication tools.
Some OSs do offer an atomic file-rename operation. In such a situation, one could also consider writing to a temporary file, then using an atomic rename to overwrite the filename you actually care about. (Thanks to a commenter who I will leave anonymous because they chose to delete their comment for suggesting this.)

Get a list of what is exported by a module

Similar question (Is there a way to see the list of functions in a module, in GHCI?), though not the result that I seek.
Is there a way to get a list of what is exported by a module?
Of course in GHCi you can import it then type Some.Module., hit tab for auto-completion and it will show what I seek. But I want to capture that stuff. Roughly speaking, String -> [String].
Purpose? Suppose that I have a source file with a naked import Some.Module. Question: What belongs to Some.Module in that file? A simple way would be to output the list of what the module exports, feed that to grep and return the contenders, without the need to load that source file in GHCi (might be complicated or not possible). And everything becomes a lot clearer.
If there's a smarter approach to that, I'm listening. I heard of solutions involving GOA and lambdabot. No idea if applicable or how to make use of this.
As #HTNW mentioned in a comment, if you can run ghc on your actual file, you can use -ddump-minimal-imports. Otherwise, if you want to actually get the list of exports from another module, assuming that you're using GHC, the easiest way to do this is probably to look at the .hi interface files. ghc has some built-in support for printing human-readable representations of interface files once you know the path to one, but as the wiki page notes "This textual format is not particularly designed for machine parsing". You can also access the information you might want via the GHC API. A small example of doing something like that follows.
We start with a bunch of random imports for doing IO & from the GHC api:
import Control.Monad.IO.Class
import System.IO
import System.Environment
import GHC
import GHC.Paths (libdir)
import DynFlags
import Outputable
import Name
import Pretty (Mode(..))
With that bureaucracy out of the way, main starts by firing up the GHC Monad:
main :: IO ()
main = defaultErrorHandler defaultFatalMessager defaultFlushOut $ do
runGhc (Just libdir) $ do
We're not actually generating any code so we can set hscTarget = HscNothing during the DynFlags setup boilerplate:
dflags <- getSessionDynFlags
let dflags' = dflags { hscTarget = HscNothing }
setSessionDynFlags dflags'
With that out of the way we can find the module we want from the package database (using the first command-line argument as the name):
mn <- head <$> (liftIO $ getArgs)
m <- lookupModule (mkModuleName mn) Nothing
We can use getModuleInfo to get a module info structure:
mmi <- getModuleInfo m
case mmi of
Nothing -> liftIO $ putStrLn "Could not find module interface"
If we did find the interface, everything we need for this is in the modInfoExports. If we needed more, we could also get the actual ModIface:
Just mi -> mapM_ (printExport dflags') (modInfoExports mi)
Actually printing out an exported is a bit tedious, as it requires working with Names; a simple example printExport might just use the pretty-printing functions, but these are more intended for printing human-readable output than machine-readable:
printExport :: DynFlags -> Name -> Ghc ()
printExport dflags n =
liftIO $ printSDocLn PageMode dflags stdout (defaultUserStyle dflags)
$ pprNameUnqualified n
A particularly simple way for interactive use is :browse. Load up a ghci that has access to the appropriate package, then
> :browse Some.Module
class Some.Module.Foo a where
Some.Module.foo :: a -> a
{-# MINIMAL foo #-}
Some.Module.bar :: Int
All the qualification can get a bit much, especially if there are many functions that operate on types defined in the same module. To reduce the clutter, you can bring Some.Module into scope first:
> :m + Some.Module
> :browse Some.Module
class Foo a where
foo :: a -> a
{-# MINIMAL foo #-}
bar :: Int

How to print paths using Haskell Turtle library?

To learn a bit about Turtle, I thought it would be nice to modify example from the tutorial. I chose to remove the reduntant "FilePath" from each line of the output thinking it would be a simple exercise.
And yet, despite author's efforts into making his library easy to use I nearly failed to use it to solve this simple problem.
I tried everyting I saw that looked like it would allow me to somehow lift >>= from IO into Shell: MonadIO, FoldM, liftIO, _foldIO with no success. I grew frustrated and only through reading Turtle source code I was able to find something that seems to work ("no obvious defects" comes to mind).
Why is this so hard? How does one logically arrive a solution using API of this library?
#!/usr/bin/env stack
-- stack --resolver lts-8.17 --install-ghc runghc --package turtle --package lens
{-# LANGUAGE OverloadedStrings #-}
import Turtle
import Control.Lens
import Control.Foldl as Foldl
import Filesystem.Path.CurrentOS
import Data.Text.IO as T
import Data.Text as T
main = do
homedir <- home
let paths = lstree $ homedir </> "projects"
let t = fmap (Control.Lens.view _Right . toText) paths
customView t
customView s = sh (do
x <- s
liftIO $ T.putStrLn x)
You don't lift >>= from IO into Shell. Shell already has a Monad instance that comes with its own >>= function. Instead you either lift IO actions into Shell with liftIO or run the shell with fold or foldM. Use sh to run the Shell when you don't care about the results.
I believe your example can be simplified to
main = sh $ do
homedir <- home
filepath <- lstree $ homedir </> "projects"
case (toText filepath) of
Right path -> liftIO $ T.putStrLn x
Left approx -> return () -- This shouldn't happen
As for the difficulty with getting a string back from a FilePath, I don't think that can be blamed on the Turtle author. I think it can be simplified to
stringPath :: FilePath -> String
stringPath filepath =
case (toText filePath) of -- try to use the human readable version
Right path -> T.unpack path
Left _ -> encodeString filePath -- fall back on the machine readable one
Combined this would simplify the example to
main = sh $ do
homedir <- home
filepath <- lstree $ homedir </> "projects"
liftIO $ putStrLn (stringPath filepath)
or
main = view $ do
homedir <- home
filepath <- lstree $ homedir </> "projects"
return $ stringPath filepath

How to write a zip file using Haskell LibZip?

I'm trying to figure out a dead-simple task using LibZip in Haskell: how do I open an archive foo.zip, decompress it, recompress it, and save it to a new archive bar.zip? With the Zip library, this is easy:
{-# LANGUAGE OverloadedStrings #-}
import Codec.Archive.Zip (toArchive, fromArchive)
import qualified Data.ByteString.Lazy as B
import System.Environment
saveZipAs :: FilePath -> FilePath -> IO ()
saveZipAs source dest = do
arch <- fmap toArchive $ B.readFile source
putStrLn "Archive info: " >> print arch
B.writeFile dest $ fromArchive arch
LibZip, on the other hand, provides no clear way to do this (that I can see). It only seems to be able to instantiate a zip file with withArchive (which is an issue in and of itself, because a file you want to open might not be on disk), and I don't see a way to do any kind of "save as" operation, nor to extract the compressed bytes as a ByteString or otherwise (as in Zip). LibZip is supposedly faster than Zip, so I want to at least give it a try, but it seems much more obscure (and also impure, carrying around an IO everywhere it goes, where it is really only needed at the beginning and the end, if ever). Can anyone give me some tips?
Side note: it really boggles the mind how people can spend such huge amounts of time writing a library, only to document it so poorly that no one can use it. Library writers, please don't do this!
Your link is somehow to an old version of the library, and the very last version of the library seems to have haddock compilation bugs.
Here are file reading functions in a newer version:
http://hackage.haskell.org/package/LibZip-0.10.2/docs/Codec-Archive-LibZip.html#g:3
The reverse process seems to be addFile/sourceBuffer and related functions.
Here is full source code of zip repacking:
import Codec.Archive.LibZip
import Codec.Archive.LibZip.Types
main = readZip "foo.zip" >>= writeZip "bar.zip"
readZip :: FilePath -> IO [(FilePath, ZipSource)]
readZip zipName = withArchive [] zipName $ do
nn <- fileNames []
ss <- mapM (\n -> sourceFile n 0 (-1)) nn
return $ zip nn ss
writeZip :: FilePath -> [(FilePath, ZipSource)] -> IO ()
writeZip zipName zipContent = withArchive [CreateFlag] zipName $ do
mapM_ (uncurry addFile) zipContent
Few refactorings still can be done: liftM2 zip can be used in readZip, and function composition . in writeZip.

What is the haskell way to copy a directory

I find myself doing more and more scripting in haskell. But there are some cases where I'm really not sure of how to do it "right".
e.g. copy a directory recursively (a la unix cp -r).
Since I mostly use linux and Mac Os I usually cheat:
import System.Cmd
import System.Exit
copyDir :: FilePath -> FilePath -> IO ExitCode
copyDir src dest = system $ "cp -r " ++ src ++ " " ++ dest
But what is the recommended way to copy a directory in a platform independent fashion?
I didn't find anything suitable on hackage.
This is my rather naiv implementation I use so far:
import System.Directory
import System.FilePath((</>))
import Control.Applicative((<$>))
import Control.Exception(throw)
import Control.Monad(when,forM_)
copyDir :: FilePath -> FilePath -> IO ()
copyDir src dst = do
whenM (not <$> doesDirectoryExist src) $
throw (userError "source does not exist")
whenM (doesFileOrDirectoryExist dst) $
throw (userError "destination already exists")
createDirectory dst
content <- getDirectoryContents src
let xs = filter (`notElem` [".", ".."]) content
forM_ xs $ \name -> do
let srcPath = src </> name
let dstPath = dst </> name
isDirectory <- doesDirectoryExist srcPath
if isDirectory
then copyDir srcPath dstPath
else copyFile srcPath dstPath
where
doesFileOrDirectoryExist x = orM [doesDirectoryExist x, doesFileExist x]
orM xs = or <$> sequence xs
whenM s r = s >>= flip when r
Any suggestions of what really is the way to do it?
I updated this with the suggestions of hammar and FUZxxl.
...but still it feels kind of clumsy to me for such a common task!
It's possible to use the Shelly library in order to do this, see cp_r:
cp_r "sourcedir" "targetdir"
Shelly first tries to use native cp -r if available. If not, it falls back to a native Haskell IO implementation.
For further details on type semantics of cp_r, see this post written by me to described how to use cp_r with String and or Text.
Shelly is not platform independent, since it relies on the Unix package, which is not supported under Windows.
I couldn't find anything that does this on Hackage.
Your code looks pretty good to me. Some comments:
dstExists <- doesDirectoryExist dst
This does not take into account that a file with the destination name might exist.
if or [not srcExists, dstExists] then print "cannot copy"
You might want to throw an exception or return a status instead of printing directly from this function.
paths <- forM xs $ \name -> do
[...]
return ()
Since you're not using paths for anything, you can change this to
forM_ xs $ \name -> do
[...]
The filesystem-trees package provides the means for a very simple implementation:
import System.File.Tree (getDirectory, copyTo_)
copyDirectory :: FilePath -> FilePath -> IO ()
copyDirectory source target = getDirectory source >>= copyTo_ target
The MissingH package provides recursive directory traversals, which you might be able to use to simplify your code.
I assume that the function in Path.IO copyDirRecur with variants to include/exclude symlinks may be a newer and maintained solution. It requires to convert the filepath to Path x Dir which is achieved with parseRelDir respective parseAbsDir, but I think to have a more precise date type than FilePath is worthwile to avoid hard to track errors at run-time.
There are also some functions for copying files and directories in the core Haskell library Cabal modules, specifically Distribution.Simple.Utils in package Cabal. copyDirectoryRecursive is one, and there are other functions near this one in that module.

Resources