I have many haskell packages and I have enabled some flag to allow them generate haddock documents. Now these documents are under directories like /usr/share/doc/{package-name}-{version}/html/.
Is there a tool to organize them? I want something like all packages by name page in hackage,
so that local links to all these installed packages can be found in one page.
It'll be better if hoogle can be told to use these documents. By now my hoogle search resutls are all pointing to the corresponding pages in hackage.
Since my question has not yet been answered, I wrote a quick and dirty program to answer my first question:
import System.Directory
import System.IO
import System.Environment
import System.Exit
import System.Path
import System.FilePath.Posix
import Control.Applicative
import Control.Monad
import Data.Maybe
import Data.List
import Text.Printf
-- | make markdown table row
makeTableRow :: String -> FilePath -> String
makeTableRow dirName htmlPath = intercalate "|" [ dirName
, link "frames"
, link "index"
, link "doc-index"]
where
link s = printf "[%s](%s)" s $ htmlPath </> s ++ ".html"
scanAndMakeTable :: String -> IO [String]
scanAndMakeTable relDocPath = do
(Just docPath) <- absNormPath' <$> getCurrentDirectory <*> pure relDocPath
dirs <- getDirectoryContents docPath
items <- liftM catMaybes
. mapM (asHaskellPackage docPath)
. sort $ dirs
return $ headers1:headers2:map (uncurry makeTableRow) items
where
headers1 = "| " ++ intercalate " | " (words "Package Frames Contents Index") ++ " |"
headers2 = intercalate " --- " $ replicate 5 "|"
absNormPath' a p = addMissingRoot <$> absNormPath a p
-- sometimes the leading '/' is missing in absNormPath results
addMissingRoot s#('/':_) = s
addMissingRoot s = '/' : s
asHaskellPackage :: String -> String -> IO (Maybe (String,FilePath))
asHaskellPackage docPath dirName = do
-- a valid haskell package has a "haddock dir"
-- in which we can at least find a file with ".haddock" as extension name
b1 <- doesDirectoryExist haddockFileDir
if b1
then do
b2 <- any ((== ".haddock") . takeExtension)
<$> getDirectoryContents haddockFileDir
return $ if b2 then Just (dirName,haddockFileDir) else Nothing
else return Nothing
where
-- guess haddock dir
haddockFileDir = docPath </> dirName </> "html"
main :: IO ()
main = do
args <- getArgs
case args of
[docPath'] -> scanAndMakeTable docPath' >>= putStrLn . unlines
_ -> help
where
help = hPutStrLn stderr "Usage: <program> <path-to-packages>"
>> exitFailure
By observing the structure of these haddock directories, I recognize haddock directories by testing:
if there's a subdirectory called html.
if in the subdirectory html, there is a file with .haddock as extension name.
Run the program with runghc <source-file> /usr/share/doc/ >document-nav.md should generate a markdown file containing links to documents. Afterward just pipe it to pandoc or some other markdown2html converter and use the resulting HTML file in a browser to navigate through package documents.
Related
To learn a bit about Turtle, I thought it would be nice to modify example from the tutorial. I chose to remove the reduntant "FilePath" from each line of the output thinking it would be a simple exercise.
And yet, despite author's efforts into making his library easy to use I nearly failed to use it to solve this simple problem.
I tried everyting I saw that looked like it would allow me to somehow lift >>= from IO into Shell: MonadIO, FoldM, liftIO, _foldIO with no success. I grew frustrated and only through reading Turtle source code I was able to find something that seems to work ("no obvious defects" comes to mind).
Why is this so hard? How does one logically arrive a solution using API of this library?
#!/usr/bin/env stack
-- stack --resolver lts-8.17 --install-ghc runghc --package turtle --package lens
{-# LANGUAGE OverloadedStrings #-}
import Turtle
import Control.Lens
import Control.Foldl as Foldl
import Filesystem.Path.CurrentOS
import Data.Text.IO as T
import Data.Text as T
main = do
homedir <- home
let paths = lstree $ homedir </> "projects"
let t = fmap (Control.Lens.view _Right . toText) paths
customView t
customView s = sh (do
x <- s
liftIO $ T.putStrLn x)
You don't lift >>= from IO into Shell. Shell already has a Monad instance that comes with its own >>= function. Instead you either lift IO actions into Shell with liftIO or run the shell with fold or foldM. Use sh to run the Shell when you don't care about the results.
I believe your example can be simplified to
main = sh $ do
homedir <- home
filepath <- lstree $ homedir </> "projects"
case (toText filepath) of
Right path -> liftIO $ T.putStrLn x
Left approx -> return () -- This shouldn't happen
As for the difficulty with getting a string back from a FilePath, I don't think that can be blamed on the Turtle author. I think it can be simplified to
stringPath :: FilePath -> String
stringPath filepath =
case (toText filePath) of -- try to use the human readable version
Right path -> T.unpack path
Left _ -> encodeString filePath -- fall back on the machine readable one
Combined this would simplify the example to
main = sh $ do
homedir <- home
filepath <- lstree $ homedir </> "projects"
liftIO $ putStrLn (stringPath filepath)
or
main = view $ do
homedir <- home
filepath <- lstree $ homedir </> "projects"
return $ stringPath filepath
I have a file with strings which represent directories. Some of those strings have a tilde (~) in it.
I want to join the homedirectory (~) of a user to the rest of the string.
What I have so far:
import Data.List (isPrefixOf)
import System.Directory (doesDirectoryExist, getHomeDirectory)
import System.FilePath (joinPath)
getFullPath s
| "~" `isPrefixOf` s = joinPath [getHomeDirectory, tail s]
| otherwise = s
But I get the following error:
Couldn't match type `IO FilePath' with `[Char]'Expected type: FilePath Actual type: IO FilePathIn the expression: getHomeDirectoryIn the first argument of `joinPath', namely `[getHomeDirectory, tail s]'In the expression: joinPath
I don't know, and I can't find, how to convert the types so they match and can be joined together.
A more idiomatic solution than #user2720372 suggests is to split non-monadic code from monadic code. IO actions are monadic functions in IO monad.
If you only need getFullPath locally it makes sense to cache home directory:
fullPath homePath s
| "~" `isPrefixOf` s = joinPath [homePath, tail s]
| otherwise = s
main = do
homePath <- getHomeDirectory
let getFullPath = fullPath homePath
print $ getFullPath "~/foo"
If you still need full global getFullPath then it can be implemented like this:
getFullPath p = do
homePath <- getHomeDirectory
return $ fullPath homePath p
And it's considered a good style to keep fullPath and getFullPath separated.
Also you don't need isPrefixOf and tail in the first place for such a simple case:
fullPath homePath ('~' : t) = joinPath [homePath, t]
fullPath _ s = s
If you want just a monolithic getFullPath then #user2720372's variant can be simplified:
getFullPath s = do
homeDir <- getHomeDirectory
return $ case s of
('~' : t) -> joinPath [homeDir, t]
_ -> s
Note that the code above is just refactorings of your code preserving its wrong behavior: you should compare ~ with the first path component, not with the first path character. Use splitPath from System.FilePath:
getFullPath s = do
homeDir <- getHomeDirectory
return $ case splitPath s of
("~" : t) -> joinPath $ homeDir : t
_ -> s
Also, do-notation is only for complicated cases. If you use do-notation for simple two-liners it is almost certainly reducible to an application of fmap/<$>/>>=/>=>/liftM2 or other functions from Control.Monad and Control.Applicative.
Here is another version:
import Control.Applicative ((<$>))
import System.Directory (getHomeDirectory)
import System.FilePath (joinPath, splitPath)
getFullPath s = case splitPath s of
"~/" : t -> joinPath . (: t) <$> getHomeDirectory
_ -> return s
main = getFullPath "~/foo" >>= print
Here is yet another more modular, but less readable version:
import Control.Applicative ((<$>), (<*>))
import System.Directory (getHomeDirectory)
import System.FilePath (joinPath, splitPath)
main = getFullPath "~/foo" >>= print
withPathComponents f = joinPath . f . splitPath
replaceHome p ("~/" : t) = p : t
replaceHome _ s = s
getFullPath path = withPathComponents . replaceHome <$> getHomeDirectory <*> return path
Haskell gurus are invited to rewrite it to preserve modularity but improve readability :)
getHomeDirectory :: IO FilePath
getHomeDirectory is not a function but an IO action so you have to unpack it within another IO action first.
getFullPath :: String -> IO FilePath
getFullPath s = do
homeDir <- getHomeDirectory
if "~" `isPrefixOf` s
then return (joinPath [homeDir, tail s])
else return s
I'm writing a program that creates a shell script containing one command for each image file in a directory. There are 667,944 images in the directory, so I need to handle the strictness/laziness issue properly.
Here's a simple example that gives me Stack space overflow. It does work if I give it more space using +RTS -Ksize -RTS, but it should be able run with little memory, producing output immediately. So I've been reading the stuff about strictness in the Haskell wiki and the wikibook on Haskell, trying to figure out how to fix the problem, and I think it's one of the mapM commands that is giving me grief, but I still don't understand enough about strictness to sort the problem.
I've found some other questions on SO that seem relevant (Is mapM in Haskell strict? Why does this program get a stack overflow? and Is Haskell's mapM not lazy?), but enlightenment still eludes me.
import System.Environment (getArgs)
import System.Directory (getDirectoryContents)
genCommand :: FilePath -> FilePath -> FilePath -> IO String
genCommand indir outdir file = do
let infile = indir ++ '/':file
let angle = 0 -- have to actually read the file to calculate this for real
let outfile = outdir ++ '/':file
return $! "convert " ++ infile ++ " -rotate " ++ show angle ++
" -crop 143x143+140+140 " ++ outfile
main :: IO ()
main = do
putStrLn "#!/bin/sh"
(indir:outdir:_) <- getArgs
files <- getDirectoryContents indir
let imageFiles = filter (`notElem` [".", ".."]) files
commands <- mapM (genCommand indir outdir) imageFiles
mapM_ putStrLn commands
EDIT: TEST #1
Here's the newest version of the example.
import System.Environment (getArgs)
import System.Directory (getDirectoryContents)
import Control.Monad ((>=>))
genCommand :: FilePath -> FilePath -> FilePath -> IO String
genCommand indir outdir file = do
let infile = indir ++ '/':file
let angle = 0 -- have to actually read the file to calculate this for real
let outfile = outdir ++ '/':file
return $! "convert " ++ infile ++ " -rotate " ++ show angle ++
" -crop 143x143+140+140 " ++ outfile
main :: IO ()
main = do
putStrLn "TEST 1"
(indir:outdir:_) <- getArgs
files <- getDirectoryContents indir
putStrLn $ show (length files)
let imageFiles = filter (`notElem` [".", ".."]) files
-- mapM_ (genCommand indir outdir >=> putStrLn) imageFiles
mapM_ (\filename -> genCommand indir outdir filename >>= putStrLn) imageFiles
I compile it with the command ghc --make -O2 amy2.hs -rtsopts. If I run it with the command ./amy2 ~/nosync/GalaxyZoo/table2/images/ wombat, I get
TEST 1
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.
If I instead run it with the command ./amy2 ~/nosync/GalaxyZoo/table2/images/ wombat +RTS -K20M, I get the correct output...eventually:
TEST 1
667946
convert /home/amy/nosync/GalaxyZoo/table2/images//587736546846572812.jpeg -rotate 0 -crop 143x143+140+140 wombat/587736546846572812.jpeg
convert /home/amy/nosync/GalaxyZoo/table2/images//587736542558617814.jpeg -rotate 0 -crop 143x143+140+140 wombat/587736542558617814.jpeg
...and so on.
This isn't really a strictness issue(*), but an order of evaluation issue. Unlike lazily evaluated pure values, monadic effects must happen in deterministic order. mapM executes every action in the given list and gathers the results, but it cannot return until the whole list of actions is executed, so you don't get the same streaming behavior as with pure list functions.
The easy fix in this case is to run both genCommand and putStrLn inside the same mapM_. Note that mapM_ doesn't suffer from the same issue since it is not building an intermediate list.
mapM_ (genCommand indir outdir >=> putStrLn) imageFiles
The above uses the "kleisli composition operator" >=> from Control.Monad which is like the function composition operator . except for monadic functions. You can also use the normal bind and a lambda.
mapM_ (\filename -> genCommand indir outdir filename >>= putStrLn) imageFiles
For more complex I/O applications where you want better composability between small, monadic stream processors, you should use a library such as conduit or pipes.
Also, make sure you are compiling with either -O or -O2.
(*) To be exact, it is also a strictness issue, because in addition to building a large, intermediate list in memory, laziness causes mapM to build unnecessary thunks and use up stack.
EDIT: So it seems the main culprit might be getDirectoryContents. Looking at the function's source code, it essentially does the same kind of list accumulation internally as mapM.
In order to do streaming directory listing, we need to use System.Posix.Directory which unfortunately makes the program incompatible with non-POSIX systems (like Windows). You can stream the directory contents by e.g. using continuation passing style
import System.Environment (getArgs)
import Control.Monad ((>=>))
import System.Posix.Directory (openDirStream, readDirStream, closeDirStream)
import Control.Exception (bracket)
genCommand :: FilePath -> FilePath -> FilePath -> IO String
genCommand indir outdir file = do
let infile = indir ++ '/':file
let angle = 0 -- have to actually read the file to calculate this for real
let outfile = outdir ++ '/':file
return $! "convert " ++ infile ++ " -rotate " ++ show angle ++
" -crop 143x143+140+140 " ++ outfile
streamingDirContents :: FilePath -> (FilePath -> IO ()) -> IO ()
streamingDirContents root cont = do
let loop stream = do
fp <- readDirStream stream
case fp of
[] -> return ()
_ | fp `notElem` [".", ".."] -> cont fp >> loop stream
| otherwise -> loop stream
bracket (openDirStream root) loop closeDirStream
main :: IO ()
main = do
putStrLn "TEST 1"
(indir:outdir:_) <- getArgs
streamingDirContents indir (genCommand indir outdir >=> putStrLn)
Here's how you could do the same thing using conduit:
import System.Environment (getArgs)
import System.Posix.Directory (openDirStream, readDirStream, closeDirStream)
import Data.Conduit
import qualified Data.Conduit.List as L
import Control.Monad.IO.Class (liftIO, MonadIO)
genCommand :: FilePath -> FilePath -> FilePath -> IO String
genCommand indir outdir file = do
let infile = indir ++ '/':file
let angle = 0 -- have to actually read the file to calculate this for real
let outfile = outdir ++ '/':file
return $! "convert " ++ infile ++ " -rotate " ++ show angle ++
" -crop 143x143+140+140 " ++ outfile
dirSource :: (MonadResource m, MonadIO m) => FilePath -> Source m FilePath
dirSource root = do
bracketP (openDirStream root) closeDirStream $ \stream -> do
let loop = do
fp <- liftIO $ readDirStream stream
case fp of
[] -> return ()
_ -> yield fp >> loop
loop
main :: IO ()
main = do
putStrLn "TEST 1"
(indir:outdir:_) <- getArgs
let files = dirSource indir $= L.filter (`notElem` [".", ".."])
commands = files $= L.mapM (liftIO . genCommand indir outdir)
runResourceT $ commands $$ L.mapM_ (liftIO . putStrLn)
The nice thing about conduit is that you regain the ability to compose pieces of functionality with things like conduit versions of filter and mapM. The $= operator streams stuff forward in the chain and $$ connects the stream to a consumer.
The not-so-nice thing is that real world is complicated and writing efficient and robust code requires us to jump through some hoops with resource management. That's why all the operations work in the ResourceT monad transformer which keeps track of e.g. open file handles and cleans them up promptly and deterministically when they are no longer needed or e.g. if the computation gets aborted by an exception (this is in contrast to using lazy I/O and relying on the garbage collector to eventually release any scarce resources).
However, this means that we a) need to run the final resulting conduit operation with runResourceT and b) we need to explicitly lift I/O operations to the transformed monad using liftIO instead of being able to directly write e.g. L.mapM_ putStrLn.
I find myself doing more and more scripting in haskell. But there are some cases where I'm really not sure of how to do it "right".
e.g. copy a directory recursively (a la unix cp -r).
Since I mostly use linux and Mac Os I usually cheat:
import System.Cmd
import System.Exit
copyDir :: FilePath -> FilePath -> IO ExitCode
copyDir src dest = system $ "cp -r " ++ src ++ " " ++ dest
But what is the recommended way to copy a directory in a platform independent fashion?
I didn't find anything suitable on hackage.
This is my rather naiv implementation I use so far:
import System.Directory
import System.FilePath((</>))
import Control.Applicative((<$>))
import Control.Exception(throw)
import Control.Monad(when,forM_)
copyDir :: FilePath -> FilePath -> IO ()
copyDir src dst = do
whenM (not <$> doesDirectoryExist src) $
throw (userError "source does not exist")
whenM (doesFileOrDirectoryExist dst) $
throw (userError "destination already exists")
createDirectory dst
content <- getDirectoryContents src
let xs = filter (`notElem` [".", ".."]) content
forM_ xs $ \name -> do
let srcPath = src </> name
let dstPath = dst </> name
isDirectory <- doesDirectoryExist srcPath
if isDirectory
then copyDir srcPath dstPath
else copyFile srcPath dstPath
where
doesFileOrDirectoryExist x = orM [doesDirectoryExist x, doesFileExist x]
orM xs = or <$> sequence xs
whenM s r = s >>= flip when r
Any suggestions of what really is the way to do it?
I updated this with the suggestions of hammar and FUZxxl.
...but still it feels kind of clumsy to me for such a common task!
It's possible to use the Shelly library in order to do this, see cp_r:
cp_r "sourcedir" "targetdir"
Shelly first tries to use native cp -r if available. If not, it falls back to a native Haskell IO implementation.
For further details on type semantics of cp_r, see this post written by me to described how to use cp_r with String and or Text.
Shelly is not platform independent, since it relies on the Unix package, which is not supported under Windows.
I couldn't find anything that does this on Hackage.
Your code looks pretty good to me. Some comments:
dstExists <- doesDirectoryExist dst
This does not take into account that a file with the destination name might exist.
if or [not srcExists, dstExists] then print "cannot copy"
You might want to throw an exception or return a status instead of printing directly from this function.
paths <- forM xs $ \name -> do
[...]
return ()
Since you're not using paths for anything, you can change this to
forM_ xs $ \name -> do
[...]
The filesystem-trees package provides the means for a very simple implementation:
import System.File.Tree (getDirectory, copyTo_)
copyDirectory :: FilePath -> FilePath -> IO ()
copyDirectory source target = getDirectory source >>= copyTo_ target
The MissingH package provides recursive directory traversals, which you might be able to use to simplify your code.
I assume that the function in Path.IO copyDirRecur with variants to include/exclude symlinks may be a newer and maintained solution. It requires to convert the filepath to Path x Dir which is achieved with parseRelDir respective parseAbsDir, but I think to have a more precise date type than FilePath is worthwile to avoid hard to track errors at run-time.
There are also some functions for copying files and directories in the core Haskell library Cabal modules, specifically Distribution.Simple.Utils in package Cabal. copyDirectoryRecursive is one, and there are other functions near this one in that module.
More specifically, given an arbritary package name I need to retrieve the same library-dirs field that can be obtained with the ghc-pkg describe command from inside a running Haskell program.
Here's what I could come up with by peeking into the ghc-pkg source code.
The getPkgInfos function returns the package definitions for all installed packages (hopefully including user-installed packages). With this in your hands, you can retrieve the library directories and other package information. See the documentation for details.
The GHC_PKGCONF variable needs to point to the global package config file for systems where it isn't located at the usual place. ghc-pkg solves this problem by receiving a command line flag via a wrapper script in Ubuntu, for instance.
import qualified Config
import qualified System.Info
import Data.List
import Distribution.InstalledPackageInfo
import GHC.Paths
import System.Directory
import System.Environment
import System.FilePath
import System.IO.Error
getPkgInfos :: IO [InstalledPackageInfo]
getPkgInfos = do
global_conf <-
catch (getEnv "GHC_PKGCONF")
(\err -> if isDoesNotExistError err
then do let dir = takeDirectory $ takeDirectory ghc_pkg
path1 = dir </> "package.conf"
path2 = dir </> ".." </> ".." </> ".."
</> "inplace-datadir"
</> "package.conf"
exists1 <- doesFileExist path1
exists2 <- doesFileExist path2
if exists1 then return path1
else if exists2 then return path2
else ioError $ userError "Can't find package.conf"
else ioError err)
let global_conf_dir = global_conf ++ ".d"
global_conf_dir_exists <- doesDirectoryExist global_conf_dir
global_confs <-
if global_conf_dir_exists
then do files <- getDirectoryContents global_conf_dir
return [ global_conf_dir ++ '/' : file
| file <- files
, isSuffixOf ".conf" file]
else return []
user_conf <-
try (getAppUserDataDirectory "ghc") >>= either
(\_ -> return [])
(\appdir -> do
let subdir = currentArch ++ '-':currentOS ++ '-':ghcVersion
user_conf = appdir </> subdir </> "package.conf"
user_exists <- doesFileExist user_conf
return (if user_exists then [user_conf] else []))
let pkg_dbs = user_conf ++ global_confs ++ [global_conf]
return.concat =<< mapM ((>>= return.read).readFile) pkg_dbs
currentArch = System.Info.arch
currentOS = System.Info.os
ghcVersion = Config.cProjectVersion
I wrote this code myself, but it was largely inspired by ghc-pkg (with some pieces copied verbatim). The original code was licensed under a BSD-style license, I think this can be distributed under the cc-wiki license all Stackoverflow content is under, but I'm not really sure. Anyway, as anything else, I did some initial testing and it seems to work, but use it at your own risk.
The format of the installed packages database is Distribution.InstalledPackageInfo.
import Distribution.InstalledPackageInfo
import Distribution.Package
import Distribution.Text
import GHC.Paths
import System
import System.FilePath
main = do
name:_ <- getArgs
packages <- fmap read $ readFile $ joinPath [libdir, "package.conf"]
let matches = filter ((PackageName name ==) . pkgName . package) packages
mapM_ (print . libraryDirs) (matches :: [InstalledPackageInfo_ String])
This doesn't obey the user's package configuration, but should be a start.
Ask Duncan Coutts on the haskell-cafe# or cabal mailing lists. (I'm serious. That is a better forum for Cabal questions than stack overflow).
Sometimes you just have to point people at a different forum.
If you're using cabal to configure and build your program/library you can used the autogenerated Paths_* module.
For example, if you have a foo.cabal file, cabal will generate a Paths_foo module (see its source under dist/build/autogen) which you can import. This module exports a function getLibDir :: IO FilePath which has the value you're looking for.