Unsafe IO Or: Haskeline and Directories - haskell

DISCLAIMER: I am somewhat new to Haskell.
I am writing an interpreter, or, in this context, a REPL. For that purpose I am using haskeline, which is nice for REPLs. It has the capability of storing the command line history within a file, which is also nice.
One problem I came across while working with it, though, is that it does not seem to expand "~" to the home directory, which means that I have to retrieve the home directory manually.
I could do it like this (and currently do):
-- | returns a fresh settings variable
addSettings :: Env -> Settings IO
addSettings env = Settings { historyFile = Just getDir
, complete = completeWord Nothing " \t" $
return . completionSearch env
, autoAddHistory = True
}
where
getDir :: FilePath
getDir = unsafePerformIO getHomeDirectory ++ "/.zepto_history"
But that uses unsafePerformIO, which makes me cringe. Do you know of a good and clean workaround that does not involve rewriting the whole function? This can be a haskeline feature I do not know of or something I just did not see.
Telling me there is no way around rewriting and rethinking it all is fine, too.
EDIT:
I know unsafePerformIO is bad, that's why it makes me cringe. If you are new to Haskell and reading this question right now: Just pretend it is not there.

A better approach would be to generate the Settings object inside IO, instead of the other way around, so to speak:
addSettings :: Env -> IO (Settings IO)
addSettings = do
getDir <- fmap (++ "/.zepto_history") getHomeDirectory
return $ Settings
{ historyFile = Just getDir
, complete = completeWord Nothing " \t" $ return . completionSearch env
, autoAddHistory = True
}
This will no doubt require some changes in your current software, but this would be considered the "right" way to go about this.

Related

Recursively search directories for all files matching name criteria in Haskell

I'm relatively inexperienced in Haskell and I wanted to improve, so for a learning project of mine I have the following requirements:
I want to search starting from a specified top directory, not necessarily an absolute path.
I want to find all files of a given extension, say .md.
I want to not search hidden directories, say toplevel/.excluded.
I want to be able to ignore hidden files like gedit produces .filename.md.swp.
I want to end up with a complete list of files as the result of my function.
I searched all over SO. Here's what I have so far:
import qualified System.FilePath.Find as SFF
import qualified Filesystem.Path.CurrentOS as FP
srcFolderName = "src"
outFolderName = "output"
resFolderName = "res"
ffNotHidden :: SFF.FindClause Bool
ffNotHidden = SFF.fileName SFF./~? ".?*"
ffIsMD :: SFF.FindClause Bool
ffIsMD = SFF.extension SFF.==? ".md" SFF.&&? SFF.fileName SFF./~? ".?*"
findMarkdownSources :: FilePath -> IO [FilePath]
findMarkdownSources filePath = do
paths <- SFF.find ffNotHidden ffIsMD filePath
return paths
This doesn't work. printf-style debugging in "findMarkdownSources", I can verify that filePath is correct e.g. "/home/user/testdata" (print includes the ", in case that tells you something). The list paths is always empty. I'm absolutely certain there are markdown files in the directory I have specified (find /path/to/dir -name "*.md" finds them).
I therefore have some specific questions.
Is there a reason (filters incorrect) for example, why this code should not work?
There are a number of ways to do this in haskell. It seems there are at least six packages (fileman, system.directory, system.filepath.find) dedicated to this. Here's some questions where something like this is answered:
Streaming recursive descent of a directory in Haskell
Is there some directory walker in Haskell?
avoid recursion into specifc folder using filemanip
Each one has about three unique ways to achieve what I want to achieve, so, we're nearly at 10 ways to do it...
Is there a specific way I should be doing this? If so why? If it helps, once I have my file list, I'm going to walk the entire thing, open and parse each file.
If it helps, I'm reasonably comfortable with basic haskell, but you'll need to slow down if we start getting too heavy with monads and applicative functors (I don't use haskell enough for this to stay in my head). I find the haskell docs on hackage incomprehensible, though.
so, we're nearly at 10 ways to do it...
Here's yet another way to do it, using functions from the directory, filepath and extra packages, but not too much monad wizardry:
import Control.Monad (foldM)
import System.Directory (doesDirectoryExist, listDirectory) -- from "directory"
import System.FilePath ((</>), FilePath) -- from "filepath"
import Control.Monad.Extra (partitionM) -- from the "extra" package
traverseDir :: (FilePath -> Bool) -> (b -> FilePath -> IO b) -> b -> FilePath -> IO b
traverseDir validDir transition =
let go state dirPath =
do names <- listDirectory dirPath
let paths = map (dirPath </>) names
(dirPaths, filePaths) <- partitionM doesDirectoryExist paths
state' <- foldM transition state filePaths -- process current dir
foldM go state' (filter validDir dirPaths) -- process subdirs
in go
The idea is that the user passes a FilePath -> Bool function to filter unwanted directories; also an initial state b and a transition function b -> FilePath -> IO b that processes file names, updates the b state and possibly has some side effects. Notice that the type of the state is chosen by the caller, who might put useful things there.
If we only want to print file names as they are produced, we can do something like this:
traverseDir (\_ -> True) (\() path -> print path) () "/tmp/somedir"
We are using () as a dummy state because we don't really need it here.
If we want to accumulate the files into a list, we can do it like this:
traverseDir (\_ -> True) (\fs f -> pure (f : fs)) [] "/tmp/somedir"
And what if we want to filter some files? We would need to tweak the transition function we pass to traverseDir so that it ignores them.
I tested you code on my machine, and it seems to work fine. Here is some example data:
$ find test/data
test/data
test/data/look-a-md-file.md
test/data/another-dir
test/data/another-dir/shown.md
test/data/.not-shown.md
test/data/also-not-shown.md.bkp
test/data/.hidden
test/data/some-dir
test/data/some-dir/shown.md
test/data/some-dir/.ahother-hidden
test/data/some-dir/.ahother-hidden/im-hidden.md
Running your function will result in:
ghci> findMarkdownSources "test"
["test/data/another-dir/shown.md","test/data/look-a-md-file.md","test/data/some-dir/shown.md"]
I've tested this with an absolute path, and it also works. Are you sure you have passed a valid path? You'll get an empty list if that is the case (although you also get a warning).
Note that your code could be simplified as follows:
module Traversals.FileManip where
import Data.List (isPrefixOf)
import System.FilePath.Find (always, extension, fileName, find, (&&?),
(/~?), (==?))
findMdSources :: FilePath -> IO [FilePath]
findMdSources fp = find isVisible (isMdFile &&? isVisible) fp
where
isMdFile = extension ==? ".md"
isVisible = fileName /~? ".?*"
And you can even remove the fp parameter, but I'm leaving it here for the sake of clarity.
I prefer to import explicitly so that I know where each function comes from (since I don't know of any Haskell IDE with advanced symbol navigation).
However, note that this solution uses uses unsafe interleave IO, which is not recommended.
So regarding your questions 2 and 3, I would recommend a streaming solution, like pipes or conduits. Sticking to these kind of solutions will reduce your options (just like sticking to pure functional programming languages reduced my options for programming languages ;)). Here you have an example on how pipes can be used to walk a directory.
Here is the code in case you want to try this out.

Is there a (Template) Haskell library that would allow me to print/dump a few local bindings with their respective names?

For instance:
let x = 1 in putStrLn [dump|x, x+1|]
would print something like
x=1, (x+1)=2
And even if there isn't anything like this currently, would it be possible to write something similar?
TL;DR There is this package which contains a complete solution.
install it via cabal install dump
and/or
read the source code
Example usage:
{-# LANGUAGE QuasiQuotes #-}
import Debug.Dump
main = print [d|a, a+1, map (+a) [1..3]|]
where a = 2
which prints:
(a) = 2 (a+1) = 3 (map (+a) [1..3]) = [3,4,5]
by turnint this String
"a, a+1, map (+a) [1..3]"
into this expression
( "(a) = " ++ show (a) ++ "\t " ++
"(a+1) = " ++ show (a + 1) ++ "\t " ++
"(map (+a) [1..3]) = " ++ show (map (+ a) [1 .. 3])
)
Background
Basically, I found that there are two ways to solve this problem:
Exp -> String The bottleneck here is pretty-printing haskell source code from Exp and cumbersome syntax upon usage.
String -> Exp The bottleneck here is parsing haskell to Exp.
Exp -> String
I started out with what #kqr put together, and tried to write a parser to turn this
["GHC.Classes.not x_1627412787 = False","x_1627412787 = True","x_1627412787 GHC.Classes.== GHC.Types.True = True"]
into this
["not x = False","x = True","x == True = True"]
But after trying for a day, my parsec-debugging-skills have proven insufficient to date, so instead I went with a simple regular expression:
simplify :: String -> String
simplify s = subRegex (mkRegex "_[0-9]+|([a-zA-Z]+\\.)+") s ""
For most cases, the output is greatly improved.
However, I suspect this to likely mistakenly remove things it shouldn't.
For example:
$(dump [|(elem 'a' "a.b.c", True)|])
Would likely return:
["elem 'a' \"c\" = True","True = True"]
But this could be solved with proper parsing.
Here is the version that works with the regex-aided simplification: https://github.com/Wizek/kqr-stackoverflow/blob/master/Th.hs
Here is a list of downsides / unresolved issues I've found with the Exp -> String solution:
As far as I know, not using Quasi Quotation requires cumbersome syntax upon usage, like: $(d [|(a, b)|]) -- as opposed to the more succinct [d|a, b|]. If you know a way to simplify this, please do tell!
As far as I know, [||] needs to contain fully valid Haskell, which pretty much necessitates the use of a tuple inside further exacerbating the syntactic situation. There is some upside to this too, however: at least we don't need to scratch our had where to split the expressions since GHC does that for us.
For some reason, the tuple only seemed to accept Booleans. Weird, I suspect this should be possible to fix somehow.
Pretty pretty-printing Exp is not very straight-forward. A more complete solution does require a parser after all.
Printing an AST scrubs the original formatting for a more uniform looks. I hoped to preserve the expressions letter-by-letter in the output.
The deal-breaker was the syntactic over-head. I knew I could get to a simpler solution like [d|a, a+1|] because I have seen that API provided in other packages. I was trying to remember where I saw that syntax. What is the name...?
String -> Exp
Quasi Quotation is the name, I remember!
I remembered seeing packages with heredocs and interpolated strings, like:
string = [qq|The quick {"brown"} $f {"jumps " ++ o} the $num ...|]
where f = "fox"; o = "over"; num = 3
Which, as far as I knew, during compile-time, turns into
string = "The quick " ++ "brown" ++ " " ++ $f ++ "jumps " ++ o ++ " the" ++ show num ++ " ..."
where f = "fox"; o = "over"; num = 3
And I thought to myself: if they can do it, I should be able to do it too!
A bit of digging in their source code revealed the QuasiQuoter type.
data QuasiQuoter = QuasiQuoter {quoteExp :: String -> Q Exp}
Bingo, this is what I want! Give me the source code as string! Ideally, I wouldn't mind returning string either, but maybe this will work. At this point I still know quite little about Q Exp.
After all, in theory, I would just need to split the string on commas, map over it, duplicate the elements so that first part stays string and the second part becomes Haskell source code, which is passed to show.
Turning this:
[d|a+1|]
into this:
"a+1" ++ " = " ++ show (a+1)
Sounds easy, right?
Well, it turns out that even though GHC most obviously is capable to parse haskell source code, it doesn't expose that function. Or not in any way we know of.
I find it strange that we need a third-party package (which thankfully there is at least one called haskell-src-meta) to parse haskell source code for meta programming. Looks to me such an obvious duplication of logic, and potential source of mismatch -- resulting in bugs.
Reluctantly, I started looking into it. After all, if it is good enough for the interpolated-string folks (those packaged did rely on haskell-src-meta) then maybe it will work okay for me too for the time being.
And alas, it does contain the desired function:
Language.Haskell.Meta.Parse.parseExp :: String -> Either String Exp
Language.Haskell.Meta.Parse
From this point it was rather straightforward, except for splitting on commas.
Right now, I do a very simple split on all commas, but that doesn't account for this case:
[d|(1, 2), 3|]
Which fails unfortunatelly. To handle this, I begun writing a parsec parser (again) which turned out to be more difficult than anticipated (again). At this point, I am open to suggestions. Maybe you know of a simple parser that handles the different edge-cases? If so, tell me in a comment, please! I plan on resolving this issue with or without parsec.
But for the most use-cases: it works.
Update at 2015-06-20
Version 0.2.1 and later correctly parses expressions even if they contain commas inside them. Meaning [d|(1, 2), 3|] and similar expressions are now supported.
You can
install it via cabal install dump
and/or
read the source code
Conclusion
During the last week I've learnt quite a bit of Template Haskell and QuasiQuotation, cabal sandboxes, publishing a package to hackage, building haddock docs and publishing them, and some things about Haskell too.
It's been fun.
And perhaps most importantly, I now am able to use this tool for debugging and development, the absence of which has been bugging me for some time. Peace at last.
Thank you #kqr, your engagement with my original question and attempt at solving it gave me enough spark and motivation to continue writing up a full solution.
I've actually almost solved the problem now. Not exactly what you imagined, but fairly close. Maybe someone else can use this as a basis for a better version. Either way, with
{-# LANGUAGE TemplateHaskell, LambdaCase #-}
import Language.Haskell.TH
dump :: ExpQ -> ExpQ
dump tuple =
listE . map dumpExpr . getElems =<< tuple
where
getElems = \case { TupE xs -> xs; _ -> error "not a tuple in splice!" }
dumpExpr exp = [| $(litE (stringL (pprint exp))) ++ " = " ++ show $(return exp)|]
you get the ability to do something like
λ> let x = True
λ> print $(dump [|(not x, x, x == True)|])
["GHC.Classes.not x_1627412787 = False","x_1627412787 = True","x_1627412787 GHC.Classes.== GHC.Types.True = True"]
which is almost what you wanted. As you see, it's a problem that the pprint function includes module prefixes and such, which makes the result... less than ideally readable. I don't yet know of a fix for that, but other than that I think it is fairly usable.
It's a bit syntactically heavy, but that is because it's using the regular [| quote syntax in Haskell. If one wanted to write their own quasiquoter, as you suggest, I'm pretty sure one would also have to re-implement parsing Haskell, which would suck a bit.

Storing metadata for rules

I have a function that rebuilds a target whenever the associated command changes:
target :: FilePath -> [FilePath] -> String -> Rules ()
target dst deps cline = do
let dcmd = dst <.> "x"
dcmd %> \out -> do
alwaysRerun
writeFileChanged out cline
return ()
dst %> \out -> do
c <- readFile' dcmd
need deps
() <- cmd $ "../dumpdeps/dumpdeps " ++ out ++ " " ++ c
needMakefileDependencies $ out <.> "d"
return ()
I would prefer not to touch the filesystem for this task, is there any way to store the associated command line and trigger the final rule when that command changes?
I'd personally do this using the file system. File systems have great debuggers (ls and cat) and a large array of manipulation tools (echo, rm, touch). They also tend to be very fast for small files, living mostly in the cache. If you avoid the files, you tend to have less visibility of what's happening.
That said, there can be very good reasons to avoid the files. The closest thing Shake has to your pattern above is to use an Oracle. Note that doesn't quite match the pattern you are doing, as it assumes the cline can be computed from dst, which might be possible in your case or might not.
If Oracle doesn't suit you, you can define your own Rule instance which can operate much the same way as a file, but store the values in the Shake database instead.
(If either of those cause any difficulties, let me know which is the relevant one to your question, and I'll expand the right one.)

How to override function in Codec.Archive.Tar

Haskell noob here. I have a question specifically regarding how to use an existing library that may lead to some more fundamental aspects of the proper use of Haskell.
I'm learning Haskell and have a small project in mind to work on while I learn. The script will need to find all the tarballs in a given directory and unpack them in parallel. At this point, I'm working on the basic functionality of unpacking. So, using the Codec.Archive.Tar package, how can I override its behavior regarding tarballs with fully qualified paths?
Here's some example code:
module Main where
import qualified Codec.Archive.Tar as Tar
import qualified Codec.Compression.GZip as GZip
import Control.Monad (liftM, unless)
import qualified Data.ByteString.Lazy as BS
import System.Directory (doesDirectoryExist, getDirectoryContents)
import System.Exit (exitWith, ExitCode(..))
import System.FilePath.Posix (takeExtension)
searchPath = "/home/someuser/tarball/dir"
exit = exitWith ExitSuccess
die = exitWith (ExitFailure 1)
processFile :: String -> IO ()
processFile file = do
putStrLn $ "Unpacking " ++ file ++ " to " ++ searchPath
Tar.unpack searchPath . Tar.read . GZip.decompress =<< BS.readFile filePath
where filePath = searchPath ++ "/" ++ file
main = do
dirExists <- doesDirectoryExist searchPath
unless dirExists $ (putStrLn $ "Error: Search path not found: " ++ searchPath) >> die
files <- targetFiles `liftM` getDirectoryContents searchPath
mapM_ processFile files
exit
where targetFiles = filter (\f -> f /= "." && f /= ".." && takeExtension f == ".tgz")
When I run this in a directory with tarballs that were packed with:
tar czvPf myfile.tgz /tarball_testing/myfile
I get the following output:
Unpacking myfile.tgz to /tarball_testing
unpacker.hs: Absolute file name in tar archive: "/tarball_testing/myfile"
The second line is the issue. Reading the docs for Codec.Archive.Tar I don't see a way to disable this functionality (not interested in discussions of why I want to use full paths in tarballs, or the relative security implications of doing so).
The first thing that comes to mind is that I somehow need to override the function but that doesn't "feel" like the way a pro Haskeller would do it. Can I get a pointer in the right direction?
You cannot monkey patch or otherwise override a function from a Haskell module, and therefore no workaround will let you avoid the safety measures of the library. What you can do, however, is use the functionality in Codec.Archive.Tar to modify the tar entry paths before unpacking so that they won't be absolute any more. Specifically, there is a mapEntriesNoFail function with type
mapEntriesNoFail :: (Entry -> Entry) -> Entries e -> Entries e
Entries is the type of the argument to Tar.unpack, while Entry is the type of an individual entry. Thanks to mapEntriesNoFail, our problem becomes writing an Entry -> Entry function to adjust the paths. For that, first we will need some extra imports:
import qualified Codec.Archive.Tar.Entry as Tar
import System.FilePath.Posix (takeExtension, dropDrive, hasTrailingPathSeparator)
import Data.Either (either)
The function can look like this:
dropDriveFromEntry :: Tar.Entry -> Tar.Entry
dropDriveFromEntry entry =
either (error "Resulting tar path is somehow too long")
(\tp -> entry { Tar.entryTarPath = tp })
drivelessTarPath
where
tarPath = Tar.entryTarPath entry
path = Tar.fromTarPath tarPath
toTarPath' p = Tar.toTarPath (hasTrailingPathSeparator p) p
drivelessTarPath = toTarPath' $ dropDrive path
This may seem a little long-winded; however, the hoops we jump through are there to ensure the resulting tar paths are sane. You can read about the gory details of tar handling on the Codec.Archive.Tar.Entry documentation. The key function in this definition is dropDrive, which makes an absolute path relative (in Linux, it strips the leading slash of an absolute path).
It is worth spending a few words on the use of either. toTarPath produces a value of type Either String TarPath to account for the possibility of failure. Specifically, the conversion to a tar path fails if the provided path is too long. In our case, however, the path cannot be too long, as it is a path which already was in a tar file, perhaps with a removed leading slash. That being so, it is good enough to eliminate the Either wrapping with either, passing an error instead of the function to handle the (impossible) Left case.
With dropDriveFromEntry in hand, we just have to map it over the entries before unpacking. The relevant line of your program would become:
Tar.unpack searchPath . Tar.mapEntriesNoFail dropDriveFromEntry
. Tar.read . GZip.decompress =<< BS.readFile filePath
Note that if there were relevant errors to be accounted for in dropDriveFromEntry, we would make it return Either String TarPath, and then use mapEntries instead of mapEntriesNoFail.
With these changes, the entry in your tar file will be extracted to /home/someuser/tarball/dir/tarball_testing/myfile. If that is not what you intended, you can modify dropDriveFromEntry so that it performs whatever extra path processing you need.
P.S.: Regarding the alternate title of your question, and considering the sensible little program you have shown us, I do not think you should be worried :)

Haskell: Need Enlightenment with Calculator program

I have an assignment which is to create a calculator program in Haskell. For example, users will be able to use the calculator by command lines like:
>var cola =5; //define a random variable
>cola*2+1;
(print 11)
>var pepsi = 10
>coca > pepsi;
(print false)
>def coke(x,y) = x+y; //define a random function
>coke(cola,pepsi);
(print 15)
//and actually it's more complicated than above
I have no clue how to program this in Haskell. All I can think of right now is to read the command line as a String, parse it into an array of tokens. Maybe go through the array, detect keywords such "var", "def" then call functions var, def which store variables/functions in a List or something like that. But then how do I store data so that I can use them later in my computation?
Also am I on the right track because I am actually very confused what to do next? :(
*In addition, I am not allowed to use Parsec!*
It looks like you have two distinct kinds of input: declarations (creating new variables and functions) and expressions (calculating things).
You should first define some data structures so you can work out what sort of things you are going to be dealing with. Something like:
data Command = Define Definition | Calculate Expression | Quit
type Name = String
data Definition = DefVar Name Expression | DefFunc Name [Name] Expression
-- ^ alternatively, implement variables as zero-argument functions
-- and merge these cases
data Expression = Var Name | Add Expression Expression | -- ... other stuff
type Environment = [Definition]
To start off with, just parse (tokenise and then parse the tokens, perhaps) the stuff into a Command, and then decide what to do with it.
Expressions are comparatively easy. You assume you already have all the definitions you need (an Environment) and then just look up any variables or do additions or whatever.
Definitions are a bit trickier. Once you've decided what new definition to make, you need to add it to the environment. How exactly you do this depends on how exactly you iterate through the lines, but you'll need to pass the new environment back from the interpreter to the thing which fetches the next line and runs the interpreter on it. Something like:
main :: IO ()
main = mainLoop emptyEnv
where
emptyEnv = []
mainLoop :: Environment -> IO ()
mainLoop env = do
str <- getLine
case parseCommnad str of
Nothing -> do
putStrLn "parse failed!"
mainLoop env
Just Quit -> do
return ()
Just (Define d) -> do
mainLoop (d : env)
Just (Calculate e) -> do
putStrLn (calc env e)
mainLoop env
-- the real meat:
parseCommand :: String -> Maybe Command
calc :: Environment -> Expression -> String -- or Integer or some other appropriate type
calc will need to look stuff up in the environment you create as you go along, so you'll probably also need a function for finding which Definition corresponds to a given Name (or complaining that there isn't one).
Some other decisions you should make:
What do I do when someone tries to redefine a variable?
What if I used one of those variables in the definition of a function? Do I evaluate a function definition when it is created or when it is used?
These questions may affect the design of the above program, but I'll leave it up to you to work out how.
First, you can learn a lot from this tutorial for haskell programming
You need to write your function in another doc with .hs
And you can load the file from you compiler and use all the function you create
For example
plus :: Int -> Int -- that mean the function just work with a number of type int and return Int
plus x y = x + y -- they receive x and y and do the operation

Resources