Flexible number of arguments to haskell program - haskell

I am using the System.FilePath.Find module of filemanip to recursively find all files I need to process (here I will be using just printing to console as the action to perform, in order not confuse things). Now, this code:
import System.Environment (getArgs)
import System.FilePath (FilePath)
import System.Directory (doesDirectoryExist, getDirectoryContents,doesFileExist)
import Control.Monad
import System.FilePath.Find (find,always,fileType,(==?),FileType(..),(&&?),extension)
main= do
[dbFile,input]<- getArgs
files <- findFiles input
mapM_ putStrLn files
return ()
searchExtension :: String
searchExtension = ".hs"
findFiles :: FilePath -> IO [String]
findFiles = find (always) ( fileType ==? RegularFile &&? extension ==? searchExtension)
works well with this call
./myprog tet .
In this case, the get argument is ignored (will be the output database file later) and the second argument is searched recursively for matching files. It also allows me to specify just a single file, which is just perfect!
BUT, I would like to be able to specify
./myprog tet path1 path2 path4 file1
but this of course fails in the pattern matching:
./myprog tet . .
myprogt: user error (Pattern match failure in do expression at myprog.hs:11:9-22)
Now, how do I make this program more flexible, so that I can take more than two arguments?
Sorry for asking this, actually, but my Haskell knowledge is limited but increasing for every new thing I have to do in my first project.

Well, you can use a different pattern like:
(dbFile:inputs) <- getArgs
where dbFile will match the first argument passed while inputs will match any number of file names (even 0. If you want at least one path name use inputs#(_:_) instead of the simple inputs).
Then you can use mapM to call findFiles for each path in inputs:
files <- mapM findFiles input
mapM_ putStrLn $ concat files
Instead of mapM you could modify findFiles to accept a [FilePath] argument instead of a simple FilePath.
Note that to parse command arguments you could consider using some module like getopt. You should also read this page about argument handling.

Related

main: Prelude.!!: index too large

I am trying to write a program that will copy a file and allow us to rename it, like a cp command in unix.
import System.IO
import System.Environment
import qualified Data.Text as T
import qualified Data.Text.IO as TI
main :: IO ()
main = do
args <- getArgs
let source = args !! 0
let dest = args !! 1
input <- TI.readFile source
TI.writeFile dest input
I tried this and got main: Prelude.!!: index too large
As the error says, this means that the index is too large, so that means that getArgs returns a list that contains less than two elements.
Using (!!) is however often not a good idea: there is no guarantee that the element exists, and it runs in 𝓞(k) to obtain the k-th element. You can work with:
main :: IO ()
main = do
args <- getArgs
case args of
(src:dst:_) -> do
input <- TI.readFile source
TI.writeFile dest input
_ -> putStrLn "You should provide a source and destination"
Also loading the entire content in a string is not a good idea, since the file can be larger than all (available) memory, thus crashing the system. Often copying is done through two file handlers: one that reads the source file, and one that writes to the target file, with only a small amount of memory used. Some file systems can also make copying more effective. For example by using two references to the same file, and only effectively making a copy if one of the two files is modified.

Recursively search directories for all files matching name criteria in Haskell

I'm relatively inexperienced in Haskell and I wanted to improve, so for a learning project of mine I have the following requirements:
I want to search starting from a specified top directory, not necessarily an absolute path.
I want to find all files of a given extension, say .md.
I want to not search hidden directories, say toplevel/.excluded.
I want to be able to ignore hidden files like gedit produces .filename.md.swp.
I want to end up with a complete list of files as the result of my function.
I searched all over SO. Here's what I have so far:
import qualified System.FilePath.Find as SFF
import qualified Filesystem.Path.CurrentOS as FP
srcFolderName = "src"
outFolderName = "output"
resFolderName = "res"
ffNotHidden :: SFF.FindClause Bool
ffNotHidden = SFF.fileName SFF./~? ".?*"
ffIsMD :: SFF.FindClause Bool
ffIsMD = SFF.extension SFF.==? ".md" SFF.&&? SFF.fileName SFF./~? ".?*"
findMarkdownSources :: FilePath -> IO [FilePath]
findMarkdownSources filePath = do
paths <- SFF.find ffNotHidden ffIsMD filePath
return paths
This doesn't work. printf-style debugging in "findMarkdownSources", I can verify that filePath is correct e.g. "/home/user/testdata" (print includes the ", in case that tells you something). The list paths is always empty. I'm absolutely certain there are markdown files in the directory I have specified (find /path/to/dir -name "*.md" finds them).
I therefore have some specific questions.
Is there a reason (filters incorrect) for example, why this code should not work?
There are a number of ways to do this in haskell. It seems there are at least six packages (fileman, system.directory, system.filepath.find) dedicated to this. Here's some questions where something like this is answered:
Streaming recursive descent of a directory in Haskell
Is there some directory walker in Haskell?
avoid recursion into specifc folder using filemanip
Each one has about three unique ways to achieve what I want to achieve, so, we're nearly at 10 ways to do it...
Is there a specific way I should be doing this? If so why? If it helps, once I have my file list, I'm going to walk the entire thing, open and parse each file.
If it helps, I'm reasonably comfortable with basic haskell, but you'll need to slow down if we start getting too heavy with monads and applicative functors (I don't use haskell enough for this to stay in my head). I find the haskell docs on hackage incomprehensible, though.
so, we're nearly at 10 ways to do it...
Here's yet another way to do it, using functions from the directory, filepath and extra packages, but not too much monad wizardry:
import Control.Monad (foldM)
import System.Directory (doesDirectoryExist, listDirectory) -- from "directory"
import System.FilePath ((</>), FilePath) -- from "filepath"
import Control.Monad.Extra (partitionM) -- from the "extra" package
traverseDir :: (FilePath -> Bool) -> (b -> FilePath -> IO b) -> b -> FilePath -> IO b
traverseDir validDir transition =
let go state dirPath =
do names <- listDirectory dirPath
let paths = map (dirPath </>) names
(dirPaths, filePaths) <- partitionM doesDirectoryExist paths
state' <- foldM transition state filePaths -- process current dir
foldM go state' (filter validDir dirPaths) -- process subdirs
in go
The idea is that the user passes a FilePath -> Bool function to filter unwanted directories; also an initial state b and a transition function b -> FilePath -> IO b that processes file names, updates the b state and possibly has some side effects. Notice that the type of the state is chosen by the caller, who might put useful things there.
If we only want to print file names as they are produced, we can do something like this:
traverseDir (\_ -> True) (\() path -> print path) () "/tmp/somedir"
We are using () as a dummy state because we don't really need it here.
If we want to accumulate the files into a list, we can do it like this:
traverseDir (\_ -> True) (\fs f -> pure (f : fs)) [] "/tmp/somedir"
And what if we want to filter some files? We would need to tweak the transition function we pass to traverseDir so that it ignores them.
I tested you code on my machine, and it seems to work fine. Here is some example data:
$ find test/data
test/data
test/data/look-a-md-file.md
test/data/another-dir
test/data/another-dir/shown.md
test/data/.not-shown.md
test/data/also-not-shown.md.bkp
test/data/.hidden
test/data/some-dir
test/data/some-dir/shown.md
test/data/some-dir/.ahother-hidden
test/data/some-dir/.ahother-hidden/im-hidden.md
Running your function will result in:
ghci> findMarkdownSources "test"
["test/data/another-dir/shown.md","test/data/look-a-md-file.md","test/data/some-dir/shown.md"]
I've tested this with an absolute path, and it also works. Are you sure you have passed a valid path? You'll get an empty list if that is the case (although you also get a warning).
Note that your code could be simplified as follows:
module Traversals.FileManip where
import Data.List (isPrefixOf)
import System.FilePath.Find (always, extension, fileName, find, (&&?),
(/~?), (==?))
findMdSources :: FilePath -> IO [FilePath]
findMdSources fp = find isVisible (isMdFile &&? isVisible) fp
where
isMdFile = extension ==? ".md"
isVisible = fileName /~? ".?*"
And you can even remove the fp parameter, but I'm leaving it here for the sake of clarity.
I prefer to import explicitly so that I know where each function comes from (since I don't know of any Haskell IDE with advanced symbol navigation).
However, note that this solution uses uses unsafe interleave IO, which is not recommended.
So regarding your questions 2 and 3, I would recommend a streaming solution, like pipes or conduits. Sticking to these kind of solutions will reduce your options (just like sticking to pure functional programming languages reduced my options for programming languages ;)). Here you have an example on how pipes can be used to walk a directory.
Here is the code in case you want to try this out.

Haskell write a list in file and read later

I am trying to write a list into a file and later on I want to read the file contents into the list as well.
So I have a list like this ["ABC","DEF"]
I have tried things like
hPrint fileHandle listName
This just prints into file "["ABC","DEF"]"
I have tried unlines but that is priniting like "ABC\nDEF\n"
Now in both the cases, I cant read back into proper list. The output file has quotes and because of which when I read, I get like this ["["ABC","DEF"]""] i.e a single string in list.
As I am not succeeding in this, I tried to write the list line by line, I tried to apply a map and the function to write the list k = map (\x -> hPrint fileSLC x) fieldsBefore, it is not doing anything, file is blank. I think if I write everything in separate line, I will be able to read like (lines src) later on.
I know whatever I am doing is wrong but I am writing the code on Haskell for second time only, last time I just a wrote a very a small file reading program. Moving from imperative to functional is not that easy. :(
Try using hPutStrLn and unlines instead of hPrint. The hPrint internally calls show which causes Strings to be quoted and escaped.
hPutStr fileHandle (unlines listName)
Alternatively, use a mapM or a forM. A verbose example is:
forM_ listName $ \string ->
hPutStrLn string
This can be simplified ("eta-contracted", in lambda-calculus terminology) to
forM_ listName hPutStrLn
As you have seen, when you read from a file, you get a String. In order to convert this String into a list, you will need to parse it.
For k = map (\x -> hPrint fileSLC x) fieldsBefore to work, you need to use mapM or mapM_ instead of map.

Adding the possibility to write a AST-file to my (rail-)compiler

I'm writing rail-compiler (rail is an esoteric language) in Haskell and I get some problems within the main-function of my mainmodule.
1) I want my program to ask wheter I want to run the compiling-pipeline or simply stop after the lexer and write the AST to a file so another compiler can deal with my AST (Abstract Synatx Tree). Here is my program:
module Main (
main -- main function to run the program
)
where
-- imports --
import InterfaceDT as IDT
import qualified Testing as Test
import qualified Preprocessor as PreProc
import qualified Lexer
import qualified SyntacticalAnalysis as SynAna
import qualified SemanticalAnalysis as SemAna
import qualified IntermediateCode as InterCode
import qualified CodeOptimization as CodeOpt
import qualified Backend
-- functions --
main :: IO()
main = do putStr "Enter inputfile (path): "
inputfile <- getLine
input <- readFile inputfile
putStr "Enter outputfile (path): "
outputfile <- getLine
input <- readFile inputfile
putStr "Only create AST (True/False): "
onlyAST <- getLine
when (onlyAST=="True") do putStrLn "Building AST..."
writeFile outputfile ((Lexer.process . PreProc.process) input)
when (onlyAST=="False") do putStrLn ("Compiling "++inputfile++" to "++outputfile)
writeFile outputfile ((Backend.process . CodeOpt.process . InterCode.process . SemAna.process . SynAna.process . Lexer.process . PreProc.process) input)
I get an error in Line 21 (input <- readFile inputfile) caused by the <-. Why?
How should I do it?
2) Next thing is that I want to refactor the program in that way, that I can call it from the terminal with parameters like runhaskell Main(AST) (in that way it should just create the AST) or like runhaskell Main.hs (in that way it should do the whole pipeline).
I hope for your help!
For your error in (1), your program doesn't look syntactically incorrect at line 21 to me. However an error at <- would happen if that line were indented differently from the previous one. I suspect that you are having an indentation error due to mixing tabs and spaces in a way that looks correct in your editor but disagrees with Haskell's interpretation of tabs. The simplest recommendation is to always use spaces and never tabs.
You also have an extra copy of that line later, which you might want to remove.
I also suspect you may need to use hFlush stdin after your putStr's, for them to work as prompts.
For (2), I'd suggest using a library for proper command line argument and option parsing, such as System.Console.GetOpt which is included with GHC, or one of the fancier ones which you can find on Hackage.

How to override function in Codec.Archive.Tar

Haskell noob here. I have a question specifically regarding how to use an existing library that may lead to some more fundamental aspects of the proper use of Haskell.
I'm learning Haskell and have a small project in mind to work on while I learn. The script will need to find all the tarballs in a given directory and unpack them in parallel. At this point, I'm working on the basic functionality of unpacking. So, using the Codec.Archive.Tar package, how can I override its behavior regarding tarballs with fully qualified paths?
Here's some example code:
module Main where
import qualified Codec.Archive.Tar as Tar
import qualified Codec.Compression.GZip as GZip
import Control.Monad (liftM, unless)
import qualified Data.ByteString.Lazy as BS
import System.Directory (doesDirectoryExist, getDirectoryContents)
import System.Exit (exitWith, ExitCode(..))
import System.FilePath.Posix (takeExtension)
searchPath = "/home/someuser/tarball/dir"
exit = exitWith ExitSuccess
die = exitWith (ExitFailure 1)
processFile :: String -> IO ()
processFile file = do
putStrLn $ "Unpacking " ++ file ++ " to " ++ searchPath
Tar.unpack searchPath . Tar.read . GZip.decompress =<< BS.readFile filePath
where filePath = searchPath ++ "/" ++ file
main = do
dirExists <- doesDirectoryExist searchPath
unless dirExists $ (putStrLn $ "Error: Search path not found: " ++ searchPath) >> die
files <- targetFiles `liftM` getDirectoryContents searchPath
mapM_ processFile files
exit
where targetFiles = filter (\f -> f /= "." && f /= ".." && takeExtension f == ".tgz")
When I run this in a directory with tarballs that were packed with:
tar czvPf myfile.tgz /tarball_testing/myfile
I get the following output:
Unpacking myfile.tgz to /tarball_testing
unpacker.hs: Absolute file name in tar archive: "/tarball_testing/myfile"
The second line is the issue. Reading the docs for Codec.Archive.Tar I don't see a way to disable this functionality (not interested in discussions of why I want to use full paths in tarballs, or the relative security implications of doing so).
The first thing that comes to mind is that I somehow need to override the function but that doesn't "feel" like the way a pro Haskeller would do it. Can I get a pointer in the right direction?
You cannot monkey patch or otherwise override a function from a Haskell module, and therefore no workaround will let you avoid the safety measures of the library. What you can do, however, is use the functionality in Codec.Archive.Tar to modify the tar entry paths before unpacking so that they won't be absolute any more. Specifically, there is a mapEntriesNoFail function with type
mapEntriesNoFail :: (Entry -> Entry) -> Entries e -> Entries e
Entries is the type of the argument to Tar.unpack, while Entry is the type of an individual entry. Thanks to mapEntriesNoFail, our problem becomes writing an Entry -> Entry function to adjust the paths. For that, first we will need some extra imports:
import qualified Codec.Archive.Tar.Entry as Tar
import System.FilePath.Posix (takeExtension, dropDrive, hasTrailingPathSeparator)
import Data.Either (either)
The function can look like this:
dropDriveFromEntry :: Tar.Entry -> Tar.Entry
dropDriveFromEntry entry =
either (error "Resulting tar path is somehow too long")
(\tp -> entry { Tar.entryTarPath = tp })
drivelessTarPath
where
tarPath = Tar.entryTarPath entry
path = Tar.fromTarPath tarPath
toTarPath' p = Tar.toTarPath (hasTrailingPathSeparator p) p
drivelessTarPath = toTarPath' $ dropDrive path
This may seem a little long-winded; however, the hoops we jump through are there to ensure the resulting tar paths are sane. You can read about the gory details of tar handling on the Codec.Archive.Tar.Entry documentation. The key function in this definition is dropDrive, which makes an absolute path relative (in Linux, it strips the leading slash of an absolute path).
It is worth spending a few words on the use of either. toTarPath produces a value of type Either String TarPath to account for the possibility of failure. Specifically, the conversion to a tar path fails if the provided path is too long. In our case, however, the path cannot be too long, as it is a path which already was in a tar file, perhaps with a removed leading slash. That being so, it is good enough to eliminate the Either wrapping with either, passing an error instead of the function to handle the (impossible) Left case.
With dropDriveFromEntry in hand, we just have to map it over the entries before unpacking. The relevant line of your program would become:
Tar.unpack searchPath . Tar.mapEntriesNoFail dropDriveFromEntry
. Tar.read . GZip.decompress =<< BS.readFile filePath
Note that if there were relevant errors to be accounted for in dropDriveFromEntry, we would make it return Either String TarPath, and then use mapEntries instead of mapEntriesNoFail.
With these changes, the entry in your tar file will be extracted to /home/someuser/tarball/dir/tarball_testing/myfile. If that is not what you intended, you can modify dropDriveFromEntry so that it performs whatever extra path processing you need.
P.S.: Regarding the alternate title of your question, and considering the sensible little program you have shown us, I do not think you should be worried :)

Resources