Listing a TAR archive in Haskell - haskell

I'm currently trying to figure out how to list a (gzipped) TAR archive in Haskell. Codec.Archive.Tar seems to be the right choice for the task, but I can't figure out how map the entryPath over the Entries monoid.
Let's assume the TAR contains the entries (only files) a.txt, b.txt, c.txt and is named foo.tar.gz. Here's my code to read the file:
import qualified Codec.Archive.Tar as Tar
import qualified Data.ByteString.Lazy as BS
import qualified Codec.Compression.GZip as GZip
foldEntryToPath :: Tar.Entry -> [String] -> [String]
foldEntryToPath entry list = list ++ [show $ Tar.entryPath entry]
-- Converts TAR errors to a string.
entryFailMapper :: String -> [String]
entryFailMapper err = [err]
main = do
fileContent <- fmap GZip.decompress $ BS.readFile "foo.tar.gz"
entries <- fmap Tar.read fileContent :: Tar.Entries
-- Here I don't know how to correctly apply fmap
entryPaths <- Tar.foldEntries foldEntryToPath [] entryFailMapper entries :: [String]
-- This should print ["a.txt", "b.txt", "c.txt"]
print entryPaths
Here's the error printed by runghc:
readtar.hs:14:49:
Expecting one more argument to `Tar.Entries'
In an expression type signature: Tar.Entries
In a stmt of a 'do' block:
entries <- fmap Tar.read fileContent :: Tar.Entries
In the expression:
do { fileContent <- fmap GZip.decompress
$ BS.readFile "foo.tar.gz";
entries <- fmap Tar.read fileContent :: Tar.Entries;
entryPaths <- Tar.foldEntries
foldEntryToPath [] (\ x -> [...]) entries ::
[String];
print entryPaths }
So far I have little knowledge of Haskell, but by reading the docs I don't know why Tar.Entries is a typeclass (is that the correct term when it says expecting n more arguments to <type>?) or what's the correct type to use.
Any help will be appreciated!

I think foldEntryToPath needs to be fixed:
foldEntryToPath :: Tar.Entry -> [String] -> [String]
foldEntryToPath entry list = (show $ Tar.entryPath entry) : list
And in main:
fileContent <- fmap GZip.decompress $ BS.readFile "foo.tar.gz"
let entries = Tar.read fileContent
let entryPaths = Tar.foldEntries foldEntryToPath [] entryFailMapper entries
print entryPaths

With a bit of fiddling around, I now have a full working example.
One of the main problems was the foldr-like behaviour of Tar.foldEntries. In reality, I have a ~25GB TAR file containing a few millions of entries. See the HaskellWiki for information on why this is a bad idea. (Note: Being efficient wasn't the question, but I think the foldEntries-free solution is better for this specific usecase.
Therefore I wrote my own recursive Tar.Entries -> [String] mapping function. Even if errors currently aren't handled particularly well, it should provide a good starting point.
import qualified Codec.Archive.Tar as Tar
import qualified Data.ByteString.Lazy as BS
import qualified Codec.Compression.GZip as GZip
entriesToPaths :: Tar.Entries Tar.FormatError -> [String]
entriesToPaths (Tar.Next entry entries) = [Tar.entryPath entry] ++ entriesToPaths entries
entriesToPaths Tar.Done = [] :: [String]
entriesToPaths (Tar.Fail e) = ["Error"]
main = do
fileContent <- fmap GZip.decompress $ BS.readFile "foo.tar.gz"
let entries = Tar.read fileContent
let entryPaths = entriesToPaths entries
-- This should print ["a.txt", "b.txt", "c.txt"]
print entryPaths

Related

No instance for (Show (FilePath -> IO [FilePath])) arising from a use of ‘print’

I am trying to fix and run every example on the Real World Haskell book and learn something in the process and I got stuck at chapter 9. By reading the comments I got the following code to compile:
FoldDir.hs:
import ControlledVisit
import Data.Char (toLower)
import Data.Time.Clock (UTCTime(..))
import System.Directory (Permissions(..))
import System.FilePath ((</>), takeExtension, takeFileName)
data Iterate seed
= Done { unwrap :: seed }
| Skip { unwrap :: seed }
| Continue { unwrap :: seed }
deriving (Show)
type Iterator seed = seed -> Info -> Iterate seed
foldTree :: Iterator a -> a -> FilePath -> IO a
foldTree iter initSeed path = do
endSeed <- fold initSeed path
return (unwrap endSeed)
where
fold seed subpath = getUsefulContents subpath >>= walk seed
walk seed (name : names) = do
let path' = path </> name
info <- getInfo path'
case iter seed info of
done # (Done _) -> return done
Skip seed' -> walk seed' names
Continue seed'
| isDirectory info -> do
next <- fold seed' path'
case next of
done # (Done _) -> return done
seed'' -> walk (unwrap seed'') names
| otherwise -> walk seed' names
walk seed _ = return (Continue seed)
atMostThreePictures :: Iterator [FilePath]
atMostThreePictures paths info
| length paths == 3
= Done paths
| isDirectory info && takeFileName path == ".svn"
= Skip paths
| extension `elem` [".jpg", ".png"]
= Continue (path : paths)
| otherwise
= Continue paths
where
extension = map toLower (takeExtension path)
path = infoPath info
countDirectories count info =
Continue (if isDirectory info then count + 1 else count)
ControlledVisit.hs:
module ControlledVisit where
import Control.Monad (forM, liftM)
import Data.Time.Clock (UTCTime(..))
import System.FilePath ((</>))
import System.Directory
( Permissions(..)
, getModificationTime
, getPermissions
, getDirectoryContents
)
import Control.Exception
( bracket
, handle
, SomeException(..)
)
import System.IO
( IOMode(..)
, hClose
, hFileSize
, openFile
)
data Info = Info
{ infoPath :: FilePath
, infoPerms :: Maybe Permissions
, infoSize :: Maybe Integer
, infoModTime :: Maybe UTCTime
} deriving (Eq, Ord, Show)
getInfo :: FilePath -> IO Info
getInfo path = do
perms <- maybeIO (getPermissions path)
size <- maybeIO (bracket (openFile path ReadMode) hClose hFileSize)
modified <- maybeIO (getModificationTime path)
return (Info path perms size modified)
traverseDirs :: ([Info] -> [Info]) -> FilePath -> IO [Info]
traverseDirs order path = do
names <- getUsefulContents path
contents <- mapM getInfo (path : map (path </>) names)
liftM concat $ forM (order contents) $ \ info -> do
if isDirectory info && infoPath info /= path
then traverseDirs order (infoPath info)
else return [info]
getUsefulContents :: FilePath -> IO [String]
getUsefulContents path = do
names <- getDirectoryContents path
return (filter (`notElem` [".", ".."]) names)
isDirectory :: Info -> Bool
isDirectory = maybe False searchable . infoPerms
maybeIO :: IO a -> IO (Maybe a)
maybeIO act = handle (\ (SomeException _) -> return Nothing) (Just `liftM` act)
traverseVerbose order path = do
names <- getDirectoryContents path
let usefulNames = filter (`notElem` [".", ".."]) names
contents <- mapM getEntryName ("" : usefulNames)
recursiveContents <- mapM recurse (order contents)
return (concat recursiveContents)
where
getEntryName name = getInfo (path </> name)
isDirectory info = case infoPerms info of
Nothing -> False
Just perms -> searchable perms
recurse info = do
if isDirectory info && infoPath info /= path
then traverseVerbose order (infoPath info)
else return [info]
But when I try to run it in GHCi as explained in the book it fails with a weird error that as far as I understand is about GHCi itself:
Prelude> :l FoldDir.hs
[1 of 2] Compiling ControlledVisit ( ControlledVisit.hs, interpreted )
[2 of 2] Compiling Main ( FoldDir.hs, interpreted )
Ok, two modules loaded.
*Main> foldTree atMostThreePictures []
<interactive>:2:1: error:
• No instance for (Show (FilePath -> IO [FilePath]))
arising from a use of ‘print’
(maybe you haven't applied a function to enough arguments?)
• In a stmt of an interactive GHCi command: print it
I think I understand the No instance for (Show (FilePath -> IO [FilePath])) part but I am clueless about the print it. I know it is a special variable in GHCi that stores the evaluation result of the last expression and I guess the code is trying to print a function or a monad, but I don't get where it is happening.
As simple as possible - signature of Your function foldTree is:
foldTree :: Iterator a -> a -> FilePath -> IO a
You are supplying it with two arguments, one of type Iterator [FilePath] and second of type FilePath. Due to default partial application such call returns function with signature:
FilePath -> IO [FilePath]
GHCI wants to display the result of Your call but it cannot, as this type has no defined instance of typeclass Show. And so, it gives You an error telling exactly this.

How to convert list to string?

I make a function which read file and removes in every line all the words that were encountered earlier in the same line.
{-# OPTIONS_GHC -Wall #-}
module Main where
import System.Environment
import System.IO()
main :: IO ()
main = do args <- getArgs
if (length args > 0) then do
f <- get args
putStrLn (seqWord $ head f)
else do
f <- getContents
putStrLn (seqWord f)
get :: [String] -> IO[String]
get [] = return []
get (file:xs) = do
contents <- readFile file
fs <- get xs
return (contents:fs)
seqWord :: String -> String
seqWord s = show (map (filterWord . words) (lines s))
filterWord :: [String] -> [String]
filterWord [] = []
filterWord (x:xs) = x : filterWord (filter(/=x) xs)
In answer I have list of lists, like this
[["1","12","5","8","13","145","85"],["546","822","1","12","58","8","9"]]
Please, help me fix this problem. Thank you
Use the unwords function to undo the effect of words. You may also want to replace show with unlines.
seqWord s = unlines (map (unwords . filterWord . words) (lines s))

Read a list of integers lazily as a bytestring

I'm trying to find the sum of integers in a file. The code using the normal string is:
main = do
contents <- getContents
L.putStrLn (sumFile contents)
where sumFile = sum . map read. words
I tried to change it to use the Data.ByteString.Lazy module like this:
import Data.ByteString.Lazy as L
main = do
contents <- L.getContents
L.putStrLn (sumFile contents)
where sumFile = sum . L.map read. words
But this refused as words was returning a string. Then I tried using Data.ByteString.Char8 but it used a strict ByteString.
How can I make this function completely lazy?
I found a slightly length workaround to reading the file as a ByteString and then as a list of integers. Thanks to #melpomene
import Data.ByteString.Lazy.Char8 as L
main = do
contents <- L.getContents
print (sumFile contents)
where sumFile x = sum $ Prelude.map tups $ Prelude.map L.readInt (L.words x)
where read' = tups.(L.readInt)
tups :: (Num a) => (Maybe (a, b)) -> a
tups (Just (a,b)) = a
tups Nothing = 0

Couldn't match type `[]' with `IO' -- Haskell

I'm beginner in Haskell. In this task i'm performing the split operation but i'm facing problem because of type mis match. I'm reading data from text file and the data is in table format. Ex. 1|2|Rahul|13.25. In this format. Here | is delimiter so i want to split the data from the delimiter | and want to print 2nd column and 4th column data but i'm getting the error like this
"Couldn't match type `[]' with `IO'
Expected type: IO [Char]
Actual type: [[Char]]
In the return type of a call of `splitOn'"
Here is my code..
module Main where
import Data.List.Split
main = do
list <- readFile("src/table.txt")
putStrLn list
splitOn "|" list
Any help regarding this will appreciate.. Thanks
The problem is that you're trying to return a list from the main function, which has a type of IO ().
What you probably want to do is print the result.
main = do
list <- readFile("src/table.txt")
putStrLn list
print $ splitOn "|" list
Not Haskell, but it looks like a typical awk task.
cat src/table.txt | awk -F'|' '{print $2, $4}'
Back to Haskell the best I could find is :
module Main where
import Data.List.Split(splitOn)
import Data.List (intercalate)
project :: [Int] -> [String] -> [String]
project indices l = foldl (\acc i -> acc ++ [l !! i]) [] indices
fromString :: String -> [[String]]
fromString = map (splitOn "|") . lines
toString :: [[String]] -> String
toString = unlines . map (intercalate "|")
main :: IO ()
main = do
putStrLn =<<
return . toString . map (project [1, 3]) . fromString =<<
readFile("table.txt")
If not reading from a file, but from stdin, the interact function could be useful.

Find a given string in given path

As title states, I am trying to find a given string within a given path. Here is what I come up so far:
getRecursiveContents :: FilePath -> IO [FilePath]
getRecursiveContents topdir = do
names <- getDirectoryContents topdir
let properNames = filter (`notElem` [".", ".."]) names
paths <- forM properNames $ \name -> do
let path = topdir </> name
isDirectory <- doesDirectoryExist path
if isDirectory
then getRecursiveContents path
else return [path]
return (concat paths)
findInFile:: String -> FilePath -> IO Bool
findInFile needle filePath= do
content <- readFile filePath
return (needle `L.isInfixOf` content)
findInFolder:: (String -> Bool) -> FilePath -> String -> IO [IO Bool]
findInFolder p path needle = do
files <- getRecursiveContents path
return (map (findInFile needle) (filter p files))
find = findInFolder (\p -> takeExtension p `elem` [".py", ".xml", ".html"])
I can :
*Main> findInFile "search_string" "./path/to/a/file"
True
Which is perfect but I cannot do the same search for a folder:
*Main> find "./path/to/a/folder" "search_string"
*Main>
In my file system ./path/to/a/file is located under ./path/to/a/folder. Thus I was expecting the same result.
What am I doing wrong?
Note: getRecursiveContents is from real world haskell.
It does indeed work. The only issue is with how things are printed. When you type some expressions into ghci, it will call print on that expression. If the value has type IO x, it will execute the IO action and print x only if it has a Show instance; otherwise it prints no additional information.
find "./path/to/a/folder" "search_string" produces a list of IO actions, which have no Show instance. You can get the result of find, which is again a list of IO actions, and then execute them:
> x <- find "./path/to/a/folder" "search_string"
> sequence x
> [True, False ...
Likely you wanted to do this originally in your function. Simply make the following changes:
findInFolder:: (String -> Bool) -> FilePath -> String -> IO [Bool]
findInFolder p path needle = do
files <- getRecursiveContents path
mapM (findInFile needle) (filter p files)
Now findInFolder will work as you expect.

Resources