IO woes when seeking sizes of directory contents?

IO woes when seeking sizes of directory contents? - haskell

I'm learning Haskell, and my goal today is to write a function sizeOf :: FilePath -> IO Integer (calculate the size of a file or folder), with the logic
If path is a file, System.Directory.getFileSize path
If path is a directory, get a list of its contents, recursively run this function on them, and sum the results
If it's something other than a file or directory, return 0
Here's how I'd implement it in Ruby, to illustrate (Ruby notes: map's argument is the equivalent of \d -> size_of d, reduce :+ is foldl (+) 0, any function ending ? returns a bool, returns are implicit):
def size_of path
if File.file? path
File.size path
elsif File.directory? path
Dir.glob(path + '/*').map { |d| size_of d }.reduce :+
end
end
Here's my crack at it in Haskell:
sizeOf :: FilePath -> IO Integer
sizeOf path =
do
isFile <- doesFileExist path
if isFile then
getFileSize path
else do
isDir <- doesDirectoryExist path
if isDir then
sum $ map sizeOf $ listDirectory path
else
return 0
I know where my problem is. sum $ map sizeOf $ listDirectory path, where listDirectory path returns an IO [FilePath] and not a FilePath. But... I can't really imagine any solution solving this. <$> instead of $ was the first thing that came to mind, since <$> I understood to be something that let a function of a -> b become a Context a -> Context b. But... I guess IO isn't like that?
I spent about two hours puzzling over the logic there. I tried it on other examples. Here's a relevant discovery that threw me: if double = (*) 2, then map double [1,2,3] == [2,4,6], but map double <$> [return 1, return 2, return 3] == [[2],[4],[6]]... it wraps them in a list. I think that's what happening to me but I'm way out of my depth.

You'd need
sum <$> (listDirectory path >>= mapM sizeOf)
Explanation:
The idea to use sum over an IO [Integer] is ok, so we need to get such a thing.
listDirectory path gives us IO [FilePath], so we need to pass each file path to sizeOf. This is what >>= together with mapM does.
Note that map alone would give us [IO Integer] that is why we need mapM

how about (using Control.Monad.Extra):
du :: FilePath -> IO Integer
du path = ifM (doesFileExist path)
(getFileSize path) $
ifM (doesDirectoryExist path)
(sum <$> (listDirectory path >>= mapM (du .
(addTrailingPathSeparator path ++))))
(return 0)
I believe you need to add the path to the output of listDirectory for successful recursive descent as listDirectory only returns filenames without the path, which is required for the subsequent calls to du.
The type of ifM is probably obvious, but is ifM :: Monad m => m Bool -> m a -> m a -> m a

Related

Haskell: Read a data file as array for computations

Set-up
I have a data file in the following format: 2D coordinates separated by a space on each line,
1.23 4.0
23.7 23.1
60.4 4.2
To parse this, I made the following function (which I would not mind feedback on): it takes the file's path as string, parses the lines and columns as doubles,
fileData :: String -> IO [[Float]]
fileData str = readFile str >>=
\file -> return $ map ((map $ \x -> read x::Float) . words $) $ lines file
The ouput given the example file above is an IO [[Float]]:
[[1.23, 4.0],
[23.7, 23.1],
[60.4, 4.2]]
Questions
What is the best way to handle this array, the output of fileData? For example, how should one go about doing computations with values in this array? Eventually, I will want to manipulate these values using hMatrix.
To obtain the zeroth element of the array [1.23, 4.0], I tried running
main :: IO ()
main = fileData "file.txt" >>=
\file -> print $ (flip (!!) 0) <$> file
but it returns the zeroth elements of each sub-array [1.23, 23.7, 60.4], which matches with the value of (print $ map (flip (!!) 0)) on the array as if was [[Float]] print $ map (flip (!!) 0) xs, where xs is the array as if it was [[Float]].
Update
I thought using fmap, as (flip (!!) 0) <$> file in main was necessary because of the type of file there, but it turns out that file !! 0 works as intended; this makes main revised as:
main :: IO ()
main = fileData "file.txt" >>=
\file -> print $ file !! 0
However, now I am confused as to what the type of file in fileData "file.txt" >>= \file -> print $ file !! 0. I thought it was of type IO [[Float]] so applying (!!) would not work directly, because of the IO monad. Is there a better way to think about this?

Iteratively printing every integer in a List

Say I have a List of integers l = [1,2]
Which I want to print to stdout.
Doing print l produces [1,2]
Say I want to print the list without the braces
map print l produces
No instance for (Show (IO ())) arising from a use of `print'
Possible fix: add an instance declaration for (Show (IO ()))
In a stmt of an interactive GHCi command: print it
`:t print
print :: Show a => a -> IO ()
So while I thought this would work I went ahead and tried:
map putStr $ map show l
Since I suspected a type mismatch from Integer to String was to blame. This produced the same error message as above.
I realize that I could do something like concatenating the list into a string, but I would like to avoid that if possible.
What's going on? How can I do this without constructing a string from the elements of the List?

The problem is that
map :: (a -> b) -> [a] -> [b]
So we end up with [IO ()]. This is a pure value, a list of IO actions. It won't actually print anything. Instead we want
mapM_ :: (a -> IO ()) -> [a] -> IO ()
The naming convention *M means that it operates over monads and *_ means we throw away the value. This is like map except it sequences each action with >> to return an IO action.
As an example mapM_ print [1..10] will print each element on a new line.

Suppose you're given a list xs :: [a] and function f :: Monad m => a -> m b. You want to apply the function f to each element of xs, yielding a list of actions, then sequence these actions. Here is how I would go about constructing a function, call it mapM, that does this. In the base case, xs = [] is the empty list, and we simply return []. In the recursive case, xs has the form x : xs. First, we want to apply f to x, giving the action f x :: m b. Next, we want recursively call mapM on xs. The result of performing the first step is a value, say y; the result of performing the second step is a list of values, say ys. So we collect y and ys into a list, then return them in the monad:
mapM :: Monad m => (a -> m b) -> [a] -> m [b]
mapM f [] = return []
mapM f (x : xs) = f x >>= \y -> mapM f ys >>= \ys -> return (y : ys)
Now we can map a function like print, which returns an action in the IO monad, over a list of values to print: mapM print [1..10] does precisely this for the list of integers from one through ten. There is a problem, however: we aren't particularly concerned about collecting the results of printing operations; we're primarily concerned about their side effects. Instead of returning y : ys, we simply return ().
mapM_ :: Monad m => (a -> m b) ->[a] -> m ()
mapM_ f [] = return ()
mapM_ f (x : xs) = f x >> mapM_ f xs
Note that mapM and mapM_ can be defined without explicit recursion using the sequence and sequence_ functions from the standard library, which do precisely what their names imply. If you look at the source code for mapM and mapM_ in Control.Monad, you will see them implemented that way.

Everything in Haskell is very strongly typed, including code to perform IO!
When you write print [1, 2], this is just a convenience wrapper for putStrLn (show [1, 2]), where show is a function that turns a (Show'able) object into a string. print itself doesn't do anything (in the side effect sense of do), but it outputs an IO() action, which is sort of like a mini unrun "program" (if you excuse the sloppy language), which isn't "run" at its creation time, but which can be passed around for later execution. You can verify the type in ghci
> :t print [1, 2]
print [1, 2]::IO()
This is just an object of type IO ().... You could throw this away right now and nothing would ever happen. More likely, if you use this object in main, the IO code will run, side effects and all.
When you map multiple putStrLn (or print) functions onto a list, you still get an object whose type you can view in ghci
> :t map print [1, 2]
map print [1, 2]::[IO()]
Like before, this is just an object that you can pass around, and by itself it will not do anything. But unlike before, the type is incorrect for usage in main, which expects an IO() object. In order to use it, you need to convert it to this type.
There are many ways to do this conversion.... One way that I like is the sequence function.
sequence $ map print [1, 2]
which takes a list of IO actions (ie- mini "programs" with side effects, if you will forgive the sloppy language), and sequences them together as on IO action. This code alone will now do what you want.
As jozefg pointed out, although sequence works, sequence_ is a better choice here....
Sequence not only concatinates the stuff in the IO action, but also puts the return values in a list.... Since print's return value is IO(), the new return value becomes a useless list of ()'s (in IO). :)

Using the lens library:
[1,2,3] ^! each . act print

You might write your own function, too:
Prelude> let l = [1,2]
Prelude> let f [] = return (); f (x:xs) = do print x; f xs
Prelude> f l
1
2

Create List of Strings in Haskell

I am making the pilgrimage from Java to Haskell. Broadly speaking, I get the main concepts behind Haskell. Reading all the tutorials and books 'makes sense' but I am getting stuck writing my own code from scratch.
I want to create 1000 files on the file system with names
"myfile_1.txt" ... "myfile_1000.txt"
and each containing some dummy text.
so far I have worked out the whole IO thing, and realise I need to build a list of Strings 1000 elements long. So I have:
buildNamesList :: [] -> []
buildNamesList ???
Once I have the List I can call the writefile method on each element. What I can't figure out is how to add a number to the end of a String to get each fileName because I can't have an int i = 0, i ++ construct in Haskell.
I am a bit out of my depth here, would appreciate some guidance, thanks

One possible solution:
buildNamesList = map buildName [1..1000]
where buildName n = "myfile_" ++ show n ++ ".txt"

import Control.Applicative
fileNames = ("myFile_"++) <$> (++".txt") <$> show <$> [1..1000]

how do I then traverse over it, pluck out the String at element n and then pass it into another function?
No! "Plucking out" something from a list in inefficient. You don't want to worry about how to get to each element, then do something with it. That's necessary in imperative languages because they don't have a proper abstraction over what "sequencing actions" means – it's just something magical built into the language. Haskell has much more well-specified, mathematically sound and type-safe magic for that; as a result you don't need loops and suchlike.
You know what to do with each element (String -> IO ()), and you know where the data comes from ([String]). You also know what should eventually happen (IO ()). So the combinator you're looking for has type ( String -> IO() ) -> [String] -> IO (), though obviously it doesn't really depend on the data being Strings, so let's simplify that to (a -> IO()) -> [a] -> IO(). You can look that up on Hoogle, which offers amongst sume rubbish mapM_ and forM_, both of which do what you want:
mapM_ (\filename -> writeFile filename "bla") filenamesList
or
forM_ filenamesList $ \filename ->
writeFile filename "bla"

Sometimes I think of foldr as bearing some resemblance to a for loop. Here's something kind of like the i++ construct, applying i inside a loop:
foldr (\i accum -> ("myfile_" ++ show i ++ ".txt") : accum) [] [1..1000]
Another way could be zipWith, which applies a function to combine two lists:
zipWith (\a b -> a ++ show b ++ ".txt") (repeat "myfile_") [1..1000]
or
zipWith ($) (repeat (("myfile_" ++) . (++ ".txt") . show)) [1..1000]
And here's a recursive example, too, applied as fileList "myfile_" ".txt" [1..1000]:
fileList _ _ [] = []
fileList fName ext (x:xs) = (fName ++ show x ++ ext) : fileList fName ext xs

Using lookup with an IO list?

I am getting the contents of a file and transforming it into a list of form:
[("abc", 123), ("def", 456)]
with readFile, lines, and words.
Right now, I can manage to transform the resulting list into type IO [(String, Int)].
My problem is, when I try to make a function like this:
check x = lookup x theMap
I get this error, which I'm not too sure how to resolve:
Couldn't match expected type `[(a0, b0)]'
with actual type `IO [(String, Int)]'
In the second argument of `lookup', namely `theMap'
theMap is essentially this:
getLines :: String -> IO [String]
getLines = liftM lines . readFile
tuplify [x,y] = (x, read y :: Int)
theMap = do
list <- getLines "./test.txt"
let l = map tuplify (map words list)
return l
And the file contents are:
abc 123
def 456
Can anyone explain what I'm doing wrong and or show me a better solution? I just started toying around with monads a few hours ago and am running into a few bumps along the way.
Thanks

You will have to "unwrap" theMap from IO. Notice how you're already doing this to getLines by:
do
list <- getlines
[...]
return (some computation on list)
So you could have:
check x = do
m <- theMap
return . lookup x $ m
This is, in fact, an antipattern (albeit an illustrative one,) and you would be better off using the functor instance, ie. check x = fmap (lookup x) theMap

Having my cereal and parsing it too

I'm using Data.Serialize.Get and am trying to define the following combinator:
getConsumed :: Get a -> Get (ByteString, a)
which should act like the passed-in Get action, but also return the ByteString that the Get consumed. The use case is that I have a binary structure that I need to both parse and hash, and I don't know the length before parsing it.
This combinator, despite its simple semantics, is proving surprisingly tricky to implement.
Without delving into the internals of Get, my instinct was to use this monstrosity:
getConsumed :: Get a -> Get (B.ByteString, a)
getConsumed g = do
(len, r) <- lookAhead $ do
before <- remaining
res <- g
after <- remaining
return (before - after, res)
bs <- getBytes len
return (bs, r)
Which will use lookahead, peek at the remaining bytes before and after running the action, return the result of the action, and then consume the length. This shouldn't duplicate any work, but it occasionally fails with:
*** Exception: GetException "Failed reading: getBytes: negative length requested\nEmpty call stack\n"
so I must be misunderstanding something about cereal somewhere.
Does anyone see what's wrong with my definition of getconsumed or have a better idea for how to implement it?
Edit: Dan Doel points out that remaining can just return the remaining length of a given chunk, which isn't very useful if you cross a chunk boundary. I'm not sure what the point of the action is, in that case, but that explains why my code wasn't working! Now I just need to find a viable alternative.
Edit 2: after thinking about it some more, it seems like the fact that remaining gives me the length of the current chunk can be to my advantage if I feed the Get manually with individual chunks (remaining >>= getBytes) in a loop and keep track of what it's eating as I do it. I haven't managed to get this approach working either yet, but it seems more promising than the original one.
Edit 3: if anyone's curious, here's code from edit 2 above:
getChunk :: Get B.ByteString
getChunk = remaining >>= getBytes
getConsumed :: Get a -> Get (B.ByteString, a)
getConsumed g = do
(len, res) <- lookAhead $ measure g
bs <- getBytes len
return (bs, res)
where
measure :: Get a -> Get (Int ,a)
measure g = do
chunk <- getChunk
measure' (B.length chunk) (runGetPartial g chunk)
measure' :: Int -> Result a -> Get (Int, a)
measure' !n (Fail e) = fail e
measure' !n (Done r bs) = return (n - B.length bs, r)
measure' !n (Partial f) = do
chunk <- getChunk
measure' (n + B.length chunk) (f chunk)
Unfortunately, it still seems to fail after a while on my sample input with:
*** Exception: GetException "Failed reading: too few bytes\nFrom:\tdemandInput\n\n\nEmpty call stack\n"

EDIT: Another solution, which does no extra computation!
getConsumed :: Get a -> Get (B.ByteString, a)
getConsumed g = do
(len, r) <- lookAhead $ do
(res,after) <- lookAhead $ liftM2 (,) g remaining
total <- remaining
return (total-after, res)
bs <- getBytes len
return (bs, r)
One solution is to call lookAhead twice. The first time makes sure that all necessary chunks are loaded, and the second performs the actual length computation (along with returning the deserialized data).
getConsumed :: Get a -> Get (B.ByteString, a)
getConsumed g = do
_ <- lookAhead g -- Make sure all necessary chunks are preloaded
(len, r) <- lookAhead $ do
before <- remaining
res <- g
after <- remaining
return (before - after, res)
bs <- getBytes len
return (bs, r)

The Cereal package does not store enough information to simply implement what you want. I expect that your idea of using chunks might work, or perhaps a special runGet. Forking Cereal and using the internals is probably your easiest path.
Writing your own can work, this is what I did when making the protocol-buffers library. My custom Text.ProtocolBuffers.Get library does implement enough machinery to do what you want:
import Text.ProtocolBuffers.Get
import Control.Applicative
import qualified Data.ByteString as B
getConsumed :: Get a -> Get (B.ByteString, a)
getConsumed thing = do
start <- bytesRead
(a,stop) <- lookAhead ((,) <$> thing <*> bytesRead)
bs <- getByteString (fromIntegral (stop-start))
return (bs,a)
This is clear because my library tracks the number of byteRead. Otherwise the API is quite similar to Cereal.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

IO woes when seeking sizes of directory contents? - haskell

Related

Haskell: Read a data file as array for computations

Iteratively printing every integer in a List

Create List of Strings in Haskell

Using lookup with an IO list?

Having my cereal and parsing it too

Categories

Resources