Using Data.Binary.decodeFile, encountered error "demandInput: not enough bytes" - haskell

I'm attempting to use the encodeFile and decodeFile functions in Data.Binary to save a very large datastructure so that I don't have to recompute it every time I run this program. The relevant encoding- and decoding-functions are as follows:
writePlan :: IO ()
writePlan = do (d, _, bs) <- return subjectDomain
outHandle <- openFile "outputfile" WriteMode
((ebsP, aP), cacheData) <- preplanDomain d bs
putStrLn "Calculated."
let toWrite = ((map pseudofyOverEBS ebsP, aP),
pseudofyOverMap cacheData) :: WrittenData
in do encodeFile preplanFilename $ encode toWrite
putStrLn "Done."
readPlan :: IO (([EvaluatedBeliefState], [Action]), MVar HeuCache)
readPlan = do (d, _, _) <- return subjectDomain
inHandle <- openFile "outputfile" ReadMode
((ebsP, aP), cacheData) <- decodeFile preplanFilename :: IO WrittenData
fancyCache <- newMVar (M.empty, depseudofyOverMap cacheData)
return $! ((map depseudofyOverEBS ebsP, aP), fancyCache)
The program to calculate and write the file (using writePlan) executes without error, outputting a gigantic binary file. However, when I run the program which takes in this file, executing readPlan results in the error (the program name is "Realtime"):
Realtime: demandInput: not enough bytes
I can't make head nor tail of this, and scouring Google has turned up no substantial documentation or discussion of this message. Any insight would be appreciated!

I am very late to the party, but found this while looking for help with a similar issue. I'm working with the incremental interface for Data.Binary.Get. As you can see in here, the function is defined in Data.Binary.Get.Internal. Now I am guessing, but your decodeFile function probably does some sort of parsing and the error is thrown because the file does not parse completely (i.e. the parser thinks that there must be something else in the file but it reaches EOF already).
Hope that helps anyone with this/similar issues!

Related

Why do I get parse error on input "isFileEnd" in Haskell?

Below is my haskell code.
readTableFile :: String -> (Handle -> IO a) -> IO [a]
readTableFile file func = do
fileHandle <- withFile file ReadMode (\handle -> do
contents <- readDataFrom handle
putStr contents)
where readDataFrom fileHandle = do
isFileEnd <- hIsEOF fileHandle
if isFileEnd
then
return ("")
else
do
info <- hGetLine fileHandle
putStrLn $ func info
readDataFrom fileHandle
But I get an error:
error: parse error on input ‘isFileEnd’
|
270 | isFileEnd <- hIsEOF fileHandle
| ^^^^^^^^^
I don't know why. Please help me
You've got a couple things going on that are contributing here. As the commenter above pointed out, when you get parse errors that look surprising, spacing is always the first thing to look for. However, we could take a look at a couple things that are contributing here:
Your readTableFile is really just one line long. You've got a do block in which the only thing you do is to assign to fileHandle the value from inside the IO monad that withFile ran in. Aside from the fact that withFile is going to return an IO action from your handler (and not the file handle that your naming might imply) your function isn't actually returning an IO action. Let's clean up some:
readTableFile file func = do
withFile file ReadMode (\handle -> do
contents <- readDataFrom handle
putStr contents)
where readDataFrom fileHandle = do
isFileEnd <- hIsEOF fileHandle
[...]
Now we're returning the right type, but you're still going to get a parse error from the isFileEnd <- assignment. Now that we've cleaned up, you can get your code to compile by moving that (and subsequent lines) to the right of the first character of the readDataFrom declaration:
where readDataFrom fileHandle = do
isFileEnd <- hIsEOF fileHandle
[...]
Your top level do is still redundant, but you'll be past your immediate problems.

How to reuse efficiently input from stdin in Haskell

I understand that I should not try to re-read from stdin because of errors about Haskell IO - handle closed
For example, in below:
main = do
x <- getContents
putStrLn $ map id x
x <- getContents --problem line
putStrLn x
the second call x <- getContents will cause the error:
test: <stdin>: hGetContents: illegal operation (handle is closed)
Of course, I can omit the second line to read from getContents.
main = do
x <- getContents
putStrLn $ map id x
putStrLn x
But will this become a performance/memory issue? Will GHC have to keep all of the contents read from stdin in the main memory?
I imagine the first time around when x is consumed, GHC can throw away the portions of x that are already processed. So theoretically, GHC could only use a small amount of constant memory for the processing. But since we are going to use x again (and again), it seems that GHC cannot throw away anything. (Nor can it read again from stdin).
Is my understanding about the memory implications here correct? And if so, is there a fix?
Yes, your understanding is correct: If you reuse x, ghc has to keep it all in memory.
I think a possible fix is to consume it lazily (once).
Let's say you want to output x to several output handles hdls :: [Handle]. The naive approach is:
main :: IO ()
main = do
x <- getContents
forM_ hdls $ \hdl -> do
hPutStr hdl x
This will read stdin into x as the first hPutStr traverses the string (at least for unbuffered handles, hPutStr is simply a loop that calls hPutChar for each character in the string). From then on it'll be kept in memory for all following hdls.
Alternatively:
main :: IO ()
main = do
x <- getContents
forM_ x $ \c -> do
forM_ hdls $ \hdl -> do
hPutChar hdl c
Here we've transposed the loops: Instead of iterating over the handles (and for each handle iterating over the input characters), we iterate over the input characters, and for each character, we print it to each handle.
I haven't tested it, but this form should guarantee that we don't need a lot of memory because each input character c is used once and then discarded.

Where is the space leak in this code?

I am trying to split a file into two separate files by alternating lines. (i.e. lines 1,3,5,7.. written to file 1 and lines 2,4,6,8... written to file 2).
The file I am working with is ~700MB, so when I seen the memory usage balloon over 6GB, I know something is wrong.
main :: IO()
main = withFile splitFile ReadMode splitData
where
splitData h = do
dataSet <- lines <$> hGetContents h
let (s1,s2) = foldl' (\(l,r) x -> (x:r,l)) ([],[]) dataSet
writeFile testFile $ unlines s1
writeFile trainingFile $ unlines s2
I initially was using the lazy version of foldl, but after some research it seemed that using the strict version would help. But alas, it made no noticeable difference. I also tried compiling with -O2, but that did nothing either.
I am using GHC 7.10.2
I'm not getting a stack overflow, so what is it using all that memory for?
As mentioned in a comment by #dfeuer, the use of writeFile will force the entire string to be written to be computed, which also forces the entire input to be read. The space leak is caused by the fact that the entire second file must be kept in memory while the first file is being written, when it is obvious that one must only keep in memory one line at a time. And indeed the solution is to write line by line:
import Control.Monad
import System.IO
main :: IO ()
main =
withFile splitFile ReadMode $ \hIn ->
withFile testFile WriteMode $ \hOdd ->
withFile trainingFile WriteMode $ \hEven ->
zipWithM_ hPutStrLn (cycle [hOdd, hEven]) . lines =<< hGetContents hIn
This program runs in constant space.

Catching/hijacking stdout in haskell

How can I define 'catchOutput' so that running main outputs only 'bar'?
That is, how can I access both the output stream (stdout) and the actual output of an io action separately?
catchOutput :: IO a -> IO (a,String)
catchOutput = undefined
doSomethingWithOutput :: IO a -> IO ()
doSomethingWithOutput io = do
(_ioOutp, stdOutp) <- catchOutput io
if stdOutp == "foo"
then putStrLn "bar"
else putStrLn "fail!"
main = doSomethingWithOutput (putStr "foo")
The best hypothetical "solution" I've found so far includes diverting stdout, inspired by this, to a file stream and then reading from that file (Besides being super-ugly I haven't been able to read directly after writing from a file. Is it possible to create a "custom buffer stream" that doesn't have to store in a file?). Although that feels 'a bit' like a side track.
Another angle seems to use 'hGetContents stdout' if that is supposed to do what I think it should. But I'm not given permission to read from stdout. Although googling it seems to show that it has been used.
I used the following function for an unit test of a function that prints to stdout.
import GHC.IO.Handle
import System.IO
import System.Directory
catchOutput :: IO () -> IO String
catchOutput f = do
tmpd <- getTemporaryDirectory
(tmpf, tmph) <- openTempFile tmpd "haskell_stdout"
stdout_dup <- hDuplicate stdout
hDuplicateTo tmph stdout
hClose tmph
f
hDuplicateTo stdout_dup stdout
str <- readFile tmpf
removeFile tmpf
return str
I am not sure about the in-memory file approach, but it works okay for a small amount of output with a temporary file.
There are some packages on Hackage that promise to do that : io-capture and silently. silently seems to be maintained and works on Windows too (io-capture only works on Unix). With silently, you use capture :
import System.IO.Silently
main = do
(output, _) <- capture $ putStr "hello"
putStrLn $ output ++ " world"
Note that it works by redirecting output to a temporary file and then read it... But as long as it works !
Why not just use a writer monad instead? For example,
import Control.Monad.Writer
doSomethingWithOutput :: WriterT String IO a -> IO ()
doSomethingWithOutput io = do
(_, res) <- runWriterT io
if res == "foo"
then putStrLn "bar"
else putStrLn "fail!"
main = doSomethingWithOutput (tell "foo")
Alternatively, you could modify your inner action to take a Handle to write to instead of stdout. You can then use something like knob to make an in-memory file handle which you can pass to the inner action, and check its contents afterward.
As #hammar pointed out, you can use a knob to create an in-memory file, but you can also use hDuplicate and hDuplicateTo to change stdout to the memory file, and back again. Something like the following completely untested code:
catchOutput io = do
knob <- newKnob (pack [])
let before = do
h <- newFileHandle knob "<stdout>" WriteMode
stdout' <- hDuplicate stdout
hDuplicateTo h stdout
hClose h
return stdout'
after stdout' = do
hDuplicateTo stdout' stdout
hClose stdout'
a <- bracket_ before after io
bytes <- Data.Knob.getContents knob
return (a, unpack bytes)

In Haskell, I want to read a file and then write to it. Do I need strictness annotation?

Still quite new to Haskell..
I want to read the contents of a file, do something with it possibly involving IO (using putStrLn for now) and then write new contents to the same file.
I came up with:
doit :: String -> IO ()
doit file = do
contents <- withFile tagfile ReadMode $ \h -> hGetContents h
putStrLn contents
withFile tagfile WriteMode $ \h -> hPutStrLn h "new content"
However this doesn't work due to laziness. The file contents are not printed. I found this post which explains it well.
The solution proposed there is to include putStrLn within the withFile:
doit :: String -> IO ()
doit file = do
withFile tagfile ReadMode $ \h -> do
contents <- hGetContents h
putStrLn contents
withFile tagfile WriteMode $ \h -> hPutStrLn h "new content"
This works, but it's not what I want to do. The operation in I will eventually replace putStrLn might be long, I don't want to keep the file open the whole time. In general I just want to be able to get the file content out and then close it before working with that content.
The solution I came up with is the following:
doit :: String -> IO ()
doit file = do
c <- newIORef ""
withFile tagfile ReadMode $ \h -> do
a <- hGetContents h
writeIORef c $! a
d <- readIORef c
putStrLn d
withFile tagfile WriteMode $ \h -> hPutStrLn h "Test"
However, I find this long and a bit obfuscated. I don't think I should need an IORef just to get a value out, but I needed "place" to put the file contents. Also, it still didn't work without the strictness annotation $! for writeIORef. I guess IORefs are not strict by nature?
Can anyone recommend a better, shorter way to do this while keeping my desired semantics?
Thanks!
The reason your first program does not work is that withFile closes the file after executing the IO action passed to it. In your case, the IO action is hGetContents which does not read the file right away, but only as its contents are demanded. By the time you try to print the file's contents, withFile has already closed the file, so the read fails (silently).
You can fix this issue by not reinventing the wheel and simply using readFile and writeFile:
doit file = do
contents <- readFile file
putStrLn contents
writeFile file "new content"
But suppose you want the new content to depend on the old content. Then you cannot, generally, simply do
doit file = do
contents <- readFile file
writeFile file $ process contents
because the writeFile may affect what the readFile returns (remember, it has not actually read the file yet). Or, depending on your operating system, you might not be able to open the same file for reading and writing on two separate handles. The simple but ugly workaround is
doit file = do
contents <- readFile file
length contents `seq` (writeFile file $ process contents)
which will force readFile to read the entire file and close it before the writeFile action can begin.
I think the easiest way to solve this problem is useing strict IO:
import qualified System.IO.Strict as S
main = do
file <- S.readFile "filename"
writeFile "filename" file
You can duplicate the file Handle, do lazy write with original one (to the end of file) and lazy read with another. So no strictness annotation involved in case of appending to file.
import System.IO
import GHC.IO.Handle
main :: IO ()
main = do
h <- openFile "filename" ReadWriteMode
h2 <- hDuplicate h
hSeek h2 AbsoluteSeek 0
originalFileContents <- hGetContents h2
putStrLn originalFileContents
hSeek h SeekFromEnd 0
hPutStrLn h $ concatMap ("{new_contents}" ++) (lines originalFileContents)
hClose h2
hClose h
The hDuplicate function is provided by GHC.IO.Handle module.
Returns a duplicate of the original handle, with its own buffer. The two Handles will share a file pointer, however. The original handle's buffer is flushed, including discarding any input data, before the handle is duplicated.
With hSeek you can set position of the handle before reading or writing.
But I'm not sure how reliable would be using "AbsoluteSeek 0" instead of "SeekFromEnd 0" for writing, i.e. overwriting contents. Generally I would suggest to write to a temporary file first, for example using openTempFile (from System.IO), and then replace original.
It's ugly but you can force the contents to be read by asking for the length of the input and seq'ing it with the next statement in your do-block. But really the solution is to use a strict version of hGetContents. I'm not sure what it's called.

Resources