Haskell hGetContents error - haskell

Is the following error due to lazy evaluation?
epubParsing :: FilePath -> IO [String]
epubParsing f = do
h <- openFile f ReadMode
hSetEncoding h utf8
content <- hGetContents h
hClose h
return . fromJust $ scrapeStringLike content paragraphS
I get an error: hGetContents: illegal operation (delayed read on closed handle)
Why?

Calling hGetContents puts the handle into a special "semi-closed" state. You cannot perform any explicit operations on it after that. In particular, you don't manually close it; it automatically gets closed in the background when you read to the end of the string. You can just remove that hClose and it will work.
This is one of the pitfalls of lazy I/O, and one of the reasons people advise to avoid it; it makes the timing of your I/O operations kind of unpredictable.

Related

Why do I get parse error on input "isFileEnd" in Haskell?

Below is my haskell code.
readTableFile :: String -> (Handle -> IO a) -> IO [a]
readTableFile file func = do
fileHandle <- withFile file ReadMode (\handle -> do
contents <- readDataFrom handle
putStr contents)
where readDataFrom fileHandle = do
isFileEnd <- hIsEOF fileHandle
if isFileEnd
then
return ("")
else
do
info <- hGetLine fileHandle
putStrLn $ func info
readDataFrom fileHandle
But I get an error:
error: parse error on input ‘isFileEnd’
|
270 | isFileEnd <- hIsEOF fileHandle
| ^^^^^^^^^
I don't know why. Please help me
You've got a couple things going on that are contributing here. As the commenter above pointed out, when you get parse errors that look surprising, spacing is always the first thing to look for. However, we could take a look at a couple things that are contributing here:
Your readTableFile is really just one line long. You've got a do block in which the only thing you do is to assign to fileHandle the value from inside the IO monad that withFile ran in. Aside from the fact that withFile is going to return an IO action from your handler (and not the file handle that your naming might imply) your function isn't actually returning an IO action. Let's clean up some:
readTableFile file func = do
withFile file ReadMode (\handle -> do
contents <- readDataFrom handle
putStr contents)
where readDataFrom fileHandle = do
isFileEnd <- hIsEOF fileHandle
[...]
Now we're returning the right type, but you're still going to get a parse error from the isFileEnd <- assignment. Now that we've cleaned up, you can get your code to compile by moving that (and subsequent lines) to the right of the first character of the readDataFrom declaration:
where readDataFrom fileHandle = do
isFileEnd <- hIsEOF fileHandle
[...]
Your top level do is still redundant, but you'll be past your immediate problems.

Interacting with a subprocess while capturing stderr haskell

So I have a Haskell program that interacts with a subprocess using the System.Process.Typed library. I am trying to capture the stderr of the subprocess during the entire duration of the subprocess's lifespan. The current method doesn't work if the subprocess finishes before I get to line *. I think to do this I need to use STM but I don't know anything about STM and so was wondering if there was a simpler way.
fun :: MyType -> IO MyOtherType
fun inparam = withProcessWait config $ \process -> do
hPutStrLn (getStdin process) (getStr1 inparam)
hFlush (getStdin process)
response1 <- hGetLine (getStdout process)
hPutStrLn (getStdin process) (getStr2 inparam)
hFlush (getStdin process)
response2 <- hGetLine (getStdout process)
err <- hGetContents (getStderr process) -- this is line *
hClose (getStdin process)
exitCode <- timedWaitExitCode 100 process
return $ MyOtherType response1 response2 err
where
config = setStdin createPipe
$ setStdout createPipe
$ setStderr createPipe
$ fromString (fp inparam)
Thank you in advance.
Edit 1: Fixed the * label
Edit 2: When I try to run the code I get Exception: [..] hGetContents: illegal operation (delayed read on closed handle)
You did not specify what exactly “doesn’t work” in your code, so I’ll try to guess. One potential issue that I can immediately see is that you are returning values that you read from file handles (response1, response2, err) from your function. The problem here is that Haskell is a lazy language, so the values that you return are not actually read from those handles until they are really needed. And by the time they are needed, the child process has exited and the handles are closed, so it is impossible to read from them.
The simplest fix would be to force those entire strings to be read before you “return” from your function. One standard recipe for this is to use force followed by evaluate. This will make your program actually read the values and remember them, so the handles can be closed.
So, instead of:
value <- hGetContents handle
you should do:
value' <- hGetContents handle
value <- evaluate $ force value'

How to reuse efficiently input from stdin in Haskell

I understand that I should not try to re-read from stdin because of errors about Haskell IO - handle closed
For example, in below:
main = do
x <- getContents
putStrLn $ map id x
x <- getContents --problem line
putStrLn x
the second call x <- getContents will cause the error:
test: <stdin>: hGetContents: illegal operation (handle is closed)
Of course, I can omit the second line to read from getContents.
main = do
x <- getContents
putStrLn $ map id x
putStrLn x
But will this become a performance/memory issue? Will GHC have to keep all of the contents read from stdin in the main memory?
I imagine the first time around when x is consumed, GHC can throw away the portions of x that are already processed. So theoretically, GHC could only use a small amount of constant memory for the processing. But since we are going to use x again (and again), it seems that GHC cannot throw away anything. (Nor can it read again from stdin).
Is my understanding about the memory implications here correct? And if so, is there a fix?
Yes, your understanding is correct: If you reuse x, ghc has to keep it all in memory.
I think a possible fix is to consume it lazily (once).
Let's say you want to output x to several output handles hdls :: [Handle]. The naive approach is:
main :: IO ()
main = do
x <- getContents
forM_ hdls $ \hdl -> do
hPutStr hdl x
This will read stdin into x as the first hPutStr traverses the string (at least for unbuffered handles, hPutStr is simply a loop that calls hPutChar for each character in the string). From then on it'll be kept in memory for all following hdls.
Alternatively:
main :: IO ()
main = do
x <- getContents
forM_ x $ \c -> do
forM_ hdls $ \hdl -> do
hPutChar hdl c
Here we've transposed the loops: Instead of iterating over the handles (and for each handle iterating over the input characters), we iterate over the input characters, and for each character, we print it to each handle.
I haven't tested it, but this form should guarantee that we don't need a lot of memory because each input character c is used once and then discarded.

Understanding `withFile` with Example

I implemented withFile in Haskell:
withFile' :: FilePath -> IOMode -> (Handle -> IO a) -> IO a
withFile' path iomode f = do
handle <- openFile path iomode
result <- f handle
hClose handle
return result
When I ran the main provided by Learn You a Haskell, it printed out the content of "girlfriend.txt," as expected:
import System.IO
main = do
withFile' "girlfriend.txt" ReadMode (\handle -> do
contents <- hGetContents handle
putStr contents)
I wasn't sure if my withFile' would've worked with the last 2 lines: (1) close the handle and (2) returning the result as anIO a.
Why didn't the following happen?
result gets lazily bound to f handle
hClose handle closes the file handle
result gets return'd, which results in the actual evaluate of f handle. Since handle was closed, an error gets thrown.
Lazy IO is popularly known as confusing.
It depends on whether putStr executes before hClose or not.
Notice the difference between the first and second uses (the brackets are unnecessary but clarifying in the second example).
ghci> withFile' "temp.hs" ReadMode (hGetContents >=> putStr) -- putStr
import System.IO
import Control.Monad
withFile' :: FilePath -> IOMode -> (Handle -> IO a) -> IO a
withFile' path iomode f = do
handle <- openFile path iomode
result <- f handle
hClose handle
return result
ghci> (withFile' "temp.hs" ReadMode hGetContents) >>= putStr
ghci>
In both cases, the f passed in gets a chance to run before the handle is closed. Because of lazy evaluation, hGetContents only reads the file if it needs to, i.e. is forced to in order to produce output for some other function.
In the first example, since f is (hGetContents >=> putStr), the full contents of the file must be read in order to execute putStr.
In the second example, nothing needs to be evaluated after hGetContents in order to return result, which is a lazy list. (I can quite happily return (show [1..]) which will only fail to terminate if I choose to use the entire output.) This is seen as a problem for lazy IO, which is fixed by alternatives such as strict IO, pipes or conduit.
Maybe returning the empty string for a file when the handle was closed prematurely is a bug, but certainly running the entirety of f before closing it is not.
Equational reasoning means that you can reason about Haskell code by just inlining and substituting things (with certain caveats, but they don't apply here).
This means that all I need to do to understand your code is to take the withFile' here:
import System.IO
main = do
withFile' "girlfriend.txt" ReadMode (\handle -> do
contents <- hGetContents handle
putStr contents)
... and inline its definition:
main = do
handle <- openFile "girlfriend.txt" ReadMode
contents <- hGetContents handle
result <- putStr contents
hClose handle
return result
Once you inline its definition, it's easier to see what is going on. putStr evaluates the entire contents of the file before you close the handle, so there is no error. Also, result is not what you think it is: it's the return value of putStr, which is just (), not the contents of the file.
Most IO actions are not lazily executed.
IO action execution is different from normal Haskell evaluation of values. IO execution is only ever carried out by the outer driver that is trying to execute all the effects of main; it does so in the correct order implied by the monadic sequencing of IO actions.
The driver's need to know what the next IO action is ultimately triggers all evaluation of lazy values in Haskell; if it were happy with an unevaluated lazy value and moved on to the next thing without fully evaluating and executing it, then it would just leave main unevaluated and no Haskell program could ever do anything.
The Haskell value resulting from executing an IO action may of course be an unevaluated lazy value, but each IO action itself is evaluated and executed by the driver (including all sub-actions sequenced with do blocks or binds).
So result doesn't get lazily bound to f handle completely unevaluated; f handle is evaluated to come up with the sub actions hGetContents handle and putStr contents. These are both fully executed before the outer driver moves on to hClose handle, so everything's okay.
Note however that hGetContents is special. Quoting from the documentation:
Computation hGetContents hdl returns the list of characters corresponding to the unread portion of the channel or file managed by hdl, which is put into an intermediate state, semi-closed. In this state, hdl is effectively closed, but items are read from hdl on demand and accumulated in a special list returned by hGetContents hdl.
Any operation that fails because a handle is closed, also fails if a handle is semi-closed. The only exception is hClose. A semi-closed handle becomes closed:
if hClose is applied to it;
if an I/O error occurs when reading an item from the handle;
or once the entire contents of the handle has been read.
Once a semi-closed handle becomes closed, the contents of the associated list becomes fixed. The contents of this final list is only partially specified: it will contain at least all the items of the stream that were evaluated prior to the handle becoming closed.
So executing hGetContents handle actually results in a partially evaluated list, whose lazy evaluation is tied to further IO operations under the hood. This is impossible to do yourself without using the Unsafe family of operations, since it is essentially bypassing the type system and can result in exactly the sort of problem you were concerned about; if you had attempted the following code:
main = do
text <- withFile' "girlfriend.txt" ReadMode (\handle -> do
contents <- hGetContents handle
return contents)
putStr text
(where the function passed to withFile' tries to return the file contents, and they are passed to putStr after the withFile' call), then the putStr would be executed after hClose, and the file may well not have been fully read before it was closed.

In Haskell, I want to read a file and then write to it. Do I need strictness annotation?

Still quite new to Haskell..
I want to read the contents of a file, do something with it possibly involving IO (using putStrLn for now) and then write new contents to the same file.
I came up with:
doit :: String -> IO ()
doit file = do
contents <- withFile tagfile ReadMode $ \h -> hGetContents h
putStrLn contents
withFile tagfile WriteMode $ \h -> hPutStrLn h "new content"
However this doesn't work due to laziness. The file contents are not printed. I found this post which explains it well.
The solution proposed there is to include putStrLn within the withFile:
doit :: String -> IO ()
doit file = do
withFile tagfile ReadMode $ \h -> do
contents <- hGetContents h
putStrLn contents
withFile tagfile WriteMode $ \h -> hPutStrLn h "new content"
This works, but it's not what I want to do. The operation in I will eventually replace putStrLn might be long, I don't want to keep the file open the whole time. In general I just want to be able to get the file content out and then close it before working with that content.
The solution I came up with is the following:
doit :: String -> IO ()
doit file = do
c <- newIORef ""
withFile tagfile ReadMode $ \h -> do
a <- hGetContents h
writeIORef c $! a
d <- readIORef c
putStrLn d
withFile tagfile WriteMode $ \h -> hPutStrLn h "Test"
However, I find this long and a bit obfuscated. I don't think I should need an IORef just to get a value out, but I needed "place" to put the file contents. Also, it still didn't work without the strictness annotation $! for writeIORef. I guess IORefs are not strict by nature?
Can anyone recommend a better, shorter way to do this while keeping my desired semantics?
Thanks!
The reason your first program does not work is that withFile closes the file after executing the IO action passed to it. In your case, the IO action is hGetContents which does not read the file right away, but only as its contents are demanded. By the time you try to print the file's contents, withFile has already closed the file, so the read fails (silently).
You can fix this issue by not reinventing the wheel and simply using readFile and writeFile:
doit file = do
contents <- readFile file
putStrLn contents
writeFile file "new content"
But suppose you want the new content to depend on the old content. Then you cannot, generally, simply do
doit file = do
contents <- readFile file
writeFile file $ process contents
because the writeFile may affect what the readFile returns (remember, it has not actually read the file yet). Or, depending on your operating system, you might not be able to open the same file for reading and writing on two separate handles. The simple but ugly workaround is
doit file = do
contents <- readFile file
length contents `seq` (writeFile file $ process contents)
which will force readFile to read the entire file and close it before the writeFile action can begin.
I think the easiest way to solve this problem is useing strict IO:
import qualified System.IO.Strict as S
main = do
file <- S.readFile "filename"
writeFile "filename" file
You can duplicate the file Handle, do lazy write with original one (to the end of file) and lazy read with another. So no strictness annotation involved in case of appending to file.
import System.IO
import GHC.IO.Handle
main :: IO ()
main = do
h <- openFile "filename" ReadWriteMode
h2 <- hDuplicate h
hSeek h2 AbsoluteSeek 0
originalFileContents <- hGetContents h2
putStrLn originalFileContents
hSeek h SeekFromEnd 0
hPutStrLn h $ concatMap ("{new_contents}" ++) (lines originalFileContents)
hClose h2
hClose h
The hDuplicate function is provided by GHC.IO.Handle module.
Returns a duplicate of the original handle, with its own buffer. The two Handles will share a file pointer, however. The original handle's buffer is flushed, including discarding any input data, before the handle is duplicated.
With hSeek you can set position of the handle before reading or writing.
But I'm not sure how reliable would be using "AbsoluteSeek 0" instead of "SeekFromEnd 0" for writing, i.e. overwriting contents. Generally I would suggest to write to a temporary file first, for example using openTempFile (from System.IO), and then replace original.
It's ugly but you can force the contents to be read by asking for the length of the input and seq'ing it with the next statement in your do-block. But really the solution is to use a strict version of hGetContents. I'm not sure what it's called.

Resources