When I try to run this code...
module Main where
import qualified Data.Text.Lazy.IO as LTIO
import qualified Data.Text.Lazy as LT
import System.IO (IOMode(..), withFile)
getFirstLine :: FilePath -> IO String
getFirstLine path =
withFile path ReadMode (\f -> do
contents <- LTIO.hGetContents f
return ("-- "++(LT.unpack . head $ LT.lines contents)++" --"))
main::IO()
main = do
firstLine <- getFirstLine "/tmp/foo.csv"
print firstLine
I get
"-- *** Exception: Prelude.head: empty list
... where I would expect it to print the first line of "/tmp/foo.csv". Could you please explain why? Ultimately, I'm trying to figure out how to create a lazy list of Texts from file input.
As Daniel Lyons mentions in a comment, this is due to IO and laziness interacting.
Imagine, if you will:
withFile opens the file, to file handle f.
Thunk using contents of f is returned.
withFile closes the file.
Thunk is evaluated. There are no contents in a closed file.
This trap is mentioned on the HaskellWiki / Maintaining laziness page.
To fix, you can either read the whole file contents within withFile (possibly by forcing it with seq) or lazily close the file instead of using withFile.
I think it's like this: withFile closes the file after executing the function. hGetContents reads the contents lazily (lazy IO), and by the time it needs to read the stuff, the file is closed.
Instead of using withFile, try just using openFile, and not closing it. hGetContents will place the file in semi-closed state after it's reading from it. Or better, just read the contents directly using readFile
Related
Here I'm back again with a (for me) really strange behaviour of my newest masterpiece...
This code should read a file, but it doesn't:
readCsvContents :: String -> IO ( String )
readCsvContents fileName = do
withFile fileName ReadMode (\handle -> do
contents <- hGetContents handle
return contents
)
main = do
contents <- readCsvContents "src\\EURUSD60.csv"
putStrLn ("Read " ++ show (length contents) ++ " Bytes input data.")
The result is
Read 0 Bytes input data.
Now I changed the first function and added a putStrLn:
readCsvContents :: String -> IO ( String )
readCsvContents fileName = do
withFile fileName ReadMode (\handle -> do
contents <- hGetContents handle
putStrLn ("hGetContents gave " ++ show (length contents) ++ " Bytes of input data.")
return contents
)
and the result is
hGetContents gave 3479360 Bytes of input data.
Read 3479360 Bytes input data.
WTF ??? Well, I know, Haskell is lazy. But I didn't know I had to kick it in the butt like this.
You're right, this is a pain. Avoid using the old standard file IO module, for this reason – except to simply read an entire file that won't change, as you did; this can be done just fine with readFile.
readCsvContents :: Filepath -> IO String
readCsvContents fileName = do
contents <- readFile fileName
return contents
Note that, by the monad laws, this is exactly the same1 as
readCsvContents = readFile
The problem with what you tried is that the handle is closed unconditionally when the monad exits withFile, without checking whether lazy-evaluation of contents has actually forced the file reads. That is of course horrible; I would never bother to use handles myself. readFile avoids the problem by linking the closing of the handle to garbage-collection of the original result thunk; this isn't altogether nice either but often works quite well.
For proper work with file IO, check out either the conduit or pipes library. The former focuses a bit more on performance, the latter more on elegance (but really, the difference isn't that big).
1And your first try is the same as readCsvContents fn = withFile fn ReadMode hGetContents.
This is a problem with lazy IO. What happens in your code is that withFile opens the file, passes the handle to the lambda. This lambda returns a lazy list containing the contents of the file. Then withFile notices that the callback finished and closes the file.
Since the returned list is lazy, the file contents will only be read when the list is evaluated. This happens in the call to length. However, at this point the file handle is already closed and therefore you can't read anything from the file.
The modified version of your call forces the file contents in the withFile argument, at which point the file is still available, and therefore it works.
I want to wait until user input terminates with EOF and then output it all whole. Isn't that what getContents supposed to do? The following code outputs each time user hits enter, what am I doing wrong?
import System.IO
main = do
hSetBuffering stdin NoBuffering
contents <- getContents
putStrLn contents
The fundamental problem is that getContents is an instances of Lazy IO. This means that getContents produces a thunk that can be evaluated like a normal Haskell value, and only does the relevant IO when it's forced.
contents is a lazy list that putStr tries to print, which forces the list and causes getContents to read as much as it can. putStr then prints everything that's forced, and continues trying to force the rest of the list until it hits []. As getContents can read more and more of the stream—the exact behavior depends on buffering—putStr can print more and more of it immediately, giving you the behavior you see.
While this behavior is useful for very simple scripts, it ties in Haskell's evaluation order into observable effects—something it was never meant to do. This means that controlling exactly when parts of contents get printed is awkward because you have to break the normal Haskell abstraction and understand exactly how things are getting evaluated.
This leads to some potentially unintuitive behavior. For example, if you try to get the length of the input—and actually use it—the list is forced before you get to printing it, giving you the behavior you want:
main = do
contents <- getContents
let n = length contents
print n
putStr contents
but if you move the print n after the putStr, you go back to the original behavior because n does not get forced until after printing the input (even though n still got defined before putStr was used):
main = do
contents <- getContents
let n = length contents
putStr contents
print n
Normally, this sort of thing is not a problem because it won't change the behavior of your code (although it can affect performance). Lazy IO just brings it into the realm of correctness by piercing the abstraction layer.
This also gives us a hint on how we can fix your issue: we need some way of forcing contents before printing it. As we saw, we can do this with length because length needs to traverse the whole list before computing its result. Instead of printing it, we can use seq which forces the lefthand expression to be evaluated at the same time as the righthand one, but throws away the actual value:
main = do
contents <- getContents
let n = length contents
n `seq` putStr contents
At the same time, this is still a bit ugly because we're using length just to traverse the list, not because we actually care about it. What we would really like is a function that just traverses the list enough to evaluate it, without doing anything else. Happily, this is exactly what deepseq does (for many data structures, not just lists):
import Control.DeepSeq
import System.IO
main = do
contents <- getContents
contents `deepseq` putStr contents
This is a problem of lazy I/O. One simple solution is to use strict I/O, such as via ByteStrings:
import qualified Data.ByteString as S
main :: IO ()
main = S.getContents >>= S.putStr
You can use the replacement functions from the strict package (link):
import qualified System.IO.Strict as S
main = do
contents <- S.getContents
putStrLn contents
Note that for reading there isn't a need to set buffering. Buffering really only helps when writing to files. See this answer (link) for more details.
The definition of the strict version of hGetContents in System.IO.Strict is pretty simple:
hGetContents :: IO.Handle -> IO.IO String
hGetContents h = IO.hGetContents h >>= \s -> length s `seq` return s
I.e., it forces everything to read into memory by calling length on the string returned by the standard/lazy version of hGetContents.
I implemented withFile in Haskell:
withFile' :: FilePath -> IOMode -> (Handle -> IO a) -> IO a
withFile' path iomode f = do
handle <- openFile path iomode
result <- f handle
hClose handle
return result
When I ran the main provided by Learn You a Haskell, it printed out the content of "girlfriend.txt," as expected:
import System.IO
main = do
withFile' "girlfriend.txt" ReadMode (\handle -> do
contents <- hGetContents handle
putStr contents)
I wasn't sure if my withFile' would've worked with the last 2 lines: (1) close the handle and (2) returning the result as anIO a.
Why didn't the following happen?
result gets lazily bound to f handle
hClose handle closes the file handle
result gets return'd, which results in the actual evaluate of f handle. Since handle was closed, an error gets thrown.
Lazy IO is popularly known as confusing.
It depends on whether putStr executes before hClose or not.
Notice the difference between the first and second uses (the brackets are unnecessary but clarifying in the second example).
ghci> withFile' "temp.hs" ReadMode (hGetContents >=> putStr) -- putStr
import System.IO
import Control.Monad
withFile' :: FilePath -> IOMode -> (Handle -> IO a) -> IO a
withFile' path iomode f = do
handle <- openFile path iomode
result <- f handle
hClose handle
return result
ghci> (withFile' "temp.hs" ReadMode hGetContents) >>= putStr
ghci>
In both cases, the f passed in gets a chance to run before the handle is closed. Because of lazy evaluation, hGetContents only reads the file if it needs to, i.e. is forced to in order to produce output for some other function.
In the first example, since f is (hGetContents >=> putStr), the full contents of the file must be read in order to execute putStr.
In the second example, nothing needs to be evaluated after hGetContents in order to return result, which is a lazy list. (I can quite happily return (show [1..]) which will only fail to terminate if I choose to use the entire output.) This is seen as a problem for lazy IO, which is fixed by alternatives such as strict IO, pipes or conduit.
Maybe returning the empty string for a file when the handle was closed prematurely is a bug, but certainly running the entirety of f before closing it is not.
Equational reasoning means that you can reason about Haskell code by just inlining and substituting things (with certain caveats, but they don't apply here).
This means that all I need to do to understand your code is to take the withFile' here:
import System.IO
main = do
withFile' "girlfriend.txt" ReadMode (\handle -> do
contents <- hGetContents handle
putStr contents)
... and inline its definition:
main = do
handle <- openFile "girlfriend.txt" ReadMode
contents <- hGetContents handle
result <- putStr contents
hClose handle
return result
Once you inline its definition, it's easier to see what is going on. putStr evaluates the entire contents of the file before you close the handle, so there is no error. Also, result is not what you think it is: it's the return value of putStr, which is just (), not the contents of the file.
Most IO actions are not lazily executed.
IO action execution is different from normal Haskell evaluation of values. IO execution is only ever carried out by the outer driver that is trying to execute all the effects of main; it does so in the correct order implied by the monadic sequencing of IO actions.
The driver's need to know what the next IO action is ultimately triggers all evaluation of lazy values in Haskell; if it were happy with an unevaluated lazy value and moved on to the next thing without fully evaluating and executing it, then it would just leave main unevaluated and no Haskell program could ever do anything.
The Haskell value resulting from executing an IO action may of course be an unevaluated lazy value, but each IO action itself is evaluated and executed by the driver (including all sub-actions sequenced with do blocks or binds).
So result doesn't get lazily bound to f handle completely unevaluated; f handle is evaluated to come up with the sub actions hGetContents handle and putStr contents. These are both fully executed before the outer driver moves on to hClose handle, so everything's okay.
Note however that hGetContents is special. Quoting from the documentation:
Computation hGetContents hdl returns the list of characters corresponding to the unread portion of the channel or file managed by hdl, which is put into an intermediate state, semi-closed. In this state, hdl is effectively closed, but items are read from hdl on demand and accumulated in a special list returned by hGetContents hdl.
Any operation that fails because a handle is closed, also fails if a handle is semi-closed. The only exception is hClose. A semi-closed handle becomes closed:
if hClose is applied to it;
if an I/O error occurs when reading an item from the handle;
or once the entire contents of the handle has been read.
Once a semi-closed handle becomes closed, the contents of the associated list becomes fixed. The contents of this final list is only partially specified: it will contain at least all the items of the stream that were evaluated prior to the handle becoming closed.
So executing hGetContents handle actually results in a partially evaluated list, whose lazy evaluation is tied to further IO operations under the hood. This is impossible to do yourself without using the Unsafe family of operations, since it is essentially bypassing the type system and can result in exactly the sort of problem you were concerned about; if you had attempted the following code:
main = do
text <- withFile' "girlfriend.txt" ReadMode (\handle -> do
contents <- hGetContents handle
return contents)
putStr text
(where the function passed to withFile' tries to return the file contents, and they are passed to putStr after the withFile' call), then the putStr would be executed after hClose, and the file may well not have been fully read before it was closed.
this may be a stupid question but i couldn't find answer anywhere. I'm a Haskell newbie and i'm having trouble with I/O.
I have this structure:
data SrcFile = SrcFile (IO Handle) String
srcFileHandle :: SrcFile -> IO Handle
srcFileHandle (SrcFile handle _) = handle
srcFileLine :: SrcFile -> String
srcFileLine (SrcFile _ string) = string
Now the problem is that i have no idea how to assign stdin/stderr/stdout into it, because the stdin etc are Handlers, no IO Handlers. And if i make the structure have Handle attributes insted of IO Handle, then i won't be able to add any other file handles into it.
Judging from your definition of SrcFile, it seems as though you may be trying to write a C program in Haskell. Language shapes the way we think, and the good news is Haskell is a much more powerful language!
The excellent book Real World Haskell has a section on lazy I/O. Consider an excerpt:
One novel way to approach I/O is the hGetContents function. hGetContents has the type Handle -> IO String. The String it returns represents all of the data in the file given by the Handle.
In a strictly-evaluated language, using such a function is often a bad idea. It may be fine to read the entire contents of a 2KB file, but if you try to read the entire contents of a 500GB file, you are likely to crash due to lack of RAM to store all that data. In these languages, you would traditionally use mechanisms such as loops to process the file's entire data.
Here's the radical part.
But hGetContents is different. The String it returns is evaluated lazily. At the moment you call hGetContents, nothing is actually read. Data is only read from the Handle as the elements (characters) of the list are processed. As elements of the String are no longer used, Haskell's garbage collector automatically frees that memory. All of this happens completely transparently to you. And since you have what looks like—and, really, is—a pure String, you can pass it to pure (non-IO) code.
Further down is a section on readFile and writeFile that shows you how to forget about handles entirely.
For example, say you want to grab all the import lines from a source file:
module Main where
import Control.Monad (liftM, mapM_)
import Data.List (isPrefixOf)
import System.Environment (getArgs, getProgName)
import System.IO (hPutStrLn, stderr)
main :: IO ()
main = getArgs >>= go
where go [path] = collectImports `liftM` readFile path >>= mapM_ putStrLn
go _ = getProgName >>=
hPutStrLn stderr . ("Usage: " ++) . (++ " source-file")
collectImports :: String -> [String]
collectImports = filter ("import" `isPrefixOf`)
. takeWhile (\l -> null l
|| "module" `isPrefixOf` l
|| "import" `isPrefixOf` l)
. lines
Even though the definition of main uses readFile, the program reads only as much of the named source-file as necessary, not the whole thing! There's nothing magic going on: note that collectImports uses takeWhile to examine only those lines it needs to rather than, say, filter that would have to read all lines.
When fed its own source, the program outputs
import Control.Monad (liftM, mapM_)
import Data.List (isPrefixOf)
import System.Environment (getArgs, getProgName)
import System.IO (hPutStrLn, stderr)
So embrace laziness. Laziness is your friend! Enjoy the rest of the wonderful journey with Haskell.
I'm not sure what you're really attempting to do, but you can convert a Handle to IO Handle by using return function. So,
stdin :: Handle
return stdin :: IO Handle
In fact, return is a polymorphic function. It's type is a -> m a where m can be IO, Maybe, [] and others. Don't confuse it with return in C - it's a normal function, not a keyword that is used to exit prematurely.
In your code, you can use record syntax. The following is equivalent and automatically declares srcFileHandle and srcFileLine as functions:
data SrcFile = SrcFile { srcFileHandle :: IO Handle,
srcFileLine :: String }
I don't quite get what you're trying to achieve.
An IO a means: An interaction with the outside world that, when run, will yield an a.
It therefore doesn't make sense to store an IO Handle in a data structure. You just store the handle and you can do IO with the handle, but for storing/loading it, you have no IO interaction involved.
Hence your structure is:
data SrcFile = SrcFile Handle String
If you want to change/add/manipulate the contents, you can use an IORef which you can use like a pointer from IO code.
Still quite new to Haskell..
I want to read the contents of a file, do something with it possibly involving IO (using putStrLn for now) and then write new contents to the same file.
I came up with:
doit :: String -> IO ()
doit file = do
contents <- withFile tagfile ReadMode $ \h -> hGetContents h
putStrLn contents
withFile tagfile WriteMode $ \h -> hPutStrLn h "new content"
However this doesn't work due to laziness. The file contents are not printed. I found this post which explains it well.
The solution proposed there is to include putStrLn within the withFile:
doit :: String -> IO ()
doit file = do
withFile tagfile ReadMode $ \h -> do
contents <- hGetContents h
putStrLn contents
withFile tagfile WriteMode $ \h -> hPutStrLn h "new content"
This works, but it's not what I want to do. The operation in I will eventually replace putStrLn might be long, I don't want to keep the file open the whole time. In general I just want to be able to get the file content out and then close it before working with that content.
The solution I came up with is the following:
doit :: String -> IO ()
doit file = do
c <- newIORef ""
withFile tagfile ReadMode $ \h -> do
a <- hGetContents h
writeIORef c $! a
d <- readIORef c
putStrLn d
withFile tagfile WriteMode $ \h -> hPutStrLn h "Test"
However, I find this long and a bit obfuscated. I don't think I should need an IORef just to get a value out, but I needed "place" to put the file contents. Also, it still didn't work without the strictness annotation $! for writeIORef. I guess IORefs are not strict by nature?
Can anyone recommend a better, shorter way to do this while keeping my desired semantics?
Thanks!
The reason your first program does not work is that withFile closes the file after executing the IO action passed to it. In your case, the IO action is hGetContents which does not read the file right away, but only as its contents are demanded. By the time you try to print the file's contents, withFile has already closed the file, so the read fails (silently).
You can fix this issue by not reinventing the wheel and simply using readFile and writeFile:
doit file = do
contents <- readFile file
putStrLn contents
writeFile file "new content"
But suppose you want the new content to depend on the old content. Then you cannot, generally, simply do
doit file = do
contents <- readFile file
writeFile file $ process contents
because the writeFile may affect what the readFile returns (remember, it has not actually read the file yet). Or, depending on your operating system, you might not be able to open the same file for reading and writing on two separate handles. The simple but ugly workaround is
doit file = do
contents <- readFile file
length contents `seq` (writeFile file $ process contents)
which will force readFile to read the entire file and close it before the writeFile action can begin.
I think the easiest way to solve this problem is useing strict IO:
import qualified System.IO.Strict as S
main = do
file <- S.readFile "filename"
writeFile "filename" file
You can duplicate the file Handle, do lazy write with original one (to the end of file) and lazy read with another. So no strictness annotation involved in case of appending to file.
import System.IO
import GHC.IO.Handle
main :: IO ()
main = do
h <- openFile "filename" ReadWriteMode
h2 <- hDuplicate h
hSeek h2 AbsoluteSeek 0
originalFileContents <- hGetContents h2
putStrLn originalFileContents
hSeek h SeekFromEnd 0
hPutStrLn h $ concatMap ("{new_contents}" ++) (lines originalFileContents)
hClose h2
hClose h
The hDuplicate function is provided by GHC.IO.Handle module.
Returns a duplicate of the original handle, with its own buffer. The two Handles will share a file pointer, however. The original handle's buffer is flushed, including discarding any input data, before the handle is duplicated.
With hSeek you can set position of the handle before reading or writing.
But I'm not sure how reliable would be using "AbsoluteSeek 0" instead of "SeekFromEnd 0" for writing, i.e. overwriting contents. Generally I would suggest to write to a temporary file first, for example using openTempFile (from System.IO), and then replace original.
It's ugly but you can force the contents to be read by asking for the length of the input and seq'ing it with the next statement in your do-block. But really the solution is to use a strict version of hGetContents. I'm not sure what it's called.