I've so far avoided ever needing unsafePerformIO, but this might have to change today.... I would like to see if the community agrees, or if someone has a better solution.
I have a library which needs to use some config data stored in a bunch of files. This data is guaranteed static (during the run), but needs to be in files that can (on very rare occasions) be edited by an end user who can not compile Haskell programs. (The details are uninportant, but think of "/etc/mime.types" as a pretty good approximation. It is a large almost static data file used throughout many programs).
If this weren't a library I would just use the IO monad.... But because it is a library which is called throughout my code, it literally forces a bubbling up of the IO monad through pretty much everything I have written in multiple modules! Although I need to do a one time read of the data files, this low level call is effetively pure, so this is a pretty unacceptable outcome.
FYI, I plan to also wrap the call in unsafeInterleaveIO, so that only files that are needed will be loaded. My code will look something like this....
dataDir="<path to files>"
datafiles::[FilePath]
datafiles =
unsafePerformIO $
unsafeInterleaveIO $
map (dataDir </>)
<$> filter (not . ("." `isPrefixOf`))
<$> getDirectoryContents dataDir
fileData::[String]
fileData = unsafePerformIO $ unsafeInterleaveIO $ sequence $ readFile <$> datafiles
Given that the data read is referentially transparent, I am pretty sure that unsafePerformIO is safe (this has been discussed in many place, such as "Use of unsafePerformIO appropriate?"). Still, though, if there is a better way, I would love to hear about it.
UPDATE-
In response to Anupam's comment....
There are two reasons why I can't break up the lib into IO and non IO parts.
First, the amount of data is large, and I don't want to read it all into memory at once. Remember that IO is always read strictly.... This is the reason that I need to put in the unsafeInterleaveIO call, to make it lazy. IMHO, once you use unsafeInterleaveIO, you might as well use unsafePerformIO, as the risk is already there.
Second, breaking out the IO specific parts just substitutes the bubbling up of the IO monad with the bubbling up of the IO read code, as well as the passing around of the data (I might actually choose to pass around the data using the state monad anyway, so it really isn't an improvement to substitute the IO monad for the state monad everywhere). This wouldn't be so bad if the low level function itself wasn't effectively pure (ie- think of my /etc/mime.types example above, and imagine a Haskell extensionToMimeType function, which is basically pure, but needs to get the database data from the file.... Suddenly everything from low to high in the stack needs to call or pass through a readMimeData::IO String. Why should each main even need to care about the library choice of a submodule many levels deep?).
I agree with Anupam Jain, you would be better off reading these data files at a somewhat higher level, in IO, and then passing the data in them through the rest of your program purely.
You could, for example, put the functions that need the results of fileData into Reader [String], so that they can just ask for the results as needed (or some Reader Config, where Config holds these strings and whatever else you need).
A sketch of what I'm suggesting follows:
type AppResult = String
fileData :: IO [String]
fileData = undefined -- read the files
myApp :: String -> Reader [String] AppResult
myApp s = do
files <- ask
return undefined -- do whatever with s and config
main = do
config <- fileData
return $ runReader (myApp "test") config
I gather that you don't want to read all the data at once, because that would be costly. And maybe you don't really know up-front what files you will need to load, so loading all of them at the start would be wasteful.
Here's an attempt at a solution. It requires you to work inside a free monad and relegate the side-effecting operations to an interpreter. Some preliminary imports:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import qualified Data.ByteString as B
import Data.Monoid
import Data.List
import Data.Functor.Compose
import Control.Applicative
import Control.Monad
import Control.Monad.Free
import System.IO
We define a functor for the free monad. It will offer a value p do the interpreter and continue the computation after receiving a value b:
type LazyLoad p b = Compose ((,) p) ((->) b)
A convenience function to request the loading of a file:
lazyLoad :: FilePath -> Free (LazyLoad FilePath B.ByteString) B.ByteString
lazyLoad path = liftF $ Compose (path,id)
A dummy interpreter function that reads "file contents" from stdin:
interpret :: Free (LazyLoad FilePath B.ByteString) a -> IO a
interpret = iterM $ \(Compose (path,next)) -> do
putStrLn $ "Enter the contents for file " <> path <> ":"
B.hGetLine stdin >>= next
Some silly example functions:
someComp :: B.ByteString -> B.ByteString
someComp b = "[" <> b <> "]"
takesAwhile :: Int
takesAwhile = foldl' (+) 0 $ take 400000000 $ intersperse (negate 1) $ repeat 1
An example program:
main :: IO ()
main = do
r <- interpret $ do
r1 <- someComp <$> lazyLoad "file1"
r2 <- return takesAwhile
if (r2 == 1)
then return r1
else someComp <$> lazyLoad "file2"
putStrLn . show $ r
When executed, this program will request a line, spend some time computing takesAwhile and only then request another line.
If want to allow different kinds of "requests", this solution could be extended with something like Data types à la carte so that each function only needs to know about about the precise effects it requires.
If you are content with allowing only one type of request, you could also use Clients and Servers from Pipes.Core instead of the free monad.
Related
Intention: Small application to learn Haskell: Downloads a wikipedia-article, then downloads all articles linked from it, then downloads all articles linked from them, and so on... until a specified recursion depth is reached. The result is saved to a file.
Approach: Use a StateT to keep track of the download queue, to download an article and to update the queue. I build a list IO [WArticle] recursively and then print it.
Problem: While profiling I find that total memory in use is proportional to number of articles downloaded.
Analysis: By literature I'm lead to believe this is a laziness and/or strictness issue. BangPatterns reduced the memory consumed but didn't solve proportionality. Furthermore, I know all articles are downloaded before the file output is started.
Possible solutions:
1) The function getNextNode :: StateT CrawlState IO WArticle (below) already has IO. One solution would be to just do the file writing in it and only return the state. It would mean the file is written to in very small chunks though. Doesn't feel very Haskell..
2) Have the function buildHelper :: CrawlState -> IO [WArticle] (below) return [IO WArticle]. Though I wouldn't know how to rewrite that code and have been advised against it in the comments.
Are any of these proposed solutions better than I think they are or are there better alternatives?
import GetArticle (WArticle, getArticle, wa_links, wiki2File) -- my own
type URL = Text
data CrawlState =
CrawlState ![URL] ![(URL, Int)]
-- [Completed] [(Queue, depth)]
-- Called by user
buildDB :: URL -> Int -> IO [WArticle]
buildDB startURL recursionDepth = buildHelper cs
where cs = CrawlState [] [(startURL, recursionDepth)]
-- Builds list recursively
buildHelper :: CrawlState -> IO [WArticle]
buildHelper !cs#(CrawlState _ queue) = {-# SCC "buildHelper" #-}
if null queue
then return []
else do
(!article, !cs') <- runStateT getNextNode cs
rest <- buildHelper cs'
return (article:rest)
-- State manipulation
getNextNode :: StateT CrawlState IO WArticle
getNextNode = {-# SCC "getNextNode" #-} do
CrawlState !parsed !queue#( (url, depth):queueTail ) <- get
article <- liftIO $ getArticle url
put $ CrawlState (url:parsed) (queueTail++ ( if depth > 1
then let !newUrls = wa_links article \\ parsed
!newUrls' = newUrls \\ map fst queue
in zip newUrls' (repeat (depth-1))
else []))
return article
startUrl = pack "https://en.wikipedia.org/wiki/Haskell_(programming_language)"
recursionDepth = 3
main :: IO ()
main = {-# SCC "DbMain" #-}
buildDB startUrl recursionDepth
>>= return . wiki2File
>>= writeFile "savedArticles.txt"
Full code at https://gitlab.com/mattias.br/sillyWikipediaSpider. Current version limited to only download the first eight links from each page to save time. Without changing it download 55 pages at ~600 MB heap usage.
Thanks for any help!
2) Is [IO WArticle] want I want in this case?
Not quite. The problem is that some of the IO WArticle actions depend on the result of a previous action: the links to future pages reside in previously obtained pages. [IO Warticle] can't provide that: it is pure in the sense that you can always find an action in the list without executing the previous actions.
What we need is a kind of "effectful list" that lets us extract articles one by one, progressively performing the neccessary effects, but not forcing us to completely generate the list in one go.
There are several libraries that provide these kinds of "effectful lists": streaming, pipes, conduit. They define monad transformers that extend a base monad with the ability to yield intermediate values before returning a final result. Usually the final result is of a type different from the values that are yielded; it might be simply unit ().
Note: The Functor, Applicative and Monad instances for these libraries differ from the corresponding instances for pure lists. The Functor instances map over the resulting final value, not over the intermediate values which are yielded. To map over the yielded values, they provide separate functions. And The Monad instances sequence effectful lists, instead of trying all combinations. To try all combinations, they provide separate functions.
Using the streaming library, we could modify buildHelper to something like this:
import Streaming
import qualified Streaming.Prelude as S
buildHelper :: CrawlState -> Stream (Of WArticle) IO ()
buildHelper !cs#(CrawlState _ queue) =
if null queue
then return []
else do (article, cs') <- liftIO (runStateT getNextNode cs)
S.yield article
buildHelper cs'
And then we could use functions like mapM_ (from Streaming.Prelude, not the one from Control.Monad!) to process the articles one by one, as they are generated.
Adding a further explaination and code building upon the answer of danidiaz. Here's the final code:
import Streaming
import qualified Streaming.Prelude as S
import System.IO (IOMode (WriteMode), hClose, openFile)
buildHelper :: CrawlState -> Stream (Of WArticle) IO ()
buildHelper cs#( CrawlState _ queue ) =
if null queue
then return ()
else do
(article, cs') <- liftIO (runStateT getNextNode cs)
S.yield article
buildHelper cs'
main :: IO ()
main = do outFileHandle <- openFile filename WriteMode
S.toHandle outFileHandle . S.show . buildHelper $
CrawlState [] [(startUrl, recursionDepth)]
hClose outFileHandle
outFileHandle is a usual file output handle.
S.toHandle takes a stream of String and writes them to the specified handle.
S.show maps show :: WArticle -> String over the stream.
An elegant solution that creates a lazy stream even though it is produced by a series of IO actions (namely downloading websites) and writes it to a file as results become available. On my machine it still uses a lot of memory (relative to the task) during execution but never exceeds 450 MB.
Consider the code below taken from a working example I've built to help me learn Haskell. This code parses a CSV file containing stock quotes downloaded from Yahoo into a nice simple list of bars with which I can then work.
My question: how can I write a function that will take a file name as its parameter and return an OHLCBarList so that the first four lines inside main can be properly encapsulated?
In other words, how can I implement (without getting all sorts of errors about IO stuff) the function whose type would be
getBarsFromFile :: Filename -> OHLCBarList
so that the grunt work that was being done in the first four lines of main can be properly encapsulated?
I've tried to do this myself but with my limited Haskell knowledge, I'm failing miserably.
import qualified Data.ByteString as BS
type Filename = String
getContentsOfFile :: Filename -> IO BS.ByteString
barParser :: Parser Bar
barParser = do
time <- timeParser
char ','
open <- double
char ','
high <- double
char ','
low <- double
char ','
close <- double
char ','
volume <- decimal
char ','
return $ Bar Bar1Day time open high low close volume
type OHLCBar = (UTCTime, Double, Double, Double, Double)
type OHLCBarList = [OHLCBar]
barsToBarList :: [Either String Bar] -> OHLCBarList
main :: IO ()
main = do
contents :: C.ByteString <- getContentsOfFile "PriceData/Daily/yhoo1.csv" --PriceData/Daily/Yhoo.csv"
let lineList :: [C.ByteString] = C.lines contents -- Break the contents into a list of lines
let bars :: [Either String Bar] = map (parseOnly barParser) lineList -- Using the attoparsec
let ohlcBarList :: OHLCBarList = barsToBarList bars -- Now I have a nice simple list of tuples with which to work
--- Now I can do simple operations like
print $ ohlcBarList !! 0
If you really want your function to have type Filename -> OHLCBarList, it can't be done.* Reading the contents of a file is an IO operation, and Haskell's IO monad is specifically designed so that values in the IO monad can never leave. If this restriction were broken, it would (in general) mess with a lot of things. Instead of doing this, you have two options: make the type of getBarsFromFile be Filename -> IO OHLCBarList — thus essentially copying the first four lines of main — or write a function with type C.ByteString -> OHLCBarList that the output of getContentsOfFile can be piped through to encapsulate lines 2 through 4 of main.
* Technically, it can be done, but you really, really, really shouldn't even try, especially if you're new to Haskell.
Others have explained that the correct type of your function has to be Filename -> IO OHLCBarList, I'd like to try and give you some insight as to why the compiler imposes this draconian measure on you.
Imperative programming is all about managing state: "do certain operations to certain bits of memory in sequence". When they grow large, procedural programs become brittle; we need a way of limiting the scope of state changes. OO programs encapsulate state in classes but the paradigm is not fundamentally different: you can call the same method twice and get different results. The output of the method depends on the (hidden) state of the object.
Functional programming goes all the way and bans mutable state entirely. A Haskell function, when called with certain inputs, will always produce the same output. Simple examples of
pure functions are mathematical operators like + and *, or most of the list-processing functions like map. Pure functions are all about the inputs and outputs, not managing internal state.
This allows the compiler to be very smart in optimising your program (for example, it can safely collapse duplicated code for you), and helps the programmer not to make mistakes: you can't put the system in an invalid state if there is none! We like pure functions.
The exception to the rule is IO. Code that performs IO is impure by definition: you could call getLine a hundred times and never get the same result, because it depends on what the user typed. Haskell handles this using the type system: all impure functions are marred with the IO type. IO can be thought of as a dependency on the state of the real world, sort of like World -> (NewWorld, a)
To summarise: pure functions are good because they are easy to reason about; this is why Haskell makes functions pure by default. Any impure code has to be labelled as such with an IO type signature; this tells the compiler and the reader to be careful with this function. So your function which reads from a file (a fundamentally impure action) but returns a pure value can't exist.
Addendum in response to your comment
You can still write pure functions to operate on data that was obtained impurely. Consider the following straw-man:
main :: IO ()
main = do
putStrLn "Enter the numbers you want me to process, separated by spaces"
line <- getLine
let numberStrings = words line
let numbers = map read numberStrings
putStrLn $ "The result of the calculation is " ++ (show $ foldr1 (*) numbers + 10)
Lots of code inside IO here. Let's extract some functions:
main :: IO ()
main = do
putStrLn "Enter the numbers you want me to process, separated by spaces"
result <- fmap processLine getLine -- fmap :: (a -> b) -> IO a -> IO b
-- runs an impure result through a pure function
-- without leaving IO
putStrLn $ "The result of the calculation is " ++ result
processLine :: String -> String -- look ma, no IO!
processLine = show . calculate . readNumbers
readNumbers :: String -> [Int]
readNumbers = map read . words
calculate :: [Int] -> Int
calculate numbers = product numbers + 10
product :: [Int] -> Int
product = foldr1 (*)
I've pulled logic out of main into pure functions which are easier to read, easier for the compiler to optimise, and more reusable (and so more testable). The program as a whole still lives inside IO because the data is obtained impurely (see the last part of this answer for a more thorough treatment of this argument). Impure data can be piped through pure functions using fmap and other combinators; you should try to put as little logic in main as possible.
Your code does seem to be most of the way there; as others have suggested you could extract lines 2-4 of your main into another function.
In other words, how can I implement (without getting all sorts of errors about IO stuff) the function whose type would be
getBarsFromFile :: Filename -> OHLCBarList
so that the grunt work that was being done in the first four lines of main can be properly encapsulated?
You cannot do this without getting all sorts of errors about IO stuff because this type for getBarsFromFile misses an IO. Probably that's what the errors about IO stuff are trying to tell you. Did you try understanding and fixing the errors?
In your situation, I would start by abstracting over the second to fourth line of your main in a function:
parseBars :: ByteString -> OHLCBarList
And then I would combine this function with getContentsOfFile to get:
getBarsFromFile :: FilePath -> IO OHLCBarList
This I would call in main.
Here's an example Haskell FRP program using the reactive-banana library. I'm only just starting to feel my way with Haskell, and especially haven't quite got my head around what FRP means. I'd really appreciate some critique of the code below
{-# LANGUAGE DeriveDataTypeable #-}
module Main where
{-
Example FRP/zeromq app.
The idea is that messages come into a zeromq socket in the form "id state". The state is of each id is tracked until it's complete.
-}
import Control.Monad
import Data.ByteString.Char8 as C (unpack)
import Data.Map as M
import Data.Maybe
import Reactive.Banana
import System.Environment (getArgs)
import System.ZMQ
data Msg = Msg {mid :: String, state :: String}
deriving (Show, Typeable)
type IdMap = Map String String
-- | Deserialize a string to a Maybe Msg
fromString :: String -> Maybe Msg
fromString s =
case words s of
(x:y:[]) -> Just $ Msg x y
_ -> Nothing
-- | Map a message to a partial operation on a map
-- If the 'state' of the message is "complete" the operation is a delete
-- otherwise it's an insert
toMap :: Msg -> IdMap -> IdMap
toMap msg = case msg of
Msg id_ "complete" -> delete id_
_ -> insert (mid msg) (state msg)
main :: IO ()
main = do
(socketHandle,runSocket) <- newAddHandler
args <- getArgs
let sockAddr = case args of
[s] -> s
_ -> "tcp://127.0.0.1:9999"
putStrLn ("Socket: " ++ sockAddr)
network <- compile $ do
recvd <- fromAddHandler socketHandle
let
-- Filter out the Nothings
justs = filterE isJust recvd
-- Accumulate the partially applied toMap operations
counter = accumE M.empty $ (toMap . fromJust <$> justs)
-- Print the contents
reactimate $ fmap print counter
actuate network
-- Get a socket and kick off the eventloop
withContext 1 $ \ctx ->
withSocket ctx Sub $ \sub -> do
connect sub sockAddr
subscribe sub ""
linkSocketHandler sub runSocket
-- | Recieve a message, deserialize it to a 'Msg' and call the action with the message
linkSocketHandler :: Socket a -> (Maybe Msg -> IO ()) -> IO ()
linkSocketHandler s runner = forever $ do
receive s [] >>= runner . fromString . C.unpack
There's a gist here: https://gist.github.com/1099712.
I'd particularly welcome any comments around whether this is a "good" use of accumE, (I'm unclear of this function will traverse the whole event stream each time although I'm guessing not).
Also I'd like to know how one would go about pulling in messages from multiple sockets - at the moment I have one event loop inside a forever. As a concrete example of this how would I add second socket (a REQ/REP pair in zeromq parlance) to query to the current state of the IdMap inside counter?
(Author of reactive-banana speaking.)
Overall, your code looks fine to me. I don't actually understand why you are using reactive-banana in the first place, but you'll have your reasons. That said, if you are looking for something like Node.js, remember that Haskell's leightweight threads make it unnecessary to use an event-based architecture.
Addendum: Basically, functional reactive programming is useful when you have a variety of different inputs, states and output that must work together with just the right timing (think GUIs, animations, audio). In contrast, it's overkill when you are dealing with many essentially independent events; these are best handled with ordinary functions and the occasional state.
Concerning the individual questions:
"I'd particularly welcome any comments around whether this is a "good" use of accumE, (I'm unclear of this function will traverse the whole event stream each time although I'm guessing not)."
Looks fine to me. As you guessed, the accumE function is indeed real-time; it will only store the current accumulated value.
Judging from your guess, you seem to be thinking that whenever a new event comes in, it will travel through the network like a firefly. While this does happen internally, it is not how you should think about functional reactive programming. Rather, the right picture is this: the result of fromAddHandler is the complete list of input events as they will happen. In other words, you should think that recvd contains the ordered list of each and every event from the future. (Of course, in the interest of your own sanity, you shouldn't try to look at them before their time has come. ;-)) The accumE function simply transforms one list into another by traversing it once.
I will need to make this way of thinking more clear in the documentation.
"Also I'd like to know how one would go about pulling in messages from multiple sockets - at the moment I have on event loop inside a forever. As a concrete example of this how would I add second socket (a REQ/REP pair in zeromq parlance) to query to the current state of the IdMap inside counter?"
If the receive function does not block, you can simply call it twice on different sockets
linkSocketHandler s1 s2 runner1 runner2 = forever $ do
receive s1 [] >>= runner1 . fromString . C.unpack
receive s2 [] >>= runner2 . fromString . C.unpack
If it does block, you will need to use threads, see also the section Handling Multiple TCP Streams in the book Real World Haskell. (Feel free to ask a new question on this, as it is outside the scope of this one.)
Earlier today I wrote a small test app for iteratees that composed an iteratee for writing progress with an iteratee for actually copying data. I wound up with values like these:
-- NOTE: this snippet is with iteratees-0.8.5.0
-- side effect: display progress on stdout
displayProgress :: Iteratee ByteString IO ()
-- side effect: copy the bytestrings of Iteratee to Handle
fileSink :: Handle -> Iteratee ByteString IO ()
writeAndDisplayProgress :: Handle -> Iteratee ByteString IO ()
writeAndDisplayProgress handle = sequence_ [fileSink handle, displayProgress]
In looking at the enumerator library, I don't see an analog of sequence_ or enumWith. All I want to do is compose two iteratees so they act as one. I could discard the result (it's going to be () anyway) or keep it, I don't care. (&&&) from Control.Arrow is what I want, only for iteratees rather than arrows.
I tried these two options:
-- NOTE: this snippet is with enumerator-0.4.10
run_ $ enumFile source $$ sequence_ [iterHandle handle, displayProgress]
run_ $ enumFile source $$ sequence_ [displayProgress, iterHandle handle]
The first one copies the file, but doesn't show progress; the second one shows progress, but doesn't copy the file, so obviously the effect of the built-in sequence_ on enumerator's iteratees is to run the first iteratee until it terminates and then run the other, which is not what I want. I want to be running the iteratees in parallel rather than serially. I feel like I'm missing something obvious, but in reading the wc example for the enumerator library, I see this curious comment:
-- Exactly matching wc's output is too annoying, so this example
-- will just print one line per file, and support counting at most
-- one statistic per run
I wonder if this remark indicates that combining or composing iteratees within the enumerations framework isn't possible out of the box. What's the generally-accepted right way to do this?
Edit:
It seems as though there is no built-in way to do this. There's discussion on the Haskell mailing list about adding combinators like enumSequence and manyToOne but so far, there doesn't seem to be anything actually in the enumerator package that furnishes this capability.
It seems to me like rather than trying to have two Iteratees consume the sequence in parallel, it would be better to feed the stream through an identity Enumeratee that simply counts the bytes passing it.
Here's a simple example that copies a file and prints the number of bytes copied after each chunk.
import System.Environment
import System.IO
import Data.Enumerator
import Data.Enumerator.Binary (enumFile, iterHandle)
import Data.Enumerator.List (mapAccumM)
import qualified Data.ByteString as B
printBytes :: Enumeratee B.ByteString B.ByteString IO ()
printBytes = flip mapAccumM 0 $ \total bytes -> do
let total' = total + B.length bytes
print total'
return (total', bytes)
copyFile s t = withBinaryFile t WriteMode $ \h -> do
run_ $ (enumFile s $= printBytes) $$ iterHandle h
main = do
[source, target] <- getArgs
copyFile source target
this may be a stupid question but i couldn't find answer anywhere. I'm a Haskell newbie and i'm having trouble with I/O.
I have this structure:
data SrcFile = SrcFile (IO Handle) String
srcFileHandle :: SrcFile -> IO Handle
srcFileHandle (SrcFile handle _) = handle
srcFileLine :: SrcFile -> String
srcFileLine (SrcFile _ string) = string
Now the problem is that i have no idea how to assign stdin/stderr/stdout into it, because the stdin etc are Handlers, no IO Handlers. And if i make the structure have Handle attributes insted of IO Handle, then i won't be able to add any other file handles into it.
Judging from your definition of SrcFile, it seems as though you may be trying to write a C program in Haskell. Language shapes the way we think, and the good news is Haskell is a much more powerful language!
The excellent book Real World Haskell has a section on lazy I/O. Consider an excerpt:
One novel way to approach I/O is the hGetContents function. hGetContents has the type Handle -> IO String. The String it returns represents all of the data in the file given by the Handle.
In a strictly-evaluated language, using such a function is often a bad idea. It may be fine to read the entire contents of a 2KB file, but if you try to read the entire contents of a 500GB file, you are likely to crash due to lack of RAM to store all that data. In these languages, you would traditionally use mechanisms such as loops to process the file's entire data.
Here's the radical part.
But hGetContents is different. The String it returns is evaluated lazily. At the moment you call hGetContents, nothing is actually read. Data is only read from the Handle as the elements (characters) of the list are processed. As elements of the String are no longer used, Haskell's garbage collector automatically frees that memory. All of this happens completely transparently to you. And since you have what looks like—and, really, is—a pure String, you can pass it to pure (non-IO) code.
Further down is a section on readFile and writeFile that shows you how to forget about handles entirely.
For example, say you want to grab all the import lines from a source file:
module Main where
import Control.Monad (liftM, mapM_)
import Data.List (isPrefixOf)
import System.Environment (getArgs, getProgName)
import System.IO (hPutStrLn, stderr)
main :: IO ()
main = getArgs >>= go
where go [path] = collectImports `liftM` readFile path >>= mapM_ putStrLn
go _ = getProgName >>=
hPutStrLn stderr . ("Usage: " ++) . (++ " source-file")
collectImports :: String -> [String]
collectImports = filter ("import" `isPrefixOf`)
. takeWhile (\l -> null l
|| "module" `isPrefixOf` l
|| "import" `isPrefixOf` l)
. lines
Even though the definition of main uses readFile, the program reads only as much of the named source-file as necessary, not the whole thing! There's nothing magic going on: note that collectImports uses takeWhile to examine only those lines it needs to rather than, say, filter that would have to read all lines.
When fed its own source, the program outputs
import Control.Monad (liftM, mapM_)
import Data.List (isPrefixOf)
import System.Environment (getArgs, getProgName)
import System.IO (hPutStrLn, stderr)
So embrace laziness. Laziness is your friend! Enjoy the rest of the wonderful journey with Haskell.
I'm not sure what you're really attempting to do, but you can convert a Handle to IO Handle by using return function. So,
stdin :: Handle
return stdin :: IO Handle
In fact, return is a polymorphic function. It's type is a -> m a where m can be IO, Maybe, [] and others. Don't confuse it with return in C - it's a normal function, not a keyword that is used to exit prematurely.
In your code, you can use record syntax. The following is equivalent and automatically declares srcFileHandle and srcFileLine as functions:
data SrcFile = SrcFile { srcFileHandle :: IO Handle,
srcFileLine :: String }
I don't quite get what you're trying to achieve.
An IO a means: An interaction with the outside world that, when run, will yield an a.
It therefore doesn't make sense to store an IO Handle in a data structure. You just store the handle and you can do IO with the handle, but for storing/loading it, you have no IO interaction involved.
Hence your structure is:
data SrcFile = SrcFile Handle String
If you want to change/add/manipulate the contents, you can use an IORef which you can use like a pointer from IO code.