What is the right way of doing this in Haskell?
if exists "foo.txt" then delete "foo.txt"
doSomethingElse
So far I have:
import System.Directory
main = do
filename <- getFileNameSomehow
fileExists <- doesFileExist filename
if fileExists
then removeFile filename
???
doSomethingElse
You would be better off removing the file and simply recovering if it does not exist:
import Prelude hiding (catch)
import System.Directory
import Control.Exception
import System.IO.Error hiding (catch)
removeIfExists :: FilePath -> IO ()
removeIfExists fileName = removeFile fileName `catch` handleExists
where handleExists e
| isDoesNotExistError e = return ()
| otherwise = throwIO e
This avoids the race condition of someone deleting the file between your code checking whether it exists and deletes it. It might not matter in your case, but it's good practice anyway.
Note the import Prelude hiding (catch) line — this is because the Prelude contains older functions from exception handling which are now deprecated in favour of Control.Exception, which also has a function named catch; the import line simply hides the Prelude's catch in favour of Control.Exception's.
However, that still leaves your more fundamental underlying question: how do you write conditionals in IO?
Well, in this case, it would suffice to simply do
when fileExists $ removeFile filename
(using Control.Monad.when). But it's helpful here, as it usually is in Haskell, to look at the types.
Both branches of a conditional must have the same type. So to fill in
if fileExists
then removeFile filename
else ???
we should look at the type of removeFile filename; whatever ??? is, it has to have the same type.
System.Directory.removeFile has the type FilePath -> IO (), so removeFile filename has the type IO (). So what we want is an IO action with a result of type () that does nothing.
Well, the purpose of return is to construct an action that has no effects, and just returns a constant value, and return () has the right type for this: IO () (or more generally, (Monad m) => m ()). So ??? is return () (which you can see I used in my improved snippet above, to do nothing when removeFile fails because the file doesn't exist).
(By the way, you should now be able to implement when with the help of return (); it's really simple :))
Don't worry if you find it hard to get into the Haskell way of things at first — it'll come naturally in time, and when it does, it's very rewarding. :)
(Note: ehird's answer makes a very good point regarding a race condition. It should be kept in mind when reading my answer, which ignores the issue. Do note also that the imperative pseudo-code presented in the question also suffers from the same problem.)
What defines the filename? Is it given in the program, or supplied by the user? In your imperative pseudo-code, it's a constant string in the program. I'll assume you want the user to supply it by passing it as the first command line argument to the program.
Then I suggest something like this:
import Control.Monad
import System.Directory
import System.Environment
doSomethingElse :: IO ()
main = do
args <- getArgs
fileExists <- doesFileExist (head args)
when fileExists (removeFile (head args))
doSomethingElse
(As you can see, I added the type signature of doSomethingElse to avoid confusion).
I import System.Environment for the getArgs function. In case the file in question is simply given by a constant string (such as in your imperative pseudo-code), just remove all the args stuff and fill in the constant string wherever I have head args.
Control.Monad is imported to get the when function. Note that this useful function is not a keyword (like if), but an ordinary function. Let's look at its type:
when :: Monad m => Bool -> m () -> m ()
In your case m is IO, so you can think of when as a function that takes a Bool and an IO action and performs the action only if the Bool is True. Of course you could solve your problem with ifs, but in your case when reads a lot clearer. At least I think so.
Addendum: If you, like I did at first, get the feeling that when is some magical and difficult machinery, it's very instructive to try to define the function yourself. I promise you, it's dead simple...
Related
I've so far avoided ever needing unsafePerformIO, but this might have to change today.... I would like to see if the community agrees, or if someone has a better solution.
I have a library which needs to use some config data stored in a bunch of files. This data is guaranteed static (during the run), but needs to be in files that can (on very rare occasions) be edited by an end user who can not compile Haskell programs. (The details are uninportant, but think of "/etc/mime.types" as a pretty good approximation. It is a large almost static data file used throughout many programs).
If this weren't a library I would just use the IO monad.... But because it is a library which is called throughout my code, it literally forces a bubbling up of the IO monad through pretty much everything I have written in multiple modules! Although I need to do a one time read of the data files, this low level call is effetively pure, so this is a pretty unacceptable outcome.
FYI, I plan to also wrap the call in unsafeInterleaveIO, so that only files that are needed will be loaded. My code will look something like this....
dataDir="<path to files>"
datafiles::[FilePath]
datafiles =
unsafePerformIO $
unsafeInterleaveIO $
map (dataDir </>)
<$> filter (not . ("." `isPrefixOf`))
<$> getDirectoryContents dataDir
fileData::[String]
fileData = unsafePerformIO $ unsafeInterleaveIO $ sequence $ readFile <$> datafiles
Given that the data read is referentially transparent, I am pretty sure that unsafePerformIO is safe (this has been discussed in many place, such as "Use of unsafePerformIO appropriate?"). Still, though, if there is a better way, I would love to hear about it.
UPDATE-
In response to Anupam's comment....
There are two reasons why I can't break up the lib into IO and non IO parts.
First, the amount of data is large, and I don't want to read it all into memory at once. Remember that IO is always read strictly.... This is the reason that I need to put in the unsafeInterleaveIO call, to make it lazy. IMHO, once you use unsafeInterleaveIO, you might as well use unsafePerformIO, as the risk is already there.
Second, breaking out the IO specific parts just substitutes the bubbling up of the IO monad with the bubbling up of the IO read code, as well as the passing around of the data (I might actually choose to pass around the data using the state monad anyway, so it really isn't an improvement to substitute the IO monad for the state monad everywhere). This wouldn't be so bad if the low level function itself wasn't effectively pure (ie- think of my /etc/mime.types example above, and imagine a Haskell extensionToMimeType function, which is basically pure, but needs to get the database data from the file.... Suddenly everything from low to high in the stack needs to call or pass through a readMimeData::IO String. Why should each main even need to care about the library choice of a submodule many levels deep?).
I agree with Anupam Jain, you would be better off reading these data files at a somewhat higher level, in IO, and then passing the data in them through the rest of your program purely.
You could, for example, put the functions that need the results of fileData into Reader [String], so that they can just ask for the results as needed (or some Reader Config, where Config holds these strings and whatever else you need).
A sketch of what I'm suggesting follows:
type AppResult = String
fileData :: IO [String]
fileData = undefined -- read the files
myApp :: String -> Reader [String] AppResult
myApp s = do
files <- ask
return undefined -- do whatever with s and config
main = do
config <- fileData
return $ runReader (myApp "test") config
I gather that you don't want to read all the data at once, because that would be costly. And maybe you don't really know up-front what files you will need to load, so loading all of them at the start would be wasteful.
Here's an attempt at a solution. It requires you to work inside a free monad and relegate the side-effecting operations to an interpreter. Some preliminary imports:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import qualified Data.ByteString as B
import Data.Monoid
import Data.List
import Data.Functor.Compose
import Control.Applicative
import Control.Monad
import Control.Monad.Free
import System.IO
We define a functor for the free monad. It will offer a value p do the interpreter and continue the computation after receiving a value b:
type LazyLoad p b = Compose ((,) p) ((->) b)
A convenience function to request the loading of a file:
lazyLoad :: FilePath -> Free (LazyLoad FilePath B.ByteString) B.ByteString
lazyLoad path = liftF $ Compose (path,id)
A dummy interpreter function that reads "file contents" from stdin:
interpret :: Free (LazyLoad FilePath B.ByteString) a -> IO a
interpret = iterM $ \(Compose (path,next)) -> do
putStrLn $ "Enter the contents for file " <> path <> ":"
B.hGetLine stdin >>= next
Some silly example functions:
someComp :: B.ByteString -> B.ByteString
someComp b = "[" <> b <> "]"
takesAwhile :: Int
takesAwhile = foldl' (+) 0 $ take 400000000 $ intersperse (negate 1) $ repeat 1
An example program:
main :: IO ()
main = do
r <- interpret $ do
r1 <- someComp <$> lazyLoad "file1"
r2 <- return takesAwhile
if (r2 == 1)
then return r1
else someComp <$> lazyLoad "file2"
putStrLn . show $ r
When executed, this program will request a line, spend some time computing takesAwhile and only then request another line.
If want to allow different kinds of "requests", this solution could be extended with something like Data types à la carte so that each function only needs to know about about the precise effects it requires.
If you are content with allowing only one type of request, you could also use Clients and Servers from Pipes.Core instead of the free monad.
I' ve got a problem with Haskell. I have text file looking like this:
5.
7.
[(1,2,3),(4,5,6),(7,8,9),(10,11,12)].
I haven't any idea how can I get the first 2 numbers (2 and 7 above) and the list from the last line. There are dots on the end of each line.
I tried to build a parser, but function called 'readFile' return the Monad called IO String. I don't know how can I get information from that type of string.
I prefer work on a array of chars. Maybe there is a function which can convert from 'IO String' to [Char]?
I think you have a fundamental misunderstanding about IO in Haskell. Particularly, you say this:
Maybe there is a function which can convert from 'IO String' to [Char]?
No, there isn't1, and the fact that there is no such function is one of the most important things about Haskell.
Haskell is a very principled language. It tries to maintain a distinction between "pure" functions (which don't have any side-effects, and always return the same result when give the same input) and "impure" functions (which have side effects like reading from files, printing to the screen, writing to disk etc). The rules are:
You can use a pure function anywhere (in other pure functions, or in impure functions)
You can only use impure functions inside other impure functions.
The way that code is marked as pure or impure is using the type system. When you see a function signature like
digitToInt :: String -> Int
you know that this function is pure. If you give it a String it will return an Int and moreover it will always return the same Int if you give it the same String. On the other hand, a function signature like
getLine :: IO String
is impure, because the return type of String is marked with IO. Obviously getLine (which reads a line of user input) will not always return the same String, because it depends on what the user types in. You can't use this function in pure code, because adding even the smallest bit of impurity will pollute the pure code. Once you go IO you can never go back.
You can think of IO as a wrapper. When you see a particular type, for example, x :: IO String, you should interpret that to mean "x is an action that, when performed, does some arbitrary I/O and then returns something of type String" (note that in Haskell, String and [Char] are exactly the same thing).
So how do you ever get access to the values from an IO action? Fortunately, the type of the function main is IO () (it's an action that does some I/O and returns (), which is the same as returning nothing). So you can always use your IO functions inside main. When you execute a Haskell program, what you are doing is running the main function, which causes all the I/O in the program definition to actually be executed - for example, you can read and write from files, ask the user for input, write to stdout etc etc.
You can think of structuring a Haskell program like this:
All code that does I/O gets the IO tag (basically, you put it in a do block)
Code that doesn't need to perform I/O doesn't need to be in a do block - these are the "pure" functions.
Your main function sequences together the I/O actions you've defined in an order that makes the program do what you want it to do (interspersed with the pure functions wherever you like).
When you run main, you cause all of those I/O actions to be executed.
So, given all that, how do you write your program? Well, the function
readFile :: FilePath -> IO String
reads a file as a String. So we can use that to get the contents of the file. The function
lines:: String -> [String]
splits a String on newlines, so now you have a list of Strings, each corresponding to one line of the file. The function
init :: [a] -> [a]
Drops the last element from a list (this will get rid of the final . on each line). The function
read :: (Read a) => String -> a
takes a String and turns it into an arbitrary Haskell data type, such as Int or Bool. Combining these functions sensibly will give you your program.
Note that the only time you actually need to do any I/O is when you are reading the file. Therefore that is the only part of the program that needs to use the IO tag. The rest of the program can be written "purely".
It sounds like what you need is the article The IO Monad For People Who Simply Don't Care, which should explain a lot of your questions. Don't be scared by the term "monad" - you don't need to understand what a monad is to write Haskell programs (notice that this paragraph is the only one in my answer that uses the word "monad", although admittedly I have used it four times now...)
Here's the program that (I think) you want to write
run :: IO (Int, Int, [(Int,Int,Int)])
run = do
contents <- readFile "text.txt" -- use '<-' here so that 'contents' is a String
let [a,b,c] = lines contents -- split on newlines
let firstLine = read (init a) -- 'init' drops the trailing period
let secondLine = read (init b)
let thirdLine = read (init c) -- this reads a list of Int-tuples
return (firstLine, secondLine, thirdLine)
To answer npfedwards comment about applying lines to the output of readFile text.txt, you need to realize that readFile text.txt gives you an IO String, and it's only when you bind it to a variable (using contents <-) that you get access to the underlying String, so that you can apply lines to it.
Remember: once you go IO, you never go back.
1 I am deliberately ignoring unsafePerformIO because, as implied by the name, it is very unsafe! Don't ever use it unless you really know what you are doing.
As a programming noob, I too was confused by IOs. Just remember that if you go IO you never come out. Chris wrote a great explanation on why. I just thought it might help to give some examples on how to use IO String in a monad. I'll use getLine which reads user input and returns an IO String.
line <- getLine
All this does is bind the user input from getLine to a value named line. If you type this this in ghci, and type :type line it will return:
:type line
line :: String
But wait! getLine returns an IO String
:type getLine
getLine :: IO String
So what happened to the IOness from getLine? <- is what happened. <- is your IO friend. It allows you to bring out the value that is tainted by the IO within a monad and use it with your normal functions. Monads are easily identified because they begin with do. Like so:
main = do
putStrLn "How much do you love Haskell?"
amount <- getLine
putStrln ("You love Haskell this much: " ++ amount)
If you're like me, you'll soon discover that liftIO is your next best monad friend, and that $ help reduce the number of parenthesis you need to write.
So how do you get the information from readFile? Well if readFile's output is IO String like so:
:type readFile
readFile :: FilePath -> IO String
Then all you need is your friendly <-:
yourdata <- readFile "samplefile.txt"
Now if type that in ghci and check the type of yourdata you'll notice it's a simple String.
:type yourdata
text :: String
As people already say, if you have two functions, one is readStringFromFile :: FilePath -> IO String, and another is doTheRightThingWithString :: String -> Something, then you don't really need to escape a string from IO, since you can combine this two functions in various ways:
With fmap for IO (IO is Functor):
fmap doTheRightThingWithString readStringFromFile
With (<$>) for IO (IO is Applicative and (<$>) == fmap):
import Control.Applicative
...
doTheRightThingWithString <$> readStringFromFile
With liftM for IO (liftM == fmap):
import Control.Monad
...
liftM doTheRightThingWithString readStringFromFile
With (>>=) for IO (IO is Monad, fmap == (<$>) == liftM == \f m -> m >>= return . f):
readStringFromFile >>= \string -> return (doTheRightThingWithString string)
readStringFromFile >>= \string -> return $ doTheRightThingWithString string
readStringFromFile >>= return . doTheRightThingWithString
return . doTheRightThingWithString =<< readStringFromFile
With do notation:
do
...
string <- readStringFromFile
-- ^ you escape String from IO but only inside this do-block
let result = doTheRightThingWithString string
...
return result
Every time you will get IO Something.
Why you would want to do it like that? Well, with this you will have pure and
referentially transparent programs (functions) in your language. This means that every function which type is IO-free is pure and referentially transparent, so that for the same arguments it will returns the same values. For example, doTheRightThingWithString would return the same Something for the same String. However readStringFromFile which is not IO-free can return different strings every time (because file can change), so that you can't escape such unpure value from IO.
If you have a parser of this type:
myParser :: String -> Foo
and you read the file using
readFile "thisfile.txt"
then you can read and parse the file using
fmap myParser (readFile "thisfile.txt")
The result of that will have type IO Foo.
The fmap means myParser runs "inside" the IO.
Another way to think of it is that whereas myParser :: String -> Foo, fmap myParser :: IO String -> IO Foo.
just curious how to rewrite the following function to be called only once during program's lifetime ?
getHeader :: FilePath -> IO String
getHeader fn = readFile fn >>= return . take 13
Above function is called several times from various functions.
How to prevent reopening of the file if function gets called with the same parameter, ie. file name ?
I would encourage you to seek a more functional solution, for example by loading the headers you need up front and passing them around in some data structure like for example a Map. If explicitly passing it around is inconvenient, you can use a Reader or State monad transformer to handle that for you.
That said, you can accomplish this the way you wanted using by using unsafePerformIO to create a global mutable reference to hold your data structure.
import Control.Concurrent.MVar
import qualified Data.Map as Map
import System.IO.Unsafe (unsafePerformIO)
memo :: MVar (Map.Map FilePath String)
memo = unsafePerformIO (newMVar Map.empty)
{-# NOINLINE memo #-}
getHeader :: FilePath -> IO String
getHeader fn = modifyMVar memo $ \m -> do
case Map.lookup fn m of
Just header -> return (m, header)
Nothing -> do header <- take 13 `fmap` readFile fn
return (Map.insert fn header m, header)
I used an MVar here for thread safety. If you don't need that, you might be able to get away with using an IORef instead.
Also, note the NOINLINE pragma on memo to ensure that the reference is only created once. Without this, the compiler might inline it into getHeader, giving you a new reference each time.
The simplest thing is to just call it once at the beginning of main and pass the resulting String around to all the other functions that need it:
main = do
header <- getHeader
bigOldThingOne header
bigOldThingTwo header
You can use monad-memo package to wrap any monad into MemoT transformer. The memo table will be passed implicitly thoughout your monadic functions. Then use startEvalMemoT to convert memoized monad into ordinary IO:
{-# LANGUAGE NoMonomorphismRestriction #-}
import Control.Monad.Memo
getHeader :: FilePath -> IO String
getHeader fn = readFile fn >>= return . take 13
-- | 'memoized' version of getHeader
getHeaderm :: FilePath -> MemoT String String IO String
getHeaderm fn = memo (lift . getHeader) fn
-- | 'memoized' version of Prelude.print
printm a = memo (lift . print) a
-- | This will not print the last "Hello"
test = do
printm "Hello"
printm "World"
printm "Hello"
main :: IO ()
main = startEvalMemoT test
You should not use unsafePerformIO to solve this. The correct way to do exactly what you describe is to create an IORef that holds a Maybe, initially containing Nothing. Then you create an IO function which checks the value, and performs the computation if it is Nothing and stores the result as a Just. If it finds a Just it reuses the value.
All of this requires passing around the IORef reference, which is just as cumbersome as passing around the string itself, which is why everybody directly recommends just passing around the string itself, either explicitly or implicitly using the Reader monad.
There are incredibly few legitimate uses for unsafePerformIO and this is not one of them. Don't go down that path, otherwise you'll find yourself fighting Haskell when it keeps doing unexpected things. Every solution that uses unsafePerformIO as a "clever trick" always ends catastrophically (and that includes readFile).
Side note - you can simplify your getHeader function:
getHeader path = fmap (take 13) (readFile path)
Or
getHeader path = take 13 <$> readFile path
this may be a stupid question but i couldn't find answer anywhere. I'm a Haskell newbie and i'm having trouble with I/O.
I have this structure:
data SrcFile = SrcFile (IO Handle) String
srcFileHandle :: SrcFile -> IO Handle
srcFileHandle (SrcFile handle _) = handle
srcFileLine :: SrcFile -> String
srcFileLine (SrcFile _ string) = string
Now the problem is that i have no idea how to assign stdin/stderr/stdout into it, because the stdin etc are Handlers, no IO Handlers. And if i make the structure have Handle attributes insted of IO Handle, then i won't be able to add any other file handles into it.
Judging from your definition of SrcFile, it seems as though you may be trying to write a C program in Haskell. Language shapes the way we think, and the good news is Haskell is a much more powerful language!
The excellent book Real World Haskell has a section on lazy I/O. Consider an excerpt:
One novel way to approach I/O is the hGetContents function. hGetContents has the type Handle -> IO String. The String it returns represents all of the data in the file given by the Handle.
In a strictly-evaluated language, using such a function is often a bad idea. It may be fine to read the entire contents of a 2KB file, but if you try to read the entire contents of a 500GB file, you are likely to crash due to lack of RAM to store all that data. In these languages, you would traditionally use mechanisms such as loops to process the file's entire data.
Here's the radical part.
But hGetContents is different. The String it returns is evaluated lazily. At the moment you call hGetContents, nothing is actually read. Data is only read from the Handle as the elements (characters) of the list are processed. As elements of the String are no longer used, Haskell's garbage collector automatically frees that memory. All of this happens completely transparently to you. And since you have what looks like—and, really, is—a pure String, you can pass it to pure (non-IO) code.
Further down is a section on readFile and writeFile that shows you how to forget about handles entirely.
For example, say you want to grab all the import lines from a source file:
module Main where
import Control.Monad (liftM, mapM_)
import Data.List (isPrefixOf)
import System.Environment (getArgs, getProgName)
import System.IO (hPutStrLn, stderr)
main :: IO ()
main = getArgs >>= go
where go [path] = collectImports `liftM` readFile path >>= mapM_ putStrLn
go _ = getProgName >>=
hPutStrLn stderr . ("Usage: " ++) . (++ " source-file")
collectImports :: String -> [String]
collectImports = filter ("import" `isPrefixOf`)
. takeWhile (\l -> null l
|| "module" `isPrefixOf` l
|| "import" `isPrefixOf` l)
. lines
Even though the definition of main uses readFile, the program reads only as much of the named source-file as necessary, not the whole thing! There's nothing magic going on: note that collectImports uses takeWhile to examine only those lines it needs to rather than, say, filter that would have to read all lines.
When fed its own source, the program outputs
import Control.Monad (liftM, mapM_)
import Data.List (isPrefixOf)
import System.Environment (getArgs, getProgName)
import System.IO (hPutStrLn, stderr)
So embrace laziness. Laziness is your friend! Enjoy the rest of the wonderful journey with Haskell.
I'm not sure what you're really attempting to do, but you can convert a Handle to IO Handle by using return function. So,
stdin :: Handle
return stdin :: IO Handle
In fact, return is a polymorphic function. It's type is a -> m a where m can be IO, Maybe, [] and others. Don't confuse it with return in C - it's a normal function, not a keyword that is used to exit prematurely.
In your code, you can use record syntax. The following is equivalent and automatically declares srcFileHandle and srcFileLine as functions:
data SrcFile = SrcFile { srcFileHandle :: IO Handle,
srcFileLine :: String }
I don't quite get what you're trying to achieve.
An IO a means: An interaction with the outside world that, when run, will yield an a.
It therefore doesn't make sense to store an IO Handle in a data structure. You just store the handle and you can do IO with the handle, but for storing/loading it, you have no IO interaction involved.
Hence your structure is:
data SrcFile = SrcFile Handle String
If you want to change/add/manipulate the contents, you can use an IORef which you can use like a pointer from IO code.
I have a function that returns type ErrorT String IO (). While the function works, liftIO's litter every line that does IO. It makes for a mess. Is there any way to get around this and still have the ability to abort on error?
I assume this is the context of the question, so I'll repost the comment I left over there in case you didn't notice it:
If you use a few particular functions a lot, you could write a wrapper around them, e.g. liftedPutStr = liftIO . putStr. You could even import the originals qualified and make your lifted version use the same name, if you wanted. Also, a group of IO actions that won't raise errors can be pulled out into a single, separate function that can then be liftIOd just once. Does that help?
In case you're not familiar with qualified imports, here's putStr again as an example:
import Prelude hiding (putStr)
import qualified Prelude as P
import Control.Monad.Trans
putStr x = liftIO $ P.putStr x
That should let you use the altered putStr in a transformed IO the same way you'd normally use the real putStr in plain IO.