Why does the main function in Haskell not have any parameters? - haskell

I'm somewhat new to Haskell. I've worked through one ore two tutorials, but I don't have much experience.
Every function in Haskell is pure, this is, why we can't have any I/O without the IO-monad.
What I don't understand is, why do the program parameters have to be an IO action as well?
The parameters are passed to the program like to a function.
Why can't the parameters be accessed like in a function?
To make things clear, I don't understand, why the main function has to look like this
main :: IO()
main = do
args <- getArgs
print args
Instead of this
main :: [String] -> IO()
main args = do
print args
I can't see any reason for it, and I haven't found an answer googling around.

It's a language design choice. Neither approach is strictly better than the other one.
Haskell could have been designed to have a main of either kind.
When one does need the program arguments, it would be more convenient to have them passed as function arguments to main.
When one does not need the program arguments, having them passed to main is slightly cumbersome, since we need to write a longer type, and an additional _ to discard the [String] argument.
Further, getArgs lets one access the program arguments anywhere in the program (inside IO), while having them passed to main, only, can be less convenient since one would then be forced to pass them around in the program, which can be inconvenient.
(Short digression) For what it's worth, I had a similar reaction to yours a long time ago when I discovered that in Java we have void main() instead of int main() as in C. Then I realized that in most programs I always wrote return 0; at the end, so it makes little sense to always require that. In Java that's the implicit default, and when we really need to return something else, we use System.exit(). Even if that is the way it's done in a previous language (C, in this case), new languages can choose a new way to make available the same functionality.

I tend to agree with chi's answer, that there's no clearly compelling reason it has to be done either way, so it really comes down to a somewhat subjective judgement call that was made by a small group of people a long time ago. There's no guarantee that there is going to be any particularly satisfying reason behind it.
Here is some reasoning that comes to my mind (which may or may not have been something the original designers thought of at the time, or even would agree with).
What we're really love to be able to do is something like:
main :: Integer -> String -> Set Flag -> IO ()
(for some hypothetical program that takes as command line arguments an integer, a string, and a set of flags)
Being able to write small command line programs as if they were just a function of their command line arguments would be great! But that would need the operating system (or at least the shell) to understand the types used in a Haskell program and know how to parse them (and what to do if parsing fails, or if there aren't enough arguments, or etc), which isn't going to happen.
Perhaps we could write a wrapper to do that. It could take care of parsing the raw string command line arguments into Haskell types and generating error messages (if needed), and then call main for us. But wait, we can do exactly that! We just have to call the wrapper main (and rename what we were previously calling main)!
The point is this: if you want to think of your program as a simple function of external inputs, that makes a lot of sense, but main is not that function. main works much better as a wrapper that takes care of the ugly details of receiving input over an untyped interface and calling the function that "really is" your program.
Forcing you to include a call to getArgs in your set up code makes it more apparent there's more to handling command line arguments than just getting access to them, and possibly nudges you to writing some of that extra handling code rather than just writing main (arg1 : arg2 : _) = do stuffWith arg1 arg2.
Also, it is super trivial to convert the interface we have to the one you want:
import System.Environment
main = real_main =<< getArgs
real_main :: [String] -> IO ()
real_main args = print args
So you can have it whichever way you prefer!

Related

How does lazy evaluation interplay with MVars?

Let's say I have multiple threads that are reading from a file and I want to make sure that only a single thread is reading from the file at any point in time.
One way to implement this is to use an mvar :: MVar () and ensure mutual exclusion as follows:
thread = do
...
_ <- takeMVar mvar
x <- readFile "somefile" -- critical section
putMVar mvar ()
...
-- do something that evaluates x.
The above should work fine in strict languages, but unless I'm missing something, I might run into problems with this approach in Haskell. In particular, since x is evaluated only after the thread exits the critical section, it seems to me that the file will only be read after the thread has executed putMVar, which defeats the point of using MVars in the first place, as multiple threads may read the file at the same time.
Is the problem that I'm describing real and, if so, how do I get around it?
Yes, it's real. You get around it by avoiding all the base functions that are implemented using unsafeInterleaveIO. I don't have a complete list, but that's at least readFile, getContents, hGetContents. IO actions that don't do lazy IO -- like hGet or hGetLine -- are fine.
If you must use lazy IO, then fully evaluate its results in an IO action inside the critical section, e.g. by combining rnf and evaluate.
Some other commentary on related things, but that aren't directly answers to this question:
Laziness and lazy IO are really separate concepts. They happen to share a name because humans are lazy at naming. Most IO actions do not involve lazy IO and do not run into this problem.
There is a related problem about stuffing unevaluated pure computations into your MVar and accidentally evaluating it on a different thread than you were expecting, but if you avoid lazy IO then evaluating on the wrong thread is merely a performance bug rather than an actual semantics bug.
readFile should be named unsafeReadFile because it's unsafe in the same way as unsafeInterleaveIO. If you stay away from functions that have, or should have, the unsafe prefix then you won't have this problem.
Haskell isn't a lazily evaluated language. It's language in which, as in mathematics, evaluation order doesn't matter (except that you mustn't spend an unbounded amount of time trying to evaluate a function's argument before evaluating the function body). Compilers are free to reorder computations for efficiency reasons, and GHC does, so programs compiled with GHC aren't lazily evaluated as a rule.
readFile (along with getContents and hGetContents) is one of a small number of standard Haskell functions without the unsafe prefix that violate Haskell's value semantics. GHC has to specially disable its optimizations when it encounters such functions because they make program transformations observable that aren't supposed to be observable.
These functions are convenient hacks that can make some toy programs easier to write. You shouldn't use them in threaded code, or, in my opinion, at all. I think they shouldn't even be used in introductory programming courses (which is probably what they were meant for) because they give beginners a totally wrong impression of how evaluation in Haskell is supposed to work.

Use of unsafePerformIO in programming language interpreter runtime

To add IO functions to a programming language interpreter written in Haskell, I have basically two options:
Modify the entire interpreter to run inside the IO monad
Have the runtime functions that can be invoked by interpreted programs use unsafePerformIO.
The former feels like a bad idea to me -- this effectively negates any purity benefits by having IO reach practically everywhere in the program. I also currently use ST heavily, and would have to modify large quantities of the program to achieve this, as there is no way I can see to use both ST and IO at the same time (?).
The latter makes me nervous -- as the function name states, it is unsafe, but I think in this situation it may be justified. Particularly:
The amount of code touched by this change would be very small.
The points at which IO may be performed are explicitly sequenced already by the use of seq at control points during evaluation of interpreted expressions.
Perhaps more importantly, values returned by IO actions would only be used within interpreted sections of code, where I can guarantee referential transparency by the fact that the interpreter cannot be called multiple times with the same arguments, as an operation counter will be threaded through the entire system as part of the same change, and is always passed with a unique value to every function that would use unsafePerformIO.
In this circumstance, is there a good reason not to use unsafePerformIO?
Update
I was asked why I want to retain purity in the interpreter. There are a number of reasons, but perhaps the most pressing is that I intend to later build a compiler for this language, and the language will include a variety of metaprogramming techniques that will require the compiler to include the interpreter, but I want to be able to guarantee purity of the results of compilation. The language will have a pure subset for this purpose, and I would like the interpreter to be pure when executing that subset.
If I understand it correctly, you want to add IO actions to interpreted language (impure primops), while the interpreter itself is pure.
The first option is abstract primops from interpreter. For example, the interpreter could run in some unspecified monad, while priops are injected:
data Primops m = Primops
{ putChar :: Char -> m ()
, getChar :: m Char
, ...
}
interpret :: Monad m => Primops m -> Program -> m ()
Now interpreter can't perform any IO action except the closed list of primops. (You can achieve similar result using custom monad instead of passing primops as an argument.)
But I'd consider it over-engineering until you say exactly why you need pure interpreter. Probably you don't? If you just want to make pure parts of the interpreter easy to test, then it is probably better to extract those parts into separate pure functions. That way the top level entry point will be impure, but small, yet all the interpreter's logic will be testable.

Using main in a Haskell file

I've done a fair bit of programming in haskell using GHCI however our next assignment requires us to use just GHC to compile and test our code.
Because of how ghci works compared to ghc you apparently need to use a main function and the ghc looks for this function in your code.
My question is, if Haskell promotes type safety and no side-effects unless within an IO action, why does the main part of any haskell program have to be an IO action?
Forgive me if I'm not understanding something fundamental, I just couldn't find any resources which ultimately explains this.
If your main function is not an IO action, then all it can do is produce a result. Since Haskell is lazy, this (usually) means that it just produces a promise that the computation will be performed, but it won't compute it until it needs to be used. Since the usual way to ensure something is computed is to print that value out, or send it over a network, save it to disk, or use it for some other kind of IO, your result won't be computed and the program would simply exit. Imagine a fake program like
main :: Int
main = 1 + 1
Suppose you could compile and run this, what would you expect to happen? Nothing gets printed, nothing is asking for the result of main, all that Haskell can do with this is create the promise that 1 + 1 will be computed at some point then exit the program. Basically, you can't do anything interesting at the top level without IO, and since we want programs to do interesting things we need our top level to be an IO action.
Put simply, running a program is a side-effect. This is why the top-level function is an I/O action.
An ideal Haskell program is a large chunk of pure code with a thin I/O "skin" around it.
I think it's pretty straightforward - ultimately, you need every program to do IO at the top level - otherwise how would you know if it did anything at all, or - how useful would it be? So you must have either console I/O, network I/O, disk I/O, or something similar. I don't see how you could get around that.

IO FileOffset seems pretty much useless

I'm sure it's not, but I've received the type IO FileOffset from System.Posix functions, and I can't figure out what I can do with it. It seems like it's just a rename of type COFF, which seems to be just a wrapper for Int64, and in fact when I get it in GHCI, I can see the number that the IO FileOffset corresponds to. However, I can't add it to anything else, print it out (except for through the interpreter), or even convert it to another type. It seems to be immune to show.
How can I actually use this type? I'm new to Haskell so I'm sure I'm missing something fundamental about types and possibly the documentation.
As discussed in numerous other questions, like this, there is never anything you can do with an IO a value as such – except bind it in another IO computation, which eventually has to be invoked from either main or ghci. And this is not some stupid arbitrary restriction of Haskell, but reflects the fact that something like a file offset can impossibly be known without the program first going “out into the world”, doing the file operation, coming back with the result. In impure languages, this kind of thing just suddenly happens when you try to evaluate an IO “function”, but only because half a century of imperative programming has done it this way doesn't mean it's a good idea. In fact it's a cause for quite a lot of bugs in non-purely functional languages, but more importantly it makes it way harder to understand what some library function will actually do – in Haskell you only need to look at the signature, and when there's no IO in it, you can be utterly sure1 it won't do any!
Question remains: how do you actually get any “real” work done? Well, it's pretty clever. For beginners, it's probably helpful to keep to this guideline:
An IO action always needs to be evaluated in a do block.
To retrieve results from such actions, use the val <- action syntax. This can stand anywhere in a do block except at the end. It is equivalent to what procedural languages write as var val = action() or similar. If action had a type IO T, then val will have simply type T!
The value obtained this way can be used anywhere in the same do block below the line you've obtained it from. Quite like in procedural languages.
So if your action was, say,
findOffsetOfFirstChar :: Handle -> Char -> IO FileOffset
you can use it like this:
printOffsetOfQ :: Handle -> IO ()
printOffsetOfQ h = do
offset <- findOffsetOfFirstChar h 'Q'
print offset
Later on you'll learn that many of these dos arent really necessary, but for the time being it's probably easiest to use them everywhere where there's IO going on.
1Some people will now object that there is a thing called unsafePerformIO which allows you to do IO without the signature telling so, but apart from being, well, unsafe, this does not actually belong to the Haskell language but to its foreign function interface.

A way to avoid a common use of unsafePerformIO

I often find this pattern in Haskell code:
options :: MVar OptionRecord
options = unsafePerformIO $ newEmptyMVar
...
doSomething :: Foo -> Bar
doSomething = unsafePerformIO $ do
opt <- readMVar options
doSomething' where ...
Basically, one has a record of options or something similar, that is initially set at the program's beginning. As the programmer is lazy, he doesn't want to carry the options record all over the program. He defines an MVar to keep it - defined by an ugly use of unsafePerformIO. The programmer ensures, that the state is set only once and before any operation has taken place. Now each part of the program has to use unsafePerformIO again, just to extract the options.
In my opinion, such a variable is considered pragmatically pure (don't beat me). Is there a library that abstracts this concept away and ensures that the variable is set only once, i.e. that no call is done before that initialization and that one doesn't have to write unsafeFireZeMissilesAndMakeYourCodeUglyAnd DisgustingBecauseOfThisLongFunctionName
Those who would trade essential referential transparency for a little
temporary convenience deserve neither
purity nor convenience.
This is a bad idea. The code that you're finding this in is bad code.*
There's no way to fully wrap this pattern up safely, because it is not a safe pattern. Do not do this in your code. Do not look for a safe way to do this. There is not a safe way to do this. Put the unsafePerformIO down on the floor, slowly, and back away from the console...
*There are legitimate reasons that people do use top level MVars, but those reasons have to do with bindings to foreign code for the most part, or a few other things where the alternative is very messy. In those instances, as far as I know, however, the top level MVars are not accessed from behind unsafePerformIO.
If you are using MVar for holding settings or something similar, why don't you try reader monad?
foo :: ReaderT OptionRecord IO ()
foo = do
options <- ask
fireMissiles
main = runReaderT foo (OptionRecord "foo")
(And regular Reader if you don't require IO :P)
Use implicit parameters. They're slightly less heavyweight than making every function have Reader or ReaderT in its type. You do have to change the type signatures of your functions, but I think such a change can be scripted. (Would make a nice feature for a Haskell IDE.)
There is an important reason for not using this pattern. As far as I know, in
options :: MVar OptionRecord
options = unsafePerformIO $ newEmptyMVar
Haskell gives no guarantees that options will be evaluated only once. Since the result of option is a pure value, it can be memoized and reused, but it can also be recomputed for every call (i.e. inlined) and the meaning of the program must not change (contrary to your case).
If you still decide to use this pattern, be sure to add {-# NOINLINE options #-}, otherwise it might get inlined and your program will fail! (And by this we're getting out of the guarantees given by the language and the type system and relying solely on the implementation of a particular compiler.)
This topic has been widely discussed and possible solutions are nicely summarized on Haskell Wiki in Top level mutable state. Currently it's not possible to safely abstract this pattern without some additional compiler support.
I often find this pattern in Haskell code:
Read different code.
As the programmer is lazy, he doesn't want to carry the options record all over the program. He defines an MVar to keep it - defined by an ugly use of unsafePerformIO. The programmer ensures, that the state is set only once and before any operation has taken place. Now each part of the program has to use unsafePerformIO again, just to extract the options.
Sounds like literally exactly what the reader monad accomplishes, except that the reader monad does it in a safe way. Instead of accommodating your own laziness, just write actual good code.

Resources