Why/how does recursive IO work? - haskell

Haskell IO is often explained in terms of the entire program being a pure function (main) that returns an IO value (often described as an imperative IO program), which is then executed by the runtime.
This mental model works fine for simple examples, but fell over for me as soon as I saw a recursive main in Learn You A Haskell. For example:
main = do
line <- getLine
putStrLn line
main
Or, if you prefer:
main = getLine >>= putStrLn >> main
Since main never terminates, it never actually returns an IO value, yet the program endlessly reads and echoes back lines just fine - so the simple explanation above doesn't quite work. Am I missing something simple or is there a more complete explanation (or is it 'simply' compiler magic) ?

In this case, main is a value of type IO () rather than a function. You can think of it as a sequence of IO a values:
main = getLine >>= putStrLn >> main
This makes it a recursive value, not unlike infinite lists:
foo = 1 : 2 : foo
We can return a value like this without needing to evaluate the whole thing. In fact, it's a reasonably common idiom.
foo will loop forever if you try to use the whole thing. But that's true of main too: unless you use some external method to break out of it, it will never stop looping! But you can start getting elements out of foo, or executing parts of main, without evaluating all of it.

The value main denotes is an infinite program:
main = do
line <- getLine
putStrLn line
line <- getLine
putStrLn line
line <- getLine
putStrLn line
line <- getLine
putStrLn line
line <- getLine
putStrLn line
line <- getLine
putStrLn line
...
But it's represented in memory as a recursive structure that references itself. That representation is finite, unless someone tries to unfold the entire thing to get a non-recursive representation of the entire program - that would never finish.
But just as you can probably figure out how to start executing the infinite program I wrote above without waiting for me to tell you "all" of it, so can Haskell's runtime system figure out how to execute main without unfolding the recursion up-front.
Haskell's lazy evaluation is actually interleaved with the runtime system's execution of the main IO program, so this works even for a function that returns an IO action which recursively invokes the function, like:
main = foo 1
foo :: Integer -> IO ()
foo x = do
print x
foo (x + 1)
Here foo 1 is not a recursive value (it contains foo 2, not foo 1), but it's still an infinite program. However this works just fine, because the program denoted by foo 1 is only generated lazily on-demand; it can be produced as the runtime system's execution of main goes along.
By default Haskell's laziness means that nothing is evaluated until it's needed, and then only "just enough" to get past the current block. Ultimately the source of all the "need" in "until it's needed" comes from the runtime system needing to know what the next step in the main program is so it can execute it. But it's only ever the next step; the rest of the program after that can remain unevaluated until after the next step has been fully executed. So infininte programs can be executed and do useful work so long as it's always only a finite amount of work to generate "one more step".

Related

Endless loop and a break for TUI in Haskell

I want to listen for keypresses and depending on those, use commands from System.Console.ANSI
package to manipulate console interface for my program.
In Python I would to this
while True:
read_from_console()
if condition:
print_stuff_into_console
break
How do I approach such task in Haskell, in simplest possible way?
Thanks
The equivalent abstract pseudo-ish code in Haskell would look like:
loop = do
line <- readFromConsole
if condition line
then do
printStuffToConsole
loop -- Recurse - i.e. repeat the same thing again
else
pure () -- Don't recurse - the function execution ends
main = loop
But of course the devil would be in how readFromConsole and printStuffToConsole look. And these really depend on what exactly you'd like to do.
I will offer the dumbest possible implementation, just to illustrate how everything works and to build a complete program.
Let's say that "read from console" just means having the user enter a line of text and press Enter. For that, you can use the getLine function:
readFromConsole = getLine
And let's say you want to print the same thing every time. For printing, you can use the putStrLn function:
printStuffToConsole = putStrLn "Give me another!"
And then let's say that the condition for stopping is that the user enters "STOP". This can be expressed with a string comparison:
condition line = line /= "STOP"
If you put all of that together, you get a complete program:
loop = do
line <- readFromConsole
if condition line
then do
printStuffToConsole
loop -- Recurse - i.e. repeat the same thing again
else
pure () -- Don't recurse - the function execution ends
where
readFromConsole = getLine
printStuffToConsole = putStrLn "Give me another!"
condition line = line /= "STOP"
main = loop
Of course, while it's nice to have parts of the program semantically named, you don't strictly speaking have to do it if you wanted to make the whole thing shorter:
main = do
line <- getLine
if line /= "STOP"
then do
putStrLn "Give me another!"
main
else
pure ()
Fyodor Soikin already provided the simple way to do it.
Here I'll comment on a general way to "break" a loop: using continuations and callCC.
import Control.Monad.Cont
main :: IO ()
main = do
putStrLn "start"
flip runContT return $ callCC $ \break -> forever $ do
l <- lift $ getLine
if l == "quit"
then break ()
else lift $ putStrLn $ "not a quit command " ++ l
lift $ putStrLn "next iteration"
putStrLn "end"
Continuations are infamously hard to grasp, but the above code is not too complex. A rough intuition is as follows.
The forever library function is used to repeat an action indefinitely, it is the Haskell equivalent of while true.
The flip runContT return $ callCC $ \f -> .... part means "define f to be a break-like function, which will exit the "block" .... immediately. In the code, I call that break to make that clear. The call break () interrupts the forever (and returns the () outside -- we could use that value if we wrote x <- flip runContT .... to bind it to x).
There is a downside, though. In the .... part we no longer work inside the IO monad, but in the ContT () IO monad. That is what lets us call break (). In order to use regular IO there, we need to lift the IO actions. So, we can't use putStrLn ".." but we need to use lift $ putStrLn ".." instead.
The rest should be more or less straightforward to follow.
Here's a small demo in GHCi.
> main
start
1 (typed by the user)
not a quit command 1
next iteration
2 (typed by the user)
not a quit command 2
next iteration
3 (typed by the user)
not a quit command 3
next iteration
4 (typed by the user)
not a quit command 4
next iteration
quit (typed by the user)
end
Is it a good idea to use continuation just for break? Maybe. If you are not familiar with this technique, probably it is not worth it. The plain recursive approach looks much simpler.

What's the use of main returning IO Something rather than IO()?

I'm reading http://learnyouahaskell.com/ ... And something surprised me:
Because of that, main always has a type signature of main :: IO something, where something is some concrete type.
? So main doesn't have to be of type IO(), but rather can be IO(String) or IO(Int)? But what's the use of this?
I did some playing...
m#m-X555LJ:~$ cat wtf.hs
main :: IO Int
main = fmap (read :: String -> Int) getLine
m#m-X555LJ:~$ runhaskell wtf.hs
1
m#m-X555LJ:~$ echo $?
0
m#m-X555LJ:~$
Hmm. So my first hypothesis is disproven. I thought this was a way for a Haskell program to return exit status to the shell, much like a C program starts from int main() and reports the exit status with return 0 or return 1.
But nope: the above program consumes the 1 from input and then does nothing, and in particular doesn't seem to return this 1 to the shell.
One more test:
m#m-X555LJ:~$ cat wtf.hs
main = getContents
m#m-X555LJ:~$ runhaskell wtf.hs
m#m-X555LJ:~$
Wow. This time I tried returning IO String. For reasons unknown to me, this time Haskell doesn't even wait for input, as it did when I was returning IO Int. The program seems to simply do nothing.
This hints that the value is really not returned anywhere: apparently, since the results of getContents are nowhere used, the whole instruction was skipped due to laziness. But if this was the case, why was returning IO Int not skipped? Well yes: I did fmap read on the IO action; but same stuff seems to apply, computing the read is only necessary if the result of the action is used, which - as the main = getContents example seems to hint - is not used, so laziness should also skip the read and hence also the getLine, right? Well, wrong - but I'm confused why.
What's the use of returning IO Something from main rather than only IO ()?
This is actually multiple questions, but in order:
The 'result' of main doesn't have a meaning, that is why it can be () or anything else. It's not used at all.
The reason types other than IO () are allowed for main is for convenience; Otherwise you'd always have to do something like main = void $ realMain to discard results (you may well want to have an action that could return a result which you don't care about as the last thing that happens) which is a bit tedious. IMHO silently discarding things is bad and so I'd prefer if main was forced to be :: IO (), but you can always get that effect by just supplying the type signature yourself so it's not really a problem in practice.
Side point: If you want to exit with a specific exit code, use System.Exit
The reason fmap read getLine consumes output and getContents doesn't is because getContents is lazy and getLine isn't - i.e. getLine does read a line of text where you'd think it does, whereas getContents only does any actual IO if the result is "needed" in the Haskell world; Since the result of IO isn't used for anything that means it doesn't do anything if getContents is your whole main.

Does the main-function in Haskell always start with main = do?

In java we always write:
public static void main(String[] args){...}
when we want to start writing a program.
My question is, is it the same for Haskell, IE: can I always be sure to declare:
main = do, when I want to write code for a program in Haskell?
for example:
main = do
putStrLn "What's your name?"
name <- getLine
putStrLn ("Hello " ++ name)
This program is going to ask the user "What's your name?"
the user input will then be stored inside of the name-variable, and
"Hello" ++ name will be displayed before the program terminates.
Short answer: No, we have to declare a main =, but not a do.
The main must be an IO monad type (so IO a) where a is arbitrary (since it is ignored), as is written here:
The use of the name main is important: main is defined to be the entry point of a Haskell program (similar to the main function in C), and must have an IO type, usually IO ().
But you do not necessary need do notation. Actually do is syntactical sugar. Your main is in fact:
main =
putStrLn "What's your name?" >> getLine >>= \n -> putStrLn ("Hello " ++ n)
Or more elegantly:
main = putStrLn "What's your name?" >> getLine >>= putStrLn . ("Hello " ++)
So here we have written a main without do notation. For more about desugaring do notation, see here.
Yes, if you have more than one line in your do block, and if you are even using the do notation.
The full do-notation syntax also includes explicit separators -- curly braces and semicolons:
main = do { putStrLn "What's your name?"
; name <- getLine
; putStrLn ("Hello " ++ name)
}
With them, indentation plays no role other than in coding style (good indentation improves readability; explicit separators ensure code robustness, remove white-space related brittleness). So when you have only one line of IO-code, like
main = do { print "Hello!" }
there are no semicolons, no indentation to pay attention to, and the curly braces and do keyword itself become redundant:
main = print "Hello!"
So, no, not always. But very often it does, and uniformity in code goes a long way towards readability.
do blocks translate into monadic code, but you can view this fact as implementational detail, at first. In fact, you should. You can treat the do notation axiomatically, as an embedded language, mentally. Besides, it is that, anyway.
The simplified do-syntax is:
do { pattern1 <- action1
; pattern2 <- action2
.....................
; return (.....)
}
Each actioni is a Haskell value of type M ai for some monad M and some result type ai. Each action produces its own result type ai while all actions must belong to the same monad type M.
Each patterni receives the previously "computed" result from the corresponding action.
Wildcards _ can be used to ignore it. If this is the case, the _ <- part can be omitted altogether.
"Monad" is a scary and non-informative word, but it is really nothing more than EDSL, conceptually. Embedded domain-specific language means that we have native Haskell values standing for (in this case) I/O computations. We write our I/O programs in this language, which become a native Haskell value(s), which we can operate upon as on any other native Haskell value -- collect them in lists, compose them into more complex computation descriptions (programs), etc.
The main value is one such value computed by our Haskell program. The compiler sees it, and performs the I/O program that it stands for, at run time.
The point to it is that we can now have a "function" getCurrentTime (impossible, on the face of it, in functional paradigm since it must return different results on separate invocations), because it is not returning the current time -- the action it describes will do so, when the I/O program it describes is run by the run-time system.
On the type level this is reflected by such values not having just some plain Haskell type a, but a parameterized type, IO a, "tagged" by IO as belonging to this special world of I/O programming.
See also: Why does haskell's bind function take a function from non-monadic to monadic.

Haskell getContents wait for EOF

I want to wait until user input terminates with EOF and then output it all whole. Isn't that what getContents supposed to do? The following code outputs each time user hits enter, what am I doing wrong?
import System.IO
main = do
hSetBuffering stdin NoBuffering
contents <- getContents
putStrLn contents
The fundamental problem is that getContents is an instances of Lazy IO. This means that getContents produces a thunk that can be evaluated like a normal Haskell value, and only does the relevant IO when it's forced.
contents is a lazy list that putStr tries to print, which forces the list and causes getContents to read as much as it can. putStr then prints everything that's forced, and continues trying to force the rest of the list until it hits []. As getContents can read more and more of the stream—the exact behavior depends on buffering—putStr can print more and more of it immediately, giving you the behavior you see.
While this behavior is useful for very simple scripts, it ties in Haskell's evaluation order into observable effects—something it was never meant to do. This means that controlling exactly when parts of contents get printed is awkward because you have to break the normal Haskell abstraction and understand exactly how things are getting evaluated.
This leads to some potentially unintuitive behavior. For example, if you try to get the length of the input—and actually use it—the list is forced before you get to printing it, giving you the behavior you want:
main = do
contents <- getContents
let n = length contents
print n
putStr contents
but if you move the print n after the putStr, you go back to the original behavior because n does not get forced until after printing the input (even though n still got defined before putStr was used):
main = do
contents <- getContents
let n = length contents
putStr contents
print n
Normally, this sort of thing is not a problem because it won't change the behavior of your code (although it can affect performance). Lazy IO just brings it into the realm of correctness by piercing the abstraction layer.
This also gives us a hint on how we can fix your issue: we need some way of forcing contents before printing it. As we saw, we can do this with length because length needs to traverse the whole list before computing its result. Instead of printing it, we can use seq which forces the lefthand expression to be evaluated at the same time as the righthand one, but throws away the actual value:
main = do
contents <- getContents
let n = length contents
n `seq` putStr contents
At the same time, this is still a bit ugly because we're using length just to traverse the list, not because we actually care about it. What we would really like is a function that just traverses the list enough to evaluate it, without doing anything else. Happily, this is exactly what deepseq does (for many data structures, not just lists):
import Control.DeepSeq
import System.IO
main = do
contents <- getContents
contents `deepseq` putStr contents
This is a problem of lazy I/O. One simple solution is to use strict I/O, such as via ByteStrings:
import qualified Data.ByteString as S
main :: IO ()
main = S.getContents >>= S.putStr
You can use the replacement functions from the strict package (link):
import qualified System.IO.Strict as S
main = do
contents <- S.getContents
putStrLn contents
Note that for reading there isn't a need to set buffering. Buffering really only helps when writing to files. See this answer (link) for more details.
The definition of the strict version of hGetContents in System.IO.Strict is pretty simple:
hGetContents :: IO.Handle -> IO.IO String
hGetContents h = IO.hGetContents h >>= \s -> length s `seq` return s
I.e., it forces everything to read into memory by calling length on the string returned by the standard/lazy version of hGetContents.

Newbie: understanding main and IO()

While learning Haskell I am wondering when an IO action will be performed. In several places I found descriptions like this:
"What’s special about I/O actions is that if they fall into the main function, they are performed."
But in the following example, 'greet' never returns and therefore nothing should be printed.
import Control.Monad
main = greet
greet = forever $ putStrLn "Hello World!"
Or maybe I should ask: what does it mean to "fall into the main function"?
First of all, main is not a function. It is indeed just a regular value and its type is IO (). The type can be read as: An action that, when performed, produces a value of type ().
Now the run-time system plays the role of an interpreter that performs the actions that you have described. Let's take your program as example:
main = forever (putStrLn "Hello world!")
Notice that I have performed a transformation. That one is valid, since Haskell is a referentially transparent language. The run-time system resolves the forever and finds this:
main = putStrLn "Hello world!" >> MORE1
It doesn't yet know what MORE1 is, but it now knows that it has a composition with one known action, which is executed. After executing it, it resolves the second action, MORE1 and finds:
MORE1 = putStrLn "Hello world!" >> MORE2
Again it executes the first action in that composition and then keeps on resolving.
Of course this is a high level description. The actual code is not an interpreter. But this is a way to picture how a Haskell program gets executed. Let's take another example:
main = forever (getLine >>= putStrLn)
The RTS sees this:
main = forever MORE1
<< resolving forever >>
MORE1 = getLine >>= MORE2
<< executing getLine >>
MORE2 result = putStrLn result >> MORE1
<< executing putStrLn result (where 'result' is the line read)
and starting over >>
When understanding this you understand how an IO String is not "a string with side effects" but rather the description of an action that would produce a string. You also understand why laziness is crucial for Haskell's I/O system to work.
In my opinion the point of the statement "What’s special about I/O actions is that if they fall into the main function, they are performed." is that IO actions are first class citizens. That is, IO-actions can occur at all places where values of other data types like Int can occur. For example, you can define a list that contains IO actions as follows.
actionList = [putStr "Hello", putStr "World"]
The list actionList has type [IO ()]. That is, the list contains actions that interact with the world, for example, print on the console or read in input from the user. But, in defining this list we do not execute the actions, we simply put them in a list for later use.
If an IO can occur somewhere in your program, the question arrises when these actions are performed and here main comes into play. Consider the following definition of main.
main = do
actionList !! 0
actionList !! 1
This main function projects to the first and the second component of the list and "executes" the corresponding actions by using them within its definition. Note that it does not necessarily have to be the main function itself that executes an IO action. Any function that is called from the main function can execute actions as well. For example, we can define a function that calls the actions from actionList and let main call this function as follows.
main = do
caller
putStr "!"
caller = do
actionList !! 0
actionList !! 1
To highlight that it does not have to be a simple renaming like in main = caller I have added an action that prints an exclamation mark after it has performed the actions from the list.
Simple IO actions can be combined into more advanced ones by using do notation.
main = do
printStrLn "Hello"
printStrLn "World"
combines the IO action printStrLn "Hello" with the IO action printStrLn "World". Main is now an IO action first printing a line that says "Hello" and then a line that says "World". Written without do-notation (which is just syntactic suger) it looks like this:
main = printStrLn "Hello" >> printStrLn "World"
Here you can see the >> function combining the two actions.
You can create an IO action that reads a line, passes it to a function(that does awesome stuff to it :)) and the prints the result like this:
main = do
input <- getLine
let result = doAwesomeStuff input
printStrLn result
or without binding the result to a variable:
main = do
input <- getLine
printStrLn (doAwesomeStuff input)
This can ofcourse also be written as IO actions and functions that combine them like this:
main = getLine >>= (\input -> printStrLn (doAwesomeStuff input))
When you run the program the main IO action is executed. This is the only time any IO actions are actually executed. (well technically you can also execute them within you program, but it is not safe. The function that does the is called unsafePerformIO.)
You can read more here: http://www.haskell.org/haskellwiki/Introduction_to_Haskell_IO/Actions
(This link is probably a better explaination than mine, but I only found it after I had written nearly everything. It is also quite a bit longer)
launchAMissile :: IO ()
launchAMissile = do
openASilo
loadCoordinates
launchAMissile
main = do
let launch3missiles = launchAMissile >> launchAMissile >> launchAMissile
putStrLn "Not actually launching any missiles"
forever isn't a loop like C's while (true). It is a function that produces an IO value (which contains an infinitely repeated sequence of actions), which is consumed by the caller. (In this case, the caller is main, which means that the actions get executed by the runtime system).

Resources