Understanding I/O monad and the use of "do" notation

Understanding I/O monad and the use of "do" notation - haskell

I am still struggling with Haskell and now I have encountered a problem with wrapping my mind around the Input/Output monad from this example:
main = do
line <- getLine
if null line
then return ()
else do
putStrLn $ reverseWords line
main
reverseWords :: String -> String
reverseWords = unwords . map reverse . words
I understand that because functional language like Haskell cannot be based on side effects of functions, some solution had to be invented. In this case it seems that everything has to be wrapped in a do block. I get simple examples, but in this case I really need someone's explanation:
Why isn't it enough to use one, single do block for I/O actions?
Why do you have to open completely new one in if/else case?
Also, when does the -- I don't know how to call it -- "scope" of the do monad ends, i.e. when can you just use standard Haskell terms/functions?

The do block concerns anything on the same indentation level as the first statement. So in your example it's really just linking two things together:
line <- getLine
and all the rest, which happens to be rather bigger:
if null line
then return ()
else do
putStrLn $ reverseWords line
main
but no matter how complicated, the do syntax doesn't look into these expressions. So all this is exactly the same as
main :: IO ()
main = do
line <- getLine
recurseMain line
with the helper function
recurseMain :: String -> IO ()
recurseMain line
| null line = return ()
| otherwise = do
putStrLn $ reverseWords line
main
Now, obviously the stuff in recurseMain can't know that the function is called within a do block from main, so you need to use another do.

do doesn't actually do anything, it's just syntactic sugar for easily combining statements. A dubious analogy is to compare do to []:
If you have multiple expressions you can combine them into lists using ::
(1 + 2) : (3 * 4) : (5 - 6) : ...
However, this is annoying, so we can instead use [] notation, which compiles to the same thing:
[1+2, 3*4, 5-6, ...]
Similarly, if you have multiple IO statments, you can combine them using >> and >>=:
(putStrLn "What's your name?") >> getLine >>= (\name -> putStrLn $ "Hi " ++ name)
However, this is annoying, so we can instead use do notation, which compiles to the same thing:
do
putStrLn "What's your name?"
name <- getLine
putStrLn $ "Hi " ++ name
Now the answer to why you need multiple do blocks is simple:
If you have multiple lists of values, you need multiple []s (even if they're nested).
If you have multiple sequences of monadic statements, you need multiple dos (even if they're nested).

Related

Does the main-function in Haskell always start with main = do?

In java we always write:
public static void main(String[] args){...}
when we want to start writing a program.
My question is, is it the same for Haskell, IE: can I always be sure to declare:
main = do, when I want to write code for a program in Haskell?
for example:
main = do
putStrLn "What's your name?"
name <- getLine
putStrLn ("Hello " ++ name)
This program is going to ask the user "What's your name?"
the user input will then be stored inside of the name-variable, and
"Hello" ++ name will be displayed before the program terminates.

Short answer: No, we have to declare a main =, but not a do.
The main must be an IO monad type (so IO a) where a is arbitrary (since it is ignored), as is written here:
The use of the name main is important: main is defined to be the entry point of a Haskell program (similar to the main function in C), and must have an IO type, usually IO ().
But you do not necessary need do notation. Actually do is syntactical sugar. Your main is in fact:
main =
putStrLn "What's your name?" >> getLine >>= \n -> putStrLn ("Hello " ++ n)
Or more elegantly:
main = putStrLn "What's your name?" >> getLine >>= putStrLn . ("Hello " ++)
So here we have written a main without do notation. For more about desugaring do notation, see here.

Yes, if you have more than one line in your do block, and if you are even using the do notation.
The full do-notation syntax also includes explicit separators -- curly braces and semicolons:
main = do { putStrLn "What's your name?"
; name <- getLine
; putStrLn ("Hello " ++ name)
}
With them, indentation plays no role other than in coding style (good indentation improves readability; explicit separators ensure code robustness, remove white-space related brittleness). So when you have only one line of IO-code, like
main = do { print "Hello!" }
there are no semicolons, no indentation to pay attention to, and the curly braces and do keyword itself become redundant:
main = print "Hello!"
So, no, not always. But very often it does, and uniformity in code goes a long way towards readability.
do blocks translate into monadic code, but you can view this fact as implementational detail, at first. In fact, you should. You can treat the do notation axiomatically, as an embedded language, mentally. Besides, it is that, anyway.
The simplified do-syntax is:
do { pattern1 <- action1
; pattern2 <- action2
.....................
; return (.....)
}
Each actioni is a Haskell value of type M ai for some monad M and some result type ai. Each action produces its own result type ai while all actions must belong to the same monad type M.
Each patterni receives the previously "computed" result from the corresponding action.
Wildcards _ can be used to ignore it. If this is the case, the _ <- part can be omitted altogether.
"Monad" is a scary and non-informative word, but it is really nothing more than EDSL, conceptually. Embedded domain-specific language means that we have native Haskell values standing for (in this case) I/O computations. We write our I/O programs in this language, which become a native Haskell value(s), which we can operate upon as on any other native Haskell value -- collect them in lists, compose them into more complex computation descriptions (programs), etc.
The main value is one such value computed by our Haskell program. The compiler sees it, and performs the I/O program that it stands for, at run time.
The point to it is that we can now have a "function" getCurrentTime (impossible, on the face of it, in functional paradigm since it must return different results on separate invocations), because it is not returning the current time -- the action it describes will do so, when the I/O program it describes is run by the run-time system.
On the type level this is reflected by such values not having just some plain Haskell type a, but a parameterized type, IO a, "tagged" by IO as belonging to this special world of I/O programming.
See also: Why does haskell's bind function take a function from non-monadic to monadic.

A little confused on the '$' character in Haskell

I'm very new to Haskell and I'm trying to understand these basic lines of code. I have a main module that's very simple:
main = do
words <- readFile "test.txt"
putStrLn $ reverseCharacters words
where reverseCharacters is defined in another module that I have:
reverseCharacters :: String -> String
reverseCharacters x = reverse x
What I am having trouble understanding is why the $ needs to be there. I've read previous posts and looked it up and I'm still having difficulty understanding this. Any help would be greatly appreciated.

$ is an operator, just like +. What it does is treat its first argument (the expression on the left) as a function, and apply it to its second argument (the expression on the right).
So in this case putStrLn $ reverseCharacters words is equivalent to putStrLn (reverseCharacters words). It needs to be there because function application is left associative, so using no $ or parentheses like putStrLn reverseCharacters words would be equivalent to parenthesising this way (putStrLn reverseCharacters) words, which doesn't work (we can't apply putStrLn to reverseCharacters [something of type String -> String], and even if we could we can't apply the result of putStrLn to words [something of type String]).
The $ operator is just another way of explicitly "grouping" the words than using parentheses; because it's an infix operator it forces a "split" in the expression (and because it's a very low precedence infix operator, it works even if the things on the left or right are using other infix operators).

Haskell - loop over user input

I have a program in haskell that has to read arbitrary lines of input from the user and when the user is finished the accumulated input has to be sent to a function.
In an imperative programming language this would look like this:
content = ''
while True:
line = readLine()
if line == 'q':
break
content += line
func(content)
I find this incredibly difficult to do in haskell so I would like to know if there's an haskell equivalent.

The Haskell equivalent to iteration is recursion. You would also need to work in the IO monad, if you have to read lines of input. The general picture is:
import Control.Monad
main = do
line <- getLine
unless (line == "q") $ do
-- process line
main
If you just want to accumulate all read lines in content, you don't have to do that. Just use getContents which will retrieve (lazily) all user input. Just stop when you see the 'q'. In quite idiomatic Haskell, all reading could be done in a single line of code:
main = mapM_ process . takeWhile (/= "q") . lines =<< getContents
where process line = do -- whatever you like, e.g.
putStrLn line
If you read the first line of code from right to left, it says:
get everything that the user will provide as input (never fear, this is lazy);
split it in lines as it comes;
only take lines as long as they're not equal to "q", stop when you see such a line;
and call process for each line.
If you didn't figure it out already, you need to read carefully a Haskell tutorial!

It's reasonably simple in Haskell. The trickiest part is that you want to accumulate the sequence of user inputs. In an imperative language you use a loop to do this, whereas in Haskell the canonical way is to use a recursive helper function. It would look something like this:
getUserLines :: IO String -- optional type signature
getUserLines = go ""
where go contents = do
line <- getLine
if line == "q"
then return contents
else go (contents ++ line ++ "\n") -- add a newline
This is actually a definition of an IO action which returns a String. Since it is an IO action, you access the returned string using the <- syntax rather than the = assignment syntax. If you want a quick overview, I recommend reading The IO Monad For People Who Simply Don't Care.
You can use this function at the GHCI prompt like this
>>> str <- getUserLines
Hello<Enter> -- user input
World<Enter> -- user input
q<Enter> -- user input
>>> putStrLn str
Hello -- program output
World -- program output

Using pipes-4.0, which is coming out this weekend:
import Pipes
import qualified Pipes.Prelude as P
f :: [String] -> IO ()
f = ??
main = do
contents <- P.toListM (P.stdinLn >-> P.takeWhile (/= "q"))
f contents
That loads all the lines into memory. However, you can also process each line as it is being generated, too:
f :: String -> IO ()
main = runEffect $
for (P.stdinLn >-> P.takeWhile (/= "q")) $ \str -> do
lift (f str)
That will stream the input and never load more than one line into memory.

You could do something like
import Control.Applicative ((<$>))
input <- unlines . takeWhile (/= "q") . lines <$> getContents
Then input would be what the user wrote up until (but not including) the q.

Keep variable inside another function in Haskell

I am lost in this concept. This is my code and what it does is just asking what is your name.
askName = do
name <- getLine
return ()
main :: IO ()
main = do
putStrLn "Greetings once again. What is your name?"
askName
However, How can I access in my main the variable name taken in askName?
This is my second attempt:
askUserName = do
putStrLn "Hello, what's your name?"
name <- getLine
return name
sayHelloUser name = do
putStrLn ("Hey " ++ name ++ ", you rock!")
main = do
askUserName >>= sayHelloUser
I can now re-use the name in a callback way, however if in main I want to call name again, how can I do that? (avoiding to put name <- getLine in the main, obviously)

We can imagine that you ask the name in order to print it, then let's rewrite it.
In pseudo code, we have
main =
print "what is your name"
bind varname with userReponse
print varname
Your question then concern the second instruction.
Take a look about the semantic of this one.
userReponse is a function which return the user input (getLine)
varname is a var
bind var with fun : is a function which associate a var(varname) to the output of a function(getLine)
Or as you know in haskell everything is a function then our semantic is not well suited.
We need to revisit it in order to respect this idiom. According to the later reflexion the semantic of our bind function become bind fun with fun
As we cannot have variable, to pass argument to a function, we need, at a first glance, to call another function, in order to produce them. Thus we need a way to chain two functions, and it's exactly what's bind is supposed to do. Furthermore, as our example suggest, an evaluation order should be respected and this lead us to the following rewriting with fun bind fun
That's suggest that bind is more that a function it's an operator.
Then for all function f and g we have with f bind g.
In haskell we note this as follow
f >>= g
Furthermore, as we know that a function take 0, 1 or more argument and return 0, 1 or more argument.
We could refine our definition of our bind operator.
In fact when f doesn't return any result we note >>= as >>
Applying, theses reflexions to our pseudo code lead us to
main = print "what's your name" >> getLine >>= print
Wait a minute, How the bind operator differ from the dot operator use for the composition of two function ?
It's differ a lot, because we have omit an important information, bind doesn't chain two function but it's chain two computations unit. And that's the whole point to understand why we have define this operator.
Let's write down a global computation as a sequence of computation unit.
f0 >>= f1 >> f2 >> f3 ... >>= fn
As this stage a global computation could be define as a set of computation unit with two operator >>=, >>.
How do we represent set in computer science ?
Usually as container.
Then a global computation is a container which contain some computation unit. On this container we could define some operator allowing us to move from a computation unit to the next one, taking into account or not the result of the later, this is ours >>= and >> operator.
As it's a container we need a way to inject value into it, this is done by the return function. Which take an object and inject it into a computation, you could check it through is signature.
return :: a -> m a -- m symbolize the container, then the global computation
As it's a computation we need a way to manage a failure, this done by the fail function.
In fact the interface of a computation is define by a class
class Computation
return -- inject an objet into a computation
>>= -- chain two computation
>> -- chain two computation, omitting the result of the first one
fail -- manage a computation failure
Now we can refine our code as follow
main :: IO ()
main = return "What's your name" >>= print >> getLine >>= print
Here I have intentionally include the signature of the main function, to express the fact that we are in the global IO computation and the resulting output with be () (as an exercise enter $ :t print in ghci).
If we take more focus on the definition for >>=, we can emerge the following syntax
f >>= g <=> f >>= (\x -> g) and f >> g <=> f >>= (\_ -> g)
And then write
main :: IO ()
main =
return "what's your name" >>= \x ->
print x >>= \_ ->
getLine >>= \x ->
print x
As you should suspect, we certainly have a special syntax to deal with bind operator in computational environment. You're right this is the purpose of do syntax
Then our previous code become, with do syntax
main :: IO ()
main = do
x <- return "what's your name"
_ <- print x
x <- getLine
print x
If you want to know more take a look on monad
As mentioned by leftaroundabout, my initial conclusion was a bit too enthusiastic
You should be shocked, because we have break referential transparency law (x take two different value inside our sequence of instruction), but it doesn't matter anymore,because we are into a computation, and a computation as defined later is a container from which we can derive an interface and this interface is designed to manage, as explain, the impure world which correspond to the real world.

Return the name from askname. In Haskell its not idiomatic to access "global" variables:
askName :: IO String
askName = do
name <- getLine
return name
main :: IO ()
main = do
putStrLn "Greetings once again. What is your name?"
name <- askName
putStrLn name
Now the only problem is tht the askName function is kind of pointless, since its now just an alias to getLine. We could "fix" that by putting the question inside askName:
askName :: IO String
askName = do
putStrLn "Greetings once again. What is your name?"
name <- getLine
return name
main :: IO ()
main = do
name <- askName
putStrLn name
Finally, just two little points: its usually a good idea to put type declarations when you are learning, to make things explicit and help compiler error messages. Another thing is to remember that the "return" function is only used for monadic code (it is not analogous to a traditional return statement!) and sometimes we could have ommited some intermediary variables:
askName :: IO String
askName = do
putStrLn "Greetings once again. What is your name?"
getLine

Convert a "do" notation with more than two actions to use the bind function

I know that the following "do" notation's "bind" function is equivalent to getLine >>= \line -> putStrLn
do line <- getLine
putStrLn line
But how is the following notation equivalent to bind function?
do line1 <- getLine
putStrLn "enter second line"
line2 <- getLine
return (line1,line2)

I take it you are trying to see how to bind the result of "putStrLn". The answer is in the type of putStrLn:
putStrLn :: String -> IO ()
Remember that "()" is the unit type, which has a single value (also written "()"). So you can bind this in exactly the same way. But since you don't use it you bind it to a "don't care" value:
getLine >>= \line1 ->
putStrLn "enter second line" >>= \_ ->
getline >>= \line2 ->
return (line1, line2)
As it happens, there is an operator already defined for ignoring the return value, ">>". So you could just rewrite this as
getLine >>= \line1 ->
putStrLn "enter second line" >>
getline >>= \line2 ->
return (line1, line2)
I'm not sure if you are also trying to understand how bind operators are daisy-chained. To see this, let me put the implicit brackets and extra indentation in the example above:
getLine >>= (\line1 ->
putStrLn "enter second line" >> (
getline >>= (\line2 ->
return (line1, line2))))
Each bind operator links the value to the left with a function to the right. That function consists of all the rest of the lines in the "do" clause. So the variable being bound through the lambda ("line1" in the first line) is in scope for the whole of the rest of the clause.

For this specific example you can actually avoid both do and >>= by using combinators from Control.Applicative:
module Main where
import Control.Applicative ((<$>), (<*>), (<*))
getInput :: IO (String, String)
getInput = (,) <$> getLine <* putStrLn "enter second line" <*> getLine
main = print =<< getInput
Which works as expected:
travis#sidmouth% ./Main
hello
enter second line
world
("hello","world")
It looks a little weird at first, but in my opinion the applicative style feels very natural once you're used to it.

I would strongly suggest you to read the chapter Desugaring of Do-blocks in the book Real-World haskell. It tells you, that you all are wrong. For a programmer, it's the natural way to use a lambda, but the do-block is implemented using functions which - if a pattern maching failuire occurs - will call the fail implementation of the according monad.
For instance, your case is like:
let f x =
putStrLn "enter second line" >>
let g y = return (x,y)
g _ = fail "Pattern mismatched"
in getLine >>= g
f _ = fail "Pattern mismatched"
in getLine >>= f
In a case like this, this may be completely irrelevant. But consider some expression that involves pattern-matching. Also, you can use this effect for some special stuff, eg, you can do something like this:
oddFunction :: Integral a => [a] -> [a]
oddFunctiond list = do
(True,y) <- zip (map odd list) list
return y
What will this function do? You can read this statement as a rule for working with the elements of the list. The first statement binds an element of the list to the var y, but only if y is odd. If y is even, a pattern matching failure occurs and fail will be called. In the monad instance for Lists, fail is simply []. Thus, the function strips all even elements from the list.
(I know, oddFunction = filter odd would do this better, but this is just an example)

getLine >>= \line1 ->
putStrLn "enter second line" >>
getLine >>= \line2 ->
return (line1, line2)
Generally foo <- bar becomes bar >>= \foo -> and baz becomes baz >> (unless it's the last line of the do-block, in which case it just stays baz).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Understanding I/O monad and the use of "do" notation - haskell

Related

Does the main-function in Haskell always start with main = do?

A little confused on the '$' character in Haskell

Haskell - loop over user input

Keep variable inside another function in Haskell

Convert a "do" notation with more than two actions to use the bind function

Categories

Resources