Haskell State monadic function using recursion - haskell

TL:DR: Is there a way to do example 3 without passing an argument
I'm trying to understand the state monad in haskell (Control.Monad.State). I made an extremely simple function:
Example 1
example :: State Int Int
example = do
e <- get
put (e*5)
return e
This example works in ghci...
runState example 3
(3,15)
I modified it to be able to take arguments....
Example 2
example :: Int -> State Int Int
example n = do
e <- get
put (e*n)
return e
also works in ghci...
runState (example 5) 3
(3,15)
I made it recursive, counting the number of steps it takes for a computation to satisfy some condition
Example 3
example :: Int -> State Int Int
example n = do
e <- get
if (n /= 1)
then do
put (succ e)
example (next n)
else return (succ e)
next :: Int -> Int
next n
| even n = div n 2
| otherwise = 3*n+1
ghci
evalState (example 13) 0
10
My question is, is there a way to do the previous example without explicitly passing a value?

You can store n in the state along side of e, for example, something like:
example = do
(e,n) <- get
if n /= 1
then do put (succ e, next n); example
else return e
There is some overhead to using the State monad, so you should compare this with the alternatives.
For instance, a more Haskelly way of approaching this problem is compose list operations to compute the answer, e.g.:
collatz :: Int -> [Int]
collatz n = iterate next n
collatzLength n = length $ takeWhile (/= 1) $ collatz n

Related

Alternative way to write a Haskell function of single input and output of this kind

Good morning everyone!
I'm using the following function as a fitting example of a function that needs to have a simple input and output. In this case it's a function that converts a number from decimal to binary form, as a list of digits no less, just because it is convenient later on.
I chose to write it like this, because even though a number goes in and a list comes out, another structure is needed as an intermediate step, that will hold the digits found so far and hold the quotient of the division, as input for the next step of the loop. I will clean up the necessary mess before outputing anything, though, by selecting the part of the structure that I'm interested in, in this case the second one , and not counters or other stuff, that I'm done with. (As I mentioned this is an example only, and it's not unusual in other cases to initialize the until loop with a triplet like (a,b,c), only to pick one of them at the end, as I see fit, with the help of additional function, like pickXof3.)
So there,
dec2Bin :: Int -> [Int]
dec2Bin num = snd $ until
(\(n,l) -> n <=0) -- test
(\(n,l) -> (fst $ division n, (snd $ division n):l)) -- main function
(num,[]) -- initialization
where division a = divMod a 2
I find it very convenient that Haskell, although lacking traditional for/while loops has a function like until, which reminds me very much of Mathematica's NextWhile, that I'm familiar with.
In the past I would write sth even uglier, like two functions, a "helper" one and a "main" one, like so
dec2BinHelper :: (Int,[Int]) -> (Int,[Int])
dec2BinHelper (n,l)
| n <= 0 = (n,l)
| otherwise = dec2BinHelper (fst $ division n, (snd $ division n):l)
where division a = divMod a 2
-- a function with the sole purpose to act as a front-end to the helper function, initializing its call parameters and picking up its output
dec2Bin :: Int -> [Int]
dec2Bin n = snd $ dec2BinHelper (n,[])
which I think is unnecessarily bloated.
Still, while the use of until allows me to define just one function, I get the feeling that it could be done even simpler/easier to read, perhaps in a way more fitting to functional programming. Is that so? How would you write such a function differently, while keeping the input and output at the absolutely essential values?
I strongly prefer your second solution. I'd start a clean-up with two things: use pattern matching, and use where to hide your helper functions. So:
dec2Bin :: Int -> [Int]
dec2Bin n = snd $ dec2BinHelper (n, []) where
dec2BinHelper (n, l)
| n <= 0 = (n, l)
| otherwise = dec2BinHelper (d, m:l)
where (d, m) = divMod n 2
Now, in the base case, you return a tuple; but then immediately call snd on it. Why not fuse the two?
dec2Bin :: Int -> [Int]
dec2Bin n = dec2BinHelper (n, []) where
dec2BinHelper (n, l)
| n <= 0 = l
| otherwise = dec2BinHelper (d, m:l)
where (d, m) = divMod n 2
There's no obvious reason why you should pass these arguments in a tuple, rather than as separate arguments, which is more idiomatic and saves some allocation/deallocation noise besides.
dec2Bin :: Int -> [Int]
dec2Bin n = dec2BinHelper n [] where
dec2BinHelper n l
| n <= 0 = l
| otherwise = dec2BinHelper d (m:l)
where (d, m) = divMod n 2
You can swap the arguments to dec2BinHelper and eta-reduce; that way, you will not be shadowing the definition of n.
dec2Bin :: Int -> [Int]
dec2Bin = dec2BinHelper [] where
dec2BinHelper l n
| n <= 0 = l
| otherwise = dec2BinHelper (m:l) d
where (d, m) = divMod n 2
Since you know that n > 0 in the recursive call, you can use the slightly faster quotRem in place of divMod. You could also consider using bitwise operations like (.&. 1) and shiftR 1; they may be even better, but you should benchmark to know for sure.
dec2Bin :: Int -> [Int]
dec2Bin = dec2BinHelper [] where
dec2BinHelper l n
| n <= 0 = l
| otherwise = dec2BinHelper (r:l) q
where (q, r) = quotRem n 2
When you don't have a descriptive name for your helper function, it's traditional to name it go or loop.
dec2Bin :: Int -> [Int]
dec2Bin = go [] where
go l n
| n <= 0 = l
| otherwise = go (r:l) q
where (q, r) = quotRem n 2
At this point, the two sides of the conditional are short enough that I'd be tempted to put them on their own line, though this is something of an aesthetic choice.
dec2Bin :: Int -> [Int]
dec2Bin = go [] where
go l n = if n <= 0 then l else go (r:l) q
where (q, r) = quotRem n 2
Finally, a comment on the name: the input isn't really in decimal in any meaningful sense. (Indeed, it's much more physically accurate to think of the input as already being in binary!) Perhaps int2Bin or something like that would be more accurate. Or let the type speak for itself, and just call it toBin.
toBin :: Int -> [Int]
toBin = go [] where
go l n = if n <= 0 then l else go (r:l) q
where (q, r) = quotRem n 2
At this point I'd consider this code quite idiomatic.

How to use "bind" keyword instead of ">>=" operator in Haskell

I'm trying to understand Haskell monads and wrote this test program, which compiles and works as expected:
divide :: Int -> Int -> Either String Int
divide _ 0 = Left "Divide by zero error
divide numerator denom = Right (numerator `div` denom)
processNumsMonadically :: Int -> Int -> Either String Int
processNumsMonadically n d = divide n d >>= \q -> return (q+1)
When I try using the word bind instead of the >>= operator in the latter function definition:
processNumsMonadically n d = bind (divide n d) (\q -> return (q+1))
I get the error:
Not in scope: 'bind'
What is the correct way to use the word bind?
This isn't a part of Prelude; it resides in Control.Monad.Extra, a part of monad-extras package.
However, you can call operators in prefix manner (like named functions) easily:
processNumsMonadically n d = (>>=) (divide n d) (\q -> return (q+1))
You could also just use do notation:
processNumsMonadically n d = do
q <- divide n d
return (q+1)
But while we're at it, I'd write using a Functor:
processNumsMonadically n d = fmap (+1) (divide n d)
or Applicative syntax:
processNumsMonadically n d = (+1) <$> divide n d
You could also lift the +1 to avoid the need for return and the lambda.
As a personal style remark, bind used as a word isn't idiomatic, and IMHO you shouldn't use it.

Implementing factorial and fibonacci using State monad (as a learning exercise)

I worked my way through Mike Vanier's monad tutorial (which is excellent) and I'm working on a few of the exercises in his post on how to use a "State" monad.
In particular, he suggests an exercise which consists of writing functions for factorial and fibonacci using a State monad. I gave it a shot and came up with the answers below. (I find do notation pretty confusing, hence my choice of syntax).
Neither of my implementations look particularly "Haskell-y" and, in the interest of not internalizing bad practices, I thought I'd ask folks for input on how they would've gone about implementing these functions (using the state monad). Is it possibly to write this code far more simply (aside from switching to do notation)? I strongly suspect this is the case.
I'm aware that it's a bit impractical to use a state monad for this purpose but this is purely a learning exercise - pun most certainly intended.
That said, the performance is not that much worse: in order to calc the factorial of 100000 (the answer is ~21k digits long), the unfoldr version took ~1.2 sec (in GHCi) vs. ~1.5 sec for the state monad version.
import Control.Monad.State (State, get, put, evalState)
import Data.List (unfoldr)
fibonacci :: Integer -> Integer
fibonacci 0 = 0
fibonacci n = evalState fib_state (1,0,1,n)
fib_state :: State (Integer,Integer,Integer,Integer) Integer
fib_state = get >>=
\s ->
let (p1,p2,ctr,n) = s
in case compare ctr n of
LT -> put (p1+p2, p1, ctr+1, n) >> fib_state
_ -> return p1
factorial :: Integer -> Integer
factorial n = evalState fact_state (n,1)
fact_state :: State (Integer,Integer) Integer
fact_state = get >>=
\s ->
let (n,f) = s
in case n of
0 -> return f
_ -> put (n-1,f*n) >> fact_state
-------------------------------------------------------------------
--Functions below are used only to test output of functions above
factorial' :: Integer -> Integer
factorial' n = product [1..n]
fibonacci' :: Int -> Integer
fibonacci' 0 = 1
fibonacci' 1 = 1
fibonacci' n =
let getFst (a,b,c) = a
in getFst
$ last
$ unfoldr (\(p1,p2,cnt) ->
if cnt == n
then Nothing
else Just ((p1,p2,cnt)
,(p1+p2,p1,cnt+1))
) (1,1,1)
Your functions seem to be a bit more complicated than they need to be, but you have the right idea. For the factorial, all you need to keep track of is the current number you're multiplying by and the number that you've accumulated so far. So, we'll say that State Int Int is a computation that operates on the current number on the state and returns the number that you've multiplied up until now:
fact_state :: State Int Int
fact_state = get >>= \x -> if x <= 1
then return 1
else (put (x - 1) >> fmap (*x) fact_state)
factorial :: Int -> Int
factorial = evalState fact_state
Prelude Control.Monad.State.Strict Control.Applicative> factorial <$> [1..10]
[1,2,6,24,120,720,5040,40320,362880,3628800]
The fibonacci sequence is similar. You need to keep the last two numbers in order to know what you're going to be adding together, and how far you've gone so far:
fibs_state :: State (Int, Int, Int) Int
fibs_state = get >>= \(x1, x2, n) -> if n == 0
then return x1
else (put (x2, x1+x2, n-1) >> fibs_state)
fibonacci n = evalState fibs_state (0, 1, n)
Prelude Control.Monad.State.Strict Control.Applicative> fibonacci <$> [1..10]
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
Two stylistic suggestions:
\s ->
let (p1,p2,ctr,n) = s
in ...
is equivalent to:
\(p1,p2,ctr,n) -> ...
and your case statement for fib_state may be written with an if statement:
if ctr < n
then put (p1+p2, p1, ctr+1, n) >> fib_state
else return p1

improvement to my code for Haskell monads

I am working with Haskell and maybe monads but I am a little bit confused with them
here is my code but I am getting error and I do not know how to improve my code.
doAdd :: Int -> Int -> Maybe Int
doAdd x y = do
result <- x + y
return result
Let's look critically at the type of the function that you're writing:
doAdd :: Int -> Int -> Maybe Int
The point of the Maybe monad is to work with types that are wrapped with a Maybe type constructor. In your case, the two Int arguments are just plain Ints, and the + function always produces an Int so there is no need for the monad.
If instead, your function took Maybe Int as its arguments, then you could use do notation to handle the Nothing case behind the scenes:
doAdd :: Maybe Int -> Maybe Int -> Maybe Int
doAdd mx my = do x <- mx
y <- my
return (x + y)
example1 = doAdd (Just 1) (Just 3) -- => Just 4
example2 = doAdd (Just 1) Nothing -- => Nothing
example3 = doAdd Nothing (Just 3) -- => Nothing
example4 = doAdd Nothing Nothing -- => Nothing
But we can extract a pattern from this: what you are doing, more generically, is taking a function ((+) :: Int -> Int -> Int) and adapting it to work in the case where the arguments it wants are "inside" a monad. We can abstract away from the specific function (+) and the specific monad (Maybe) and get this generic function:
liftM2 :: Monad m => (a -> b -> c) -> m a -> m b -> m c
liftM2 f ma mb = do a <- ma
b <- mb
return (f a b)
Now with liftM2 you can write:
doAdd :: Maybe Int -> Maybe Int -> Maybe Int
doAdd = liftM2 (+)
The reason why I chose the name liftM2 is because this is actually a library function—you don't need to write it, you can import the Control.Monad module and you'll get it for free.
What would be a better example of using the Maybe monad? When you have an operation that, unlike +, can intrinsically can produce a Maybe result. One idea would be if you wanted to catch division by 0 mistakes. You could write a "safe" version of the div function:
-- | Returns `Nothing` if second argument is zero.
safeDiv :: Int -> Int -> Maybe Int
safeDiv _ 0 = Nothing
safeDiv x y = Just (x `div` y)
Now in this case the monad does become more useful:
-- | This function tests whether `x` is divisible by `y`. Returns `Nothing` if
-- division by zero.
divisibleBy :: Int -> Int -> Maybe Bool
divisibleBy x y = do z <- safeDiv x y
let x' = z * y
return (x == x')
Another more interesting monad example is if you have operations that return more than one value—for example, positive and negative square roots:
-- Compute both square roots of x.
allSqrt x = [sqrt x, -(sqrt x)]
-- Example: add the square roots of 5 to those of 7.
example = do x <- allSqrt 5
y <- allSqrt 7
return (x + y)
Or using liftM2 from above:
example = liftM2 (+) (allSqrt 5) (allSqrt 7)
So anyway, a good rule of thumb is this: never "pollute" a function with a monad type if it doesn't really need it. Your original doAdd—and even my rewritten version—are a violation of this rule of thumb, because what the function does is adding, but adding has nothing to do with Maybe—the Nothing handling is just a behavior that we add on top of the core function (+). The reason for this rule of thumb is that any function that does not use monads can be generically adapted to add the behavior of any monad you want, using utility functions like liftM2 (and many other similar utility functions).
On the other hand, safeDiv and allSqrt are examples where you can't really write the function you want without using Maybe or []; if you are dealing with a function like that, then monads are often a convenient abstraction for eliminating boilerplate code.
A better example might be
justPositive :: Num a => a -> Maybe a
justPositive x
| x <= 0 = Nothing
| otherwise = Just x
addPositives x y = do
x' <- justPositive x
y' <- justPositive y
return $ x' + y'
This will filter out any non-positive values passed into the function using do notation
That isn't how you'd write that code. The <- operator is for getting a value out of a monad. The result of x + y is just a number, not a monad wrapping a number.
Do notation is actually completely wasteful here. If you were bound and determined to write it that way, it would have to look like this:
doAdd x y = do
let result = x + y
return result
But that's just a longwinded version of this:
doAdd x y = return $ x + y
Which is in turn equivalent to
doAdd x y = Just $ x + y
Which is how you'd actually write something like this.
The use case you give doesn't justify do notation, but this is a more common use case- You can chain functions of this type together.
func::Int->Int->Maybe Int -- func would be a function like divide, which is undefined for division by zero
main = do
result1 <- func 1 2
result2 <- func 3 4
result3 <- func result1 result2
return result3
This is the whole point of monads anyway, chaining together functions of type a->m a.
When used this way, the Maybe monad acts much like exceptions in Java (you can use Either if you want to propagate a message up).

Summing a large list of numbers is too slow

Task: "Sum the first 15,000,000 even numbers."
Haskell:
nats = [1..] :: [Int]
evens = filter even nats :: [Int]
MySum:: Int
MySum= sum $ take 15000000 evens
...but MySum takes ages. More precisely, about 10-20 times slower than C/C++.
Many times I've found, that a Haskell solution coded naturally is something like 10 times slower than C. I expected that GHC was a very neatly optimizing compiler and task such this don't seem that tough.
So, one would expect something like 1.5-2x slower than C. Where is the problem?
Can this be solved better?
This is the C code I'm comparing it with:
long long sum = 0;
int n = 0, i = 1;
for (;;) {
if (i % 2 == 0) {
sum += i;
n++;
}
if (n == 15000000)
break;
i++;
}
Edit 1: I really know, that it can be computed in O(1). Please, resist.
Edit 2: I really know, that evens are [2,4..] but the function even could be something else O(1) and need to be implemented as a function.
Lists are not loops
So don't be surprised if using lists as a loop replacement, you get slower code if the loop body is small.
nats = [1..] :: [Int]
evens = filter even nats :: [Int]
dumbSum :: Int
dumbSum = sum $ take 15000000 evens
sum is not a "good consumer", so GHC is not (yet) able to eliminate the intermediate lists completely.
If you compile with optimisations (and don't export nat), GHC is smart enough to fuse the filter with the enumeration,
Rec {
Main.main_go [Occ=LoopBreaker]
:: GHC.Prim.Int# -> GHC.Prim.Int# -> [GHC.Types.Int]
[GblId, Arity=1, Caf=NoCafRefs, Str=DmdType L]
Main.main_go =
\ (x_aV2 :: GHC.Prim.Int#) ->
let {
r_au7 :: GHC.Prim.Int# -> [GHC.Types.Int]
[LclId, Str=DmdType]
r_au7 =
case x_aV2 of wild_Xl {
__DEFAULT -> Main.main_go (GHC.Prim.+# wild_Xl 1);
9223372036854775807 -> n_r1RR
} } in
case GHC.Prim.remInt# x_aV2 2 of _ {
__DEFAULT -> r_au7;
0 ->
let {
wild_atm :: GHC.Types.Int
[LclId, Str=DmdType m]
wild_atm = GHC.Types.I# x_aV2 } in
let {
lvl_s1Rp :: [GHC.Types.Int]
[LclId]
lvl_s1Rp =
GHC.Types.:
# GHC.Types.Int wild_atm (GHC.Types.[] # GHC.Types.Int) } in
\ (m_aUL :: GHC.Prim.Int#) ->
case GHC.Prim.<=# m_aUL 1 of _ {
GHC.Types.False ->
GHC.Types.: # GHC.Types.Int wild_atm (r_au7 (GHC.Prim.-# m_aUL 1));
GHC.Types.True -> lvl_s1Rp
}
}
end Rec }
but that's as far as GHC's fusion takes it. You are left with boxing Ints and constructing list cells. If you give it a loop, like you give it to the C compiler,
module Main where
import Data.Bits
main :: IO ()
main = print dumbSum
dumbSum :: Int
dumbSum = go 0 0 1
where
go :: Int -> Int -> Int -> Int
go sm ct n
| ct >= 15000000 = sm
| n .&. 1 == 0 = go (sm + n) (ct+1) (n+1)
| otherwise = go sm ct (n+1)
you get the approximate relation of running times between the C and the Haskell version you expected.
This sort of algorithm is not what GHC has been taught to optimise well, there are bigger fish to fry elsewhere before the limited manpower is put into these optimisations.
The problem why list fusion can't work here is actually rather subtle. Say we define the right RULE to fuse the list away:
import GHC.Base
sum2 :: Num a => [a] -> a
sum2 = sum
{-# NOINLINE [1] sum2 #-}
{-# RULES "sum" forall (f :: forall b. (a->b->b)->b->b).
sum2 (build f) = f (+) 0 #-}
(The short explanation is that we define sum2 as an alias of sum, which we forbid GHC to inline early, so the RULE has a chance to fire before sum2 gets eliminated. Then we look for sum2 directly next to the list-builder build (see definition) and replace it by direct arithmetic.)
This has mixed success, as it yields the following Core:
Main.$wgo =
\ (w_s1T4 :: GHC.Prim.Int#) ->
case GHC.Prim.remInt# w_s1T4 2 of _ {
__DEFAULT ->
case w_s1T4 of wild_Xg {
__DEFAULT -> Main.$wgo (GHC.Prim.+# wild_Xg 1);
15000000 -> 0
};
0 ->
case w_s1T4 of wild_Xg {
__DEFAULT ->
case Main.$wgo (GHC.Prim.+# wild_Xg 1) of ww_s1T7 { __DEFAULT ->
GHC.Prim.+# wild_Xg ww_s1T7
};
15000000 -> 15000000
}
}
Which is nice, completely fused code - with the sole problem that we have a call to $wgo in a non-tail-call position. This means that we aren't looking at a loop, but actually at a deeply recursive function, with predictable program results:
Stack space overflow: current size 8388608 bytes.
The root problem here is that the Prelude's list fusion can only fuse right folds, and computing the sum as a right fold directly causes the excessive stack consumption.
The obvious fix would be to use a fusion framework that can actually deal with left folds, such as Duncan's stream-fusion package, which actually implements sum fusion.
Another solution would be to hack around it - and implement the left fold using a right fold:
main = print $ foldr (\x c -> c . (+x)) id [2,4..15000000] 0
This actually produces close-to-perfect code with current versions of GHC. On the other hand, this is generally not a good idea as it relies on GHC being smart enough to eliminate the partially applied functions. Already adding a filter into the chain will break that particular optimization.
Sum first 15,000,000 even numbers:
{-# LANGUAGE BangPatterns #-}
g :: Integer -- 15000000*15000001 = 225000015000000
g = go 1 0 0
where
go i !a c | c == 15000000 = a
go i !a c | even i = go (i+1) (a+i) (c+1)
go i !a c = go (i+1) a c
ought to be the fastest.
If you want to be sure to traverse the list only once, you can write the traversal explicitly:
nats = [1..] :: [Int]
requiredOfX :: Int -> Bool -- this way you can write a different requirement
requiredOfX x = even x
dumbSum :: Int
dumbSum = dumbSum' 0 0 nats
where dumbSum' acc 15000000 _ = acc
dumbSum' acc count (x:xs)
| requiredOfX x = dumbSum' (acc + x) (count + 1) xs
| otherwise = dumbSum' acc (count + 1) xs
First, you can be clever as young Gauss was and compute the sum in O(1).
Fun stuff aside, your Haskell solution uses lists. I'm quite sure your C/C++ solution doesn't. (Haskell lists are very easy to use so one is tempted to use them even in cases where it might not be appropriate.) Try benchmarking this:
sumBy2 :: Integer -> Integer
sumBy2 = f 0
where
f result n | n <= 1 = result
| otherwise = f (n + result) (n - 2)
Compile it using GHC with -O2 argument. This function is tail-recursive so compiler can implement it very efficiently.
Update: If you want it using even function, it's possible:
sumBy2 :: Integer -> Integer
sumBy2 = f 0
where
f result n | n <= 0 = result
| even n = f (n + result) (n - 1)
| otherwise = f result (n - 1)
You can also easily make the filtering function a parameter:
sumFilter :: (Integral a) => (a -> Bool) -> a -> a
sumFilter filtfn = f 0
where
f result n | n <= 0 = result
| filtfn n = f (n + result) (n - 1)
| otherwise = f result (n - 1)
Strict version works much faster:
foldl' (+) 0 $ take 15000000 [2, 4..]
Another thing to note is that nats and evens are so-called Constant Applicative Forms, or CAFs for short. Basically, those correspond to top-level definitions without any arguments. CAFs are a bit of an odd duck, for instance being the reason for the Dreaded Monomorphism Restriction; I'm not sure the language definition even allows CAFs to be inlined.
In my mental model of how Haskell executes, by the time dumbSum returns a value, evens will be evaluated to look something like 2:4: ... : 30000000 : <thunk> and nats to 1:2: ... : 30000000 : <thunk>, where the <thunk>s represent something that's not been looked at yet. If my understanding is correct, these allocations of : do have to happen and can't be optimized away.
So one way of speeding things up without altering your code too much would be to simply write:
dumbSum :: Int
dumbSum = sum . take 15000000 . filter even $ [1..]
or
dumbSum = sum $ take 15000000 evens where
nats = [1..]
evens = filter even nats
On my machine, compiled with -O2, that alone seems to result in a roughly 30% speedup.
I'm no GHC connaisseur (I've never even profiled a Haskell program!), so I could be wildly off the mark, though.

Resources