Haskell - Function which returns a different value each time it iterates - haskell

Say that I have a function:
doesAThing :: Int -> ChangeState
For the purpose of this question it's not especially important what ChangeState is, only that doesAThing needs to take an Int as a parameter and that doesAThing iterates infinitely.
What I want to do is take such a function:
addNum :: [Int] -> Int
addNum [n] = foldl (+) 0 ([n] ++ [n + 1])
and use it for doesAThing. Currently, the function works fine, however it doesn't do what I want it to do - it always returns the same number. The idea is that each time doesAThing iterates, addNum takes whatever its previous output was and uses that as addNum's parameter. I.e. :
First iteration: say that n was set at 0. addNum returns 1 (0 + 0 + 1) and doesAThing uses that to modify ChangeState.
Second iteration: now addNum takes 1 as its parameter, so that it returns 3 (0 + 1 + 2) and doesAThing uses that to modify ChangeState.
Third iteration: now addNum takes 3 as its parameter, returns 7 (0 + 3 + 4) and doesAThing uses 7 to modify ChangeState.
Etc.
Apologies if this is a really noobish question, self-teaching yourself Haskell can be hard sometimes.

If you want to change something, that means you need mutable state. There are two options. First, you can make current state one of the arguments of your function, and the new state — part of a result. Like addNum :: [Int] -> ([Int], Int). There is a State monad that helps with that, giving an illusion that there actually is some mutable state.
Secondly, you can store your state in an IORef and make your function use IO monad, like addNum :: IORef [Int] -> IO Int. That way you can actually modify the state kept in the IORef. There are other monads that allow the same thing, like ST, which is great if your mutable state is used only locally, or STM, which helps if your application is highly concurrent.
I would strongly recommend the first option though. Don't use IO (or other imperative things) until you absolutely need it.

You have encountered a situations where a programmer who is comfortable with Haskell would likely turn to monads. In particular the state monad, for which there are a few reasonable tutorials online:
https://acm.wustl.edu/functional/state-monad.php
http://brandon.si/code/the-state-monad-a-tutorial-for-the-confused/
https://www.schoolofhaskell.com/school/starting-with-haskell/basics-of-haskell/12-State-Monad
I didn't quite understand the "addNum" operation you are trying to describe, and I think there's some flaws with your attempts to define it. For example, the fact that your code always expects a list of exactly one element suggests that there shouldn't be a list or a foldl at all—just take n as an argument and add.
But from your description, I think the following approximates it as best as I can. (This won't be understandable without studying monads and the state monad a little bit, but hopefully it gives you an example to work with in conjunction with other materials.)
import Control.Monad.State
-- Add the previous `addNum` result to the argument.
addNum :: Int -> State Int Int
addNum n = do
-- Precondition: the previous call to `addNum` used `put`
-- (a few lines below here) to record its result as the
-- implicit state for the `State` monad.
previous <- get
let newResult = previous + n
-- Here we fulfill the precondition above.
put newResult
return newResult
The idea of the State monad is that you have get and put actions such that executing get retrieves the value that was most recently given as argument to put. To actually use addNum you need to do it within the context of a call to a function like evalState :: State s a -> s -> a, which forces you to specify the initial state that will be seen by the very first get in the very first use of addNum.
So, for example, here we use the traverse function to chain calls to addNum on consecutive elements of the list [1..10]. Each call to addNum for each list element will get the newResult value that was put by the call for the previous element. And the 0 argument to evalState means that the very first addNum call gets a 0:
>>> evalState (traverse addNum [1..10]) 0
[1,3,6,10,15,21,28,36,45,55]
If this feels overwhelming, well, for better or worse that's how Haskell feels at first. Keep at it and build up slowly to examples like this one.

What you want is impossible in Haskell, for good reason, see below.
There is a way to iterate over a function, however, by feeding it its own output in the next iteration.
Here is an example:
iterate (\c -> c + 2) 0
This creates the infinite list
[0,2,4,6,....]
Haskell is a pure language, and this means that a function can only access its arguments, constants, other functions and nothing else. Esqecially, there is no hidden state a function can access. Therefore, with the same input, a Haskell function will compute the same output all times.

Related

Infinite list of Fibonacci numbers in Haskell

Sorry for a question like this. I'm a very beginner programmer, and I'm just started to learn about Haskell. I recently ran into an exercise to implement a function in Haskell that returns an infinite list of Fibonacci numbers. The following code was the answer to the exercise:
fibs :: [Int]
fibs = fibs2 0
where
fibs2 :: Int -> [Int]
fibs2 x = (fib2) x : (fibs2 (x+1))
Can someone explain to me why we should declare another function (fibs2) here and what "where" does in this case?
Can someone explain to me why we should declare another function (fibs2) here?
You certainly aren't obligated to declare another function. However, this particular pattern is quite common. Think of it a bit like loop initialization in other languages. If you want to iterate some process, the easiest way to do that is to write a function that takes some information describing where you are in the iteration, does one step of the "loop", then calls itself with a suitably modified description. For example, if you wanted to sum up all the numbers from 0 to n, you might write:
sumTo :: Int -> Int
sumTo 0 = 0
sumTo n = n + foo (n-1)
BUT frequently the function or value you want is actually the one that starts at a specific value. It's annoying to force all callers of your loop to specify that starting value; and the fact that you've implemented your loop as a recursive function with an argument is an implementation detail they shouldn't have to worry about anyway. So what to do? Well, you define something that calls the loop with the right starting value.
gauss :: Int
gauss = sumTo 100
This way, users can just use gauss and not have to know that 100 is the right starting value for your internal function.
Can someone explain to me what "where" does in this case?
Well, there's one more thing that's a bit unfortunate about our previous sumTo/gauss values: we aren't really interested in sumTo itself, only in gauss, and the fact that it's visible outside of gauss is a violation of an abstraction barrier! If it's easy to call, it may be that somebody else tries to use it; then, if we need to change it to improve what gauss does, we are improving gauss but potentially breaking what that other user is using sumTo for. So we'd like to hide its existence.
That is the purpose of where here: it allows you to define a new thing that's accessible only locally. So:
gauss :: Int
gauss = sumTo 100 where
sumTo 0 = 0
sumTo n = n + sumTo (n-1)
In this variant, gauss can be called, but outside of the implementation of gauss, it isn't possible to call sumTo, maintaining a nice abstraction boundary.

GHC: Are there consistent rules for memoization for calls with fixed values?

In my quest to understand and harness GHC automatic memoization, I've hit a wall: when pure functions are called with fixed values like fib 42, they are sometimes fast and sometimes slow when called again. It varies if they're called plainly like fib 42 or implicitly through some math, e.g. (\x -> fib (x - 1)) 43. The cases have no seeming rhyme or reason, so I'll present them with the intention of asking what the logic is behind the behavior.
Consider a slow Fibonacci implementation, which makes it obvious when the memoization is working:
slow_fib :: Int -> Integer
slow_fib n = if n < 2 then 1 else (slow_fib (n - 1)) + (slow_fib (n - 2))
I tested three basic questions to see if GHC (version 8.2.2) will memoize calls with fixed args:
Can slow_fib access previous top-level calls to slow_fib?
Are previous results memoized for later non-trivial (e.g. math) top-level expressions?
Are previous results memoized for later identical top-level expressions?
The answers seem to be:
No
No
Yes [??]
The fact that the last case works is very confusing to me: if I can reprint the result for example, then I should expect to be able to add them. Here's the code that shows this:
main = do
-- 1. all three of these are slow, even though `slow_fib 37` is
-- just the sum of the other two results. Definitely no memoization.
putStrLn $ show $ slow_fib 35
putStrLn $ show $ slow_fib 36
putStrLn $ show $ slow_fib 37
-- 2. also slow, definitely no memoization as well.
putStrLn $ show $ (slow_fib 35) + (slow_fib 36) + (slow_fib 37)
putStrLn $ show $ (slow_fib 35) + 1
-- 3. all three of these are instant. Huh?
putStrLn $ show $ slow_fib 35
putStrLn $ show $ slow_fib 36
putStrLn $ show $ slow_fib 37
Yet stranger, doing math on the results worked when it's embedded in a recursive function: this fibonacci variant that starts at Fib(40):
let fib_plus_40 n = if n <= 0
then slow_fib 40
else (fib_plus_40 (n - 1)) + (fib_plus_40 (n - 2))
Shown by the following:
main = do
-- slow as expected
putStrLn $ show $ fib_plus_40 0
-- instant. Why?!
putStrLn $ show $ fib_plus_40 1
I can't find any reasoning for this in any explanations for GHC memoization, which typically incriminate explicit variables (e.g. here, here, and here). This is why I expected fib_plus_40 to fail to memoize.
To elaborate in case it wasn't clear from #amalloy's answer, the problem is that you're conflating two things here -- the implicit memoization-like-behavior (what people mean when they talk about Haskell's "automatic memoization", though it is not true memoization!) that results directly from thunk-based lazy evaluation, and a compiler optimization technique that's basically a form of common subexpression elimination. The former is predictable, more or less; the latter is at the whim of the compiler.
Recall that real memoization is a property of the implementation of a function: the function "remembers" results calculated for certain combinations of arguments, and may reuse those results instead of recalculating them from scratch when called multiple times with the same arguments. When GHC generates code for functions, it does not automatically generate code to perform this kind of memoization.
Instead, the GHC code generates to implement function application is unusual. Instead of actually applying the function to arguments to generate the final result as a value, a "result" is immediately constructed in the form of a thunk, which you can view as a suspended function call or a "promise" to deliver a value at a later time.
When, at some future point, the actual value is needed, the thunk is forced (which actually causes the original function call to take place), and the thunk is updated with the value. If that same value is needed again later, the value is already available, so the thunk doesn't need to be forced a second time. This is the "automatic memoization". Note that it takes place at the "result" level rather than the "function" level -- the result of a function application remembers its value; a function does not remember the results it previously produced.
Now, normally the concept of the result of a function application remembering its value would be ridiculous. In strict languages, we don't worry that after x = sqrt(10), reusing x will cause multiple sqrt calls because x hasn't "memoized" its own value. That is, in strict languages, all function application results are "automatically memoized" in the same sense they are in Haskell.
The difference is lazy evaluation, which allows us to write something like:
stuff = map expensiveComputation [1..10000]
which returns a thunk immediately without performing any expensive computations. Afterwards:
f n = stuff !! n
magically creates a memoized function, not because GHC generates code in the implementation of f to somehow memoize the call f 1000, but because f 1000 forces (a bunch of list constructor thunks and then) a single expensiveComputation whose return value is "memoized" as the value at index 1000 in the list stuff -- it was a thunk, but after being forced, it remembers its own value, just like any value in a strict language would.
So, given your definition of slow_fib, none of your examples are actually making use of Haskell's automatic memoization, in the usual sense people mean. Any speedups you're seeing are the result of various compiler optimizations that are (or aren't) recognizing common subexpressions or inlining / unwrapping short loops.
To write a memoized fib, you need to do it as explicitly as you would in a strict language, by creating a data structure to hold the memoized values, though lazy evaluation and mutually recursive definitions can sometimes make it seem like it's "automatic":
import qualified Data.Vector as V
import Data.Vector (Vector,(!))
fibv :: Vector Integer
fibv = V.generate 1000000 getfib
where getfib 0 = 1
getfib 1 = 1
getfib i = fibv ! (i-1) + fibv ! (i-2)
fib :: Int -> Integer
fib n = fibv ! n
All of the examples you link at the end exploit the same technique: instead of implementing function f directly, they first introduce a list whose contents are all the calls to f that could ever be made. That list is computed only once, lazily; and then a simple lookup in that list is used as the implementation of the user-facing function. So, they are not relying on any caching from GHC.
Your question is different: you hope that calling some function will be automatically cached for you, and in general that does not happen. The real question is why any of your results are fast. I'm not sure, but I think it is to do with Constant Applicative Forms (CAFs), which GHC may share between multiple use sites, at its discretion.
The most relevant feature of a CAF here is the "Constant" part: GHC will only introduce such a cache for an expression whose value is constant throughout the entire run of the program, not just for some particular scope. So, you can be sure that f x <> f x will never reuse the result of f x (at least not due to CAF folding; maybe GHC can find some other excuse to memoize this for some functions, but typically it does not).
The two things in your program that are not CAFs are the implementation of slow_fib, and the recursive case of fib_plus_40. GHC definitely cannot introduce any caching of the results of those expressions. The base case for fib_plus_40 is a CAF, as are all of the expressions and subexpressions in main. So, GHC can choose to cache/share any of those subexpressions, and not share any of them, as it pleases. Perhaps it sees that slow_fib 40 is "obviously" simple enough to save, but it's not so sure about whether the slow_fib 35 expressions in main should be shared. Meanwhile, it sounds like it does decide to share the IO action putStrLn $ show $ slow_fib 35 for whatever reason. Seems like a weird choice to you and me, but we're not compilers.
The moral here is that you cannot count on this at all: if you want to ensure you compute a value only once, you need to save it in a variable somewhere, and refer to that variable instead of recomputing it.
To confirm this, I took luqui's advice and looked at the -ddump-simpl output. Here are some snippets showing the explicit caching:
-- RHS size: {terms: 2, types: 0, coercions: 0}
lvl1_r4ER :: Integer
[GblId, Str=DmdType]
lvl1_r4ER = $wslow_fib_r4EP 40#
Rec {
-- RHS size: {terms: 21, types: 4, coercions: 0}
Main.main_fib_plus_40 [Occ=LoopBreaker] :: Integer -> Integer
[GblId, Arity=1, Str=DmdType <S,U>]
Main.main_fib_plus_40 =
\ (n_a1DF :: Integer) ->
case integer-gmp-1.0.0.1:GHC.Integer.Type.leInteger#
n_a1DF Main.main7
of wild_a2aQ { __DEFAULT ->
case GHC.Prim.tagToEnum# # Bool wild_a2aQ of _ [Occ=Dead] {
False ->
integer-gmp-1.0.0.1:GHC.Integer.Type.plusInteger
(Main.main_fib_plus_40
(integer-gmp-1.0.0.1:GHC.Integer.Type.minusInteger
n_a1DF Main.main4))
(Main.main_fib_plus_40
(integer-gmp-1.0.0.1:GHC.Integer.Type.minusInteger
n_a1DF lvl_r4EQ));
True -> lvl1_r4ER
}
}
end Rec }
This doesn't tell us why GHC is choosing to introduce this cache - remember, it's allowed to do what it wants. But it does confirm the mechanism, that it introduces a variable to hold the repeated calculation. I can't show you core for your longer main involving smaller numbers, because when I compile it I get more sharing: the expressions in section 2 are cached for me as well.

What's the terminology for when a boolean/tagged union determines whether the current execution of a recursive function is at its first iteration?

I remember seeing in Erlang, that a wrapper function of a recursive function will sometimes pass an atom that determines whether the recursion is at the first iteration (n = 1) or some successive iterations (n > 1). This is useful when a recursive function needs to change its behaviour after the first iteration. What is this pattern called?
Furthermore is this pattern also appropriate in Haskell? I wrote a small snippet using it, look at the first boolean:
import Data.Char (digitToInt, isDigit)
data Token = Num Integer deriving (Show)
tokeniseNumber :: String -> (String, Maybe Token)
tokeniseNumber input = accumulateNumber input 0 True
where
accumulateNumber :: String -> Integer -> Bool -> (String, Maybe Token)
accumulateNumber [] value True = ([], Nothing)
accumulateNumber [] value False = ([], Just (Num value))
accumulateNumber input#(peek:tail) value first =
case isDigit peek of
False | first -> (input, Nothing)
False | not first -> (input, Just (Num value))
True -> accumulateNumber tail (value * 10 + (toInteger . digitToInt) peek) False
-- Edit --
zxq9 posted an answer and later deleted. But I actually think the answer has merit.
This is cleaner to define as a set of separate functions that each behave some specific way, and a function head match that determines which of those functions to dispatch (Haskell provides a broader array of type-based function matching tools here). In other words, a certain style of "finite state machine" is what you are looking for.
The states can be styled as function names or as a state argument; which to use depends on the context and language, and this can extend to the state argument being a function name and that itself being a sort of match.
What is best for Haskell is usually not what works best for Erlang. Many one-off tasks are delegated to separate processes in Erlang, and even process instantiation in Erlang goes through an "init state" when it calls init, which is essentially the same thing as when you say "when a recursive function needs to change its behaviour after the first iteration". OTOH, Haskell provides more ways to match on a function head. In either case taking an approach where a named function defines an operating state is cleaner. The result will be code that is not nested, doesn't require procedural conditionals, and can be called from anywhere more easily (more flexibly dealt with when you re-write your program later...).
FSMs are a general way of determining what code to execute based on state, and initialization of a function (or process) is a special case of that. I've heard this called "pass-through initialization" as in, the entry function defines the interface, does one-time processing to set up the main procedure and passes execution through to it:
init(Args) ->
{ok, NewArgs} = one_time_processing(Args),
loop(NewArgs).
loop(Args) ->
{ok, NewArgs} = do_stuff(Args),
loop(NewArgs).
Of course, the above is an infinite loop, so its more common to either have a check for exit at the end of the loop/1 function, or (more often) a match in the function head of loop:
loop(done, Args) ->
Args;
loop(cont, Args) ->
{Cond, NewArgs} = do_stuff(Args),
loop(Cond, NewArgs).
But in either case it is better to have the initialization of a process be its own procedure, separately defined from whatever the body of the loop is. Other languages with looping constructs handle this differently with some combination of conditional checks applied a special way based on which style of loop the programmer chooses, but the effect is the same. Very often the most obvious way to implement this procedurally is to do the same: wrap the whole loop behind a function call, and the steps that precede the loop are the "one time" initialization parts. In this case its not that the loop is "wrapped" in a function call, but that you write an interface function to access it which does some one-time initialization on the way to calling it.
To expand on my comment about boolean blindness, I don't just mean using another type isomorphic to 2, but rather, using the right type to encode the reason your recursive function cares about which iteration it is.
Compare your code to the following version which is I'd say cleaner and more succint. It hinges on passing a Maybe Integer instead of an (Integer, Bool) to accumulateNumber.
import Data.Char (digitToInt, isDigit)
import Data.Maybe
import Control.Applicative
data Token = Num Integer deriving (Show)
tokeniseNumber :: String -> (String, Maybe Token)
tokeniseNumber input = accumulateNumber input Nothing
where
accumulateNumber :: String -> Maybe Integer -> (String, Maybe Token)
accumulateNumber input#(peek:tail) value
| isDigit peek = accumulateNumber tail (Just $ toNum (fromMaybe 0 value) peek)
accumulateNumber input value = (input, Num <$> value)
toNum value peek = value * 10 + (toInteger . digitToInt) peek
Also wanted to point out that I discovered an academic paper that discusses this exact technique.
It's called the "The worker/wrapper transformation" by Andy Gill & Graham Hutton (2009)
Link: http://www.cs.nott.ac.uk/~gmh/wrapper.pdf

How to write the type declaration of an Haskell function with no arguments?

How to write the type declaration of an haskell function without arguments?
There is no such thing as a function without arguments, that would be just a value. Sure, you can write such a declaration:
five :: Int
five = 5
It might look more like what you asked for if I make it
five' :: () -> Int
five' () = 5
but that's completely equivalent (unless you write something ridiculous like five' undefined) and superfluent1.
If what you mean is something like, in C
void scream() {
printf("Aaaah!\n");
}
then that's again not a function but an action. (C programmers do call it function, but you might better say procedure, everybody would understand.) What I said above holds pretty much the same way, you'd use
scream :: IO()
scream = putStrLn "Aaaah!"
Note that the empty () do in this case not have anything to do with not having arguments (that follows already from the absence of -> arrows), instead it means there is also no return value, it's just a "side-effect-only" action.
1Actually, it differs in one relevant way: five is a constant applicative form, which sort of means it's memoised. If I had defined such a constant in some roundabout way (e.g. sum $ 5 : replicate 1000000 0) then the lengthy calculation would be carried out only once, even if five is evaluated multiple times during a program run. OTOH, wherever you would have written out five' (), the calculation would have been done anew.
Since functions in Haskell are pure (their result only depends on their arguments), the equivalent of a function with no arguments is just a value. For example, one = 1.

Evaluation of nullary functions in Haskell

Suppose you have a nullary function in Haskell, which is used several times in the code. Is it always evaluated only once? I already tested the following code:
sayHello :: Int
sayHello = unsafePerformIO $ do
putStr "Hello"
return 42
test :: Int -> [Int]
test 0 = []
test n = (sayHello:(test (n-1)))
When I call test 10, it writes "Hello" only once, so it's indicating the result of function is stored after first evaluation. My question is, is it guaranteed? Will I get the same result across different compilers?
Edit
The reason I used unsafePerformIO is to check whether sayHello is evaluated more than once. I don't use that in my program. Normally I expect sayHello to have exactly the same result every time its evaluated. But it's a time-consuming operation, so I wanted to know if it could be accessed this way, or if it should be passed as an argument wherever it's needed to ensure it is not evaluated multiple times, i.e.:
test _ 0 = []
test s n = (s:(test (n-1)))
...
test sayHello 10
According to the answers this should be used.
There is no such thing as a nullary function. A function in Haskell has exactly one argument, and always has type ... -> .... sayHello is a value -- an Int -- but not a function. See this article for more.
On guarantees: No, you don't really get any guarantees. The Haskell report specifies that Haskell is non-strict -- so you know what value things will eventually reduce to -- but not any particular evaluation strategy. The evaluation strategy GHC generally uses is lazy evaluation, i.e. non-strict evaluation with sharing, but it doesn't make strong guarantees about that -- the optimizer could shuffle your code around so that things are evaluated more than once.
There are also various exceptions -- for example, foo :: Num a => a is polymorphic, so it probably won't be shared (it's compiled to an actual function). Sometimes a pure value might be evaluated by more than one thread at the same time (that won't happen in this case because unsafePerformIO explicitly uses noDuplicate to avoid it). So when you program, you can generally expect laziness, but if you want any sort of guarantees you'll have to be very careful. The Report itself won't really give you anything on how your program is evaluated.
unsafePerformIO gives you even less in the way of guarantees, of course. There's a reason it's called "unsafe".
Top level no-argument functions like sayHello are called Constant Applicative Forms and are always memoised (atleast in GHC - see http://www.haskell.org/ghc/docs/7.2.1/html/users_guide/profiling.html). You would have to resort to tricks like passing in dummy arguments and turning optimisations off to not share a CAF globally.
Edit: quote from the link above -
Haskell is a lazy language, and certain expressions are only ever
evaluated once. For example, if we write:
x = nfib 25 then x will only be evaluated once (if at all), and
subsequent demands for x will immediately get to see the cached result.
The definition x is called a CAF (Constant Applicative Form), because
it has no arguments.
If you do want "Hello" printed n times, you need to remove the unsafePermformIO, so the runtime will know it can't optimize away repeated calls to putStr. I'm not clear whether you want to return the list of int, so I've written two versions of test, one of which returns (), one [Int].
sayHello2 :: IO Int
sayHello2 = do
putStr "Hello"
return 42
test2 :: Int -> IO ()
test2 0 = return ()
test2 n = do
sayHello2
test2 (n-1)
test3 :: Int -> IO [Int]
test3 0 = return []
test3 n = do
r <- sayHello2
l <- test3 (n-1)
return $ r:l

Resources