Call by need: When is it used in Haskell? - haskell

http://en.wikipedia.org/wiki/Evaluation_strategy#Call_by_need says:
"Call-by-need is a memoized version of call-by-name where, if the function argument is evaluated, that value is stored for subsequent uses. [...] Haskell is the most well-known language that uses call-by-need evaluation."
However, the value of a computation is not always stored for faster access (for example consider a recursive definition of fibonacci numbers). I asked someone on #haskell and the answer was that this memoization is done automatically "only in one instance, e.g. if you have `let foo = bar baz', foo will be evaluated once".
My questions is: What does instance exactly mean, are there other cases than let in which memoization is done automatically?

Describing this behavior as "memoization" is misleading. "Call by need" just means that a given input to a function will be evaluated somewhere between 0 and 1 times, never more than once. (It could be partially evaluated as well, which means the function only needed part of that input.) In contrast, "call by name" is simply expression substitution, which means if you give the expression 2 + 3 as an input to a function, it may be evaluated multiple times if the input is used more than once. Both call by need and call by name are non-strict: if the input is not used, then it is never evaluated. Most programming languages are strict, and use a "call by value" approach, which means that all inputs are evaluated before you begin evaluating the function, whether or not the inputs are used. This all has nothing to do with let expressions.
Haskell does not perform any automatic memoization. Let expressions are not an example of memoization. However, most compilers will evaluate let bindings in a call-by-need-esque fashion. If you model a let expression as a function, then the "call by need" mentality does apply:
let foo = expression one in expression two that uses foo
==>
(\foo -> expression two that uses foo) (expression one)
This doesn't correctly model recursive bindings, but you get the idea.

The haskell language definition does not define when, or how often, code is invoked. Infinite loops are defined in terms of 'the bottom' (written ⊥), which is a value (which exists within all types) that represents an error condition. The compiler is free to make its own decisions regarding when and how often to evaluate things as long as the program (and presence/absence of error conditions, including infinite loops!) behaves according to spec.
That said, the usual way of doing this is that most expressions generate 'thunks' - basically a pointer to some code and some context data. The first time you attempt to examine the result of the expression (ie, pattern match it), the thunk is 'forced'; the pointed-to code is executed, and the thunk overwritten with real data. This in turn can recursively evaluate other thunks.
Of course, doing this all the time is slow, so the compiler usually tries to analyze when you'd end up forcing a thunk right away anyway (ie, when something is 'strict' on the value in question), and if it finds this, it'll skip the whole thunk thing and just call the code right away. If it can't prove this, it can still make this optimization as long as it makes sure that executing the thunk right away can't crash or cause an infinite loop (or it handles these conditions somehow).

If you don't want to have to get very technical about this, the essential point is that when you have an expression like some_expensive_computation of all these arguments, you can do whatever you want with it; store it in a data structure, create a list of 53 copies of it, pass it to 6 other functions, etc, and then even return it to your caller for the caller to do whatever it wants with it.
What Haskell will (mostly) do is evaluate it at most once; if it the program ever needs to know something about what that expression returned in order to make a decision, then it will be evaluated (at least enough to know which way the decision should go). That evaluation will affect all the other references to the same expression, even if they are now scattered around in data structures and other not-yet-evaluated expressions all throughout your program.

Related

Saving the result of applying a function to a variable in the same variable

I was remembering the haskell programming I learnt the last year and suddenly I had a little problem.
ghci> let test = [1,2,3,4]
ghci> test = drop 1 test
ghci> test
^CInterrupted.
I do not remember if it is possible.
Thanks!
test on the first line and test on the second line are not, in fact, the same variable. They're two different, separate, unrelated variables that just happen to have the same name.
Further, the concept of "saving in a variable" does not apply to Haskell. In Haskell, variables cannot be interpreted as "memory cells", which can hold values. Haskell's variables are more like mathematical variables - just names that you give to complex expressions for easier reasoning (well, this is a bit of an oversimplification, but good enough for now)
Consequently, variables in Haskell are immutable. You cannot change the value of a variable by assignment, like you can in many other languages. This property follows from interpreting the concept of "variable" in the mathematical sense, as described above.
Furthermore, definitions (aka "bindings") in Haskell are recursive. This means that the right side (the body) of a binding may refer to its left side. This is very handy for constructing infinite data structures, for example:
x = 42 : x
An infinite list of 42s
In your example, when you write test = drop 1 test, you're defining a list named test, which is completely unrelated to the list defined on the previous line, and which is equal to itself without the first element. It's only natural that trying to print such a list results in an infinite loop.
The bottom line is: you cannot do what you're trying to do. You cannot create a new binding, which shadows an existing binding, while at the same time references it. Just give it a different name.

Guarantee of sameness of output after switching order in functional programming

I started reading some of Haskell's documentation, and there's a fundamental concept I just don't understand. I read about it in other places as well, but I want to understand it once and for all.
In many places discussing functional programing, I keep reading that if the functions you're using are pure (have no side effects, and give same response for the same input at every call) then you can switch the order in which they are called when composing them, with it being guaranteed that the output of this composed call will remain the same regardless of the order.
For example, here is an entry from the Haskell Wiki:
Haskell is a pure language, which means that the result of any
function call is fully determined by its arguments. Pseudo-functions
like rand() or getchar() in C, which return different results on each
call, are simply impossible to write in Haskell. Moreover, Haskell
functions can't have side effects, which means that they can't effect
any changes to the "real world", like changing files, writing to the
screen, printing, sending data over the network, and so on. These two
restrictions together mean that any function call can be replaced by
the result of a previous call with the same parameters, and the
language guarantees that all these rearrangements will not change the
program result!
But when I fiddle with this idea I can quickly think of examples that contradict the statement above. For instance, let's say I have two functions (I will use pseudo code rather than Haskell):
x(a)->a+3
y(a)->a*3
z(a)->x(y(a))
w(a)->y(x(a))
Now, if we execute z and w, we get:
z(5) //gives 3*5+3=18
w(5) //gives (5+3)*3=24
That being so, I think I misunderstood the promised guarantee they speak about. Can anybody explain it to me?
When you compare x(y(a)) to y(x(a)), those two expressions are not equivalent because x and y aren't called with the same arguments in each. In the first expression x is called with the argument y(a) and y is called with the argument a. Whereas in the second y is called with x(a), not a, as its argument and x is called with a, not y(a). So: different arguments, (possibly) different results.
When people say that the order does not matter, they mean that in the following code:
a = f(x)
b = g(y)
you can switch the definition of a and b without affecting their values. That is it makes no difference whether f is called before g or vice versa. This is clearly not true for the following code:
a = getchar()
b = getchar()
If you switch a and b here, their values are switched as well, because getchar returns a (possibly) different character each time that it's called. So a purely functional language can't have a function exactly like getchar.

Accumulator factory in Haskell

Now, at the start of my adventure with programming I have some problems understanding basic concepts. Here is one related to Haskell or perhaps generally functional paradigm.
Here is a general statement of accumulator factory problem, from
http://rosettacode.org/wiki/Accumulator_factory
[Write a function that]
Takes a number n and returns a function (lets call it g), that takes a number i, and returns n incremented by the accumulation of i from every call of function g(i).
Works for any numeric type-- i.e. can take both ints and floats and returns functions that can take both ints and floats. (It is not enough simply to convert all input to floats. An accumulator that has only seen integers must return integers.) (i.e., if the language doesn't allow for numeric polymorphism, you have to use overloading or something like that)
Generates functions that return the sum of every number ever passed to them, not just the most recent. (This requires a piece of state to hold the accumulated value, which in turn means that pure functional languages can't be used for this task.)
Returns a real function, meaning something that you can use wherever you could use a function you had defined in the ordinary way in the text of your program. (Follow your language's conventions here.)
Doesn't store the accumulated value or the returned functions in a way that could cause them to be inadvertently modified by other code. (No global variables or other such things.)
with, as I understand, a key point being:
"[...] creating a function that [...]
Generates functions that return the sum of every number ever passed to them, not just the most recent. (This requires a piece of state to hold the accumulated value, which in turn means that pure functional languages can't be used for this task.)"
We can find a Haskell solution on the same website and it seems to do just what the quote above says.
Here
http://rosettacode.org/wiki/Category:Haskell
it is said that Haskell is purely functional.
What is then the explanation of the apparent contradiction? Or maybe there is no contradiction and I simply lack some understanding? Thanks.
The Haskell solution does not actually quite follow the rules of the challenge. In particular, it violates the rule that the function "Returns a real function, meaning something that you can use wherever you could use a function you had defined in the ordinary way in the text of your program." Instead of returning a real function, it returns an ST computation that produces a function that itself produces more ST computations. Within the context of an ST "state thread", you can create and use mutable references (STRef), arrays, and vectors. However, it's impossible for this mutable state to "leak" outside the state thread to contaminate pure code.

How to run several operations back to back in a lambda in Haskell?

I'm teaching myself Haskell, and I am having difficulty understanding how I might pipeline a number of operations in the body of a lambda function without using a do block. Take the following for example:
par = (\c->
c + 1
c + 2
)
In Ruby and other imperative languages, I am used to being able to run serveral expressions in a lambda block by adding a linebreak between them. In Haskell, I looked for a similar construct and found that Haskell doesn't respect linebreaks in pure expressions.
So, syntactically, if I wanted to run the second line after the first without using a do block, what could I do?
Unless it's side-effectful code, having multiple independent expressions with no side effects makes no sense in Haskell (nor any other languages for that matter). You can use let to store the value of an expression in a name and then use it in the subsequent expression (the returm value of the lambda), or a do-block, but anything else is meaningless in Haskell and thus not a valid program.
In other words: it makes little sense to ask this question solely about syntax as Haskell's syntax is all about its semantics.

How to find the optimal processing order?

I have an interesting question, but I'm not sure exactly how to phrase it...
Consider the lambda calculus. For a given lambda expression, there are several possible reduction orders. But some of these don't terminate, while others do.
In the lambda calculus, it turns out that there is one particular reduction order which is guaranteed to always terminate with an irreducible solution if one actually exists. It's called Normal Order.
I've written a simple logic solver. But the trouble is, the order in which it processes the constraints seems to have a huge effect on whether it finds any solutions or not. Basically, I'm wondering whether something like a normal order exists for my logic programming language. (Or wether it's impossible for a mere machine to deterministically solve this problem.)
So that's what I'm after. Presumably the answer critically depends on exactly what the "simple logic solver" is. So I will attempt to briefly describe it.
My program is closely based on the system of combinators in chapter 9 of The Fun of Programming (Jeremy Gibbons & Oege de Moor). The language has the following structure:
The input to the solver is a single predicate. Predicates may involve variables. The output from the solver is zero or more solutions. A solution is a set of variable assignments which make the predicate become true.
Variables hold expressions. An expression is an integer, a variable name, or a tuple of subexpressions.
There is an equality predicate, which compares expressions (not predicates) for equality. It is satisfied if substituting every (bound) variable with its value makes the two expressions identical. (In particular, every variable equals itself, bound or not.) This predicate is solved using unification.
There are also operators for AND and OR, which work in the obvious way. There is no NOT operator.
There is an "exists" operator, which essentially creates local variables.
The facility to define named predicates enables recursive looping.
One of the "interesting things" about logic programming is that once you write a named predicate, it typically works fowards and backwards (and sometimes even sideways). Canonical example: A predicate to concatinate two lists can also be used to split a list into all possible pairs.
But sometimes running a predicate backwards results in an infinite search, unless you rearrange the order of the terms. (E.g., swap the LHS and RHS of an AND or an OR somehwere.) I'm wondering whether there's some automated way to detect the best order to run the predicates in, to ensure prompt termination in all cases where the solution set is exactually finite.
Any suggestions?
Relevant paper, I think: http://www.cs.technion.ac.il/~shaulm/papers/abstracts/Ledeniov-1998-DCS.html
Also take a look at this: http://en.wikipedia.org/wiki/Constraint_logic_programming#Bottom-up_evaluation

Resources