Can all iterative algorithms be expressed recursively? - programming-languages

Can all iterative algorithms be expressed recursively?
If not, is there a good counter example that shows an iterative algorithm for which there exists no recursive counterpart?
If it is the case that all iterative algorithms can be expressed recursively, are there cases in which this is more difficult to do?
Also, what role does the programming language play in all this? I can imagine that Scheme programmers have a different take on iteration (= tail-recursion) and stack usage than Java-only programmers.

There's a simple ad hoc proof for this. Since you can build a Turing complete language using strictly iterative structures and a Turing complete language using only recursive structures, then the two are therefore equivalent.

Can all iterative algorithms be expressed recursively?
Yes, but the proof is not interesting:
Transform the program with all its control flow into a single loop containing a single case statement in which each branch is straight-line control flow possibly including break, return, exit, raise, and so on. Introduce a new variable (call it the "program counter") which the case statement uses to decide which block to execute next.
This construction was discovered during the great "structured-programming wars" of the 1960s when people were arguing the relative expressive power of various control-flow constructs.
Replace the loop with a recursive function, and replace every mutable local variable with a parameter to that function. Voilà! Iteration replaced by recursion.
This procedure amounts to writing an interpreter for the original function. As you may imagine, it results in unreadable code, and it is not an interesting thing to do. However, some of the techniques can be useful for a person with background in imperative programming who is learning to program in a functional language for the first time.

Like you say, every iterative approach can be turned into a "recursive" one, and with tail calls, the stack will not explode either. :-) In fact, that's actually how Scheme implements all common forms of looping. Example in Scheme:
(define (fib n)
(do ((x 0 y)
(y 1 (+ x y))
(i 1 (+ i 1)))
((> i n) x)))
Here, although the function looks iterative, it actually recurses on an internal lambda that takes three parameters, x, y, and i, and calling itself with new values at each iteration.
Here's one way that function could be macro-expanded:
(define (fib n)
(letrec ((inner (lambda (x y i)
(if (> i n) x
(inner y (+ x y) (+ i 1))))))
(inner 0 1 1)))
This way, the recursive nature becomes more visually apparent.

Defining iterative as:
function q(vars):
while X:
do Y
Can be translated as:
function q(vars):
if X:
do Y
call q(vars)
Y in most cases would include incrementing a counter that is tested by X. This variable will have to be passed along in 'vars' in some way when going the recursive route.

As noted by plinth in the their answer we can construct proofs showing how recursion and iteration are equivalent and can both be used to solve the same problem; however, even though we know the two are equivalent there are drawbacks to use one over the other.
In languages that are not optimized for recursion you may find that an algorithm using iteration preforms faster than the recursive one and likewise, even in optimized languages you may find that an algorithm using iteration written in a different language runs faster than the recursive one. Furthermore, there may not be an obvious way of written a given algorithm using recursion versus iteration and vice versa. This can lead to code that is difficult to read which leads to maintainability issues.

Prolog is recursive only language and you can do pretty much everything in it (I don't suggest you do, but you can :))

Recursive Solutions are usually relatively inefficient when compared to iterative solutions.
However, it is noted that there are some problems that can be solved only through recursion and equivalent iterative solution may not exist or extremely complex to program easily (Example of such is The Ackermann function cannot be expressed without recursion)Though recursions are elegant,easy to write and understand.

Related

Does Prolog use lazy evaluation like Haskell? [duplicate]

Because Prolog uses chronological backtracking(from the Prolog Wikipedia page) even after an answer is found(in this example where there can only be one solution), would this justify Prolog as using eager evaluation?
mother_child(trude, sally).
father_child(tom, sally).
father_child(tom, erica).
father_child(mike, tom).
sibling(X, Y) :- parent_child(Z, X), parent_child(Z, Y).
parent_child(X, Y) :- father_child(X, Y).
parent_child(X, Y) :- mother_child(X, Y).
With the following output:
?- sibling(sally, erica).
true ;
false.
To summarize the discussion with #WillNess below, yes, Prolog is strict. However, Prolog's execution model and semantics are substantially different from the languages that are usually labelled strict or non-strict. For more about this, see below.
I'm not sure the question really applies to Prolog, because it doesn't really have the kind of implicit evaluation ordering that other languages have. Where this really comes into play in a language like Haskell, you might have an expression like:
f (g x) (h y)
In a strict language like ML, there is a defined evaluation order: g x will be evaluated, then h y, and f (g x) (h y) last. In a language like Haskell, g x and h y will only be evaluated as required ("non-strict" is more accurate than "lazy"). But in Prolog,
f(g(X), h(Y))
does not have the same meaning, because it isn't using a function notation. The query would be broken down into three parts, g(X, A), h(Y, B), and f(A,B,C), and those constituents can be placed in any order. The evaluation strategy is strict in the sense that what comes earlier in a sequence will be evaluated before what comes next, but it is non-strict in the sense that there is no requirement that variables be instantiated to ground terms before evaluation can proceed. Unification is perfectly content to complete without having given you values for every variable. I am bringing this up because you have to break down a complex, nested expression in another language into several expressions in Prolog.
Backtracking has nothing to do with it, as far as I can tell. I don't think backtracking to the nearest choice point and resuming from there precludes a non-strict evaluation method, it just happens that Prolog's is strict.
That Prolog pauses after giving each of the several correct answers to a problem has nothing to do with laziness; it is a part of its user interaction protocol. Each answer is calculated eagerly.
Sometimes there will be only one answer but Prolog doesn't know that in advance, so it waits for us to press ; to continue search, in hopes of finding another solution. Sometimes it is able to deduce it in advance and will just stop right away, but only sometimes.
update:
Prolog does no evaluation on its own. All terms are unevaluated, as if "quoted" in Lisp.
Prolog will unfold your predicate definitions as written and is perfectly happy to keep your data structures full of unevaluated uninstantiated holes, if so entailed by your predicate definitions.
Haskell does not need any values, a user does, when requesting an output.
Similarly, Prolog produces solutions one-by-one, as per the user requests.
Prolog can even be seen to be lazier than Haskell where all arithmetic is strict, i.e. immediate, whereas in Prolog you have to explicitly request the arithmetic evaluation, with is/2.
So perhaps the question is ill-posed. Prolog's operations model is just too different. There are no "results" nor "functions", for one; but viewed from another angle, everything is a result, and predicates are "multi"-functions.
As it stands, the question is not correct in what it states. Chronological backtracking does not mean that Prolog will necessarily backtrack "in an example where there can be only one solution".
Consider this:
foo(a, 1).
foo(b, 2).
foo(c, 3).
?- foo(b, X).
X = 2.
?- foo(X, 2).
X = b.
So this is an example that does have only one solution and Prolog recognizes that, and does not attempt to backtrack. There are cases in which you can implement a solution to a problem in a way that Prolog will not recognize that there is only one logical solution, but this is due to the implementation and is not inherent to Prolog's execution model.
You should read up on Prolog's execution model. From the Wikipedia article which you seem to cite, "Operationally, Prolog's execution strategy can be thought of as a generalization of function calls in other languages, one difference being that multiple clause heads can match a given call. In that case, [emphasis mine] the system creates a choice-point, unifies the goal with the clause head of the first alternative, and continues with the goals of that first alternative." Read Sterling and Shapiro's "The Art of Prolog" for a far more complete discussion of the subject.
from Wikipedia I got
In eager evaluation, an expression is evaluated as soon as it is bound to a variable.
Then I think there are 2 levels - at user level (our predicates) Prolog is not eager.
But it is at 'system' level, because variables are implemented as efficiently as possible.
Indeed, attributed variables are implemented to be lazy, and are rather 'orthogonal' to 'logic' Prolog variables.

Is there a way to "preserve" results in Haskell?

I'm new to Haskell and understand that it is (basically) a pure functional language, which has the advantage that results to functions will not change across multiple evaluations. Given this, I'm puzzled by why I can't easily mark a function in such a way that its remembers the results of its first evaluation, and does not have to be evaluated again each time its value is required.
In Mathematica, for example, there is a simple idiom for accomplishing this:
f[x_]:=f[x]= ...
but in Haskell, the closest things I've found is something like
f' = (map f [0 ..] !!)
where f 0 = ...
f n = f' ...
which in addition to being far less clear (and apparently limited to Int arguments?) does not (seem to) preserve results within an interactive session.
Admittedly (and clearly), I don't understand exactly what's going on here; but naively, it seems like Haskel should have some way, at the function definition level, of
taking advantage of the fact that its functions are functions and skipping re-computation of their results once they have been computed, and
indicating a desire to do this at the function definition level with a simple and clean idiom.
Is there a way to accomplish this in Haskell that I'm missing? I understand (sort of) that Haskell can't store the evaluations as "state", but why can't it simply (in effect) redefine evaluated functions to be their computed value?
This grows out of this question, in which lack of this feature results in terrible performance.
Use a suitable library, such as MemoTrie.
import Data.MemoTrie
f' = memo f
where f 0 = ...
f n = f' ...
That's hardly less nice than the Mathematica version, is it?
Regarding
“why can't it simply (in effect) redefine evaluated functions to be their computed value?”
Well, it's not so easy in general. These values have to be stored somewhere. Even for an Int-valued function, you can't just allocate an array with all possible values – it wouldn't fit in memory. The list solution only works because Haskell is lazy and therefore allows infinite lists, but that's not particularly satisfying since lookup is O(n).
For other types it's simply hopeless – you'd need to somehow diagonalise an over-countably infinite domain.
You need some cleverer organisation. I don't know how Mathematica does this, but it probably uses a lot of “proprietary magic”. I wouldn't be so sure that it does really work the way you'd like, for any inputs.
Haskell fortunately has type classes, and these allow you to express exactly what a type needs in order to be quickly memoisable. HasTrie is such a class.

Difference between logic programming and functional programming

I have been reading many articles trying to understand the difference between functional and logic programming, but the only deduction I have been able to make so far is that logic programming defines programs through mathematical expressions. But such a thing is not associated with logic programming.
I would really appreciate some light being shed on the difference between functional and logic programming.
I wouldn't say that logic programming defines programs through mathematical expressions; that sounds more like functional programming. Logic programming uses logic expressions (well, eventually logic is math).
In my opinion, the major difference between functional and logic programming is the "building blocks": functional programming uses functions while logic programming uses predicates. A predicate is not a function; it does not have a return value. Depending on the value of it's arguments it may be true or false; if some values are undefined it will try to find the values that would make the predicate true.
Prolog in particular uses a special form of logic clauses named Horn clauses that belong to first order logic; Hilog uses clauses of higher order logic.
When you write a prolog predicate you are defining a horn clause:
foo :- bar1, bar2, bar3. means that foo is true if bar1, bar2 and bar3 is true.
note that I did not say if and only if; you can have multiple clauses for one predicate:
foo:-
bar1.
foo:-
bar2.
means that foo is true if bar1 is true or if bar2 is true
Some say that logic programming is a superset of functional programming since each function could be expressed as a predicate:
foo(x,y) -> x+y.
could be written as
foo(X, Y, ReturnValue):-
ReturnValue is X+Y.
but I think that such statements are a bit misleading
Another difference between logic and functional is backtracking. In functional programming once you enter the body of the function you cannot fail and move to the next definition. For example you can write
abs(x) ->
if x>0 x else -x
or even use guards:
abs(x) x>0 -> x;
abs(x) x=<0 -> -x.
but you cannot write
abs(x) ->
x>0,
x;
abs(x) ->
-x.
on the other hand, in Prolog you could write
abs(X, R):-
X>0,
R is X.
abs(X, R):-
R is -X.
if then you call abs(-3, R), Prolog would try the first clause, and fail when the execution reaches the -3 > 0 point but you wont get an error; Prolog will try the second clause and return R = 3.
I do not think that it is impossible for a functional language to implement something similar (but I haven't used such a language).
All in all, although both paradigms are considered declarative, they are quite different; so different that comparing them feels like comparing functional and imperative styles. I would suggest to try a bit of logic programming; it should be a mind-boggling experience. However, you should try to understand the philosophy and not simply write programs; Prolog allows you to write in functional or even imperative style (with monstrous results).
In a nutshell:
In functional programming, your program is a set of function definitions. The return value for each function is evaluated as a mathematical expression, possibly making use of passed arguments and other defined functions. For example, you can define a factorial function, which returns a factorial of a given number:
factorial 0 = 1 // a factorial of 0 is 1
factorial n = n * factorial (n - 1) // a factorial of n is n times factorial of n - 1
In logic programming, your program is a set of predicates. Predicates are usually defined as sets of clauses, where each clause can be defined using mathematical expressions, other defined predicates, and propositional calculus. For example, you can define a 'factorial' predicate, which holds whenever second argument is a factorial of first:
factorial(0, 1). // it is true that a factorial of 0 is 1
factorial(X, Y) :- // it is true that a factorial of X is Y, when all following are true:
X1 is X - 1, // there is a X1, equal to X - 1,
factorial(X1, Z), // and it is true that factorial of X1 is Z,
Y is Z * X. // and Y is Z * X
Both styles allow using mathematical expressions in the programs.
First, there are a lot of commonalities between functional and logic programming. That is, a lot of notions developed in one community can also be used in the other. Both paradigms started with rather crude implementations and strive towards purity.
But you want to know the differences.
So I will take Haskell on the one side and Prolog with constraints on the other. Practically all current Prolog systems offer constraints of some sort, like B, Ciao, ECLiPSe, GNU, IF, Scryer, SICStus, SWI, YAP, XSB. For the sake of the argument, I will use a very simple constraint dif/2 meaning inequality, which was present even in the very first Prolog implementation - so I will not use anything more advanced than that.
What functional programming is lacking
The most fundamental difference revolves around the notion of a variable. In functional programming a variable denotes a concrete value. This value must not be entirely defined, but only those parts that are defined can be used in computations. Consider in Haskell:
> let v = iterate (tail) [1..3]
> v
[[1,2,3],[2,3],[3],[],*** Exception: Prelude.tail: empty list
After the 4th element, the value is undefined. Nevertheless, you can use the first 4 elements safely:
> take 4 v
[[1,2,3],[2,3],[3],[]]
Note that the syntax in functional programs is cleverly restricted to avoid that a variable is left undefined.
In logic programming, a variable does not need to refer to a concrete value. So, if we want a list of 3 elements, we might say:
?- length(Xs,3).
Xs = [_A,_B,_C].
In this answer, the elements of the list are variables. All possible instances of these variables are valid solutions. Like Xs = [1,2,3]. Now, lets say that the first element should be different to the remaining elements:
?- length(Xs,3), Xs = [X|Ys], maplist(dif(X), Ys).
Xs = [X,_A,_B], Ys = [_A,_B], dif(X,_B), dif(X,_A).
Later on, we might demand that the elements in Xs are all equal. Before I write it out, I will try it alone:
?- maplist(=(_),Xs).
Xs = []
; Xs = [_A]
; Xs = [_A,_A]
; Xs = [_A,_A,_A]
; Xs = [_A,_A,_A,_A]
; ... .
See that the answers contain always the same variable? Now, I can combine both queries:
?- length(Xs,3), Xs = [X|Ys], maplist(dif(X), Ys), maplist(=(_),Xs).
false.
So what we have shown here is that there is no 3 element list where the first element is different to the other elements and all elements are equal.
This generality has permitted to develop several constraint languages which are offered as libraries to Prolog systems, the most prominent are CLPFD and CHR.
There is no straight forward way to get similar functionality in functional programming. You can emulate things, but the emulation isn't quite the same.
What logic programming is lacking
But there are many things that are lacking in logic programming that make functional programming so interesting. In particular:
Higher-order programming: Functional programming has here a very long tradition and has developed a rich set of idioms. For Prolog, the first proposals date back to the early 1980s, but it is still not very common. At least ISO Prolog has now the homologue to apply called call/2, call/3 ....
Lambdas: Again, it is possible to extend logic programming in that direction, the most prominent system is Lambda Prolog. More recently, lambdas have been developed also for ISO Prolog.
Type systems: There have been attempts, like Mercury, but it has not caught on that much. And there is no system with functionality comparable to type classes.
Purity: Haskell is entirely pure, a function Integer -> Integer is a function. No fine print lurking around. And still you can perform side effects. Comparable approaches are very slowly evolving.
There are many areas where functional and logic programming more or less overlap. For example backtracking and lazyness and list comprehensions, lazy evaluation and freeze/2, when/2, block. DCGs and monads. I will leave discussing these issues to others...
Logic programming and functional programming use different "metaphors" for computation. This often affects how you think about producing a solution, and sometimes means that different algorithms come naturally to a functional programmer than a logic programmer.
Both are based on mathematical foundations that provide more benefits for "pure" code; code that doesn't operate with side effects. There are languages for both paradigms that enforce purity, as well as languages that allow unconstrained side effects, but culturally the programmers for such languages tend to still value purity.
I'm going to consider append, a fairly basic operation in both logical and functional programming, for appending a list on to the end of another list.
In functional programming, we might consider append to be something like this:
append [] ys = ys
append (x:xs) ys = x : append xs ys
While in logic programming, we might consider append to be something like this:
append([], Ys, Ys).
append([X|Xs], Ys, [X|Zs]) :- append(Xs, Ys, Zs).
These implement the same algorithm, and even work basically the same way, but they "mean" something very different.
The functional append defines the list that results from appending ys onto the end of xs. We think of append as a function from two lists to another list, and the runtime system is designed to calculate the result of the function when we invoke it on two lists.
The logical append defines a relationship between three lists, which is true if the third list is the elements of the first list followed by the elements of the second list. We think of append as a predicate that is either true or false for any 3 given lists, and the runtime system is designed to find values that will make this predicate true when we invoke it with some arguments bound to specific lists and some left unbound.
The thing that makes logical append different is you can use it to compute the list that results from appending one list onto another, but you can also use it to compute the list you'd need to append onto the end of another to get a third list (or whether no such list exists), or to compute the list to which you need to append another to get a third list, or to give you two possible lists that can be appended together to get a given third (and to explore all possible ways of doing this).
While equivalent in that you can do anything you can do in one in the other, they lead to different ways of thinking about your programming task. To implement something in functional programming, you think about how to produce your result from the results of other function calls (which you may also have to implement). To implement something in logic programming, you think about what relationships between your arguments (some of which are input and some of which are output, and not necessarily the same ones from call to call) will imply the desired relationship.
Prolog, being a logical language, gives you free backtracking, it's pretty noticeable.
To elaborate, and I precise that I'm in no way expert in any of the paradigms, it looks to me like logical programming is way better when it comes to solving things. Because that's precisely what the language does (that appears clearly when backtracking is needed for example).
I think the difference is this:
imperative programming=modelling actions
function programming=modelling reasoning
logic programming =modelling knowledge
choose what fits your mind best
functional programming:
when 6PM, light on.
logic programming:
when dark, light on.

Non-Trivial Lazy Evaluation

I'm currently digesting the nice presentation Why learn Haskell? by Keegan McAllister. There he uses the snippet
minimum = head . sort
as an illustration of Haskell's lazy evaluation by stating that minimum has time-complexity O(n) in Haskell. However, I think the example is kind of academic in nature. I'm therefore asking for a more practical example where it's not trivially apparent that most of the intermediate calculations are thrown away.
Have you ever written an AI? Isn't it annoying that you have to thread pruning information (e.g. maximum depth, the minimum cost of an adjacent branch, or other such information) through the tree traversal function? This means you have to write a new tree traversal every time you want to improve your AI. That's dumb. With lazy evaluation, this is no longer a problem: write your tree traversal function once, to produce a huge (maybe even infinite!) game tree, and let your consumer decide how much of it to consume.
Writing a GUI that shows lots of information? Want it to run fast anyway? In other languages, you might have to write code that renders only the visible scenes. In Haskell, you can write code that renders the whole scene, and then later choose which pixels to observe. Similarly, rendering a complicated scene? Why not compute an infinite sequence of scenes at various detail levels, and pick the most appropriate one as the program runs?
You write an expensive function, and decide to memoize it for speed. In other languages, this requires building a data structure that tracks which inputs for the function you know the answer to, and updating the structure as you see new inputs. Remember to make it thread safe -- if we really need speed, we need parallelism, too! In Haskell, you build an infinite data structure, with an entry for each possible input, and evaluate the parts of the data structure that correspond to the inputs you care about. Thread safety comes for free with purity.
Here's one that's perhaps a bit more prosaic than the previous ones. Have you ever found a time when && and || weren't the only things you wanted to be short-circuiting? I sure have! For example, I love the <|> function for combining Maybe values: it takes the first one of its arguments that actually has a value. So Just 3 <|> Nothing = Just 3; Nothing <|> Just 7 = Just 7; and Nothing <|> Nothing = Nothing. Moreover, it's short-circuiting: if it turns out that its first argument is a Just, it won't bother doing the computation required to figure out what its second argument is.
And <|> isn't built in to the language; it's tacked on by a library. That is: laziness allows you to write brand new short-circuiting forms. (Indeed, in Haskell, even the short-circuiting behavior of (&&) and (||) aren't built-in compiler magic: they arise naturally from the semantics of the language plus their definitions in the standard libraries.)
In general, the common theme here is that you can separate the production of values from the determination of which values are interesting to look at. This makes things more composable, because the choice of what is interesting to look at need not be known by the producer.
Here's a well-known example I posted to another thread yesterday. Hamming numbers are numbers that don't have any prime factors larger than 5. I.e. they have the form 2^i*3^j*5^k. The first 20 of them are:
[1,2,3,4,5,6,8,9,10,12,15,16,18,20,24,25,27,30,32,36]
The 500000th one is:
1962938367679548095642112423564462631020433036610484123229980468750
The program that printed the 500000th one (after a brief moment of computation) is:
merge xxs#(x:xs) yys#(y:ys) =
case (x`compare`y) of
LT -> x:merge xs yys
EQ -> x:merge xs ys
GT -> y:merge xxs ys
hamming = 1 : m 2 `merge` m 3 `merge` m 5
where
m k = map (k *) hamming
main = print (hamming !! 499999)
Computing that number with reasonable speed in a non-lazy language takes quite a bit more code and head-scratching. There are a lot of examples here
Consider generating and consuming the first n elements of an infinite sequence. Without lazy evaluation, the naive encoding would run forever in the generation step, and never consume anything. With lazy evaluation, only as many elements are generated as the code tries to consume.

Explain concatenative languages to me like I'm an 8-year-old

I've read the Wikipedia article on concatenative languages, and I am now more confused than I was when I started. :-)
What is a concatenative language in stupid people terms?
In normal programming languages, you have variables which can be defined freely and you call methods using these variables as arguments. These are simple to understand but somewhat limited. Often, it is hard to reuse an existing method because you simply can't map the existing variables into the parameters the method needs or the method A calls another method B and A would be perfect for you if you could only replace the call to B with a call to C.
Concatenative language use a fixed data structure to save values (usually a stack or a list). There are no variables. This means that many methods and functions have the same "API": They work on something which someone else left on the stack. Plus code itself is thought to be "data", i.e. it is common to write code which can modify itself or which accepts other code as a "parameter" (i.e. as an element on the stack).
These attributes make this languages perfect for chaining existing code to create something new. Reuse is built in. You can write a function which accepts a list and a piece of code and calls the code for each item in the list. This will now work on any kind of data as long it's behaves like a list: results from a database, a row of pixels from an image, characters in a string, etc.
The biggest problem is that you have no hint what's going on. There are only a couple of data types (list, string, number), so everything gets mapped to that. When you get a piece of data, you usually don't care what it is or where it comes from. But that makes it hard to follow data through the code to see what is happening to it.
I believe it takes a certain set of mind to use the languages successfully. They are not for everyone.
[EDIT] Forth has some penetration but not that much. You can find PostScript in any modern laser printer. So they are niche languages.
From a functional level, they are at par with LISP, C-like languages and SQL: All of them are Turing Complete, so you can compute anything. It's just a matter of how much code you have to write. Some things are more simple in LISP, some are more simple in C, some are more simple in query languages. The question which is "better" is futile unless you have a context.
First I'm going to make a rebuttal to Norman Ramsey's assertion that there is no theory.
Theory of Concatenative Languages
A concatenative language is a functional programming language, where the default operation (what happens when two terms are side by side) is function composition instead of function application. It is as simple as that.
So for example in the SKI Combinator Calculus (one of the simplest functional languages) two terms side by side are equivalent to applying the first term to the second term. For example: S K K is equivalent to S(K)(K).
In a concatenative language S K K would be equivalent to S . K . K in Haskell.
So what's the big deal
A pure concatenative language has the interesting property that the order of evaluation of terms does not matter. In a concatenative language (S K) K is the same as S (K K). This does not apply to the SKI Calculus or any other functional programming language based on function application.
One reason this observation is interesting because it reveals opportunities for parallelization in the evaluation of code expressed in terms of function composition instead of application.
Now for the real world
The semantics of stack-based languages which support higher-order functions can be explained using a concatenative calculus. You simply map each term (command/expression/sub-program) to be a function that takes a function as input and returns a function as output. The entire program is effectively a single stack transformation function.
The reality is that things are always distorted in the real world (e.g. FORTH has a global dictionary, PostScript does weird things where the evaluation order matters). Most practical programming languages don't adhere perfectly to a theoretical model.
Final Words
I don't think a typical programmer or 8 year old should ever worry about what a concatenative language is. I also don't find it particularly useful to pigeon-hole programming languages as being type X or type Y.
After reading http://concatenative.org/wiki/view/Concatenative%20language and drawing on what little I remember of fiddling around with Forth as a teenager, I believe that the key thing about concatenative programming has to do with:
viewing data in terms of values on a specific data stack
and functions manipulating stuff in terms of popping/pushing values on the same the data stack
Check out these quotes from the above webpage:
There are two terms that get thrown
around, stack language and
concatenative language. Both define
similar but not equal classes of
languages. For the most part though,
they are identical.
Most languages in widespread use today
are applicative languages: the central
construct in the language is some form
of function call, where a function is
applied to a set of parameters, where
each parameter is itself the result of
a function call, the name of a
variable, or a constant. In stack
languages, a function call is made by
simply writing the name of the
function; the parameters are implicit,
and they have to already be on the
stack when the call is made. The
result of the function call (if any)
is then left on the stack after the
function returns, for the next
function to consume, and so on.
Because functions are invoked simply
by mentioning their name without any
additional syntax, Forth and Factor
refer to functions as "words", because
in the syntax they really are just
words.
This is in contrast to applicative languages that apply their functions directly to specific variables.
Example: adding two numbers.
Applicative language:
int foo(int a, int b)
{
return a + b;
}
var c = 4;
var d = 3;
var g = foo(c,d);
Concatenative language (I made it up, supposed to be similar to Forth... ;) )
push 4
push 3
+
pop
While I don't think concatenative language = stack language, as the authors point out above, it seems similar.
I reckon the main idea is 1. We can create new programs simply by joining other programs together.
Also, 2. Any random chunk of the program is a valid function (or sub-program).
Good old pure RPN Forth has those properties, excluding any random non-RPN syntax.
In the program 1 2 + 3 *, the sub-program + 3 * takes 2 args, and gives 1 result. The sub-program 2 takes 0 args and returns 1 result. Any chunk is a function, and that is nice!
You can create new functions by lumping two or more others together, optionally with a little glue. It will work best if the types match!
These ideas are really good, we value simplicity.
It is not limited to RPN Forth-style serial language, nor imperative or functional programming. The two ideas also work for a graphical language, where program units might be for example functions, procedures, relations, or processes.
In a network of communicating processes, every sub-network can act like a process.
In a graph of mathematical relations, every sub-graph is a valid relation.
These structures are 'concatenative', we can break them apart in any way (draw circles), and join them together in many ways (draw lines).
Well, that's how I see it. I'm sure I've missed many other good ideas from the concatenative camp. While I'm keen on graphical programming, I'm new to this focus on concatenation.
My pragmatic (and subjective) definition for concatenative programming (now, you can avoid read the rest of it):
-> function composition in extreme ways (with Reverse Polish notation (RPN) syntax):
( Forth code )
: fib
dup 2 <= if
drop 1
else
dup 1 - recurse
swap 2 - recurse +
then ;
-> everything is a function, or at least, can be a function:
( Forth code )
: 1 1 ; \ define a function 1 to push the literal number 1 on stack
-> arguments are passed implicitly over functions (ok, it seems to be a definition for tacit-programming), but, this in Forth:
a b c
may be in Lisp:
(c a b)
(c (b a))
(c (b (a)))
so, it's easy to generate ambiguous code...
you can write definitions that push the xt (execution token) on stack and define a small alias for 'execute':
( Forth code )
: <- execute ; \ apply function
so, you'll get:
a b c <- \ Lisp: (c a b)
a b <- c <- \ Lisp: (c (b a))
a <- b <- c <- \ Lisp: (c (b (a)))
To your simple question, here's a subjective and argumentative answer.
I looked at the article and several related web pages. The web pages say themselves that there isn't a real theory, so it's no wonder that people are having a hard time coming up with a precise and understandable definition. I would say that at present, it is not useful to classify languages as "concatenative" or "not concatenative".
To me it looks like a term that gives Manfred von Thun a place to hang his hat but may not be useful for other programmers.
While PostScript and Forth are worth studying, I don't see anything terribly new or interesting in Manfred von Thun's Joy programming language. Indeed, if you read Chris Okasaki's paper on Techniques for Embedding Postfix Languages in Haskell you can try out all this stuff in a setting that, relative to Joy, is totally mainstream.
So my answer is there's no simple explanation because there's no mature theory underlying the idea of a concatenative language. (As Einstein and Feynman said, if you can't explain your idea to a college freshman, you don't really understand it.) I'll go further and say although studying some of these languages, like Forth and PostScript, is an excellent use of time, trying to figure out exactly what people mean when they say "concatenative" is probably a waste of your time.
You can't explain a language, just get one (Factor, preferably) and try some tutorials on it. Tutorials are better than Stack Overflow answers.

Resources