Pros / Cons of Tacit Programming in J - implicit

As a beginner in J I am often confronted with tacit programs which seem quite byzantine compared to the more familiar explicit form.
Now just because I find interpretation hard does not mean that the tacit form is incorrect or wrong. Very often the tacit form is considerably shorter than the explicit form, and thus easier to visually see all at once.
Question to the experts : Do these tacit forms convey a better sense of structure, and maybe distil out the underlying computational mechanisms ? Are there other benefits ?
I'm hoping the answer is yes, and true for some non-trivial examples...

Tacit programming is usually faster and more efficient, because you can tell J exactly what you want to do, instead of making it find out as it goes along your sentence. But as someone loving the hell out of tacit programming, I can also say that tacit programming encourages you to think about things in the J way.
To spoil the ending and answer your question: yes, tacit programming can and does convey information about structure. Technically, it emphasizes meaning above all else, but many of the operators that feature prominently in the less-trivial expressions you'll encounter (#: & &. ^: to name a few) have very structure-related meanings.
The canonical example of why it pays to write tacit code is the special code for modular exponentiation, along with the assurance that there are many more shortcuts like it:
ts =: 6!:2, 7!:2#] NB. time and space
100 ts '2 (1e6&| # ^) 8888x'
2.3356e_5 16640
100 ts '1e6 | 2 ^ 8888x'
0.00787232 8.496e6
The other major thing you'll hear said is that when J sees an explicit definition, it has to parse and eval it every single time it applies it:
NB. use rank 0 to apply the verb a large number of times
100 ts 'i (4 : ''x + y + x * y'')"0 i=.i.100 100' NB. naive
0.0136254 404096
100 ts 'i (+ + *)"0 i=.i.100 100' NB. tacit
0.00271868 265728
NB. J is spending the time difference reinterpreting the definition each time
100 ts 'i (4 : ''x (+ + *) y'')"0 i=.i.100 100'
0.0136336 273024
But both of these reasons take a backseat to the idea that J has a very distinct style of solving problems. There is no if, there is ^:. There is no looping, there is rank. Likewise, Ken saw beauty in the fact that in calculus, f+g was the pointwise sum of functions—indeed, one defines f+g to be the function where (f+g)(x) = f(x) + g(x)—and since J was already so good at pointwise array addition, why stop there?
Just as a language like Haskell revels in the pleasure of combining higher-order functions together instead of "manually" syncing them up end to end, so does J. Semantically, take a look at the following examples:
h =: 3 : '(f y) + g y' – h is a function that grabs its argument y, plugs it into f and g, and funnels the results into a sum.
h =: f + g – h is the sum of the functions f and g.
(A < B) +. (A = B) – "A is less than B or A is equal to B."
A (< +. =) B – "A is less than or equal to B."
It's a lot more algebraic. And I've only talked about trains thus far; there's a lot to be said about the handiness of tools like ^: or &.. The lesson is fairly clear, though: J wants it to be easy to talk about your functions algebraically. If you had to wrap all your actions in a 3 :'' or 4 :''—or worse, name them on a separate line!—every time you wanted to apply them interestingly (like via / or ^: or ;.) you'd probably be very turned off from J.
Sure, I admit you will be hard-pressed to find examples as elegant as these as your expressions get more complex. The tacit style just takes some getting used to. The vocab has to be familiar (if not second nature) to you, and even then sometimes you have the pleasure of slogging through code that is simply inexcusable. This can happen with any language.

Not an expert, but the biggest positive aspects of coding in tacit for me are 1) that it makes it a little easier to write programs that write programs and 2) it is a little easier for me to grasp the J way of approaching problems (which is a big part of why like to program with J). Explicit feels more like procedural programming, especially if I am using control words such as if., while. or select. .
The challenges are that 1) explicit code sometimes runs faster than tacit, but this is dependent on the task and the algorithm and 2) tacit code is interpreted as it is parsed and this means that there are times when explicit code is cleaner because you can leave the code waiting for variable values that are only defined at run time.

Related

what's the meaning of "you do computations in Haskell by declaring what something is instead of declaring how you get it"?

Recently I am trying to learn a functional programming language and I choosed Haskell.
Now I am reading learn you a haskell and here is a description seems like Haskell's philosophy I am not sure I understand it exactly: you do computations in Haskell by declaring what something is instead of declaring how you get it.
Suppose I want to get the sum of a list.
In a declaring how you get it way:
get the total sum by add all the elements, so the code will be like this(not haskell, python):
sum = 0
for i in l:
sum += i
print sum
In a what something is way:
the total sum is the sum of the first element and the sum of the rest elements, so the code will be like this:
sum' :: (Num a) => [a] -> a
sum' [] = 0
sum' (x:xs) = x + sum' xs
But I am not sure I get it or not. Can some one help? Thanks.
Imperative and functional are two different ways to approach problem solving.
Imperative (Python) gives you actions which you need to use to get what you want. For example, you may tell the computer "knead the dough. Then put it in the oven. Turn the oven on. Bake for 10 minutes.".
Functional (Haskell, Clojure) gives you solutions. You'd be more likely to tell the computer "I have flour, eggs, and water. I need bread". The computer happens to know dough, but it doesn't know bread, so you tell it "bread is dough that has been baked". The computer, knowing what baking is, knows now how to make bread. You sit at the table for 10 minutes while the computer does the work for you. Then you enjoy delicious bread fresh from the oven.
You can see a similar difference in how engineers and mathematicians work. The engineer is imperative, looking at the problem and giving workers a blueprint to solve it. The mathematician defines the problem (solve for x) and the solution (x = -----) and may use any number of tried and true solutions to smaller problems (2x - 1 = ----- => 2x = ----- + 1) until he finally finds the desired solution.
It is not a coincidence that functional languages are used largely by people in universities, not because it is difficult to learn, but because there are not many mathematical thinkers outside of universities. In your quotation, they tried to define this difference in thought process by cleverly using how and what. I personally believe that everybody understands words by turning them into things they already understand, so I'd imagine my bread metaphor should clarify the difference for you.
EDIT: It is worth noting that when you imperatively command the computer, you don't know if you'll have bread at the end (maybe you cooked it too long and it's burnt, or you didn't add enough flour). This is not a problem in functional languages where you know exactly what each solution gives you. There is no need for trial and error in a functional language because everything you do will be correct (though not always useful, like accidentally solving for t instead of x).
The missing part of the explanations is the following.
The imperative example shows you step by step how to compute the sum. At no stage you can convince yourself that it is indeed a sum of elements of a list. For example, there is no knowing why sum=0 at first; should it be 0 at all; do you loop through the right indices; what sum+=i gives you.
sum=0 -- why? it may become clear if you consider what happens in the loop,
-- but not on its own
for i in l:
sum += i -- what do we get? it will become clear only after the loop ends
-- at no step of iteration you have *the sum of the list*
-- so the step on its own is not meaningful
The declarative example is very different in this respect. In this particular case you start with declaring that the sum of an empty list is 0. This is already part of the answer of what the sum is. Then you add a statement about non-empty lists - a sum for a non-empty list is the sum of the tail with the head element added to it. This is the declaration of what the sum is. You can demonstrate inductively that it finds the solution for any list.
Note this proof part. In this case it is obvious. In more complex algorithms it is not obvious, so the proof of correctness is a substantial part - and remember that the imperative case only makes sense as a whole.
Another way to compute sum, where, hopefully, declarativeness and proovability become clearer:
sum [] = 0 -- the sum of the empty list is 0
sum [x] = x -- the sum of the list with 1 element is that element
sum xs = sum $ p xs where -- the sum of any other list is
-- the sum of the list reduced with p
p (x:y:xs) = x+y : p xs -- p reduces the list by replacing a pair of elements
-- with their sum
p xs = xs -- if there was no pair of elements, leave the list as is
Here we can convince ourselves that: 1. p makes the list ever shorter, so the computation of the sum will terminate; 2. p produces a list of sums, so by summing ever shorter lists we get a list of just one element; 3. because (+) is associative, the value produced by repeatedly applying p is the same as the sum of all elements in the original list; 4. we can demonstrate the number of applications of (+) is smaller than in the straightforward implementation.
In other words, the order of adding the elements doesn't matter, so we can sum the elements ([a,b,c,d,e]) in pairs first (a+b, c+d), which gives us a shorter list [a+b,c+d,e], whose sum is the same as the sum of the original list, and which now can be reduced in the same way: [(a+b)+(c+d),e], then [((a+b)+(c+d))+e].
Robert Harper claims in his blog that "declarative" has no meaning. I suppose
he is talking about a clear definition there, which I usually think of as more narrow
then meaning, but the post is still worth checking out and hints that you might not
find as clear an answer as you would wish.
Still, everybody talks about "declarative" and it feels like when we do we usually
talk about the same thing. i.e. Give a number of people two different apis/languages/programs
and ask them which is the most declarative one and they will usually pick the same.
The confusing part to me at first was that your declarative sum
sum' [] = 0
sum' (x:xs) = x + sum' xs
can also be seen as an instruction on how to get the result. It's just a different one.
It's also worth noting that the function sum in the prelude isn't actually defined like that
since that particular way of calculating the sum is inefficient. So clearly something is
fishy.
So, the "what, not how" explanation seem unsatisfactory to me. I think of it instead as
declarative being a "how" which in addition have some nice properties. My current intuition
about what those properties are is something similar to:
A thing is more declarative if it doesn't mutate any state.
A thing is more declarative if you can do mathy transformations on it and the meaning of
the thing sort of remains intact. So given your declarative sum again, if we knew that
+ is commutative there is some justification for thinking that writing it like
sum' xs + x should yield the same result.
A declarative thing can be decomposed into smaller thing and still have some meaning. Like
x and sum' xs still have the same meaning when taken separately, but trying to do the
same with the sum += x of python doesn't work as well.
A thing is more declarative if it's independent of the flow of time. For example css
doesn't describe the styling of a web page at page load. It describes the styling of the
web page at any time, even if the page would change.
A thing is more declarative if you don't have to think about program flow.
Other people might have different intuitions, or even a definition that I'm not aware of,
but hopefully these are somewhat helpful regardless.

Can compilers deduce/prove mathematically?

I'm starting to learn functional programming language like Haskell, ML and most of the exercises will show off things like:
foldr (+) 0 [ 1 ..10]
which is equivalent to
sum = 0
for( i in [1..10] )
sum += i
So that leads me to think why can't compiler know that this is Arithmetic Progression and use O(1) formula to calculate?
Especially for pure FP languages without side effect?
The same applies for
sum reverse list == sum list
Given a + b = b + a
and definition of reverse, can compilers/languages prove it automatically?
Compilers generally don't try to prove this kind of thing automatically, because it's hard to implement.
As well as adding the logic to the compiler to transform one fragment of code into another, you have to be very careful that it only tries to do it when it's actually safe - i.e. there are often lots of "side conditions" to worry about. For example in your example above, someone might have written an instance of the type class Num (and hence the (+) operator) where the a + b is not b + a.
However, GHC does have rewrite rules which you can add to your own source code and could be used to cover some relatively simple cases like the ones you list above, particularly if you're not too bothered about the side conditions.
For example, and I haven't tested this, you might use the following rule for one of your examples above:
{-# RULES
"sum/reverse" forall list . sum (reverse list) = sum list
#-}
Note the parentheses around reverse list - what you've written in your question actually means (sum reverse) list and wouldn't typecheck.
EDIT:
As you're looking for official sources and pointers to research, I've listed a few.
Obviously it's hard to prove a negative but the fact that no-one has given an example of a general-purpose compiler that does this kind of thing routinely is probably quite strong evidence in itself.
As others have pointed out, even simple arithmetic optimisations are surprisingly dangerous, particularly on floating point numbers, and compilers generally have flags to turn them off - for example Visual C++, gcc. Even integer arithmetic isn't always clear-cut and people occasionally have big arguments about how to deal with things like overflow.
As Joachim noted, integer variables in loops are one place where slightly more sophisticated optimisations are applied because there are actually significant wins to be had. Muchnick's book is probably the best general source on the topic but it's not that cheap. The wikipedia page on strength reduction is probably as good an introduction as any to one of the standard optimisations of this kind, and has some references to the relevant literature.
FFTW is an example of a library that does all kinds of mathematical optimization internally. Some of its code is generated by a customised compiler the authors wrote specifically for the purpose. It's worthwhile because the authors have domain-specific knowledge of optimizations that in the specific context of the library are both worth the effort and safe
People sometimes use template metaprogramming to write "self-optimising libraries" that again might rely on arithmetic identities, see for example Blitz++. Todd Veldhuizen's PhD dissertation has a good overview.
If you descend into the realms of toy and academic compilers all sorts of things go. For example my own PhD dissertation is about writing inefficient functional programs along with little scripts that explain how to optimise them. Many of the examples (see Chapter 6) rely on applying arithmetic rules to justify the underlying optimisations.
Also, it's worth emphasising that the last few examples are of specialised optimisations being applied only to certain parts of the code (e.g. calls to specific libraries) where it is expected to be worthwhile. As other answers have pointed out, it's simply too expensive for a compiler to go searching for all possible places in an entire program where an optimisation might apply. The GHC rewrite rules that I mentioned above are a great example of a compiler exposing a generic mechanism for individual libraries to use in a way that's most appropriate for them.
The answer
No, compilers don’t do that kind of stuff.
One reason why
And for your examples, it would even be wrong: Since you did not give type annotations, the Haskell compiler will infer the most general type, which would be
foldr (+) 0 [ 1 ..10] :: Num a => a
and similar
(\list -> sum (reverse list)) :: Num a => [a] -> a
and the Num instance for the type that is being used might well not fulfil the mathematical laws required for the transformation you suggest. The compiler should, before everything else, avoid to change the meaning (i.e. the semantics) of your program.
More pragmatically: The cases where the compiler could detect such large-scale transformations rarely occur in practice, so it would not be worth it to implement them.
An exception
Note notable exceptions are linear transformations in loops. Most compilers will rewrite
for (int i = 0; i < n; i++) {
... 200 + 4 * i ...
}
to
for (int i = 0, j = 200; i < n; i++, j += 4) {
... j ...
}
or something similar, as that pattern does often occur in code working on array.
The optimizations you have in mind will probably not be done even in the presence of monomorphic types, because there are so many possibilities and so much knowledge required. For example, in this example:
sum list == sum (reverse list)
The compiler would need to know or take into account the following facts:
sum = foldl (+) 0
(+) is commutative
reverse list is a permutation of list
foldl x c l, where x is commutative and c is a constant, yields the same result for all permutations of l.
This all seems trivial. Sure, the compiler can most probably look up the definition of sumand inline it. It could be required that (+) be commutative, but remember that +is just another symbol without attached meaning to the compiler. The third point would require the compiler to prove some non trivial properties about reverse.
But the point is:
You don't want to perform the compiler to do those calculations with each and every expression. Remember, to make this really useful, you'd have to heap up a lot of knowledge about many, many standard functions and operators.
You still can't replace the expression above with True unless you can rule out the possibility that list or some list element is bottom. Usually, one cannot do this. You can't even do the following "trivial" optimization of f x == f x in all cases
f x `seq` True
For, consider
f x = (undefined :: Bool, x)
then
f x `seq` True ==> True
f x == f x ==> undefined
That being said, regarding your first example slightly modified for monomorphism:
f n = n * foldl (+) 0 [1..10] :: Int
it is imaginable to optimize the program by moving the expression out of its context and replace it with the name of a constant, like so:
const1 = foldl (+) 0 [1..10] :: Int
f n = n * const1
This is because the compiler can see that the expression must be constant.
What you're describing looks like super-compilation. In your case, if the expression had a monomorphic type like Int (as opposed to polymorphic Num a => a), the compiler could infer that the expression foldr (+) 0 [1 ..10] has no external dependencies, therefore it could be evaluated at compile time and replaced by 55. However, AFAIK no mainstream compiler currently does this kind of optimization.
(In functional programming "proving" is usually associated with something different. In languages with dependent types types are powerful enough to express complex proposition and then through the Curry-Howard correspondence programs become proofs of such propositions.)
As others have noted, it's unclear that your simplifications even hold in Haskell. For instance, I can define
newtype NInt = N Int
instance Num NInt where
N a + _ = N a
N b * _ = N b
... -- etc
and now sum . reverse :: Num [a] -> a does not equal sum :: Num [a] -> a since I can specialize each to [NInt] -> NInt where sum . reverse == sum clearly does not hold.
This is one general tension that exists around optimizing "complex" operations—you actually need quite a lot of information in order to successfully prove that it's okay to optimize something. This is why the syntax-level compiler optimization which do exist are usually monomorphic and related to the structure of programs---it's usually such a simplified domain that there's "no way" for the optimization to go wrong. Even that is often unsafe because the domain is never quite so simplified and well-known to the compiler.
As an example, a very popular "high-level" syntactic optimization is stream fusion. In this case the compiler is given enough information to know that stream fusion can occur and is basically safe, but even in this canonical example we have to skirt around notions of non-termination.
So what does it take to have \x -> sum [0..x] get replaced by \x -> x*(x + 1)/2? The compiler would need a theory of numbers and algebra built-in. This is not possible in Haskell or ML, but becomes possible in dependently typed languages like Coq, Agda, or Idris. There you could specify things like
revCommute :: (_+_ :: a -> a -> a)
-> Commutative _+_
-> foldr _+_ z (reverse as) == foldr _+_ z as
and then, theoretically, tell the compiler to rewrite according to revCommute. This would still be difficult and finicky, but at least we'd have enough information around. To be clear, I'm writing something very strange above, a dependent type. The type not only depends on the ability to introduce both a type and a name for the argument inline, but also the existence of the entire syntax of your language "at the type level".
There are a lot of differences between what I just wrote and what you'd do in Haskell, though. First, in order to form a basis where such promises can be taken seriously, we must throw away general recursion (and thus we already don't have to worry about questions of non-termination like stream-fusion does). We also must have enough structure around to create something like the promise Commutative _+_---this likely depends upon there being an entire theory of operators and mathematics built into the language's standard library else you would need to create that yourself. Finally, the richness of type system required to even express these kinds of theories adds a lot of complexity to the entire system and tosses out type inference as you know it today.
But, given all that structure, I'd never be able to create an obligation Commutative _+_ for the _+_ defined to work on NInts and so we could be certain that foldr (+) 0 . reverse == foldr (+) 0 actually does hold.
But now we'd need to tell the compiler how to actually perform that optimization. For stream-fusion, the compiler rules only kick in when we write something in exactly the right syntactic form to be "clearly" an optimization redex. The same kinds of restrictions would apply to our sum . reverse rule. In fact, already we're sunk because
foldr (+) 0 . reverse
foldr (+) 0 (reverse as)
don't match. They're "obviously" the same due to some rules we could prove about (.), but that means that now the compiler must invoke two built-in rules in order to perform our optimization.
At the end of the day, you need a very smart optimization search over the sets of known laws in order to achieve the kinds of automatic optimizations you're talking about.
So not only do we add a lot of complexity to the entire system, require a lot of base work to build-in some useful algebraic theories, and lose Turing completeness (which might not be the worst thing), we also only get a finicky promise that our rule would even fire unless we perform an exponentially painful search during compilation.
Blech.
The compromise that exists today tends to be that sometimes we have enough control over what's being written to be mostly certain that a certain obvious optimization can be performed. This is the regime of stream fusion and it requires a lot of hidden types, carefully written proofs, exploitations of parametricity, and hand-waving before it's something the community trusts enough to run on their code.
And it doesn't even always fire. For an example of battling that problem take a look at the source of Vector for all of the RULES pragmas that specify all of the common circumstances where Vector's stream-fusion optimizations should kick in.
All of this is not at all a critique of compiler optimizations or dependent type theories. Both are really incredible. Instead it's just an amplification of the tradeoffs involved in introducing such an optimization. It's not to be done lightly.
Fun fact: Given two arbitrary formulas, do they both give the same output for the same inputs? The answer to this trivial question is not computable! In other words, it is mathematically impossible to write a computer program that always gives the correct answer in finite time.
Given this fact, it's perhaps not surprising that nobody has a compiler that can magically transform every possible computation into its most efficient form.
Also, isn't this the programmer's job? If you want the sum of an arithmetic sequence commonly enough that it's a performance bottleneck, why not just write some more efficient code yourself? Similarly, if you really want Fibonacci numbers (why?), use the O(1) algorithm.

Can all iterative algorithms be expressed recursively?

Can all iterative algorithms be expressed recursively?
If not, is there a good counter example that shows an iterative algorithm for which there exists no recursive counterpart?
If it is the case that all iterative algorithms can be expressed recursively, are there cases in which this is more difficult to do?
Also, what role does the programming language play in all this? I can imagine that Scheme programmers have a different take on iteration (= tail-recursion) and stack usage than Java-only programmers.
There's a simple ad hoc proof for this. Since you can build a Turing complete language using strictly iterative structures and a Turing complete language using only recursive structures, then the two are therefore equivalent.
Can all iterative algorithms be expressed recursively?
Yes, but the proof is not interesting:
Transform the program with all its control flow into a single loop containing a single case statement in which each branch is straight-line control flow possibly including break, return, exit, raise, and so on. Introduce a new variable (call it the "program counter") which the case statement uses to decide which block to execute next.
This construction was discovered during the great "structured-programming wars" of the 1960s when people were arguing the relative expressive power of various control-flow constructs.
Replace the loop with a recursive function, and replace every mutable local variable with a parameter to that function. Voilà! Iteration replaced by recursion.
This procedure amounts to writing an interpreter for the original function. As you may imagine, it results in unreadable code, and it is not an interesting thing to do. However, some of the techniques can be useful for a person with background in imperative programming who is learning to program in a functional language for the first time.
Like you say, every iterative approach can be turned into a "recursive" one, and with tail calls, the stack will not explode either. :-) In fact, that's actually how Scheme implements all common forms of looping. Example in Scheme:
(define (fib n)
(do ((x 0 y)
(y 1 (+ x y))
(i 1 (+ i 1)))
((> i n) x)))
Here, although the function looks iterative, it actually recurses on an internal lambda that takes three parameters, x, y, and i, and calling itself with new values at each iteration.
Here's one way that function could be macro-expanded:
(define (fib n)
(letrec ((inner (lambda (x y i)
(if (> i n) x
(inner y (+ x y) (+ i 1))))))
(inner 0 1 1)))
This way, the recursive nature becomes more visually apparent.
Defining iterative as:
function q(vars):
while X:
do Y
Can be translated as:
function q(vars):
if X:
do Y
call q(vars)
Y in most cases would include incrementing a counter that is tested by X. This variable will have to be passed along in 'vars' in some way when going the recursive route.
As noted by plinth in the their answer we can construct proofs showing how recursion and iteration are equivalent and can both be used to solve the same problem; however, even though we know the two are equivalent there are drawbacks to use one over the other.
In languages that are not optimized for recursion you may find that an algorithm using iteration preforms faster than the recursive one and likewise, even in optimized languages you may find that an algorithm using iteration written in a different language runs faster than the recursive one. Furthermore, there may not be an obvious way of written a given algorithm using recursion versus iteration and vice versa. This can lead to code that is difficult to read which leads to maintainability issues.
Prolog is recursive only language and you can do pretty much everything in it (I don't suggest you do, but you can :))
Recursive Solutions are usually relatively inefficient when compared to iterative solutions.
However, it is noted that there are some problems that can be solved only through recursion and equivalent iterative solution may not exist or extremely complex to program easily (Example of such is The Ackermann function cannot be expressed without recursion)Though recursions are elegant,easy to write and understand.

Pattern matching identical values

I just wondered whether it's possible to match against the same values for multiple times with the pattern matching facilities of functional programming languages (Haskell/F#/Caml).
Just think of the following example:
plus a a = 2 * a
plus a b = a + b
The first variant would be called when the function is invoked with two similar values (which would be stored in a).
A more useful application would be this (simplifying an AST).
simplify (Add a a) = Mult 2 a
But Haskell rejects these codes and warns me of conflicting definitions for a - I have to do explicit case/if-checks instead to find out whether the function got identical values. Is there any trick to indicate that a variable I want to match against will occur multiple times?
This is called a nonlinear pattern. There have been several threads on the haskell-cafe mailing list about this, not long ago. Here are two:
http://www.mail-archive.com/haskell-cafe#haskell.org/msg59617.html
http://www.mail-archive.com/haskell-cafe#haskell.org/msg62491.html
Bottom line: it's not impossible to implement, but was decided against for sake of simplicity.
By the way, you do not need if or case to work around this; the (slightly) cleaner way is to use a guard:
a `plus` b
| a == b = 2*a
| otherwise = a+b
You can't have two parameters with the same name to indicate that they should be equal, but you can use guards to distinguish cases like this:
plus a b
| a == b = 2 * a
| otherwise = a + b
This is more flexible since it also works for more complicated conditions than simple equality.
I have just looked up the mailing list threads given in Thomas's answer, and the very first reply in one of them makes good sense, and explains why such a "pattern" would not make much sense in general: what if a is a function? (It is impossible in general to check it two functions are equal.)
I have implemented a new functional programming language that can handle non-linear patterns in Haskell.
https://github.com/egison/egison
In my language, your plus function in written as follow.
(define $plus
(match-lambda [integer integer]
{[[$a ,a] (* a 2)]
[[$a $b] (+ a b)]}))

Explain concatenative languages to me like I'm an 8-year-old

I've read the Wikipedia article on concatenative languages, and I am now more confused than I was when I started. :-)
What is a concatenative language in stupid people terms?
In normal programming languages, you have variables which can be defined freely and you call methods using these variables as arguments. These are simple to understand but somewhat limited. Often, it is hard to reuse an existing method because you simply can't map the existing variables into the parameters the method needs or the method A calls another method B and A would be perfect for you if you could only replace the call to B with a call to C.
Concatenative language use a fixed data structure to save values (usually a stack or a list). There are no variables. This means that many methods and functions have the same "API": They work on something which someone else left on the stack. Plus code itself is thought to be "data", i.e. it is common to write code which can modify itself or which accepts other code as a "parameter" (i.e. as an element on the stack).
These attributes make this languages perfect for chaining existing code to create something new. Reuse is built in. You can write a function which accepts a list and a piece of code and calls the code for each item in the list. This will now work on any kind of data as long it's behaves like a list: results from a database, a row of pixels from an image, characters in a string, etc.
The biggest problem is that you have no hint what's going on. There are only a couple of data types (list, string, number), so everything gets mapped to that. When you get a piece of data, you usually don't care what it is or where it comes from. But that makes it hard to follow data through the code to see what is happening to it.
I believe it takes a certain set of mind to use the languages successfully. They are not for everyone.
[EDIT] Forth has some penetration but not that much. You can find PostScript in any modern laser printer. So they are niche languages.
From a functional level, they are at par with LISP, C-like languages and SQL: All of them are Turing Complete, so you can compute anything. It's just a matter of how much code you have to write. Some things are more simple in LISP, some are more simple in C, some are more simple in query languages. The question which is "better" is futile unless you have a context.
First I'm going to make a rebuttal to Norman Ramsey's assertion that there is no theory.
Theory of Concatenative Languages
A concatenative language is a functional programming language, where the default operation (what happens when two terms are side by side) is function composition instead of function application. It is as simple as that.
So for example in the SKI Combinator Calculus (one of the simplest functional languages) two terms side by side are equivalent to applying the first term to the second term. For example: S K K is equivalent to S(K)(K).
In a concatenative language S K K would be equivalent to S . K . K in Haskell.
So what's the big deal
A pure concatenative language has the interesting property that the order of evaluation of terms does not matter. In a concatenative language (S K) K is the same as S (K K). This does not apply to the SKI Calculus or any other functional programming language based on function application.
One reason this observation is interesting because it reveals opportunities for parallelization in the evaluation of code expressed in terms of function composition instead of application.
Now for the real world
The semantics of stack-based languages which support higher-order functions can be explained using a concatenative calculus. You simply map each term (command/expression/sub-program) to be a function that takes a function as input and returns a function as output. The entire program is effectively a single stack transformation function.
The reality is that things are always distorted in the real world (e.g. FORTH has a global dictionary, PostScript does weird things where the evaluation order matters). Most practical programming languages don't adhere perfectly to a theoretical model.
Final Words
I don't think a typical programmer or 8 year old should ever worry about what a concatenative language is. I also don't find it particularly useful to pigeon-hole programming languages as being type X or type Y.
After reading http://concatenative.org/wiki/view/Concatenative%20language and drawing on what little I remember of fiddling around with Forth as a teenager, I believe that the key thing about concatenative programming has to do with:
viewing data in terms of values on a specific data stack
and functions manipulating stuff in terms of popping/pushing values on the same the data stack
Check out these quotes from the above webpage:
There are two terms that get thrown
around, stack language and
concatenative language. Both define
similar but not equal classes of
languages. For the most part though,
they are identical.
Most languages in widespread use today
are applicative languages: the central
construct in the language is some form
of function call, where a function is
applied to a set of parameters, where
each parameter is itself the result of
a function call, the name of a
variable, or a constant. In stack
languages, a function call is made by
simply writing the name of the
function; the parameters are implicit,
and they have to already be on the
stack when the call is made. The
result of the function call (if any)
is then left on the stack after the
function returns, for the next
function to consume, and so on.
Because functions are invoked simply
by mentioning their name without any
additional syntax, Forth and Factor
refer to functions as "words", because
in the syntax they really are just
words.
This is in contrast to applicative languages that apply their functions directly to specific variables.
Example: adding two numbers.
Applicative language:
int foo(int a, int b)
{
return a + b;
}
var c = 4;
var d = 3;
var g = foo(c,d);
Concatenative language (I made it up, supposed to be similar to Forth... ;) )
push 4
push 3
+
pop
While I don't think concatenative language = stack language, as the authors point out above, it seems similar.
I reckon the main idea is 1. We can create new programs simply by joining other programs together.
Also, 2. Any random chunk of the program is a valid function (or sub-program).
Good old pure RPN Forth has those properties, excluding any random non-RPN syntax.
In the program 1 2 + 3 *, the sub-program + 3 * takes 2 args, and gives 1 result. The sub-program 2 takes 0 args and returns 1 result. Any chunk is a function, and that is nice!
You can create new functions by lumping two or more others together, optionally with a little glue. It will work best if the types match!
These ideas are really good, we value simplicity.
It is not limited to RPN Forth-style serial language, nor imperative or functional programming. The two ideas also work for a graphical language, where program units might be for example functions, procedures, relations, or processes.
In a network of communicating processes, every sub-network can act like a process.
In a graph of mathematical relations, every sub-graph is a valid relation.
These structures are 'concatenative', we can break them apart in any way (draw circles), and join them together in many ways (draw lines).
Well, that's how I see it. I'm sure I've missed many other good ideas from the concatenative camp. While I'm keen on graphical programming, I'm new to this focus on concatenation.
My pragmatic (and subjective) definition for concatenative programming (now, you can avoid read the rest of it):
-> function composition in extreme ways (with Reverse Polish notation (RPN) syntax):
( Forth code )
: fib
dup 2 <= if
drop 1
else
dup 1 - recurse
swap 2 - recurse +
then ;
-> everything is a function, or at least, can be a function:
( Forth code )
: 1 1 ; \ define a function 1 to push the literal number 1 on stack
-> arguments are passed implicitly over functions (ok, it seems to be a definition for tacit-programming), but, this in Forth:
a b c
may be in Lisp:
(c a b)
(c (b a))
(c (b (a)))
so, it's easy to generate ambiguous code...
you can write definitions that push the xt (execution token) on stack and define a small alias for 'execute':
( Forth code )
: <- execute ; \ apply function
so, you'll get:
a b c <- \ Lisp: (c a b)
a b <- c <- \ Lisp: (c (b a))
a <- b <- c <- \ Lisp: (c (b (a)))
To your simple question, here's a subjective and argumentative answer.
I looked at the article and several related web pages. The web pages say themselves that there isn't a real theory, so it's no wonder that people are having a hard time coming up with a precise and understandable definition. I would say that at present, it is not useful to classify languages as "concatenative" or "not concatenative".
To me it looks like a term that gives Manfred von Thun a place to hang his hat but may not be useful for other programmers.
While PostScript and Forth are worth studying, I don't see anything terribly new or interesting in Manfred von Thun's Joy programming language. Indeed, if you read Chris Okasaki's paper on Techniques for Embedding Postfix Languages in Haskell you can try out all this stuff in a setting that, relative to Joy, is totally mainstream.
So my answer is there's no simple explanation because there's no mature theory underlying the idea of a concatenative language. (As Einstein and Feynman said, if you can't explain your idea to a college freshman, you don't really understand it.) I'll go further and say although studying some of these languages, like Forth and PostScript, is an excellent use of time, trying to figure out exactly what people mean when they say "concatenative" is probably a waste of your time.
You can't explain a language, just get one (Factor, preferably) and try some tutorials on it. Tutorials are better than Stack Overflow answers.

Resources