Understanding STG

Understanding STG - haskell

The design of GHC is based on something called STG, which stands for "spineless, tagless G-machine".
Now G-machine is apparently short for "graph reduction machine", which defines how laziness is implemented. Unevaluated thunks are stored as an expression tree, and executing the program involves reducing these down to normal form. (A tree is an acyclic graph, but Haskell's pervasive recursion means that Haskell expressions form general graphs, hence graph-reduction and not tree-reduction.)
What is less clear are the terms "spineless" and "tagless".
I think that "spineless" refers to the fact that function applications do not have a "spine" of function application nodes. Instead, you have an object that names the function called and points to all of its arguments. Is that correct?
I thought that "tagless" referred to constructor nodes not being "tagged" with a constructor ID, and instead case-expressions are resolved using a jump instruction. But now I'm not sure that's correct. Instead, it seems to refer to the fact that nodes aren't tagged with their evaluation state. Can anyone clarify which (if any) of these interpretations is correct?

GHC wiki contains an introductory article about STG written by Max Bolingbroke:
The STG machine is an essential part of GHC, the world's leading
Haskell compiler. It defines how the Haskell evaluation model should
be efficiently implemented on standard hardware. Despite this key
role, it is generally poorly understood amongst GHC users. This
document aims to provide an overview of the STG machine in its modern,
eval/apply-based, pointer-tagged incarnation by a series of simple
examples showing how Haskell source code is compiled.

You are right about the "Spineless", that is it, if I'm correct. It is basically described on the 1988 article by Burn, Peyton-Jones and Robson, "The Spineless G-Machine". I've read it, but it is not so fresh in my mind.
Basically, on the G-Machine, all stack entries point to an application node except the one on the top, which points to the head of the expression. Those application nodes make access to the arguments indirect, and in some G-Machine descriptions, before applying a function the stack is rearranged, so that the last n nodes on the stack are made to point to the argument instead of the application node.
If I am not mistaken, the "Spineless" part is about avoiding having these application nodes (which are called the spine of the graph) on the stack altogether, thus avoiding that re-arrangement before each reduction.
As to the "Tagless" part, you are more correct now that you used to be, but... Using tags on nodes is a very, very old thing. Can you think on how a dynamically-typed language such as LISP was implemented? Every cell must have its value and a tag which says the type. If you want something you must examine the tag and act accordingly. In the case of Haskell, the evaluation state is more important than type, Haskell is statically typed.
In the STG machine, tags are not used. Tags were replaced, maybe through inspiration of OO lanaguages, by a set of function pointers. When you want the value of a node which has not been computed, the function will compute it. When it is already computed, the function returns it. This allows for a lot of creativity in what this function can do without making client code any more complex.
This "Tagless" part yes, is described in the "implementation of functional languages on stock hardware" article by SPJ.
There is also objection to this "tagless" thing. Basically, this involves function pointers, which is an indirect jump in computer architecture terms. And indirect jumps are an obstacle to branch prediction and hence to pipelining in general. Because either the architecture considers there is a data dependency on the jump argument, halting the pipeline, or the architecture assumes it does not know the destination and halts the pipeline.

The answer by migle is exactly what spinlessness and taglessness of the STGM mean. Today it does not worth trying to understand the names of the two features because the names stem from the history of graph reduction technologies: from G-machine, Spineless G-machine, and Spineless and Tagless G-machine.
The G-machine uses both spine and tags. A spine is a list of edges from the root node of a function application to the node of the function. For example, a function application of "f e1 e2 ... en" is represented as
root = AP left_n en
left_n = AP left_n-1 en-1 ...
left_2 = AP left_1 e1
left_1 = FUN f
in G-machine, and so a spine is a list of edges consisting of left_n -> left_n-1 -> ... -> left_2 -> left_1. It is literally a spine of the function application!
In the same function application, there are tags AP and FUN.
In the next advanced G-machine so called the Spineless G-machine, there is no such spine by representing such a function application in a contiguous block whose first slot points to f, the second slot points to e1, ..., and the n+1-th slot points to en. In this representation, we do not need a spine. But the block starts a special tag designating the number of slots and so on.
In the most advanced G-machine so called the Spineless Tagless G-machine, such a tag is replaced with a function pointer. To evaluate a function application is to jump to the code by the function pointer.
It is unfortunate to find that Simone Peyton Jones's STGM paper does not give compilation/evaluation rules in some abstract level, and so it is very natural for people not to be easy to understand the essence of the STGM.

You want to read SPJ's book about functional PL implementation:
http://research.microsoft.com/en-us/um/people/simonpj/papers/slpj-book-1987/index.htm

Related

Use type classes to implement dependency inversion in a Haskell application?

One major architectural goal when designing large applications is to reduce coupling and dependencies. By dependencies, I mean source-code dependencies, when one function or data type uses another function or another type. A high-level architecture guideline seems to be the Ports & Adapters architecture, with slight variations also referred to as Onion Architecture, Hexagonal Architecture, or Clean Architecture: Types and functions that model the domain of the application are at the center, then come use cases that provide useful services on the basis of the domain, and in the outermost ring are technical aspects like persistence, networking and UI.
The dependency rule says that dependencies must point inwards only. E.g.; persistence may depend on functions and types from use cases, and use cases may depend on functions and types from the domain. But the domain is not allowed to depend on the outer rings. How should I implement this kind of architecture in Haskell? To make it concrete: How can I implement a use case module that does not depend (= import) functions and types from a persistence module, even though it needs to retrieve and store data?
Say I want to implement a use case order placement via a function U.placeOrder :: D.Customer -> [D.LineItem] -> IO U.OrderPlacementResult, which creates an order from line items and attempts to persist the order. Here, U indicates the use case module and D the domain module. The function returns an IO action because it somehow needs to persist the order. However, the persistence itself is in the outermost architectural ring - implemented in some module P; so, the above function must not depend on anything exported from P.
I can imagine two generic solutions:
Higher order functions: The function U.placeOrder takes an additional function argument, say U.OrderDto -> U.PersistenceResult. This function is implemented in the persistence (P) module, but it depends on types of the U module, whereas the U module does not need to declare a dependency on P.
Type classes: The U module defines a Persistence type class that declares the above function. The P module depends on this type class and provides an instance for it.
Variant 1 is quite explicit but not very general. Potentially it results in functions with many arguments. Variant 2 is less verbose (see, for example, here). However, Variant 2 results in many unprincipled type classes, something considered bad practice in most modern Haskell textbooks and tutorials.
So, I am left with two questions:
Am I missing other alternatives?
Which approach is the generally recommended one, if any?

There are, indeed, other alternatives (see below).
While you can use partial application as dependency injection, I don't consider it a proper functional architecture, because it makes everything impure.
With your current example, it doesn't seem to matter too much, because U.placeOrder is already impure, but in general, you'd want your Haskell code to consist of as much referentially transparent code as possible.
You sometimes see a suggestion involving the Reader monad, where the 'dependencies' are passed to the function as the reader context instead of as straight function arguments, but as far as I can tell, these are just (isomorphic?) variations of the same idea, with the same problems.
Better alternatives are functional core, imperative shell, and free monads. There may be other alternatives as well, but these are the ones I'm aware of.
Functional core, imperative shell
You can often factor your code so that your domain model is defined as a set of pure functions. This is often easier to do in languages like Haskell and F# because you can use sum types to communicate decisions. The U.placeOrder function might, for example, look like this:
U.placeOrder :: D.Customer -> [D.LineItem] -> U.OrderPlacementDecision
Notice that this is a pure function, where U.OrderPlacementDecision might be a sum type that enumerates all the possible outcomes of the use case.
That's your functional core. You'd then compose your imperative shell (e.g. your main function) in an impureim sandwich:
main :: IO ()
main = do
stuffFromDb <- -- call the persistence module code here
customer -- initialised from persistence module, or some other place
lineItems -- ditto
let decision = U.placeOrder customer lineItems
_ <- persist decision
return ()
(I've obviously not tried to type-check that code, but I hope it's sufficiently correct to get the point accross.)
Free monads
The functional core, imperative shell is by far the simplest way to achieve the desired architectural outcome, and it's conspicuously often possible to get away with. Still, there are cases where that's not possible. In those cases, you can instead use free monads.
With free monads, you can define data structures that are roughly equivalent to object-oriented interfaces. Like in the functional core, imperative shell case, these data structures are sum types, which means that you can keep your functions pure. You can then run an impure interpreter over the generated expression tree.
I've written an article series about how to think about dependency injection in F# and Haskell. I've also recently published an article that (among other things) showcases this technique. Most of my articles are accompanied by GitHub repositories.

How do I serialize or save to a file a Thunk?

In Haskell, you can have infinite lists, because it doesn't completely compute them, it uses thunks. I am wondering if there is a way to serialize or otherwise save to a file a piece of data's thunk. For example let us say you have a list [0..]. Then you do some processing on it (I am mostly interested in tail and (:), but it should support doing filter or map as well.) Here is an example of sort of what I am looking for.
serial::(SerialThunk a)=>a->serThunk
serialized = serial ([0..] :: [Int])
main=writeToFile "foo.txt" serialized
And
deserial::(SerialThunk a)=>serThunk->a
main=do
deserialized <- readFromFile "foo.txt" :: IO [Int]
print $ take 10 deserialized

No. There is no way to serialize a thunk in Haskell. Once code is compiled it is typically represented as assembly (for example, this is what GHC does) and there is no way to recover a serializable description of the function, let alone the function and environment that you'd like to make a thunk.
Yes. You could build custom solutions, such as describing and serializing a Haskell expression. Deserialization and execution could happen by way of interpretation (ex. Using the hint package).
Maybe. Someone (you?) could make a compiler or modify an existing compiler to maintain more information in a platform-agnostic manner such that things could be serialized without the user manually leveraging hint. I imaging this is an are under exploration by the Cloud Haskell (aka distributed-haskell) developers.
Why? I have also wanted an ability to serialize functions so that I could pass closures around in a flexible manner. Most of the time, though, that flexibility isn't actually needed and instead people want to pass certain types of computations that can be easily expressed as a custom data type and interpretation function.

packman: "Evaluation-orthogonal serialisation of Haskell data, as a library" (thanks to a reddit link) -- is exactly what we have been looking for!
...this serialisation is orthogonal to evaluation: the argument is
serialised in its current state of evaluation, it might be
entirely unevaluated (a thunk) or only partially evaluated (containing
thunks).
...The library enables sending and receiving data between different nodes
of a distributed Haskell system. This is where the code originated:
the Eden runtime system.
...Apart from this obvious application, the functionality can be used to
optimise programs by memoisation (across different program runs), and
to checkpoint program execution in selected places. Both uses are
exemplified in the slide set linked above.
...Another limitation is that serialised data can only be used by the
very same binary. This is however common for many approaches to
distributed programming using functional languages.
...

Cloud Haskell supports serialization of function closures. http://www.haskell.org/haskellwiki/Cloud_Haskell

Apart from the work in Cloud Haskell and HdpH on "closures", and part from the answers stating that thunks are not analyzable at runtime, I've found that:
:sprint in GHCi seems to have access to internal thunk representation -- . Perhaps GHCi works with some special, non-optimized code. So in principle one could use this representation and the implementation of :sprint if one wants to serialize thunks, isn't that true?
http://hackage.haskell.org/package/ghc-heap-view-0.5.3/docs/GHC-HeapView.html -- "With this module, you can investigate the heap representation of Haskell values, i.e. to investigate sharing and lazy evaluation."
I'd be very curious to know what kind of working solutions for seriliazing closures can be made out of this stuff...

Use of Haskell state monad a code smell?

God I hate the term "code smell", but I can't think of anything more accurate.
I'm designing a high-level language & compiler to Whitespace in my spare time to learn about compiler construction, language design, and functional programming (compiler is being written in Haskell).
During the code generation phase of the compiler, I have to maintain "state"-ish data as I traverse the syntax tree. For example, when compiling flow-control statements I need to generate unique names for the labels to jump to (labels generated from a counter that's passed in, updated, & returned, and the old value of the counter must never be used again). Another example is when I come across in-line string literals in the syntax tree, they need to be permanently converted into heap variables (in Whitespace, strings are best stored on the heap). I'm currently wrapping the entire code generation module in the state monad to handle this.
I've been told that writing a compiler is a problem well suited to the functional paradigm, but I find that I'm designing this in much the same way I would design it in C (you really can write C in any language - even Haskell w/ state monads).
I want to learn how to think in Haskell (rather, in the functional paradigm) - not in C with Haskell syntax. Should I really try to eliminate/minimize use of the state monad, or is it a legitimate functional "design pattern"?

I've written multiple compilers in Haskell, and a state monad is a reasonable solution to many compiler problems. But you want to keep it abstract---don't make it obvious you're using a monad.
Here's an example from the Glasgow Haskell Compiler (which I did not write; I just work around a few edges), where we build control-flow graphs. Here are the basic ways to make graphs:
empyGraph :: Graph
mkLabel :: Label -> Graph
mkAssignment :: Assignment -> Graph -- modify a register or memory
mkTransfer :: ControlTransfer -> Graph -- any control transfer
(<*>) :: Graph -> Graph -> Graph
But as you've discovered, maintaining a supply of unique labels is tedious at best, so we provide these functions as well:
withFreshLabel :: (Label -> Graph) -> Graph
mkIfThenElse :: (Label -> Label -> Graph) -- branch condition
-> Graph -- code in the 'then' branch
-> Graph -- code in the 'else' branch
-> Graph -- resulting if-then-else construct
The whole Graph thing is an abstract type, and the translator just merrily constructs graphs in purely functional fashion, without being aware that anything monadic is going on. Then, when the graph is finally constructed, in order to turn it into an algebraic datatype we can generate code from, we give it a supply of unique labels, run the state monad, and pull out the data structure.
The state monad is hidden underneath; although it's not exposed to the client, the definition of Graph is something like this:
type Graph = RealGraph -> [Label] -> (RealGraph, [Label])
or a bit more accurately
type Graph = RealGraph -> State [Label] RealGraph
-- a Graph is a monadic function from a successor RealGraph to a new RealGraph
With the state monad hidden behind a layer of abstraction, it's not smelly at all!

I'd say that state in general is not a code smell, so long as it's kept small and well controlled.
This means that using monads such as State, ST or custom-built ones, or just having a data structure containing state data that you pass around to a few places, is not a bad thing. (Actually, monads are just assistance in doing exactly this!) However, having state that goes all over the place (yes, this means you, IO monad!) is a bad smell.
An fairly clear example of this was when my team was working on our entry for the ICFP Programming Contest 2009 (the code is available at git://git.cynic.net/haskell/icfp-contest-2009). We ended up with several different modular parts to this:
VM: the virtual machine that ran the simulation program
Controllers: several different sets of routines that read the output of the simulator and generated new control inputs
Solution: generation of the solution file based on the output of the controllers
Visualizers: several different sets of routines that read both the input and output ports and generated some sort of visualization or log of what was going on as the simulation progressed
Each of these has its own state, and they all interact in various ways through the input and output values of the VM. We had several different controllers and visualizers, each of which had its own different kind of state.
The key point here was that the the internals of any particular state were limited to their own particular modules, and each module knew nothing about even the existence of state for other modules. Any particular set of stateful code and data was generally only a few dozen lines long, with a handful of data items in the state.
All this was glued together in one small function of about a dozen lines which had no access to the internals of any of the states, and which merely called the right things in the proper order as it looped through the simulation, and passed a very limited amount of outside information to each module (along with the module's previous state, of course).
When state is used in such a limited way, and the type system is preventing you from inadvertently modifying it, it's quite easy to handle. It's one of the beauties of Haskell that it lets you do this.
One answer says, "Don't use monads." From my point of view, this is exactly backwards. Monads are a control structure that, among other things, can help you minimize the amount of code that touches state. If you look at monadic parsers as an example, the state of the parse (i.e., the text being parsed, how far one has gotten in to it, any warnings that have accumulated, etc.) must run through every combinator used in the parser. Yet there will only be a few combinators that actually manipulate the state directly; anything else uses one of these few functions. This allows you to see clearly and in one place all of a small amount of code that can change the state, and more easily reason about how it can be changed, again making it easier to deal with.

Have you looked at Attribute grammars (AG)? (More info on wikipedia and an article in the Monad Reader)?
With AG you can add attributes to a syntax tree. These attributes are separated in synthesized and inherited attributes.
Synthesized attributes are things you generate (or synthesize) from your syntax tree, this could be the generated code, or all comments, or whatever else your interested in.
Inherited attributes are input to your syntax tree, this could be the environment, or a list of labels to use during code generation.
At Utrecht University we use the Attribute Grammar System (UUAGC) to write compilers. This is a pre-processor which generates haskell code (.hs files) from the provided .ag files.
Although, if you're still learning Haskell, then maybe this is not the time to start learning yet another layer of abstraction over that.
In that case, you could manually write the sort of code that attributes grammars generate for you, for example:
data AbstractSyntax = Literal Int | Block AbstractSyntax
| Comment String AbstractSyntax
compile :: AbstractSyntax -> [Label] -> (Code, Comments)
compile (Literal x) _ = (generateCode x, [])
compile (Block ast) (l:ls) = let (code', comments) = compile ast ls
in (labelCode l code', comments)
compile (Comment s ast) ls = let (code, comments') = compile ast ls
in (code, s : comments')
generateCode :: Int -> Code
labelCode :: Label -> Code -> Code

It's possible that you may want an applicative functor instead of a
monad:
http://www.haskell.org/haskellwiki/Applicative_functor
I think the original paper explains it better than the wiki, however:
http://www.soi.city.ac.uk/~ross/papers/Applicative.html

I don't think using the State Monad is a code smell when it used to model state.
If you need to thread state through your functions,
you can do this explicitly, taking the the state as an argument and returning it in each function.
The State Monad offers a good abstraction: it passes the state along for you and
provides lots of useful function to combine functions that require state.
In this case, using the State Monad (or Applicatives) is not a code smell.
However, if you use the State Monad to emulate an imperative style of programming
while a functional solution would suffice, you are just making things complicated.

In general you should try to avoid state wherever possible, but that's not always practical. Applicative makes effectful code look nicer and more functional, especially tree traversal code can benefit from this style. For the problem of name generation there is now a rather nice package available: value-supply.

Well, don't use monads. The power of functional programming is function purity and their reuse. There's this paper a professor of mine once wrote and he's one of the guys who helped build Haskell.
The paper is called "Why functional programming matters", I suggest you read through it. It's a good read.

let's be careful about the terminology here. State is not per se bad; functional languages have state. What is a "code smell" is when you find yourself wanting to assign variables values and change them.
Of course, the Haskell state monad is there for just that reason -- as with I/O, it's letting you do unsafe and un-functional things in a constrained context.
So, yes, it's probably a code smell.

Creative uses of monads

I'm looking for creative uses of monads to learn from. I've read somewhere that monads have been used for example in AI, but being a monad newbie, I fail to see how.
Please include a link to the source code and sample usages. No standard monads please.

Phil Wadler has written many papers on monads, but the one to read first is a lot of fun and will be accessible to any programmer; it's called The essence of functional programming. The paper includes source code and sample usages.
A personal favorite of mine is the probability monad; if you can find Sungwoo Park's PhD thesis, it has a number of interesting example codes from robotics.

There's also LogicT (backtracking monad transformer with fair operations and pruning).
It has good value to AI Search algorithms because of its constructs for fair disjunctions, for example, easily enabling computations that succeed an infinite number of times to be combined (interleaved).
It's usage is described in the ICFP'05 paper Backtracking, Interleaving, and Terminating Monad Transformers

you can find interesting and advanced monads in the blog A Neighborhood of Infinity. I can note the Vector Space Monad, and its use for rational tangles description. Unfortunately,I don't think I understand this well enough to explain it here.

One of my favorite monads is Martin Escardo's search monad. It can be found on hackage in infinite-search package.
It is the monad of "search functions" for a set of elements of type a, namely (a -> Bool) -> Maybe a (finding an element in the set matching a given predicate).

One interesting use of monad is in parsing. Parsec is the standard example.

Read series of articles on monads used to model probability and probabilistic processes here : http://www.randomhacks.net/articles/2007/03/03/smart-classification-with-haskell (follow links to prev/next parts)

Harpy, a package for run-time generation of x86 machine code, uses a code generation monad. From the description:
This is a combined reader-state-exception monad which handles all the details of handling code buffers, emitting binary data, relocation etc.
All the code generation functions in module Harpy.X86CodeGen live in this monad and use its error reporting facilities as well as the internal state maintained by the monad.
The library user can pass a user environment and user state through the monad. This state is independent from the internal state and may be used by higher-level code generation libraries to maintain their own state across code generation operations.
I found this a particularly interesting example because I think that this pattern is not uncommon: I'd invented something quite similar myself for generating a set of internal messages for my application based on messages received from a (stock) market data feed. It turns out to be an extremely comfortable way to have a framework keep track of various "global" things whilst composing simple operations that in and of themselves keep no state.
I took one step further his idea of having a user state (which I call a "substate") that could also be passed through the monad: I have a mechanism for switching out and restoring state during the monad run:
-- | Given a generator that uses different substate type, convert it
-- to a generator that runs with our substate type. As well as the
-- other-substate-type generator, the caller must provide an initial
-- substate for that generator and a function taking the final substate
-- of the generator and producing a new substate of our type. This
-- preserves all other (non-substate) parts of the master state touched
-- by the generator.
--
mgConvertSubstate :: MsgGen msg st' a -> st' -> (st' -> st) -> MsgGen msg st a
This is used for subgroups of combinators that had their own state needed for a short period. These run with just their state, not knowing anything about the state of the generator that invoked it (which helps make things more modular), and yet this preserves any non-user-specific state, such as the current list of messages generated and the current set of warnings or errors, as well as the control flow (i.e., allowing total aborts to flow upwards).

I'd like to list a couple of monads not yet mentioned in other answers.
Enumerate and weighted search monads
The Omega monad can be used to productively traverse infinite lists of results. Compare:
>>> take 10 $ liftM2 (,) [0..] [0..]
[(0,0),(0,1),(0,2),(0,3),(0,4),(0,5),(0,6),(0,7),(0,8),(0,9)]
>>> take 10 $ runOmega $ liftM2 (,) (each' [0..]) (each' [0..])
[(0,0),(0,1),(1,0),(0,2),(1,1),(2,0),(0,3),(1,2),(2,1),(3,0)]
With a bit more advanced WeightedSearch monad it is also possible to assign weights to computations so that results of computations with lower weights would appear first in the output.
Accumulating errors monad
A useful These data type forms a Monad similar to Either, but able to accumulate errors rather. The package also defines MonadChronicle class as well as ChronicleT monad transformer based on These.

Explain concatenative languages to me like I'm an 8-year-old

I've read the Wikipedia article on concatenative languages, and I am now more confused than I was when I started. :-)
What is a concatenative language in stupid people terms?

In normal programming languages, you have variables which can be defined freely and you call methods using these variables as arguments. These are simple to understand but somewhat limited. Often, it is hard to reuse an existing method because you simply can't map the existing variables into the parameters the method needs or the method A calls another method B and A would be perfect for you if you could only replace the call to B with a call to C.
Concatenative language use a fixed data structure to save values (usually a stack or a list). There are no variables. This means that many methods and functions have the same "API": They work on something which someone else left on the stack. Plus code itself is thought to be "data", i.e. it is common to write code which can modify itself or which accepts other code as a "parameter" (i.e. as an element on the stack).
These attributes make this languages perfect for chaining existing code to create something new. Reuse is built in. You can write a function which accepts a list and a piece of code and calls the code for each item in the list. This will now work on any kind of data as long it's behaves like a list: results from a database, a row of pixels from an image, characters in a string, etc.
The biggest problem is that you have no hint what's going on. There are only a couple of data types (list, string, number), so everything gets mapped to that. When you get a piece of data, you usually don't care what it is or where it comes from. But that makes it hard to follow data through the code to see what is happening to it.
I believe it takes a certain set of mind to use the languages successfully. They are not for everyone.
[EDIT] Forth has some penetration but not that much. You can find PostScript in any modern laser printer. So they are niche languages.
From a functional level, they are at par with LISP, C-like languages and SQL: All of them are Turing Complete, so you can compute anything. It's just a matter of how much code you have to write. Some things are more simple in LISP, some are more simple in C, some are more simple in query languages. The question which is "better" is futile unless you have a context.

First I'm going to make a rebuttal to Norman Ramsey's assertion that there is no theory.
Theory of Concatenative Languages
A concatenative language is a functional programming language, where the default operation (what happens when two terms are side by side) is function composition instead of function application. It is as simple as that.
So for example in the SKI Combinator Calculus (one of the simplest functional languages) two terms side by side are equivalent to applying the first term to the second term. For example: S K K is equivalent to S(K)(K).
In a concatenative language S K K would be equivalent to S . K . K in Haskell.
So what's the big deal
A pure concatenative language has the interesting property that the order of evaluation of terms does not matter. In a concatenative language (S K) K is the same as S (K K). This does not apply to the SKI Calculus or any other functional programming language based on function application.
One reason this observation is interesting because it reveals opportunities for parallelization in the evaluation of code expressed in terms of function composition instead of application.
Now for the real world
The semantics of stack-based languages which support higher-order functions can be explained using a concatenative calculus. You simply map each term (command/expression/sub-program) to be a function that takes a function as input and returns a function as output. The entire program is effectively a single stack transformation function.
The reality is that things are always distorted in the real world (e.g. FORTH has a global dictionary, PostScript does weird things where the evaluation order matters). Most practical programming languages don't adhere perfectly to a theoretical model.
Final Words
I don't think a typical programmer or 8 year old should ever worry about what a concatenative language is. I also don't find it particularly useful to pigeon-hole programming languages as being type X or type Y.

After reading http://concatenative.org/wiki/view/Concatenative%20language and drawing on what little I remember of fiddling around with Forth as a teenager, I believe that the key thing about concatenative programming has to do with:
viewing data in terms of values on a specific data stack
and functions manipulating stuff in terms of popping/pushing values on the same the data stack
Check out these quotes from the above webpage:
There are two terms that get thrown
around, stack language and
concatenative language. Both define
similar but not equal classes of
languages. For the most part though,
they are identical.
Most languages in widespread use today
are applicative languages: the central
construct in the language is some form
of function call, where a function is
applied to a set of parameters, where
each parameter is itself the result of
a function call, the name of a
variable, or a constant. In stack
languages, a function call is made by
simply writing the name of the
function; the parameters are implicit,
and they have to already be on the
stack when the call is made. The
result of the function call (if any)
is then left on the stack after the
function returns, for the next
function to consume, and so on.
Because functions are invoked simply
by mentioning their name without any
additional syntax, Forth and Factor
refer to functions as "words", because
in the syntax they really are just
words.
This is in contrast to applicative languages that apply their functions directly to specific variables.
Example: adding two numbers.
Applicative language:
int foo(int a, int b)
{
return a + b;
}
var c = 4;
var d = 3;
var g = foo(c,d);
Concatenative language (I made it up, supposed to be similar to Forth... ;) )
push 4
push 3
+
pop
While I don't think concatenative language = stack language, as the authors point out above, it seems similar.

I reckon the main idea is 1. We can create new programs simply by joining other programs together.
Also, 2. Any random chunk of the program is a valid function (or sub-program).
Good old pure RPN Forth has those properties, excluding any random non-RPN syntax.
In the program 1 2 + 3 *, the sub-program + 3 * takes 2 args, and gives 1 result. The sub-program 2 takes 0 args and returns 1 result. Any chunk is a function, and that is nice!
You can create new functions by lumping two or more others together, optionally with a little glue. It will work best if the types match!
These ideas are really good, we value simplicity.
It is not limited to RPN Forth-style serial language, nor imperative or functional programming. The two ideas also work for a graphical language, where program units might be for example functions, procedures, relations, or processes.
In a network of communicating processes, every sub-network can act like a process.
In a graph of mathematical relations, every sub-graph is a valid relation.
These structures are 'concatenative', we can break them apart in any way (draw circles), and join them together in many ways (draw lines).
Well, that's how I see it. I'm sure I've missed many other good ideas from the concatenative camp. While I'm keen on graphical programming, I'm new to this focus on concatenation.

My pragmatic (and subjective) definition for concatenative programming (now, you can avoid read the rest of it):
-> function composition in extreme ways (with Reverse Polish notation (RPN) syntax):
( Forth code )
: fib
dup 2 <= if
drop 1
else
dup 1 - recurse
swap 2 - recurse +
then ;
-> everything is a function, or at least, can be a function:
( Forth code )
: 1 1 ; \ define a function 1 to push the literal number 1 on stack
-> arguments are passed implicitly over functions (ok, it seems to be a definition for tacit-programming), but, this in Forth:
a b c
may be in Lisp:
(c a b)
(c (b a))
(c (b (a)))
so, it's easy to generate ambiguous code...
you can write definitions that push the xt (execution token) on stack and define a small alias for 'execute':
( Forth code )
: <- execute ; \ apply function
so, you'll get:
a b c <- \ Lisp: (c a b)
a b <- c <- \ Lisp: (c (b a))
a <- b <- c <- \ Lisp: (c (b (a)))

To your simple question, here's a subjective and argumentative answer.
I looked at the article and several related web pages. The web pages say themselves that there isn't a real theory, so it's no wonder that people are having a hard time coming up with a precise and understandable definition. I would say that at present, it is not useful to classify languages as "concatenative" or "not concatenative".
To me it looks like a term that gives Manfred von Thun a place to hang his hat but may not be useful for other programmers.
While PostScript and Forth are worth studying, I don't see anything terribly new or interesting in Manfred von Thun's Joy programming language. Indeed, if you read Chris Okasaki's paper on Techniques for Embedding Postfix Languages in Haskell you can try out all this stuff in a setting that, relative to Joy, is totally mainstream.
So my answer is there's no simple explanation because there's no mature theory underlying the idea of a concatenative language. (As Einstein and Feynman said, if you can't explain your idea to a college freshman, you don't really understand it.) I'll go further and say although studying some of these languages, like Forth and PostScript, is an excellent use of time, trying to figure out exactly what people mean when they say "concatenative" is probably a waste of your time.

You can't explain a language, just get one (Factor, preferably) and try some tutorials on it. Tutorials are better than Stack Overflow answers.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string