Need help parsing the meaning of "i.~" - j

I'm trying to understand the solution for day 1 part 2: https://code.jsoftware.com/wiki/Essays/Advent_Of_Code#Part_2
PART2=: >: _1 i.~ +/\ 1 _1 mp '()'=/read'input'
I feel like I understand most of what's going on here but I'm not sure how to interpret the
i.~
part. I know what "i." generally does but I'm confused by the "~" here. My understanding is that "~" duplicates the right args to also be on the left. But here we already have a "_1" so I'm not sure how to interpret the semantics of this.
Also any tips on how to track this down myself are greatly appreciated.

It seems like my comments discouraged others from providing the typical official answers in answer format. So I’m reproducing them as an answer proper.
The adverb commute ~ simply flips the arguments to the dyadic verb it’s attached to. So 3 % 5 is three-fifths but 3 %~ 5 is five-thirds (aka 5 % 3).
So, in your verb, the dyad i. is the one you’re familiar with (index of), but the ~ makes the constant _1 its right argument, even though, textually, it appears to its left. Thus, the i. is looking for the first _1 in the result of +/\ 1 _1 mp ….
You can find it spelled out in the NuVoc here or in the original J Vocabulary here. The NuVoc is designed to be more accessible, the Vocabulary canonical.
In the NuVoc, it's expressed as Thus, x u~ y is the same as y u x. The Vocabulary states it more mathematically, with the equivalence assertion x u~ y ↔ y u x (on the right half of the page, since we're talking about dyad i. here, instead of the monad).

Related

In Haskell, what is the result of a infinite multiplication of 1 by itself?

In the following example, why any optimization that evaluates to 1 would be considered incorrect?
foldl (*) 1 (repeat 1)
^CInterrupted.
The intention of optimizations is to make a program faster without changing what answer it gets at the end of the (now faster) calculation.
Since foldl (*) 1 (repeat 1) starts out as an infinite loop, after optimizations, while it may go through each iteration of the loop body more quickly, it still must go through infinitely many iterations to avoid changing what answer it gets.
Daniel Wagner gave a great answer, yet I wanted to add a little bit on infinite multiplication.
In a way, what you are trying to do matches the following expression in the language (the mechanism is, of course, different - see the definition of fix: we have no accumulated product - only the sequence of applications; I just wanted to give an answer from the other perspective, which is kind of not there):
fix (1*)
Data.Function.fix's description:
fix f is the least fixed point of the function f, i.e. the least defined x such that f x = x.
And here is the definition:
fix f = let x = f x in x
So, (1*) has a whole Int of fixed points, and the fix (1*) gets reduced to the least defined, that is to the bottom (infinite loop in this case).
In your definition, same happens, yet the programme is ready to put 1 (a well-defined fixed point) in the end of the multiplication. This does not happen as it reaches bottom - the least defined fixed point.

How to quickly read the do notation without translating to >>= compositions?

This question is related to this post: Understanding do notation for simple Reader monad: a <- (*2), b <- (+10), return (a+b)
I don't care if a language is hard to understand if it promises to solve some problems that easy to understand languages give us. I've been promised that the impossibility of changing state in Haskell (and other functional languages) is a game changer and I do believe that. I've had too many bugs in my code related to state and I totally agree with this post that reasoning about the interaction of objects in OOP languages is near impossible because they can change states, and thus in order to reason about code we should consider all the possible permutations of these states.
However, I've been finding that reasoning about Haskell monads is also very hard. As you can see in the answers to the question I linked, we need a big diagram to understand 3 lines of the do notation. I always end up opening stackedit.io to desugar the do notation by hand and write step by step the >>= applications of the do notation in order to understand the code.
The problem is more or less like this: in the majority of the cases when we have S a >>= f we have to unwrap a from S and apply f to it. However, f is actually another thing more or less in the formS a >>= g, which we also have to unwrap and so on. Human brain doesn't work like that, we can't easily apply these things in the head and stop, keep them in the brain's stack, and continue applying the rest of the >>= until we reach the end. When the end is reached, we get all those things stored in the brain's stack and glue them together.
Therefore, I must be doing something wrong. There must be an easy way to understand '>>= composition' in the brain. I know that do notation is very simple, but I can only think of that as a way to easily write >>= compositions. When I see the do notation I simply translate it to a bunch of >>=. I don't see it as a separate way of understanding code. If there is a way, I'd like someone to tell me.
So the question is: how to read the do notation?
Given a simple code like
foo :: Monad m => m Int -> m Int -> m Int
foo x y = do
a <- y -- I'm intentionally doing y first; see the Either example
b <- x
return (a + b)
you can't say much about <- except that it "gets" an Int value from x or y. What "get" means depends very much on what m is.
Some examples:
m ~ Maybe
foo (Just 3) (Just 5) evaluates to Just 8; replace either argument with Nothing, and you get Nothing. <- tries to get a value out the Maybe Int value, but aborts the rest of the block if it fails.
m ~ Either a
Pretty much the same as Maybe, but replacing Nothing with the first Left value that it encounters. foo (Right 3) (Right 5) returns Right 8. foo x (Left "foo") returns Left "foo", whether x is a Right or Left value.
m ~ []
Now, instead of getting an Int, <- gets every Int from among the given choices. It does so nondeterministically; you can imagine that the function "forks" into multiple parallel copies, each one having chosen a different value from its list. In the end, the final result is a list of all the results that were computed.
foo [1,2] [3,4] returns [4, 5, 5, 6] ([3 + 1, 3 + 2, 4 + 1, 4 + 2]).
m ~ IO
This one is tricky, because unlike the previous monads we've looked at, there isn't necessarily a value yet to get. foo readLn readLn will return whatever the sum of the two numbers read from standard input is, with the possibility of a run-time error should the strings so read not be parseable as Int values.
You might think of it as working like the Maybe monad, but with run-time exceptions replacing Nothing.
Part 1: no need to go into the weeds
There is actually a very simple, easy to grasp, intuition behind monads: they encode the order of stuff happening. Like, first do this thing, then do the other thing, then do the third thing. For example:
executeMadDoctrine = do
wait oneYear
s <- evaluatePoliticalSituation
case s of
Stable -> do
printInNewspapers "We're going to live another day"
executeMadDoctrine -- recursive call
Unstable -> do
printInNewspapers "Run for your lives"
launchMissiles
return ()
Or a slightly more realistic (and also compilable and executable) example:
main = do
putStrLn "What's your name?"
name <- getLine
if name == "EXIT" then
return ()
else do
putStrLn $ "Hi, " <> name
main
Simple. Just like Python. Human brain does, indeed, work exactly like this.
You see, you don't need to know how it all works inside, unless you start doing more advanced things. After all, you're probably not thinking about the order of cylinders firing every time you start your car, do you? You just hit the gas and it goes. It's the same with do.
Part 2: you picked a bad example
The example you picked in your previous question is not the best candidate for this stuff. The Monad instance for functions is indeed a bit brain-wrecking. Even I have to make a little effort to understand what's going on - and I've been doing Haskell professionally for quite some time.
The trouble here is mathematics. The bloody thing turns out unreasonably effective time after time, especially when nobody is asking it to.
Think about this: first we had perfectly good natural numbers that we could very well understand. I have two eyes, and you have one sword, I better run. But then it turned out that we need zero. Why the bloody hell do we need it? It's sacrilege! You can't write down something that isn't! But it turns out you have to have it. It unambiguously follows from the other stuff we know is true. And then we got irrational numbers. WTF is that? How do I even understand it? I can't have π oranges after all, can I? But they too must exist. It just follows. No way around it. And then complex numbers, transcendental, hypercomplex, unconstructible... My brain is boiling at this point.
It's sort of the same with monads: there is this peculiar mathematical object, and at some point somebody noticed that it's very good for expressing the order of computation, so we appropriated monads for that. But then it turns out that all kinds of things can be made to look like monads, mathematically speaking. No way around it, it just is.
And so we have all these funny instances. And the do notation still works for them, because they're monads (mathematically speaking), but it's no longer about order. Like, did you know that lists were monads too? But just like with functions, the interpretation for lists is not "order", it's nested loops. And if you combine lists with something else, you get non-determinism. Fun stuff.
But just like with different kinds of numbers, you can learn. You can build up intuition over time. Do you absolutely have to? See part 1.
Any long do chains can be re-arranged into the equivalent binary do, by the associativity law of monads, grouping everything on the right, as
do { A ; B ; C ; ... }
===
do { A ; r <- do { B ; C ; ... } ; return r }.
So we only need to understand this binary do form to understand everything else. And that is expressed as single >>= combination.
Then, treat do code's interpretation (for a particular monad) axiomatically instead, as a bunch of re-write rules. Convince yourself about the validity of those rules for a particular monad just once (yes, using possibly extensive >>=-based re-writes, once).
So, for the Reader monad from the question's linked entry,
(do { S f }) x === f x
(do { a <- S f ; return (h a) }) x === let {a = f x} in h a
=== h (f x)
(do { a <- S f ; === let {a = f x ;
b <- S g ; b = g x} in h a b
return (h a b) }) x === h (f x) (g x)
and any longer chain of lets is expressible as nested binary lets, equivalently.
The last one is liftM2 actually, so an argument could be made that understanding a particular monad means understanding its particular liftM2 (*), really.
And those Ss, we end up just ignoring them as noise, forced on us by Haskell syntax (well, that question didn't use them at all, but it could).
(*) more precisely, liftBind, (do { a <- S f ; b <- k a ; return (h a b) }) x === let {a = f x ; b = g x } in h a b where (S g) x === k a x. (specifically, this, after the words "the long version")
And so, your attitude of "When I see the do notation I simply translate it to a bunch of >>=. I don't see it as a separate way of understanding code" could actually be the problem.
do notation is your friend. Personally, I first hated it, then learned to love it, and now I see the >>=-based re-writes as its (low-level) implementation, more and more.
And, even more abstractly, do can equivalently be written as Monad Comprehensions, looking just like list comprehensions!
#chepner has already included in his answer quite a lot of what I would have said, but I wish to emphasise another aspect, which I feel is quite pertinent to this question: the fact that do notation is, for most developers, a much easier and more easily understandable way to work with any monadic expression which is at least moderately complex.
The reason for this is that, in an almost miraculous way, do blocks end up very much resembling code written in an imperative language. Imperative style code is much easier to understand for the majority of developers, and not only because it's by far the most common paradigm: it gives an explicit "recipe" for what a piece of code is doing, whereas more typical Haskell expressions, particularly monadic ones involving nested lambdas and >>= everywhere, very easily become difficult to comprehend.
In saying this I certainly do not mean that one should code in an imperative language as opposed to Haskell. The advantages of the pure functional style are well documented and seemingly well understood by the OP, so I will not go into them here. But Haskell's do notation allows one to write code in an imperative-looking "style", which therefore is explicit and easier to comprehend - at least on a small scale - while sacrificing none of the advantages of using a pure functional language.
This "imperative style" of do notation is, I feel, more visible with some monads than others, and I wish to illustrate my point with examples from a couple of monads which I find suit the "imperative style" well. First, IO, where I can give this simple example:
greet :: IO ()
greet = do
putStrLn "Hello, what is your name?"
name <- readLine
putStrLn $ "Pleased to meet you, " ++ name ++ "!"
I hope it's immediately obvious what this code does, when executed by the Haskell runtime. What I wish to emphasise is how similar it is to imperative code, for example, this Python translation (which is not the most idiomatic, but has been chosen to exactly match the Haskell code line for line).
def greet():
print("Hello, what is your name?")
name = input()
print("Pleased to meet you, " + name + "!")
Now ask yourself, how easy would the code be to understand in its desugared from, without do?
greet = putStrLn "Hello, what is your name?" >> readLine >>= \name -> putStrLn $ "Pleased to meet you, " ++ name ++ "!"
It's not particularly difficult, granted - but I hope you agree that it's much more "noisy" than the do block above. I can't speak for others, but I very much doubt I'm alone in saying that the latter version might take me 10-20 seconds or so to fully comprehend, whereas the do block is instantly comprehensible. And this of course is an extremely simple action - anything more complicated, as found in many real-world applications, makes the difference in comprehensibility much greater.
I have chosen IO for a reason, of course - I think it's in dealing with IO in particular that it's most natural to think in terms of "do this action, then if the result is that then do the next action, otherwise...". While the semantics of the IO monad fits this perfectly, it's much easier to translate the code into something like that when written in quasi-imperative notation than it is to use >>= directly. And the do notation is easier to write, too.
But although IO is the clearest example of this, it's certainly not the only one. Another great example is the State monad. Here's a simple example of using it to find the sum of a list of integers (and I know you wouldn't actually do that this way, but it's just a very simple example of some not-totally-trivial code that used this monad):
sumList :: State [Int] Int
sumList = go 0
where go subtotal = do
remaining <- get
case remaining of
[] -> return subtotal
(x:xs) -> do
put xs
go $ subtotal + X
Here, in my opinion, the steps are very clear - the auxiliary function go successively adds the first element of the list to the running total, while updating the internal state with the tail of the list. When there is no more list, it returns the running total. (Given the above, the function evalState sumList will take an actual list and sum it.)
One can probably come up with better examples (particularly ones where the calculation involved isn't trivial to do in other ways), but my point is hopefully still clear: rewriting the above with >>= and lambdas would make it much less comprehensible.
do notation is, in my opinion, why the often-quoted quip about Haskell being "the world's finest imperative language", has more than a grain of truth. By using and defining different monads one can write easily understandable "imperative" code in a wide variety of situations - while still having guarantees that various functions can't, for example, change global state. It's in many ways the best of both worlds.

Haskell: Get a prefix operator that works without parentheses

One big reason prefix operators are nice is that they can avoid the need for parentheses so that + - 10 1 2 unambiguously means (10 - 1) + 2. The infix expression becomes ambiguous if parens are dropped, which you can do away with by having certain precedence rules but that's messy, blah, blah, blah.
I'd like to make Haskell use prefix operations but the only way I've seen to do that sort of trades away the gains made by getting rid of parentheses.
Prelude> (-) 10 1
uses two parens.
It gets even worse when you try to compose functions because
Prelude> (+) (-) 10 1 2
yields an error, I believe because it's trying to feed the minus operation into the plus operations rather than first evaluating the minus and then feeding--so now you need even more parens!
Is there a way to make Haskell intelligently evaluate prefix notation? I think if I made functions like
Prelude> let p x y = x+y
Prelude> let m x y = x-y
I would recover the initial gains on fewer parens but function composition would still be a problem. If there's a clever way to join this with $ notation to make it behave at least close to how I want, I'm not seeing it. If there's a totally different strategy available I'd appreciate hearing it.
I tried reproducing what the accepted answer did here:
Haskell: get rid of parentheses in liftM2
but in both a Prelude console and a Haskell script, the import command didn't work. And besides, this is more advanced Haskell than I'm able to understand, so I was hoping there might be some other simpler solution anyway before I do the heavy lifting to investigate whatever this is doing.
It gets even worse when you try to compose functions because
Prelude> (+) (-) 10 1 2
yields an error, I believe because it's
trying to feed the minus operation into the plus operations rather
than first evaluating the minus and then feeding--so now you need even
more parens!
Here you raise exactly the key issue that's a blocker for getting what you want in Haskell.
The prefix notation you're talking about is unambiguous for basic arithmetic operations (more generally, for any set of functions of statically known arity). But you have to know that + and - each accept 2 arguments for + - 10 1 2 to be unambiguously resolved as +(-(10, 1), 2) (where I've explicit argument lists to denote every call).
But ignoring the specific meaning of + and -, the first function taking the second function as an argument is a perfectly reasonable interpretation! For Haskell rather than arithmetic we need to support higher order functions like map. You would want not not x has to turn into not(not(x)), but map not x has to turn into map(not, x).
And what if I had f g x? How is that supposed to work? Do I need to know what f and g are bound to so that I know whether it's a case like not not x or a case like map not x, just to know how to parse the call structure? Even assuming I have all the code available to inspect, how am I supposed to figure out what things are bound to if I can't know what the call structure of any expression is?
You'd end up needing to invent disambiguation syntax like map (not) x, wrapping not in parentheses to disable it's ability to act like an arity-1 function (much like Haskell's actual syntax lets you wrap operators in parentheses to disable their ability to act like an infix operator). Or use the fact that all Haskell functions are arity-1, but then you have to write (map not) x and your arithmetic example has to look like (+ ((- 10) 1)) 2. Back to the parentheses!
The truth is that the prefix notation you're proposing isn't unambiguous. Haskell's normal function syntax (without operators) is; the rule is you always interpret a sequence of terms like foo bar baz qux etc as ((((foo) bar) baz) qux) etc (where each of foo, bar, etc can be an identifier or a sub-term in parentheses). You use parentheses not to disambiguate that rule, but to group terms to impose a different call structure than that hard rule would give you.
Infix operators do complicate that rule, and they are ambiguous without knowing something about the operators involved (their precedence and associativity, which unlike arity is associated with the name not the actual value referred to). Those complications were added to help make the code easier to understand; particularly for the arithmetic conventions most programmers are already familiar with (that + is lower precedence than *, etc).
If you don't like the additional burden of having to memorise the precedence and associativity of operators (not an unreasonable position), you are free to use a notation that is unambiguous without needing precedence rules, but it has to be Haskell's prefix notation, not Polish prefix notation. And whatever syntactic convention you're using, in any language, you'll always have to use something like parentheses to indicate grouping where the call structure you need is different from what the standard convention would indicate. So:
(+) ((-) 10 1) 2
Or:
plus (minus 10 1) 2
if you define non-operator function names.

Pros / Cons of Tacit Programming in J

As a beginner in J I am often confronted with tacit programs which seem quite byzantine compared to the more familiar explicit form.
Now just because I find interpretation hard does not mean that the tacit form is incorrect or wrong. Very often the tacit form is considerably shorter than the explicit form, and thus easier to visually see all at once.
Question to the experts : Do these tacit forms convey a better sense of structure, and maybe distil out the underlying computational mechanisms ? Are there other benefits ?
I'm hoping the answer is yes, and true for some non-trivial examples...
Tacit programming is usually faster and more efficient, because you can tell J exactly what you want to do, instead of making it find out as it goes along your sentence. But as someone loving the hell out of tacit programming, I can also say that tacit programming encourages you to think about things in the J way.
To spoil the ending and answer your question: yes, tacit programming can and does convey information about structure. Technically, it emphasizes meaning above all else, but many of the operators that feature prominently in the less-trivial expressions you'll encounter (#: & &. ^: to name a few) have very structure-related meanings.
The canonical example of why it pays to write tacit code is the special code for modular exponentiation, along with the assurance that there are many more shortcuts like it:
ts =: 6!:2, 7!:2#] NB. time and space
100 ts '2 (1e6&| # ^) 8888x'
2.3356e_5 16640
100 ts '1e6 | 2 ^ 8888x'
0.00787232 8.496e6
The other major thing you'll hear said is that when J sees an explicit definition, it has to parse and eval it every single time it applies it:
NB. use rank 0 to apply the verb a large number of times
100 ts 'i (4 : ''x + y + x * y'')"0 i=.i.100 100' NB. naive
0.0136254 404096
100 ts 'i (+ + *)"0 i=.i.100 100' NB. tacit
0.00271868 265728
NB. J is spending the time difference reinterpreting the definition each time
100 ts 'i (4 : ''x (+ + *) y'')"0 i=.i.100 100'
0.0136336 273024
But both of these reasons take a backseat to the idea that J has a very distinct style of solving problems. There is no if, there is ^:. There is no looping, there is rank. Likewise, Ken saw beauty in the fact that in calculus, f+g was the pointwise sum of functions—indeed, one defines f+g to be the function where (f+g)(x) = f(x) + g(x)—and since J was already so good at pointwise array addition, why stop there?
Just as a language like Haskell revels in the pleasure of combining higher-order functions together instead of "manually" syncing them up end to end, so does J. Semantically, take a look at the following examples:
h =: 3 : '(f y) + g y' – h is a function that grabs its argument y, plugs it into f and g, and funnels the results into a sum.
h =: f + g – h is the sum of the functions f and g.
(A < B) +. (A = B) – "A is less than B or A is equal to B."
A (< +. =) B – "A is less than or equal to B."
It's a lot more algebraic. And I've only talked about trains thus far; there's a lot to be said about the handiness of tools like ^: or &.. The lesson is fairly clear, though: J wants it to be easy to talk about your functions algebraically. If you had to wrap all your actions in a 3 :'' or 4 :''—or worse, name them on a separate line!—every time you wanted to apply them interestingly (like via / or ^: or ;.) you'd probably be very turned off from J.
Sure, I admit you will be hard-pressed to find examples as elegant as these as your expressions get more complex. The tacit style just takes some getting used to. The vocab has to be familiar (if not second nature) to you, and even then sometimes you have the pleasure of slogging through code that is simply inexcusable. This can happen with any language.
Not an expert, but the biggest positive aspects of coding in tacit for me are 1) that it makes it a little easier to write programs that write programs and 2) it is a little easier for me to grasp the J way of approaching problems (which is a big part of why like to program with J). Explicit feels more like procedural programming, especially if I am using control words such as if., while. or select. .
The challenges are that 1) explicit code sometimes runs faster than tacit, but this is dependent on the task and the algorithm and 2) tacit code is interpreted as it is parsed and this means that there are times when explicit code is cleaner because you can leave the code waiting for variable values that are only defined at run time.

what's the meaning of "you do computations in Haskell by declaring what something is instead of declaring how you get it"?

Recently I am trying to learn a functional programming language and I choosed Haskell.
Now I am reading learn you a haskell and here is a description seems like Haskell's philosophy I am not sure I understand it exactly: you do computations in Haskell by declaring what something is instead of declaring how you get it.
Suppose I want to get the sum of a list.
In a declaring how you get it way:
get the total sum by add all the elements, so the code will be like this(not haskell, python):
sum = 0
for i in l:
sum += i
print sum
In a what something is way:
the total sum is the sum of the first element and the sum of the rest elements, so the code will be like this:
sum' :: (Num a) => [a] -> a
sum' [] = 0
sum' (x:xs) = x + sum' xs
But I am not sure I get it or not. Can some one help? Thanks.
Imperative and functional are two different ways to approach problem solving.
Imperative (Python) gives you actions which you need to use to get what you want. For example, you may tell the computer "knead the dough. Then put it in the oven. Turn the oven on. Bake for 10 minutes.".
Functional (Haskell, Clojure) gives you solutions. You'd be more likely to tell the computer "I have flour, eggs, and water. I need bread". The computer happens to know dough, but it doesn't know bread, so you tell it "bread is dough that has been baked". The computer, knowing what baking is, knows now how to make bread. You sit at the table for 10 minutes while the computer does the work for you. Then you enjoy delicious bread fresh from the oven.
You can see a similar difference in how engineers and mathematicians work. The engineer is imperative, looking at the problem and giving workers a blueprint to solve it. The mathematician defines the problem (solve for x) and the solution (x = -----) and may use any number of tried and true solutions to smaller problems (2x - 1 = ----- => 2x = ----- + 1) until he finally finds the desired solution.
It is not a coincidence that functional languages are used largely by people in universities, not because it is difficult to learn, but because there are not many mathematical thinkers outside of universities. In your quotation, they tried to define this difference in thought process by cleverly using how and what. I personally believe that everybody understands words by turning them into things they already understand, so I'd imagine my bread metaphor should clarify the difference for you.
EDIT: It is worth noting that when you imperatively command the computer, you don't know if you'll have bread at the end (maybe you cooked it too long and it's burnt, or you didn't add enough flour). This is not a problem in functional languages where you know exactly what each solution gives you. There is no need for trial and error in a functional language because everything you do will be correct (though not always useful, like accidentally solving for t instead of x).
The missing part of the explanations is the following.
The imperative example shows you step by step how to compute the sum. At no stage you can convince yourself that it is indeed a sum of elements of a list. For example, there is no knowing why sum=0 at first; should it be 0 at all; do you loop through the right indices; what sum+=i gives you.
sum=0 -- why? it may become clear if you consider what happens in the loop,
-- but not on its own
for i in l:
sum += i -- what do we get? it will become clear only after the loop ends
-- at no step of iteration you have *the sum of the list*
-- so the step on its own is not meaningful
The declarative example is very different in this respect. In this particular case you start with declaring that the sum of an empty list is 0. This is already part of the answer of what the sum is. Then you add a statement about non-empty lists - a sum for a non-empty list is the sum of the tail with the head element added to it. This is the declaration of what the sum is. You can demonstrate inductively that it finds the solution for any list.
Note this proof part. In this case it is obvious. In more complex algorithms it is not obvious, so the proof of correctness is a substantial part - and remember that the imperative case only makes sense as a whole.
Another way to compute sum, where, hopefully, declarativeness and proovability become clearer:
sum [] = 0 -- the sum of the empty list is 0
sum [x] = x -- the sum of the list with 1 element is that element
sum xs = sum $ p xs where -- the sum of any other list is
-- the sum of the list reduced with p
p (x:y:xs) = x+y : p xs -- p reduces the list by replacing a pair of elements
-- with their sum
p xs = xs -- if there was no pair of elements, leave the list as is
Here we can convince ourselves that: 1. p makes the list ever shorter, so the computation of the sum will terminate; 2. p produces a list of sums, so by summing ever shorter lists we get a list of just one element; 3. because (+) is associative, the value produced by repeatedly applying p is the same as the sum of all elements in the original list; 4. we can demonstrate the number of applications of (+) is smaller than in the straightforward implementation.
In other words, the order of adding the elements doesn't matter, so we can sum the elements ([a,b,c,d,e]) in pairs first (a+b, c+d), which gives us a shorter list [a+b,c+d,e], whose sum is the same as the sum of the original list, and which now can be reduced in the same way: [(a+b)+(c+d),e], then [((a+b)+(c+d))+e].
Robert Harper claims in his blog that "declarative" has no meaning. I suppose
he is talking about a clear definition there, which I usually think of as more narrow
then meaning, but the post is still worth checking out and hints that you might not
find as clear an answer as you would wish.
Still, everybody talks about "declarative" and it feels like when we do we usually
talk about the same thing. i.e. Give a number of people two different apis/languages/programs
and ask them which is the most declarative one and they will usually pick the same.
The confusing part to me at first was that your declarative sum
sum' [] = 0
sum' (x:xs) = x + sum' xs
can also be seen as an instruction on how to get the result. It's just a different one.
It's also worth noting that the function sum in the prelude isn't actually defined like that
since that particular way of calculating the sum is inefficient. So clearly something is
fishy.
So, the "what, not how" explanation seem unsatisfactory to me. I think of it instead as
declarative being a "how" which in addition have some nice properties. My current intuition
about what those properties are is something similar to:
A thing is more declarative if it doesn't mutate any state.
A thing is more declarative if you can do mathy transformations on it and the meaning of
the thing sort of remains intact. So given your declarative sum again, if we knew that
+ is commutative there is some justification for thinking that writing it like
sum' xs + x should yield the same result.
A declarative thing can be decomposed into smaller thing and still have some meaning. Like
x and sum' xs still have the same meaning when taken separately, but trying to do the
same with the sum += x of python doesn't work as well.
A thing is more declarative if it's independent of the flow of time. For example css
doesn't describe the styling of a web page at page load. It describes the styling of the
web page at any time, even if the page would change.
A thing is more declarative if you don't have to think about program flow.
Other people might have different intuitions, or even a definition that I'm not aware of,
but hopefully these are somewhat helpful regardless.

Resources