Haskell function definition and caching arrays - haskell

I have a question about implementing caching (memoization) using arrays in Haskell. The following pattern works:
f = (fA !)
where fA = listArray...
But this does not (the speed of the program suggests that the array is getting recreated each call or something):
f n = (fA ! n)
where fA = listArray...
Defining fA outside of a where clause (in "global scope") also works with either pattern.
I was hoping that someone could point me towards a technical explanation of what the difference between the above two patterns is.
Note that I am using the latest GHC, and I'm not sure if this is just a compiler peculiarity or part of the language itself.
EDIT: ! is used for array access, so fA ! 5 means fA[5] in C++ syntax. My understanding of Haskell is that (fA !) n would be the same as (fA ! n)...also it would have been more conventional for me to have written "f n = fA ! n" (without the parentheses). Anyway, I get the same behaviour no matter how I parenthesize.

The difference in behavior is not specified by the Haskell standard. All it has to say is that the functions are the same (will result in the same output given the same input).
However in this case there is a simple way to predict time and memory performance that most compilers adhere to. Again I stress that this is not essential, only that most compilers do it.
First rewrite your two examples as pure lambda expressions, expanding the section:
f = let fA = listArray ... in \n -> fA ! n
f' = \n -> let fA = listArray ... in fA ! n
Compilers use let binding to indicate sharing. The guarantee is that in a given environment (set of local variables, lambda body, something like that), the right side of a let binding with no parameters will be evaluated at most once. The environment of fA in the former is the whole program since it is not under any lambda, but the environment of the latter is smaller since it is under a lambda.
What this means is that in the latter, fA may be evaluated once for each different n, whereas in the former this is forbidden.
We can see this pattern in effect even with multi argument functions:
g x y = (a ! y) where a = [ x ^ y' | y' <- [0..] ]
g' x = (\y -> a ! y) where a = [ x ^ y' | y' <- [0..] ]
Then in:
let k = g 2 in k 100 + k 100
We might compute 2^100 more than once, but in:
let k = g' 2 in k 100 + k 100
We will only compute it once.
If you are doing work with memoization, I recommend data-memocombinators on Hackage, which is a library of memo tables of different shapes, so you don't have to roll your own.

The best way to find what is going on is to tell the compiler to output its intermediate representation with -v4. The output is voluminous and a bit hard to read, but should allow you to find out exactly what the difference in the generated code is, and how the compiler arrived there.
You will probably notice that fA is being moved outside the function (to the "global scope") on your first example. On your second example, it probably is not (meaning it will be recreated on each call).
One possible reason for it not being moved outside the function would be because the compiler is thinking it depends on the value of n. On your working example, there is no n for fA to depend on.
But the reason I think the compiler is avoiding moving fA outside on your second example is because it is trying to avoid a space leak. Consider what would happen if fA, instead of your array, were an infinite list (on which you used the !! operator). Imagine you called it once with a large number (for instance f 10000), and later only called it with small numbers (f 2, f 3, f 12...). The 10000 elements from the earlier call are still on memory, wasting space. So, to avoid this, the compiler creates fA again every time you call your function.
The space leak avoidance probably does not happen on your first example because in that case f is in fact only called once, returning a closure (we are now at the frontier of the pure functional and the imperative worlds, so things get a bit more subtle). This closure replaces the original function, which will never be called again, so fA is only called once (and thus the optimizer feels free to move it outside the function). On your second example, f does not get replaced by a closure (since its value depends on the argument), and thus will get called again.
If you want to try to understand more of this (which will help reading the -v4 output), you could take a look at the Spineless Tagless G-Machine paper (citeseer link).
As to your final question, I think it is a compiler peculiarity (but I could be wrong). However, it would not surprise me if all compilers did the same thing, even if it were not part of the language.

Cool, thank you for your answers which helped a lot, and I will definitely check out data-memocombinators on Hackage. Coming from a C++-heavy background, I've been struggling with understanding exactly what Haskell will do (mainly in terms of complexity) with a given program, which tutorials don't seem to get in to.

Related

What is the difference between normal and lambda Haskell functions?

I'm a beginner to Haskell and I've been following the e-book Get Programming with Haskell
I'm learning about closures with Lambda functions but I fail to see the difference in the following code:
genIfEven :: Integral p => p -> (p -> p) -> p
genIfEven x = (\f -> isEven f x)
genIfEven2 :: Integral p => (p -> p) -> p -> p
genIfEven2 f x = isEven f x
It would be great if anyone could explain what the precise difference here is
At a basic level1, there isn't really a difference between "normal" functions and ones created with lambda syntax. What made you think there was a difference to ask what it is? (In the particular example you've shown, the functions take their parameters in a different order, but are otherwise the same; either of them could be defined with lambda syntax or "normal" syntax)
Functions are first class values in Haskell. Which means you can pass them to other functions, return them as results, store and retrieve them in data structures, etc, etc. Just like you can with numbers, or strings, or any other value.
Just like with numbers, strings, etc, it's helpful to have syntax for denoting a function value, because you might want to make a simple one right in the middle of other code. It would be pretty horrible if you, say, needed to pass x + 1 to some function and you couldn't just write the literal 1 for the number one, you had to instead go elsewhere in the file and add a one = 1 binding so that you could come back and write x + one. In exactly the same way, you might need to pass to some other function a function for adding 1; it would be annoying to go elsewhere in the file and add a separate definition plusOne x = x + 1, so lambda syntax gives us a way of writing "function literals": \x -> x + 1.2
Considering "normal" function definition syntax, like this:
incrementAllBy _ [] = []
incrementAllBy n (x:xs) = (x + n) : xs
Here we don't have any bit of source code that just represents the function value that incrementAllBy is a name for. The function is implied in this syntax, spread over (possibly) multiple "rules" that say what value our function returns given that it is applied to arguments of a certain form. This syntax also fundamentally forces us to bind the function to a name. All of that is in contrast to lambda syntax which just directly represents the function itself, with no bundled case analysis or name.
However they are both merely different ways of writing down functions. Once defined, there is no difference between the functions, whichever syntax you used to express them.
You mentioned that you were learning about closures. It's not really clear how that relates to the question, but I can guess it's being a bit confusing.
I'm going to say something slightly controversial here: you don't need to learn about closures.3
A closure is what is involved in making something like incrementAllBy n xs = map (\x -> x + n) xs work. The function created here \x -> x + n depends on n, which is a parameter so it can be different every time incrementAllBy is called, and there can be multiple such calls running at the same time. So this \x -> x + n function can't end up as just a chunk of compiled code at a particular address in the program's binary, the way top-level functions are. The structure in memory that is passed to map has to either store a copy of n or store a reference to it. Such a structure is called a "closure", and is said to have "closed over" n, or "captured" it.
In Haskell, you don't need to know any of that. I view the expression \n -> x + n as simply creating a new function value, depending on the value n (and also the value +, which is a first-class value too!) that happens to be in scope. I don't think you need to think about this any differently than you would think about the expression x + n creating a new numeric value depending on a local n. Closures in Haskell only matter when you're trying to understand how the language is implemented, not when you're programming in Haskell.
Closures do matter in imperative languages. There the question of whether (the equivalent of) \x -> x + n stores a reference to n or a copy of n (and when the copy is taken, and how deeply) is critical to understanding how code using this function works, because n isn't just a way of referring to a value, it's a variable that has (potentially) different values over time.
But in Haskell I don't really think we should teach beginners about the term or concept of closures. It vastly over-complicates "you can make new functions out of existing values, even ones that are only locally in scope".
So if you have been given these two functions as examples to try and illustrate the concept of closures, and it isn't making sense to you what difference this "closure" makes, you can probably ignore the whole issue and move on to something more important.
1 Sometimes which choice of "equivalent" syntax you use to write your code does affect operational behaviour, like performance. Usually (but not always) the effect is negligible. As a beginner, I highly recommend ignoring such issues for now, so I haven't mentioned them. It's much easier to learn the principles involved in reasoning about how your code might be executed once you've got a thorough understanding of what all the language elements are supposed to mean.
It can sometimes also affect the way GHC infers types (mostly not actually the fact that they're lambdas, but if you bind function names without syntactic parameters as in plusOne = \x -> x + 1 you can trip up the monomorphism restriction, but that's another topic covered in many Stack Overflow questions, so I won't address it here).
2 In this case you could also use an operator section to write an even simpler function literal as (+1).
3 Now I'm going to teach you about closures so I can explain why you don't need to know about closures. :P
There is no difference whatsoever, except one: lambda expressions don't need a name. For that reason, they are sometimes called "anonymous functions" in other languages.
If you plan to use a function often, you'll want to give it a name, if you only need it once, a lambda will usually do, as you can define it at the site where you use it.
You can, of course, name an anonymous function after it is born:
genIfEven2 = \f x -> isEven f x
That would be completely equivalent to your definition.

How to quickly read the do notation without translating to >>= compositions?

This question is related to this post: Understanding do notation for simple Reader monad: a <- (*2), b <- (+10), return (a+b)
I don't care if a language is hard to understand if it promises to solve some problems that easy to understand languages give us. I've been promised that the impossibility of changing state in Haskell (and other functional languages) is a game changer and I do believe that. I've had too many bugs in my code related to state and I totally agree with this post that reasoning about the interaction of objects in OOP languages is near impossible because they can change states, and thus in order to reason about code we should consider all the possible permutations of these states.
However, I've been finding that reasoning about Haskell monads is also very hard. As you can see in the answers to the question I linked, we need a big diagram to understand 3 lines of the do notation. I always end up opening stackedit.io to desugar the do notation by hand and write step by step the >>= applications of the do notation in order to understand the code.
The problem is more or less like this: in the majority of the cases when we have S a >>= f we have to unwrap a from S and apply f to it. However, f is actually another thing more or less in the formS a >>= g, which we also have to unwrap and so on. Human brain doesn't work like that, we can't easily apply these things in the head and stop, keep them in the brain's stack, and continue applying the rest of the >>= until we reach the end. When the end is reached, we get all those things stored in the brain's stack and glue them together.
Therefore, I must be doing something wrong. There must be an easy way to understand '>>= composition' in the brain. I know that do notation is very simple, but I can only think of that as a way to easily write >>= compositions. When I see the do notation I simply translate it to a bunch of >>=. I don't see it as a separate way of understanding code. If there is a way, I'd like someone to tell me.
So the question is: how to read the do notation?
Given a simple code like
foo :: Monad m => m Int -> m Int -> m Int
foo x y = do
a <- y -- I'm intentionally doing y first; see the Either example
b <- x
return (a + b)
you can't say much about <- except that it "gets" an Int value from x or y. What "get" means depends very much on what m is.
Some examples:
m ~ Maybe
foo (Just 3) (Just 5) evaluates to Just 8; replace either argument with Nothing, and you get Nothing. <- tries to get a value out the Maybe Int value, but aborts the rest of the block if it fails.
m ~ Either a
Pretty much the same as Maybe, but replacing Nothing with the first Left value that it encounters. foo (Right 3) (Right 5) returns Right 8. foo x (Left "foo") returns Left "foo", whether x is a Right or Left value.
m ~ []
Now, instead of getting an Int, <- gets every Int from among the given choices. It does so nondeterministically; you can imagine that the function "forks" into multiple parallel copies, each one having chosen a different value from its list. In the end, the final result is a list of all the results that were computed.
foo [1,2] [3,4] returns [4, 5, 5, 6] ([3 + 1, 3 + 2, 4 + 1, 4 + 2]).
m ~ IO
This one is tricky, because unlike the previous monads we've looked at, there isn't necessarily a value yet to get. foo readLn readLn will return whatever the sum of the two numbers read from standard input is, with the possibility of a run-time error should the strings so read not be parseable as Int values.
You might think of it as working like the Maybe monad, but with run-time exceptions replacing Nothing.
Part 1: no need to go into the weeds
There is actually a very simple, easy to grasp, intuition behind monads: they encode the order of stuff happening. Like, first do this thing, then do the other thing, then do the third thing. For example:
executeMadDoctrine = do
wait oneYear
s <- evaluatePoliticalSituation
case s of
Stable -> do
printInNewspapers "We're going to live another day"
executeMadDoctrine -- recursive call
Unstable -> do
printInNewspapers "Run for your lives"
launchMissiles
return ()
Or a slightly more realistic (and also compilable and executable) example:
main = do
putStrLn "What's your name?"
name <- getLine
if name == "EXIT" then
return ()
else do
putStrLn $ "Hi, " <> name
main
Simple. Just like Python. Human brain does, indeed, work exactly like this.
You see, you don't need to know how it all works inside, unless you start doing more advanced things. After all, you're probably not thinking about the order of cylinders firing every time you start your car, do you? You just hit the gas and it goes. It's the same with do.
Part 2: you picked a bad example
The example you picked in your previous question is not the best candidate for this stuff. The Monad instance for functions is indeed a bit brain-wrecking. Even I have to make a little effort to understand what's going on - and I've been doing Haskell professionally for quite some time.
The trouble here is mathematics. The bloody thing turns out unreasonably effective time after time, especially when nobody is asking it to.
Think about this: first we had perfectly good natural numbers that we could very well understand. I have two eyes, and you have one sword, I better run. But then it turned out that we need zero. Why the bloody hell do we need it? It's sacrilege! You can't write down something that isn't! But it turns out you have to have it. It unambiguously follows from the other stuff we know is true. And then we got irrational numbers. WTF is that? How do I even understand it? I can't have π oranges after all, can I? But they too must exist. It just follows. No way around it. And then complex numbers, transcendental, hypercomplex, unconstructible... My brain is boiling at this point.
It's sort of the same with monads: there is this peculiar mathematical object, and at some point somebody noticed that it's very good for expressing the order of computation, so we appropriated monads for that. But then it turns out that all kinds of things can be made to look like monads, mathematically speaking. No way around it, it just is.
And so we have all these funny instances. And the do notation still works for them, because they're monads (mathematically speaking), but it's no longer about order. Like, did you know that lists were monads too? But just like with functions, the interpretation for lists is not "order", it's nested loops. And if you combine lists with something else, you get non-determinism. Fun stuff.
But just like with different kinds of numbers, you can learn. You can build up intuition over time. Do you absolutely have to? See part 1.
Any long do chains can be re-arranged into the equivalent binary do, by the associativity law of monads, grouping everything on the right, as
do { A ; B ; C ; ... }
===
do { A ; r <- do { B ; C ; ... } ; return r }.
So we only need to understand this binary do form to understand everything else. And that is expressed as single >>= combination.
Then, treat do code's interpretation (for a particular monad) axiomatically instead, as a bunch of re-write rules. Convince yourself about the validity of those rules for a particular monad just once (yes, using possibly extensive >>=-based re-writes, once).
So, for the Reader monad from the question's linked entry,
(do { S f }) x === f x
(do { a <- S f ; return (h a) }) x === let {a = f x} in h a
=== h (f x)
(do { a <- S f ; === let {a = f x ;
b <- S g ; b = g x} in h a b
return (h a b) }) x === h (f x) (g x)
and any longer chain of lets is expressible as nested binary lets, equivalently.
The last one is liftM2 actually, so an argument could be made that understanding a particular monad means understanding its particular liftM2 (*), really.
And those Ss, we end up just ignoring them as noise, forced on us by Haskell syntax (well, that question didn't use them at all, but it could).
(*) more precisely, liftBind, (do { a <- S f ; b <- k a ; return (h a b) }) x === let {a = f x ; b = g x } in h a b where (S g) x === k a x. (specifically, this, after the words "the long version")
And so, your attitude of "When I see the do notation I simply translate it to a bunch of >>=. I don't see it as a separate way of understanding code" could actually be the problem.
do notation is your friend. Personally, I first hated it, then learned to love it, and now I see the >>=-based re-writes as its (low-level) implementation, more and more.
And, even more abstractly, do can equivalently be written as Monad Comprehensions, looking just like list comprehensions!
#chepner has already included in his answer quite a lot of what I would have said, but I wish to emphasise another aspect, which I feel is quite pertinent to this question: the fact that do notation is, for most developers, a much easier and more easily understandable way to work with any monadic expression which is at least moderately complex.
The reason for this is that, in an almost miraculous way, do blocks end up very much resembling code written in an imperative language. Imperative style code is much easier to understand for the majority of developers, and not only because it's by far the most common paradigm: it gives an explicit "recipe" for what a piece of code is doing, whereas more typical Haskell expressions, particularly monadic ones involving nested lambdas and >>= everywhere, very easily become difficult to comprehend.
In saying this I certainly do not mean that one should code in an imperative language as opposed to Haskell. The advantages of the pure functional style are well documented and seemingly well understood by the OP, so I will not go into them here. But Haskell's do notation allows one to write code in an imperative-looking "style", which therefore is explicit and easier to comprehend - at least on a small scale - while sacrificing none of the advantages of using a pure functional language.
This "imperative style" of do notation is, I feel, more visible with some monads than others, and I wish to illustrate my point with examples from a couple of monads which I find suit the "imperative style" well. First, IO, where I can give this simple example:
greet :: IO ()
greet = do
putStrLn "Hello, what is your name?"
name <- readLine
putStrLn $ "Pleased to meet you, " ++ name ++ "!"
I hope it's immediately obvious what this code does, when executed by the Haskell runtime. What I wish to emphasise is how similar it is to imperative code, for example, this Python translation (which is not the most idiomatic, but has been chosen to exactly match the Haskell code line for line).
def greet():
print("Hello, what is your name?")
name = input()
print("Pleased to meet you, " + name + "!")
Now ask yourself, how easy would the code be to understand in its desugared from, without do?
greet = putStrLn "Hello, what is your name?" >> readLine >>= \name -> putStrLn $ "Pleased to meet you, " ++ name ++ "!"
It's not particularly difficult, granted - but I hope you agree that it's much more "noisy" than the do block above. I can't speak for others, but I very much doubt I'm alone in saying that the latter version might take me 10-20 seconds or so to fully comprehend, whereas the do block is instantly comprehensible. And this of course is an extremely simple action - anything more complicated, as found in many real-world applications, makes the difference in comprehensibility much greater.
I have chosen IO for a reason, of course - I think it's in dealing with IO in particular that it's most natural to think in terms of "do this action, then if the result is that then do the next action, otherwise...". While the semantics of the IO monad fits this perfectly, it's much easier to translate the code into something like that when written in quasi-imperative notation than it is to use >>= directly. And the do notation is easier to write, too.
But although IO is the clearest example of this, it's certainly not the only one. Another great example is the State monad. Here's a simple example of using it to find the sum of a list of integers (and I know you wouldn't actually do that this way, but it's just a very simple example of some not-totally-trivial code that used this monad):
sumList :: State [Int] Int
sumList = go 0
where go subtotal = do
remaining <- get
case remaining of
[] -> return subtotal
(x:xs) -> do
put xs
go $ subtotal + X
Here, in my opinion, the steps are very clear - the auxiliary function go successively adds the first element of the list to the running total, while updating the internal state with the tail of the list. When there is no more list, it returns the running total. (Given the above, the function evalState sumList will take an actual list and sum it.)
One can probably come up with better examples (particularly ones where the calculation involved isn't trivial to do in other ways), but my point is hopefully still clear: rewriting the above with >>= and lambdas would make it much less comprehensible.
do notation is, in my opinion, why the often-quoted quip about Haskell being "the world's finest imperative language", has more than a grain of truth. By using and defining different monads one can write easily understandable "imperative" code in a wide variety of situations - while still having guarantees that various functions can't, for example, change global state. It's in many ways the best of both worlds.

How does the <<loop>> error "work" in detail?

I'm working on this tool where the user can define-and-include in [config files | content text-files | etc] their own "templates" (like mustache etc) and these can reference others so they can induce a loop. Just when I was about to create a "max-loops" setting I realized with runghc the program after a while just quits with farewell message of just <<loop>>. That's actually good enough for me but raises a few ponderations:
how does GHC or the runtime actually detect it's stuck in a loop, and how would it distinguish between a wanted long-running operation and an accidental infinite loop? The halting problem is still a thing last I checked..
any (time or iteration) limits that can be custom-set to the compiler or the runtime?
is this runghc-only or does it exist in all final compile outputs?
will any -o (optimization) flags set much later when building releases disable this apparent built-in loop detection?
All stuff I can figure out the hard way of course, but who knows maybe someone already looked into this in more detail.. (hard to google/ddg for "haskell" "<<loop>>" because they strip the angle brackets and then show results for "how to loop in Haskell" etc..)
This is a simple "improvement" of the STG runtime which was implemented in GHC. I'll share what I have understood, but GHC experts can likely provide more useful and accurate information.
GHC compiles to an intermediate language called Core, after having done several optimizations. You can see it using ghc -ddump-simpl ...
Very roughly, in Core, an unevaluated binding (like let x = 1+y+x in f x) creates a thunk. Some memory is allocated somewhere to represent the closure, and x is made to point at it.
When (and if) x is forced by f, then the thunk is evaluated. Here's the improvement: before the evaluation starts, the thunk of x is overwritten with a special value called BLACKHOLE. After x is evaluated (to WHNF) then the black hole is again overwritten with the actual value (so we don't recompute it if e.g. f x = x+x).
If the black hole is ever forced, <<loop>> is triggered. This is actually an IO exception (those can be raised in pure code, too, so this is fine).
Examples:
let x = 1+x in 2*x -- <<loop>>
let g x = g (x+1) in g 0 -- diverges
let h x = h (10-x) in h 0 -- diverges, even if h 0 -> h 10 -> h 0 -> ...
let g0 = g10 ; g10 = g0 in g0 -- <<loop>>
Note that each call of h 0 is considered a distinct thunk, hence no black hole is forced there.
The tricky part is that it's not completely trivial to understand which thunks are actually created in Core since GHC can perform several optimizations before emitting Core. Hence, we should regard <<loop>> as a bonus, not as a given / hard guarantee by GHC. New optimizations in the future might replace some <<loop>>s with actual non-termination.
If you want to google something, "GHC, blackhole, STG" should be good keywords.

Tail optimization guarantee - loop encoding in Haskell

So the short version of my question is, how are we supposed to encode loops in Haskell, in general? There is no tail optimization guarantee in Haskell, bang patterns aren't even a part of the standard (right?), and fold/unfold paradigm is not guaranteed to work in all situation. Here's case in point were only bang-patterns did the trick for me of making it run in constant space (not even using $! helped ... although the testing was done at Ideone.com which uses ghc-6.8.2).
It is basically about a nested loop, which in list-paradigm can be stated as
prod (sum,concat) . unzip $
[ (c, [r | t]) | k<-[0..kmax], j<-[0..jmax], let (c,r,t)=...]
prod (f,g) x = (f.fst $ x, g.snd $ x)
Or in pseudocode:
let list_store = [] in
for k from 0 to kmax
for j from 0 to jmax
if test(k,j)
list_store += [entry(k,j)]
count += local_count(k,j)
result = (count, list_store)
Until I added the bang-pattern to it, I got either a memory blow-out or even a stack overflow. But bang patterns are not part of the standard, right? So the question is, how is one to code the above, in standard Haskell, to run in constant space?
Here is the test code. The calculation is fake, but the problems are the same. EDIT: The foldr-formulated code is:
testR m n = foldr f (0,[])
[ (c, [(i,j) | (i+j) == d ])
| i<- [0..m], j<-[0..n],
let c = if (rem j 3) == 0 then 2 else 1 ]
where d = m + n - 3
f (!c1, []) (!c, h) = (c1+c,h)
f (!c1, (x:_)) (!c, h) = (c1+c,x:h)
Trying to run print $ testR 1000 1000 produces stack overflow. Changing to foldl only succeeds if using bang-patterns in f, but it builds the list in reversed order. I'd like to build it lazily, and in the right order. Can it be done with any kind of fold, for the idiomatic solution?
EDIT: to sum up the answer I got from #ehird: there's nothing to fear using bang pattern. Though not in standard Haskell itself it is easily encoded in it as f ... c ... = case (seq c False) of {True -> undefined; _ -> ...}. The lesson is, only pattern match forces a value, and seq does NOT force anything by itself, but rather arranges that when seq x y is forced - by a pattern match - x will be forced too, and y will be the answer. Contrary to what I could understand from the Online Report, $! does NOT force anything by itself, though it is called a "strict application operator".
And the point from #stephentetley - strictness is very important in controlling the space behaviour. So it is perfectly OK to encode loops in Haskell with proper usage of strictness annotations with bang patterns, where needed, to write any kind of special folding (i.e. structure-consuming) function that is needed - like I ended up doing in the first place - and rely on GHC to optimize the code.
Thank you very much to all for your help.
Bang patterns are simply sugar for seq — whenever you see let !x = y in z, that can be translated into let x = y in x `seq` z. seq is standard, so there's no issue with translating programs that use bang patterns into a portable form.
It is true that Haskell makes no guarantees about performance — the report does not even define an evaluation order (only that it must be non-strict), let alone the existence or behaviour of a runtime stack. However, while the report doesn't specify a specific method of implementation, you can certainly optimise for one.
For example, call-by-need (and thus sharing) is used by all Haskell implementations in practice, and is vital for optimising Haskell code for memory usage and speed. Indeed, the pure memoisation trick1 (as relies on sharing (without it, it'll just slow things down).
This basic structure lets us see, for example, that stack overflows are caused by building up too-large thunks. Since you haven't posted your entire code, I can't tell you how to rewrite it without bang patterns, but I suspect [ (c, [r | t]) | ... ] should become [ c `seq` r `seq` t `seq` (c, [r | t]) | ... ]. Of course, bang patterns are more convenient; that's why they're such a common extension! (On the other hand, you probably don't need to force all of those; knowing what to force is entirely dependent on the specific structure of the code, and wildly adding bang patterns to everything usually just slows things down.)
Indeed, "tail recursion" per se does not mean all that much in Haskell: if your accumulator parameters aren't strict, you'll overflow the stack when you later try to force them, and indeed, thanks to laziness, many non-tail-recursive programs don't overflow the stack; printing repeat 1 won't ever overflow the stack, even though the definition — repeat x = x : repeat x — clearly has recursion in a non-tail position. This is because (:) is lazy in its second argument; if you traverse the list, you'll have constant space usage, as the repeat x thunks are forced, and the previous cons cells are thrown away by the garbage collector.
On a more philosophical note, tail-recursive loops are generally considered suboptimal in Haskell. In general, rather than iteratively computing a result in steps, we prefer to generate a structure with all the step-equivalents at the leaves, and do a transformation (like a fold) on it to produce the final result. This is a much higher-level view of things, made efficient by laziness (the structure is built up and garbage-collected as it's processed, rather than all at once).2
This can take some getting used to at first, and it certainly doesn't work in all cases — extremely complicated loop structures might be a pain to translate efficiently3 — but directly translating tail-recursive loops into Haskell can be painful precisely because it isn't really all that idiomatic.
As far as the paste you linked to goes, id $! x doesn't work to force anything because it's the same as x `seq` id x, which is the same as x `seq` x, which is the same as x. Basically, whenever x `seq` y is forced, x is forced, and the result is y. You can't use seq to just force things at arbitrary points; you use it to cause the forcing of thunks to depend on other thunks.
In this case, the problem is that you're building up a large thunk in c, so you probably want to make auxk and auxj force it; a simple method would be to add a clause like auxj _ _ c _ | seq c False = undefined to the top of the definition. (The guard is always checked, forcing c to be evaluated, but always results in False, so the right-hand side is never evaluated.)
Personally, I would suggest keeping the bang pattern you have in the final version, as it's more readable, but f c _ | seq c False = undefined would work just as well too.
1 See Elegant memoization with functional memo tries and the data-memocombinators library.
2 Indeed, GHC can often even eliminate the intermediate structure entirely using fusion and deforestation, producing machine code similar to how the computation would be written in a low-level imperative language.
3 Although if you have such loops, it's quite possible that this style of programming will help you simplify them — laziness means that you can easily separate independent parts of a computation out into separate structures, then filter and combine them, without worrying that you'll be duplicating work by making intermediate computations that will later be thrown away.
OK let's work from the ground up here.
You have a list of entries
entries = [(k,j) | j <- [0..jmax], k <- [0..kmax]]
And based on those indexes, you have tests and counts
tests m n = map (\(k,j) -> j + k == m + n - 3) entries
counts = map (\(_,j) -> if (rem j 3) == 0 then 2 else 1) entries
Now you want to build up two things: a "total" count, and the list of entries that "pass" the test. The problem, of course, is that you want to generate the latter lazily, while the former (to avoid exploding the stack) should be evaluated strictly.
If you evaluate these two things separately, then you must either 1) prevent sharing entries (generate it twice, once for each calculation), or 2) keep the entire entries list in memory. If you evaluate them together, then you must either 1) evaluate strictly, or 2) have a lot of stack space for the huge thunk created for the count. Option #2 for both cases is rather bad. Your imperative solution deals with this problem simply by evaluating simultaneously and strictly. For a solution in Haskell, you could take Option #1 for either the separate or the simultaneous evaluation. Or you could show us your "real" code and maybe we could help you find a way to rearrange your data dependencies; it may turn out you don't need the total count, or something like that.

Is everything in Haskell stored in thunks, even simple values?

What do the thunks for the following value/expression/function look like in the Haskell heap?
val = 5 -- is `val` a pointer to a box containing 5?
add x y = x + y
result = add 2 val
main = print $ result
Would be nice to have a picture of how these are represented in Haskell, given its lazy evaluation mode.
Official answer
It's none of your business. Strictly implementation detail of your compiler.
Short answer
Yes.
Longer answer
To the Haskell program itself, the answer is always yes, but the compiler can and will do things differently if it finds out that it can get away with it, for performance reasons.
For example, for '''add x y = x + y''', a compiler might generate code that works with thunks for x and y and constructs a thunk as a result.
But consider the following:
foo :: Int -> Int -> Int
foo x y = x * x + y * y
Here, an optimizing compiler will generate code that first takes x and y out of their boxes, then does all the arithmetic, and then stores the result in a box.
Advanced answer
This paper describes how GHC switched from one way of implementing thunks to another that was actually both simpler and faster:
http://research.microsoft.com/en-us/um/people/simonpj/papers/eval-apply/
In general, even primitive values in Haskell (e.g. of type Int and Float) are represented by thunks. This is indeed required by the non-strict semantics; consider the following fragment:
bottom :: Int
bottom = div 1 0
This definition will generate a div-by-zero exception only if the value of bottom is inspected, but not if the value is never used.
Consider now the add function:
add :: Int -> Int -> Int
add x y = x+y
A naive implementation of add must force the thunk for x, force the thunk for y, add the values and create an (evaluated) thunk for the result. This is a huge overhead for arithmetic compared to strict functional languages (not to mention imperative ones).
However, an optimizing compiler such as GHC can mostly avoid this overhead; this is a simplified view of how GHC translates the add function:
add :: Int -> Int -> Int
add (I# x) (I# y) = case# (x +# y) of z -> I# z
Internally, basic types like Int is seen as datatype with a single constructor. The type Int# is the "raw" machine type for integers and +# is the primitive addition on raw types.
Operations on raw types are implemented directly on bit-patterns (e.g. registers) --- not thunks. Boxing and unboxing are then translated as constructor application and pattern matching.
The advantage of this approach (not visible in this simple example) is that the compiler is often capable of inlining such definitions and removing intermediate boxing/unboxing operations, leaving only the outermost ones.
It would be absolutely correct to wrap every value in a thunk. But since Haskell is non-strict, compilers can choose when to evaluate thunks/expressions. In particular, compilers can choose to evaluate an expression earlier than strictly necessary, if it results in better code.
Optimizing Haskell compilers (GHC) perform Strictness analysis to figure out, which values will always be computed.
In the beginning, the compiler has to assume, that none of a function's arguments are ever used. Then it goes over the body of the function and tries to find functions applications that 1) are known to be strict in (at least some of) their arguments and 2) always have to be evaluated to compute the function's result.
In your example, we have the function (+) that is strict in both it's arguments. Thus the compiler knows that both x and y are always required to be evaluated at this point.
Now it just so happens, that the expression x+y is always necessary to compute the function's result, therefore the compiler can store the information that the function add is strict in both x and y.
The generated code for add* will thus expect integer values as parameters and not thunks. The algorithm becomes much more complicated when recursion is involved (a fixed point problem), but the basic idea remains the same.
Another example:
mkList x y =
if x then y : []
else []
This function will take x in evaluated form (as a boolean) and y as a thunk. The expression x needs to be evaluated in every possible execution path through mkList, thus we can have the caller evaluate it. The expression y, on the other hand, is never used in any function application that is strict in it's arguments. The cons-function : never looks at y it just stores it in a list. Thus y needs to be passed as a thunk in order to satisfy the lazy Haskell semantics.
mkList False undefined -- absolutely legal
*: add is of course polymorphic and the exact type of x and y depends on the instantiation.
Short answer: Yes.
Long answer:
val = 5
This has to be stored in a thunk, because imagine if we wrote this anywhere in our code (like, in a library we imported or something):
val = undefined
If this has to be evaluated when our program starts, it would crash, right? If we actually use that value for something, that would be what we want, but if we don't use it, it shouldn't be able to influence our program so catastrophically.
For your second example, let me change it a little:
div x y = x / y
This value has to be stored in a thunk as well, because imagine some code like this:
average list =
if null list
then 0
else div (sum list) (length list)
If div was strict here, it would be evaluated even when the list is null (aka. empty), meaning that writing the function like this wouldn't work, because it wouldn't have a chance to return 0 when given the empty list, even though this is what we would want in this case.
Your final example is just a variation of example 1, and it has to be lazy for the same reasons.
All this being said, it is possible to force the compiler to make some values strict, but that goes beyond the scope of this question.
I think the others answered your question nicely, but just for completeness's sake let me add that GHC offers you the possibility of using unboxed values directly as well. This is what Haskell Wiki says about it:
When you are really desperate for speed, and you want to get right down to the “raw bits.” Please see GHC Primitives for some information about using unboxed types.
This should be a last resort, however, since unboxed types and primitives are non-portable. Fortunately, it is usually not necessary to resort to using explicit unboxed types and primitives, because GHC's optimiser can do the work for you by inlining operations it knows about, and unboxing strict function arguments. Strict and unpacked constructor fields can also help a lot. Sometimes GHC needs a little help to generate the right code, so you might have to look at the Core output to see whether your tweaks are actually having the desired effect.
One thing that can be said for using unboxed types and primitives is that you know you're writing efficient code, rather than relying on GHC's optimiser to do the right thing, and being at the mercy of changes in GHC's optimiser down the line. This may well be important to you, in which case go for it.
As mentioned, it's non-portable, so you need a GHC language extension. See here for their docs.

Resources