How does the <<loop>> error "work" in detail? - haskell

I'm working on this tool where the user can define-and-include in [config files | content text-files | etc] their own "templates" (like mustache etc) and these can reference others so they can induce a loop. Just when I was about to create a "max-loops" setting I realized with runghc the program after a while just quits with farewell message of just <<loop>>. That's actually good enough for me but raises a few ponderations:
how does GHC or the runtime actually detect it's stuck in a loop, and how would it distinguish between a wanted long-running operation and an accidental infinite loop? The halting problem is still a thing last I checked..
any (time or iteration) limits that can be custom-set to the compiler or the runtime?
is this runghc-only or does it exist in all final compile outputs?
will any -o (optimization) flags set much later when building releases disable this apparent built-in loop detection?
All stuff I can figure out the hard way of course, but who knows maybe someone already looked into this in more detail.. (hard to google/ddg for "haskell" "<<loop>>" because they strip the angle brackets and then show results for "how to loop in Haskell" etc..)

This is a simple "improvement" of the STG runtime which was implemented in GHC. I'll share what I have understood, but GHC experts can likely provide more useful and accurate information.
GHC compiles to an intermediate language called Core, after having done several optimizations. You can see it using ghc -ddump-simpl ...
Very roughly, in Core, an unevaluated binding (like let x = 1+y+x in f x) creates a thunk. Some memory is allocated somewhere to represent the closure, and x is made to point at it.
When (and if) x is forced by f, then the thunk is evaluated. Here's the improvement: before the evaluation starts, the thunk of x is overwritten with a special value called BLACKHOLE. After x is evaluated (to WHNF) then the black hole is again overwritten with the actual value (so we don't recompute it if e.g. f x = x+x).
If the black hole is ever forced, <<loop>> is triggered. This is actually an IO exception (those can be raised in pure code, too, so this is fine).
Examples:
let x = 1+x in 2*x -- <<loop>>
let g x = g (x+1) in g 0 -- diverges
let h x = h (10-x) in h 0 -- diverges, even if h 0 -> h 10 -> h 0 -> ...
let g0 = g10 ; g10 = g0 in g0 -- <<loop>>
Note that each call of h 0 is considered a distinct thunk, hence no black hole is forced there.
The tricky part is that it's not completely trivial to understand which thunks are actually created in Core since GHC can perform several optimizations before emitting Core. Hence, we should regard <<loop>> as a bonus, not as a given / hard guarantee by GHC. New optimizations in the future might replace some <<loop>>s with actual non-termination.
If you want to google something, "GHC, blackhole, STG" should be good keywords.

Related

Robust haskell without errors

I'm currently learning Haskell. One of the motivations why I picked this language was to write software with a very high degree of robustness, i.e. functions that is fully defined, mathematically deterministic and never crash or produce errors. I don't mean failure caused by predicates of the system ("system out of memory", "computer on fire" etc), those are not interesting and can simply crash the entire process. I also don't mean errornous behaviour caused by invalid declarations (pi = 4).
Instead I refer to errors caused by erroneous states which I want to remove by making those states unrepresentable and non-compilable (in some functions) via strict static typing. In my mind I called these functions "pure" and figured that the strong type system would allow me to accomplish this. However Haskell does not define "pure" in this way and allow programs to crash via error in any context.
Why is catching an exception non-pure, but throwing an exception is pure?
This is completely acceptable and not strange at all. The disappointing thing however is that Haskell does not appear to provide some functionality to forbid function definitions that could lead to branches using error.
Here's a contrived example why I find this disappointing:
module Main where
import Data.Maybe
data Fruit = Apple | Banana | Orange Int | Peach
deriving(Show)
readFruit :: String -> Maybe Fruit
readFruit x =
case x of
"apple" -> Just Apple
"banana" -> Just Banana
"orange" -> Just (Orange 4)
"peach" -> Just Peach
_ -> Nothing
showFruit :: Fruit -> String
showFruit Apple = "An apple"
showFruit Banana = "A Banana"
showFruit (Orange x) = show x ++ " oranges"
printFruit :: Maybe Fruit -> String
printFruit x = showFruit $ fromJust x
main :: IO ()
main = do
line <- getLine
let fruit = readFruit line
putStrLn $ printFruit fruit
main
Let's say that I'm paranoid that the pure functions readFruit and printFruit really doesn't fail due to unhanded states. You could imagine that the code is for launching a rocket full of astronauts which in an absolutely critical routine needs to serialize and unserialize fruit values.
The first danger is naturally that we made a mistake in our pattern matching as this gives us the dreaded erroneous states that can't be handled. Thankfully Haskell provides built in ways to prevent those, we simply compile our program with -Wall which includes -fwarn-incomplete-patterns and AHA:
src/Main.hs:17:1: Warning:
Pattern match(es) are non-exhaustive
In an equation for ‘showFruit’: Patterns not matched: Peach
We forgot to serialize Peach fruits and showFruit would have thrown an error. That's an easy fix, we simply add:
showFruit Peach = "A peach"
The program now compiles without warnings, danger averted! We launch the rocket but suddenly the program crashes with:
Maybe.fromJust: Nothing
The rocket is doomed and hits the ocean caused by the following faulty line:
printFruit x = showFruit $ fromJust x
Essentially fromJust has a branch where it raises an Error so we didn't even want the program to compile if we tried to use it since printFruit absolutely had to be "super" pure. We could have fixed that for example by replacing the line with:
printFruit x = maybe "Unknown fruit!" (\y -> showFruit y) x
I find it strange that Haskell decides to implement strict typing and incomplete pattern detection, all in the nobel pursuit of preventing invalid states from being representable but falls just in front of the finish line by not giving programmers a way to detect branches to error when they are not allowed. In a sense this makes Haskell less robust then Java which forces you to declare the exceptions your functions are allowed to raise.
The most trivial way to implement this would be to simply undefined error in some way, locally for a function and any function used by its equation via some form of associated declaration. However this does not appear to be possible.
The wiki page about errors vs exceptions mentions an extension called "Extended Static Checking" for this purpose via contracts but it just leads to a broken link.
It basically boils down to: How do I get the program above to not compile because it uses fromJust? All ideas, suggestions and solutions are welcome.
Haskell allows arbitrary general recursion, so it's not as though all Haskell programs would necessarily be total, if only error in its various forms were removed. That is, you could just define error a = error a to take care of any case you don't want to handle anyways. A runtime error is just more helpful than an infinite loop.
You should not think of error as similar to a garden-variety Java exception. error is an assertion failure representing a programming error, and a function like fromJust which can call error is an assertion. You should not attempt to catch the exception produced by error except in special cases like a server that must continue to run even when the handler for a request encounters a programming error.
Your main complaint is about fromJust, but given it is fundamentally against what your goals are, why are you using it?
Remember that the biggest reason for error is that not everything can be defined in a way that the type system can guarantee, so having a "This can't happen, trust me" makes sense. fromJust comes from this mindset of "sometimes I know better than the compiler". If you don't agree with that mindset don't use it. Alternatively don't abuse it like this example does.
EDIT:
To expand on my point, think of how you would implement what you are asking for (warnings on using partial functions). Where would the warning apply in the following code?
a = fromJust $ Just 1
b = Just 2
c = fromJust $ c
d = fromJust
e = d $ Just 3
f = d b
All of these are statically not partial after all. (Sorry if this next one's syntax is off, it is late)
g x = fromJust x + 1
h x = g x + 1
i = h $ Just 3
j = h $ Nothing
k x y = if x > 0 then fromJust y else x
l = k 1 $ Just 2
m = k (-1) $ Just 3
n = k (-1) $ Nothing
i here is again safe, but j is not. Which method should return the warning? What do we do when some or all of these methods are defined outside of our function. What do you do in the case of k where it is conditionally partial? Should some or all of these functions fail?
By making the compiler make this call you are conflating a lot of complicated problems. Especially when you consider that careful library selection avoids it.
In either case the solution IMO to these kinds of problems is to use a tool after or during compilation to look for problems rather than modifying the compiler. Such a tool can more easily understand "your code" vs "their code" and better determine where best to warn you upon inappropriate use. You could also tune it without having to add a bunch of annotations to your source file to avoid false positives. I don't know a tool offhand or I would suggest one.
The answer was that I wanted was a level of totality checking that Haskell unfortunately does not provide (yet?).
Alternatively I want dependent types (e.g. Idris), or a static verifier (e.g. Liquid Haskell) or syntax lint detection (e.g. hlint).
I'm really looking into Idris now, it seems like an amazing language. Here's a talk by the founder of Idris that I can recommend watching.
Credits for these answers goes to #duplode, #chi, #user3237465 and #luqui.

Why is there no implicit parallelism in Haskell?

Haskell is functional and pure, so basically it has all the properties needed for a compiler to be able to tackle implicit parallelism.
Consider this trivial example:
f = do
a <- Just 1
b <- Just $ Just 2
-- ^ The above line does not utilize an `a` variable, so it can be safely
-- executed in parallel with the preceding line
c <- b
-- ^ The above line references a `b` variable, so it can only be executed
-- sequentially after it
return (a, c)
-- On the exit from a monad scope we wait for all computations to finish and
-- gather the results
Schematically the execution plan can be described as:
do
|
+---------+---------+
| |
a <- Just 1 b <- Just $ Just 2
| |
| c <- b
| |
+---------+---------+
|
return (a, c)
Why is there no such functionality implemented in the compiler with a flag or a pragma yet? What are the practical reasons?
This is a long studied topic. While you can implicitly derive parallelism in Haskell code, the problem is that there is too much parallelism, at too fine a grain, for current hardware.
So you end up spending effort on book keeping, not running things faster.
Since we don't have infinite parallel hardware, it is all about picking the right granularity -- too
coarse and there will be idle processors, too fine and the overheads
will be unacceptable.
What we have is more coarse grained parallelism (sparks) suitable for generating thousands or millions of parallel tasks (so not at the instruction level), which map down onto the mere handful of cores we typically have available today.
Note that for some subsets (e.g. array processing) there are fully automatic parallelization libraries with tight cost models.
For background on this see Feedback Directed Implicit Parallelism, where they introduce an automated approach to the insertion of par in arbitrary Haskell programs.
While your code block may not be the best example due to implicit data
dependence between the a and b, it is worth noting that these two
bindings commute in that
f = do
a <- Just 1
b <- Just $ Just 2
...
will give the same results
f = do
b <- Just $ Just 2
a <- Just 1
...
so this could still be parallelized in a speculative fashion. It is worth noting that
this does not need to have anything to do with monads. We could, for instance, evaluate
all independent expressions in a let-block in parallel or we could introduce a
version of let that would do so. The lparallel library for Common Lisp does this.
Now, I am by no means an expert on the subject, but this is my understanding
of the problem.
A major stumbling block is determining when it is advantageous to parallelize the
evaluation of multiple expressions. There is overhead associated with starting
the separate threads for evaluation, and, as your example shows, it may result
in wasted work. Some expressions may be too small to make parallel evaluation
worth the overhead. As I understand it, coming up will a fully accurate metric
of the cost of an expression would amount to solving the halting problem, so
you are relegated to using an heuristic approach to determining what to
evaluate in parallel.
Then it is not always faster to throw more cores at a problem. Even when
explicitly parallelizing a problem with the many Haskell libraries available,
you will often not see much speedup just by evaluating expressions in parallel
due to heavy memory allocation and usage and the strain this puts on the garbage
collector and CPU cache. You end up needing a nice compact memory layout and
to traverse your data intelligently. Having 16 threads traverse linked lists will
just bottleneck you at your memory bus and could actually make things slower.
At the very least, what expressions can be effectively parallelized is something that is
not obvious to many programmers (at least it isn't to this one), so getting a compiler to
do it effectively is non-trivial.
Short answer: Sometimes running stuff in parallel turns out to be slower, not faster. And figuring out when it is and when it isn't a good idea is an unsolved research problem.
However, you still can be "suddenly utilizing all those cores without ever bothering with threads, deadlocks and race conditions". It's not automatic; you just need to give the compiler some hints about where to do it! :-D
One of the reason is because Haskell is non-strict and it does not evaluate anything by default. In general the compiler does not know that computation of a and b terminates hence trying to compute it would be waste of resources:
x :: Maybe ([Int], [Int])
x = Just undefined
y :: Maybe ([Int], [Int])
y = Just (undefined, undefined)
z :: Maybe ([Int], [Int])
z = Just ([0], [1..])
a :: Maybe ([Int], [Int])
a = undefined
b :: Maybe ([Int], [Int])
b = Just ([0], map fib [0..])
where fib 0 = 1
fib 1 = 1
fib n = fib (n - 1) + fib (n - 2)
Consider it for the following functions
main1 x = case x of
Just _ -> putStrLn "Just"
Nothing -> putStrLn "Nothing"
(a, b) part does not need to be evaluated. As soon as you get that x = Just _ you can proceed to branch - hence it will work for all values but a
main2 x = case x of
Just (_, _) -> putStrLn "Just"
Nothing -> putStrLn "Nothing"
This function enforces evaluation of tuple. Hence x will terminate with error while rest will work.
main3 x = case x of
Just (a, b) -> print a >> print b
Nothing -> putStrLn "Nothing"
This function will first print first list and then second. It will work for z (resulting in printing infinite stream of numbers but Haskell can deal with it). b will eventually run out of memory.
Now in general you don't know if computation terminates or not and how many resources it will consume. Infinite lists are perfectly fine in Haskell:
main = maybe (return ()) (print . take 5 . snd) b -- Prints first 5 Fibbonacci numbers
Hence spawning threads to evaluate expression in Haskell might try to evaluate something which is not meant to be evaluated fully - say list of all primes - yet programmers use as part of structure. The above examples are very simple and you may argue that compiler could notice them - however it is not possible in general due to Halting problem (you cannot write program which takes arbitrary program and its input and check if it terminates) - therefore it is not safe optimization.
In addition - which was mentioned by other answers - it is hard to predict whether the overhead of additional thread are worth engaging. Even though GHC doesn't spawn new threads for sparks using green threading (with fixed number of kernel threads - setting aside a few exceptions) you still need to move data from one core to another and synchronize between them which can be quite costly.
However Haskell do have guided parallelization without breaking the purity of language by par and similar functions.
Actually there was such attempt but not on common hardware due to the low available quantity of cores. The project is called Reduceron. It runs Haskell code with a high level of parallelism. In case it was ever released as a proper 2 GHz ASIC core, we'd have a serious breakthrough in Haskell execution speed.

Does Haskell optimizer utilize memoization for repeated function calls in a scope?

Consider this function:
f as = if length as > 100 then length as else 100
Since the function is pure it's obvious that the length will be the same in both calls. My question is does Haskell optimizer turn the code above into equivalent of the following?
f as =
let l = length as
in if l > 100 then l else 100
If it does, then which level setting enables it? If it doesn't, then why? In this scenario a memory waste can't be the reason as explained in this answer, because the introduced variable gets released as soon as the function execution is finished.
Please note that this is not a duplicate of this question because of the local scope, and thus it may get a radically different answer.
GHC now does some CSE by default, as the -fcse flag is on.
On by default.. Enables the common-sub-expression elimination
optimisation. Switching this off can be useful if you have some
unsafePerformIO expressions that you don't want commoned-up.
However, it is conservative, due to the problems with introducing sharing (and thus space leaks).
The CSE pass is getting a bit better though (and this).
Finally, note there is a plugin for full CSE.
http://hackage.haskell.org/package/cse-ghc-plugin
If you have code that could benefit from that.
Even in such a local setting, it is still the case that it is not obvious that the introduction of sharing is always an optimization. Consider this example definition
f = if length [1 .. 1000000] > 0 then head [1 .. 1000000] else 0
vs. this one
f = let xs = [1 .. 1000000] in if length xs > 0 then head xs else 0
and you'll find that in this case, the first behaves much better, as each of the computations performed on the list is cheap, whereas the second version will cause the list to be unfolded completely in memory by length, and it can only be discarded after head has been reduced.
The case you are describing has more to do with common subexpression elimination than memoization, however it seems that GHC currently doesn't do that either because unintended sharing might lead to space leaks.

Tail optimization guarantee - loop encoding in Haskell

So the short version of my question is, how are we supposed to encode loops in Haskell, in general? There is no tail optimization guarantee in Haskell, bang patterns aren't even a part of the standard (right?), and fold/unfold paradigm is not guaranteed to work in all situation. Here's case in point were only bang-patterns did the trick for me of making it run in constant space (not even using $! helped ... although the testing was done at Ideone.com which uses ghc-6.8.2).
It is basically about a nested loop, which in list-paradigm can be stated as
prod (sum,concat) . unzip $
[ (c, [r | t]) | k<-[0..kmax], j<-[0..jmax], let (c,r,t)=...]
prod (f,g) x = (f.fst $ x, g.snd $ x)
Or in pseudocode:
let list_store = [] in
for k from 0 to kmax
for j from 0 to jmax
if test(k,j)
list_store += [entry(k,j)]
count += local_count(k,j)
result = (count, list_store)
Until I added the bang-pattern to it, I got either a memory blow-out or even a stack overflow. But bang patterns are not part of the standard, right? So the question is, how is one to code the above, in standard Haskell, to run in constant space?
Here is the test code. The calculation is fake, but the problems are the same. EDIT: The foldr-formulated code is:
testR m n = foldr f (0,[])
[ (c, [(i,j) | (i+j) == d ])
| i<- [0..m], j<-[0..n],
let c = if (rem j 3) == 0 then 2 else 1 ]
where d = m + n - 3
f (!c1, []) (!c, h) = (c1+c,h)
f (!c1, (x:_)) (!c, h) = (c1+c,x:h)
Trying to run print $ testR 1000 1000 produces stack overflow. Changing to foldl only succeeds if using bang-patterns in f, but it builds the list in reversed order. I'd like to build it lazily, and in the right order. Can it be done with any kind of fold, for the idiomatic solution?
EDIT: to sum up the answer I got from #ehird: there's nothing to fear using bang pattern. Though not in standard Haskell itself it is easily encoded in it as f ... c ... = case (seq c False) of {True -> undefined; _ -> ...}. The lesson is, only pattern match forces a value, and seq does NOT force anything by itself, but rather arranges that when seq x y is forced - by a pattern match - x will be forced too, and y will be the answer. Contrary to what I could understand from the Online Report, $! does NOT force anything by itself, though it is called a "strict application operator".
And the point from #stephentetley - strictness is very important in controlling the space behaviour. So it is perfectly OK to encode loops in Haskell with proper usage of strictness annotations with bang patterns, where needed, to write any kind of special folding (i.e. structure-consuming) function that is needed - like I ended up doing in the first place - and rely on GHC to optimize the code.
Thank you very much to all for your help.
Bang patterns are simply sugar for seq — whenever you see let !x = y in z, that can be translated into let x = y in x `seq` z. seq is standard, so there's no issue with translating programs that use bang patterns into a portable form.
It is true that Haskell makes no guarantees about performance — the report does not even define an evaluation order (only that it must be non-strict), let alone the existence or behaviour of a runtime stack. However, while the report doesn't specify a specific method of implementation, you can certainly optimise for one.
For example, call-by-need (and thus sharing) is used by all Haskell implementations in practice, and is vital for optimising Haskell code for memory usage and speed. Indeed, the pure memoisation trick1 (as relies on sharing (without it, it'll just slow things down).
This basic structure lets us see, for example, that stack overflows are caused by building up too-large thunks. Since you haven't posted your entire code, I can't tell you how to rewrite it without bang patterns, but I suspect [ (c, [r | t]) | ... ] should become [ c `seq` r `seq` t `seq` (c, [r | t]) | ... ]. Of course, bang patterns are more convenient; that's why they're such a common extension! (On the other hand, you probably don't need to force all of those; knowing what to force is entirely dependent on the specific structure of the code, and wildly adding bang patterns to everything usually just slows things down.)
Indeed, "tail recursion" per se does not mean all that much in Haskell: if your accumulator parameters aren't strict, you'll overflow the stack when you later try to force them, and indeed, thanks to laziness, many non-tail-recursive programs don't overflow the stack; printing repeat 1 won't ever overflow the stack, even though the definition — repeat x = x : repeat x — clearly has recursion in a non-tail position. This is because (:) is lazy in its second argument; if you traverse the list, you'll have constant space usage, as the repeat x thunks are forced, and the previous cons cells are thrown away by the garbage collector.
On a more philosophical note, tail-recursive loops are generally considered suboptimal in Haskell. In general, rather than iteratively computing a result in steps, we prefer to generate a structure with all the step-equivalents at the leaves, and do a transformation (like a fold) on it to produce the final result. This is a much higher-level view of things, made efficient by laziness (the structure is built up and garbage-collected as it's processed, rather than all at once).2
This can take some getting used to at first, and it certainly doesn't work in all cases — extremely complicated loop structures might be a pain to translate efficiently3 — but directly translating tail-recursive loops into Haskell can be painful precisely because it isn't really all that idiomatic.
As far as the paste you linked to goes, id $! x doesn't work to force anything because it's the same as x `seq` id x, which is the same as x `seq` x, which is the same as x. Basically, whenever x `seq` y is forced, x is forced, and the result is y. You can't use seq to just force things at arbitrary points; you use it to cause the forcing of thunks to depend on other thunks.
In this case, the problem is that you're building up a large thunk in c, so you probably want to make auxk and auxj force it; a simple method would be to add a clause like auxj _ _ c _ | seq c False = undefined to the top of the definition. (The guard is always checked, forcing c to be evaluated, but always results in False, so the right-hand side is never evaluated.)
Personally, I would suggest keeping the bang pattern you have in the final version, as it's more readable, but f c _ | seq c False = undefined would work just as well too.
1 See Elegant memoization with functional memo tries and the data-memocombinators library.
2 Indeed, GHC can often even eliminate the intermediate structure entirely using fusion and deforestation, producing machine code similar to how the computation would be written in a low-level imperative language.
3 Although if you have such loops, it's quite possible that this style of programming will help you simplify them — laziness means that you can easily separate independent parts of a computation out into separate structures, then filter and combine them, without worrying that you'll be duplicating work by making intermediate computations that will later be thrown away.
OK let's work from the ground up here.
You have a list of entries
entries = [(k,j) | j <- [0..jmax], k <- [0..kmax]]
And based on those indexes, you have tests and counts
tests m n = map (\(k,j) -> j + k == m + n - 3) entries
counts = map (\(_,j) -> if (rem j 3) == 0 then 2 else 1) entries
Now you want to build up two things: a "total" count, and the list of entries that "pass" the test. The problem, of course, is that you want to generate the latter lazily, while the former (to avoid exploding the stack) should be evaluated strictly.
If you evaluate these two things separately, then you must either 1) prevent sharing entries (generate it twice, once for each calculation), or 2) keep the entire entries list in memory. If you evaluate them together, then you must either 1) evaluate strictly, or 2) have a lot of stack space for the huge thunk created for the count. Option #2 for both cases is rather bad. Your imperative solution deals with this problem simply by evaluating simultaneously and strictly. For a solution in Haskell, you could take Option #1 for either the separate or the simultaneous evaluation. Or you could show us your "real" code and maybe we could help you find a way to rearrange your data dependencies; it may turn out you don't need the total count, or something like that.

Haskell function definition and caching arrays

I have a question about implementing caching (memoization) using arrays in Haskell. The following pattern works:
f = (fA !)
where fA = listArray...
But this does not (the speed of the program suggests that the array is getting recreated each call or something):
f n = (fA ! n)
where fA = listArray...
Defining fA outside of a where clause (in "global scope") also works with either pattern.
I was hoping that someone could point me towards a technical explanation of what the difference between the above two patterns is.
Note that I am using the latest GHC, and I'm not sure if this is just a compiler peculiarity or part of the language itself.
EDIT: ! is used for array access, so fA ! 5 means fA[5] in C++ syntax. My understanding of Haskell is that (fA !) n would be the same as (fA ! n)...also it would have been more conventional for me to have written "f n = fA ! n" (without the parentheses). Anyway, I get the same behaviour no matter how I parenthesize.
The difference in behavior is not specified by the Haskell standard. All it has to say is that the functions are the same (will result in the same output given the same input).
However in this case there is a simple way to predict time and memory performance that most compilers adhere to. Again I stress that this is not essential, only that most compilers do it.
First rewrite your two examples as pure lambda expressions, expanding the section:
f = let fA = listArray ... in \n -> fA ! n
f' = \n -> let fA = listArray ... in fA ! n
Compilers use let binding to indicate sharing. The guarantee is that in a given environment (set of local variables, lambda body, something like that), the right side of a let binding with no parameters will be evaluated at most once. The environment of fA in the former is the whole program since it is not under any lambda, but the environment of the latter is smaller since it is under a lambda.
What this means is that in the latter, fA may be evaluated once for each different n, whereas in the former this is forbidden.
We can see this pattern in effect even with multi argument functions:
g x y = (a ! y) where a = [ x ^ y' | y' <- [0..] ]
g' x = (\y -> a ! y) where a = [ x ^ y' | y' <- [0..] ]
Then in:
let k = g 2 in k 100 + k 100
We might compute 2^100 more than once, but in:
let k = g' 2 in k 100 + k 100
We will only compute it once.
If you are doing work with memoization, I recommend data-memocombinators on Hackage, which is a library of memo tables of different shapes, so you don't have to roll your own.
The best way to find what is going on is to tell the compiler to output its intermediate representation with -v4. The output is voluminous and a bit hard to read, but should allow you to find out exactly what the difference in the generated code is, and how the compiler arrived there.
You will probably notice that fA is being moved outside the function (to the "global scope") on your first example. On your second example, it probably is not (meaning it will be recreated on each call).
One possible reason for it not being moved outside the function would be because the compiler is thinking it depends on the value of n. On your working example, there is no n for fA to depend on.
But the reason I think the compiler is avoiding moving fA outside on your second example is because it is trying to avoid a space leak. Consider what would happen if fA, instead of your array, were an infinite list (on which you used the !! operator). Imagine you called it once with a large number (for instance f 10000), and later only called it with small numbers (f 2, f 3, f 12...). The 10000 elements from the earlier call are still on memory, wasting space. So, to avoid this, the compiler creates fA again every time you call your function.
The space leak avoidance probably does not happen on your first example because in that case f is in fact only called once, returning a closure (we are now at the frontier of the pure functional and the imperative worlds, so things get a bit more subtle). This closure replaces the original function, which will never be called again, so fA is only called once (and thus the optimizer feels free to move it outside the function). On your second example, f does not get replaced by a closure (since its value depends on the argument), and thus will get called again.
If you want to try to understand more of this (which will help reading the -v4 output), you could take a look at the Spineless Tagless G-Machine paper (citeseer link).
As to your final question, I think it is a compiler peculiarity (but I could be wrong). However, it would not surprise me if all compilers did the same thing, even if it were not part of the language.
Cool, thank you for your answers which helped a lot, and I will definitely check out data-memocombinators on Hackage. Coming from a C++-heavy background, I've been struggling with understanding exactly what Haskell will do (mainly in terms of complexity) with a given program, which tutorials don't seem to get in to.

Resources