ghci compiler optimization: calling a function with same parameter twice

ghci compiler optimization: calling a function with same parameter twice - haskell

In the simple code below, part of the definition of a function that deletes an element from a binary search tree:
deleteB x (Node n l r) | x == n = Node (leastB r) l (deleteB (leastB r) r)
does the compiler optimize the code so that it calls (least B r) only once as if it were:
deleteB x (Node n l r) | x == n = Node k l (deleteB k r)
where k = leastB r
?
In other words, is the compiler able to understand that since parameter r isn't changed within the body of the function deleteB, the result of the call of the same function (leastB) on it can't give different results, hence it is useless to compute it twice?
More generally, how would I be able to understand if the compiler does this optimization or not in case amazing stackoverflow did not exist? thanks

If you want to know what GHC "really did", you want to look at the "Core" output.
GHC takes your Haskell source code, which is extremely high-level, and transforms it into a sequence of lower and lower-level languages:
Haskell ⇒ Core ⇒ STG ⇒ C−− ⇒ assembly language ⇒ machine code
Almost all of the high-level optimisations happen in Core. The one you're asking about is basically "common subexpression elimination" (CSE). If you think about it, this is a time / space tradeoff; by saving the previous result, you're using less CPU time, but also using more RAM. If the result you're trying to store is tiny (i.e., an integer), this can be worth it. If the result is huge (i.e., the entire contents of that 17GB text file you just loaded), this is probably a Very Bad Idea.
As I understand it (⇒ not very well!), GHC tends not to do CSE. But if you want to know for sure, in your specific case, you want to look at the Core that your program has actually been compiled into. I believe the switch you want is --ddump-prep.
http://www.haskell.org/ghc/docs/7.0.2/html/users_guide/options-debugging.html

GHC does not perform this optimization because it is not always an optimization space-wise.
For instance, consider
n = 1000000
x = (length $ map succ [1..n], length $ map pred [1..n])
On a lazy language such as Haskell, one would expect this to run in constant space. Indeed, the list generating expression [1..n] should lazily produce an element at a time, which would be affected by succ/pred because of the maps, and then counted by length. (Even better, succ and pred are not computed at all since length does not force the list elements). After this, the produced element can be garbage collected, and the list generator can produce the next element, and so on. In real implementations, one would not expect every single element to be garbage collected immediately, but if the garbage collector is good, only a constant amount of them should be in memory at any time.
By comparison, the "optimized" code
n = 1000000
l = [1..n]
x = (length $ map succ l, length $ map pred l)
does not allow to garbage collect the elements of l until both components of x are evaluated. So, while it produces the list only once, it uses O(n) words of memory to store the full list. This is likely to lead to a lower performance than the unoptimized code.

Related

haskell length runtime O(1) or O(n)

I was working on a Haskell assignment and I was trying think of ways to make my code faster.
For example, my factors function below finds the number of divisors of some integer.
factors :: Int -> Int
factors x = length [n | n <- [1..x], mod x n == 0]
However, it occurred to me that I could make my code faster by avoiding usage of "length".
factors :: Int -> Int
factors x = go 1
where
go :: Int -> Int
go i
| i == x = 1
| mod x i == 0 = 1 + go (i + 1)
| otherwise = go (i + 1)
I was wondering if Haskell's length function is O(n) like strlen() in C or O(1) like String.length() in Java.
Also, is there a better or more efficient of writing my code?

In my estimation, contrary to the accepted answer, you can in fact infer the complexity of length (and many other functions) just by looking at the definition of [a]:
Prelude> :info []
data [] a = [] | a : [a] -- Defined in ‘GHC.Types’
Lists are inductively-defined; you can see from that definition (which is almost just regular haskell) that at the top level a list is either the constructor [] or :. Clearly length must recurse n times on this structure and so would have to be O(n).
It's very important to be able to reason at least intuitively in this way, in particular about lists which are ubiquitous. e.g. quick what's the complexity of (!!)?
If you want to do a deep dive into formally reasoning about time complexity in the presence of laziness then you'll need to pick up "Purely Functional Data Structures" by Okasaki.

Also, is there a better or more efficient of writing my code?
Integer factorization is one of the most famous problems. There surely have been proposed a lot of algorithms for that, even if I am not expert enough to make a recommendation (CS.SE is around the corner, and can help on that, if needed). None of such proposals is polynomial time, but this doesn't stop them to be faster than the trivial approach.
Even without looking at the literature, a few simple optimizations can be found.
The original code scans the whole list [1..x], but this is not needed. We could stop at sqrt x, since after that there are no longer divisors.
Even more: after we find a divisor m, we could divide x by m (as many times as possible), and recurse with this new number. E.g. if x = 1000 after we try m=2, we compute 1000 -> 500 -> 250 -> 125, and then find the new divisors (larger than 2) in 125. Note how this made the number much smaller.
I will leave implementing this strategies in Haskell as an exercise :-P

From a theoretical perspective, we can not know whether length is θ(n), we know that it is O(n), but it is technically possible that Haskell implements it faster for known lists.
Since a Haskell compiler could be free to implement a list whatever way they want to. But nevertheless it does not matter, since in that case generating the list in the first place will take θ(n).
Note that even if the compiler uses a more dedicated datastructure, Haskell is lazy, so your list comprehension does not result in a complete list, but more in a function that can generate a list lazily.
Finally if we would evaluate the list comprehension eagerly, then it would again require O(n) to first generate the list in the first place. So even if obtaining the length was very fast, then generating the list would require O(n) as a lower bound. So regardless what the efficiency of length is, the algorithm will still scale linearly with the input.
Your own implementation again uses O(n) (and is not very safe to be honest). Nevertheless, you can easily speedup the factorization of a number to O(sqrt n):
factors :: Int -> Int
factors x = go 1
where
go :: Int -> Int
go i | i2 > x = 0
| i2 == x = 1
| mod x i == 0 = 2 + go (i+1)
| otherwise = go (i + 1)
where i2 = i*i
Here we enumerate from 1 to sqrt(n). Each time we find a factor a, we know that there is a co-factor b = x/a. As long as a is not equal to sqrt(x), we know that those are different. In case a is equal to sqrt(x), we know that a is equal to b and thus we count this as one.
That being said, there are definitely faster ways to do it. It is a topic with a lot of research that has yielded more efficient algorithms. I'm not suggesting that the above is the fastest, but it is definitely a huge improvement in terms of time complexity.

Building the list with the list comprehension already takes O(n). Therefore there is not much overhead when using a length function which should have complexity O(n) in the worst case.

Universal memoization in Haskell [duplicate]

I can't figure out why m1 is apparently memoized while m2 is not in the following:
m1 = ((filter odd [1..]) !!)
m2 n = ((filter odd [1..]) !! n)
m1 10000000 takes about 1.5 seconds on the first call, and a fraction of that on subsequent calls (presumably it caches the list), whereas m2 10000000 always takes the same amount of time (rebuilding the list with each call). Any idea what's going on? Are there any rules of thumb as to if and when GHC will memoize a function? Thanks.

GHC does not memoize functions.
It does, however, compute any given expression in the code at most once per time that its surrounding lambda-expression is entered, or at most once ever if it is at top level. Determining where the lambda-expressions are can be a little tricky when you use syntactic sugar like in your example, so let's convert these to equivalent desugared syntax:
m1' = (!!) (filter odd [1..]) -- NB: See below!
m2' = \n -> (!!) (filter odd [1..]) n
(Note: The Haskell 98 report actually describes a left operator section like (a %) as equivalent to \b -> (%) a b, but GHC desugars it to (%) a. These are technically different because they can be distinguished by seq. I think I might have submitted a GHC Trac ticket about this.)
Given this, you can see that in m1', the expression filter odd [1..] is not contained in any lambda-expression, so it will only be computed once per run of your program, while in m2', filter odd [1..] will be computed each time the lambda-expression is entered, i.e., on each call of m2'. That explains the difference in timing you are seeing.
Actually, some versions of GHC, with certain optimization options, will share more values than the above description indicates. This can be problematic in some situations. For example, consider the function
f = \x -> let y = [1..30000000] in foldl' (+) 0 (y ++ [x])
GHC might notice that y does not depend on x and rewrite the function to
f = let y = [1..30000000] in \x -> foldl' (+) 0 (y ++ [x])
In this case, the new version is much less efficient because it will have to read about 1 GB from memory where y is stored, while the original version would run in constant space and fit in the processor's cache. In fact, under GHC 6.12.1, the function f is almost twice as fast when compiled without optimizations than it is compiled with -O2.

m1 is computed only once because it is a Constant Applicative Form, while m2 is not a CAF, and so is computed for each evaluation.
See the GHC wiki on CAFs: http://www.haskell.org/haskellwiki/Constant_applicative_form

There is a crucial difference between the two forms: the monomorphism restriction applies to m1 but not m2, because m2 has explicitly given arguments. So m2's type is general but m1's is specific. The types they are assigned are:
m1 :: Int -> Integer
m2 :: (Integral a) => Int -> a
Most Haskell compilers and interpreters (all of them that I know of actually) do not memoize polymorphic structures, so m2's internal list is recreated every time it's called, where m1's is not.

I'm not sure, because I'm quite new to Haskell myself, but it appears that it's beacuse the second function is parametrized and the first one is not. The nature of the function is that, it's result depends on input value and in functional paradigm especailly it depends ONLY on the input. Obvious implication is that a function with no parameters returns always the same value over and over, no matter what.
Aparently there's an optimizing mechanizm in GHC compiler that exploits this fact to compute the value of such a function only once for whole program runtime. It does it lazily, to be sure, but does it nonetheless. I noticed it myself, when I wrote the following function:
primes = filter isPrime [2..]
where isPrime n = null [factor | factor <- [2..n-1], factor `divides` n]
where f `divides` n = (n `mod` f) == 0
Then to test it, I entered GHCI and wrote: primes !! 1000. It took a few seconds, but finally I got the answer: 7927. Then I called primes !! 1001 and got the answer instantly. Similarly in an instant I got the result for take 1000 primes, because Haskell had to compute the whole thousand-element list to return 1001st element before.
Thus if you can write your function such that it takes no parameters, you probably want it. ;)

tail recursion recognition

I'm trying to learn Haskell and I stumbled upon the following:
myAdd (x:xs) = x + myAdd xs
myAdd null = 0
f = let n = 10000000 in myAdd [1 .. n]
main = do
putStrLn (show f)
When compiling with GHC, this yields a stack overflow. As a C/C++ programmer, I would have expected the compiler to do tail call optimization.
I don't like that I would have to "help" the compiler in simple cases like these, but what options are there? I think it is reasonable to require that the calculation given above be done without using O(n) memory, and without deferring to specialized functions.
If I cannot state my problem naturally (even on a toy problem such as this), and expect reasonable performance in terms of time & space, much of the appeal of Haskell would be lost.

Firstly, make sure you're compiling with -O2. It makes a lot of performance problems just go away :)
The first problem I can see is that null is just a variable name there. You want []. It's equivalent here because the only options are x:xs and [], but it won't always be.
The issue here is simple: when you call sum [1,2,3,4], it looks like this:
1 + (2 + (3 + (4 + 0)))
without ever reducing any of these additions to a number, because of Haskell's non-strict semantics. The solution is simple:
myAdd = myAdd' 0
where myAdd' !total [] = total
myAdd' !total (x:xs) = myAdd' (total + x) xs
(You'll need {-# LANGUAGE BangPatterns #-} at the top of your source file to compile this.)
This accumulates the addition in another parameter, and is actually tail recursive (yours isn't; + is in tail position rather than myAdd). But in fact, it's not quite tail recursion we care about in Haskell; that distinction is mainly relevant in strict languages. The secret here is the bang pattern on total: it forces it to be evaluated every time myAdd' is called, so no unevaluated additions build up, and it runs in constant space. In this case, GHC can actually figure this out with -O2 thanks to its strictness analysis, but I think it's usually best to be explicit about what you want strict and what you don't.
Note that if addition was lazy, your myAdd definition would work fine; the problem is that you're doing a lazy traversal of the list with a strict operation, which ends up causing the stack overflow. This mostly comes up with arithmetic, which is strict for the standard numeric types (Int, Integer, Float, Double, etc.).
This is quite ugly, and it would be a pain to write something like this every time we want to write a strict fold. Thankfully, Haskell has an abstraction ready for this!
myAdd = foldl' (+) 0
(You'll need to add import Data.List to compile this.)
foldl' (+) 0 [a, b, c, d] is just like (((0 + a) + b) + c) + d, except that at each application of (+) (which is how we refer to the binary operator + as a function value), the value is forced to be evaluated. The resulting code is cleaner, faster, and easier to read (once you know how the list folds work, you can understand any definition written in terms of them easier than a recursive definition).
Basically, the problem here is not that the compiler can't figure out how to make your program efficient — it's that making it as efficient as you like could change its semantics, which an optimisation should never do. Haskell's non-strict semantics certainly pose a learning curve to programmers in more "traditional" languages like C, but it gets easier over time, and once you see the power and abstraction that Haskell's non-strictness offers, you'll never want to go back :)

Expanding the example ehird hinted at in the comments:
data Peano = Z | S Peano
deriving (Eq, Show)
instance Ord Peano where
compare (S a) (S b) = compare a b
compare Z Z = EQ
compare Z _ = LT
compare _ _ = GT
instance Num Peano where
Z + n = n
(S a) + n = S (a + n)
-- omit others
fromInteger 0 = Z
fromInteger n
| n < 0 = error "Peano: fromInteger requires non-negative argument"
| otherwise = S (fromInteger (n-1))
instance Enum Peano where
succ = S
pred (S a) = a
pred _ = error "Peano: no predecessor"
toEnum n
| n < 0 = error "toEnum: invalid argument"
| otherwise = fromInteger (toInteger n)
fromEnum Z = 0
fromEnum (S a) = 1 + fromEnum a
enumFrom = iterate S
enumFromTo a b = takeWhile (<= b) $ enumFrom a
-- omit others
infinity :: Peano
infinity = S infinity
result :: Bool
result = 3 < myAdd [1 .. infinity]
result is True by the definition of myAdd, but if the compiler transformed into a tail-recursive loop, it wouldn't terminate. So that transformation is not only a change in efficiency, but also in semantics, hence a compiler must not do it.

A little funny example regarding "The issue is why the compiler is unable to optimize something that appears to be rather trivial to optimize."
Let's say I'm coming from Haskell to C++. I used to write foldr because in Haskell foldr is usually more effective than foldl because of laziness and list fusion.
So I'm trying to write a foldr for a (single-linked) list in C and complaining why it's grossly inefficient:
int foldr(int (*f)(int, node*), int base, node * list)
{
return list == NULL
? base
: f(a, foldr(f, base, list->next));
}
It is inefficient not because the C compiler in question is an unrealistic toy tool developed by ivory tower theorists for their own satisfaction, but because the code in question is grossly non-idiomatic for C.
It is not the case that you cannot write an efficient foldr in C: you just need a doubly-linked list. In Haskell, similarly, you can write an efficient foldl, you need strictness annotations for foldl to be efficient. The standard library provides both foldl (without annotations) and foldl' (with annotations).
The idea of left folding a list in Haskell is the same kind of perversion as a desire to iterate a singly-linked list backwards using recursion in C. Compiler is there to help normal people, not perverts lol.
As your C++ projects probably don't have code iterating singly-linked lists backwards, my HNC project contains only 1 foldl I incorrectly wrote before I mastered Haskell enough. You hardly ever need to foldl in Haskell.
You must unlearn that forward iteration is natural and fastest, and learn that backward iteration is. The forward iteration (left folding) does not do what you intend, until you annotate: it does three passes - list creation, thunk chain buildup and thunk evaluation, instead of two (list creation and list traversal). Note that in immutable world lists can be only efficiently created backwards: a : b is O(1), and a ++ [b] is O(N).
And the backward iteration doesn't do what you intend either. It does one pass instead of three as you might expect from your C background. It doesn't create a list, traverse it to the bottom and then traverse it backwards (2 passes) - it traverses the list as it creates it, that is, 1 pass. With optimizations on, it is just a loop - no actual list elements are created. With optimizations off, it is still O(1) space operation with bigger constant overhead, but explanation is a bit longer.

So there are two things I will address about your problem, firstly the performance problem, and then secondly the expressive problem, that of having to help the compiler with something that seems trivial.
The performance
The thing is that your program is in fact not tail recursive, that is, there is no single call to a function that can replace the recursion. Lets have a look at what happens when we expand myAdd [1..3]:
myAdd [1,2,3]
1 + myAdd [2,3]
1 + 2 + myAdd [3]
As you can see, at any given step, we cannot replace the recursion with a function call, we could simplify the expression by reducing 1 + 2 to 3, but that is not what tail recursion is about.
So here is a version that is tail recursive:
myAdd2 = go 0
where go a [] = a
go a (x:xs) = go (a + x) xs
Lets have a look at how go 0 [1,2,3] is evaluated:
go 0 [1,2,3]
go (1+0) [2,3]
go (2 + 1 + 0) [3]
As you see, at every step, we only need to keep track of
one function call, and as long the first parameter is
evaluated strictly we should not get an exponential space
blow up, and in fact, if you compile with optimization (-O1 or -O2)
ghc is smart enough to figure that out on its own.
Expressiveness
Alright so it is a bit harder to reason about performance in haskell, but most of the time you don't have to. The thing is that you can use combinators that ensure efficiency. This particular pattern above is captured by foldl (and its strict cousin foldl') so myAdd can be written as:
myAdd = foldl (+) 0
and if you compile that with optimization it will not give you an exponential space blowup!

Extent of GHC's optimization

I am not very familiar with the degree that Haskell/GHC can optimize code. Below I have a pretty "brute-force" (in the declarative sense) implementation of the n queens problem. I know it can be written more efficiently, but thats not my question. Its that this got me thinking about the GHC optimizations capabilities and limits.
I have expressed it in what I consider a pretty straightforward declarative sense. Filter permutations of [1..n] that fulfill the predicate For all indices i,j s.t j<i, abs(vi - vj) != j-i I would hope this is the kind of thing that can be optimized, but it also kind of feels like asking a lot of compiler.
validQueens x = and [abs (x!!i - x!!j) /= j-i | i<-[0..length x - 2], j<-[i+1..length x - 1]]
queens n = filter validQueens (permutations [1..n])
oneThru x = [1..x]
pointlessQueens = filter validQueens . permutations . oneThru
main = do
n <- getLine
print $ pointlessQueens $ (read :: String -> Int) n
This runs fairly slow and grows quickly. n=10 takes about a second and n=12 takes forever. Without optimization I can tell the growth is factorial (# of permutations) multiplied by quadratic (# of differences in the predicate to check). Is there any way this code can perform better thru intelligent compilation? I tried the basic ghc options such has -O2 and didn't notice a significant difference, but I don't know the finer points (just added the flagS)
My impression is that the function i call queens can not be optimized and must generate all permutations before filter. Does the point-free version have a better chance? On the one hand I feel like a smart function comprehension between a filter and a predicate might be able to knock off some obviously undesired elements before they are even fully generated, but on the other hand it kind of feels like a lot to ask.
Sorry if this seems rambling, i guess my question is
Is the pointfree version of above function more capable of being optimized?
What steps could I take at make/compile/link time to encourage optimization?
Can you briefly describe some possible (and contrast with the impossible!) means of optimization for the above code? At what point in the process do these occur?
Is there any particular part of ghc --make queensN -O2 -v output I should be paying attention to? Nothing stands out to me. Don't even see much difference in output due to optimization flags
I am not overly concerned with this code example, but I thought writing it got me thinking and it seems to me like a decent vehicle for discussing optimization.
PS - permutations is from Data.List and looks like this:
permutations :: [a] -> [[a]]
permutations xs0 = xs0 : perms xs0 []
where
perms [] _ = []
perms (t:ts) is = foldr interleave (perms ts (t:is)) (permutations is)
where interleave xs r = let (_,zs) = interleave' id xs r in zs
interleave' _ [] r = (ts, r)
interleave' f (y:ys) r = let (us,zs) = interleave' (f . (y:)) ys r
in (y:us, f (t:y:us) : zs)

At a more general level regarding "what kind of optimizations can GHC do", it may help to break the idea of an "optimization" apart a little bit. There are conceptual distinctions that can be drawn between aspects of a program that can be optimized. For instance, consider:
The intrinsic logical structure of the algorithm: You can safely assume in almost every case that this will never be optimized. Outside of experimental research, you're not likely to find a compiler that will replace a bubble sort with a merge sort, or even an insertion sort, and extremely unlikely to find one that would replace a bogosort with something sensible.
Non-essential logical structure of the algorithm: For instance, in the expression g (f x) (f x), how many times will f x be computed? What about an expression like g (f x 2) (f x 5)? These aren't intrinsic to the algorithm, and different variations can be interchanged without impacting anything other than performance. The difficulties in performing optimization here are essentially recognizing when a substitution can in fact be done without changing the meaning, and predicting which version will have the best results. A lot of manual optimizations fall into this category, along with a great deal of GHC's cleverness.
This is also the part that trips a lot of people up, because they see how clever GHC is and expect it to do even more. And because of the reasonable expectation that GHC should never make things worse, it's not uncommon to have potential optimizations that seem obvious (and are, to the programmer) that GHC can't apply because it's nontrivial to distinguish cases where the same transformation would significantly degrade performance. This is, for instance, why memoization and common subexpression elimination aren't always automatic.
This is also the part where GHC has a huge advantage, because laziness and purity make a lot of things much easier, and is I suspect what leads to people making tongue-in-cheek remarks like "Optimizing compilers are a myth (except perhaps in Haskell).", but also to unrealistic optimism about what even GHC can do.
Low-level details: Things like memory layout and other aspects of the final code. These tend to be somewhat arcane and highly dependent on implementation details of the runtime, the OS, and the processor. Optimizations of this sort are essentially why we have compilers, and usually not something you need to worry about unless you're writing code that is very computationally demanding (or are writing a compiler yourself).
As far as your specific example here goes: GHC isn't going to significantly alter the intrinsic time complexity of your algorithm. It might be able to remove some constant factors. What it can't do is apply constant-factor improvements that it can't be sure are correct, particularly ones that technically change the meaning of the program in ways that you don't care about. Case in point here is #sclv's answer, which explains how your use of print creates unnecessary overhead; there's nothing GHC could do about that, and in fact the current form would possibly inhibit other optimizations.

There's a conceptual problem here. Permutations is generating streaming permutations, and filter is streaming too. What's forcing everything prematurely is the "show" implicit in "print". Change your last line to:
mapM print $ pointlessQueens $ (read :: String -> Int) n
and you'll see that results are generated in a streaming fashion much more rapidly. That fixes, for large result sets, a potential space leak, and other than that just lets things be printed as computed rather than all at once at the end.
However, you shouldn't expect any order of magnitude improvements from ghc optimizations (there are a few, obvious ones that you do get, mostly having to do with strictness and folds, but its irritating to rely on them). What you'll get are constant factors, generally.
Edit: As luqui points out below, show is also streaming (or at least show of [Int] is), but the line buffering nonetheless makes it harder to see the genuine speed of computation...

It should be noted, although you do express that it is not part of your question, that the big problem with your code is that you do not do any pruning.
In the case of your question, it feels foolish to talk about possible/impossible optimization, compiler flags, and how to best formulate it etc. when an improvement of the algorithm is staring us so blatantly in the face.
One of the first things that will be tried is the permutations starting with the first queen in position 1 and the second queen in position 2 ([1,2...]). This is of course not a solution and we will have to move one of the queens. However, in your implementation, all permutations involving this combination of the two first queens will be tested! The search should stop there and instantly move to the permutations involving [1,3,...].
Here is a version that does this sort of pruning:
import Data.List
import Control.Monad
main = getLine >>= mapM print . queens . read
queens :: Int -> [[Int]]
queens n = queens' [] n
queens' xs n
| length xs == n = return xs
| otherwise = do
x <- [1..n] \\ xs
guard (validQueens (x:xs))
queens' (x:xs) n
validQueens x =
and [abs (x!!i - x!!j) /= j-i | i<-[0..length x - 2], j<-[i+1..length x - 1]]

I understand that your question was about compiler optimization but as the discussion has shown pruning is necessary.
The first paper that I know of about how to do this for the n queens problem in a lazy functional language is Turner's paper "Recursion Equations as a Programming Language" You can read it in Google Books here.
In terms of your comment about a pattern worth remembering, this problem introduces a very powerful pattern. A great paper on this idea is Philip Wadler's paper, "How to Replace Failure by a List of Successes", which can be read in Google Books here
Here is a pure, non-monadic, implementation based on Turner's Miranda implementation. In the case of n = 12 (queens 12 12) it returns the first solution in .01 secs and will compute all 14,200 solutions in under 6 seconds. Of course printing those takes much longer.
queens :: Int -> Int -> [[Int]]
queens n boardsize =
queensi n
where
-- given a safe arrangement of queens in the first n - 1 rows,
-- "queensi n" returns a list of all the safe arrangements of queens
-- in the first n rows
queensi :: Int -> [[Int]]
queensi 0 = [[]]
queensi n = [ x : y | y <- queensi (n-1) , x <- [1..boardsize], safe x y 1]
-- "safe x y n" tests whether a queen at column x would be safe from previous
-- queens in y where the first element of y is n rows away from x, the second
-- element is (n+1) rows away from x, etc.
safe :: Int -> [Int] -> Int -> Bool
safe _ [] _ = True
safe x (c:y) n = and [ x /= c , x /= c + n , x /= c - n , safe x y (n+1)]
-- we only need to check for queens in the same column, and the same diagonals;
-- queens in the same row are not possible by the fact that we only pick one
-- queen per row

When is memoization automatic in GHC Haskell?

I can't figure out why m1 is apparently memoized while m2 is not in the following:
m1 = ((filter odd [1..]) !!)
m2 n = ((filter odd [1..]) !! n)
m1 10000000 takes about 1.5 seconds on the first call, and a fraction of that on subsequent calls (presumably it caches the list), whereas m2 10000000 always takes the same amount of time (rebuilding the list with each call). Any idea what's going on? Are there any rules of thumb as to if and when GHC will memoize a function? Thanks.

GHC does not memoize functions.
It does, however, compute any given expression in the code at most once per time that its surrounding lambda-expression is entered, or at most once ever if it is at top level. Determining where the lambda-expressions are can be a little tricky when you use syntactic sugar like in your example, so let's convert these to equivalent desugared syntax:
m1' = (!!) (filter odd [1..]) -- NB: See below!
m2' = \n -> (!!) (filter odd [1..]) n
(Note: The Haskell 98 report actually describes a left operator section like (a %) as equivalent to \b -> (%) a b, but GHC desugars it to (%) a. These are technically different because they can be distinguished by seq. I think I might have submitted a GHC Trac ticket about this.)
Given this, you can see that in m1', the expression filter odd [1..] is not contained in any lambda-expression, so it will only be computed once per run of your program, while in m2', filter odd [1..] will be computed each time the lambda-expression is entered, i.e., on each call of m2'. That explains the difference in timing you are seeing.
Actually, some versions of GHC, with certain optimization options, will share more values than the above description indicates. This can be problematic in some situations. For example, consider the function
f = \x -> let y = [1..30000000] in foldl' (+) 0 (y ++ [x])
GHC might notice that y does not depend on x and rewrite the function to
f = let y = [1..30000000] in \x -> foldl' (+) 0 (y ++ [x])
In this case, the new version is much less efficient because it will have to read about 1 GB from memory where y is stored, while the original version would run in constant space and fit in the processor's cache. In fact, under GHC 6.12.1, the function f is almost twice as fast when compiled without optimizations than it is compiled with -O2.

m1 is computed only once because it is a Constant Applicative Form, while m2 is not a CAF, and so is computed for each evaluation.
See the GHC wiki on CAFs: http://www.haskell.org/haskellwiki/Constant_applicative_form

There is a crucial difference between the two forms: the monomorphism restriction applies to m1 but not m2, because m2 has explicitly given arguments. So m2's type is general but m1's is specific. The types they are assigned are:
m1 :: Int -> Integer
m2 :: (Integral a) => Int -> a
Most Haskell compilers and interpreters (all of them that I know of actually) do not memoize polymorphic structures, so m2's internal list is recreated every time it's called, where m1's is not.

I'm not sure, because I'm quite new to Haskell myself, but it appears that it's beacuse the second function is parametrized and the first one is not. The nature of the function is that, it's result depends on input value and in functional paradigm especailly it depends ONLY on the input. Obvious implication is that a function with no parameters returns always the same value over and over, no matter what.
Aparently there's an optimizing mechanizm in GHC compiler that exploits this fact to compute the value of such a function only once for whole program runtime. It does it lazily, to be sure, but does it nonetheless. I noticed it myself, when I wrote the following function:
primes = filter isPrime [2..]
where isPrime n = null [factor | factor <- [2..n-1], factor `divides` n]
where f `divides` n = (n `mod` f) == 0
Then to test it, I entered GHCI and wrote: primes !! 1000. It took a few seconds, but finally I got the answer: 7927. Then I called primes !! 1001 and got the answer instantly. Similarly in an instant I got the result for take 1000 primes, because Haskell had to compute the whole thousand-element list to return 1001st element before.
Thus if you can write your function such that it takes no parameters, you probably want it. ;)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string