haskell length runtime O(1) or O(n)

haskell length runtime O(1) or O(n) - haskell

I was working on a Haskell assignment and I was trying think of ways to make my code faster.
For example, my factors function below finds the number of divisors of some integer.
factors :: Int -> Int
factors x = length [n | n <- [1..x], mod x n == 0]
However, it occurred to me that I could make my code faster by avoiding usage of "length".
factors :: Int -> Int
factors x = go 1
where
go :: Int -> Int
go i
| i == x = 1
| mod x i == 0 = 1 + go (i + 1)
| otherwise = go (i + 1)
I was wondering if Haskell's length function is O(n) like strlen() in C or O(1) like String.length() in Java.
Also, is there a better or more efficient of writing my code?

In my estimation, contrary to the accepted answer, you can in fact infer the complexity of length (and many other functions) just by looking at the definition of [a]:
Prelude> :info []
data [] a = [] | a : [a] -- Defined in ‘GHC.Types’
Lists are inductively-defined; you can see from that definition (which is almost just regular haskell) that at the top level a list is either the constructor [] or :. Clearly length must recurse n times on this structure and so would have to be O(n).
It's very important to be able to reason at least intuitively in this way, in particular about lists which are ubiquitous. e.g. quick what's the complexity of (!!)?
If you want to do a deep dive into formally reasoning about time complexity in the presence of laziness then you'll need to pick up "Purely Functional Data Structures" by Okasaki.

Also, is there a better or more efficient of writing my code?
Integer factorization is one of the most famous problems. There surely have been proposed a lot of algorithms for that, even if I am not expert enough to make a recommendation (CS.SE is around the corner, and can help on that, if needed). None of such proposals is polynomial time, but this doesn't stop them to be faster than the trivial approach.
Even without looking at the literature, a few simple optimizations can be found.
The original code scans the whole list [1..x], but this is not needed. We could stop at sqrt x, since after that there are no longer divisors.
Even more: after we find a divisor m, we could divide x by m (as many times as possible), and recurse with this new number. E.g. if x = 1000 after we try m=2, we compute 1000 -> 500 -> 250 -> 125, and then find the new divisors (larger than 2) in 125. Note how this made the number much smaller.
I will leave implementing this strategies in Haskell as an exercise :-P

From a theoretical perspective, we can not know whether length is θ(n), we know that it is O(n), but it is technically possible that Haskell implements it faster for known lists.
Since a Haskell compiler could be free to implement a list whatever way they want to. But nevertheless it does not matter, since in that case generating the list in the first place will take θ(n).
Note that even if the compiler uses a more dedicated datastructure, Haskell is lazy, so your list comprehension does not result in a complete list, but more in a function that can generate a list lazily.
Finally if we would evaluate the list comprehension eagerly, then it would again require O(n) to first generate the list in the first place. So even if obtaining the length was very fast, then generating the list would require O(n) as a lower bound. So regardless what the efficiency of length is, the algorithm will still scale linearly with the input.
Your own implementation again uses O(n) (and is not very safe to be honest). Nevertheless, you can easily speedup the factorization of a number to O(sqrt n):
factors :: Int -> Int
factors x = go 1
where
go :: Int -> Int
go i | i2 > x = 0
| i2 == x = 1
| mod x i == 0 = 2 + go (i+1)
| otherwise = go (i + 1)
where i2 = i*i
Here we enumerate from 1 to sqrt(n). Each time we find a factor a, we know that there is a co-factor b = x/a. As long as a is not equal to sqrt(x), we know that those are different. In case a is equal to sqrt(x), we know that a is equal to b and thus we count this as one.
That being said, there are definitely faster ways to do it. It is a topic with a lot of research that has yielded more efficient algorithms. I'm not suggesting that the above is the fastest, but it is definitely a huge improvement in terms of time complexity.

Building the list with the list comprehension already takes O(n). Therefore there is not much overhead when using a length function which should have complexity O(n) in the worst case.

Related

ghci compiler optimization: calling a function with same parameter twice

In the simple code below, part of the definition of a function that deletes an element from a binary search tree:
deleteB x (Node n l r) | x == n = Node (leastB r) l (deleteB (leastB r) r)
does the compiler optimize the code so that it calls (least B r) only once as if it were:
deleteB x (Node n l r) | x == n = Node k l (deleteB k r)
where k = leastB r
?
In other words, is the compiler able to understand that since parameter r isn't changed within the body of the function deleteB, the result of the call of the same function (leastB) on it can't give different results, hence it is useless to compute it twice?
More generally, how would I be able to understand if the compiler does this optimization or not in case amazing stackoverflow did not exist? thanks

If you want to know what GHC "really did", you want to look at the "Core" output.
GHC takes your Haskell source code, which is extremely high-level, and transforms it into a sequence of lower and lower-level languages:
Haskell ⇒ Core ⇒ STG ⇒ C−− ⇒ assembly language ⇒ machine code
Almost all of the high-level optimisations happen in Core. The one you're asking about is basically "common subexpression elimination" (CSE). If you think about it, this is a time / space tradeoff; by saving the previous result, you're using less CPU time, but also using more RAM. If the result you're trying to store is tiny (i.e., an integer), this can be worth it. If the result is huge (i.e., the entire contents of that 17GB text file you just loaded), this is probably a Very Bad Idea.
As I understand it (⇒ not very well!), GHC tends not to do CSE. But if you want to know for sure, in your specific case, you want to look at the Core that your program has actually been compiled into. I believe the switch you want is --ddump-prep.
http://www.haskell.org/ghc/docs/7.0.2/html/users_guide/options-debugging.html

GHC does not perform this optimization because it is not always an optimization space-wise.
For instance, consider
n = 1000000
x = (length $ map succ [1..n], length $ map pred [1..n])
On a lazy language such as Haskell, one would expect this to run in constant space. Indeed, the list generating expression [1..n] should lazily produce an element at a time, which would be affected by succ/pred because of the maps, and then counted by length. (Even better, succ and pred are not computed at all since length does not force the list elements). After this, the produced element can be garbage collected, and the list generator can produce the next element, and so on. In real implementations, one would not expect every single element to be garbage collected immediately, but if the garbage collector is good, only a constant amount of them should be in memory at any time.
By comparison, the "optimized" code
n = 1000000
l = [1..n]
x = (length $ map succ l, length $ map pred l)
does not allow to garbage collect the elements of l until both components of x are evaluated. So, while it produces the list only once, it uses O(n) words of memory to store the full list. This is likely to lead to a lower performance than the unoptimized code.

Haskell checking if number is from Fibonacci sequence

I'm Haskell beginner. Last time I have learnt about Fibonacci sequences, so I can create Fib sequence. Now I'm wondering how to write a function which checks if number belongs to Fib sequence.
I mean function:
belongToFib :: Int -> Bool
I don't really need code. Some hints how to handle with this would be enough. Thanks in advance.

I will give you some hints for a solution involving lazy evaluation:
Define the list of all fibonacci numbers.
Check whether your input number belongs to the sequence.
These are the signatures for the two things you'll need to define:
fib :: [Int]
belongToFib :: Int -> Bool
Of course you will need some tricks to make this work. Even though your list has a (theoretically) infinite sequence of numbers, if you make sure that you only need to work on a finite subsequence, thanks to its laziness, Haskell will generate only the strictly needed part, and your function will not loop forever. So, when checking for the membership of your number to fib, make sure you return False at some point.
Another possible solution is to try to find out whether your number is in the fibonacci sequence without actually generating it up to the input, but rather by relying on arithmetic only. As a hint for this, have a look at this thread.
On Wikipedia you'll find a number of other ways to check membership to the fibonacci sequence.
edit: by the way, beware of overflows with Int. You may wish to switch to Integer instead.

Here is a skeleton of a function that tests if a number occurs in an increasing list of numbers:
contains _ [] = False
contains n (x:xs)
| n == x = True
| n < x = ???
| otherwise = ???
Think about what should happen in the cases I left open...
Or, if you are both lazy and allowed to use Prelude functions, you may have a look at dropWhile instead.

Counting number of elements in a list that satisfy the given predicate

Does Haskell standard library have a function that given a list and a predicate, returns the number of elements satisfying that predicate? Something like with type (a -> Bool) -> [a] -> Int. My hoogle search didn't return anything interesting. Currently I am using length . filter pred, which I don't find to be a particularly elegant solution. My use case seems to be common enough to have a better library solution that that. Is that the case or is my premonition wrong?

The length . filter p implementation isn't nearly as bad as you suggest. In particular, it has only constant overhead in memory and speed, so yeah.
For things that use stream fusion, like the vector package, length . filter p will actually be optimized so as to avoid creating an intermediate vector. Lists, however, use what's called foldr/build fusion at the moment, which is not quite smart enough to optimize length . filter p without creating linearly large thunks that risk stack overflows.
For details on stream fusion, see this paper. As I understand it, the reason that stream fusion is not currently used in the main Haskell libraries is that (as described in the paper) about 5% of programs perform dramatically worse when implemented on top of stream-based libraries, while foldr/build optimizations can never (AFAIK) make performance actively worse.

No, there is no predefined function that does this, but I would say that length . filter pred is, in fact, an elegant implementation; it's as close as you can get to expressing what you mean without just invoking the concept directly, which you can't do if you're defining it.
The only alternatives would be a recursive function or a fold, which IMO would be less elegant, but if you really want to:
foo :: (a -> Bool) -> [a] -> Int
foo p = foldl' (\n x -> if p x then n+1 else n) 0
This is basically just inlining length into the definition. As for naming, I would suggest count (or perhaps countBy, since count is a reasonable variable name).

Haskell is a high-level language. Rather than provide one function for every possible combination of circumstances you might ever encounter, it provides you with a smallish set of functions that cover all of the basics, and you then glue these together as required to solve whatever problem is currently at hand.
In terms of simplicity and conciseness, this is as elegant as it gets. So yes, length . filter pred is absolutely the standard solution. As another example, consider elem, which (as you may know) tells you whether a given item is present in a list. The standard reference implementation for this is actually
elem :: Eq x => x -> [x] -> Bool
elem x = foldr (||) False . map (x ==)
In order words, compare every element in the list to the target element, creating a new list of Bools. Then fold the logical-OR function over this new list.
If this seems inefficient, try not to worry about it. In particular,
The compiler can often optimise away temporary data structures created by code like this. (Remember, this is the standard way to write code in Haskell, so the compiler is tuned to deal with it.)
Even if it can't be optimised away, laziness often makes such code fairly efficient anyway.
(In this specific example, the OR function will terminate the loop as soon as a match is seen - just like what would happen if you hand-coded it yourself.)
As a general rule, write code by gluing together pre-existing functions. Change this only if performance isn't good enough.

This is my amateurish solution to a similar problem. Count the number of negative integers in a list l
nOfNeg l = length(filter (<0) l)
main = print(nOfNeg [0,-1,-2,1,2,3,4] ) --2

No, there isn't!
As of 2020, there is indeed no such idiom in the Haskell standard library yet! One could (and should) however insert an idiom howMany (resembling good old any)
howMany p xs = sum [ 1 | x <- xs, p x ]
-- howMany=(length.).filter
main = print $ howMany (/=0) [0..9]
Try howMany=(length.).filter

I'd do manually
howmany :: (a -> Bool) -> [a] -> Int
howmany _ [ ] = 0
howmany pred (x:xs) = if pred x then 1 + howmany pred xs
else howmany pred xs

tail recursion recognition

I'm trying to learn Haskell and I stumbled upon the following:
myAdd (x:xs) = x + myAdd xs
myAdd null = 0
f = let n = 10000000 in myAdd [1 .. n]
main = do
putStrLn (show f)
When compiling with GHC, this yields a stack overflow. As a C/C++ programmer, I would have expected the compiler to do tail call optimization.
I don't like that I would have to "help" the compiler in simple cases like these, but what options are there? I think it is reasonable to require that the calculation given above be done without using O(n) memory, and without deferring to specialized functions.
If I cannot state my problem naturally (even on a toy problem such as this), and expect reasonable performance in terms of time & space, much of the appeal of Haskell would be lost.

Firstly, make sure you're compiling with -O2. It makes a lot of performance problems just go away :)
The first problem I can see is that null is just a variable name there. You want []. It's equivalent here because the only options are x:xs and [], but it won't always be.
The issue here is simple: when you call sum [1,2,3,4], it looks like this:
1 + (2 + (3 + (4 + 0)))
without ever reducing any of these additions to a number, because of Haskell's non-strict semantics. The solution is simple:
myAdd = myAdd' 0
where myAdd' !total [] = total
myAdd' !total (x:xs) = myAdd' (total + x) xs
(You'll need {-# LANGUAGE BangPatterns #-} at the top of your source file to compile this.)
This accumulates the addition in another parameter, and is actually tail recursive (yours isn't; + is in tail position rather than myAdd). But in fact, it's not quite tail recursion we care about in Haskell; that distinction is mainly relevant in strict languages. The secret here is the bang pattern on total: it forces it to be evaluated every time myAdd' is called, so no unevaluated additions build up, and it runs in constant space. In this case, GHC can actually figure this out with -O2 thanks to its strictness analysis, but I think it's usually best to be explicit about what you want strict and what you don't.
Note that if addition was lazy, your myAdd definition would work fine; the problem is that you're doing a lazy traversal of the list with a strict operation, which ends up causing the stack overflow. This mostly comes up with arithmetic, which is strict for the standard numeric types (Int, Integer, Float, Double, etc.).
This is quite ugly, and it would be a pain to write something like this every time we want to write a strict fold. Thankfully, Haskell has an abstraction ready for this!
myAdd = foldl' (+) 0
(You'll need to add import Data.List to compile this.)
foldl' (+) 0 [a, b, c, d] is just like (((0 + a) + b) + c) + d, except that at each application of (+) (which is how we refer to the binary operator + as a function value), the value is forced to be evaluated. The resulting code is cleaner, faster, and easier to read (once you know how the list folds work, you can understand any definition written in terms of them easier than a recursive definition).
Basically, the problem here is not that the compiler can't figure out how to make your program efficient — it's that making it as efficient as you like could change its semantics, which an optimisation should never do. Haskell's non-strict semantics certainly pose a learning curve to programmers in more "traditional" languages like C, but it gets easier over time, and once you see the power and abstraction that Haskell's non-strictness offers, you'll never want to go back :)

Expanding the example ehird hinted at in the comments:
data Peano = Z | S Peano
deriving (Eq, Show)
instance Ord Peano where
compare (S a) (S b) = compare a b
compare Z Z = EQ
compare Z _ = LT
compare _ _ = GT
instance Num Peano where
Z + n = n
(S a) + n = S (a + n)
-- omit others
fromInteger 0 = Z
fromInteger n
| n < 0 = error "Peano: fromInteger requires non-negative argument"
| otherwise = S (fromInteger (n-1))
instance Enum Peano where
succ = S
pred (S a) = a
pred _ = error "Peano: no predecessor"
toEnum n
| n < 0 = error "toEnum: invalid argument"
| otherwise = fromInteger (toInteger n)
fromEnum Z = 0
fromEnum (S a) = 1 + fromEnum a
enumFrom = iterate S
enumFromTo a b = takeWhile (<= b) $ enumFrom a
-- omit others
infinity :: Peano
infinity = S infinity
result :: Bool
result = 3 < myAdd [1 .. infinity]
result is True by the definition of myAdd, but if the compiler transformed into a tail-recursive loop, it wouldn't terminate. So that transformation is not only a change in efficiency, but also in semantics, hence a compiler must not do it.

A little funny example regarding "The issue is why the compiler is unable to optimize something that appears to be rather trivial to optimize."
Let's say I'm coming from Haskell to C++. I used to write foldr because in Haskell foldr is usually more effective than foldl because of laziness and list fusion.
So I'm trying to write a foldr for a (single-linked) list in C and complaining why it's grossly inefficient:
int foldr(int (*f)(int, node*), int base, node * list)
{
return list == NULL
? base
: f(a, foldr(f, base, list->next));
}
It is inefficient not because the C compiler in question is an unrealistic toy tool developed by ivory tower theorists for their own satisfaction, but because the code in question is grossly non-idiomatic for C.
It is not the case that you cannot write an efficient foldr in C: you just need a doubly-linked list. In Haskell, similarly, you can write an efficient foldl, you need strictness annotations for foldl to be efficient. The standard library provides both foldl (without annotations) and foldl' (with annotations).
The idea of left folding a list in Haskell is the same kind of perversion as a desire to iterate a singly-linked list backwards using recursion in C. Compiler is there to help normal people, not perverts lol.
As your C++ projects probably don't have code iterating singly-linked lists backwards, my HNC project contains only 1 foldl I incorrectly wrote before I mastered Haskell enough. You hardly ever need to foldl in Haskell.
You must unlearn that forward iteration is natural and fastest, and learn that backward iteration is. The forward iteration (left folding) does not do what you intend, until you annotate: it does three passes - list creation, thunk chain buildup and thunk evaluation, instead of two (list creation and list traversal). Note that in immutable world lists can be only efficiently created backwards: a : b is O(1), and a ++ [b] is O(N).
And the backward iteration doesn't do what you intend either. It does one pass instead of three as you might expect from your C background. It doesn't create a list, traverse it to the bottom and then traverse it backwards (2 passes) - it traverses the list as it creates it, that is, 1 pass. With optimizations on, it is just a loop - no actual list elements are created. With optimizations off, it is still O(1) space operation with bigger constant overhead, but explanation is a bit longer.

So there are two things I will address about your problem, firstly the performance problem, and then secondly the expressive problem, that of having to help the compiler with something that seems trivial.
The performance
The thing is that your program is in fact not tail recursive, that is, there is no single call to a function that can replace the recursion. Lets have a look at what happens when we expand myAdd [1..3]:
myAdd [1,2,3]
1 + myAdd [2,3]
1 + 2 + myAdd [3]
As you can see, at any given step, we cannot replace the recursion with a function call, we could simplify the expression by reducing 1 + 2 to 3, but that is not what tail recursion is about.
So here is a version that is tail recursive:
myAdd2 = go 0
where go a [] = a
go a (x:xs) = go (a + x) xs
Lets have a look at how go 0 [1,2,3] is evaluated:
go 0 [1,2,3]
go (1+0) [2,3]
go (2 + 1 + 0) [3]
As you see, at every step, we only need to keep track of
one function call, and as long the first parameter is
evaluated strictly we should not get an exponential space
blow up, and in fact, if you compile with optimization (-O1 or -O2)
ghc is smart enough to figure that out on its own.
Expressiveness
Alright so it is a bit harder to reason about performance in haskell, but most of the time you don't have to. The thing is that you can use combinators that ensure efficiency. This particular pattern above is captured by foldl (and its strict cousin foldl') so myAdd can be written as:
myAdd = foldl (+) 0
and if you compile that with optimization it will not give you an exponential space blowup!

Extent of GHC's optimization

I am not very familiar with the degree that Haskell/GHC can optimize code. Below I have a pretty "brute-force" (in the declarative sense) implementation of the n queens problem. I know it can be written more efficiently, but thats not my question. Its that this got me thinking about the GHC optimizations capabilities and limits.
I have expressed it in what I consider a pretty straightforward declarative sense. Filter permutations of [1..n] that fulfill the predicate For all indices i,j s.t j<i, abs(vi - vj) != j-i I would hope this is the kind of thing that can be optimized, but it also kind of feels like asking a lot of compiler.
validQueens x = and [abs (x!!i - x!!j) /= j-i | i<-[0..length x - 2], j<-[i+1..length x - 1]]
queens n = filter validQueens (permutations [1..n])
oneThru x = [1..x]
pointlessQueens = filter validQueens . permutations . oneThru
main = do
n <- getLine
print $ pointlessQueens $ (read :: String -> Int) n
This runs fairly slow and grows quickly. n=10 takes about a second and n=12 takes forever. Without optimization I can tell the growth is factorial (# of permutations) multiplied by quadratic (# of differences in the predicate to check). Is there any way this code can perform better thru intelligent compilation? I tried the basic ghc options such has -O2 and didn't notice a significant difference, but I don't know the finer points (just added the flagS)
My impression is that the function i call queens can not be optimized and must generate all permutations before filter. Does the point-free version have a better chance? On the one hand I feel like a smart function comprehension between a filter and a predicate might be able to knock off some obviously undesired elements before they are even fully generated, but on the other hand it kind of feels like a lot to ask.
Sorry if this seems rambling, i guess my question is
Is the pointfree version of above function more capable of being optimized?
What steps could I take at make/compile/link time to encourage optimization?
Can you briefly describe some possible (and contrast with the impossible!) means of optimization for the above code? At what point in the process do these occur?
Is there any particular part of ghc --make queensN -O2 -v output I should be paying attention to? Nothing stands out to me. Don't even see much difference in output due to optimization flags
I am not overly concerned with this code example, but I thought writing it got me thinking and it seems to me like a decent vehicle for discussing optimization.
PS - permutations is from Data.List and looks like this:
permutations :: [a] -> [[a]]
permutations xs0 = xs0 : perms xs0 []
where
perms [] _ = []
perms (t:ts) is = foldr interleave (perms ts (t:is)) (permutations is)
where interleave xs r = let (_,zs) = interleave' id xs r in zs
interleave' _ [] r = (ts, r)
interleave' f (y:ys) r = let (us,zs) = interleave' (f . (y:)) ys r
in (y:us, f (t:y:us) : zs)

At a more general level regarding "what kind of optimizations can GHC do", it may help to break the idea of an "optimization" apart a little bit. There are conceptual distinctions that can be drawn between aspects of a program that can be optimized. For instance, consider:
The intrinsic logical structure of the algorithm: You can safely assume in almost every case that this will never be optimized. Outside of experimental research, you're not likely to find a compiler that will replace a bubble sort with a merge sort, or even an insertion sort, and extremely unlikely to find one that would replace a bogosort with something sensible.
Non-essential logical structure of the algorithm: For instance, in the expression g (f x) (f x), how many times will f x be computed? What about an expression like g (f x 2) (f x 5)? These aren't intrinsic to the algorithm, and different variations can be interchanged without impacting anything other than performance. The difficulties in performing optimization here are essentially recognizing when a substitution can in fact be done without changing the meaning, and predicting which version will have the best results. A lot of manual optimizations fall into this category, along with a great deal of GHC's cleverness.
This is also the part that trips a lot of people up, because they see how clever GHC is and expect it to do even more. And because of the reasonable expectation that GHC should never make things worse, it's not uncommon to have potential optimizations that seem obvious (and are, to the programmer) that GHC can't apply because it's nontrivial to distinguish cases where the same transformation would significantly degrade performance. This is, for instance, why memoization and common subexpression elimination aren't always automatic.
This is also the part where GHC has a huge advantage, because laziness and purity make a lot of things much easier, and is I suspect what leads to people making tongue-in-cheek remarks like "Optimizing compilers are a myth (except perhaps in Haskell).", but also to unrealistic optimism about what even GHC can do.
Low-level details: Things like memory layout and other aspects of the final code. These tend to be somewhat arcane and highly dependent on implementation details of the runtime, the OS, and the processor. Optimizations of this sort are essentially why we have compilers, and usually not something you need to worry about unless you're writing code that is very computationally demanding (or are writing a compiler yourself).
As far as your specific example here goes: GHC isn't going to significantly alter the intrinsic time complexity of your algorithm. It might be able to remove some constant factors. What it can't do is apply constant-factor improvements that it can't be sure are correct, particularly ones that technically change the meaning of the program in ways that you don't care about. Case in point here is #sclv's answer, which explains how your use of print creates unnecessary overhead; there's nothing GHC could do about that, and in fact the current form would possibly inhibit other optimizations.

There's a conceptual problem here. Permutations is generating streaming permutations, and filter is streaming too. What's forcing everything prematurely is the "show" implicit in "print". Change your last line to:
mapM print $ pointlessQueens $ (read :: String -> Int) n
and you'll see that results are generated in a streaming fashion much more rapidly. That fixes, for large result sets, a potential space leak, and other than that just lets things be printed as computed rather than all at once at the end.
However, you shouldn't expect any order of magnitude improvements from ghc optimizations (there are a few, obvious ones that you do get, mostly having to do with strictness and folds, but its irritating to rely on them). What you'll get are constant factors, generally.
Edit: As luqui points out below, show is also streaming (or at least show of [Int] is), but the line buffering nonetheless makes it harder to see the genuine speed of computation...

It should be noted, although you do express that it is not part of your question, that the big problem with your code is that you do not do any pruning.
In the case of your question, it feels foolish to talk about possible/impossible optimization, compiler flags, and how to best formulate it etc. when an improvement of the algorithm is staring us so blatantly in the face.
One of the first things that will be tried is the permutations starting with the first queen in position 1 and the second queen in position 2 ([1,2...]). This is of course not a solution and we will have to move one of the queens. However, in your implementation, all permutations involving this combination of the two first queens will be tested! The search should stop there and instantly move to the permutations involving [1,3,...].
Here is a version that does this sort of pruning:
import Data.List
import Control.Monad
main = getLine >>= mapM print . queens . read
queens :: Int -> [[Int]]
queens n = queens' [] n
queens' xs n
| length xs == n = return xs
| otherwise = do
x <- [1..n] \\ xs
guard (validQueens (x:xs))
queens' (x:xs) n
validQueens x =
and [abs (x!!i - x!!j) /= j-i | i<-[0..length x - 2], j<-[i+1..length x - 1]]

I understand that your question was about compiler optimization but as the discussion has shown pruning is necessary.
The first paper that I know of about how to do this for the n queens problem in a lazy functional language is Turner's paper "Recursion Equations as a Programming Language" You can read it in Google Books here.
In terms of your comment about a pattern worth remembering, this problem introduces a very powerful pattern. A great paper on this idea is Philip Wadler's paper, "How to Replace Failure by a List of Successes", which can be read in Google Books here
Here is a pure, non-monadic, implementation based on Turner's Miranda implementation. In the case of n = 12 (queens 12 12) it returns the first solution in .01 secs and will compute all 14,200 solutions in under 6 seconds. Of course printing those takes much longer.
queens :: Int -> Int -> [[Int]]
queens n boardsize =
queensi n
where
-- given a safe arrangement of queens in the first n - 1 rows,
-- "queensi n" returns a list of all the safe arrangements of queens
-- in the first n rows
queensi :: Int -> [[Int]]
queensi 0 = [[]]
queensi n = [ x : y | y <- queensi (n-1) , x <- [1..boardsize], safe x y 1]
-- "safe x y n" tests whether a queen at column x would be safe from previous
-- queens in y where the first element of y is n rows away from x, the second
-- element is (n+1) rows away from x, etc.
safe :: Int -> [Int] -> Int -> Bool
safe _ [] _ = True
safe x (c:y) n = and [ x /= c , x /= c + n , x /= c - n , safe x y (n+1)]
-- we only need to check for queens in the same column, and the same diagonals;
-- queens in the same row are not possible by the fact that we only pick one
-- queen per row

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string