I see go a lot when reading Haskell material or source, but I've never been really comfortable about it - (I guess it has the negative connotation of "goto" in my mind). I started learning Haskell with LYAH, and that's where I picked up the tendency to use acc and step when writing folds. Where does the convention for writing go come from?
Most importantly, what exactly is the name go supposed to imply?
Hmm! Some archaeology!
Since around 2004 I've used go as the generic name for tail-recursive worker loops, when doing a worker/wrapper transformation of a recursive function. I started using it widely in bytestring, e.g.
foldr :: (Word8 -> a -> a) -> a -> ByteString -> a
foldr k v (PS x s l) = inlinePerformIO $ withForeignPtr x $ \ptr ->
go v (ptr `plusPtr` (s+l-1)) (ptr `plusPtr` (s-1))
where
STRICT3(go)
go z p q | p == q = return z
| otherwise = do c <- peek p
go (c `k` z) (p `plusPtr` (-1)) q -- tail recursive
{-# INLINE foldr #-}
was from bytestring in August 2005.
This got written up in RWH, and probably was popularized from there. Also, in the stream fusion library, Duncan Coutts and I started doing it a lot.
From the GHC sources
The idiom goes back further though. foldr in GHC.Base is given as:
foldr k z = go
where
go [] = z
go (y:ys) = y `k` go ys
which is probably where I picked up the trick (I'd thought this was from Andy Gill's thesis, but can't find any use of go there). It isn't given in this form in Gofer, so I think this first appeared in the GHC code base.
By 2001, Simon Marlow was using go in some of the systems-level code, so we might place the blame somewhere in GHC, and this clue leads us to the GHC source, where go is widely used in worker functions:
myCollectBinders expr
= go [] expr
where
go bs (Lam b e) = go (b:bs) e
go bs e#(Note (SCC _) _) = (reverse bs, e)
go bs (Cast e _) = go bs e
go bs (Note _ e) = go bs e
go bs e = (reverse bs, e)
GHC 3.02 and Glasgow
Digging up old versions of GHC, we see that in GHC 0.29 this idiom does not appear, but by GHC 3.02 series (1998), the go idiom appears everywhere. An example, in Numeric.lhs, in the definition of showInt, dated to 1996-1997:
showInt n r
| n < 0 = error "Numeric.showInt: can't show negative numbers"
| otherwise = go n r
where
go n r =
case quotRem n 10 of { (n', d) ->
case chr (ord_0 + fromIntegral d) of { C# c# -> -- stricter than necessary
let
r' = C# c# : r
in
if n' == 0 then r' else go n' r'
}}
this is a different implementation to the one given in the H98 report. Digging into the implementation of "Numeric.lhs", however, we find that it isn't the same as the version that was added to GHC 2.06 in 1997, and a very interesting patch from Sigbjorne Finne appears, in April 1998, adding a go loop to Numeric.lhs.
This says that at least by 1998, Sigbjorne was adding go loops to the GHC "std" library, while simultaneously, many modules in the GHC compiler core had go loops. Digging further, this very interesting commit from Will Partain in July 1996 adds a "go" loop into GHC -- the code comes from Simon PJ though!
So I'm going to call this as a Glasgow idiom invented by people at Glasgow who worked on GHC in the mid 90s, such as Simon Marlow, Sigbjorn Finne, Will Partain and Simon Peyton Jones.
Obviously Don's answer is the correct one. Let me just add in a small detail (since it seems to be my writing that you're directly referring to): go is nice because it's only two letters.
Oh, and the reason the Yesod book devotes so much content to the enumerator package is because I'd already written up the three-part tutorial of enumerator as a blog post series, so decided I may as well include it in the book. The enumerator package is used in a number of places throughout Yesod, so it is relevant.
I'd expect this idiom to be applicable not just to linear structures (and hence "loops"), but also to branching (tree-like) structures.
I wonder how often the go pattern corresponds to accumulation parameters and, more generally, with the continuation-encoding strategites that Mitch Wand explored in the paper Continuation-Based Program Transformation Strategies (one of my all-time favorite papers).
In these cases, the go function has a particular meaning, which can then be used to derive efficient code from an elegant specification.
Related
So far I have seen numerous "Maybe" versions of certain partial functions that would potentially result in ⊥, like readMaybe for read and listToMaybe for head; sometimes I wonder if we can generalise the idea and work out such a function safe :: (a -> b) -> (a -> Maybe b) to convert any partial function into their safer total alternative that returns Nothing on any instances where error stack would have been called in the original function. As till now I have not found a way to implement such safe function or existing implementations of a similar kind, and I come to doubt if this idea is truly viable.
There are two kinds of bottom actually, non-termination and error. You cannot catch non-termination, for obvious reasons, but you can catch errors. Here is a quickly thrown-together version (I am not an expert so there are probably better ways)
{-# LANGUAGE ScopedTypeVariables #-}
import Control.Exception
import System.IO.Unsafe
safe f = unsafePerformIO $ do
z <- try (evaluate f)
let r = case z of
Left (e::SomeException) -> Nothing
Right k -> Just k
return r
Here are some examples
*Main > safe (head [42])
Just 42
*Main > safe (head [])
Nothing
*Main λ safe (1 `div` 0)
Nothing
*Main λ safe (1 `div` 2)
Just 0
No, it's not possible. It violates a property called "monotonicity", which says that a value cannot become more defined as you process it. You can't branch on bottoms - attempting to process one always results in bottom.
Or at least, that's all true of the domain theory Haskell evaluation is based on. But Haskell has a few extra features domain theory doesn't... Like executing IO actions being a different thing than evaluation, and unsafePerformIO letting you hide execution inside evaluation. The spoon library packages all of these ideas together as well as can be done. It's not perfect. It has holes, because this isn't something you're supposed to be able to do. But it does the job in a bunch of common cases.
Consider the function
collatz :: Integer -> ()
collatz 1 = ()
collatz n
| even n = collatz $ n`div`2
| otherwise = collatz $ 3*n + 1
(Let's pretend Integer is the type of positive whole numbers for simplicity)
Is this a total function? Nobody knows! For all we know, it could be total, so your proposed safe-guard can't ever yield Nothing. But neither has anybody found a proof that it is total, so if safe just always gives back Just (collatz n) then this may still be only partial.
This is the usual definition of the fixed-point combinator in Haskell:
fix :: (a -> a) -> a
fix f = let x = f x in x
On https://wiki.haskell.org/Prime_numbers, they define a different fixed-point combinator:
_Y :: (t -> t) -> t
_Y g = g (_Y g) -- multistage, non-sharing, g (g (g (g ...)))
-- g (let x = g x in x) -- two g stages, sharing
_Y is a non-sharing fixpoint combinator, here arranging for a recursive "telescoping" multistage primes production (a tower of producers).
What exactly does this mean? What is "sharing" vs. "non-sharing" in that context? How does _Y differ from fix?
"Sharing" means f x re-uses the x that it creates; but with _Y g = g . g . g . g . ..., each g calculates its output anew (cf. this and this).
In that context, the sharing version has much worse memory usage, leads to a space leak.1
The definition of _Y mirrors the usual lambda calculus definition's effect for the Y combinator, which emulates recursion by duplication, while true recursion refers to the same (hence, shared) entity.
In
x = f x
(_Y g) = g (_Y g)
both xs refer to the same entity, but each of (_Y g)s refer to equivalent, but separate, entity. That's the intention of it, anyway.
Of course thanks to referential transparency there's no guarantee in Haskell the language for any of this. But GHC the compiler does behave this way.
_Y g is a common sub-expression and it could be "eliminated" by a compiler by giving it a name and reusing that named entity, subverting the whole purpose of it. That's why the GHC has the "no common sub-expressions elimination" -fno-cse flag which prevents this explicitly. It used to be that you had to use this flag to achieve the desired behaviour here, but not anymore. GHC won't be as aggressive at common sub-expressions elimination anymore, with the more recent (read: several years now) versions.
disclaimer: I'm the author of that part of the page you're referring to. Was hoping for the back-and-forth that's usual on wiki pages, but it never came, so my work didn't get reviewed like that. Either no-one bothered, or it is passable (lacking major errors). The wiki seems to be largely abandoned for many years now.
1 The g function involved,
(3:) . minus [5,7..] . foldr (\ (x:xs) ⟶ (x:) . union xs) []
. map (\ p ⟶ [p², p² + 2p..])
produces an increasing stream of all odd primes given an increasing stream of all odd primes. To produce a prime N in value, it consumes its input stream up to the first prime above sqrt(N) in value, at least. Thus the production points are given roughly by repeated squaring, and there are ~ log (log N) of such g functions in total in the chain (or "tower") of these primes producers, each immediately garbage collectible, the lowest one producing its primes given just the first odd prime, 3, known a priori.
And with the two-staged _Y2 g = g x where { x = g x } there would be only two of them in the chain, but only the top one would be immediately garbage collectible, as discussed at the referenced link above.
_Y is translated to the following STG:
_Y f = let x = _Y f in f x
fix is translated identically to the Haskell source:
fix f = let x = f x in x
So fix f sets up a recursive thunk x and returns it, while _Y is a recursive function, and importantly it’s not tail-recursive. Forcing _Y f enters f, passing a new call to _Y f as an argument, so each recursive call sets up a new thunk; forcing the x returned by fix f enters f, passing x itself as an argument, so each recursive call is into the same thunk—this is what’s meant by “sharing”.
The sharing version usually has better memory usage, and also lets the GHC RTS detect some kinds of infinite loop. When a thunk is forced, before evaluation starts, it’s replaced with a “black hole”; if at any point during evaluation of a thunk a black hole is reached from the same thread, then we know we have an infinite loop and can throw an exception (which you may have seen displayed as Exception: <<loop>>).
I think you already received excellent answers, from a GHC/Haskell perspective. I just wanted to chime in and add a few historical/theoretical notes.
The correspondence between unfolding and cyclic views of recursion is rigorously studied in Hasegawa's PhD thesis: https://www.springer.com/us/book/9781447112211
(Here's a shorter paper that you can read without paying Springer: https://link.springer.com/content/pdf/10.1007%2F3-540-62688-3_37.pdf)
Hasegawa assumes a traced monoidal category, a requirement that is much less stringent than the usual PCPO assumption of domain theory, which forms the basis of how we think about Haskell in general. What Hasegawa showed was that one can define these "sharing" fixed point operators in such a setting, and established that they correspond to the usual unfolding view of fixed points from Church's lambda-calculus. That is, there is no way to tell them apart by making them produce different answers.
Hasegawa's correspondence holds for what's known as central arrows; i.e., when there are no "effects" involved. Later on, Benton and Hyland extended this work and showed that the correspondence holds for certain cases when the underlying arrow can perform "mild" monadic effects as well: https://pdfs.semanticscholar.org/7b5c/8ed42a65dbd37355088df9dde122efc9653d.pdf
Unfortunately, Benton and Hyland only allow effects that are quite "mild": Effects like the state and environment monads fit the bill, but not general effects like exceptions, lists, or IO. (The fixed point operators for these effectful computations are known as mfix in Haskell, with the type signature (a -> m a) -> m a, and they form the basis of the recursive-do notation.)
It's still an open question how to extend this work to cover arbitrary monadic effects. Though it doesn't seem to be receiving much attention these days. (Would make a great PhD topic for those interested in the correspondence between lambda-calculus, monadic effects, and graph-based computations.)
I'm reading a Introdution to Haskell course and they're introducing well known Tower of Hanoi problem as a homework for the first class. I was tempted and wrote a solution:
type Peg = String
type Move = (Peg, Peg)
hanoi :: Int -> Peg -> Peg -> Peg -> [Move]
hanoi n b a e
| n == 1 = [(b, e)]
| n > 1 = hanoi (n - 1) b e a ++ hanoi 1 b a e ++ hanoi (n - 1) a b e
| otherwise = []
I've played with it a little and saw that it obviously using Tail Call Optimization since it works in constant memory.
Clojure is language that I work with most of the time and hence I was challenged to write a Clojure solution. Naive ones are discarded since I want to write it to use TCO:
(defn hanoi-non-optimized
[n b a e]
(cond
(= n 1) [[b e]]
(> n 1) (concat (hanoi-non-optimized (dec n) b e a)
(hanoi-non-optimized 1 b a e)
(hanoi-non-optimized (dec n) a b e))
:else []))
Well, Clojure is JVM hosted and thus don't have TCO by default and one should use recur to get it (I know the story...). In the other hand, recur imposes some syntactic constraints since it have to be last expression - have to be the tail. I feel a bit bad because I still can't write a solution that is short/expressive as that one in Haskell and use TCO at the same time.
Is there a simple solution for this which I can't see at the moment?
I have a great respect for both languages and already know that this is rather problem with my approach than with Clojure itself.
No, the Haskell code isn't tail-recursive. It is guarded-recursive, with recursion guarded by a lazy data constructor, : (to which the ++ calls are ultimately transformed), where because of the laziness only one part of the recursion call tree (a ++ b ++ c) is explored in its turn, so the stack's depth never exceeds n, the number of disks. Which is very small, like 7 or 8.
So Haskell code explores a, putting the c part aside. Your Clojure code on the other hand calculates the two parts (a and c, as b doesn't count) before concatenating them, so is double recursive, i.e. computationally heavy.
What you're looking for is not TCO, but TRMCO -- tail recursion modulo cons optimization, -- i.e. building the list in a top-down manner from inside a loop with a simulated stack. Clojure is especially well-suited for this, with its tail-appending conj (right?) instead of Lisp's and Haskell's head-prepending cons.
Or just print the moves instead of building the list of all of them.
edit: actually, TRMCO means we're allowed to reuse the call frame if we maintain the "continuation stack" ourselves, so the stack depth becomes exactly 1. Haskell in all likelihood builds a left-deepening tree of nested ++ thunk nodes in this case, as explained here, but in Clojure we're allowed to rearrange it into the right-nested list ourselves, when we maintain our own stack of to-do-next invocation descriptions (for the b and c parts of the a ++ b ++ c expression).
According to this article,
Enumerations don't count as single-constructor types as far as GHC is concerned, so they don't benefit from unpacking when used as strict constructor fields, or strict function arguments. This is a deficiency in GHC, but it can be worked around.
And instead the use of newtypes is recommended. However, I cannot verify this with the following code:
{-# LANGUAGE MagicHash,BangPatterns #-}
{-# OPTIONS_GHC -O2 -funbox-strict-fields -rtsopts -fllvm -optlc --x86-asm-syntax=intel #-}
module Main(main,f,g)
where
import GHC.Base
import Criterion.Main
data D = A | B | C
newtype E = E Int deriving(Eq)
f :: D -> Int#
f z | z `seq` False = 3422#
f z = case z of
A -> 1234#
B -> 5678#
C -> 9012#
g :: E -> Int#
g z | z `seq` False = 7432#
g z = case z of
(E 0) -> 2345#
(E 1) -> 6789#
(E 2) -> 3535#
f' x = I# (f x)
g' x = I# (g x)
main :: IO ()
main = defaultMain [ bench "f" (whnf f' A)
, bench "g" (whnf g' (E 0))
]
Looking at the assembly, the tags for each constructor of the enumeration D is actually unpacked and directly hard-coded in the instruction. Furthermore, the function f lacks error-handling code, and more than 10% faster than g. In a more realistic case I have also experienced a slowdown after converting a enumeration to a newtype. Can anyone give me some insight about this? Thanks.
It depends on the use case. For the functions you have, it's expected that the enumeration performs better. Basically, the three constructors of D become Ints resp. Int#s when the strictness analysis allows that, and GHC knows it's statically checked that the argument can only have one of the three values 0#, 1#, 2#, so it needs not insert error handling code for f. For E, the static guarantee of only one of three values being possible isn't given, so it needs to add error handling code for g, that slows things down significantly. If you change the definition of g so that the last case becomes
E _ -> 3535#
the difference vanishes completely or almost completely (I get a 1% - 2% better benchmark for f still, but I haven't done enough testing to be sure whether that's a real difference or an artifact of benchmarking).
But this is not the use case the wiki page is talking about. What it's talking about is unpacking the constructors into other constructors when the type is a component of other data, e.g.
data FooD = FD !D !D !D
data FooE = FE !E !E !E
Then, if compiled with -funbox-strict-fields, the three Int#s can be unpacked into the constructor of FooE, so you'd basically get the equivalent of
struct FooE {
long x, y, z;
};
while the fields of FooD have the multi-constructor type D and cannot be unpacked into the constructor FD(1), so that would basically give you
struct FooD {
long *px, *py, *pz;
}
That can obviously have significant impact.
I'm not sure about the case of single-constructor function arguments. That has obvious advantages for types with contained data, like tuples, but I don't see how that would apply to plain enumerations, where you just have a case and splitting off a worker and a wrapper makes no sense (to me).
Anyway, the worker/wrapper transformation isn't so much a single-constructor thing, constructor specialisation can give the same benefit to types with few constructors. (For how many constructors specialisations would be created depends on the value of -fspec-constr-count.)
(1) That might have changed, but I doubt it. I haven't checked it though, so it's possible the page is out of date.
I would guess that GHC has changed quite a bit since that page was last updated in 2008. Also, you're using the LLVM backend, so that's likely to have some effect on performance as well. GHC can (and will, since you've used -O2) strip any error handling code from f, because it knows statically that f is total. The same cannot be said for g. I would guess that it's the LLVM backend that then unpacks the constructor tags in f, because it can easily see that there is nothing else used by the branching condition. I'm not sure of that, though.
I am not very familiar with the degree that Haskell/GHC can optimize code. Below I have a pretty "brute-force" (in the declarative sense) implementation of the n queens problem. I know it can be written more efficiently, but thats not my question. Its that this got me thinking about the GHC optimizations capabilities and limits.
I have expressed it in what I consider a pretty straightforward declarative sense. Filter permutations of [1..n] that fulfill the predicate For all indices i,j s.t j<i, abs(vi - vj) != j-i I would hope this is the kind of thing that can be optimized, but it also kind of feels like asking a lot of compiler.
validQueens x = and [abs (x!!i - x!!j) /= j-i | i<-[0..length x - 2], j<-[i+1..length x - 1]]
queens n = filter validQueens (permutations [1..n])
oneThru x = [1..x]
pointlessQueens = filter validQueens . permutations . oneThru
main = do
n <- getLine
print $ pointlessQueens $ (read :: String -> Int) n
This runs fairly slow and grows quickly. n=10 takes about a second and n=12 takes forever. Without optimization I can tell the growth is factorial (# of permutations) multiplied by quadratic (# of differences in the predicate to check). Is there any way this code can perform better thru intelligent compilation? I tried the basic ghc options such has -O2 and didn't notice a significant difference, but I don't know the finer points (just added the flagS)
My impression is that the function i call queens can not be optimized and must generate all permutations before filter. Does the point-free version have a better chance? On the one hand I feel like a smart function comprehension between a filter and a predicate might be able to knock off some obviously undesired elements before they are even fully generated, but on the other hand it kind of feels like a lot to ask.
Sorry if this seems rambling, i guess my question is
Is the pointfree version of above function more capable of being optimized?
What steps could I take at make/compile/link time to encourage optimization?
Can you briefly describe some possible (and contrast with the impossible!) means of optimization for the above code? At what point in the process do these occur?
Is there any particular part of ghc --make queensN -O2 -v output I should be paying attention to? Nothing stands out to me. Don't even see much difference in output due to optimization flags
I am not overly concerned with this code example, but I thought writing it got me thinking and it seems to me like a decent vehicle for discussing optimization.
PS - permutations is from Data.List and looks like this:
permutations :: [a] -> [[a]]
permutations xs0 = xs0 : perms xs0 []
where
perms [] _ = []
perms (t:ts) is = foldr interleave (perms ts (t:is)) (permutations is)
where interleave xs r = let (_,zs) = interleave' id xs r in zs
interleave' _ [] r = (ts, r)
interleave' f (y:ys) r = let (us,zs) = interleave' (f . (y:)) ys r
in (y:us, f (t:y:us) : zs)
At a more general level regarding "what kind of optimizations can GHC do", it may help to break the idea of an "optimization" apart a little bit. There are conceptual distinctions that can be drawn between aspects of a program that can be optimized. For instance, consider:
The intrinsic logical structure of the algorithm: You can safely assume in almost every case that this will never be optimized. Outside of experimental research, you're not likely to find a compiler that will replace a bubble sort with a merge sort, or even an insertion sort, and extremely unlikely to find one that would replace a bogosort with something sensible.
Non-essential logical structure of the algorithm: For instance, in the expression g (f x) (f x), how many times will f x be computed? What about an expression like g (f x 2) (f x 5)? These aren't intrinsic to the algorithm, and different variations can be interchanged without impacting anything other than performance. The difficulties in performing optimization here are essentially recognizing when a substitution can in fact be done without changing the meaning, and predicting which version will have the best results. A lot of manual optimizations fall into this category, along with a great deal of GHC's cleverness.
This is also the part that trips a lot of people up, because they see how clever GHC is and expect it to do even more. And because of the reasonable expectation that GHC should never make things worse, it's not uncommon to have potential optimizations that seem obvious (and are, to the programmer) that GHC can't apply because it's nontrivial to distinguish cases where the same transformation would significantly degrade performance. This is, for instance, why memoization and common subexpression elimination aren't always automatic.
This is also the part where GHC has a huge advantage, because laziness and purity make a lot of things much easier, and is I suspect what leads to people making tongue-in-cheek remarks like "Optimizing compilers are a myth (except perhaps in Haskell).", but also to unrealistic optimism about what even GHC can do.
Low-level details: Things like memory layout and other aspects of the final code. These tend to be somewhat arcane and highly dependent on implementation details of the runtime, the OS, and the processor. Optimizations of this sort are essentially why we have compilers, and usually not something you need to worry about unless you're writing code that is very computationally demanding (or are writing a compiler yourself).
As far as your specific example here goes: GHC isn't going to significantly alter the intrinsic time complexity of your algorithm. It might be able to remove some constant factors. What it can't do is apply constant-factor improvements that it can't be sure are correct, particularly ones that technically change the meaning of the program in ways that you don't care about. Case in point here is #sclv's answer, which explains how your use of print creates unnecessary overhead; there's nothing GHC could do about that, and in fact the current form would possibly inhibit other optimizations.
There's a conceptual problem here. Permutations is generating streaming permutations, and filter is streaming too. What's forcing everything prematurely is the "show" implicit in "print". Change your last line to:
mapM print $ pointlessQueens $ (read :: String -> Int) n
and you'll see that results are generated in a streaming fashion much more rapidly. That fixes, for large result sets, a potential space leak, and other than that just lets things be printed as computed rather than all at once at the end.
However, you shouldn't expect any order of magnitude improvements from ghc optimizations (there are a few, obvious ones that you do get, mostly having to do with strictness and folds, but its irritating to rely on them). What you'll get are constant factors, generally.
Edit: As luqui points out below, show is also streaming (or at least show of [Int] is), but the line buffering nonetheless makes it harder to see the genuine speed of computation...
It should be noted, although you do express that it is not part of your question, that the big problem with your code is that you do not do any pruning.
In the case of your question, it feels foolish to talk about possible/impossible optimization, compiler flags, and how to best formulate it etc. when an improvement of the algorithm is staring us so blatantly in the face.
One of the first things that will be tried is the permutations starting with the first queen in position 1 and the second queen in position 2 ([1,2...]). This is of course not a solution and we will have to move one of the queens. However, in your implementation, all permutations involving this combination of the two first queens will be tested! The search should stop there and instantly move to the permutations involving [1,3,...].
Here is a version that does this sort of pruning:
import Data.List
import Control.Monad
main = getLine >>= mapM print . queens . read
queens :: Int -> [[Int]]
queens n = queens' [] n
queens' xs n
| length xs == n = return xs
| otherwise = do
x <- [1..n] \\ xs
guard (validQueens (x:xs))
queens' (x:xs) n
validQueens x =
and [abs (x!!i - x!!j) /= j-i | i<-[0..length x - 2], j<-[i+1..length x - 1]]
I understand that your question was about compiler optimization but as the discussion has shown pruning is necessary.
The first paper that I know of about how to do this for the n queens problem in a lazy functional language is Turner's paper "Recursion Equations as a Programming Language" You can read it in Google Books here.
In terms of your comment about a pattern worth remembering, this problem introduces a very powerful pattern. A great paper on this idea is Philip Wadler's paper, "How to Replace Failure by a List of Successes", which can be read in Google Books here
Here is a pure, non-monadic, implementation based on Turner's Miranda implementation. In the case of n = 12 (queens 12 12) it returns the first solution in .01 secs and will compute all 14,200 solutions in under 6 seconds. Of course printing those takes much longer.
queens :: Int -> Int -> [[Int]]
queens n boardsize =
queensi n
where
-- given a safe arrangement of queens in the first n - 1 rows,
-- "queensi n" returns a list of all the safe arrangements of queens
-- in the first n rows
queensi :: Int -> [[Int]]
queensi 0 = [[]]
queensi n = [ x : y | y <- queensi (n-1) , x <- [1..boardsize], safe x y 1]
-- "safe x y n" tests whether a queen at column x would be safe from previous
-- queens in y where the first element of y is n rows away from x, the second
-- element is (n+1) rows away from x, etc.
safe :: Int -> [Int] -> Int -> Bool
safe _ [] _ = True
safe x (c:y) n = and [ x /= c , x /= c + n , x /= c - n , safe x y (n+1)]
-- we only need to check for queens in the same column, and the same diagonals;
-- queens in the same row are not possible by the fact that we only pick one
-- queen per row