Having trouble understanding list comprehensions - haskell

I've just started learning haskell(literally, tonight!) and I'm having a little trouble understanding the logic of list comprehensions, more specifically the <- operator. A little example on Learn You Some Haskell Finds all tuples that have a length less than 10:
ghci> let triangles = [ (a,b,c) | c <- [1..10], b <- [1..10], a <- [1..10] ]
my initial understanding was that these would all increment together, but after seeing the output I really dont understanding the incrementing method for these lists. Another example that seems to get me is:
ghci> let rightTriangles = [ (a,b,c) | c <- [1..10], b <- [1..c], a <- [1..b], a^2 + b^2 == c^2]
I would really appreciate a little explanation on these, thanks for your patience with my lack of haskell intelligence.

Read [ as "list of", | as "for", <- as "in", , as "and".
The enumerations are done in nested fashion. [ (a,b,c) | c <- [1..10], b <- [1..c], a <- [1..b], a^2 + b^2 == c^2] is really
for c from 1 to 10 step 1:
for b from 1 to c step 1:
for a from 1 to b step 1:
if (a^2 + b^2 == c^2):
emit (a,b,c)
In Haskell though, the above is achieved by the following translation
[1..10] >>= (\c-> -- (a function of 'c', producing ...
[1..c] >>= (\b-> -- (a function of 'b', producing ...
[1..b] >>= (\a-> -- (a function of 'a', producing ...
if a^2+b^2==c^2 then [(a,b,c)] else []
-- or: [(a,b,c) | a^2+b^2==c^2]
)))
so you really can see the nested structure here. (>>=) is nothing mysterious too. Read >>= as "fed into" or "pushed through", although its official name is "bind". It is defined (for lists) as
(xs >>= f) = concatMap f xs = concat (map f xs)
f here is called (by map) upon each element of xs, in order. It must produce lists so that they could be combined with concat. Since empty lists [] are eliminated on concat (e.g. concat [[1], [], [3]] == [1,3]) all the elements that do not pass the test are eliminated from the final output.
For the full translation see section 3.11, List Comprehensions, of the Haskell 98 Report. In general a list comprehension may contain a pattern, not just a variable name. The comprehension
[e | pat <- ls, ...]
is translated as
ls >>= (\x -> case x of pat -> [e | ...] ;
_ -> [] )
where pat is some pattern, and x is a fresh variable. When there's a pattern mismatch, an empty list is produced (instead of a run-time error), and that element x of ls is skipped over. This is useful for additional pattern-based filtering, like e.g. [x | Just x <- ls, even x] where all the Nothings in ls are quietly ignored.

[ (a,b,c) | c <- [1..10], b <- [1..10], a <- [1..10] ] means, for all combinations of (a,b,c) where a is in [1..10], b is in [1..10], c is in [1..10]
If you want the (1,1,1) (2,2,2) kinds, you should use zip: zip [1..10] [1..10] or for 3 lists, zip3 [1..10] [1..10] [1..10]

I think of list comprehension syntax as Haskell's attempt to get Set-builder notation in the language. We use '[' rather than '{' and '<-' rather than '∈'. List comprehension syntax can even be generalized to arbitrary monads.

Related

List comprehensions with arguments drawn from the same list in Haskell

I've a question regarding list comprehensions in Haskell.
I have an exam later this week and therefore did some old exams where I found this question:
"Write a function that given a positive integer n returns a list of positive integers m ≤ n such that there are two positive integers x and y, such that x^2 + y^3 = m. The list needs to be sorted"
There were two possible answers,
either
squareCube::Int->[Int]
squareCube n =[a|a<-[1..n],x<-[1..n],y<-[1..n],x^2+y^3==a]
or
import Data.List
squareCube::Int->[Int]
squareCube n =
sort [a|x<-[1..n],y<-[1..n],a<-[1..n],x^2+y^3==a]
I wonder why I need to use the sort function when a comes after x and y in my comprehension. Why does the order between the arguments matter?
This list is sorted:
[ 1, 1, 1
, 2, 2, 2
, 3, 3, 3
, 4, 4, 4 ]
This one isn't:
[ 1, 2, 3, 4
, 1, 2, 3, 4
, 1, 2, 3, 4 ]
This is only vaguely related to the question: it addresses the programming challenge, but does not answer the question as asked about why the existing approaches work. But it was too fun to avoid writing a snippet about it, so here goes.
With appropriate imports, you can very efficiently generate even the infinite list of square-cube sums. The basic idea is to make an infinite list of infinite lists; we will maintain the invariant that the outer infinite list is sorted by the heads of the inner infinite lists. Then it's easy and efficient to merge all of these. With the appropriate package it's a one-liner, and very succinctly matches the problem description:
import Data.List.Ordered
squareCubes = unionAll [[x^2+y^3 | x <- [1..]] | y <- [1..]]
We can compare the efficiency of this to the existing two approaches. Here's the test program, which I compiled with -O2:
import Data.List
import Data.List.Ordered
import System.Environment
squareCubes = unionAll [[x^2+y^3 | x <- [1..]] | y <- [1..]]
squareCube n = takeWhile (<=n) squareCubes
squareCube' n = [a|a<-[1..n],x<-[1..n],y<-[1..n],x^2+y^3==a]
squareCube'' n = sort [a|x<-[1..n],y<-[1..n],a<-[1..n],x^2+y^3==a]
main = do
[kind, limit] <- getArgs
let f = case kind of
"inf" -> squareCube
"unsorted" -> squareCube'
"sorted" -> squareCube''
print . sum . f . read $ limit
And here are the timings, which are quite stark indeed:
% /usr/bin/time ./test unsorted 700
57465
9.60user 0.01system 0:09.63elapsed 99%CPU (0avgtext+0avgdata 4156maxresident)k
% /usr/bin/time ./test sorted 700
57465
1.87user 0.00system 0:01.87elapsed 99%CPU (0avgtext+0avgdata 4056maxresident)k
% /usr/bin/time ./test inf 700
50895
0.00user 0.00system 0:00.00elapsed 100%CPU (0avgtext+0avgdata 3616maxresident)k
The others take seconds (aeons in computer time) while the one that's in some ways more capable than the others doesn't even register on the timer! I also experimented to find how large of an input we could give before arriving at the timings for the other two implementations. I found that an input of 500000000 takes 8.88 seconds -- almost six orders of magnitude higher in the roughly the same time.
Wait, wait, you say: those outputs are different. So what gives? Well, it turns out that the slow implementations have what I consider to be a bug: they will spit out a single number multiple times if there are multiple ways to construct it as the sum of squares and cubes. For example,
> squareCube' 17
[2,5,9,10,12,17,17]
> squareCube 17
[2,5,9,10,12,17]
because 3^2 + 2^3 = 4^2 + 1^3. On the other hand, if this is the intended behavior, one can easily achieve it in the efficient, lazy one-liner by replacing unionAll with mergeAll.
We need a sort function when a comes after x and y in the comprehension because of the order of evaluation. If a <- [1..] is called first, each of the subsequent statements will be evaluated against each a in turn, so the a's already form an increasing list:
a = 1
x <- [1..n]
y <- [1..n]
...return a if there's a valid match
a = 2
x <- [1..n]
y <- [1..n]
...return a if there's a valid match
etc.
However, if a <- [1..n] is evaluated last, we may not get an ordered sequence of as:
x = 1
y <- [1..n]
...
y = 1
a <- [1..n]
...
a = 2 2
y = 2
a <- [1..n]
...
a = 9 9
x = 2
y <- [1..n]
...
y = 1
a <- [1..n]
...
a = 5 5
To see clearly what's going on with list comprehensions, try
do { print [ (x,[10..13]) | x <- [1,2]]
; print [ [(x,y) | y <- [10..13]] | x <- [1,2]]
; print [ r | x <- [1,2], r <- [(x,y) | y <- [10..13]]]
; print [ (x,y) | x <- [1,2], y <- [10..13] ]
}
=>
[ (1,[10,11,12,13]), (2,[10,11,12,13]) ]
[[(1,10),(1,11),(1,12),(1,13)], [(2,10),(2,11),(2,12),(2,13)]]
[ (1,10),(1,11),(1,12),(1,13), (2,10),(2,11),(2,12),(2,13) ]
[ (1,10),(1,11),(1,12),(1,13), (2,10),(2,11),(2,12),(2,13) ]
On the other hand,
do { print [ ([1,2],y) | y <- [10..13] ]
; print [ (x,y) | y <- [10..13], x <- [1,2] ]
}
=>
[ ([1,2],10), ([1,2],11), ([1,2],12), ([1,2],13) ]
[ (1,10),(2,10), (1,11),(2,11), (1,12),(2,12), (1,13),(2,13) ]
List comprehensions work in the nested fashion.
You could re-write your first code as
map (\(a,_,_) -> a) $
filter (\(a,x,y) -> x^2+y^3==a) $
liftA3 (,,) [1..n] [1..n] [1..n] -- (1)
and the second as
map (\(_,_,a) -> a) $
filter (\(x,y,a) -> x^2+y^3==a) $
liftA3 (,,) [1..n] [1..n] [1..n] -- (2)
You could be tempted to see (1) and (2) as a very general "all combinations of three elements drawn from same list" thing. But Haskell is a deterministic language. It produces the same results for the same inputs. Thus it imposes a certain order on the resulting list.
And this is what it means when we say that list comprehensions work in a nested fashion — the leftmost list's element changes the slowest, and the rightmost's the fastest, in the resulting combination — like in an odometer.
To solve your problem, you could write
sort [a | x2 <- takeWhile (<= n) [x^2 | x <- [1..]]
, a <- takeWhile (<= n) [x2+y^3 | y <- [1..]] ]
but this requires careful thought, the code doesn't express the original intent as clearly, and isn't still as optimal as the one using Data.List.Ordered.mergeAll (from data-ordlist package) as seen in the answer by Daniel Wagner,
takeWhile (<= n) . mergeAll $ [[x^2+y^3 | x <- [1..]] | y <- [1..]]
although both have the same time complexity, more or less. mergeAll merges the ordered non-decreasing lists it is presented with, by using pairwise merges arranged in a tree slanted to the right.
Come to think of it, we too could write the more natural-looking
sort . concat . map (takeWhile (<= n))
$ [[x^2+y^3 | x <- [1..n]] | y <- [1..n]]
This doesn't work with the infinite list of lists. To fix this, we could write
-- sort . concat . takeWhile (not . null) . map (takeWhile (<= n))
sort . concat . map (takeWhile (<= n)) . takeWhile ((<= n).head)
$ [[x^2+y^3 | x <- [1..]] | y <- [1..]]
In Haskell, it is quite often better not to try too hard to figure it all out ourselves, but leave it to the lazy evaluation to take care of things. Here it didn't quite worked unfortunately, and we had to take special care to be able to deal with infinite list1, with those explicit takeWhiles superfluous to the task's logic.
Indeed it is true, for any n, that
under n (union a b) == nub . sort $ under n a ++ under n b
under n . unionAll . take m == under n . foldl union [] . take m
under n . unionAll == nub . sort . concat
. takeWhile (not.null) . map (under n)
and
under n (merge a b) == sort $ under n a ++ under n b
under n . mergeAll . take m == under n . foldl merge [] . take m
under n . mergeAll == sort . concat
. takeWhile (not.null) . map (under n)
using under n = takeWhile (<= n), with ordered increasing lists.
1Data.List.Ordered.mergeAll takes care of the infinite lists all by itself, and is better in the sense that it is on-line - it starts producing its output much earlier than our tortured constructed function. The point wasn't that the library function isn't needed, but just to see what can be done without it.

How does listx2 = [x * 2 | x<- numberList] work?

So I m watching a very basic Tutorial, and I m at list comprehension where this comes up:
listx2 = [x * 2 | x<- numberList]
with numberList being a list of numbers
So this takes every number in the list and duplicates it, so numberList = [1,2] results in [2,4].
But HOW does the whole Syntax come together?
I know that x * 2 is the doubleing, but the rest just doesn't make sense to me.
| is the "or" Symbol as far as I know,and what does it do there?
x <- numberList gives x a number from the list, but why does it take just a number? and why so nicely one after the other? There is no recursion or anything that tells it to do one element at a time...
I learn stuff by understanding it, so is that even possible here or do I just have to accept this as "thats how it goes" and memorize the pattern?
List comprehensions use their own special syntax, which is
[ e | q1, q2, ..., qn ]
The | is not an "or", it's part of the syntax, just as [ and ].
Each qi can be of the following forms.
x <- list chooses x from the list
condition is a boolean expression, which discards the xs chosen before if the condition is false
let y = expression defines variable y accordingly
Finally, e is an expression which can involve all the variables defined in the qi, and which forms the elements in the resulting list.
What you see is syntactical sugar. So Haskell does not interpret the pipe (|) as a guard, etc. It sees the list comprehension as a whole.
This however does not mean that the <- are picked at random. Actually list comprehension maps nicely on the list monad. What you see is syntactical sugar for:
listx2 = do
x <- numberList
return x*2
Now a list type [] is actually a monad. It means that we have written:
listx2 = numberList >>= \x -> return (x*2)
Or even shorter:
listx2 = numberList >>= return . (*2)
Now the list monad is defined as:
instance Monad [] where
return x = [x]
xs >>= k = concat $ fmap k xs
So this means that it is equivalent to:
listx2 = numberList >>= return . (*2)
listx2 = concat (fmap (return . (*2)) numberList)
listx2 = concat (fmap (\x -> [2*x]) numberList)
Now for a list fmap is equal to map, so:
listx2 = concat $ map (\x -> [2*x]) numberList
listx2 = concatMap (\x -> [2*x]) numberList
so that means that for every element x in the numberList we will generate a singleton list [2*x] and concatenate all these singleton lists into the result.

How are dependent ranges computed in a list comprehension?

I'm currently making my way through Learn You a Haskell for Great Good!, and I'm confused on the penultimate example in Chapter 2.
As a way of generating triples representing all right triangles with all sides that are whole numbers less than or equal to 10, he gives this definition:
rightTriangles = [ (a,b,c) | c <- [1..10], b <- [1..c], a <- [1..b], a^2 + b^2 == c^2]
What I'm specifically confused about is the fact that b is bound to a list that ranges from 1 to c, and similarly with a. If my understanding is correct, c will be evaluated to all values in the list it is bound to, but I still don't see which value is being used for c in the range (e.g. all values of c, only the first c, etc.)
If it's not too much, a step by step explanation of how this evaluates would be great. :)
Thanks in advance!
Let's consider two simpler list comprehensions:
ex1 = [(a,b) | a <- [1..3], b <- [1..3]]
ex2 = [(a,b) | a <- [1..3], b <- [1..a]]
They're almost the same, but in the second case, b ranges from 1 to a, not 1 to 3. Let's consider what they're equal to; I've formatted their values in such a way as to make a point.
ex1 = [ (1,1), (1,2), (1,3)
, (2,1), (2,2), (2,3)
, (3,1), (3,2), (3,3) ]
ex2 = [ (1,1),
, (2,1), (2,2),
, (3,1), (3,2), (3,3) ]
In the first example, the list comprehension draws every possible combination of elements from [1..3] and [1..3]. But since we're talking about lists, not sets, the order it does that in is important. Thus, in more detail, what ex1 really means is this:
Let a be equal to every possible value from its list.
For each value of a, let b be every possible value from its list.
(a,b) is an element of the output list
Or, rephrased: "for every possible value of a, compute (a,b) for every possible value of b." If you look at the order of the results, this is what happens:
For the first three elements, a is equal to 1, and we see it paired with every value of b.
For the next three elements, a is equal to 2, and we see every value of b.
And finally, for the last three elements, a is equal to 3 and we see every value of b.
In the second case, much the same thing happens. But because a is picked first, b can depend on it. Thus:
First, a is equal to 1, and we see it paired with every possible value of b. Since b <- [1..a], that means b <- [1..1], and so there's only one option.
After one element, then, a is equal to 2, and we see that paired with every possible value of b. Now that means b <- [1..2], and so we get two results.
Finally, a is equal to 3, and so we're picking b <- [1..3]; this gives us the full set of three results.
In other words, because the list comprehensions rely on an ordering, you can take advantage of that. One way to see that is to imagine translating these list comprehensions into nested list comprehensions:
ex1 = concat [ [(a,b) | b <- [1..3]] | a <- [1..3] ]
ex2 = concat [ [(a,b) | b <- [1..a]] | a <- [1..3] ]
To get the right behavior, a <- [1..3] must go on the outside; this ensures that the bs change faster than the as. And it hopefully makes it clear how b can depend on a. Another translation (basically the one used in the Haskell 2010 Report) would be:
ex1 = concatMap (\a -> [(a,b) | b <- [1..3]]) [1..3]
= concatMap (\a -> concatMap (\b -> [(a,b)]) [1..3]) [1..3]
ex2 = concatMap (\a -> [(a,b) | b <- [1..a]]) [1..3]
= concatMap (\a -> concatMap (\b -> [(a,b)]) [1..a]) [1..3]
Again, this makes the nesting very explicit, even if it's hard to follow. Something to keep in mind is that if the selection of a is to happen first, it must be on the outside of the translated expression, even though it's on the inside of the list comprehension. The full, formal translation of rightTriangles would then be
rightTriangles =
concatMap (\c ->
concatMap (\b ->
concatMap (\a ->
if a^2 + b^2 == c^2
then [(a,b,c)]
else []
) [1..b]
) [1..c]
) [1..10]
As a side note, another way to write rightTriangles is as follows:
import Control.Monad (guard)
rightTriangles = do c <- [1..10]
b <- [1..c]
a <- [1..b]
guard $ a^2 + b^2 == c^2
return (a,b,c)
You probably haven't used do notation yet, and certainly not for anything but IO, so I'm not saying you should necessarily understand this. But you can read the x <- list lines as saying "for each x in list", and so read this as a nested loop:
rightTriangles = do
c <- [1..10] -- For each `c` from `1` to `10`, ...
b <- [1..c] -- For each `b` from `1` to `c`, ...
a <- [1..b] -- For each `a` from `1` to `b`, ...
guard $ a^2 + b^2 == c^2 -- If `a^2 + b^2 /= c^2`, then `continue` (as in C);
return (a,b,c) -- `(a,b,c)` is the next element of the output list.
Note that the continue only skips to the next iteration of the innermost loop in this interpretation. You could also write it as
rightTriangles = do c <- [1..10]
b <- [1..c]
a <- [1..b]
if a^2 + b^2 == c^2
then return (a,b,c)
else [] -- or `mzero`
Where the last lines say "if a^2 + b^2 == c^2, add (a,b,c) to the output list; otherwise, add nothing." I only mention this because I thought seeing it written this way might help make the "nested loop"-type structure that's going on clear, not because you should fully understand do-notation while reading Chapter 2 of Learn You A Haskell :-)
Seeing you have experience with imperative programming, a short answer would be: similar to this for nesting (pseudo code):
for(c = 1; c <= 10; c++) {
for(b = 1; b <= c; b++) {
for(a = 1; a <= b; a++) {
if(a ^ 2 + b ^ 2 == c ^ 2) {
list.append((a, b, c));
}
}
}
}

Why can't a list be defined like this in Haskell?

[(a,b) | a <- [1..5], b <- [1..5], a+b <- [1..10] ]
Trying to define define a list that follows these rules. I know it doesn't allow the way I am adding a and b but I don't understand why.
Edited forgot the "<-"
Maybe you want this?
[(a,b) | a <- [1..5], b <- [1..5], a + b >= 1 && a + b <= 10]
Or this?
[(a,b) | a <- [1..5], b <- [1..5], a + b `elem` [1..10]]
Haskell doesn't solve equations for you, it performs calculations.
This declares the values which a variable takes:
a <- [1..5]
This is a request for Haskell to solve an equation, which it doesn't do... the left side has to be a valid pattern.
a + b <- [1..10] # Not valid Haskell
Of course, patterns can be more sophisticated,
> [a | Just a <- [Just 10, Nothing, Just 20]]
[10, 20]
The "<-" in a list comprehension actually draws elements from a list.
Your third expression, "a+b <- [1..10]" is really trying to express that the sum could be drawn from a list. That's a job for (elem), a predicate, or test.
elem :: (Eq a) => a -> [a] -> Bool
A pretty good way to think about the problem is, how would it be implemented? You'd have to take elements, then test to see if they met the criteria. I'd use a predicate like
elem (a + b) [1..10]
to test that.

Can you create more than one element of a list at a time with a list comprehension in haskell?

So, for example, say I had a list of numbers and I wanted to create a list that contained each number multiplied by 2 and 3. Is there any way to do something like the following, but get back a single list of numbers instead of a list of lists of numbers?
mult_nums = [ [(n*2),(n*3)] | n <- [1..5]]
-- this returns [[2,3],[4,6],[6,9],[8,12],[10,15]]
-- but we want [2,3,4,6,6,9,8,12,10,15]
I find that extending the list comprehension makes this easier to read:
[ m | n <- [1..5], m <- [2*n,3*n] ]
It might be helpful to examine exactly what this does, and how it relates to other solutions. Let's define it as a function:
mult lst = [ m | n <- lst, m <- [2*n,3*n] ]
After a fashion, this desugars to
mult' lst =
concatMap (\n -> concatMap (\m -> [m]) [2*n,3*n]) lst
The expression concatMap (\m -> [m]) is wrapping m up in a list in order to immediately flatten it—it is equivalent to map id.
Compare this to #FunctorSalad's answer:
mult1 lst = concatMap (\n -> [n*2,n*3]) lst
We've optimized away concatMap (\m -> [m]).
Now #vili's answer:
mult2 lst = concat [ [(n*2),(n*3)] | n <- lst]
This desugars to:
mult2' lst = concat (concatMap (\n -> [[2*n,3*n]]) lst)
As in the first solution above, we are unnecessarily creating a list of lists that we have to concat away.
I don't think there is a solution that uses list comprehensions, but desugars to mult1. My intuition is that Haskell compilers are generally clever enough that this wouldn't matter (or, alternatively, that unnecessary concats are cheap due to lazy evaluation (whereas they're lethal in eager languages)).
you could use concat.
concat [ [(n*2),(n*3)] | n <- [1..5]]
output: [2,3,4,6,6,9,8,12,10,15]
In some similar cases concatMap can also be convenient, though here it doesn't change much:
concatMap (\n -> [n*2,n*3]) [1..5]

Resources