Haskell: List comprehension predicate order - haskell

After reading about the Haskell syntax for List Comprehensions online, I got the feeling that predicates always come last. Eg:
[(x,y) | x <- [1..10000], y <- [1..100], x==2000, odd y]
But the following line accomplishes the same result:
[(x,y) | x <- [1..10000], x==2000, y <- [1..100], odd y]
Normally I would just take this as a hint that the order doesn't matter and be done with it. However this is a problem that comes from an old exam, and the answer to the problem says that while the results may be the same, the way in which they are computed may differ.
I'm assuming this is true but I can't find any information about it on the web. So my question is: How could the computations differ between the two list comprehensions and why? Are list comprehensions some form of syntactic sugar that I don't know about?

You can think of a list comprehension like
[(x,y) | x <- [1..10000], y <- [1..100], x==2000, odd y]
as corresponding to the imperative pseudo-code
for x in [1..10000]:
for y in [1..100];
if x == 2000:
if odd y:
yield (x,y)
and
[(x,y) | x <- [1..10000], x==2000, y <- [1..100], odd y]
as corresponding to
for x in [1..10000]:
if x == 2000;
for y in [1..100]:
if odd y:
yield (x,y)
Specifically, passing the list comprehension to something like mapM_ print is the same operationally as replacing yield by print in the imperative version.
Obviously, it's almost always better to "float" a guard/if out of a generator/for when possible. (The rare exception is when the generator is actually an empty list, and the guard condition is expensive to compute.)

They differ in the way of how many intermediary results/lists are generated.
You can visualize this with some trace - note that I modified this a bit to give reasonable results - also I replaced the return values by () to make it clearer:
comprehension1 = [ () | x <- [1..3], trace' 'x' x, y <- [1..3], trace' 'y' y, x==2, odd y]
comprehension2 = [ () | x <- [1..3], trace' 'x' x, x==2, y <- [1..3], trace' 'y' y, odd y]
trace' :: Show a => Char -> a -> Bool
trace' c x = trace (c : '=' : show x) True
here is the evaluation:
λ> comprehension1
x=1
y=1
y=2
y=3
x=2
y=1
[()y=2
y=3
,()x=3
y=1
y=2
y=3
]
λ> comprehension2
x=1
x=2
y=1
[()y=2
y=3
,()x=3
]
now do you notice something?
Obviously in the first example every (x,y) pair for x=1,2,3 and y=1,2,3 is generated before the filters are applied.
But in the second example the ys are only generated when x=2 - so you could say it's better/more performant

Related

Infinite loop on a simple list for two predicates

When i try to compile this line :
mult y = [x*2 | x <- [1..], x <= y]
And run it, I have an infinite loop that I must cancel with CTRL + C
*Main> mult 10
[2,4,6,8,10,12,14,16,18,20
Do you know why those predicate are not correctly interpreted ?
Thank you
You're looking for
mult y = [x * 2 | x <- [1..y]]
In this version, the [1..y] gets compiled to a finite list from 1 up to y. In your original code
mult y = [x * 2 | x <- [1..], x <= y]
Haskell doesn't understand complicated concepts like the nature of <= as an ordering or that [1..] is a monotonic list. So Haskell is determined to come up with every natural number, just to make sure some really big number out there doesn't happen to be less than y, by some fluke. You and I can look at that code and see that it obviously won't find any, but Haskell doesn't understand that, so it goes looking anyway.

Haskell: Filtering a list based on a predicate for all other elements in the list

I have a list of natural numbers [1..n] (this list is never empty) and I would like to filter each element by testing a predicate with all other elements in the list. I would like to return a list of those numbers who never fulfilled the predicate. My idea is this:
filter (\x -> 1 == length [y| y <- [1..n], pred y x]) [1..n]
I am testing if the length is equal to 1 since for x==y the predicate returns true.
This does work as intended, however, I was wondering if there is a cleaner way to do this. I'm not really looking for more performance, but rather a more simple solution.
As far as complexity, I don't think you can do better than quadratic, since, after all, the very definition of the problem is to test each element with each other. So unless there is more to be known about the structure of the problem, you're stuck there.
But you can perhaps cut down on the performance somewhat by stopping early. Calculating length every time means enumerating all elements from 1 to n, but you don't actually need that, right? You can stop enumerating once pred returns True for the first time. To do that you can use and:
filter (\x -> and [not (pred y x) | y <- [1..n], y /= x]) [1..n]
Or, alternatively, you can move the predicate to the condition part and then test the resulting list for emptiness:
filter (\x -> null [y <- [1..n], y /= x && pred y x]) [1..n]
But I like the former variant better, because it better describes the intent.
Finally, I think this would look cleaner as a list comprehension:
[ x
| x <- [1..n]
, and [not (pred y x) | y <- [1..n], y /= x]
]
But that's a matter of personal taste, of course.

Using list comprehension with two variables in haskell

does someone know how I can do a list comprehension with two variables in haskell?
ex.
[ x * y | x <- [1..10] y <- [1..10]]
it should result in
[1,4,9,16,25,36,49,64,81,100]
but it actually yields in ghci
<interactive>:13:23-24: error:
parse error on input ‘<-’
Perhaps this statement should be within a 'do' block?
Well there are two problems here: a syntactical one, and a semantical one.
Towards a valid list comprehension expression
The syntactical one is that you separate the parts of list comprehension (these can be generators, filters, and let clauses) by a comma (,):
[ x * y | x <- [1..10], y <- [1..10]]
But now we will not get the desired output. Indeed:
Prelude> [ x * y | x <- [1..10], y <- [1..10]]
[1,2,3,4,5,6,7,8,9,10,2,4,6,8,10,12,14,16,18,20,3,6,9,12,15,18,21,24,27,30,4,8,12,16,20,24,28,32,36,40,5,10,15,20,25,30,35,40,45,50,6,12,18,24,30,36,42,48,54,60,7,14,21,28,35,42,49,56,63,70,8,16,24,32,40,48,56,64,72,80,9,18,27,36,45,54,63,72,81,90,10,20,30,40,50,60,70,80,90,100]
What we here have is all multiplications between two integers from 1 to 10. Since for every x in the list [1..10], we iterate through the list [1..10] for y. This however does not match with your requested list, hence a semantical error.
Obtaining a list of squares
What you seem to want is a list of all square numbers. In that case there is only one variable x, and for each value of x, we yield x*x:
[ x * x | x <- [1..10]]
this then yields:
Prelude> [ x * x | x <- [1..10]]
[1,4,9,16,25,36,49,64,81,100]
Enumerating lists in parallel
In case you have two lists you want to enumerate in parallel, you can do this with a zip, for example if we want to multiply the elements of [1..10] with the elements of [5..14] elementwise, we can do this with:
[ x * y | (x, y) <- zip [1..10] [5..14]]
We can also work with the ParallelListComp extension as #DanielWagner says:
{-# LANGUAGE ParallelListComp #-}
[ x * y | x <- [1..10] | y <- [5..14]]
You need to zip the two ranges together:
[ x * y | (x, y) <- zip [1..10] [1..10] ]
You can have two separate iterators, separated with a comma
[ x * y | x <- [1..10], y <-[1..10] ]
but this computes the cartesian product of the two sets, resulting in a full multiplication table rather a list of squares.

List comprehensions with arguments drawn from the same list in Haskell

I've a question regarding list comprehensions in Haskell.
I have an exam later this week and therefore did some old exams where I found this question:
"Write a function that given a positive integer n returns a list of positive integers m ≤ n such that there are two positive integers x and y, such that x^2 + y^3 = m. The list needs to be sorted"
There were two possible answers,
either
squareCube::Int->[Int]
squareCube n =[a|a<-[1..n],x<-[1..n],y<-[1..n],x^2+y^3==a]
or
import Data.List
squareCube::Int->[Int]
squareCube n =
sort [a|x<-[1..n],y<-[1..n],a<-[1..n],x^2+y^3==a]
I wonder why I need to use the sort function when a comes after x and y in my comprehension. Why does the order between the arguments matter?
This list is sorted:
[ 1, 1, 1
, 2, 2, 2
, 3, 3, 3
, 4, 4, 4 ]
This one isn't:
[ 1, 2, 3, 4
, 1, 2, 3, 4
, 1, 2, 3, 4 ]
This is only vaguely related to the question: it addresses the programming challenge, but does not answer the question as asked about why the existing approaches work. But it was too fun to avoid writing a snippet about it, so here goes.
With appropriate imports, you can very efficiently generate even the infinite list of square-cube sums. The basic idea is to make an infinite list of infinite lists; we will maintain the invariant that the outer infinite list is sorted by the heads of the inner infinite lists. Then it's easy and efficient to merge all of these. With the appropriate package it's a one-liner, and very succinctly matches the problem description:
import Data.List.Ordered
squareCubes = unionAll [[x^2+y^3 | x <- [1..]] | y <- [1..]]
We can compare the efficiency of this to the existing two approaches. Here's the test program, which I compiled with -O2:
import Data.List
import Data.List.Ordered
import System.Environment
squareCubes = unionAll [[x^2+y^3 | x <- [1..]] | y <- [1..]]
squareCube n = takeWhile (<=n) squareCubes
squareCube' n = [a|a<-[1..n],x<-[1..n],y<-[1..n],x^2+y^3==a]
squareCube'' n = sort [a|x<-[1..n],y<-[1..n],a<-[1..n],x^2+y^3==a]
main = do
[kind, limit] <- getArgs
let f = case kind of
"inf" -> squareCube
"unsorted" -> squareCube'
"sorted" -> squareCube''
print . sum . f . read $ limit
And here are the timings, which are quite stark indeed:
% /usr/bin/time ./test unsorted 700
57465
9.60user 0.01system 0:09.63elapsed 99%CPU (0avgtext+0avgdata 4156maxresident)k
% /usr/bin/time ./test sorted 700
57465
1.87user 0.00system 0:01.87elapsed 99%CPU (0avgtext+0avgdata 4056maxresident)k
% /usr/bin/time ./test inf 700
50895
0.00user 0.00system 0:00.00elapsed 100%CPU (0avgtext+0avgdata 3616maxresident)k
The others take seconds (aeons in computer time) while the one that's in some ways more capable than the others doesn't even register on the timer! I also experimented to find how large of an input we could give before arriving at the timings for the other two implementations. I found that an input of 500000000 takes 8.88 seconds -- almost six orders of magnitude higher in the roughly the same time.
Wait, wait, you say: those outputs are different. So what gives? Well, it turns out that the slow implementations have what I consider to be a bug: they will spit out a single number multiple times if there are multiple ways to construct it as the sum of squares and cubes. For example,
> squareCube' 17
[2,5,9,10,12,17,17]
> squareCube 17
[2,5,9,10,12,17]
because 3^2 + 2^3 = 4^2 + 1^3. On the other hand, if this is the intended behavior, one can easily achieve it in the efficient, lazy one-liner by replacing unionAll with mergeAll.
We need a sort function when a comes after x and y in the comprehension because of the order of evaluation. If a <- [1..] is called first, each of the subsequent statements will be evaluated against each a in turn, so the a's already form an increasing list:
a = 1
x <- [1..n]
y <- [1..n]
...return a if there's a valid match
a = 2
x <- [1..n]
y <- [1..n]
...return a if there's a valid match
etc.
However, if a <- [1..n] is evaluated last, we may not get an ordered sequence of as:
x = 1
y <- [1..n]
...
y = 1
a <- [1..n]
...
a = 2 2
y = 2
a <- [1..n]
...
a = 9 9
x = 2
y <- [1..n]
...
y = 1
a <- [1..n]
...
a = 5 5
To see clearly what's going on with list comprehensions, try
do { print [ (x,[10..13]) | x <- [1,2]]
; print [ [(x,y) | y <- [10..13]] | x <- [1,2]]
; print [ r | x <- [1,2], r <- [(x,y) | y <- [10..13]]]
; print [ (x,y) | x <- [1,2], y <- [10..13] ]
}
=>
[ (1,[10,11,12,13]), (2,[10,11,12,13]) ]
[[(1,10),(1,11),(1,12),(1,13)], [(2,10),(2,11),(2,12),(2,13)]]
[ (1,10),(1,11),(1,12),(1,13), (2,10),(2,11),(2,12),(2,13) ]
[ (1,10),(1,11),(1,12),(1,13), (2,10),(2,11),(2,12),(2,13) ]
On the other hand,
do { print [ ([1,2],y) | y <- [10..13] ]
; print [ (x,y) | y <- [10..13], x <- [1,2] ]
}
=>
[ ([1,2],10), ([1,2],11), ([1,2],12), ([1,2],13) ]
[ (1,10),(2,10), (1,11),(2,11), (1,12),(2,12), (1,13),(2,13) ]
List comprehensions work in the nested fashion.
You could re-write your first code as
map (\(a,_,_) -> a) $
filter (\(a,x,y) -> x^2+y^3==a) $
liftA3 (,,) [1..n] [1..n] [1..n] -- (1)
and the second as
map (\(_,_,a) -> a) $
filter (\(x,y,a) -> x^2+y^3==a) $
liftA3 (,,) [1..n] [1..n] [1..n] -- (2)
You could be tempted to see (1) and (2) as a very general "all combinations of three elements drawn from same list" thing. But Haskell is a deterministic language. It produces the same results for the same inputs. Thus it imposes a certain order on the resulting list.
And this is what it means when we say that list comprehensions work in a nested fashion — the leftmost list's element changes the slowest, and the rightmost's the fastest, in the resulting combination — like in an odometer.
To solve your problem, you could write
sort [a | x2 <- takeWhile (<= n) [x^2 | x <- [1..]]
, a <- takeWhile (<= n) [x2+y^3 | y <- [1..]] ]
but this requires careful thought, the code doesn't express the original intent as clearly, and isn't still as optimal as the one using Data.List.Ordered.mergeAll (from data-ordlist package) as seen in the answer by Daniel Wagner,
takeWhile (<= n) . mergeAll $ [[x^2+y^3 | x <- [1..]] | y <- [1..]]
although both have the same time complexity, more or less. mergeAll merges the ordered non-decreasing lists it is presented with, by using pairwise merges arranged in a tree slanted to the right.
Come to think of it, we too could write the more natural-looking
sort . concat . map (takeWhile (<= n))
$ [[x^2+y^3 | x <- [1..n]] | y <- [1..n]]
This doesn't work with the infinite list of lists. To fix this, we could write
-- sort . concat . takeWhile (not . null) . map (takeWhile (<= n))
sort . concat . map (takeWhile (<= n)) . takeWhile ((<= n).head)
$ [[x^2+y^3 | x <- [1..]] | y <- [1..]]
In Haskell, it is quite often better not to try too hard to figure it all out ourselves, but leave it to the lazy evaluation to take care of things. Here it didn't quite worked unfortunately, and we had to take special care to be able to deal with infinite list1, with those explicit takeWhiles superfluous to the task's logic.
Indeed it is true, for any n, that
under n (union a b) == nub . sort $ under n a ++ under n b
under n . unionAll . take m == under n . foldl union [] . take m
under n . unionAll == nub . sort . concat
. takeWhile (not.null) . map (under n)
and
under n (merge a b) == sort $ under n a ++ under n b
under n . mergeAll . take m == under n . foldl merge [] . take m
under n . mergeAll == sort . concat
. takeWhile (not.null) . map (under n)
using under n = takeWhile (<= n), with ordered increasing lists.
1Data.List.Ordered.mergeAll takes care of the infinite lists all by itself, and is better in the sense that it is on-line - it starts producing its output much earlier than our tortured constructed function. The point wasn't that the library function isn't needed, but just to see what can be done without it.

what does this symbol | mean in Haskelll

Someone can explain this coding to me.
[ x*y | x <- [2,5,10], y <- [8,10,11], x*y > 50]
I don't understand the meaning of this | symbol in haskell
You should read it as "where" or "such that" -
-- x * y where x is from [2,5,10] and y is from [8,10,11] and x * y > 50
[ x * y | x <- [2,5,10], y <- [8,10,11], x * y > 50]
or, alternatively, if you're familiar with Python and its list comprehensions, you might read it as "for"
-- [x * y for x in [2,5,10] for y in [8,10,11] if x * y > 50]
[x * y | x <- [2,5,10], y <- [8,10,11], x * y > 50]
symbol '|' has the same meaning as symbol '|' in math (set theory). You just should read it like 'such that'. In math the symbol '|' sometimes is replaced by ':'.
symbol '<-' is read as 'is drawn from'.
And the expression x <- [2,5,10] is called a generator. A list comprehension can have more than one generator, with successive generators being separated by commas.
List comprehensions can also use logical expressions called guards to filter the values produced by earlier generators. If a guard is True, then the current values are retained, and, if it is False, then they are discarded. For example, the comprehension [x | x <- [1..10], even x] produces the list [2,4,6,8,10] of all even numbers from list [1..10].
Hope it would help you to understand the meaning of symbol '|' and '<-' in list comprehensions.
A translation to English would be something like
A list whose elements are of the form x*y, such that x is an element of [2,5,10], y is an element of [8,10,11], and x*y is greater than 50.
The | symbol is here part of the syntax for list comprehensions; it's not an operator or anything else that has independent meaning, it simply serves to separate the expression for the elements of the list being defined (the x*y* part in this case) from the generators and filters (the x <- [2,5,10], y <- [8,10,11], x*y > 50 part). In the translation to English, I rendered the | symbol as "such that"; "where" is also common.
The syntax for writing list comprehensions is inspired by how set comprehensions are written in mathematics; in the examples on that page you can clearly see a vertical bar used to separate the form of set elements from the conditions on the elements.
I would prefer to think the | here as under these condition:

Resources