Why is evaluating keys to WHNFs enough to construct Sets in Haskell? - haskell

Haskell's Data.Set page says:
Key arguments are evaluated to WHNF
in its "Strictness properties" section.
However, I wonder why WHNF is enough to construct Sets. For example, to construct an instance of Set [Int], we must evaluate its elements of Lists of Ints deeply to compare them.

To expand on #chi’s comment: This only means that keys are at least evaluated to WHNF. Often, they might be evaluated always once they are compared, but not always. Let’s invoke ghc-heap-view and look at a few examples:
Prelimaries:
~ $ ghci
GHCi, version 8.0.1: http://www.haskell.org/ghc/ :? for help
Prelude> :script /home/jojo/.cabal/share/x86_64-linux-ghc-8.0.1/ghc-heap-view-0.5.7/ghci
Prelude> import qualified Data.Set as S
singleton does not fully evaluate its argument (the _bco is a thunk):
Prelude S> let t = [True, True && True] -- a thunk in a list
Prelude S> let s1 = S.singleton t
Prelude S> s1 `seq` ()
()
Prelude S> :printHeap s1
let x1 = True()
x2 = Tip()
in Bin [x1,(_bco _fun)()] x2 x2 1
And even inserting another element will not fully evaluate the second elements in the list, as they can be distinguished already by looking at the first element:
Prelude S> let t2 = [False, False && False] -- a thunk in another list
Prelude S> let s2 = S.insert t2 s1
Prelude S> s2 `seq` ()
()
Prelude S> :printHeap s2
let x1 = True()
x2 = toArray (0 words)
f1 = _fun
x3 = []
x4 = False()
x5 = Tip()
in Bin (x1 : (_bco f1)() : x3) (Bin (x4 : (_bco f1)() : x3) x5 x5 1) x5 2
But inserting t2 again will now force the second element of that list:
Prelude S> let s3 = S.insert t2 s2
Prelude S> s3 `seq` ()
()
Prelude S> :printHeap s3
let x1 = True()
x2 = []
x3 = False()
x4 = Tip()
in Bin (x1 : (_bco _fun)() : x2) (Bin (x3 : _bh x3 : x2) x4 x4 1) x4 2
So you cannot rely on Data.Set to evaluate the keys fully as you store them. If you want that, you need to use, for example, (singleton $!! t1) and (insert $!! t2).
(If someone wants to replace the ghc-heap-view output in this answer with ghc-vis graphs, feel free to do so :-)).

There may be data types usable as keys that don't need to evaluate everything they contain to determine equality or order. Lists are no such types, of course.
However, while lists must be fully evaluated to find equality, they don't necessary need to in order to find non-equality. That is, we wouldn't expect that
[1..] == [2..]
computes forever. Likewise with
[] == [(40+2)..]
Here it is enough to get WHNF of the second list to find that it is not equal to the first one. We need not bother to compute 40+2, much less the succeeding elements.

Related

Unresolved top level overloading

Task is to find all two-valued numbers representable as the sum of the sqrt's of two natural numbers.
I try this:
func = [sqrt (x) + sqrt (y) | x <- [10..99], y <- [10..99], sqrt (x) `mod` 1 == 0, sqrt (y) `mod` 1 == 0]
Result:
Unresolved top-level overloading Binding : func
Outstanding context : (Integral b, Floating b)
How can I fix this?
This happens because of a conflict between these two types:
sqrt :: Floating a => a -> a
mod :: Integral a => a -> a -> a
Because you write mod (sqrt x) 1, and sqrt is constrained to return the same type as it takes, the compiler is left trying to find a type for x that simultaneously satisfies the Floating constraint of sqrt and the Integral constraint of mod. There are no types in the base library that satisfy both constraints.
A quick fix is to use mod' :: Real a => a -> a -> a:
import Data.Fixed
func = [sqrt (x) + sqrt (y) | x <- [10..99], y <- [10..99], sqrt (x) `mod'` 1 == 0, sqrt (y) `mod'` 1 == 0]
However, from the error you posted, it looks like you may not be using GHC, and mod' is probably a GHC-ism. In that case you could copy the definition (and the definition of the helper function div') from here.
But I recommend a more involved fix. The key observation is that if x = sqrt y, then x*x = y, so we can avoid calling sqrt at all. Instead of iterating over numbers and checking if they have a clean sqrt, we can iterate over square roots; their squares will definitely have clean square roots. A straightforward application of this refactoring might look like this:
sqrts = takeWhile (\n -> n*n <= 99)
. dropWhile (\n -> n*n < 10)
$ [0..]
func = [x + y | x <- sqrts, y <- sqrts]
Of course, func is a terrible name (it's not even a function!), and sqrts is a constant we could compute ourselves, and is so short we should probably just inline it. So we might then simplify to:
numberSums = [x + y | x <- [4..9], y <- [4..9]]
At this point, I would be wondering whether I really wanted to write this at all, preferring just
numberSums = [8..18]
which, unlike the previous iteration, doesn't have any duplicates. It has lost all of the explanatory power of why this is an interesting constant, though, so you would definitely want a comment.
-- sums of pairs of numbers, each of whose squares lies in the range [10..99]
numberSums = [8..18]
This would be my final version.
Also, although the above definitions were not parameterized by the range to search for perfect squares in, all the proposed refactorings can be applied when that is a parameter; I leave this as a good exercise for the reader to check that they have understood each change.

Understanding `runEval` - Not WHNF?

Given:
Prelude> import Control.Parallel.Strategies
Prelude> import Control.Parallel
Prelude> let fact n = if (n <= 0) then 1 else n * fact (n-1) :: Integer
Prelude> let xs = map (runEval . (\x -> return x :: Eval Integer) . fact) [1..100]
Prelude> let ys = map fact [1..100]
Prelude> :sprint xs
xs = _
Prelude> :sprint ys
ys = _
As I understand, xs is in Weak Head Normal Form. Why is that? Didn't the runEval have any affect on bringing the value/computation to Normal Form?
The reason is that let just binds a name with an expression but it doesn't trigger any evaluation of the expression.
To understand better, let me use a more simple example
Main> let x = error "foobar!" in 1
1
As you can see, the error "foobar!", that should throw exception, is just ignored. The reason is that x is not used and thus Haskell doesn't evaluate it. You need something to trigger the evaluation of x
Main> let x = error "foobar!" in x `seq` 1
*** Exception: foobar!
Going back to your example, note that Eval x specifies how to evaluate a x, not when it will be evaluated in your program.
Have a look at this wiki article on Lazyness for more.

Expression Evaluation In Haskell: Fixing the type of a sub-expression causes parent expression to be evaluated to different degrees

I am not able to explain the following behavior:
Prelude> let x = 1 + 2
Prelude> let y = (x,x)
Prelude> :sprint y
Prelude> y = _
Now when I specify a type for x:
Prelude> let x = 1 + 2 ::Int
Prelude> let y = (x,x)
Prelude> :sprint y
Prelude> y = (_,_)
Why does the specification of x's type force y to its weak head normal form (WHNF)?
I accidentally discovered this behavior while reading Simon Marlow's Parallel and Concurrent Programming In Haskell.
Here's an informed guess. In your first example,
x :: Num a => a
So
y :: Num a => (a, a)
In GHC core, this y is a function that takes a Num dictionary and gives a pair. If you were to evaluate y, then GHCi would default it for you and apply the Integer dictionary. But from what you've shown, it seems likely that doesn't happen with sprint. Thus you don't yet have a pair; you have a function that produces one.
When you specialize to Int, the dictionary is applied to x, so you get
x :: Int
y :: (Int, Int)
Instead of a function from a dictionary, x is now a thunk. Now no dictionary needs to be applied to evaluate y! y is just the application of the pair constructor to two pointers to the x thunk. Applying a constructor doesn't count as computation, so it's never delayed lazily.

Pattern matching against a type with only one constructor

If I pattern-match against an expression whose type has only one constructor, will that still force the runtime to evaluate the expression to WHNF?
I did an experiment that seems to indicate it doesn't evaluate:
Prelude> data Test = Test Int Int
Prelude> let errorpr () = error "Fail"
Prelude> let makeTest f = let (x,y) = f () in Test x y
Prelude> let x = makeTest errorpr
Prelude> let Test z1 z2 = x
Prelude> :sprint z1
z1 = _
Prelude> :sprint z2
z2 = _
Prelude> :sprint x
x = _
I would have expected to either get an error or :sprint x to yield
x = Test _ _
but it didn't.
Apparently I didn't understand how "let" works. See the answers below
Prelude> let x = makeTest errorpr
Prelude> let Test z1 z2 = x
The last line does not force the evaluation of anything: patterns within let are (implicitly) lazy patterns (aka irrefutable patterns). Try instead
Prelude> let x = makeTest errorpr
Prelude> case x of Test z1 z2 -> "hello!"
Prelude> :sprint x
and you should observe something like Test _ _, since patterns in case are not lazy ones. By comparison,
Prelude> let x = makeTest errorpr
Prelude> case x of ~(Test z1 z2) -> "hello!" -- lazy pattern!
Prelude> :sprint x
should print just _, like when using let.
The above holds for data types. Instead, newtypes do not lift the internal type, but directly use the same representation. That is, newtype value construction and pattern-matching are no-ops at runtime: they are roughly erased by the compiler after type checking.
(As the possibility of having many constructors was mentioned, I presumed that the question referred to a data type. The answer is different if a newtype is used. See the comments.)
Yes, it will. Try for example:
data T = C Int
unT (C n) = 42
main = print $ unT undefined
and you will get an undefined exception, instead of 42.
But of course it depends on the pattern. If you replace the definition of unT with:
unT ~(C n) = 42
using a lazy pattern, the result will be 42. Also, note that a constructor pattern used in a let expression (e.g., let (C n) = something is also equivalent to a lazy pattern.
Regarding the rest of your question, makeTest generates a Test value whose parameters are not used. If they had been used, you'd get an error of course (e.g., had you used print instead of :sprint). The reason why x does not give you Test _ _ is because the let expression in line 5 corresponds to a lazy pattern (section 3.12 in the Haskell 98 Report). It's as if you had written let ~(Test z1 z2) = x. Therefore, x is never evaluated.

Haskell cartesian product of infinite lists

I want to generate a vectorspace from a basis pair, which looks something like:
genFromPair (e1, e2) = [x*e1 + y*e2 | x <- [0..], y <- [0..]]
When I examine the output though, it sems like I'm getting [0, e2, 2*e2,...] (i.e. x never gets above 0). Which sort of makes sense when I think about how I would write the code to do this list comprehension.
I wrote some code to take expanding "shells" from the origin (first the ints with norm 0, then with norm 1, then norm 2...) but this is kind of annoying and specific to Z^2 - I'd have to rewrite it for Z^3 or Z[i] etc. Is there a cleaner way of doing this?
The data-ordlist package has some functions which are extremely useful for working with sorted infinite lits. One of these is mergeAllBy, which combines an infinite list of infinite lists using some comparison function.
The idea is then to build an infinite list of lists such that y is fixed in each list, while x grows. As long as we can guarantee that each list is sorted, and that the heads of the lists are sorted, according to our ordering, we get a merged sorted list back.
Here's a quick example:
import Data.List.Ordered
import Data.Ord
genFromPair (e1, e2) = mergeAllBy (comparing norm) [[x.*e1 + y.*e2 | x <- [0..]] | y <- [0..]]
-- The rest just defines a simple vector type so we have something to play with
data Vec a = Vec a a
deriving (Eq, Show)
instance Num a => Num (Vec a) where
(Vec x1 y1) + (Vec x2 y2) = Vec (x1+x2) (y1+y2)
-- ...
s .* (Vec x y) = Vec (s*x) (s*y)
norm (Vec x y) = sqrt (x^2 + y^2)
Trying this in GHCi we get the expected result:
*Main> take 5 $ genFromPair (Vec 0 1, Vec 1 0)
[Vec 0.0 0.0,Vec 0.0 1.0,Vec 1.0 0.0,Vec 1.0 1.0,Vec 0.0 2.0]
You could look at your space as a tree. At the root of the tree one picks the first element and in its child you pick the second element..
Here's your tree defined using the ListTree package:
import Control.Monad.ListT
import Data.List.Class
import Data.List.Tree
import Prelude hiding (scanl)
infiniteTree :: ListT [] Integer
infiniteTree = repeatM [0..]
spacesTree :: ListT [] [Integer]
spacesTree = scanl (\xs x -> xs ++ [x]) [] infiniteTree
twoDimSpaceTree = genericTake 3 spacesTree
It's an infinite tree, but we could enumerate over it for example in DFS order:
ghci> take 10 (dfs twoDimSpaceTree)
[[],[0],[0,0],[0,1],[0,2],[0,3],[0,4],[0,5],[0,6],[0,7]]
The order you want, in tree-speak, is a variant of best-first-search for infinite trees, where one assumes that the children of tree nodes are sorted (you can't compare all the node's children as in normal best-first-search because there are infinitely many of those). Luckily, this variant is already implemented:
ghci> take 10 $ bestFirstSearchSortedChildrenOn sum $ genericTake 3 $ spacesTree
[[],[0],[0,0],[0,1],[1],[1,0],[1,1],[0,2],[2],[2,0]]
You can use any norm you like for your expanding shells, instead of sum above.
Using the diagonal snippet from CodeCatalog:
genFromPair (e1, e2) = diagonal [[x*e1 + y*e2 | x <- [0..]] | y <- [0..]]
diagonal :: [[a]] -> [a]
diagonal = concat . stripe
where
stripe [] = []
stripe ([]:xss) = stripe xss
stripe ((x:xs):xss) = [x] : zipCons xs (stripe xss)
zipCons [] ys = ys
zipCons xs [] = map (:[]) xs
zipCons (x:xs) (y:ys) = (x:y) : zipCons xs ys
Piggybacking on hammar's reply: His approach seems fairly easy to extend to higher dimensions:
Prelude> import Data.List.Ordered
Prelude Data.List.Ordered> import Data.Ord
Prelude Data.List.Ordered Data.Ord> let norm (x,y,z) = sqrt (fromIntegral x^2+fromIntegral y^2+fromIntegral z^2)
Prelude Data.List.Ordered Data.Ord> let mergeByNorm = mergeAllBy (comparing norm)
Prelude Data.List.Ordered Data.Ord> let sorted = mergeByNorm (map mergeByNorm [[[(x,y,z)| x <- [0..]] | y <- [0..]] | z <- [0..]])
Prelude Data.List.Ordered Data.Ord> take 20 sorted
[(0,0,0),(1,0,0),(0,1,0),(0,0,1),(1,1,0),(1,0,1),(0,1,1),(1,1,1),(2,0,0),(0,2,0),(0,0,2),(2,1,0),(1,2,0),(2,0,1),(0,2,1),(1,0,2),(0,1,2),(2,1,1),(1,2,1),(1,1,2)]

Resources