Would the ability to detect cyclic lists in Haskell break any properties of the language? - haskell

In Haskell, some lists are cyclic:
ones = 1 : ones
Others are not:
nums = [1..]
And then there are things like this:
more_ones = f 1 where f x = x : f x
This denotes the same value as ones, and certainly that value is a repeating sequence. But whether it's represented in memory as a cyclic data structure is doubtful. (An implementation could do so, but this answer explains that "it's unlikely that this will happen in practice".)
Suppose we take a Haskell implementation and hack into it a built-in function isCycle :: [a] -> Bool that examines the structure of the in-memory representation of the argument. It returns True if the list is physically cyclic and False if the argument is of finite length. Otherwise, it will fail to terminate. (I imagine "hacking it in" because it's impossible to write that function in Haskell.)
Would the existence of this function break any interesting properties of the language?

Would the existence of this function break any interesting properties of the language?
Yes it would. It would break referential transparency (see also the Wikipedia article). A Haskell expression can be always replaced by its value. In other words, it depends only on the passed arguments and nothing else. If we had
isCycle :: [a] -> Bool
as you propose, expressions using it would not satisfy this property any more. They could depend on the internal memory representation of values. In consequence, other laws would be violated. For example the identity law for Functor
fmap id === id
would not hold any more: You'd be able to distinguish between ones and fmap id ones, as the latter would be acyclic. And compiler optimizations such as applying the above law would not longer preserve program properties.
However another question would be having function
isCycleIO :: [a] -> IO Bool
as IO actions are allowed to examine and change anything.
A pure solution could be to have a data type that internally distinguishes the two:
import qualified Data.Foldable as F
data SmartList a = Cyclic [a] | Acyclic [a]
instance Functor SmartList where
fmap f (Cyclic xs) = Cyclic (map f xs)
fmap f (Acyclic xs) = Acyclic (map f xs)
instance F.Foldable SmartList where
foldr f z (Acyclic xs) = F.foldr f z xs
foldr f _ (Cyclic xs) = let r = F.foldr f r xs in r
Of course it wouldn't be able to recognize if a generic list is cyclic or not, but for many operations it'd be possible to preserve the knowledge of having Cyclic values.

In the general case, no you can't identify a cyclic list. However if the list is being generated by an unfold operation then you can. Data.List contains this:
unfoldr :: (b -> Maybe (a, b)) -> b -> [a]
The first argument is a function that takes a "state" argument of type "b" and may return an element of the list and a new state. The second argument is the initial state. "Nothing" means the list ends.
If the state ever recurs then the list will repeat from the point of the last state. So if we instead use a different unfold function that returns a list of (a, b) pairs we can inspect the state corresponding to each element. If the same state is seen twice then the list is cyclic. Of course this assumes that the state is an instance of Eq or something.

Related

Generate injective functions with QuickCheck?

I'm using QuickCheck to generate arbitrary functions, and I'd like to generate arbitrary injective functions (i.e. f a == f b if and only if a == b).
I thought I had it figured out:
newtype Injective = Injective (Fun Word Char) deriving Show
instance Arbitrary Injective where
arbitrary = fmap Injective fun
where
fun :: Gen (Fun Word Char)
fun = do
a <- arbitrary
b <- arbitrary
arbitrary `suchThat` \(Fn f) ->
(f a /= f b) || (a == b)
But I'm seeing cases where the generated function maps distinct inputs to the same output.
What I want:
f such that for all inputs a and b, either f a does not equal f b or a equals b.
What I think I have:
f such that there exist inputs a and b where either f a does not equal f b or a equals b.
How can I fix this?
You've correctly identified the problem: what you're generating is functions with the property ∃ a≠b. f a≠f b (which is readily true for most random functions anyway), whereas what you want is ∀ a≠b. f a≠f b. That is a much more difficult property to ensure, because you need to know about all the other function values for generating each individual one.
I don't think this is possible to ensure for general input types, however for word specifically what you can do is “fake” a function by precomputing all the output values sequentially, making sure that you don't repeat one that has already been done, and then just reading off from that predetermined chart. It requires a bit of laziness fu to actually get this working:
import qualified Data.Set as Set
newtype Injective = Injective ([Char] {- simply a list without duplicates -})
deriving Show
instance Arbitrary Injective where
arbitrary = Injective . lazyNub <$> arbitrary
lazyNub :: Ord a => [a] -> [a]
lazyNub = go Set.empty
where go _ [] = []
go forbidden (x:xs)
| x `Set.member` forbidden = go forbidden xs
| otherwise = x : go (Set.insert x forbidden) xs
This is not very efficient, and may well not be ok for your application, but it's probably the best you can do.
In practice, to actually use Injective as a function, you'll want to wrap the values in a suitable structure that has only O (log n) lookup time. Unfortunately, Data.Map.Lazy is not lazy enough, you may need to hand-bake something like a list of exponentially-growing maps.
There's also the concern that for some insufficiently big result types, it is just not possible to generate injective functions because there aren't enough values available. In fact as Joseph remarked, this is the case here. The lazyNub function will go into an infinite loop in this case. I'd say for a QuickCheck this is probably ok though.

Variable scope in a higher-order lambda function

In working through a solution to the 8 Queens problem, a person used the following line of code:
sameDiag try qs = any (\(colDist,q) -> abs (try - q) == colDist) $ zip [1..] qs
try is an an item; qs is a list of the same items.
Can someone explain how colDist and q in the lambda function get bound to anything?
How did try and q used in the body of lambda function find their way into the same scope?
To the degree this is a Haskell idiom, what problem does this design approach help solve?
The function any is a higher-order function that takes 2 arguments:
the 1st argument is of type a -> Bool, i.e. a function from a to Bool
the 2nd argument is of type [a], i.e. a list of items of type a;
i.e. the 1st argument is a function that takes any element from the list passed as the 2nd argument, and returns a Bool based on that element. (well it can take any values of type a, not just the ones in that list, but it's quite obviously certain that any won't be invoking it with some arbitrary values of a but the ones from the list.)
You can then simplify thinking about the original snippet by doing a slight refactoring:
sameDiag :: Int -> [Int] -> Bool
sameDiag try qs = any f xs
where
xs = zip [1..] qs
f = (\(colDist, q) -> abs (try - q) == colDist)
which can be transformed into
sameDiag :: Int -> [Int] -> Bool
sameDiag try qs = any f xs
where
xs = zip [1..] qs
f (colDist, q) = abs (try - q) == colDist)
which in turn can be transformed into
sameDiag :: Int -> [Int] -> Bool
sameDiag try qs = any f xs
where
xs = zip [1..] qs
f pair = abs (try - q) == colDist) where (colDist, q) = pair
(Note that sameDiag could also have a more general type Integral a => a -> [a] -> Bool rather than the current monomorphic one)
— so how does the pair in f pair = ... get bound to a value? well, simple: it's just a function; whoever calls it must pass along a value for the pair argument. — when calling any with the first argument set to f, it's the invocation of the function any who's doing the calling of f, with individual elements of the list xs passed in as values of the argument pair.
and, since the contents of xs is a list of pairs, it's OK to pass an individual pair from this list to f as f expects it to be just that.
EDIT: a further explanation of any to address the asker's comment:
Is this a fair synthesis? This approach to designing a higher-order function allows the invoking code to change how f behaves AND invoke the higher-order function with a list that requires additional processing prior to being used to invoke f for every element in the list. Encapsulating the list processing (in this case with zip) seems the right thing to do, but is the intent of this additional processing really clear in the original one-liner above?
There's really no additional processing done by any prior to invoking f. There is just very minimalistic bookkeeping in addition to simply iterating through the passed in list xs: invoking f on the elements during the iteration, and immediately breaking the iteration and returning True the first time f returns True for any list element.
Most of the behavior of any is "implicit" though in that it's taken care of by Haskell's lazy evaluation, basic language semantics as well as existing functions, which any is composed of (well at least my version of it below, any' — I haven't taken a look at the built-in Prelude version of any yet but I'm sure it's not much different; just probably more heavily optimised).
In fact, any is simple it's almost trivial to re-implement it with a one liner on a GHCi prompt:
Prelude> let any' f xs = or (map f xs)
let's see now what GHC computes as its type:
Prelude> :t any'
any' :: (a -> Bool) -> [a] -> Bool
— same as the built-in any. So let's give it some trial runs:
Prelude> any' odd [1, 2, 3] -- any odd values in the list?
True
Prelude> any' even [1, 3] -- any even ones?
False
Prelude> let adult = (>=18)
Prelude> any' adult [17, 17, 16, 15, 17, 18]
— see how you can sometimes write code that almost looks like English with higher-order functions?
zip :: [a] -> [b] -> [(a,b)] takes two lists and joins them into pairs, dropping any remaining at the end.
any :: (a -> Bool) -> [a] -> Bool takes a function and a list of as and then returns True if any of the values returned true or not.
So colDist and q are the first and second elements of the pairs in the list made by zip [1..] qs, and they are bound when they are applied to the pair by any.
q is only bound within the body of the lambda function - this is the same as with lambda calculus. Since try was bound before in the function definition, it is still available in this inner scope. If you think of lambda calculus, the term \x.\y.x+y makes sense, despite the x and the y being bound at different times.
As for the design approach, this approach is much cleaner than trying to iterate or recurse through the list manually. It seems quite clear in its intentions to me (with respect to the larger codebase it comes from).

How to implement delete with foldr in Haskell

I've been studying folds for the past few days. I can implement simple functions with them, like length, concat and filter. What I'm stuck at is trying to implement with foldr functions like delete, take and find. I have implemented these with explicit recursion but it doesn't seem obvious to me how to convert these types of functions to right folds.
I have studied the tutorials by Graham Hutton and Bernie Pope. Imitating Hutton's dropWhile, I was able to implement delete with foldr but it fails on infinite lists.
From reading Implement insert in haskell with foldr, How can this function be written using foldr? and Implementing take using foldr, it would seem that I need to use foldr to generate a function which then does something. But I don't really understand these solutions and don't have an idea how to implement for example delete this way.
Could you explain to me a general strategy for implementing with foldr lazy versions of functions like the ones I mentioned. Maybe you could also implement delete as an example since this probably is one of the easiest.
I'm looking for a detailed explanation that a beginner can understand. I'm not interested in just solutions, I want to develop an understanding so I can come up with solutions to similar problems myself.
Thanks.
Edit: At the moment of writing there is one useful answer but it's not quite what I was looking for. I'm more interested in an approach that uses foldr to generate a function, which then does something. The links in my question have examples of this. I don't quite understand those solutions so I would like to have more information on this approach.
delete is a modal search. It has two different modes of operation - whether it's already found the result or not. You can use foldr to construct a function that passes the state down the line as each element is checked. So in the case of delete, the state can be a simple Bool. It's not exactly the best type, but it will do.
Once you have identified the state type, you can start working on the foldr construction. I'm going to walk through figuring it out the way I did. I'll be enabling ScopedTypeVariables just so I can annotate the type of subexpressions better. One you know the state type, you know you want foldr to generate a function taking a value of that type, and returning a value of the desired final type. That's enough to start sketching things.
{-# LANGUAGE ScopedTypeVariables #-}
delete :: forall a. Eq a => a -> [a] -> [a]
delete a xs = foldr f undefined xs undefined
where
f :: a -> (Bool -> [a]) -> (Bool -> [a])
f x g = undefined
It's a start. The exact meaning of g is a little bit tricky here. It's actually the function for processing the rest of the list. It's accurate to look at it as a continuation, in fact. It absolutely represents performing the rest of the folding, with your whatever state you choose to pass along. Given that, it's time to figure out what to put in some of those undefined places.
{-# LANGUAGE ScopedTypeVariables #-}
delete :: forall a. Eq a => a -> [a] -> [a]
delete a xs = foldr f undefined xs undefined
where
f :: a -> (Bool -> [a]) -> (Bool -> [a])
f x g found | x == a && not found = g True
| otherwise = x : g found
That seems relatively straightforward. If the current element is the one being searched for, and it hasn't yet been found, don't output it, and continue with the state set to True, indicating it's been found. otherwise, output the current value and continue with the current state. This just leaves the rest of the arguments to foldr. The last one is the initial state. The other one is the state function for an empty list. Ok, those aren't too bad either.
{-# LANGUAGE ScopedTypeVariables #-}
delete :: forall a. Eq a => a -> [a] -> [a]
delete a xs = foldr f (const []) xs False
where
f :: a -> (Bool -> [a]) -> (Bool -> [a])
f x g found | x == a && not found = g True
| otherwise = x : g found
No matter what the state is, produce an empty list when an empty list is encountered. And the initial state is that the element being searched for has not yet been found.
This technique is also applicable in other cases. For instance, foldl can be written as a foldr this way. If you look at foldl as a function that repeatedly transforms an initial accumulator, you can guess that's the function being produced - how to transform the initial value.
{-# LANGUAGE ScopedTypeVariables #-}
foldl :: forall a b. (a -> b -> a) -> a -> [b] -> a
foldl f z xs = foldr g id xs z
where
g :: b -> (a -> a) -> (a -> a)
g x cont acc = undefined
The base cases aren't too tricky to find when the problem is defined as manipulating the initial accumulator, named z there. The empty list is the identity transformation, id, and the value passed to the created function is z.
The implementation of g is trickier. It can't just be done blindly on types, because there are two different implementations that use all the expected values and type-check. This is a case where types aren't enough, and you need to consider the meanings of the functions available.
Let's start with an inventory of the values that seem like they should be used, and their types. The things that seem like they must need to be used in the body of g are f :: a -> b -> a, x :: b, cont :: (a -> a), and acc :: a. f will obviously take x as its second argument, but there's a question of the appropriate place to use cont. To figure out where it goes, remember that it represents the transformation function returned by processing the rest of the list, and that foldl processes the current element and then passes the result of that processing to the rest of the list.
{-# LANGUAGE ScopedTypeVariables #-}
foldl :: forall a b. (a -> b -> a) -> a -> [b] -> a
foldl f z xs = foldr g id xs z
where
g :: b -> (a -> a) -> (a -> a)
g x cont acc = cont $ f acc x
This also suggests that foldl' can be written this way with only one tiny change:
{-# LANGUAGE ScopedTypeVariables #-}
foldl' :: forall a b. (a -> b -> a) -> a -> [b] -> a
foldl' f z xs = foldr g id xs z
where
g :: b -> (a -> a) -> (a -> a)
g x cont acc = cont $! f acc x
The difference is that ($!) is used to suggest evaluation of f acc x before it's passed to cont. (I say "suggest" because there are some edge cases where ($!) doesn't force evaluation even as far as WHNF.)
delete doesn't operate on the entire list evenly. The structure of the computation isn't just considering the whole list one element at a time. It differs after it hits the element it's looking for. This tells you it can't be implemented as just a foldr. There will have to be some sort of post-processing involved.
When that happens, the general pattern is that you build a pair of values and just take one of them at completion of the foldr. That's probably what you did when you imitated Hutton's dropWhile, though I'm not sure since you didn't include code. Something like this?
delete :: Eq a => a -> [a] -> [a]
delete a = snd . foldr (\x (xs1, xs2) -> if x == a then (x:xs1, xs1) else (x:xs1, x:xs2)) ([], [])
The main idea is that xs1 is always going to be the full tail of the list, and xs2 is the result of the delete over the tail of the list. Since you only want to remove the first element that matches, you don't want to use the result of delete over the tail when you do match the value you're searching for, you just want to return the rest of the list unchanged - which fortunately is what's always going to be in xs1.
And yeah, that doesn't work on infinite lists - but only for one very specific reason. The lambda is too strict. foldr only works on infinite lists when the function it is provided doesn't always force evaluation of its second argument, and that lambda does always force evaluation of its second argument in the pattern match on the pair. Switching to an irrefutable pattern match fixes that, by allowing the lambda to produce a constructor before ever examining its second argument.
delete :: Eq a => a -> [a] -> [a]
delete a = snd . foldr (\x ~(xs1, xs2) -> if x == a then (x:xs1, xs1) else (x:xs1, x:xs2)) ([], [])
That's not the only way to get that result. Using a let-binding or fst and snd as accessors on the tuple would also do the job. But it is the change with the smallest diff.
The most important takeaway here is to be very careful with handling the second argument to the reducing function you pass to foldr. You want to defer examining the second argument whenever possible, so that the foldr can stream lazily in as many cases as possible.
If you look at that lambda, you see that the branch taken is chosen before doing anything with the second argument to the reducing function. Furthermore, you'll see that most of the time, the reducing function produces a list constructor in both halves of the result tuple before it ever needs to evaluate the second argument. Since those list constructors are what make it out of delete, they are what matter for streaming - so long as you don't let the pair get in the way. And making the pattern-match on the pair irrefutable is what keeps it out of the way.
As a bonus example of the streaming properties of foldr, consider my favorite example:
dropWhileEnd :: (a -> Bool) -> [a] -> [a]
dropWhileEnd p = foldr (\x xs -> if p x && null xs then [] else x:xs) []
It streams - as much as it can. If you figure out exactly when and why it does and doesn't stream, you'll understand pretty much every detail of the streaming structure of foldr.
here is a simple delete, implemented with foldr:
delete :: (Eq a) => a -> [a] -> [a]
delete a xs = foldr (\x xs -> if x == a then (xs) else (x:xs)) [] xs

Type-safe difference lists

A common idiom in Haskell, difference lists, is to represent a list xs as the value (xs ++). Then (.) becomes "(++)" and id becomes "[]" (in fact this works for any monoid or category). Since we can compose functions in constant time, this gives us a nice way to efficiently build up lists by appending.
Unfortunately the type [a] -> [a] is way bigger than the type of functions of the form (xs ++) -- most functions on lists do something other than prepend to their argument.
One approach around this (as used in dlist) is to make a special DList type with a smart constructor. Another approach (as used in ShowS) is to not enforce the constraint anywhere and hope for the best. But is there a nice way of keeping all the desired properties of difference lists while using a type that's exactly the right size?
Yes!
We can view [a] as a free monad instance Free ((,) a) ().
Thus we can apply the scheme described by Edward Kmett in Free Monads for Less.
The type we'll get is
newtype F a = F { runF :: forall r. (() -> r) -> ((a, r) -> r) -> r }
or simply
newtype F a = F { runF :: forall r. r -> (a -> r -> r) -> r }
So runF is nothing else than the foldr function for our list!
This is called the Boehm-Berarducci encoding and it's isomorphic to the original data type (list) — so this is as small as you can possibly get.
Will Ness says:
So this type is still too "wide", it allows more than just prefixing - doesn't constrain the g function argument.
If I understood his argument correctly, he points out that you can apply the foldr (or runF) function to something different from [] and (:).
But I never claimed that you can use foldr-encoding only for concatenation. Indeed, as this name implies, you can use it to calculate any fold — and that's what Will Ness demonstrated.
It may become more clear if you forget for a moment that there's one true list type, [a]. There may be lots of list types — e.g. I can define one by
data List a = Nil | Cons a (List a)
It's be different from, but isomorphic to [a].
The foldr-encoding above is just yet another encoding of lists, like List a or [a]. It is also isomorphic to [a], as evidenced by functions \l -> F (\a f -> foldr a f l) and \x -> runF [] (:) and the fact that their compositions (in either order) is identity. But you are not obliged to convert to [a] — you can convert to List directly, using \x -> runF x Nil Cons.
The important point is that F a doesn't contain an element that is not the foldr functions for some list — nor does it contain an element that is the foldr functions for more than one list (obviously).
Thus, it doesn't contain too few or too many elements — it contains precisely as many elements as needed to exactly represent all lists.
This is not true of the difference list encoding — for example, the reverse function is not an append operation for any list.

Evaluation strategy

How should one reason about function evaluation in examples like the following in Haskell:
let f x = ...
x = ...
in map (g (f x)) xs
In GHC, sometimes (f x) is evaluated only once, and sometimes once for each element in xs, depending on what exactly f and g are. This can be important when f x is an expensive computation. It has just tripped a Haskell beginner I was helping and I didn't know what to tell him other than that it is up to the compiler. Is there a better story?
Update
In the following example (f x) will be evaluated 4 times:
let f x = trace "!" $ zip x x
x = "abc"
in map (\i -> lookup i (f x)) "abcd"
With language extensions, we can create situations where f x must be evaluated repeatedly:
{-# LANGUAGE GADTs, Rank2Types #-}
module MultiEvG where
data BI where
B :: (Bounded b, Integral b) => b -> BI
foo :: [BI] -> [Integer]
foo xs = let f :: (Integral c, Bounded c) => c -> c
f x = maxBound - x
g :: (forall a. (Integral a, Bounded a) => a) -> BI -> Integer
g m (B y) = toInteger (m + y)
x :: (Integral i) => i
x = 3
in map (g (f x)) xs
The crux is to have f x polymorphic even as the argument of g, and we must create a situation where the type(s) at which it is needed can't be predicted (my first stab used an Either a b instead of BI, but when optimising, that of course led to only two evaluations of f x at most).
A polymorphic expression must be evaluated at least once for each type it is used at. That's one reason for the monomorphism restriction. However, when the range of types it can be needed at is restricted, it is possible to memoise the values at each type, and in some circumstances GHC does that (needs optimising, and I expect the number of types involved mustn't be too large). Here we confront it with what is basically an inhomogeneous list, so in each invocation of g (f x), it can be needed at an arbitrary type satisfying the constraints, so the computation cannot be lifted outside the map (technically, the compiler could still build a cache of the values at each used type, so it would be evaluated only once per type, but GHC doesn't, in all likelihood it wouldn't be worth the trouble).
Monomorphic expressions need only be evaluated once, they can be shared. Whether they are is up to the implementation; by purity, it doesn't change the semantics of the programme. If the expression is bound to a name, in practice you can rely on it being shared, since it's easy and obviously what the programmer wants. If it isn't bound to a name, it's a question of optimisation. With the bytecode generator or without optimisations, the expression will often be evaluated repeatedly, but with optimisations repeated evaluation would indicate a compiler bug.
Polymorphic expressions must be evaluated at least once for every type they're used at, but with optimisations, when GHC can see that it may be used multiple times at the same type, it will (usually) still be shared for that type during a larger computation.
Bottom line: Always compile with optimisations, help the compiler by binding expressions you want shared to a name, and give monomorphic type signatures where possible.
Your examples are indeed quite different.
In the first example, the argument to map is g (f x) and is passed once to map most likely as partially applied function.
Should g (f x), when applied to an argument within map evaluate its first argument, then this will be done only once and then the thunk (f x) will be updated with the result.
Hence, in your first example, f xwill be evaluated at most 1 time.
Your second example requires a deeper analysis before the compiler can arrive at the conclusion that (f x) is always constant in the lambda expression. Perhaps it will never optimize it at all, because it may have knowledge that trace is not quite kosher. So, this may evaluate 4 times when tracing, and 4 times or 1 time when not tracing.
This is really dependent on GHC's optimizations, as you've been able to tell.
The best thing to do is to study the GHC core that you get after optimizing the program. I would look at the generated Core and examine whether f x had its own let statement outside the map or not.
If you want to be sure, then you should factor f x out into its own variable assigned in a let, but there's not really a guaranteed way to figure it out other than reading through Core.
All that said, with the exception of things like trace that use unsafePerformIO, this will never change the semantics of your program: how it actually behaves.
In GHC without optimizations, the body of a function is evaluated every time the function is called. (A "call" means the function is applied to arguments and the result is evaluated.) In the following example, f x is inside a function, so it will execute each time the function is called.
(GHC may optimize this expression as discussed in the FAQ [1].)
let f x = trace "!" $ zip x x
x = "abc"
in map (\i -> lookup i (f x)) "abcd"
However, if we move f x out of the function, it will execute only once.
let f x = trace "!" $ zip x x
x = "abc"
in map ((\f_x i -> lookup i f_x) (f x)) "abcd"
This can be rewritten more readably as
let f x = trace "!" $ zip x x
x = "abc"
g f_x i = lookup i f_x
in map (g (f x)) "abcd"
The general rule is that, each time a function is applied to an argument, a new "copy" of the function body is created. Function application is the only thing that may cause an expression to re-execute. However, be warned that some functions and function calls do not look like functions syntactically.
[1] http://www.haskell.org/haskellwiki/GHC/FAQ#Subexpression_Elimination

Resources