Generate injective functions with QuickCheck? - haskell

I'm using QuickCheck to generate arbitrary functions, and I'd like to generate arbitrary injective functions (i.e. f a == f b if and only if a == b).
I thought I had it figured out:
newtype Injective = Injective (Fun Word Char) deriving Show
instance Arbitrary Injective where
arbitrary = fmap Injective fun
where
fun :: Gen (Fun Word Char)
fun = do
a <- arbitrary
b <- arbitrary
arbitrary `suchThat` \(Fn f) ->
(f a /= f b) || (a == b)
But I'm seeing cases where the generated function maps distinct inputs to the same output.
What I want:
f such that for all inputs a and b, either f a does not equal f b or a equals b.
What I think I have:
f such that there exist inputs a and b where either f a does not equal f b or a equals b.
How can I fix this?

You've correctly identified the problem: what you're generating is functions with the property ∃ a≠b. f a≠f b (which is readily true for most random functions anyway), whereas what you want is ∀ a≠b. f a≠f b. That is a much more difficult property to ensure, because you need to know about all the other function values for generating each individual one.
I don't think this is possible to ensure for general input types, however for word specifically what you can do is “fake” a function by precomputing all the output values sequentially, making sure that you don't repeat one that has already been done, and then just reading off from that predetermined chart. It requires a bit of laziness fu to actually get this working:
import qualified Data.Set as Set
newtype Injective = Injective ([Char] {- simply a list without duplicates -})
deriving Show
instance Arbitrary Injective where
arbitrary = Injective . lazyNub <$> arbitrary
lazyNub :: Ord a => [a] -> [a]
lazyNub = go Set.empty
where go _ [] = []
go forbidden (x:xs)
| x `Set.member` forbidden = go forbidden xs
| otherwise = x : go (Set.insert x forbidden) xs
This is not very efficient, and may well not be ok for your application, but it's probably the best you can do.
In practice, to actually use Injective as a function, you'll want to wrap the values in a suitable structure that has only O (log n) lookup time. Unfortunately, Data.Map.Lazy is not lazy enough, you may need to hand-bake something like a list of exponentially-growing maps.
There's also the concern that for some insufficiently big result types, it is just not possible to generate injective functions because there aren't enough values available. In fact as Joseph remarked, this is the case here. The lazyNub function will go into an infinite loop in this case. I'd say for a QuickCheck this is probably ok though.

Related

Haskell's (<-) in Terms of the Natural Transformations of Monad

So I'm playing around with the hasbolt module in GHCi and I had a curiosity about some desugaring. I've been connecting to a Neo4j database by creating a pipe as follows
ghci> pipe <- connect $ def {credentials}
and that works just fine. However, I'm wondering what the type of the (<-) operator is (GHCi won't tell me). Most desugaring explanations describe that
do x <- a
return x
desugars to
a >>= (\x -> return x)
but what about just the line x <- a?
It doesn't help me to add in the return because I want pipe :: Pipe not pipe :: Control.Monad.IO.Class.MonadIO m => m Pipe, but (>>=) :: Monad m => m a -> (a -> m b) -> m b so trying to desugar using bind and return/pure doesn't work without it.
Ideally it seems like it'd be best to just make a Comonad instance to enable using extract :: Monad m => m a -> a as pipe = extract $ connect $ def {creds} but it bugs me that I don't understand (<-).
Another oddity is that, treating (<-) as haskell function, it's first argument is an out-of-scope variable, but that wouldn't mean that
(<-) :: a -> m b -> b
because not just anything can be used as a free variable. For instance, you couldn't bind the pipe to a Num type or a Bool. The variable has to be a "String"ish thing, except it never is actually a String; and you definitely can't try actually binding to a String. So it seems as if it isn't a haskell function in the usual sense (unless there is a class of functions that take values from the free variable namespace... unlikely). So what is (<-) exactly? Can it be replaced entirely by using extract? Is that the best way to desugar/circumvent it?
I'm wondering what the type of the (<-) operator is ...
<- doesn't have a type, it's part of the syntax of do notation, which as you know is converted to sequences of >>= and return during a process called desugaring.
but what about just the line x <- a ...?
That's a syntax error in normal haskell code and the compiler would complain. The reason the line:
ghci> pipe <- connect $ def {credentials}
works in ghci is that the repl is a sort of do block; you can think of each entry as a line in your main function (it's a bit more hairy than that, but that's a good approximation). That's why you need (until recently) to say let foo = bar in ghci to declare a binding as well.
Ideally it seems like it'd be best to just make a Comonad instance to enable using extract :: Monad m => m a -> a as pipe = extract $ connect $ def {creds} but it bugs me that I don't understand (<-).
Comonad has nothing to do with Monads. In fact, most Monads don't have any valid Comonad instance. Consider the [] Monad:
instance Monad [a] where
return x = [x]
xs >>= f = concat (map f xs)
If we try to write a Comonad instance, we can't define extract :: m a -> a
instance Comonad [a] where
extract (x:_) = x
extract [] = ???
This tells us something interesting about Monads, namely that we can't write a general function with the type Monad m => m a -> a. In other words, we can't "extract" a value from a Monad without additional knowledge about it.
So how does the do-notation syntax do {x <- [1,2,3]; return [x,x]} work?
Since <- is actually just syntax sugar, just like how [1,2,3] actually means 1 : 2 : 3 : [], the above expression actually means [1,2,3] >>= (\x -> return [x,x]), which in turn evaluates to concat (map (\x -> [[x,x]]) [1,2,3])), which comes out to [1,1,2,2,3,3].
Notice how the arrow transformed into a >>= and a lambda. This uses only built-in (in the typeclass) Monad functions, so it works for any Monad in general.
We can pretend to extract a value by using (>>=) :: Monad m => m a -> (a -> m b) -> m b and working with the "extracted" a inside the function we provide, like in the lambda in the list example above. However, it is impossible to actually get a value out of a Monad in a generic way, which is why the return type of >>= is m b (in the Monad)
So what is (<-) exactly? Can it be replaced entirely by using extract? Is that the best way to desugar/circumvent it?
Note that the do-block <- and extract mean very different things even for types that have both Monad and Comonad instances. For instance, consider non-empty lists. They have instances of both Monad (which is very much like the usual one for lists) and Comonad (with extend/=>> applying a function to all suffixes of the list). If we write a do-block such as...
import qualified Data.List.NonEmpty as N
import Data.List.NonEmpty (NonEmpty(..))
import Data.Function ((&))
alternating :: NonEmpty Integer
alternating = do
x <- N.fromList [1..6]
-x :| [x]
... the x in x <- N.fromList [1..6] stands for the elements of the non-empty list; however, this x must be used to build a new list (or, more generally, to set up a new monadic computation). That, as others have explained, reflects how do-notation is desugared. It becomes easier to see if we make the desugared code look like the original one:
alternating :: NonEmpty Integer
alternating =
N.fromList [1..6] >>= \x ->
-x :| [x]
GHCi> alternating
-1 :| [1,-2,2,-3,3,-4,4,-5,5,-6,6]
The lines below x <- N.fromList [1..6] in the do-block amount to the body of a lambda. x <- in isolation is therefore akin to a lambda without body, which is not a meaningful thing.
Another important thing to note is that x in the do-block above does not correspond to any one single Integer, but rather to all Integers in the list. That already gives away that <- does not correspond to an extraction function. (With other monads, the x might even correspond to no values at all, as in x <- Nothing or x <- []. See also Lazersmoke's answer.)
On the other hand, extract does extract a single value, with no ifs or buts...
GHCi> extract (N.fromList [1..6])
1
... however, it is really a single value: the tail of the list is discarded. If we want to use the suffixes of the list, we need extend/(=>>)...
GHCi> N.fromList [1..6] =>> product =>> sum
1956 :| [1236,516,156,36,6]
If we had a co-do-notation for comonads (cf. this package and the links therein), the example above might get rewritten as something in the vein of:
-- codo introduces a function: x & f = f x
N.fromList [1..6] & codo xs -> do
ys <- product xs
sum ys
The statements would correspond to plain values; the bound variables (xs and ys), to comonadic values (in this case, to list suffixes). That is exactly the opposite of what we have with monadic do-blocks. All in all, as far as your question is concerned, switching to comonads just swaps which things we can't refer to outside of the context of a computation.

Would the ability to detect cyclic lists in Haskell break any properties of the language?

In Haskell, some lists are cyclic:
ones = 1 : ones
Others are not:
nums = [1..]
And then there are things like this:
more_ones = f 1 where f x = x : f x
This denotes the same value as ones, and certainly that value is a repeating sequence. But whether it's represented in memory as a cyclic data structure is doubtful. (An implementation could do so, but this answer explains that "it's unlikely that this will happen in practice".)
Suppose we take a Haskell implementation and hack into it a built-in function isCycle :: [a] -> Bool that examines the structure of the in-memory representation of the argument. It returns True if the list is physically cyclic and False if the argument is of finite length. Otherwise, it will fail to terminate. (I imagine "hacking it in" because it's impossible to write that function in Haskell.)
Would the existence of this function break any interesting properties of the language?
Would the existence of this function break any interesting properties of the language?
Yes it would. It would break referential transparency (see also the Wikipedia article). A Haskell expression can be always replaced by its value. In other words, it depends only on the passed arguments and nothing else. If we had
isCycle :: [a] -> Bool
as you propose, expressions using it would not satisfy this property any more. They could depend on the internal memory representation of values. In consequence, other laws would be violated. For example the identity law for Functor
fmap id === id
would not hold any more: You'd be able to distinguish between ones and fmap id ones, as the latter would be acyclic. And compiler optimizations such as applying the above law would not longer preserve program properties.
However another question would be having function
isCycleIO :: [a] -> IO Bool
as IO actions are allowed to examine and change anything.
A pure solution could be to have a data type that internally distinguishes the two:
import qualified Data.Foldable as F
data SmartList a = Cyclic [a] | Acyclic [a]
instance Functor SmartList where
fmap f (Cyclic xs) = Cyclic (map f xs)
fmap f (Acyclic xs) = Acyclic (map f xs)
instance F.Foldable SmartList where
foldr f z (Acyclic xs) = F.foldr f z xs
foldr f _ (Cyclic xs) = let r = F.foldr f r xs in r
Of course it wouldn't be able to recognize if a generic list is cyclic or not, but for many operations it'd be possible to preserve the knowledge of having Cyclic values.
In the general case, no you can't identify a cyclic list. However if the list is being generated by an unfold operation then you can. Data.List contains this:
unfoldr :: (b -> Maybe (a, b)) -> b -> [a]
The first argument is a function that takes a "state" argument of type "b" and may return an element of the list and a new state. The second argument is the initial state. "Nothing" means the list ends.
If the state ever recurs then the list will repeat from the point of the last state. So if we instead use a different unfold function that returns a list of (a, b) pairs we can inspect the state corresponding to each element. If the same state is seen twice then the list is cyclic. Of course this assumes that the state is an instance of Eq or something.

Type-safe difference lists

A common idiom in Haskell, difference lists, is to represent a list xs as the value (xs ++). Then (.) becomes "(++)" and id becomes "[]" (in fact this works for any monoid or category). Since we can compose functions in constant time, this gives us a nice way to efficiently build up lists by appending.
Unfortunately the type [a] -> [a] is way bigger than the type of functions of the form (xs ++) -- most functions on lists do something other than prepend to their argument.
One approach around this (as used in dlist) is to make a special DList type with a smart constructor. Another approach (as used in ShowS) is to not enforce the constraint anywhere and hope for the best. But is there a nice way of keeping all the desired properties of difference lists while using a type that's exactly the right size?
Yes!
We can view [a] as a free monad instance Free ((,) a) ().
Thus we can apply the scheme described by Edward Kmett in Free Monads for Less.
The type we'll get is
newtype F a = F { runF :: forall r. (() -> r) -> ((a, r) -> r) -> r }
or simply
newtype F a = F { runF :: forall r. r -> (a -> r -> r) -> r }
So runF is nothing else than the foldr function for our list!
This is called the Boehm-Berarducci encoding and it's isomorphic to the original data type (list) — so this is as small as you can possibly get.
Will Ness says:
So this type is still too "wide", it allows more than just prefixing - doesn't constrain the g function argument.
If I understood his argument correctly, he points out that you can apply the foldr (or runF) function to something different from [] and (:).
But I never claimed that you can use foldr-encoding only for concatenation. Indeed, as this name implies, you can use it to calculate any fold — and that's what Will Ness demonstrated.
It may become more clear if you forget for a moment that there's one true list type, [a]. There may be lots of list types — e.g. I can define one by
data List a = Nil | Cons a (List a)
It's be different from, but isomorphic to [a].
The foldr-encoding above is just yet another encoding of lists, like List a or [a]. It is also isomorphic to [a], as evidenced by functions \l -> F (\a f -> foldr a f l) and \x -> runF [] (:) and the fact that their compositions (in either order) is identity. But you are not obliged to convert to [a] — you can convert to List directly, using \x -> runF x Nil Cons.
The important point is that F a doesn't contain an element that is not the foldr functions for some list — nor does it contain an element that is the foldr functions for more than one list (obviously).
Thus, it doesn't contain too few or too many elements — it contains precisely as many elements as needed to exactly represent all lists.
This is not true of the difference list encoding — for example, the reverse function is not an append operation for any list.

Evaluation strategy

How should one reason about function evaluation in examples like the following in Haskell:
let f x = ...
x = ...
in map (g (f x)) xs
In GHC, sometimes (f x) is evaluated only once, and sometimes once for each element in xs, depending on what exactly f and g are. This can be important when f x is an expensive computation. It has just tripped a Haskell beginner I was helping and I didn't know what to tell him other than that it is up to the compiler. Is there a better story?
Update
In the following example (f x) will be evaluated 4 times:
let f x = trace "!" $ zip x x
x = "abc"
in map (\i -> lookup i (f x)) "abcd"
With language extensions, we can create situations where f x must be evaluated repeatedly:
{-# LANGUAGE GADTs, Rank2Types #-}
module MultiEvG where
data BI where
B :: (Bounded b, Integral b) => b -> BI
foo :: [BI] -> [Integer]
foo xs = let f :: (Integral c, Bounded c) => c -> c
f x = maxBound - x
g :: (forall a. (Integral a, Bounded a) => a) -> BI -> Integer
g m (B y) = toInteger (m + y)
x :: (Integral i) => i
x = 3
in map (g (f x)) xs
The crux is to have f x polymorphic even as the argument of g, and we must create a situation where the type(s) at which it is needed can't be predicted (my first stab used an Either a b instead of BI, but when optimising, that of course led to only two evaluations of f x at most).
A polymorphic expression must be evaluated at least once for each type it is used at. That's one reason for the monomorphism restriction. However, when the range of types it can be needed at is restricted, it is possible to memoise the values at each type, and in some circumstances GHC does that (needs optimising, and I expect the number of types involved mustn't be too large). Here we confront it with what is basically an inhomogeneous list, so in each invocation of g (f x), it can be needed at an arbitrary type satisfying the constraints, so the computation cannot be lifted outside the map (technically, the compiler could still build a cache of the values at each used type, so it would be evaluated only once per type, but GHC doesn't, in all likelihood it wouldn't be worth the trouble).
Monomorphic expressions need only be evaluated once, they can be shared. Whether they are is up to the implementation; by purity, it doesn't change the semantics of the programme. If the expression is bound to a name, in practice you can rely on it being shared, since it's easy and obviously what the programmer wants. If it isn't bound to a name, it's a question of optimisation. With the bytecode generator or without optimisations, the expression will often be evaluated repeatedly, but with optimisations repeated evaluation would indicate a compiler bug.
Polymorphic expressions must be evaluated at least once for every type they're used at, but with optimisations, when GHC can see that it may be used multiple times at the same type, it will (usually) still be shared for that type during a larger computation.
Bottom line: Always compile with optimisations, help the compiler by binding expressions you want shared to a name, and give monomorphic type signatures where possible.
Your examples are indeed quite different.
In the first example, the argument to map is g (f x) and is passed once to map most likely as partially applied function.
Should g (f x), when applied to an argument within map evaluate its first argument, then this will be done only once and then the thunk (f x) will be updated with the result.
Hence, in your first example, f xwill be evaluated at most 1 time.
Your second example requires a deeper analysis before the compiler can arrive at the conclusion that (f x) is always constant in the lambda expression. Perhaps it will never optimize it at all, because it may have knowledge that trace is not quite kosher. So, this may evaluate 4 times when tracing, and 4 times or 1 time when not tracing.
This is really dependent on GHC's optimizations, as you've been able to tell.
The best thing to do is to study the GHC core that you get after optimizing the program. I would look at the generated Core and examine whether f x had its own let statement outside the map or not.
If you want to be sure, then you should factor f x out into its own variable assigned in a let, but there's not really a guaranteed way to figure it out other than reading through Core.
All that said, with the exception of things like trace that use unsafePerformIO, this will never change the semantics of your program: how it actually behaves.
In GHC without optimizations, the body of a function is evaluated every time the function is called. (A "call" means the function is applied to arguments and the result is evaluated.) In the following example, f x is inside a function, so it will execute each time the function is called.
(GHC may optimize this expression as discussed in the FAQ [1].)
let f x = trace "!" $ zip x x
x = "abc"
in map (\i -> lookup i (f x)) "abcd"
However, if we move f x out of the function, it will execute only once.
let f x = trace "!" $ zip x x
x = "abc"
in map ((\f_x i -> lookup i f_x) (f x)) "abcd"
This can be rewritten more readably as
let f x = trace "!" $ zip x x
x = "abc"
g f_x i = lookup i f_x
in map (g (f x)) "abcd"
The general rule is that, each time a function is applied to an argument, a new "copy" of the function body is created. Function application is the only thing that may cause an expression to re-execute. However, be warned that some functions and function calls do not look like functions syntactically.
[1] http://www.haskell.org/haskellwiki/GHC/FAQ#Subexpression_Elimination

Applying a function that may fail to all values in a list

I want to apply a function f to a list of values, however function f might randomly fail (it is in effect making a call out to a service in the cloud).
I thought I'd want to use something like map, but I want to apply the function to all elements in the list and afterwards, I want to know which ones failed and which were successful.
Currently I am wrapping the response objects of the function f with an error pair which I could then effectively unzip afterwards
i.e. something like
g : (a->b) -> a -> [ b, errorBoolean]
f : a-> b
and then to run the code ... map g (xs)
Is there a better way to do this? The other alternative approach was to iterate over the values in the array and then return a pair of arrays, one which listed the successful values and one which listed the failures. To me, this seems to be something that ought to be fairly common. Alternatively I could return some special value. What's the best practice in dealing with this??
If f is making a call out to the cloud, than f is undoubtedly using some monad, probably the IO monad or a monad derived from the IO monad. There are monadic versions of map. Here is what you would typically do, as a first attempt:
f :: A -> IO B -- defined elsewhere
g :: [A] -> IO [B]
g xs = mapM f xs
-- or, in points-free style:
g = mapM f
This has the (possibly) undesirable property that g will fail, returning no values, if any call to f fails. We fix that by making it so f returns either an answer or an error message.
type Error = String
f :: A -> IO (Either Error B)
g :: [A] -> IO [Either Error B]
g = mapM f
If you want all of the errors to be returned together, and all of the successes clumped together, you can use the lefts and rights functions from Data.Either.
h :: [A] -> IO ([B], [Error])
h xs = do ys <- g xs
return (rights ys, lefts ys)
If you don't need the error messages, just use Maybe B instead of Either Error B.
The Either data type is the most common way to represent a value which can either result in an error or a correct value. Errors use the Left constructor, correct values use the Right constructor. As a bonus, "right" also means "correct" in English, but the reason that the correct value uses the Right constructor is actually deeper (because this means we can create a functor out of the Either type which modifies correct results, which is not possible over the Left constructor).
You could write your g to return a Maybe monad:
f: a -> b
g: (a -> b) -> a -> Maybe b
If f fails, g returns Nothing, otherwise it returns Just (f x).

Resources