Uniqueness and other restrictions for Arbitrary in QuickCheck - haskell

I'm trying to write a modified Arbitrary instance for my data type, where (in my case) a subcomponent has a type [String]. I would ideally like to bring uniqueness in the instance itself, that way I don't need ==> headers / prerequisites for every test I write.
Here's my data type:
data Foo = Vars [String]
and the trivial Arbitrary instance:
instance Arbitrary Foo where
arbitrary = Vars <$> (:[]) <$> choose ('A','z')
This instance is strange, I know. In the past, I've had difficulty when quickcheck combinatorically explodes, so I'd like to keep these values small. Another request - how can I make an instance where the generated strings are under 4 characters, for instance?
All of this, fundamentally requires (boolean) predicates to augment Arbitrary instances. Is this possible?

Definitely you want the instance to produce only instances that match the intention of the data type. If you want all the variables to be distinct, the Arbitrary instance must reflect this. (Another question is if in this case it wouldn't make more sense to define Vars as a set, like newtype Vars = Set [String].)
I'd suggest to check for duplicates using Set or Hashtable, as nub has O(n^2) complexity, which might slow down your test considerably for larger inputs. For example:
import Control.Applicative
import Data.List (nub)
import qualified Data.Set as Set
import Test.QuickCheck
newtype Foo = Vars [String]
-- | Checks if a given list has no duplicates in _O(n log n)_.
hasNoDups :: (Ord a) => [a] -> Bool
hasNoDups = loop Set.empty
where
loop _ [] = True
loop s (x:xs) | s' <- Set.insert x s, Set.size s' > Set.size s
= loop s' xs
| otherwise
= False
-- | Always worth to test if we wrote `hasNoDups` properly.
prop_hasNoDups :: [Int] -> Property
prop_hasNoDups xs = hasNoDups xs === (nub xs == xs)
Your instance then needs to create a list of list, and each list should be randomized. So instead of (: []), which creates just a singleton list (and just one level), you need to call listOf twice:
instance Arbitrary Foo where
arbitrary = Vars <$> (listOf . listOf $ choose ('A','z'))
`suchThat` hasNoDups
Also notice that choose ('A', 'z') allows to use all characters between A and z, which includes many control characters. My guess is that you rather want something like
oneof [choose ('A','Z'), choose ('a','z')]
If you really want, you could also make hasNoDups O(n) using hash tables in the ST monad.
Concerning limiting the size: you could always have your own parametrized functions that produce different Gen Foo, but I'd say in most cases it's not necessary. Gen has it's own internal size parameter, which is increased throughout the tests (see this answer), so different sizes (as generated using listOf) of lists are covered.
But I'd suggest you to implement shrink, as this will give you much nicer counter-examples. For example, if we define (a wrong test) that tried to verify that no instance of Var contains 'a' in any of its variable:
prop_Foo_hasNoDups :: Foo -> Property
prop_Foo_hasNoDups (Vars xs) = all (notElem 'a') xs === True
we'll get ugly counter-examples such as
Vars ["RhdaJytDWKm","FHHhrqbI","JVPKGTqNCN","awa","DABsOGNRYz","Wshubp","Iab","pl"]
But adding
shrink (Vars xs) = map Vars $ shrink xs
to Arbitrary Foo makes the counter-example to be just
Vars ["a"]

suchThat :: Gen a -> (a -> Bool) -> Gen a is a way to embed Boolean predicates in a Gen. See the haddocks for more info.
Here's how you would make the instance unique:
instance Arbitrary Foo where
arbitrary = Vars <$> (:[]) <$> (:[]) <$> choose ('A','z')
`suchThat` isUnique
where
isUnique x = nub x == x

Related

Generate injective functions with QuickCheck?

I'm using QuickCheck to generate arbitrary functions, and I'd like to generate arbitrary injective functions (i.e. f a == f b if and only if a == b).
I thought I had it figured out:
newtype Injective = Injective (Fun Word Char) deriving Show
instance Arbitrary Injective where
arbitrary = fmap Injective fun
where
fun :: Gen (Fun Word Char)
fun = do
a <- arbitrary
b <- arbitrary
arbitrary `suchThat` \(Fn f) ->
(f a /= f b) || (a == b)
But I'm seeing cases where the generated function maps distinct inputs to the same output.
What I want:
f such that for all inputs a and b, either f a does not equal f b or a equals b.
What I think I have:
f such that there exist inputs a and b where either f a does not equal f b or a equals b.
How can I fix this?
You've correctly identified the problem: what you're generating is functions with the property ∃ a≠b. f a≠f b (which is readily true for most random functions anyway), whereas what you want is ∀ a≠b. f a≠f b. That is a much more difficult property to ensure, because you need to know about all the other function values for generating each individual one.
I don't think this is possible to ensure for general input types, however for word specifically what you can do is “fake” a function by precomputing all the output values sequentially, making sure that you don't repeat one that has already been done, and then just reading off from that predetermined chart. It requires a bit of laziness fu to actually get this working:
import qualified Data.Set as Set
newtype Injective = Injective ([Char] {- simply a list without duplicates -})
deriving Show
instance Arbitrary Injective where
arbitrary = Injective . lazyNub <$> arbitrary
lazyNub :: Ord a => [a] -> [a]
lazyNub = go Set.empty
where go _ [] = []
go forbidden (x:xs)
| x `Set.member` forbidden = go forbidden xs
| otherwise = x : go (Set.insert x forbidden) xs
This is not very efficient, and may well not be ok for your application, but it's probably the best you can do.
In practice, to actually use Injective as a function, you'll want to wrap the values in a suitable structure that has only O (log n) lookup time. Unfortunately, Data.Map.Lazy is not lazy enough, you may need to hand-bake something like a list of exponentially-growing maps.
There's also the concern that for some insufficiently big result types, it is just not possible to generate injective functions because there aren't enough values available. In fact as Joseph remarked, this is the case here. The lazyNub function will go into an infinite loop in this case. I'd say for a QuickCheck this is probably ok though.

Haskell's (<-) in Terms of the Natural Transformations of Monad

So I'm playing around with the hasbolt module in GHCi and I had a curiosity about some desugaring. I've been connecting to a Neo4j database by creating a pipe as follows
ghci> pipe <- connect $ def {credentials}
and that works just fine. However, I'm wondering what the type of the (<-) operator is (GHCi won't tell me). Most desugaring explanations describe that
do x <- a
return x
desugars to
a >>= (\x -> return x)
but what about just the line x <- a?
It doesn't help me to add in the return because I want pipe :: Pipe not pipe :: Control.Monad.IO.Class.MonadIO m => m Pipe, but (>>=) :: Monad m => m a -> (a -> m b) -> m b so trying to desugar using bind and return/pure doesn't work without it.
Ideally it seems like it'd be best to just make a Comonad instance to enable using extract :: Monad m => m a -> a as pipe = extract $ connect $ def {creds} but it bugs me that I don't understand (<-).
Another oddity is that, treating (<-) as haskell function, it's first argument is an out-of-scope variable, but that wouldn't mean that
(<-) :: a -> m b -> b
because not just anything can be used as a free variable. For instance, you couldn't bind the pipe to a Num type or a Bool. The variable has to be a "String"ish thing, except it never is actually a String; and you definitely can't try actually binding to a String. So it seems as if it isn't a haskell function in the usual sense (unless there is a class of functions that take values from the free variable namespace... unlikely). So what is (<-) exactly? Can it be replaced entirely by using extract? Is that the best way to desugar/circumvent it?
I'm wondering what the type of the (<-) operator is ...
<- doesn't have a type, it's part of the syntax of do notation, which as you know is converted to sequences of >>= and return during a process called desugaring.
but what about just the line x <- a ...?
That's a syntax error in normal haskell code and the compiler would complain. The reason the line:
ghci> pipe <- connect $ def {credentials}
works in ghci is that the repl is a sort of do block; you can think of each entry as a line in your main function (it's a bit more hairy than that, but that's a good approximation). That's why you need (until recently) to say let foo = bar in ghci to declare a binding as well.
Ideally it seems like it'd be best to just make a Comonad instance to enable using extract :: Monad m => m a -> a as pipe = extract $ connect $ def {creds} but it bugs me that I don't understand (<-).
Comonad has nothing to do with Monads. In fact, most Monads don't have any valid Comonad instance. Consider the [] Monad:
instance Monad [a] where
return x = [x]
xs >>= f = concat (map f xs)
If we try to write a Comonad instance, we can't define extract :: m a -> a
instance Comonad [a] where
extract (x:_) = x
extract [] = ???
This tells us something interesting about Monads, namely that we can't write a general function with the type Monad m => m a -> a. In other words, we can't "extract" a value from a Monad without additional knowledge about it.
So how does the do-notation syntax do {x <- [1,2,3]; return [x,x]} work?
Since <- is actually just syntax sugar, just like how [1,2,3] actually means 1 : 2 : 3 : [], the above expression actually means [1,2,3] >>= (\x -> return [x,x]), which in turn evaluates to concat (map (\x -> [[x,x]]) [1,2,3])), which comes out to [1,1,2,2,3,3].
Notice how the arrow transformed into a >>= and a lambda. This uses only built-in (in the typeclass) Monad functions, so it works for any Monad in general.
We can pretend to extract a value by using (>>=) :: Monad m => m a -> (a -> m b) -> m b and working with the "extracted" a inside the function we provide, like in the lambda in the list example above. However, it is impossible to actually get a value out of a Monad in a generic way, which is why the return type of >>= is m b (in the Monad)
So what is (<-) exactly? Can it be replaced entirely by using extract? Is that the best way to desugar/circumvent it?
Note that the do-block <- and extract mean very different things even for types that have both Monad and Comonad instances. For instance, consider non-empty lists. They have instances of both Monad (which is very much like the usual one for lists) and Comonad (with extend/=>> applying a function to all suffixes of the list). If we write a do-block such as...
import qualified Data.List.NonEmpty as N
import Data.List.NonEmpty (NonEmpty(..))
import Data.Function ((&))
alternating :: NonEmpty Integer
alternating = do
x <- N.fromList [1..6]
-x :| [x]
... the x in x <- N.fromList [1..6] stands for the elements of the non-empty list; however, this x must be used to build a new list (or, more generally, to set up a new monadic computation). That, as others have explained, reflects how do-notation is desugared. It becomes easier to see if we make the desugared code look like the original one:
alternating :: NonEmpty Integer
alternating =
N.fromList [1..6] >>= \x ->
-x :| [x]
GHCi> alternating
-1 :| [1,-2,2,-3,3,-4,4,-5,5,-6,6]
The lines below x <- N.fromList [1..6] in the do-block amount to the body of a lambda. x <- in isolation is therefore akin to a lambda without body, which is not a meaningful thing.
Another important thing to note is that x in the do-block above does not correspond to any one single Integer, but rather to all Integers in the list. That already gives away that <- does not correspond to an extraction function. (With other monads, the x might even correspond to no values at all, as in x <- Nothing or x <- []. See also Lazersmoke's answer.)
On the other hand, extract does extract a single value, with no ifs or buts...
GHCi> extract (N.fromList [1..6])
1
... however, it is really a single value: the tail of the list is discarded. If we want to use the suffixes of the list, we need extend/(=>>)...
GHCi> N.fromList [1..6] =>> product =>> sum
1956 :| [1236,516,156,36,6]
If we had a co-do-notation for comonads (cf. this package and the links therein), the example above might get rewritten as something in the vein of:
-- codo introduces a function: x & f = f x
N.fromList [1..6] & codo xs -> do
ys <- product xs
sum ys
The statements would correspond to plain values; the bound variables (xs and ys), to comonadic values (in this case, to list suffixes). That is exactly the opposite of what we have with monadic do-blocks. All in all, as far as your question is concerned, switching to comonads just swaps which things we can't refer to outside of the context of a computation.

List of polymorphic functions in haskell?

Consider the code below:
t1 :: [Int] -> (Int,String)
t1 xs = (sum xs,show $ length xs)
t2 :: [Int] -> (Int,String)
t2 xs = (length xs, (\x -> '?') <$> xs)
t3 :: [Int] -> (Char,String)
t3 (x:xs) = ('Y',"1+" ++ (show $ length xs))
t3 [] = ('N',"empty")
These three functions have a type that only varies partially -- they are entirely usable without needing to know the type of the first component of the tuple they produce. This means that I can operate on them without needing to refer to that type:
fnListToStrs vs fs = (\x -> snd $ x vs) <$> fs
Loading these definitions into GHCi, all three of the functions work independently as an argument to fnListToStrs, and indeed I can pass in a list containing both t1 and t2 because they have the same type:
*Imprec> fnListToStrs [1,2] [t1,t2]
["2","??"]
*Imprec> fnListToStrs [1,2] [t3]
["1+1"]
But I can't pass all 3 at the same time, even though the divergence of types is actually irrelevant to the calculation performed:
*Imprec> fnListToStrs [1,2] [t1,t2]
["2","??"]
*Imprec> fnListToStrs [1,2] [t3]
["1+1"]
I have the feeling that making this work has something to do with either existential or impredicative types, but neither extension has worked for me when using the type declaration I expect fnListToStrs to be able to take, namely:
fnListToStrs :: [Int] -> [forall a.[Int]->(a,String)] -> [String]
Is there some other way to make this work?
Existential is correct, not impredicative. And Haskell doesn't have existentials, except through an explicit wrapper...
{-# LANGUAGE GADTs #-}
data SomeFstRes x z where
SFR :: (x -> (y,z)) -> SomeFstRes x z
> fmap (\(SFR f) -> snd $ f [1,2]) [SFR t1, SFR t2, SFR t3]
["2","??","1+1"]
but, this really is a bit useless. Since you can't possibly do anything with the first result anyway, it's more sensible to just throw it away immediately and put the remaining function in a simple monomorphic list:
> fmap ($[1,2]) [snd . t1, snd . t2, snd . t3]
["2","??","1+1"]
Any way to put these functions into a list will require "wrapping" each of them in some fashion. The simplest wrapping is just
wrap :: (a -> (b, c)) -> a -> c
wrap f = snd . f
There are, indeed, other ways to wrap these (notably with existential types), but you've not given any information to suggest that any of those would be even slightly better in your application than this simplest version.
Here's an example where something more sophisticated might make sense. Suppose you have
data Blob a b = Blob [a -> b] [a]
Now imagine you want to make a list of values of type Blob a b that all have the same b type, but may have different a types. Actually applying each function to each argument could lead to a prohibitively large list of potential results, so it would make sense to write
data WrapBlob b where
WrapBlob :: Blob a b -> WrapBlob b
Now you can make the list and postpone the decision of which function(s) to apply to which argument(s) without paying a prohibitive price.

Recursively merge list of lists based on shared elements

I don't know what the official technical name is for what I'm trying to do so I'll try to explain it as best I can.
Given a list of lists:
[[2,3,4,5], [1,5,6], [7,8,9]]
I want to union only the lists that have atleast one common element. So basically something like this:
simUnion :: [[Int]] -> [[Int]]
simUnion list = --...
--Result
-- [[1,2,3,4,5,6], [7,8,9]]
The problem I'm running into is running a match process between each element. Basically this is like the old math class problem where each person in a room must shake the hand of each other person. Ordinarily I'd accomplish this with a nested for loop, but how can I do this using Haskell's recursion?
Any help at all would be great!
If there is a finite number of distinct elements, you can turn the task inside out and make a Ord elem => Map elem [[elem]] out of your [[elem]] and then start iteratively merging the elements by the next algorithm:
while map isn't empty, take away a key, put it in the queue
get all the groups containing key popped from the queue
concat them and put into the queue (and in some accumulator, too)
if the queue got empty, the group is finished; take another key from the map
Note: The following post is written in literate Haskell. Save it as *.lhs and load it in GHCi. Also note that the discussed algorithm has runtime O(n²) and isn't optimal. A better approach would use union find or similar.
First, let us think about the tools we need if we want to group a single list x with the rest of the lists xs. We need to separate between the lists from xs that have an element in common with x, and we need to build the union of such lists. Therefore, we should import some functions from Data.List:
> import Data.List (partition, union)
Next, we need to check whether two lists are suitable to get merged:
> intersects :: Eq a => [a] -> [a] -> Bool
> intersects xs ys = any (`elem` ys) xs
Now we have all the tools at hand to define simUnion. The empty case is clear: if we don't have any lists, the result doesn't have any list either:
> simUnion :: Eq a => [[a]] -> [[a]]
> simUnion [] = []
Suppose we have at least two lists. We take the first one and check whether they have any element in common with any other list. We can do so by using partition:
> simUnion (x:xs) =
> let (common, noncommon) = partition (intersects x) xs
Now, common :: [[a]] will only contain those lists that have at least one element in common. There can be two cases now: either common is empty, and our list x has no element in common with any list from xs:
> in if null common
> then x : simUnion xs
We ignore uncommon here, since xs == uncommon in this case. In the other case, we need to build the union of all lists in common and x. This can be done with foldr union. However, this new list must be used in simUnion again, since it may have new intersections. For example, in
simUnion [[1,2], [2,3], [3,4]]
you want to end up with [[1,2,3,4]], not [[1,2,3],[3,4]]:
> else simUnion (foldr union x common : noncommon)
Note that the result will be unsorted, but you can map sort over it as a last step.
I have two main recommendations:
Don't think of it in terms of recursion! Instead, make liberal use of library utility functions.
Use appropriate data structures! Since you're talking about membership tests and unions, sets (from the Data.Set module) sound like they would be a better choice.
Applying those ideas, here's a fairly simple (though perhaps very naïve and suboptimal) solution:
import Data.Set (Set)
import qualified Data.Set as Set
simUnion :: Set (Set Int) -> Set (Set Int)
simUnion sets = Set.map outer sets
where outer :: Set Int -> Set Int
outer set = unionMap middle set
where middle :: Int -> Set Int
middle i = unionMap inner sets
where inner :: Set Int -> Set Int
inner set
| i `Set.member` set = set
| otherwise = Set.empty
-- | Utility function analogous to the 'concatMap' list function, but
-- for sets.
unionMap :: (Ord a, Ord b) => (a -> Set b) -> Set a -> Set b
unionMap f as = Set.unions (map f (Set.toList as))
Now using your example:
-- | This evaluates to:
--
-- >>> simUnion sampleData
-- fromList [fromList [1,2,3,4,5,6],fromList [7,8,9]]
sampleData :: Set (Set Int)
sampleData = Set.fromList (map Set.fromList sampleData')
where sampleData' :: [[Int]]
sampleData' = [[2,3,4,5], [1,5,6], [7,8,9]]
Ordinarily I'd accomplish this with a nested for loop, but how can I do this using Haskell's recursion?
You don't use recursion directly. You use higher-order functions like Set.map and unionMap. Note that these functions are analogous to loops, and that we're using them in a nested manner. Rule of thumb: imperative for loops very often translate to functional map, filter, reduce or similar operations. Nested imperative loops correspondingly often translate to nested use of such functions.

How to avoid spaceleak in multiple list traversals?

is GHC intelligent enough to run multiple operations on lists in 'semi-parallel'?
Consider this (simplified) code:
findElements bigList = do
let special = head . filter isSpecial $ bigList
let others = filter isSpecialOrNormal $ bigList
return (special, others)
(Monad due to original code)
I guess GHC will run the first list operation and will keep all elements in memory so that the second operation is able to work on them.
My problem is that i am running into a spaceleak when dealing with larger files. But i believe it should be able to run in constant space. Is there a way to achieve this?
Update 1
Having written it down like this the solution to this problem of course is to change the order of the two lines.
But my question remains: is the GHC intelligent enough to figure out this semi-parallel processing when it not done in a monad?
I don't think GHC is smart enough to merge these two traversals, or, as is usually the case, GHC could be smart enough, but there are cases where you don't want this behavior, so GHC doesn't do it.
Here's how I would do it, using monoids and foldMap.
import Data.Monoid
import Data.Foldable
First, here's how to write special with foldMap, using the First monoid.
specialF :: a -> First a
specialF a = First $ if isSpecial a then Just a else Nothing
special :: [a] -> a
special as = let (First (Just s)) = foldMap specialF as in s
And similar for specialOrNormal, using the list monoid.
specialOrNormalF :: a -> [a]
specialOrNormalF a = if isSpecialOrNormal a then [a] else []
specialOrNormal :: [a] -> [a]
specialOrNormal = foldMap specialOrNormalF
One neat thing about monoids is that a tuple of monoids is also a monoid, which makes merging these folds easy:
findElements :: [a] -> (a, [a])
findElements bigList =
let (First (Just s), son) =
foldMap (\a -> (specialF a, specialOrNormalF a)) bigList
in (s, son)
And if you like point-free code, you can write the whole thing like this:
findElements :: [a] -> (a, [a])
findElements =
first (fromJust . getFirst) .
foldMap
( First . mfilter isSpecial . return
&&& mfilter isSpecialOrNormal . return
)

Resources