QuickCheck: How to use exhaustiveness checker to prevent forgotten constructors of a sum type

QuickCheck: How to use exhaustiveness checker to prevent forgotten constructors of a sum type - haskell

I have a Haskell data type like
data Mytype
= C1
| C2 Char
| C3 Int String
If I case on a Mytype and forget to handle one of the cases, GHC gives me a warning (exhaustiveness check).
I now want to write a QuickCheck Arbitrary instance to generate MyTypes like:
instance Arbitrary Mytype where
arbitrary = do
n <- choose (1, 3 :: Int)
case n of
1 -> C1
2 -> C2 <$> arbitrary
3 -> C3 <$> arbitrary <*> someCustomGen
The problem with this is that I can add a new alternative to Mytype and forget to update the Arbitrary instance, thus having my tests not test that alternative.
I would like to find a way of using GHC's exhaustiveness checker to remind me of forgotten cases in my Arbitrary instance.
The best I've come up with is
arbitrary = do
x <- elements [C1, C2 undefined, C3 undefined undefined]
case x of
C1 -> C1
C2 _ -> C2 <$> arbitrary
C3 _ _ -> C3 <$> arbitrary <*> someCustomGen
But it doesn't really feel elegant.
I intuitively feel that there's no 100% clean solution to this, but would appreciate anything that reduces the chance of forgetting such cases - especially in a big project where code and tests are separated.

I implemented a solution with TemplateHaskell, you can find a prototype at https://gist.github.com/nh2/d982e2ca4280a03364a8. With this you can write:
instance Arbitrary Mytype where
arbitrary = oneof $(exhaustivenessCheck ''Mytype [|
[ pure C1
, C2 <$> arbitrary
, C3 <$> arbitrary <*> arbitrary
]
|])
It works like this: You give it a type name (like ''Mytype) and an expression (in my case a list of arbitrary style Gens). It gets the list of all constructors for that type name and checks whether the expression contains all of these constructors at least once. If you just added a constructor but forgot to add it to the Arbitrary instance, this function will warn you at compile time.
This is how it's implemented with TH:
exhaustivenessCheck :: Name -> Q Exp -> Q Exp
exhaustivenessCheck tyName qList = do
tyInfo <- reify tyName
let conNames = case tyInfo of
TyConI (DataD _cxt _name _tyVarBndrs cons _derives) -> map conNameOf cons
_ -> fail "exhaustivenessCheck: Can only handle simple data declarations"
list <- qList
case list of
input#(ListE l) -> do
-- We could be more specific by searching for `ConE`s in `l`
let cons = toListOf tinplate l :: [Name]
case filter (`notElem` cons) conNames of
[] -> return input
missings -> fail $ "exhaustivenessCheck: missing case: " ++ show missings
_ -> fail "exhaustivenessCheck: argument must be a list"
I'm using GHC.Generics to easily traverse the syntax tree of the Exp: With toListOf tinplate exp :: [Name] (from lens) I can easily find all Names in the whole exp.
I was surprised that the types from Language.Haskell.TH do not have Generic instances, and neither (with current GHC 7.8) do Integer or Word8 - Generic instances for these are required because they appear in Exp. So I added them as orphan instances (for most things, StandaloneDeriving does it but for primitive types like Integer I had to copy-paste instances as Int has them).
The solution is not perfect because it doesn't use the exhaustiveness checker like case does, but as we agree, that's not possible while staying DRY, and this TH solution is DRY.
One possible improvement/alternative would be to write a TH function that does this check for all Arbitrary instances in a whole module at once instead of calling exhaustivenessCheck inside each Arbitrary instance.

You want to ensure that your code behaves in a particular way; the simplest way to check the behaviour of code is to test it.
In this case, the desired behaviour is that each constructor gets reasonable coverage in tests. We can check that with a simple test:
allCons xs = length xs > 100 ==> length constructors == 3
where constructors = nubBy eqCons xs
eqCons C1 C1 = True
eqCons C1 _ = False
eqCons (C2 _) (C2 _) = True
eqCons (C2 _) _ = False
eqCons (C3 _ _) (C3 _ _) = True
eqCons (C3 _ _) _ = False
This is pretty naive, but it's a good first shot. Its advantages:
eqCons will trigger an exhaustiveness warning if new constructors are added, which is what you want
It checks that your instance is handling all constructors, which is what you want
It also checks that all constructors are actually generated with some useful probability (in this case at least 1%)
It also checks that your instance is usable, eg. doesn't hang
Its disadvantages:
Requires a large amount of test data, in order to filter out those with length > 100
eqCons is quite verbose, since a catch-all eqCons _ _ = False would bypass the exhaustiveness check
Uses magic numbers 100 and 3
Not very generic
There are ways to improve this, eg. we can compute the constructors using the Data.Data module:
allCons xs = sufficient ==> length constructors == consCount
where sufficient = length xs > 100 * consCount
constructors = length . nub . map toConstr $ xs
consCount = dataTypeConstrs (head xs)
This loses the compile-time exhaustiveness check, but it's redundant as long as we test regularly and our code has become more generic.
If we really want the exhaustiveness check, there are a few places where we could shoe-horn it back in:
allCons xs = sufficient ==> length constructors == consCount
where sufficient = length xs > 100 * consCount
constructors = length . nub . map toConstr $ xs
consCount = length . dataTypeConstrs $ case head xs of
x#(C1) -> x
x#(C2 _) -> x
x#(C3 _ _) -> x
Notice that we use consCount to eliminate the magic 3 completely. The magic 100 (which determined the minimum required frequency of a constructor) now scales with consCount, but that just requires even more test data!
We can solve that quite easily using a newtype:
consCount = length (dataTypeConstrs C1)
newtype MyTypeList = MTL [MyType] deriving (Eq,Show)
instance Arbitrary MyTypeList where
arbitrary = MTL <$> vectorOf (100 * consCount) arbitrary
shrink (MTL xs) = MTL (shrink <$> xs)
allCons (MTL xs) = length constructors == consCount
where constructors = length . nub . map toConstr $ xs
We can put a simple exhaustiveness check in there somewhere if we like, eg.
instance Arbitrary MyTypeList where
arbitrary = do x <- arbitrary
MTL <$> vectorOf (100 * consCount) getT
where getT = do x <- arbitrary
return $ case x of
C1 -> x
C2 _ -> x
C3 _ _ -> x
shrink (MTL xs) = MTL (shrink <$> xs)

Here I exploit an unused variable _x. This is not really more elegant than your solution, though.
instance Arbitrary Mytype where
arbitrary = do
let _x = case _x of C1 -> _x ; C2 _ -> _x ; C3 _ _ -> _x
n <- choose (1, 3 :: Int)
case n of
1 -> C1
2 -> C2 <$> arbitrary
3 -> C3 <$> arbitrary <*> someCustomGen
Of course, one has to keep the last case coherent with the dummy definition of _x, so it is not completely DRY.
Alternatively, one might exploit Template Haskell to build a compile-time assert checking that the constructors in Data.Data.dataTypeOf are the expected ones. This assert has to be kept coherent with the Arbitrary instance, so this is not completely DRY either.
If you do not need custom generators, I believe Data.Data can be exploited to generate Arbitrary instances via Template Haskell (I think I saw some code doing exactly that, but I can't remember where). In this way, there's no chance the instance can miss a constructor.

Here is a solution using the generic-random library:
{-# language DeriveGeneric #-}
{-# language TypeOperators #-}
import Generic.Random
import GHC.Generics
import Test.QuickCheck
data Mytype
= C1
| C2 Char
| C3 Int String
deriving Generic
instance Arbitrary Mytype where
arbitrary = genericArbitraryG customGens uniform
where
customGens :: Gen String :+ ()
customGens = someCustomGen :+ ()
someCustomGen :: Gen String
someCustomGen = undefined
genericArbitraryG takes care of generating each constructor of MyType. In this case we use uniform to get a uniform distribution of constructors. With customGens we define that each String field in Mytype is generated with someCustomGen.
See Generic.Random.Tutorial for more examples.

Related

Accessing a common field in sum type variants

Suppose I have a sum type (or several, in fact), that I know by design all have a common field:
data T1 a
= C1 String a
| C2 Int a
| C3 Bool a
data T2 a
= C4 Int Int a
| C5 [String] a
Is there a way to access the a field without having to pattern match on all variants across all types?
(I ask in the context of defining ASTs & having a neat way of accessing node-specific information)

A Minor Technical Detail
At the boring super-technical level, no. There is no way to access the fields of a constructor without pattern matching. Pattern matching is the primitive operation that causes the constructor to be evaluated. Before that, the fields don't even necessarily exist (thanks to non-strict evaluation).
Some Options That Might Be Useful
But you probably didn't mean that low-level question. You probably want a way to work with these data types without constantly writing pattern matches. And that can be done. It's just a matter of writing some functions. Which functions, though? ...that can be interesting.
You can write simple accessor functions:
t1ToA :: T1 a -> a
t1ToA (C1 _ x) = x
t1ToA (C2 _ x) = x
t1ToA (C3 _ x) = x
t2ToA :: T2 a -> a
t2ToA (C4 _ _ x) = x
t2ToA (C5 _ x) = x
Don't automatically reject this approach. Sure, it's a bit hungry on namespace because you need a different function name for each type. On the other hand, it's really good for readability and type inference. There's nothing magical anywhere. You might write some matching setter and modifier functions as well.
If you find that's getting to be too namespace hungry when you have various set and modify functions added in, you could use the van Laarhoven trick:
t1A :: Functor f => (a -> f a) -> T1 a -> f (T1 a)
t1A g (C1 x y) = C1 x <$> g y
t1A g (C2 x y) = C2 x <$> g y
t1A g (C3 x y) = C3 x <$> g y
t2A :: Functor f => (a -> f a) -> T2 a -> f (T2 a)
t2A g (C4 x y z) = C4 x y <$> g z
t2A g (C5 x y) = C5 x <$> g y
This representation lets you do reading and updating from the same type, though it is awkward without some helper functions. This is the representation used by libraries like lens, which provide you a huge number of those helper functions. But maybe you don't want to worry about learning how to work with this representation. I'm going to assume this isn't really what you're looking for and not even go into the details of how those helper functions work. But at a high level, they make clever use of specific types for f like Identity and Const a.
An option if you are willing to give up some type inference in order to reduce namespace use is to go for some sort of ad-hoc class:
class ToA f where
toA :: f a -> a
instance ToA T1 where
toA (C1 _ x) = x
toA (C2 _ x) = x
toA (C3 _ x) = x
instance ToA T2 where
toA :: T2 a -> a
toA (C4 _ _ x) = x
toA (C5 _ x) = x
You could choose to combine this with the van Laarhoven encoding, for what it's worth. This would minimize the amount of namespace you grab, but requiring some additional helpers for the sake of using them easily.
There are a few other options that you might be able to work with, like using less ad-hoc tools GHC provides. Data and Generic are different classes you could work with where GHC gives you a lot of the tools already. But these tend to be very complex to pick up the first time around.
But Maybe There's a Better Solution
There's one last option that is actually the one I would recommend in most cases. Refactor your data types so the shared values aren't duplicated.
data WithA t a = WithA t a
data T1
= C1 String
| C2 Int
| C3 Bool
And so on. Or however you might choose to refactor it. The important part is that the shared field is lifted out of the sum type, and is just always present. I think that this often ends up working the best. It often communicates what you mean better. When you have 3 constructors which each have a field of the same type it's not immediately obvious that that field should be seen as interchangeable between the constructors, even when the datatype is polymorphic over that field's type. But if it's a single field outside of the multiple constructors it is immediately obvious that it's always the same thing. Don't underestimate the communication value that provides for all future maintainers of the code.

Carl already mentioned a few alternatives. For completeness, let me add that using records allows one to use a common field name for all constructors, and that in turn allows to get/set the common field.
data T a
= K1 { foo :: a, oth1 :: Int }
| K2 { foo :: a, oth2 :: String }
get :: T a -> a
get = foo
set :: T a -> a -> T a
set t x = t{foo = x}
I am not a huge fan of this approach since other fields like oth1, oth2 become partial functions. I would rather refactor the type as Carl showed (the WithA example).

Conditions on list comprehension using Haskell and SBV

I want to write a Haskell list comprehension with a condition on symbolic expressions (SBV). I reproduced the problem with the following small example.
import Data.SBV
allUs :: [SInteger]
allUs = [0,1,2]
f :: SInteger -> SBool
f 0 = sTrue
f 1 = sFalse
f 2 = sTrue
someUs :: [SInteger]
someUs = [u | u <- allUs, f u == sTrue]
with show someUs, this gives the following error
*** Data.SBV: Comparing symbolic values using Haskell's Eq class!
***
*** Received: 0 :: SInteger == 0 :: SInteger
*** Instead use: 0 :: SInteger .== 0 :: SInteger
***
*** The Eq instance for symbolic values are necessiated only because
*** of the Bits class requirement. You must use symbolic equality
*** operators instead. (And complain to Haskell folks that they
*** remove the 'Eq' superclass from 'Bits'!.)
CallStack (from HasCallStack):
error, called at ./Data/SBV/Core/Symbolic.hs:1009:23 in sbv-8.8.5-IR852OLMhURGkbvysaJG5x:Data.SBV.Core.Symbolic
Changing the condition into f u .== sTrue also gives an error
<interactive>:8:27: error:
• Couldn't match type ‘SBV Bool’ with ‘Bool’
Expected type: Bool
Actual type: SBool
• In the expression: f u .== sTrue
In a stmt of a list comprehension: f u .== sTrue
In the expression: [u | u <- allUs, f u .== sTrue]
How to get around this problem?

Neither your f nor your someUs are symbolically computable as written. Ideally, these should be type-errors, rejected out-of-hand. This is due to the fact that symbolic values cannot be instances of the Eq class: Why? Because determining equality of symbolic values requires a call to the underlying solver; so the result cannot be Bool; it should really be SBool. But Haskell doesn't allow generalized guards in pattern-matching to allow for that possibility. (And there are good reasons for that too, so it's not really Haskell's fault here. It's just that the two styles of programming don't work well all that great together.)
You can ask why SBV makes symbolic values an instance of the Eq class. The only reason why it's an instance of Eq is what the error message is telling you: Because we want them to be instances of the Bits class; which has Eq as a superclass requirement. But that's a whole another discussion.
Based on this, how can you write your functions in SBV? Here's how you'd code f in the symbolic style:
f :: SInteger -> SBool
f i = ite (i .== 0) sTrue
$ ite (i .== 1) sFalse
$ ite (i .== 2) sTrue
$ sFalse -- arbitrarily filled to make the function total
Ugly, but this is the only way to write it unless you want to play some quasi-quoting tricks.
Regarding someUs: This isn't something you can directly write symbolically either: This is known as a spine-concrete list. And there's no way for SBV to know how long your resulting list would be without actually running the solver on individual elements. In general you cannot do filter like functions on a spine-concrete list with symbolic elements.
The solution is to use what's known as a symbolic list and a bounded-list abstraction. This isn't very satisfactory, but is the best you can do to avoid termination problems:
{-# LANGUAGE OverloadedLists #-}
import Data.SBV
import Data.SBV.List
import Data.SBV.Tools.BoundedList
f :: SInteger -> SBool
f i = ite (i .== 0) sTrue
$ ite (i .== 1) sFalse
$ ite (i .== 2) sTrue
$ sFalse -- arbitrarily filled to make the function total
allUs :: SList Integer
allUs = [0,1,2]
someUs :: SList Integer
someUs = bfilter 10 f allUs
When I run this, I get:
*Main> someUs
[0,2] :: [SInteger]
But you'll ask what's that number 10 in the call to bfilter? Well, the idea is that all lists are assumed to have some sort of an upper bound on their length, and the Data.SBV.Tools.BoundedList exports a bunch of methods to deal with them easily; all taking a bound parameter. So long as the inputs are at most this length long, they'll work correctly. There's no guarantee as to what happens if your list is longer than the bound given. (In general it'll chop off your lists at the bound, but you should not rely on that behavior.)
There's a worked-out example of uses of such lists in coordination with BMC (bounded-model-checking) at https://hackage.haskell.org/package/sbv-8.12/docs/Documentation-SBV-Examples-Lists-BoundedMutex.html
To sum up, dealing with lists in a symbolic context comes with some costs in modeling and how much you can do, due to restrictions in Haskell (where Bool is a fixed type instead of a class), and underlying solvers, which cannot deal with recursively defined functions all that well. The latter is mainly due to the fact that such proofs require induction, and SMT-solvers cannot do induction out-of-the-box. But if you follow the rules of the game using BMC like ideas, you can handle practical instances of the problem up to reasonable bounds.

(.==) takes two instances of EqSymbolic, returning an SBool. Inside a list comprehension, conditionals are implemented using the guard function.
Here's what it looks like:
guard :: Alternative f => Bool -> f ()
guard False = empty
guard True = pure ()
For lists, empty is [], and pure () returns a singleton list [()]. Any member of the list that evaluates to False will return an empty list instead of a unit item, excluding it from computations down the chain.
[True, False, True] >>= guard
= concatMap guard [True, False, True]
= concat $ map guard [True, False, True]
= concat $ [[()], [], [()]]
= [(), ()]
The second branch is then excluded when the context is flattened, so it's "pruned" from the computation.
It seems like you have two problems here - when you pattern match in f, you're doing a comparison using the Eq class. That's where the SBV error is coming from. Since your values are close together, you could use select, which takes a list of items, a default, an expression which evaluates to an index, and attempt to take the indexth item from that list.
You could rewrite f as
f :: SInteger -> SBool
f = select [sTrue, sFalse, sTrue] sFalse
The second problem is that guards explicitly look for Bool, but (.==) still returns an SBool. Looking at Data.SBV, you should be able to coerce that into a regular Bool using unliteral, which attempts to unwrap an SBV value into an equivalent Haskell one.
fromSBool :: SBool -> Bool
fromSBool = fromMaybe False . unliteral
someUs :: [SInteger]
someUs = [u | u <- allUs, fromSBool (f u)]
-- [0 :: SInteger, 2 :: SInteger]

Printing Dynamic Data

I have a system in haskell that uses Data.Dynamic and Type.Reflection to perform inference and calculations. I would like to be able to print the results.
Printing is easy when the type is supplied e.g
foo :: Dynamic -> String
foo dyn = case tyConName . someTypeRepTyCon . dynTypeRep $ dyn of
"Int" -> show $ fromDyn dyn (0 :: Int)
"Bool" -> show $ fromDyn dyn True
_ -> "no chance"
But if I want to be able to print tuples, I would have to add a new line for each e.g (Int, Bool), (Bool, Int), (Char, Int, Banana) ....
With the addition of more primitives and larger tuples this quickly becomes impractical.
Is there an algorithmic way to generate strings for this dynamic data, specifically for tuples and lists.

I like the main idea of the other answer, but it seems to get where it's going in a fairly roundabout way. Here's how I would style the same idea:
{-# LANGUAGE ViewPatterns #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE GADTs #-}
import Type.Reflection
import Data.Dynamic
showDyn :: Dynamic -> String
showDyn (Dynamic (App (App (eqTypeRep (typeRep #(,)) -> Just HRefl) ta) tb) (va, vb))
= concat [ "DynamicPair("
, showDyn (Dynamic ta va)
, ","
, showDyn (Dynamic tb vb)
, ")"
]
showDyn (Dynamic (eqTypeRep (typeRep #Integer) -> Just HRefl) n) = show n
showDyn (Dynamic tr _) = show tr
That first pattern match is quite a mouthful, but after playing with a few different ways of formatting it I'm convinced that there just is no way to make that look good. You can try it in ghci:
> showDyn (toDyn ((3,4), (True, "hi")))
"DynamicPair(DynamicPair(3,4),DynamicPair(Bool,[Char]))"

I could only manage to obtain this horrible solution.
{-# LANGUAGE GADTs, ScopedTypeVariables, TypeApplications #-}
{-# OPTIONS -Wall #-}
import Type.Reflection
import Data.Dynamic
Here we define the TyCon for (,) and Int. (I'm pretty sure there must be an easier way.)
pairTyCon :: TyCon
pairTyCon = someTypeRepTyCon (someTypeRep [('a','b')])
intTyCon :: TyCon
intTyCon = someTypeRepTyCon (someTypeRep [42 :: Int])
Then we dissect the Dynamic type. First we check if it is an Int.
showDynamic :: Dynamic -> String
showDynamic x = case x of
Dynamic tr#(Con k) v | k == intTyCon ->
case eqTypeRep tr (typeRep # Int) of
Just HRefl -> show (v :: Int)
_ -> error "It really should be an int"
-- to be continued
The above is ugly, since we first pattern match against the TyCon using == instead of pattern matching, which prevents the type refinement of v into an Int. So, we still have to resort to eqTypeRep to perform a second check which we already know has to succeed.
I think it could be made pretty by checking eqTypeRep in advance, for instance. Or fromDyn. It does not matter.
What matters is that the pair case below is even more messy, and can not be made pretty in the same way, as far as I can see.
-- continuing from above
Dynamic tr#(App (App t0#(Con k :: TypeRep p)
(t1 :: TypeRep a1))
(t2 :: TypeRep a2)) v | k == pairTyCon ->
withTypeable t0 $
withTypeable t1 $
withTypeable t2 $
case ( eqTypeRep tr (typeRep #(p a1 a2))
, eqTypeRep (typeRep #p) (typeRep #(,))) of
(Just HRefl, Just HRefl) ->
"DynamicPair("
++ showDynamic (Dynamic t1 (fst v))
++ ", "
++ showDynamic (Dynamic t2 (snd v))
++ ")"
_ -> error "It really should be a pair!"
_ -> "Dynamic: not an int, not a pair"
Above we match the TypeRep so that it represents something of type p a1 a2. We require that the representation of p to be pairTyCon.
As before this does not trigger type refinement, since it is done with == instead of pattern matching. We need to perform another explicit match to force p ~ (,) and another for the final refinement v :: (a1,a2). Sigh.
Finally, we can take fst v and snd v, turn them into Dynamic once again, and pair them. Effectively, we turned the original x :: Dynamic into something like (fst x, snd x) where both components are Dynamic. Now we can recurse.
I would really like to avoid the errors, but I can not see how to do that at the moment.
The redeeming part is that the approach is very general, and can be easily adapted to other type constructors.

Uniqueness and other restrictions for Arbitrary in QuickCheck

I'm trying to write a modified Arbitrary instance for my data type, where (in my case) a subcomponent has a type [String]. I would ideally like to bring uniqueness in the instance itself, that way I don't need ==> headers / prerequisites for every test I write.
Here's my data type:
data Foo = Vars [String]
and the trivial Arbitrary instance:
instance Arbitrary Foo where
arbitrary = Vars <$> (:[]) <$> choose ('A','z')
This instance is strange, I know. In the past, I've had difficulty when quickcheck combinatorically explodes, so I'd like to keep these values small. Another request - how can I make an instance where the generated strings are under 4 characters, for instance?
All of this, fundamentally requires (boolean) predicates to augment Arbitrary instances. Is this possible?

Definitely you want the instance to produce only instances that match the intention of the data type. If you want all the variables to be distinct, the Arbitrary instance must reflect this. (Another question is if in this case it wouldn't make more sense to define Vars as a set, like newtype Vars = Set [String].)
I'd suggest to check for duplicates using Set or Hashtable, as nub has O(n^2) complexity, which might slow down your test considerably for larger inputs. For example:
import Control.Applicative
import Data.List (nub)
import qualified Data.Set as Set
import Test.QuickCheck
newtype Foo = Vars [String]
-- | Checks if a given list has no duplicates in _O(n log n)_.
hasNoDups :: (Ord a) => [a] -> Bool
hasNoDups = loop Set.empty
where
loop _ [] = True
loop s (x:xs) | s' <- Set.insert x s, Set.size s' > Set.size s
= loop s' xs
| otherwise
= False
-- | Always worth to test if we wrote `hasNoDups` properly.
prop_hasNoDups :: [Int] -> Property
prop_hasNoDups xs = hasNoDups xs === (nub xs == xs)
Your instance then needs to create a list of list, and each list should be randomized. So instead of (: []), which creates just a singleton list (and just one level), you need to call listOf twice:
instance Arbitrary Foo where
arbitrary = Vars <$> (listOf . listOf $ choose ('A','z'))
`suchThat` hasNoDups
Also notice that choose ('A', 'z') allows to use all characters between A and z, which includes many control characters. My guess is that you rather want something like
oneof [choose ('A','Z'), choose ('a','z')]
If you really want, you could also make hasNoDups O(n) using hash tables in the ST monad.
Concerning limiting the size: you could always have your own parametrized functions that produce different Gen Foo, but I'd say in most cases it's not necessary. Gen has it's own internal size parameter, which is increased throughout the tests (see this answer), so different sizes (as generated using listOf) of lists are covered.
But I'd suggest you to implement shrink, as this will give you much nicer counter-examples. For example, if we define (a wrong test) that tried to verify that no instance of Var contains 'a' in any of its variable:
prop_Foo_hasNoDups :: Foo -> Property
prop_Foo_hasNoDups (Vars xs) = all (notElem 'a') xs === True
we'll get ugly counter-examples such as
Vars ["RhdaJytDWKm","FHHhrqbI","JVPKGTqNCN","awa","DABsOGNRYz","Wshubp","Iab","pl"]
But adding
shrink (Vars xs) = map Vars $ shrink xs
to Arbitrary Foo makes the counter-example to be just
Vars ["a"]

suchThat :: Gen a -> (a -> Bool) -> Gen a is a way to embed Boolean predicates in a Gen. See the haddocks for more info.
Here's how you would make the instance unique:
instance Arbitrary Foo where
arbitrary = Vars <$> (:[]) <$> (:[]) <$> choose ('A','z')
`suchThat` isUnique
where
isUnique x = nub x == x

Sort by constructor ignoring (part of) value

Suppose I have
data Foo = A String Int | B Int
I want to take an xs :: [Foo] and sort it such that all the As are at the beginning, sorted by their strings, but with the ints in the order they appeared in the list, and then have all the Bs at the end, in the same order they appeared.
In particular, I want to create a new list containg the first A of each string and the first B.
I did this by defining a function taking Foos to (Int, String)s and using sortBy and groupBy.
Is there a cleaner way to do this? Preferably one that generalizes to at least 10 constructors.
Typeable, maybe? Something else that's nicer?
EDIT: This is used for processing a list of Foos that is used elsewhere. There is already an Ord instance which is the normal ordering.

You can use
sortBy (comparing foo)
where foo is a function that extracts the interesting parts into something comparable (e.g. Ints).
In the example, since you want the As sorted by their Strings, a mapping to Int with the desired properties would be too complicated, so we use a compound target type.
foo (A s _) = (0,s)
foo (B _) = (1,"")
would be a possible helper. This is more or less equivalent to Tikhon Jelvis' suggestion, but it leaves space for the natural Ord instance.

To make it easier to build comparison function for ADTs with large number of constructors, you can map values to their constructor index with SYB:
{-# LANGUAGE DeriveDataTypeable #-}
import Data.Generics
data Foo = A String Int | B Int deriving (Show, Eq, Typeable, Data)
cIndex :: Data a => a -> Int
cIndex = constrIndex . toConstr
Example:
*Main Data.Generics> cIndex $ A "foo" 42
1
*Main Data.Generics> cIndex $ B 0
2

Edit:After re-reading your question, I think the best option is to make Foo an instance of Ord. I do not think there is any way to do this automatically that will act the way you want (just using deriving will create different behavior).
Once Foo is an instance of Ord, you can just use sort from Data.List.
In your exact example, you can do something like this:
data Foo = A String Int | B Int deriving (Eq)
instance Ord Foo where
(A _ _) <= (B _) = True
(A s _) <= (A s' _) = s <= s'
(B _) <= (B _) = True
When something is an instance of Ord, it means the data type has some ordering. Once we know how to order something, we can use a bunch of existing functions (like sort) on it and it will behave how you want. Anything in Ord has to be part of Eq, which is what the deriving (Eq) bit does automatically.
You can also derive Ord. However, the behavior will not be exactly what you want--it will order by all of the fields if it has to (e.g. it will put As with the same string in order by their integers).
Further edit: I was thinking about it some more and realized my solution is probably semantically wrong.
An Ord instance is a statement about your whole data type. For example, I'm saying that Bs are always equal with each other when the derived Eq instance says otherwise.
If the data your representing always behaves like this (that is, Bs are all equal and As with the same string are all equal) then an Ord instance makes sense. Otherwise, you should not actually do this.
However, you can do something almost exactly like this: write your own special compare function (Foo -> Foo -> Ordering) that encapsulates exactly what you want to do then use sortBy. This properly codifies that your particular sorting is special rather than the natural ordering of the data type.

You could use some template haskell to fill in the missing transitive cases. The mkTransitiveLt creates the transitive closure of the given cases (if you order them least to greatest). This gives you a working less-than, which can be turned into a function that returns an Ordering.
{-# LANGUAGE TemplateHaskell #-}
import MkTransitiveLt
import Data.List (sortBy)
data Foo = A String Int | B Int | C | D | E deriving(Show)
cmp a b = $(mkTransitiveLt [|
case (a, b) of
(A _ _, B _) -> True
(B _, C) -> True
(C, D) -> True
(D, E) -> True
(A s _, A s' _) -> s < s'
otherwise -> False|])
lt2Ord f a b =
case (f a b, f b a) of
(True, _) -> LT
(_, True) -> GT
otherwise -> EQ
main = print $ sortBy (lt2Ord cmp) [A "Z" 1, A "A" 1, B 1, A "A" 0, C]
Generates:
[A "A" 1,A "A" 0,A "Z" 1,B 1,C]
mkTransitiveLt must be defined in a separate module:
module MkTransitiveLt (mkTransitiveLt)
where
import Language.Haskell.TH
mkTransitiveLt :: ExpQ -> ExpQ
mkTransitiveLt eq = do
CaseE e ms <- eq
return . CaseE e . reverse . foldl go [] $ ms
where
go ms m#(Match (TupP [a, b]) body decls) = (m:ms) ++
[Match (TupP [x, b]) body decls | Match (TupP [x, y]) _ _ <- ms, y == a]
go ms m = m:ms

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

QuickCheck: How to use exhaustiveness checker to prevent forgotten constructors of a sum type - haskell

Related

Accessing a common field in sum type variants

Conditions on list comprehension using Haskell and SBV

Printing Dynamic Data

Uniqueness and other restrictions for Arbitrary in QuickCheck

Sort by constructor ignoring (part of) value

Categories

Resources