What is the best practice to generate data which satisfy specific property in QuickCheck? - haskell

When we are using QuickCheck to check our programs, we need to define generators for our data, there is some generic way to define them, but the generic way usually become useless when we need the generated data to satisfy some constraints to work.
e.g.
data Expr
= LitI Int
| LitB Bool
| Add Expr Expr
| And Expr Expr
data TyRep = Number | Boolean
typeInfer :: Expr -> Maybe TyRep
typeInfer e = case e of
LitI _ -> Number
LitB _ -> Boolean
Add e1 e2 -> case (typeInfer e1, typeInfer e2) of
(Just Number, Just Number) -> Just Number
_ -> Nothing
And e1 e2 -> case (typeInfer e1, typeInfer e2) of
(Just Boolean, Just Boolean) -> Just Boolean
_ -> Nothing
now I need to define generator of Expr (i.e. Gen Expr or instance Arbitrary Expr), but also want it generates the type correct ones (i.e. isJust (typeInfer generatedExpr))
a naive way to do that is use suchThat to filter out the invalid ones, but that is obviously inefficient when Expr and TyRep becomes complicated with more cases.
Another similar situation is about reference integrity, e.g.
data Expr
= LitI Int
| LitB Bool
| Add Expr Expr
| And Expr Expr
| Ref String -- ^ reference another Expr via it's name
type Context = Map String Expr
In this case, we want all the referenced names in the generated Expr are contained in some specific Context, now I have to generate Expr for specific Context:
arbExpr :: Context -> Gen Expr
but now shrink will be a problem, and to solve this problem, I have to define a specific version of shrink, and use forAllShrink everytime I use arbExpr, that means a lot of work.
So I want to know, is there a best practice to do such things?

For well-typed terms, a simple approach in many cases is to have one generator for each type, or, equivalently, a function TyRep -> Gen Expr. Adding variables on top of that, this usually turns into a function Context -> TyRep -> Gen Expr.
In the case of generating terms with variables (and with no or very simple types), indexing the type of terms by the context (e.g., like you would do using the bound library) should make it quite easy to derive a generator generically.
For shrinking, hedgehog's approach can work quite well, where Gen generates a value together with shrunk versions, sparing you from defining a separate shrinking function.
Note that as the well-formedness/typing relation becomes more complex, you start hitting the theoretical wall where generating terms is at least as hard as arbitrary proof search.
For more advanced techniques/related literature, with my own comments about possibly using it in Haskell:
Generating Constrained Data with Uniform Distribution, by Claessen et al., FLOPS'14 (PDF). I believe the Haskell package lazy-search has most of the machinery described by the paper, but it seems aimed at enumeration rather than random generation.
Making Random Judgments: Automatically Generating Well-Typed Terms from the Definition of a Type-System, by Fetscher et al., ESOP'15 (PDF), the title says it all. I don't know about a Haskell implementation though; you might want to ask the authors.
Beginner's Luck: A Language for Property-Based Generators, by Lampropoulos et al., POPL'17 (PDF) (disclaimer: I'm a coauthor). A language of properties (more concretely, functions T -> Bool, e.g., a typechecker) that can be interpreted as random generators (Gen T). The language's syntax is strongly inspired by Haskell, but there are still a few differences. The implementation has an interface to extract the generated values in Haskell (github repo).
Generating Good Generators for Inductive Relations, by Lampropoulos et al. POPL'18 (PDF). It's in Coq QuickChick, but tying it to Haskell QuickCheck by extraction seems reasonably feasible.

Related

Factoring out recursion in a complex AST

For a side project I am working on I currently have to deal with an abstract syntax tree and transform it according to rules (the specifics are unimportant).
The AST itself is nontrivial, meaning it has subexpressions which are restricted to some types only. (e.g. the operator A must take an argument which is of type B only, not any Expr. A drastically simplified reduced version of my datatype looks like this:
data Expr = List [Expr]
| Strange Str
| Literal Lit
data Str = A Expr
| B Expr
| C Lit
| D String
| E [Expr]
data Lit = Int Int
| String String
My goal is to factor out the explicit recursion and rely on recursion schemes instead, as demonstrated in these two excellent blog posts, which provide very powerful general-purpose tools to operate on my AST. Applying the necessary factoring, we end up with:
data ExprF a = List [a]
| Strange (StrF a)
| Literal (LitF a)
data StrF a = A a
| B a
| C (LitF a)
| D String
| E [a]
data LitF a = Int Int
| String String
If I didn't mess up, type Expr = Fix ExprF should now be isomorphic to the previously defined Expr.
However, writing cata for these cases becomes rather tedious, as I have to pattern match B a :: StrF a inside of an Str :: ExprF a for cata to be well-typed. For the entire original AST this is unfeasible.
I stumbled upon fixing GADTs, which seems to me like it is a solution to my problem, however the user-unfriendly interface of the duplicated higher-order type classes etc. is quite the unneccessary boilerplate.
So, to sum up my questions:
Is rewriting the AST as a GADT the correct way to go about this?
If yes, how could I transform the example into a well-working version? On a second note, is there better support for higher kinded Functors in GHC now?
If you've gone through the effort of to separate out the recursion in your data type, then you can just derive Functor and you're done. You don't need any fancy features to get the recursion scheme. (As a side note, there's no reason to parameterize the Lit data type.)
The fold is:
newtype Fix f = In { out :: f (Fix f) }
gfold :: (Functor f) => (f a -> a) -> Fix f -> a
gfold alg = alg . fmap (gfold alg) . out
To specify the algebra (the alg parameter), you need to do a case analysis against ExprF, but the alternative would be to have the fold have a dozen or more parameters: one for each data constructor. That wouldn't really save you much typing and would be much harder to read. If you want (and this may require rank-2 types in general), you can package all those parameters up into a record and then you could use record update to update "pre-made" records that provide "default" behavior in various circumstances. There's an old paper Dealing with Large Bananas that takes an approach like this. What I'm suggesting, to be clear, is just wrapping the gfold function above with a function that takes a record, and passes in an algebra that will do the case analysis and call the appropriate field of the record for each case.
Of course, you could use GHC Generics or the various "generic/polytypic" programming libraries like Scrap Your Boilerplate instead of this. You are basically recreating what they do.

Nearest equivalent to Prolog atom or Lisp symbol in Haskell

I'm trying to write a simple program for manipulating expressions in propositional calculus and would like a nice syntax for proposition variables (e.g. 'P or something).
Strings get the job done, but the syntax is misleading in this context and they permit inappropriate operations like ++.
Syntactically, I'd like to be able to write down something that does not "look quoted" visually (something like 'P is okay, though). In terms of the supported operations, I'd like to be able to determine whether two symbols are equal and to convert them into a string matching their name via show. I'd also like these things to be open (ADTs with only nullary constructors are similar in principle to symbols, but require all variants to be declared in advance).
Here's a toy example using strings where something symbol-like would be more appropriate.
type Var = String
data Proposition =
Primitive Var |
Negated Proposition |
Implication Proposition Proposition
instance Show Proposition where
show (Primitive p) = p
show (Negated n) = "!" ++ show n
show (Implication ant cons) =
"(" ++ show ant ++ "->" ++ show cons ++ ")"
main = putStrLn $ show $ Implication (Primitive "A") (Primitive "B")
Typically the way this is done in Haskell is by parameterizing over the type of symbols. So your example would become:
data Proposition a =
Primitive a |
Negated (Proposition a) |
Implication (Proposition a) (Proposition a)
which then leaves it up to the user to decide the best representation their symbols. This has advantages over LISP-like symbols: symbols intended for different purposes will not be mixed up, and data structures involving symbols now admit transformations over all the symbols, which are more useful than you realize. For example, Functor changes between symbol representations, and Monad models substitution.
(=<<) :: (a -> Proposition b) -> Proposition a -> Proposition b
^ ^^^^^^^^^^^^^ ^^^^^^^^^^^^^
substitute each free var with an expression in this expression
You can get a form of type-safe openness too:
implyOpen :: Proposition a -> Proposition b -> Proposition (Either a b)
implyOpen p q = Implication (Left <$> p) (Right <$> q)
Another fun trick is using a non-regular recursive type to model variable bindings in a type-safe way.
data Proposition a =
... |
ForAll (Proposition (Maybe a))
Here we have added one "free variable" to the inner proposition -- Primitive Nothing is the variable being quantified over. It may seem awkward at first, but when you get to coding it's bomb, because the types make it very hard to get it wrong.
bound is an excellent package for modelling expression languages based on this idea (and a few other tricks).

Omitting data type constructors

I'm trying to implement the following in Haskell:
0,1,2,...:N
x,y,z,...:V
+,*,-,/,...:F
F alias for Expr -> Expr -> Expr
Expr := N|V|F Expr Expr
My question is first:
Is the grammar flawed at type level? Does it make sense? All terms look like they'd type check (allowing for 0,1,... to be both Expr and N subtype, and x,y,... to be both Expr and V subtype).
And secondarily, what's the closest Haskell implementation? My current Haskell implementation is:
data F = +|-|*|...
data Expr = N|V|MakeExpr F Expr Expr
Any suggestions?
EDIT -
The key difference between the grammar and implementation is that type constructor is implicit /omitted in the grammar. Why are type constructors compulsory in Haskell?
The key difference between the grammar and implementation is that type constructor is implicit /omitted in the grammar. Why are type constructors compulsory in Haskell?
.
Is the grammar flawed at type level? Does it make sense? All terms look like they'd type check (allowing for 0,1,... to be both Expr and N subtype, and x,y,... to be both Expr and V subtype
The reason data constructors1 are compulsory in Haskell is specifically to ensure that you can't have x, y, .. be both Expr and V subtypes.
So your grammar looks like a reasonable model for how you want your language terms to work. But it doesn't make sense as a direct design for how you want to represent your language terms as Haskell data types.
Basically, Haskell deliberately does not have subtypes. It ensures that when you create a new type (with newtype or data) that all of the values of the new type are distinct from the values of all other existing types (and all types that will be created in future). It does this by having values of user-defined types always appear inside constructors (and making it impossible to "reuse" constructors; you always make new ones whenever you make a new type).
The way Haskell's type system works depends on this lack of subtyping. You could design a language that allowed subtypes (see Scala, perhaps). But it just fundamentally wouldn't be Haskell.
But what you can do instead is define something like:
data Expr
= ExprN N
| ExprV V
| ExprF Expr Expr
You still can't have a N value and just use it as an Expr. But you can just apply ExprN to it, and then you have an Expr. And it's really no more burden than if Haskell allowed you to use some n of type N as an Expr as well, but only required you to add a type annotation clarifying that that's what you meant; you just have to say ExprN n instead of n :: Expr.
Similarly when you have an Expr and you want to apply a function on N to it, the case statement to extract the N from the ExprN constructor (if it's there) isn't really any more code than you'd have to write to check if your Expr was actually an N.
1 "Type constructor" is a specific term in Haskell, which isn't what we're talking about here. I'm pretty sure what you meant by that was "the constructors for a type", but to be pedantic you accidentally referred to a different thing by using that term.
To clear it up, when you declare a type like data Maybe a = Nothing | Just a, Nothing and Just are new data constructors ("constructor" on its on is extremely likely to mean a data constructor) and Maybe is a new type constructor.

Extending algebraic data type

Note: if this question is somehow odd, this is because I was only recently exposed to Haskell and am still adapting to the functional mindset.
Considering a data type like Maybe:
data MyOwnMaybe a = MyOwnNothing | MyOwnJust a
everyone using my data type will write functions like
maybeToList :: MyOwnMaybe a -> [a]
maybeToList MyOwnNothing = []
maybeToList (MyOwnJust x) = [x]
Now, suppose that, at a later time, I wish to extend this data type
data MyOwnMaybe a = MyOwnNothing | MyOwnJust a | SuperpositionOfNothingAndJust a
how do I make sure that everyone's functions will break at compile-time?
Of course, there is the chance that somehow I'm not "getting" algebraic data types and maybe I shouldn't be doing this at all, but considering a data type Action
data Action = Reset | Send | Remove
it would seem that adding an extra Action like Add would not be so uncommon (and I wouldn't want to risk having all these functions around that possibly cannot handle my new Action)
Well, bad news first: sometimes you just can't do it. Period.
But that is language-agnostic; in any language you sometimes have to break interface. There is no way around it.
Now, good news: you can actually go a great length before you have to do that.
You just have to carefully consider what you export from your module. If, instead of exporting the internal workings of it, you export high-level functions, then there is a good chance you can rewrite those function using the new data type, and everything would go smooth.
In particular, be very careful when exporting data constructors. In this case, you don't just export functions that create your data; you are also exporting the possibility of pattern-matching; and that is not something that ties you pretty tight.
So, in your example, if you write functions like
myOwnNothing :: MyOwnMaybe a
myOwnJust :: a -> MyOwnMaybe a
and
fromMyOwnMaybe :: MyOwnMaybe a -> b -> (a -> b) -> b
fromMyOwnMaybe MyOwnNothing b _ = b
fromMyOwnMaybe (MyOwnJust a) _ f = f a
then it's reasonable to assume that you would be able to reimplement it for the updated MyOwnMaybe data type; so, just export those functions and the data type itself, but don't export constructors.
The only situation in which you would benefit from exporting constructors is when you are absolutely sure that your data type won't ever change. For example, Bool would always have only two (fully defined) values: True and False, it won't be extended by some FileNotFound or anything (although Edward Kmett might disagree). Ditto Maybe or [].
But the idea is more general: stay as high-level as you can.
You seem to know that GHC can warn about non-exhaustive pattern matches in function via the -W flag or explicitly with -fwarn-incomplete-patterns.
There is a good discussion about why these warnings are not automatically compile-time errors at this SO question:
In Haskell, why non-exhaustive patterns are not compile-time errors?
Also, consider this case where you have an ADT with a large number of constructors:
data Alphabet = A | B | C | ... | X | Y | Z
isVowel :: Alphabet -> Bool
isVowel A = True
isVowel E = True
isVowel I = True
isVowel O = True
isVowel U = True
isVowel _ = False
A default case is used as a convenience to avoid having to write out the other 21 cases.
Now if you add an addition constructor to Alphabet, should isVowel be flagged as "incomplete"?
One thing that a lot of modules do is not to export their constructors. Instead, they export functions that can be used (“smart constructors”). If you change your ADT later, you have to fix your functions in the module, but no one else's code gets broken.

What is "Scrap Your Boilerplate"?

I see people talking about Scrap Your Boilerplate and generic programming in Haskell. What do these terms mean? When would I want to use Scrap Your Boilerplate, and how do I use it?
Often when making transformations on complex data types, we only need to affect small pieces of the structure---in other words, we're targeting specific reducible expressions, redexes, alone.
The classic example is double-negation elimination over a type of integer expressions:
data Exp = Plus Exp Exp | Mult Exp Exp | Negate Exp | Pure Int
doubleNegSimpl :: Exp -> Exp
doubleNegSimpl (Negate (Negate e)) = e
...
Even in describing this example, I'd prefer not to write out the entirety of the ... part. It is completely mechanical---nothing more than the engine for continuing the recursion throughout the entirety of the Exp.
This "engine" is the boilerplate we intend to scrap.
To achieve this, Scrap Your Boilerplate suggests a mechanism by which we can construct "generic traversals" over data types. These traversals operate exactly correctly without knowing anything at all about the specific data type in question. To do this, very roughly, we have a notion of generic annotated trees. These are larger than ADTs in such a way that all ADTs can be projected into the type of annotated trees:
section :: Generic a => a -> AnnotatedTree
and "valid" annotated trees can be projected back into some brand of ADT
retract :: Generic a => AnnotatedTree -> Maybe a
Notably, I'm introducing the Generic typeclass to indicate types which have section and retract defined.
Using this generic, annotated tree representation of all data types, we can define a traversal once and for all. In particular, we provide an interface (using section and retract strategically) so that end users are never exposed to the AnnotatedTree type. Instead, it looks a bit like:
everywhere' :: Generic a => (a -> a) -> (AnnotatedTree -> AnnotatedTree)
such that, combined with final and initial section and retracts and the invariant that our annotated trees are always "valid", we have
everywhere :: Generic a => (a -> a) -> (a -> a)
everywhere f a0 = fromJust . retract . everywhere' f . section
What does everywhere f a do? It tries to apply the function f "everywhere" in the ADT a. In other words, we now write our double negation simplification as follows
doubleNegSimpl :: Exp -> Exp
doubleNegSimpl (Negate (Negate e)) = e
doubleNegSimpl e = e
In other words, it acts as id whenever the redex (Negate (Negate _)) fails to match. If we apply everywhere to this
simplify :: Exp -> Exp
simplify = everywhere doubleNegSimpl
then double negations will be eliminated "everywhere" via a generic traversal. The ... boilerplate is gone.

Resources