Extending algebraic data type

Extending algebraic data type - haskell

Note: if this question is somehow odd, this is because I was only recently exposed to Haskell and am still adapting to the functional mindset.
Considering a data type like Maybe:
data MyOwnMaybe a = MyOwnNothing | MyOwnJust a
everyone using my data type will write functions like
maybeToList :: MyOwnMaybe a -> [a]
maybeToList MyOwnNothing = []
maybeToList (MyOwnJust x) = [x]
Now, suppose that, at a later time, I wish to extend this data type
data MyOwnMaybe a = MyOwnNothing | MyOwnJust a | SuperpositionOfNothingAndJust a
how do I make sure that everyone's functions will break at compile-time?
Of course, there is the chance that somehow I'm not "getting" algebraic data types and maybe I shouldn't be doing this at all, but considering a data type Action
data Action = Reset | Send | Remove
it would seem that adding an extra Action like Add would not be so uncommon (and I wouldn't want to risk having all these functions around that possibly cannot handle my new Action)

Well, bad news first: sometimes you just can't do it. Period.
But that is language-agnostic; in any language you sometimes have to break interface. There is no way around it.
Now, good news: you can actually go a great length before you have to do that.
You just have to carefully consider what you export from your module. If, instead of exporting the internal workings of it, you export high-level functions, then there is a good chance you can rewrite those function using the new data type, and everything would go smooth.
In particular, be very careful when exporting data constructors. In this case, you don't just export functions that create your data; you are also exporting the possibility of pattern-matching; and that is not something that ties you pretty tight.
So, in your example, if you write functions like
myOwnNothing :: MyOwnMaybe a
myOwnJust :: a -> MyOwnMaybe a
and
fromMyOwnMaybe :: MyOwnMaybe a -> b -> (a -> b) -> b
fromMyOwnMaybe MyOwnNothing b _ = b
fromMyOwnMaybe (MyOwnJust a) _ f = f a
then it's reasonable to assume that you would be able to reimplement it for the updated MyOwnMaybe data type; so, just export those functions and the data type itself, but don't export constructors.
The only situation in which you would benefit from exporting constructors is when you are absolutely sure that your data type won't ever change. For example, Bool would always have only two (fully defined) values: True and False, it won't be extended by some FileNotFound or anything (although Edward Kmett might disagree). Ditto Maybe or [].
But the idea is more general: stay as high-level as you can.

You seem to know that GHC can warn about non-exhaustive pattern matches in function via the -W flag or explicitly with -fwarn-incomplete-patterns.
There is a good discussion about why these warnings are not automatically compile-time errors at this SO question:
In Haskell, why non-exhaustive patterns are not compile-time errors?
Also, consider this case where you have an ADT with a large number of constructors:
data Alphabet = A | B | C | ... | X | Y | Z
isVowel :: Alphabet -> Bool
isVowel A = True
isVowel E = True
isVowel I = True
isVowel O = True
isVowel U = True
isVowel _ = False
A default case is used as a convenience to avoid having to write out the other 21 cases.
Now if you add an addition constructor to Alphabet, should isVowel be flagged as "incomplete"?

One thing that a lot of modules do is not to export their constructors. Instead, they export functions that can be used (“smart constructors”). If you change your ADT later, you have to fix your functions in the module, but no one else's code gets broken.

Related

Writing modules in Haskell the right way

(I'm totally rewriting this question to give it a better focus; you can see the history of changes if you want to see the original.)
Let's say I have two modules:
One module defines the function inverseAndSqrt. What this function actually does is not important; what is important is that it returns none, one, or both of two things in a way that the client can distinguish which one is which;
module Module1 (inverseAndSqrt) where
type TwoOpts a = (Maybe a, Maybe a)
inverseAndSqrt :: Int -> TwoOpts Float
inverseAndSqrt x = (if x /= 0 then Just (1.0/(fromIntegral x)) else Nothing,
if x >= 0 then Just (sqrt $ fromIntegral x) else Nothing)
another module defines other functions depending on inverseAndSqrt and on its type
module Module2 where
import Module1
fun :: (Maybe Float, Maybe Float) -> Float
fun (Just x, Just y) = x + y
fun (Just x, Nothing) = x
fun (Nothing, Just y) = y
exportedFun :: Int -> Float
exportedFun = fun . inverseAndSqrt
What I want to understand from the perspective of design principle is: how should I interface Module1 with other modules (e.g. Module2) in a way that makes it well encapsulated, reusable, etc?
The problems I see are
I could one day decide that I don't want to use a pair to return the two results anymore; I could decide to use a 2 elements list; or another type which is isomorphic (I think this is the right adjective, isn't it?) to a pair; if I do this, all client code will break
Exporting the TwoOpts type synonym doesn't solve anything, as Module1 could still change its implementation thus breaking client code.
Module1 is also forcing the type of the two optionals to be the same, but I'm not sure this is really relevant to this question...
How should I design Module1 (and thus edit Module2 as well) such that the two are not tightly coupled?
One thing I can think of is that maybe I should define a typeclass expressing what "a box with two optional things in it" is, and then Module1 and Module2 would use that as a common interface. But should that be in both module? In either of them? Or in none of them, in a third module? Or maybe such a class/concept is not needed?
I'm not a computer scientist so I'm sure that this question highlights some misunderstanding of mine due to lack of experience and theoretical background. Any help filling the gaps is welcome.
Possible modifications I'd like to support
Related to what chepner suggested in a comment to his answer, at some point I might want to extend the support from 2-tuple things to both 2- and 3-tuple things, having different accessor names for them, suche as get1of2/get2of2 (let's say these are the name we use when we first design Module1) vs get1of3/get2of3/get3of3.
At some point I would also be able to complement this 2-tuple-like type with something else, for instance an optional containing Just the sum¹ of the two main contents only if they are both Justs, or a Nothing if at least one of the two main contents is a Nothing. I guess in this case the internal representation of this class would be something like ((Maybe a, Maybe a), Maybe b) (¹ The sum is really a stupid example, so I've used b here instead of a to be more general than the sum would require).

To me, Haskell design is all type-centric. The design rule for functions is just "use the most general and accurate types that do the job", and the whole problem of design in Haskell is about coming up with the best types for the job.
We would like there to be no "junk" in the types, so that they have exactly one representation for each value you want to denote. E.g. String is a bad representation for numbers, because "0", "0.0", "-0" all mean the same thing, and also because "The Prisoner" is not a number -- it is a valid representation that does not have a valid denotation. If, say for performance reasons, the same denotation can be represented multiple ways, the type's API should make that difference invisible to the user.
So in your case, (Maybe a, Maybe a) is perfect -- it means exactly what you need it to mean. Using something more complicated is unnecessary, and will just complicate matters for the user. At some point whatever you expose will have to be convertible to a Maybe a for the first thing and a Maybe a for the second thing, and there is no extra information than that, so the tuple is perfect. Whether you use a type synonym or not is a matter of style -- I prefer not use synonyms at all and only give types names when I have a more formal abstraction in mind.
Connotation is important. For example, if I had a function for finding the roots of a quadratic polynomial, I probably wouldn't use TwoOpts, even though there are at most two of them. The fact that my return values are all "the same kind of thing" in an intuitive sense makes me prefer a list (or if I'm feeling particularly picky, a Set or Bag), even if the list has at most two elements. I just have it match my best understanding of the domain at the time, so I won't change it unless my understanding of the domain has changed in a significant way, in which case the opportunity to review all its uses is exactly what I want. If you are writing your functions to be as polymorphic as possible, then often you won't need to change anything but the specific moments the meaning is used, the exact moment domain knowledge is required (such as understanding the relationship between TwoOpts and Set). You don't need to "redo the plumbing" if it's made of a sufficiently flexible, polymorphic material.
Supposing you didn't have a clean isomorphism to a standard type like (Maybe a, Maybe a), and you wanted to formalize TwoOpts. The way here is to build an API out of its constructors, combinators, and eliminators. For example:
data TwoOpts a -- abstract, not exposed
-- constructors
none :: TwoOpts a
justLeft :: a -> TwoOpts a
justRight :: a -> TwoOpts a
both :: a -> a -> TwoOpts a
-- combinators
-- Semigroup and Monoid at least
swap :: TwoOpts a -> TwoOpts a
-- eliminators
getLeft :: TwoOpts a -> Maybe a
getRight :: TwoOpts a -> Maybe a
In this case the eliminators give exactly your representation (Maybe a, Maybe a) as their final coalgebra.
-- same as the tuple in a newtype, just more conventional
data TwoOpts a = TwoOpts (Maybe a) (Maybe a)
Or if you wanted to focus on the constructors side you could use an initial algebra
data TwoOpts a
= None
| JustLeft a
| JustRight a
| Both a a
You are at liberty to change this representation as long as it still implements the combinatory API above. If you have reason to use different representations of the same API, make the API into a typeclass (typeclass design is a whole other story).
In Einstein's famous words, "make it as simple as possible, but no simpler".

Don't define a simple type alias; this exposes the details of how you implement TwoOpts.
Instead, define a new type, but don't export the data constructor, but rather functions for accessing the two components. Then you are free to change the implementation of the type all you like without changing the interface, because the user can't pattern-match on a value of type TwoOpts a.
module Module1 (TwoOpts, inverseAndSqrt, getFirstOpt, getSecondOpt) where
data TwoOpts a = TwoOpts (Maybe a) (Maybe a)
getFirstOpt, getSecondOpt :: TwoOpts a -> Maybe a
getFirstOpt (TwoOpts a _) = a
getSecondOpt (TwoOpts _ b) = b
inverseAndSqrt :: Int -> TwoOpts Float
inverseAndSqrt x = TwoOpts (safeInverse x) (safeSqrt x)
where safeInverse 0 = Nothing
safeInverse x = Just (1.0 / fromIntegral x)
safeSqrt x | x >= 0 = Just $ sqrt $ fromIntegral x
| otherwise = Nothing
and
module Module2 where
import Module1
fun :: TwoOpts Float -> Float
fun a = case (getFirstOpts a, getSecondOpt a) of
(Just x, Just y) -> x + y
(Just x, Nothing) -> x
(Nothing, Just y) -> y
exportedFun :: Int -> Float
exportedFun = fun . inverseAndSqrt
Later, when you realize that you've reimplemented the type product, you can change your definitions without affecting any user code.
newtype TwoOpts a = TwoOpts { getOpts :: (Maybe a, Maybe a) }
getFirstOpt, getSecondOpt :: TwoOpts a -> Maybe a
getFirstOpt = fst . getOpts
getSecondOpt = snd . getOpts

Haskell: Typeclass vs passing a function

To me it seems that you can always pass function arguments rather than using a typeclass. For example rather than defining equality typeclass:
class Eq a where
(==) :: a -> a -> Bool
And using it in other functions to indicate type argument must be an instance of Eq:
elem :: (Eq a) => a -> [a] -> Bool
Can't we just define our elem function without using a typeclass and instead pass a function argument that does the job?

Yes. This is called "dictionary passing style". Sometimes when I am doing some especially tricky things, I need to scrap a typeclass and turn it into a dictionary, because dictionary passing is more powerful1, yet often quite cumbersome, making conceptually simple code look quite complicated. I use dictionary passing style sometimes in languages that aren't Haskell to simulate typeclasses (but have learned that that is usually not as great an idea as it sounds).
Of course, whenever there is a difference in expressive power, there is a trade-off. While you can use a given API in more ways if it is written using DPS, the API gets more information if you can't. One way this shows up in practice is in Data.Set, which relies on the fact that there is only one Ord dictionary per type. The Set stores its elements sorted according to Ord, and if you build a set with one dictionary, and then inserted an element using a different one, as would be possible with DPS, you could break Set's invariant and cause it to crash. This uniqueness problem can be mitigated using a phantom existential type to mark the dictionary, but, again, at the cost of quite a bit of annoying complexity in the API. This also shows up in pretty much the same way in the Typeable API.
The uniqueness bit doesn't come up very often. What typeclasses are great at is writing code for you. For example,
catProcs :: (i -> Maybe String) -> (i -> Maybe String) -> (i -> Maybe String)
catProcs f g = f <> g
which takes two "processors" which take an input and might give an output, and concatenates them, flattening away Nothing, would have to be written in DPS something like this:
catProcs f g = (<>) (funcSemi (maybeSemi listSemi)) f g
We essentially had to spell out the type we're using it at again, even though we already spelled it out in the type signature, and even that was redundant because the compiler already knows all the types. Because there's only one way to construct a given Semigroup at a type, the compiler can do it for you. This has a "compound interest" type effect when you start defining a lot of parametric instances and using the structure of your types to compute for you, as in the Data.Functor.* combinators, and this is used to great effect with deriving via where you can essentially get all the "standard" algebraic structure of your type written for you.
And don't even get me started on MPTC's and fundeps, which feed information back into typechecking and inference. I have never tried converting such a thing to DPS -- I suspect it would involve passing around a lot of type equality proofs -- but in any case I'm sure it would be a lot more work for my brain than I would be comfortable with.
--
1Unless you use reflection in which case they become equivalent in power -- but reflection can also be cumbersome to use.

Yes. That (called dictionary passing) is basically what the compiler does to typeclasses anyway. For that function, done literally, it would look a bit like this:
elemBy :: (a -> a -> Bool) -> a -> [a] -> Bool
elemBy _ _ [] = False
elemBy eq x (y:ys) = eq x y || elemBy eq x ys
Calling elemBy (==) x xs is now equivalent to elem x xs. And in this specific case, you can go a step further: eq has the same first argument every time, so you can make it the caller's responsibility to apply that, and end up with this:
elemBy2 :: (a -> Bool) -> [a] -> Bool
elemBy2 _ [] = False
elemBy2 eqx (y:ys) = eqx y || elemBy2 eqx ys
Calling elemBy2 (x ==) xs is now equivalent to elem x xs.
...Oh wait. That's just any. (And in fact, in the standard library, elem = any . (==).)

Factoring out recursion in a complex AST

For a side project I am working on I currently have to deal with an abstract syntax tree and transform it according to rules (the specifics are unimportant).
The AST itself is nontrivial, meaning it has subexpressions which are restricted to some types only. (e.g. the operator A must take an argument which is of type B only, not any Expr. A drastically simplified reduced version of my datatype looks like this:
data Expr = List [Expr]
| Strange Str
| Literal Lit
data Str = A Expr
| B Expr
| C Lit
| D String
| E [Expr]
data Lit = Int Int
| String String
My goal is to factor out the explicit recursion and rely on recursion schemes instead, as demonstrated in these two excellent blog posts, which provide very powerful general-purpose tools to operate on my AST. Applying the necessary factoring, we end up with:
data ExprF a = List [a]
| Strange (StrF a)
| Literal (LitF a)
data StrF a = A a
| B a
| C (LitF a)
| D String
| E [a]
data LitF a = Int Int
| String String
If I didn't mess up, type Expr = Fix ExprF should now be isomorphic to the previously defined Expr.
However, writing cata for these cases becomes rather tedious, as I have to pattern match B a :: StrF a inside of an Str :: ExprF a for cata to be well-typed. For the entire original AST this is unfeasible.
I stumbled upon fixing GADTs, which seems to me like it is a solution to my problem, however the user-unfriendly interface of the duplicated higher-order type classes etc. is quite the unneccessary boilerplate.
So, to sum up my questions:
Is rewriting the AST as a GADT the correct way to go about this?
If yes, how could I transform the example into a well-working version? On a second note, is there better support for higher kinded Functors in GHC now?

If you've gone through the effort of to separate out the recursion in your data type, then you can just derive Functor and you're done. You don't need any fancy features to get the recursion scheme. (As a side note, there's no reason to parameterize the Lit data type.)
The fold is:
newtype Fix f = In { out :: f (Fix f) }
gfold :: (Functor f) => (f a -> a) -> Fix f -> a
gfold alg = alg . fmap (gfold alg) . out
To specify the algebra (the alg parameter), you need to do a case analysis against ExprF, but the alternative would be to have the fold have a dozen or more parameters: one for each data constructor. That wouldn't really save you much typing and would be much harder to read. If you want (and this may require rank-2 types in general), you can package all those parameters up into a record and then you could use record update to update "pre-made" records that provide "default" behavior in various circumstances. There's an old paper Dealing with Large Bananas that takes an approach like this. What I'm suggesting, to be clear, is just wrapping the gfold function above with a function that takes a record, and passes in an algebra that will do the case analysis and call the appropriate field of the record for each case.
Of course, you could use GHC Generics or the various "generic/polytypic" programming libraries like Scrap Your Boilerplate instead of this. You are basically recreating what they do.

Using different Ordering for Sets

I was reading a Chapter 2 of Purely Functional Data Structures, which talks about unordered sets implemented as binary search trees. The code is written in ML, and ends up showing a signature ORDERED and a functor UnbalancedSet(Element: ORDERED): SET. Coming from more of a C++ background, this makes sense to me; custom comparison function objects form part of the type and can be passed in at construction time, and this seems fairly analogous to the ML functor way of doing things.
When it comes to Haskell, it seems the behavior depends only on the Ord instance, so if I wanted to have a set that had its order reversed, it seems like I'd have to use a newtype instance, e.g.
newtype ReverseInt = ReverseInt Int deriving (Eq, Show)
instance Ord ReverseInt where
compare (ReverseInt a) (ReverseInt b)
| a == b = EQ
| a < b = GT
| a > b = LT
which I could then use in a set:
let x = Set.fromList $ map ReverseInt [1..5]
Is there any better way of doing this sort of thing that doesn't resort to using newtype to create a different Ord instance?

No, this is really the way to go. Yes, having a newtype is sometimes annoying but you get some big benefits:
When you see a Set a and you know a, you immediately know what type of comparison it uses (sort of the same way that purity makes code more readable by not making you have to trace execution). You don't have to know where that Set a comes from.
For many cases, you can coerce your way through multiple newtypes at once. For example, I can turn xs = [1,2,3] :: Int into ys = [ReverseInt 1, ReverseInt 2, ReverseInt 3] :: [ReverseInt] just using ys = coerce xs :: [ReverseInt]. Unfortunately, that isn't the case for Set (and it shouldn't - you'd need the coercion function to be monotonic to not screw up the data structure invariants, and there is not yet a way to express that in the type system).
newtypes end up being more composable than you expect. For example, the ReverseInt type you made already exists in a form that generalizes to reversing any type with an Ord constraint: it is called Down. To be explicit, you could use Down Int instead of ReversedInt, and you get the instance you wrote out for free!
Of course, if you still feel very strongly about this, nothing is stopping you from writing your version of Set which has to have a field which is the comparison function it uses. Something like
data Set a = Set { comparisionKey :: a -> a -> Ordering
, ...
}
Then, every time you make a Set, you would have to pass in the comparison key.

Data type design in Haskell

Learning Haskell, I write a formatter of C++ header files. First, I parse all class members into a-collection-of-class-members which is then passed to the formatting routine. To represent class members I have
data ClassMember = CmTypedef Typedef |
CmMethod Method |
CmOperatorOverload OperatorOverload |
CmVariable Variable |
CmFriendClass FriendClass |
CmDestructor Destructor
(I need to classify the class members this way because of some peculiarities of the formatting style.)
The problem that annoys me is that to "drag" any function defined for the class member types to the ClassMember level, I have to write a lot of redundant code. For example,
instance Formattable ClassMember where
format (CmTypedef td) = format td
format (CmMethod m) = format m
format (CmOperatorOverload oo) = format oo
format (CmVariable v) = format v
format (CmFriendClass fc) = format fc
format (CmDestructor d) = format d
instance Prettifyable ClassMember where
-- same story here
On the other hand, I would definitely like to have a list of ClassMember objects (at least, I think so), hence defining it as
data ClassMember a = ClassMember a
instance Formattable ClassMember a
format (ClassMember a) = format a
doesn't seem to be an option.
The alternatives I'm considering are:
Store in ClassMember not object instances themselves, but functions defined on the corresponding types, which are needed by the formatting routine. This approach breaks the modularity, IMO, as the parsing results, represented by [ClassMember], need to be aware of all their usages.
Define ClassMember as an existential type, so [ClassMember] is no longer a problem. I doubt whether this design is strict enough and, again, I need to specify all constraints in the definition, like data ClassMember = forall a . Formattable a => ClassMember a. Also, I would prefer a solution without using extensions.
Is what I'm doing a proper way to do it in Haskell or there is a better way?

First, consider trimming down that ADT a bit. Operator overloads and destructors are special kinds of methods, so it might make more sense to treat all three in CmMethod; Method will then have special ways to separate them. Alternatively, keep all three CmMethod, CmOperatorOverload, and CmDestructor, but let them all contain the same Method type.
But of course, you can reduce the complexity only so much.
As for the specific example of a Show instance: you really don't want to write that yourself except in some special cases. For your case, it's much more reasonable to have the instance derived automatically:
data ClassMember = CmTypedef Typedef
| CmMethod Method
| ...
| CmDestructor Destructor
deriving (Show)
This will give different results from your custom instance – because yours is wrong: showing a contained result should also give information about the constructor.
If you're not really interested in Show but talking about another class C that does something more specific to ClassMembers – well, then you probably shouldn't have defined C in the first place! The purpose of type classes is to express mathematical concepts that hold for a great variety of types.

A possible solution is to use records.
It can be used without extensions and preserves flexibility.
There is still some boilerplate code, but you need to type it only once for all. So if you would need to perform another set of operations over your ClassMember, it would be very easy and quick to do it.
Here is an example for your particular case (template Haskell and Control.Lens makes things easier but are not mandatory):
{-# LANGUAGE TemplateHaskell #-}
module Test.ClassMember
import Control.Lens
-- | The class member as initially defined.
data ClassMember =
CmTypedef Typedef
| CmMethod Method
| CmOperatorOverload OperatorOverload
| CmVariable Variable
| CmFriendClass FriendClass
| CmDestructor Destructor
-- | Some dummy definitions of the data types, so the code will compile.
data Typedef = Typedef
data Method = Method
data OperatorOverload = OperatorOverload
data Variable = Variable
data FriendClass = FriendClass
data Destructor = Destructor
{-|
A data type which defines one function per constructor.
Note the type a, which means that for a given Hanlder "a" all functions
must return "a" (as for a type class!).
-}
data Handler a = Handler
{
_handleType :: Typedef -> a
, _handleMethod :: Method -> a
, _handleOperator :: OperatorOverload -> a
, _handleVariable :: Variable -> a
, _handleFriendClass :: FriendClass -> a
, _handleDestructor :: Destructor -> a
}
{-|
Here I am using lenses. This is not mandatory at all, but makes life easier.
This is also the reason of the TemplateHaskell language pragma above.
-}
makeLenses ''Handler
{-|
A function acting as a dispatcher (the boilerplate code!!!), telling which
function of the handler must be used for a given constructor.
-}
handle :: Handler a -> ClassMember -> a
handle handler member =
case member of
CmTypedef a -> handler^.handleType $ a
CmMethod a -> handler^.handleMethod $ a
CmOperatorOverload a -> handler^.handleOperator $ a
CmVariable a -> handler^.handleVariable $ a
CmFriendClass a -> handler^.handleFriendClass $ a
CmDestructor a) -> handler^.handleDestructor $ a
{-|
A dummy format method.
I kept things simple here, but you could define much more complicated
functions.
You could even define some generic functions separately and... you could define
them with some extra arguments that you would only provide when building
the Handler! An (dummy!) example is the way the destructor function is
constructed.
-}
format :: Handler String
format = Handler
(\x -> "type")
(\x -> "method")
(\x -> "operator")
(\x -> "variable")
(\x -> "Friend")
(destructorFunc $ (++) "format ")
{-|
A dummy function showcasing partial application.
It has one more argument than handleDestructor. In practice you are free
to add as many as you wish as long as it ends with the expected type
(Destructor -> String).
-}
destructorFunc :: (String -> String) -> Destructor -> String
destructorFunc f _ = f "destructor"
{-|
Construction of the pretty handler which illustrates the reason why
using lens by keeping a nice and concise syntax.
The "&" is the backward operator and ".~" is the set operator.
All we do here is to change the functions of the handleType and the
handleDestructor.
-}
pretty :: Handler String
pretty = format & handleType .~ (\x -> "Pretty type")
& handleDestructor .~ (destructorFunc ((++) "Pretty "))
And now we can run some tests:
test1 = handle format (CmDestructor Destructor)
> "format destructor"
test2 = handle pretty (CmDestructor Destructor)
> "Pretty destructor"

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string