Using different Ordering for Sets - haskell

I was reading a Chapter 2 of Purely Functional Data Structures, which talks about unordered sets implemented as binary search trees. The code is written in ML, and ends up showing a signature ORDERED and a functor UnbalancedSet(Element: ORDERED): SET. Coming from more of a C++ background, this makes sense to me; custom comparison function objects form part of the type and can be passed in at construction time, and this seems fairly analogous to the ML functor way of doing things.
When it comes to Haskell, it seems the behavior depends only on the Ord instance, so if I wanted to have a set that had its order reversed, it seems like I'd have to use a newtype instance, e.g.
newtype ReverseInt = ReverseInt Int deriving (Eq, Show)
instance Ord ReverseInt where
compare (ReverseInt a) (ReverseInt b)
| a == b = EQ
| a < b = GT
| a > b = LT
which I could then use in a set:
let x = Set.fromList $ map ReverseInt [1..5]
Is there any better way of doing this sort of thing that doesn't resort to using newtype to create a different Ord instance?

No, this is really the way to go. Yes, having a newtype is sometimes annoying but you get some big benefits:
When you see a Set a and you know a, you immediately know what type of comparison it uses (sort of the same way that purity makes code more readable by not making you have to trace execution). You don't have to know where that Set a comes from.
For many cases, you can coerce your way through multiple newtypes at once. For example, I can turn xs = [1,2,3] :: Int into ys = [ReverseInt 1, ReverseInt 2, ReverseInt 3] :: [ReverseInt] just using ys = coerce xs :: [ReverseInt]. Unfortunately, that isn't the case for Set (and it shouldn't - you'd need the coercion function to be monotonic to not screw up the data structure invariants, and there is not yet a way to express that in the type system).
newtypes end up being more composable than you expect. For example, the ReverseInt type you made already exists in a form that generalizes to reversing any type with an Ord constraint: it is called Down. To be explicit, you could use Down Int instead of ReversedInt, and you get the instance you wrote out for free!
Of course, if you still feel very strongly about this, nothing is stopping you from writing your version of Set which has to have a field which is the comparison function it uses. Something like
data Set a = Set { comparisionKey :: a -> a -> Ordering
, ...
}
Then, every time you make a Set, you would have to pass in the comparison key.

Related

Writing modules in Haskell the right way

(I'm totally rewriting this question to give it a better focus; you can see the history of changes if you want to see the original.)
Let's say I have two modules:
One module defines the function inverseAndSqrt. What this function actually does is not important; what is important is that it returns none, one, or both of two things in a way that the client can distinguish which one is which;
module Module1 (inverseAndSqrt) where
type TwoOpts a = (Maybe a, Maybe a)
inverseAndSqrt :: Int -> TwoOpts Float
inverseAndSqrt x = (if x /= 0 then Just (1.0/(fromIntegral x)) else Nothing,
if x >= 0 then Just (sqrt $ fromIntegral x) else Nothing)
another module defines other functions depending on inverseAndSqrt and on its type
module Module2 where
import Module1
fun :: (Maybe Float, Maybe Float) -> Float
fun (Just x, Just y) = x + y
fun (Just x, Nothing) = x
fun (Nothing, Just y) = y
exportedFun :: Int -> Float
exportedFun = fun . inverseAndSqrt
What I want to understand from the perspective of design principle is: how should I interface Module1 with other modules (e.g. Module2) in a way that makes it well encapsulated, reusable, etc?
The problems I see are
I could one day decide that I don't want to use a pair to return the two results anymore; I could decide to use a 2 elements list; or another type which is isomorphic (I think this is the right adjective, isn't it?) to a pair; if I do this, all client code will break
Exporting the TwoOpts type synonym doesn't solve anything, as Module1 could still change its implementation thus breaking client code.
Module1 is also forcing the type of the two optionals to be the same, but I'm not sure this is really relevant to this question...
How should I design Module1 (and thus edit Module2 as well) such that the two are not tightly coupled?
One thing I can think of is that maybe I should define a typeclass expressing what "a box with two optional things in it" is, and then Module1 and Module2 would use that as a common interface. But should that be in both module? In either of them? Or in none of them, in a third module? Or maybe such a class/concept is not needed?
I'm not a computer scientist so I'm sure that this question highlights some misunderstanding of mine due to lack of experience and theoretical background. Any help filling the gaps is welcome.
Possible modifications I'd like to support
Related to what chepner suggested in a comment to his answer, at some point I might want to extend the support from 2-tuple things to both 2- and 3-tuple things, having different accessor names for them, suche as get1of2/get2of2 (let's say these are the name we use when we first design Module1) vs get1of3/get2of3/get3of3.
At some point I would also be able to complement this 2-tuple-like type with something else, for instance an optional containing Just the sum¹ of the two main contents only if they are both Justs, or a Nothing if at least one of the two main contents is a Nothing. I guess in this case the internal representation of this class would be something like ((Maybe a, Maybe a), Maybe b) (¹ The sum is really a stupid example, so I've used b here instead of a to be more general than the sum would require).
To me, Haskell design is all type-centric. The design rule for functions is just "use the most general and accurate types that do the job", and the whole problem of design in Haskell is about coming up with the best types for the job.
We would like there to be no "junk" in the types, so that they have exactly one representation for each value you want to denote. E.g. String is a bad representation for numbers, because "0", "0.0", "-0" all mean the same thing, and also because "The Prisoner" is not a number -- it is a valid representation that does not have a valid denotation. If, say for performance reasons, the same denotation can be represented multiple ways, the type's API should make that difference invisible to the user.
So in your case, (Maybe a, Maybe a) is perfect -- it means exactly what you need it to mean. Using something more complicated is unnecessary, and will just complicate matters for the user. At some point whatever you expose will have to be convertible to a Maybe a for the first thing and a Maybe a for the second thing, and there is no extra information than that, so the tuple is perfect. Whether you use a type synonym or not is a matter of style -- I prefer not use synonyms at all and only give types names when I have a more formal abstraction in mind.
Connotation is important. For example, if I had a function for finding the roots of a quadratic polynomial, I probably wouldn't use TwoOpts, even though there are at most two of them. The fact that my return values are all "the same kind of thing" in an intuitive sense makes me prefer a list (or if I'm feeling particularly picky, a Set or Bag), even if the list has at most two elements. I just have it match my best understanding of the domain at the time, so I won't change it unless my understanding of the domain has changed in a significant way, in which case the opportunity to review all its uses is exactly what I want. If you are writing your functions to be as polymorphic as possible, then often you won't need to change anything but the specific moments the meaning is used, the exact moment domain knowledge is required (such as understanding the relationship between TwoOpts and Set). You don't need to "redo the plumbing" if it's made of a sufficiently flexible, polymorphic material.
Supposing you didn't have a clean isomorphism to a standard type like (Maybe a, Maybe a), and you wanted to formalize TwoOpts. The way here is to build an API out of its constructors, combinators, and eliminators. For example:
data TwoOpts a -- abstract, not exposed
-- constructors
none :: TwoOpts a
justLeft :: a -> TwoOpts a
justRight :: a -> TwoOpts a
both :: a -> a -> TwoOpts a
-- combinators
-- Semigroup and Monoid at least
swap :: TwoOpts a -> TwoOpts a
-- eliminators
getLeft :: TwoOpts a -> Maybe a
getRight :: TwoOpts a -> Maybe a
In this case the eliminators give exactly your representation (Maybe a, Maybe a) as their final coalgebra.
-- same as the tuple in a newtype, just more conventional
data TwoOpts a = TwoOpts (Maybe a) (Maybe a)
Or if you wanted to focus on the constructors side you could use an initial algebra
data TwoOpts a
= None
| JustLeft a
| JustRight a
| Both a a
You are at liberty to change this representation as long as it still implements the combinatory API above. If you have reason to use different representations of the same API, make the API into a typeclass (typeclass design is a whole other story).
In Einstein's famous words, "make it as simple as possible, but no simpler".
Don't define a simple type alias; this exposes the details of how you implement TwoOpts.
Instead, define a new type, but don't export the data constructor, but rather functions for accessing the two components. Then you are free to change the implementation of the type all you like without changing the interface, because the user can't pattern-match on a value of type TwoOpts a.
module Module1 (TwoOpts, inverseAndSqrt, getFirstOpt, getSecondOpt) where
data TwoOpts a = TwoOpts (Maybe a) (Maybe a)
getFirstOpt, getSecondOpt :: TwoOpts a -> Maybe a
getFirstOpt (TwoOpts a _) = a
getSecondOpt (TwoOpts _ b) = b
inverseAndSqrt :: Int -> TwoOpts Float
inverseAndSqrt x = TwoOpts (safeInverse x) (safeSqrt x)
where safeInverse 0 = Nothing
safeInverse x = Just (1.0 / fromIntegral x)
safeSqrt x | x >= 0 = Just $ sqrt $ fromIntegral x
| otherwise = Nothing
and
module Module2 where
import Module1
fun :: TwoOpts Float -> Float
fun a = case (getFirstOpts a, getSecondOpt a) of
(Just x, Just y) -> x + y
(Just x, Nothing) -> x
(Nothing, Just y) -> y
exportedFun :: Int -> Float
exportedFun = fun . inverseAndSqrt
Later, when you realize that you've reimplemented the type product, you can change your definitions without affecting any user code.
newtype TwoOpts a = TwoOpts { getOpts :: (Maybe a, Maybe a) }
getFirstOpt, getSecondOpt :: TwoOpts a -> Maybe a
getFirstOpt = fst . getOpts
getSecondOpt = snd . getOpts

Haskell: Typeclass vs passing a function

To me it seems that you can always pass function arguments rather than using a typeclass. For example rather than defining equality typeclass:
class Eq a where
(==) :: a -> a -> Bool
And using it in other functions to indicate type argument must be an instance of Eq:
elem :: (Eq a) => a -> [a] -> Bool
Can't we just define our elem function without using a typeclass and instead pass a function argument that does the job?
Yes. This is called "dictionary passing style". Sometimes when I am doing some especially tricky things, I need to scrap a typeclass and turn it into a dictionary, because dictionary passing is more powerful1, yet often quite cumbersome, making conceptually simple code look quite complicated. I use dictionary passing style sometimes in languages that aren't Haskell to simulate typeclasses (but have learned that that is usually not as great an idea as it sounds).
Of course, whenever there is a difference in expressive power, there is a trade-off. While you can use a given API in more ways if it is written using DPS, the API gets more information if you can't. One way this shows up in practice is in Data.Set, which relies on the fact that there is only one Ord dictionary per type. The Set stores its elements sorted according to Ord, and if you build a set with one dictionary, and then inserted an element using a different one, as would be possible with DPS, you could break Set's invariant and cause it to crash. This uniqueness problem can be mitigated using a phantom existential type to mark the dictionary, but, again, at the cost of quite a bit of annoying complexity in the API. This also shows up in pretty much the same way in the Typeable API.
The uniqueness bit doesn't come up very often. What typeclasses are great at is writing code for you. For example,
catProcs :: (i -> Maybe String) -> (i -> Maybe String) -> (i -> Maybe String)
catProcs f g = f <> g
which takes two "processors" which take an input and might give an output, and concatenates them, flattening away Nothing, would have to be written in DPS something like this:
catProcs f g = (<>) (funcSemi (maybeSemi listSemi)) f g
We essentially had to spell out the type we're using it at again, even though we already spelled it out in the type signature, and even that was redundant because the compiler already knows all the types. Because there's only one way to construct a given Semigroup at a type, the compiler can do it for you. This has a "compound interest" type effect when you start defining a lot of parametric instances and using the structure of your types to compute for you, as in the Data.Functor.* combinators, and this is used to great effect with deriving via where you can essentially get all the "standard" algebraic structure of your type written for you.
And don't even get me started on MPTC's and fundeps, which feed information back into typechecking and inference. I have never tried converting such a thing to DPS -- I suspect it would involve passing around a lot of type equality proofs -- but in any case I'm sure it would be a lot more work for my brain than I would be comfortable with.
--
1Unless you use reflection in which case they become equivalent in power -- but reflection can also be cumbersome to use.
Yes. That (called dictionary passing) is basically what the compiler does to typeclasses anyway. For that function, done literally, it would look a bit like this:
elemBy :: (a -> a -> Bool) -> a -> [a] -> Bool
elemBy _ _ [] = False
elemBy eq x (y:ys) = eq x y || elemBy eq x ys
Calling elemBy (==) x xs is now equivalent to elem x xs. And in this specific case, you can go a step further: eq has the same first argument every time, so you can make it the caller's responsibility to apply that, and end up with this:
elemBy2 :: (a -> Bool) -> [a] -> Bool
elemBy2 _ [] = False
elemBy2 eqx (y:ys) = eqx y || elemBy2 eqx ys
Calling elemBy2 (x ==) xs is now equivalent to elem x xs.
...Oh wait. That's just any. (And in fact, in the standard library, elem = any . (==).)

Factoring out recursion in a complex AST

For a side project I am working on I currently have to deal with an abstract syntax tree and transform it according to rules (the specifics are unimportant).
The AST itself is nontrivial, meaning it has subexpressions which are restricted to some types only. (e.g. the operator A must take an argument which is of type B only, not any Expr. A drastically simplified reduced version of my datatype looks like this:
data Expr = List [Expr]
| Strange Str
| Literal Lit
data Str = A Expr
| B Expr
| C Lit
| D String
| E [Expr]
data Lit = Int Int
| String String
My goal is to factor out the explicit recursion and rely on recursion schemes instead, as demonstrated in these two excellent blog posts, which provide very powerful general-purpose tools to operate on my AST. Applying the necessary factoring, we end up with:
data ExprF a = List [a]
| Strange (StrF a)
| Literal (LitF a)
data StrF a = A a
| B a
| C (LitF a)
| D String
| E [a]
data LitF a = Int Int
| String String
If I didn't mess up, type Expr = Fix ExprF should now be isomorphic to the previously defined Expr.
However, writing cata for these cases becomes rather tedious, as I have to pattern match B a :: StrF a inside of an Str :: ExprF a for cata to be well-typed. For the entire original AST this is unfeasible.
I stumbled upon fixing GADTs, which seems to me like it is a solution to my problem, however the user-unfriendly interface of the duplicated higher-order type classes etc. is quite the unneccessary boilerplate.
So, to sum up my questions:
Is rewriting the AST as a GADT the correct way to go about this?
If yes, how could I transform the example into a well-working version? On a second note, is there better support for higher kinded Functors in GHC now?
If you've gone through the effort of to separate out the recursion in your data type, then you can just derive Functor and you're done. You don't need any fancy features to get the recursion scheme. (As a side note, there's no reason to parameterize the Lit data type.)
The fold is:
newtype Fix f = In { out :: f (Fix f) }
gfold :: (Functor f) => (f a -> a) -> Fix f -> a
gfold alg = alg . fmap (gfold alg) . out
To specify the algebra (the alg parameter), you need to do a case analysis against ExprF, but the alternative would be to have the fold have a dozen or more parameters: one for each data constructor. That wouldn't really save you much typing and would be much harder to read. If you want (and this may require rank-2 types in general), you can package all those parameters up into a record and then you could use record update to update "pre-made" records that provide "default" behavior in various circumstances. There's an old paper Dealing with Large Bananas that takes an approach like this. What I'm suggesting, to be clear, is just wrapping the gfold function above with a function that takes a record, and passes in an algebra that will do the case analysis and call the appropriate field of the record for each case.
Of course, you could use GHC Generics or the various "generic/polytypic" programming libraries like Scrap Your Boilerplate instead of this. You are basically recreating what they do.

Ordering of Bool types (i.e. True > False) - Why? [duplicate]

This question already has answers here:
Understanding Haskell's Bool Deriving an Ord
(3 answers)
Closed 7 years ago.
Can someone please explain the following output?
Prelude> compare True False
GT
it :: Ordering
Prelude> compare False True
LT
it :: Ordering
Why are Bool type values ordered in Haskell - especially, since we can demonstrate that values of True and False are not exactly 1 and 0 (unlike many other languages)?
This is how the derived instance of Ord works:
data D = A | B | C deriving Ord
Given that datatype, we get C > B > A. Bool is defined as False | True, and it kind of makes sense when you look at other examples such as:
Maybe a = Nothing | Just a
Either a b = Left a | Right b
In each of the case having "some" ("truthy") value is greater than having no values at all (or having "left" or "bad" or "falsy" value).
While Bool is not Int, it can be converted to the 0,1 fragment of Int since it is an Enum type.
fromEnum False = 0
fromEnum True = 1
Now, the Enum could have been different, reversing 0 and 1, but that would probably be surprising to most programmers thinking about bits.
Since it has an Enum type, everything else being equal, it's better to define an Ord instance which follows the same order, satisfying
compare x y = compare (fromEnum x) (fromEnum y)
In fact, each instance generated from deriving (Eq, Ord, Enum) follows such property.
On a more theoretical note, logicians tend to order propositions from the strongest to the weakest (forming a lattice). In this structure, False (as a proposition) is the bottom, i.e. the least element, while True is the top. While this is only a convention (theory would be just as nice if we picked the opposite ordering), it's a good thing to be consistent.
Minor downside: the implication boolean connective is actually p <= q expressing that p implies q, instead of the converse as the "arrow" seems to indicate.
Let me answer your question with a question: Why is there an Ord instance for ()?
Unlike Bool, () has only one possible value: (). So why the hell would you ever want to compare it? There is only one value possible!
Basically, it's useful if all or most of the standard basic types have instances for common classes. It makes it easier to derive instances for your own types. If Foo doesn't have an Ord instance, and your new type has a single Foo field, then you can't auto-derive an Ord instance.
You might, for example, have some kind of tree type where we can attach several items of information to the leaves. Something like Tree x y z. And you might want to have an Eq instance to compare trees. It would be annoying if Tree () Int String didn't have an Eq instance just because () doesn't. So that's why () has Eq (and Ord and a few others).
Similar remarks apply to Bool. It might not sound particularly useful to compare two bool values, but it would be irritating if your Ord instance vanishes as soon as you put a bool in there.
(One other complicating factor is that sometimes we want Ord because there's a logically meaningful ordering for things, and sometimes we just want some arbitrary order, typically so we can use something as a key for Data.Map or similar. Arguably there ought to be two separate classes for that… but there isn't.)
Basically, it comes from math. In set theory or category theory boolean functions are usually thought of as classifiers of subsets/subobjects. In plain terms, function f :: a -> Bool is identified with filter f :: [a] -> [a]. So, if we change one value from False to True, the resulting filtered list (subset, subobject, whatever) is going to have more elements. Therefore, True is considered "bigger" than False.

Extending algebraic data type

Note: if this question is somehow odd, this is because I was only recently exposed to Haskell and am still adapting to the functional mindset.
Considering a data type like Maybe:
data MyOwnMaybe a = MyOwnNothing | MyOwnJust a
everyone using my data type will write functions like
maybeToList :: MyOwnMaybe a -> [a]
maybeToList MyOwnNothing = []
maybeToList (MyOwnJust x) = [x]
Now, suppose that, at a later time, I wish to extend this data type
data MyOwnMaybe a = MyOwnNothing | MyOwnJust a | SuperpositionOfNothingAndJust a
how do I make sure that everyone's functions will break at compile-time?
Of course, there is the chance that somehow I'm not "getting" algebraic data types and maybe I shouldn't be doing this at all, but considering a data type Action
data Action = Reset | Send | Remove
it would seem that adding an extra Action like Add would not be so uncommon (and I wouldn't want to risk having all these functions around that possibly cannot handle my new Action)
Well, bad news first: sometimes you just can't do it. Period.
But that is language-agnostic; in any language you sometimes have to break interface. There is no way around it.
Now, good news: you can actually go a great length before you have to do that.
You just have to carefully consider what you export from your module. If, instead of exporting the internal workings of it, you export high-level functions, then there is a good chance you can rewrite those function using the new data type, and everything would go smooth.
In particular, be very careful when exporting data constructors. In this case, you don't just export functions that create your data; you are also exporting the possibility of pattern-matching; and that is not something that ties you pretty tight.
So, in your example, if you write functions like
myOwnNothing :: MyOwnMaybe a
myOwnJust :: a -> MyOwnMaybe a
and
fromMyOwnMaybe :: MyOwnMaybe a -> b -> (a -> b) -> b
fromMyOwnMaybe MyOwnNothing b _ = b
fromMyOwnMaybe (MyOwnJust a) _ f = f a
then it's reasonable to assume that you would be able to reimplement it for the updated MyOwnMaybe data type; so, just export those functions and the data type itself, but don't export constructors.
The only situation in which you would benefit from exporting constructors is when you are absolutely sure that your data type won't ever change. For example, Bool would always have only two (fully defined) values: True and False, it won't be extended by some FileNotFound or anything (although Edward Kmett might disagree). Ditto Maybe or [].
But the idea is more general: stay as high-level as you can.
You seem to know that GHC can warn about non-exhaustive pattern matches in function via the -W flag or explicitly with -fwarn-incomplete-patterns.
There is a good discussion about why these warnings are not automatically compile-time errors at this SO question:
In Haskell, why non-exhaustive patterns are not compile-time errors?
Also, consider this case where you have an ADT with a large number of constructors:
data Alphabet = A | B | C | ... | X | Y | Z
isVowel :: Alphabet -> Bool
isVowel A = True
isVowel E = True
isVowel I = True
isVowel O = True
isVowel U = True
isVowel _ = False
A default case is used as a convenience to avoid having to write out the other 21 cases.
Now if you add an addition constructor to Alphabet, should isVowel be flagged as "incomplete"?
One thing that a lot of modules do is not to export their constructors. Instead, they export functions that can be used (“smart constructors”). If you change your ADT later, you have to fix your functions in the module, but no one else's code gets broken.

Resources