I have been reading up on adjunctions during the last couple of days. While I start to understand their importance from a theoretical point of view, I wonder how and why people use them in Haskell. Data.Functor.Adjunction provides an implementation and among its instances are free functor / forgetful functor and curry / uncurry. Again those are very interesting from the theoretical view point but I can't see how I would use them for more practical programming problems.
Are there examples of programming problems people solved using Data.Functor.Adjunction and why you would prefer this implementation over others?
Preliminary note: This answer is a bit speculative. Much like the question, it was built from studying Data.Functor.Adjunction.
I can think of three reasons why there aren't many use cases for the Adjunction class in the wild.
Firstly, all Hask/Hask adjunctions are ultimately some variation on the currying adjunction, so the spectrum of potential instances isn't all that large to begin with. Many of the adjunctions one might be interested on aren't Hask/Hask.
Secondly, while an Adjunction instance gives you a frankly awesome amount of other instances for free, in many cases those instances already exist somewhere else. To pick the ur-example, we might very easily implement StateT in terms of Control.Monad.Trans.Adjoint:
newtype StateT s m a = StateT { runStateT :: s -> m (s, a) }
deriving (Functor, Applicative, Monad) via AdjointT ((,) s) ((->) s) m
deriving MonadTrans via AdjointT ((,) s) ((->) s)
-- There is also a straightforward, fairly general way to implement MonadState.
However, no one needs to actually do that, because there is a perfectly good StateT in transformers. That said, if you do have an Adjunction instance of your own, you might be in luck. One little thing I have thought of that might make sense (even if I haven't actually seen it out there) are the following functors:
data Dilemma a = Dilemma { fstDil :: a, sndDil a }
data ChoiceF a = Fst a | Snd a
We might write an Adjunction ChoiceF Dilemma instance, which reflects how Dilemma (ChoiceF a) is materialised version of State Bool a. Dilemma (ChoiceF a) can be thought of as a step in a decision tree: choosing one side of the Dilemma tells you, through the ChoiceF constructors, what choice is to be made next. The Adjunction instance would then give us a monad transformer for Dilemma (ChoiceF a) for free.
(Another possibility might be exploiting the Free f/Cofree u adjunction. Cofree Dilemma a is an infinite tree of outcomes, while Free ChoiceF a is a path leading to an outcome. I hazard there is some mileage to get out of that.)
Thirdly, while there are many useful functions for right adjoints in Data.Functor.Adjunction, most of the functionality they provide is also available through Representable and/or Distributive, so most places where they might be used end up sticking with the superclasses instead.
Data.Functor.Adjunction, of course, also offers useful functions for left adjoints. On the one hand, left adjoints (which are isomorphic to pairs i.e. containers that hold a single element) are probably less versatile than right adjoints (which are isomorphic to functions i.e. functors with a single shape); on the other hand, there doesn't seem to be any canonical class for left adjoints (not yet, at least), so that might lead to opportunities for actually using Data.Functor.Adjunction functions. Incidentally, Chris Penner's battleship example you suggested arguably fits the bill, as it does rely on the left adjoint and how it can be used to encode the representation of the right adjoint:
zapWithAdjunction :: Adjunction f u => (a -> b -> c) -> u a -> f b -> c
zapWithAdjunction #CoordF #Board :: (a -> b -> c) -> Board a -> CoordF b -> c
checkHit :: Vessel -> Weapon -> Bool
shoot :: Board Vessel -> CoordF Weapon -> Bool
CoordF, the left adjoint, carries coordinates for the board and a payload. zapWithAdjunction makes it possible to (quite literally, in this case), target the position while using the payload.
Related
In Haskell I find it difficult to completely grasp the purpose of a kind system, and what it truly adds to the language.
I understand having kinds adds safety.
For example consider fmap :: (a -> b) -> f a -> f b vs a monokinded version of it fmap2 :: (a -> b) -> p -> q.
My understanding is that, thanks to the kind system, I can specify beforehand with higher detail what the shape of the data should be. It is better, as the type checker can check more, and stricter requirements will make it less likely that users will mess up with types. This will reduce the likelihood of programming errors. I believe this is the main motivation for kinds but I might be wrong.
Now I think fmap2 can have the same implementation as fmap, since its types are unconstrained. Is it true?
Could one just replace all multi kinded types in the base/ghc library by mono kinded ones, and would it still compile correctly?
To clarify a little bit I mean that for a class like:
class Functor f where
fmap :: (a -> b) -> f a -> f b
(<$) :: a -> f b -> f a
I might replace it by something like
class Functor a b p q where
fmap :: (a -> b) -> p -> q
(<$) :: a -> q -> p
This way for a Functor instance like
instance Functor [] where fmap = map
I might replace it by
instance Functor a b p q where fmap = map
Remark: This wont work as is because i also need to modify map and go down the dependency chain. Will think more about this later..
I'm trying to figure out if kinds add more than just safety? Can I do something with multi kinded types that I cannot do with mono kinded ones?
Remark: Here I forgot to mention i'm usually using some language extensions, typically to allow more flexibility in writing classes.
When doing vanilla haskell kinds can be really meaningful thing to use.
But when I start using type families, and a few of other extensions it becomes much less clear that I need kinds at all.
I have in mind that there is some monotraversable library which reimplements many standard library functions using single kinded types, and type families to generalize signatures. That's why my intuition is that multi kinded variables might not really add that much expressive power after all, provided one uses type families. However a drawback of doing this is that you lose type safety. My question is do you really only lose just that, or do you really lose something else.
Let's back up. It seems like you think that because the signature
id2 :: a -> b
is more general than
id :: a -> a
that id2 could have the same implementation as id. But that's not true. In fact id2 has no total implementation (any implementation of id2 involves an infinite loop or undefined).
The structure of generality creates a "pressure", and this pressure is what we must navigate to find the right type for something. If f is more general than g, that means that anywhere g could be used, f could also but it also means that any implementation of f is a valid implementation of g. So at the use site it goes one direction, at the definition site it goes the other.
We can use id2 anywhere we could have used id, but the signature of id2 is so general that it is impossible to implement. The more general a signature gets, the more contexts it can be used in, but also the less likely it is to have an implementation. I would say a main goal of a good type signature is to find the level of generality where there are as few as possible implementations of your intended function, without making it impossible to write altogether.
Your proposed signature fmap2 :: (a -> b) -> p -> q is, like id2, so general that it is impossible to implement. The purpose of higher-kinded types is to give us the tools to have very general signatures like fmap, which are not too general like fmap2.
Why is guard based on Alternative?
guard :: Alternative f => Bool -> f ()
-- guard b is pure () if b is True,
-- and empty if b is False.
I ask because guard only uses the empty from Alternative. It doesn't use <|> from Alternative at all. So why bother using Alternative in the first place?
I guess this is because there is some unstated idea behind Alternative's empty that matches perfectly with what we're trying to accomplish with guard (stop on False, continue on True). If this is the case, please enlighten me about this unstated idea.
Yet at the same time, if feels that we're just ignoring <|>. It feels almost as if guard is not "fully capturing" what Alternative is all about. I hope that makes sense. To make it more concrete: Why didn't they invent another type class called something like Stoppable (or Abortable) and used that instead of Alternative?
TL;DR: Historical reasons. It was envisioned like this in MonadPlus, which got its Applicative variant Alternative later, and no one has proposed to split Alternative into AZero and AChoice or similar.
Alternative is a relatively new idea, just like Applicative. Back when guard was first envisioned, it was based on MonadPlus, a Monad that should support choice and failure, just like Alternative. Its original type was thus
guard :: MonadPlus m => Bool -> m ()
That was specified in the Haskell 98 report, where MonadPlus was already noted. Haskell 1.0 didn't use monads at all, by the way. When Applicative finally got a superclass of Monad, Alternative got a superclass of MonadPlus, and mzero = empty and mplus = (<|>).
Well, now we know why guard uses Alternative. Because it was based on MonadPlus beforehand. So why is MonadPlus defined like this?
One would have to write a mail to SPJ or someone else from the committee to get their rationale from 1998, because just one year later, Erik Meijer and Graham Hutton wrote their "Monadic Parsing in Haskell" paper. If you have a look at the paper, you'll notice that their MonadPlus just works like you intend:
class Monad m => MonadZero m where
zero :: m a
class MonadZero m => MonadPlus m where
(++) :: m a -> m a -> m a
So it's certainly possible to handle this "stoppable" the way you've described it. However, there is simply no base class that currently defines empty without Alternative. There could be one, but it wasn't proposed yet.
Note that this is a recurring theme with Haskell classes. Monoid contains mappend and mempty. After its conception, someone noticed that there are certain types where mappend makes sense, but not mempty. For example
newtype Min a = Min a
combine :: Ord a => Min a -> Min a -> Min a
combine (Min x) (Min y) = Min (min x y)
Here, mappend = combine is clearly associative, whereas an empty Min isn't possible if we just use Ord, we would have to use Bounded. That's why there is now Semigroup, which isn't a base class of Monoid yet, but gives us that associative operation.
To come back to your original question: guard uses Alternative, because Alternative provides empty, and empty "stops" the evaluation in certain Alternative's. There's no other class that contains that, yet.
But with a proposal, there might be at some point, although I'm not sure what's the community's opinion on splitting Alternative is.
By the way, languages like PureScript split Alternative, although they split it the other way round…
For more information about Alternative and why I used Monoid as another example, see Confused by the meaning of the 'Alternative' type class and its relationship to other type classes.
Comonoids are mentioned, for example, in Haskell's distributive library docs:
Due to the lack of non-trivial comonoids in Haskell, we can restrict ourselves to requiring a Functor rather than some Coapplicative class.
After a little searching I found a StackOverflow answer that explains this a bit more with the laws that comonoids would have to satisfy. So I think I understand why there's only one possible instance for a hypothetical Comonoid typeclass in Haskell.
Thus, to find a nontrivial comonoid, I suppose we'd have to look in some other category. Surely, if category theorists have a name for comonoids, then there are some interesting ones. The other answers on that page seem to hint at an example involving Supply, but I couldn't figure one out that still satisfies the laws.
I also turned to Wikipedia: there's a page for monoids that doesn't reference category theory, which seems to me as an adequate description of Haskell's Monoid typeclass, but "comonoid" redirects to a category-theoretic description of monoids and comonoids together that I can't understand, and there still don't seem to be any interesting examples.
So my questions are:
Can comonoids be explained in non-category-theoretic terms like monoids?
What is a simple example of an interesting comonoid, even if it's not a Haskell type? (Could one be found in a Kleisli category over a familiar Haskell monad?)
edit: I am not sure if this is actually category-theoretically correct, but what I was imagining in the parenthetical of question 2 was nontrivial definitions of delete :: a -> m () and split :: a -> m (a, a) for some specific Haskell type a and Haskell monad m that satisfy Kleisli-arrow versions of the comonoid laws in the linked answer. Other examples of comonoids are still welcome.
As Phillip JF mentioned, comonoids are interesting to talk about in substructural logics. Let's talk about linear lambda calculus. This is much like your normal typed lambda calculus except that every variable must be used exactly once.
To get a feel, let's count linear functions of given types, i.e.
a -> a
has exactly one inhabitant, id. While
(a,a) -> (a,a)
has two, id and flip. Note that in regular lambda calculus (a,a) -> (a,a) has four inhabitants
(a, b) ↦ (a, a)
(a, b) ↦ (b, b)
(a, b) ↦ (a, b)
(a, b) ↦ (b, a)
but the first two require that we use one of the arguments twice while discarding the other. This is exactly the essence of linear lambda calculus—disallowing those kinds of functions.
As a quick aside, what's the point of linear LC? Well, we can use it to model linear effects or resource usage. If, for instance, we have a file type and a few transformers it might look like
data File
open :: String -> File
close :: File -> () -- consumes a file, but we're ignoring purity right now
t1 :: File -> File
t2 :: File -> File
and then the following are valid pipelines:
close . t1 . t2 . open
close . t2 . t1 . open
close . t1 . open
close . t2 . open
but this "branching" computation isn't
let f1 = open "foo"
f2 = t1 f1
f3 = t2 f1
in close f3
since we used f1 twice.
Now, you might be wondering something at this point about what things must follow the linear rules. For instance, I decided that some pipelines don't have to include both t1 and t2 (compare the enumeration exercise from before). Further, I introduced the open and close functions which happily create and destroy the File type despite that being a violation of linearity.
Indeed, we might posit the existence of functions which violate linearity—but not all clients may. It's much like the IO monad—all of the secrets live inside the implementation of IO so that users work in a "pure" world.
And this is where Comonoid comes in.
class Comonoid m where
destroy :: m -> ()
split :: m -> (m, m)
A type that instantiates Comonoid in a linear lambda calculus is a type which has carry-along destruction and duplication rules. In other words, it's a type which isn't very much bound by linear lambda calculus at all.
Since Haskell doesn't implement the linear lambda calculus rules at all, we can always instantiate Comonoid
instance Comonoid a where
destroy a = ()
split a = (a, a)
Or, perhaps the other way to think of it is that Haskell is a linear LC system that just happens to instantiate Comonoid for every type and applies destroy and split for you automatically.
A monoid in the usual sense is the same as a categorical monoid in the category of sets. One would expect that a comonoid in the usual sense is the same as a categorical comonoid in the category of sets. But every set in the category of sets is a comonoid in a trivial way, so apparently there is no non-categorical description of comonoids which would be parallel to that of monoids.
Just like a monad is a monoid in the category of endofunctors (what's the problem?), a comonad is a comonoid in the category of endofunctors (what's the coproblem?) So yes, any comonad in Haskell would be an example of a comonoid.
Well one way we can think of a monoid is as hooked to any particular product construction that we're using, so in Set we'd take this signature:
mul : A * A -> A
one : A
to this one:
dup : A -> A * A
one : A
but the idea of duality is that the logical statements that you can make all have duals which can be applied to the dual objects, and there is another way of stating what a monoid is, and that's being agnostic to the choice of product construction and then when we take the costructure we can take the coproduct in the output, like:
div : A -> A + A
one : A
where + is a tagged sum. Here we essentially have that every single term which is in this type is always ready to produce a new bit, which is implicitly derived from the tag used to denote the left or the right instance of A. I personally think this is really damn cool. I think the cool version of the things that people were talking about above is when you don't particularly construct that for monoids, but for monoid actions.
A monoid M is said to act on a set A if there's a function
act : M * A -> A
where we have the following rules
act identity a = a
act f (act g a) = act (f * g) a
If we want a co-action, what exactly do we want?
act : A -> M * A
this generates us a stream of the type of our comonoid! I'm having a lot of trouble coming up with the laws for these systems, but I think they must be around somewhere so I'm gonna keep looking tonight. If somebody can tell me them or that I'm wrong about these things in some way or another, also interested in that.
As a physicist, the most common example I deal with is coalgebras, which are comonoid objects in the category of vector spaces, with the monoidal structure usually given by the tensor product.
In that case, there is a bijection between monoid and comonoid objects, since you can just take the adjoint or transpose of the product and unit maps to get a coproduct and a counit that satisfy the comonoid axioms.
In some branches of physics, it is very common to see objects that have both an algebra and a coalgebra structure with some compatibility axioms. The two most common cases are Hopf algebras and Frobenius algebras. They are very convenient for constructing states or solution that are entangled or correlated.
In programming, the simplest nontrivial example I can think of would be reference counted pointers such as shared_ptr in C++ and Rc in Rust, along with their weak equivalents. You can copy them, which is a nontrivial operation that bumps up the refcount (so the two copies are distinct from the initial state). You can drop (call the destructor) on one, which is nontrivial because it bumps down the refcount of any other refcounted pointer that points to the same piece of data.
Furthermore, weak pointers are a great example of a comonoid action. You can use the co-action to generate a weak pointer from a shared pointer. This can be easily checked by noting that creating one from a shared pointer and immediately dropping it is a unit operation, and creating one & cloning it is equivalent to creating two from the shared pointer.
This is a general thing you see with nontrivial coproducts and their co-actions: when they don't reduce to a copying operation, they intuitively imply some form of action at a distance between the two halves, while also adding an operation that erases one half to leave the other independent.
To clarify the question: it is about the merits of the monad type class (as opposed to just its instances without the unifying class).
After having read many references (see below),
I came to the conclusion that, actually, the monad class is there to solve only one, but big and crucial, problem: the 'chaining' of functions on types with context. Hence, the famous sentence "monads are programmable semicolons".
In fact, a monad can be viewed as an array of functions with helper operations.
I insist on the difference between the monad class, understood as a general interface for other types; and these other types instantiating the class (thus, "monadic types").
I understand that the monad class by itself, only solves the chaining of operators because mainly, it only mandates its type instances
to have bind >>= and return, and tell us how they must behave. And as a bonus, the compiler greatyly helps the coding providing do notation for monadic types.
On the other hand,
it is each individual type instantiating the monad class which solves each concrete problem, but not merely for being a instance of Monad. For instance Maybe solves "how a function returns a value or an error", State solves "how to have functions with global state", IO solves "how to interact with the outside world", and so on. All theses classes encapsulate a value within a context.
But soon or later, we will need to chain operations on such context-types. I.e., we will need to organize calls to functions on these types in a particular sequence (for an example of such a problem, please read the example about multivalued functions in You could have invented monads).
And you get solved the problem of chaining, if you have each type be an instance of the monad class.
For the chaining to work you need >>= just with the exact signature it has, no other. (See this question).
Therefore, I guess that the next time you define a context data type T for solving something, if you need to sequence calls of functions (on values of T) consider making T an instance of Monad (if you need "chaining with choice" and if you can benefit from the do notation). And to make sure you are doing it right, check that T satisfies the monad laws
Then, I ask two questions to the Haskell experts:
A concrete question: is there any other problem that the monad class solves by ifself (leaving apart monadic classes)? If any, then, how it compares in relevance to the problem of chaining operations?
An optional general question: are my conclusions right, am I misunderstanding something?
References
Tutorials
Monads in pictures Definitely worth it; read this one first.
Fistful of monads
You could have invented monads
Monads are trees (pdf)
StackOverflow Questions & Answers
How to detect a monad
On the signature of >>= monad operator
You're definitely on to something in the way that you're stating this—there are many things that Monad means and you've separated them out well.
That said, I would definitely say that chaining operations is not the primary thing solved by Monads. That can be solved using plain Functors (with some trouble) or easily with Applicatives. You need to use the full monad spec when "chaining with choice". In particular, the tension between Applicative and Monad comes from Applicative needing to know the entire structure of the side-effecting computation statically. Monad can change that structure at runtime and thus sacrifices some analyzability for power.
To make the point more clear, the only place you deal with a Monad but not any specific monad is if you're defining something with polymorphism constrained to be a Monad. This shows up repeatedly in the Control.Monad module, so we can examine some examples from there.
sequence :: [m a] -> m [a]
forever :: m a -> m b
foldM :: (a -> b -> m a) -> a -> [b] -> m a
Immediately, we can throw out sequence as being particular to Monad since there's a corresponding function in Data.Traversable, sequenceA which has a type slightly more general than Applicative f => [f a] -> f [a]. This ought to be a clear indicator that Monad isn't the only way to sequence things.
Similarly, we can define foreverA as follows
foreverA :: Applicative f => f a -> f b
foreverA f = flip const <$> f <*> foreverA f
So more ways to sequence non-Monad types. But we run into trouble with foldM
foldM :: (Monad m) => (a -> b -> m a) -> a -> [b] -> m a
foldM _ a [] = return a
foldM f a (x:xs) = f a x >>= \fax -> foldM f fax xs
If we try to translate this definition to Applicative style we might write
foldA :: (Applicative f) => (a -> b -> f a) -> a -> [b] -> f a
foldA _ a [] = pure a
foldA f a (x:xs) = foldA f <$> f a x <*> xs
But Haskell will rightfully complain that this doesn't typecheck--each recursive call to foldA tries to put another "layer" of f on the result. With Monad we could join those layers down, but Applicative is too weak.
So how does this translate to Applicatives restricting us from runtime choices? Well, that's exactly what we express with foldM, a monadic computation (a -> b -> m a) which depends upon its a argument, a result from a prior monadic computation. That kind of thing simply doesn't have any meaning in the more purely sequential world of Applicative.
To solve the problem of chaining operations on an individual monadic type, it's not at all necessary to make it an instance of Monad and be sure the monad laws are satisfied. You could just implement a chaining operation directly on your type.
It would probably be very similar to the monadic bind, but not necessarily exactly the same (recall that bind for lists is concatMap, a function that exists anyway, but with the arguments in a different order). And you wouldn't have to worry about the monad laws, because you would have a slightly different interface for each type, so they wouldn't have any common requirements.
To ask what problem the Monad type class itself solves, look at all the functions (in Control.Monad and else where) that work on values in any monadic type. The problem solved is code reuse! Monad is exactly the part of all the monadic types that is common to each and every one of them. That part is sufficient on its own to write useful computations. All of these functions could be implemented for any individual monadic type (often more directly), but they've already been implemented for all monadic types, even the ones that don't exist yet.
You don't write a Monad instance so that you can chain operations on your type (often you already have a way of chaining, in fact). You write a Monad instance for all the code that automatically comes along with the Monad instance. Monad isn't about solving any problem for any single type, it's about a way of viewing many disparate types as instances of a single unifying concept.
In "Learn You a Haskell for Great Good!" author claims that Applicative IO instance is implemented like this:
instance Applicative IO where
pure = return
a <*> b = do
f <- a
x <- b
return (f x)
I might be wrong, but it seems that both return, and do-specific constructs (some sugared binds (>>=) ) comes from Monad IO. Assuming that's correct, my actual question is:
Why Applicative IO implementation depends on Monad IO functions/combinators?
Isn't Applicative less powerfull concept than Monad?
Edit (some clarifications):
This implementation is against my intuition, because according to Typeclassopedia article it's required for a given type to be Applicative before it can be made Monad (or it should be in theory).
(...) according to Typeclassopedia article it's required for a given type to be Applicative before it can be made Monad (or it should be in theory).
Yes, your parenthetical aside is exactly the issue here. In theory, any Monad should also be an Applicative, but this is not actually required, for historical reasons (i.e., because Monad has been around longer). This is not the only peculiarity of Monad, either.
Consider the actual definitions of the relevant type classes, taken from the base package's source on Hackage.
Here's Applicative:
class Functor f => Applicative f where
pure :: a -> f a
(<*>) :: f (a -> b) -> f a -> f b
(*>) :: f a -> f b -> f b
(<*) :: f a -> f b -> f a
...about which we can observe the following:
The context is correct given currently existing type classes, i.e., it requires Functor.
It's defined in terms of function application, rather than in (possibly more natural from a mathematical standpoint) terms of lifting tuples.
It includes technically superfluous operators equivalent to lifting constant functions.
Meanwhile, here's Monad:
class Monad m where
(>>=) :: m a -> (a -> m b) -> m b
(>>) :: m a -> m b -> m b
return :: a -> m a
fail :: String -> m a
...about which we can observe the following:
The context not only ignores Applicative, but also Functor, both of which are logically implied by Monad but not explicitly required.
It's also defined in terms of function application, rather than the more mathematically natural definition using return and join.
It includes a technically superfluous operator equivalent to lifting a constant function.
It also includes fail which doesn't really fit in at all.
In general, the ways that the Monad type class differs from the mathematical concept it's based on can be traced back through its history as an abstraction for programming. Some, like the function application bias it shares with Applicative, are a reflection of existing in a functional language; others, like fail or the lack of an appropriate class context, are historical accidents more than anything else.
What it all comes down to is that having an instance of Monad implies an instance for Applicative, which in turn implies an instance for Functor. A class context merely formalizes this explicitly; it remains true regardless. As it stands, given a Monad instance, both Functor and Applicative can be defined in a completely generic way. Applicative is "less powerful" than Monad in exactly the same sense that it is more general: Any Monad is automatically Applicative if you copy+paste the generalized instance, but there exist Applicative instances which cannot be defined as a Monad.
A class context, like Functor f => Applicative f says two things: That the latter implies the former, and that a definition must exist to fulfill that implication. In many cases, defining the latter implicitly defines the former anyway, but the compiler cannot deduce that in general, and thus requires both instances to be written out explicitly. The same thing can be observed with Eq and Ord--the latter obviously implies the former, but you still need to define an Eq instance in order to define one for Ord.
The IO type is abstract in Haskell, so if you want to implement a general Applicative for IO you have to do it with the operations that are supported by IO. Since you can implement Applicative in terms of the Monad operations that seems like a good choice. Can you think of another way to implement it?
And yes, Applicative is in some sense less powerful than Monad.
Isn't Applicative a less powerful concept than Monad?
Yes, and therefore whenever you have a Monad you can always make it an Applicative. You could replace IO with any other monad in your example and it would be a valid Applicative instance.
As an analogy, while a color printer may be considered more powerful than a grayscale printer, you can still use one to print a grayscale image.
Of course, one could also base a Monad instance on an Applicative and set return = pure, but you won't be able to define >>= generally. This is what Monad being more powerful means.
In a perfect world every Monad would be an Applicative (so we had class Applicative a => Monad a where ...), but for historical reasons both type classes are independend. So your observation that this definition is kind of "backwards" (using the more powerful abstaction to implement the less powerful one) is correct.
You already have perfectly good answers for older versions of GHC, but in the latest version you actually do have class Applicative m => Monad m so your question needs another answer.
In terms of GHC implementation: GHC just checks what instances are defined for a given type before it tries to compile any of them.
In terms of code semantics: class Applicative m => Monad m doesn't mean the Applicative instance has to be defined "first", just that if it hasn't been defined by the end of your program then the compiler will abort.