Haskell multiple functors - haskell

I'm making a fibonacci heap implementation in Haskell, and I'm not sure exactly what the clean way to do it.
For example, I want to order the nodes. So I can do something like:
instance Ord (FibNode e) where
f1 `compare` f2 = (key f1) `compare` (key f2)
This would be more easily done if I made FibNode a monad. But other times I want to fold across the node's siblings, or fold across their children etc. So defining a functor where f x = f $ key x won't work all the time.
Apart from defining my own fmapKey, fmapSibs, fmapKids... is there a way to do this?

You can't make a type constructor an instance of Functor in more than one way.
But you can make various newtype wrappers around your type that each has its own Functor instance. But that's not really any more convenient than defining your own fmap functions.

The (Fibonacci) heap is an inherently impure, destructively-updated data structure with specific asymptotic runtime guarantees for various operations on it. I would be surprised if you can maintain these guarantees with a straightforward translation to a pure version. My suggestion would be to to make it an impure data structure using something like the ST or IO arrays. This would make for a much more direct implementation of the classic algorithms.

Related

Why does Haskell contain so many equivalent functions

It seems like there are a lot of functions that do the same thing, particularly relating to Monads, Functors, and Applicatives.
Examples (from most to least generic):
fmap == liftA == liftM
(<*>) == ap
liftA[2345] == liftM[2345]
pure == return
(*>) == (>>)
An example not directly based on the FAM class tree:
fmap == map
(I thought there were quite a few more with List, Foldable, Traversable, but it looks like most were made more generic some time ago, as I only see the old, less generic type signatures in old stack overflow / message board questions)
I personally find this annoying, as it means that if I need to do x, and some function such as liftM allows me to do x, then I will have made my function less generic than it could have been, and I am only going to notice that kind of thing by thoroughly reasoning about the differences between types (such as FAM, or perhaps List, Foldable, Traversable combinations as well), which is not beginner friendly at all, as while simply using those types isn't all that hard, reasoning about their properties and laws requires a lot more mental effort.
I am guessing a lot of these equivalencies come from the Applicative Monad Proposal. If that is the reason for them (and not some other reason I am missing for having less generic functions available for confusion), are they going to be deprecated / deleted ever? I can understand waiting a long time to delete them, due to breaking existing code, but surely deprecation is a good idea?
The short answers are "history" and "regularity".
Originally "map" was defined for lists. Then type-classes were introduced, with the Functor type class, so the generalised version of "map" for any functor had to be called something different, otherwise existing code would be broken. Hence "fmap".
Then monads came along. Instances of monads did not need to be functors, so "liftM" was created, along with "liftM2", "liftM3" etc. Of course if a type is an instance of both Monad and Functor then fmap = liftM.
Monads also have "ap", used in expressions like f `ap` arg1 `ap` arg2. This was very handy, but then Applicative Functors were added. (<*>) did the same job for applicative functors as 'ap', but because many applicative functors are not monads it had to be called something different. Likewise liftAx versus liftMx and "pure" versus "return".
They aren't equivalent though. equivalent things in haskell can be interchanged with no difference at all in functionality. Consider for example pure and return
EDIT: I wrote some examples down, but they were really bad since they involved Maybe a, a type that is both an applicative and a monad, so the functions could be used pretty interchangeably.
There are types that are applicatives but not monads though (see this question for examples), and by studying the type of the following expression, we can see that this could lead to some roadbumps:
pure 1 >>= pure :: (Monad m, Num b) => m b
I personally find this annoying, as it means that if I need to do x, and some function such as liftM allows me to do x, then I will have made my function less generic than it could have been
This logic is backwards.
Normally you know in advance the type of the thing you want to write, be it IO String or (Foldable f, Monoid t, Monad m) => f (m t) -> m t or whatever. Let's take the first case, getLineCapitalized :: IO String. You could write it as
getLineCapitalized = liftM (map toUpper) getLine
or
getLineCapitalized = fmap (fmap toUpper) getLine
Is the former "less generic" because it uses the specialized functions liftM and map? Of course not. This is intrinsically an IO action that produces a list. It cannot become "more generic" by changing it to the second version since those fmaps will have their types fixed to IO and [] anyways. So, there is no advantage to the second version.
By writing the first version, you provide contextual information to the reader for free. In liftM (map foo) bar, the reader knows that bar is going to be an action in some monad that returns a list. In fmap (fmap foo) bar, it could be any sort of doubly-nested structure whatsoever. If bar is something complicated rather than just getLine, then this kind of information is helpful for understanding more easily what is going on in bar.
In general, you should write a function in two steps.
Decide what the type of the function should be. Make it as general or as specific as you want. The more general the type of the function, the stronger guarantees you get on its behavior from parametricity.
Once you have decided on the type of your function, implement it using the most specific available functions. By doing so, you are providing the most information to the reader of your function. You never lose any generality or parametricity guarantees by doing so, since those only depend on the type, which you already determined in step 1.
Edit in response to comments: I was reminded of the biggest reason to use the most specific function available, which is catching bugs. The type length :: [a] -> Int is essentially the entire reason that I still use GHC 7.8. It's never happened that I wanted to take the length of an unknown Foldable structure. On the other hand, I definitely do not want to ever accidentally take the length of a pair, or take the length of foo bar baz which I think has type [a], but actually has type Maybe [a].
In the use cases for Foldable that are not already covered by the rest of the Haskell standard, lens is a vastly more powerful alternative. If I want the "length" of a Maybe t, lengthOf _Just :: Maybe t -> Int expresses my intent clearly, and the compiler can check that the program actually matches my intent; and I can go on to write lengthOf _Nothing, lengthOf _Left, etc. Explicit is better than implicit.
There are some "redundant" functions like liftM, ap, and liftA that have a very real use and taking them out would cause loss of functionality --- you can use liftM, ap, and liftA to implement your Functor or Applicative instances if all you've written is a Monad instance. It lets you be lazy and do, say:
instance Monad Foo where
return = ...
(>>=) = ...
Now you've done all of the rewarding work of defining a Monad instance, but this won't compile. Why? Because you also need a Functor and Applicative instance.
So, because you're quickly prototyping, or lazy, or can't think of a better way, you can just get a free Functor and Applicative instance:
instance Functor Foo where
fmap = liftM
instance Applicative Foo where
pure = return
(<*>) = ap
In fact, you can just copy-and-paste that chunk of code everywhere you need to quickly define a Functor or Applicative instance when you already have a Monad instance defined.
The same goes for fmapDefault from Data.Traversable. If you've implemented Traversable, you can also implement Foldable and Functor:
instance Functor Bar where
fmap = fmapDefault
no extra work required!
There are some redundant functions, however, that really have no actual usage other than being historical accidents from a time when Functor was not a superclass of Monad. These have literally zero use/point in existing...and include things like the liftM2, liftM3 etc., and (>>) and friends.

Functors and Non-Inductive Types

I am working through the section on Functors in the Typeclassopedia.
A simple intuition is that a Functor represents a “container” of some sort, along with the ability to apply a function uniformly to every element in the container.
OK. So, functors appear pretty natural for inductive types like lists or trees.
Functors also appear pretty simple if the number of elements is fixed to a low number. For example, with Maybe you just have to be concerned about "Nothing" or "Just a" -- two things.
So, how would you make something like a graph, that could potentially have loops, an instance of Functor? I think a more generalized way to put it is, how do non-inductive types "fit into" Functors?
The more I think about it, the more I realize that inductive / non-inductive doesn't really matter. Inductive types are just easier to define fmap for...
If I wanted to make a graph an instance of Functor, I would have to implement a graph traversal algorithm inside fmap; for example it would probably have to use a helper function that would keep track of the visited nodes. At this point, I am now wondering why bother defining it as a Functor instead of just writing this as a function itself? E.g. map vs fmap for lists...?
I hope someone with experience, war stories, and scars can shed some light. Thanks!
Well let's assume you define a graph like this
data Graph a = Node a [Graph a]
Then fmap is just defined precisely as you would expect
instance Functor Graph where
fmap f (Node a ns) = Node (f a) (map (fmap f) ns)
Now, if there's a loop then we'd have had to do something like
foo = Node 1 [bar]
bar = Node 2 [foo]
Now fmap is sufficiently lazy that you can evaluate part of it's result without forcing the rest of the computation, so it works just as well as any knot-tied graph representation would!
In general this is the trick: fmap is lazy so you can treat it's results just as you would treat any non-inductive values in Haskell (: carefully).
Also, you should define fmap vs the random other functions since
fmap is a good, well known API with rules
Your container now places well with things expecting Functors
You can abstract away other bits of your program so they depend on Functor, not your Graph
In general when I see something is a functor I think "Ah wonderful, I know just how to use that" and when I see
superAwesomeTraversal :: (a -> b) -> Foo a -> Foo b
I get a little worried that this will do unexpected things..

What does a nontrivial comonoid look like?

Comonoids are mentioned, for example, in Haskell's distributive library docs:
Due to the lack of non-trivial comonoids in Haskell, we can restrict ourselves to requiring a Functor rather than some Coapplicative class.
After a little searching I found a StackOverflow answer that explains this a bit more with the laws that comonoids would have to satisfy. So I think I understand why there's only one possible instance for a hypothetical Comonoid typeclass in Haskell.
Thus, to find a nontrivial comonoid, I suppose we'd have to look in some other category. Surely, if category theorists have a name for comonoids, then there are some interesting ones. The other answers on that page seem to hint at an example involving Supply, but I couldn't figure one out that still satisfies the laws.
I also turned to Wikipedia: there's a page for monoids that doesn't reference category theory, which seems to me as an adequate description of Haskell's Monoid typeclass, but "comonoid" redirects to a category-theoretic description of monoids and comonoids together that I can't understand, and there still don't seem to be any interesting examples.
So my questions are:
Can comonoids be explained in non-category-theoretic terms like monoids?
What is a simple example of an interesting comonoid, even if it's not a Haskell type? (Could one be found in a Kleisli category over a familiar Haskell monad?)
edit: I am not sure if this is actually category-theoretically correct, but what I was imagining in the parenthetical of question 2 was nontrivial definitions of delete :: a -> m () and split :: a -> m (a, a) for some specific Haskell type a and Haskell monad m that satisfy Kleisli-arrow versions of the comonoid laws in the linked answer. Other examples of comonoids are still welcome.
As Phillip JF mentioned, comonoids are interesting to talk about in substructural logics. Let's talk about linear lambda calculus. This is much like your normal typed lambda calculus except that every variable must be used exactly once.
To get a feel, let's count linear functions of given types, i.e.
a -> a
has exactly one inhabitant, id. While
(a,a) -> (a,a)
has two, id and flip. Note that in regular lambda calculus (a,a) -> (a,a) has four inhabitants
(a, b) ↦ (a, a)
(a, b) ↦ (b, b)
(a, b) ↦ (a, b)
(a, b) ↦ (b, a)
but the first two require that we use one of the arguments twice while discarding the other. This is exactly the essence of linear lambda calculus—disallowing those kinds of functions.
As a quick aside, what's the point of linear LC? Well, we can use it to model linear effects or resource usage. If, for instance, we have a file type and a few transformers it might look like
data File
open :: String -> File
close :: File -> () -- consumes a file, but we're ignoring purity right now
t1 :: File -> File
t2 :: File -> File
and then the following are valid pipelines:
close . t1 . t2 . open
close . t2 . t1 . open
close . t1 . open
close . t2 . open
but this "branching" computation isn't
let f1 = open "foo"
f2 = t1 f1
f3 = t2 f1
in close f3
since we used f1 twice.
Now, you might be wondering something at this point about what things must follow the linear rules. For instance, I decided that some pipelines don't have to include both t1 and t2 (compare the enumeration exercise from before). Further, I introduced the open and close functions which happily create and destroy the File type despite that being a violation of linearity.
Indeed, we might posit the existence of functions which violate linearity—but not all clients may. It's much like the IO monad—all of the secrets live inside the implementation of IO so that users work in a "pure" world.
And this is where Comonoid comes in.
class Comonoid m where
destroy :: m -> ()
split :: m -> (m, m)
A type that instantiates Comonoid in a linear lambda calculus is a type which has carry-along destruction and duplication rules. In other words, it's a type which isn't very much bound by linear lambda calculus at all.
Since Haskell doesn't implement the linear lambda calculus rules at all, we can always instantiate Comonoid
instance Comonoid a where
destroy a = ()
split a = (a, a)
Or, perhaps the other way to think of it is that Haskell is a linear LC system that just happens to instantiate Comonoid for every type and applies destroy and split for you automatically.
A monoid in the usual sense is the same as a categorical monoid in the category of sets. One would expect that a comonoid in the usual sense is the same as a categorical comonoid in the category of sets. But every set in the category of sets is a comonoid in a trivial way, so apparently there is no non-categorical description of comonoids which would be parallel to that of monoids.
Just like a monad is a monoid in the category of endofunctors (what's the problem?), a comonad is a comonoid in the category of endofunctors (what's the coproblem?) So yes, any comonad in Haskell would be an example of a comonoid.
Well one way we can think of a monoid is as hooked to any particular product construction that we're using, so in Set we'd take this signature:
mul : A * A -> A
one : A
to this one:
dup : A -> A * A
one : A
but the idea of duality is that the logical statements that you can make all have duals which can be applied to the dual objects, and there is another way of stating what a monoid is, and that's being agnostic to the choice of product construction and then when we take the costructure we can take the coproduct in the output, like:
div : A -> A + A
one : A
where + is a tagged sum. Here we essentially have that every single term which is in this type is always ready to produce a new bit, which is implicitly derived from the tag used to denote the left or the right instance of A. I personally think this is really damn cool. I think the cool version of the things that people were talking about above is when you don't particularly construct that for monoids, but for monoid actions.
A monoid M is said to act on a set A if there's a function
act : M * A -> A
where we have the following rules
act identity a = a
act f (act g a) = act (f * g) a
If we want a co-action, what exactly do we want?
act : A -> M * A
this generates us a stream of the type of our comonoid! I'm having a lot of trouble coming up with the laws for these systems, but I think they must be around somewhere so I'm gonna keep looking tonight. If somebody can tell me them or that I'm wrong about these things in some way or another, also interested in that.
As a physicist, the most common example I deal with is coalgebras, which are comonoid objects in the category of vector spaces, with the monoidal structure usually given by the tensor product.
In that case, there is a bijection between monoid and comonoid objects, since you can just take the adjoint or transpose of the product and unit maps to get a coproduct and a counit that satisfy the comonoid axioms.
In some branches of physics, it is very common to see objects that have both an algebra and a coalgebra structure with some compatibility axioms. The two most common cases are Hopf algebras and Frobenius algebras. They are very convenient for constructing states or solution that are entangled or correlated.
In programming, the simplest nontrivial example I can think of would be reference counted pointers such as shared_ptr in C++ and Rc in Rust, along with their weak equivalents. You can copy them, which is a nontrivial operation that bumps up the refcount (so the two copies are distinct from the initial state). You can drop (call the destructor) on one, which is nontrivial because it bumps down the refcount of any other refcounted pointer that points to the same piece of data.
Furthermore, weak pointers are a great example of a comonoid action. You can use the co-action to generate a weak pointer from a shared pointer. This can be easily checked by noting that creating one from a shared pointer and immediately dropping it is a unit operation, and creating one & cloning it is equivalent to creating two from the shared pointer.
This is a general thing you see with nontrivial coproducts and their co-actions: when they don't reduce to a copying operation, they intuitively imply some form of action at a distance between the two halves, while also adding an operation that erases one half to leave the other independent.

What functionality do you get for free with Functors or other type-classes?

I read an article which said:
Providing instances for the many standard type-classes [Functors] will immediately give you a lot of functionality for practically free
My question is: what is this functionality that you get for free (for functors or other type-classes)? I know what the definition of a functor is, but what do I get for free by defining something as a functor/other type-class. Something other than a prettier syntax. Ideally this would be general and useful functions that operate on functors/other type-classes.
My imagination (could be wrong) of what free means is functions of this sort: TypeClass x => useful x y = ..
== Edit/Additition ==
I guess I'm mainly asking about the more abstract (and brain boggling) type-classes, like the ones in this image. For less abstract classes like Ord, my object oriented intuition understands.
Functors are simple and probably not the best example. Let's look at Monads instead:
liftM - if something is a Monad, it is also a Functor where liftM is fmap.
>=>, <=<: you can compose a -> m b functions for free where m is your monad.
foldM, mapM, filterM... you get a bunch of utility functions that generalize existing functions to use your monad.
when, guard* and unless -- you also get some control functions for free.
join -- this is actually fairly fundamental to the definition of a monad, but you don't need to define it in Haskell since you've defined >>=.
transformers -- ErrorT and stuff. You can bolt error handling onto your new type, for free (give or take)!
Basically, you get a wide variety of standard functions "lifted" to use your new type as soon as you make it a Monad instance. It also becomes trivial (but alas not automatic) to make it a Functor and Applicative as well.
However, these are all "symptoms" of a more general idea. You can write interesting, nontrivial code that applies to all monads. You might find some of the functions you wrote for your type--which are useful in your particular case, for whatever reason--can be generalized to all monads. Now you can suddenly take your function and use it on parsers, and lists, and maybes and...
* As Daniel Fischer helpfully pointed out, guard requires MonadPlus rather than Monad.
Functors are not very interesting by themselves, but they are a necessary stepping stone to get into applicative functors and Traversables.
The main property which makes applicative functors useful is that you can use fmap with the applicative operator <*> to "lift" any function of any arity to work with applicative values. I.e. you can turn any a -> b -> c -> d into Applicative f => f a -> f b -> f c -> f d. You can also take a look at Data.Traversable and Data.Foldable which contain several general purpose functions that involve applicative functors.
Alternative is a specialized applicative functor which supports choice between alternatives that can "fail" (the exact meaning of "empty" depends in the applicative instance). Applicative parsers are one practical example where the definitions of some and many are very intuitive (e.g. match some pattern zero-or-more times or one-or-more times).
Monads are one of the most interesting and useful type-classes, but they are already well covered by the other answers.
Monoid is another type-class that is both simple and immediately useful. It basically defines a way to add two pieces of data together, which then gives you a generic concat as well as functionality in the aforementioned Foldable module and it also enables you to use the Writer monad with the data type.
There are many of the standard functions in haskell that require that their arguments implement one or more type-classes. Doing so in your code allows other developers (or yourself) to use your data in ways they are already familiar with, without having to write additional functions.
As an example, implementing the Ord type-class will allow you to use things like sort, min, max, etc. Where otherwise, you would need sortBy and the like.
Yes, it means that implementing the type class Foo gives you all the other functions that have a Foo constraint "for free".
The Functor type class isn't too interesting in that regard, as it doesn't give you a lot.
A better example is monads and the functions in the Control.Monad module. Once you've defined the two Monad functions (>>=) and return for your type, you get another thirty or so functions that can then be used on your type.
Some of the more useful ones include: mapM, sequence, forever, join, foldM, filterM, replicateM, when, unless and liftM. These show up all the time in Haskell code.
As others have said, Functor itself doesn't actually get you much for free. Basically, the more high-level or general a typeclass is (meaning the more things fit that description), then the less "free" functionality you are going to get. So for example, Functor, and Monoid don't provide you with much, but Monad and Arrow provide you with a lot of useful functions for free.
In Haskell, it's still a good idea to write an instance for Functor and Monoid though (if your data type is indeed a functor or a monoid), because we almost always try to use the most general interface possible when writing functions. If you are writing a new function that can get away with only using fmap to operate on your data type, then there is no reason to artificially restrict that function to to Monads or Applicatives, since it might be useful later for other things.
Your object-oriented intuition carries across, if you read "interface and implementation" for "typeclass and instance". If you make your new type C an instance of a standard typeclass B, then you get for free that your type will work with all existing code A that depends on B.
As others have said, when the typeclass is something like Monad, then the freebies are the many library functions like foldM and when.

How do you identify monadic design patterns?

I my way to learn Haskell I'm starting to grasp the monad concept and starting to use the known monads in my code but I'm still having difficulties approaching monads from a designer point of view. In OO there are several rules like, "identify nouns" for objects, watch for some kind of state and interface... but I'm not able to find equivalent resources for monads.
So how do you identify a problem as monadic in nature? What are good design patterns for monadic design? What's your approach when you realize that some code would be better refactored into a monad?
A helpful rule of thumb is when you see values in a context; monads can be seen as layering "effects" on:
Maybe: partiality (uses: computations that can fail)
Either: short-circuiting errors (uses: error/exception handling)
[] (the list monad): nondeterminism (uses: list generation, filtering, ...)
State: a single mutable reference (uses: state)
Reader: a shared environment (uses: variable bindings, common information, ...)
Writer: a "side-channel" output or accumulation (uses: logging, maintaining a write-only counter, ...)
Cont: non-local control-flow (uses: too numerous to list)
Usually, you should generally design your monad by layering on the monad transformers from the standard Monad Transformer Library, which let you combine the above effects into a single monad. Together, these handle the majority of monads you might want to use. There are some additional monads not included in the MTL, such as the probability and supply monads.
As far as developing an intuition for whether a newly-defined type is a monad, and how it behaves as one, you can think of it by going up from Functor to Monad:
Functor lets you transform values with pure functions.
Applicative lets you embed pure values and express application — (<*>) lets you go from an embedded function and its embedded argument to an embedded result.
Monad lets the structure of embedded computations depend on the values of previous computations.
The easiest way to understand this is to look at the type of join:
join :: (Monad m) => m (m a) -> m a
This means that if you have an embedded computation whose result is a new embedded computation, you can create a computation that executes the result of that computation. So you can use monadic effects to create a new computation based on values of previous computations, and transfer control flow to that computation.
Interestingly, this can be a weakness of structuring things monadically: with Applicative, the structure of the computation is static (i.e. a given Applicative computation has a certain structure of effects that cannot change based on intermediate values), whereas with Monad it is dynamic. This can restrict the optimisation you can do; for instance, applicative parsers are less powerful than monadic ones (well, this isn't strictly true, but it effectively is), but they can be optimised better.
Note that (>>=) can be defined as
m >>= f = join (fmap f m)
and so a monad can be defined simply with return and join (assuming it's a Functor; all monads are applicative functors, but Haskell's typeclass hierarchy unfortunately doesn't require this for historical reasons).
As an additional note, you probably shouldn't focus too heavily on monads, no matter what kind of buzz they get from misguided non-Haskellers. There are many typeclasses that represent meaningful and powerful patterns, and not everything is best expressed as a monad. Applicative, Monoid, Foldable... which abstraction to use depends entirely on your situation. And, of course, just because something is a monad doesn't mean it can't be other things too; being a monad is just another property of a type.
So, you shouldn't think too much about "identifying monads"; the questions are more like:
Can this code be expressed in a simpler monadic form? With which monad?
Is this type I've just defined a monad? What generic patterns encoded by the standard functions on monads can I take advantage of?
Follow the types.
If you find you have written functions with all of these types
(a -> b) -> YourType a -> YourType b
a -> YourType a
YourType (YourType a) -> YourType a
or all of these types
a -> YourType a
YourType a -> (a -> YourType b) -> YourType b
then YourType may be a monad. (I say “may” because the functions must obey the monad laws as well.)
(Remember you can reorder arguments, so e.g. YourType a -> (a -> b) -> YourType b is just (a -> b) -> YourType a -> YourType b in disguise.)
Don't look out only for monads! If you have functions of all of these types
YourType
YourType -> YourType -> YourType
and they obey the monoid laws, you have a monoid! That can be valuable too. Similarly for other typeclasses, most importantly Functor.
There's the effect view of monads:
Maybe - partiality / failure short-circuiting
Either - error reporting / short-circuiting (like Maybe with more information)
Writer - write only "state", commonly logging
Reader - read-only state, commonly environment passing
State - read / write state
Resumption - pausable computation
List - multiple successes
Once you are familiar with these effects its easy to build monads combining them with monad transformers. Note that combining some monads needs special care (particularly Cont and any monads with backtracking).
One thing important to note is there aren't many monads. There are some exotic ones that aren't in the standard libraries e.g the probability monad and variations of the Cont monad like Codensity. But unless you are doing something mathematical its unlikely you will invent (or discover) a new monad, however if you use Haskell long enough you'll build many monads that are different combinations of the standard ones.
Edit - Also note that the order you stack monad transformers results in different monads:
If you add ErrorT (transformer) to a Writer monad, you get this monad Either err (log,a) - you can only access the log if you have no error.
If you add WriterT (transfomer) to an Error monad, you get this monad (log, Either err a) which always gives access to the log.
This is sort of a non-answer, but I feel it is important to say anyways. Just ask! StackOverflow, /r/haskell, and the #haskell irc channel are all great places to get quick feedback from smart people. If you are working on a problem, and you suspect that there's some monadic magic that could make it easier, just ask! The Haskell community loves to solve problems, and is ridiculously friendly.
Don't misunderstand, I'm not encouraging you to never learn for yourself. Quite the contrary, interacting with the Haskell community is one of the best ways to learn. LYAH and RWH, 2 Haskell books that are freely available online, come highly recommended as well.
Oh, and don't forget to play, play, play! As you play around with monadic code, you'll start to get the feel of what "shape" monads have, and when monadic combinators can be useful. If you're rolling your own monad, then usually the type system will guide you to an obvious, simple solution. But to be honest, you should rarely need to roll your own instance of Monad, since Haskell libraries provide tons of useful things as mentioned by other answerers.
There's a common notion that one sees in many programming languages of an "infectious function tag" -- some special behavior for a function that must extend to its callers as well.
Rust functions can be unsafe, meaning they perform operations that can potentially violate memory unsafety. unsafe functions can call normal functions, but any function that calls an unsafe function must be unsafe as well.
Python functions can be async, meaning they return a promise rather than an actual value. async functions can call normal functions, but invocation of an async function (via await) can only be done by another async function.
Haskell functions can be impure, meaning they return an IO a rather than an a. Impure functions can call pure functions, but impure functions can only be called by other impure functions.
Mathematical functions can be partial, meaning they don't map every value in their domain to an output. The definitions of partial functions can reference total functions, but if a total function maps some of its domain to a partial function, it becomes partial as well.
While there may be ways to invoke a tagged function from an untagged function, there is no general way, and doing so can often be dangerous and threatens to break the abstraction the language tries to provide.
The benefit, then, of having tags is that you can expose a set of special primitives that are given this tag and have any function that uses these primitives make that clear in its signature.
Say you're a language designer and you recognize this pattern, and you decide that you want to allow user-defined tags. Let's say the user defined a tag Err, representing computations that may throw an error. A function using Err might look like this:
function div <Err> (n: Int, d: Int): Int
if d == 0
throwError("division by 0")
else
return (n / d)
If we wanted to simplify things, we might observe that there's nothing erroneous about taking arguments - it's computing the return value where problems might arise. So we can restrict tags to functions that take no arguments, and have div return a closure rather than the actual value:
function div(n: Int, d: Int): <Err> () -> Int
() =>
if d == 0
throwError("division by 0")
else
return (n / d)
In a lazy language such as Haskell, we don't need the closure, and can just return a lazy value directly:
div :: Int -> Int -> Err Int
div _ 0 = throwError "division by 0"
div n d = return $ n / d
It is now apparent that, in Haskell, tags need no special language support - they are ordinary type constructors. Let's make a typeclass for them!
class Tag m where
We want to be able to call an untagged function from a tagged function, which is equivalent to turning an untagged value (a) into a tagged value (m a).
addTag :: a -> m a
We also want to be able to take a tagged value (m a) and apply a tagged function (a -> m b) to get a tagged result (m b):
embed :: m a -> (a -> m b) -> m b
This, of course, is precisely the definition of a monad! addTag corresponds to return, and embed corresponds to (>>=).
It is now clear that "tagged functions" are merely a type of monad. As such, whenever you spot a place where a "function tag" could apply, chances are you've got a place suitable for a monad.
P.S. Regarding the tags I've mentioned in this answer: Haskell models impurity with the IO monad and partiality with the Maybe monad. Most languages implement async/promises fairly transparently, and there seems to be a Haskell package called promise that mimics this functionality. The Err monad is equivalent to the Either String monad. I'm not aware of any language that models memory unsafety monadically, it could be done.

Resources