I'm playing around with rewriting simple functions in different ways and I clearly misunderstand some core concepts. Is there a better way to work with limited types like these?
mlength :: Monoid m => m -> Int
mlength mempty = 0
mlength (l <> r) = mlength l + mlength r
It fails compilation with the following error:
Parse error in pattern: l <> r
I can see that my usage of <> is misguided because there are multiple correct matches for l and r. Even though it looks like it doesn't matter which value is assigned, a value still has to be assigned in the end. Maybe there's a way for me to assert this decision for specific Monoid instances?
"ab" == "" <> "ab"
"ab" == "a" <> "b"
"ab" == "ab" <> ""
A monoid, in the general case, has no notion of length. Take for instance Sum Int, which is Int equipped with addition for its monoidal operation. We have
Sum 3 <> Sum 4 = Sum 7 = Sum (-100) <> Sum 7 <> Sum (100)
What should be its "length"? There is no real notion of length here, since the underlying type is Int, which is not a list-like type.
Another example: Endo Int which is Int -> Int equipped with composition. E.g.
Endo (\x -> x+1) <> Endo (\x -> x*2) = Endo (\x -> 2*x+1)
Again, no meaningful "length" can be defined here.
You can browse Data.Monoid and see other examples where there is no notion of "length".
Const a is also a (boring) monoid with no length.
Now, it is true that lists [a] form a monoid (the free monoid over a), and length can indeed be defined there. Still, this is only a particular case, which does not generalize.
The Semigroup and Monoid interfaces provide a means to build up values, (<>). They don't, however, give us a way to break down or otherwise extract information from values. That being so, a length generalised beyond some specific type requires a different abstraction.
As discussed in the comments to chi's answer, while Data.Foldable offers a generalised length :: Foldable t => t a -> Int, it isn't quite what you were aiming at -- in particular, the connection between Foldable and Monoid is that foldable structures can be converted to lists/the free monoid, and not that foldables themselves are necessarily monoids.
One other possibility, which is somewhat obscure but closer to the spirit of your question, is the Factorial class from the monoid-subclasses package, a subclass of Semigroup. It is built around factors :: Factorial m => m -> [m], which splits a value into irreducible factors, undoing what sconcat or mconcat do. A generalised length :: Factorial m => m -> Int can then be defined as the length of the list of factors. In any case, note that we still end up needing a further abstraction on the top of Semigroup/Monoid.
Related
Looks like I have a pretty clear understanding what a Monoid is in Haskell, but last time I heard about something called a free monoid.
What is a free monoid and how does it relate to a monoid?
Can you provide an example in Haskell?
As you already know, a monoid is a set with an element e and an operation <> satisfying
e <> x = x <> e = x (identity)
(x<>y)<>z = x<>(y<>z) (associativity)
Now, a free monoid, intuitively, is a monoid which satisfies only those equations above, and, obviously, all their consequences.
For instance, the Haskell list monoid ([a], [], (++)) is free.
By contrast, the Haskell sum monoid (Sum Int, Sum 0, \(Sum x) (Sum y) -> Sum (x+y)) is not free, since it also satisfies additional equations. For instance, it's commutative
x<>y = y<>x
and this does not follow from the first two equations.
Note that it can be proved, in maths, that all the free monoids are isomorphic to the list monoid [a]. So, "free monoid" in programming is only a fancy term for any data structure which 1) can be converted to a list, and back, with no loss of information, and 2) vice versa, a list can be converted to it, and back, with no loss of information.
In Haskell, you can mentally substitute "free monoid" with "list-like type".
In a programming context, I usually translate free monoid to [a]. In his excellent series of articles about category theory for programmers, Bartosz Milewski describes free monoids in Haskell as the list monoid (assuming one ignores some problems with infinite lists).
The identity element is the empty list, and the binary operation is list concatenation:
Prelude Data.Monoid> mempty :: [Int]
[]
Prelude Data.Monoid> [1..3] <> [7..10]
[1,2,3,7,8,9,10]
Intuitively, I think of this monoid to be 'free' because it a monoid that you can always apply, regardless of the type of value you want to work with (just like the free monad is a monad you can always create from any functor).
Additionally, when more than one monoid exists for a type, the free monoid defers the decision on which specific monoid to use. For example, for integers, infinitely many monoids exist, but the most common are addition and multiplication.
If you have two (or more integers), and you know that you may want to aggregate them, but you haven't yet decided which type of aggregation you want to apply, you can instead 'aggregate' them using the free monoid - practically, this means putting them in a list:
Prelude Data.Monoid> [3,7]
[3,7]
If you later decide that you want to add them together, then that's possible:
Prelude Data.Monoid> getSum $ mconcat $ Sum <$> [3,7]
10
If, instead, you wish to multiply them, you can do that as well:
Prelude Data.Monoid> getProduct $ mconcat $ Product <$> [3,7]
21
In these two examples, I've deliberately chosen to elevate each number to a type (Sum, Product) that embodies a more specific monoid, and then use mconcat to perform the aggregation.
For addition and multiplication, there are more succinct ways to do this, but I did it that way to illustrate how you can use a more specific monoid to interpret the free monoid.
A free monoid is a specific type of monoid. Specifically, it’s the monoid you get by taking some fixed set of elements as characters and then forming all possible strings from those elements. Those strings, with the underlying operation being string concatenation, form a monoid, and that monoid is called the free monoid.
A monoid (M,•,1) is a mathematical structure such that:
M is a set
1 is a member of M
• : M * M -> M
a•1 = a = 1•a
Given elements a, b and c in M, we have a•(b•c) = (a•b)•c.
A free monoid on a set M is a monoid (M',•,0) and function e : M -> M' such that, for any monoid (N,*,1), given a (set) map f : M -> N we can extend this to a monoid morphism f' : (M',•,0) -> (N,*,1), i.e
f a = f' (e a)
f' 0 = 1
f' (a•b) = (f' a) • (f' b)
In other words, it is a monoid that does nothing special.
An example monoid is the integers with the operation being addition and the identity being 0. Another monoid is sequences of integers with the operation being concatenation and the identity being the empty sequence. Now the integers under addition is not a free monoid on the integers. Consider the map into sequences of integers taking n to (n). Then for this to be free we would need to extend this to a map taking n + m to (n,m), i.e. it must take 0 to (0) and to (0,0) and to (0,0,0) and so on.
On the other hand if we try to look at sequences of integers as a free monoid on the integers, we see that it seems to work in this case. The extension of the map into the integers with addition is one that takes the sum of a sequence (with the sum of () being 0).
So what is the free monoid on a set S? Well one thing we could try is just arbitrary binary trees of S. In a Haskell type this would look like:
data T a = Unit | Single a | Conc (T a) (T a)
And it would have an identity of Unit, e = Single and (•) = Conc.
And we can write a function to show how it is free:
-- here the second argument represents a monoid structure on b
free :: (a -> b) -> (b -> b -> b, b) -> T a -> b
free f ((*),zero) = f' where
f' (Single a) = f a
f' Unit = zero
f' (Conc a b) = f' a * f' b
It should be quite obvious that this satisfies the required laws for a free monoid on a. Except for one: T a is not a monoid because it does not quite satisfy laws 4 or 5.
So now we should ask if we can make this into a simpler free monoid, ie one that is an actual monoid. The answer is yes. One way is to observe that Conc Unit a and Conc a Unit and Single a should be the same. So let’s make the first two types unrepresentable:
data TInner a = Single a | Conc (TInner a) (TInner a)
data T a = Unit | Inner (TInner a)
A second observation we can make is that there should be no difference between Conc (Conc a b) c and Conc a (Conc b c). This is due to law 5 above. We can then flatten our tree:
data TInner a = Single a | Conc (a,TInner a)
data T a = Unit | Inner (TInner a)
The strange construction with Conc forces us to only have a single way to represent Single a and Unit. But we see we can merge these all together: change the definition of Conc to Conc [a] and then we can change Single x to Conc [x], and Unit to Conc [] so we have:
data T a = Conc [a]
Or we can just write:
type T a = [a]
And the operations are:
unit = []
e a = [a]
(•) = append
free f ((*),zero) = f' where
f' [] = zero
f' (x:xs) = f x * f' xs
So in Haskell, the list type is called the free monoid.
Since I had trouble googling this question I thought I'd post it here.
I'm just interested in the logic behind it or wether it's just the creators' preference to use ++ instead. I mean, using a typeclass for strings that concatenates two strings (or rather lists) with + does not seem too hard to imagine.
Edit: I should add, that in Haskell one has to suspect reasons behind it, because + and ++ are functions defined in typeclasses, whereas in java the usage of + for string concatenation is just part of the language's syntax and therefor subject only to the creators preference/opinion. (The answers so far suggest that I was right about my suspicion.)
Also haskell comes from a mathematical background and is deeply influenced by mathematical syntax, so there might be deeper reasons than just preference/opinion.
typeclass for strings that concatenates two strings
Such a typeclass exists, although the operator isn't +, but <>:
Prelude> :m +Data.Monoid
Prelude Data.Monoid> "foo" <> "bar"
"foobar"
While ++ concatenates lists, the <> operator is more general, since it combines any two values of a given Monoid instance.
As other people have pointed out, + is reserved for Num instances. Why isn't the Monoid binary operator called +, then? Because addition is only one of infinitely many monoids; multiplication is another:
Prelude Data.Monoid> Sum 2 <> Sum 3
Sum {getSum = 5}
Prelude Data.Monoid> Product 2 <> Product 3
Product {getProduct = 6}
Choosing something like <> as 'the' monoidal operator is preferred exactly because it carries little semantic baggage.
Long story short, it would cause type troubles.
(+) is part of the Num typeclass:
class Num a where
(+), (-), (*) :: a -> a -> a
negate :: a -> a
abs :: a -> a
signum :: a -> a
fromInteger :: Integer -> a
x - y = x + negate y
negate x = 0 - x
And (++) :: [a] -> [a] -> [a].
It's easy to see the first problem: if we wanted (+) to work on list, we would have to implement (*), negate, abs, signum, and fromInteger for lists as well. Which is spurious.
If we decided to seperate (+) from the typeclass, and make a new typeclass, maybe called Plussable for (+), there would be too many typeclasses to keep track of, and simple expressions like 1 + 2*(2-1) would no longer be of type Num a => a, it would be of type (Plussable a, Timesable a, Minusable a) => a, and so on for each operation. It would be far too complicated.
According to Harper (https://existentialtype.wordpress.com/2011/04/16/modules-matter-most/), it seems that Type Classes simply do not offer the same level of abstraction that Modules offer and I'm having a hard time exactly figuring out why. And there are no examples in that link, so it's hard for me to see the key differences. There are also other papers on how to translate between Modules and Type Classes (http://www.cse.unsw.edu.au/~chak/papers/modules-classes.pdf), but this doesn't really have anything to do with the implementation in the programmer's perspective (it just says that there isn't something one can do that the other can't emulate).
Specifically, in the first link:
The first is that they insist that a type can implement a type class in exactly one way. For example, according to the philosophy of type classes, the integers can be ordered in precisely one way (the usual ordering), but obviously there are many orderings (say, by divisibility) of interest. The second is that they confound two separate issues: specifying how a type implements a type class and specifying when such a specification should be used during type inference.
I don't understand either. A type can implement a type class in more than 1 way in ML? How would you have the integers ordered by divisibility by example without creating a new type? In Haskell, you would have to do something like use data and have the instance Ord to offer an alternative ordering.
And the second one, aren't the two are distinct in Haskell?
Specifying "when such a specification should be used during type inference" can be done by something like this:
blah :: BlahType b => ...
where BlahType is the class being used during the type inference and NOT the implementing class. Whereas, "how a type implements a type class" is done using instance.
Can some one explain what the link is really trying to say? I'm just not quite understanding why Modules would be less restrictive than Type Classes.
To understand what the article is saying, take a moment to consider the Monoid typeclass in Haskell. A monoid is any type, T, which has a function mappend :: T -> T -> T and identity element mempty :: T for which the following holds.
a `mappend` (b `mappend` c) == (a `mappend` b) `mappend` c
a `mappend` mempty == mempty `mappend` a == a
There are many Haskell types which fit this definition. One example that springs immediately to mind are the integers, for which we can define the following.
instance Monoid Integer where
mappend = (+)
mempty = 0
You can confirm that all of the requirements hold.
a + (b + c) == (a + b) + c
a + 0 == 0 + a == a
Indeed, the those conditions hold for all numbers over addition, so we can define the following as well.
instance Num a => Monoid a where
mappend = (+)
mempty = 0
So now, in GHCi, we can do the following.
> mappend 3 5
8
> mempty
0
Particularly observant readers (or those with a background in mathemetics) will probably have noticed by now that we can also define a Monoid instance for numbers over multiplication.
instance Num a => Monoid a where
mappend = (*)
mempty = 1
a * (b * c) == (a * b) * c
a * 1 == 1 * a == a
But now the compiler encounters a problem. Which definiton of mappend should it use for numbers? Does mappend 3 5 equal 8 or 15? There is no way for it to decide. This is why Haskell does not allow multiple instances of a single typeclass. However, the issue still stands. Which Monoid instance of Num should we use? Both are perfectly valid and make sense for certain circumstances. The solution is to use neither. If you look Monoid in Hackage, you will see that there is no Monoid instance of Num, or Integer, Int, Float, or Double for that matter. Instead, there are Monoid instances of Sum and Product. Sum and Product are defined as follows.
newtype Sum a = Sum { getSum :: a }
newtype Product a = Product { getProduct :: a }
instance Num a => Monoid (Sum a) where
mappend (Sum a) (Sum b) = Sum $ a + b
mempty = Sum 0
instance Num a => Monoid (Product a) where
mappend (Product a) (Product b) = Product $ a * b
mempty = Product 1
Now, if you want to use a number as a Monoid you have to wrap it in either a Sum or Product type. Which type you use determines which Monoid instance is used. This is the essence of what the article was trying to describe. There is no system built into Haskell's typeclass system which allows you to choose between multiple intances. Instead you have to jump through hoops by wrapping and unwrapping them in skeleton types. Now whether or not you consider this a problem is a large part of what determines whether you prefer Haskell or ML.
ML gets around this by allowing multiple "instances" of the same class and type to be defined in different modules. Then, which module you import determines which "instance" you use. (Strictly speaking, ML doesn't have classes and instances, but it does have signatures and structures, which can act almost the same. For amore in depth comparison, read this paper).
So, I'm learning Haskell at the moment, and I would like to confirm or debunk my understanding of monoid.
What I figured out from reading CIS194 course is that monoid is basically "API" for defining custom binary operation on custom set.
Than I went to inform my self some more and I stumbled upon massive ammount of very confusing tutorials trying to clarify the thing, so I'm not so sure anymore.
I have decent mathematical background, but I just got confused from all the metaphors and am looking for clear yes/no answer to my understanding of monoid.
From Wikipedia:
In abstract algebra, a branch of mathematics, a monoid is an algebraic structure with a single associative binary operation and an identity element.
I think your understanding is correct. From a programming perspective, Monoid is an interface with two "methods" that must be implemented.
The only piece that seems to be missing from your description is the "identity", without which you are describing a Semigroup.
Anything that has a "zero" or an "empty" and a way of combining two values can be a Monoid. One thing to note is that it may be possible for a set/type to be made a Monoid in more than one way, for example numbers via addition with identity 0, or multiplication with identity 1.
from Wolfram:
A monoid is a set that is closed under an associative binary operation and has an identity element I in S such that for all a in S, Ia=aI=a.
from Wiki:
In abstract algebra, a branch of mathematics, a monoid is an algebraic structure with a single associative binary operation and an identity element.
so your intuition is more or less right.
You should only keep in mind that it's not defined for a "custom set" in Haskell but a type. The distinction is small (because types in type theory are very similar to sets in set theory) but the types for which you can define a Monoid instance need not be types that represent mathematical sets.
In other words: a type describes the set of all values that are of that type. Monoid is an "interface" that states that any type that claims to adhere to that interface must provide an identity value, a binary operation combining two values of that type, and there are some equations these should satisfy in order for all generic Monoid operations to work as intended (such as the generic summation of a list of monoid values) and not produce illogical/inconsistent results.
Also, note that the existence of an identity element in that set (type) is required for a type to be an instance of the Monoid class.
For example, natural numbers form a Monoid under both addition (identity = 0):
0 + n = n
n + 0 = n
as well as multiplication (identity = 1):
1 * n = n
n * 1 = n
also lists form a monoid under ++ (identity = []):
[] ++ xs = xs
xs ++ [] = xs
also, functions of type a -> a form a monoid under composition (identity = id)
id . f = f
f . id = f
so it's important to keep in mind that Monoid isn't about types that represents sets but about types when viewed as sets, so to say.
as an example of a malconstructed Monoid instance, consider:
import Data.Monoid
newtype MyInt = MyInt Int deriving Show
instance Monoid MyInt where
mempty = MyInt 0
mappend (MyInt a) (MyInt b) = MyInt (a * b)
if you now try to mconcat a list of MyInt values, you'll always get MyInt 0 as the result because the identity value 0 and binary operation * don't play well together:
λ> mconcat [MyInt 1, MyInt 2]
MyInt 0
At a basic level you're right - it's just an API for a binary operator we denote by <>.
However, the value of the monoid concept is in its relationship to other types and classes. Culturally we've decided that <> is the natural way of joining/appending two things of the same type together.
Consider this example:
{-# LANGUAGE OverloadedStrings #-}
import Data.Monoid
greet x = "Hello, " <> x
The function greet is extremely polymorphic - x can be a String, ByteString or Text just to name a few possibilities. Moreover, in each of these cases it does basically what you expect it to - it appends x to the string `"Hello, ".
Additionally, there are lots of algorithms which will work on anything that can be accumulated, and those are good candidates for generalization to a Monoid. For example consider the foldMap function from the Foldable class:
foldMap :: Monoid m => (a -> m) -> t a -> m
Not only does foldMap generalize the idea of folding over a structure, but I can generalize how the accumulation is performed by substituting the right Monoid instance.
If I have a foldable structure t containing Ints, I can use foldMap with the Sum monoid to get the sum of the Ints, or with Product to get the product, etc.
Finally, using <> affords convenience. For instance, there is an abundance of different Set implementations, but for all of them s <> t is always the union of two sets s and t (of the same type). This enables me to write code which is agnostic of the underlying implementation of the set thereby simplifying my code. The same can be said for a lot of other data structures, e.g. sequences, trees, maps, priority queues, etc.
While I've seen all kinds of weird things in Haskell sample code - I've never seen an operator plus being overloaded. Is there something special about it?
Let's say I have a type like Pair, and I want to have something like
Pair(2,4) + Pair(1,2) = Pair(3,6)
Can one do it in haskell?
I am just curious, as I know it's possible in Scala in a rather elegant way.
Yes
(+) is part of the Num typeclass, and everyone seems to feel you can't define (*) etc for your type, but I strongly disagree.
newtype Pair a b = Pair (a,b) deriving (Eq,Show)
I think Pair a b would be nicer, or we could even just use the type (a,b) directly, but...
This is very much like the cartesian product of two Monoids, groups, rings or whatever in maths, and there's a standard way of defining a numeric structure on it, which would be sensible to use.
instance (Num a,Num b) => Num (Pair a b) where
Pair (a,b) + Pair (c,d) = Pair (a+c,b+d)
Pair (a,b) * Pair (c,d) = Pair (a*c,b*d)
Pair (a,b) - Pair (c,d) = Pair (a-c,b-d)
abs (Pair (a,b)) = Pair (abs a, abs b)
signum (Pair (a,b)) = Pair (signum a, signum b)
fromInteger i = Pair (fromInteger i, fromInteger i)
Now we've overloaded (+) in an obvious way, but also gone the whole hog and overloaded (*) and all the other Num functions in the same, obvious, familiar way mathematics does it for a pair. I just don't see the problem with this. In fact I think it's good practice.
*Main> Pair (3,4.0) + Pair (7, 10.5)
Pair (10,14.5)
*Main> Pair (3,4.0) + 1 -- *
Pair (4,5.0)
* - Notice that fromInteger is applied to numeric literals like 1, so this was interpreted in that context as Pair (1,1.0) :: Pair Integer Double. This is also quite nice and handy.
Overloading in Haskell is only available using type classes. In this case, (+) belongs to the Num type class, so you would have to provide a Num instance for your type.
However, Num also contains other functions, and a well-behaved instance should implement all of them in a consistent way, which in general will not make sense unless your type represents some kind of number.
So unless that is the case, I would recommend defining a new operator instead. For example,
data Pair a b = Pair a b
deriving Show
infixl 6 |+| -- optional; set same precedence and associativity as +
Pair a b |+| Pair c d = Pair (a+c) (b+d)
You can then use it like any other operator:
> Pair 2 4 |+| Pair 1 2
Pair 3 6
I'll try to come at this question very directly, since you are keen on getting a straight "yes or no" on overloading (+). The answer is yes, you can overload it. There are two ways to overload it directly, without any other changes, and one way to overload it "correctly" which requires creating an instance of Num for your datatype. The correct way is elaborated on in the other answers, so I won't go over it.
Edit: Note that I'm not recommending the way discussed below, just documenting it. You should implement the Num typeclass and not anything I write here.
The first (and most "wrong") way to overload (+) is to simply hide the Prelude.+ function, and define your own function named (+) that operates on your datatype.
import Prelude hiding ((+)) -- hide the autoimport of +
import qualified Prelude as P -- allow us to refer to Prelude functions with a P prefix
data Pair a = Pair (a,a)
(+) :: Num a => Pair a -> Pair a -> Pair a -- redefinition of (+)
(Pair (a,b)) + (Pair (c,d)) = Pair ((P.+) a c,(P.+) b d ) -- using qualified (+) from Prelude
You can see here, we have to go through some contortions to hide the regular definition of (+) from being imported, but we still need a way to refer to it, since it's the only way to do fast machine addition (it's a primitive operation).
The second (slightly less wrong) way to do it is to define your own typeclass that only includes a new operator you name (+). You'll still have to hide the old (+) so haskell doesn't get confused.
import Prelude hiding ((+))
import qualified Prelude as P
data Pair a = Pair (a,a)
class Addable a where
(+) :: a -> a -> a
instance Num a => Addable (Pair a) where
(Pair (a,b)) + (Pair (c,d)) = Pair ((P.+) a c,(P.+) b d )
This is a bit better than the first option because it allows you to use your new (+) for lots of different data types in your code.
But neither of these are recommended, because as you can see, it is very inconvenient to access the regular (+) operator that is defined in the Num typeclass. Even though haskell allows you to redefine (+), all of the Prelude and the libraries are expecting the original (+) definition. Lucky for you, (+) is defined in a typeclass, so you can just make Pair an instance of Num. This is probably the best option, and it is what the other answerers have recommended.
The issue you are running into is that there are possibly too many functions defined in the Num typeclass (+ is one of them). This is just a historical accident, and now the use of Num is so widespread, it would be hard to change it now. Instead of splitting those functionalities out into separate typeclasses for each function (so they can be overridden separately) they are all glommed together. Ideally the Prelude would have an Addable typeclass, and a Subtractable typeclass etc. that allow you to define an instance for one operator at a time without having to implement everything that Num has in it.
Be that as it may, the fact is that you will be fighting an uphill battle if you want to write a new (+) just for your Pair data type. Too much of the other Haskell code depends on the Num typeclass and its current definition.
You might look into the Numeric Prelude if you are looking for a blue-sky reimplementation of the Prelude that tries to avoid some of the mistakes of the current one. You'll notice they've reimplemented the Prelude just as a library, no compiler hacking was necessary, though it's a huge undertaking.
Overloading in Haskell is made possible through type classes. For a good overview, you might want to look at this section in Learn You a Haskell.
The (+) operator is part of the Num type class from the Prelude:
class (Eq a, Show a) => Num a where
(+), (*), (-) :: a -> a -> a
negate :: a -> a
...
So if you'd like a definition for + to work for pairs, you would have to provide an instance.
If you have a type:
data Pair a = Pair (a, a) deriving (Show, Eq)
Then you might have a definition like:
instance Num a => Num (Pair a) where
Pair (x, y) + Pair (u, v) = Pair (x+u, y+v)
...
Punching this into ghci gives us:
*Main> Pair (1, 2) + Pair (3, 4)
Pair (4,6)
However, if you're going to give an instance for +, you should also be providing an instance for all of the other functions in that type class too, which might not always make sense.
If you only want (+) operator rather than all the Num operators, probably you have a Monoid instance, for example Monoid instance of pair is like this:
class (Monoid a, Monoid b) => Monoid (a, b) where
mempty = (mempty, mempty)
(a1, b1) `mappend` (a2, b2) = (a1 `mappend` a2, b1 `mappend` b2)
You can make (++) a alias of mappend, then you can write code like this:
(1,2) ++ (3,4) == (4,6)
("hel", "wor") ++ ("lo", "ld") == ("hello", "world")