Translated from https://gist.github.com/cscalfani/b0a263cf1d33d5d75ca746d81dac95c5
Why should programmers care about Monoids? Because Monoids are a common pattern that recurs in programming. When patterns emerge, we can abstract them away and leverage what we've done in the past. This allows us to rapidly develop solutions on top of proven stable code.
Add "commutative" to the Monoid (Commutative Monoid) and you have something that can be executed in parallel. With Moore's Law coming to an end, parallel computing is our only hope of increasing processing speed.
Here's what I learned after learning Monoids. It's not necessarily complete, but hopefully it will help introduce people to Monoids.
Monoid lineage
Monoid comes from mathematics and belongs to the lineage of algebraic structures. So it helps to start from scratch and gradually expand to Monoids. In fact, we can push further to "Groups".
Magma (metagroup)
Magma is a set and a binary operation that must be closed:
∀ a, b ∈ M : a • b ∈ M
A binary operation is closed if it produces another member of the set when applied to any 2 elements of the set. (here ·
represents a binary operation)
An example of Magma is the set of Boolean and AND
operations.
Semigroup (semigroup)
Semigroup is Magma with one additional requirement. Binary operations must be "associative" for all members of the set:
∀ a, b, c ∈ S : a · (b · c) = (a · b) · c
An example of a Semigroup is a collection of "non-empty string" and "string concatenation" operations.
Monoid
Monoid is a Semigroup with an additional condition. There is a "Neutral Element" in a set that can be combined with any member of the set using binary operations to produce members that belong to the same set.
e ∈ M : ∀ a ∈ M, a · e = e · a = a
An example of a Monoid is a collection of strings and the "string concatenation" operation. Note that the empty string added to the collection is "unitary" and makes the Semigroup known as Monoid.
Another example of a Monoid is a collection of non-negative integers and addition operations. The unitary element is 0
.
Group
A Group is a Monoid containing an additional condition. There is an "inverse" in the collection such that:
∀ a, b, e ∈ G : a · b = b · a = e
where e
is the unitary element.
An example of a Group is a collection of integers and addition operations. "Inverse" is a negative number, and the unitary element is 0
.
By allowing negative numbers, we turned the second example of Monoid above into a Group.
Quote: Math StackExchange question: What's the difference between a monoid and a group?
Monoids in Haskell
Monoid typeclass
In Haskell Prelude (based on GHC.Base
) the Monoid typeclass is defined as:
class Monoid a where
mempty :: a
-- ^ 'mappend' 的幺元
mappend :: a -> a -> a
-- ^ 一个"可结合"的操作
mconcat :: [a] -> a
-- ^ 使用 monoid 来折叠一个列表.
-- 对于大多数类型,会使用 'mconcat' 的默认定义
-- 但该函数包含在类定义中,所以可以为特定类型提供优化的版本.
mconcat = foldr mappend mempty
where mempty
is a unitary, mappend
is a binary composable operator. This is enough to be a Monoid, but mconcat
is added for convenience. It has a default implementation that uses binary operations mappend
--- to fold lists starting from univariate mempty
.
Instances can override this default implementation, as we'll see later.
Monoid instance
Monoid ()
A simple example is a collection containing only ()
:
instance Monoid () where
mempty = ()
_ `mappend` _ = ()
mconcat _ = ()
This set contains only one unitary element ()
. So mappend
doesn't really care about the parameters and just returns ()
. Meaning that the only valid parameter is always ()
since our collection only contains ()
.
Also, for efficiency, the mconcat
function is overridden to ignore the list of elements in the set, since they are both ()
, so it just returns ()
. Note that if mconcat
is omitted here, the default implementation will produce the same result due to the implementation of mappend
.
Monoid ()
use case
There's not much you can do with this Monoid by itself.
n :: ()
n = () `mappend` ()
ns :: ()
ns = mconcat [(), (), ()]
Monoid [a]
Monoid of any list:
instance Monoid [a] where
mempty = []
mappend = (++)
mconcat xss = [x | xs <- xss, x <- xs]
mappend
is a "concatenation" operation, which means that the unitary mempty
can only be an empty list, []
.
It's important to realize mconcat
to get a list of "elements" from the collection, here "list of lists". So it expects a "list of lists", hence the parameter name xss
.
I suspect that List Comprehensions are more efficient than foldr
, otherwise there is no reason to implement mconcat
.
If we think about it, foldr
will repeat the mappend
called with 2 lists, which is not efficient due to the repeated processing of the elements in the intermediate list returned by each iteration.
Using List Comprehension would be a low-level operation, likely to visit each element of each sublist only once.
Monoid [a] use case
as :: [Int]
as = [1, 2, 3]
bs :: [Int]
bs = [4, 5, 6]
asbs :: [Int]
asbs = mconcat [as, bs] -- [1, 2, 3, 4, 5, 6]
(Monoid a, Monoid b) => Monoid (a, b)
Monoid of 2-tuples of arbitrary Monoids:
instance (Monoid a, Monoid b) => Monoid (a,b) where
mempty = (mempty, mempty)
(a1,b1) `mappend` (a2,b2) = (a1 `mappend` a2, b1 `mappend` b2)
At first, the definition of mempty
seems confusing. At first glance, the definition might be misinterpreted as a recursive definition.
Actually the first one in this tuple mempty
a
of type mempty
. The second mempty
b
of type mempty
.
Imagine a
is ()
and b
is [Int]
. So mempty
will be ( (), [] )
, that is, the first one is ()
of mempty
, the second is [Int]
The mempty
.
The implementation of mappend
is very simple. It executes a mappend
for a
and b
--- and returns a (a, b)
group. Since both a
and b
are Monoids, the closure constraints of Magmas and Monoids continue.
Monoid (a, b) use case
p1 :: ((), [Int])
p1 = ((), [1, 2, 3])
p2 :: ((), [Int])
p2 = ((), [4, 5, 6])
p1p2 :: ((), [Int])
p1p2 = mconcat [p1, p2] -- ((), [1, 2, 3, 4, 5, 6])
Monoid b => Monoid (a -> b)
Monoid for "any function that takes one or more arguments, returns a Monoid,":
instance Monoid b => Monoid (a -> b) where
mempty _ = mempty
mappend f g x = f x `mappend` g x
It's not obvious how this definition handles functions with multiple arguments. Might need some reminder.
Function annotations are right-associative , i.e. they combine on the right-hand side:
f :: Int -> (Bool -> String) -- 不必要的括号
f s1 s2 = s1 ++ s2
Int -> (Bool -> String)
is equivalent to Int -> Bool -> String
, which is why we don't include the parentheses. "Right associativity" suggests this.
Remember String
equivalent to [Char]
, we know f
will eventually return to a Monoid, as we have seen above Monoid [a]
.
But not so fast. We first have to decompose the annotation as defined in the Monoid instance a -> b
:
Int -> (Bool -> String)
a -> b
Here b
must be Monoid. Thanks to Monoid (a -> b)
, it is.
Looking at b
now, we get:
(Bool -> String)
( a -> b )
So reapply Monoid (a -> b)
can handle functions with multiple arguments, for example:
Int -> (String -> (Int -> String))
a -> ( b )
a -> (a' -> ( b' ))
a -> (a' -> (a'' -> b'' )
Here b
is Monoid, because b'
is Monoid, because b''
is String
is Monoid, but also because String
is [Char]
and we saw earlier that all listings are Monoids.
Look at the definition again:
instance Monoid b => Monoid (a -> b) where
mempty _ = mempty
mappend f g x = f x `mappend` g x
mempty
the definition now makes more sense. mempty
is of type a -> b
, which is why it takes a single parameter. It ignores arguments and simply returns b
of type mempty
.
For Bool -> String
type of function, mempty
is []
, i.e. Monoid [a]
a mempty
.
For type Int -> Bool -> String
function, mempty
is recursive, that is, it is first Bool -> String
type of the return itself, which will return []
.
Note a
is irrelevant here. In fact, all input types to the function are irrelevant. The only thing that matters here is the type of the return value. That's why only b
must be Monoid.
Therefore, the following function types will have mempty
and eventually return []
, since they both return String
:
Int -> String
Int -> Int -> String
Int -> Bool -> Int -> Double -> String
Similarly, mappend
applies a single argument to both functions, then calls b
of mappend
.
For type String -> String
function, mappend
using the input String
calling all the two functions, and then to Monoid [a]
of String
--call String
mappend
ie (++)
.
For a function of type String -> String -> String
, mappend
30c5bc79fdedaa3162906483d7332eef---calls both functions with the first input parameter String
calls both functions, then String -> String
mappend
, which is Monoid (a -> b)
, which is itself.
Next, use the second input parameter String
to call all two functions, and then call --- String
87274def8b868e9 of type Monoid [a]
mappend
That is to call (++)
.
Monoid (a -> b) use case
import Data.Monoid ((<>))
parens :: String -> String
parens str = "(" ++ str ++ ")"
curlyBrackets :: String -> String
curlyBrackets str = "{" ++ str ++ "}"
squareBrackets :: String -> String
squareBrackets str = "[" ++ str ++ "]"
pstr :: String -> String
pstr = parens <> curlyBrackets <> squareBrackets
astr :: String
astr = pstr "abc"
Note that the <>
operator is used in pstr
. This operator is imported from Data.Monoid
and is an alias (infix) for the mappend
operation.
If you look back at Monoid's class
definition, you'll see that mappend
is of type a -> a -> a
.
Since parens
and curlyBrackets
has a type -> String -> String
, so parens <> curlyBrackets
having String -> String
type, parens <> curlyBrackets <> squareBrackets
will also have that type.
pstr
received String
and apply parens
, curlyBrackets
and squareBrackets
The results of these calls stitching.
So astr
is (abc){abc}[abc]
.
If the number of functions to be applied is large, using the <>
method can become cumbersome. This is why the Monoid class has a helper function mconcat
.
We can refactor the code like this:
pstr :: String -> String
pstr = mconcat [parens, curlyBrackets, squareBrackets]
astr :: String
astr = pstr "abc"
Monoid \<number-type\>
Looking back at the definition of Monoid, we have to choose a binary operation that can be combined, but for numbers it can be either addition or multiplication.
If we choose addition, we miss multiplication and vice versa.
Unfortunately, there can only be 1 Monoid of each type.
The solution to this problem is to create a new type that contains one for addition Num
and another for multiplication.
These types can be found in Data.Monoid
:
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
import GHC.Generics
newtype Sum a = Sum { getSum :: a }
deriving (Eq, Ord, Read, Show, Bounded, Generic, Generic1, Num)
newtype Product a = Product { getProduct :: a }
deriving (Eq, Ord, Read, Show, Bounded, Generic, Generic1, Num)
Now we can create Monoids for each.
Monoid Sum (and)
{-# LANGUAGE ScopedTypeVariables #-}
import Data.Coerce
instance Num a => Monoid (Sum a) where
mempty = Sum 0
mappend = coerce ((+) :: a -> a -> a)
mempty
is 0
wrapped in Sum
.
Here coerce
for safely Sum a
cast to its "Representational type", for example Sum Integer
will be cast Integer
and use the appropriate +
operation.
ScopedTypeVariables
the pragma allows us a -> a -> a
in a
equivalent to instance
range, it is equivalent to Num a
in The a
.
Monoid Sum use case
sum :: Sum Integer
sum = mconcat [Sum 1, Sum 2] -- Sum 3
Monoid Product (product)
{-# LANGUAGE ScopedTypeVariables #-}
import Data.Coerce
instance Num a => Monoid (Product a) where
mempty = Product 1
mappend = coerce ((*) :: a -> a -> a)
mempty
is 0
wrapped in Product
.
Here coerce
for securely Product a
cast its Representational type, e.g. Product Integer
is cast Integer
using The appropriate *
operation.
ScopedTypeVariables
the pragma allows us a -> a -> a
in a
equivalent to instance
range, it is equivalent to Num a
in The a
.
Monoid Product use case
product :: Product Integer
product = mconcat [Product 2, Product 3] -- Product 6
Monoid Ordering
Before looking at this Monoid, let's review the sorting and comparison:
data Ordering = LT | EQ | GT
This type is used when using class Ord
in compare
, for example:
compare :: a -> a -> Ordering
Example of its use:
compare "abcd" $ "abed" -- LT
Now Data.Ord
has a great helper function for comparison called comparing
:
comparing :: (Ord a) => (b -> a) -> b -> b -> Ordering
comparing p x y = compare (p x) (p y)
This helper function applies a function to each element before comparing. This is very useful for things like tuples:
comparing fst (1, 2) (1, 3) -- EQ
comparing snd (1, 2) (1, 3) -- LT
Now for Monoid:
-- lexicographical ordering
instance Monoid Ordering where
mempty = EQ
LT `mappend` _ = LT
EQ `mappend` y = y
GT `mappend` _ = GT
This implementation seems arbitrary. Why would anyone implement Monoid Ordering
this way?
Well, if you want to append some comparison to sortBy
, then you need this implementation.
Take a look at sortBy
:
sortBy :: (a -> a -> Ordering) -> [a] -> [a]
Note that the first parameter is the same as 7d6e6d of type compare
, comparing fst
, comparing snd
and comparing fst `mappend` comparison snd
.
Why? Because the type of ---d1604f59d8d2b31f73d239634070c299 mappend
is a -> a -> a
, here a
is (a, b) -> (a, b) -> Ordering
.
So we can combine or mappend
comparison functions and we will have an overall comparison function.
Remember, Monoid (a -> b)
requires b
as well as Monoid
.
So if we want to be able to mappend
in our comparison function, we must set Ordering
to Monoid
, as we did above.
But we still haven't answered why it has this seemingly bizarre definition.
Well, the comments have a bit of a clue, namely "lexicographical order". This essentially means "alphabetical" or "left-first", i.e. if the leftmost is GT
or LT
, then all comparisons to the right are no longer valid.
However, if the leftmost one is EQ
, then we need to look to the right to determine the final result of the combined comparison.
This is exactly what this implementation does. Here again some extra notes to illustrate this:
-- 字典序
instance Monoid Ordering where
mempty = EQ -- EQ 直到左边或直到右边, 对最终结果没有影响
LT `mappend` _ = LT -- 如果左边是 LT 则忽略右侧
EQ `mappend` y = y -- 如果左边是 EQ 则用右侧
GT `mappend` _ = GT -- 如果左边是 GT 则忽略右侧
Take a moment to understand this well. Once you do this, it will be easier to understand:
sortBy (comparing fst <> comparing snd) [(1,0),(2,1),(1,1),(2,0)]
-- [(1,0),(1,1),(2,0),(2,1)]
To understand how it works, you have to remember Monoid (a -> b)
.
We're doing ---2753b7b43e65381c66eedb80858062be (a, b) -> (a, b) -> Ordering
for a function of type mappend
. Once both functions are done, we'll return the two Ordering
in our "lexicographical order" Ordering
The value does mappend
.
This means that contrast fst
takes precedence over contrast snd
, that's why all (1, x)
will come before all (2, y)
The same is true when x > y
.
We can do a different comparison, we only care about the comparison snd
:
sortBy (comparing snd) [(1,0),(2,1),(1,1),(2,0)]
-- [(1,0),(2,0),(2,1),(1,1)]
Here fst
terms are in unpredictable order, while snd
is in ascending order.
For fun, we can control the ascending and descending order separately. First let's define some helper functions:
asc, desc :: Ord b => (a -> b) -> a -> a -> Ordering
asc = comparing
desc = flip . asc
Now we can sort fst
in descending order and snd
ascending order:
sortBy (desc fst <> asc snd) [(1,0),(2,1),(1,1),(2,0)]
-- [(2,0),(2,1),(1,0),(1,1)]
Optimization Monoid Ordering
The example sorts all use only a small number of contrasts. In fact, most sorts will only use a small number of comparisons.
Even so, ---a653e5c328bcac87569a146b4016e871--- must be executed even if the first one returns LT
or GT
mappend
. This doesn't seem like a big deal when there are only a small number of comparisons. But it may stack up into one big list.
We want our comparisons to be "short-circuited", which is usually done with boolean binary operations &&
and ||
.
The current definition of Monoid Ordering
cannot short-circuit because it relies on the default mconcat
implementation, which uses the foldr
function that accesses each list element.
If we write our own Moniod Ordering
and implement a mconcat
that returns the result early, we will have a more efficient sorting.
import Prelude hiding (Monoid, mempty, mappend, mconcat)
import Data.List
import Data.Maybe
import Control.Arrow
instance Monoid Ordering where
mempty = EQ
LT `mappend` _ = LT
EQ `mappend` y = y
GT `mappend` _ = GT
mconcat = find (/= EQ) >>> fromMaybe EQ
This implementation allows us to refactor our previous sorting:
sortBy (mconcat [desc fst, asc snd]) [(1,0),(2,1),(1,1),(2,0)]
-- [(2,0),(2,1),(1,0),(1,1)]
Same result, but anytime dest fst
returns LT
or GT
, then asc snd
will be skipped.
Note: Our implementation relies Data.List
, Data.Maybe
and Control.Arrow
, if implemented in the standard they would unnecessarily coupled Data.Monoid
. This limitation can be overcome by writing a dedicated function (not very "Don't repeat yourself").
However, the biggest problem with overriding the standard implementation is that we have to overshadow all Monoid definitions.
These are some pretty big downsides of optimizing for edge cases. But it's also a good exercise. Also, if the list we're trying to sort is large, it might be worthwhile.
Quote:
Exchangeable Monoid (Abelian Monoid)
As mentioned at the beginning, if we add one more constraint to Monoid
(or Group
), we can execute operations in parallel.
That constraint is "commutability".
∀ a, b ∈ M : a · b = b · a
By imposing this constraint, we can process the list in any order. This can be parallelized by the compiler, or even distributed to other machines with the help of class libraries.
Here is the definition:
class Monoid m => CommutativeMonoid m
It may seem strange not to write a function, but its interface is the same as Monoid
except that binary operations are required to support commutativity.
Unfortunately, there is no way in Haskell to require these constraints.
Num a => CommutativeMonoid (Sum a)
Here is the definition:
instance Num a => CommutativeMonoid (Sum a)
Sum
(or Product
) Reason for using CommutativeMonoid
instead of Monoid
:
- Better communicate how to use
Monoid
- Calling a function that requires a
CommutativeMonoid
in conclusion
Monoids are powerful abstractions for stitching together similar things that can be presented over and over again in programming.
Hope this pair Monoids
is a good introduction. There are many other types of Monoids, but once you have a general understanding, it should be a lot easier to study these other specialized Monoids.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。