Haskell: How is join a natural transformation? - haskell

I can define a natural transformation in Haskell as:
h :: [a] -> Maybe a
h [] = Nothing
h (x:_) = Just x
and with a function k:
k :: Char -> Int
k = ord
the naturality condition is met due to the fact that:
h . fmap k == fmap k . h
Can the naturality condition of the List monad's join function be demonstrated in a similar way? I'm having some trouble understanding how join, say concat in particular, is a natural transformation.

Okay, let's look at concat.
First, here's the implementation:
concat :: [[a]] -> [a]
concat = foldr (++) []
This parallels the structure of your h where Maybe is replaced by [] and, more significantly, [] is replaced by--to abuse syntax for a moment--[[]].
[[]] is a functor as well, of course, but it's not a Functor instance in the way that the naturality condition uses it. Translating your example directly won't work:
concat . fmap k =/= fmap k . concat
...because both fmaps are working on only the outermost [].
And although [[]] is hypothetically a valid instance of Functor you can't make it one directly, for practical reasons that are probably obvious.
However, you can reconstruct the correct lifting as so:
concat . (fmap . fmap) k == fmap k . concat
...where fmap . fmap is equivalent to the implementation of fmap for a hypothetical Functor instance for [[]].
As a related addendum, return is awkward for the opposite reason: a -> f a is a natural transformation from an elided identity functor. Using : [] the identity would be written as so:
(:[]) . ($) k == fmap k . (:[])
...where the completely superfluous ($) is standing in for what would be fmap over the elided identity functor.

Related

Function Composition Do Notation

Is there a "do notation" syntactic sugar for simple function composition?
(i.e. (.) :: (b -> c) -> (a -> b) -> a -> c)
I'd like to be able to store results of some compositions for later (while still continuing the chain.
I'd rather not use the RebindableSyntax extension if possible.
I'm looking for something like this:
composed :: [String] -> [String]
composed = do
fmap (++ "!!!")
maxLength <- maximum . fmap length
filter ((== maxLength) . length)
composed ["alice", "bob", "david"]
-- outputs: ["alice!!!", "david!!!"]
I'm not sure something like this is possible, since the result of the earlier function essentially has to pass "through" the bind of maxLength, but I'm open to hearing of any other similarly expressive options. Basically I need to collect information as I go through the composition in order to use it later.
Perhaps I could do something like this with a state monad?
Thanks for your help!
Edit
This sort of thing kinda works:
split :: (a -> b) -> (b -> a -> c) -> a -> c
split ab bac a = bac (ab a) a
composed :: [String] -> [String]
composed = do
fmap (++ "!!!")
split
(maximum . fmap length)
(\maxLength -> (filter ((== maxLength) . length)))
One possible way to achieve something like that are arrows. Basically, in “storing interstitial results” you're just splitting up the information flow through the composition chain. That's what the &&& (fanout) combinator does.
import Control.Arrow
composed = fmap (++ "!!!")
>>> ((. length) . (==) . maximum . fmap length &&& id)
>>> uncurry filter
This definitely isn't good human-comprehensible code though.
A state monad would seem to allow something related too, but the problem is that the state type is fixed through the do block's monadic chain. That's not really flexible enough to pick up different-typed values throughout the composition chain. While it is certainly possible to circumvent this (amongst them, indeed, RebindableSyntax), this too isn't a good idea IMO.
The type of (<*>) specialised to the function instance of Applicative is:
(<*>) :: (r -> a -> b) -> (r -> a) -> (r -> b)
The resulting r -> b function passes its argument to both the r -> a -> b and the r -> a functions, and then uses the a value produced by the r -> a function as the second argument of the r -> a -> b one.
What does this have to do with your function? filter is a function of two arguments, a predicate and a list. Now, a key aspect of what you are trying to do is that the predicate is generated from the list. That means the core of your function can be expressed in terms of (<*>):
-- Using the predicate-generating function from leftaroundabout's answer.
maxLengthOnly :: Foldable t => [t a] -> [t a]
maxLengthOnly = flip filter <*> ((. length) . (==) . maximum . fmap length)
composed :: [String] -> [String]
composed = maxLengthOnly . fmap (++ "!!!")
This maxLengthOnly definition would be a quite nice one-liner if the pointfree predicate-generating function weren't so clunky.
Since the Applicative instance of functions is equivalent in power to the Monad one, maxLengthOnly can also be phrased as:
maxLengthOnly = (. length) . (==) . maximum . fmap length >>= filter
(The split you added to your question, by the way, is (>>=) for functions.)
A different way of writing it with Applicative is:
maxLengthOnly = filter <$> ((. length) . (==) . maximum . fmap length) <*> id
It is no coincidence that this looks a lot like leftaroundabout's solution: for functions, (,) <$> f <*> g = liftA2 (,) f g = f &&& g.
Finally, it is also worth noting that, while it is tempting to replace id in the latest version of maxLengthOnly with fmap (++ "!!!"), that won't work because fmap (++ "!!!") changes the length of the strings, and therefore affects the result of the predicate. With a function that doesn't invalidate the predicate, though, it would work pretty well:
nicerComposed = filter
<$> ((. length) . (==) . maximum . fmap length) <*> fmap reverse
GHCi> nicerComposed ["alice","bob","david"]
["ecila","divad"]
As leftaroundabout mentioned, you can use Arrows to write your function. But, there is a feature in ghc Haskell compiler, which is proc-notation for Arrows. It is very similar to well-known do-notation, but, unfortunately, not many people aware of it.
With proc-notation you can write your desired function in next more redable and elegant way:
{-# LANGUAGE Arrows #-}
import Control.Arrow (returnA)
import Data.List (maximum)
composed :: [String] -> [String]
composed = proc l -> do
bangedL <- fmap (++"!!!") -< l
maxLen <- maximum . fmap length -< bangedL
returnA -< filter ((== maxLen) . length) bangedL
And this works in ghci as expected:
ghci> composed ["alice", "bob", "david"]
["alice!!!","david!!!"]
If you are interested, you can read some tutorials with nice pictures to understand what is arrow and how this powerful feature works so you can dive deeper into it:
https://www.haskell.org/arrows/index.html
https://en.wikibooks.org/wiki/Haskell/Understanding_arrows
What you have is essentially a filter, but one where the filtering function changes as you iterate over the list. I would model this not as a "forked" composition, but as a fold using the following function f :: String -> (Int, [String]):
The return value maintains the current maximum and all strings of that length.
If the first argument is shorter than the current maximum, drop it.
If the first argument is the same as the current maximum, add it to the list.
If the first argument is longer, make its length the new maximum, and replace the current output list with a new list.
Once the fold is complete, you just extract the list from the tuple.
-- Not really a suitable name anymore, but...
composed :: [String] -> [String]
composed = snd . foldr f (0, [])
where f curr (maxLen, result) = let currLen = length curr
in case compare currLen maxLen of
LT -> (maxLen, result) -- drop
EQ -> (maxLen, curr:result) -- keep
GT -> (length curr, [curr]) -- reset

Haskell Applicative idiom?

I'm new to Haskell and am puzzling over how to best express some operations in the most idiomatic and clear way. Currently (there will be more to come) I'm puzzling over <*> (I'm not even sure what to call that).
For example, if I have, say
f = (^2)
g = (+10)
as representative functions (in practice they are more complex, but the key thing here is that they are different and distinct), then
concatMap ($ [1,2,3,4,10]) [(f <$>), (g <$>) . tail . reverse]
and
concat $ [(f <$>), (g <$>) . tail . reverse] <*> [[1,2,3,4,10]]
accomplish the same thing.
Is one of these more idiomatic Haskell, does one imply something an experienced reader of Haskell that the other does not. Perhaps there are additional (better) ways to express exactly the same thing. Are there conceptual differences between the two approaches that a novice Haskeller like myself may be missing?
Both your functions (f <$>) and (g <$>).tail.reverse return a monoid type (list in this case) so you can use mconcat to convert them into a single function. Then you can apply this function directly to the input list instead of wrapping it in another list and using concatMap:
mconcat [(f <$>), (g <$>).tail.reverse] $ [1,2,3,4,10]
To expand on this, a function a -> b is an instance of Monoid if b is a monoid. The implementation of mappend for such functions is:
mappend f g x = f x `mappend` g x
or equivalently
mappend f g = \x -> (f x) `mappend` (g x)
so given two functions f and g which return a monoid type b, fmappendg returns a function which applies its argument to f and g and combines the results using the Monoid instance of b.
mconcat has type Monoid a => [a] -> a and combines all the elements of the input list using mappend.
Lists are monoids where mappend == (++) so
mconcat [(f <$>), (g <$>).tail.reverse]
returns a function like
\x -> (fmap f x) ++ (((fmap g) . tail . reverse) x)
Personally for your example I would write
f = (^2)
g = (+10)
let xs = [1,2,3,4,10]
in (map f xs) ++ (map g . tail $ reverse xs)
In a very Applicative "mood", I would replace the part after in by
((++) <$> map f <*> map g . tail . reverse) xs
which I actually don't think is more readable in this case. If you don't directly understand what it means, spend some time on understanding the Applicative instance of ((->) a) (Reader).
I think the choice really depends on what you're trying to do, i.e. what your output is supposed to mean. In your example the task is very abstract (basically just showcasing what Applicative can do), so it's not directly obvious which version to use.
The Applicative instance of [] intuitively relates to combinations, so I would use it in a situation like this:
-- I want all pair combinations of 1 to 5
(,) <$> [1..5] <*> [1..5]
If you would have many functions, and you would want to try all combinations of these functions with a number of arguments, I would indeed use the [] instance of Applicative. But if what you're after is a concatenation of different transformations I would write it as such (which I did, above).
Just my 2 cents as a medium-experience Haskeller.
I sometimes struggle with the similar problem. You have single element but multiple functions.
Usually we have multiple elements, and single function: so we do:
map f xs
But it's not the problem in Haskell. The dual is as easy:
map ($ x) fs
The fact, that your x is actually a list, and you want to concat after the map, so you do
concatMap ($ xs) fs
I cannot really understand what happens in the second equation directly, even I can reason it does the same as first one using applicative laws.

Counting the frequency of values in a list Using Control.Foldl

I am using the Control.Foldl library to traverse an arbitrarily long list and counting all occurrences of arbitrarily many unique entities. Ie, the list may be of form
[Just "a", Just "b", Just "aab", Nothing, Just "aab"]
and I my result should something like:
[(Just "a",1),(Just "b",1) (Just "aab", 2), (Nothing, 1)]
Now the issue is I do not have the name of these entities a priori, and I would like to dynamically update the results as I fold.
My problem is that I do not know how to describe this computation in terms of the Fold data type from Control.foldl. Specifically, at each step of the fold I need to traverse the result list and ask if I have seen the current item, but I see no way of describing this using foldl.
Please note for future use purposes it's really important that I use the Control.Foldl library here, not fold over some other foldable data type like a map. In some sense my question is more along the lines of how to use the Foldl library, since the documentation is not too clear to me.
Edit: The example I showed is just a toy example, in reality I need to traverse a arb large list many times computing statistics, hence I'm using the foldl library, which allow me to combine the computations using applicatives ie toResults <$> stat1 <*> stat2 <*> ... <*> statm $ largeList and foldl allow me to traverse the list just once, computing all m statistics. Please find a solution using the foldl library.
You can encode a normal foldl' pretty straightforwardly as a Fold:
foldlToFold :: (b -> a -> b) -> b -> Fold a b
foldlToFold f z = Fold f z id
I'm actually a bit puzzled that this combinator isn't in the library...
Anyways, if you have
foldl' f z
you can replace it with
fold (Fold f z id)
so here, you would normally be using
foldl' (\mp x -> M.insertWith (+) x 1 mp) M.empty
with Fold, you'd be making
countingFold :: Ord a => Fold a (Map a Int)
countingFold = Fold (\mp x -> M.insertWith (+) 1 mp) M.empty id
and you can use it as
countUp :: Ord a => [a] -> Map a Int
countUp = fold countingFold
-- or
countUp = fold (Fold (\mp x -> M.insertWith (+) 1 mp) M.empty id)
If you want to go back to a list at the end, you can do
M.toList . countUp
In general, if you can formulate your fold as a foldl', you can do the transformation above to be able to encode it as a Fold. Fold is a bit more expressive because for foldl', the b type is both the accumulator and the result type; for a Fold, you can have a separate accumulator and result type.
Roughly speaking, you can translate any Fold into a foldl-and-map:
Fold f z g = map g . foldl' f z
And you can go backwards too:
foldlMapToFold :: (b -> a -> b) -> b -> (b -> c) -> Fold a c
foldlMapToFold = Fold
So if you had
map g . foldl' f z
you can write
fold (Fold f z g)
If you want to use a Fold, think, "how can i describe my operation as a foldl' and a map?", and then go from there.
The advantage of using the Fold type over just normal maps and folds is (apart from performance tweaks) the ability to combine and manipulate multiple Folds as objects using their Applicative instance, and other nice instances too, like Functor, Profunctor, fun stuff like that. Combining folds encoded as maps-and-foldl's is a bit tedious, but the Fold wrapper lets you do it in a cleaner first-class way using the abstractions everyone knows and loves.
For example, if i had
fold1 = map g . foldl' f z
and
fold2 = map g' . foldl' f' z'
and I wanted to do
fold3 = map (\(x,y) -> foo (g x) (g' y))
. foldl' (\(x,x') (y,y) -> (f x y, f' x' y')) (z', z')
(that is, do both folds on the list in one pas, and recombine the results at the end with foo). It's a big hassle, right?
But i can also just do
fold1 = Fold f z g
fold2 = Fold f' z' g'
fold3 = foo <$> fold1 <*> fold2
(Note that, even better, using using Fold actually keeps foldl' strict, because in the example above, the lazy tuples add a layer of indirection and make the fold' lazy again incidentally)

Difference between concatMap f xs and concat $ map f xs?

Presumably they do exactly the same thing, concatMap f xs and concat $ map f xs. Why would I choose one over another?
I imagine it may be an optimization. If so, is this still the case with GHC 7.8?
It is the case that concatMap f xs = concat (map f xs) as you suspect. Thus, for correctness purposes you should consider them interchangeable. We can examine their definitions to learn a little more, though.
concatMap :: (a -> [b]) -> [a] -> [b]
concatMap f = foldr ((++) . f) []
concat :: [[a]] -> [a]
concat = foldr (++) []
In particular, this means that concat . map f expands to foldr (++) [] . map f. Now using a thing known as the "universal property of fold" we can see that foldr g z . map f = foldr (g . f) z for any (g, z, f) such as the choice ((++), f, []) we use above. This demonstrates that concatMap f = concat . map f like we want.[0]
So why are they defined differently? Because foldr ((++) . f) [] is always going to be faster than foldr (++) [] . map f since, in a really pathological case, the latter suggests two separate recursions. Due to laziness, it's unlikely that two recursions would ever be performed, though, so what gives?
The real reason is that there are more complex fusion laws available to the compiler such as those which combine two sequential foldrs or which define interactions between foldr and unfoldr. These are kind of finicky to use as they depend upon being able to look at the surface syntax of a fragment of code and detect possible simplifications. A lot of work goes into getting consistently firing fusion laws.
But one thing we can do is encourage people to use higher order combinators with optimization laws pre-applied. Since foldr (++) [] . map f is never going to be faster than foldr ((++) . f) [] we can take a shortcut and pre-apply the universal law simplification. This will improve the likelihood of fusion laws firing elsewhere to best optimize a list production pipeline.
[0] Why does this law work? Roughly, the universal law of foldr states that if you have any function q such that q [] = z and q (a:as) = f a (q as) then that q must be and is foldr f z. Since q = foldr g z . map f can be shown to have q [] = z and q (a:as) = g (f a) (q as) then it must be a fold like foldr (g . f) z like we want.

Point-free functions in monadic binding

I've been investigating the usage of >>= with lists (when viewed as monads). In an article All about monads I found the following identity for lists: l >>= f = concatMap f l, where l is a list and f is some (unary) function. I tried the simple example of doubling each element of a list and arrived at the following:
let double :: Int -> [Int]
double = (flip (:) []) . (2*)
let monadicCombination :: [Int]
monadicCombination = [1,2,3,4,5] >>= double
I specifically wanted the double function to be written in a point-free manner. Can you think of simpler implementations of double so that it still can be used with >>=?
Sassa NF's return . (*2) is both short and demonstrates an interesting principle of your example. If we inline the whole thing we'll get
list >>= double
list >>= return . (*2)
The pattern \f l -> l >>= return . f Is common enough to have its own name: liftM
liftM :: Monad m => (a -> b) -> m a -> m b
liftM f m = m >>= return . f
And in fact, liftM is equivalent to fmap, often known as just map when referring to lists:
list >>= return . (*2)
liftM (*2) list
fmap (*2) list
map (*2) list

Resources