Difference between concatMap f xs and concat $ map f xs? - haskell

Presumably they do exactly the same thing, concatMap f xs and concat $ map f xs. Why would I choose one over another?
I imagine it may be an optimization. If so, is this still the case with GHC 7.8?

It is the case that concatMap f xs = concat (map f xs) as you suspect. Thus, for correctness purposes you should consider them interchangeable. We can examine their definitions to learn a little more, though.
concatMap :: (a -> [b]) -> [a] -> [b]
concatMap f = foldr ((++) . f) []
concat :: [[a]] -> [a]
concat = foldr (++) []
In particular, this means that concat . map f expands to foldr (++) [] . map f. Now using a thing known as the "universal property of fold" we can see that foldr g z . map f = foldr (g . f) z for any (g, z, f) such as the choice ((++), f, []) we use above. This demonstrates that concatMap f = concat . map f like we want.[0]
So why are they defined differently? Because foldr ((++) . f) [] is always going to be faster than foldr (++) [] . map f since, in a really pathological case, the latter suggests two separate recursions. Due to laziness, it's unlikely that two recursions would ever be performed, though, so what gives?
The real reason is that there are more complex fusion laws available to the compiler such as those which combine two sequential foldrs or which define interactions between foldr and unfoldr. These are kind of finicky to use as they depend upon being able to look at the surface syntax of a fragment of code and detect possible simplifications. A lot of work goes into getting consistently firing fusion laws.
But one thing we can do is encourage people to use higher order combinators with optimization laws pre-applied. Since foldr (++) [] . map f is never going to be faster than foldr ((++) . f) [] we can take a shortcut and pre-apply the universal law simplification. This will improve the likelihood of fusion laws firing elsewhere to best optimize a list production pipeline.
[0] Why does this law work? Roughly, the universal law of foldr states that if you have any function q such that q [] = z and q (a:as) = f a (q as) then that q must be and is foldr f z. Since q = foldr g z . map f can be shown to have q [] = z and q (a:as) = g (f a) (q as) then it must be a fold like foldr (g . f) z like we want.

Related

Haskell Applicative idiom?

I'm new to Haskell and am puzzling over how to best express some operations in the most idiomatic and clear way. Currently (there will be more to come) I'm puzzling over <*> (I'm not even sure what to call that).
For example, if I have, say
f = (^2)
g = (+10)
as representative functions (in practice they are more complex, but the key thing here is that they are different and distinct), then
concatMap ($ [1,2,3,4,10]) [(f <$>), (g <$>) . tail . reverse]
and
concat $ [(f <$>), (g <$>) . tail . reverse] <*> [[1,2,3,4,10]]
accomplish the same thing.
Is one of these more idiomatic Haskell, does one imply something an experienced reader of Haskell that the other does not. Perhaps there are additional (better) ways to express exactly the same thing. Are there conceptual differences between the two approaches that a novice Haskeller like myself may be missing?
Both your functions (f <$>) and (g <$>).tail.reverse return a monoid type (list in this case) so you can use mconcat to convert them into a single function. Then you can apply this function directly to the input list instead of wrapping it in another list and using concatMap:
mconcat [(f <$>), (g <$>).tail.reverse] $ [1,2,3,4,10]
To expand on this, a function a -> b is an instance of Monoid if b is a monoid. The implementation of mappend for such functions is:
mappend f g x = f x `mappend` g x
or equivalently
mappend f g = \x -> (f x) `mappend` (g x)
so given two functions f and g which return a monoid type b, fmappendg returns a function which applies its argument to f and g and combines the results using the Monoid instance of b.
mconcat has type Monoid a => [a] -> a and combines all the elements of the input list using mappend.
Lists are monoids where mappend == (++) so
mconcat [(f <$>), (g <$>).tail.reverse]
returns a function like
\x -> (fmap f x) ++ (((fmap g) . tail . reverse) x)
Personally for your example I would write
f = (^2)
g = (+10)
let xs = [1,2,3,4,10]
in (map f xs) ++ (map g . tail $ reverse xs)
In a very Applicative "mood", I would replace the part after in by
((++) <$> map f <*> map g . tail . reverse) xs
which I actually don't think is more readable in this case. If you don't directly understand what it means, spend some time on understanding the Applicative instance of ((->) a) (Reader).
I think the choice really depends on what you're trying to do, i.e. what your output is supposed to mean. In your example the task is very abstract (basically just showcasing what Applicative can do), so it's not directly obvious which version to use.
The Applicative instance of [] intuitively relates to combinations, so I would use it in a situation like this:
-- I want all pair combinations of 1 to 5
(,) <$> [1..5] <*> [1..5]
If you would have many functions, and you would want to try all combinations of these functions with a number of arguments, I would indeed use the [] instance of Applicative. But if what you're after is a concatenation of different transformations I would write it as such (which I did, above).
Just my 2 cents as a medium-experience Haskeller.
I sometimes struggle with the similar problem. You have single element but multiple functions.
Usually we have multiple elements, and single function: so we do:
map f xs
But it's not the problem in Haskell. The dual is as easy:
map ($ x) fs
The fact, that your x is actually a list, and you want to concat after the map, so you do
concatMap ($ xs) fs
I cannot really understand what happens in the second equation directly, even I can reason it does the same as first one using applicative laws.

Counting the frequency of values in a list Using Control.Foldl

I am using the Control.Foldl library to traverse an arbitrarily long list and counting all occurrences of arbitrarily many unique entities. Ie, the list may be of form
[Just "a", Just "b", Just "aab", Nothing, Just "aab"]
and I my result should something like:
[(Just "a",1),(Just "b",1) (Just "aab", 2), (Nothing, 1)]
Now the issue is I do not have the name of these entities a priori, and I would like to dynamically update the results as I fold.
My problem is that I do not know how to describe this computation in terms of the Fold data type from Control.foldl. Specifically, at each step of the fold I need to traverse the result list and ask if I have seen the current item, but I see no way of describing this using foldl.
Please note for future use purposes it's really important that I use the Control.Foldl library here, not fold over some other foldable data type like a map. In some sense my question is more along the lines of how to use the Foldl library, since the documentation is not too clear to me.
Edit: The example I showed is just a toy example, in reality I need to traverse a arb large list many times computing statistics, hence I'm using the foldl library, which allow me to combine the computations using applicatives ie toResults <$> stat1 <*> stat2 <*> ... <*> statm $ largeList and foldl allow me to traverse the list just once, computing all m statistics. Please find a solution using the foldl library.
You can encode a normal foldl' pretty straightforwardly as a Fold:
foldlToFold :: (b -> a -> b) -> b -> Fold a b
foldlToFold f z = Fold f z id
I'm actually a bit puzzled that this combinator isn't in the library...
Anyways, if you have
foldl' f z
you can replace it with
fold (Fold f z id)
so here, you would normally be using
foldl' (\mp x -> M.insertWith (+) x 1 mp) M.empty
with Fold, you'd be making
countingFold :: Ord a => Fold a (Map a Int)
countingFold = Fold (\mp x -> M.insertWith (+) 1 mp) M.empty id
and you can use it as
countUp :: Ord a => [a] -> Map a Int
countUp = fold countingFold
-- or
countUp = fold (Fold (\mp x -> M.insertWith (+) 1 mp) M.empty id)
If you want to go back to a list at the end, you can do
M.toList . countUp
In general, if you can formulate your fold as a foldl', you can do the transformation above to be able to encode it as a Fold. Fold is a bit more expressive because for foldl', the b type is both the accumulator and the result type; for a Fold, you can have a separate accumulator and result type.
Roughly speaking, you can translate any Fold into a foldl-and-map:
Fold f z g = map g . foldl' f z
And you can go backwards too:
foldlMapToFold :: (b -> a -> b) -> b -> (b -> c) -> Fold a c
foldlMapToFold = Fold
So if you had
map g . foldl' f z
you can write
fold (Fold f z g)
If you want to use a Fold, think, "how can i describe my operation as a foldl' and a map?", and then go from there.
The advantage of using the Fold type over just normal maps and folds is (apart from performance tweaks) the ability to combine and manipulate multiple Folds as objects using their Applicative instance, and other nice instances too, like Functor, Profunctor, fun stuff like that. Combining folds encoded as maps-and-foldl's is a bit tedious, but the Fold wrapper lets you do it in a cleaner first-class way using the abstractions everyone knows and loves.
For example, if i had
fold1 = map g . foldl' f z
and
fold2 = map g' . foldl' f' z'
and I wanted to do
fold3 = map (\(x,y) -> foo (g x) (g' y))
. foldl' (\(x,x') (y,y) -> (f x y, f' x' y')) (z', z')
(that is, do both folds on the list in one pas, and recombine the results at the end with foo). It's a big hassle, right?
But i can also just do
fold1 = Fold f z g
fold2 = Fold f' z' g'
fold3 = foo <$> fold1 <*> fold2
(Note that, even better, using using Fold actually keeps foldl' strict, because in the example above, the lazy tuples add a layer of indirection and make the fold' lazy again incidentally)

Point-free equivalent

I have this function from another SO question,
f :: Ord a => [a] -> [(a, Int)]
f xs = zipWith (\x ys -> (x, length $ filter (< x) ys)) xs (inits xs)
I'm trying to write it in point-free style,
f = flip (zipWith (\x -> (,) x . length . filter (< x))) =<< inits
Is it possible to get rid of that x ?
It's possible, but absolutely not worth the pain. To directly answer your question, LambdaBot on FreeNode reports:
f = flip (zipWith (liftM2 (.) (,) ((length .) . filter . flip (<)))) =<< inits
At this point the function has lost whatever clarity it had, and has become unmaintainable. Here you'd do much better to introduce real names. Remember, just because we can make things point free does not mean we should.
As a general rule: if a variable turns up more than once in an expression, it's probably not a good idea to make it point-free. If you're determined however, the least unreadable way is with the Arrow combinators, because that makes it pretty clear where the data flow is "split". For the xs I'd write
uncurry (zipWith (...)) . (id &&& inits)
For x, the same method yields
zipWith ( curry $ uncurry(,) . (fst &&& length . uncurry filter . first(>)) )
This is even longer than the (->)-monad solution that you've used and lambdabot suggests, but it looks far more organised.
The point of pointfree style is not just omitting names for values, but preferring names for functions. This is significantly easier to do when you use very small definitions. Of course any code is going to become unreadable if you inline everything and don’t use good names.
So let’s start with your original function, and split it into a few smaller definitions.
f xs = zipWith combine xs (inits xs)
combine x xs = (x, countWhere (< x) xs)
countWhere f xs = length (filter f xs)
Now we can easily make these definitions pointfree in a readable way.
f = zipWith combine <*> inits
where combine = compose (,) countLessThan
compose = liftA2 (.)
countLessThan = countWhere . flip (<)
countWhere = length .: filter
(.:) = (.) . (.)
Using names judiciously and preferring composition over application allows us to factor code into small, easily understood definitions. Named parameters are the equivalent of goto for data—powerful, but best used to build reusable higher-level structures that are easier to understand and use correctly. These compositional combinators such as (.) and <*> are to data flow what map, filter, and fold are to control flow.
My stab at it:
f :: Ord a => [a] -> [(a, Int)]
f = zip <*> ((zipWith $ (length .) . filter . (>)) <*> inits)
Here I replaced (<) with (>) to have (length .) . filter . (>) as a function with arguments in the right order: a->[a]->Int. Passing it to zipWith, we get [a]->[[a]]->[Int].
Assuming we have [a] on input, we can see this as f ([[a]]->[Int]) for Applicative ((->) [a]), which can be combined with inits :: f [[a]] with <*> :: f ([[a]]->[Int])->f [[a]]->f [Int]. This gives us [a]->[Int], now need to consume both [a] and [Int] in parallel. zip is already of the right type: [a]->[Int]->[(a,Int)] to apply with <*>.
Not saying I recommend this, but the King of Pointfree is Control.Arrow
import Control.Arrow
-- A special version of zipWith' more amenable to pointfree style
zipWith' :: ((a, b) -> c) -> ([a], [b]) -> [c]
zipWith' = uncurry . zipWith . curry
f :: Ord a => [a] -> [(a, Int)]
f = zipWith' (fst &&& (length <<< uncurry filter <<< first (>))) <<< id &&& inits
Let me reclarify here—I really don't recommend this unless your intention is to somehow generalize the kind of arrow your program is operating in (e.g. into Arrowized FRP perhaps).
With the well-known
(f .: g) x y = f (g x y)
it is a semi-readable
zipWith (curry (fst &&& uncurry (length .: (filter . flip (<))) )) <*> inits
-- \(x,ys) -> (x , length ( (filter . flip (<)) x ys) )
Using Control.Applicative (f <*> g $ x = f x (g x), the S combinator), and Control.Arrow (as others, but a little bit differently).

Haskell: How is join a natural transformation?

I can define a natural transformation in Haskell as:
h :: [a] -> Maybe a
h [] = Nothing
h (x:_) = Just x
and with a function k:
k :: Char -> Int
k = ord
the naturality condition is met due to the fact that:
h . fmap k == fmap k . h
Can the naturality condition of the List monad's join function be demonstrated in a similar way? I'm having some trouble understanding how join, say concat in particular, is a natural transformation.
Okay, let's look at concat.
First, here's the implementation:
concat :: [[a]] -> [a]
concat = foldr (++) []
This parallels the structure of your h where Maybe is replaced by [] and, more significantly, [] is replaced by--to abuse syntax for a moment--[[]].
[[]] is a functor as well, of course, but it's not a Functor instance in the way that the naturality condition uses it. Translating your example directly won't work:
concat . fmap k =/= fmap k . concat
...because both fmaps are working on only the outermost [].
And although [[]] is hypothetically a valid instance of Functor you can't make it one directly, for practical reasons that are probably obvious.
However, you can reconstruct the correct lifting as so:
concat . (fmap . fmap) k == fmap k . concat
...where fmap . fmap is equivalent to the implementation of fmap for a hypothetical Functor instance for [[]].
As a related addendum, return is awkward for the opposite reason: a -> f a is a natural transformation from an elided identity functor. Using : [] the identity would be written as so:
(:[]) . ($) k == fmap k . (:[])
...where the completely superfluous ($) is standing in for what would be fmap over the elided identity functor.

Trick for "reusing" arguments in Haskell?

From time to time I stumble over the problem that I want to express "please use the last argument twice", e.g. in order to write pointfree style or to avoid a lambda. E.g.
sqr x = x * x
could be written as
sqr = doubleArgs (*) where
doubleArgs f x = f x x
Or consider this slightly more complicated function (taken from this question):
ins x xs = zipWith (\ a b -> a ++ (x:b)) (inits xs) (tails xs)
I could write this code pointfree if there were a function like this:
ins x = dup (zipWith (\ a b -> a ++ (x:b))) inits tails where
dup f f1 f2 x = f (f1 x) (f2 x)
But as I can't find something like doubleArgs or dup in Hoogle, so I guess that I might miss a trick or idiom here.
From Control.Monad:
join :: (Monad m) -> m (m a) -> m a
join m = m >>= id
instance Monad ((->) r) where
return = const
m >>= f = \x -> f (m x) x
Expanding:
join :: (a -> a -> b) -> (a -> b)
join f = f >>= id
= \x -> id (f x) x
= \x -> f x x
So, yeah, Control.Monad.join.
Oh, and for your pointfree example, have you tried using applicative notation (from Control.Applicative):
ins x = zipWith (\a b -> a ++ (x:b)) <$> inits <*> tails
(I also don't know why people are so fond of a ++ (x:b) instead of a ++ [x] ++ b... it's not faster -- the inliner will take care of it -- and the latter is so much more symmetrical! Oh well)
What you call 'doubleArgs' is more often called dup - it is the W combinator (called warbler in To Mock a Mockingbird) - "the elementary duplicator".
What you call 'dup' is actually the 'starling-prime' combinator.
Haskell has a fairly small "combinator basis" see Data.Function, plus some Applicative and Monadic operations add more "standard" combinators by virtue of the function instances for Applicative and Monad (<*> from Applicative is the S - starling combinator for the functional instance, liftA2 & liftM2 are starling-prime). There doesn't seem to be much enthusiasm in the community for expanding Data.Function, so whilst combinators are good fun, pragmatically I've come to prefer long-hand in situations where a combinator is not directly available.
Here is another solution for the second part of my question: Arrows!
import Control.Arrow
ins x = inits &&& tails >>> second (map (x:)) >>> uncurry (zipWith (++))
The &&& ("fanout") distributes an argument to two functions and returns the pair of the results. >>> ("and then") reverses the function application order, which allows to have a chain of operations from left to right. second works only on the second part of a pair. Of course you need an uncurry at the end to feed the pair in a function expecting two arguments.

Resources