Use of List Monad vs fmap - haskell

Are there any practical uses of the list monad that wouldn't just roll to a fmap? When would you use bind over fmap with the list monad?
Like, for example, you can do [1,2,3] >>= return . ( + 1) but that's the same as (+1) <$> [1,2,3] - when would you use bind without a return on list?

Using bind with return is equivalent to using fmap. Indeed,
fmap f m = m >>= return . f
The uses of bind that can't be reproduced with fmap are exactly those which don't involve this use of return. To provide just one (hopefully) interesting example for lists, let's talk about L-Systems.
L-systems were created by Aristid Lindenmeyer in 1968. As rewriting systems, they start with a simple object and repeatedly replace parts of it using a set of rewriting rules or productions. They can be used to generate fractals and other self-similar images. A context-free, deterministic L-System (or D0L) is defined by the triple of an alphabet, an axiom, and a collection of production rules.
For our alphabet, we'll define a type:
data AB = A | B deriving Show
for our axiom, or starting state, we'll use the word [A, B].
myAxiom = [A, B]
For our rules, we need a map from a single letter to a sequence of letters. This is a function of type AB -> [AB]. Let's use this rule:
myRule :: AB -> [AB]
myRule A = [A, B]
myRule B = [A]
To apply the rule, we must rewrite each letter using its production rule. We must do this for all letters in the word at the same time. Conveniently, this is exactly what >>= does for lists:
apply rule axiom = axiom >>= rule
Now, let's apply our rule to our axiom, generating the first step in the L-System:
> apply myRule myAxiom
> [A, B, A]
This is Lindenmeyer's original L-System, used for modeling algae. We can iterate to see it progress:
> mapM_ print . take 7 $ iterate (>>= myRule) myAxiom
[A,B]
[A,B,A]
[A,B,A,A,B]
[A,B,A,A,B,A,B,A]
[A,B,A,A,B,A,B,A,A,B,A,A,B]
[A,B,A,A,B,A,B,A,A,B,A,A,B,A,B,A,A,B,A,B,A]
[A,B,A,A,B,A,B,A,A,B,A,A,B,A,B,A,A,B,A,B,A,A,B,A,A,B,A,B,A,A,B,A,A,B]
In general, bind for lists is concatMap, and you use it precisely when you want to combine mapping with concatenation. Another interpretation is that lists represent non-deterministic choice and that bind functions by choosing each possibility from the list once. For example, rolling dice:
do
d1 <- [1..6]
d2 <- [1..6]
return (d1, d2)
This gives all possible ways of rolling 2d6.

factors :: Int -> [Int]
factors n = do
q <- [1..n]
filter ((==n) . (*q)) [1..n]
...or, in desugared notation,
factors n = [1..n] >>= ($[1..n]) . filter . fmap (==n) . (*)
That's of course hardly efficient, but it works:
*Main> factors 17
[17,1]
*Main> factors 24
[24,12,8,6,4,3,2,1]
*Main> factors 34
[34,17,2,1]
For operations that's aren't so simple as *, so you couldn't avoid a brute-force approach like that, this might actually be a good solution.

For one thing, concatMap is just (=<<). And concat is just join. I've used both of these frequently in real code.
Another thing you can do is apply a list of functions to one value.
λ:> applyList = sequence
λ:> applyList [(2*), (3+)] 4
[8,7]
You can also generate a list of all subsets of a list
λ:> import Control.Monad
λ:> allSubsets = filterM (const [True, False])
λ:> allSubsets "ego"
["ego","eg","eo","e","go","g","o",""]
Or even enumerate all strings that can be formed from an alphabet
λ:> import Data.List
λ:> import Control.Monad
λ:> allStrings = sequence <=< (inits . repeat)
λ:> take 100 $ allStrings ['a'..'z']
["","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","aa","ab","ac","ad","ae","af","ag","ah","ai","aj","ak","al","am","an","ao","ap","aq","ar","as","at","au","av","aw","ax","ay","az","ba","bb","bc","bd","be","bf","bg","bh","bi","bj","bk","bl","bm","bn","bo","bp","bq","br","bs","bt","bu","bv","bw","bx","by","bz","ca","cb","cc","cd","ce","cf","cg","ch","ci","cj","ck","cl","cm","cn","co","cp","cq","cr","cs","ct","cu"]
Perhaps more practically, you can use the applicative instance to combine two lists together
λ:> zipWith' f xs ys = f <$> xs <*> ys
λ:> zipWith' (+) [1..3] [5..8]
[6,7,8,9,7,8,9,10,8,9,10,11]

Related

String combination in Haskell

I'm writing an algorithm in Haskell that simplifies context-free grammars and I've been struggling with the removal of null productions, more specifically with the "substitution" of nullable non-terminals in the other productions.
Given a string, let's say "ASA", I would like to return a list of strings built by removing the character "A" one, two, ... every time it appears.
To be clear, given "ASA" I'd like to return this: ["SA", "AS", "S"].
In Python I did it quite easily, but in Haskell I don't know how to iterate over the string and manipulate it how I'd like to. Probably because I'm still not used tu functional programming.
A library-based approach:
A given input character may or may not be in any of the output partial strings. So it seems natural to involve the Haskell Maybe type transformer. It is similar to std::optional in C++.
We can have an expand function that associates to each input character a list of the corresponding possibilities:
$ ghci
λ>
λ> st = "ASA"
λ>
λ> expand ch = if (ch == 'A') then [ Just ch, Nothing ] else [ Just ch ]
λ>
λ> map expand st
[[Just 'A',Nothing],[Just 'S'],[Just 'A',Nothing]]
λ>
What we need is basically the Cartesian product of the above lists of possibilites. A list Cartesian product can be obtained by using the highly polymorphic sequence library function:
λ>
λ> sequence (map expand st)
[[Just 'A',Just 'S',Just 'A'],[Just 'A',Just 'S',Nothing],[Nothing,Just 'S',Just 'A'],[Nothing,Just 'S',Nothing]]
λ>
Next, we need to change for example [Just 'A', Just 'S', Nothing] into ['A', 'S'], which in Haskell is exactly the same thing as "AS". The required function would have as its type signature:
func :: [Maybe α] -> [α]
If we submit this candidate type signature into Hoogle, we readily get library function catMaybes:
λ>
λ> import qualified Data.Maybe as Mb
λ>
λ> Mb.catMaybes [Just 'A',Just 'S',Nothing]
"AS"
λ>
λ> map Mb.catMaybes (sequence (map expand st))
["ASA","AS","SA","S"]
λ>
and we just have to remove the full string "ASA" from that last list.
Of course, there is no need to restrict this to the Char data type. Any type with a proper equality test can do. And the privileged character 'A' should be made into a variable argument. Overall, this gives us the following code:
import qualified Data.Maybe as Mb
multiSuppressor :: Eq α => α -> [α] -> [[α]]
multiSuppressor e xs =
let expand e1 = if (e1 == e) then [ Just e1, Nothing ] else [ Just e1 ]
maybes = sequence (map expand xs)
res1 = map Mb.catMaybes maybes
in
-- final massaging as the whole list is normally unwanted:
if (null xs) then [[]] else filter (/= xs) res1
A note on efficiency:
Function sequence is polymorphic. Being the list cartesian product is not its sole role in life. Unfortunately, this happens to have the sad side effect that its memory consumption can become quite large if you go beyond toy-sized examples.
If this becomes a problem, one can use the following replacement code instead, which is based on an idea by K. A. Buhr:
cartesianProduct :: [[α]] -> [[α]]
cartesianProduct xss =
map reverse (helper (reverse xss))
where
helper [] = [[]]
helper (ys:zss) = [y:zs | zs <- helper zss, y <- ys]

parenthesis in Haskell functions

I just want to know how do we know which functions need brackets () and which ones do not? For example
replicate 100 (product (map (*3) (zipWith max [1,2,3,4,5] [4,5,6,7,8])))
works fine. But
replicate 100 (product (map (*3) (zipWith (max [1,2,3,4,5] [4,5,6,7,8]))))
does not work. It is because I put a set of brackets for zipWith. In this small example, zipWith and max do not have brackets, but replicate, product and map do. In general is there a way to know/figure out which functions need brackets and which ones dont.
Function application is left associative. So, when you write an expression like:
f g h x
it means:
((f g) h) x
And also the type of zipWith provides a clue:
zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
it says that zipWith has 3 parameters: a function and two lists.
When you write:
zipWith (max [1,2,3,4,5] [4,5,6,7,8])
The interpreter will understand that
max [1,2,3,4,5] [4,5,6,7,8]
will be the first parameter to zipWith, which is type incorrect. Note that zipWith expects a function of two arguments as its first argument and, as pointed out by #Cubic, max [1,2,3,4,5] [4,5,6,7,8] will return the maximum
between these two lists according the usual lexicographic order, which will be of type [a], for some type a which is instance of Ord and Num. Said that, the error become evident since you are trying to pass a value of type
(Num a, Ord a) => [a]
where a value of type
(a -> b -> c)
is expected.
Rodrigo gave the right answer. I'll just add that it is a misconception to think that some functions need parentheses, while others don't.
This is just like in school math:
3 * (4+5)
It is simply not the case that + expressions need parentheses and * expressions don't need them in general.
In Haskell, you can always get away without parentheses at all. Whenever you need to enclose an expression in parentheses, the alternative is to introduce a local name and bind it to that expression, then use the name instead of the expression.
In your example:
replicate 100 (product (map (*3) (zipWith max [1,2,3,4,5] [4,5,6,7,8])))
let list1 = product list2
list2 = map thrice list3
thrice x = x*3
list3 = zipWith max [1,2,3,4,5] [4,5,6,7,8]
in replicate 100 list1
In fact, I often write functions top down thus:
foo x y z = result
where
result = ...
...
However, as it was said before, expressions that consist of function applications can also often be written without parentheses by making use of (.) and ($) and in such cases, the top down approach from above may be overly verbose and the following would be much clearer (because there is no noise through newly introduced names):
replicate 100
. product
. map (*3)
$ zipWith max [1..5] [4..8]

Generating all combinations of 6 Xs with 3 Qs in Haskell

I am trying to generate a list of all strings that consist of 6 Xs and 3 Qs.
A subset of the list I am trying to generate is as follows:
["XXXXXXQQQ", "XQXXQXXQX", "QXQXQXXXX",...
What is a good way to go about this?
Here is a dynamic programming solution using Data.Array. mem just stores memoized values.
import Data.Array
strings :: Int -> Int -> [String]
strings n m = strings' n m
where
mem :: Array (Int,Int) [String]
mem = array ((0,0),(n,m)) [ ((i,j), strings' i j) | i <- [0..n], j <- [0..m] ]
strings' 0 m = [replicate m 'X']
strings' n 0 = [replicate n 'Q']
strings' n m = (('Q':) <$> mem ! (n-1,m)) ++ (('X':) <$> mem ! (n,m-1))
The naive solution is to recursively choose one of X or Q until we run out of choices to make. This is especially convenient when using the list monad to model the nondeterministic choice, and leads to quite short code:
stringsNondet m 0 = [replicate m 'X']
stringsNondet 0 n = [replicate n 'Q']
stringsNondet m n = do
(char, m', n') <- [('X', m-1, n), ('Q', m, n-1)]
rest <- stringsNondet m' n'
return (char:rest)
The disadvantage of this approach is that it does a lot of extra work. If we choose an X and then choose a Q, the continuations are the same as if we had chosen a Q and then an X, but these continuations will be recomputed in the above. (And similarly for other choice paths that lead to shared continuations.)
Alec has posted a dynamic programming solution which solves this problem by introducing a recursively-defined array to share the subcomputations. I like this solution, but the recursive definition is a bit mind-bending. The following solution is also a dynamic programming solution -- subcomputations are also shared -- but uses no hand-written recursion. It does make use of standard recursive patterns (map, zip, iterate, ++, and !!) but notably does not require "tying the knot" as Alec's solution does.
As a warmup, let's discuss the type of the function of interest to us:
step :: [[String]] -> [[String]]
The final result of interest to us is [String], a collection of strings with a fixed number m of 'X's and a fixed number n of 'Q's. The step function will expect a collection of results, all of the same length, and will assume that the result at index m has m copies of 'X'. It will also produce a result with these properties, and where each result is one longer than the input results.
We implement step by producing two intermediate [[String]]s, one with an extra 'X' compared to the input results and one with an extra 'Q'. These two intermediates can then be zipped together with a little "stutter" to represent the slight difference in 'X' count between them. Thus:
step css = zipWith (++)
([[]] ++ map (map ('X':)) css)
(map (map ('Q':)) css ++ [[]])
The top-level function is now easy to write: we simply index into the iterated version of step by the length of the final string we want, then index into the list of results we get that way by the number of 'X's we want.
strings m n = iterate step [[[]]] !! (m+n) !! m
A bonus of this approach is the single, aesthetically pleasing base case of [[[]]].
Use permutations and nub functions from Data.List:
Prelude Data.List> nub $ permutations "XXXXXXQQQ"
["XXXXXXQQQ","QXXXXXXQQ","XQXXXXXQQ","XXQXXXXQQ","XXXQXXXQQ","XXXXQXXQQ","XXXXXQXQQ","QQXXXXXXQ","QXQXXXXXQ","QXXQXXXXQ","QXXXQXXXQ","QXXXXQXXQ","QXXXXXQXQ","XQQXXXXXQ","XQXQXXXXQ","XQXXQXXXQ","XQXXXQXXQ","XQXXXXQXQ","XXQQXXXXQ","XXQXQXXXQ","XXQXXQXXQ","XXQXXXQXQ","XXXQQXXXQ","XXXQXQXXQ","XXXQXXQXQ","XXXXQQXXQ","XXXXQXQXQ","XXXXXQQXQ","QQQXXXXXX","QQXQXXXXX","QQXXQXXXX","QQXXXQXXX","QQXXXXQXX","QQXXXXXQX","QXQQXXXXX","XQQQXXXXX","XQQXQXXXX","XQQXXQXXX","XQQXXXQXX","XQQXXXXQX","QXQXQXXXX","QXQXXQXXX","QXQXXXQXX","QXQXXXXQX","QXXQQXXXX","XQXQQXXXX","XXQQQXXXX","XXQQXQXXX","XXQQXXQXX","XXQQXXXQX","XQXQXQXXX","XQXQXXQXX","XQXQXXXQX","QXXQXQXXX","QXXQXXQXX","QXXQXXXQX","QXXXQQXXX","XQXXQQXXX","XXQXQQXXX","XXXQQQXXX","XXXQQXQXX","XXXQQXXQX","XXQXQXQXX","XXQXQXXQX","XQXXQXQXX","XQXXQXXQX","QXXXQXQXX","QXXXQXXQX","QXXXXQQXX","XQXXXQQXX","XXQXXQQXX","XXXQXQQXX","XXXXQQQXX","XXXXQQXQX","XXXQXQXQX","XQXXXQXQX","QXXXXQXQX","XXQXXQXQX","QXXXXXQQX","XQXXXXQQX","XXQXXXQQX","XXXQXXQQX","XXXXQXQQX","XXXXXQQQX"]
We can have a faster implementation as well:
insertAtEvery x [] = [[x]]
insertAtEvery x (y:ys) = (x:y:ys) : map (y:) (insertAtEvery x ys)
combinations [] = [[]]
combinations (x:xs) = nub . concatMap (insertAtEvery x) . combinations $ xs
Comparison with the previous solution in ghci:
Prelude Data.List> (sort . nub . permutations $ "XXXXXXQQQ") == (sort . combinations $ "XXXXXXQQQ")
True
Prelude Data.List> :set +s
Prelude Data.List> combinations "XXXXXXQQQ"
["XXXXXXQQQ","XXXXXQXQQ","XXXXXQQXQ","XXXXXQQQX","XXXXQXXQQ","XXXXQXQXQ","XXXXQXQQX","XXXXQQXXQ","XXXXQQXQX","XXXXQQQXX","XXXQXXXQQ","XXXQXXQXQ","XXXQXXQQX","XXXQXQXXQ","XXXQXQXQX","XXXQXQQXX","XXXQQXXXQ","XXXQQXXQX","XXXQQXQXX","XXXQQQXXX","XXQXXXXQQ","XXQXXXQXQ","XXQXXXQQX","XXQXXQXXQ","XXQXXQXQX","XXQXXQQXX","XXQXQXXXQ","XXQXQXXQX","XXQXQXQXX","XXQXQQXXX","XXQQXXXXQ","XXQQXXXQX","XXQQXXQXX","XXQQXQXXX","XXQQQXXXX","XQXXXXXQQ","XQXXXXQXQ","XQXXXXQQX","XQXXXQXXQ","XQXXXQXQX","XQXXXQQXX","XQXXQXXXQ","XQXXQXXQX","XQXXQXQXX","XQXXQQXXX","XQXQXXXXQ","XQXQXXXQX","XQXQXXQXX","XQXQXQXXX","XQXQQXXXX","XQQXXXXXQ","XQQXXXXQX","XQQXXXQXX","XQQXXQXXX","XQQXQXXXX","XQQQXXXXX","QXXXXXXQQ","QXXXXXQXQ","QXXXXXQQX","QXXXXQXXQ","QXXXXQXQX","QXXXXQQXX","QXXXQXXXQ","QXXXQXXQX","QXXXQXQXX","QXXXQQXXX","QXXQXXXXQ","QXXQXXXQX","QXXQXXQXX","QXXQXQXXX","QXXQQXXXX","QXQXXXXXQ","QXQXXXXQX","QXQXXXQXX","QXQXXQXXX","QXQXQXXXX","QXQQXXXXX","QQXXXXXXQ","QQXXXXXQX","QQXXXXQXX","QQXXXQXXX","QQXXQXXXX","QQXQXXXXX","QQQXXXXXX"]
(0.01 secs, 3,135,792 bytes)
Prelude Data.List> nub $ permutations "XXXXXXQQQ"
["XXXXXXQQQ","QXXXXXXQQ","XQXXXXXQQ","XXQXXXXQQ","XXXQXXXQQ","XXXXQXXQQ","XXXXXQXQQ","QQXXXXXXQ","QXQXXXXXQ","QXXQXXXXQ","QXXXQXXXQ","QXXXXQXXQ","QXXXXXQXQ","XQQXXXXXQ","XQXQXXXXQ","XQXXQXXXQ","XQXXXQXXQ","XQXXXXQXQ","XXQQXXXXQ","XXQXQXXXQ","XXQXXQXXQ","XXQXXXQXQ","XXXQQXXXQ","XXXQXQXXQ","XXXQXXQXQ","XXXXQQXXQ","XXXXQXQXQ","XXXXXQQXQ","QQQXXXXXX","QQXQXXXXX","QQXXQXXXX","QQXXXQXXX","QQXXXXQXX","QQXXXXXQX","QXQQXXXXX","XQQQXXXXX","XQQXQXXXX","XQQXXQXXX","XQQXXXQXX","XQQXXXXQX","QXQXQXXXX","QXQXXQXXX","QXQXXXQXX","QXQXXXXQX","QXXQQXXXX","XQXQQXXXX","XXQQQXXXX","XXQQXQXXX","XXQQXXQXX","XXQQXXXQX","XQXQXQXXX","XQXQXXQXX","XQXQXXXQX","QXXQXQXXX","QXXQXXQXX","QXXQXXXQX","QXXXQQXXX","XQXXQQXXX","XXQXQQXXX","XXXQQQXXX","XXXQQXQXX","XXXQQXXQX","XXQXQXQXX","XXQXQXXQX","XQXXQXQXX","XQXXQXXQX","QXXXQXQXX","QXXXQXXQX","QXXXXQQXX","XQXXXQQXX","XXQXXQQXX","XXXQXQQXX","XXXXQQQXX","XXXXQQXQX","XXXQXQXQX","XQXXXQXQX","QXXXXQXQX","XXQXXQXQX","QXXXXXQQX","XQXXXXQQX","XXQXXXQQX","XXXQXXQQX","XXXXQXQQX","XXXXXQQQX"]
(0.71 secs, 161,726,128 bytes)

Long working of program that count Ints

I want to write program that takes array of Ints and length and returns array that consist in position i all elements, that equals i, for example
[0,0,0,1,3,5,3,2,2,4,4,4] 6 -> [[0,0,0],[1],[2,2],[3,3],[4,4,4],[5]]
[0,0,4] 7 -> [[0,0],[],[],[],[4],[],[]]
[] 3 -> [[],[],[]]
[2,2] 3 -> [[],[],[2,2]]
So, that's my solution
import Data.List
import Data.Function
f :: [Int] -> Int -> [[Int]]
f ls len = g 0 ls' [] where
ls' = group . sort $ ls
g :: Int -> [[Int]] -> [[Int]] -> [[Int]]
g val [] accum
| len == val = accum
| otherwise = g (val+1) [] (accum ++ [[]])
g val (x:xs) accum
| len == val = accum
| val == head x = g (val+1) xs (accum ++ [x])
| otherwise = g (val+1) (x:xs) (accum ++ [[]])
But query f [] 1000000 works really long, why?
I see we're accumulating over some data structure. I think foldMap. I ask "Which Monoid"? It's some kind of lists of accumulations. Like this
newtype Bunch x = Bunch {bunch :: [x]}
instance Semigroup x => Monoid (Bunch x) where
mempty = Bunch []
mappend (Bunch xss) (Bunch yss) = Bunch (glom xss yss) where
glom [] yss = yss
glom xss [] = xss
glom (xs : xss) (ys : yss) = (xs <> ys) : glom xss yss
Our underlying elements have some associative operator <>, and we can thus apply that operator pointwise to a pair of lists, just like zipWith does, except that when we run out of one of the lists, we don't truncate, rather we just take the other. Note that Bunch is a name I'm introducing for purposes of this answer, but it's not that unusual a thing to want. I'm sure I've used it before and will again.
If we can translate
0 -> Bunch [[0]] -- single 0 in place 0
1 -> Bunch [[],[1]] -- single 1 in place 1
2 -> Bunch [[],[],[2]] -- single 2 in place 2
3 -> Bunch [[],[],[],[3]] -- single 3 in place 3
...
and foldMap across the input, then we'll get the right number of each in each place. There should be no need for an upper bound on the numbers in the input to get a sensible output, as long as you are willing to interpret [] as "the rest is silence". Otherwise, like Procrustes, you can pad or chop to the length you need.
Note, by the way, that when mappend's first argument comes from our translation, we do a bunch of ([]++) operations, a.k.a. ids, then a single ([i]++), a.k.a. (i:), so if foldMap is right-nested (which it is for lists), then we will always be doing cheap operations at the left end of our lists.
Now, as the question works with lists, we might want to introduce the Bunch structure only when it's useful. That's what Control.Newtype is for. We just need to tell it about Bunch.
instance Newtype (Bunch x) [x] where
pack = Bunch
unpack = bunch
And then it's
groupInts :: [Int] -> [[Int]]
groupInts = ala' Bunch foldMap (basis !!) where
basis = ala' Bunch foldMap id [iterate ([]:) [], [[[i]] | i <- [0..]]]
What? Well, without going to town on what ala' is in general, its impact here is as follows:
ala' Bunch foldMap f = bunch . foldMap (Bunch . f)
meaning that, although f is a function to lists, we accumulate as if f were a function to Bunches: the role of ala' is to insert the correct pack and unpack operations to make that just happen.
We need (basis !!) :: Int -> [[Int]] to be our translation. Hence basis :: [[[Int]]] is the list of images of our translation, computed on demand at most once each (i.e., the translation, memoized).
For this basis, observe that we need these two infinite lists
[ [] [ [[0]]
, [[]] , [[1]]
, [[],[]] , [[2]]
, [[],[],[]] , [[3]]
... ...
combined Bunchwise. As both lists have the same length (infinity), I could also have written
basis = zipWith (++) (iterate ([]:) []) [[[i]] | i <- [0..]]
but I thought it was worth observing that this also is an example of Bunch structure.
Of course, it's very nice when something like accumArray hands you exactly the sort of accumulation you need, neatly packaging a bunch of grungy behind-the-scenes mutation. But the general recipe for an accumulation is to think "What's the Monoid?" and "What do I do with each element?". That's what foldMap asks you.
The (++) operator copies the left-hand list. For this reason, adding to the beginning of a list is quite fast, but adding to the end of a list is very slow.
In summary, avoid adding things to the end of a list. Try to always add to the beginning instead. One simple way to do that is to build the list backwards, and then reverse it at the end. A more devious trick is to use "difference lists" (Google it). Another possibility is to use Data.Sequence rather than a list.
The first thing that should be noted is the most obvious way to implement this is use a data structure that allows random access, an array is an obviously choice. Note that you need to add the elements to the array multiple times and somehow "join them".
accumArray is perfect for this.
So we get:
f l i = elems $ accumArray (\l e -> e:l) [] (0,i-1) (map (\e -> (e,e)) l)
And we're good to go (see full code here).
This approach does involve converting the final array back into a list, but that step is very likely faster than say sorting the list, which often involves scanning the list at least a few times for a list of decent size.
Whenever you use ++ you have to recreate the entire list, since lists are immutable.
A simple solution would be to use :, but that builds a reversed list. However that can be fixed using reverse, which results in only building two lists (instead of 1 million in your case).
Your concept of glomming things onto an accumulator is a very useful one, and both MathematicalOrchid and Guvante show how you can use that concept reasonably efficiently. But in this case, there is a simpler approach that is likely also faster. You started with
group . sort $ ls
and this was a very good place to start! You get a list that's almost the one you want, except that you need to fill in some blanks. How can we figure those out? The simplest way, though probably not quite the most efficient, is to work with a list of all the numbers you want to count up to: [0 .. len-1].
So we start with
f ls len = g [0 .. len-1] (group . sort $ ls)
where
?
How do we define g? By pattern matching!
f ls len = g [0 .. len-1] (group . sort $ ls)
where
-- We may or may not have some lists left,
-- but we counted as high as we decided we
-- would
g [] _ = []
-- We have no lists left, so the rest of the
-- numbers are not represented
g ns [] = map (const []) ns
-- This shouldn't be possible, because group
-- doesn't make empty lists.
g _ ([]:_) = error "group isn't working!"
-- Finally, we have some work to do!
g (n:ns) xls#(xl#(x:_):xls')
| n == x = xl : g ns xls'
| otherwise = [] : g ns xls
That was nice, but making the list of numbers isn't free, so you might be wondering how you can optimize it. One method I invite you to try is using your original technique of keeping a separate counter, but following this same sort of structure.

Cont Monad breaks laziness in Haskell

I was trying the Cont monad, and discovers the following problem.
First construct a infinite list and lift all the elements to a Cont monad
Use sequence operation to get a Cont monad on the infinite list.
When we try to run the monad, with head, for example, it falls into infinite loop
while trying to expand the continuation and the head is never called.
The code looks like this:
let inff = map (return :: a -> Cont r a) [0..]
let seqf = sequence inff
runCont seqf head
So is this a limitation of the Cont monad implementation in Haskell?
If so, how do we improve this?
The reason is that even though the value of the head element of sequence someList depends only on the first elemenent of someList, the effect of sequence someList can generally depend on all the effects of someList (and it does for most monads). Therefore, if we want to evaluate the head element, we still need to evaluate all the effects.
For example, if we have a list of Maybe values, the result of sequence someList is Just only if all the elements of someList are Just. So if we try to sequence an infinite list, we'd need to examine its infinite number of elements if they're all Just.
The same applies for Cont.
In the continuation monad, we can escape any time from the computation and return a result that is different from what has been computed so far.
Consider the following example:
test :: (Num a, Enum a) => a
test = flip runCont head $
callCC $ \esc -> do
sequence (map return [0..100] ++ [esc [-1]])
or directly using cont instead of callCC:
test' :: (Num a, Enum a) => a
test' = flip runCont head $
sequence (map return [0..100] ++ [cont (const (-1))])
The result of test is just -1. After processing the first 100 elements, the final element can decide to escape all of this and return -1 instead. So in order to see what is the head element of sequence someList in Cont, we again need to compute them all.
This is not a flaw with the Cont monad so much as sequence. You can get similar results for Either, for example:
import Control.Monad.Instances ()
xs :: [Either a Int]
xs = map Right [0..] -- Note: return = Right, for Either
ys :: Either a [Int]
ys = sequence xs
You can't retrieve any elements of ys until it computes the entire list, which will never happen.
Also, note that: sequence (map f xs) = mapM f xs, so we can simplify this example to:
>>> import Control.Monad.Instances
>>> mapM Right [0..]
<Hangs forever>
There are a few monads where mapM will work on an infinite list of values, specifically the lazy StateT monad and Identity, but they are the exception to the rule.
Generally, mapM/sequence/replicateM (without trailing underscores) are anti-patterns and the correct solution is to use pipes, which allows you to build effectful streams that don't try to compute all the results up front. The beginning of the pipes tutorial describes how to solve this in more detail, but the general rule of thumb is that any time you write something like:
example1 = mapM f xs
example2 = sequence xs
You can transform it into a lazy Producer by just transforming it to:
example1' = each xs >-> Pipes.Prelude.mapM f
example2' = each xs >-> Pipes.Prelude.sequence
Using the above example with Either, you would write:
>>> import Pipes
>>> let xs = each [0..] >-> mapM Right :: Producer Int (Either a) ()
Then you can lazily process the stream without generating all elements:
>>> Pipes.Prelude.any (> 10) xs
Right True

Resources