Two friends P1 and P2 send the SAME message M to a mutual friend say P3.
However due to some network damage P3 only receives only one character at a time without knowing if the character received belongs to P1 or P2.
Furthermore P3 might receive X characters from P1 then Y characters from P2 or vice versa but whatever the order P3 will receive ALL characters that both P1 and P2 sent.
Given the sequence S of characters that P3 received help him determine the initial message M that consists only of 0s and 1s
Note that there might be more than one solutions to the problem however getting just one is fine.
Examples :
1) S = [0,1,0,0,1,0] then M = "010"
2) S = [0,0,1,1,0,0,1,1,0,0] then M = "01010" or M = "00110"
To clarify the order and the ownership of each character :
Say M = "cat" then S might be :
1) [c1,c2,a2,t2,a1,t1]
2) [c1,a1,t1,c2,a2,t2]
3) [c1,c2,a1,a2,t2,t1]
Where xi stands for : Character x belongs to person i.
Given the fact that P1 and P2 send the same message then :
There is a fixed amount of 0s that P1 and P2 can send
There is also a fixed amount of 1s that P1 and P2 can send
Length of M will obviously be an even number
At first I implemented the predicate above using Prolog and A's (0) and B's (1) where backtracking is fairly easy and I applied a constraint that prunes my search tree so that my approach is not a brute force one :
Prolog Code :
countCharacters([],A,B,A,B).
countCharacters([C|T],A,B,X,Y) :- % Count A's per person and B's per person
(C == a -> A1 is A + 1,countCharacters(T,A1,B,X,Y);
B1 is B + 1,countCharacters(T,A,B1,X,Y)).
countCharacters(L,A,B) :-
countCharacters(L,0,0,X,Y),
A is X / 2,
B is Y / 2.
rightOrder([],_) :- !.
rightOrder(_,[]) :- !.
rightOrder([C1|_],[C2|_]) :- C1 \= C2,!,false.
rightOrder([C|T1],[C|T2]) :- % Constraint that checks if two lists have the same order
rightOrder(T1,T2).
determine([],M1,M2,_,_,_,_,M1) :- M1 == M2,!.
determine(L,M1,M2,A1,B1,A2,B2,X) :-
A1 == 0,
B1 == 0,
append(M2,L,NM2),
rightOrder(M1,NM2),
determine([],M1,NM2,A1,B1,A2,B2,X).
determine([a|T],M1,M2,A1,B1,A2,B2,X) :-
A1 > 0,
NA1 is A1 - 1,
append(M1,[a],NM1),
determine(T,NM1,M2,NA1,B1,A2,B2,X).
determine([b|T],M1,M2,A1,B1,A2,B2,X) :-
B1 > 0,
NB1 is B1 - 1,
append(M1,[b],NM1),
determine(T,NM1,M2,A1,NB1,A2,B2,X).
determine([a|T],M1,M2,A1,B1,A2,B2,X) :-
A2 > 0,
NA2 is A2 - 1,
append(M2,[a],NM2),
rightOrder(M1,NM2),
determine(T,M1,NM2,A1,B1,NA2,B2,X).
determine([b|T],M1,M2,A1,B1,A2,B2,X) :-
B2 > 0,
NB2 is B2 - 1,
append(M2,[b],NM2),
rightOrder(M1,NM2),
determine(T,M1,NM2,A1,B1,A2,NB2,X).
determine(L,M) :-
countCharacters(L,AS,BS),
determine(L,[],[],AS,BS,AS,BS,M).
The code above is not that optimized as I've been studying Prolog for just a few weeks now, however I need some help or insight on how to implement the same predicate in Haskell as I have no clue on how to backtrack.
If you need more clarifications let me know.
An inefficient way to do this in Haskell would be with the list monad, which simulates nondeterminism.
One way to arrive at a solution is to consider the problem from the opposite direction: how would you generate the possible ways the message could have been interleaved? Essentially for every element in the output, there will have been a choice between taking it from one sender or the other, or all the remaining elements will come from the same sender if one has run out of elements. Expressed literally:
-- Compute all the possible interleavings of a list with itself.
interleavings :: [a] -> [[a]]
interleavings xs0 = go xs0 xs0
where
-- If the first list has run out,
-- return the remainder of the second.
go [] rs = pure rs
-- And vice versa.
go ls [] = pure ls
-- If both lists are nonempty:
go ls#(l : ls') rs#(r : rs') = do
-- Toss a coin;
choice <- [False, True]
case choice of
-- If tails, take an element from the left sender
-- and prepend it to all possible remaining interleavings.
False -> fmap (l :) (go ls' rs)
-- If heads, take from the right sender.
True -> fmap (r :) (go ls rs')
Note that this generates many duplicate entries, since it doesn’t backtrack or prune:
> interleavings "10"
["1010","1100","1100","1100","1100","1010"]
However, it does point the way to the start of a solution. You want to run the above process in reverse: given an interleaving, generate a series of choices and assume that each element came from the assumed list, keeping track of the deinterleaved lists. If they’re equal at the end, then they represent a valid deinterleaving:
-- The possible deinterleavings of a list
-- whose elements can be compared for equality.
deinterleavings :: (Eq a) => [a] -> [[a]]
-- Begin searching assuming no elements have been sent by either sender.
deinterleavings xs0 = go [] [] xs0
where
-- If there is an element remaining:
go ls rs (x : xs) = do
-- Toss a coin;
choice <- [False, True]
case choice of
-- If tails, assume it came from the left sender and proceed.
-- (Note that this accumulates in reverse, adding to the head.)
False -> go (x : ls) rs xs
-- If heads, assume the right sender.
True -> go ls (x : rs) xs
-- If there are no elements remaining:
go ls rs [] = do
-- Require that the accumulated messages be identical.
guard (ls == rs)
-- Return the (de-reversed) message.
pure (reverse ls)
Again this is extremely inefficient:
> deinterleavings "0011001100"
["00110","00110","01100","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01100","01100","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01100","00110","00110"]
But I hope it illustrates the general structure of a solution that you can improve upon.
Consider how you could introduce guards earlier, or accumulate elements differently to prune the search; or use a different monad that does backtracking like Logic; or maintain a stateful set of results with State (or even IO) so that you can check during the computation which results you’ve already seen. Also consider how you could approach the problem from another angle entirely, based on the fact that the interleaved message contains the same string twice as subsequences, since there are standard efficient memoised algorithms for the “longest common subsequence” and “longest repeating subsequence”.
Related
To improve my Haskell skills, I'm trying to solve the Advent of Code 2018. As expected, I am already stuck on day 1, specifically on part 2:
--- Part Two ---
You notice that the device repeats the same frequency change list over
and over.
To calibrate the device, you need to find the first frequency it
reaches twice.
For example, using the same list of changes above, the device would
loop as follows:
Current frequency 0, change of +1; resulting frequency 1.
Current frequency 1, change of -2; resulting frequency -1.
Current frequency -1, change of +3; resulting frequency 2.
Current frequency 2, change of +1; resulting frequency 3.
(At this point, the device continues from the start of the list.)
Current frequency 3, change of +1; resulting frequency 4.
Current frequency 4, change of -2; resulting frequency 2, which has
already been seen.
In this example, the first frequency reached twice is 2. Note that
your device might need to repeat its list of frequency changes many
times before a duplicate frequency is found, and that duplicates might
be found while in the middle of processing the list.
Here are other examples:
+1, -1 first reaches 0 twice.
+3, +3, +4, -2, -4 first reaches 10 twice.
-6, +3, +8, +5, -6 first reaches 5 twice.
+7, +7, -2, -7, -4 first reaches 14 twice.
What is the first frequency your device reaches twice?
Basically, I have a very large list vals::[Int] that includes all the frequency changes mentioned above.
Here is the function I wrote for solving this problem:
-- [1] The list of frequency changes
-- [2] The first repeat frequency
-- [1] [2]
part2helper :: [Int] -> Int
part2helper ds = go ds []
where go ds [] = go ds [0]
go (d:ds) [f] = go ds $ (f+d):[f]
go (d:ds) (f:fs) =
if f `elem` fs then f
else go ds $ (d+f):f:fs
I test this function with the values provided in the description in ghci:
*Main> part2helper (cycle [1, -2, 3, 1])
2
*Main> part2helper (cycle [1, -1])
0
*Main> part2helper (cycle [3, 3, 4, -2, -4])
10
*Main> part2helper (cycle [7, 7, -2, -7, -4])
14
*Main>
All result are correct, so I assume my function works correctly. The problem now is, when I compile this into a program that reads the input list from a file, the program never terminates. Here's the code:
module Main where
import System.Environment
main = do
[input] <- getArgs
s <- readFile input
let n = lines $ s
vals = map (read::String->Int) $ fmap (filter (/='+')) n
sol = part2helper (cycle vals)
print sol
-- [1] The list of frequency changes
-- [2] The first repeat frequency
-- [1] [2]
part2helper :: [Int] -> Int
part2helper ds = go ds []
where go ds [] = go ds [0]
go (d:ds) [f] = go ds $ (f+d):[f]
go (d:ds) (f:fs) =
if f `elem` fs then f
else go ds $ (d+f):f:fs
This builds with GHC correctly, but as I said, never terminates and prints no result. What am I missing? The input file can be found here.
You're trying to put everything together in a single function. It's much better if you work in a modular fashion, breaking the problem into smaller ones.
Here's an idea,
generate the sequence of frequencies,
f0, f1, f2...
generate the sequence of cumulative sets of frequencies
{}, {f0}, {f0,f1}, {f0,f1,f2}...
check repeated insertions, i.e.
fi such that fi ∈ {f0..fi-1}
To make things clearer regarding the last point consider,
f0, f1, f2, f3...
{}, {f0}, {f0,f1}, {f0,f1,f2}...`
if f3 were a repetition then f3 ∈ {f0,f1,f2}
This may seem terribly inefficient but because Haskell is lazy, these lists will be generated as needed.
We'll need to import modules to work with sets and maybes,
import Data.Set
import Data.Maybe
Generating the frequencies from the first frequency and a list of frequency changes can be done via scanl (+). The function scanl (+) x xs operates the elements of xs with the operator + , starting at x, generating the cumulative list of sums.
freqs :: Int -> [Int] -> [Int]
freqs = scanl (+)
Now we can generate the list of sets. Here too we use a scanl. In each step we insert a new frequency, and we start with the empty set.
sets :: [Int] -> [Set Int]
sets = scanl (\s x -> insert x s) (empty)
Once we have the frequencies and the sets we are pretty much done.The main function just puts everything together. It combines both lists and finds the first pair (fi , {f0,...,fi-1}) such that fi ∈ {f0,...,fi-1}, and returns the corresponding fi
result :: Int -> [Int] -> Maybe Int
result x xs = let fs = freqs x xs
ss = sets fs
r = find (\(y,s) -> y `elem` s) (zip fs ss)
in fmap fst r
Note find returns a Maybe (Int, Set Int). It may find Nothing or return Just (x,s) for some frequency x that was already in s. We use fmap fst to turn Just (x,s) into Just x.
EDIT
Once you've got things working if you wish to, may optimize a few things, or play around with your style. The following is a more succinct, and possibly a little bit more efficient version.
The list of frequencies and sets can be built together in one go.
freqsets :: Int -> [Int] -> [(Int, Set Int)]
freqsets f0 = scanl (\(f,s) x -> (f+x,insert f s)) (f0,empty)
And so it's ready to use for the result function. Also we can take advantage of Maybe being a monad to make things a bit more readable.
result :: Int -> [Int] -> Maybe Int
result f0 xs = do (f,_) <- find(\(y,s)->y `elem` s) (freqsets f0 xs)
return f
And there you have it, a rather short solution. I like the change in the result function. I like the do notation, as well as not having it calculate the zipping of the two previous lists. I'm not so sure if "fusing" the building of both lists is worth it. It's a bit less readable. Using three functions, one for frequencies, one for sets, and one for zipping, might be best.
I am trying to get the output, given an input
> halve [1,2,3,4,5,6]
([1,2,3],[4,5,6])
I have solved this problem using this approach:
halve xs = ((take s xs), (drop s xs))
where
s = (length xs) `div` 2
I am a beginner in Haskell and I want to learn how to solve this question using pattern matching? Thanks
You can make use of a variant of the hare and tortoise algorithm. This algorithm basically runs over the list with two iterators: the hare taking two hops at a time, and the tortoise performing one hop at that time.
When the hare reaches the end of the list, then we know that the tortoise is halfway, and thus can split the list in half: the list seen thus far is the first half, and the list still to enumerate over, is the second half.
An algorithm thus looks like:
half :: [a] -> ([a], [a])
half h = go h h
where go (_:(_:hs)) (t:ts) = (..., ...)
where (a, b) = go ...
go _ (t:ts) = (..., ...)
go _ [] = (..., ...)
with the ... parts still to fill in.
I tried to implement a function that takes a limit and a string, parses the string and tests if the parsed number exceeds the limit. The function works well only for strings without 0s in it, like "123". However, it could not parse strings correctly like "100", whose result is 1.
What caused this problem?
Below is the code.
reachBounded :: Int -> String -> Maybe Int
reachBounded limit str = case str of
"" -> Nothing
"0" -> Just 0
_ -> foldr (\digit s -> do
sum <- s
let n = sum * 10 + digitToInt digit
guard (isDigit digit)
guard (n <= limit)
return n)
(Just 0) str
Moreover, is there any way to debug this code like we normally do in imperative languages? I found ghci debugger only able to print the type, not the value.
This is a very imperative way of solving the problem, and if you keep thinking like that you're going to have difficulties moving forward.
Here's how you might want to re-think the problem:
Replace "I have a list of characters, but I want digits, I'll iterate and replace them one by one" with "I have a list of characters but I want digits, I'll just replace them all at once" (I'm going to assume you want to actually parse the string yourself fully manually rather than just using read or some kind of parsing tool)
So far we have:
reachBounded limit str = ... map digitToInt str
Next, you want to turn these digits into a number. Replace "I want to iterate through this list increment a sum" with "I need to know the place value of each digit". We can do this by reversing the digits and multiplying them pairwise with the list [1,10,100,1000...]. We can produce the place value list by mapping (10^) over the list of positive integers, or declaring that each element is 10 times the previous, starting with 1. Let's use the latter:
reachBounded limit str = ... zipWith (*) (iterate (*10) 1) $ reverse $ map digitToInt str
And we want the sum of these place values:
reachBounded limit str = ... where
val = sum $ zipWith (*) (iterate (*10) 1) $ reverse $ map digitToInt str
Lastly, we must check if it's within the bound given:
reachBounded limit str = val <$ guard (val < limit) where
val = sum $ zipWith (*) (iterate (*10) 1) $ reverse $ map digitToInt str
In this case a <$ b will replace the contents of b with a if b is Just something, and leave it alone if b is Nothing.
In terms of debugging, it is now trivial, as it is not some process we need to interrupt, but a series of values that we manipulate to get the desired result. You cannot run part of your process on each step and get a sensible answer, but here we can look at the result produced by any of these stages and see if we are on track.
There isn't a toMaybe :: (a -> Bool) -> a -> Maybe a function. I'm not sure why, but with one and using read, the solution is merely:
bounded l = toMaybe (<l) . read
Or using the Safe library...
bounded l = toMaybe (<l) <=< readMay
Which will not throw exceptions if you don't input a string that actually represents a number.
Now, let's say you really do want to write your algorithm iteratively, maybe you need to for performance or it's just one of those algorithms that doesn't readily admit a declarative implementation (there aren't many of those, though). It's still going to be cleaner to use values instead of exceptions, but you need to stop and look at it sometimes.. so what do you do?
Let's write our own iterator function:
data Iter a b c = Next a | Final b | Error c
iterateE :: (a -> Iter a b c) -> a -> ([a], Either c b)
iterateE f = go where
go x = case f x of
Next a -> let (list, final) = go a in (x:list, final)
Final b -> ([x], Right b)
Error c -> ([x], Left c)
This more directly encapsulates stopping the fold early and tracking the intermediate results - even though you can also just stop folds early and track the intermediate results - this is a simpler way to think about it for now. This will provide you with a complete list of all intermediate states and either a result or error that your iterator function can choose to terminate with.
Transforming your solution into this format...
reachBounded limit str = iterateE iter (Just 0,str) where
iter (n, []) = Final n
iter (n, (s:str)) = Next (do
sum <- s
let n = sum * 10 + digitToInt digit
guard (isDigit digit)
guard (n <= limit)
return n, str)
... we don't don't announce any error in this code, but this will let us see what's happened at each step, and also doesn't have a direction in the fold, so you can't get it backwards between left and right.
I want to calculate the "e" constant using Haskell's (Prelude) built-in until function. I want to do something like this:
enumber = until (>2.7) iter (1 0)
iter x k = x + (1/(fact (k + 1)))
fact k = foldr (*) 1 [1..k]
When I try to run this code, I get this error:
Occurs check: cannot construct the infinite type: a ~ a -> a
Expected type: (a -> a) -> a -> a
Actual type: a -> a -> a
Relevant bindings include enumber :: a -> a (bound at Lab2.hs:65:1)
In the second argument of ‘until’, namely ‘iter’
In the expression: until (> 2.7) iter (1 0)
By "e" I mean e = 2.71828..
The concrete mistake that causes this error is the notation (1 0). This doesn't make any sense in Haskell, it is parsed such that 1 is a function which is applied to 0, and the result then used. You apparently mean to pass both 1 and 0 as (initial) arguments. That's what we have tuples for, written (1,0).
Now, before trying to make anything definitions, we should make clear what types we need and write them out. Always start with your type signatures, they guide you a lot to you the actual definitions should look!
enumber :: Double -- could also be a polymorphic number type, but let's keep it simple.
type Index = Double -- this should, perhaps, actually be an integer, but again for simlicity use only `Double`
fact :: Index -> Double
now, if you want to do something like enumber = until (>2.7) iter (1,0), then iter would need to both add up the series expansion, and increment the k index (until knows nothing about indices), i.e. something like
iter :: (Double, Index) -> (Double, Index)
But right now your iter has a signature more like
iter :: Double -> Index -> Double
i.e. it does not do the index-incrementing. Also, it's curried, i.e. doesn't accept the arguments as a tuple.
Let's try to work with a tuple signature:
iter :: (Double, Index) -> (Double, Index)
iter (x,k) = ( x + 1/(fact (k + 1)), k+1 )
If you want to use this with until, you have the problem that you're always working with tuples, not just with the accumulated results. You need to throw away the index, both in the termination condition and in the final result: this can easily be done with the fst function
enumber = fst $ until ((>2.7) . fst) iter (1,0)
Now, while this version of the code will type-check, it's neither elegant nor efficient nor accurate (being greater than 2.7 is hardly a meaningful condition here...). As chi remarks, a good way of summing up stuff is the scanl function.
Apart from avoiding to manually increment and pass around an index, you should also avoid calculating the entire factorial over and over again. Doing that is a pretty general code smell (there's a reason fact isn't defined in the standard libraries)
recipFacts :: [Double] -- Infinite list of reciprocal factorials, starting from 1/0!
recipFacts = go 1
where go k = 1 : map (/k) (go (k+1))
Incidentally, this can also be written as a scan: scanl (/) 1 [1..] (courtesy of Will Ness).
Next we can use scanl to calculate the partial sums, and use some termination condition. However, because the series converges so quickly, there's actually a hack that works fine and is even simpler:
enumber :: Double
enumber = sum $ takeWhile (>0) recipFacts
-- result: 2.7182818284590455
Here I've used the fact that the fast-growing factorial quickly causes the floating-point reciprocals to underflow to zero.
Of course, really there's not a need to sum anything up yourself at all here: the most to-the-point definition is
enumber = exp 1
and nothing else.
enumber = until (>2.7) iter (1 0)
-- ^^^^^
Above you are applying "function" 1 to argument 0. This can't work.
You may want to use a pair instead (1, 0). In that case, not that iter must be changed to accept and return a pair. Also, the predicate >2.7 must be adapted to pairs.
If you don't want to use pairs, you need a different approach. Look up the scanl function, which you can use to compute partial sums. Then, you can use dropWhile to discard partial sums until some good-enough predicate is satisfied.
An example: the first ten partial sums of n^2.
> take 10 $ scanl (+) 0 [ n^2 | n<-[1..] ]
[0,1,5,14,30,55,91,140,204,285]
Note that this approach works only if you compute all the list elements independently. If you want to reuse some computed value from one element to another, you need something else. E.g.
> take 10 $ snd $ mapAccumL (\(s,p) x -> ((s+p,p*2),s+p)) (0,1) [1..]
[1,3,7,15,31,63,127,255,511,1023]
Dissected:
mapAccumL (\(s,p) x -> ((s+p,p*2),s+p)) (0,1) [1..]
a b c d e
s previous sum
p previous power of two
x current element of [1..]
a next sum
b next power of two
c element in the generated list
d first sum
e first power of two
Still, I am not a big fan of mapAccumL. Using iterate and pairs looks nicer.
I want to write program that takes array of Ints and length and returns array that consist in position i all elements, that equals i, for example
[0,0,0,1,3,5,3,2,2,4,4,4] 6 -> [[0,0,0],[1],[2,2],[3,3],[4,4,4],[5]]
[0,0,4] 7 -> [[0,0],[],[],[],[4],[],[]]
[] 3 -> [[],[],[]]
[2,2] 3 -> [[],[],[2,2]]
So, that's my solution
import Data.List
import Data.Function
f :: [Int] -> Int -> [[Int]]
f ls len = g 0 ls' [] where
ls' = group . sort $ ls
g :: Int -> [[Int]] -> [[Int]] -> [[Int]]
g val [] accum
| len == val = accum
| otherwise = g (val+1) [] (accum ++ [[]])
g val (x:xs) accum
| len == val = accum
| val == head x = g (val+1) xs (accum ++ [x])
| otherwise = g (val+1) (x:xs) (accum ++ [[]])
But query f [] 1000000 works really long, why?
I see we're accumulating over some data structure. I think foldMap. I ask "Which Monoid"? It's some kind of lists of accumulations. Like this
newtype Bunch x = Bunch {bunch :: [x]}
instance Semigroup x => Monoid (Bunch x) where
mempty = Bunch []
mappend (Bunch xss) (Bunch yss) = Bunch (glom xss yss) where
glom [] yss = yss
glom xss [] = xss
glom (xs : xss) (ys : yss) = (xs <> ys) : glom xss yss
Our underlying elements have some associative operator <>, and we can thus apply that operator pointwise to a pair of lists, just like zipWith does, except that when we run out of one of the lists, we don't truncate, rather we just take the other. Note that Bunch is a name I'm introducing for purposes of this answer, but it's not that unusual a thing to want. I'm sure I've used it before and will again.
If we can translate
0 -> Bunch [[0]] -- single 0 in place 0
1 -> Bunch [[],[1]] -- single 1 in place 1
2 -> Bunch [[],[],[2]] -- single 2 in place 2
3 -> Bunch [[],[],[],[3]] -- single 3 in place 3
...
and foldMap across the input, then we'll get the right number of each in each place. There should be no need for an upper bound on the numbers in the input to get a sensible output, as long as you are willing to interpret [] as "the rest is silence". Otherwise, like Procrustes, you can pad or chop to the length you need.
Note, by the way, that when mappend's first argument comes from our translation, we do a bunch of ([]++) operations, a.k.a. ids, then a single ([i]++), a.k.a. (i:), so if foldMap is right-nested (which it is for lists), then we will always be doing cheap operations at the left end of our lists.
Now, as the question works with lists, we might want to introduce the Bunch structure only when it's useful. That's what Control.Newtype is for. We just need to tell it about Bunch.
instance Newtype (Bunch x) [x] where
pack = Bunch
unpack = bunch
And then it's
groupInts :: [Int] -> [[Int]]
groupInts = ala' Bunch foldMap (basis !!) where
basis = ala' Bunch foldMap id [iterate ([]:) [], [[[i]] | i <- [0..]]]
What? Well, without going to town on what ala' is in general, its impact here is as follows:
ala' Bunch foldMap f = bunch . foldMap (Bunch . f)
meaning that, although f is a function to lists, we accumulate as if f were a function to Bunches: the role of ala' is to insert the correct pack and unpack operations to make that just happen.
We need (basis !!) :: Int -> [[Int]] to be our translation. Hence basis :: [[[Int]]] is the list of images of our translation, computed on demand at most once each (i.e., the translation, memoized).
For this basis, observe that we need these two infinite lists
[ [] [ [[0]]
, [[]] , [[1]]
, [[],[]] , [[2]]
, [[],[],[]] , [[3]]
... ...
combined Bunchwise. As both lists have the same length (infinity), I could also have written
basis = zipWith (++) (iterate ([]:) []) [[[i]] | i <- [0..]]
but I thought it was worth observing that this also is an example of Bunch structure.
Of course, it's very nice when something like accumArray hands you exactly the sort of accumulation you need, neatly packaging a bunch of grungy behind-the-scenes mutation. But the general recipe for an accumulation is to think "What's the Monoid?" and "What do I do with each element?". That's what foldMap asks you.
The (++) operator copies the left-hand list. For this reason, adding to the beginning of a list is quite fast, but adding to the end of a list is very slow.
In summary, avoid adding things to the end of a list. Try to always add to the beginning instead. One simple way to do that is to build the list backwards, and then reverse it at the end. A more devious trick is to use "difference lists" (Google it). Another possibility is to use Data.Sequence rather than a list.
The first thing that should be noted is the most obvious way to implement this is use a data structure that allows random access, an array is an obviously choice. Note that you need to add the elements to the array multiple times and somehow "join them".
accumArray is perfect for this.
So we get:
f l i = elems $ accumArray (\l e -> e:l) [] (0,i-1) (map (\e -> (e,e)) l)
And we're good to go (see full code here).
This approach does involve converting the final array back into a list, but that step is very likely faster than say sorting the list, which often involves scanning the list at least a few times for a list of decent size.
Whenever you use ++ you have to recreate the entire list, since lists are immutable.
A simple solution would be to use :, but that builds a reversed list. However that can be fixed using reverse, which results in only building two lists (instead of 1 million in your case).
Your concept of glomming things onto an accumulator is a very useful one, and both MathematicalOrchid and Guvante show how you can use that concept reasonably efficiently. But in this case, there is a simpler approach that is likely also faster. You started with
group . sort $ ls
and this was a very good place to start! You get a list that's almost the one you want, except that you need to fill in some blanks. How can we figure those out? The simplest way, though probably not quite the most efficient, is to work with a list of all the numbers you want to count up to: [0 .. len-1].
So we start with
f ls len = g [0 .. len-1] (group . sort $ ls)
where
?
How do we define g? By pattern matching!
f ls len = g [0 .. len-1] (group . sort $ ls)
where
-- We may or may not have some lists left,
-- but we counted as high as we decided we
-- would
g [] _ = []
-- We have no lists left, so the rest of the
-- numbers are not represented
g ns [] = map (const []) ns
-- This shouldn't be possible, because group
-- doesn't make empty lists.
g _ ([]:_) = error "group isn't working!"
-- Finally, we have some work to do!
g (n:ns) xls#(xl#(x:_):xls')
| n == x = xl : g ns xls'
| otherwise = [] : g ns xls
That was nice, but making the list of numbers isn't free, so you might be wondering how you can optimize it. One method I invite you to try is using your original technique of keeping a separate counter, but following this same sort of structure.