I am struggling to translate this piece of a matrix multiplication in F# into Haskell (pls forget the parallel component):
Parallel.For(0, rowsA, (fun i->
for j = 0 to colsB - 1 do
for k = 0 to colsA - 1 do
result.[i,j] <- result.[i,j] + a.[i,k] * b.[k,j]))
|> ignore
All I managed to put together is
sum (map (\(i, j, k) -> (my.read (a,i,k)) * (my.read (b, k, j))) [ (i, j, k) | i <- [0..rowsA], j <- [0..colsB], k <- [0..colsA] ])
--my.read reads the values of the respective cells from 'my' database
The intention is to read the cells of matrix a and matrix b from my database and do a matrix multiplication that eventually can be carried out in portions by different agents. This is controlled by setting the boundaries for i , j and k but is not relevant here.
I have tried to translate the above F# sample into haskell. The issue I am struggling with is that the result is not the sum over everything but there should be a list of results at the position i, j(F# result.[i,j] - the cell is the result matrix). I do not see how I could emit the right result (i,j). Maybe I must further take this apart?
What exactly is the original code doing? Also, what is the type signature of my.read? I assume it would have a signature similar to Num b => (a, Int, Int) -> IO b, in which case this code will not even compile. If my . read is in the IO monad, then you could write it as:
myfunc = do
let indices = [(i, j, k) | i <- [0..rowsA],
j <- [0..colsB],
k <- [0..colsA]]
-- Since `my . read` returns a value in the IO monad,
-- we can't just multiply the values returned.
r1 <- mapM (\(i, j, k) -> (my . read) (a, i, k)) indices
r2 <- mapM (\(i, j, k) -> (my . read) (b, k, j)) indices
-- We can multiply r1 and r2 together though,
-- since they are values extracted from the IO monad
return $ sum $ zipWith (*) r1 r2
The best advice I can give you right now is to use ghci to figure out your types.
Try to divide
a :: [(a,a,a)]
a = [ (i, j, k) | i <- [0..rowsA], j <- [0..colsB], k <- [0..colsA] ]
into
b :: [[(a,a,a)]]
b = [ [ (i, j, k) | k <- [0..colsA]] | i <- [0..rowsA], j <- [0..colsB] ]
And you have a list of "lines" - matrix
And the list of sum is
m = [ [ (i, j, k) | k <- [0..colsA]] | i <- [0..rowsA], j <- [0..colsB] ]
listSum = map sum $ map (map (\(i,j,k) -> my_read (a,i,k) * my_read(b,k,j))) m
Related
Often times you want the performance of arrays over linked lists while having not conforming to the requirement of having rectangular arrays.
As an example consider an hexagonal grid, here shown with the 1-distance neighbors of cell (3, 3) in medium gray and the 2-distance neighbors in light gray.
Say we want an array that contains, for each cell, the indices of every 1- and 2-distance neighbor for that cell. One slight issue is that cells have a different amount of X-distance neighbors -- cells on the grid border will have fewer neighbors than cells closer to the grid center.
(We want an array of neighbor indices --- instead of a function from cell coordinates to neighbor indices --- for performance reasons.)
We can work around this problem by keeping track of how many neighbors each cell has. Say you have an array
neighbors2 of size R x C x N x 2, where R is the number of grid rows, C for columns, and N is the maximum number of 2-distance neighbors for any cell in the grid.
Then, by keeping an additional array n_neighbors2 of size R x C, we can keep track of which indices in neighbors2 are populated and which are just zero padding. For example, to retrieve the the 2-distance neighbors of cell (2, 5), we simply index into the array as such:
someNeigh = neighbors2[2, 5, 0..n_neighbors2[2, 5], ..]
someNeigh will be a n_neighbors2[2, 5] x 2 array (or view) of indicies, where someNeigh[0, 0] yields the row of the first neighbor, and someNeigh[0, 1] yields the column of the first neighbor and so forth.
Note that the elements at the positions
neighbors2[2, 5, n_neighbors2[2, 5]+1.., ..]
are irrelevant; this space is just padding to keep the matrix rectangular.
Provided we have a function for finding the d-distance neighbors for any cell:
import Data.Bits (shift)
rows, cols = (7, 7)
type Cell = (Int, Int)
generateNeighs :: Int -> Cell -> [Cell]
generateNeighs d cell1 = [ (row2, col2)
| row2 <- [0..rows-1]
, col2 <- [0..cols-1]
, hexDistance cell1 (row2, col2) == d]
hexDistance :: Cell -> Cell -> Int
hexDistance (r1, c1) (r2, c2) = shift (abs rd + abs (rd + cd) + abs cd) (-1)
where
rd = r1 - r2
cd = c1 - c2
How can we create the aforementioned arrays neighbors2 and n_neighbors2? Assume we know the maximum amount of 2-distance neighbors N beforehand. Then it is possible to modify generateNeighs to always return lists of the same size, as we can fill up remaining entries with (0, 0). That leaves, as I see it, two problems:
We need a function to populate neighbors2 which operates not every individual index but on a slice, in our case it should fill one cell at a time.
n_neighbors2 should be populated simultaneously as neighbors2
A solution is welcome with either repa or accelerate arrays.
Here's you picture skewed 30 degrees to the right:
As you can see your array is actually perfectly rectangular.
The indices of a neighborhood's periphery are easily found as six straight pieces around the chosen center cell, e.g. (imagine n == 2 is the distance of the periphery from the center (i,j) == (3,3) in the picture):
periphery n (i,j) =
-- 2 (3,3)
let
((i1,j1):ps1) = reverse . take (n+1) . iterate (\(i,j)->(i,j+1)) $ (i-n,j)
-- ( 1, 3)
((i2,j2):ps2) = reverse . take (n+1) . iterate (\(i,j)->(i+1,j)) $ (i1,j1)
-- ( 1, 5)
.....
ps6 = ....... $ (i5,j5)
in filter isValid (ps6 ++ ... ++ ps2 ++ ps1)
The whole neighborhood is simply
neighborhood n (i,j) = (i,j) : concat [ periphery k (i,j) | k <- [1..n] ]
For each cell/distance combination, simply generate the neighborhood indices on the fly and access your array in O(1) time for each index pair.
Writing out the answer from #WillNess in full, and incorporating the proposal from #leftroundabout to store indecies in a 1D vector instead, and we get this:
import qualified Data.Array.Accelerate as A
import Data.Array.Accelerate (Acc, Array, DIM1, DIM2, DIM3, Z(..), (:.)(..), (!), fromList, use)
rows = 7
cols = 7
type Cell = (Int, Int)
(neighs, nNeighs) = generateNeighs
-- Return a vector of indices of cells at distance 'd' or less from the given cell
getNeighs :: Int -> Cell -> Acc (Array DIM1 Cell)
getNeighs d (r,c) = A.take n $ A.drop start neighs
where
start = nNeighs ! A.constant (Z :. r :. c :. 0)
n = nNeighs ! A.constant (Z :. r :. c :. d)
generateNeighs :: (Acc (Array DIM1 Cell), Acc (Array DIM3 Int))
generateNeighs = (neighsArr, nNeighsArr)
where
idxs = concat [[(r, c) | c <- [0..cols-1]] | r <- [0..rows-1]]
(neighsLi, nNeighsLi, n) = foldl inner ([], [], 0) idxs
neighsArr = use $ fromList (Z :. n) neighsLi
nNeighsArr = use $ fromList (Z :. rows :. cols :. 5) nNeighsLi
inner (neighs', nNeighs', n') idx = (neighs' ++ cellNeighs, nNeighs'', n'')
where
(cellNeighs, cellNNeighs) = neighborhood idx
n'' = n' + length cellNeighs
nNeighs'' = nNeighs' ++ n' : cellNNeighs
neighborhood :: Cell -> ([Cell], [Int])
neighborhood (r,c) = (neighs, nNeighs)
where
neighsO = [ periphery d (r,c) | d <- [1..4] ]
neighs = (r,c) : concat neighsO
nNeighs = tail $ scanl (+) 1 $ map length neighsO
periphery d (r,c) =
-- The set of d-distance neighbors form a hexagon shape. Traverse each of
-- the sides of this hexagon and gather up the cell indices.
let
ps1 = take d . iterate (\(r,c)->(r,c+1)) $ (r-d,c)
ps2 = take d . iterate (\(r,c)->(r+1,c)) $ (r-d,c+d)
ps3 = take d . iterate (\(r,c)->(r+1,c-1)) $ (r,c+d)
ps4 = take d . iterate (\(r,c)->(r,c-1)) $ (r+d,c)
ps5 = take d . iterate (\(r,c)->(r-1,c)) $ (r+d,c-d)
ps6 = take d . iterate (\(r,c)->(r-1,c+1)) $ (r,c-d)
in filter isValid (ps6 ++ ps5 ++ ps4 ++ ps3 ++ ps2 ++ ps1)
isValid :: Cell -> Bool
isValid (r, c)
| r < 0 || r >= rows = False
| c < 0 || c >= cols = False
| otherwise = True
This can be by using the permute function to fill the neighbors for 1 cell at a time.
import Data.Bits (shift)
import Data.Array.Accelerate as A
import qualified Prelude as P
import Prelude hiding ((++), (==))
rows = 7
cols = 7
channels = 70
type Cell = (Int, Int)
(neighs, nNeighs) = fillNeighs
getNeighs :: Cell -> Acc (Array DIM1 Cell)
getNeighs (r, c) = A.take (nNeighs ! sh1) $ slice neighs sh2
where
sh1 = constant (Z :. r :. c)
sh2 = constant (Z :. r :. c :. All)
fillNeighs :: (Acc (Array DIM3 Cell), Acc (Array DIM2 Int))
fillNeighs = (neighs2, nNeighs2)
where
sh = constant (Z :. rows :. cols :. 18) :: Exp DIM3
neighZeros = fill sh (lift (0 :: Int, 0 :: Int)) :: Acc (Array DIM3 Cell)
-- nNeighZeros = fill (constant (Z :. rows :. cols)) 0 :: Acc (Array DIM2 Int)
(neighs2, nNeighs2li) = foldr inner (neighZeros, []) indices
nNeighs2 = use $ fromList (Z :. rows :. cols) nNeighs2li
-- Generate indices by varying column fastest. This assures that fromList, which fills
-- the array in row-major order, gets nNeighs in the correct order.
indices = foldr (\r acc -> foldr (\c acc2 -> (r, c):acc2 ) acc [0..cols-1]) [] [0..rows-1]
inner :: Cell
-> (Acc (Array DIM3 Cell), [Int])
-> (Acc (Array DIM3 Cell), [Int])
inner cell (neighs, nNeighs) = (newNeighs, n : nNeighs)
where
(newNeighs, n) = fillCell cell neighs
-- Given an cell and a 3D array to contain cell neighbors,
-- fill in the neighbors for the given cell
-- and return the number of neighbors filled in
fillCell :: Cell -> Acc (Array DIM3 Cell) -> (Acc (Array DIM3 Cell), Int)
fillCell (r, c) arr = (permute const arr indcomb neighs2arr, nNeighs)
where
(ra, ca) = (lift r, lift c) :: (Exp Int, Exp Int)
neighs2li = generateNeighs 2 (r, c)
nNeighs = P.length neighs2li
neighs2arr = use $ fromList (Z :. nNeighs) neighs2li
-- Traverse the 3rd dimension of the given cell
indcomb :: Exp DIM1 -> Exp DIM3
indcomb nsh = index3 ra ca (unindex1 nsh)
generateNeighs :: Int -> Cell -> [Cell]
generateNeighs d cell1 = [ (row2, col2)
| row2 <- [0..rows]
, col2 <- [0..cols]
, hexDistance cell1 (row2, col2) P.== d]
-- Manhattan distance between two cells in an hexagonal grid with an axial coordinate system
hexDistance :: Cell -> Cell -> Int
hexDistance (r1, c1) (r2, c2) = shift (abs rd + abs (rd + cd) + abs cd) (-1)
where
rd = r1 - r2
cd = c1 - c2
I want to do something like
array ((0,0), (25, 25)) [((i,j), 1) | i <- [0..25], j <- [i..25]]
which you can see by the array index, is only defined when i <= j. However, when I try to print this out in ghci I get an error because it tries to print things like (1,0) due to the array bounds.
((1,0),*** Exception: (Array.!): undefined array element
I could just have the array be square and put something like 0's in those entries, but I think that would be suboptimal. Is there a way I can set up the bounds of this array to be "triangular"?
A simple upper triangular index can be defined as:
import Data.Ix (Ix, range, index, inRange)
data UpperTriagIndex = Int :. Int
deriving (Show, Ord, Eq)
instance Ix UpperTriagIndex where
range (a :. b, c :. d) = concatMap (\i -> (i :.) <$> [max i b..d]) [a..c]
inRange (a :. b, c :. d) (i :. j) = a <= i && i <= c && b <= j && j <= d
index pr#(a :. b, c :. d) ix#(i :. j)
| inRange pr ix = f a - f i + j - i
| otherwise = error "out of range!"
where f x = let s = d + 1 - max x b in s * (s + 1) `div` 2
One can verify that range and index round trip even if the array is not square. For example:
\> let pr = (0 :. 0, 3 :. 5) in index pr <$> range pr
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17] -- [0..17]
and:
\> import Data.Array (array, (!))
\> let f i j = (i :. j, "row: " ++ show i ++ ", col: " ++ show j)
\> let a = array ((0 :. 0), (3 :. 3)) [f i j | i <- [0..3], j <- [i..3]]
\> a ! (2 :. 3)
"row: 2, col: 3"
all.
While trying to solve some programming quiz:
https://www.hackerrank.com/challenges/missing-numbers
, I came across with space leak.
Main function is difference, which implements multi-set difference.
I've found out that List ':' and Triples (,,) kept on heaps
with -hT option profiling. However, only big lists are difference's
two arguments, and it shrinks as difference keeps on tail recursion.
But the memory consumed by lists keeps increasing as program runs.
Triples is ephemeral array structure, used for bookkeeping the count of multiset's each element. But the memory consumed by triples also
keeps increasing, and I cannot find out why.
Though I've browsed similar 'space leak' questions in stackoverflow,
I couldn't grasp the idea. Surely I have much to study.
I appreciate any comments. Thank you.
p.s) executable is compiled with -O2 switch.
$ ./difference -hT < input04.txt
Stack space overflow: current size 8388608 bytes.
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.6.3
.
import Data.List
import Data.Array
-- array (non-zero-count, start-offset, array_data)
array_size=101
myindex :: Int -> Int -> Int
myindex key offset
| key >= offset = key - offset
| otherwise = key - offset + array_size
mylookup x (_,offset,arr) = arr ! idx
where idx = myindex x offset
addOrReplace :: Int -> Int -> (Int, Int, Array Int (Int,Int)) -> (Int, Int, Array Int (Int,Int))
addOrReplace key value (count,offset,arr) = (count', offset, arr // [(idx,(key,value))])
where idx = myindex key offset
(_,prev_value) = arr ! idx
count' = case (prev_value, value) of
(0,0) -> count
(0,_) -> count + 1
(_,0) -> count - 1
otherwise -> count
difference :: (Int,Int,Array Int (Int,Int)) -> [Int] -> [Int] -> [Int]
difference (count,offset,arr) [] []
| count == 0 = []
| otherwise = [ k | x <- [0..array_size-1], let (k,v) = (arr ! x), v /= 0]
difference m (x:xs) y = difference new_m xs y
where (_,v) = mylookup x m
new_m = addOrReplace x (v + 1) m
difference m [] (y:ys) = difference new_m [] ys
where (_,v) = mylookup y m
new_m = if v == 0
then m
else addOrReplace y (v - 1) m
main = do
n <- readLn :: IO Int
pp <- getLine
m <- readLn :: IO Int
qq <- getLine
let p = map (read :: String->Int) . words $ pp
q = map (read :: String->Int) . words $ qq
startArray = (0,head q, array (0,100) [(i,(0,0)) | i <- [0..100]] )
putStrLn . unwords . map show . sort $ difference startArray q p
[EDIT]
I seq'ed value and Array thanks to Carl's advice.
I attach heap diagram.
[original heap profiling]
[]1
[after seq'ing value v]
difference m (x:xs) y = difference new_m xs y
where (_,v) = mylookup x m
new_m = v `seq` addOrReplace x (v + 1) m
[after seq'ing value v and Array]
difference m (x:xs) y = new_m `seq` difference new_m xs y
where (_,v) = mylookup x m
new_m = v `seq` addOrReplace x (v + 1) m
I see three main problems with this code.
First (and not the cause of the memory use, but definitely the cause of generally poor performance) Array is horrible for this use case. O(1) lookups are useless when updates are O(n).
Speaking of, the values being stored in the Array aren't forced while difference is looping over its first input. They are thunks containing pointers to an unevaluated lookup in the previous version of the array. You can ensure that the value is evaluated at the same time the array is updated, in a variety of ways. When difference loops over its second input, it does this accidentally, in fact, by comparing the value against 0.
Third, difference doesn't even force the evaluation of the new arrays being created while traversing its first argument. Nothing requires the old array to be evaluated during that portion of the loop.
Both of those latter issues need to be resolved to fix the space leak. The first issue doesn't cause a space leak, just much higher overheads than needed.
I've got a function, in my minimum example called maybeProduceValue i j, which is only valid when i > j. Note that in my actual code, the js are not uniform and so the data only resembles a triangular matrix, I don't know what the mathematical name for this is.
I'd like my code, which loops over i and j and returns essentially (where js is sorted)
[maximum [f i j | j <- js, j < i] | i <- [0..iMax]]
to not check any more j's once one has failed. In C-like languages, this is simple as
if (j >= i) {break;}
and I'm trying to recreate this behaviour in Haskell. I've got two implementations below:
one which tries to take advantage of laziness by using takeWhile to only inspect at most one value (per i) which fails the test and returns Nothing;
one which remembers the number of js which worked for the previous i and so, for i+1, it doesn't bother doing any safety checks until it exceeds this number.
This latter function is more than twice as fast by my benchmarks but it really is a mess - I'm trying to convince people that Haskell is more concise and safe while still reasonably performant and here is some fast code which is dense, cluttered and does a bunch of unsafe operations.
Is there a solution, perhaps using Cont, Error or Exception, that can achieve my desired behaviour?
n.b. I've tried using Traversable.mapAccumL and Vector.unfoldrN instead of State and they end up being about the same speed and clarity. It's still a very overcomplicated way of solving this problem.
import Criterion.Config
import Criterion.Main
import Control.DeepSeq
import Control.Monad.State
import Data.Maybe
import qualified Data.Traversable as T
import qualified Data.Vector as V
main = deepseq inputs $ defaultMainWith (defaultConfig{cfgSamples = ljust 10}) (return ()) [
bcompare [
bench "whileJust" $ nf whileJust js,
bench "memoised" $ nf memoisedSection js
]]
iMax = 5000
jMax = 10000
-- any sorted vector
js :: V.Vector Int
js = V.enumFromN 0 jMax
maybeProduceValue :: Int -> Int -> Maybe Float
maybeProduceValue i j | j < i = Just (fromIntegral (i+j))
| otherwise = Nothing
unsafeProduceValue :: Int -> Int -> Float
-- unsafeProduceValue i j | j >= i = error "you fool!"
unsafeProduceValue i j = fromIntegral (i+j)
whileJust, memoisedSection
:: V.Vector Int -> V.Vector Float
-- mean: 389ms
-- short circuits properly
whileJust inputs' = V.generate iMax $ \i ->
safeMax . V.map fromJust . V.takeWhile isJust $ V.map (maybeProduceValue i) inputs'
where safeMax v = if V.null v then 0 else V.maximum v
-- mean: 116ms
-- remembers the (monotonically increasing) length of the section of
-- the vector that is safe. I have tested that this doesn't violate the condition that j < i
memoisedSection inputs' = flip evalState 0 $ V.generateM iMax $ \i -> do
validSection <- state $ \oldIx ->
let newIx = oldIx + V.length (V.takeWhile (< i) (V.unsafeDrop oldIx inputs'))
in (V.unsafeTake newIx inputs', newIx)
return $ V.foldl' max 0 $ V.map (unsafeProduceValue i) validSection
Here's a simple way of solving the problem with Applicatives, provided that you don't need to keep the rest of the list once you run into an issue:
import Control.Applicative
memoizeSections :: Ord t => [(t, t)] -> Maybe [t]
memoizeSections [] = Just []
memoizeSections ((x, y):xs) = (:) <$> maybeProduceValue x y <*> memoizeSections xs
This is equivalent to:
import Data.Traversable
memoizeSections :: Ord t => [(t, t)] -> Maybe [t]
memoizeSections = flip traverse (uncurry maybeProduceValue)
and will return Nothing on the first occurrence of failure. Note that I don't know how fast this is, but it's certainly concise, and arguably pretty clear (particularly the first example).
Some minor comments:
-- any sorted vector
js :: V.Vector Int
js = V.enumFromN 0 jMax
If you have a vector of Ints (or Floats, etc), you want to use Data.Vector.Unboxed.
maybeProduceValue :: Int -> Int -> Maybe Float
maybeProduceValue i j | j < i = Just (fromIntegral (i+j))
| otherwise = Nothing
Since Just is lazy in its only field, this will create a thunk for the computation fromIntegral (i+j). You almost always want to apply Just like so
maybeProduceValue i j | j < i = Just $! fromIntegral (i+j)
There are some more thunks in:
memoisedSection inputs' = flip evalState 0 $ V.generateM iMax $ \i -> do
validSection <- state $ \oldIx ->
let newIx = oldIx + V.length (V.takeWhile (< i) (V.unsafeDrop oldIx inputs'))
in (V.unsafeTake newIx inputs', newIx)
return $ V.foldl' max 0 $ V.map (unsafeProduceValue i) validSection
Namely you want to:
let !newIx = oldIx + V.length (V.takeWhile (< i) (V.unsafeDrop oldIx inputs'))
!v = V.unsafeTake newIx inputs'
in (v, newIx)
as the pair is lazy in its fields and
return $! V.foldl' max 0 $ V.map (unsafeProduceValue i) validSection
because return in the state monad is lazy in the value.
You can use a guard in a single list comprehension:
[f i j | j <- js, i <- is, j < i]
If you're trying to get the same results as
[foo i j | i <- is, j <- js, j < i]
when you know that js is increasing, just write
[foo i j | i <- is, j <- takeWhile (< i) js]
There's no need to mess around with Maybe for this. Note that making the input list global has a likely-unfortunate effect: instead of fusing the production of the input list with its transformation(s) and ultimate consumption, it's forced to actually construct the list and then keep it in memory. It's quite possible that it will take longer to pull the list into cache from memory than to generate it piece by piece on the fly!
I hope someone can help figure out where my error lies. Calling g 3 4 0 2 (M.empty,0) [], I would expect [[2,1,0,1]] as a result. Instead, I'm seeing [[2,1,0,1],[2,1,0,1]].
The program is supposed to accumulate distinct digit patterns of length m by adding a different digit to the list each time, returning back down when reaching n-1 and up when reaching 0. The apparent problem happens in the middle when the function is called recursively for both the up and down directions.
If I comment out line 11 like so:
else g n m (digitCount + 1) (lastDigit + 1) (hash',hashCount') (lastDigit:digits)
-- g n m (digitCount + 1) (lastDigit - 1) (hash',hashCount') (lastDigit:digits)
I get the correct result []
As when commenting out line 11 and modifying line 10 to:
else g n m (digitCount + 1) (lastDigit - 1) (hash',hashCount') (lastDigit:digits)
Again, a correct result [[2,1,0,1]]
Why when calling g twice using the ++ operator, I'm getting two [2,1,0,1]'s instead of just one? In my thinking, each result in g should be distinct because in any recursive call, a different order of digits is (or should be) accumulating.
Thanks in advance.
import qualified Data.Map as M
g :: Int -> Int -> Int -> Int -> (M.Map Int Bool, Int) -> [Int] -> [[Int]]
g n m digitCount lastDigit (hash,hashCount) digits
| digitCount == m = if test then [reverse digits] else []
| otherwise =
if lastDigit == 0
then g n m (digitCount + 1) (lastDigit + 1) (hash',hashCount') (lastDigit:digits)
else if lastDigit == n - 1
then g n m (digitCount + 1) (lastDigit - 1) (hash',hashCount') (lastDigit:digits)
else g n m (digitCount + 1) (lastDigit + 1) (hash',hashCount') (lastDigit:digits)
++ g n m (digitCount + 1) (lastDigit - 1) (hash',hashCount') (lastDigit:digits)
where test = hashCount == n
(hash',hashCount') =
if test
then (M.empty,hashCount)
else case M.lookup lastDigit hash of
Just anyting -> (hash,hashCount)
Nothing -> (M.insert lastDigit True hash,hashCount + 1)
Now that you've got it working, here's a more generic approach.
We need to walk the tree of solutions.
data S a = Solution a | Explore [S a]
Solutions are leaves of this tree, Explore are lists of options to explore.
-- this is very much unfoldr-like
generator :: [S a] -> [a]
generator [] = []
generator (Solution a: ss) = a: generator ss
generator (Explore ps: ss) = generator $ ss ++ ps
Now, given a list of "maybe-solutions", produce a list of solutions. The generator pattern-matches Explores, and appends the list of solutions to explore to the end of the list. This way we are exploring the solutions breadth-first, and that way we can deal with non-terminating branches. (Depth-first can't get out of non-terminating branches). This of course is at expense of memory, but you can find a finite number of solutions even for problems with infinite number of solutions.
Now, the function that generates solutions for your problem:
g :: Int -> Int -> [S [Int]]
g n m = [Explore $ g' [i] (S.singleton i) | i <- [1..n-1]] where
g' is#(h:_) ms
| h < 0 || h >= n || length is > m = [] --no solution, nothing to explore
| otherwise = maybeSolution ++
[ Explore $ g' ((h-1):is) $ S.insert (h-1) ms
, Explore $ g' ((h+1):is) $ S.insert (h+1) ms ]
where
maybeSolution
| S.size ms == n = [Solution is]
| otherwise = []
Given n and m, produces a list of subtrees to Explore. g' is the helper function that produces a list of subtrees, given a list of Int already produced and a Set of Int already used. So, there is a definite termination condition: we appended a number outside the needed range, or the list became too long - exploring any further cannot produce Solutions, so return []. Otherwise, we are within bounds, maybeSolution sees if the list of Ints is already a valid solution, and suggests more subtrees to explore.
main = print $ map reverse $ generator $ g 3 6
Your problem solved.
Why when calling g twice using the ++ operator, I'm getting two [2,1,0,1]'s instead of just
one? In my thinking, each result in g should be distinct because in any recursive call, a
different order of digits is (or should be) accumulating.
But your pair of (Map,Int) is the same in both calls, so the recursive calls don't know what has been found by the other call. Consider the call g ... (lastDigit-1). It will also call g ... (lastDigit) (by adding 1 to (lastDigit-1) that it got), and follow the branch g ... (lastDigit+1) to produce the same result.
Also, (Map a ()) is a (Set a), and since you don't use the Bool value from the map, it is the same as ():
import qualified Data.Set as S
g :: Int -> Int -> Int -> Int -> (S.Set Int, Int) -> [Int] -> [[Int]]
g n m digitCount lastDigit (hash,hashCount) digits
| digitCount == m = if test then [reverse digits] else []
| lastDigit < 0 || lastDigit == n = []
| otherwise = g n m d' (lastDigit + 1) h' (lastDigit:digits)
++ g n m d' (lastDigit - 1) h' (lastDigit:digits)
where test = hashCount == n
d' = digitCount + 1
h'
| test = (S.empty,hashCount)
| S.member lastDigit hash = (hash,hashCount)
| otherwise = (S.insert lastDigit hash,hashCount + 1)
In your two recursive calls to g combined with (++) in the final else branch, you are passing exactly the same parameters except for lastDigit.
The base case of your recursion doesn't look at lastDigit - it just compares m and digitCount, n and hashCount and then returns [reverse digits].
So in any situation where the (++) case is hit immediately followed by the base case returning [reverse digits], you'll get the same value repeated.
I didn't fully understand your problem specification but perhaps you need to add the "new" value for lastDigit to digits when you make the recursive calls - i.e. (lastDigit-1):digits or (lastDigit+1):digits.