Haskell space leak in implementation of BFS - haskell

I have been banging my head against a Haskell space leak (of the stack overflow kind, naturally) for a few straight days. It's frustrating because I'm attempting to mimic the BFS algorithm straight from CLR, which is not naturally recursive. NB: I have enabled BangPatterns and I have put a bang in front of every possible place where one can go, in an attempt to branch-and-bound this problem, with no effect. I have battled through space leaks before, and I am loth to give up and cry for help on this one, but at this point I'm stuck. I love coding in Haskell, and I understand the Zen of functional programming pretty well, but debugging space leaks is about as much fun as rolling around on a floor full of thumbtacks.
That said, my trouble appears to be a space leak of the typical "accumulator" kind. The stack evidently builds up around calls to bfs' in the code below. Any space-leak protips much appreciated.
import qualified Data.Map as M
import qualified Data.IntSet as IS
import qualified Data.Sequence as S
import qualified Data.List as DL
data BfsColor = White | Gray | Black deriving Show
data Node =
Node {
neighbors :: !IS.IntSet,
color :: !BfsColor,
depth :: !Int
}
type NodeID = Int
type NodeQueue = S.Seq NodeID
type Graph = M.Map NodeID Node
bfs :: Graph -> NodeID -> Graph
bfs graph start_node =
bfs' (S.singleton start_node) graph
bfs' :: NodeQueue -> Graph -> Graph
bfs' !queue !graph
| S.null queue = graph
| otherwise =
let (u,q1) = pop_left queue
Node children _ n = graph M.! u
(g2,q2) = IS.fold (enqueue_child_at_depth $ n+1) (graph,q1) children
g3 = set_color u Black g2
in bfs' q2 g3
enqueue_child_at_depth :: Int -> NodeID -> (Graph, NodeQueue)
-> (Graph, NodeQueue)
enqueue_child_at_depth depth child (graph,!queue) =
case get_color child graph of
White -> (set_color child Gray $ set_depth child depth graph,
queue S.|> child)
otherwise -> (graph,queue)
pop_left :: NodeQueue -> (NodeID, NodeQueue)
pop_left queue =
let (a,b) = S.splitAt 1 queue
in (a `S.index` 0, b)
set_color :: NodeID -> BfsColor -> Graph -> Graph
set_color node_id c graph =
M.adjust (\node -> node{color=c}) node_id graph
get_color :: NodeID -> Graph -> BfsColor
get_color node_id graph = color $ graph M.! node_id
set_depth :: NodeID -> Int -> Graph -> Graph
set_depth node_id d graph =
M.adjust (\node -> node{depth=d}) node_id graph

That looks much easier to understand. (You can still shrink the code by 1/2, though.)
Now, the nature of the space leak becomes apparent. Namely, the one thing that is never evaluated is the depth. It will pile up to a big expression 1+1+.... You can remove all the bang patterns and add a single one at
enqueue_child_at_depth !depth child (graph,queue)
to get rid of the space leak.
(Further code tips: You can replace the IS.IntSet by a simple list. The queue is best deconstructed and reconstructed along the lines of
go depth qs graph = case viewl qs of
EmptyL -> graph
q :< qs ->
let
qs' = (qs ><) . Seq.fromList
. filter (\q -> isWhite q graph)
. neighbors q $ graph
in ...
)

First of all, if would be very helpful if you could provide some simple test case (in the form of code) which demonstrates how this thing stack overflows.
Without it I, personally, can only speculate on the subject of reason for that.
As a speculation: is IS.fold strict enough? Well, for example the following simplest code stack overflows as well (GHC with -O2):
{-# LANGUAGE BangPatterns #-}
import qualified Data.IntSet as IS
test s = IS.fold it 1 s
where it !e !s = s+e
main = print $ test (IS.fromList [1..1000000])
The overflow problem with this code can be hackafixed (is there a better way?) like that:
test s = foldl' it 1 (IS.toList s)
where it !e !s = s+e
Maybe you want to look at IS.fold in your code as well.

Related

Avoiding thunks in sparsely evaluated list generated by monadic unfold

I have a simulation library that uses the FFI wrapped in a monad M, carrying a context. All the foreign functions are pure, so I've decided to make the monad lazy, which is normally convenient for flow-control. I represent my simulation as a list of simulation-frames, that I can consume by either writing to a file, or by displaying the frame graphically.
simulation :: [(Frame -> M Frame)] -> Frame -> M [Frame]
simulation [] frame = return [frame]
simulation (step:steps) frame
= step frame >>= fmap (frame:) . simulation steps
Each frame consists of a tuple of newtype-wrapped ForeignPtrs that I can lift to my Haskell representation with
lift :: Frame -> M HFrame
Since the time-steps in my simulation are quite short, I only want to look at every n frames, for which I use
takeEvery n l = foldr cons nil l 0 where
nil _ = []
cons x rest 0 = x : rest n
cons x rest n = rest (n-1)
So my code looks something like
main = consume
$ takeEvery n
$ runM
$ simulation steps initialFrame >>= mapM lift
Now, the problem is that as I increase n, a thunk builds up. I've tried a couple of different ways to try to strictly evaluate each frame in simulation, but I have yet to figure out how to do so. ForeignPtr doesn't appear to have a NFData instance, so I can't use deepseq, but all my attempts with seq, including using seq on each element in the tuple, have been without noticeable effect.
EDIT:
Upon request, I have included more specifics, that I initially excluded since I think they are probably mostly noise for this question.
The monad
newtype FT c a = FT (Context -> a)
instance Functor (FT c) where
fmap f (FT a) = FT (f.a)
instance Applicative (FT c) where
pure a = FT (\_ -> a)
(<*>) (FT a) (FT b) = FT (\c -> a c $ b c)
instance Monad (FT c) where
return = pure
(>>=) (FT a) f = FT (\c -> (\(FT b) -> b c) $ f $ a c)
runFTIn :: Context -> (forall c. FT c a) -> a
runFTIn context (FT a) = a context
runFTWith :: [ContextOption] -> (forall c. FT c a) -> a
runFTWith options a
= unsafePerformIO
$ getContext options >>= \c -> return $ runFTIn c a
runFT = runFTWith []
unsafeLiftFromIO :: (Context -> IO a) -> FT c a
unsafeLiftFromIO a = FT (\c -> unsafePerformIO $ a c)
All the foreign functions are lifted from IO with unsafeLiftFromIO
newtype Box c = Box (ForeignPtr RawBox)
newtype Coordinates c = Coordinates (ForeignPtr RawCoordinates)
type Frame c = (Box c, Coordinates c)
liftBox :: Box c -> FT c HBox
liftCoordinates :: Coordinates c -> FT c HCoordinates
liftFrame (box, coordinates) = do
box' <- liftBox box
coordinates' <- liftCoordinates coordinates
return (box', coordinates')
The steps themselves are supposed to be arbitrary (Frame c -> FT c (Frame c)), so strictness should preferably be in the higher level code.
EDIT2:
I have now tried to use Streamly, however the problem persists, so I think the issue really is finding a way to strictly evaluate ForeignPtrs.
current implementations:
import Streamly
import qualified Streamly.Prelude as S
import qualified Streamly.Internal.Data.Stream.Serial as Serial
takeEvery n = Serial.unfoldrM ((fmap.fmap) (\(h, t) -> (h, S.drop (n-1) t)) . S.uncons)
(#) = flip ($)
simulation
:: (IsStream t)
=> Frame c
-> t (FT c) (Frame c -> FT c (Frame c))
-> t (FT c) (Frame c)
simulation frame = S.scanlM' (#) frame
EDIT3:
To clarify the symptoms and how I have diagnosed the problem.
The library calls OpenCL functions running on a GPU. I am sure that the freeing of the pointers is handled correctly - the ForeignPtrs have the correct freeing functions, and memory use is independent of total number of steps as long as this number is larger than n. What I find is that memory use on the GPU is basically linearly correlated to n. The consumer I've been using for this testing is
import qualified Data.ByteString.Lazy as BL
import Data.Binary
import Data.Binary.Put
writeTrajectory fn = fmap (BL.writeFile fn . runPut) . S.foldr ((>>).putFrame) (pure ()) . serially
For my streamly implementation, and
writeTrajectory fn = BL.writeFile fn . runPut . MapM_ putFrame
For the original implementation. Both should consume the stream continuously. I've generated the steps for testing with replicate.
I am unsure of how to more precisely analyze the memory-use on the GPU. System memory use is not an issue here.
Update:
I am starting to think it's not a matter of strictness, but of GC-problems. The run-time system does not know the size of the memory allocated on the GPU and so does not know to collect the pointers, this is less of an issue when there is stuff going on CPU-side as well, as that will produce allocations too, activating the GC. This would explain the slightly non-determinstic memory usage, but linear correlation to n that I've seen. How too solve this nicely is another issue, but I suspect there will be a substantial overhaul to my code.
I think the issue really is finding a way to strictly evaluate ForeignPtrs
If that is really the issue, one way to do that is to change the second clause of simulation:
{-# LANGUAGE BangPatterns #-}
simulation :: [(Frame -> M Frame)] -> Frame -> M [Frame]
simulation [] frame = return [frame]
simulation (step:steps) frame#(!_, !_) -- Evaluate both components of the pair
= step frame >>= fmap (frame:) . simulation steps

Avoiding space leaks with `mapM` and `foldM` over `State` monad

How do I avoid space leaks while using foldM and mapM over a State monad?
Last year's Advent of Code day 20 has a puzzle of generating a map of a maze from instructions on how to walk across it. For instance, the instructions NN gives the maze
|
|
*
(a straight corridor two steps northwards), and the instructions NNN(EE|WW)S gives the maze
+-+-+
| | |
|
*
(go north a bit, then either go east then south or west then south).
The way I'm trying to solve this involves having a State monad, where the state is the Set of all the corridor sections (termed Doors below), and the value is the list of positions you could be working from.
If you're just following a corridor Path, I use foldM to walk along it, updating the current position. If you're at a junction, follow each branch of the junction and collect all the positions you end up.
This code produces the correct results on small test inputs, but there's a huge space leak when working on the full example.
Profiling indicates it's spending most of its time in includeDoor.
So, questions.
Is there a space leak? If so, where, and how can you tell.
How do I fix it?
(I think what's happening is that Haskell isn't strictly adding fully-evaluated Doors to the Set as soon as it can. In this case, I don't want any laziness anywhere.)
(I parse the input into a bunch of two-element vectors that indicate the step to take for each instruction. That code works fine, and quickly.)
import qualified Data.Set as S
import Linear (V2(..))
import Control.Monad.State.Strict
import Control.Monad.Extra (concatMapM)
type Coord = V2 Integer -- x, y, with north and east incresing values (origin a bottom left)
data Door = Door Coord Coord deriving (Show, Eq, Ord)
type Doors = S.Set Door
data MazeSection = Path [Coord] | Junction [Maze] deriving (Show, Eq)
type Maze = [MazeSection]
type Mapper = State Doors [Coord]
makeDoor :: Coord -> Coord -> Door
makeDoor !a !b
| a < b = Door a b
| otherwise = Door b a
emptyMap = S.empty
part1 maze =
do
let start = V2 0 0
let doors = execState (mapMaze [start] maze) emptyMap
print $ length doors
mapMaze :: [Coord] -> Maze -> Mapper
mapMaze !starts !sections =
foldM (\heres section -> mapMazeSection heres section) starts sections
mapMazeSection :: [Coord] -> MazeSection -> Mapper
mapMazeSection !starts (Junction mazes) =
concatMapM (\maze -> mapMaze starts maze) mazes
mapMazeSection !starts (Path steps) =
mapM mapPath starts
where mapPath start = foldM (\here step -> includeDoor here step) start steps
includeDoor :: Coord -> Coord -> State Doors Coord
includeDoor !here !step =
do let there = (here + step)
let door = there `seq` makeDoor here there
modify' (door `seq` S.insert door)
return there
Space leaks can be very hard to detect in Haskell. I am no expert, but I have heard there are many problems with the State monad and space leaks. I generally avoid State/StateT and use IORef, MVar or TVar instead, but that would change it to IO. The first thing you can try is adding ! in various let bindings and type definitions.
data Door = Door !Coord !Coord
data Maze = Path ![Coord] | Junction ![Maze]
If that doesn't resolve it, there are some tools that may help you pinpoint where it occurs in this article.
Other resources
Here are some other resources that might help.
ndmithcell on space leak detection
Space leak with nested strict StateT
Pinpointing space leaks in big programs
Turns out, it wasn't a space leak! It was me failing to deal with some pathological input. Once I sorted out how to handle that, it worked, and very quickly.

The Haskell RNG and state

As a Java person learning Haskell I was getting use to the new way of thinking about everything but I've spent half a day trying to implement something with a simple RNG and am getting nowhere. In Java I could crate a static RNG and call it with Classname.random.nextInt(10) and it would meet these criteria:
I wouldn't have to keep a reference to the RNG and I could call it ad-hoc (even from inside a loop or a recursive function)
It would produce a new random number every time it was called
It would produce a new set of random numbers every time the project executed
So far in Haskell I'm facing the classic programmers dilemma - I can have 2/3. I'm still learning and have absolutely no idea about Monads, except that they might be able to help me here.
My Most recent attempt has been this:
getRn :: (RandomGen g) => Int -> Int -> Rand g Int
getRn lo hi= getRandomR (lo,hi)
--EDIT: Trimming my questions so that it's not so long winded, replacing with a summary and then what I ended up doing instead:
After creating a bunch of random cities (for TSP), I maped over them with a function createEdges that took a city and connected it to the rest of the cities: M.mapWithKey (\x y -> (x,(createEdges y [1..3] makeCountry)))
PROBLEM:
I wanted to replace [1..3] with something random. I.e. I wanted to map randomness (IO) over pure code. This caused no end of confusion for me (see people's attempt to answer me below to get a good sense of my confusion). In fact I'm still not even sure if I'm explaining the problem correctly.
I was getting this type of error: Couldn't match expected type [Int] with actual type IO [Int]
SOLUTION:
So after finding out that what I wanted to do was fundamentally wrong in a functional environment, I decided to change my approach. Instead of generating a list of cities and then applying randomness to connect them, I instead created an [[Int]] where each inner list represented the random edges. Thereby creating my randomness at the start of the process, rather than trying to map randomness over the pure code.
(I posted the final result as my own answer, but SO won't let me accept my own answer yet. Once it does I've reached that threshold I'll come back and accept)
You can work with random numbers without any monads or IO at all if you like.
All you have to know is, that as there is state (internal state of the random-number-generator) involved you have to take this state with you.
In my opinion the easiest framework for this is Sytem.Random.
Using this your getRn function could look like this:
getRn :: (RandomGen g) => Int -> Int -> g -> (Int, g)
getRn lo hi g = randomR (lo,hi) g
here you can view g as the state I mentioned above - you put it in and you get another back like this (in ghci):
> let init = mkStdGen 11
> let (myNr, nextGen) = getRn 1 6 init
> myNr
6
> let (myNr, nextGen') = getRn 1 6 nextGen
> myNr
4
I think you can start by using just this - thread the gen around and later when you get all the monad stuff come back and make it a bit easier to write/read.
I don't know the definitions of your data but here is a simple example that uses this technique:
module StackOQuestion where
import System.Random
getRn :: (RandomGen g) => Int -> Int -> g -> (Int, g)
getRn lo hi = randomR (lo,hi)
getRnList :: (RandomGen g) => (g -> (a, g)) -> Int -> g -> ([a], g)
getRnList f n g
| n <= 0 = ([], g)
| otherwise = let (ls, g') = getRnList f (n-1) g
(a, g'') = f g'
in (a:ls, g'')
type City = (Int, Int)
randomCity :: (RandomGen g) => g -> (City, g)
randomCity g =
let (f, g') = getRn 1 6 g
(s, g'') = getRn 1 6 g'
in ((f, s), g'')
randomCities :: (RandomGen g) => (Int, Int) -> g -> ([City], g)
randomCities (minC, maxC) g =
let (count, g') = getRn minC maxC g
in getRnList randomCity count g'
and you can test it like this:
> let init = mkStdGen 23
> randomCities (2,6) init
([(4,3),(1,2)],394128088 652912057)
As you can see this creates two Cities (here simply represented as an integer-pair) - for other values of init you will get other answers.
If you look the right way at this you can see that there is already the beginning of a state-monad there (the g -> ('a, g) part) ;)
PS: mkStdGen is a bit like the Random-initialization you know from Java and co (the part where you usually put your system-clock's tick-count in) - I choose 11 because it was quick to type ;) - of course you will always get the same numbers if you stick with 11 - so you will need to initialize this with something from IO - but you can push this pack to main and keep pure otherwise if you just pass then g around
I would say if you want to work with random numbers, the easiest thing to do is to use an utility library like Control.Monad.Random.
The more educational, work intensive path is to learn to write your own monad like that. First you want to understand the State monad and get comfortable with it. I think studying this older question (disclaimer: I have an answer there) may be a good starting point for studying this. The next step I would take is to be able to write the State monad on my own.
After that, the next exercise I would try is to write a "utility" monad for random number generation. By "utility" monad what I mean is a monad that basically repackages the standard State monad with an API that makes it easier for that specific task. This is how that Control.Monad.Random package is implemented:
-- | A monad transformer which adds a random number generator to an
-- existing monad.
newtype RandT g m a = RandT (StateT g m a)
Their RandT monad is really just a newtype definition that reuses StateT and adds a few utility functions so that you can concentrate on using random numbers rather than on the state monad itself. So for this exercise, you basically design a random number generation monad with the API you'd like to have, then use the State and Random libraries to implement it.
Edit: After a lot more reading and some extra help from a friend, I finally reduced it to this solution. However I'll keep my original solution in the answer as well just in case the same approach helps another newbie like me (it was a vital part of my learning process as well).
-- Use a unique random generator (replace <$> newStdGen with mkStdGen 123 for testing)
generateTemplate = createCitiesWeighted <$> newStdGen
-- create random edges (with weight as pair) by taking a random sized sample of randoms
multiTakePair :: [Int] -> [Int] -> [Int] -> [[(Int,Int)]]
multiTakePair ws (l:ls) is = (zip chunka chunkb) : multiTakePair remaindera ls remainderb
where
(chunkb,remainderb) = splitAt l is
(chunka,remaindera) = splitAt l ws
-- pure version of utilizing multitake by passing around an RNG using "split"
createCitiesWeighted :: StdGen -> [[(Int,Int)]]
createCitiesWeighted gen = take count result
where
(count,g1) = randomR (15,20) gen
(g2,g3) = split g1
cs = randomRs (0, count - 2) g1
es = randomRs (3,7) g2
ws = randomRs (1,10) g3
result = multiTakePair ws es cs
The original solution -----
As well as #user2407038's insightful comments, my solution relied very heavily on what I read from these two questions:
Sampling sequences of random numbers in Haskell
Random Integer in Haskell
(NB. I was having an issue where I couldn't work out how to randomize how many edges each city would have, #AnrewC provided an awesome response that not only answered that question but massively reduce excess code)
module TspRandom (
generateCityTemplate
) where
import Control.Monad (liftM, liftM2) -- promote a pure function to a monad
-- #AndrewC's suggestion
multiTake :: [Int] -> [Int] -> [[Int]]
multiTake (l:ls) is = chunk : multiTake ls remainder
where (chunk,remainder) = splitAt l is
-- Create a list [[Int]] where each inner int is of a random size (3-7)
-- The values inside each inner list max out at 19 (total - 1)
createCities = liftM (take 20) $ liftM2 multiTake (getRandomRs (3,7)) (getRandomRs (0, 19))
-- Run the generator
generateCityTemplate = do
putStrLn "Calculating # Cities"
x <- createCities
print x
return ()
The state monad is actually very simple. It is just a function from a state to a value and a new state, or:
data State s a = State {getState :: s -> (s, a)}
In fact, this is exactly what the Rand monad is. It isn't necessary to understand the mechanics of State to use Rand. You shouldn't be evaluating the Rand inside of IO, just use it directly, using the same do notation you have been using for IO. do notation works for any monad.
createCities :: Rand StdGen Int
createCities = getRn minCities maxCities
x :: Cities -> X
x = ...
func :: Rand StdGen X
func = do
cities <- createCities
return (x cities)
-- also valid
func = cities <$> createCities
func = createCities >>= return . x
You can't write getConnections like you have written it. You must do the following:
getConnections :: City -> Country -> Rand StdGen [Int]
getConnections c country = do
edgeCount <- createEdgeCount
fromIndecies [] edgeCount (citiesExcludeSelf c country)
Any function which calls getConnections will have to also return a value of type Rand StdGen x. You can only get rid of it once you have written the entire algorithm and want to run it.
Then, you can run the result using evalRandIO func, or, if you want to test some algorithm and you want to give it the same inputs on every test, you can use evalRand func (mkStdGen 12345), where 12345, or any other number, is your seed value.

Pseudorandom number generators in Haskell

I'm working on solutions to the latest Programming Praxis puzzles—the first on implementing the minimal standard random number generator and the second on implementing a shuffle box to go with either that one or a different pseudorandom number generator. Implementing the math is pretty straightforward. The tricky bit for me is figuring out how to put the pieces together properly.
Conceptually, a pseudorandom number generator is a function stepRandom :: s -> (s, a) where s is the type of the internal state of the generator and a is the type of randomly chosen object produced. For a linear congruential PRNG, we could have s = a = Int64, for example, or perhaps s = Int64 and a = Double. This post on PSE does a pretty good job of showing how to use a monad to thread the PRNG state through a random computation, and finish things off with runRandom to run a computation with a certain initial state (seed).
Conceptually, a shuffle box is a function shuffle :: box -> a -> (box, a) along with a function to initialize a new box of the desired size with values from a PRNG. In practice, however, the representation of this box is a bit trickier. For efficiency, it should be represented as a mutable array, which forces it into ST or IO. Something vaguely like this:
mkShuffle :: (Integral i, Ix i, MArray a e m) => i -> m e -> m (a i e)
mkShuffle size getRandom = do
thelist <- replicateM (fromInteger.fromIntegral $ size) getRandom
newListArray (0,size-1) thelist
shuffle :: (Integral b, Ix b, MArray a b m) => a b b -> b -> m b
shuffle box n = do
(start,end) <- getBounds box
let index = start + n `quot` (end-start+1)
value <- readArray box index
writeArray box index n
return value
What I really want to do, however, is attach an (initialized?) shuffle box to a PRNG, so as to "pipe" the output from the PRNG into the shuffle box. I don't understand how to set up that plumbing properly.
I'm assuming that the goal is to implement an algorithm as follows: we have a random generator of some sort which we can think of as somehow producing a stream of random values
import Pipes
prng :: Monad m => Producer Int m r
-- produces Ints using the effects of m never stops, thus the
-- return type r is polymorphic
We would like to modify this PRNG via a shuffle box. Shuffle boxes have a mutable state Box which is an array of random integers and they modify a stream of random integers in a particular way
shuffle :: Monad m => Box -> Pipe Int Int m r
-- given a box, convert a stream of integers into a different
-- stream of integers using the effects of m without stopping
-- (polymorphic r)
shuffle works on an integer-by-integer basis by indexing into its Box by the incoming random value modulo the size of the box, storing the incoming value there, and emitting the value which was previously stored there. In some sense it's like a stochastic delay function.
So with that spec let's get to a real implementation. We want to use a mutable array so we'll use the vector library and the ST monad. ST requires that we pass around a phantom s parameter that matches throughout a particular ST monad invocation, so when we write Box it'll need to expose that parameter.
import qualified Data.Vector.Mutable as Vm
import Control.Monad.ST
data Box s = Box { sz :: Int, vc :: Vm.STVector s Int }
The sz parameter is the size of the Box's memory and the Vm.STVector s is a mutable ST Vector linked to the s ST thread. We can immediately use this to build our shuffle algorithm, now knowing that the Monad m must actually be ST s.
import Control.Monad
shuffle :: Box s -> Pipe Int Int (ST s) r
shuffle box = forever $ do -- this pipe runs forever
up <- await -- wait for upstream
next <- lift $ do let index = up `rem` sz box -- perform the shuffle
prior <- Vm.read (vc box) index -- using our mutation
Vm.write (vc box) index up -- primitives in the ST
return prior -- monad
yield next -- then yield the result
Now we'd just like to be able to attach this shuffle to some prng Producer. Since we're using vector it's nice to use the high-performance mwc-random library.
import qualified System.Random.MWC as MWC
-- | Produce a uniformly distributed positive integer
uniformPos :: MWC.GenST s -> ST s Int
uniformPos gen = liftM abs (MWC.uniform gen)
prng :: MWC.GenST s -> Int -> ST s (Box s)
prng gen = forever $ do
val <- lift (uniformPos gen)
yield val
Notice that since we're passing the PRNG seed, MWC.GenST s, along in an ST s thread we don't need to catch modifications and thread them along as well. Instead, mwc-random uses a mutable STRef s behind the scenes. Also notice that we modify MWC.uniform to return positive indices only as this is required for our indexing scheme in shuffle.
We can also use mwc-random to generate our initial box.
mkBox :: MWC.GenST s -> Int -> ST s (Box s)
mkBox gen size = do
vec <- Vm.replicateM size (uniformPos gen)
return (Box size vec)
The only trick here is the very nice Vm.replicateM function which effectively has the constrained type
Vm.replicateM :: Int -> ST s Int -> Vm.STVector s Int
where the second argument is an ST s action which generates a new element of the vector.
Finally we have all the pieces. We just need to assemble them. Fortunately, the modularity we get from using pipes makes this trivial.
import qualified Pipes.Prelude as P
run10 :: MWC.GenST s -> ST s [Int]
run10 gen = do
box <- mkBox gen 1000
P.toListM (prng gen >-> shuffle box >-> P.take 10)
Here we use (>->) to build a production pipeline and P.toListM to run that pipeline and produce a list. Finally we just need to execute this ST s thread in IO which is also where we can create our initial MWC.GenST s seed and feed it to run10 using MWC.withSystemRandom which generates the initial seed from, as it says, SystemRandom.
main :: IO ()
main = do
result <- MWC.withSystemRandom run10
print result
And we have our pipeline.
*ShuffleBox> main
[743244324568658487,8970293000346490947,7840610233495392020,6500616573179099831,1849346693432591466,4270856297964802595,3520304355004706754,7475836204488259316,1099932102382049619,7752192194581108062]
Note that the actual operations of these pieces is not terrifically complex. Unfortunately, the types in ST, mwc-random, vector, and pipes are all each individually highly generalized and thus can be quite burdensome to comprehend at first. Hopefully the above, where I've deliberately weakened and specialized nearly every type to this exact problem, will be much easier to follow and provide a little bit of intuition for how each of these wonderful libraries works individually and together.

need help writing a function candidates in Haskell

Hi take a look at this thread already processing this subject
And also this thread might be of intrest.
Im trying to write a function
candidates :: Sudoku -> Pos -> [Int]
that given a Sudoku
data Sudoku = Sudoku { rows :: [[Maybe Int]] }
deriving ( Show, Eq )
and a position (type Pos = (Int, Int))
determines what numbers that you can write there, for example in a sudoku row that already contains (1,2,4,7,9,x,x) you cant write any of the already existing numbers in the last row. Also the other problem is to check the hight as well as the width so no numbers occur more than once (ordinary sudoku rules). So any suggestions on how to start?
Example:
Sudoku> candidates example (0,2)
[4,8]
I remember doing this project in my Algorithms class in college. My best advice, particularly for someone who is writing in Haskell for learning and not for production, is to write 'top down'. First, ask yourself what do you need to do to solve this problem? Then just write it down with descriptive functions (even if they don't yet exist). Then stub in the functions you need. For example, a start might be:
candidates :: Sudoku -> Pos -> [Int]
candidates s p = union (rowCands s p) (colCands s p) (blockCands s p)
rowCands :: Sudoku -> Pos -> [Int]
rowCands = undefined
colCands :: Sudoku -> Pos -> [Int]
colCands = undefined
blockCands :: Sudoku -> Pos -> [Int]
blockCands = undefined
From this, you would simply start describing top-down how to solve the rowCands problem, until you've answered everything. Note that sometimes you'll want to write a function similar to union, but surely its already been written before. Try checking out http://haskell.org/hoogle. You can search for function names or even type signatures. Maybe there is a union somewhere already written in the standard libraries?
As an interesting question for you to answer yourself, what is the type of undefined and why does it type check? It is not a special keyword; it is merely a predefined function.
Here is a solution using Data.Set. You can use S.elems to get the list, but if you are making a sudoku solver, you might be looking for S.size.
import qualified Data.Set as S
import Data.Maybe(catMaybes)
fullSet = S.fromAscList [1..9]
fromJustL = S.fromList . concatMaybes
candidates s x =
rowSet s x `S.intersection` colSet s x `S.intersection` cellSet s x
rowSet s (i,_) = fullSet `S.difference` fromJustL (s !! i)
colSet s (_,i) = fullSet `S.difference` fromJustL (map (!!i) s)
cellSet s (i,j) = fullSet `S.difference` fromJustL (concatMap (g j) (g i s))
where
g i | i < 3 = take 3
| i < 6 = take 3 . drop 3
| otherwise = take 3 . drop 6

Resources