Haskell - Is there a way to limit the execution time for a given function?

Haskell - Is there a way to limit the execution time for a given function? - haskell

Suppose I have a function in Haskell which may not always terminate. Is there a way to make the program halt itself if it's taking too long to compute?
Example:
import qualified Data.Map as Map
walkmap :: Int -> Map.Map Int Int -> Int
walkmap x m = case Map.lookup x m of
Nothing -> x
Just y -> walkmap y m
main :: IO ()
main = do
let ma = Map.fromList [(0,1), (1,2)]
let mb = Map.fromList [(0,1), (1,0)]
print $ walkmap 0 ma
print $ walkmap 0 mb
walkmap ma 0 should return 2 right away, but walkmap mb 0 would loop forever. I know it's impossible to know for sure if the function would halt or not, what I'd like to know is if there's a way to set a time limit (say, 10 seconds) for that computation.

The answer to the question as asked looks like this:
timeout (10*1000000) (evaluate (walkmap 0 mb)) >>= print
But the answer to the "avoid cycles in a lookup" question that's behind it is Brent's remarkable tortoise and hare algorithm. (Beware! I have only tested this code on your exact two test cases. There could be bugs lurking in it. You should read about the algorithm behind the link and code review (or re-implement) it yourself.)
walkmap :: Ord a => a -> Map.Map a a -> Maybe a
walkmap a m = case Map.lookup a m of
Nothing -> Just a
Just a' -> go a a' (iterate (2*) 1)
where
-- p for pause
go a a' ps | a == a' = Nothing
go a a' (p:ps) = case Map.lookup a' m of
Nothing -> Just a'
Just a''
| p == 0 -> go a' a'' ps
| otherwise -> go a a'' (p-1:ps)

Related

Parallelize computation of mutable vector in ST

How can computations done in ST be made to run in parallel?
I have a vector which needs to be filled in by random access, hence the use of ST, and the computation runs correctly single-threaded, but have been unable to figure out how to use more than one core.
Random access is needed because of the meaning of the indices into the vector. There are n things and every possible way of choosing among n things has an entry in the vector, as in the choice function. Each of these choices corresponds to a binary number (conceptually, a packed [Bool]) and these Int values are the indices. If there are n things, then the size of the vector is 2^n. The natural way the algorithm runs is for every entry corresponding to "n choose 1" to be filled in, then every entry for "n choose 2," etc. The entries corresponding to "n choose k" depends on the entries corresponding to "n choose (k-1)." The integers for the different choices do not occur in numerical order, and that's why random access is needed.
Here's a pointless (but slow) computation that follows the same pattern. The example function shows how I tried to break the computation up so that the bulk of the work is done in a pure world (no ST monad). In the code below, bogus is where most of the work is done, with the intent of calling that in parallel, but only one core is ever used.
import qualified Data.Vector as Vb
import qualified Data.Vector.Mutable as Vm
import qualified Data.Vector.Generic.Mutable as Vg
import qualified Data.Vector.Generic as Gg
import Control.Monad.ST as ST ( ST, runST )
import Data.Foldable(forM_)
import Data.Char(digitToInt)
main :: IO ()
main = do
putStrLn $ show (example 9)
example :: Int -> Vb.Vector Int
example n = runST $ do
m <- Vg.new (2^n) :: ST s (Vm.STVector s Int)
Vg.unsafeWrite m 0 (1)
forM_ [1..n] $ \i -> do
p <- prev m n (i-1)
let newEntries = (choiceList n i) :: [Int]
forM_ newEntries $ \e -> do
let v = bogus p e
Vg.unsafeWrite m e v
Gg.unsafeFreeze m
choiceList :: Int -> Int -> [Int]
choiceList _ 0 = [0]
choiceList n 1 = [ 2^k | k <- [0..(n-1) ] ]
choiceList n k
| n == k = [2^n - 1]
| otherwise = (choiceList (n-1) k) ++ (map ((2^(n-1)) +) $ choiceList (n-1) (k-1))
prev :: Vm.STVector s Int -> Int -> Int -> ST s Integer
prev m n 0 = return 1
prev m n i = do
let chs = choiceList n i
v <- mapM (\k -> Vg.unsafeRead m k ) chs
let e = map (\k -> toInteger k ) v
return (sum e)
bogus :: Integer -> Int -> Int
bogus prior index = do
let f = fac prior
let g = (f^index) :: Integer
let d = (map digitToInt (show g)) :: [Int]
let a = fromIntegral (head d)^2
a
fac :: Integer -> Integer
fac 0 = 1
fac n = n * fac (n - 1)
If anyone tests this, using more than 9 or 10 in show (example 9) will take much longer than you want to wait for such a pointless sequence of numbers.

Just do it in IO. If you need to use the result in pure code, then unsafePerformIO is available.
The following version runs about 3-4 times faster with +RTS -N16 than +RTS -N1. My changes involved converting the ST vectors to IO, changing the forM_ to forConcurrently_, and adding a bang annotation to let !v = bogus ....
Full code:
import qualified Data.Vector as Vb
import qualified Data.Vector.Mutable as Vm
import qualified Data.Vector.Generic.Mutable as Vg
import qualified Data.Vector.Generic as Gg
import Control.Monad.ST as ST ( ST, runST )
import Data.Foldable(forM_)
import Data.Char(digitToInt)
import Control.Concurrent.Async
import System.IO.Unsafe
main :: IO ()
main = do
let m = unsafePerformIO (example 9)
putStrLn $ show m
example :: Int -> IO (Vb.Vector Int)
example n = do
m <- Vg.new (2^n)
Vg.unsafeWrite m 0 (1)
forM_ [1..n] $ \i -> do
p <- prev m n (i-1)
let newEntries = (choiceList n i) :: [Int]
forConcurrently_ newEntries $ \e -> do
let !v = bogus p e
Vg.unsafeWrite m e v
Gg.unsafeFreeze m
choiceList :: Int -> Int -> [Int]
choiceList _ 0 = [0]
choiceList n 1 = [ 2^k | k <- [0..(n-1) ] ]
choiceList n k
| n == k = [2^n - 1]
| otherwise = (choiceList (n-1) k) ++ (map ((2^(n-1)) +) $ choiceList (n-1) (k-1))
prev :: Vm.IOVector Int -> Int -> Int -> IO Integer
prev m n 0 = return 1
prev m n i = do
let chs = choiceList n i
v <- mapM (\k -> Vg.unsafeRead m k ) chs
let e = map (\k -> toInteger k ) v
return (sum e)
bogus :: Integer -> Int -> Int
bogus prior index = do
let f = fac prior
let g = (f^index) :: Integer
let d = (map digitToInt (show g)) :: [Int]
let a = fromIntegral (head d)^2
a
fac :: Integer -> Integer
fac 0 = 1
fac n = n * fac (n - 1)

I think this can not be done in a safe way. In the general case, it seems it would break Haskell's referential transparency.
If we could perform multi-threaded computations within ST s, then we could spawn two threads that race over the same STRef s Bool. Let's say one thread is writing False and the other one True.
After we use runST on the computation, we get an expression of type Bool which is sometimes False and sometimes True. That should not be possible.
If you are absolutely certain that your parallelization does not break referential transparency, you could try using unsafe primitives like unsafeIOToST to spawn new threads. Use with extreme care.
There might be safer ways to achieve something similar. Outside ST, we do have some parallelism available in Control.Parallel.Strategies.

There are a number of ways to do parallelization in Haskell. Usually they will give comparable performance improvements, however some are better then the others and it mostly depends on problem that needs parallelization. This particular use case looked very interesting to me, so I decided to investigate a few approaches.
Approaches
vector-strategies
We are using a boxed vector, therefore we can utilize laziness and built-in spark pool for parallelization. One very simple approach is provided by vector-strategies package, which can iterate over any immutable boxed vector and evaluate all of the thunks in parallel. It is also possible to split the vector in chunks, but as it turns out the chunk size of 1 is the optimal one:
exampleParVector :: Int -> Vb.Vector Int
exampleParVector n = example n `using` parVector 1
parallel
parVector uses par underneath and requires one extra iteration over the vector. In this case we are already iterating over thee vector, thus it would actually make more sense to use par from parallel directly. This would allow us to perform computation in parallel while continue using ST monad:
import Control.Parallel (par)
...
forM_ [1..n] $ \i -> do
p <- prev m n (i-1)
let newEntries = choiceList n i :: [Int]
forM_ newEntries $ \e -> do
let v = bogus p e
v `par` Vg.unsafeWrite m e v
It is important to note that the computation of each element of the vector is expensive when compared to the total number of elements in the vector. That is why using par is a very good solution here. If it was the opposite, namely the vector was very large, but elements weren't too expensive to compute, it would be better to use an unboxed vector and switch it to a different parallelization method.
async
Another way was described by #K.A.Buhr. Switch to IO from ST and use async:
import Control.Concurrent.Async (forConcurrently_)
...
forM_ [1..n] $ \i -> do
p <- prev m n (i-1)
let newEntries = choiceList n i :: [Int]
forConcurrently_ newEntries $ \e -> do
let !v = bogus p e
Vg.unsafeWrite m e v
The concern that #chi has raised is a valid one, however in this particular implementation it is safe to use unsafePerformIO instead of runST, because parallelization does not violate the invariant of deterministic computation. Namely, we can promise that regardless of the input supplied to example function, the output will always be exactly the same.
scheduler
Green threads are pretty cheap in Haskell, but they aren't free. The solution above with async package has one slight drawback: it will spin up at least as many threads as there are elements in the newEntries list each time forConcurrently_ is called. It would be better to spin up as many threads as there are capabilities (the -N RTS option) and let them do all the work. For this we can use scheduler package, which is a work stealing scheduler:
import Control.Scheduler (Comp(Par), runBatch_, withScheduler_)
...
withScheduler_ Par $ \scheduler ->
forM_ [1..n] $ \i -> runBatch_ scheduler $ \_ -> do
p <- prev m n (i-1)
let newEntries = choiceList n i :: [Int]
forM_ newEntries $ \e -> scheduleWork_ scheduler $ do
let !v = bogus p e
Vg.unsafeWrite m e v
Spark pool in GHC also uses a work stealing scheduler, which is built into RTS and is unrelated to the package above in any shape or form, but the idea is very similar: few threads with many units of computation.
Benchmarks
Here are some benchmarks on a 16-core machine for all of the approaches with example 7 (value 9 takes on the order of seconds, which introduces too much noise for criterion). We only get about x5 speedup, because a significant part of the algorithm is sequential in nature and can't be parallelized.

How do I memoize?

I have written this function that computes Collatz sequences, and I see wildly varying times of execution depending on the spin I give it. Apparently it is related to something called "memoization", but I have a hard time understanding what it is and how it works, and, unfortunately, the relevant article on HaskellWiki, as well as the papers it links to, have all proven to not be easily surmountable. They discuss intricate details of the relative performance of highly layman-indifferentiable tree constructions, while what I miss must be some very basic, very trivial point that these sources neglect to mention.
This is the code. It is a complete program, ready to be built and executed.
module Main where
import Data.Function
import Data.List (maximumBy)
size :: (Integral a) => a
size = 10 ^ 6
-- Nail the basics.
collatz :: Integral a => a -> a
collatz n | even n = n `div` 2
| otherwise = n * 3 + 1
recollatz :: Integral a => a -> a
recollatz = fix $ \f x -> if (x /= 1)
then f (collatz x)
else x
-- Now, I want to do the counting with a tuple monad.
mocollatz :: Integral b => b -> ([b], b)
mocollatz n = ([n], collatz n)
remocollatz :: Integral a => a -> ([a], a)
remocollatz = fix $ \f x -> if x /= 1
then f =<< mocollatz x
else return x
-- Trivialities.
collatzLength :: Integral a => a -> Int
collatzLength x = (length . fst $ (remocollatz x)) + 1
collatzPairs :: Integral a => a -> [(a, Int)]
collatzPairs n = zip [1..n] (collatzLength <$> [1..n])
longestCollatz :: Integral a => a -> (a, Int)
longestCollatz n = maximumBy order $ collatzPairs n
where
order :: Ord b => (a, b) -> (a, b) -> Ordering
order x y = snd x `compare` snd y
main :: IO ()
main = print $ longestCollatz size
With ghc -O2 it takes about 17 seconds, without ghc -O2 -- about 22 seconds to deliver the length and the seed of the longest Collatz sequence starting at any point below size.
Now, if I make these changes:
diff --git a/Main.hs b/Main.hs
index c78ad95..9607fe0 100644
--- a/Main.hs
+++ b/Main.hs
## -1,6 +1,7 ##
module Main where
import Data.Function
+import qualified Data.Map.Lazy as M
import Data.List (maximumBy)
size :: (Integral a) => a
## -22,10 +23,15 ## recollatz = fix $ \f x -> if (x /= 1)
mocollatz :: Integral b => b -> ([b], b)
mocollatz n = ([n], collatz n)
-remocollatz :: Integral a => a -> ([a], a)
-remocollatz = fix $ \f x -> if x /= 1
- then f =<< mocollatz x
- else return x
+remocollatz :: (Num a, Integral b) => b -> ([b], a)
+remocollatz 1 = return 1
+remocollatz x = case M.lookup x (table mutate) of
+ Nothing -> mutate x
+ Just y -> y
+ where mutate x = remocollatz =<< mocollatz x
+
+table :: (Ord a, Integral a) => (a -> b) -> M.Map a b
+table f = M.fromList [ (x, f x) | x <- [1..size] ]
-- Trivialities.
-- Then it will take just about 4 seconds with ghc -O2, but I would not live long enough to see it complete without ghc -O2.
Looking at the details of cost centres with ghc -prof -fprof-auto -O2 reveals that the first version enters collatz about a hundred million times, while the patched one -- just about one and a half million times. This must be the reason of the speedup, but I have a hard time understanding the inner workings of this magic. My best idea is that we replace a portion of expensive recursive calls with O(log n) map lookups, but I don't know if it's true and why it depends so much on some godforsaken compiler flags, while, as I see it, such performance swings should all follow solely from the language.
Can I haz an explanation of what happens here, and why the performance differs so vastly between ghc -O2 and plain ghc builds?
P.S. There are two requirements to the achieving of automagical memoization highlighted elsewhere on Stack Overflow:
Make a function to be memoized a top-level name.
Make a function to be memoized a monomorphic one.
In line with these requirements, I rebuilt remocollatz as follows:
remocollatz :: Int -> ([Int], Int)
remocollatz 1 = return 1
remocollatz x = mutate x
mutate :: Int -> ([Int], Int)
mutate x = remocollatz =<< mocollatz x
Now it's as top level and as monomorphic as it gets. Running time is about 11 seconds, versus the similarly monomorphized table version:
remocollatz :: Int -> ([Int], Int)
remocollatz 1 = return 1
remocollatz x = case M.lookup x (table mutate) of
Nothing -> mutate x
Just y -> y
mutate :: Int -> ([Int], Int)
mutate = \x -> remocollatz =<< mocollatz x
table :: (Int -> ([Int], Int)) -> M.Map Int ([Int], Int)
table f = M.fromList [ (x, f x) | x <- [1..size] ]
-- Running in less than 4 seconds.
I wonder why the memoization ghc is supposedly performing in the first case here is almost 3 times slower than my dumb table.

Can I haz an explanation of what happens here, and why the performance differs so vastly between ghc -O2 and plain ghc builds?
Disclaimer: this is a guess, not verified by viewing GHC core output. A careful answer would do so to verify the conjectures outlined below. You can try peering through it yourself: add -ddump-simpl to your compilation line and you will get copious output detailing exactly what GHC has done to your code.
You write:
remocollatz x = {- ... -} table mutate {- ... -}
where mutate x = remocollatz =<< mocollatz x
The expression table mutate in fact does not depend on x; but it appears on the right-hand side of an equation that takes x as an argument. Consequently, without optimizations, this table is recomputed each time remocollatz is called (presumably even from inside the computation of table mutate).
With optimizations, GHC notices that table mutate does not depend on x, and floats it to its own definition, effectively producing:
fresh_variable_name = table mutate
where mutate x = remocollatz =<< mocollatz x
remocollatz x = case M.lookup x fresh_variable_name of
{- ... -}
The table is therefore computed just once for the entire program run.
don't know why it [the performance] depends so much on some godforsaken compiler flags, while, as I see it, such performance swings should all follow solely from the language.
Sorry, but Haskell doesn't work that way. The language definition tells clearly what the meaning of a given Haskell term is, but does not say anything about the runtime or memory performance needed to compute that meaning.

Another approach to memoization that works in some situations, like this one, is to use a boxed vector, whose elements are computed lazily. The function used to initialize each element can use other elements of the vector in its calculation. As long as the evaluation of an element of the vector doesn't loop and refer to itself, just the elements it recursively depends on will be evaluated. Once evaluated, an element is effectively memoized, and this has the further benefit that elements of the vector that are never referenced are never evaluated.
The Collatz sequence is a nearly ideal application for this technique, but there is one complication. The next Collatz value(s) in sequence from a value under the limit may be outside the limit, which would cause a range error when indexing the vector. I solved this by just iterating through the sequence until back under the limit and counting the steps to do so.
The following program takes 0.77 seconds to run unoptimized and 0.30 when optimized:
import qualified Data.Vector as V
limit = 10 ^ 6 :: Int
-- The Collatz function, which given a value returns the next in the sequence.
nextCollatz val
| odd val = 3 * val + 1
| otherwise = val `div` 2
-- Given a value, return the next Collatz value in the sequence that is less
-- than the limit and the number of steps to get there. For example, the
-- sequence starting at 13 is: [13, 40, 20, 10, 5, 16, 8, 4, 2, 1], so if
-- limit is 100, then (nextCollatzWithinLimit 13) is (40, 1), but if limit is
-- 15, then (nextCollatzWithinLimit 13) is (10, 3).
nextCollatzWithinLimit val = (firstInRange, stepsToFirstInRange)
where
firstInRange = head rest
stepsToFirstInRange = 1 + (length biggerThanLimit)
(biggerThanLimit, rest) = span (>= limit) (tail collatzSeqStartingWithVal)
collatzSeqStartingWithVal = iterate nextCollatz val
-- A boxed vector holding Collatz length for each index. The collatzFn used
-- to generate the value for each element refers back to other elements of
-- this vector, but since the vector elements are only evaluated as needed and
-- there aren't any loops in the Collatz sequences, the values are calculated
-- only as needed.
collatzVec :: V.Vector Int
collatzVec = V.generate limit collatzFn
where
collatzFn :: Int -> Int
collatzFn index
| index <= 1 = 1
| otherwise = (collatzVec V.! nextWithinLimit) + stepsToGetThere
where
(nextWithinLimit, stepsToGetThere) = nextCollatzWithinLimit index
main :: IO ()
main = do
-- Use a fold through the vector to find the longest Collatz sequence under
-- the limit, and keep track of both the maximum length and the initial
-- value of the sequence, which is the index.
let (maxLength, maxIndex) = V.ifoldl' accMaxLen (0, 0) collatzVec
accMaxLen acc#(accMaxLen, accMaxIndex) index currLen
| currLen <= accMaxLen = acc
| otherwise = (currLen, index)
putStrLn $ "Max Collatz length below " ++ show limit ++ " is "
++ show maxLength ++ " at index " ++ show maxIndex

Writing an assembler in Haskell - mapM with state?

I'm writing a very simple two-pass assembler in Haskell and I've come across a scenario that I don't yet have the experience to solve. I think the solution is likely to involve monad transformers, which I don't really understand.
The assembler parses the assembly code into a list of Statements, which are either instructions or labels. Some Statements may refer to labels. The assembler needs to convert the Statements into Instructions, which involves eliminating the labels and substituting the label references with an appropriate value.
I have written the first pass of the assembler, which produces a [(String, Int)] representing a map from labels to addresses. I have also written the following function for translating a Statement into an Instruction:
stmtToInstruction :: Int -> [(String, Int)] -> Statement -> Either String [I.Instruction]
stmtToInstruction addr labels stmt = case stmt of
ADD d s1 s2 -> Right [I.ADD d s1 s2]
BEQL s1 s2 l -> case do label <- find (\e -> fst e == l) labels
let labelAddr = snd label
let relativeAddr = I.ImmS $ fromIntegral (labelAddr - addr)
return (I.BEQ s1 s2 relativeAddr) of
Just i -> Right [i]
Nothing -> Left $ "Label " ++ l ++ " not defined"
LABEL _ -> Right []
I've omitted several cases for brevity, but you can see all the possible results here:
ADD always succeeds and produces an instruction
BEQL can either succeed or fail, depending on whether a label is found
LABEL always succeeds, even though it produces no actual instructions
This works as expected. The problem I now have is writing this function:
replaceLabels :: [Statement] -> Either String [I.Instruction]
replaceLabels takes a list of statements, and runs stmtToInstruction on each one. The addr argument to stmtToInstruction must be the length of the [Instruction] accumulated so far. The output may either be a Left String, if one of the label references was invalid, or a Right [I.Instruction], if there were no errors.
mapM :: Monad m => (a -> m b) -> [a] -> m [b] gets us some of the way there, but provides no way to inject the current address into the (a -> m b) function. How do I make this work?

You're right: the StateT monad transformer will do the trick:
imapM :: (Traversable t, Monad m)
=> (Int -> a -> m b) -> t a -> m (t b)
imapM f = flip runStateT 0 .
mapM (\a ->
do
count <- get
put $! count + 1
f count a)
But writing the specialized version for lists might be better:
itraverse :: Applicative f
=> (Int -> a -> f b) -> [a] -> f [b]
itraverse f = go 0 where
go !_ [] = pure []
go !count (x:xs) = (:) <$> f count x <*> go (count + 1) xs

I've implemented a recursive solution that I'm sure is very inefficient. I'd still be interested to see the 'proper' way of doing this.
replaceLabels :: [Statement] -> Either String [I.Instruction]
replaceLabels [] = Right []
replaceLabels stmts#(s:ss) = replaceLabels' labels stmts 0
where labels = process stmts
replaceLabels' :: [(String, Int)] -> [Statement] -> Int -> Either String [I.Instruction]
replaceLabels' _ [] _ = Right []
replaceLabels' labels (s:ss) addr = do
instructions <- stmtToInstruction addr labels s
restInstructions <- replaceLabels' labels ss (addr + length instructions)
return (instructions ++ restInstructions)

I would start by changing
stmtToInstruction :: Int -> [(String, Int)] -> Statement -> Either String [I.Instruction]
into
stmtToInstruction :: [(String, Int)] -> Statement -> Either String (Int -> [I.Instruction])
That is, moving the function that takes the address into the Right branch of the Either. The reason is that label reference errors seem to be independent of addresses, so it's better to handle reference errors first and then worry about the address stuff in isolation.
This function resolves the references:
resolveRefs :: [(String,Int)] -> [Statement] -> Either String [Int -> [Instruction]]
resolveRefs environment = traverse (stmtToInstruction environment)
(traverse is equivalent to mapM but it only requires an Applicative constraint. They are different functions merely for historical reasons.)
Ok, after having handled the errors, lets now focus on the [Int -> [Instruction]] list. It seems that we have to map over it from the left while carrying an accumulated address that we must supply to each function. The mapAccumL function is perfect for this:
resolveAddrs :: [Int -> [Instruction]] -> [Instruction]
resolveAddrs funcs = mconcat . snd $ accumulate funcs
where
accumulate :: [Int -> [Instruction]] -> (Int,[[Instruction]])
accumulate = mapAccumL step 0
step address func = let is = func address in (address + length is,is)

Conduit: Join two sources into one

I have two conduit sources A and B which I want to merge them into one yielding:
data Result = Left Int | Value Int | Right Int
merge :: Monad m => Source m Int -> Source m Int -> Source Result
merge a b = undefined
such as it:
Consumes values from both a and b
Performs some computation to produce Value Int
As a result of computation either a or b could have a leftover
When one of the sequences is exhausted the result source should keep producing either Left or Right values, (depending on which original source still has values) until both sources are exhausted
I tried implementing it with ZipSource such as:
getZipSource (ZipSource (a =$= CL.map Left) <* ZipSource (b =$= CL.map Right))
but I couldn't figure out how to make it alternate between sources (when I do two awaits) and how to deal with leftovers in a way that I described above.
I also looked at sequenceSources but it doesn't seem to help either.
Can something like that be built with Conduit?
A concrete example would be:
Have two (assumed sorted) Int sources
Get values from both and compare
Yield the min value, subtract it from the biggest and put the remaining back to its stream
Repeat.
The expected output would be:
runConduit $ merge (CL.sourceList [10, 20, 30]) (CL.sourceList [6, 4, 20]) $$ CL.take 10
Value 6 -- 10-6 = 4, 6 yielded, 4 goes back to "a"
Value 4 -- 4-4 = 0, both values are fully consumed
Value 20 -- 20-20 = 0, both values are fully consumed
Left 30 -- "b" has no values, but "a" still yielding
[UPDATE]
The best way I found so far is to write something similar to zipSources tweaking its internals as:
go (Done ()) (HaveOutput src close y) = HaveOutput (go (Done ()) src) close (Nothing, Just y)
go (HaveOutput src close x) (Done ()) = HaveOutput (go src (Done ())) close (Just x, Nothing)
Would it be the right way to go?

I ended up doing this way:
data MergedValue a v b = BackL a v | MergedValue v | BackR v b
data JoinResult a v b = LeftoverL a | JoinValue v | LeftoverR b
joinSources :: Monad m
=> (a -> b -> MergedValue a v b)
-> Source m a
-> Source m b
-> Source m (JoinResult a v b)
joinSources f as bs =
go (newResumableSource as) (newResumableSource bs)
where
go ras rbs = do
(ras', ma) <- lift $ ras $$++ await
(rbs', mb) <- lift $ rbs $$++ await
case (ma, mb) of
(Nothing, Nothing) -> pure ()
(Nothing, Just b) -> yield (LeftoverR b) >> go ras' rbs'
(Just a, Nothing) -> yield (LeftoverL a) >> go ras' rbs'
(Just a, Just b) -> case f a b of
BackL x v -> do
yield (JoinValue v)
(nxt, _) <- lift $ ras' $$++ leftover x
go nxt rbs'
BackR v x -> do
yield (JoinValue v)
(nxt, _) <- lift $ rbs' $$++ leftover x
go ras' nxt
MergedValue v -> yield (JoinValue v) >> go ras' rbs'

Haskell: put in State monad seems to be elided

I'm writing a program to allocate pizzas to people; each person will get one pizza, ideally of their favorite type, unless stock has run out, in which case they are given their next favorite type recursively.
My approach is to compute a ((User, Pizza), Int) for the amount a person would like said pizza, sort those, and recurse through using a state monad to keep inventory counts.
The program is written and type checks:
allocatePizzasImpl :: [((User, Pizza), Int)]
-> State [(Pizza, Int)] [(User, Pizza)]
allocatePizzasImpl [] = return []
allocatePizzasImpl ((user, (flavor, _)):ranks) =
do inventory <- get
-- this line is never hit
put $ updateWith inventory (\i -> if i <= 0
then Nothing
else Just $ i - 1) flavor
next <- allocatePizzasImpl $ filter ((/= user) . fst) ranks
return $ (user, flavor) : next
and I have a helper function to extract the result:
allocatePizzas :: [Pizza]
-> [((User, Pizza), Int)]
-> [(User, Pizza)]
allocatePizzas pizzas rank = fst
. runState (allocatePizzasImpl rank)
$ buildQuotas pizzas
but the line indicated by -- this line is never hit is... never hit by any GHCI breakpoints; furthermore, if I break on the return call, GHCI says inventory isn't in scope.
When run, the result is assigning the same pizza (with one inventory count) to all users. Something is going wrong, but I have absolutely no idea how to proceed. I'm new to Haskell, so any comments on style would be appreciated as well =)
Thanks!
PS: For completeness, updateWith is defined as:
updateWith :: (Eq a, Eq b)
=> [(a, b)] -- inventory
-> (b -> Maybe b) -- update function; Nothing removes it
-> a -- key to update
-> [(a, b)]
updateWith set update key =
case lookup key set of
Just b -> replace set
(unwrapPair (key, update b))
(fromMaybe 0 $ elemIndex (key, b) set)
Nothing -> set
where replace :: [a] -> Maybe a -> Int -> [a]
replace [] _ _ = []
replace (_:xs) (Just val) 0 = val:xs
replace (_:xs) Nothing 0 = xs
replace (x:xs) val i = x : (replace xs val $ i - 1)
unwrapPair :: Monad m => (a, m b) -> m (a, b)
unwrapPair (a, mb) = do b <- mb
return (a, b)

I think your function replace is broken:
replace (_:xs) (Just val) 0 = val:xs
This doesn't pay any attention to the value it's replacing. Wasn't your intention to replace just the pair corresponding to key?
I think you want
updateWith [] e k = []
updateWith ((k', v):kvs) e k
| k' == k = case e v of
Just v' -> (k, v'):kvs
Nothing -> kvs
| otherwise = (k', v) : updateWith kvs e k

The issue (ignoring other conceptual things mentioned by the commenters) turned out to be using fst to extract the result from the State would for some reason not cause the State to actually be computed. Running the result through seq fixed it.
I'd be interested in knowing why this is the case, though!
Edit: As Daniel Wagner pointed out in the comments, I wasn't actually using inventory, which turned out to be the real bug. Marking this as accepted.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string