How to implement lazy iterate when IO is involved in Haskell

How to implement lazy iterate when IO is involved in Haskell - haskell

I'm using IO to encapsulate randomness. I am trying to write a method which iterates a next function n times, but the next function produces a result wrapped in IO because of the randomness.
Basically, my next function has this signature:
next :: IO Frame -> IO Frame
and I want to start with an initial Frame, then use the same pattern as iterate to get a list [Frame] with length n. Essentially, I'd like to be able to write the following:
runSimulation :: {- parameters -} -> IO [Frame]
runSimulation {- parameters -} = do
{- some setup -}
sequence . take n . iterate next $ firstFrame
Where firstFrame :: IO Frame formed by doing something like let firstFrame = return Frame x y z.
The problem I am encountering is that when I run this function, it never exits, so it seems to be running on an infinite loop (since iterate produces an infinite list).
I'm quite new to haskell so not sure where I'm going wrong here, or if my supposition above is correct that it seems that the entire infinite list is being executed.
(Update) In case it's helpful, here are the full definitions of Frame, next, and runSimulation:
-- A simulation Frame encapsulates the state of the simulation at some
-- point in "time". That means it contains a list of Agents in that
-- Frame, and a list of the Interactions that occurred in it as well. It
-- also contains the state of the World, as well as an AgentID counter
-- (so we can easily increment for generating new Agents).
data Frame = Frame AgentID [Agent] [Interaction]
deriving Show
-- Generate the next Frame from the current one, including scoring the
-- Agents based on the outcomes *in this Frame*.
-- TODO: add in reproduction.
nextFrame :: Reactor -> World -> IO Frame -> IO Frame
nextFrame react w inp = do
(Frame i agents history) <- inp
interactions <- interactAll react history agents
let scoredAgents = scoreAgents (rewards w) interactions agents
return (Frame i scoredAgents interactions)
-- Run a simulation for a number of iterations
runSimulation :: World -> Reactor -> (Dist, Dist) -> IO [Frame]
runSimulation world react (gen_dist, sel_dist) = do
startingAgents <- spawnAgents (initial_size world) (agentCreatorFactory gen_dist sel_dist)
let firstFrame = return (Frame (length startingAgents) startingAgents [])
next = nextFrame react world
sequence . take (iterations world) . iterate next $ firstFrame

I don't know how much time computing each Frame takes, but I suspect you are doing more work than necessary. The cause is a bit subtle. iterate produces a list of repeated applications of a function. For each element in the list, the previous value is reused. Your list is composed of IO actions. The IO action at position n is computed from the already obtained IO action at position n-1 by applying next.
Alas, when executing those actions, we are not so lucky. Executing the action at position n in the list will repeat all the work of the previous actions! We shared work when building the actions themselves (which are values, like almost everything in Haskell) but not when executing them, which is a different thing.
The simplest solution could be to define this auxiliary function with a baked-in limit:
iterateM :: Monad m => (a -> m a) -> a -> Int -> m [a]
iterateM step = go
where
go _ 0 = return []
go current limit =
do next <- step current
(current:) <$> go next (pred limit)
While simple, it's a bit inelegant, for two reasons:
It conflates the iteration process with the limiting of such process. In the pure list world we didn't have to do that, we could create infinite lists and take from then. But now in the effectful world that nice compositionality seems to be lost.
What if we want to do something with each value as it is being produced, without having to wait for all of them? Out function returns everything at the end, in one go.
As mentioned in the comments, streaming libraries like "conduit", "streamly" or "streaming" try to solve this problem in a better way, regaining some of the compositionality of pure lists. These libraries have types that represent effectful processes whose results are yielded piecewise.
For example, consider the function Streaming.Prelude.iterateM from "streaming", specialized to IO:
iterateM :: (a -> IO a) -> IO a -> Stream (Of a) IO r
It returns a Stream that we can "limit" using Streaming.Prelude.take:
take :: Int -> Stream (Of a) IO r -> Stream (Of a) IO ()
after limiting it we can get back to IO [a] with Streaming.Prelude.toList_ which accumulates all results:
toList_ :: Stream (Of a) IO r -> IO [a]
But instead of that we could process each element as it is being produced, with functions like Streaming.Prelude.mapM_:
mapM_ :: (a -> IO x) -> Stream (Of a) IO r -> IO r

An elementary solution:
As an alternative to #danidiaz's answer, it is possible to solve the problem without resorting to extra libraries such as Streaming, assuming the role of IO can be minimized.
Most of the required code can be written in terms of the MonadRandom class, of which IO is just one instance. It is not necessary to use IO in order to generate pseudo-random numbers.
The required iteration function can be written like this, in do notation:
import System.Random
import Control.Monad.Random.Lazy
iterateM1 :: MonadRandom mr => (a -> mr a) -> a -> mr [a]
iterateM1 fn x0 =
do
y <- fn x0
ys <- iterateM1 fn y
return (x0:ys)
Unfortunately, the text of the question does not define exactly what a Frame object is, or what the next stepping function does; so I have to somehow fill in the blanks. Also the next name gets defined in the libraries involved, so I will have to use nextFrame instead of just next.
Let's assume that a Frame object is just a point in 3-dimensional space, and that at each random step, one and only one of the 3 dimensions is chosen at random, and the corresponding coordinate is bumped by an amount of either +1 or -1, with equal probabilities.
This gives this code:
data Frame = Frame Int Int Int deriving Show
nextFrame :: MonadRandom mr => Frame -> mr Frame
nextFrame (Frame x y z) =
do
-- 3 dimensions times 2 possible steps: 1 & -1, hence 6 possibilities
n <- getRandomR (0::Int, 5::Int)
let fr = case n of
0 -> Frame (x-1) y z
1 -> Frame (x+1) y z
2 -> Frame x (y-1) z
3 -> Frame x (y+1) z
4 -> Frame x y (z-1)
5 -> Frame x y (z+1)
_ -> Frame x y z
return fr
At that point, it is not difficult to write code that builds an unlimited list of Frame objects representing the simulation history. Creating that list does not cause the code to loop forever, and the usual take function can be used to select the first few elements of such a list.
Putting all the code together:
import System.Random
import Control.Monad.Random.Lazy
iterateM1 :: MonadRandom mr => (a -> mr a) -> a -> mr [a]
iterateM1 fn x0 =
do
y <- fn x0
ys <- iterateM1 fn y
return (x0:ys)
data Frame = Frame Int Int Int deriving Show
nextFrame :: MonadRandom mr => Frame -> mr Frame
nextFrame (Frame x y z) =
do
-- 3 dimensions times 2 possible steps: 1 & -1, hence 6 possibilities
n <- getRandomR (0::Int, 5::Int)
let fr = case n of
0 -> Frame (x-1) y z
1 -> Frame (x+1) y z
2 -> Frame x (y-1) z
3 -> Frame x (y+1) z
4 -> Frame x y (z-1)
5 -> Frame x y (z+1)
_ -> Frame x y z
return fr
runSimulation :: MonadRandom mr => Int -> Int -> Int -> mr [Frame]
runSimulation x y z = let fr0 = Frame x y z in iterateM1 nextFrame fr0
main = do
rng0 <- getStdGen -- PRNG hosted in IO monad
-- Could use mkStdGen or MkTFGen instead
let
sim = runSimulation 0 0 0
allFrames = evalRand sim rng0 -- unlimited list of frames !
frameCount = 10
frames = take frameCount allFrames
mapM_ (putStrLn . show) frames
Program execution:
$ ./frame
Frame 0 0 0
Frame 0 1 0
Frame 0 0 0
Frame 0 (-1) 0
Frame 1 (-1) 0
Frame 1 (-2) 0
Frame 1 (-1) 0
Frame 1 (-1) 1
Frame 1 0 1
Frame 2 0 1
$
For large values of frameCount, execution time is a quasi-linear function of frameCount, as expected.
More on monadic actions for random number generation here.

Related

How to randomly shuffle a list

I have random number generator
rand :: Int -> Int -> IO Int
rand low high = getStdRandom (randomR (low,high))
and a helper function to remove an element from a list
removeItem _ [] = []
removeItem x (y:ys) | x == y = removeItem x ys
| otherwise = y : removeItem x ys
I want to shuffle a given list by randomly picking an item from the list, removing it and adding it to the front of the list. I tried
shuffleList :: [a] -> IO [a]
shuffleList [] = []
shuffleList l = do
y <- rand 0 (length l)
return( y:(shuffleList (removeItem y l) ) )
But can't get it to work. I get
hw05.hs:25:33: error:
* Couldn't match expected type `[Int]' with actual type `IO [Int]'
* In the second argument of `(:)', namely
....
Any idea ?
Thanks!

Since shuffleList :: [a] -> IO [a], we have shuffleList (xs :: [a]) :: IO [a].
Obviously, we can't cons (:) :: a -> [a] -> [a] an a element onto an IO [a] value, but instead we want to cons it onto the list [a], the computation of which that IO [a] value describes:
do
y <- rand 0 (length l)
-- return ( y : (shuffleList (removeItem y l) ) )
shuffled <- shuffleList (removeItem y l)
return y : shuffled
In do notation, values to the right of <- have types M a, M b, etc., for some monad M (here, IO), and values to the left of <- have the corresponding types a, b, etc..
The x :: a in x <- mx gets bound to the pure value of type a produced / computed by the M-type computation which the value mx :: M a denotes, when that computation is actually performed, as a part of the combined computation represented by the whole do block, when that combined computation is performed as a whole.
And if e.g. the next line in that do block is y <- foo x, it means that a pure function foo :: a -> M b is applied to x and the result is calculated which is a value of type M b, denoting an M-type computation which then runs and produces / computes a pure value of type b to which the name y is then bound.
The essence of Monad is thus this slicing of the pure inside / between the (potentially) impure, it is these two timelines going on of the pure calculations and the potentially impure computations, with the pure world safely separated and isolated from the impurities of the real world. Or seen from the other side, the pure code being run by the real impure code interacting with the real world (in case M is IO). Which is what computer programs must do, after all.
Your removeItem is wrong. You should pick and remove items positionally, i.e. by index, not by value; and in any case not remove more than one item after having picked one item from the list.
The y in y <- rand 0 (length l) is indeed an index. Treat it as such. Rename it to i, too, as a simple mnemonic.

Generally, with Haskell it works better to maximize the amount of functional code at the expense of non-functional (IO or randomness-related) code.
In your situation, your “maximum” functional component is not removeItem but rather a version of shuffleList that takes the input list and (as mentioned by Will Ness) a deterministic integer position. List function splitAt :: Int -> [a] -> ([a], [a]) can come handy here. Like this:
funcShuffleList :: Int -> [a] -> [a]
funcShuffleList _ [] = []
funcShuffleList pos ls =
if (pos <=0) || (length(take (pos+1) ls) < (pos+1))
then ls -- pos is zero or out of bounds, so leave list unchanged
else let (left,right) = splitAt pos ls
in (head right) : (left ++ (tail right))
Testing:
λ>
λ> funcShuffleList 4 [0,1,2,3,4,5,6,7,8,9]
[4,0,1,2,3,5,6,7,8,9]
λ>
λ> funcShuffleList 5 "#ABCDEFGH"
"E#ABCDFGH"
λ>
Once you've got this, you can introduce randomness concerns in simpler fashion. And you do not need to involve IO explicitely, as any randomness-friendly monad will do:
shuffleList :: MonadRandom mr => [a] -> mr [a]
shuffleList [] = return []
shuffleList ls =
do
let maxPos = (length ls) - 1
pos <- getRandomR (0, maxPos)
return (funcShuffleList pos ls)
... IO being just one instance of MonadRandom.
You can run the code using the default IO-hosted random number generator:
main = do
let inpList = [0,1,2,3,4,5,6,7,8]::[Integer]
putStrLn $ "inpList = " ++ (show inpList)
-- mr automatically instantiated to IO:
outList1 <- shuffleList inpList
putStrLn $ "outList1 = " ++ (show outList1)
outList2 <- shuffleList outList1
putStrLn $ "outList2 = " ++ (show outList2)
Program output:
$ pickShuffle
inpList = [0,1,2,3,4,5,6,7,8]
outList1 = [6,0,1,2,3,4,5,7,8]
outList2 = [8,6,0,1,2,3,4,5,7]
$
$ pickShuffle
inpList = [0,1,2,3,4,5,6,7,8]
outList1 = [4,0,1,2,3,5,6,7,8]
outList2 = [2,4,0,1,3,5,6,7,8]
$
The output is not reproducible here, because the default generator is seeded by its launch time in nanoseconds.
If what you need is a full random permutation, you could have a look here and there - Knuth a.k.a. Fisher-Yates algorithm.

Matrix of string, with unique columns and rows, latin square

i'm trying to write a function that for n gives matrix n*n with unique rows and columns (latin square).
I got function that gives my list of strings "1" .. "2" .. "n"
numSymbol:: Int -> [String]
I tried to generate all permutations of this, and them all n-length tuples of permutations, and them check if it is unique in row / columns. But complexity (n!)^2 works perfect for 2 and 3, but with n > 3 it takes forever. It is possible to build latin square from permutations directly, for example from
permutation ( numSymbol 3) = [["1","2","3"],["1","3","2"],["2","1","3"],["2","3","1"],["3","1","2"],["3","2","1"]]
get
[[["1","2","3",],["2","1","3"],["3","1","2"]] , ....]
without generating list like [["1",...],["1",...],...], when we know first element disqualify it ?

Note: since we can easily take a Latin square that's been filled with numbers from 1 to n and re-label it with anything we want, we can write code that uses integer symbols without giving anything away, so let's stick with that.
Anyway, the stateful backtracking/nondeterministic monad:
type StateList s = StateT s []
is helpful for this sort of problem.
Here's the idea. We know that every symbol s is going to appear exactly once in each row r, so we can represent this with an urn of all possible ordered pairs (r,s):
my_rs_urn = [(r,s) | r <- [1..n], s <- [1..n]]
Similarly, as every symbol s appears exactly once in each column c, we can use a second urn:
my_cs_urn = [(c,s) | c <- [1..n], s <- [1..n]]
Creating a Latin square is matter of filling in each position (r,c) with a symbol s by removing matching balls (r,s) and (c,s) (i.e., removing two balls, one from each urn) so that every ball is used exactly once. Our state will be the content of the urns.
We need backtracking because we might reach a point where for a particular position (r,c), there is no s such that (r,s) and (c,s) are both still available in their respective urns. Also, a pleasant side-effect of list-based backtracking/nondeterminism is that it'll generate all possible Latin squares, not just the first one it finds.
Given this, our state will look like:
type Urn = [(Int,Int)]
data S = S
{ size :: Int
, rs :: Urn
, cs :: Urn }
I've included the size in the state for convenience. It won't ever be modified, so it actually ought to be in a Reader instead, but this is simpler.
We'll represent a square by a list of cell contents in row-major order (i.e., the symbols in positions [(1,1),(1,2),...,(1,n),(2,1),...,(n,n)]):
data Square = Square
Int -- square size
[Int] -- symbols in row-major order
deriving (Show)
Now, the monadic action to generate latin squares will look like this:
type M = StateT S []
latin :: M Square
latin = do
n <- gets size
-- for each position (r,c), get a valid symbol `s`
cells <- forM (pairs n) (\(r,c) -> getS r c)
return $ Square n cells
pairs :: Int -> [(Int,Int)]
pairs n = -- same as [(x,y) | x <- [1..n], y <- [1..n]]
(,) <$> [1..n] <*> [1..n]
The worker function getS picks an s so that (r,s) and (c,s) are available in the respective urns, removing those pairs from the urns as a side effect. Note that getS is written non-deterministically, so it'll try every possible way of picking an s and associated balls from the urns:
getS :: Int -> Int -> M Int
getS r c = do
-- try each possible `s` in the row
s <- pickSFromRow r
-- can we put `s` in this column?
pickCS c s
-- if so, `s` is good
return s
Most of the work is done by the helpers pickSFromRow and pickCS. The first, pickSFromRow picks an s from the given row:
pickSFromRow :: Int -> M Int
pickSFromRow r = do
balls <- gets rs
-- "lift" here non-determinstically picks balls
((r',s), rest) <- lift $ choices balls
-- only consider balls in matching row
guard $ r == r'
-- remove the ball
modify (\st -> st { rs = rest })
-- return the candidate "s"
return s
It uses a choices helper which generates every possible way of pulling one element out of a list:
choices :: [a] -> [(a,[a])]
choices = init . (zipWith f <$> inits <*> tails)
where f a (x:b) = (x, a++b)
f _ _ = error "choices: internal error"
The second, pickCS checks if (c,s) is available in the cs urn, and removes it if it is:
pickCS :: Int -> Int -> M ()
pickCS c s = do
balls <- gets cs
-- only continue if the required ball is available
guard $ (c,s) `elem` balls
-- remove the ball
modify (\st -> st { cs = delete (c,s) balls })
With an appropriate driver for our monad:
runM :: Int -> M a -> [a]
runM n act = evalStateT act (S n p p)
where p = pairs n
this can generate all 12 Latin square of size 3:
λ> runM 3 latin
[Square 3 [1,2,3,2,3,1,3,1,2],Square 3 [1,2,3,3,1,2,2,3,1],...]
or the 576 Latin squares of size 4:
λ> length $ runM 4 latin
576
Compiled with -O2, it's fast enough to enumerate all 161280 squares of size 5 in a couple seconds:
main :: IO ()
main = print $ length $ runM 5 latin
The list-based urn representation above isn't very efficient. On the other hand, because the lengths of the lists are pretty small, there's not that much to be gained by finding more efficient representations.
Nonetheless, here's complete code that uses efficient Map/Set representations tailored to the way the rs and cs urns are used. Compiled with -O2, it runs in constant space. For n=6, it can process about 100000 Latin squares per second, but that still means it'll need to run for a few hours to enumerate all 800 million of them.
{-# OPTIONS_GHC -Wall #-}
module LatinAll where
import Control.Monad.State
import Data.List
import Data.Set (Set)
import qualified Data.Set as Set
import Data.Map (Map, (!))
import qualified Data.Map as Map
data S = S
{ size :: Int
, rs :: Map Int [Int]
, cs :: Set (Int, Int) }
data Square = Square
Int -- square size
[Int] -- symbols in row-major order
deriving (Show)
type M = StateT S []
-- Get Latin squares
latin :: M Square
latin = do
n <- gets size
cells <- forM (pairs n) (\(r,c) -> getS r c)
return $ Square n cells
-- All locations in row-major order [(1,1),(1,2)..(n,n)]
pairs :: Int -> [(Int,Int)]
pairs n = (,) <$> [1..n] <*> [1..n]
-- Get a valid `s` for position `(r,c)`.
getS :: Int -> Int -> M Int
getS r c = do
s <- pickSFromRow r
pickCS c s
return s
-- Get an available `s` in row `r` from the `rs` urn.
pickSFromRow :: Int -> M Int
pickSFromRow r = do
urn <- gets rs
(s, rest) <- lift $ choices (urn ! r)
modify (\st -> st { rs = Map.insert r rest urn })
return s
-- Remove `(c,s)` from the `cs` urn.
pickCS :: Int -> Int -> M ()
pickCS c s = do
balls <- gets cs
guard $ (c,s) `Set.member` balls
modify (\st -> st { cs = Set.delete (c,s) balls })
-- Return all ways of removing one element from list.
choices :: [a] -> [(a,[a])]
choices = init . (zipWith f <$> inits <*> tails)
where f a (x:b) = (x, a++b)
f _ _ = error "choices: internal error"
-- Run an action in the M monad.
runM :: Int -> M a -> [a]
runM n act = evalStateT act (S n rs0 cs0)
where rs0 = Map.fromAscList $ zip [1..n] (repeat [1..n])
cs0 = Set.fromAscList $ pairs n
main :: IO ()
main = do
print $ runM 3 latin
print $ length (runM 4 latin)
print $ length (runM 5 latin)
Somewhat remarkably, modifying the program to produce only reduced Latin squares (i.e., with symbols [1..n] in order in both the first row and the first column) requires changing only two functions:
-- All locations in row-major order, skipping first row and column
-- i.e., [(2,2),(2,3)..(n,n)]
pairs :: Int -> [(Int,Int)]
pairs n = (,) <$> [2..n] <*> [2..n]
-- Run an action in the M monad.
runM :: Int -> M a -> [a]
runM n act = evalStateT act (S n rs0 cs0)
where -- skip balls [(1,1)..(n,n)] for first row
rs0 = Map.fromAscList $ map (\r -> (r, skip r)) [2..n]
-- skip balls [(1,1)..(n,n)] for first column
cs0 = Set.fromAscList $ [(c,s) | c <- [2..n], s <- skip c]
skip i = [1..(i-1)]++[(i+1)..n]
With these modifications, the resulting Square will include symbols in row-major order but skipping the first row and column. For example:
λ> runM 3 latin
[Square 3 [3,1,1,2]]
means:
1 2 3 fill in question marks 1 2 3
2 ? ? =====================> 2 3 1
3 ? ? in row-major order 3 1 2
This is fast enough to enumerate all 16,942,080 reduced Latin squares of size 7 in a few minutes:
$ stack ghc -- -O2 -main-is LatinReduced LatinReduced.hs && time ./LatinReduced
[1 of 1] Compiling LatinReduced ( LatinReduced.hs, LatinReduced.o )
Linking LatinReduced ...
16942080
real 3m9.342s
user 3m8.494s
sys 0m0.848s

How do I memoize?

I have written this function that computes Collatz sequences, and I see wildly varying times of execution depending on the spin I give it. Apparently it is related to something called "memoization", but I have a hard time understanding what it is and how it works, and, unfortunately, the relevant article on HaskellWiki, as well as the papers it links to, have all proven to not be easily surmountable. They discuss intricate details of the relative performance of highly layman-indifferentiable tree constructions, while what I miss must be some very basic, very trivial point that these sources neglect to mention.
This is the code. It is a complete program, ready to be built and executed.
module Main where
import Data.Function
import Data.List (maximumBy)
size :: (Integral a) => a
size = 10 ^ 6
-- Nail the basics.
collatz :: Integral a => a -> a
collatz n | even n = n `div` 2
| otherwise = n * 3 + 1
recollatz :: Integral a => a -> a
recollatz = fix $ \f x -> if (x /= 1)
then f (collatz x)
else x
-- Now, I want to do the counting with a tuple monad.
mocollatz :: Integral b => b -> ([b], b)
mocollatz n = ([n], collatz n)
remocollatz :: Integral a => a -> ([a], a)
remocollatz = fix $ \f x -> if x /= 1
then f =<< mocollatz x
else return x
-- Trivialities.
collatzLength :: Integral a => a -> Int
collatzLength x = (length . fst $ (remocollatz x)) + 1
collatzPairs :: Integral a => a -> [(a, Int)]
collatzPairs n = zip [1..n] (collatzLength <$> [1..n])
longestCollatz :: Integral a => a -> (a, Int)
longestCollatz n = maximumBy order $ collatzPairs n
where
order :: Ord b => (a, b) -> (a, b) -> Ordering
order x y = snd x `compare` snd y
main :: IO ()
main = print $ longestCollatz size
With ghc -O2 it takes about 17 seconds, without ghc -O2 -- about 22 seconds to deliver the length and the seed of the longest Collatz sequence starting at any point below size.
Now, if I make these changes:
diff --git a/Main.hs b/Main.hs
index c78ad95..9607fe0 100644
--- a/Main.hs
+++ b/Main.hs
## -1,6 +1,7 ##
module Main where
import Data.Function
+import qualified Data.Map.Lazy as M
import Data.List (maximumBy)
size :: (Integral a) => a
## -22,10 +23,15 ## recollatz = fix $ \f x -> if (x /= 1)
mocollatz :: Integral b => b -> ([b], b)
mocollatz n = ([n], collatz n)
-remocollatz :: Integral a => a -> ([a], a)
-remocollatz = fix $ \f x -> if x /= 1
- then f =<< mocollatz x
- else return x
+remocollatz :: (Num a, Integral b) => b -> ([b], a)
+remocollatz 1 = return 1
+remocollatz x = case M.lookup x (table mutate) of
+ Nothing -> mutate x
+ Just y -> y
+ where mutate x = remocollatz =<< mocollatz x
+
+table :: (Ord a, Integral a) => (a -> b) -> M.Map a b
+table f = M.fromList [ (x, f x) | x <- [1..size] ]
-- Trivialities.
-- Then it will take just about 4 seconds with ghc -O2, but I would not live long enough to see it complete without ghc -O2.
Looking at the details of cost centres with ghc -prof -fprof-auto -O2 reveals that the first version enters collatz about a hundred million times, while the patched one -- just about one and a half million times. This must be the reason of the speedup, but I have a hard time understanding the inner workings of this magic. My best idea is that we replace a portion of expensive recursive calls with O(log n) map lookups, but I don't know if it's true and why it depends so much on some godforsaken compiler flags, while, as I see it, such performance swings should all follow solely from the language.
Can I haz an explanation of what happens here, and why the performance differs so vastly between ghc -O2 and plain ghc builds?
P.S. There are two requirements to the achieving of automagical memoization highlighted elsewhere on Stack Overflow:
Make a function to be memoized a top-level name.
Make a function to be memoized a monomorphic one.
In line with these requirements, I rebuilt remocollatz as follows:
remocollatz :: Int -> ([Int], Int)
remocollatz 1 = return 1
remocollatz x = mutate x
mutate :: Int -> ([Int], Int)
mutate x = remocollatz =<< mocollatz x
Now it's as top level and as monomorphic as it gets. Running time is about 11 seconds, versus the similarly monomorphized table version:
remocollatz :: Int -> ([Int], Int)
remocollatz 1 = return 1
remocollatz x = case M.lookup x (table mutate) of
Nothing -> mutate x
Just y -> y
mutate :: Int -> ([Int], Int)
mutate = \x -> remocollatz =<< mocollatz x
table :: (Int -> ([Int], Int)) -> M.Map Int ([Int], Int)
table f = M.fromList [ (x, f x) | x <- [1..size] ]
-- Running in less than 4 seconds.
I wonder why the memoization ghc is supposedly performing in the first case here is almost 3 times slower than my dumb table.

Can I haz an explanation of what happens here, and why the performance differs so vastly between ghc -O2 and plain ghc builds?
Disclaimer: this is a guess, not verified by viewing GHC core output. A careful answer would do so to verify the conjectures outlined below. You can try peering through it yourself: add -ddump-simpl to your compilation line and you will get copious output detailing exactly what GHC has done to your code.
You write:
remocollatz x = {- ... -} table mutate {- ... -}
where mutate x = remocollatz =<< mocollatz x
The expression table mutate in fact does not depend on x; but it appears on the right-hand side of an equation that takes x as an argument. Consequently, without optimizations, this table is recomputed each time remocollatz is called (presumably even from inside the computation of table mutate).
With optimizations, GHC notices that table mutate does not depend on x, and floats it to its own definition, effectively producing:
fresh_variable_name = table mutate
where mutate x = remocollatz =<< mocollatz x
remocollatz x = case M.lookup x fresh_variable_name of
{- ... -}
The table is therefore computed just once for the entire program run.
don't know why it [the performance] depends so much on some godforsaken compiler flags, while, as I see it, such performance swings should all follow solely from the language.
Sorry, but Haskell doesn't work that way. The language definition tells clearly what the meaning of a given Haskell term is, but does not say anything about the runtime or memory performance needed to compute that meaning.

Another approach to memoization that works in some situations, like this one, is to use a boxed vector, whose elements are computed lazily. The function used to initialize each element can use other elements of the vector in its calculation. As long as the evaluation of an element of the vector doesn't loop and refer to itself, just the elements it recursively depends on will be evaluated. Once evaluated, an element is effectively memoized, and this has the further benefit that elements of the vector that are never referenced are never evaluated.
The Collatz sequence is a nearly ideal application for this technique, but there is one complication. The next Collatz value(s) in sequence from a value under the limit may be outside the limit, which would cause a range error when indexing the vector. I solved this by just iterating through the sequence until back under the limit and counting the steps to do so.
The following program takes 0.77 seconds to run unoptimized and 0.30 when optimized:
import qualified Data.Vector as V
limit = 10 ^ 6 :: Int
-- The Collatz function, which given a value returns the next in the sequence.
nextCollatz val
| odd val = 3 * val + 1
| otherwise = val `div` 2
-- Given a value, return the next Collatz value in the sequence that is less
-- than the limit and the number of steps to get there. For example, the
-- sequence starting at 13 is: [13, 40, 20, 10, 5, 16, 8, 4, 2, 1], so if
-- limit is 100, then (nextCollatzWithinLimit 13) is (40, 1), but if limit is
-- 15, then (nextCollatzWithinLimit 13) is (10, 3).
nextCollatzWithinLimit val = (firstInRange, stepsToFirstInRange)
where
firstInRange = head rest
stepsToFirstInRange = 1 + (length biggerThanLimit)
(biggerThanLimit, rest) = span (>= limit) (tail collatzSeqStartingWithVal)
collatzSeqStartingWithVal = iterate nextCollatz val
-- A boxed vector holding Collatz length for each index. The collatzFn used
-- to generate the value for each element refers back to other elements of
-- this vector, but since the vector elements are only evaluated as needed and
-- there aren't any loops in the Collatz sequences, the values are calculated
-- only as needed.
collatzVec :: V.Vector Int
collatzVec = V.generate limit collatzFn
where
collatzFn :: Int -> Int
collatzFn index
| index <= 1 = 1
| otherwise = (collatzVec V.! nextWithinLimit) + stepsToGetThere
where
(nextWithinLimit, stepsToGetThere) = nextCollatzWithinLimit index
main :: IO ()
main = do
-- Use a fold through the vector to find the longest Collatz sequence under
-- the limit, and keep track of both the maximum length and the initial
-- value of the sequence, which is the index.
let (maxLength, maxIndex) = V.ifoldl' accMaxLen (0, 0) collatzVec
accMaxLen acc#(accMaxLen, accMaxIndex) index currLen
| currLen <= accMaxLen = acc
| otherwise = (currLen, index)
putStrLn $ "Max Collatz length below " ++ show limit ++ " is "
++ show maxLength ++ " at index " ++ show maxIndex

How to make a player jump (set it's y velocity)?

Given the following:
integralB :: Num a => Behavior t a -> Behavior t a -- definite integral of a behaviour
eJump :: Event t a -- tells the player to jump
bYAccel = pure 4000 -- y acceleration
bYVel = integralB bYAccel -- y velocity
bY = integralB bYVel -- y position
How do I make the player jump (probably by setting its y velocity) when a jump event arrives?

You'll need to be able to apply an impulse to the Y velocity for the jump. From your own answer, you've come up with a way to do so by summing all the impulses from the jumps and adding them to the integral of the acceleration.
Your acceleration is also constant. If you don't want the player falling constantly, you'd need something like:
bYAccel = (ifB airborne) 4000 0
airborne = fmap (>0) bY
ifB :: Behavior t Bool -> a -> a -> Behavior t a
ifB boolBehavior yes no = fmap (\bool -> if bool then yes else no) boolBehavior
One possible reason the height of your jumps varies is you aren't resetting the velocity when the player lands. If you have rules that hold the player above some position (like the floor), and are somehow stopping acceleration when the player hits the floor, you will also need to set the velocity to 0 if it is in the direction of the floor. (If you also set it to 0 when it's not in the direction of the floor, the player can never get the velocity to leave the ground.)
The reason this would cause erratic jumping heights is that the final velocity when the player lands will be close to the impulse you applied for them to take off. Using your numbers, if a jump started with a velocity of -5000, and ended with a velocity of 4800, the next jump will add an impulse of -5000, taking the jump to a starting velocity of only -200. That might have an ending velocity of 300, so the next jump will be an almost full -4700 jump.
Here's a complete working example. It uses the gloss library for input and display. The gameDefinition corresponds to the components introduced in your question. integrateDeltas is equivalent to your integralB, but produces events that are impulses, which are easy to generate in a clocked framework like gloss, and easy to use mixed with other events that cause impulses, like jumping.
{-# LANGUAGE RankNTypes #-}
module Main where
import Reactive.Banana
import Reactive.Banana.Frameworks.AddHandler
import Reactive.Banana.Frameworks
import Data.IORef
import qualified Graphics.Gloss.Interface.IO.Game as Gloss
gameDefinition :: GlossGameEvents t -> Behavior t Gloss.Picture
gameDefinition events = renderBehavior
where
bY = accumB 0 (fmap sumIfPositive yShifts)
yShifts = integrateDeltas bYVel
bYVel = accumB 0 yVelChanges
yVelChanges = apply ((ifB airborne) (+) sumIfPositive) yVelShifts
yVelShifts = union (integrateDeltas bYAccel) (fmap (const 3) eJump)
bYAccel = (ifB airborne) (-10) 0
airborne = fmap (>0) bY
eJump = filterE isKeyEvent (event events)
integrateDeltas = integrateDeltaByTimeStep (timeStep events)
renderBehavior = (liftA3 render) bY bYVel bYAccel
render y yVel yAccel =
Gloss.Pictures [
Gloss.Translate 0 (20+y*100) (Gloss.Circle 20),
Gloss.Translate (-50) (-20) (readableText (show y)),
Gloss.Translate (-50) (-40) (readableText (show yVel)),
Gloss.Translate (-50) (-60) (readableText (show yAccel))
]
readableText = (Gloss.Scale 0.1 0.1) . Gloss.Text
-- Utilities
sumIfPositive :: (Ord n, Num n) => n -> n -> n
sumIfPositive x y = max 0 (x + y)
ifB :: Behavior t Bool -> a -> a -> Behavior t a
ifB boolBehavior yes no = fmap (\bool -> if bool then yes else no) boolBehavior
integrateDeltaByTimeStep :: (Num n) => Event t n -> Behavior t n -> Event t n
integrateDeltaByTimeStep timeStep derivative = apply (fmap (*) derivative) timeStep
isKeyEvent :: Gloss.Event -> Bool
isKeyEvent (Gloss.EventKey _ _ _ _) = True
isKeyEvent _ = False
-- Main loop to run it
main :: IO ()
main = do
reactiveGame (Gloss.InWindow "Reactive Game Example" (400, 400) (10, 10))
Gloss.white
100
gameDefinition
-- Reactive gloss game
data GlossGameEvents t = GlossGameEvents {
event :: Event t Gloss.Event,
timeStep :: Event t Float
}
makeReactiveGameNetwork :: Frameworks t
=> IORef Gloss.Picture
-> AddHandler Gloss.Event
-> AddHandler Float
-> (forall t. GlossGameEvents t -> Behavior t Gloss.Picture)
-> Moment t ()
makeReactiveGameNetwork latestFrame glossEvent glossTime game = do
eventEvent <- fromAddHandler glossEvent
timeStepEvent <- fromAddHandler glossTime
let
events = GlossGameEvents { event = eventEvent, timeStep = timeStepEvent }
pictureBehavior = game events
pictureChanges <- changes pictureBehavior
reactimate (fmap (writeIORef latestFrame) pictureChanges)
reactiveGame :: Gloss.Display
-> Gloss.Color
-> Int
-> (forall t. GlossGameEvents t -> Behavior t Gloss.Picture)
-> IO ()
reactiveGame display color steps game = do
latestFrame <- newIORef Gloss.Blank
(glossEvent, fireGlossEvent) <- newAddHandler
(glossTime, addGlossTime) <- newAddHandler
network <- compile (makeReactiveGameNetwork latestFrame glossEvent glossTime game)
actuate network
Gloss.playIO
display
color
steps
()
(\world -> readIORef latestFrame)
(\event world -> fireGlossEvent event)
(\time world -> addGlossTime time)
In this example, bY checks for collision with a floor at 0 by accumulating the impulses, but constraining the accumulated value to be above 0.
The velocity, bYVel, accumulates all impulses while airborne, but only those impulses that are directed away from the floor while not airborne. If you change
yVelChanges = apply ((ifB airborne) (+) sumIfPositive) yVelShifts
to
yVelChanges = fmap (+) yVelShifts
it recreates the erratic jumping bug.
The acceleration, bYAccel, is only present while airborne.
I used a coordinate system with a +Y axis in the up direction (opposite the acceleration).
The code at the end is a small framework to hook reactive-banana up to gloss.

Solved it! I feel a little silly for not thinking of this earlier, but I just increment a counter every eJump and add that counter on to bYVel.
bJumpVel = sumB $ (-5000) <$ eJump
bYVel = (+) <$> bJumpVel <*> integralB bYAccel
-- gives the sum of the events
sumB :: Num a => Event t a -> Behavior t a
sumB e = accumB 0 $ (+) <$> e
For some reason the height of the jump always varies quite a bit, but that's probably an unrelated problem to do with my timing of things.
I won't mark this question as answered yet in case someone wants to share a better one.

How to increment a variable in functional programming?

How do you increment a variable in a functional programming language?
For example, I want to do:
main :: IO ()
main = do
let i = 0
i = i + 1
print i
Expected output:
1

Simple way is to introduce shadowing of a variable name:
main :: IO () -- another way, simpler, specific to monads:
main = do main = do
let i = 0 let i = 0
let j = i i <- return (i+1)
let i = j+1 print i
print i -- because monadic bind is non-recursive
Prints 1.
Just writing let i = i+1 doesn't work because let in Haskell makes recursive definitions — it is actually Scheme's letrec. The i in the right-hand side of let i = i+1 refers to the i in its left hand side — not to the upper level i as might be intended. So we break that equation up by introducing another variable, j.
Another, simpler way is to use monadic bind, <- in the do-notation. This is possible because monadic bind is not recursive.
In both cases we introduce new variable under the same name, thus "shadowing" the old entity, i.e. making it no longer accessible.
How to "think functional"
One thing to understand here is that functional programming with pure — immutable — values (like we have in Haskell) forces us to make time explicit in our code.
In imperative setting time is implicit. We "change" our vars — but any change is sequential. We can never change what that var was a moment ago — only what it will be from now on.
In pure functional programming this is just made explicit. One of the simplest forms this can take is with using lists of values as records of sequential change in imperative programming. Even simpler is to use different variables altogether to represent different values of an entity at different points in time (cf. single assignment and static single assignment form, or SSA).
So instead of "changing" something that can't really be changed anyway, we make an augmented copy of it, and pass that around, using it in place of the old thing.

As a general rule, you don't (and you don't need to). However, in the interests of completeness.
import Data.IORef
main = do
i <- newIORef 0 -- new IORef i
modifyIORef i (+1) -- increase it by 1
readIORef i >>= print -- print it
However, any answer that says you need to use something like MVar, IORef, STRef etc. is wrong. There is a purely functional way to do this, which in this small rapidly written example doesn't really look very nice.
import Control.Monad.State
type Lens a b = ((a -> b -> a), (a -> b))
setL = fst
getL = snd
modifyL :: Lens a b -> a -> (b -> b) -> a
modifyL lens x f = setL lens x (f (getL lens x))
lensComp :: Lens b c -> Lens a b -> Lens a c
lensComp (set1, get1) (set2, get2) = -- Compose two lenses
(\s x -> set2 s (set1 (get2 s) x) -- Not needed here
, get1 . get2) -- But added for completeness
(+=) :: (Num b) => Lens a b -> Lens a b -> State a ()
x += y = do
s <- get
put (modifyL x s (+ (getL y s)))
swap :: Lens a b -> Lens a b -> State a ()
swap x y = do
s <- get
let x' = getL x s
let y' = getL y s
put (setL y (setL x s y') x')
nFibs :: Int -> Int
nFibs n = evalState (nFibs_ n) (0,1)
nFibs_ :: Int -> State (Int,Int) Int
nFibs_ 0 = fmap snd get -- The second Int is our result
nFibs_ n = do
x += y -- Add y to x
swap x y -- Swap them
nFibs_ (n-1) -- Repeat
where x = ((\(x,y) x' -> (x', y)), fst)
y = ((\(x,y) y' -> (x, y')), snd)

There are several solutions to translate imperative i=i+1 programming to functional programming. Recursive function solution is the recommended way in functional programming, creating a state is almost never what you want to do.
After a while you will learn that you can use [1..] if you need a index for example, but it takes a lot of time and practice to think functionally instead of imperatively.
Here's a other way to do something similar as i=i+1 not identical because there aren't any destructive updates. Note that the State monad example is just for illustration, you probably want [1..] instead:
module Count where
import Control.Monad.State
count :: Int -> Int
count c = c+1
count' :: State Int Int
count' = do
c <- get
put (c+1)
return (c+1)
main :: IO ()
main = do
-- purely functional, value-modifying (state-passing) way:
print $ count . count . count . count . count . count $ 0
-- purely functional, State Monad way
print $ (`evalState` 0) $ do {
count' ; count' ; count' ; count' ; count' ; count' }

Note: This is not an ideal answer but hey, sometimes it might be a little good to give anything at all.
A simple function to increase the variable would suffice.
For example:
incVal :: Integer -> Integer
incVal x = x + 1
main::IO()
main = do
let i = 1
print (incVal i)
Or even an anonymous function to do it.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string