Debug memory issue in Haskell

Debug memory issue in Haskell - haskell

I'm trying to solve the whole Advent of Code series in Haskell.
I'm encountering a memory issue while solving the 2015/06 exercise where there is a bunch of instructions to turn on, off and toggle lights on a grid. The goal is to count the number of lit lights at the end.
Given instructions are parsed and stored in a Instruction type, this is the type definition:
data Instruction = Instruction Op Range deriving Show
data Op = Off | On | Toggle | Nop deriving Show
data Range = Range Start End deriving Show
type Start = Point
type End = Start
data Point = Point Int Int deriving Show
This is the code that calculates the result. I'm trying to abstract over the fact that a light is a Boolean by using a typeclass
gridWidth, gridHeight :: Int
gridWidth = 1000
gridHeight = 1000
initialGrid :: Togglable a => Matrix a
initialGrid = matrix gridWidth gridHeight (const initialState)
instance Monoid Op where
mempty = Nop
instance Semigroup Op where
_ <> On = On
_ <> Off = Off
x <> Nop = x
Off <> Toggle = On
On <> Toggle = Off
Toggle <> Toggle = Nop
Nop <> Toggle = Toggle
class Togglable a where
initialState :: a
apply :: Op -> a -> a
instance Togglable Bool where
initialState = False
apply On = const True
apply Off = const False
apply Toggle = not
apply Nop = id
-- Does the Range of the instruction apply to this matrix coordinate?
(<?) :: Range -> (Int, Int) -> Bool
(<?) (Range start end) (x, y) = let
(Point x1 y1) = start
(Point x2 y2) = end
(mx, my) = (x-1, y-1) -- translate from matrix coords (they start from 1!)
in and [
mx >= min x1 x2, mx <= max x1 x2,
my >= min y1 y2, my <= max y1 y2
]
stepGenerator :: Instruction -> Matrix Op
stepGenerator (Instruction op r) = let
g coord = if r <? coord then op else Nop
in matrix gridWidth gridHeight g
allStepsMatrix :: [Instruction] -> Matrix Op
allStepsMatrix = mconcat.map stepGenerator
finalGrid :: Togglable a => Matrix a -> Matrix Op -> Matrix a
finalGrid z op = fmap apply op <*> z
countOn :: Matrix Bool -> Integer
countOn = toInteger.foldr (\x -> if x then (+1) else id) 0
partA :: Challenge (String -> Integer)
partA = Challenge $ countOn.finalGrid initialGrid.allStepsMatrix.parse
The solution will be the Integer returned by what's inside partA. parse works and has type parse :: String -> [Instruction]
The code compiles and runs with small matrices (e.g. 10x10), as soon as I turn gridWidth and gridHeight to 1000 I'm faced with a out of memory error, apparently generating from the allStepsMatrix function.
Any hint of what could be wrong here? Full code is on GitHub

I would strongly suggest not using a typeclass. Typeclasses are supposed to have laws, and they should be "rare", in the sense that each type has only a few valid implementations. I would suggest taking initialState and toggle as arguments, but even that is overkill, because the given instructions simply do not make sense with any type that isn't Bool. Just operate on a Matrix Bool directly and you can cut out a good chunk of the code you've written. However, I won't change anything for my answer.
In any case, I think the issue may be laziness. 1000 * 1000 = 1000000, so each Matrix will be several megabytes in size. On a 64-bit machine, a pointer is 8 bytes, so each Matrix is at least 8 MB, plus a few more for the data behind it. You are mconcating 300 of them (that's what I get from the site) together, but, because you are doing it lazily, all 300 matrices are resident simultaneously, so it's at least 2.4 GB, just for the structures themselves. The cost of filling each of those 300 million pointers with thunks also makes itself known—a thunk is at least one pointer (8 bytes, pointing to code in static memory, making another 2.4 GB), plus its payload, which, here, means more pointers, each one bestowing your computer with another 2.4 GB of memory pressure. I suggest deepseq:
instance NFData Op where
rnf Off = ()
rnf On = ()
rnf Toggle = ()
rnf Nop = ()
-- rnf x = x `seq` () but I like to be explicit
allStepsMatrix :: [Instruction] -> Matrix Op
allStepsMatrix = foldl' (\x y -> force (x <> y)) mempty . map stepGenerator
Usnig foldl' lets this operate in constant stack space, but foldl or foldr would also work, because a stack depth on the order of 300 is nothing. The force means that all the elements of each Matrix are evaluated. Before, each matrix kept the previous one alive by holding references to it, but now the references are removed when the elements are evaluated, so the GC can throw them out in a timely manner. I have tested this and it terminates in reasonable time with much better space usage.

Related

How to implement lazy iterate when IO is involved in Haskell

I'm using IO to encapsulate randomness. I am trying to write a method which iterates a next function n times, but the next function produces a result wrapped in IO because of the randomness.
Basically, my next function has this signature:
next :: IO Frame -> IO Frame
and I want to start with an initial Frame, then use the same pattern as iterate to get a list [Frame] with length n. Essentially, I'd like to be able to write the following:
runSimulation :: {- parameters -} -> IO [Frame]
runSimulation {- parameters -} = do
{- some setup -}
sequence . take n . iterate next $ firstFrame
Where firstFrame :: IO Frame formed by doing something like let firstFrame = return Frame x y z.
The problem I am encountering is that when I run this function, it never exits, so it seems to be running on an infinite loop (since iterate produces an infinite list).
I'm quite new to haskell so not sure where I'm going wrong here, or if my supposition above is correct that it seems that the entire infinite list is being executed.
(Update) In case it's helpful, here are the full definitions of Frame, next, and runSimulation:
-- A simulation Frame encapsulates the state of the simulation at some
-- point in "time". That means it contains a list of Agents in that
-- Frame, and a list of the Interactions that occurred in it as well. It
-- also contains the state of the World, as well as an AgentID counter
-- (so we can easily increment for generating new Agents).
data Frame = Frame AgentID [Agent] [Interaction]
deriving Show
-- Generate the next Frame from the current one, including scoring the
-- Agents based on the outcomes *in this Frame*.
-- TODO: add in reproduction.
nextFrame :: Reactor -> World -> IO Frame -> IO Frame
nextFrame react w inp = do
(Frame i agents history) <- inp
interactions <- interactAll react history agents
let scoredAgents = scoreAgents (rewards w) interactions agents
return (Frame i scoredAgents interactions)
-- Run a simulation for a number of iterations
runSimulation :: World -> Reactor -> (Dist, Dist) -> IO [Frame]
runSimulation world react (gen_dist, sel_dist) = do
startingAgents <- spawnAgents (initial_size world) (agentCreatorFactory gen_dist sel_dist)
let firstFrame = return (Frame (length startingAgents) startingAgents [])
next = nextFrame react world
sequence . take (iterations world) . iterate next $ firstFrame

I don't know how much time computing each Frame takes, but I suspect you are doing more work than necessary. The cause is a bit subtle. iterate produces a list of repeated applications of a function. For each element in the list, the previous value is reused. Your list is composed of IO actions. The IO action at position n is computed from the already obtained IO action at position n-1 by applying next.
Alas, when executing those actions, we are not so lucky. Executing the action at position n in the list will repeat all the work of the previous actions! We shared work when building the actions themselves (which are values, like almost everything in Haskell) but not when executing them, which is a different thing.
The simplest solution could be to define this auxiliary function with a baked-in limit:
iterateM :: Monad m => (a -> m a) -> a -> Int -> m [a]
iterateM step = go
where
go _ 0 = return []
go current limit =
do next <- step current
(current:) <$> go next (pred limit)
While simple, it's a bit inelegant, for two reasons:
It conflates the iteration process with the limiting of such process. In the pure list world we didn't have to do that, we could create infinite lists and take from then. But now in the effectful world that nice compositionality seems to be lost.
What if we want to do something with each value as it is being produced, without having to wait for all of them? Out function returns everything at the end, in one go.
As mentioned in the comments, streaming libraries like "conduit", "streamly" or "streaming" try to solve this problem in a better way, regaining some of the compositionality of pure lists. These libraries have types that represent effectful processes whose results are yielded piecewise.
For example, consider the function Streaming.Prelude.iterateM from "streaming", specialized to IO:
iterateM :: (a -> IO a) -> IO a -> Stream (Of a) IO r
It returns a Stream that we can "limit" using Streaming.Prelude.take:
take :: Int -> Stream (Of a) IO r -> Stream (Of a) IO ()
after limiting it we can get back to IO [a] with Streaming.Prelude.toList_ which accumulates all results:
toList_ :: Stream (Of a) IO r -> IO [a]
But instead of that we could process each element as it is being produced, with functions like Streaming.Prelude.mapM_:
mapM_ :: (a -> IO x) -> Stream (Of a) IO r -> IO r

An elementary solution:
As an alternative to #danidiaz's answer, it is possible to solve the problem without resorting to extra libraries such as Streaming, assuming the role of IO can be minimized.
Most of the required code can be written in terms of the MonadRandom class, of which IO is just one instance. It is not necessary to use IO in order to generate pseudo-random numbers.
The required iteration function can be written like this, in do notation:
import System.Random
import Control.Monad.Random.Lazy
iterateM1 :: MonadRandom mr => (a -> mr a) -> a -> mr [a]
iterateM1 fn x0 =
do
y <- fn x0
ys <- iterateM1 fn y
return (x0:ys)
Unfortunately, the text of the question does not define exactly what a Frame object is, or what the next stepping function does; so I have to somehow fill in the blanks. Also the next name gets defined in the libraries involved, so I will have to use nextFrame instead of just next.
Let's assume that a Frame object is just a point in 3-dimensional space, and that at each random step, one and only one of the 3 dimensions is chosen at random, and the corresponding coordinate is bumped by an amount of either +1 or -1, with equal probabilities.
This gives this code:
data Frame = Frame Int Int Int deriving Show
nextFrame :: MonadRandom mr => Frame -> mr Frame
nextFrame (Frame x y z) =
do
-- 3 dimensions times 2 possible steps: 1 & -1, hence 6 possibilities
n <- getRandomR (0::Int, 5::Int)
let fr = case n of
0 -> Frame (x-1) y z
1 -> Frame (x+1) y z
2 -> Frame x (y-1) z
3 -> Frame x (y+1) z
4 -> Frame x y (z-1)
5 -> Frame x y (z+1)
_ -> Frame x y z
return fr
At that point, it is not difficult to write code that builds an unlimited list of Frame objects representing the simulation history. Creating that list does not cause the code to loop forever, and the usual take function can be used to select the first few elements of such a list.
Putting all the code together:
import System.Random
import Control.Monad.Random.Lazy
iterateM1 :: MonadRandom mr => (a -> mr a) -> a -> mr [a]
iterateM1 fn x0 =
do
y <- fn x0
ys <- iterateM1 fn y
return (x0:ys)
data Frame = Frame Int Int Int deriving Show
nextFrame :: MonadRandom mr => Frame -> mr Frame
nextFrame (Frame x y z) =
do
-- 3 dimensions times 2 possible steps: 1 & -1, hence 6 possibilities
n <- getRandomR (0::Int, 5::Int)
let fr = case n of
0 -> Frame (x-1) y z
1 -> Frame (x+1) y z
2 -> Frame x (y-1) z
3 -> Frame x (y+1) z
4 -> Frame x y (z-1)
5 -> Frame x y (z+1)
_ -> Frame x y z
return fr
runSimulation :: MonadRandom mr => Int -> Int -> Int -> mr [Frame]
runSimulation x y z = let fr0 = Frame x y z in iterateM1 nextFrame fr0
main = do
rng0 <- getStdGen -- PRNG hosted in IO monad
-- Could use mkStdGen or MkTFGen instead
let
sim = runSimulation 0 0 0
allFrames = evalRand sim rng0 -- unlimited list of frames !
frameCount = 10
frames = take frameCount allFrames
mapM_ (putStrLn . show) frames
Program execution:
$ ./frame
Frame 0 0 0
Frame 0 1 0
Frame 0 0 0
Frame 0 (-1) 0
Frame 1 (-1) 0
Frame 1 (-2) 0
Frame 1 (-1) 0
Frame 1 (-1) 1
Frame 1 0 1
Frame 2 0 1
$
For large values of frameCount, execution time is a quasi-linear function of frameCount, as expected.
More on monadic actions for random number generation here.

Generate a random list of custom data type, where values provided to data constructor are somehow bounded within a range

I have defined a Point data type, with a single value constructor like so:
data Point = Point {
x :: Int,
y :: Int,
color :: Color
} deriving (Show, Eq)
data Color = None
| Black
| Red
| Green
| Blue
deriving (Show, Eq, Enum, Bounded)
I have found an example of making a Bounded Enum an instance of the Random class and have made Color an instance of it like so:
instance Random Color where
random g = case randomR (1, 4) g of
(r, g') -> (toEnum r, g')
randomR (a, b) g = case randomR (fromEnum a, fromEnum b) g of
(r, g') -> (toEnum r, g')
I was then able to find out how to make Point an instance of the Random class also:
instance Random Point where
randomR (Point xl yl cl, Point xr yr cr) g =
let (x, g1) = randomR (xl, xr) g
(y, g2) = randomR (yl, yr) g1
(c, g3) = randomR (cl, cr) g2
in (Point x y c, g3)
random g =
let (x, g1) = random g
(y, g2) = random g1
(c, g3) = random g2
in (Point x y c, g3)
So, this let's me make random point values. But, what I'd like to do is be able to create a list of random Point values, where the x and the y properties are bounded within some range whilst leaving the color property to be an unbounded random value. Is this possible with the way I am currently modelling the code, or do I need to rethink how I construct Point values? For instance, instead of making Point an instance of the Random class, should I just create a random list of Int in the IO monad and then have a pure function that creates n Points, using values from the random list to construct each Point value?
Edit, I think I have found out how to do it:
Without changing the above code, in the IO monad I can do the following:
solved :: IO ()
solved = do
randGen <- getStdGen
let x = 2
let randomPoints = take x $ randomRs (Point 0 0 None, Point 200 200 Blue) randGen
putStrLn $ "Random points: " ++ show randomPoints
This seems to work, randomRs appears to let me specify a range...
Presumably because my Point data type derives Eq?
Or
Is it because my x and y properties are Int (guessing here, but may be "bounded" by default) and I have Color derive bounded?

It works because of the properties of the Int and Color types, not because of the properties of Point. If one suppresses the Eq clause of Point, it still works.
Your code is overall quite good, however I would mention a few minor caveats.
In the Random instance for Point, you are chaining the generator states manually; this is a bit error prone, and monadic do notation is supposed to make it unnecessary. The Color instance could be simplified.
You are using IO where it is not really required. IO is just one instance of the MonadRandom class. If g is an instance of RandomGen, any Rand g is an instance of MonadRandom.
The random values you're getting are not reproducible from a program execution to the next one; this is because getStdGen implicitly uses the launch time as a random number generation seed. It may do that because it is IO-hosted. In many situations, this is a problem, as one might want to vary the choice of random sequence and the system parameters independently of each other.
Using monadic style, the basics of your code could be rewritten for example like this:
import System.Random
import System.Random.TF -- Threefish random number generator
import Control.Monad.Random
data Point = Point {
x :: Int,
y :: Int,
color :: Color
} deriving (Show, Eq)
data Color = None
| Black
| Red
| Green
| Blue
deriving (Show, Eq, Enum, Bounded)
instance Random Color where
randomR (a, b) g = let (r,g') = randomR (fromEnum a, fromEnum b) g
in (toEnum r, g')
random g = randomR (minBound::Color, maxBound::Color) g
singleRandomPoint :: -- monadic action for just one random point
MonadRandom mr => Int -> Int -> Color -> Int -> Int -> Color -> mr Point
singleRandomPoint xmin ymin cmin xmax ymax cmax =
do
-- avoid manual chaining of generator states:
x <- getRandomR (xmin, xmax)
y <- getRandomR (ymin, ymax)
c <- getRandomR (cmin, cmax)
return (Point x y c)
And then we can derive an expression returning an unlimited list of random points:
-- monadic action for an unlimited list of random points:
seqRandomPoints :: MonadRandom mr =>
Int -> Int -> Color -> Int -> Int -> Color -> mr [Point]
seqRandomPoints xmin ymin cmin xmax ymax cmax =
sequence (repeat (singleRandomPoint xmin ymin cmin xmax ymax cmax))
-- returns an unlimited list of random points:
randomPoints :: Int -> Int -> Int -> Color -> Int -> Int -> Color -> [Point]
randomPoints seed xmin ymin cmin xmax ymax cmax =
let
-- get random number generator:
-- using Threefish algorithm (TF) for better statistical properties
randGen = mkTFGen seed
action = seqRandomPoints xmin ymin cmin xmax ymax cmax
in
evalRand action randGen
Finally we can print the first few random points on stdout:
-- Small printing utility:
printListAsLines :: Show t => [t] -> IO()
printListAsLines xs = mapM_ (putStrLn . show) xs
solved01 :: IO ()
solved01 = do
let
seed = 42 -- for random number generator setup
-- unlimited list of random points:
allRandomPoints = randomPoints seed 0 0 None 200 200 Blue
count = 5
someRandomPoints = take count allRandomPoints
-- IO not used at all so far
putStrLn $ "Random points: "
printListAsLines someRandomPoints
main = solved01
Program execution (reproducible with constant seed):
$ randomPoints
Random points:
Point {x = 187, y = 56, color = Green}
Point {x = 131, y = 28, color = Black}
Point {x = 89, y = 135, color = Blue}
Point {x = 183, y = 190, color = Red}
Point {x = 27, y = 161, color = Green}
$
Should you prefer to just get a finite number of points and also get back the updated state of your random number generator, you would have to use replicate n instead of repeat, and runRand instead of evalRand.
Bit more details about the monadic approach here.

How do I memoize?

I have written this function that computes Collatz sequences, and I see wildly varying times of execution depending on the spin I give it. Apparently it is related to something called "memoization", but I have a hard time understanding what it is and how it works, and, unfortunately, the relevant article on HaskellWiki, as well as the papers it links to, have all proven to not be easily surmountable. They discuss intricate details of the relative performance of highly layman-indifferentiable tree constructions, while what I miss must be some very basic, very trivial point that these sources neglect to mention.
This is the code. It is a complete program, ready to be built and executed.
module Main where
import Data.Function
import Data.List (maximumBy)
size :: (Integral a) => a
size = 10 ^ 6
-- Nail the basics.
collatz :: Integral a => a -> a
collatz n | even n = n `div` 2
| otherwise = n * 3 + 1
recollatz :: Integral a => a -> a
recollatz = fix $ \f x -> if (x /= 1)
then f (collatz x)
else x
-- Now, I want to do the counting with a tuple monad.
mocollatz :: Integral b => b -> ([b], b)
mocollatz n = ([n], collatz n)
remocollatz :: Integral a => a -> ([a], a)
remocollatz = fix $ \f x -> if x /= 1
then f =<< mocollatz x
else return x
-- Trivialities.
collatzLength :: Integral a => a -> Int
collatzLength x = (length . fst $ (remocollatz x)) + 1
collatzPairs :: Integral a => a -> [(a, Int)]
collatzPairs n = zip [1..n] (collatzLength <$> [1..n])
longestCollatz :: Integral a => a -> (a, Int)
longestCollatz n = maximumBy order $ collatzPairs n
where
order :: Ord b => (a, b) -> (a, b) -> Ordering
order x y = snd x `compare` snd y
main :: IO ()
main = print $ longestCollatz size
With ghc -O2 it takes about 17 seconds, without ghc -O2 -- about 22 seconds to deliver the length and the seed of the longest Collatz sequence starting at any point below size.
Now, if I make these changes:
diff --git a/Main.hs b/Main.hs
index c78ad95..9607fe0 100644
--- a/Main.hs
+++ b/Main.hs
## -1,6 +1,7 ##
module Main where
import Data.Function
+import qualified Data.Map.Lazy as M
import Data.List (maximumBy)
size :: (Integral a) => a
## -22,10 +23,15 ## recollatz = fix $ \f x -> if (x /= 1)
mocollatz :: Integral b => b -> ([b], b)
mocollatz n = ([n], collatz n)
-remocollatz :: Integral a => a -> ([a], a)
-remocollatz = fix $ \f x -> if x /= 1
- then f =<< mocollatz x
- else return x
+remocollatz :: (Num a, Integral b) => b -> ([b], a)
+remocollatz 1 = return 1
+remocollatz x = case M.lookup x (table mutate) of
+ Nothing -> mutate x
+ Just y -> y
+ where mutate x = remocollatz =<< mocollatz x
+
+table :: (Ord a, Integral a) => (a -> b) -> M.Map a b
+table f = M.fromList [ (x, f x) | x <- [1..size] ]
-- Trivialities.
-- Then it will take just about 4 seconds with ghc -O2, but I would not live long enough to see it complete without ghc -O2.
Looking at the details of cost centres with ghc -prof -fprof-auto -O2 reveals that the first version enters collatz about a hundred million times, while the patched one -- just about one and a half million times. This must be the reason of the speedup, but I have a hard time understanding the inner workings of this magic. My best idea is that we replace a portion of expensive recursive calls with O(log n) map lookups, but I don't know if it's true and why it depends so much on some godforsaken compiler flags, while, as I see it, such performance swings should all follow solely from the language.
Can I haz an explanation of what happens here, and why the performance differs so vastly between ghc -O2 and plain ghc builds?
P.S. There are two requirements to the achieving of automagical memoization highlighted elsewhere on Stack Overflow:
Make a function to be memoized a top-level name.
Make a function to be memoized a monomorphic one.
In line with these requirements, I rebuilt remocollatz as follows:
remocollatz :: Int -> ([Int], Int)
remocollatz 1 = return 1
remocollatz x = mutate x
mutate :: Int -> ([Int], Int)
mutate x = remocollatz =<< mocollatz x
Now it's as top level and as monomorphic as it gets. Running time is about 11 seconds, versus the similarly monomorphized table version:
remocollatz :: Int -> ([Int], Int)
remocollatz 1 = return 1
remocollatz x = case M.lookup x (table mutate) of
Nothing -> mutate x
Just y -> y
mutate :: Int -> ([Int], Int)
mutate = \x -> remocollatz =<< mocollatz x
table :: (Int -> ([Int], Int)) -> M.Map Int ([Int], Int)
table f = M.fromList [ (x, f x) | x <- [1..size] ]
-- Running in less than 4 seconds.
I wonder why the memoization ghc is supposedly performing in the first case here is almost 3 times slower than my dumb table.

Can I haz an explanation of what happens here, and why the performance differs so vastly between ghc -O2 and plain ghc builds?
Disclaimer: this is a guess, not verified by viewing GHC core output. A careful answer would do so to verify the conjectures outlined below. You can try peering through it yourself: add -ddump-simpl to your compilation line and you will get copious output detailing exactly what GHC has done to your code.
You write:
remocollatz x = {- ... -} table mutate {- ... -}
where mutate x = remocollatz =<< mocollatz x
The expression table mutate in fact does not depend on x; but it appears on the right-hand side of an equation that takes x as an argument. Consequently, without optimizations, this table is recomputed each time remocollatz is called (presumably even from inside the computation of table mutate).
With optimizations, GHC notices that table mutate does not depend on x, and floats it to its own definition, effectively producing:
fresh_variable_name = table mutate
where mutate x = remocollatz =<< mocollatz x
remocollatz x = case M.lookup x fresh_variable_name of
{- ... -}
The table is therefore computed just once for the entire program run.
don't know why it [the performance] depends so much on some godforsaken compiler flags, while, as I see it, such performance swings should all follow solely from the language.
Sorry, but Haskell doesn't work that way. The language definition tells clearly what the meaning of a given Haskell term is, but does not say anything about the runtime or memory performance needed to compute that meaning.

Another approach to memoization that works in some situations, like this one, is to use a boxed vector, whose elements are computed lazily. The function used to initialize each element can use other elements of the vector in its calculation. As long as the evaluation of an element of the vector doesn't loop and refer to itself, just the elements it recursively depends on will be evaluated. Once evaluated, an element is effectively memoized, and this has the further benefit that elements of the vector that are never referenced are never evaluated.
The Collatz sequence is a nearly ideal application for this technique, but there is one complication. The next Collatz value(s) in sequence from a value under the limit may be outside the limit, which would cause a range error when indexing the vector. I solved this by just iterating through the sequence until back under the limit and counting the steps to do so.
The following program takes 0.77 seconds to run unoptimized and 0.30 when optimized:
import qualified Data.Vector as V
limit = 10 ^ 6 :: Int
-- The Collatz function, which given a value returns the next in the sequence.
nextCollatz val
| odd val = 3 * val + 1
| otherwise = val `div` 2
-- Given a value, return the next Collatz value in the sequence that is less
-- than the limit and the number of steps to get there. For example, the
-- sequence starting at 13 is: [13, 40, 20, 10, 5, 16, 8, 4, 2, 1], so if
-- limit is 100, then (nextCollatzWithinLimit 13) is (40, 1), but if limit is
-- 15, then (nextCollatzWithinLimit 13) is (10, 3).
nextCollatzWithinLimit val = (firstInRange, stepsToFirstInRange)
where
firstInRange = head rest
stepsToFirstInRange = 1 + (length biggerThanLimit)
(biggerThanLimit, rest) = span (>= limit) (tail collatzSeqStartingWithVal)
collatzSeqStartingWithVal = iterate nextCollatz val
-- A boxed vector holding Collatz length for each index. The collatzFn used
-- to generate the value for each element refers back to other elements of
-- this vector, but since the vector elements are only evaluated as needed and
-- there aren't any loops in the Collatz sequences, the values are calculated
-- only as needed.
collatzVec :: V.Vector Int
collatzVec = V.generate limit collatzFn
where
collatzFn :: Int -> Int
collatzFn index
| index <= 1 = 1
| otherwise = (collatzVec V.! nextWithinLimit) + stepsToGetThere
where
(nextWithinLimit, stepsToGetThere) = nextCollatzWithinLimit index
main :: IO ()
main = do
-- Use a fold through the vector to find the longest Collatz sequence under
-- the limit, and keep track of both the maximum length and the initial
-- value of the sequence, which is the index.
let (maxLength, maxIndex) = V.ifoldl' accMaxLen (0, 0) collatzVec
accMaxLen acc#(accMaxLen, accMaxIndex) index currLen
| currLen <= accMaxLen = acc
| otherwise = (currLen, index)
putStrLn $ "Max Collatz length below " ++ show limit ++ " is "
++ show maxLength ++ " at index " ++ show maxIndex

improvement to my code for Haskell monads

I am working with Haskell and maybe monads but I am a little bit confused with them
here is my code but I am getting error and I do not know how to improve my code.
doAdd :: Int -> Int -> Maybe Int
doAdd x y = do
result <- x + y
return result

Let's look critically at the type of the function that you're writing:
doAdd :: Int -> Int -> Maybe Int
The point of the Maybe monad is to work with types that are wrapped with a Maybe type constructor. In your case, the two Int arguments are just plain Ints, and the + function always produces an Int so there is no need for the monad.
If instead, your function took Maybe Int as its arguments, then you could use do notation to handle the Nothing case behind the scenes:
doAdd :: Maybe Int -> Maybe Int -> Maybe Int
doAdd mx my = do x <- mx
y <- my
return (x + y)
example1 = doAdd (Just 1) (Just 3) -- => Just 4
example2 = doAdd (Just 1) Nothing -- => Nothing
example3 = doAdd Nothing (Just 3) -- => Nothing
example4 = doAdd Nothing Nothing -- => Nothing
But we can extract a pattern from this: what you are doing, more generically, is taking a function ((+) :: Int -> Int -> Int) and adapting it to work in the case where the arguments it wants are "inside" a monad. We can abstract away from the specific function (+) and the specific monad (Maybe) and get this generic function:
liftM2 :: Monad m => (a -> b -> c) -> m a -> m b -> m c
liftM2 f ma mb = do a <- ma
b <- mb
return (f a b)
Now with liftM2 you can write:
doAdd :: Maybe Int -> Maybe Int -> Maybe Int
doAdd = liftM2 (+)
The reason why I chose the name liftM2 is because this is actually a library function—you don't need to write it, you can import the Control.Monad module and you'll get it for free.
What would be a better example of using the Maybe monad? When you have an operation that, unlike +, can intrinsically can produce a Maybe result. One idea would be if you wanted to catch division by 0 mistakes. You could write a "safe" version of the div function:
-- | Returns `Nothing` if second argument is zero.
safeDiv :: Int -> Int -> Maybe Int
safeDiv _ 0 = Nothing
safeDiv x y = Just (x `div` y)
Now in this case the monad does become more useful:
-- | This function tests whether `x` is divisible by `y`. Returns `Nothing` if
-- division by zero.
divisibleBy :: Int -> Int -> Maybe Bool
divisibleBy x y = do z <- safeDiv x y
let x' = z * y
return (x == x')
Another more interesting monad example is if you have operations that return more than one value—for example, positive and negative square roots:
-- Compute both square roots of x.
allSqrt x = [sqrt x, -(sqrt x)]
-- Example: add the square roots of 5 to those of 7.
example = do x <- allSqrt 5
y <- allSqrt 7
return (x + y)
Or using liftM2 from above:
example = liftM2 (+) (allSqrt 5) (allSqrt 7)
So anyway, a good rule of thumb is this: never "pollute" a function with a monad type if it doesn't really need it. Your original doAdd—and even my rewritten version—are a violation of this rule of thumb, because what the function does is adding, but adding has nothing to do with Maybe—the Nothing handling is just a behavior that we add on top of the core function (+). The reason for this rule of thumb is that any function that does not use monads can be generically adapted to add the behavior of any monad you want, using utility functions like liftM2 (and many other similar utility functions).
On the other hand, safeDiv and allSqrt are examples where you can't really write the function you want without using Maybe or []; if you are dealing with a function like that, then monads are often a convenient abstraction for eliminating boilerplate code.

A better example might be
justPositive :: Num a => a -> Maybe a
justPositive x
| x <= 0 = Nothing
| otherwise = Just x
addPositives x y = do
x' <- justPositive x
y' <- justPositive y
return $ x' + y'
This will filter out any non-positive values passed into the function using do notation

That isn't how you'd write that code. The <- operator is for getting a value out of a monad. The result of x + y is just a number, not a monad wrapping a number.
Do notation is actually completely wasteful here. If you were bound and determined to write it that way, it would have to look like this:
doAdd x y = do
let result = x + y
return result
But that's just a longwinded version of this:
doAdd x y = return $ x + y
Which is in turn equivalent to
doAdd x y = Just $ x + y
Which is how you'd actually write something like this.

The use case you give doesn't justify do notation, but this is a more common use case- You can chain functions of this type together.
func::Int->Int->Maybe Int -- func would be a function like divide, which is undefined for division by zero
main = do
result1 <- func 1 2
result2 <- func 3 4
result3 <- func result1 result2
return result3
This is the whole point of monads anyway, chaining together functions of type a->m a.
When used this way, the Maybe monad acts much like exceptions in Java (you can use Either if you want to propagate a message up).

How to increment a variable in functional programming?

How do you increment a variable in a functional programming language?
For example, I want to do:
main :: IO ()
main = do
let i = 0
i = i + 1
print i
Expected output:
1

Simple way is to introduce shadowing of a variable name:
main :: IO () -- another way, simpler, specific to monads:
main = do main = do
let i = 0 let i = 0
let j = i i <- return (i+1)
let i = j+1 print i
print i -- because monadic bind is non-recursive
Prints 1.
Just writing let i = i+1 doesn't work because let in Haskell makes recursive definitions — it is actually Scheme's letrec. The i in the right-hand side of let i = i+1 refers to the i in its left hand side — not to the upper level i as might be intended. So we break that equation up by introducing another variable, j.
Another, simpler way is to use monadic bind, <- in the do-notation. This is possible because monadic bind is not recursive.
In both cases we introduce new variable under the same name, thus "shadowing" the old entity, i.e. making it no longer accessible.
How to "think functional"
One thing to understand here is that functional programming with pure — immutable — values (like we have in Haskell) forces us to make time explicit in our code.
In imperative setting time is implicit. We "change" our vars — but any change is sequential. We can never change what that var was a moment ago — only what it will be from now on.
In pure functional programming this is just made explicit. One of the simplest forms this can take is with using lists of values as records of sequential change in imperative programming. Even simpler is to use different variables altogether to represent different values of an entity at different points in time (cf. single assignment and static single assignment form, or SSA).
So instead of "changing" something that can't really be changed anyway, we make an augmented copy of it, and pass that around, using it in place of the old thing.

As a general rule, you don't (and you don't need to). However, in the interests of completeness.
import Data.IORef
main = do
i <- newIORef 0 -- new IORef i
modifyIORef i (+1) -- increase it by 1
readIORef i >>= print -- print it
However, any answer that says you need to use something like MVar, IORef, STRef etc. is wrong. There is a purely functional way to do this, which in this small rapidly written example doesn't really look very nice.
import Control.Monad.State
type Lens a b = ((a -> b -> a), (a -> b))
setL = fst
getL = snd
modifyL :: Lens a b -> a -> (b -> b) -> a
modifyL lens x f = setL lens x (f (getL lens x))
lensComp :: Lens b c -> Lens a b -> Lens a c
lensComp (set1, get1) (set2, get2) = -- Compose two lenses
(\s x -> set2 s (set1 (get2 s) x) -- Not needed here
, get1 . get2) -- But added for completeness
(+=) :: (Num b) => Lens a b -> Lens a b -> State a ()
x += y = do
s <- get
put (modifyL x s (+ (getL y s)))
swap :: Lens a b -> Lens a b -> State a ()
swap x y = do
s <- get
let x' = getL x s
let y' = getL y s
put (setL y (setL x s y') x')
nFibs :: Int -> Int
nFibs n = evalState (nFibs_ n) (0,1)
nFibs_ :: Int -> State (Int,Int) Int
nFibs_ 0 = fmap snd get -- The second Int is our result
nFibs_ n = do
x += y -- Add y to x
swap x y -- Swap them
nFibs_ (n-1) -- Repeat
where x = ((\(x,y) x' -> (x', y)), fst)
y = ((\(x,y) y' -> (x, y')), snd)

There are several solutions to translate imperative i=i+1 programming to functional programming. Recursive function solution is the recommended way in functional programming, creating a state is almost never what you want to do.
After a while you will learn that you can use [1..] if you need a index for example, but it takes a lot of time and practice to think functionally instead of imperatively.
Here's a other way to do something similar as i=i+1 not identical because there aren't any destructive updates. Note that the State monad example is just for illustration, you probably want [1..] instead:
module Count where
import Control.Monad.State
count :: Int -> Int
count c = c+1
count' :: State Int Int
count' = do
c <- get
put (c+1)
return (c+1)
main :: IO ()
main = do
-- purely functional, value-modifying (state-passing) way:
print $ count . count . count . count . count . count $ 0
-- purely functional, State Monad way
print $ (`evalState` 0) $ do {
count' ; count' ; count' ; count' ; count' ; count' }

Note: This is not an ideal answer but hey, sometimes it might be a little good to give anything at all.
A simple function to increase the variable would suffice.
For example:
incVal :: Integer -> Integer
incVal x = x + 1
main::IO()
main = do
let i = 1
print (incVal i)
Or even an anonymous function to do it.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string