Managing memory usage with infinite lists

Managing memory usage with infinite lists - haskell

So I am trying to find ways to make the number 5 using the number 15
import Data.List
import Control.Monad
import Control.Monad.Omega
import System.IO
data Expr = X | Sub Expr Expr | Div Expr Expr | Pow Expr Expr
deriving Show
eval x X = x
eval x (Sub a b) = eval x a - eval x b
eval x (Div a b) = eval x a / eval x b
eval x (Pow a b) = eval x a ** eval x b
exprs | v <- each exprs
= (X:) . concat $ transpose
[ runOmega $ liftM2 Div v v
, runOmega $ liftM2 Sub v v
, runOmega $ liftM2 Pow v v
]
main = do
hSetBuffering stdout LineBuffering
mapM_ print $ filter ((==5).eval 15) exprs
However, the first result is the 118588079th element in the list, and Haskell runs out of memory long before getting there. How do I know the first one is 118588079? Because if I just calculate it and ask for the index instead, Haskell uses no memory at all:
run x | v <- each $ run x
= (x:) . concat $ transpose
[ runOmega $ liftM2 (/ ) v v
, runOmega $ liftM2 (- ) v v
, runOmega $ liftM2 (**) v v
]
main = print . map snd . filter ((==5).fst) $ zip (run 15) [0..]
Where exactly is my memory going in the first case, and how do I get around it?

A top level binding like
list = [1..1000000]
will lazily produce the list once, and keep it in memory for subsequent use.
A function, instead,
fun x = [1..1000000]
will allocate a new list at every call, recomputing it (lazily) from scratch every time. Because it's referred by a top level binding, it will never be garbage collected.
Note that this is not mandated by Haskell -- it's just GHC which works in this way.
For comparison, try this variant:
run x | v <- each $ run x
= (x:) . concat $ transpose
[ runOmega $ liftM2 (/ ) v v
, runOmega $ liftM2 (- ) v v
, runOmega $ liftM2 (**) v v
]
run15 = run 15
main = print . map snd . filter ((==5).fst) $ zip run15 [0..]
You should see a lot of memory being consumed, since no garbage collection can happen. Instead,
main = print . map snd . filter ((==5).fst) $ zip run15 [0..]
where
run15 = run 15
should allow for garbage collection and run in a small amount of memory.
(By the way, that use of a pattern binding instead of let/where puzzled me for a while.)

#chi is exactly right. A function with no arguments is not really a function, and will be kept in memory, regardless of how recursive it is, whether it produces an infinite datastructure or not. This will cause you to run out of memory:
run' | v <- each run'
= (15:) . concat $ transpose
[ runOmega $ liftM2 (/ ) v v
, runOmega $ liftM2 (- ) v v
, runOmega $ liftM2 (**) v v
]
main = print $ run' !! 1000000000
while the first run function will not. On the same note- despite looking silly- this will not run out of memory:
exprs' x | v <- each $ exprs' x
= (x:) . concat $ transpose
[ runOmega $ liftM2 Div v v
, runOmega $ liftM2 Sub v v
, runOmega $ liftM2 Pow v v
]
main = print $ exprs' X !! 1000000000
PS
If anyone is curious, the 118588079th element in the list was ((((x**x)/x)-(x**(x-(x/x))))-((x-x)-(x/x)))

Related

Lazy state transformer consumes lazy list eagerly in 2D recursion

I'm using a state transformer to randomly sample a dataset at every point of a 2D recursive walk, which outputs a list of 2D grids of samples that together succeed a condition. I'd like to pull from the results lazily, but my approach instead exhausts the whole dataset at every point before I can pull the first result.
To be concrete, consider this program:
import Control.Monad ( sequence, liftM2 )
import Data.Functor.Identity
import Control.Monad.State.Lazy ( StateT(..), State(..), runState )
walk :: Int -> Int -> [State Int [Int]]
walk _ 0 = [return [0]]
walk 0 _ = [return [0]]
walk x y =
let st :: [State Int Int]
st = [StateT (\s -> Identity (s, s + 1)), undefined]
unst :: [State Int Int] -- degenerate state tf
unst = [return 1, undefined]
in map (\m_z -> do
z <- m_z
fmap concat $ sequence [
liftM2 (zipWith (\x y -> x + y + z)) a b -- for 1D: map (+z) <$> a
| a <- walk x (y - 1) -- depth
, b <- walk (x - 1) y -- breadth -- comment out for 1D
]
) st -- vs. unst
main :: IO ()
main = do
std <- getStdGen
putStrLn $ show $ head $ fst $ (`runState` 0) $ head $ walk 2 2
The program walks the rectangular grid from (x, y) to (0, 0) and sums all the results, including the value of one of the lists of State monads: either the non-trivial transformers st that read and advance their state, or the trivial transformers unst. Of interest is whether the algorithm explores past the heads of st and unst.
In the code as presented, it throws undefined. I chalked this up to a misdesign of my order of chaining the transformations, and in particular, a problem with the state handling, as using unst instead (i.e. decoupling the result from state transitions) does produce a result. However, I then found that a 1D recursion also preserves laziness even with the state transformer (remove the breadth step b <- walk... and swap the liftM2 block for fmap).
If we trace (show (x, y)), we also see that it does walk the whole grid before triggering:
$ cabal run
Build profile: -w ghc-8.6.5 -O1
...
(2,2)
(2,1)
(1,2)
(1,1)
(1,1)
sandbox: Prelude.undefined
I suspect that my use of sequence is at fault here, but as the choice of monad and the dimensionality of the walk affect its success, I can't say broadly that sequenceing the transformations is the source of strictness by itself.
What's causing the difference in strictness between 1D and 2D recursion here, and how can I achieve the laziness I want?

Consider the following simplified example:
import Control.Monad.State.Lazy
st :: [State Int Int]
st = [state (\s -> (s, s + 1)), undefined]
action1d = do
a <- sequence st
return $ map (2*) a
action2d = do
a <- sequence st
b <- sequence st
return $ zipWith (+) a b
main :: IO ()
main = do
print $ head $ evalState action1d 0
print $ head $ evalState action2d 0
Here, in both the 1D and 2D calculations, the head of the result depends explicitly only on the heads of the inputs (just head a for the 1D action and both head a and head b for the 2D action). However, in the 2D calculation, there's an implicit dependency of b (even just its head) on the current state, and that state depends on the evaluation of the entirety of a, not just its head.
You have a similar dependency in your example, though it's obscured by the use of lists of state actions.
Let's say we wanted to run the action walk22_head = head $ walk 2 2 manually and inspect the first integer in the resulting list:
main = print $ head $ evalState walk22_head
Writing the elements of the state action list st explicitly:
st1, st2 :: State Int Int
st1 = state (\s -> (s, s+1))
st2 = undefined
we can write walk22_head as:
walk22_head = do
z <- st1
a <- walk21_head
b <- walk12_head
return $ zipWith (\x y -> x + y + z) a b
Note that this depends only on the defined state action st1 and the heads of walk 2 1 and walk 1 2. Those heads, in turn, can be written:
walk21_head = do
z <- st1
a <- return [0] -- walk20_head
b <- walk11_head
return $ zipWith (\x y -> x + y + z) a b
walk12_head = do
z <- st1
a <- walk11_head
b <- return [0] -- walk02_head
return $ zipWith (\x y -> x + y + z) a b
Again, these depend only on the defined state action st1 and the head of walk 1 1.
Now, let's try to write down a definition of walk11_head:
walk11_head = do
z <- st1
a <- return [0]
b <- return [0]
return $ zipWith (\x y -> x + y + z) a b
This depends only on the defined state action st1, so with these definitions in place, if we run main, we get a defined answer:
> main
10
But these definitions aren't accurate! In each of walk 1 2 and walk 2 1, the head action is a sequence of actions, starting with the action that invokes walk11_head, but continuing with actions based on walk11_tail. So, more accurate definitions would be:
walk21_head = do
z <- st1
a <- return [0] -- walk20_head
b <- walk11_head
_ <- walk11_tail -- side effect of the sequennce
return $ zipWith (\x y -> x + y + z) a b
walk12_head = do
z <- st1
a <- walk11_head
b <- return [0] -- walk02_head
_ <- walk11_tail -- side effect of the sequence
return $ zipWith (\x y -> x + y + z) a b
with:
walk11_tail = do
z <- undefined
a <- return [0]
b <- return [0]
return [zipWith (\x y -> x + y + z) a b]
With these definitions in place, there's no problem running walk12_head and walk21_head in isolation:
> head $ evalState walk12_head 0
1
> head $ evalState walk21_head 0
1
The state side effects here are not needed to calculate the answer and so never invoked. But, it's not possible to run them both in sequence:
> head $ evalState (walk12_head >> walk21_head) 0
*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:78:14 in base:GHC.Err
undefined, called at Lazy2D_2.hs:41:8 in main:Main
Therefore, trying to run main fails for the same reason:
> main
*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:78:14 in base:GHC.Err
undefined, called at Lazy2D_2.hs:41:8 in main:Main
because, in calculating walk22_head, even the very beginning of walk21_head's calculation depends on the state side effect walk11_tail initiated by walk12_head.
Your original walk definition behaves the same way as these mockups:
> head $ evalState (head $ walk 1 2) 0
1
> head $ evalState (head $ walk 2 1) 0
1
> head $ evalState (head (walk 1 2) >> head (walk 2 1)) 0
*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:78:14 in base:GHC.Err
undefined, called at Lazy2D_0.hs:15:49 in main:Main
> head $ evalState (head (walk 2 2)) 0
*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:78:14 in base:GHC.Err
undefined, called at Lazy2D_0.hs:15:49 in main:Main
It's hard to say how to fix this. Your toy example was excellent for the purposes of illustrating the problem, but it's not clear how the state is used in your "real" problem and if head $ walk 2 1 really has a state dependency on the sequence of walk 1 1 actions induced by head $ walk 1 2.

The accepted answer by K.A. Buhr is right: while getting the head of one step in each direction is fine (try walk with either x < 2 or y < 2) the combination of the implicit >>= in liftM2, the sequence in the value of a and the state dependency in the value of b makes b depend on all side effects of a. As he also pointed out, a working solution depends on what dependencies are actually wanted.
I'll share a solution for my particular case: each walk call depends on the state of the caller at least, and perhaps some other states, based on a pre-order traversal of the grid and alternatives in st. In addition, as the question suggests, I want to try to make a full result before testing any unneeded alternatives in st. This is a little difficult to explain visually, but here's the best I could do: the left shows the variable number of st alternatives at each coordinate (which is what I have in my actual use case) and the right shows a [rather messy] map of the desired dependency order of the state: we see it traverses x-y first in a 3D DFS, with "x" as depth (fastest axis), "y" as breadth (middle axis), then finally alternatives as the slowest axis (shown in dashed lines with open circles).
The central issue in the original implementation came from sequencing lists of state transitions to accommodate the non-recursive return type. Let's replace the list type altogether with a type that's recursive in the monad parameter, so the caller can better control the dependency order:
data ML m a = MCons a (MML m a) | MNil -- recursive monadic list
newtype MML m a = MML (m (ML m a)) -- base case wrapper
An example of [1, 2]:
MCons 1 (MML (return (MCons 2 (MML (return MNil)))))
Functor and Monoid behaviors are used often, so here's the relevant implementations:
instance Functor m => Functor (ML m) where
fmap f (MCons a m) = MCons (f a) (MML $ (fmap f) <$> coerce m)
fmap _ MNil = MNil
instance Monad m => Semigroup (MML m a) where
(MML l) <> (MML r) = MML $ l >>= mapper where
mapper (MCons la lm) = return $ MCons la (lm <> (MML r))
mapper MNil = r
instance Monad m => Monoid (MML m a) where
mempty = MML (pure MNil)
There are two critical operations: combining steps in two different axes, and combining lists from different alternatives at the same coordinate. Respectively:
Based on the diagram, we want to get a single full result from the x step first, then a full result from the y step. Each step returns a list of results from all combinations of viable alternatives from inner coordinates, so we take a Cartesian product over both lists, also biased in one direction (in this case y fastest). First we define a "concatenation" that applies a base case wrapper MML at the end of a bare list ML:
nest :: Functor m => MML m a -> ML m a -> ML m a
nest ma (MCons a mb) = MCons a (MML $ nest ma <$> coerce mb)
then a Cartesian product:
prodML :: Monad m => (a -> a -> a) -> ML m a -> ML m a -> ML m a
prodML f x (MCons ya ym) = (MML $ prodML f x <$> coerce ym) `nest` ((f ya) <$> x)
prodML _ MNil _ = MNil
We want to smash the lists from different alternatives into one list and we don't care that this introduces dependencies between alternatives. This is where we use mconcat from the Monoid instance.
All in all, it looks like this:
walk :: Int -> Int -> MML (State Int) Int
-- base cases
walk _ 0 = MML $ return $ MCons 1 (MML $ return MNil)
walk 0 _ = walk 0 0
walk x y =
let st :: [State Int Int]
st = [StateT (\s -> Identity (s, s + 1)), undefined]
xstep = coerce $ walk (x-1) y
ystep = coerce $ walk x (y-1)
-- point 2: smash lists with mconcat
in mconcat $ map (\mz -> MML $ do
z <- mz
-- point 1: product over results
liftM2 ((fmap (z+) .) . prodML (+)) xstep ystep
) st
headML (MCons a _) = a
headML _ = undefined
main :: IO ()
main = putStrLn $ show $ headML $ fst $ (`runState` 0) $ (\(MML m) -> m) $ walk 2 2
Note the result have changed with the semantics. It doesn't matter to me since my goal only needed to pull random numbers from state, and whatever dependency order is needed can be controlled with the right shepherding of list elements into the final result.
(I'll also warn that without memoization or attention to strictness, this implementation is very inefficient for large x and y.)

Parsec permutation parsing

I wrote such permutation parsing example:
data Entry = Entry {
first_name :: String
, last_name :: String
, date_of_birth :: Maybe String
, nationality :: Maybe String
, parentage :: Maybe String
} deriving (Show)
nameParser :: Parser (String, String)
nameParser = do
first_name <- many1 upper
endOfLine
last_name <- many1 letter
endOfLine
return $ (first_name, last_name)
attributeParser :: String -> Parser String
attributeParser field = do
string $ field ++ ": "
value <- many1 (noneOf "\n")
endOfLine
return value
entryParser :: Parser Entry
entryParser = do
(f, l) <- nameParser
(d, n, p) <- permute ((,,)
<$?> (Nothing, liftM Just (try $ attributeParser "Date of Birth"))
<|?> (Nothing, liftM Just (try $ attributeParser "Nationality"))
<|?> (Nothing, liftM Just (try $ attributeParser "Parentage"))
)
return $ Entry f l d n p
main = do
mapM_ putStrLn . map (show . parse entryParser "") $ goodTests
goodTests =
"AAKVAAG\nTorvild\nDate of Birth: 1 July\nNationality: Norwegian\nParentage: business executive\n" :
"AAKVAAG\nTorvild\nNationality: Norwegian\nParentage: business executive\n" :
"AAKVAAG\nTorvild\nParentage: business executive\nNationality: Norwegian\n" :
"AAKVAAG\nTorvild\nParentage: business executive\n" :
"AAKVAAG\nTorvild\nNationality: Norwegian\n" : []
It would be good to extend Entry data with new fields in future, but doing that will require to put even more repetitive code in entryParser function. Is there a way to make this function accept list of parsers?
I started with this:
attributeParsers =
map attributeParser ["Date of Birth", "Nationality", "Parentage"]
permuteParams =
map (\p -> (Nothing, liftM Just (try p))) attributeParsers
But could not come of with correct way to fold permuteParams together with <|?> operator (I guess it would require something smarter than (,,) tuple constructor then).

As a first step, you can abstract the stuff you do for every component:
attr txt = (Nothing, liftM Just (try $ attributeParser txt))
With this, you can go to:
entryParser :: Parser Entry
entryParser = do
(f, l) <- nameParser
(d, n, p) <- permute ((,,)
<$?> attr "Date of Birth"
<|?> attr "Nationality"
<|?> attr "Parentage"
)
return $ Entry f l d n p
Then, if you want, you can combine the infix combinators and the attr calls:
f .$ x = f <$?> attr x
f .| x = f <|?> attr x
infixl 2 .$
infixl 2 .|
This gives you:
entryParser :: Parser Entry
entryParser = do
(f, l) <- nameParser
(d, n, p) <- permute ((,,)
.$ "Date of Birth"
.| "Nationality"
.| "Parentage"
)
return $ Entry f l d n p
Then you can further simplify by getting rid of the intermediate triple. All you're doing is to build it and then apply its components to Entry f l, so you can as well apply the result of the permutation parser to Entry f l directly:
entryParser :: Parser Entry
entryParser = do
(f, l) <- nameParser
permute (Entry f l
.$ "Date of Birth"
.| "Nationality"
.| "Parentage"
)
I think this is compact enough. If you really want some kind of fold, you'll either have to introduce an intermediate list and collect the permutation results in a list. This, however, only works as long as all the permutable attributes are of the same type (they currently are), and is not so nice because you'll make assumptions about the number of elements in this list. Or you'll have to use a heterogeneous list / some type class magic, which will lead to more complexity with the types and is, I think, not worth it here.

(<|?>) does not play nicely with folding because the type of the StreamPermParser you pass as its first argument isn't the same than that of the StreamPermParser result. For a simpler yet analogous issue, you would run into similar problems if you were trying to use (,,) with (<$>) and (<*>) in applicative style (e.g. (,,) <$> foo <*> bar <*> baz).
If you want to cut down some of the repetition, my prosaic suggestion would be using a local definition:
entryParser :: Parser Entry
entryParser = do
(f, l) <- nameParser
(d, n, p) <- permute ((,,)
<$?> optField "Date of Birth"
<|?> optField "Nationality"
<|?> optField "Parentage"
)
return $ Entry f l d n p
where
optField fieldName = (Nothing, liftM Just (try $ attributeParser fieldName))

Using force vs time / space efficiency

Hi I am generating a sparse DAG of 1000 X 1000 nodes each having ~4 edges (direction). Here is the relevant code : Full Code with imports
The problem i am solving has values between [0-1500]. I have hardcoded 1501 as upper value for now. I am trying to calculate longest path of edges in the DAG. However, these details are not direct part of my question :
My question is related to how to judge the usage of force or similar constructs while writing algos in haskell :
type OutGoingEdges = Map.Map NodeId [ NodeId ]
type NodesData = Map.Map NodeId Node
type NodeId = Int
data DAG = AdjList
{ outGoingEdges :: OutGoingEdges
, nodesData :: NodesData
} deriving (Eq, Show)
makeDAG :: DAGDataPath -> IO (DAG, SourceNodes)
makeDAG filepath = do
listOfListOfInts <- makeInteger <$> readLines filepath
let [width, height] = head listOfListOfInts
numNodes = width * height
rows = (replicate width 1501) : (drop 1 listOfListOfInts) ++ [(replicate width 1501)]
heightsWithNodeIdsRows = force . fmap (\ (row, rowId) -> fmap (\ (height, colId) -> (height, rowId * width + colId)) $ zip row [1..]) $ zip rows [1..]
emptyGraph = AdjList Map.empty $ Map.fromList (fmap (\(h, nid) -> (nid, Node h)) . concat . tail . init $ heightsWithNodeIdsRows)
emptyNodesWithEdges = Set.empty
threeRowsInOneGo = zip3 heightsWithNodeIdsRows (drop 1 heightsWithNodeIdsRows) (drop 2 heightsWithNodeIdsRows)
(graph, nodesWithInEdges) = DL.foldl' makeGraph (emptyGraph, emptyNodesWithEdges) threeRowsInOneGo
sourceNodes = Set.difference (Set.fromList . Map.keys . nodesData $ graph) nodesWithInEdges
-- traceShow [take 10 . Map.keys . nodesData $ graph] (return (Set.toList sourceNodes))
-- traceShow graph (return (Set.toList sourceNodes))
-- traceShow sourceNodes (return (Set.toList sourceNodes))
return (graph, force $ Set.toList sourceNodes)
where
makeGraph (graphTillNow, nodesWithInEdges) (prevRow, row, nextRow) =
let updownEdges = zip3 prevRow row nextRow
(graph', nodesInEdges') = addEdges (graphTillNow, nodesWithInEdges) updownEdges
leftRightEdges = zip3 ((1501, 0) : row) (drop 1 row) (drop 2 row)
(graph'', nodesInEdges'') = addEdges (graph', nodesInEdges') leftRightEdges
Next line is interesting... graph'' is DAG and nodesInEdges'' is a Set NodeId
in (graph'', nodesInEdges'')
addEdges (g, n) edges =
DL.foldl' (\ (!g', !n') ((p, pId), (c, cId), (n, nId)) ->
let (g'', n'') = if c > p
then (makeEdge cId pId g', Set.insert pId n')
else (g', n')
(g''', n''') = if c > n
then (makeEdge cId nId g'', Set.insert nId n'')
else (g'', n'')
in (g''', n'''))
(g, n)
edges
While profiling i found that, if i use (force graph'', force nodesInEdges'') instead of (graph'', nodesInEdges''), my memory usage reduces from 3 GB to 600 MB. But running time of program increases from 37 secs to 69 secs. These numbers are from time command and looking at activity monitor. I also checked with profiling and it was similar results.
I am compiling profile builds with :
stack build --executable-profiling --library-profiling --ghc-options="-fprof-auto -auto-all -caf-all -fforce-recomp -rtsopts" --file-watch
I have ghc-7.10.3 and stack 1.1.2.
I think that force goes over the data structure again and again.
Can force be told to not go over the graph if it already fully evaluated ?
Can i use some other strategy ?
Sample Input:
2 2 -- width height
1 2
3 4
Output:
3
Output is length of longest path in the graph. [4 -> 2 -> 1] i.e. [(1,1),(0,1), (0,0)]. Just to remind, correctness of program is not the question;
space/time efficiency is. Thanks

Short-circuiting a function over a lower triangular(ish) array in Haskell: speed leads to ugly code

I've got a function, in my minimum example called maybeProduceValue i j, which is only valid when i > j. Note that in my actual code, the js are not uniform and so the data only resembles a triangular matrix, I don't know what the mathematical name for this is.
I'd like my code, which loops over i and j and returns essentially (where js is sorted)
[maximum [f i j | j <- js, j < i] | i <- [0..iMax]]
to not check any more j's once one has failed. In C-like languages, this is simple as
if (j >= i) {break;}
and I'm trying to recreate this behaviour in Haskell. I've got two implementations below:
one which tries to take advantage of laziness by using takeWhile to only inspect at most one value (per i) which fails the test and returns Nothing;
one which remembers the number of js which worked for the previous i and so, for i+1, it doesn't bother doing any safety checks until it exceeds this number.
This latter function is more than twice as fast by my benchmarks but it really is a mess - I'm trying to convince people that Haskell is more concise and safe while still reasonably performant and here is some fast code which is dense, cluttered and does a bunch of unsafe operations.
Is there a solution, perhaps using Cont, Error or Exception, that can achieve my desired behaviour?
n.b. I've tried using Traversable.mapAccumL and Vector.unfoldrN instead of State and they end up being about the same speed and clarity. It's still a very overcomplicated way of solving this problem.
import Criterion.Config
import Criterion.Main
import Control.DeepSeq
import Control.Monad.State
import Data.Maybe
import qualified Data.Traversable as T
import qualified Data.Vector as V
main = deepseq inputs $ defaultMainWith (defaultConfig{cfgSamples = ljust 10}) (return ()) [
bcompare [
bench "whileJust" $ nf whileJust js,
bench "memoised" $ nf memoisedSection js
]]
iMax = 5000
jMax = 10000
-- any sorted vector
js :: V.Vector Int
js = V.enumFromN 0 jMax
maybeProduceValue :: Int -> Int -> Maybe Float
maybeProduceValue i j | j < i = Just (fromIntegral (i+j))
| otherwise = Nothing
unsafeProduceValue :: Int -> Int -> Float
-- unsafeProduceValue i j | j >= i = error "you fool!"
unsafeProduceValue i j = fromIntegral (i+j)
whileJust, memoisedSection
:: V.Vector Int -> V.Vector Float
-- mean: 389ms
-- short circuits properly
whileJust inputs' = V.generate iMax $ \i ->
safeMax . V.map fromJust . V.takeWhile isJust $ V.map (maybeProduceValue i) inputs'
where safeMax v = if V.null v then 0 else V.maximum v
-- mean: 116ms
-- remembers the (monotonically increasing) length of the section of
-- the vector that is safe. I have tested that this doesn't violate the condition that j < i
memoisedSection inputs' = flip evalState 0 $ V.generateM iMax $ \i -> do
validSection <- state $ \oldIx ->
let newIx = oldIx + V.length (V.takeWhile (< i) (V.unsafeDrop oldIx inputs'))
in (V.unsafeTake newIx inputs', newIx)
return $ V.foldl' max 0 $ V.map (unsafeProduceValue i) validSection

Here's a simple way of solving the problem with Applicatives, provided that you don't need to keep the rest of the list once you run into an issue:
import Control.Applicative
memoizeSections :: Ord t => [(t, t)] -> Maybe [t]
memoizeSections [] = Just []
memoizeSections ((x, y):xs) = (:) <$> maybeProduceValue x y <*> memoizeSections xs
This is equivalent to:
import Data.Traversable
memoizeSections :: Ord t => [(t, t)] -> Maybe [t]
memoizeSections = flip traverse (uncurry maybeProduceValue)
and will return Nothing on the first occurrence of failure. Note that I don't know how fast this is, but it's certainly concise, and arguably pretty clear (particularly the first example).

Some minor comments:
-- any sorted vector
js :: V.Vector Int
js = V.enumFromN 0 jMax
If you have a vector of Ints (or Floats, etc), you want to use Data.Vector.Unboxed.
maybeProduceValue :: Int -> Int -> Maybe Float
maybeProduceValue i j | j < i = Just (fromIntegral (i+j))
| otherwise = Nothing
Since Just is lazy in its only field, this will create a thunk for the computation fromIntegral (i+j). You almost always want to apply Just like so
maybeProduceValue i j | j < i = Just $! fromIntegral (i+j)
There are some more thunks in:
memoisedSection inputs' = flip evalState 0 $ V.generateM iMax $ \i -> do
validSection <- state $ \oldIx ->
let newIx = oldIx + V.length (V.takeWhile (< i) (V.unsafeDrop oldIx inputs'))
in (V.unsafeTake newIx inputs', newIx)
return $ V.foldl' max 0 $ V.map (unsafeProduceValue i) validSection
Namely you want to:
let !newIx = oldIx + V.length (V.takeWhile (< i) (V.unsafeDrop oldIx inputs'))
!v = V.unsafeTake newIx inputs'
in (v, newIx)
as the pair is lazy in its fields and
return $! V.foldl' max 0 $ V.map (unsafeProduceValue i) validSection
because return in the state monad is lazy in the value.

You can use a guard in a single list comprehension:
[f i j | j <- js, i <- is, j < i]

If you're trying to get the same results as
[foo i j | i <- is, j <- js, j < i]
when you know that js is increasing, just write
[foo i j | i <- is, j <- takeWhile (< i) js]
There's no need to mess around with Maybe for this. Note that making the input list global has a likely-unfortunate effect: instead of fusing the production of the input list with its transformation(s) and ultimate consumption, it's forced to actually construct the list and then keep it in memory. It's quite possible that it will take longer to pull the list into cache from memory than to generate it piece by piece on the fly!

Project Euler #4 using Haskell

I hope this works by just pasting and running it with "runghc euler4.hs 1000". Since I am having a hard time learning Haskell, can someone perhaps tell me how I could improve here? Especially all those "fromIntegral" are a mess.
module Main where
import System.Environment
main :: IO ()
main = do
args <- getArgs
let
hBound = read (args !! 0)::Int
squarePal = pal hBound
lBound = floor $ fromIntegral squarePal /
(fromIntegral hBound / fromIntegral squarePal)
euler = maximum $ takeWhile (>squarePal) [ x | y <- [lBound..hBound],
z <- [y..hBound],
let x = y * z,
let s = show x,
s == reverse s ]
putStrLn $ show euler
pal :: Int -> Int
pal n
| show pow == reverse (show pow) = n
| otherwise = pal (n-1)
where
pow = n^2

If what you want is integer division, you should use div instead of converting back and forth to Integral in order to use ordinary /.
module Main where
import System.Environment
main :: IO ()
main = do
(arg:_) <- getArgs
let
hBound = read arg :: Int
squarePal = pal hBound
lBound = squarePal * squarePal `div` hBound
euler = maximum $ takeWhile (>squarePal) [ x | y <- [lBound..hBound],
z <- [y..hBound],
let x = y * z,
let s = show x,
s == reverse s ]
print euler
pal :: Int -> Int
pal n
| show pow == reverse (show pow) = n
| otherwise = pal (n - 1)
where
pow = n * n
(I've re-written the lbound expression, that used two /, and fixed some styling issues highlighted by hlint.)

Okay, couple of things:
First, it might be better to pass in a lower bound and an upper bound for this question, it makes it a little bit more expandable.
If you're only going to use the first two (one in your previous case) arguments from the CL, we can handle this with pattern matching easily and avoid yucky statements like (args !! 0):
(arg0:arg1:_) <- getArgs
Let's convert these to Ints:
let [a, b] = map (\x -> read x :: Int) [arg0,arg1]
Now we can reference a and b, our upper and lower bounds.
Next, let's make a function that runs through all of the numbers between an upper and lower bound and gets a list of their products:
products a b = [x*y | x <- [a..b], y <- [x..b]]
We do not have to run over each number twice, so we start x at our current y to get all of the different products.
from here, we'll want to make a method that filters out non-palindromes in some data set:
palindromes xs = filter palindrome xs
where palindrome x = show x == reverse $ show x
finally, in our main function:
print . maximum . palindromes $ products a b
Here's the full code if you would like to review it:
import System.Environment
main = do
(arg0:arg1:_) <- getArgs
let [a, b] = map (\x -> read x :: Int) [arg0,arg1]
print . maximum . palindromes $ products a b
products a b = [x*y | x <- [a..b], y <- [x..b]]
palindromes = filter palindrome
where palindrome x = (show x) == (reverse $ show x)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Managing memory usage with infinite lists - haskell

Related

Lazy state transformer consumes lazy list eagerly in 2D recursion

Parsec permutation parsing

Using force vs time / space efficiency

Short-circuiting a function over a lower triangular(ish) array in Haskell: speed leads to ugly code

Project Euler #4 using Haskell

Categories

Resources