Speed up List.fold with Threads and Channels - multithreading

What I am trying to achieve is, my own version of List.fold_right or List.fold_left with Event Module and Thread Module in order to speed up the process.
(I'm aware that Ocaml doesn't support parallel multithreading, but I'm in here for the concept)
My try : (although I'm not sure I won extra time)
open Thread
open Event
let rec tapply f start = function
| [] -> start
| h::t -> let c = new_channel () in
let _ = create (fun _ -> sync (send c (f h (tapply f start t)))) ()
in sync (receive c)
Calling tapply :
#tapply ( * ) 1 [1;2;3;4];;
- : int = 24

It is impossible to parallelize fold functions in the general case, as every intermediate value is dependent on all the previous ones. See Cipher Block Chaining, for example. Parallelization is possible if certain additional conditions are met.
Example 1: If it's possible to treat each list element as a function (example: group actions), then you can exploit the associativity of functions using a divide and conquer approach, where you first calculate the function represented by a list and then apply it to the initial element.
Example 2: If the function is commutative and you're using reduce instead of a fold operation, then you can split the list into sublists and distribute work for the sublists, such as:
open Batteries
let par_reduce n f l =
(* worker thread *)
let worker (ch, l) =
List.reduce f l |> Event.(sync % send ch) in
(* split [l] in [n] sublists *)
let k = max (List.length l / n) 1 in
let sublists = List.ntake k l in
(* create [n] worker threads *)
List.map (fun sublist ->
let ch = Event.new_channel () in
let _ = Thread.create worker (ch, sublist) in
ch) sublists
(* collect results *)
|> List.map Event.(sync % receive)
(* and combine those into the final result *)
|> List.reduce f
let factorial n =
par_reduce 4 Num.mult_num (List.init n (fun k -> Num.of_int (k+1)))
let () =
Printf.printf "%s\n" Num.(to_string ## factorial 1000)
Note 1: This uses the "batteries included" library for some convenience functions.
Note 2: There are faster ways to calculate large factorials – in fact, the code doesn't even split the work evenly –, this is just an example that actually uses some CPU time.

Related

Haskell - Values do not save after a ForM loop

I want to be able to create a program in haskell that can an find an increasing subsequence from a list of numbers (not yet completed, this part is to compute for each sublist what the longest increasing subsequence is within that sublist). This program essentially take take from an input such as
1
5
1 2 9 6 8
where the first line is the number of test cases and the second line being the number of numbers in the specific test case, and the third line being the test case itself. It is looking for the multiple increasing sequences within the test case. Here is what I have so far:
main = do
inputCases <- getLine
let cases = (read inputCases :: Int)
forM [1..cases] $ \num -> do
inputNumbers <- getLine
let numbers = (read inputNumbers :: Int)
something <- getLine
let singlewords = words something
list = f singlewords
let hello = grid numbers numbers 0
let second = hello
print list
forM [0..numbers] $ \check -> do
forM [check..numbers] $ \check2 -> do
let val = 1
let keeper = val
forM [check..check2] $ \check3 -> do
let val = getVal hello list keeper check check2 check3
let keeper = val
return()
print check
print check2
print val
let hello = updateMatrix second val (check, check2)
let second = hello
f :: [String] -> [Int]
f = map read
grid :: Int -> Int -> a -> [[a]]
grid x y = replicate y . replicate x
getVal :: [[Int]] -> [Int] -> Int -> Int -> Int -> Int -> Int
getVal m b second x y z =
if b!!z < b!!y && 1+m!!x!!z > second then 1+m!!x!!z
else second
updateMatrix :: [[a]] -> a -> (Int, Int) -> [[a]]
updateMatrix m x (r,c) =
take r m ++
[take c (m !! r) ++ [x] ++ drop (c + 1) (m !! r)] ++
drop (r + 1) m
However, my problem is that once the program exits the ForM loop, it does not save the variable "hello" or anything that was declared in the for loop. Is there a better way of doing this? Would recursion work in this case? I am not quite sure how that would be implemented
lis[i][j] will hold the length of the longest increasing subsequence in {a[i], ..., a[j]}
here is the python code that I am trying to translate. Given this code, is there a better way of doing this besides the way I am currently trying to do it?
T = int(input())
for t in range(0, T):
n = int(input())
a = list(map(int, input().split()))
lis = [[0 for j in range(0, n)] for i in range(0, n)]
for i in range(0, n):
for j in range(i, n):
val = 1
for k in range(i, j):
if(a[k] < a[j] and 1 + lis[i][k] > val):
val = 1 + lis[i][k]
lis[i][j] = val
In my other answer, I discussed the answer to the question you asked about how to store information for retrieval later when in a forM loop. In this answer, I will discuss the idiomatic translation of for loops from other languages; generally this is not to produce a forM loop in Haskell.
Because this is an excellent programming exercise, I don't want to give away the answer -- there's a lot of joy and learning to be had from solving the problem yourself. But I do want to illustrate an alternative approach. To keep all the interesting bits of the translation I cooked up of your Python code, I will solve a slightly easier problem in a slightly stylized way: instead of lis[i][j] giving the length of the longest increasing subsequence between indices i and j in the original list, we will have lis[i][j] give the largest value between indices i and j in the original list.
The idea will go like this: instead of iterating over indices i and j, we'll iterate over suffixes starting at i, then over prefixes of suffixes starting at i and ending at j. To begin with, we'll do the naive thing of just calling maximum on each infix expression. So:
import Data.List
maxes0 a =
[ [ maximum inf
| inf <- tail (inits suff)
]
| suff <- init (tails a)
]
For example, we can try it on your example list in ghci:
> maxes0 [1,2,9,6,8]
[[1,2,9,9,9],[2,9,9,9],[9,9,9],[6,8],[8]]
Note right away that there's a difference in shape here: where in Python we produced a square result, here we produce a triangular one, omitting the useless entries that do not correspond to actual infix chunks of the original list. (It's easy to reintroduce dummy values if you actually need a square result for some reason.)
This is already pretty good, and quite idiomatic; however, there is one part of the Python code that it does not capture well yet: the Python code reuses previously computed values to do some dynamic programming. This can be done to the above code, as well, though it does require a bit of mental gymnastics the first few times you see it. We will use laziness and recursion to make available earlier results when computing later ones.
The idea here will be to keep a rolling max as we traverse the suffix, merging as we go the list of maximums of infixes with the new values we see in the suffix. So:
maxes1 a =
[ let row = head suff : zipWith max row (tail suff)
in row
| suff <- init (tails a)
]
We can see in ghci that this works just the same:
> maxes1 [1,2,9,6,8]
[[1,2,9,9,9],[2,9,9,9],[9,9,9],[6,8],[8]]
You can combine these two ideas (making the already-computed bits available via laziness+recursion, and making the infix list available by nested list comprehensions) to produce an idiomatic translation of the Python code which is completely pure, does not mention list indices anywhere, and does not use forM.
forM returns a list of values, one each per input element in the list it's handed, with whatever you compute in the body of the function you give to forM. So you can extract information from the loop body with the usual do-notation binding syntax. Here's a simple example that asks the user whether to double each number in a list:
import Control.Monad
vals = [1..5]
main = do
vals' <- forM vals $ \val -> do
v <- getLine
return (if v == "yes" then val*2 else val)
print vals'
An example of running it:
> main
yes
yes
no
no
yes
[2,4,3,4,10]
Though this example returned numbers for simplicity, you may return arbitrary information of interest from each loop iteration in this way.

How to avoid recomputation of a pure function with the same parameters?

I have the following function that convert a list of counts to a discrete probability density function:
freq2prob l = [ (curr / (sum l))) | curr <- l ]
Unfortunately (sum l) is computed for each of the l elements making the computational complexity unnecessarily high.
What is the most concise, elegant, "haskellic" way to deal with this?
It's simple:
freq2prob l = [ curr / s | let s = sum l, curr <- l ]
you can put it outside the list comprehension as well: freq2prob l = let s = sum l in [ curr / s | curr <- l ] (notice the in). This is effectively the same computation.
That is because the first is essentially translated into
freq2prob :: (Fractional a) => [a] -> [a]
freq2prob l = [ curr / s | let s = sum l, curr <- l ]
= do
let s = sum l
curr <- l
return (curr / s)
= let s=sum l in
l >>= (\curr -> [curr / s])
-- concatMap (\curr -> [curr / s]) l
-- map (\curr -> curr / s) l
and the second, obviously, to the same code,
freq2prob l = let s = sum l in [ curr / s | curr <- l ]
= let s = sum l in
do
curr <- l
return (curr / s)
= let s=sum l in
l >>= (\curr -> [curr / s])
We can use a let statement or a where clause for this:
freq2prob l = let s = sum l in
[ curr / s | curr <- l ]
or
freq2prob l = [ curr / s | curr <- l ]
where s = sum l
but it would be more idiomatic to use a higher order function than list comprehension, since you're doing the same thing to each element:
freq2prob l = map (/sum l) l
The sum l in the dividing function (/sum l) will only be evaluated once.
This is because when evaluating map f xs, the compiler doesn't make the elementary mistake of creating multiple copies of the function f to be evaluated separately; it's a thunk that will be pointed to by every occurrence where it's needed.
As a simple and blunt test, we can investigate crude timing stats in ghci for whether it's noticeably faster to use the same function repeatedly or slightly different function each time. First I'll check whether the results of sums are usually cached in ghci:
ghci> sum [2..10000000]
50000004999999
(8.31 secs, 1533723640 bytes)
ghci> sum [2..10000000]
50000004999999
(8.58 secs, 1816661888 bytes)
So you can see it wasn't cached, and that there's a little variance in these crude stats.
Now let's multiply by the same complicated thing every time:
ghci> map (* sum [2..10000000]) [1..10]
[50000004999999,100000009999998,150000014999997,200000019999996,250000024999995,300000029999994,350000034999993,400000039999992,450000044999991,500000049999990]
(8.30 secs, 1534499200 bytes)
So (including a little variance, it took almost exactly the same time to multiply ten numbers by sum [2..10000000] using map than multiplying a single one. Multiplying ten pairs of numbers takes hardly any time at all. So ghci (an interpreter, not even an optimising compiler) didn't introduce multiple copies of the same calculation.
This isn't because ghci is clever, it's because lazy evaluation, a nice feature of pure functional programming, never does more work than necessary. In a most programming languages it would be hard to optimise away passing a lengthy calculation around all over the place instead of saving its result in a variable.
Now let's compare that with doing a slightly different calculation each time, where we add up slightly fewer numbers as we go.
ghci> map (\x -> sum [x..10000000]) [1..10]
[50000005000000,50000004999999,50000004999997,50000004999994,50000004999990,50000004999985,50000004999979,50000004999972,50000004999964,50000004999955]
(77.98 secs, 16796207024 bytes)
Well, that took roughly ten times as long, as we expected, because now we're asking it to do a different thing every time. I can verify for you that this paused for each number, whereas when we didn't change the expensive-to-calculate number, it was only evaluated once, and the pause was before the first number and the rest appeared rapidly.

Haskell: to fix or not to fix

I recently learned about Data.Function.fix, and now I want to apply it everywhere. For example, whenever I see a recursive function I want to "fix" it. So basically my question is where and when should I use it.
To make it more specific:
1) Suppose I have the following code for factorization of n:
f n = f' n primes
where
f' n (p:ps) = ...
-- if p^2<=n: returns (p,k):f' (n `div` p^k) ps for k = maximum power of p in n
-- if n<=1: returns []
-- otherwise: returns [(n,1)]
If I rewrite it in terms of fix, will I gain something? Lose something? Is it possible, that by rewriting an explicit recursion into fix-version I will resolve or vice versa create a stack overflow?
2) When dealing with lists, there are several solutions: recursion/fix, foldr/foldl/foldl', and probably something else. Is there any general guide/advice on when to use each? For example, would you rewrite the above code using foldr over the infinite list of primes?
There are, probably, other important questions not covered here. Any additional comments related to the usage of fix are welcome as well.
One thing that can be gained by writing in an explicitly fixed form is that the recursion is left "open".
factOpen :: (Integer -> Integer) -> Integer -> Integer
factOpen recur 0 = 1
factOpen recur n = n * recur (pred n)
We can use fix to get regular fact back
fact :: Integer -> Integer
fact = fix factOpen
This works because fix effectively passes a function itself as its first argument. By leaving the recursion open, however, we can modify which function gets "passed back". The best example of using this property is to use something like memoFix from the memoize package.
factM :: Integer -> Integer
factM = memoFix factOpen
And now factM has built-in memoization.
Effectively, we have that open-style recursion requires us impute the recursive bit as a first-order thing. Recursive bindings are one way that Haskell allows for recursion at the language level, but we can build other, more specialized forms.
I'd like to mention another usage of fix; suppose you have a simple language consisting of addition, negative, and integer literals. Perhaps you have written a parser which takes a String and outputs a Tree:
data Tree = Leaf String | Node String [Tree]
parse :: String -> Tree
-- parse "-(1+2)" == Node "Neg" [Node "Add" [Node "Lit" [Leaf "1"], Node "Lit" [Leaf "2"]]]
Now you would like to evaluate your tree to a single integer:
fromTree (Node "Lit" [Leaf n]) = case reads n of {[(x,"")] -> Just x; _ -> Nothing}
fromTree (Node "Neg" [e]) = liftM negate (fromTree e)
fromTree (Node "Add" [e1,e2]) = liftM2 (+) (fromTree e1) (fromTree e2)
Suppose someone else decides to extend the language; they want to add multiplication. They will have to have access to the original source code. They could try the following:
fromTree' (Node "Mul" [e1, e2]) = ...
fromTree' e = fromTree e
But then Mul can only appear once, at the top level of the expression, since the call to fromTree will not be aware of the Node "Mul" case. Tree "Neg" [Tree "Mul" a b] will not work, since the original fromTree has no pattern for "Mul". However, if the same function is written using fix:
fromTreeExt :: (Tree -> Maybe Int) -> (Tree -> Maybe Int)
fromTreeExt self (Node "Neg" [e]) = liftM negate (self e)
fromTreeExt .... -- other cases
fromTree = fix fromTreeExt
Then extending the language is possible:
fromTreeExt' self (Node "Mul" [e1, e2]) = ...
fromTreeExt' self e = fromTreeExt self e
fromTree' = fix fromTreeExt'
Now, the extended fromTree' will evaluate the tree properly, since self in fromTreeExt' refers to the entire function, including the "Mul" case.
This approach is used here (the above example is a closely adapted version of the usage in the paper).
Beware the difference between _Y f = f (_Y f) (recursion, value--copying) and fix f = x where x = f x (corecursion, reference--sharing).
Haskell's let and where bindings are recursive: same name on the LHS and RHS refer to the same entity. The reference is shared.
In the definition of _Y there's no sharing (unless a compiler performs an aggressive optimization of common subexpressions elimination). This means it describes recursion, where repetition is achieved by application of a copy of an original, like in a classic metaphor of a recursive function creating its own copies. Corecursion, on the other hand, relies on sharing, on referring to same entity.
An example, primes calculated by
2 : _Y ((3:) . gaps 5 . _U . map (\p-> [p*p, p*p+2*p..]))
-- gaps 5 == ([5,7..] \\)
-- _U == sort . concat
either reusing its own output (with fix, let g = ((3:)...) ; ps = g ps in 2 : ps) or creating separate primes supply for itself (with _Y, let g () = ((3:)...) (g ()) in 2 : g ()).
See also:
double stream feed to prevent unneeded memoization?
How to implement an efficient infinite generator of prime numbers in Python?
Or, with the usual example of factorial function,
gen rec n = n<2 -> 1 ; n * rec (n-1) -- "if" notation
facrec = _Y gen
facrec 4 = gen (_Y gen) 4
= let {rec=_Y gen} in (\n-> ...) 4
= let {rec=_Y gen} in (4<2 -> 1 ; 4*rec 3)
= 4*_Y gen 3
= 4*gen (_Y gen) 3
= 4*let {rec2=_Y gen} in (3<2 -> 1 ; 3*rec2 2)
= 4*3*_Y gen 2 -- (_Y gen) recalculated
.....
fac = fix gen
fac 4 = (let f = gen f in f) 4
= (let f = (let {rec=f} in (\n-> ...)) in f) 4
= let {rec=f} in (4<2 -> 1 ; 4*rec 3) -- f binding is created
= 4*f 3
= 4*let {rec=f} in (3<2 -> 1 ; 3*rec 2)
= 4*3*f 2 -- f binding is reused
.....
1) fix is just a function, it improves your code when you use some recursion. It makes your code prettier.For example usage visit: Haskell Wikibook - Fix and recursion.
2) You know what does foldr? Seems like foldr isn't useful in factorization (or i didn't understand what are you mean in that).
Here is a prime factorization without fix:
fact xs = map (\x->takeWhile (\y->y/=[]) x) . map (\x->factIt x) $ xs
where factIt n = map (\x->getFact x n []) [2..n]
getFact i n xs
| n `mod` i == 0 = getFact i (div n i) xs++[i]
| otherwise = xs
and with fix(this exactly works like the previous):
fact xs = map (\x->takeWhile (\y->y/=[]) x) . map (\x->getfact x) $ xs
where getfact n = map (\x->defact x n) [2..n]
defact i n =
fix (\rec j k xs->if(mod k j == 0)then (rec j (div k j) xs++[j]) else xs ) i n []
This isn't pretty because in this case fix isn't a good choice(but there is always somebody who can write it better).

Converting Haskell Polymorphic Cosine function to F#

I'm trying to convert some Haskell code to F# but I'm having some trouble since Haskell is lazy by default and F# is not. I'm also still learning my way around F#. Below is a polymorphic cosine function in Haskell with pretty good performance. I want to try and keep the same or better performance parameters in F#. I would like to see a F# List version and a F# Seq version since the Seq version would be more like the lazy Haskell but the List version would probably perform better. Thanks for any help.
Efficiency: number of arithmetic operations used proportional to number of terms in series
Space: uses constant space, independent of number of terms
takeThemTwoByTwo xs =
takeWhile (not . null) [take 2 ys | ys <- iterate (drop 2) xs]
products xss = [product xs | xs <- xss]
pairDifferences xs =
[foldr (-) 0 adjacentPair | adjacentPair <- takeThemTwoByTwo xs]
harmonics x = [x/(fromIntegral k) | k <- [1 ..]]
cosineTerms = scanl (*) 1 . products . takeThemTwoByTwo . harmonics
cosine = foldl (+) 0 . pairDifferences .
take numberOfTerms . cosineTerms
Here is my attempt in case you're lazy to read:
let harmonics x =
Seq.initInfinite(fun i -> - x*x/(float ((2*i+1)*(2*i+2))))
let cosineTerms = Seq.scan (*) 1.0 << harmonics
let cosine numberOfTerms = Seq.sum << Seq.take numberOfTerms << cosineTerms
I have a hard time finding out that you're calculating cosine in radian using Taylor series:
cosine(x) = 1 - x2/2! + x4/4! - x6/6! +
...
Let me describe what you're doing:
Create an infinite sequence of x/k where k is an integer starting from 1.
Split above sequence into chunks of two and scan by multiplying with a seed of 1 to have a sequence of x2/((2k-1)*(2k)) (with an exception of 1 at the beginning).
Split the new sequence into blocks of two again to have differences in the form of x4k-4/((4k-4)!) - x4k-2/((4k-2)!) and sum all of them to get final result.
Because it's likely to be inefficient to split sequences in F# and takeThemTwoByTwo function is not essential, I chose another approach:
Create an infinite sequence of - x2/((2k-1)*(2k)) where k is an integer starting from 1.
Scan the sequence by multiplying with a seed of 1; we get a sequence of (-1)k * x2k/((2k)!).
Sum all elements to obtain final result.
Above program is a direct translation of my description, succinct and simple. Computing cosine with numberOfTerms = 200000 iterations takes 0.15 seconds on my machine; I suppose it is efficient enough for your purpose.
Furthermore, a List version should be easy to translate from this one.
UPDATE:
Ok, my fault was to underestimate the polymorphism part of the question. I focused more on the performance part. Here is a polymorphic version (keeping closely to the float version):
let inline cosine n (x: ^a) =
let one: ^a = LanguagePrimitives.GenericOne
Seq.initInfinite(fun i -> LanguagePrimitives.DivideByInt (- x*x) ((2*i+1)*(2*i+2)))
|> Seq.scan (*) one
|> Seq.take n
|> Seq.sum
Seq.initInfinite is less powerful than Seq.unfold in #kvb 's answer. I keep it to make things simple because n is in int range anyway.
Pad's answer is good, but not polymorphic. In general, it's significantly less common to create such definitions in F# than in Haskell (and a bit of a pain). Here's one approach:
module NumericLiteralG =
let inline FromZero() = LanguagePrimitives.GenericZero
let inline FromOne() = LanguagePrimitives.GenericOne
module ConstrainedOps =
let inline (~-) (x:^a) : ^a = -x
let inline (+) (x:^a) (y:^a) : ^a = x + y
let inline (*) (x:^a) (y:^a) : ^a = x * y
let inline (/) (x:^a) (y:^a) : ^a = x / y
open ConstrainedOps
let inline cosine n x =
let two = 1G + 1G
Seq.unfold (fun (twoIp1, t) -> Some(t, (twoIp1+two, -t*x*x/(twoIp1*(twoIp1+1G))))) (1G,1G)
|> Seq.take n
|> Seq.sum
As Pad wrote, this appears to be the Taylor series expansion of cos(x) about x=0:
cosine(x) = 1 - x²/2! + x⁴/4! - x⁶/6! + ...
So your question is an XY question: you presented a solution rather than posing the problem. Presenting the problem instead makes it much easier to solve it differently.
Let's start by writing a float-specific version in F#:
let cosine n x =
let rec loop i q t c =
if i=n then c else
loop (i + 1) (q + 10 + 8*i) (-t * x * x / float q) (c + t)
loop 0 2 1.0 0.0
For example, we can compute 1M terms of the expansion of x=0.1:
cosine 1000000 0.1
The best way to make this polymorphic in F# is to parameterize the function over the operators it uses and mark it as inline in order to remove the performance overhead of this parameterization:
let inline cosine zero one ofInt ( ~-. ) ( +. ) ( *. ) ( /. ) n x =
let rec loop i q t c =
if i=n then c else
loop (i + 1) (q + 10 + 8*i) (-.t *. x *. x /. ofInt q) (c +. t)
loop 0 2 one zero
Now we can compute 1M terms using float like this, which is just as fast as before:
cosine 0.0 1.0 float ( ~- ) (+) (*) (/) 1000000 0.1
But we can also do single-precision float:
cosine 0.0f 1.0f float32 ( ~- ) (+) (*) (/) 1000000 0.1f
And arbitrary-precision rational:
cosine 0N 1N BigNum.FromInt (~-) (+) (*) (/) 10 (1N / 10N)
And even symbolic:
type Expr =
| Int of int
| Var of string
| Add of Expr * Expr
| Mul of Expr * Expr
| Pow of Expr * Expr
static member (~-) f = Mul(Int -1, f)
static member (+) (f, g) = Add(f, g)
static member (*) (f, g) = Mul(f, g)
static member (/) (f, g) = Mul(f, Pow(g, Int -1))
cosine (Int 0) (Int 1) Int (~-) (+) (*) (/) 3 (Var "x")
To make it faster, hoist the common subexpression -x*x out of loop.

Haskell: repeat a function a large number of times without stackoverflow

As a newbie to Haskell I am trying to iterate a function (e.g., the logistic map) a large number of times. In an imperative language this would be a simple loop, however in Haskell I end up with stack overflow. Take for example this code:
main = print $ iter 1000000
f x = 4.0*x*(1.0-x)
iter :: Int -> Double
iter 0 = 0.3
iter n = f $ iter (n-1)
For a small number of iterations the code works, but for a million iterations I get a stack space overflow:
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.
I cannot understand why this does happen. The tail recursion should be fine here.
Maybe the problem is lazy evaluation. I experimented with several ways to force strict evaluation, by inserting $! or seq at various positions, but with no success.
What would be the Haskell way to iterate a function a huge number of times?
I have tried suggestions from related posts: here or here, but I always ended up with stackoverflow for a large number of iterations, e.g., main = print $ iterate f 0.3 !! 1000000.
The problem is that your definition
iter :: Int -> Double
iter 0 = 0.3
iter n = f $ iter (n-1)
tries to evaluate in the wrong direction. Unfolding it for a few steps, we obtain
iter n = f (iter (n-1))
= f (f (iter (n-2)))
= f (f (f (iter (n-3))))
...
and the entire call stack from iter 1000000 to iter 0 has to be built before anything can be evaluated. It would be the same in a strict language. You have to organise it so that part of the evaluation can take place before recurring. The usual way is to have an accumulation parameter, like
iter n = go n 0.3
where
go 0 x = x
go k x = go (k-1) (f x)
Then adding strictness annotations - in case the compiler doesn't already add them - will make it run smoothly without consuming stack.
The iterate variant has the same problem as your iter, only the call stack is built inside-out rather than outside-in as for yours. But since iterate builds its call-stack inside-out, a stricter version of iterate (or a consumption pattern where earlier iterations are forced before) solves the problem,
iterate' :: (a -> a) -> a -> [a]
iterate' f x = x `seq` (x : iterate' f (f x))
calculates iterate' f 0.3 !! 1000000 without problem.

Resources