Haskell: to fix or not to fix - haskell

I recently learned about Data.Function.fix, and now I want to apply it everywhere. For example, whenever I see a recursive function I want to "fix" it. So basically my question is where and when should I use it.
To make it more specific:
1) Suppose I have the following code for factorization of n:
f n = f' n primes
where
f' n (p:ps) = ...
-- if p^2<=n: returns (p,k):f' (n `div` p^k) ps for k = maximum power of p in n
-- if n<=1: returns []
-- otherwise: returns [(n,1)]
If I rewrite it in terms of fix, will I gain something? Lose something? Is it possible, that by rewriting an explicit recursion into fix-version I will resolve or vice versa create a stack overflow?
2) When dealing with lists, there are several solutions: recursion/fix, foldr/foldl/foldl', and probably something else. Is there any general guide/advice on when to use each? For example, would you rewrite the above code using foldr over the infinite list of primes?
There are, probably, other important questions not covered here. Any additional comments related to the usage of fix are welcome as well.

One thing that can be gained by writing in an explicitly fixed form is that the recursion is left "open".
factOpen :: (Integer -> Integer) -> Integer -> Integer
factOpen recur 0 = 1
factOpen recur n = n * recur (pred n)
We can use fix to get regular fact back
fact :: Integer -> Integer
fact = fix factOpen
This works because fix effectively passes a function itself as its first argument. By leaving the recursion open, however, we can modify which function gets "passed back". The best example of using this property is to use something like memoFix from the memoize package.
factM :: Integer -> Integer
factM = memoFix factOpen
And now factM has built-in memoization.
Effectively, we have that open-style recursion requires us impute the recursive bit as a first-order thing. Recursive bindings are one way that Haskell allows for recursion at the language level, but we can build other, more specialized forms.

I'd like to mention another usage of fix; suppose you have a simple language consisting of addition, negative, and integer literals. Perhaps you have written a parser which takes a String and outputs a Tree:
data Tree = Leaf String | Node String [Tree]
parse :: String -> Tree
-- parse "-(1+2)" == Node "Neg" [Node "Add" [Node "Lit" [Leaf "1"], Node "Lit" [Leaf "2"]]]
Now you would like to evaluate your tree to a single integer:
fromTree (Node "Lit" [Leaf n]) = case reads n of {[(x,"")] -> Just x; _ -> Nothing}
fromTree (Node "Neg" [e]) = liftM negate (fromTree e)
fromTree (Node "Add" [e1,e2]) = liftM2 (+) (fromTree e1) (fromTree e2)
Suppose someone else decides to extend the language; they want to add multiplication. They will have to have access to the original source code. They could try the following:
fromTree' (Node "Mul" [e1, e2]) = ...
fromTree' e = fromTree e
But then Mul can only appear once, at the top level of the expression, since the call to fromTree will not be aware of the Node "Mul" case. Tree "Neg" [Tree "Mul" a b] will not work, since the original fromTree has no pattern for "Mul". However, if the same function is written using fix:
fromTreeExt :: (Tree -> Maybe Int) -> (Tree -> Maybe Int)
fromTreeExt self (Node "Neg" [e]) = liftM negate (self e)
fromTreeExt .... -- other cases
fromTree = fix fromTreeExt
Then extending the language is possible:
fromTreeExt' self (Node "Mul" [e1, e2]) = ...
fromTreeExt' self e = fromTreeExt self e
fromTree' = fix fromTreeExt'
Now, the extended fromTree' will evaluate the tree properly, since self in fromTreeExt' refers to the entire function, including the "Mul" case.
This approach is used here (the above example is a closely adapted version of the usage in the paper).

Beware the difference between _Y f = f (_Y f) (recursion, value--copying) and fix f = x where x = f x (corecursion, reference--sharing).
Haskell's let and where bindings are recursive: same name on the LHS and RHS refer to the same entity. The reference is shared.
In the definition of _Y there's no sharing (unless a compiler performs an aggressive optimization of common subexpressions elimination). This means it describes recursion, where repetition is achieved by application of a copy of an original, like in a classic metaphor of a recursive function creating its own copies. Corecursion, on the other hand, relies on sharing, on referring to same entity.
An example, primes calculated by
2 : _Y ((3:) . gaps 5 . _U . map (\p-> [p*p, p*p+2*p..]))
-- gaps 5 == ([5,7..] \\)
-- _U == sort . concat
either reusing its own output (with fix, let g = ((3:)...) ; ps = g ps in 2 : ps) or creating separate primes supply for itself (with _Y, let g () = ((3:)...) (g ()) in 2 : g ()).
See also:
double stream feed to prevent unneeded memoization?
How to implement an efficient infinite generator of prime numbers in Python?
Or, with the usual example of factorial function,
gen rec n = n<2 -> 1 ; n * rec (n-1) -- "if" notation
facrec = _Y gen
facrec 4 = gen (_Y gen) 4
= let {rec=_Y gen} in (\n-> ...) 4
= let {rec=_Y gen} in (4<2 -> 1 ; 4*rec 3)
= 4*_Y gen 3
= 4*gen (_Y gen) 3
= 4*let {rec2=_Y gen} in (3<2 -> 1 ; 3*rec2 2)
= 4*3*_Y gen 2 -- (_Y gen) recalculated
.....
fac = fix gen
fac 4 = (let f = gen f in f) 4
= (let f = (let {rec=f} in (\n-> ...)) in f) 4
= let {rec=f} in (4<2 -> 1 ; 4*rec 3) -- f binding is created
= 4*f 3
= 4*let {rec=f} in (3<2 -> 1 ; 3*rec 2)
= 4*3*f 2 -- f binding is reused
.....

1) fix is just a function, it improves your code when you use some recursion. It makes your code prettier.For example usage visit: Haskell Wikibook - Fix and recursion.
2) You know what does foldr? Seems like foldr isn't useful in factorization (or i didn't understand what are you mean in that).
Here is a prime factorization without fix:
fact xs = map (\x->takeWhile (\y->y/=[]) x) . map (\x->factIt x) $ xs
where factIt n = map (\x->getFact x n []) [2..n]
getFact i n xs
| n `mod` i == 0 = getFact i (div n i) xs++[i]
| otherwise = xs
and with fix(this exactly works like the previous):
fact xs = map (\x->takeWhile (\y->y/=[]) x) . map (\x->getfact x) $ xs
where getfact n = map (\x->defact x n) [2..n]
defact i n =
fix (\rec j k xs->if(mod k j == 0)then (rec j (div k j) xs++[j]) else xs ) i n []
This isn't pretty because in this case fix isn't a good choice(but there is always somebody who can write it better).

Related

Explanation of the findIndices function

I said in this question that I didn't understand the source code of findIndices.
In fact I didn't pay enough attention and I didn't see that there are two definitions of this function:
findIndices :: (a -> Bool) -> [a] -> [Int]
#if defined(USE_REPORT_PRELUDE)
findIndices p xs = [ i | (x,i) <- zip xs [0..], p x]
#else
-- Efficient definition, adapted from Data.Sequence
{-# INLINE findIndices #-}
findIndices p ls = build $ \c n ->
let go x r k | p x = I# k `c` r (k +# 1#)
| otherwise = r (k +# 1#)
in foldr go (\_ -> n) ls 0#
#endif /* USE_REPORT_PRELUDE */
I understand the first definition, the one I didn't see. I don't understand the second one. I have a couple of questions:
what is if defined(USE_REPORT_PRELUDE) ?
can one explain the second definition ? What are build, I#, +#, 1# ?
why the second definition is inlined, not the first one ?
The CPP extensions enables the C preprocessor, as for the C programming language. Here, it is used to test if the flag USE_REPORT_PRELUDE was set during compilation. According to that flag, the compiler uses the #if or the #else variant of code.
build is a function which could be defined as
build f = f (:) []
So, using build (\c n -> ... essentially lets c to the "cons" (:), and n to the "nil" [].
This is not used for convenience: it is not convenient at all! However, the compiler optimizer works great with build and foldr combined, so the code is written here in a weird way to take advantage of that.
Further, I# ... is the low-level constructor for integers. When we normally write
x :: Int
x = 4+2
GHC implements x (very roughly) with a pointer to some memory that reads as unevaluated: 4+2. After x is forced the first time, this memory gets overwritten with evaluated: I# 6#. This is needed to implement laziness.
The "boxing" here refers to the indirection through a pointer.
Instead, the type Int# is a plain machine integer, with no pointers, no indirection, no unevaluated expressions. It is strict (instead of lazy), but being more low-level it is more efficient. One creates a value as in
x' :: Int#
x' = 6#
x :: Int
x = I# x'
Indeed, Int is defined as newtype Int = I# Int#.
Keep in mind that this is not standard Haskell, but GHC-specific low-level details. In normal code, you should not need to use such unboxed types. In libraries, the authors do that to achieve a little more performance, but that's it.
Sometimes, even if in our code we only use Ints, GHC is smart enough to automatically convert our code to using Int# and achieve more efficiency, avoiding the boxing. This can be observed if we ask GHC to "dump Core" so that we can see the result of the optimization.
For instance, compiling
f :: Int -> Int
f 0 = 0
f n = n + f (n-1)
GHC produces a lower level version (this is GHC Core, not Haskell, but it is similar enough to be understood):
Main.$wf :: GHC.Prim.Int# -> GHC.Prim.Int#
Main.$wf = \ (ww_s4un :: GHC.Prim.Int#) ->
case ww_s4un of ds_X210 {
__DEFAULT ->
case Main.$wf (GHC.Prim.-# ds_X210 1#) of ww1_s4ur { __DEFAULT ->
GHC.Prim.+# ds_X210 ww1_s4ur
};
0# -> 0#
}
Notice the number of arguments to go. go x r k = ... === go x r = \k -> .... This is the standard trick to arrange for left-to-right information flow while folding the list (go is used as the reducer function, in foldr go (\_ -> n) ls 0#). Here, it's the counting of [0..], explicated as the initial k=0 and the (k + 1) on each step (k is an unfortunate naming choice, i seems better; k is overloaded with the irrelevant "constant" and "continuation", not just "counter" which was probably the intended meaning here).
The foldr/build (sic) fusion (linked to by luqui in the comments) turns foldr c n $ findIndices p [a1,a2,...,an] into a loop, exposing the inner foldr of the findIndices definition, avoiding building the actual list structure of the result of the findIndices call:
build g = g (:) []
foldr c n $ build g = g c n
foldr c n $ findIndices p [a1,a2,...,an]
==
foldr c n $ build g where {g c n = ...}
=
g c n where {g c n = ...}
=
foldr go (const n) [a1,a2,...,an] 0 where {go x r k = ...}
=
go a1 (foldr go (const n) [a2,...,an]) 0
=
let { x=a1, r=foldr go (const n) [a2,...,an], k=0 }
in
if | p x -> c (I# k) (r (k +# 1#)) -- no 'cons' (`:`), only 'c'
| otherwise -> r (k +# 1#)
=
....
So you see, it's a standard trick to have foldr define a function which expects one more input argument, to arrange the left-to-right information flow while processing the input list.
All the stuff with the hash sign are "primitive" or "closer-to-machine-level" entities. I# is a primitive Int constructor; 0# is a machine-level 0; etc.. This may or may not be exactly correct, but it should be close enough.
foldr/build fusion seems a particular case of transducers-based code transformation, which is based on the fact that nested folds are fused by composing their reducers' transformers (aka transducers), as in
foldr c n $
foldr (tr2 c2) n2 $
foldr (tr3 c3) n3 xs
=
foldr (tr2 c) n $ -- fold "replaces" the constructor nodes with its reducer
foldr (tr3 c3) n3 xs -- so just use the outer reducer in the first place!
=
foldr (tr3 (tr2 c)) n xs
=
foldr ((tr3 . tr2) c) n xs
and build g === foldr . tr for some appropriate choice of tr for a given g, so that
build g = g c n = (foldr . tr) c n = foldr (tr c) n
As for USE_REPORT_PRELUDE, again, I can't say this with any authority, but I always assumed that it is the compilation flag which is enabled when the mock definitions from the Haskell Report are used as actual code, even though they were intended as an executable specification.

Using fold* to grow a list in Haskell

I'm trying to solve the following problem in Haskell: given an integer return the list of its digits. The constraint is I have to only use one of the fold* functions (* = {r,l,1,l1}).
Without such constraint, the code is simple:
list_digits :: Int -> [Int]
list_digits 0 = []
list_digits n = list_digits r ++ [n-10*r]
where
r = div n 10
But how do I use fold* to, essentially grow a list of digits from an empty list?
Thanks in advance.
Is this a homework assignment? It's pretty strange for the assignment to require you to use foldr, because this is a natural use for unfoldr, not foldr. unfoldr :: (b -> Maybe (a, b)) -> b -> [a] builds a list, whereas foldr :: (a -> b -> b) -> b -> [a] -> b consumes a list. An implementation of this function using foldr would be horribly contorted.
listDigits :: Int -> [Int]
listDigits = unfoldr digRem
where digRem x
| x <= 0 = Nothing
| otherwise = Just (x `mod` 10, x `div` 10)
In the language of imperative programming, this is basically a while loop. Each iteration of the loop appends x `mod` 10 to the output list and passes x `div` 10 to the next iteration. In, say, Python, this'd be written as
def list_digits(x):
output = []
while x > 0:
output.append(x % 10)
x = x // 10
return output
But unfoldr allows us to express the loop at a much higher level. unfoldr captures the pattern of "building a list one item at a time" and makes it explicit. You don't have to think through the sequential behaviour of the loop and realise that the list is being built one element at a time, as you do with the Python code; you just have to know what unfoldr does. Granted, programming with folds and unfolds takes a little getting used to, but it's worth it for the greater expressiveness.
If your assignment is marked by machine and it really does require you to type the word foldr into your program text, (you should ask your teacher why they did that and) you can play a sneaky trick with the following "id[]-as-foldr" function:
obfuscatedId = foldr (:) []
listDigits = obfuscatedId . unfoldr digRem
Though unfoldr is probably what the assignment meant, you can write this using foldr if you use foldr as a hylomorphism, that is, building up one list while it tears another down.
digits :: Int -> [Int]
digits n = snd $ foldr go (n, []) places where
places = replicate num_digits ()
num_digits | n > 0 = 1 + floor (logBase 10 $ fromIntegral n)
| otherwise = 0
go () (n, ds) = let (q,r) = n `quotRem` 10 in (q, r : ds)
Effectively, what we're doing here is using foldr as "map-with-state". We know ahead of time
how many digits we need to output (using log10) just not what those digits are, so we use
unit (()) values as stand-ins for those digits.
If your teacher's a stickler for just having a foldr at the top-level, you can get
away with making go partial:
digits' :: Int -> [Int]
digits' n = foldr go [n] places where
places = replicate num_digits ()
num_digits | n > 0 = floor (logBase 10 $ fromIntegral n)
| otherwise = 0
go () (n:ds) = let (q,r) = n `quotRem` 10 in (q:r:ds)
This has slightly different behaviour on non-positive numbers:
>>> digits 1234567890
[1,2,3,4,5,6,7,8,9,0]
>>> digits' 1234567890
[1,2,3,4,5,6,7,8,9,0]
>>> digits 0
[]
>>> digits' 0
[0]
>>> digits (negate 1234567890)
[]
>>> digits' (negate 1234567890)
[-1234567890]

Recursion With Tuples in Haskell

I want to define a simple function in Haskell:
nzp :: [Int] -> (Int,Int,Int)
that accepts a list of integers as input and returns a triple (a,b,c) where a is the amount of numbers in the list less than 0, b is the amount equal to 0 and c is the amount higher than zero. For example,
nzp [3,0,-2,0,4,5] = (1,2,3)
I have to define this function recursively and I can only traverse the list once. How can I do this? I can't seem to grasp the concept of recursively creating a tuple.
Most Regards
Here are some pointers:
To use recursion to build up a value, you need to pass the previous version of the value as an argument, so write
nzp_helper :: [Int] -> (Int,Int,Int) -> (Int, Int, Int)
Decide what the answer is when the list is empty
nzp_helper [] runningTotals = -- what's the answer if the list is empty?
Decide what the answer is when there's something in the list
nzp_helper (i:is) (negatives, positives, zeros) =
| i < 0 = -- write stuff here
| i == 0 = -- I hope you're getting the idea
Kick the whole thing off by defining nzp using nzp_helper.
nzp is = nzp_helper is -- one final argument - what?
Instead of thinking of trying to create a single tuple recursively, you should think about updating an existing tuple containing the counts based on given value. This function would look something like:
update :: (Int, Int, Int) -> Int -> (Int, Int, Int)
update (l,e,g) x | x < 0 = (l+1, e, g)
update (l,e,g) x | x == 0 = (l, e+1, g)
update (l,e,g) x | x > 0 = (l, e, g+1)
Then you can traverse the input list using foldl and accumulate the output tuple:
nzp :: [Int] -> (Int, Int, Int)
nzp = foldl update (0,0,0)
EDIT: As #leftroundabout points out in the comments, you should avoid using foldl since it can lead to space leaks - you can find an explanation in Real World Haskell. You can use the strict version of foldl, foldl' in Data.List
import Data.List
nzp = foldl' update (0,0,0)
I can't seem to grasp the concept of recursively creating a tuple.
I shouldn't think so – in fact, it's impossible! (At least not without GHC extension havoc.)
No, you need to create a tuple in one go. The important thing needed for your function: since you may only traverse the list once, you need to pull through the entire tuple at once as well. As this is apparently homework, I shall instead show how it works with a fold (that is in fact preferrable, but translates quite directly to recursion):
nzp = foldr switchIncr (0,0,0)
where switchIncr x (negatives, zeroes, positives)
| x<0 = (succ negatives, zeroes, positives)
| x==0 = (negatives, succ zeroes, positives)
| otherwise = (negatives, zeroes, succ positives)
New to Haskell too! Perhaps not proper but here is my solution which works.
Define an auxiliary function that accumulates the n,z, and p values
let f (x:xs, (n, z, p)) | x < 0 = f (xs, (n+1, z, p))
| x == 0 = f (xs, (n, z+1, p))
| otherwise = f (xs, (n, z, p+1))
f ([], (n, z, p)) = ([], (n, z, p))
and define nzp in terms of the auxiliary function
let nzp x = snd $ f (x,(0,0,0))
to verify
Prelude> nzp [-1,1,1,-1,0]
(2,1,2)
I am a little over a couple of months into Haskell. A helper/auxilliary function would make running this solution easier.
s3 [] (l,g,z) = (l,g,z)
s3 (x:xs) (l,g,z) = if x<0
then (s3 xs (l+1,g,z))
else if x>0
then (s3 xs (l,g+1,z))
else (s3 xs (l,g,z+1))
This is run with s3 [1,0,-2,3,4,-7,0,-8] (0,0,0) producing
(3,3,2).
Just about any primitive recursive function can be translated into a fold.

Recursive state monad for accumulating a value while building a list?

I'm totally new to Haskell so apologies if the question is silly.
What I want to do is recursively build a list while at the same time building up an accumulated value based on the recursive calls. This is for a problem I'm doing for a Coursera course, so I won't post the exact problem but something analogous.
Say for example I wanted to take a list of ints and double each one (ignoring for the purpose of the example that I could just use map), but I also wanted to count up how many times the number '5' appears in the list.
So to do the doubling I could do this:
foo [] = []
foo (x:xs) = x * 2 : foo xs
So far so easy. But how can I also maintain a count of how many times x is a five? The best solution I've got is to use an explicit accumulator like this, which I don't like as it reverses the list, so you need to do a reverse at the end:
foo total acc [] = (total, reverse acc)
foo total acc (x:xs) = foo (if x == 5 then total + 1 else total) (x*2 : acc) xs
But I feel like this should be able to be handled nicer by the State monad, which I haven't used before, but when I try to construct a function that will fit the pattern I've seen I get stuck because of the recursive call to foo. Is there a nicer way to do this?
EDIT: I need this to work for very long lists, so any recursive calls need to be tail-recursive too. (The example I have here manages to be tail-recursive thanks to Haskell's 'tail recursion modulo cons').
Using State monad it can be something like:
foo :: [Int] -> State Int [Int]
foo [] = return []
foo (x:xs) = do
i <- get
put $ if x==5 then (i+1) else i
r <- foo xs
return $ (x*2):r
main = do
let (lst,count) = runState (foo [1,2,5,6,5,5]) 0 in
putStr $ show count
This is a simple fold
foo :: [Integer] -> ([Integer], Int)
foo [] = ([], 0)
foo (x : xs) = let (rs, n) = foo xs
in (2 * x : rs, if x == 5 then n + 1 else n)
or expressed using foldr
foo' :: [Integer] -> ([Integer], Int)
foo' = foldr f ([], 0)
where
f x (rs, n) = (2 * x : rs, if x == 5 then n + 1 else n)
The accumulated value is a pair of both the operations.
Notes:
Have a look at Beautiful folding. It shows a nice way how to make such computations composable.
You can use State for the same thing as well, by viewing each element as a stateful computation. This is a bit overkill, but certainly possible. In fact, any fold can be expressed as a sequence of State computations:
import Control.Monad
import Control.Monad.State
-- I used a slightly non-standard signature for a left fold
-- for simplicity.
foldl' :: (b -> a -> a) -> a -> [b] -> a
foldl' f z xs = execState (mapM_ (modify . f) xs) z
Function mapM_ first maps each element of xs to a stateful computation by modify . f :: b -> State a (). Then it combines a list of such computations into one of type State a () (it discards the results of the monadic computations, just keeps the effects). Finally we run this stateful computation on z.

Performance of "all" in haskell

I got nearly no knowledge of haskell and tried to solve some Project Euler Problems.
While solving Number 5 I wrote this solution (for 1..10)
--Check if n can be divided by 1..max
canDivAll :: Integer -> Integer -> Bool
canDivAll max n = all (\x -> n `mod` x == 0) [1..max]
main = print $ head $ filter (canDivAll 10) [1..]
Now I found out, that all is implemented like this:
all p = and . map p
Doesn't this mean, the condition is checked for every element? Wouldn't it be much faster to break upon the first False-Result of the condition? This would make the execution of the code above faster.
Thanks
and itself is short-circuited and since both map and all evaluation is lazy, you will only get as many elements as needed - not more.
You can verify that with a GHCi session:
Prelude Debug.Trace> and [(trace "first" True), (trace "second" True)]
first
second
True
Prelude Debug.Trace> and [(trace "first" False), (trace "second" False)]
first
False
map does not evaluate all its argument before and executes. And and is short-circuited.
Notice that in GHC all isn't really defined like this.
-- | Applied to a predicate and a list, 'all' determines if all elements
-- of the list satisfy the predicate.
all :: (a -> Bool) -> [a] -> Bool
#ifdef USE_REPORT_PRELUDE
all p = and . map p
#else
all _ [] = True
all p (x:xs) = p x && all p xs
{-# RULES
"all/build" forall p (g::forall b.(a->b->b)->b->b) .
all p (build g) = g ((&&) . p) True
#-}
#endif
We see that all p (x:xs) = p x && all p xs, so whenever p x is false, the evaluation will stop.
Moreover, there is a simplification rule all/build, which effectively transforms your all p [1..max] into a simple fail-fast loop*, so I don't think you can improve much from modifying all.
*. The simplified code should look like:
eftIntFB c n x0 y | x0 ># y = n
| otherwise = go x0
where
go x = I# x `c` if x ==# y then n else go (x +# 1#)
eftIntFB ((&&) . p) True 1# max#
This is a good program for the fusion optimization, as all your loops are expressed as fusible combinators. Thus you can write it using, e.g. Data.Vector, and get better performance than with lists.
From N=20, with lists as in your program:
52.484s
Also, use rem instead of mod.
15.712s
Where the list functions become vector operations:
import qualified Data.Vector.Unboxed as V
canDivAll :: Int -> Int -> Bool
canDivAll max n = V.all (\x -> n `rem` x == 0) (V.enumFromN 1 max)
main = print . V.head $ V.filter (canDivAll 20) $ V.unfoldr (\a -> Just (a, a+1)) 1
You're assuming that and is not short-circuiting. and will stop execution on the first false result it sees, so it is "fast" as one might expect.

Resources