I got nearly no knowledge of haskell and tried to solve some Project Euler Problems.
While solving Number 5 I wrote this solution (for 1..10)
--Check if n can be divided by 1..max
canDivAll :: Integer -> Integer -> Bool
canDivAll max n = all (\x -> n `mod` x == 0) [1..max]
main = print $ head $ filter (canDivAll 10) [1..]
Now I found out, that all is implemented like this:
all p = and . map p
Doesn't this mean, the condition is checked for every element? Wouldn't it be much faster to break upon the first False-Result of the condition? This would make the execution of the code above faster.
Thanks
and itself is short-circuited and since both map and all evaluation is lazy, you will only get as many elements as needed - not more.
You can verify that with a GHCi session:
Prelude Debug.Trace> and [(trace "first" True), (trace "second" True)]
first
second
True
Prelude Debug.Trace> and [(trace "first" False), (trace "second" False)]
first
False
map does not evaluate all its argument before and executes. And and is short-circuited.
Notice that in GHC all isn't really defined like this.
-- | Applied to a predicate and a list, 'all' determines if all elements
-- of the list satisfy the predicate.
all :: (a -> Bool) -> [a] -> Bool
#ifdef USE_REPORT_PRELUDE
all p = and . map p
#else
all _ [] = True
all p (x:xs) = p x && all p xs
{-# RULES
"all/build" forall p (g::forall b.(a->b->b)->b->b) .
all p (build g) = g ((&&) . p) True
#-}
#endif
We see that all p (x:xs) = p x && all p xs, so whenever p x is false, the evaluation will stop.
Moreover, there is a simplification rule all/build, which effectively transforms your all p [1..max] into a simple fail-fast loop*, so I don't think you can improve much from modifying all.
*. The simplified code should look like:
eftIntFB c n x0 y | x0 ># y = n
| otherwise = go x0
where
go x = I# x `c` if x ==# y then n else go (x +# 1#)
eftIntFB ((&&) . p) True 1# max#
This is a good program for the fusion optimization, as all your loops are expressed as fusible combinators. Thus you can write it using, e.g. Data.Vector, and get better performance than with lists.
From N=20, with lists as in your program:
52.484s
Also, use rem instead of mod.
15.712s
Where the list functions become vector operations:
import qualified Data.Vector.Unboxed as V
canDivAll :: Int -> Int -> Bool
canDivAll max n = V.all (\x -> n `rem` x == 0) (V.enumFromN 1 max)
main = print . V.head $ V.filter (canDivAll 20) $ V.unfoldr (\a -> Just (a, a+1)) 1
You're assuming that and is not short-circuiting. and will stop execution on the first false result it sees, so it is "fast" as one might expect.
Related
I said in this question that I didn't understand the source code of findIndices.
In fact I didn't pay enough attention and I didn't see that there are two definitions of this function:
findIndices :: (a -> Bool) -> [a] -> [Int]
#if defined(USE_REPORT_PRELUDE)
findIndices p xs = [ i | (x,i) <- zip xs [0..], p x]
#else
-- Efficient definition, adapted from Data.Sequence
{-# INLINE findIndices #-}
findIndices p ls = build $ \c n ->
let go x r k | p x = I# k `c` r (k +# 1#)
| otherwise = r (k +# 1#)
in foldr go (\_ -> n) ls 0#
#endif /* USE_REPORT_PRELUDE */
I understand the first definition, the one I didn't see. I don't understand the second one. I have a couple of questions:
what is if defined(USE_REPORT_PRELUDE) ?
can one explain the second definition ? What are build, I#, +#, 1# ?
why the second definition is inlined, not the first one ?
The CPP extensions enables the C preprocessor, as for the C programming language. Here, it is used to test if the flag USE_REPORT_PRELUDE was set during compilation. According to that flag, the compiler uses the #if or the #else variant of code.
build is a function which could be defined as
build f = f (:) []
So, using build (\c n -> ... essentially lets c to the "cons" (:), and n to the "nil" [].
This is not used for convenience: it is not convenient at all! However, the compiler optimizer works great with build and foldr combined, so the code is written here in a weird way to take advantage of that.
Further, I# ... is the low-level constructor for integers. When we normally write
x :: Int
x = 4+2
GHC implements x (very roughly) with a pointer to some memory that reads as unevaluated: 4+2. After x is forced the first time, this memory gets overwritten with evaluated: I# 6#. This is needed to implement laziness.
The "boxing" here refers to the indirection through a pointer.
Instead, the type Int# is a plain machine integer, with no pointers, no indirection, no unevaluated expressions. It is strict (instead of lazy), but being more low-level it is more efficient. One creates a value as in
x' :: Int#
x' = 6#
x :: Int
x = I# x'
Indeed, Int is defined as newtype Int = I# Int#.
Keep in mind that this is not standard Haskell, but GHC-specific low-level details. In normal code, you should not need to use such unboxed types. In libraries, the authors do that to achieve a little more performance, but that's it.
Sometimes, even if in our code we only use Ints, GHC is smart enough to automatically convert our code to using Int# and achieve more efficiency, avoiding the boxing. This can be observed if we ask GHC to "dump Core" so that we can see the result of the optimization.
For instance, compiling
f :: Int -> Int
f 0 = 0
f n = n + f (n-1)
GHC produces a lower level version (this is GHC Core, not Haskell, but it is similar enough to be understood):
Main.$wf :: GHC.Prim.Int# -> GHC.Prim.Int#
Main.$wf = \ (ww_s4un :: GHC.Prim.Int#) ->
case ww_s4un of ds_X210 {
__DEFAULT ->
case Main.$wf (GHC.Prim.-# ds_X210 1#) of ww1_s4ur { __DEFAULT ->
GHC.Prim.+# ds_X210 ww1_s4ur
};
0# -> 0#
}
Notice the number of arguments to go. go x r k = ... === go x r = \k -> .... This is the standard trick to arrange for left-to-right information flow while folding the list (go is used as the reducer function, in foldr go (\_ -> n) ls 0#). Here, it's the counting of [0..], explicated as the initial k=0 and the (k + 1) on each step (k is an unfortunate naming choice, i seems better; k is overloaded with the irrelevant "constant" and "continuation", not just "counter" which was probably the intended meaning here).
The foldr/build (sic) fusion (linked to by luqui in the comments) turns foldr c n $ findIndices p [a1,a2,...,an] into a loop, exposing the inner foldr of the findIndices definition, avoiding building the actual list structure of the result of the findIndices call:
build g = g (:) []
foldr c n $ build g = g c n
foldr c n $ findIndices p [a1,a2,...,an]
==
foldr c n $ build g where {g c n = ...}
=
g c n where {g c n = ...}
=
foldr go (const n) [a1,a2,...,an] 0 where {go x r k = ...}
=
go a1 (foldr go (const n) [a2,...,an]) 0
=
let { x=a1, r=foldr go (const n) [a2,...,an], k=0 }
in
if | p x -> c (I# k) (r (k +# 1#)) -- no 'cons' (`:`), only 'c'
| otherwise -> r (k +# 1#)
=
....
So you see, it's a standard trick to have foldr define a function which expects one more input argument, to arrange the left-to-right information flow while processing the input list.
All the stuff with the hash sign are "primitive" or "closer-to-machine-level" entities. I# is a primitive Int constructor; 0# is a machine-level 0; etc.. This may or may not be exactly correct, but it should be close enough.
foldr/build fusion seems a particular case of transducers-based code transformation, which is based on the fact that nested folds are fused by composing their reducers' transformers (aka transducers), as in
foldr c n $
foldr (tr2 c2) n2 $
foldr (tr3 c3) n3 xs
=
foldr (tr2 c) n $ -- fold "replaces" the constructor nodes with its reducer
foldr (tr3 c3) n3 xs -- so just use the outer reducer in the first place!
=
foldr (tr3 (tr2 c)) n xs
=
foldr ((tr3 . tr2) c) n xs
and build g === foldr . tr for some appropriate choice of tr for a given g, so that
build g = g c n = (foldr . tr) c n = foldr (tr c) n
As for USE_REPORT_PRELUDE, again, I can't say this with any authority, but I always assumed that it is the compilation flag which is enabled when the mock definitions from the Haskell Report are used as actual code, even though they were intended as an executable specification.
I'm writing a (Literate) Haskell code that bruteforces the lcm (least common multiple) of a list of Ints.
I already thought about a strategy, but I'm not that good with Haskell syntax and don't know a lot of functions.
This is the function so far:
> bruteforceLCM :: [Int] -> Int -> Int
> bruteforceLCM xs n = if EVERYELEMENTOFTHELIST `mod` n == 0
> then n
> else (bruteforceLCM xs (n+1))
Where xs is the list of all Ints and n is the current Int that gets checked for being the lcm.
The first call would be bruteforceLCM xs 2, because n=0 would be not divisible and n=1 would always return true, these cases are solved with pattern matching earlier.
What would I have to replace "EVERYELEMENTOFTHELIST" with to achieve my goal?
Greeting, Joe
EDIT: Here is the whole code now, thanks to dfeuer!
> bruteforceKGV :: [Int] -> Int -> Int
> bruteforceKGV xs n = if all p xs then n else (bruteforceKGV xs (n+1))
> where p x = n `mod` x == 0
Can you write down a function f :: Int -> Bool that checks if an Int is 0 modulo n? I'll leave this first step to you.
So now you have a function f :: Int -> Bool and a list of Ints, and you want to see if f x is True for every x in the list. We ask Hoogle, and it tells us about all. You'll use f as the first argument of all to do what you want.
You're starting out with
bruteforceLCM :: [Int] -> Int -> Int
bruteforceLCM xs n = if EVERYELEMENTOFTHELIST `mod` n == 0
then n
else (bruteforceLCM xs (n+1))
When you say EVERYELEMENTOFTHELIST `mod` n == 0, what you really mean is "For each element, x, of xs, x `mod` n == 0".
Let's write a predicate expressing what that says about an element of the list:
p x = x `mod` n == 0
Now we can use all, which takes our predicate and tells us if it's true for all elements of the list.
But now we might want to clean things up a bit at a higher level. Because Haskell is lazy, we don't need to be so explicit about the recursion. We can do something like this instead:
bfLCM xs = fromJust $ find SOMETHING [2..]
Unfortunately, running this leads to a lot of infinite loops, because your math actually turns out to be a little bit wrong. Can you figure out where your mistake is?
I recently learned about Data.Function.fix, and now I want to apply it everywhere. For example, whenever I see a recursive function I want to "fix" it. So basically my question is where and when should I use it.
To make it more specific:
1) Suppose I have the following code for factorization of n:
f n = f' n primes
where
f' n (p:ps) = ...
-- if p^2<=n: returns (p,k):f' (n `div` p^k) ps for k = maximum power of p in n
-- if n<=1: returns []
-- otherwise: returns [(n,1)]
If I rewrite it in terms of fix, will I gain something? Lose something? Is it possible, that by rewriting an explicit recursion into fix-version I will resolve or vice versa create a stack overflow?
2) When dealing with lists, there are several solutions: recursion/fix, foldr/foldl/foldl', and probably something else. Is there any general guide/advice on when to use each? For example, would you rewrite the above code using foldr over the infinite list of primes?
There are, probably, other important questions not covered here. Any additional comments related to the usage of fix are welcome as well.
One thing that can be gained by writing in an explicitly fixed form is that the recursion is left "open".
factOpen :: (Integer -> Integer) -> Integer -> Integer
factOpen recur 0 = 1
factOpen recur n = n * recur (pred n)
We can use fix to get regular fact back
fact :: Integer -> Integer
fact = fix factOpen
This works because fix effectively passes a function itself as its first argument. By leaving the recursion open, however, we can modify which function gets "passed back". The best example of using this property is to use something like memoFix from the memoize package.
factM :: Integer -> Integer
factM = memoFix factOpen
And now factM has built-in memoization.
Effectively, we have that open-style recursion requires us impute the recursive bit as a first-order thing. Recursive bindings are one way that Haskell allows for recursion at the language level, but we can build other, more specialized forms.
I'd like to mention another usage of fix; suppose you have a simple language consisting of addition, negative, and integer literals. Perhaps you have written a parser which takes a String and outputs a Tree:
data Tree = Leaf String | Node String [Tree]
parse :: String -> Tree
-- parse "-(1+2)" == Node "Neg" [Node "Add" [Node "Lit" [Leaf "1"], Node "Lit" [Leaf "2"]]]
Now you would like to evaluate your tree to a single integer:
fromTree (Node "Lit" [Leaf n]) = case reads n of {[(x,"")] -> Just x; _ -> Nothing}
fromTree (Node "Neg" [e]) = liftM negate (fromTree e)
fromTree (Node "Add" [e1,e2]) = liftM2 (+) (fromTree e1) (fromTree e2)
Suppose someone else decides to extend the language; they want to add multiplication. They will have to have access to the original source code. They could try the following:
fromTree' (Node "Mul" [e1, e2]) = ...
fromTree' e = fromTree e
But then Mul can only appear once, at the top level of the expression, since the call to fromTree will not be aware of the Node "Mul" case. Tree "Neg" [Tree "Mul" a b] will not work, since the original fromTree has no pattern for "Mul". However, if the same function is written using fix:
fromTreeExt :: (Tree -> Maybe Int) -> (Tree -> Maybe Int)
fromTreeExt self (Node "Neg" [e]) = liftM negate (self e)
fromTreeExt .... -- other cases
fromTree = fix fromTreeExt
Then extending the language is possible:
fromTreeExt' self (Node "Mul" [e1, e2]) = ...
fromTreeExt' self e = fromTreeExt self e
fromTree' = fix fromTreeExt'
Now, the extended fromTree' will evaluate the tree properly, since self in fromTreeExt' refers to the entire function, including the "Mul" case.
This approach is used here (the above example is a closely adapted version of the usage in the paper).
Beware the difference between _Y f = f (_Y f) (recursion, value--copying) and fix f = x where x = f x (corecursion, reference--sharing).
Haskell's let and where bindings are recursive: same name on the LHS and RHS refer to the same entity. The reference is shared.
In the definition of _Y there's no sharing (unless a compiler performs an aggressive optimization of common subexpressions elimination). This means it describes recursion, where repetition is achieved by application of a copy of an original, like in a classic metaphor of a recursive function creating its own copies. Corecursion, on the other hand, relies on sharing, on referring to same entity.
An example, primes calculated by
2 : _Y ((3:) . gaps 5 . _U . map (\p-> [p*p, p*p+2*p..]))
-- gaps 5 == ([5,7..] \\)
-- _U == sort . concat
either reusing its own output (with fix, let g = ((3:)...) ; ps = g ps in 2 : ps) or creating separate primes supply for itself (with _Y, let g () = ((3:)...) (g ()) in 2 : g ()).
See also:
double stream feed to prevent unneeded memoization?
How to implement an efficient infinite generator of prime numbers in Python?
Or, with the usual example of factorial function,
gen rec n = n<2 -> 1 ; n * rec (n-1) -- "if" notation
facrec = _Y gen
facrec 4 = gen (_Y gen) 4
= let {rec=_Y gen} in (\n-> ...) 4
= let {rec=_Y gen} in (4<2 -> 1 ; 4*rec 3)
= 4*_Y gen 3
= 4*gen (_Y gen) 3
= 4*let {rec2=_Y gen} in (3<2 -> 1 ; 3*rec2 2)
= 4*3*_Y gen 2 -- (_Y gen) recalculated
.....
fac = fix gen
fac 4 = (let f = gen f in f) 4
= (let f = (let {rec=f} in (\n-> ...)) in f) 4
= let {rec=f} in (4<2 -> 1 ; 4*rec 3) -- f binding is created
= 4*f 3
= 4*let {rec=f} in (3<2 -> 1 ; 3*rec 2)
= 4*3*f 2 -- f binding is reused
.....
1) fix is just a function, it improves your code when you use some recursion. It makes your code prettier.For example usage visit: Haskell Wikibook - Fix and recursion.
2) You know what does foldr? Seems like foldr isn't useful in factorization (or i didn't understand what are you mean in that).
Here is a prime factorization without fix:
fact xs = map (\x->takeWhile (\y->y/=[]) x) . map (\x->factIt x) $ xs
where factIt n = map (\x->getFact x n []) [2..n]
getFact i n xs
| n `mod` i == 0 = getFact i (div n i) xs++[i]
| otherwise = xs
and with fix(this exactly works like the previous):
fact xs = map (\x->takeWhile (\y->y/=[]) x) . map (\x->getfact x) $ xs
where getfact n = map (\x->defact x n) [2..n]
defact i n =
fix (\rec j k xs->if(mod k j == 0)then (rec j (div k j) xs++[j]) else xs ) i n []
This isn't pretty because in this case fix isn't a good choice(but there is always somebody who can write it better).
I'm totally new to Haskell so apologies if the question is silly.
What I want to do is recursively build a list while at the same time building up an accumulated value based on the recursive calls. This is for a problem I'm doing for a Coursera course, so I won't post the exact problem but something analogous.
Say for example I wanted to take a list of ints and double each one (ignoring for the purpose of the example that I could just use map), but I also wanted to count up how many times the number '5' appears in the list.
So to do the doubling I could do this:
foo [] = []
foo (x:xs) = x * 2 : foo xs
So far so easy. But how can I also maintain a count of how many times x is a five? The best solution I've got is to use an explicit accumulator like this, which I don't like as it reverses the list, so you need to do a reverse at the end:
foo total acc [] = (total, reverse acc)
foo total acc (x:xs) = foo (if x == 5 then total + 1 else total) (x*2 : acc) xs
But I feel like this should be able to be handled nicer by the State monad, which I haven't used before, but when I try to construct a function that will fit the pattern I've seen I get stuck because of the recursive call to foo. Is there a nicer way to do this?
EDIT: I need this to work for very long lists, so any recursive calls need to be tail-recursive too. (The example I have here manages to be tail-recursive thanks to Haskell's 'tail recursion modulo cons').
Using State monad it can be something like:
foo :: [Int] -> State Int [Int]
foo [] = return []
foo (x:xs) = do
i <- get
put $ if x==5 then (i+1) else i
r <- foo xs
return $ (x*2):r
main = do
let (lst,count) = runState (foo [1,2,5,6,5,5]) 0 in
putStr $ show count
This is a simple fold
foo :: [Integer] -> ([Integer], Int)
foo [] = ([], 0)
foo (x : xs) = let (rs, n) = foo xs
in (2 * x : rs, if x == 5 then n + 1 else n)
or expressed using foldr
foo' :: [Integer] -> ([Integer], Int)
foo' = foldr f ([], 0)
where
f x (rs, n) = (2 * x : rs, if x == 5 then n + 1 else n)
The accumulated value is a pair of both the operations.
Notes:
Have a look at Beautiful folding. It shows a nice way how to make such computations composable.
You can use State for the same thing as well, by viewing each element as a stateful computation. This is a bit overkill, but certainly possible. In fact, any fold can be expressed as a sequence of State computations:
import Control.Monad
import Control.Monad.State
-- I used a slightly non-standard signature for a left fold
-- for simplicity.
foldl' :: (b -> a -> a) -> a -> [b] -> a
foldl' f z xs = execState (mapM_ (modify . f) xs) z
Function mapM_ first maps each element of xs to a stateful computation by modify . f :: b -> State a (). Then it combines a list of such computations into one of type State a () (it discards the results of the monadic computations, just keeps the effects). Finally we run this stateful computation on z.
I'm rewritting my Prolog program in Haskell and i have small problem, how can i do something like that
myFunc(Field, Acc, Acc) :-
% some "ending" condition
!.
myFunc(Field, Acc, Result) :-
nextField(Field, Field2),
test1(Field2,...),
myFunc(Field2, Acc, Result).
myFunc(Field, Acc, Result) :-
nextField(Field, Field2),
test2(Ak, (X1, Y1)),
myFunc(Field2, [Field2|Acc], Result).
in Haskell? Code above is checking some condition and recursivly calls itself so in the end i get list of specific fields. The whole point is that if some condition (test1 or test2) fails, it is returning to the last point it could make other choice and does it. How do i implement something like that in Haskell?
To model Prolog computations as expressively in Haskell, you need a backtracking monad. This is trivially done using the LogicT monad. Your example as it stands translates to the following:
import Control.Monad.Logic
myFunc :: Int -> [Int] -> Logic [Int]
myFunc field acc = ifte (exitCond field acc) (\_-> return acc) $
(do f <- nextField field
guard $ test1 f
myFunc f acc)
`mplus`
(do f <- nextField field
guard $ test2 f
myFunc f (f:acc))
Assuming the following implementations for the functions and predicates:
nextField i = return (i+1)
test1 f = f < 10
test2 f = f < 20
exitCond f a = guard (f > 15)
You use mplus to combine to Logic computations so that if one fails it backtracks and tries the other one. ifte is just a soft cut (there's no hard cut in logict, although I believe it's trivial to implement since logict is based on continuations) to exit when the exiting condition is true. You run your computation as follows:
Main> runLogic (myFunc 1 []) (\a r -> a:r) []
[[16,15,14,13,12,11,10],[16,15,14,13,12,11,10,9],[16,15,14,13,12,11,10,8]...
runLogic takes the Logic computation, a continuation and an initial value for the output of the continuation. Here I just passed a continuation which will accumulate all results in a list. The above will backtrack and get all solutions, unlike the Prolog example, since we used a soft cut instead of a hard cut. To stop backtracking after getting the first solution you can use once:
Main> runLogic (once $ myFunc 1 []) (\a r -> a:r) []
[[16,15,14,13,12,11,10]]
you can also use observe to observe the first solution only, without having to pass a continuation:
Main> observe (myFunc 1 [])
[16,15,14,13,12,11,10]
or even obserMany and observeAll:
observeMany 5 (myFunc 1 []) --returns the first 5 solutions
observerAll (myFunc 1 []) --returns a list of all solutions
Finally, you will need to install the logict package to get the above code to work. Use cabal install logict to install it.
Edit: Answering your question in the comments
Yes, you can do something similar without having to install logict. Although a dedicated backtracking monad makes things less complicated and makes clear what you are trying to do.
To model the logict example above you only need the [] monad
myFunc :: Int -> [Int] -> [[Int]]
myFunc field acc | exitCond field acc = return acc
myFunc field acc = do
let m1 = do
f <- nextField field
guard $ test1 f
myFunc f acc
m2 = do
f <- nextField field
guard $ test2 f
myFunc f (f:acc)
in m1 `mplus` m2
nextField i = return $ i + 1
exitCond i a = i > 15
test1 i = i < 10
test2 i = i < 20
You can run it as follows:
Main> myFunc 1 []
[[16,15,14,13,12,11,10],[16,15,14,13,12,11,10,9],[16,15,14,13,12,11,10,8]...
You can also choose how many solutions you want as before:
Main> head $ myFunc 1 []
[16,15,14,13,12,11,10]
Main> take 3 $ myFunc 1 []
[[16,15,14,13,12,11,10],[16,15,14,13,12,11,10,9],[16,15,14,13,12,11,10,8]]
However, you will need the Cont monad, and thus the ListT monad, to implement a hard cut as in the Prolog example, which was not available in the logict example above:
import Control.Monad.Cont
import Control.Monad.Trans.List
myFunc :: Int -> ListT (Cont [[Int]]) [Int]
myFunc field = callCC $ \exit -> do
let g field acc | exitCond field acc = exit acc
g field acc =
let m1 = do
f <- nextField field
guard $ test1 f
g f acc
m2 = do
f <- nextField field
guard $ test2 f
g f (f:acc)
in m1 `mplus` m2
g field []
Like Prolog, this last example will not backtrack again after exitCond is satisfied:
*Main> runCont (runListT (myFunc 1)) id
[[16,15,14,13,12,11,10]]
You comment helped clarify some, but there is still some question in mind about what you are looking for so here is an example of using the list monad and guard.
import Control.Monad
myFunc lst = do
e <- lst
guard $ even e -- only allow even elements
guard . not $ e `elem` [4,6,8] -- do not allow 4, 6, or 8
return e -- accumulate results
used in ghci:
> myFunc [1..20]
[2,10,12,14,16,18,20]
I've never programmed in Haskell - then I would call for your help - but could hint about
that Prolog fragment - where I think you have a typo - should be myFunc(Field2, [(X1,Y1)|Acc], Result). could be compiled -by hand - in a continuation passing schema.
Let's google about it (haskell continuation passing prolog). I will peek first the Wikipedia page: near Haskell we find the continuation monad.
Now we can try to translate that Prolog in executable Haskell. Do this make sense ?
Textual translation of your code to Haskell is:
myFunc field acc = take 1 $ -- a cut
g field acc
where
g f a | ending_condition_holds = [a]
g f a =
( nextField f >>= (\f2 ->
(if test1 ... -- test1 a predicate function
then [()]
else [] ) >>= (_ ->
g f2 a )))
++
( nextField f >>= (\f2 ->
test2 ... >>= (\_ -> -- test2 producing some results
g f2 (f2:a) )))