Haskell cyclic structure [duplicate] - haskell
I found this statement while studying Functional Reactive Programming, from "Plugging a Space Leak with an Arrow" by Hai Liu and Paul Hudak ( page 5) :
Suppose we wish to define a function that repeats its argument indefinitely:
repeat x = x : repeat x
or, in lambdas:
repeat = λx → x : repeat x
This requires O(n) space. But we can achieve O(1) space by writing instead:
repeat = λx → let xs = x : xs
in xs
The difference here seems small but it hugely prompts the space efficiency. Why and how it happens ? The best guess I've made is to evaluate them by hand:
r = \x -> x: r x
r 3
-> 3: r 3
-> 3: 3: 3: ........
-> [3,3,3,......]
As above, we will need to create infinite new thunks for these recursion. Then I try to evaluate the second one:
r = \x -> let xs = x:xs in xs
r 3
-> let xs = 3:xs in xs
-> xs, according to the definition above:
-> 3:xs, where xs = 3:xs
-> 3:xs:xs, where xs = 3:xs
In the second form the xs appears and can be shared between every places it occurring, so I guess that's why we can only require O(1) spaces rather than O(n). But I'm not sure whether I'm right or not.
BTW: The keyword "shared" comes from the same paper's page 4:
The problem here is that the standard call-by-need evaluation rules
are unable to recognize that the function:
f = λdt → integralC (1 + dt) (f dt)
is the same as:
f = λdt → let x = integralC (1 + dt) x in x
The former definition causes work to be repeated in the recursive call
to f, whereas in the latter case the computation is shared.
It's easiest to understand with pictures:
The first version
repeat x = x : repeat x
creates a chain of (:) constructors ending in a thunk which will replace itself with more constructors as you demand them. Thus, O(n) space.
The second version
repeat x = let xs = x : xs in xs
uses let to "tie the knot", creating a single (:) constructor which refers to itself.
Put simply, variables are shared, but function applications are not. In
repeat x = x : repeat x
it is a coincidence (from the language's perspective) that the (co)recursive call to repeat is with the same argument. So, without additional optimization (which is called static argument transformation), the function will be called again and again.
But when you write
repeat x = let xs = x : xs in xs
there are no recursive function calls. You take an x, and construct a cyclic value xs using it. All sharing is explicit.
If you want to understand it more formally, you need to familiarize yourself with the semantics of lazy evaluation, such as A Natural Semantics for Lazy Evaluation.
Your intuition about xs being shared is correct. To restate the author's example in terms of repeat, instead of integral, when you write:
repeat x = x : repeat x
the language does not recognize that the repeat x on the right is the same as the value produced by the expression x : repeat x. Whereas if you write
repeat x = let xs = x : xs in xs
you're explicitly creating a structure that when evaluated looks like this:
{hd: x, tl:|}
^ |
\________/
Related
How to create a Infinite List in Haskell where the new value consumes all the previous values
If I create a infinite list like this: let t xs = xs ++ [sum(xs)] let xs = [1,2] : map (t) xs take 10 xs I will get this result: [ [1,2], [1,2,3], [1,2,3,6], [1,2,3,6,12], [1,2,3,6,12,24], [1,2,3,6,12,24,48], [1,2,3,6,12,24,48,96], [1,2,3,6,12,24,48,96,192], [1,2,3,6,12,24,48,96,192,384], [1,2,3,6,12,24,48,96,192,384,768] ] This is pretty close to what I am trying to do. This current code uses the last value to define the next. But, instead of a list of lists, I would like to know some way to make an infinite list that uses all the previous values to define the new one. So the output would be only [1,2,3,6,12,24,48,96,192,384,768,1536,...] I have the definition of the first element [1]. I have the rule of getting a new element, sum all the previous elements. But, I could not put this in the Haskell grammar to create the infinite list. Using my current code, I could take the list that I need, using the command: xs !! 10 > [1,2,3,6,12,24,48,96,192,384,768,1536] But, it seems to me, that it is possible doing this in some more efficient way. Some Notes I understand that, for this particular example, that was intentionally oversimplified, we could create a function that uses only the last value to define the next. But, I am searching if it is possible to read all the previous values into an infinite list definition. I am sorry if the example that I used created some confusion. Here another example, that is not possible to fix using reading only the last value: isMultipleByList :: Integer -> [Integer] -> Bool isMultipleByList _ [] = False isMultipleByList v (x:xs) = if (mod v x == 0) then True else (isMultipleByList v xs) nextNotMultipleLoop :: Integer -> Integer -> [Integer] -> Integer nextNotMultipleLoop step v xs = if not (isMultipleByList v xs) then v else nextNotMultipleLoop step (v + step) xs nextNotMultiple :: [Integer] -> Integer nextNotMultiple xs = if xs == [2] then nextNotMultipleLoop 1 (maximum xs) xs else nextNotMultipleLoop 2 (maximum xs) xs addNextNotMultiple xs = xs ++ [nextNotMultiple xs] infinitePrimeList = [2] : map (addNextNotMultiple) infinitePrimeList take 10 infinitePrimeList [ [2,3], [2,3,5], [2,3,5,7], [2,3,5,7,11], [2,3,5,7,11,13], [2,3,5,7,11,13,17], [2,3,5,7,11,13,17,19], [2,3,5,7,11,13,17,19,23], [2,3,5,7,11,13,17,19,23,29], [2,3,5,7,11,13,17,19,23,29,31] ] infinitePrimeList !! 10 [2,3,5,7,11,13,17,19,23,29,31,37]
You can think so: You want to create a list (call them a) which starts on [1,2]: a = [1,2] ++ ??? ... and have this property: each next element in a is a sum of all previous elements in a. So you can write scanl1 (+) a and get a new list, in which any element with index n is sum of n first elements of list a. So, it is [1, 3, 6 ...]. All you need is take all elements without first: tail (scanl1 (+) a) So, you can define a as: a = [1,2] ++ tail (scanl1 (+) a) This way of thought you can apply with other similar problems of definition list through its elements.
If we already had the final result, calculating the list of previous elements for a given element would be easy, a simple application of the inits function. Let's assume we already have the final result xs, and use it to compute xs itself: import Data.List (inits) main :: IO () main = do let is = drop 2 $ inits xs xs = 1 : 2 : map sum is print $ take 10 xs This produces the list [1,2,3,6,12,24,48,96,192,384] (Note: this is less efficient than SergeyKuz1001's solution, because the sum is re-calculated each time.)
unfoldr has a quite nice flexibility to adapt to various "create-a-list-from-initial-conditions"-problems so I think it is worth mentioning. A little less elegant for this specific case, but shows how unfoldr can be used. import Data.List nextVal as = Just (s,as++[s]) where s = sum as initList = [1,2] myList =initList ++ ( unfoldr nextVal initList) main = putStrLn . show . (take 12) $ myList Yielding [1,2,3,6,12,24,48,96,192,384,768,1536] in the end. As pointed out in the comment, one should think a little when using unfoldr. The way I've written it above, the code mimicks the code in the original question. However, this means that the accumulator is updated with as++[s], thus constructing a new list at every iteration. A quick run at https://repl.it/languages/haskell suggests it becomes quite memory intensive and slow. (4.5 seconds to access the 2000nd element in myList Simply swapping the acumulator update to a:as produced a 7-fold speed increase. Since the same list can be reused as accumulator in every step it goes faster. However, the accumulator list is now in reverse, so one needs to think a little bit. In the case of predicate function sum this makes no differece, but if the order of the list matters, one must think a little bit extra.
You could define it like this: xs = 1:2:iterate (*2) 3 For example: Prelude> take 12 xs [1,2,3,6,12,24,48,96,192,384,768,1536]
So here's my take. I tried not to create O(n) extra lists. explode ∷ Integral i ⇒ (i ->[a] -> a) -> [a] -> [a] explode fn init = as where as = init ++ [fn i as | i <- [l, l+1..]] l = genericLength init This convenience function does create additional lists (by take). Hopefully they can be optimised away by the compiler. explode' f = explode (\x as -> f $ take x as) Usage examples: myList = explode' sum [1,2] sum' 0 xs = 0 sum' n (x:xs) = x + sum' (n-1) xs myList2 = explode sum' [1,2] In my tests there's little performance difference between the two functions. explode' is often slightly better.
The solution from #LudvigH is very nice and clear. But, it was not faster. I am still working on the benchmark to compare the other options. For now, this is the best solution that I could find: ------------------------------------------------------------------------------------- -- # infinite sum of the previous using fuse ------------------------------------------------------------------------------------- recursiveSum xs = [nextValue] ++ (recursiveSum (nextList)) where nextValue = sum(xs) nextList = xs ++ [nextValue] initialSumValues = [1] infiniteSumFuse = initialSumValues ++ recursiveSum initialSumValues ------------------------------------------------------------------------------------- -- # infinite prime list using fuse ------------------------------------------------------------------------------------- -- calculate the current value based in the current list -- call the same function with the new combined value recursivePrimeList xs = [nextValue] ++ (recursivePrimeList (nextList)) where nextValue = nextNonMultiple(xs) nextList = xs ++ [nextValue] initialPrimes = [2] infiniteFusePrimeList = initialPrimes ++ recursivePrimeList initialPrimes This approach is fast and makes good use of many cores. Maybe there is some faster solution, but I decided to post this to share my current progress on this subject so far.
In general, define xs = x1 : zipWith f xs (inits xs) Then it's xs == x1 : f x1 [] : f x2 [x1] : f x3 [x1, x2] : ...., and so on. Here's one example of using inits in the context of computing the infinite list of primes, which pairs them up as ps = 2 : f p1 [p1] : f p2 [p1,p2] : f p3 [p1,p2,p3] : ... (in the definition of primes5 there).
Is it possible to define patterns of composition in functions or functors?
Consider the following situation. I define a function to process a list of elements by the typical way of doing an operation on the head and recalling the function over the rest of the list. But under certain condition of the element (being negative, being a special character, ...) I change sign on the rest of the list before continuing. Like this: f [] = [] f (x : xs) | x >= 0 = g x : f xs | otherwise = h x : f (opposite xs) opposite [] = [] opposite (y : ys) = negate y : opposite ys As opposite (opposite xs) = xs, I become to the situation of redundant opposite operations, accumulating opposite . opposite . opposite .... It happens with other operations instead of opposite, any such that the composition with itself is the identity, like reverse. Is it possible to overcome this situation using functors / monads / applicatives / arrows? (I don't understand well those concepts). What I would like is to be able of defining a property, or a composition pattern, like this: opposite . opposite = id -- or, opposite (opposite y) = y in order that the compiler or interpreter avoids to calculate the opposite of the opposite (it is possible and simple (native) in some concatenative languages).
You can solve this without any monads, since the logic is quite simple: f g h = go False where go _ [] = [] go b (x':xs) | x >= 0 = g x : go b xs | otherwise = h x : go (not b) xs where x = (if b then negate else id) x' The body of the go function is almost identical to that of your original f function. The only difference is that go decides whether the element should be negated or not based on the boolean value passed to it from previous calls.
Sure, just keep a bit of state telling whether to apply negate to the current element or not. Thus: f = mapM $ \x_ -> do x <- gets (\b -> if b then x_ else negate x_) if x >= 0 then return (g x) else modify not >> return (h x)
Why is this tail-recursive Haskell function slower ?
I was trying to implement a Haskell function that takes as input an array of integers A and produces another array B = [A[0], A[0]+A[1], A[0]+A[1]+A[2] ,... ]. I know that scanl from Data.List can be used for this with the function (+). I wrote the second implementation (which performs faster) after seeing the source code of scanl. I want to know why the first implementation is slower compared to the second one, despite being tail-recursive? -- This function works slow. ps s x [] = x ps s x y = ps s' x' y' where s' = s + head y x' = x ++ [s'] y' = tail y -- This function works fast. ps' s [] = [] ps' s y = [s'] ++ (ps' s' y') where s' = s + head y y' = tail y Some details about the above code: Implementation 1 : It should be called as ps 0 [] a where 'a' is your array. Implementation 2: It should be called as ps' 0 a where 'a' is your array.
You are changing the way that ++ associates. In your first function you are computing ((([a0] ++ [a1]) ++ [a2]) ++ ...) whereas in the second function you are computing [a0] ++ ([a1] ++ ([a2] ++ ..)). Appending a few elements to the start of the list is O(1), whereas appending a few elements to the end of a list is O(n) in the length of the list. This leads to a linear versus quadratic algorithm overall. You can fix the first example by building the list up in reverse order, and then reversing again at the end, or by using something like dlist. However the second will still be better for most purposes. While tail calls do exist and can be important in Haskell, if you are familiar with a strict functional language like Scheme or ML your intuition about how and when to use them is completely wrong. The second example is better, in large part, because it's incremental; it immediately starts returning data that the consumer might be interested in. If you just fixed the first example using the double-reverse or dlist tricks, your function will traverse the entire list before it returns anything at all.
I would like to mention that your function can be more easily expressed as drop 1 . scanl (+) 0 Usually, it is a good idea to use predefined combinators like scanl in favour of writing your own recursion schemes; it improves readability and makes it less likely that you needlessly squander performance. However, in this case, both my scanl version and your original ps and ps' can sometimes lead to stack overflows due to lazy evaluation: Haskell does not necessarily immediately evaluate the additions (depends on strictness analysis). One case where you can see this is if you do last (ps' 0 [1..100000000]). That leads to a stack overflow. You can solve that problem by forcing Haskell to evaluate the additions immediately, for instance by defining your own, strict scanl: myscanl :: (b -> a -> b) -> b -> [a] -> [b] myscanl f q [] = [] myscanl f q (x:xs) = q `seq` let q' = f q x in q' : myscanl f q' xs ps' = myscanl (+) 0 Then, calling last (ps' [1..100000000]) works.
Avoid pattern matching in recursion
Consider this code which I used to solve Euler Problem 58: diagNums = go skips 2 where go (s:skips) x = let x' = x+s in x':go skips (x'+1) squareDiagDeltas = go diagNums where go xs = let (h,r) = splitAt 4 xs in h:go r I don't like the pattern matching in the second function. It looks more complicated than necessary! This is something that arises pretty frequently for me. Here, splitAt returns a tuple, so I have to destructure it first before I can recurse. The same pattern arises perhaps even more annoyingly when my recursion itself returns a tuple I want to modify. Consider: f n = go [1..n] where go [] = (0,0) go (x:xs) = let (y,z) = go xs in (y+x, z-x) compared to the nice and simple recursion: f n = go [1..n] where go [] = 0 go (x:xs) = x+go xs Of course the functions here are pure nonsense and could be written in a wholly different and better way. But my point is that the need for pattern matching arises every time I need to thread more than one value back through the recursion. Are there any ways to avoid this, perhaps by using Applicative or anything similar? Or would you consider this style idiomatic?
First of all, that style is actually rather idiomatic. Since you're doing two things to two different values, there is some irreducible complexity; the actual pattern match does not introduce much on its own. Besides, I personally find the explicit style very readable most of the time. However, there is an alternative. Control.Arrow has a bunch of functions for working with tuples. Since the function arrow -> is an Arrow as well, all these work for normal functions. So you could rewrite your second example using (***) to combine two functions to work over tuples. This operator has the following type: (***) :: a b c -> a b' c' -> a (b, b') (c, c') If we replace a with ->, we get: (***) :: (b -> c) -> (b' -> c') -> ((b, b') -> (c, c')) So you could combine (+ x) and (- x) into a single function with (+ x) *** (- x). This would be equivalent to: \ (a, b) -> (a + x, b - x) Then you could use it in your recursion. Unfortunately, the - operator is stupid and doesn't work in sections, so you would have to write it with a lambda: (+ x) *** (\ a -> a - x) $ go xs You can obviously imagine using any other operator, all of which aren't quite as stupid :). Honestly, I think this version is less readable than the original. However, in other cases, the *** version can be more readable, so it's useful to know about it. In particular, if you were passing (+ x) *** (- x) into a higher-order function instead of applying it immediately, I think the *** version would be better than an explicit lambda.
I agree with Tikhon Jelvis that there is nothing wrong with your version. Like he said, using combinators from Control.Arrow can be useful with higher order functions. You can write f using a fold: f n = foldr (\x -> (+ x) *** subtract x) (0,0) [1..n] And if you really want to get rid of the let in squareDiagDeltas (I'm not sure I would), you can use second, because you are only modifying the second element of the tuple: squareDiagDeltas = go diagNums where go = uncurry (:) . second go . splitAt 4
I agree with hammar, unfoldr is the way to go here. You can also get rid of the pattern matching in diagNums: diagNums = go skips 2 where go (s:skips) x = let x' = x+s in x':go skips (x'+1) The recursion makes it a little difficult to tell what's going on here, so let's examine it in depth. Suppose skips = s0 : s1 : s2 : s3 : ..., then we have: diagNums = go skips 2 = go (s0 : s1 : s2 : s3 : ...) 2 = s0+2 : go (s1 : s2 : s3 : ... ) (s0+3) = s0+2 : s0+s1+3 : go (s2 : s3 : ... ) (s0+s1+4) = s0+2 : s0+s1+3 : s0+s1+s2+4 : go (s3 : ... ) (s0+s1+s2+5) = s0+2 : s0+s1+3 : s0+s1+s2+4 : s0+s1+s2+s3+5 : go (...) (s0+s1+s2+s3+6) This makes it much clearer what's going on, we've got the sum of two sequences, which is easy to compute using zipWith (+): diagNums = zipWith (+) [2,3,4,5,...] [s0, s0+s1, s0+s1+s2, s0+s1+s2+s3,...] So now we just need to find a better way to compute the partial sums of skips, which is a great use for scanl1: scanl1 (+) skips = s0 : s0+s1 : s0+s1+s2 : s0+s1+s2+s3 : ... Leaving a (IMO) much easier to understand definition for diagNums: diagNums = zipWith (+) [2..] $ scanl1 (+) skips
Does Haskell allow a let expression for multiple pattern matchings?
Let's say I have a function which does some computation, with several patterns; implemented in the form of pattern matching. Most of these patterns do (along with other things different from one to another) a treatment on a parameter, for which I use an intermediary variable in a let expression. But I find it really redundant to have the same let on many patterns, and I wonder if there is a way to define a let for several patterns? Here is an example of my duplicated let : data MyType a = Something a | Another Int [a] myFunc (Something x) = -- return something, this isn't the point here myFunc (Another 0 xs) = let intermediary = some $ treatment xs in doSthg intermediary 1 myFunc (Another 1 (x:xs)) = let intermediary = some $ treatment xs in doSthg1 intermediary 1 x myFunc (Another 2 (x:x':xs)) = let intermediary = some $ treatment xs in doSthg2 intermediary 2 x x' You can see that the parameter xs is always present when I use it for intermediary, and this could be factorised. It could easily be achieved by using a helper function but I was wondering if what I am asking is possible without one. Please try to keep it simple for a beginner, and I hope my example is clear enough.
This particular problem can be worked around as follows: myFunc2 (Something x) = returnSomething x myFunc2 (Another n ys) = let xs = drop n ys x = head ys x' = head (tail ys) intermediate = some $ treatment xs in case n of 0 -> doSomething intermediate n 1 -> doSomething1 intermediate n x 2 -> doSomething2 intermediate n x x' Thanks to lazy evaluation x and x' will be only evaluated if their value is needed. However - and this is a big however! - your code will give a runtime error when you try to call myFunc2 (Another 2 []) (and if doSomething2 actually uses x!) because to find out what x is, we need to evaluate head ys - and that'll crash for an empty list. The code you gave as an example also won't work (another runtime error) for Another 2 [] since there's no matching pattern, but there it's easier to supply a fall-back case. This might not be a problem if you control the input and always make sure that the list in Another is long enough, but it's important to be aware of this issue!