Hakell: 'rememberMap' is non lazy

Hakell: 'rememberMap' is non lazy - haskell

I am a newbie to Haskell and I wrote the following code
module RememberMap (rememberMap) where
rememberMap :: (a -> b -> (a, c)) -> a -> [b] -> [c]
rememberMap f acc xs = go acc xs []
where
go acc [x] carry = carry <> [step]
where
(_, step) = f acc x
go acc (x:xs) carry = go accStep xs (carry <> [step])
where
(accStep, step) = f acc x
I wrote this contaminator with the Intent to help me with the most Common difficulty that i have when writing my Haskell code, That is that I recurrently find myself willing to map something (specially in CodeWarrior's Katas) like to map something, but that something required knowledge of the elements before it. But it had the problem of being non-streaming, ergo, it does no allow me to use lazy proprieties of Haskell with it, thus I would like to know if (a) there is already a solution to this problem (preferably Arrows) or (b) how to make it lazy.

To make the function stream you need to have the cons operator outside the recursive call, so a caller can see the first element without the whole recursion needing to happen. So you expect it to look something like:
rememberMap f acc (x:xs) = element : ... recursion ...
Once you understand this there is not much more to do:
rememberMap _ _ [] = []
rememberMap f acc (x:xs) = y : rememberMap f acc' xs
where
(acc', y) = f acc x
You can make an auxiliary function to avoid passing f around if you want, but there's no reason for it to have the extra list that you called carry.

There are mapAccumL and traverse with the lazy State monad.

Related

How to avoid infinite loop in zipWith a self reference?

I'd like to create a list data structure that can zipWith that has a better behavior with self reference. This is for an esoteric language that will rely on self reference and laziness to be Turing complete using only values (no user functions). I've already created it, called Atlas but it has many built ins, I'd like to reduce that and be able to compile/interpret in Haskell.
The issue is that zipWith checks if either list is empty and returns empty. But in the case that this answer depends on the result of zipWith then it will loop infinitely. Essentially I'd like it to detect this case and have faith that the list won't be empty. Here is an example using DList
import Data.DList
import Data.List (uncons)
zipDL :: (a->b->c) -> DList a -> DList b -> DList c
zipDL f a b = fromList $ zipL f (toList a) (toList b)
zipL :: (a->b->c) -> [a] -> [b] -> [c]
zipL _ [] _ = []
zipL _ _ [] = []
zipL f ~(a:as) ~(b:bs) = f a b : zipL f as bs
a = fromList [5,6,7]
main=print $ dh where
d = zipDL (+) a $ snoc (fromList dt) 0
~(Just (dh,dt)) = uncons $ toList d
This code would sum the list 5,6,7 except for the issue. It can be fixed by removing zipL _ _ [] = [] because then it assumes that the result won't be empty and then it in fact turns out not to be empty. But this is a bad solution because we can't always assume that it is the second list that could have the self reference.
Another way of explaining it is if we talk about the sizes of these list.
The size of zip a b = min (size a) (size b)
So in this example: size d = min (size a) (size d-1+1)
But there in lies the problem, if the size of d is 0, then the size of d = 0, but if size of d is 1 the size is 1, however once the size of d is said to be greater than size of a, then the size would be a, which is a contradiction. But any size 0-a works which means it is undefined.
Essentially I want to detect this case and make the size of d = a.
So far the only thing I have figured out is to make all lists lists of Maybe, and terminate lists with a Nothing value. Then in the application of the zipWith binary function return Nothing if either value is Nothing. You can then take out both of the [] checks in zip, because you can think of all lists as being infinite. Finally to make the summation example work, instead of doing a snoc, do a map, and replace any Nothing value with the snoc value. This works because when checking the second list for Nothing, it can lazily return true, since no value of the second list can be nothing.
Here is that code:
import Data.Maybe
data L a = L (Maybe a) (L a)
nil :: L a
nil = L Nothing nil
fromL :: [a] -> L a
fromL [] = nil
fromL (x:xs) = L (Just x) (fromL xs)
binOpMaybe :: (a->b->c) -> Maybe a -> Maybe b -> Maybe c
binOpMaybe f Nothing _ = Nothing
binOpMaybe f _ Nothing = Nothing
binOpMaybe f (Just a) (Just b) = Just (f a b)
zip2W :: (a->b->c) -> L a -> L b -> L c
zip2W f ~(L a as) ~(L b bs) = L (binOpMaybe f a b) (zip2W f as bs)
unconsL :: L a -> (Maybe a, Maybe (L a))
unconsL ~(L a as) = (a, Just as)
mapOr :: a -> L a -> L a
mapOr v ~(L a as) = L (Just $ fromMaybe v a) $ mapOr v as
main=print $ h
where
a = fromL [4,5,6]
b = zip2W (+) a (mapOr 0 (fromJust t))
(h,t) = unconsL $ b
The downside to this approach is it needs this other operator to map with Just . fromMaybe initialvalue. This is a less intuitive operator than ++. And without it the language could be built entirely on ++ uncons and (:[]) which would be pretty neat.
The other thing I've figured out is in the current ruby implementation to throw an error when a value depends on itself, and catch it in the empty list detection. But this is vary hacky and not entirely sound, although it does work for cases like this. I don't think this can work in Haskell since I don't think you can detect self dependence?
Sorry for the long description and the very odd use case. I've spent tons of time thinking about this, but haven't solved it yet and can't explain it any more succinctly! Not expecting an answer but figured it is worth a shot, thanks for considering.
EDIT:
After seeing it framed as a greatest fixed point question, it seems like a poor question because there is no efficient general solution to such a problem. For example, suppose the code was b = zipWith (+) a (if length b < 1 then [1] else []).
For my purposes it could still be nice to handle some cases correctly - the example provided does have a solution. So I could reframe the question as: when can we find the greatest fixed point efficiently and what is that fixed point? But I believe there is no simple answer to such a question, and so it would be a poor basis for a programming language to rely on ad hoc rules.

Sounds like you want a greatest fixed point. I'm not sure I've seen this done before, but maybe it's possible to make a sensible type class for types that support those.
class GF a where gfix :: (a -> a) -> a
instance GF a => GF [a] where
gfix f = case (f (repeat undefined), f []) of
(_:_, _) -> b:bs where
b = gfix (\a' -> head (f (a':bs)))
bs = gfix (\as' -> tail (f (b:as')))
([], []) -> []
_ -> error "no fixed point greater than bottom exists"
-- use the usual least fixed point. this ain't quite right, but
-- it works for this example, and maybe it's Good Enough
instance GF Int where gfix f = let x = f x in x
Try it out in ghci:
> gfix (\xs -> zipWith (+) [5,6,7] (tail xs ++ [0])) :: [Int]
[18,13,7]
This implementation isn't particularly efficient; e.g. replacing [5,6,7] with [1..n] results in a runtime that's quadratic in n. Perhaps with some cleverness that can be improved, but it's not immediately obvious to me how that would go.

I have an answer for this specific case, not general.
appendRepeat :: a -> [a] -> [a]
appendRepeat v a = h : appendRepeat v t
where
~(h,t) =
if null a
then (v,[])
else (head a,tail a)
a = [4,5,6]
main=print $ head b
where
b = zipWith (+) a $ appendRepeat 0 (tail b)
appendRepeat adds a an infinite list of a repeated value to the end of a list. But the key thing about it is it doesn't check if list is empty or not when deciding that it is returning a non empty list where the tail is a recursive call. This way laziness never ends up in an infinite loop checking the zipWith _ [] case.
So this code works, and for the purposes of the original question, it can be used to convert the language to just using 2 simple functions (++ and :[]). But the interpreter would need to do some static analysis for appending a repeated value and replace it to using this special appendRepeat function (which can easily be done in Atlas). It seems hacky to only make this one implementation switcharoo, but that is all that is needed.

Why does foldr use a helper function?

In explaining foldr to Haskell newbies, the canonical definition is
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr _ z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
But in GHC.Base, foldr is defined as
foldr k z = go
where
go [] = z
go (y:ys) = y `k` go ys
It seems this definition is an optimization for speed, but I don't see why using the helper function go would make it faster. The source comments (see here) mention inlining, but I also don't see how this definition would improve inlining.

I can add some important details about GHC's optimization system.
The naive definition of foldr passes around a function. There's an inherent overhead in calling a function - especially when the function isn't known at compile time. It'd be really nice to able to inline the definition of the function if it's known at compile time.
There are tricks available to perform that inlining in GHC - and this is an example of them. First, foldr needs to be inlined (I'll get to why later). foldr's naive implementation is recursive, so cannot be inlined. So a worker/wrapper transformation is applied to the definition. The worker is recursive, but the wrapper is not. This allows foldr to be inlined, despite the recursion over the structure of the list.
When foldr is inlined, it creates a copy of all of its local bindings, too. It's more or less a direct textual inlining (modulo some renaming, and happening after the desugaring pass). This is where things get interesting. go is a local binding, and the optimizer gets to look inside it. It notices that it calls a function in the local scope, which it names k. GHC will often remove the k variable entirely, and will just replace it with the expression k reduces to. And then afterwards, if the function application is amenable to inlining, it can be inlined at this time - removing the overhead of calling a first-class function entirely.
Let's look at a simple, concrete example. This program will echo a line of input with all trailing 'x' characters removed:
dropR :: Char -> String -> String
dropR x r = if x == 'x' && null r then "" else x : r
main :: IO ()
main = do
s <- getLine
putStrLn $ foldr dropR "" s
First, the optimizer will inline foldr's definition and simplify, resulting in code that looks something like this:
main :: IO ()
main = do
s <- getLine
-- I'm changing the where clause to a let expression for the sake of readability
putStrLn $ let { go [] = ""; go (x:xs) = dropR x (go xs) } in go s
And that's the thing the worker-wrapper transformation allows.. I'm going to skip the remaining steps, but it should be obvious that GHC can now inline the definition of dropR, eliminating the function call overhead. This is where the big performance win comes from.

GHC cannot inline recursive functions, so
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr _ z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
cannot be inlined. But
foldr k z = go
where
go [] = z
go (y:ys) = y `k` go ys
is not a recursive function. It is a non-recursive function with a local recursive definition!
This means that, as #bheklilr writes, in map (foldr (+) 0) the foldr can be inlined and hence f and z replaced by (+) and 0 in the new go, and great things can happen, such as unboxing of the intermediate value.

As the comments say:
-- Inline only in the final stage, after the foldr/cons rule has had a chance
-- Also note that we inline it when it has *two* parameters, which are the
-- ones we are keen about specialising!
In particular, note the "we inline it when it has two parameters, which are the ones we are keen about specialising!"
What this is saying is that when foldr gets inlined, it's getting inlined only for the specific choice of f and z, not for the choice of the list getting folded. I'm not expert, but it would seem it would make it possible to inline it in situations like
map (foldr (+) 0) some_list
so that the inline happens in this line and not after map has been applied. This makes it optimizable in more situations and more easily. All the helper function does is mask the 3rd argument so {-# INLINE #-} can do its thing.

One tiny important detail not mentioned in other answers is that GHC, given a function definition like
f x y z w q = ...
cannot inline f until all of the arguments x, y, z, w, and q are applied. This means that it's often advantageous to use the worker/wrapper transformation to expose a minimal set of function arguments which must be applied before inlining can occur.

writing a recursive function using foldr

I am new in Haskell programming.
While practicing I was asked to make a recursive function that looks like this:
repeat1 5 [1,2,3] = [[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
which is
repeat1 :: Int -> a -> [a]
repeat1 0 x = []
repeat1 num x = x : repeat1 (num-1) x
I want to convert it into a foldr function but I can't :(
I have read about the lambda functions and the folding(foldr and foldl) functions from http://en.wikibooks.org/wiki/Haskell/List_processing
Can anybody help please?
Thanks in advance

foldr is for functions that consume lists. For producing lists, unfoldr is a more natural choice:
repeat1 :: Int -> a -> [a]
repeat1 n x = unfoldr f n
where f 0 = Nothing
f n = Just (x, n-1)
That said, I think writing it as a plain recursion is more clear in this case.

As hammar pointed out, foldr isn't the right tool here, as you first need a list to work on. Why not simply...
repeat1 n = take n . repeat

If you really want to use foldr, you could do something like that:
repeat' n x = foldr (\_ acc -> x:acc) [] [1..n]
You basically create a list of size n with [1..n] and for each element of that list, you append x to your accumulator (base value []). In the end you have a n-elements list of x.

How can this function be written using foldr?

I have this simple function which returns a list of pairs with the adjacents elements of a list.
adjacents :: [a] -> [(a,a)]
adjacents (x:y:xs) = [(x,y)] ++ adjacents (y:xs)
adjacents (x:xs) = []
I'm having problems trying to write adjacents using foldr. I've done some research but nothing seems to give me a hint. How can it be done?

Tricky folds like this one can often be solved by having the fold build up a function rather than try to build the result directly.
adjacents :: [a] -> [(a, a)]
adjacents xs = foldr f (const []) xs Nothing
where f curr g (Just prev) = (prev, curr) : g (Just curr)
f curr g Nothing = g (Just curr)
Here, the idea is to let the result be a function of type Maybe a -> [(a, a)] where the Maybe contains the previous element, or Nothing if we're at the beginning of the list.
Let's take a closer look at what's going on here:
If we have both a previous and a current element, we make a pair and pass the current element to the result of the recursion, which is the function which will generate the tail of the list.
f curr g (Just prev) = (prev, curr) : g (Just curr)
If there is no previous element, we just pass the current one to the next step.
f curr g Nothing = g (Just curr)
The base case const [] at the end of the list just ignores the previous element.
By doing it this way, the result is as lazy as your original definition:
> adjacents (1 : 2 : 3 : 4 : undefined)
[(1,2),(2,3),(3,4)*** Exception: Prelude.undefined

I don't think your function is a good fit for a fold, because it looks at two elements rather than one.
I think the best solution to the problem is
adjacents [] = []
adjacents xs = zip xs (tail xs)
But we can shoehorn it into a travesty of a fold if you like. First an auxilliary function.
prependPair :: a -> [(a,a)] -> [(a,a)]
prependPair x [] = [(x,b)] where b = error "I don't need this value."
prependPair x ((y,z):ys) = ((x,y):(y,z):ys)
adjacents' xs = init $ foldr prependPair [] xs
I feel like I've cheated slightly by making and throwing
away the last element with the error value, but hey ho, I already said I don't think
foldr is a good way of doing this, so I guess this hack is an example of it not being a fold.

You can also try unfoldr instead of foldr.
import Data.List
adjacents xs = unfoldr f xs where
f (x:rest#(y:_)) = Just ((x,y), rest)
f _ = Nothing

Is my rewritten foldl function optimised?

I just started Haskell 2 days ago so I'm not yet sure about how to optimise my code.
As an exercise, I have rewritten foldl and foldr ( I will give foldl here but foldr is the same, replacing last with head and init with tail).
The code is:
module Main where
myFoldl :: ( a -> ( b -> a ) ) -> a -> ( [b] -> a )
myFoldl func = ( \x -> (\theList
-> if (length theList == 0) then
x
else
myFoldl func (func x (last theList) ) (init theList)
) )
My only concern is that I suspect Haskell can't apply tail call optimisation here because the recursive call is not made at the end of the function.
How can I make this tail call optimised? Is Haskell's built-in implementation of foldl implemented differently to mine?

Your use of parentheses in your code sample and your emphasis on tail recursion suggests you're coming to Haskell from Lisp or Scheme. If you're coming to Haskell from an eager language like Scheme, be warned: tail calls are not nearly as predictive of performance in Haskell as they are in an eager language. You can have tail-recursive functions that execute in linear space because of laziness, and you can have non-tail recursive functions that execute in constant space because of laziness. (Confused already?)
First flaw in your definition is the use of the length theList == 0. This forces evaluation of the whole spine of the list, and is O(n) time. It's better to use pattern matching, like in this naïve foldl definition in Haskell:
foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f z [] = z
foldl f z (x:xs) = foldl f (f z x) xs
This, however, performs infamously badly in Haskell, because we don't actually compute the f z x part until the caller of foldl demands the result; so this foldl accumulates unevaluated thunks in memory for each list element, and gains no benefit from being tail recursive. In fact, despite being tail-recursive, this naïve foldl over a long list can lead to a stack overflow! (The Data.List module has a foldl' function that doesn't have this problem.)
As a converse to this, many Haskell non-tail recursive functions perform very well. For example, take this definition of find, based on the accompanying non-tail recursive definition of foldr:
find :: (a -> Boolean) -> [a] -> Maybe a
find pred xs = foldr find' Nothing xs
where find' elem rest = if pred elem then Just elem else rest
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f z [] = z
foldr f z (x:xs) = f x (subfold xs)
where subfold = foldr f z
This actually executes in linear time and constant space, thanks to laziness. Why?
Once you find an element that satisfies the predicate, there is no need to traverse the rest of the list to compute the value of rest.
Once you look at an element and decide that it doesn't match, there's no need to keep any data about that element.
The lesson I'd impart right now is: don't bring in your performance assumptions from eager languages into Haskell. You're just two days in; concentrate first on understanding the syntax and semantics of the language, and don't contort yourself into writing optimized versions of functions just yet. You're going to get hit with the foldl-style stack overflow from time to time at first, but you'll master it in time.
EDIT [9/5/2012]: Simpler demonstration that lazy find runs in constant space despite not being tail recursive. First, simplified definitions:
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
find :: (a -> Bool) -> [a] -> Maybe a
find p xs = let step x rest = if p x then Just x else rest
in foldr step Nothing xs
Now, using equational reasoning (i.e., substituting equals with equals, based on the definition above), and evaluating in a lazy order (outermost first), let's calculate find (==400) [1..]:
find (==400) [1..]
-- Definition of `find`:
=> let step x rest = if x == 400 then Just x else rest
in foldr step Nothing [1..]
-- `[x, y, ...]` is the same as `x:[y, ...]`:
=> let step x rest = if x == 400 then Just x else rest
in foldr step Nothing (1:[2..])
-- Using the second equation in the definition of `foldr`:
=> let step x rest = if x == 400 then Just x else rest
in step 1 (foldr step Nothing [2..])
-- Applying `step`:
=> let step x rest = if x == 400 then Just x else rest
in if 1 == 400 then Just 1 else foldr step Nothing [2..]
-- `1 == 400` is `False`
=> let step x rest = if x == 400 then Just x else rest
in if False then Just 1 else foldr step Nothing [2..]
-- `if False then a else b` is the same as `b`
=> let step x rest = if x == 400 then Just x else rest
in foldr step Nothing [2..]
-- Repeat the same reasoning steps as above
=> let step x rest = if x == 400 then Just x else rest
in foldr step Nothing (2:[3..])
=> let step x rest = if x == 400 then Just x else rest
in step 2 (foldr step Nothing [3..])
=> let step x rest = if x == 400 then Just x else rest
in if 2 == 400 then Just 2 else foldr step Nothing [3..]
=> let step x rest = if x == 400 then Just x else rest
in if False then Just 2 else foldr step Nothing [3..]
=> let step x rest = if x == 400 then Just x else rest
in foldr step Nothing [3..]
.
.
.
=> let step x rest = if x == 400 then Just x else rest
in foldr step Nothing [400..]
=> let step x rest = if x == 400 then Just x else rest
in foldr step Nothing (400:[401..])
=> let step x rest = if x == 400 then Just x else rest
in step 400 (foldr step Nothing [401..])
=> let step x rest = if x == 400 then Just x else rest
in if 400 == 400 then Just 400 else foldr step Nothing [401..]
=> let step x rest = if x == 400 then Just x else rest
in if True then Just 400 else foldr step Nothing [401..]
-- `if True then a else b` is the same as `a`
=> let step x rest = if x == 400 then Just x else rest
in Just 400
-- We can eliminate the `let ... in ...` here:
=> Just 400
Note that the expressions in the successive evaluation steps don't get progressively more complex or longer as we proceed through the list; the length or depth of the expression at step n is not proportional to n, it's basically fixed. This in fact demonstrates how find (==400) [1..] can be lazily executed in constant space.

Idiomatic Haskell looks very different to this, eschewing if-then-else, nested lambdas, parentheses, and destructuring functions (head, tail). Instead, you'd write it something like:
foldl :: (a -> b -> a) -> a -> [b] -> a
foldl f z0 xs0 = go z0 xs0
where
go z [] = z
go z (x:xs) = go (f z x) xs
Relying instead on pattern matching, a where clause, tail recursion, guarded declarations.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Hakell: 'rememberMap' is non lazy - haskell

There are mapAccumL and traverse with the lazy State monad.

Related

How to avoid infinite loop in zipWith a self reference?

Why does foldr use a helper function?

writing a recursive function using foldr

How can this function be written using foldr?

Is my rewritten foldl function optimised?

Categories

Resources