How to avoid infinite loop in zipWith a self reference? - haskell

I'd like to create a list data structure that can zipWith that has a better behavior with self reference. This is for an esoteric language that will rely on self reference and laziness to be Turing complete using only values (no user functions). I've already created it, called Atlas but it has many built ins, I'd like to reduce that and be able to compile/interpret in Haskell.
The issue is that zipWith checks if either list is empty and returns empty. But in the case that this answer depends on the result of zipWith then it will loop infinitely. Essentially I'd like it to detect this case and have faith that the list won't be empty. Here is an example using DList
import Data.DList
import Data.List (uncons)
zipDL :: (a->b->c) -> DList a -> DList b -> DList c
zipDL f a b = fromList $ zipL f (toList a) (toList b)
zipL :: (a->b->c) -> [a] -> [b] -> [c]
zipL _ [] _ = []
zipL _ _ [] = []
zipL f ~(a:as) ~(b:bs) = f a b : zipL f as bs
a = fromList [5,6,7]
main=print $ dh where
d = zipDL (+) a $ snoc (fromList dt) 0
~(Just (dh,dt)) = uncons $ toList d
This code would sum the list 5,6,7 except for the issue. It can be fixed by removing zipL _ _ [] = [] because then it assumes that the result won't be empty and then it in fact turns out not to be empty. But this is a bad solution because we can't always assume that it is the second list that could have the self reference.
Another way of explaining it is if we talk about the sizes of these list.
The size of zip a b = min (size a) (size b)
So in this example: size d = min (size a) (size d-1+1)
But there in lies the problem, if the size of d is 0, then the size of d = 0, but if size of d is 1 the size is 1, however once the size of d is said to be greater than size of a, then the size would be a, which is a contradiction. But any size 0-a works which means it is undefined.
Essentially I want to detect this case and make the size of d = a.
So far the only thing I have figured out is to make all lists lists of Maybe, and terminate lists with a Nothing value. Then in the application of the zipWith binary function return Nothing if either value is Nothing. You can then take out both of the [] checks in zip, because you can think of all lists as being infinite. Finally to make the summation example work, instead of doing a snoc, do a map, and replace any Nothing value with the snoc value. This works because when checking the second list for Nothing, it can lazily return true, since no value of the second list can be nothing.
Here is that code:
import Data.Maybe
data L a = L (Maybe a) (L a)
nil :: L a
nil = L Nothing nil
fromL :: [a] -> L a
fromL [] = nil
fromL (x:xs) = L (Just x) (fromL xs)
binOpMaybe :: (a->b->c) -> Maybe a -> Maybe b -> Maybe c
binOpMaybe f Nothing _ = Nothing
binOpMaybe f _ Nothing = Nothing
binOpMaybe f (Just a) (Just b) = Just (f a b)
zip2W :: (a->b->c) -> L a -> L b -> L c
zip2W f ~(L a as) ~(L b bs) = L (binOpMaybe f a b) (zip2W f as bs)
unconsL :: L a -> (Maybe a, Maybe (L a))
unconsL ~(L a as) = (a, Just as)
mapOr :: a -> L a -> L a
mapOr v ~(L a as) = L (Just $ fromMaybe v a) $ mapOr v as
main=print $ h
a = fromL [4,5,6]
b = zip2W (+) a (mapOr 0 (fromJust t))
(h,t) = unconsL $ b
The downside to this approach is it needs this other operator to map with Just . fromMaybe initialvalue. This is a less intuitive operator than ++. And without it the language could be built entirely on ++ uncons and (:[]) which would be pretty neat.
The other thing I've figured out is in the current ruby implementation to throw an error when a value depends on itself, and catch it in the empty list detection. But this is vary hacky and not entirely sound, although it does work for cases like this. I don't think this can work in Haskell since I don't think you can detect self dependence?
Sorry for the long description and the very odd use case. I've spent tons of time thinking about this, but haven't solved it yet and can't explain it any more succinctly! Not expecting an answer but figured it is worth a shot, thanks for considering.
After seeing it framed as a greatest fixed point question, it seems like a poor question because there is no efficient general solution to such a problem. For example, suppose the code was b = zipWith (+) a (if length b < 1 then [1] else []).
For my purposes it could still be nice to handle some cases correctly - the example provided does have a solution. So I could reframe the question as: when can we find the greatest fixed point efficiently and what is that fixed point? But I believe there is no simple answer to such a question, and so it would be a poor basis for a programming language to rely on ad hoc rules.

Sounds like you want a greatest fixed point. I'm not sure I've seen this done before, but maybe it's possible to make a sensible type class for types that support those.
class GF a where gfix :: (a -> a) -> a
instance GF a => GF [a] where
gfix f = case (f (repeat undefined), f []) of
(_:_, _) -> b:bs where
b = gfix (\a' -> head (f (a':bs)))
bs = gfix (\as' -> tail (f (b:as')))
([], []) -> []
_ -> error "no fixed point greater than bottom exists"
-- use the usual least fixed point. this ain't quite right, but
-- it works for this example, and maybe it's Good Enough
instance GF Int where gfix f = let x = f x in x
Try it out in ghci:
> gfix (\xs -> zipWith (+) [5,6,7] (tail xs ++ [0])) :: [Int]
This implementation isn't particularly efficient; e.g. replacing [5,6,7] with [1..n] results in a runtime that's quadratic in n. Perhaps with some cleverness that can be improved, but it's not immediately obvious to me how that would go.

I have an answer for this specific case, not general.
appendRepeat :: a -> [a] -> [a]
appendRepeat v a = h : appendRepeat v t
~(h,t) =
if null a
then (v,[])
else (head a,tail a)
a = [4,5,6]
main=print $ head b
b = zipWith (+) a $ appendRepeat 0 (tail b)
appendRepeat adds a an infinite list of a repeated value to the end of a list. But the key thing about it is it doesn't check if list is empty or not when deciding that it is returning a non empty list where the tail is a recursive call. This way laziness never ends up in an infinite loop checking the zipWith _ [] case.
So this code works, and for the purposes of the original question, it can be used to convert the language to just using 2 simple functions (++ and :[]). But the interpreter would need to do some static analysis for appending a repeated value and replace it to using this special appendRepeat function (which can easily be done in Atlas). It seems hacky to only make this one implementation switcharoo, but that is all that is needed.


Missing Haskell primitive to apply a function to each element of a list successively?

In Haskell, it is well known that the map primitive can be used to apply a given function to all elements of a list:
λ> map toUpper "abcd"
While trying to generate all partitions of a finite set (list), the following, similar primitive would be handy:
λ> sap toUpper "abcd"
with sap standing for successive applications.
The type signature would be:
sap :: (a -> a) -> [a] -> [[a]]
For example, part of the partitions of set "abcd" can be obtained from the partitions of "bcd" by sap'ing them with ('a':).
λ> pbcd = [["b","c","d"],["b","cd"],["bc","d"],["c","bd"],["bcd"]]
λ> concatMap (sap ('a':)) pbcd
and the 5 missing partitions can then be obtained by adding 'a' as its own separate singleton.
My problem is that I have been unable to locate such a primitive in the language libraries, and that Hoogle, given the type signature, returns nothing of interest.
Does such a primitive as sap exist somewhere in the Haskell language libraries ???
Or is there a way to write it that is so short and simple that it does not even deserve to be a separate function, putting it below the so-called Fairbairn threshold ?
It is possible to write sap like this:
sap :: (a -> a) -> [a] -> [[a]]
sap fn ls = fst $ foldr op ([], []) ls
where op x (ll,tl) = ( ((fn x):tl) : map (x:) ll , x:tl )
Essentially you start with [[fn (last ls)]] as a seed and then progress leftwards. But this seems pedestrian not simple.
It seems like the simplest version of this is direct recursion:
sap :: (a -> a) -> [a] -> [[a]]
sap _ [] = []
sap f (x:xs) = (f x : xs) : map (x:) (sap f xs)
One possible exploration of this is as a paramorphism, which gives access to the recursive result and the unprocessed remainder together.
sap f = para step where
step Nil = []
step (Cons x (xs, rest)) = (f x : xs) : map (x:) rest
(Not checked, might have silly errors)
I don't see that as a huge improvement though. I don't see any deep insights in that decomposition of recursion from the problem itself.
For that, well... I've used holesOf for a generalized version of this in the past.
sap :: Traversable t => (a -> a) -> t a -> [t a]
sap f = map (peeks f) . holesOf traverse
Now that definitely says something. It has generalized the type to work on all instances of Traversable. On the other hand, the theoretical chunks involved were so overpowered for the end result that I'm not sure what it actually is that it says. On the third(?) hand, it looks pretty.
Or is there a way to write it that is so short and simple that it does not even deserve to be a separate function, putting it below the so-called Fairbairn threshold?
This. The functionality is rarely needed, and the (a -> a) argument doesn't make for a very generic application.
A short and simple implementation can be achieved with list recursion:
sap :: (a -> a) -> [a] -> [[a]]
sap _ [] = []
sap f (x:xs) = (f x:xs):((x:) <$> sap f xs)
I don't think it exists anywhere, although proving it negatively is of course impossible.. Another way to write sap, which I would probably prefer over using foldr,
sap f ls = zipWith (alterWith f) [0..] (iterate ls)
where alterWith f i ls = take i ls ++ f (ls !! i) : drop (i+1) ls
alterWith is available as adjust in, but I would very much not bring something so heavyweight in for that function. I often have something like alterWith defined in a project already, though, and if so that allows sap to be elided in favor of the call to zipWith above.
Exploiting Data.List.HT.splitEverywhere:
import Data.List.HT
sap :: (a -> a) -> [a] -> [[a]]
sap f xs = [ pre ++ f x : post | (pre,x,post) <- splitEverywhere xs]

List based on right Kan extension

In the ``Kan Extensions for Program Optimisation'' by Ralf Hinze there is the definition of List type based on right Kan extension of the forgetful functor from the category of monoids along itself (section 7.4). The paper gives Haskell implementation as follows:
newtype List a = Abstr {
apply :: forall z . (Monoid z) => (a -> z) -> z
I was able to define usual nil and cons constructors:
nil :: List a
nil = Abstr (\f -> mempty)
cons :: a -> List a -> List a
cons x (Abstr app) = Abstr (\f -> mappend (f x) (app f))
With the following instance of Monoid class for Maybe functor, I managed to define head function:
instance Monoid (Maybe a) where
mempty = Nothing
mappend Nothing m = m
mappend (Just a) m = Just a
head :: List a -> Maybe a
head (Abstr app) = app Just
Question: How can one define tail function?
Here is a rather principled solution to implementing head and tail in one go (full gist):
First of all, we know how to append lists (it will be useful later on):
append :: List a -> List a -> List a
append (Abstr xs) (Abstr ys) = Abstr (\ f -> xs f <> ys f)
Then we introduce a new type Split which we will use to detect whether a List is empty or not (and get, in the case it's non empty, a head and a tail):
newtype Split a = Split { outSplit :: Maybe (a, List a) }
This new type forms a monoid: indeed we know how to append two lists.
instance Monoid (Split a) where
mempty = Split Nothing
mappend (Split Nothing) (Split nns) = Split nns
mappend (Split mms) (Split Nothing) = Split mms
mappend (Split (Just (m, ms))) (Split (Just (n, ns))) =
Split $ Just (m, append ms (cons n ns))
Which means that we can get a function from List a to Split a using List a's apply:
split :: List a -> Split a
split xs = apply xs $ \ a -> Split $ Just (a, nil)
head and tail can finally be trivially derived from split:
head :: List a -> Maybe a
head = fmap fst . outSplit . split
tail :: List a -> Maybe (List a)
tail = fmap snd . outSplit . split
This implementation of lists as free monoids is provided in the package fmlist, which notes some interesting properties of it (unlike most implementations of lists, which are right-biased, this one is truly unbiased; you can make an arbitrary tree, and although of course the monoid laws force you to see it as flattened, you can still observe some differences in the infinite case. This is almost a Haskell quirk -- usually, free monoids). It also has an implementation of tail, so that's sort of an answer to your question (but see below).
With these sorts of representations (not just this particular one one, but also e.g. forall r. (a -> r -> r) -> r -> r lists), there are usually some operations (e.g. appending) that become easier, and some (e.g. zip and tail) that become more difficult. This is discussed a bit in various places, e.g. How to take the tail of a functional stream.
Looking more closely at fmlist, though, its solution is pretty unsatisfactory: It just converts the nice balanced tree that you give it to a right-biased list using foldr, which allows it to do regular list operations, but loses the monoidal structure. The tail of a "middle-infinite" list is no longer "middle-infinite", it's just right-infinite like a regular list.
It should be possible to come up with a clever Monoid instance to compute the tail while disturbing the rest of the structure as little as possible, but an obvious one doesn't come to mind off-hand. I can think of a non-clever "brute force" solution, though: Cheat and reify the "list" into a tree using an invalid Monoid instance, inspect the tree, and then fold it back up so the end result is valid. Here's what it would look like with my nonfree package and fmlist:
nail :: FM.FMList a -> FM.FMList a
nail (FM.FM k) = FM.FM $ \f -> foldMap f (nail' (k N))
nail' :: N a -> N a
nail' NEmpty = error "nail' NEmpty"
nail' (N x) = NEmpty
nail' (NAppend l r) =
case normalize l of
NEmpty -> nail' r
N x -> r
l' -> NAppend (nail' l') r
-- Normalize a tree so that the left side of a root NAppend isn't an empty
-- subtree of any shape. If the tree is infinite in a particular way, this
-- won't terminate, so in that sense taking the tail of a list can make it
-- slightly worse (but you were already in pretty bad shape as far as
-- operations on the left side are concerned, and this is a pathological case
-- anyway).
normalize :: N a -> N a
normalize (NAppend l r) =
case normalize l of
NEmpty -> normalize r
l' -> NAppend l' r
normalize n = n

Infinite lazy bitmap

I am trying to construct a lazy data structure that holds an infinite bitmap. I would like to support the following operations:
true :: InfBitMap
Returns an infinite bitmap of True, i.e. all positions should have value True.
falsify :: InfBitMap -> [Int] -> InfBitMap
Set all positions in the list to False. The list is possible infinite. For example, falsify true [0,2..] will return a list where all (and only) odd positions are True.
check :: InfBitMap -> Int -> Bool
Check the value of the index.
Here is what I could do so far.
-- InfBitMap will look like [(#), (#, #), (#, #, #, #)..]
type InfBitMap = [Seq Bool]
true :: InfBitMap
true = iterate (\x -> x >< x) $ singleton True
-- O(L * log N) where N is the biggest index in the list checked for later
-- and L is the length of the index list. It is assumed that the list is
-- sorted and unique.
falsify :: InfBitMap -> [Int] -> InfBitMap
falsify ls is = map (falsify' is) ls
-- Update each sequence with all indices within its length
-- Basically composes a list of (update pos False) for all positions
-- within the length of the sequence and then applies it.
falsify' is l = foldl' (.) id
(map ((flip update) False)
(takeWhile (< length l) is))
$ l
-- O(log N) where N is the index.
check :: InfBitMap -> Int -> Bool
check ls i = index (fromJust $ find ((> i) . length) ls) i
I am wondering if there is some Haskellish concept/data-structure that I am missing that would make my code more elegant / more efficient (constants do not matter to me, just order). I tried looking at Zippers and Lenses but they do not seem to help. I would like to keep the complexities of updates and checks logarithmic (maybe just amortized logarithmic).
Note: before someone suspects it, no this is not a homework problem!
It just occurred to me that check can be improved to:
-- O(log N) where N is the index.
-- Returns "collapsed" bitmap for later more efficient checks.
check :: InfBitMap -> Int -> (Bool, InfBitMap)
check ls i = (index l i, ls')
ls'#(l:_) = dropWhile ((<= i) . length) ls
Which can be turned into a Monad for code cleanliness.
A slight variation on the well-known integer trie seems to be applicable here.
{-# LANGUAGE DeriveFunctor #-}
data Trie a = Trie a (Trie a) (Trie a) deriving (Functor)
true :: Trie Bool
true = Trie True true true
-- O(log(index))
check :: Trie a -> Int -> a
check t i | i < 0 = error "negative index"
check t i = go t (i + 1) where
go (Trie a _ _) 1 = a
go (Trie _ l r) i = go (if even i then l else r) (div i 2)
modify :: Trie a -> Int -> (a -> a) -> Trie a
modify t i f | i < 0 = error "negative index"
modify t i f = go t (i + 1) where
go (Trie a l r) 1 = Trie (f a) l r
go (Trie a l r) i | even i = Trie a (go l (div i 2)) r
go (Trie a l r) i = Trie a l (go r (div i 2))
Unfortunately we can't use modify to implement falsify because we can't handle infinite lists of indices that way (all modifications have to be performed before an element of the trie can be inspected). Instead, we should do something more like a merge:
ascIndexModify :: Trie a -> [(Int, a -> a)] -> Trie a
ascIndexModify t is = go 1 t is where
go _ t [] = t
go i t#(Trie a l r) ((i', f):is) = case compare i (i' + 1) of
LT -> Trie a (go (2*i) l ((i', f):is)) (go (2*i+1) r ((i', f):is))
GT -> go i t is
EQ -> Trie (f a) (go (2*i) l is) (go (2*i+1) r is)
falsify :: Trie Bool -> [Int] -> Trie Bool
falsify t is = ascIndexModify t [(i, const False) | i <- is]
We assume strictly ascending indices in is, since otherwise we would skip places in the trie or even get non-termination, for example in check (falsify t (repeat 0)) 1.
The time complexities are a bit complicated by laziness. In check (falsify t is) index, we pay an additional cost of a constant log 2 index number of comparisons, and a further length (filter (<index) is) number of comparisons (i. e. the cost of stepping over all indices smaller than what we're looking up). You could say it's O(max(log(index), length(filter (<index) is)). Anyway, it's definitely better than the O(length is * log (index)) that we would get for a falsify implemented for finite is-es using modify.
We must keep in mind that tree nodes are evaluated once, and subsequent check-s for the same index after the first check are not paying any extra cost for falsify. Again, laziness makes this a bit complicated.
This falsify is also pretty well-behaved when we want to traverse a prefix of a trie. Take this toList function:
trieToList :: Trie a -> [a]
trieToList t = go [t] where
go ts = [a | Trie a _ _ <- ts]
++ go (do {Trie _ l r <- ts; [l, r]})
It's a standard breadth-first traversal, in linear time. The traversal time remains linear when we compute take n $ trieToList (falsify t is), since falsify incurs at most n + length (filter (<n) is) extra comparisons, which is at most 2 * n, assuming strictly increasing is.
(side note: the space requirement of breadth-first traversal is rather painful, but I can't see a simple way to help it, since iterative deepening is even worse here, because there the whole tree must be held in memory, while bfs only has to remember the bottom level of the tree).
One way to represent this is as a function.
true = const True
falsify ls is = \i -> not (i `elem` is) && ls i
check ls i = ls i
The true and falsify functions are nice and efficient. The check function can be as bad as linear. It's possible to improve the efficiency of the same basic idea. I like its elegance.

Is there an indexed list in Haskell and is it good or bad?

I am a new comer to the Haskell world and I am wondering if there is something like this:
data IndexedList a = IList Int [a]
findIndex::(Int->Int)->IndexedList a->(a,IndexedList a)
findIndex f (IList x l) = (l!!(f x), IList (f x) l)
next::IndexedList a->(a,IndexedList a)
next x = findIndex (+1) x
I've noticed that this kind of list is not purely functional but kind of useful for some applications. Should it be considered harmful?
It's certainly useful to have a list that comes equipped with a pointed to a particular location in the list. However, the way it's usually done in Haskell is somewhat different - rather than using an explicit pointer, we tend to use a zipper.
The list zipper looks like this
data ListZipper a = LZ [a] a [a] deriving (Show)
You should think of the middle field a as being the element that is currently pointed to, the first field [a] as being the elements before the current position, and the final field [a] as being the elements after the current position.
Usually we store the elements before the current one in reverse order, for efficiency, so that the list [0, 1, 2, *3*, 4, 5, 6] with a pointer to the middle element, would be stored as
LZ [2,1,0] 3 [4,5,6]
You can define functions that move the pointer to the left or right
left (LZ (a:as) b bs) = LZ as a (b:bs)
right (LZ as a (b:bs)) = LZ (a:as) b bs
If you want to move to the left or right n times, then you can do that with the help of a function that takes another function, and applies it n times to its argument
times n f = (!!n) . iterate f
so that to move left three times, you could use
>> let lz = LZ [2,1,0] 3 [4,5,6]
>> (3 `times` left) lz
LZ [] 0 [1,2,3,4,5,6]
Your two functions findIndex and next can be written as
next :: ListZipper a -> (a, ListZipper a)
next = findIndex 1
findIndex :: Int -> ListZipper a -> (a, ListZipper a)
findIndex n x = let y#(LZ _ a _) = (n `times` right) x in (a, y)
Contrary to what you think this list is in fact purely functional. The reason is that IList (f x) l creates a new list (and does not, as you may think, modify the current IndexedList). It is in general not that easy to create non-purely functional data structures or functions in Haskell, as long as you stay away from unsafePerformIO.
The reason I would recommend against using the IndexedList is that there is no assurance that the index is less than the length of the list. In this case the lookup l!!(f x) will fail with an exception, which is generally considered bad style in Haskell. An alternative could be to use a safe lookup, which returns a Maybe a like the following:
findIndex :: (Int -> Int) -> IndexedList a -> (Maybe a, IndexedList a)
findIndex f (IList i l) = (maybe_x, IList new_i l)
new_i = f i
maybe_x = if new_i < length l
then Just (l !! newI)
else Nothing
I can also not think of a usecase where such a list would be useful, but I guess I am limited by my creativity ;)

Inverting a fold

Suppose for a minute that we think the following is a good idea:
data Fold x y = Fold {start :: y, step :: x -> y -> y}
fold :: Fold x y -> [x] -> y
Under this scheme, functions such as length or sum can be implemented by calling fold with the appropriate Fold object as argument.
Now, suppose you want to do clever optimisation tricks. In particular, suppose you want to write
unFold :: ([x] -> y) -> Fold x y
It should be relatively easy to rule a RULES pragma such that fold . unFold = id. But the interesting question is... can we actually implement unFold?
Obviously you can use RULES to apply arbitrary code transformations, whether or not they preserve the original meaning of the code. But can you really write an unFold implementation which actually does what its type signature suggests?
No, it's not possible. Proof: let
f :: [()] -> Bool
f[] = False
f[()] = False
f _ = True
First we must, for f' = unFold f, have start f' = False, because when folding over the empty list we directly get the start value. Then we must require step f' () False = False to achieve fold f' [()] = False. But when now evaluating fold f' [(),()], we would again only get a call step f' () False, which we had to define as False, leading to fold f' [(),()] ≡ False, whereas f[(),()] ≡ True. So there exists no unFold f that fulfills fold $ unFold f ≡ f.                                                                                                                                              □
You can, but you need to make a slight modification to Fold in order to pull it off.
All functions on lists can be expressed as a fold, but sometimes to accomplish this, extra bookkeeping is needed. Suppose we add an additional type parameter to your Fold type, which passes along this additional contextual information.
data Fold a c r = Fold { _start :: (c, r), _step :: a -> (c,r) -> (c,r) }
Now we can implement fold like so
fold :: Fold a c r -> [a] -> r
fold (Fold step start) = snd . foldr step start
Now what happens when we try to go the other way?
unFold :: ([a] -> r) -> Fold a c r
Where does the c come from? Functions are opaque values, so it's hard to know how to inspect a function and know which contextual information it relies on. So, let's cheat a little. We're going to have the "contextual information" be the entire list, so then when we get to the leftmost element, we can just apply the function to the original list, ignoring the prior cumulative results.
unFold :: ([a] -> r) -> Fold a [a] r
unFold f = Fold { _start = ([], f [])
, _step = \a (c, _r) -> let c' = a:c in (c', f c') }
Now, sadly, this does not necessarily compose with fold, because it requires that c must be [a]. Let's fix that by hiding c with existential quantification.
{-# LANGUAGE ExistentialQuantification #-}
data Fold a r = forall c. Fold
{ _start :: (c,r)
, _step :: a -> (c,r) -> (c,r) }
fold :: Fold a r -> [a] -> r
fold (Fold start step) = snd . foldr step start
unFold :: ([a] -> r) -> Fold a r
unFold f = Fold start step where
start = ([], f [])
step a (c, _r) = let c' = a:c in (c', f c')
Now, it should always be true that fold . unFold = id. And, given a relaxed notion of equality for the Fold data type, you could also say that unFold . fold = id. You can even provide a smart constructor that acts like the old Fold constructor:
makeFold :: r -> (a -> r -> r) -> Fold a r
makeFold start step = Fold start' step' where
start' = ((), start)
step' a ((), r) = ((), step a r)
Conclusion 1: you can't
What you asked for originally isn't possible, at least not by any version of what you wanted I can come up with. (See below.)
If change your data type to allow me to store intermediate calculations, I think I'll be fine, but even then,
the function unFold would be rather inefficient, which seems to run counter to your clever optimisation tricks agenda!
Conclusion 2: I don't think it achieves what you want, even if you work around it by changing the types
Any optimisation of the list algorithm would be subject to the problem that you've calculated the step function using the original unoptimised function, and quite probably in a complicated way.
Since there's no equality on functions, optimising step to something efficient isn't possible. I think you need a human to do unFold, not a compiler.
Anyway, back to the original question:
Could fold . unFold = id ?
No. Suppose we have
isSingleton :: [a] -> Bool
isSingleton [x] = True
isSingleton _ = False
then if we had unFold :: ([x] -> y) -> Fold x y then if foldSingleton was the same as unFold isSingleton would need to have
foldSingleton = Fold {start = False , step = ???}
Where step takes an element of the list and updates the result.
Now isSingleton "a" == True, we need
step False = True
and because isSingleton "ab" == False, we need
step True = False
so step = not would do so far, but also isSingleton "abc" == False so we also need
step False = False
Since there are functions ([x] -> y) that cannot be represented by a value of type Fold x y, there cannot exist a function unFold :: ([x] -> y) -> Fold x y such that fold . unFold = id, because id is a total function.
It turns out you're not convinced by this, because you only expected unFold to work on functions that had a representation as a fold, so maybe you meant unFold.fold = id.
Could unFold . fold = id ?
Even if you just want unFold to work on functions ([x] -> y) that can be obtained using fold :: Fold x y -> ([x] -> y), I don't think it's possible. Let's address the question by assuming now we have defined
combine :: X -> Y -> Y
initial :: Y
folded :: [X] -> Y
folded = fold $ Fold initial combine
Recovering the value initial is trivial: initial = folded [].
Recovery of the original combine is not, because there's no way to go from a function that gives you some values of Y to one which combines arbitrary values of Y.
For an example, if we had X = Y = Int and I defined
combine x y | y < 0 = -10
| otherwise = y + 1
initial = 0
then since combine just adds one to y every time you use it on positive y, and the initial value is 0, folded is indistinguishable from length in terms of its output. Notice that since folded xs is never negative, it's also impossible to define a function unFold :: ([x] -> y) -> Fold x y that ever recovers our combine function. This boils down to the fact that fold is not injective; it carries different values of type Fold x y to the same value of type [x] -> y.
Thus I've proved two things: if unFold :: ([x] -> y) -> Fold x y then both fold.unFold /= id and now also unFold.fold /= id
I bet you're not convinced by this either, because you don't really care whether you got Fold 0 (\_ y -> y+1) or Fold 0 combine back from unFold folded, seeing as they have the same value when refolded! Let's narrow the goalposts one more time. Perhaps you want unFold to work whenever the function is obtainable via fold, and you're happy for it not to give you inconsistent answers as long as when you fold the result again, you get the same function. I can summarise that with this next question:
Could fold . unFold . fold = fold ?
i.e. Could you define unFold so that fold.unFold is the identity on the set of functions obtainable via fold?
I'm really convinced this isn't possible, because it's not a tractible problem to calculate the step function without retaining extra information about intermediate values on sublists.
Suppose we had
unFold f = Fold {start = f [], step = recoverstep f}
we need
recoverstep f x1 initial == f [x1]
so if there's an Eq instance for x (ring the alarm bells!), then recoverstep must have the same effect as
recoverstep f x1 y | y == initial = f [x1]
also we need
recoverstep f x2 (f [x1]) == f [x1,x2]
so if there's an Eq instance for x, then recoverstep must have the same effect as
recoverstep f x2 y | y == (f [x1]) = f [x1,x2]
but there's a massive problem here: the variable x1 is free in the right hand side of this equation.
This means that logically, we can't tell what value the step function should have on an x unless we already
know what values it has been used on. We would need to store the values of f [x1], f [x1,x2] etc in the Fold
data type to make it work, and this is the clincher as to why we can't define unFold. If you change the data type Fold
to allow us to store information about intermediate lists, I can see it would work, but as it stands it's impossible
to recover the context.
Similar to Dan's answer, but using a slightly different approach. Instead of pairing the accumulator with partial results which will be thrown away at the end, we add a "post-processing" function which will convert from the accumulator type to the final result.
The same "cheat" for unFold just does all the work in the post-processing step:
{-# LANGUAGE ExistentialQuantification #-}
data Fold a r = forall c. Fold
{ _start :: c
, _step :: a -> c -> c
, _result :: c -> r }
fold :: Fold a r -> [a] -> r
fold (Fold start step result) = result . foldr step start
unFold :: ([a] -> r) -> Fold a r
unFold f = Fold [] (:) f
makeFold :: r -> (a -> r -> r) -> Fold a r
makeFold start step = Fold start step id
