A recursion scheme from Int -> Int? - haskell

The foldr identity is
foldr (:) []
More generally, with folds you can either destroy structure and end up with a summary value or inject structure in such a way that you end up with the same output structure.
[Int] -> [Int]
or
[Int] -> Int
or
[Int] -> ?
I'm wondering if there a similar identity with unfoldr/l.
I know how to get
Int -> [Int]
with unfold/ana.
I'm looking for some kind of way to go from
Int -> Int
with a recursion scheme.

Taking a cue from your remark about factorials, we can note that natural numbers can be treated as a recursive data structure:
data Nat = Zero | Succ Nat
In terms of the recursion-schemes machinery, the corresponding base functor would be:
data NatF a = ZeroF | SuccF a
deriving (Functor)
NatF, however, is isomorphic to Maybe. That being so, recursion-schemes conveniently makes Maybe the base functor of the Natural type from base. For instance, here is the type of ana specialised to Natural:
ana #Natural :: (a -> Maybe a) -> a -> Natural
We can use it to write the identity unfold for Natural:
{-# LANGUAGE LambdaCase #-}
import Numeric.Natural
import Data.Functor.Foldable
idNatAna :: Natural -> Natural
idNatAna = ana $ \case
0 -> Nothing
x -> Just (x - 1)
The coalgebra we just gave to ana is project for Natural, project being the function that unwraps one layer of the recursive structure. In terms of the recursion-schemes vocabulary, ana project is the identity unfold, and cata embed is the identity fold. (In particular, project for lists is uncons from Data.List, except that it is encoded with ListF instead of Maybe.)
By the way, the factorial function can be expressed as a paramorphism on naturals (as pointed out in the note at the end of this question). We can also implement that in terms of recursion-schemes:
fact :: Natural -> Natural
fact = para $ \case
Nothing -> 1
Just (predec, prod) -> prod * (predec + 1)
para makes available, at each recursive step, the rest of the structure to be folded (if we were folding a list, that would be its tail). In this case, I have called the value thus provided predec because at the n-th recursive step from bottom to top predec is n - 1.
Note that user11228628's hylomorphism is probably a more efficient implementation, if you happen to care about that. (I haven't benchmarked them, though.)

The kind of recursion scheme that deals with building up an intermediate structure and tearing it down, so that the structure doesn't appear in the input or output, is a hylomorphism, spelled hylo in recursion-schemes.
To use a hylomorphism, you need to specify an algebra (something that consumes one step of a recursive structure) and a coalgebra (something that produces one step of a recursive structure), and you need to have a data type for the kind of structure you're using, of course.
You suggested factorial, so let's look into how to write that as a hylomorphism.
One way to look at factorial is as the product of a list of numbers counting down from the initial n. In this framing, we can think of the product as our algebra, tearing down the list one cons at a time, and the count-down as our coalgebra, building up the list as n is decremented.
recursion-schemes gives us ListF as a handy base functor for lists, so we'll use that as the data type produced by the coalgebra and consumed by the algebra. Its constructors are Nil and Cons, which of course resemble the constructors for full lists, except that a ListF, like any base structure in a recursion scheme, uses a type parameter in the place that lists would use actual recursion (meaning that Cons :: a -> b -> ListF a b instead of (:) :: a -> [a] -> [a]).
So that determines our types. Now defining fact is a rather fill-in-the-blanks exercise:
import Prelude hiding (product)
import Data.Functor.Foldable
product :: ListF Int Int -> Int
product Nil = 1
product (Cons a b) = a * b
countDown :: Int -> ListF Int Int
countDown 0 = Nil
countDown n = Cons n (n - 1)
fact :: Int -> Int
fact = hylo product countDown

Related

One-line implementation of split in Haskell

What I want is the following one (which I think should be included in prelude since it is very useful in text processing):
split :: Eq a => [a] -> [a] -> [[a]]
e.g:
split ";;" "hello;;world" = ["hello", "world"]
split from Data.List.Utils isn't in base. I feel there should be a short-and-sweet implementation by composing a few base functions, but I can't figure it out. Am I missing something?
Arguably, the best way to check how feasible a short-and-sweet splitOn (or split, as you and MissingH call it -- here I will stick to the name used by the split and extra packages) is trying to write it [note 1].
(By the way, I will use recursion-schemes functions and concepts in this answer, as I find systematising things a bit helps me think about this kind of problem. Do let me know if anything is unclear.)
The type of splitOn is [note 2]:
splitOn :: Eq a => [a] -> [a] -> [[a]]
One way to write a recursive function that builds one data structure from another, like splitOn does, begins by asking whether to do it by walking the original structure in a bottom-up or a top-down way (for lists, that amounts to right-to-left and left-to-right respectively). A bottom-up walk is more naturally expressed as some kind of fold:
foldr #[] :: (a -> b -> b) -> b -> [a] -> b
cata #[_] :: (ListF a b -> b) -> [a] -> b
(cata, short for catamorphism, is how recursion-schemes expresses a vanilla fold. The ListF a b -> b function, called an algebra in the jargon, specifies what happens in each fold step. data ListF a b = Nil | Cons a b, and so, in the case of lists, the algebra amounts to the two first arguments of foldr rolled into one -- the binary function corresponds to the Cons case, and the seed of the fold, to the Nil one.)
A top-down walk, on the other hand, lends itself to an unfold:
unfoldr :: (b -> Maybe (a, b)) -> b -> [a] -- found in Data.List
ana #[_] :: (b -> ListF a b) -> b -> [a]
(ana, short for anamorphism, is the vanilla unfold in recursion-schemes. The b -> ListF a b function is a coalgebra; it specifies what happens in each unfold step. For a list, the possibilities are either emitting a list element and an updated seed or generating an empty list and terminating the unfold.)
Should splitOn be bottom-up or top-down? To implement it, we need to, at any given position in the list, look ahead in order to check whether the current list segment starts with the delimiter. That being so, it makes sense to reach for a top-down solution i.e. an unfold/anamorphism.
Playing with ways to write splitOn as an unfold shows another thing to consider: you want each individual unfold step to generate a fully-formed list chunk. Not doing so will, at best, lead you to unnecessarily walk the original list twice [note 3]; at worst, catastrophic memory usage and stack overflows on long list chunks await [note 4]. One way to achieve that is through a breakOn function, like the one in Data.List.Extra...
breakOn :: Eq a => [a] -> [a] -> ([a], [a])
... which is like break from the Prelude, except that, instead of applying a predicate to each element, it checks whether the remaining list segment has the first argument as a prefix [note 5].
With breakOn at hand, we can write a proper splitOn implementation -- one that, compiled with optimisations, matches in performance the library ones mentioned at the beginning:
splitOnAtomic :: Eq a => [a] -> [a] -> [[a]]
splitOnAtomic delim
| null delim = error "splitOnAtomic: empty delimiter"
| otherwise = apo coalgSplit
where
delimLen = length delim
coalgSplit = \case
[] -> Cons [] (Left [])
li ->
let (ch, xs) = breakOn (delim `isPrefixOf`) li
in Cons ch (Right (drop delimLen xs))
(apo, short for apomorphism, is an unfold that can be short-circuited. That is done by emitting from an unfold step, rather than the usual updated seed -- signaled by Right -- a final result -- signaled by Left. Short-circuiting is needed here because, in the empty list case, we want neither to produce an empty list by returning Nil -- which would wrongly result in splitOn delim [] = [] -- nor to resort to Cons [] [] -- which would generate an infinite tail of []. This trick corresponds directly to the additional splitOn _ [] = [[]] case added to the Data.List.Extra implementation.)
After a few slight detours, we can now address your actual question. splitOn is tricky to write in a short way because, firstly, the recursion pattern it uses isn't entirely trivial; secondly, a good implementation requires a few details that are inconvenient for golfing; and thirdly, what appears to be the best implementation relies crucially on breakOn, which is not in base.
Notes:
[note 1]: Here are the imports and pragmas needed to run the snippets in this answer:
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE DeriveFoldable #-}
{-# LANGUAGE DeriveTraversable #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TemplateHaskell #-}
import Data.Functor.Foldable
import Data.Functor.Foldable.TH
import Data.List
import Data.Maybe
[note 2]: An alternative type might be Eq a => NonEmpty a -> [a] -> NonEmpty [a], if one wants to put precision above all else. I won't bother with that here to avoid unnecessary distractions.
[note 3]: As in this rather neat implementation, which uses two unfolds -- the first one (ana coalgMark) replaces the delimiters with Nothing, so that the second one (apo coalgSplit) can split in a straightforward way:
splitOnMark :: Eq a => [a] -> [a] -> [[a]]
splitOnMark delim
| null delim = error "splitOnMark: empty delimiter"
| otherwise = apo coalgSplit . ana coalgMark
where
coalgMark = \case
[] -> Nil
li#(x:xs) -> case stripPrefix delim li of
Just ys -> Cons Nothing ys
Nothing -> Cons (Just x) xs
coalgSplit = \case
[] -> Cons [] (Left [])
mxs ->
let (mch, mys) = break isNothing mxs
in Cons (catMaybes mch) (Right (drop 1 mys))
(What apo is and what Left and Right are doing here will be covered a little further in the main body of the answer.)
This implementation has fairly acceptable performance, though with optimisations it is a slower than the one in the main body of the answer by a (modest) constant factor. It might be a little easier to golf this one, though...
[note 4]: As in this single unfold implementation, which uses a coalgebra that calls itself recursively to build each chunk as a (difference) list:
splitOnNaive :: Eq a => [a] -> [a] -> [[a]]
splitOnNaive delim
| null delim = error "splitOn: empty delimiter"
| otherwise = apo coalgSplit . (,) id
where
coalgSplit = \case
(ch, []) -> Cons (ch []) (Left [])
(ch, li#(x:xs)) -> case stripPrefix delim li of
Just ys -> Cons (ch []) (Right (id, ys))
Nothing -> coalg (ch . (x :), xs)
Having to decide at each element whether to grow the current chunk or to start a new one is in itself problematic, as it breaks laziness.
[note 5]: Here is how Data.List.Extra implements breakOn. If we want to achieve that using a recursion-schemes unfold, one good strategy is defining a data structure that encodes exactly what we are trying to build:
data BrokenList a = Broken [a] | Unbroken a (BrokenList a)
deriving (Eq, Show, Functor, Foldable, Traversable)
makeBaseFunctor ''BrokenList
A BrokenList is just like a list, except that the empty list is replaced by the (non-recursive) Broken constructor, which marks the break point and holds the remainder of the list. Once generated by an unfold, a BrokenList can be easily folded into a pair of lists: the elements in the Unbroken values are consed into one list, and the list in Broken becomes the other one:
breakOn :: ([a] -> Bool) -> [a] -> ([a], [a])
breakOn p = hylo algPair coalgBreak
where
coalgBreak = \case
[] -> BrokenF []
li#(x:xs)
| p li -> BrokenF li
| otherwise -> UnbrokenF x xs
algPair = \case
UnbrokenF x ~(xs, ys) -> (x : xs, ys)
BrokenF ys -> ([], ys)
(hylo, short for hylomorphism, is simply an ana followed by a cata, i.e. an unfold followed by a fold. hylo, as implemented in recursion-schemes, takes advantage of the fact that the intermediate data structure, created by the unfold and then immediately consumed by the fold, can be fused away, leading to significant performance gains.)
It is worth mentioning that the lazy pattern match in algPair is crucial to preserve laziness. The Data.List.Extra implementation linked to above achieves that by using first from Control.Arrow, which also matches the pair given to it lazily.

How to "iterate" over a function whose type changes among iteration but the formal definition is the same

I have just started learning Haskell and I come across the following problem. I try to "iterate" the function \x->[x]. I expect to get the result [[8]] by
foldr1 (.) (replicate 2 (\x->[x])) $ (8 :: Int)
This does not work, and gives the following error message:
Occurs check: cannot construct the infinite type: a ~ [a]
Expected type: [a -> a]
Actual type: [a -> [a]]
I can understand why it doesn't work. It is because that foldr1 has type signature foldr1 :: Foldable t => (a -> a -> a) -> a -> t a -> a, and takes a -> a -> a as the type signature of its first parameter, not a -> a -> b
Neither does this, for the same reason:
((!! 2) $ iterate (\x->[x]) .) id) (8 :: Int)
However, this works:
(\x->[x]) $ (\x->[x]) $ (8 :: Int)
and I understand that the first (\x->[x]) and the second one are of different type (namely [Int]->[[Int]] and Int->[Int]), although formally they look the same.
Now say that I need to change the 2 to a large number, say 100.
My question is, is there a way to construct such a list? Do I have to resort to meta-programming techniques such as Template Haskell? If I have to resort to meta-programming, how can I do it?
As a side node, I have also tried to construct the string representation of such a list and read it. Although the string is much easier to construct, I don't know how to read such a string. For example,
read "[[[[[8]]]]]" :: ??
I don't know how to construct the ?? part when the number of nested layers is not known a priori. The only way I can think of is resorting to meta-programming.
The question above may not seem interesting enough, and I have a "real-life" case. Consider the following function:
natSucc x = [Left x,Right [x]]
This is the succ function used in the formal definition of natural numbers. Again, I cannot simply foldr1-replicate or !!-iterate it.
Any help will be appreciated. Suggestions on code styles are also welcome.
Edit:
After viewing the 3 answers given so far (again, thank you all very much for your time and efforts) I realized this is a more general problem that is not limited to lists. A similar type of problem can be composed for each valid type of functor (what if I want to get Just Just Just 8, although that may not make much sense on its own?).
You'll certainly agree that 2 :: Int and 4 :: Int have the same type. Because Haskell is not dependently typed†, that means foldr1 (.) (replicate 2 (\x->[x])) (8 :: Int) and foldr1 (.) (replicate 4 (\x->[x])) (8 :: Int) must have the same type, in contradiction with your idea that the former should give [[8]] :: [[Int]] and the latter [[[[8]]]] :: [[[[Int]]]]. In particular, it should be possible to put both of these expressions in a single list (Haskell lists need to have the same type for all their elements). But this just doesn't work.
The point is that you don't really want a Haskell list type: you want to be able to have different-depth branches in a single structure. Well, you can have that, and it doesn't require any clever type system hacks – we just need to be clear that this is not a list, but a tree. Something like this:
data Tree a = Leaf a | Rose [Tree a]
Then you can do
Prelude> foldr1 (.) (replicate 2 (\x->Rose [x])) $ Leaf (8 :: Int)
Rose [Rose [Leaf 8]]
Prelude> foldr1 (.) (replicate 4 (\x->Rose [x])) $ Leaf (8 :: Int)
Rose [Rose [Rose [Rose [Leaf 8]]]]
†Actually, modern GHC Haskell has quite a bunch of dependently-typed features (see DaniDiaz' answer), but these are still quite clearly separated from the value-level language.
I'd like to propose a very simple alternative which doesn't require any extensions or trickery: don't use different types.
Here is a type which can hold lists with any number of nestings, provided you say how many up front:
data NestList a = Zero a | Succ (NestList [a]) deriving Show
instance Functor NestList where
fmap f (Zero a) = Zero (f a)
fmap f (Succ as) = Succ (fmap (map f) as)
A value of this type is a church numeral indicating how many layers of nesting there are, followed by a value with that many layers of nesting; for example,
Succ (Succ (Zero [['a']])) :: NestList Char
It's now easy-cheesy to write your \x -> [x] iteration; since we want one more layer of nesting, we add one Succ.
> iterate (\x -> Succ (fmap (:[]) x)) (Zero 8) !! 5
Succ (Succ (Succ (Succ (Succ (Zero [[[[[8]]]]])))))
Your proposal for how to implement natural numbers can be modified similarly to use a simple recursive type. But the standard way is even cleaner: just take the above NestList and drop all the arguments.
data Nat = Zero | Succ Nat
This problem indeed requires somewhat advanced type-level programming.
I followed #chi's suggestion in the comments, and searched for a library that provided inductive type-level naturals with their corresponding singletons. I found the fin library, which is used in the answer.
The usual extensions for type-level trickery:
{-# language DataKinds, PolyKinds, KindSignatures, ScopedTypeVariables, TypeFamilies #-}
Here's a type family that maps a type-level natural and an element type to the type of the corresponding nested list:
import Data.Type.Nat
type family Nested (n::Nat) a where
Nested Z a = [a]
Nested (S n) a = [Nested n a]
For example, we can test from ghci that
*Main> :kind! Nested Nat3 Int
Nested Nat3 Int :: *
= [[[[Int]]]]
(Nat3 is a convenient alias defined in Data.Type.Nat.)
And here's a newtype that wraps the function we want to construct. It uses the type family to express the level of nesting
newtype Iterate (n::Nat) a = Iterate { runIterate :: (a -> [a]) -> a -> Nested n a }
The fin library provides a really nifty induction1 function that lets us compute a result by induction on Nat. We can use it to compute the Iterate that corresponds to every Nat. The Nat is passed implicitly, as a constraint:
iterate' :: forall n a. SNatI n => Iterate (n::Nat) a
iterate' =
let step :: forall m. SNatI m => Iterate m a -> Iterate (S m) a
step (Iterate recN) = Iterate (\f a -> [recN f a])
in induction1 (Iterate id) step
Testing the function in ghci (using -XTypeApplications to supply the Nat):
*Main> runIterate (iterate' #Nat3) pure True
[[[[True]]]]

How to implement delete with foldr in Haskell

I've been studying folds for the past few days. I can implement simple functions with them, like length, concat and filter. What I'm stuck at is trying to implement with foldr functions like delete, take and find. I have implemented these with explicit recursion but it doesn't seem obvious to me how to convert these types of functions to right folds.
I have studied the tutorials by Graham Hutton and Bernie Pope. Imitating Hutton's dropWhile, I was able to implement delete with foldr but it fails on infinite lists.
From reading Implement insert in haskell with foldr, How can this function be written using foldr? and Implementing take using foldr, it would seem that I need to use foldr to generate a function which then does something. But I don't really understand these solutions and don't have an idea how to implement for example delete this way.
Could you explain to me a general strategy for implementing with foldr lazy versions of functions like the ones I mentioned. Maybe you could also implement delete as an example since this probably is one of the easiest.
I'm looking for a detailed explanation that a beginner can understand. I'm not interested in just solutions, I want to develop an understanding so I can come up with solutions to similar problems myself.
Thanks.
Edit: At the moment of writing there is one useful answer but it's not quite what I was looking for. I'm more interested in an approach that uses foldr to generate a function, which then does something. The links in my question have examples of this. I don't quite understand those solutions so I would like to have more information on this approach.
delete is a modal search. It has two different modes of operation - whether it's already found the result or not. You can use foldr to construct a function that passes the state down the line as each element is checked. So in the case of delete, the state can be a simple Bool. It's not exactly the best type, but it will do.
Once you have identified the state type, you can start working on the foldr construction. I'm going to walk through figuring it out the way I did. I'll be enabling ScopedTypeVariables just so I can annotate the type of subexpressions better. One you know the state type, you know you want foldr to generate a function taking a value of that type, and returning a value of the desired final type. That's enough to start sketching things.
{-# LANGUAGE ScopedTypeVariables #-}
delete :: forall a. Eq a => a -> [a] -> [a]
delete a xs = foldr f undefined xs undefined
where
f :: a -> (Bool -> [a]) -> (Bool -> [a])
f x g = undefined
It's a start. The exact meaning of g is a little bit tricky here. It's actually the function for processing the rest of the list. It's accurate to look at it as a continuation, in fact. It absolutely represents performing the rest of the folding, with your whatever state you choose to pass along. Given that, it's time to figure out what to put in some of those undefined places.
{-# LANGUAGE ScopedTypeVariables #-}
delete :: forall a. Eq a => a -> [a] -> [a]
delete a xs = foldr f undefined xs undefined
where
f :: a -> (Bool -> [a]) -> (Bool -> [a])
f x g found | x == a && not found = g True
| otherwise = x : g found
That seems relatively straightforward. If the current element is the one being searched for, and it hasn't yet been found, don't output it, and continue with the state set to True, indicating it's been found. otherwise, output the current value and continue with the current state. This just leaves the rest of the arguments to foldr. The last one is the initial state. The other one is the state function for an empty list. Ok, those aren't too bad either.
{-# LANGUAGE ScopedTypeVariables #-}
delete :: forall a. Eq a => a -> [a] -> [a]
delete a xs = foldr f (const []) xs False
where
f :: a -> (Bool -> [a]) -> (Bool -> [a])
f x g found | x == a && not found = g True
| otherwise = x : g found
No matter what the state is, produce an empty list when an empty list is encountered. And the initial state is that the element being searched for has not yet been found.
This technique is also applicable in other cases. For instance, foldl can be written as a foldr this way. If you look at foldl as a function that repeatedly transforms an initial accumulator, you can guess that's the function being produced - how to transform the initial value.
{-# LANGUAGE ScopedTypeVariables #-}
foldl :: forall a b. (a -> b -> a) -> a -> [b] -> a
foldl f z xs = foldr g id xs z
where
g :: b -> (a -> a) -> (a -> a)
g x cont acc = undefined
The base cases aren't too tricky to find when the problem is defined as manipulating the initial accumulator, named z there. The empty list is the identity transformation, id, and the value passed to the created function is z.
The implementation of g is trickier. It can't just be done blindly on types, because there are two different implementations that use all the expected values and type-check. This is a case where types aren't enough, and you need to consider the meanings of the functions available.
Let's start with an inventory of the values that seem like they should be used, and their types. The things that seem like they must need to be used in the body of g are f :: a -> b -> a, x :: b, cont :: (a -> a), and acc :: a. f will obviously take x as its second argument, but there's a question of the appropriate place to use cont. To figure out where it goes, remember that it represents the transformation function returned by processing the rest of the list, and that foldl processes the current element and then passes the result of that processing to the rest of the list.
{-# LANGUAGE ScopedTypeVariables #-}
foldl :: forall a b. (a -> b -> a) -> a -> [b] -> a
foldl f z xs = foldr g id xs z
where
g :: b -> (a -> a) -> (a -> a)
g x cont acc = cont $ f acc x
This also suggests that foldl' can be written this way with only one tiny change:
{-# LANGUAGE ScopedTypeVariables #-}
foldl' :: forall a b. (a -> b -> a) -> a -> [b] -> a
foldl' f z xs = foldr g id xs z
where
g :: b -> (a -> a) -> (a -> a)
g x cont acc = cont $! f acc x
The difference is that ($!) is used to suggest evaluation of f acc x before it's passed to cont. (I say "suggest" because there are some edge cases where ($!) doesn't force evaluation even as far as WHNF.)
delete doesn't operate on the entire list evenly. The structure of the computation isn't just considering the whole list one element at a time. It differs after it hits the element it's looking for. This tells you it can't be implemented as just a foldr. There will have to be some sort of post-processing involved.
When that happens, the general pattern is that you build a pair of values and just take one of them at completion of the foldr. That's probably what you did when you imitated Hutton's dropWhile, though I'm not sure since you didn't include code. Something like this?
delete :: Eq a => a -> [a] -> [a]
delete a = snd . foldr (\x (xs1, xs2) -> if x == a then (x:xs1, xs1) else (x:xs1, x:xs2)) ([], [])
The main idea is that xs1 is always going to be the full tail of the list, and xs2 is the result of the delete over the tail of the list. Since you only want to remove the first element that matches, you don't want to use the result of delete over the tail when you do match the value you're searching for, you just want to return the rest of the list unchanged - which fortunately is what's always going to be in xs1.
And yeah, that doesn't work on infinite lists - but only for one very specific reason. The lambda is too strict. foldr only works on infinite lists when the function it is provided doesn't always force evaluation of its second argument, and that lambda does always force evaluation of its second argument in the pattern match on the pair. Switching to an irrefutable pattern match fixes that, by allowing the lambda to produce a constructor before ever examining its second argument.
delete :: Eq a => a -> [a] -> [a]
delete a = snd . foldr (\x ~(xs1, xs2) -> if x == a then (x:xs1, xs1) else (x:xs1, x:xs2)) ([], [])
That's not the only way to get that result. Using a let-binding or fst and snd as accessors on the tuple would also do the job. But it is the change with the smallest diff.
The most important takeaway here is to be very careful with handling the second argument to the reducing function you pass to foldr. You want to defer examining the second argument whenever possible, so that the foldr can stream lazily in as many cases as possible.
If you look at that lambda, you see that the branch taken is chosen before doing anything with the second argument to the reducing function. Furthermore, you'll see that most of the time, the reducing function produces a list constructor in both halves of the result tuple before it ever needs to evaluate the second argument. Since those list constructors are what make it out of delete, they are what matter for streaming - so long as you don't let the pair get in the way. And making the pattern-match on the pair irrefutable is what keeps it out of the way.
As a bonus example of the streaming properties of foldr, consider my favorite example:
dropWhileEnd :: (a -> Bool) -> [a] -> [a]
dropWhileEnd p = foldr (\x xs -> if p x && null xs then [] else x:xs) []
It streams - as much as it can. If you figure out exactly when and why it does and doesn't stream, you'll understand pretty much every detail of the streaming structure of foldr.
here is a simple delete, implemented with foldr:
delete :: (Eq a) => a -> [a] -> [a]
delete a xs = foldr (\x xs -> if x == a then (xs) else (x:xs)) [] xs

Type-safe difference lists

A common idiom in Haskell, difference lists, is to represent a list xs as the value (xs ++). Then (.) becomes "(++)" and id becomes "[]" (in fact this works for any monoid or category). Since we can compose functions in constant time, this gives us a nice way to efficiently build up lists by appending.
Unfortunately the type [a] -> [a] is way bigger than the type of functions of the form (xs ++) -- most functions on lists do something other than prepend to their argument.
One approach around this (as used in dlist) is to make a special DList type with a smart constructor. Another approach (as used in ShowS) is to not enforce the constraint anywhere and hope for the best. But is there a nice way of keeping all the desired properties of difference lists while using a type that's exactly the right size?
Yes!
We can view [a] as a free monad instance Free ((,) a) ().
Thus we can apply the scheme described by Edward Kmett in Free Monads for Less.
The type we'll get is
newtype F a = F { runF :: forall r. (() -> r) -> ((a, r) -> r) -> r }
or simply
newtype F a = F { runF :: forall r. r -> (a -> r -> r) -> r }
So runF is nothing else than the foldr function for our list!
This is called the Boehm-Berarducci encoding and it's isomorphic to the original data type (list) — so this is as small as you can possibly get.
Will Ness says:
So this type is still too "wide", it allows more than just prefixing - doesn't constrain the g function argument.
If I understood his argument correctly, he points out that you can apply the foldr (or runF) function to something different from [] and (:).
But I never claimed that you can use foldr-encoding only for concatenation. Indeed, as this name implies, you can use it to calculate any fold — and that's what Will Ness demonstrated.
It may become more clear if you forget for a moment that there's one true list type, [a]. There may be lots of list types — e.g. I can define one by
data List a = Nil | Cons a (List a)
It's be different from, but isomorphic to [a].
The foldr-encoding above is just yet another encoding of lists, like List a or [a]. It is also isomorphic to [a], as evidenced by functions \l -> F (\a f -> foldr a f l) and \x -> runF [] (:) and the fact that their compositions (in either order) is identity. But you are not obliged to convert to [a] — you can convert to List directly, using \x -> runF x Nil Cons.
The important point is that F a doesn't contain an element that is not the foldr functions for some list — nor does it contain an element that is the foldr functions for more than one list (obviously).
Thus, it doesn't contain too few or too many elements — it contains precisely as many elements as needed to exactly represent all lists.
This is not true of the difference list encoding — for example, the reverse function is not an append operation for any list.

Creating a list type using functions

For a silly challenge I am trying to implement a list type using as little of the prelude as possible and without using any custom types (the data keyword).
I can construct an modify a list using tuples like so:
import Prelude (Int(..), Num(..), Eq(..))
cons x = (x, ())
prepend x xs = (x, xs)
head (x, _) = x
tail (_, x) = x
at xs n = if n == 0 then xs else at (tail xs) (n-1)
I cannot think of how to write an at (!!) function. Is this even possible in a static language?
If it is possible could you try to nudge me in the right direction without telling me the answer.
There is a standard trick known as Church encoding that makes this easy. Here's a generic example to get you started:
data Foo = A Int Bool | B String
fooValue1 = A 3 False
fooValue2 = B "hello!"
Now, a function that wants to use this piece of data must know what to do with each of the constructors. So, assuming it wants to produce some result of type r, it must at the very least have two functions, one of type Int -> Bool -> r (to handle the A constructor), and the other of type String -> r (to handle the B constructor). In fact, we could write the type that way instead:
type Foo r = (Int -> Bool -> r) -> (String -> r) -> r
You should read the type Foo r here as saying "a function that consumes a Foo and produces an r". The type itself "stores" a Foo inside a closure -- so that it will effectively apply one or the other of its arguments to the value it closed over. Using this idea, we can rewrite fooValue1 and fooValue2:
fooValue1 = \consumeA consumeB -> consumeA 3 False
fooValue2 = \consumeA consumeB -> consumeB "hello!"
Now, let's try applying this trick to real lists (though not using Haskell's fancy syntax sugar).
data List a = Nil | Cons a (List a)
Following the same format as before, consuming a list like this involves either giving a value of type r (in case the constructor was Nil) or telling what to do with an a and another List a, so. At first, this seems problematic, since:
type List a r = (r) -> (a -> List a -> r) -> r
isn't really a good type (it's recursive!). But we can instead demand that we first reduce all the recursive arguments to r first... then we can adjust this type to make something more reasonable.
type List a r = (r) -> (a -> r -> r) -> r
(Again, we should read the type List a r as being "a thing that consumes a list of as and produces an r".)
There's one final trick that's necessary. What we would like to do is to enforce the requirement that the r that our List a r returns is actually constructed from the arguments we pass. That's a little abstract, so let's give an example of a bad value that happens to have type List a r, but which we'd like to rule out.
badList = \consumeNil consumeCons -> False
Now, badList has type List a Bool, but it's not really a function that consumes a list and produces a Bool, since in some sense there's no list being consumed. We can rule this out by demanding that the type work for any r, no matter what the user wants r to be:
type List a = forall r. (r) -> (a -> r -> r) -> r
This enforces the idea that the only way to get an r that gets us off the ground is to use the (user-supplied) consumeNil function. Can you see how to make this same refinement for our original Foo type?
If it is possible could you try and nudge me in the right direction without telling me the answer.
It's possible, in more than one way. But your main problem here is that you've not implemented lists. You've implemented fixed-size vectors whose length is encoded in the type.
Compare the types from adding an element to the head of a list vs. your implementation:
(:) :: a -> [a] -> [a]
prepend :: a -> b -> (a, b)
To construct an equivalent of the built-in list type, you'd need a function like prepend with a type resembling a -> b -> b. And if you want your lists to be parameterized by element type in a straightforward way, you need the type to further resemble a -> f a -> f a.
Is this even possible in a static language?
You're also on to something here, in that the encoding you're using works fine in something like Scheme. Languages with "dynamic" systems can be regarded as having a single static type with implicit conversions and metadata attached, which obviously solves the type mismatch problem in a very extreme way!
I cannot think of how to write an at (!!) function.
Recalling that your "lists" actually encode their length in their type, it should be easy to see why it's difficult to write functions that do anything other than increment/decrement the length. You can actually do this, but it requires elaborate encoding and more advanced type system features. A hint in this direction is that you'll need to use type-level numbers as well. You'd probably enjoy doing this as an exercise as well, but it's much more advanced than encoding lists.
Solution A - nested tuples:
Your lists are really nested tuples - for example, they can hold items of different types, and their type reveals their length.
It is possible to write indexing-like function for nested tuples, but it is ugly, and it won't correspond to Prelude's lists. Something like this:
class List a b where ...
instance List () b where ...
instance List a b => List (b,a) b where ...
Solution B - use data
I recommend using data construct. Tuples are internally something like this:
data (,) a b = Pair a b
so you aren't avoiding data. The division between "custom types" and "primitive types" is rather artificial in Haskell, as opposed to C.
Solution C - use newtype:
If you are fine with newtype but not data:
newtype List a = List (Maybe (a, List a))
Solution D - rank-2-types:
Use rank-2-types:
type List a = forall b. b -> (a -> b -> b) -> b
list :: List Int
list = \n c -> c 1 (c 2 n) -- [1,2]
and write functions for them. I think this is closest to your goal. Google for "Church encoding" if you need more hints.
Let's set aside at, and just think about your first four functions for the moment. You haven't given them type signatures, so let's look at those; they'll make things much clearer. The types are
cons :: a -> (a, ())
prepend :: a -> b -> (a, b)
head :: (a, b) -> a
tail :: (a, b) -> b
Hmmm. Compare these to the types of the corresponding Prelude functions1:
return :: a -> [a]
(:) :: a -> [a] -> [a]
head :: [a] -> a
tail :: [a] -> [a]
The big difference is that, in your code, there's nothing that corresponds to the list type, []. What would such a type be? Well, let's compare, function by function.
cons/return: here, (a,()) corresponds to [a]
prepend/(:): here, both b and (a,b) correspond to [a]
head: here, (a,b) corresponds to [a]
tail: here, (a,b) corresponds to [a]
It's clear, then, that what you're trying to say is that a list is a pair. And prepend indicates that you then expect the tail of the list to be another list. So what would that make the list type? You'd want to write type List a = (a,List a) (although this would leave out (), your empty list, but I'll get to that later), but you can't do this—type synonyms can't be recursive. After all, think about what the type of at/!! would be. In the prelude, you have (!!) :: [a] -> Int -> a. Here, you might try at :: (a,b) -> Int -> a, but this won't work; you have no way to convert a b into an a. So you really ought to have at :: (a,(a,b)) -> Int -> a, but of course this won't work either. You'll never be able to work with the structure of the list (neatly), because you'd need an infinite type. Now, you might argue that your type does stop, because () will finish a list. But then you run into a related problem: now, a length-zero list has type (), a length-one list has type (a,()), a length-two list has type (a,(a,())), etc. This is the problem: there is no single "list type" in your implementation, and so at can't have a well-typed first parameter.
You have hit on something, though; consider the definition of lists:
data List a = []
| a : [a]
Here, [] :: [a], and (:) :: a -> [a] -> [a]. In other words, a list is isomorphic to something which is either a singleton value, or a pair of a value and a list:
newtype List' a = List' (Either () (a,List' a))
You were trying to use the same trick without creating a type, but it's this creation of a new type which allows you to get the recursion. And it's exactly your missing recursion which allows lists to have a single type.
1: On a related note, cons should be called something like singleton, and prepend should be cons, but that's not important right now.
You can implement the datatype List a as a pair (f, n) where f :: Nat -> a and n :: Nat, where n is the length of the list:
type List a = (Int -> a, Int)
Implementing the empty list, the list operations cons, head, tail, and null, and a function convert :: List a -> [a] is left as an easy exercise.
(Disclaimer: stole this from Bird's Introduction to Functional Programming in Haskell.)
Of course, you could represent tuples via functions as well. And then True and False and the natural numbers ...

Resources