Type-safe difference lists - haskell

A common idiom in Haskell, difference lists, is to represent a list xs as the value (xs ++). Then (.) becomes "(++)" and id becomes "[]" (in fact this works for any monoid or category). Since we can compose functions in constant time, this gives us a nice way to efficiently build up lists by appending.
Unfortunately the type [a] -> [a] is way bigger than the type of functions of the form (xs ++) -- most functions on lists do something other than prepend to their argument.
One approach around this (as used in dlist) is to make a special DList type with a smart constructor. Another approach (as used in ShowS) is to not enforce the constraint anywhere and hope for the best. But is there a nice way of keeping all the desired properties of difference lists while using a type that's exactly the right size?

Yes!
We can view [a] as a free monad instance Free ((,) a) ().
Thus we can apply the scheme described by Edward Kmett in Free Monads for Less.
The type we'll get is
newtype F a = F { runF :: forall r. (() -> r) -> ((a, r) -> r) -> r }
or simply
newtype F a = F { runF :: forall r. r -> (a -> r -> r) -> r }
So runF is nothing else than the foldr function for our list!
This is called the Boehm-Berarducci encoding and it's isomorphic to the original data type (list) — so this is as small as you can possibly get.
Will Ness says:
So this type is still too "wide", it allows more than just prefixing - doesn't constrain the g function argument.
If I understood his argument correctly, he points out that you can apply the foldr (or runF) function to something different from [] and (:).
But I never claimed that you can use foldr-encoding only for concatenation. Indeed, as this name implies, you can use it to calculate any fold — and that's what Will Ness demonstrated.
It may become more clear if you forget for a moment that there's one true list type, [a]. There may be lots of list types — e.g. I can define one by
data List a = Nil | Cons a (List a)
It's be different from, but isomorphic to [a].
The foldr-encoding above is just yet another encoding of lists, like List a or [a]. It is also isomorphic to [a], as evidenced by functions \l -> F (\a f -> foldr a f l) and \x -> runF [] (:) and the fact that their compositions (in either order) is identity. But you are not obliged to convert to [a] — you can convert to List directly, using \x -> runF x Nil Cons.
The important point is that F a doesn't contain an element that is not the foldr functions for some list — nor does it contain an element that is the foldr functions for more than one list (obviously).
Thus, it doesn't contain too few or too many elements — it contains precisely as many elements as needed to exactly represent all lists.
This is not true of the difference list encoding — for example, the reverse function is not an append operation for any list.

Related

Maybe monad and a list

Ok, so I am trying to learn how to use monads, starting out with maybe. I've come up with an example that I can't figure out how to apply it to in a nice way, so I was hoping someone else could:
I have a list containing a bunch of values. Depending on these values, my function should return the list itself, or a Nothing. In other words, I want to do a sort of filter, but with the consequence of a hit being the function failing.
The only way I can think of is to use a filter, then comparing the size of the list I get back to zero. Is there a better way?
This looks like a good fit for traverse:
traverse :: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)
That's a bit of a mouthful, so let's specialise it to your use case, with lists and Maybe:
GHCi> :set -XTypeApplications
GHCi> :t traverse #[] #Maybe
traverse #[] #Maybe :: (a -> Maybe b) -> [a] -> Maybe [b]
It works like this: you give it an a -> Maybe b function, which is applied to all elements of the list, just like fmap does. The twist is that the Maybe b values are then combined in a way that only gives you a modified list if there aren't any Nothings; otherwise, the overall result is Nothing. That fits your requirements like a glove:
noneOrNothing :: (a -> Bool) -> [a] -> Maybe [a]
noneOrNothing p = traverse (\x -> if p x then Nothing else Just x)
(allOrNothing would have been a more euphonic name, but then I'd have to flip the test with respect to your description.)
There are a lot of things we might discuss about the Traversable and Applicative classes. For now, I will talk a bit more about Applicative, in case you haven't met it yet. Applicative is a superclass of Monad with two essential methods: pure, which is the same thing as return, and (<*>), which is not entirely unlike (>>=) but crucially different from it. For the Maybe example...
GHCi> :t (>>=) #Maybe
(>>=) #Maybe :: Maybe a -> (a -> Maybe b) -> Maybe b
GHCi> :t (<*>) #Maybe
(<*>) #Maybe :: Maybe (a -> b) -> Maybe a -> Maybe b
... we can describe the difference like this: in mx >>= f, if mx is a Just-value, (>>=) reaches inside of it to apply f and produce a result, which, depending on what was inside mx, will turn out to be a Just-value or a Nothing. In mf <*> mx, though, if mf and mx are Just-values you are guaranteed to get a Just value, which will hold the result of applying the function from mf to the value from mx. (By the way: what will happen if mf or mx are Nothing?)
traverse involves Applicative because the combining of values I mentioned at the beginning (which, in your example, turns a number of Maybe a values into a Maybe [a]) is done using (<*>). As your question was originally about monads, it is worth noting that it is possible to define traverse using Monad rather than Applicative. This variation goes by the name mapM:
mapM :: (Traversable t, Monad m) => (a -> m b) -> t a -> m (t b)
We prefer traverse to mapM because it is more general -- as mentioned above, Applicative is a superclass of Monad.
On a closing note, your intuition about this being "a sort of filter" makes a lot of sense. In particular, one way to think about Maybe a is that it is what you get when you pick booleans and attach values of type a to True. From that vantage point, (<*>) works as an && for these weird booleans, which combines the attached values if you happen to supply two of them (cf. DarthFennec's suggestion of an implementation using any). Once you get used to Traversable, you might enjoy having a look at the Filterable and Witherable classes, which play with this relationship between Maybe and Bool.
duplode's answer is a good one, but I think it is also helpful to learn to operate within a monad in a more basic way. It can be a challenge to learn every little monad-general function, and see how they could fit together to solve a specific problem. So, here's a DIY solution that shows how to use do notation and recursion, tools which can help you with any monadic question.
forbid :: (a -> Bool) -> [a] -> Maybe [a]
forbid _ [] = Just []
forbid p (x:xs) = if p x
then Nothing
else do
remainder <- forbid p xs
Just (x : remainder)
Compare this to an implementation of remove, the opposite of filter:
remove :: (a -> Bool) -> [a] -> [a]
remove _ [] = []
remove p (x:xs) = if p x
then remove p xs
else
let remainder = remove p xs
in x : remainder
The structure is the same, with just a couple differences: what you want to do when the predicate returns true, and how you get access to the value returned by the recursive call. For remove, the returned value is a list, and so you can just let-bind it and cons to it. With forbid, the returned value is only maybe a list, and so you need to use <- to bind to that monadic value. If the return value was Nothing, bind will short-circuit the computation and return Nothing; if it was Just a list, the do block will continue, and cons a value to the front of that list. Then you wrap it back up in a Just.

Example of deep understanding of currying

Reading https://wiki.haskell.org/Currying
it states :
Much of the time, currying can be ignored by the new programmer. The
major advantage of considering all functions as curried is
theoretical: formal proofs are easier when all functions are treated
uniformly (one argument in, one result out). Having said that, there
are Haskell idioms and techniques for which you need to understand
currying.
What is a Haskell technique/idiom that a deeper understanding of currying is required ?
Partial function application isn't really a distinct feature of Haskell; it is just a consequence of curried functions.
map :: (a -> b) -> [a] -> [b]
In a language like Python, map always takes two arguments: a function of type a -> b and a list of type [a]
map(f, [x, y, z]) == [f(x), f(y), f(z)]
This requires you to pretend that the -> syntax is just for show, and that the -> between (a -> b) and [a] is not really the same as the one between [a] -> [b]. However, that is not the case; it's the exact same operator, and it is right-associative. The type of map can be explicitly parenthesized as
map :: (a -> b) -> ([a] -> [b])
and suddenly it seems much less interesting that you might give only one argument (the function) to map and get back a new function of type [a] -> [b]. That is all partial function application is: taking advantage of the fact that all functions are curried.
In fact, you never really give more than one argument to a function. To go along with -> being right-associative, function application is left-associative, meaning a "multi-argument" call like
map f [1,2,3]
is really two function applications, which becomes clearer if we parenthesize it.
(map f) [1,2,3]
map is first "partially" applied to one argument f, which returns a new function. This function is then applied to [1,2,3] to get the final result.

Are there useful applications for the Divisible Type Class?

I've lately been working on an API in Elm where one of the main types is contravariant. So, I've googled around to see what one can do with contravariant types and found that the Contravariant package in Haskell defines the Divisible type class.
It is defined as follows:
class Contravariant f => Divisible f where
divide :: (a -> (b, c)) -> f b -> f c -> f a
conquer :: f a
It turns out that my particular type does suit the definition of the Divisible type class. While Elm does not support type classes, I do look at Haskell from time to time for some inspiration.
My question: Are there any practical uses for this type class? Are there known APIs out there in Haskell (or other languages) that benefit from this divide-conquer pattern? Are there any gotchas I should be aware of?
Thank you very much for your help.
One example:
Applicative is useful for parsing, because you can turn Applicative parsers of parts into a parser of wholes, needing only a pure function for combining the parts into a whole.
Divisible is useful for serializing (should we call this coparsing now?), because you can turn Divisible serializers of parts into a serializer of wholes, needing only a pure function for splitting the whole into parts.
I haven't actually seen a project that worked this way, but I'm (slowly) working on an Avro implementation for Haskell that does.
When I first came across Divisible I wanted it for divide, and had no idea what possible use conquer could be other than cheating (an f a out of nowhere, for any a?). But to make the Divisible laws check out for my serializers conquer became a "serializer" that encodes anything to zero bytes, which makes a lot of sense.
Here's a possible use case.
In streaming libraries, one can have fold-like constructs like the ones from the foldl package, that are fed a sequence of inputs and return a summary value when the sequence is exhausted.
These folds are contravariant on their inputs, and can be made Divisible. This means that if you have a stream of elements where each element can be somehow decomposed into b and c parts, and you also happen to have a fold that consumes bs and another fold that consumes cs, then you can build a fold that consumes the original stream.
The actual folds from foldl don't implement Divisible, but they could, using a newtype wrapper. In my process-streaming package I have a fold-like type that does implement Divisible.
divide requires the return values of the constituent folds to be of the same type, and that type must be an instance of Monoid. If the folds return different, unrelated monoids, a workaround is to put each return value in a separate field of a tuple, leaving the other field as mempty. This works because a tuple of monoids is itself a Monoid.
I'll examine the example of the core data types in Fritz Henglein's generalized radix sort techniques as implemented by Edward Kmett in the discrimination package.
While there's a great deal going on there, it largely focuses around a type like this
data Group a = Group (forall b . [(a, b)] -> [[b]])
If you have a value of type Group a you essentially must have an equivalence relationship on a because if I give you an association between as and some type b completely unknown to you then you can give me "groupings" of b.
groupId :: Group a -> [a] -> [[a]]
groupId (Group grouper) = grouper . map (\a -> (a, a))
You can see this as a core type for writing a utility library of groupings. For instance, we might want to know that if we can Group a and Group b then we can Group (a, b) (more on this in a second). Henglein's core idea is that if you can start with some basic Groups on integers—we can write very fast Group Int32 implementations via radix sort—and then use combinators to extend them over all types then you will have generalized radix sort to algebraic data types.
So how might we build our combinator library?
Well, f :: Group a -> Group b -> Group (a, b) is pretty important in that it lets us make groups of product-like types. Normally, we'd get this from Applicative and liftA2 but Group, you'll notice, is Contravaiant, not a Functor.
So instead we use Divisible
divided :: Group a -> Group b -> Group (a, b)
Notice that this arises in a strange way from
divide :: (a -> (b, c)) -> Group b -> Group c -> Group a
as it has the typical "reversed arrow" character of contravariant things. We can now understand things like divide and conquer in terms of their interpretation on Group.
Divide says that if I want to build a strategy for equating as using strategies for equating bs and cs, I can do the following for any type x
Take your partial relation [(a, x)] and map over it with a function f :: a -> (b, c), and a little tuple manipulation, to get a new relation [(b, (c, x))].
Use my Group b to discriminate [(b, (c, x))] into [[(c, x)]]
Use my Group c to discriminate each [(c, x)] into [[x]] giving me [[[x]]]
Flatten the inner layers to get [[x]] like we need
instance Divisible Group where
conquer = Group $ return . fmap snd
divide k (Group l) (Group r) = Group $ \xs ->
-- a bit more cleverly done here...
l [ (b, (c, d)) | (a,d) <- xs, let (b, c) = k a] >>= r
We also get interpretations of the more tricky Decidable refinement of Divisible
class Divisible f => Decidable f where
lose :: (a -> Void) -> f a
choose :: (a -> Either b c) -> f b -> f c -> f a
instance Decidable Group where
lose :: (a -> Void) -> Group a
choose :: (a -> Either b c) -> Group b -> Group c -> Group a
These read as saying that for any type a of which we can guarantee there are no values (we cannot produce values of Void by any means, a function a -> Void is a means of producing Void given a, thus we must not be able to produce values of a by any means either!) then we immediately get a grouping of zero values
lose _ = Group (\_ -> [])
We also can go a similar game as to divide above except instead of sequencing our use of the input discriminators, we alternate.
Using these techniques we build up a library of "Groupable" things, namely Grouping
class Grouping a where
grouping :: Group a
and note that nearly all the definitions arise from the basic definition atop groupingNat which uses fast monadic vector manipuations to achieve an efficient radix sort.

Would the ability to detect cyclic lists in Haskell break any properties of the language?

In Haskell, some lists are cyclic:
ones = 1 : ones
Others are not:
nums = [1..]
And then there are things like this:
more_ones = f 1 where f x = x : f x
This denotes the same value as ones, and certainly that value is a repeating sequence. But whether it's represented in memory as a cyclic data structure is doubtful. (An implementation could do so, but this answer explains that "it's unlikely that this will happen in practice".)
Suppose we take a Haskell implementation and hack into it a built-in function isCycle :: [a] -> Bool that examines the structure of the in-memory representation of the argument. It returns True if the list is physically cyclic and False if the argument is of finite length. Otherwise, it will fail to terminate. (I imagine "hacking it in" because it's impossible to write that function in Haskell.)
Would the existence of this function break any interesting properties of the language?
Would the existence of this function break any interesting properties of the language?
Yes it would. It would break referential transparency (see also the Wikipedia article). A Haskell expression can be always replaced by its value. In other words, it depends only on the passed arguments and nothing else. If we had
isCycle :: [a] -> Bool
as you propose, expressions using it would not satisfy this property any more. They could depend on the internal memory representation of values. In consequence, other laws would be violated. For example the identity law for Functor
fmap id === id
would not hold any more: You'd be able to distinguish between ones and fmap id ones, as the latter would be acyclic. And compiler optimizations such as applying the above law would not longer preserve program properties.
However another question would be having function
isCycleIO :: [a] -> IO Bool
as IO actions are allowed to examine and change anything.
A pure solution could be to have a data type that internally distinguishes the two:
import qualified Data.Foldable as F
data SmartList a = Cyclic [a] | Acyclic [a]
instance Functor SmartList where
fmap f (Cyclic xs) = Cyclic (map f xs)
fmap f (Acyclic xs) = Acyclic (map f xs)
instance F.Foldable SmartList where
foldr f z (Acyclic xs) = F.foldr f z xs
foldr f _ (Cyclic xs) = let r = F.foldr f r xs in r
Of course it wouldn't be able to recognize if a generic list is cyclic or not, but for many operations it'd be possible to preserve the knowledge of having Cyclic values.
In the general case, no you can't identify a cyclic list. However if the list is being generated by an unfold operation then you can. Data.List contains this:
unfoldr :: (b -> Maybe (a, b)) -> b -> [a]
The first argument is a function that takes a "state" argument of type "b" and may return an element of the list and a new state. The second argument is the initial state. "Nothing" means the list ends.
If the state ever recurs then the list will repeat from the point of the last state. So if we instead use a different unfold function that returns a list of (a, b) pairs we can inspect the state corresponding to each element. If the same state is seen twice then the list is cyclic. Of course this assumes that the state is an instance of Eq or something.

Creating a list type using functions

For a silly challenge I am trying to implement a list type using as little of the prelude as possible and without using any custom types (the data keyword).
I can construct an modify a list using tuples like so:
import Prelude (Int(..), Num(..), Eq(..))
cons x = (x, ())
prepend x xs = (x, xs)
head (x, _) = x
tail (_, x) = x
at xs n = if n == 0 then xs else at (tail xs) (n-1)
I cannot think of how to write an at (!!) function. Is this even possible in a static language?
If it is possible could you try to nudge me in the right direction without telling me the answer.
There is a standard trick known as Church encoding that makes this easy. Here's a generic example to get you started:
data Foo = A Int Bool | B String
fooValue1 = A 3 False
fooValue2 = B "hello!"
Now, a function that wants to use this piece of data must know what to do with each of the constructors. So, assuming it wants to produce some result of type r, it must at the very least have two functions, one of type Int -> Bool -> r (to handle the A constructor), and the other of type String -> r (to handle the B constructor). In fact, we could write the type that way instead:
type Foo r = (Int -> Bool -> r) -> (String -> r) -> r
You should read the type Foo r here as saying "a function that consumes a Foo and produces an r". The type itself "stores" a Foo inside a closure -- so that it will effectively apply one or the other of its arguments to the value it closed over. Using this idea, we can rewrite fooValue1 and fooValue2:
fooValue1 = \consumeA consumeB -> consumeA 3 False
fooValue2 = \consumeA consumeB -> consumeB "hello!"
Now, let's try applying this trick to real lists (though not using Haskell's fancy syntax sugar).
data List a = Nil | Cons a (List a)
Following the same format as before, consuming a list like this involves either giving a value of type r (in case the constructor was Nil) or telling what to do with an a and another List a, so. At first, this seems problematic, since:
type List a r = (r) -> (a -> List a -> r) -> r
isn't really a good type (it's recursive!). But we can instead demand that we first reduce all the recursive arguments to r first... then we can adjust this type to make something more reasonable.
type List a r = (r) -> (a -> r -> r) -> r
(Again, we should read the type List a r as being "a thing that consumes a list of as and produces an r".)
There's one final trick that's necessary. What we would like to do is to enforce the requirement that the r that our List a r returns is actually constructed from the arguments we pass. That's a little abstract, so let's give an example of a bad value that happens to have type List a r, but which we'd like to rule out.
badList = \consumeNil consumeCons -> False
Now, badList has type List a Bool, but it's not really a function that consumes a list and produces a Bool, since in some sense there's no list being consumed. We can rule this out by demanding that the type work for any r, no matter what the user wants r to be:
type List a = forall r. (r) -> (a -> r -> r) -> r
This enforces the idea that the only way to get an r that gets us off the ground is to use the (user-supplied) consumeNil function. Can you see how to make this same refinement for our original Foo type?
If it is possible could you try and nudge me in the right direction without telling me the answer.
It's possible, in more than one way. But your main problem here is that you've not implemented lists. You've implemented fixed-size vectors whose length is encoded in the type.
Compare the types from adding an element to the head of a list vs. your implementation:
(:) :: a -> [a] -> [a]
prepend :: a -> b -> (a, b)
To construct an equivalent of the built-in list type, you'd need a function like prepend with a type resembling a -> b -> b. And if you want your lists to be parameterized by element type in a straightforward way, you need the type to further resemble a -> f a -> f a.
Is this even possible in a static language?
You're also on to something here, in that the encoding you're using works fine in something like Scheme. Languages with "dynamic" systems can be regarded as having a single static type with implicit conversions and metadata attached, which obviously solves the type mismatch problem in a very extreme way!
I cannot think of how to write an at (!!) function.
Recalling that your "lists" actually encode their length in their type, it should be easy to see why it's difficult to write functions that do anything other than increment/decrement the length. You can actually do this, but it requires elaborate encoding and more advanced type system features. A hint in this direction is that you'll need to use type-level numbers as well. You'd probably enjoy doing this as an exercise as well, but it's much more advanced than encoding lists.
Solution A - nested tuples:
Your lists are really nested tuples - for example, they can hold items of different types, and their type reveals their length.
It is possible to write indexing-like function for nested tuples, but it is ugly, and it won't correspond to Prelude's lists. Something like this:
class List a b where ...
instance List () b where ...
instance List a b => List (b,a) b where ...
Solution B - use data
I recommend using data construct. Tuples are internally something like this:
data (,) a b = Pair a b
so you aren't avoiding data. The division between "custom types" and "primitive types" is rather artificial in Haskell, as opposed to C.
Solution C - use newtype:
If you are fine with newtype but not data:
newtype List a = List (Maybe (a, List a))
Solution D - rank-2-types:
Use rank-2-types:
type List a = forall b. b -> (a -> b -> b) -> b
list :: List Int
list = \n c -> c 1 (c 2 n) -- [1,2]
and write functions for them. I think this is closest to your goal. Google for "Church encoding" if you need more hints.
Let's set aside at, and just think about your first four functions for the moment. You haven't given them type signatures, so let's look at those; they'll make things much clearer. The types are
cons :: a -> (a, ())
prepend :: a -> b -> (a, b)
head :: (a, b) -> a
tail :: (a, b) -> b
Hmmm. Compare these to the types of the corresponding Prelude functions1:
return :: a -> [a]
(:) :: a -> [a] -> [a]
head :: [a] -> a
tail :: [a] -> [a]
The big difference is that, in your code, there's nothing that corresponds to the list type, []. What would such a type be? Well, let's compare, function by function.
cons/return: here, (a,()) corresponds to [a]
prepend/(:): here, both b and (a,b) correspond to [a]
head: here, (a,b) corresponds to [a]
tail: here, (a,b) corresponds to [a]
It's clear, then, that what you're trying to say is that a list is a pair. And prepend indicates that you then expect the tail of the list to be another list. So what would that make the list type? You'd want to write type List a = (a,List a) (although this would leave out (), your empty list, but I'll get to that later), but you can't do this—type synonyms can't be recursive. After all, think about what the type of at/!! would be. In the prelude, you have (!!) :: [a] -> Int -> a. Here, you might try at :: (a,b) -> Int -> a, but this won't work; you have no way to convert a b into an a. So you really ought to have at :: (a,(a,b)) -> Int -> a, but of course this won't work either. You'll never be able to work with the structure of the list (neatly), because you'd need an infinite type. Now, you might argue that your type does stop, because () will finish a list. But then you run into a related problem: now, a length-zero list has type (), a length-one list has type (a,()), a length-two list has type (a,(a,())), etc. This is the problem: there is no single "list type" in your implementation, and so at can't have a well-typed first parameter.
You have hit on something, though; consider the definition of lists:
data List a = []
| a : [a]
Here, [] :: [a], and (:) :: a -> [a] -> [a]. In other words, a list is isomorphic to something which is either a singleton value, or a pair of a value and a list:
newtype List' a = List' (Either () (a,List' a))
You were trying to use the same trick without creating a type, but it's this creation of a new type which allows you to get the recursion. And it's exactly your missing recursion which allows lists to have a single type.
1: On a related note, cons should be called something like singleton, and prepend should be cons, but that's not important right now.
You can implement the datatype List a as a pair (f, n) where f :: Nat -> a and n :: Nat, where n is the length of the list:
type List a = (Int -> a, Int)
Implementing the empty list, the list operations cons, head, tail, and null, and a function convert :: List a -> [a] is left as an easy exercise.
(Disclaimer: stole this from Bird's Introduction to Functional Programming in Haskell.)
Of course, you could represent tuples via functions as well. And then True and False and the natural numbers ...

Resources