Why isn't zip generic with respect to argument count? - haskell

Haskell newbie here. It is my observation that:
zip and zip3 are important functions - the are included in the Prelude, implemented by many other languages, represent a common operation in mathematics(transposition)
not generic with respect to parameter structure
easy to implement in traditional languages - C or C++ (say 20 hours work); python already has it as a build-in
Why is zip so restricted? Is there an abstraction, generalizing it? Something wrong with n-sized tuples?

Because the suggested duplicates answer most of this, I will focus on the questions in your followup comment.
1) why is the standard implementation for fixed n = 2
zipWith is for 2 arguments, and repeat is for 0 arguments. This is enough to get arbitrary-arity zips. For example, the 1 argument version (also called map) can be implemented as
map f = zipWith ($) (repeat f)
and the 3 argument version as
zipWith3 f = (.) (zipWith ($)) . zipWith f
and so on. There is a pretty pattern to the implementations of larger zips (admittedly not obvious from this small sample size). This result is analogous to the one in CT which says that any category with 0-ary and 2-ary products has all finitary products.
The other half of the answer, I suppose, is that type-level numbers (which are the most frequent implementation technique for arbitrary-arity zips) are possible but annoying to use, and avoiding them tends to reduce both term- and type-level noise.
2) I need to pass the number of lists, that's unwieldy
Use ZipList. You don't need to pass the number of lists (though you do need to write one infix operator per list -- a very light requirement, I think, as even in Python you need a comma between each list).
Empirically: I have not found arbitrary-arity zips such a common need that I would label it "unwieldy".
3) even if I define my own zip, there will be collisions with Prelude.zip.
So pick another name...?

Because the type signatures would be different, for example the type signatures of zip and zip3 are different:
zip :: [a] -> [b] ->[(a,b)]
zip3:: [a] -> [b] -> [c] -> [(a,b,c)]
zip3 takes one more argument than zip and secondly the type, and haskell does not allow you to have a polymorphism with different numbers of arguments because of currying. Here is an explanation of what currying is on SO.

Related

Are Lists Inductive or Coinductive in Haskell?

So I've been reading about coinduction a bit lately, and now I'm wondering: are Haskell lists inductive or coinductive? I've also heard that Haskell doesn't distinguish the two, but if so, how do they do so formally?
Lists are defined inductively, data [a] = [] | a : [a], yet can be used coinductively, ones = a:ones. We can create infinite lists. Yet, we can create finite lists. So which are they?
Related is in Idris, where the type List a is strictly an inductive type, and is thus only finite lists. It's defined akin to how it is in Haskell. However, Stream a is a coinductive type, modeling an infinite list. It's defined as (or rather, the definition is equivalent to) codata Stream a = a :: (Stream a). It's impossible to create an infinite List or a finite Stream. However, when I write the definition
codata HList : Type -> Type where
Nil : HList a
Cons : a -> HList a -> HList a
I get the behavior that I expect from Haskell lists, namely that I can make both finite and infinite structures.
So let me boil them down to a few core questions:
Does Haskell not distinguish between inductive and coinductive types? If so, what's the formalization for that? If not, then which is [a]?
Is HList coinductive? If so, how can a coinductive type contain finite values?
What about if we defined data HList' a = L (List a) | R (Stream a)? What would that be considered and/or would it be useful over just HList?
Due to laziness, Haskell types are both inductive and coinductive, or, there is no formal distinguishment between data and codata. All recursive types can contain an infinite nesting of constructors. In languages such as Idris, Coq, Agda, etc. a definition like ones = 1 : ones is rejected by the termination checker. Laziness means that ones can be evaluated in one step to 1 : ones, whereas the other languages evaluate to normal form only, and ones does not have a normal form.
'Coinductive' does not mean 'necessarily infinite', it means 'defined by how it is deconstructed', wheras inductive means 'defined by how it is constructed'. I think this is an excellent explanation of the subtle difference. Surely you would agree that the type
codata A : Type where MkA : A
cannot be infinite.
This is an interesting one - as opposed to HList, which you can never 'know' if it is finite or infinite (specifically, you can discover in finite time if a list is finite, but you can't compute that it is infinite), HList' gives you a simple way to decide in constant time if your list is finite or infinite.
In a total language like Coq or Agda, inductive types are those whose values can be torn down in finite time. Inductive functions must terminate. Coinductive types, on the other hand, are those whose values can be built up in finite time. Coinductive functions must be productive.
Systems that are intended to be useful as proof assistants (like Coq and Agda) must be total, because non-termination causes a system to be logically inconsistent. But requiring all functions to be total and inductive makes it impossible to work with infinite structures, thus, coinduction was invented.
So the purpose of inductive and coinductive types is to reject possibly non-terminating programs. Here's an example in Agda of a function which is rejected because of the productivity condition. (The function you pass to filter could reject every element, so you could be waiting forever for the next element of the resulting stream.)
filter : {A : Set} -> (A -> Bool) -> Stream A -> Stream A
filter f xs with f (head xs)
... | true = head xs :: filter f (tail xs)
... | false = filter f (tail xs) -- unguarded recursion
Now, Haskell has no notion of inductive or coinductive types. The question "Is this type inductive or coinductive?" is not a meaningful one. How does Haskell get away without making the distinction? Well, Haskell was never intended to be consistent as a logic in the first place. It's a partial language, which means that you're allowed to write non-terminating and non-productive functions - there's no termination checker and no productivity checker. One can debate the wisdom of this design decision but it certainly renders induction and coinduction redundant.
Instead, Haskell programmers are used to reasoning informally about a program's termination/productivity. Laziness lets us work with infinite data structures, but we don't get any help from the machine to ensure that our functions are total.
To interpret type-level recursion one needs to find a "fixed point" for the
CPO-valued list functor
F X = (1 + A_bot * X)_bot
If we reason inductively, we want the fixed point to be "least". If coinductively, "greatest".
Technically, this is done by working in the embedding-projection subcategory of CPO_bot, taking e.g. for the "least" the colimit of the diagram of embeddings
0_bot |-> F 0_bot |-> F (F 0_bot) |-> ...
generalizing the Kleene's fixed point theorem. For the "greatest" we would take the limit of the diagram of the projections
0_bot <-| F 0_bot <-| F (F 0_bot) <-| ...
It however turns out that the "least" is isomorphic to the "greatest", for any F. This is the "bilimit" theorem (see e.g. Abramsky's "Domain Theory" survey paper).
Perhaps surprisingly, it turns out that the inductive or coinductive flavor comes from the liftings applied by F instead of the least/greatest fixed points. For instance, if x is the smashed product and # is the smashed sum,
F X = 1_bot # (A_bot x X)
would have as a bilimit the set of finite lists (up to iso).
[I hope I got the liftings right -- these are tricky ;-) ]

Is there significance in the order of Haskell function parameters?

I've been learning Haskell and I noticed that many of the built in functions accept parameters in an order counter intuitive to what I would expect. For example:
replicate :: Int -> a -> [a]
If I want to replicate 7 twice, I would write replicate 2 7. But when read out loud in English, the function call feels like it is saying "Replicate 2, 7 times". If I would have written the function myself, I would have swapped the first and second arguments so that replicate 7 2 would read "replicate 7, 2 times".
Some other examples appeared when I was going through 99 Haskell Problems. I had to write a function:
dropEvery :: [a] -> Int -> [a]`
It takes a list as its first argument and an Int as its second. Intuitively, I would have written the header as dropEvery :: Int -> [a] -> [a] so that dropEvery 3 [1..100] would read as: "drop every third element in the list [1..100]. But in the question's example, it would look like: dropEvery [1..100] 3.
I've also seen this with other functions that I cannot find right now. Is it common to write functions in such a way due to a practical reason or is this all just in my head?
It's common practice in Haskell to order function parameters so that parameters which "configure" an operation come first, and the "main thing being operated on" comes last. This is often counter intuitive coming from other languages, since it tends to mean you end up passing the "least important" information first. It's especially jarring coming from OO where the "main" argument is usually the object on which the method is being invoked, occurring so early in in the call that it's out of the parameter list entirely!
There's a method to our madness though. The reason we do this is that partial application (through currying) is so easy and so widely used in Haskell. Say I have a functions like foo :: Some -> Config -> Parameters -> DataStrucutre -> DataStructure and bar :: Differnt -> Config -> DataStructure -> DataStructure. When you're not used to higher-order thinking you just see these as things you call to transform a data structure. But you can also use either of them as a factory for "DataStructure transformers": functions of the type DataStructure -> DataStructure.
It's very likely that there are other operations that are configured by such DataStructure -> DataStructure functions; at the very least there's fmap for turning transformers of DataStructures into transformers of functors of DataStructures (lists, Maybes, IOs, etc).
We can take this a bit further sometimes too. Consider foo :: Some -> Config -> Parameters -> DataStructure -> DataStructure again. If I expect that callers of foo will often call it many times with the same Some and Config, but varying Parameters, then even-more-partial applications become useful.
Of course, even if the parameters are in the "wrong" order for my partial application I can still do it, using combinators like flip and/or creating wrapper functions/lambdas. But this results in a lot of "noise" in my code, meaning that a reader has to be able to puzzle out what is the "important" thing being done and what's just adapting interfaces.
So the basic theory is for a function writer to try to anticipate the usage patterns of the function, and list its arguments in order from "most stable" to "least stable". This isn't the only consideration of course, and often there are conflicting patterns and no clear "best" order.
But "the order the parameters would be listed in an English sentence describing the function call" would not be something I would give much weight to in designing a function (and not in other languages either). Haskell code just does not read like English (nor does code in most other programming languages), and trying to make it closer in a few cases doesn't really help.
For your specific examples:
For replicate, it seems to me like the a parameter is the "main" argument, so I would put it last, as the standard library does. There's not a lot in it though; it doesn't seem very much more useful to choose the number of replications first and have an a -> [a] function than it would be to choose the replicated element first and have an Int -> [a] function.
dropEvery indeed seems to take it's arguments in a wonky order, but not because we say in English "drop every Nth element in a list". Functions that take a data structure and return a "modified version of the same structure" should almost always take the data structure as their last argument, with the parameters that configure the "modification" coming first.
One of the reasons functions are written this way is because their curried forms turn out to be useful.
For example, consider the functions map and filter:
map :: (a -> b) -> [a] -> [b]
filter :: (a -> Bool) -> [a] -> [a]
If I wanted to keep the even numbers in a list and then divide them by 2, I could write:
myfunc :: [Int] -> [Int]
myfunc as = map (`div` 2) (filter even as)
which may also be written this way:
myfunc = map (`div` 2) . filter even
\___ 2 ____/ \___ 1 ___/
Envision this as a pipeline going from right to left:
first we keep the even numbers (step 1)
then we divide each number by 2 (step 2)
The . operator at as a way of joining pipeline segments together - much like how the | operator works in the Unix shell.
This is all possible because the list argument for map and filter are the last parameters to those functions.
If you write your dropEvery with this signature:
dropEvery :: Int -> [a] -> [a]
then we can include it in one of these pipelines, e.g.:
myfunc2 = dropEvery 3 . map (`div` 2) . filter even
To add to the other answers, there's also often an incentive to make the last argument be the one whose construction is likely to be most complicated and/or to be a lambda abstraction. This way one can write
f some little bits $
big honking calculation
over several lines
rather than having the big calculation surrounded by parentheses and a few little arguments trailing off at the end.
If you wish to flip arguments, just use flip function from Prelude
replicate' = flip replicate
> :t replicate'
replicate' :: a -> Int -> [a]

Can all recursive structures be replaced by a non recursive solution?

For example, could you define a list in Haskell without defining a recursive structure? Or replace all lists by some function(s)?
data List a = Empty | (a, List a) -- <- recursive definition
EDIT
I gave the list as an example, but I was really asking about all data structures in general.
Maybe we only need one recursive data structure for all cases where recursion is needed? Like the Y combinator being the only recursive function needed. #TikhonJelvis 's answer made me think about that.
Now I'm pretty sure this post is better suited for cs.stackexchange.
About current selected answer
I was really looking for answers that looked more like the ones given by #DavidYoung & #TikhonJelvis, but they only give a partial answer and I appreciate them.
So, if any has an answer that uses functional concepts, please share.
That's a bit of an odd question. I think the answer is not really, but the definition of the data type does not have to be recursive directly.
Ultimately, lists are recursive data structures. You can't define them without having some sort of recursion somewhere. It's core to their essence.
However, we don't have to make the actual definition of List recursive. Instead, we can factor out recursion into a single data type Fix and then define all other recursive types with it. In a sense, Fix just captures the essence of what it means for a data structure to be recursive. (It's the type-level version of the fix function, which does the same thing for functions.)
data Fix f = Roll (f (Fix f))
The idea is that Fix f corresponds to f applied to itself repeatedly. To make it work with Haskell's algebraic data types, we have to throw in a Roll constructor at every level, but this does not change what the type represents.
Essentially, f applied to itself repeatedly like this is the essence of recursion.
Now we can define a non-recursive analog to List that takes an extra type argument f that replaces our earlier recursion:
data ListF a f = Empty | Cons a f
This is a straightforward data type that is not recursive.
If we combine the two, we get our old List type except with some extra Roll constructors at each recursive step.
type List a = Fix (ListF a)
A value of this type looks like this:
Roll (Cons 1 (Roll (Cons 2 (Roll Empty))))
It carries the same information as (Cons 1 (Cons 2 Empty)) or even just [1, 2], but a few extra constructors sprinkled through.
So if you were given Fix, you could define List without using recursion. But this isn't particularly special because, in a sense, Fix is recursion.
I'm not sure if all recursive structures can be replaced by a non-recursive version but some certainly can, including lists. One possible way to do this is with what is called a Boehm-Berarducci encoding. This a way to represent a structure as a function, specifically the fold over that structure (foldr in the case of a list):
{-# LANGUAGE RankNTypes #-}
type List a = forall x . (a -> x -> x) -> x -> x
-- ^^^^^^^^^^^^^ ^
-- Cons branch Nil branch
(From the above link with slightly different formatting)
This type is also something like a case analysis over the list. The first argument represents the cons case and the second argument represents the nil case.
In general, the branches of a sum type become different arguments to the function and fields of a product type become function types with an argument for each field. Note that in the encoding above, the nil branch is (in general) a non-function because the nil constructor takes no arguments, while the cons branch has two arguments since the cons constructor takes two arguments. The recursion parts of the definition are "replaced" with a Rank N type (called x here).
I think this question breaks down into considering three distinct feature subsets that Haskell provides:
Facilities for defining new data types.
A repertoire of built-in types.
A foreign function interface that allows interfacing with functionality external to the language.
Looking only at (1), the native type definition facilities don't really provide for defining any infinitely-large types other than by recursion.
Looking at (2), however, Haskell 2010 provides the Data.Array module, which provides array types that together with (1) can be used to build non-recursive definitions of many different structures.
And even if the language did not provide arrays, (3) means that we could bolt them to the language as an FFI extension. Haskell implementations are also allowed to provide extra functionality that can be used for this in stead of the FFI, and many libraries for GHC exploit those (e.g., vector).
So I'd say that the best answer is that Haskell only allows you to define nonrecursive collection types only to the extent that it provides you with basic built-in ones that you can use as building blocks for more complex ones.

Are haskell data types co-algebras by default?

I'm trying to get my head around F-algebras, and this article does a pretty good job. I understand the notion of a dual in category theory, but I'm having a hard time understanding how F-coalgebras (the dual of F-algebras) relate to lazy data structures in Haskell.
F-algebras are described with an endofunctor with the function: F a -> a, which makes sense if you think of F a as an expression, and a as the result of evaluating that expression, as the linked article explains it.
Being the dual of F-algebras, the corresponding function for a F-coalgebra would be a -> F a. Wikipedia says that F-coalgebras can be used to create infinite, lazy data structures. How does the a -> F a functon allow one to create infinite, lazy data structures? Also, with that in mind, since Haskell is at it's core lazy, are most data-types in Haskell F-coalgebras instead of F-algebras? Are F-algebras not lazily evaluated?
If data types (or at least the ones that are capable of infinite data) are based on F-coalgebras in Haskell, what is the a -> F a function for lists, for example? What is the terminal F-coalgebra for lists?
Making an infinite list [1,2,3,4...] might look like this in Haskell:
list = 1 : map (+ 1) list
Does this use F-coalgebras somehow? Do infinite data structures require a notion of lazy evaluation and recursion alongside the use of F-coalgebras? Am I missing something here?
A coalgebra A -> F A can be used to peel away the outer layer of a (possibly infinite) data structure. For lists of X, the functor is F a = Maybe (X, a), the same as in the algebraic view. In haskell the function for the coalgebra is
headView :: [a] -> Maybe (a, [a])
headView [] = Nothing
headView (x:xs) = Just (x,xs)
unfoldr is the unfold corresponding to this coalgebra, just like foldr is the fold corresponding to this algebra.
If you consider [a] not as the type of lists, but as a the type of descriptions of lists or programs, then this allows you to construct (seemingly) infinite values, just with a necessarily finite description.
As you can see, a Haskell list looks like both an F-algebra and an F-coalgebra. This is possible because Haskell is not actually consistent. You can fold an unfold, and get an infinite loop. Languages like coq and agda make the distinction between data types (F-algebras) and codata types (F-coalgebras) explicit. In those languages you have two list types, an algebraic List and a coalgebraic Colist.

Closures and list comprehensions in Haskell

I'm playing around with Haskell at the moment and thus stumbled upon the list comprehension feature.
Naturally, I would have used a closure to do this kind of thing:
Prelude> [x|x<-[1..7],x>4] -- list comprehension
[5,6,7]
Prelude> filter (\x->x>4) [1..7] -- closure
[5,6,7]
I still don't feel this language, so which way would a Haskell programmer go?
What are the differences between these two solutions?
Idiomatic Haskell would be filter (> 4) [1..7]
Note that you are not capturing any of the lexical scope in your closure, and are instead making use of a sectioned operator. That is to say, you want a partial application of >, which operator sections give you immediately. List comprehensions are sometimes attractive, but the usual perception is that they do not scale as nicely as the usual suite of higher order functions ("scale" with respect to more complex compositions). That kind of stylistic decision is, of course, largely subjective, so YMMV.
List comprehensions come in handy if the elements are somewhat complex and one needs to filter them by pattern matching, or the mapping part feels too complex for a lambda abstraction, which should be short (or so I feel), or if one has to deal with nested lists. In the latter case, a list comprehension is often more readable than the alternatives (to me, anyway).
For example something like:
[ (f b, (g . fst) a) | (Just a, Right bs) <- somelist, a `notElem` bs, (_, b) <- bs ]
But for your example, the section (>4) is a really nice way to write (\a -> a > 4) and because you use it only for filtering, most people would prefer ANthonys solution.

Resources