Haskell Multidimensional Arrays with Compiler-enforced lengths - haskell

I've been trying out some Haskell because I was intrigued by the strong typing, and I'm confused about the best way to tackle this:
The Vector datatype defined in Data.Vector allows for multidimensional arrays by way of nested arrays. However, these are constructed from lists, and lists of varying lengths are considered the same datatype (unlike tuples of varying lengths).
How might I extend this datatype (or write an analogous one) that functions in the same way, except that vectors of different lengths are considered to be different datatypes, so any attempt to create a multidimensional array/matrix with rows of differing lengths (for example) would result in a compile-time error?
It seems that tuples manage this by way of writing out 63 different definitions (one for each valid length), but I would like to be able to handle vectors of arbitrary length, if possible.

I see two ways for doing this:
1) The "typed" way: Using dependent types. This is, to some extent, possible in Haskell with the recent GHC extension for DataKinds*. Even better, to use a language with a really advanced type system, like Agda.
2) The other way: encode your vectors like
data Vec a = {values :: [a], len :: [Int]}
Then, export only
buildVec :: [a] -> Vec a
buildVec as = Vec as (length as)
and check for correct lengths in the other functions that use vectors of same length, e.g. ensure same-length vectors in a matrix function or in Vec additions. Or even better: provide another custom builder/ctor for matrices.
*I just saw: exactly what you're wanting is the standard example for DataKinds.

This form of typing, where the type depends on the value, is often called dependently typed programming, and, as luck have it, Wolfgang Jeltsch wrote a blog post about dependent types in Haskell using GADTs and TypeFamilies.
The gist of the blogpost is that if we have two types representing natural numbers:
data Zero
data Succ nat
one can build lists with type enforced lengths in the following way:
data List el len where
Empty :: List el Zero
cons :: el -> List el nat -> List el (Succ nat)

Related

Finding the number of elements in a matrix

I'm a Haskell newcomer, so cut me a bit of slack :P
I need to write a Haskell function that goes through a matrix and outputs a list of all matching elements to a given element (like using filter) and then matches the list against another to check if they are the same.
checkMatrix :: Matrix a -> a -> [a] -> Bool
I have tried variations of using filter, and using the !! operator and I can't figure it out. I don't really want to get the answer handed to me, just need some pointers for getting me on the right path
checkMatrix :: Matrix a -> a -> [a] -> Bool
checkMatrix matr a lst = case matr of
x:xs | [] -> (i don't really know what to put for the base case)
| filter (== True) (x:xs !! 0) -> checkMatrix xs a lst
Thats all i got, I'm really very lost as to what to do next
tl;dr You want something to the effect of filter someCondition (toList matrix) == otherList, with minor details varying depending on your matrix type and your specific needs.
The Full Answer
I don't know what Matrix type you're using, but the approach is going to be similar for any reasonably defined matrix type.
For this answer, I'll assume you're using the Data.Matrix class from the package on Hackage called matrix.
You are right to think you should use filter. Thinking functionally, you want to eliminate some elements from the matrix and keep others, based on a condition. However, a matrix does not provide a natural way to perform filter on it, as the idea is not really well-defined. So, instead, we want to extract the elements from our matrix into a list first. The matrix package provides the following function, which does just that.
toList :: Matrix a -> [a]
Once you have a list representation, you can very easily use filter to get the elements that you want.
A few caveats and notes.
If the matrix package that you're using doesn't define toList itself, check if it defines a Foldable instance for the matrix type. If it does, then Data.Foldable has a general-purpose toList that works for all Foldable types.
Be careful with the ordering here. It's not entirely clear what order the elements should be put into the list in, since matrices are two-dimensional and lists are inherently one-dimensional. If the ordering matters for whatever you're doing, you might have to put some additional effort into guaranteeing the desired order. If it does not matter, consider using Data.Set or some other unordered collection instead of lists.
I don't see any constraints in your checkMatrix implementation. Remember that comparing elements of lists adds an Eq a constraint, and if you want to use an unordered collection then that's going to add Ord a instead.

Type constraints on dimensionality of vectors in F# and Haskell (Dependent Types)

I'm new to F# and Haskell and am implementing a project in order to determine which language I would prefer to devote more time to.
I have a numerous situations where I expect a given numerical type to have given dimensions based on parameters given to a top-level function (ie, at runtime). For example, in this F# snippet, I have
type DataStreamItem = LinearAlgebra.Vector<float32>
type Ball =
{R : float32;
X : DataStreamItem}
and I expect all instances of type DataStreamItem to have D dimensions.
My question is in the interests of algorithm development and debugging since such shape-mismatche-bugs can be a headache to pin down but should be a non-issue when the algorithm is up-and-running:
Is there a way, in either F# or Haskell, to constrain DataStreamItem and / or Ball to have dimensions of D? Or do I need to resort to pattern matching on every calculation?
If the latter is the case, are there any good, light-weight paradigms to catch such constraint violations as soon as they occur (and that can be removed when performance is critical)?
Edit:
To clarify the sense in which D is constrained:
D is defined such that if you expressed the algorithm of the function main(DataStream) as a computation graph, all of the intermediate calculations would depend on the dimension of D for the execution of main(DataStream). The simplest example I can think of would be a dot-product of M with DataStreamItem: the dimension of DataStream would determine the creation of dimension parameters of M
Another Edit:
A week later, I find the following blog outlining precisely what I was looking for in dependant types in Haskell:
https://blog.jle.im/entry/practical-dependent-types-in-haskell-1.html
And Another:
This reddit contains some discussion on Dependent Types in Haskell and contains a link to the quite interesting dissertation proposal of R. Eisenberg.
Neither Haskell not F# type system is rich enough to (directly) express statements of the sort "N nested instances of a recursive type T, where N is between 2 and 6" or "a string of characters exactly 6 long". Not in those exact terms, at least.
I mean, sure, you can always express such a 6-long string type as type String6 = String6 of char*char*char*char*char*char or some variant of the sort (which technically should be enough for your particular example with vectors, unless you're not telling us the whole example), but you can't say something like type String6 = s:string{s.Length=6} and, more importantly, you can't define functions of the form concat: String<n> -> String<m> -> String<n+m>, where n and m represent string lengths.
But you're not the first person asking this question. This research direction does exist, and is called "dependent types", and I can express the gist of it most generally as "having higher-order, more powerful operations on types" (as opposed to just union and intersection, as we have in ML languages) - notice how in the example above I parametrize the type String with a number, not another type, and then do arithmetic on that number.
The most prominent language prototypes (that I know of) in this direction are Agda, Idris, F*, and Coq (not really the full deal AFAIK). Check them out, but beware: this is kind of the edge of tomorrow, and I wouldn't advise starting a big project based on those languages.
(edit: apparently you can do certain tricks in Haskell to simulate dependent types, but it's not very convenient, and you have to enable UndecidableInstances)
Alternatively, you could go with a weaker solution of doing the checks at runtime. The general gist is: wrap your vector types in a plain wrapper, don't allow direct construction of it, but provide constructor functions instead, and make those constructor functions ensure the desired property (i.e. length). Something like:
type Stream4 = private Stream4 of DataStreamItem
with
static member create (item: DataStreamItem) =
if item.Length = 4 then Some (Stream4 item)
else None
// Alternatively:
if item.Length <> 4 then failwith "Expected a 4-long vector."
item
Here is a fuller explanation of the approach from Scott Wlaschin: constrained strings.
So if I understood correctly, you're actually not doing any type-level arithmetic, you just have a “length tag” that's shared in a chain of function calls.
This has long been possible to do in Haskell; one way that I consider quite elegant is to annotate your arrays with a standard fixed-length type of the desired length:
newtype FixVect v s = FixVect { getFixVect :: VU.Vector s }
To ensure the correct length, you only provide (polymorphic) smart constructors that construct from the fixed-length type – perfectly safe, though the actual dimension number is nowhere mentioned!
class VectorSpace v => FiniteDimensional v where
asFixVect :: v -> FixVect v (Scalar v)
instance FiniteDimensional Float where
asFixVect s = FixVect $ VU.singleton s
instance (FiniteDimensional a, FiniteDimensional b, Scalar a ~ Scalar b) => FiniteDimensional (a,b) where
asFixVect (a,b) = case (asFixVect a, asFixVect b) of
(FixVect av, FixVect bv) -> FixVect $ av<>bv
This construction from unboxed tuples is really inefficient, however this doesn't mean you can write efficient programs with this paradigm – if the dimension always stays constant, you only need to wrap and unwrap the once and can do all the critical operations through safe yet runtime-unchecked zips, folds and LA combinations.
Regardless, this approach isn't really widely used. Perhaps the single constant dimension is in fact too limiting for most relevant operations, and if you need to unwrap to tuples often it's way too inefficient. Another approach that is taking off these days is to actually tag the vectors with type-level numbers. Such numbers have become available in a usable form with the introduction of data kinds in GHC-7.4. Up until now, they're still rather unwieldy and not fit for proper arithmetic, but the upcoming 8.0 will greatly improve many aspects of this dependently-typed programming in Haskell.
A library that offers efficient length-indexed arrays is linear.

Can all recursive structures be replaced by a non recursive solution?

For example, could you define a list in Haskell without defining a recursive structure? Or replace all lists by some function(s)?
data List a = Empty | (a, List a) -- <- recursive definition
EDIT
I gave the list as an example, but I was really asking about all data structures in general.
Maybe we only need one recursive data structure for all cases where recursion is needed? Like the Y combinator being the only recursive function needed. #TikhonJelvis 's answer made me think about that.
Now I'm pretty sure this post is better suited for cs.stackexchange.
About current selected answer
I was really looking for answers that looked more like the ones given by #DavidYoung & #TikhonJelvis, but they only give a partial answer and I appreciate them.
So, if any has an answer that uses functional concepts, please share.
That's a bit of an odd question. I think the answer is not really, but the definition of the data type does not have to be recursive directly.
Ultimately, lists are recursive data structures. You can't define them without having some sort of recursion somewhere. It's core to their essence.
However, we don't have to make the actual definition of List recursive. Instead, we can factor out recursion into a single data type Fix and then define all other recursive types with it. In a sense, Fix just captures the essence of what it means for a data structure to be recursive. (It's the type-level version of the fix function, which does the same thing for functions.)
data Fix f = Roll (f (Fix f))
The idea is that Fix f corresponds to f applied to itself repeatedly. To make it work with Haskell's algebraic data types, we have to throw in a Roll constructor at every level, but this does not change what the type represents.
Essentially, f applied to itself repeatedly like this is the essence of recursion.
Now we can define a non-recursive analog to List that takes an extra type argument f that replaces our earlier recursion:
data ListF a f = Empty | Cons a f
This is a straightforward data type that is not recursive.
If we combine the two, we get our old List type except with some extra Roll constructors at each recursive step.
type List a = Fix (ListF a)
A value of this type looks like this:
Roll (Cons 1 (Roll (Cons 2 (Roll Empty))))
It carries the same information as (Cons 1 (Cons 2 Empty)) or even just [1, 2], but a few extra constructors sprinkled through.
So if you were given Fix, you could define List without using recursion. But this isn't particularly special because, in a sense, Fix is recursion.
I'm not sure if all recursive structures can be replaced by a non-recursive version but some certainly can, including lists. One possible way to do this is with what is called a Boehm-Berarducci encoding. This a way to represent a structure as a function, specifically the fold over that structure (foldr in the case of a list):
{-# LANGUAGE RankNTypes #-}
type List a = forall x . (a -> x -> x) -> x -> x
-- ^^^^^^^^^^^^^ ^
-- Cons branch Nil branch
(From the above link with slightly different formatting)
This type is also something like a case analysis over the list. The first argument represents the cons case and the second argument represents the nil case.
In general, the branches of a sum type become different arguments to the function and fields of a product type become function types with an argument for each field. Note that in the encoding above, the nil branch is (in general) a non-function because the nil constructor takes no arguments, while the cons branch has two arguments since the cons constructor takes two arguments. The recursion parts of the definition are "replaced" with a Rank N type (called x here).
I think this question breaks down into considering three distinct feature subsets that Haskell provides:
Facilities for defining new data types.
A repertoire of built-in types.
A foreign function interface that allows interfacing with functionality external to the language.
Looking only at (1), the native type definition facilities don't really provide for defining any infinitely-large types other than by recursion.
Looking at (2), however, Haskell 2010 provides the Data.Array module, which provides array types that together with (1) can be used to build non-recursive definitions of many different structures.
And even if the language did not provide arrays, (3) means that we could bolt them to the language as an FFI extension. Haskell implementations are also allowed to provide extra functionality that can be used for this in stead of the FFI, and many libraries for GHC exploit those (e.g., vector).
So I'd say that the best answer is that Haskell only allows you to define nonrecursive collection types only to the extent that it provides you with basic built-in ones that you can use as building blocks for more complex ones.

When to expose constructors of a data type when designing data structures?

When designing data structures in functional languages there are 2 options:
Expose their constructors and pattern match on them.
Hide their constructors and use higher-level functions to examine the data structures.
In what cases, what is appropriate?
Pattern matching can make code much more readable or simpler. On the other hand, if we need to change something in the definition of a data type then all places where we pattern-match on them (or construct them) need to be updated.
I've been asking this question myself for some time. Often it happens to me that I start with a simple data structure (or even a type alias) and it seems that constructors + pattern matching will be the easiest approach and produce a clean and readable code. But later things get more complicated, I have to change the data type definition and refactor a big part of the code.
The essential factor for me is the answer to the following question:
Is the structure of my datatype relevant to the outside world?
For example, the internal structure of the list datatype is very much relevant to the outside world - it has an inductive structure that is certainly very useful to expose to consumers, because they construct functions that proceed by induction on the structure of the list. If the list is finite, then these functions are guaranteed to terminate. Also, defining functions in this way makes it easy to provide properties about them, again by induction.
By contrast, it is best for the Set datatype to be kept abstract. Internally, it is implemented as a tree in the containers package. However, it might as well have been implemented using arrays, or (more usefully in a functional setting) with a tree with a slightly different structure and respecting different invariants (balanced or unbalanced, branching factor, etc). The need to enforce any invariants above and over those that the constructors already enforce through their types, by the way, precludes letting the datatype be concrete.
The essential difference between the list example and the set example is that the Set datatype is only relevant for the operations that are possible on Set's. Whereas lists are relevant because the standard library already provides many functions to act on them, but in addition their structure is relevant.
As a sidenote, one might object that actually the inductive structure of lists, which is so fundamental to write functions whose termination and behaviour is easy to reason about, is captured abstractly by two functions that consume lists: foldr and foldl. Given these two basic list operators, most functions do not need to inspect the structure of a list at all, and so it could be argued that lists too coud be kept abstract. This argument generalizes to many other similar structures, such as all Traversable structures, all Foldable structures, etc. However, it is nigh impossible to capture all possible recursion patterns on lists, and in fact many functions aren't recursive at all. Given only foldr and foldl, one would, writing head for example would still be possible, though quite tedious:
head xs = fromJust $ foldl (\b x -> maybe (Just x) Just b) Nothing xs
We're much better off just giving away the internal structure of the list.
One final point is that sometimes the actual representation of a datatype isn't relevant to the outside world, because say it is some kind of optimised and might not be the canonical representation, or there isn't a single "canonical" representation. In these cases, you'll want to keep your datatype abstract, but offer "views" of your datatype, which do provide concrete representations that can be pattern matched on.
One example would be if wanted to define a Complex datatype of complex numbers, where both cartesian forms and polar forms can be considered canonical. In this case, you would keep Complex abstract, but export two views, ie functions polar and cartesian that return a pair of a length and an angle or a coordinate in the cartesian plane, respectively.
Well, the rule is pretty simple: If it's easy to construct wrong values by using the actual constructors, then don't allow them to be used directly, but instead provide smart constructors. This is the path followed by some data structures like Map and Set, which are easy to get wrong.
Then there are the types for which it's impossible or hard to construct inconsistent/wrong values either because the type doesn't allow that at all or because you would need to introduce bottoms. The length-indexed list type (commonly called Vec) and most monads are examples of that.
Ultimately this is your own decision. Put yourself into the user's perspective and make the tradeoff between convenience and safety. If there is no tradeoff, then always expose the constructors. Otherwise your library users will hate you for the unnecessary opacity.
If the data type serves a simple purpose (like Maybe a) and no (explicit or implicit) assumptions about the data type can be violated by directly constructing a value via the data constructors, I would expose the constructors.
On the other hand, if the data type is more complex (like a balanced tree) and/or it's internal representation is likely to change, I usually hide the constructors.
When using a package, there's an unwritten rule that the interface exposed by a non-internal module should be "safe" to use on the given data type. Considering the balanced tree example, exposing the data constructors allows one to (accidentally) construct an unbalanced tree, and so the assumed runtime guarantees for searching the tree etc might be violated.
If the type is used to represent values with a canonical definition and representation (many mathematical objects fall into this category), and it's not possible to construct "invalid" values using the type, then you should expose the constructors.
For example, if you're representing something like two dimensional points with your own type (including a newtype), you might as well expose the constructor. The reality is that a change to this datatype is not going to be a change in how 2d points are represented, it's going to be a change in your need to use 2d points (maybe you're generalising to 3d space, maybe you're adding a concept of layers, or whatever), and is almost certain to need attention in the parts of the code using values of this type no matter what you do.[1]
A complex type representing something specific to your application or field is quite likely to undergo changes to the representation while continuing to support similar operations. Therefore you only want other modules depending on the operations, not on the internal structure. So you shouldn't expose the constructors.
Other types represent things with canonical definitions but not canonical representations. Everyone knows the properties expected of maps and sets, but there are lots of different ways of representing values that support those properties. So you again only want other modules depending on the operations they support, not on the particular representations.
Some types, whether or not they are if simple with canonical representations, allow the construction of values in the program which don't represent a valid value in the abstract concept the type is supposed to represent. A simple example would be a type representing a self-balancing binary search tree; client code with access to the constructors could easily construct invalid trees. Exposing the constructors either means you need to assume that such values passed in from outside may be invalid and therefore you need to make something sensible happen even for bizarre values, or means that it's the responsibility of the programmers working with your interface to ensure they don't violate any assumptions. It's usually better to just keep such types from being constructed directly outside your module.
Basically it comes down to the concept your type is supposed to represent. If your concept maps in a very simple and obvious[2] way directly to values in some data type which isn't "more inclusive" than the concept due to the compiler being unable to check needed invariants, then the concept is pretty much "the same" as the data type, and exposing its structure is fine. If not, then you probably need to keep the structure hidden.
[1] A likely change though would be to change which numeric type you're using for the coordinate values, so you probably do have to think about how to minimise the impact of such changes. That's pretty orthogonal to whether or not you expose the constructors though.
[2] "Obvious" here meaning that if you asked 10 people independently to come up with a data type representing the concept they would all come back with the same thing, modulo changing the names.
I would propose a different, noticeably more restrictive rule than most people. The central criterion would be:
Do you guarantee that this type will never, ever change? If so, exposing the constructors might be a good idea. Good luck with that, though!
But the types for which you can make that guarantee tend to be very simple, generic "foundation" types like Maybe, Either or [], which one could arguably write once and then never revisit again.
Though even those can be questioned, because they do get revisited from time to time; there's people who have used Church-encoded versions of Maybe and List in various contexts for performance reasons, e.g.:
{-# LANGUAGE RankNTypes #-}
newtype Maybe' a = Maybe' { elimMaybe' :: forall r. r -> (a -> r) -> r }
nothing = Maybe' $ \z k -> z
just x = Maybe' $ \z k -> k x
newtype List' a = List' { elimList' :: forall r. (a -> r -> r) -> r -> r }
nil = List' $ \k z -> z
cons x xs = List' $ \k z -> k x (elimList' k z xs)
These two examples highlight something important: you can replace the Maybe' type's implementation shown above with any other implementation as long as it supports the following three functions:
nothing :: Maybe' a
just :: a -> Maybe' a
elimMaybe' :: Maybe' a -> r -> (a -> r) -> r
...and the following laws:
elimMaybe' nothing z x == z
elimMaybe' (just x) z f == f x
And this technique can be applied to any algebraic data type. Which to me says that pattern matching against concrete constructors is just insufficiently abstract; it doesn't really gain you anything that you can't get out of the abstract constructors + destructor pattern, and it loses implementation flexibility.

Facilities for generating Haskell types in Haskell ("second order Haskell")?

Apologies in advance if this question is a bit vague. It's the result of some weekend daydreaming.
With Haskell's wonderful type system, it's delightfully pleasing to express mathematical (especially algebraic) structure as typeclasses. I mean, just have a look at numeric-prelude! But taking advantage of such wonderful type structure in practice has always seemed difficult to me.
You have a nice, type-system way of expressing that v1 and v2 are elements of a vector space V and that w is a an element of a vector space W. The type system lets you write a program adding v1 and v2, but not v1 and w. Great! But in practice you might want to play with potentially hundreds of vector spaces, and you certainly don't want to create types V1, V2, ..., V100 and declare them instances of the vector space typeclass! Or maybe you read some data from the real world resulting in symbols a, b and c - you may want to express that the free vector space over these symbols really is a vector space!
So you're stuck, right? In order to do many of the things you'd like to do with vector spaces in a scientific computing setting, you have to give up your typesystem by foregoing a vector space typeclass and having functions do run-time compatibility checks instead. Should you have to? Shouldn't it be possible to use the fact that Haskell is purely functional to write a program that generates all the types you need and inserts them into the real program? Does such a technique exist? By all means do point out if I'm simply overlooking something basic here (I probably am) :-)
Edit: Just now did I discover fundeps. I'll have to think a bit about how they relate to my question (enlightening comments with regards to this are appreciated).
Template Haskell allows this. The wiki page has some useful links; particularly Bulat's tutorials.
The top-level declaration syntax is the one you want. By typing:
mkFoo = [d| data Foo = Foo Int |]
you generate a Template Haskell splice (like a compile-time function) that will create a declaration for data Foo = Foo Int just by inserting the line $(mkFoo).
While this small example isn't too useful, you could provide an argument to mkFoo to control how many different declarations you want. Now a $(mkFoo 100) will produce 100 new data declarations for you. You can also use TH to generate type class instances. My adaptive-tuple package is a very small project that uses Template Haskell to do something similar.
An alternative approach would be to use Derive, which will automatically derive type class instances. This might be simpler if you only need the instances.
Also there are some simple type-level programming techniques in Haskell. A canonical example follows:
-- A family of types for the natural numbers
data Zero
data Succ n
-- A family of vectors parameterized over the naturals (using GADTs extension)
data Vector :: * -> * -> * where
-- empty is a vector with length zero
Empty :: Vector Zero a
-- given a vector of length n and an a, produce a vector of length n+1
Cons :: a -> Vector n a -> Vector (Succ n) a
-- A type-level adder for natural numbers (using TypeFamilies extension)
type family Plus n m :: *
type instance Plus Zero n = n
type instance Plus (Succ m) n = Succ (Plus m n)
-- Typesafe concatenation of vectors:
concatV :: Vector n a -> Vector m a -> Vector (Plus n m) a
concatV Empty ys = ys
concatV (Cons x xs) ys = Cons x (concatV xs ys)
Take a moment to take that in. I think it is pretty magical that it works.
However, type-level programming in Haskell is in the feature-uncanny-valley -- just enough to draw attention to how much you can't do. Dependently-typed languages like Agda, Coq, and Epigram take this style to its limit and full power.
Template Haskell is much more like the usual LISP-macro style of code generation. You write some code to write some code, then you say "ok insert that generated code here". Unlike the above technique, you can write any computably-specified code that way, but you don't get the very general typechecking as is seen in concatV above.
So you have a few options to do what you want. I think metaprogramming is a really interesting space, and in some ways still quite young. Have fun exploring. :-)

Resources