What's the recommended way of handling complexly composed POD(plain-old-data in OO) in Haskell? - haskell

I'm a Haskell newbie.
In statically typed OO languages (for instance, Java), all complex data structures are presented as class and instances. An object can have many attributes (fields). And another object can be a value of the field. Those fields can be accessed with their names, and statically typed by class. Finally, those objects construct huge graph of object which linked each other. Most program uses data graph like this.
How can I archive these functionality in Haskell?

If you really do have data without behavior, this maps nicely to a Haskell record:
data Person = Person { name :: String
, address :: String }
deriving (Eq, Read, Show)
data Department = Management | Accounting | IT | Programming
deriving (Eq, Read, Show)
data Employee = Employee { identity :: Person
, idNumber :: Int
, department :: Department }
| Contractor { identity :: Person
, company :: String }
deriving (Eq, Read, Show)
This says that a Person is a Person who has a name and address (both Strings); a Department is either Management, Accounting, IT, or Programming; and an Employee is either an Employee who has an identity (a Person), an idNumber (an Int), and a department (a Department), or is a Contractor who has an identity (a Person) and a company (a String). The deriving (Eq, Read, Show) lines enable you to compare these objects for equality, read them in, and convert them to strings.
In general, a Haskell data type is a combination of unions (also called sums) and tuples (also called products).1 The |s denote choice (a union): an Employee is either an Employee or a Contractor, a Department is one of four things, etc. In general, tuples are written something like the following:
data Process = Process String Int
This says that Process (in addition to being a type name) is a data constructor with type String -> Int -> Process. Thus, for instance, Process "init" 1, or Process "ls" 57300. A Process has to have both a String and an Int to exist. The record notation used above is just syntactic sugar for these products; I could also have written data Person = Person String String, and then defined
name :: Person -> String
name (Person n _) = n
address :: Person -> String
address (Person _ a) = a
Record notation, however, can be nice for complex data structures.
Also note that you can parametrize a Haskell type over other types; for instance, a three-dimensional point could be data Point3 a = Point3 a a a. This means that Point3 :: a -> a -> a -> Point3 a, so that one could write Point3 (3 :: Int) (4 :: Int) (5 :: Int) to get a Point3 Int, or Point3 (1.1 :: Double) (2.2 :: Double) (3.3 :: Double) to get a Point3 Double. (Or Point3 1 2 3 to get a Num a => Point3 a, if you've seen type classes and overloaded numeric literals.)
This is what you need to represent a data graph. However, take note: one problem for people transitioning from imperative languages to functional ones—or, really, between any two different paradigms (C to Python, Prolog to Ruby, Erlang to Java, whatever)—is to continue to try to solve problems the old way. The solution you're trying to model may not be constructed in a way amenable to easy functional programming techniques, even if the problem is. For instance, in Haskell, thinking about types is very important, in a way that's different from, say, Java. At the same time, implementing behaviors for those types is done very differently: higher-order functions capture some of the abstractions you've seen in Java, but also some which aren't easily expressible (map :: (a -> b) -> [a] -> [b], filter :: (a -> Bool) -> [a] -> [a], and foldr :: (a -> b -> b) -> b -> [a] -> b come to mind). So keep your options open, and consider addressing your problems in a functional way. Of course, maybe you are, in which case, full steam ahead. But do keep this in mind as you explore a new language. And have fun :-)
1: And recursion: you can represent a binary tree, for instance, with data Tree a = Leaf a | Branch a (Tree a) (Tree a).

Haskell has algebraic data types, which can describe structures or unions of structures such that something of a given type can hold one of a number of different sets of fields. These fields can set and accessed both positionally or via names with record syntax.
See here: http://learnyouahaskell.com/making-our-own-types-and-typeclasses

Related

Redundancy regarding product types and tuples in Haskell

In Haskell you have product types and you have tuples.
You use tuples if you don't want to associate a dedicated type with the value, and you can use product types if you wish to do so.
However I feel there is redundancy in the notation of product types
data Foo = Foo (String, Int, Char)
data Bar = Bar String Int Char
Why are there both kinds of notations? Is there any case where you would prefer one the other?
I guess you can't use record notation when using tuples, but that's just a convenience problem. Another thing might be the notion of order in tuples, as opposed to product types, but I think that's just due to the naming of the functions fst and snd.
#chi's answer is about the technical differences in terms of Haskell's evaluation model. I hope to give you some insight into the philosophy of this sort of typed programming.
In category theory we generally work with objects "up to isomorphism". Your Bar is of course isomorphic to (String, Int, Char), so from a categorical perspective they're the same thing.
bar_tuple :: Iso' Bar (String, Int, Char)
bar_tuple = iso to from
where to (Bar s i c) = (s, i, c)
from (s, i, c) = Bar s i c
In some sense tuples are a Platonic form of product type, in that they have no meaning beyond being a collection of disparate values. All the other product types can be mapped to and from a plain old tuple.
So why not use tuples everywhere, when all Haskell types ultimately boil down to a sum of products? It's about communication. As Martin Fowler says,
Any fool can write code that a computer can understand. Good programmers write code that humans can understand.
Names are important! Writing down a custom product type like
data Customer = Customer { name :: String, address :: String }
imbues the type Customer with meaning to the person reading the code, unlike (String, String) which just means "two strings".
Custom types are particularly useful when you want to enforce invariants by hiding the representation of your data and using smart constructors:
newtype NonEmpty a = NonEmpty [a]
nonEmpty :: [a] -> Maybe (NonEmpty a)
nonEmpty [] = Nothing
nonEmpty xs = Just (NonEmpty xs)
Now, if you don't export the NonEmpty constructor, you can force people to go through the nonEmpty smart constructor. If someone hands you a NonEmpty value you may safely assume that it has at least one element.
You can of course represent Customer as a tuple under the hood and expose evocatively-named field accessors,
newtype Customer = Bar (String, String)
name, address :: Customer -> String
name (Customer (n, a)) = n
address (Customer (n, a)) = a
but this doesn't really buy you much, except that it's now cheaper to convert Customer to a tuple (if, say, you're writing performance-sensitive code that works with a tuple-oriented API).
If your code is intended to solve a particular problem - which of course is the whole point of writing code - it pays to not just solve the problem, but make it look like you've solved it too. Someone - maybe you in a couple of years - is going to have to read this code and understand it with no a priori knowledge of how it works. Custom types are a very important communication tool in this regard.
The type
data Foo = Foo (String, Int, Char)
represents a double-lifted tuple. It values comprise
undefined
Foo undefined
Foo (undefined, undefined, undefined)
etc.
This is usually troublesome. Because of this, it's rare to see such definitions in actual code. We either have plain data types
data Foo = Foo String Int Char
or newtypes
newtype Foo = Foo (String, Int, Char)
The newtype can be just as inconvenient to use, but at least it
does not double-lift the tuple: undefined and Foo undefined are now equal values.
The newtype also provides zero-cost conversion between a plain tuple and Foo, in both directions.
You can see such newtypes in use e.g. when the programmer needs a different instance for some type class, than the one already associated with the tuple. Or, perhaps, it is used in a "smart constructor" idiom.
I would not expect the pattern used in Foo to be frequent. There is slight difference in what the constructor acts like: Foo :: (String, Int, Char) -> Foo as opposed to Bar :: String -> Int -> Char -> Bar. Then Foo undefined and Foo (undefined, ..., ...) are strictly speaking different things, whereas you miss one level of undefinedness in Bar.

Practical applications of Rank 2 polymorphism?

I'm covering polymorphism and I'm trying to see the practical uses of such a feature.
My basic understanding of Rank 2 is:
type MyType = ∀ a. a -> a
subFunction :: a -> a
subFunction el = el
mainFunction :: MyType -> Int
mainFunction func = func 3
I understand that this is allowing the user to use a polymorphic function (subFunction) inside mainFunction and strictly specify it's output (Int). This seems very similar to GADT's:
data Example a where
ExampleInt :: Int -> Example Int
ExampleBool :: Bool -> Example Bool
1) Given the above, is my understanding of Rank 2 polymorphism correct?
2) What are the general situations where Rank 2 polymorphism can be used, as opposed to GADT's, for example?
If you pass a polymorphic function as and argument to a Rank2-polymorphic function, you're essentially passing not just one function but a whole family of functions – for all possible types that fulfill the constraints.
Typically, those forall quantifiers come with a class constraint. For example, I might wish to do number arithmetic with two different types simultaneously (for comparing precision or whatever).
data FloatCompare = FloatCompare {
singlePrecision :: Float
, doublePrecision :: Double
}
Now I might want to modify those numbers through some maths operation. Something like
modifyFloat :: (Num -> Num) -> FloatCompare -> FloatCompare
But Num is not a type, only a type class. I could of course pass a function that would modify any particular number type, but I couldn't use that to modify both a Float and a Double value, at least not without some ugly (and possibly lossy) converting back and forth.
Solution: Rank-2 polymorphism!
modifyFloat :: (∀ n . Num n => n -> n) -> FloatCompare -> FloatCompare
mofidyFloat f (FloatCompare single double)
= FloatCompare (f single) (f double)
The best single example of how this is useful in practice are probably lenses. A lens is a “smart accessor function” to a field in some larger data structure. It allows you to access fields, update them, gather results... while at the same time composing in a very simple way. How it works: Rank2-polymorphism; every lens is polymorphic, with the different instantiations corresponding to the “getter” / “setter” aspects, respectively.
The go-to example of an application of rank-2 types is runST as Benjamin Hodgson mentioned in the comments. This is a rather good example and there are a variety of examples using the same trick. For example, branding to maintain abstract data type invariants across multiple types, avoiding confusion of differentials in ad, a region-based version of ST.
But I'd actually like to talk about how Haskell programmers are implicitly using rank-2 types all the time. Every type class whose methods have universally quantified types desugars to a dictionary with a field with a rank-2 type. In practice, this is virtually always a higher-kinded type class* like Functor or Monad. I'll use a simplified version of Alternative as an example. The class declaration is:
class Alternative f where
empty :: f a
(<|>) :: f a -> f a -> f a
The dictionary representing this class would be:
data AlternativeDict f = AlternativeDict {
empty :: forall a. f a,
(<|>) :: forall a. f a -> f a -> f a }
Sometimes such an encoding is nice as it allows one to use different "instances" for the same type, perhaps only locally. For example, Maybe has two obvious instances of Alternative depending on whether Just a <|> Just b is Just a or Just b. Languages without type classes, such as Scala, do indeed use this encoding.
To connect to leftaroundabout's reference to lenses, you can view the hierarchy there as a hierarchy of type classes and the lens combinators as simply tools for explicitly building the relevant type class dictionaries. Of course, the reason it isn't actually a hierarchy of type classes is that we usually will have multiple "instances" for the same type. E.g. _head and _head . _tail are both "instances" of Traversal' s a.
* A higher-kinded type class doesn't necessarily lead to this, and it can happen for a type class of kind *. For example:
-- Higher-kinded but doesn't require universal quantification.
class Sum c where
sum :: c Int -> Int
-- Not higher-kinded but does require universal quantification.
class Length l where
length :: [a] -> l
If you are using modules in Haskell, you are already using Rank-2 types. Theoretically speaking, modules are records with rank-2 type properties.
For example, the Foo module below in Haskell ...
module Foo(id) where
id :: forall a. a -> a
id x = x
import qualified Foo
main = do
putStrLn (Foo.id "hello")
return ()
... can actually be thought as a record as follows:
type FooType = FooType {
id :: forall a. a -> a
}
Foo :: FooType
Foo = Foo {
id = \x -> x
}
P/S (unrelated this question): from a language design perspective, if you are going to support module system, then you might as well support higher-rank types (i.e. allow arbitrary quantification of type variables on any level) to reduce duplication of efforts (i.e. type checking a module should be almost the same as type checking a record with higher rank types).

Why aren't there existentially quantified type variables in GHC Haskell

There are universally quantified type variables, and there are existentially quantified data types. However, despite that people give pseudocode of the form exists a. Int -> a to help explain concepts sometimes, it doesn't seem like a compiler extension that there's any real interest in. Is this just a "there isn't much value in adding this" kind of thing (because it does seem valuable to me), or is there a problem like undecidability that's makes it truly impossible.
EDIT:
I've marked viorior's answer as correct because it seems like it is probably the actual reason why this was not included. I'd like to add some additional commentary though just in case anyone would want to help clarify this more.
As requested in the comments, I'll give an example of why I would consider this useful. Suppose we have a data type as follows:
data Person a = Person
{ age: Int
, height: Double
, weight: Int
, name: a
}
So we choose parameterize over a, which is a naming convention (I know that it probably makes more sense in this example to make a NamingConvention ADT with appropriate data constructors for the American "first,middle,last", the hispanic "name,paternal name,maternal name", etc. But for now, just go with this).
So, there are several functions we see that basically ignore the type that Person is parameterized over. Examples would be
age :: Person a -> Int
height :: Person a -> Double
weight :: Person a -> Int
And any function built on top of these could similarly ignore the a type. For example:
atRiskForDiabetes :: Person a -> Bool
atRiskForDiabetes p = age p + weight p > 200
--Clearly, I am not actually a doctor
Now, if we have a heterogeneous list of people (of type [exists a. Person a]), we would like to be able to map some of our functions over the list. Of course, there are some useless ways to map:
heteroList :: [exists a. Person a]
heteroList = [Person 20 30.0 170 "Bob Jones", Person 50 32.0 140 3451115332]
extractedNames = map name heteroList
In this example, extractedNames is of course useless because it has type [exists a. a]. However, if we use our other functions:
totalWeight :: [exists a. Person a] -> Int
totalWeight = sum . map age
numberAtRisk :: [exists a. Person a] -> Int
numberAtRisk = length . filter id . map atRiskForDiabetes
Now, we have something useful that operates over a heterogeneous collection (And, we didn't even involve typeclasses). Notice that we were able to reuse our existing functions. Using an existential data type would go as follows:
data SomePerson = forall a. SomePerson (Person a) --fixed, thanks viorior
But now, how can we use age and atRiskForDiabetes? We can't. I think that you would have to do something like this:
someAge :: SomePerson -> Int
someAge (SomePerson p) = age p
Which is really lame because you have to rewrite all of your combinators for a new type. It gets even worse if you want to do this with a data type that's parameterized over several type variables. Imagine this:
somewhatHeteroPipeList :: forall a b. [exists c d. Pipe a b c d]
I won't explain this line of thought any further, but just notice that you'd be rewriting a lot of combinators to do anything like this using just existential data types.
That being said, I hope I've give a mildly convincing use that this could be useful. If it doesn't seem useful (or if the example seems too contrived), feel free to let me know. Also, since I am firstly a programmer and have no training in type theory, it's a little difficult for me to see how to use Skolem's theorum (as posted by viorior) here. If anyone could show me how to apply it to the Person a example I gave, I would be very grateful. Thanks.
It is unnecessary.
By Skolem's Theorem we could convert existential quantifier into universal quantifier with higher rank types:
(∃b. F(b)) -> Int <===> ∀b. (F(b) -> Int)
Every existentially quantified type of rank n+1 can be encoded as a universally quantified type of rank n
Existentially quantified types are available in GHC, so the question is predicated on a false assumption.

Haskell generic data structure

I want to create a type to store some generic information, as for me, this type is
Molecule, where i store chemical graph and molecular properties.
data Molecule = Molecule {
name :: Maybe String,
graph :: Gr Atom Bond,
property :: Maybe [Property] -- that's a question
} deriving(Show)
Properties I want to represent as tuple
type Property a = (String,a)
because a property may have any type: Float, Int, String e.t.c.
The question is how to form Molecule data structure, so I will be able to collect any numbers of any types of properties in Molecule. If I do
data Molecule a = Molecule {
name :: Maybe String,
graph :: Gr Atom Bond,
property :: Maybe [Property a]
} deriving(Show)
I have to diretly assign one type when I create a molecule.
If you know in advance the set of properties a molecule might have, you could define a sum type:
data Property = Mass Float | CatalogNum Int | Comment String
If you want this type to be extensible, you could use Data.Dynamic as another answer suggests. For instance:
data Molecule = Molecule { name :: Maybe String,
graph :: Gr Atom Bond,
property :: [(String,Dynamic)]
} deriving (Show)
mass :: Molecule -> Maybe Float
mass m = case lookup "mass" (property m) of
Nothing -> Nothing
Just i -> fromDynamic i
You could also get rid of the "stringly-typed" (String,a) pairs, say:
-- in Molecule:
-- property :: [Dynamic]
data Mass = Mass Float
mass :: Molecule -> Maybe Mass
mass m = ...
Neither of these attempts gives much type safety over just parsing out of (String,String) pairs since there is no way to enforce the invariant that the user creates well-formed properties (short of wrapping properties in a new type and hiding the constructors in another module, which again breaks extensibility).
What you might want are Ocaml-style polymorphic variants. You could look at Vinyl, which provides type-safe extensible records.
As an aside, you might want to get rid of the Maybe wrapper around the list of properties, since the empty list already encodes the case of no properties.
You might want to look at Data.Dynamic for a psudo-dynamic typing solution.

Relationship between TypeRep and "Type" GADT

In Scrap your boilerplate reloaded, the authors describe a new presentation of Scrap Your Boilerplate, which is supposed to be equivalent to the original.
However, one difference is that they assume a finite, closed set of "base" types, encoded with a GADT
data Type :: * -> * where
Int :: Type Int
List :: Type a -> Type [a]
...
In the original SYB, type-safe cast is used, implemented using the Typeable class.
My questions are:
What is the relationship between these two approaches?
Why was the GADT representation chosen for the "SYB Reloaded" presentation?
[I am one of the authors of the "SYB Reloaded" paper.]
TL;DR We really just used it because it seemed more beautiful to us. The class-based Typeable approach is more practical. The Spine view can be combined with the Typeable class and does not depend on the Type GADT.
The paper states this in its conclusions:
Our implementation handles the two central ingredients of generic programming differently from the original SYB paper: we use overloaded functions with
explicit type arguments instead of overloaded functions based on a type-safe
cast 1 or a class-based extensible scheme [20]; and we use the explicit spine
view rather than a combinator-based approach. Both changes are independent
of each other, and have been made with clarity in mind: we think that the structure of the SYB approach is more visible in our setting, and that the relations
to PolyP and Generic Haskell become clearer. We have revealed that while the
spine view is limited in the class of generic functions that can be written, it is
applicable to a very large class of data types, including GADTs.
Our approach cannot be used easily as a library, because the encoding of
overloaded functions using explicit type arguments requires the extensibility of
the Type data type and of functions such as toSpine. One can, however, incorporate Spine into the SYB library while still using the techniques of the SYB
papers to encode overloaded functions.
So, the choice of using a GADT for type representation is one we made mainly for clarity. As Don states in his answer, there are some obvious advantages in this representation, namely that it maintains static information about what type a type representation is for, and that it allows us to implement cast without any further magic, and in particular without the use of unsafeCoerce. Type-indexed functions can also be implemented directly by using pattern matching on the type, and without falling back to various combinators such as mkQ or extQ.
Fact is that I (and I think the co-authors) simply were not very fond of the Typeable class. (In fact, I'm still not, although it is finally becoming a bit more disciplined now in that GHC adds auto-deriving for Typeable, makes it kind-polymorphic, and will ultimately remove the possibility to define your own instances.) In addition, Typeable wasn't quite as established and widely known as it is perhaps now, so it seemed appealing to "explain" it by using the GADT encoding. And furthermore, this was the time when we were also thinking about adding open datatypes to Haskell, thereby alleviating the restriction that the GADT is closed.
So, to summarize: If you actually need dynamic type information only for a closed universe, I'd always go for the GADT, because you can use pattern matching to define type-indexed functions, and you do not have to rely on unsafeCoerce nor advanced compiler magic. If the universe is open, however, which is quite common, certainly for the generic programming setting, then the GADT approach might be instructive, but isn't practical, and using Typeable is the way to go.
However, as we also state in the conclusions of the paper, the choice of Type over Typeable isn't a prerequisite for the other choice we're making, namely to use the Spine view, which I think is more important and really the core of the paper.
The paper itself shows (in Section 8) a variation inspired by the "Scrap your Boilerplate with Class" paper, which uses a Spine view with a class constraint instead. But we can also do a more direct development, which I show in the following. For this, we'll use Typeable from Data.Typeable, but define our own Data class which, for simplicity, just contains the toSpine method:
class Typeable a => Data a where
toSpine :: a -> Spine a
The Spine datatype now uses the Data constraint:
data Spine :: * -> * where
Constr :: a -> Spine a
(:<>:) :: (Data a) => Spine (a -> b) -> a -> Spine b
The function fromSpine is as trivial as with the other representation:
fromSpine :: Spine a -> a
fromSpine (Constr x) = x
fromSpine (c :<>: x) = fromSpine c x
Instances for Data are trivial for flat types such as Int:
instance Data Int where
toSpine = Constr
And they're still entirely straightforward for structured types such as binary trees:
data Tree a = Empty | Node (Tree a) a (Tree a)
instance Data a => Data (Tree a) where
toSpine Empty = Constr Empty
toSpine (Node l x r) = Constr Node :<>: l :<>: x :<>: r
The paper then goes on and defines various generic functions, such as mapQ. These definitions hardly change. We only get class constraints for Data a => where the paper has function arguments of Type a ->:
mapQ :: Query r -> Query [r]
mapQ q = mapQ' q . toSpine
mapQ' :: Query r -> (forall a. Spine a -> [r])
mapQ' q (Constr c) = []
mapQ' q (f :<>: x) = mapQ' q f ++ [q x]
Higher-level functions such as everything also just lose their explicit type arguments (and then actually look exactly the same as in original SYB):
everything :: (r -> r -> r) -> Query r -> Query r
everything op q x = foldl op (q x) (mapQ (everything op q) x)
As I said above, if we now want to define a generic sum function summing up all Int occurrences, we cannot pattern match anymore, but have to fall back to mkQ, but mkQ is defined purely in terms of Typeable and completely independent of Spine:
mkQ :: (Typeable a, Typeable b) => r -> (b -> r) -> a -> r
(r `mkQ` br) a = maybe r br (cast a)
And then (again exactly as in original SYB):
sum :: Query Int
sum = everything (+) sumQ
sumQ :: Query Int
sumQ = mkQ 0 id
For some of the stuff later in the paper (e.g., adding constructor information), a bit more work is needed, but it can all be done. So using Spine really does not depend on using Type at all.
Well, obviously the Typeable use is open -- new variants can be added after the fact, and without modifying the original definitions.
The important change though is that in that TypeRep is untyped. That is, there is no connection between the runtime type , TypeRep, and the static type it encodes. With the GADT approach we can encode the mapping between a type a and its Type, given by the GADT Type a.
We thus bake in evidence for the type rep being statically linked to its origin type, and can write statically typed dynamic application (for example) using Type a as evidence that we have a runtime a.
In the older TypeRep case, we have no such evidence and it comes down to runtime string equality, and a coerce and hope for the best through fromDynamic.
Compare the signatures:
toDyn :: Typeable a => a -> TypeRep -> Dynamic
versus GADT style:
toDyn :: Type a => a -> Type a -> Dynamic
I can't fake my type evidence, and I can use that later when reconstructing things, to e.g. lookup the type class instances for a when all I have is a Type a.

Resources