What is the difference between value constructors and tuples?

What is the difference between value constructors and tuples? - haskell

It's written that Haskell tuples are simply a different syntax for algebraic data types. Similarly, there are examples of how to redefine value constructors with tuples.
For example, a Tree data type in Haskell might be written as
data Tree a = EmptyTree | Node a (Tree a) (Tree a)
which could be converted to "tuple form" like this:
data Tree a = EmptyTree | Node (a, Tree a, Tree a)
What is the difference between the Node value constructor in the first example, and the actual tuple in the second example? i.e. Node a (Tree a) (Tree a) vs. (a, Tree a, Tree a) (aside from just the syntax)?
Under the hood, is Node a (Tree a) (Tree a) just a different syntax for a 3-tuple of the appropriate types at each position?
I know that you can partially apply a value constructor, such as Node 5 which will have type: (Node 5) :: Num a => Tree a -> Tree a -> Tree a
You sort of can partially apply a tuple too, using (,,) as a function ... but this doesn't know about the potential types for the un-bound entries, such as:
Prelude> :t (,,) 5
(,,) 5 :: Num a => b -> c -> (a, b, c)
unless, I guess, you explicitly declare a type with ::.
Aside from syntactical specialties like this, plus this last example of the type scoping, is there a material difference between whatever a "value constructor" thing actually is in Haskell, versus a tuple used to store positional values of the same types are the value constructor's arguments?

Well, coneptually there indeed is no difference and in fact other languages (OCaml, Elm) present tagged unions exactly that way - i.e., tags over tuples or first class records (which Haskell lacks). I personally consider this to be a design flaw in Haskell.
There are some practical differences though:
Laziness. Haskell's tuples are lazy and you can't change that. You can however mark constructor fields as strict:
data Tree a = EmptyTree | Node !a !(Tree a) !(Tree a)
Memory footprint and performance. Circumventing intermediate types reduces the footprint and raises the performance. You can read more about it in this fine answer.
You can also mark the strict fields with the the UNPACK pragma to reduce the footprint even further. Alternatively you can use the -funbox-strict-fields compiler option. Concerning the last one, I simply prefer to have it on by default in all my projects. See the Hasql's Cabal file for example.
Considering the stated above, if it's a lazy type that you're looking for, then the following snippets should compile to the same thing:
data Tree a = EmptyTree | Node a (Tree a) (Tree a)
data Tree a = EmptyTree | Node {-# UNPACK #-} !(a, Tree a, Tree a)
So I guess you can say that it's possible to use tuples to store lazy fields of a constructor without a penalty. Though it should be mentioned that this pattern is kinda unconventional in the Haskell's community.
If it's the strict type and footprint reduction that you're after, then there's no other way than to denormalize your tuples directly into constructor fields.

They're what's called isomorphic, meaning "to have the same shape". You can write something like
data Option a = None | Some a
And this is isomorphic to
data Maybe a = Nothing | Just a
meaning that you can write two functions
f :: Maybe a -> Option a
g :: Option a -> Maybe a
Such that f . g == id == g . f for all possible inputs. We can then say that (,,) is a data constructor isomorphic to the constructor
data Triple a b c = Triple a b c
Because you can write
f :: (a, b, c) -> Triple a b c
f (a, b, c) = Triple a b c
g :: Triple a b c -> (a, b, c)
g (Triple a b c) = (a, b, c)
And Node as a constructor is a special case of Triple, namely Triple a (Tree a) (Tree a). In fact, you could even go so far as to say that your definition of Tree could be written as
newtype Tree' a = Tree' (Maybe (a, Tree' a, Tree' a))
The newtype is required since you can't have a type alias be recursive. All you have to do is say that EmptyLeaf == Tree' Nothing and Node a l r = Tree' (Just (a, l, r)). You could pretty simply write functions that convert between the two.
Note that this is all from a mathematical point of view. The compiler can add extra metadata and other information to be able to identify a particular constructor making them behave slightly differently at runtime.

Related

Generic Pattern in Haskell

I was hoping that Haskell's compiler would understand that f v Is type-safe given Unfold v f (Although that is a tall order).
data Sequence a = FirstThen a (Sequence a) | Repeating a | UnFold b (b -> b) (b -> a)
Is there some way that I can encapsulate a Generic pattern for a datatype without adding extra template parameters.
(I am aware of a solution for this specific case using lazy maps but I am after a more general solution)

You can use existential quantification to get there:
data Sequence a = ... | forall b. UnFold b (b -> b) (b -> a)
I'm not sure this buys you much over the simpler solution of storing the result of unfolding directly, though:
data Stream a = Cons a (Stream a)
data Sequence' a = ... | Explicit (Stream a)
In particular, if somebody hands you a Sequence a, you can't pattern match on the b's contained within, even if you think you know what type they are.

Can trees be generalized to allow any traversable sub-tree?

Data.Tree uses a list to represent the subtree rooted at a particular node. Is it possible to have two tree types, for example one which uses a list and another which uses a vector? I want to be able to write functions which don't care how the sub-tree is represented concretely, only that the subtree is traversable, as well as functions which take advantage of a particular subtree type, e.g. fast indexing into vectors.
It seems like type families would be the right tool for the job, though I've never used them before and I have no idea how to actually define the right family.
If it matters, I'm not using the containers library tree instance, but instead I have types
data Tree a b = Node a b [Tree a b] deriving (Show, Foldable, Generic)
and
data MassivTree a b = V a b (Array B Ix1 (MassivTree a b))
where the latter uses vectors from massiv.

You could use a typeclass - in fact the typeclass you need probably already exists.
Consider this:
data Tree t a = Tree a (t (Tree t a))
Argument t is a higher-kinded type which represents a container of as.
Now define a set of Tree operations, constrained on Traversable like so:
:: (Foldable t) => Tree t a -> b
And you can now create and manipulate trees that use any Foldable. You would need to choose the right typeclass for the set of operations you want - Functor may be enough, or you may want Traversable if you are doing anything with monadic actions. You can choose the typeclass on a per-function basis, depending on what it does.
You can now define Tree types like so:
type ListTree a = Tree [] a
type MassivTree r ix a = Tree (Array r ix) a
You can also define instance-specific functions, with access to a full range of functionality:
:: ListTree a -> b
-- or
:: Tree [] a -> b
Happy Haskelling!

The simplest way to generically traverse a tree in haskell

Suppose I used language-javascript library to build AST in Haskell. The AST has nodes of different types, and each node can have fields of those different types.
And each type can have numerous constructors. (All the types instantiate Data, Eq and Show).
I would like to count each type's constructor occurrence in the tree. I could use toConstr to get the constructor, and ideally I'd make a Tree -> [Constr] function fisrt (then counting is easy).
There are different ways to do that. Obviously pattern matching is too verbose (imagine around 3 types with 9-28 constructors).
So I'd like to use a generic traversal, and I tried to find the solution in SYB library.
There is an everywhere function, which doesn't suit my needs since I don't need a Tree -> Tree transformation.
There is gmapQ, which seems suitable in terms of its type, but as it turns out it's not recursive.
The most viable option so far is everywhereM. It still does the useless transformation, but I can use a Writer to collect toConstr results. Still, this way doesn't really feel right.
Is there an alternative that will not perform a useless (for this task) transformation and still deliver the list of constructors? (The order of their appearance in the tree doesn't matter for now)

Not sure if it's the simplest, but:
> data T = L | B T T deriving Data
> everything (++) (const [] `extQ` (\x -> [toConstr (x::T)])) (B L (B (B L L) L))
[B,L,B,B,L,L,L]
Here ++ says how to combine the results from subterms.
const [] is the base case for subterms who are not of type T. For those of type T, instead, we apply \x -> [toConstr (x::T)].
If you have multiple tree types, you'll need to extend the query using
const [] `extQ` (handleType1) `extQ` (handleType2) `extQ` ...
This is needed to identify the types for which we want to take the constructors. If there are a lot of types, probably this can be made shorter in some way.
Note that the code above is not very efficient on large trees since using ++ in this way can lead to quadratic complexity. It would be better, performance wise, to return a Data.Map.Map Constr Int. (Even if we do need to define some Ord Constr for that)

universe from the Data.Generics.Uniplate.Data module can give you a list of all the sub-trees of the same type. So using Ilya's example:
data T = L | B T T deriving (Data, Show)
tree :: T
tree = B L (B (B L L) L)
λ> import Data.Generics.Uniplate.Data
λ> universe tree
[B L (B (B L L) L),L,B (B L L) L,B L L,L,L,L]
λ> fmap toConstr $ universe tree
[B,L,B,B,L,L,L]

Constraints for instances of typeclasses not mentioning the constrained type variable

Let's say we're writing a simple bitset that uses a field of type c (think Word) to signify the presence of different values of type a:
data BitSet c a = BitSet c
deriving (Eq, Ord, Data, Typeable, Generic, NFData)
isSet :: (Bits c, Enum a) => a -> BitSet c a -> Bool
isSet e (BitSet c) = testBit c $ fromEnum e
a is assumed to be Enum so that it could be converted to an integer corresponding to the bit index.
Let's now try writing a Foldable instance for BitSet c, starting with the following:
instance FiniteBits c => Foldable (BitSet c) where
foldr f b0 (BitSet c) = foldr f b0 list
where (list :: [a]) = toEnum <$> filter (testBit c) [0..finiteBitSize c - 1]
But this won't work: it doesn't have the constraint that a should satisfy Enum a, and I don't see any way to provide it (and I have a gut feel that it doesn't make much sense from more theoretical point of view if a isn't listed in the instance declaration, but that's probably another story).
The most voted answer here suggests that using GADTs might help, but GADTs have their own shortcomings:
One would need to use standalone deriving, which gets particularly ugly for Data and Typeable, and, what's worse,
GADTs seem to break derivation of Generic for this type at all.
Is there any other way to write a Foldable instance while still keeping autoderived Generic instance?

Binding together data, types and functions

I want to model a large tree (or forest) of some regular structure - tree can be decomposed to small tree (the irregular part) and (i.e.) large list of params, each of them with each of nodes make a node of big tree.
So, I want a data structure, where each node in a tree is representing many nodes. And real node is of type (node,param).
For algorithms that work on this kind of trees type of that param does not mattter. They are just placeholders. But some data should be possible to extract from the plain param or combination of node and param, and all possible params should be iterable. All that kinds of data is known apriori, they reflect semantic of that tree.
So, actual type, semantics and stuff of param is up to implementation of tree.
I model it in C++ using nested typedefs for params type, fixed method names for all kind of stuff that should be available to algorithm (this two together making a concept) and templates for algorithm itself.
I.e. if I want to associate with each node of big tree an integer, I would provide a function int data(const node& n, const param& p), where param is available as nested typedef, and algorithm could get list of all available params, and call data with nodes of interest and each of params
I have some plain data type, i.e. tree data, like this
data Tree = Node [Tree] | Leaf
Now I want to package up:
concrete tree
some type
some values of that type
some functions operating on (that concrete) tree nodes and (that) values
So one can write some function that use this packaged up types and functions, like, generic way.
How to achieve that?
With type families I came to
class PackagedUp t where
type Value t
tree :: Tree t
values :: [Value t]
f :: Tree t -> Value t -> Int
Tree now become Tree t because type families want type of their members to depend on typeclass argument.
Also, as in https://stackoverflow.com/a/16927632/1227578 type families to deal with injectivity will be needed.
With this I can
instance PackagedUp MyTree where
type Value MyTree = (Int,Int)
tree = Leaf
values = [(0,0),(1,1)]
f t v = fst v
And how to write such a function now? I.e. a function that will take root of a tree, all of values and make a [Int] of all f tree value.

First of all, your tree type should be defined like this:
data Tree a = Node a [Tree a] | Leaf
The type above is polymorphic. As far as semantics go that resembles what we would call a generic type in OO parlance (in C# or Java we might write Tree<A> instead). A node of a Tree a holds a value of type a and a list of subtrees.
Next, we come to PackagedUp. Classes in Haskell have little to do with the OO concept of the same name; they are not meant to package data and behaviour together. Things are actually much simpler: all you need to do is defining the appropriate functions for your tree type
getRoot :: Tree a -> Maybe a
getRoot Leaf = Nothing
getRoot (Node x _) = Just x
(Returning Maybe a is a simple way to handle failure with type safety. Think of the Nothing value as a polite cousin of null that doesn't explode with null reference exceptions.)
One thing that type classes are good at is in expressing data structure algorithm interfaces such as the ones you allude to. One of the most common classes is Functor, which provides a general interface for mapping over data structures.
instance Functor Tree where
fmap f Leaf = Leaf
fmap f (Node x ts) = Node (f x) (fmap f ts)
fmap has the following polymorphic type:
fmap :: Functor f => (a -> b) -> f a -> f b
With your tree, it specialises to
fmap :: (a -> b) -> Tree a -> Tree b
and with lists (as in fmap f ts) it becomes
fmap :: (a -> b) -> [a] -> [b]
Finally, the Data.Tree module provides a data structure which looks a lot like what you want to define.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string