Basic question: what design principles should one follow when choosing between using a class or using a record (with polymorphic fields) ?
First, we know that classes and records are essentially equivalent (since in Core, classes get desugared to dictionaries, which are just records). Nevertheless, there are differences: classes are passed implicitly, records must be explicit.
Looking a little deeper, classes are really useful when:
we have many different representations of 'the same thing', and
in actual usage, which representation is used can be inferred.
Classes are awkward when we have (up to parametric polymorphism) only one representation of our data, but we have multiple instances. This leads to the syntactic noise of having to use newtype to add extra tags (which exist only in our code, as we know such tags get erased at run time) if we don't want to turn on all sorts of troublesome extensions (i.e. overlapping and/or undecidable instances).
Of course, things get muddier: what if I want to have constraints on my types? Let's pick a real example:
class (Bounded i, Enum i) => Partition a i where
index :: a -> i
I could just as easily have done
data Partition a i = Partition { index :: a -> i}
But now I've lost my constraints, and I will have to add them to specific functions instead.
Are there design guidelines that would help me out?
I tend to see no issue with only requiring constraints on functions. The issue is, I suppose, that your data structure no longer models precisely what you intend it to. On the other hand, if you think of it as a data structure first and foremost, then that should matter less.
I feel like I don't necessarily still have a good grasp on the question, and this is about as vague as can be, but my rule of thumb tends to be that typeclasses are things that obey laws (or model meaning), and datatypes are things that encode a certain quantity of information.
When we want to layer behavior in complex ways, I've found that typeclasses start off enticingly, but can get painful quickly and switching to dictionary-passing makes things more straightforward. Which is to say that when we want implementations to be interoperable, then we should fall back to a uniform dictionary type.
This is take two, expanding a bit on a concrete example, but still just sort of spinning ideas...
Suppose we want to model probability distributions over the reals. Two natural representations come to mind.
A) Typeclass-driven
class PDist a where
sample :: a -> Gen -> Double
B) Dictionary-driven
data PDist = PDist (Gen -> Double)
The former lets us do
data NormalDist = NormalDist Double Double -- mean, var
instance PDist NormalDist where...
data LognormalDist = LognormalDist Double Double
instance PDist LognormalDist where...
The latter lets us do
mkNormalDist :: Double -> Double -> PDist...
mkLognormalDist :: Double -> Double -> PDist...
In the former, we can write
data SumDist a b = SumDist a b
instance (PDist a, PDist b) => PDist (SumDist a b)...
in the latter we can simply write
sumDist :: PDist -> PDist -> PDist
So what are the tradeoffs? Typeclass-driven lets us specify what distributions we're given. The tradeoff is that we have to construct an algebra of distributions explicitly, including new types for their combinations. Data-driven doesn't let us restrict the distributions we're given (or even if they're well-formed) but in return we can do whatever the heck we want.
Furthermore we can write a parseDist :: String -> PDist relatively easily, but we have to go through some angst to do the equiv for the typeclass approach.
So this is, in a sense the typed/untyped static/dynamic tradeoff at another level. We can give it a twist though, and argue that the typeclass, along with associated algebraic laws, specifies the semantics of a probability distribution. And the PDist type can indeed be made an instance of the PDist typeclass. Meanwhile, we can resign ourselves to using the PDist type (rather than typeclass) nearly everywhere, while thinking of it as iso to the tower of instances and datatypes necessary to use the typeclass more "richly."
In fact, we can even define basic PDist function in terms of typeclass functions. i.e. mkNormalPDist m v = PDist (sample $ NormalDist m v) So there's lots of room in the design space to slide between the two representations as necessary...
Note: I'm not sure that I understand the OP exactly. Suggestions/comments for improvement appreciated!
Background:
When I first learned about typeclasses in Haskell, the general rule-of-thumb I picked up was that, in comparison to Java-like languages:
typeclasses are similar to interfaces
data are similar to classes
Here's another SO question and answer that describe guidelines for using interfaces (also some drawbacks of interface over-use). My interpretation:
records/Java-classes are what something is
interfaces/typeclasses are roles that a concretion can fulfil
multiple, unrelated concretions can fulfil the same role
I bet you already know all this.
The guidelines I try to follow for my own code are:
typeclasses are for abstractions
records are for concretions
So in practice this means:
let the needs of the data determine the records
let the client code determine what the interfaces are -- clients should depend on abstractions, and thereby drive the creation and design of typeclasses
Example:
typeclass Show, with function show :: (Show s) => s -> String: for data that can be represented as a String.
clients just want to turn data into strings
clients don't care what the data (concretion) is -- only care that it can be represented as a string
role of implementing data: can be string-ified
this could not be achieved without a typeclass -- each datatype would require a conversion function with a different name, what a pain to deal with!
Type-classes can sometimes provide additional type-safety (An example would be Ord with Data.Map.union). If you have similar circumstances where choosing type-classes may help your type-safety - then use type-classes.
I'll present a different example where I think type-classes would not provide additional safety:
class Drawing a where
drawAsHtml :: a -> Html
drawOpenGL :: a -> IO ()
exampleFunctionA :: Drawing a => a -> a -> Something
exampleFunctionB :: (Drawing a, Drawing b) => a -> b -> Something
There is nothing exampleFunctionA could do and exampleFunctionB could not do (I find it hard to explain why, insights are welcome).
In this case I see no benefit of using a type-class.
(Edited following feedback from Jacques and question from missingo)
Related
In many articles about Haskell they say it allows to make some checks during compile time instead of run time. So, I want to implement the simplest check possible - allow a function to be called only on integers greater than zero. How can I do it?
module Positive (toPositive, getPositive, Positive) where
newtype Positive = Positive { unPositive :: Int }
toPositive :: Int -> Maybe Positive
toPositive n = if (n <= 0) then Nothing else Just (Positive n)
-- We can't export unPositive, because unPositive can be used
-- to update the field. Trivially renaming it to getPositive
-- ensures that getPositive can only be used to access the field
getPositive :: Positive -> Int
getPositive = unPositive
The above module doesn't export the constructor, so the only way to build a value of type Positive is to supply toPositive with a positive integer, which you can then unwrap using getPositive to access the actual value.
You can then write a function that only accepts positive integers using:
positiveInputsOnly :: Positive -> ...
Haskell can perform some checks at compile time that other languages perform at runtime. Your question seems to imply you are hoping for arbitrary checks to be lifted to compile time, which isn't possible without a large potential for proof obligations (which could mean you, the programmer, would need to prove the property is true for all uses).
In the below, I don't feel like I'm saying anything more than what pigworker touched on while mentioning the very cool sounding Inch tool. Hopefully the additional words on each topic will clarify some of the solution space for you.
What People Mean (when speaking of Haskell's static guarantees)
Typically when I hear people talk about the static guarantees provided by Haskell they are talking about the Hindley Milner style static type checking. This means one type can not be confused for another - any such misuse is caught at compile time (ex: let x = "5" in x + 1 is invalid). Obviously, this only scratches the surface and we can discuss some more aspects of static checking in Haskell.
Smart Constructors: Check once at runtime, ensure safety via types
Gabriel's solution is to have a type, Positive, that can only be positive. Building positive values still requires a check at runtime but once you have a positive there are no checks required by consuming functions - the static (compile time) type checking can be leveraged from here.
This is a good solution for many many problems. I recommended the same thing when discussing golden numbers. Never-the-less, I don't think this is what you are fishing for.
Exact Representations
dflemstr commented that you can use a type, Word, which is unable to represent negative numbers (a slightly different issue than representing positives). In this manner you really don't need to use a guarded constructor (as above) because there is no inhabitant of the type that violates your invariant.
A more common example of using proper representations is non-empty lists. If you want a type that can never be empty then you could just make a non-empty list type:
data NonEmptyList a = Single a | Cons a (NonEmptyList a)
This is in contrast to the traditional list definition using Nil instead of Single a.
Going back to the positive example, you could use a form of Peano numbers:
data NonNegative = One | S NonNegative
Or user GADTs to build unsigned binary numbers (and you can add Num, and other instances, allowing functions like +):
{-# LANGUAGE GADTs #-}
data Zero
data NonZero
data Binary a where
I :: Binary a -> Binary NonZero
O :: Binary a -> Binary a
Z :: Binary Zero
N :: Binary NonZero
instance Show (Binary a) where
show (I x) = "1" ++ show x
show (O x) = "0" ++ show x
show (Z) = "0"
show (N) = "1"
External Proofs
While not part of the Haskell universe, it is possible to generate Haskell using alternate systems (such as Coq) that allow richer properties to be stated and proven. In this manner the Haskell code can simply omit checks like x > 0 but the fact that x will always be greater than 0 will be a static guarantee (again: the safety is not due to Haskell).
From what pigworker said, I would classify Inch in this category. Haskell has not grown sufficiently to perform your desired tasks, but tools to generate Haskell (in this case, very thin layers over Haskell) continue to make progress.
Research on More Descriptive Static Properties
The research community that works with Haskell is wonderful. While too immature for general use, people have developed tools to do things like statically check function partiality and contracts. If you look around you'll find it's a rich field.
I would be failing in my duty as his supervisor if I failed to plug Adam Gundry's Inch preprocessor, which manages integer constraints for Haskell.
Smart constructors and abstraction barriers are all very well, but they push too much testing to run time and don't allow for the possibility that you might actually know what you're doing in a way that checks out statically, with no need for Maybe padding. (A pedant writes. The author of another answer appears to suggest that 0 is positive, which some might consider contentious. Of course, the truth is that we have uses for a variety of lower bounds, 0 and 1 both occurring often. We also have some use for upper bounds.)
In the tradition of Xi's DML, Adam's preprocessor adds an extra layer of precision on top of what Haskell natively offers but the resulting code erases to Haskell as is. It would be great if what he's done could be better integrated with GHC, in coordination with the work on type level natural numbers that Iavor Diatchki has been doing. We're keen to figure out what's possible.
To return to the general point, Haskell is currently not sufficiently dependently typed to allow the construction of subtypes by comprehension (e.g., elements of Integer greater than 0), but you can often refactor the types to a more indexed version which admits static constraint. Currently, the singleton type construction is the cleanest of the available unpleasant ways to achieve this. You'd need a kind of "static" integers, then inhabitants of kind Integer -> * capture properties of particular integers such as "having a dynamic representation" (that's the singleton construction, giving each static thing a unique dynamic counterpart) but also more specific things like "being positive".
Inch represents an imagining of what it would be like if you didn't need to bother with the singleton construction in order to work with some reasonably well behaved subsets of the integers. Dependently typed programming is often possible in Haskell, but is currently more complicated than necessary. The appropriate sentiment toward this situation is embarrassment, and I for one feel it most keenly.
I know that this was answered a long time ago and I already provided an answer of my own, but I wanted to draw attention to a new solution that became available in the interim: Liquid Haskell, which you can read an introduction to here.
In this case, you can specify that a given value must be positive by writing:
{-# myValue :: {v: Int | v > 0} #-}
myValue = 5
Similarly, you can specify that a function f requires only positive arguments like this:
{-# f :: {v: Int | v > 0 } -> Int #-}
Liquid Haskell will verify at compile-time that the given constraints are satisfied.
This—or actually, the similar desire for a type of natural numbers (including 0)—is actually a common complaints about Haskell's numeric class hierarchy, which makes it impossible to provide a really clean solution to this.
Why? Look at the definition of Num:
class (Eq a, Show a) => Num a where
(+) :: a -> a -> a
(*) :: a -> a -> a
(-) :: a -> a -> a
negate :: a -> a
abs :: a -> a
signum :: a -> a
fromInteger :: Integer -> a
Unless you revert to using error (which is a bad practice), there is no way you can provide definitions for (-), negate and fromInteger.
Type-level natural numbers are planned for GHC 7.6.1: https://ghc.haskell.org/trac/ghc/ticket/4385
Using this feature it's trivial to write a "natural number" type, and gives a performance you could never achieve (e.g. with a manually written Peano number type).
When designing data structures in functional languages there are 2 options:
Expose their constructors and pattern match on them.
Hide their constructors and use higher-level functions to examine the data structures.
In what cases, what is appropriate?
Pattern matching can make code much more readable or simpler. On the other hand, if we need to change something in the definition of a data type then all places where we pattern-match on them (or construct them) need to be updated.
I've been asking this question myself for some time. Often it happens to me that I start with a simple data structure (or even a type alias) and it seems that constructors + pattern matching will be the easiest approach and produce a clean and readable code. But later things get more complicated, I have to change the data type definition and refactor a big part of the code.
The essential factor for me is the answer to the following question:
Is the structure of my datatype relevant to the outside world?
For example, the internal structure of the list datatype is very much relevant to the outside world - it has an inductive structure that is certainly very useful to expose to consumers, because they construct functions that proceed by induction on the structure of the list. If the list is finite, then these functions are guaranteed to terminate. Also, defining functions in this way makes it easy to provide properties about them, again by induction.
By contrast, it is best for the Set datatype to be kept abstract. Internally, it is implemented as a tree in the containers package. However, it might as well have been implemented using arrays, or (more usefully in a functional setting) with a tree with a slightly different structure and respecting different invariants (balanced or unbalanced, branching factor, etc). The need to enforce any invariants above and over those that the constructors already enforce through their types, by the way, precludes letting the datatype be concrete.
The essential difference between the list example and the set example is that the Set datatype is only relevant for the operations that are possible on Set's. Whereas lists are relevant because the standard library already provides many functions to act on them, but in addition their structure is relevant.
As a sidenote, one might object that actually the inductive structure of lists, which is so fundamental to write functions whose termination and behaviour is easy to reason about, is captured abstractly by two functions that consume lists: foldr and foldl. Given these two basic list operators, most functions do not need to inspect the structure of a list at all, and so it could be argued that lists too coud be kept abstract. This argument generalizes to many other similar structures, such as all Traversable structures, all Foldable structures, etc. However, it is nigh impossible to capture all possible recursion patterns on lists, and in fact many functions aren't recursive at all. Given only foldr and foldl, one would, writing head for example would still be possible, though quite tedious:
head xs = fromJust $ foldl (\b x -> maybe (Just x) Just b) Nothing xs
We're much better off just giving away the internal structure of the list.
One final point is that sometimes the actual representation of a datatype isn't relevant to the outside world, because say it is some kind of optimised and might not be the canonical representation, or there isn't a single "canonical" representation. In these cases, you'll want to keep your datatype abstract, but offer "views" of your datatype, which do provide concrete representations that can be pattern matched on.
One example would be if wanted to define a Complex datatype of complex numbers, where both cartesian forms and polar forms can be considered canonical. In this case, you would keep Complex abstract, but export two views, ie functions polar and cartesian that return a pair of a length and an angle or a coordinate in the cartesian plane, respectively.
Well, the rule is pretty simple: If it's easy to construct wrong values by using the actual constructors, then don't allow them to be used directly, but instead provide smart constructors. This is the path followed by some data structures like Map and Set, which are easy to get wrong.
Then there are the types for which it's impossible or hard to construct inconsistent/wrong values either because the type doesn't allow that at all or because you would need to introduce bottoms. The length-indexed list type (commonly called Vec) and most monads are examples of that.
Ultimately this is your own decision. Put yourself into the user's perspective and make the tradeoff between convenience and safety. If there is no tradeoff, then always expose the constructors. Otherwise your library users will hate you for the unnecessary opacity.
If the data type serves a simple purpose (like Maybe a) and no (explicit or implicit) assumptions about the data type can be violated by directly constructing a value via the data constructors, I would expose the constructors.
On the other hand, if the data type is more complex (like a balanced tree) and/or it's internal representation is likely to change, I usually hide the constructors.
When using a package, there's an unwritten rule that the interface exposed by a non-internal module should be "safe" to use on the given data type. Considering the balanced tree example, exposing the data constructors allows one to (accidentally) construct an unbalanced tree, and so the assumed runtime guarantees for searching the tree etc might be violated.
If the type is used to represent values with a canonical definition and representation (many mathematical objects fall into this category), and it's not possible to construct "invalid" values using the type, then you should expose the constructors.
For example, if you're representing something like two dimensional points with your own type (including a newtype), you might as well expose the constructor. The reality is that a change to this datatype is not going to be a change in how 2d points are represented, it's going to be a change in your need to use 2d points (maybe you're generalising to 3d space, maybe you're adding a concept of layers, or whatever), and is almost certain to need attention in the parts of the code using values of this type no matter what you do.[1]
A complex type representing something specific to your application or field is quite likely to undergo changes to the representation while continuing to support similar operations. Therefore you only want other modules depending on the operations, not on the internal structure. So you shouldn't expose the constructors.
Other types represent things with canonical definitions but not canonical representations. Everyone knows the properties expected of maps and sets, but there are lots of different ways of representing values that support those properties. So you again only want other modules depending on the operations they support, not on the particular representations.
Some types, whether or not they are if simple with canonical representations, allow the construction of values in the program which don't represent a valid value in the abstract concept the type is supposed to represent. A simple example would be a type representing a self-balancing binary search tree; client code with access to the constructors could easily construct invalid trees. Exposing the constructors either means you need to assume that such values passed in from outside may be invalid and therefore you need to make something sensible happen even for bizarre values, or means that it's the responsibility of the programmers working with your interface to ensure they don't violate any assumptions. It's usually better to just keep such types from being constructed directly outside your module.
Basically it comes down to the concept your type is supposed to represent. If your concept maps in a very simple and obvious[2] way directly to values in some data type which isn't "more inclusive" than the concept due to the compiler being unable to check needed invariants, then the concept is pretty much "the same" as the data type, and exposing its structure is fine. If not, then you probably need to keep the structure hidden.
[1] A likely change though would be to change which numeric type you're using for the coordinate values, so you probably do have to think about how to minimise the impact of such changes. That's pretty orthogonal to whether or not you expose the constructors though.
[2] "Obvious" here meaning that if you asked 10 people independently to come up with a data type representing the concept they would all come back with the same thing, modulo changing the names.
I would propose a different, noticeably more restrictive rule than most people. The central criterion would be:
Do you guarantee that this type will never, ever change? If so, exposing the constructors might be a good idea. Good luck with that, though!
But the types for which you can make that guarantee tend to be very simple, generic "foundation" types like Maybe, Either or [], which one could arguably write once and then never revisit again.
Though even those can be questioned, because they do get revisited from time to time; there's people who have used Church-encoded versions of Maybe and List in various contexts for performance reasons, e.g.:
{-# LANGUAGE RankNTypes #-}
newtype Maybe' a = Maybe' { elimMaybe' :: forall r. r -> (a -> r) -> r }
nothing = Maybe' $ \z k -> z
just x = Maybe' $ \z k -> k x
newtype List' a = List' { elimList' :: forall r. (a -> r -> r) -> r -> r }
nil = List' $ \k z -> z
cons x xs = List' $ \k z -> k x (elimList' k z xs)
These two examples highlight something important: you can replace the Maybe' type's implementation shown above with any other implementation as long as it supports the following three functions:
nothing :: Maybe' a
just :: a -> Maybe' a
elimMaybe' :: Maybe' a -> r -> (a -> r) -> r
...and the following laws:
elimMaybe' nothing z x == z
elimMaybe' (just x) z f == f x
And this technique can be applied to any algebraic data type. Which to me says that pattern matching against concrete constructors is just insufficiently abstract; it doesn't really gain you anything that you can't get out of the abstract constructors + destructor pattern, and it loses implementation flexibility.
I'm currently trying to come up with a data structure that fits the needs of two automata learning algorithms I'd like to implement in Haskell: RPNI and EDSM.
Intuitively, something close to what zippers are to trees would be perfect: those algorithms are state merging algorithms that maintain some sort of focus (the Blue Fringe) on states and therefore would benefit of some kind of zippers to reach interesting points quickly. But I'm kinda lost because a DFA (Determinist Finite Automaton) is more a graph-like structure than a tree-like structure: transitions can make you go back in the structure, which is not likely to make zippers ok.
So my question is: how would you go about representing a DFA (or at least its transitions) so that you could manipulate it in a fast fashion?
Let me begin with the usual opaque representation of automata in Haskell:
newtype Auto a b = Auto (a -> (b, Auto a b))
This represents a function that takes some input and produces some output along with a new version of itself. For convenience it's a Category as well as an Arrow. It's also a family of applicative functors. Unfortunately this type is opaque. There is no way to analyze the internals of this automaton. However, if you replace the opaque function by a transparent expression type you should get automata that you can analyze and manipulate:
data Expr :: * -> * -> * where
-- Stateless
Id :: Expr a a
-- Combinators
Connect :: Expr a b -> Expr b c -> Expr a c
-- Stateful
Counter :: (Enum b) => b -> Expr a b
This gives you access to the structure of the computation. It is also a Category, but not an arrow. Once it becomes an arrow you have opaque functions somewhere.
Can you just use a graph to get started? I think the fgl package is part of the Haskell Platform.
Otherwise you can try defining your own structure with 'deriving (Data)' and use the "Scrap Your Zipper" library to get the Zipper.
If you don't need any fancy graph algorithms you can represent your DFA as a Map State State. This gives you fast access and manipulation. You also get focus by keeping track of the current state.
Take a look at the regex-tdfa package: http://hackage.haskell.org/package/regex-tdfa
The source is pretty complex, but it's an implementations of regexes with tagged DFAs tuned for performance, so it should illustrate some good practices for representing DFAs efficiently.
What it says in the title. If I write a type signature, is it possible to algorithmically generate an expression which has that type signature?
It seems plausible that it might be possible to do this. We already know that if the type is a special-case of a library function's type signature, Hoogle can find that function algorithmically. On the other hand, many simple problems relating to general expressions are actually unsolvable (e.g., it is impossible to know if two functions do the same thing), so it's hardly implausible that this is one of them.
It's probably bad form to ask several questions all at once, but I'd like to know:
Can it be done?
If so, how?
If not, are there any restricted situations where it becomes possible?
It's quite possible for two distinct expressions to have the same type signature. Can you compute all of them? Or even some of them?
Does anybody have working code which does this stuff for real?
Djinn does this for a restricted subset of Haskell types, corresponding to a first-order logic. It can't manage recursive types or types that require recursion to implement, though; so, for instance, it can't write a term of type (a -> a) -> a (the type of fix), which corresponds to the proposition "if a implies a, then a", which is clearly false; you can use it to prove anything. Indeed, this is why fix gives rise to ⊥.
If you do allow fix, then writing a program to give a term of any type is trivial; the program would simply print fix id for every type.
Djinn is mostly a toy, but it can do some fun things, like deriving the correct Monad instances for Reader and Cont given the types of return and (>>=). You can try it out by installing the djinn package, or using lambdabot, which integrates it as the #djinn command.
Oleg at okmij.org has an implementation of this. There is a short introduction here but the literate Haskell source contains the details and the description of the process. (I'm not sure how this corresponds to Djinn in power, but it is another example.)
There are cases where is no unique function:
fst', snd' :: (a, a) -> a
fst' (a,_) = a
snd' (_,b) = b
Not only this; there are cases where there are an infinite number of functions:
list0, list1, list2 :: [a] -> a
list0 l = l !! 0
list1 l = l !! 1
list2 l = l !! 2
-- etc.
-- Or
mkList0, mkList1, mkList2 :: a -> [a]
mkList0 _ = []
mkList1 a = [a]
mkList2 a = [a,a]
-- etc.
(If you only want total functions, then consider [a] as restricted to infinite lists for list0, list1 etc, i.e. data List a = Cons a (List a))
In fact, if you have recursive types, any types involving these correspond to an infinite number of functions. However, at least in the case above, there is a countable number of functions, so it is possible to create an (infinite) list containing all of them. But, I think the type [a] -> [a] corresponds to an uncountably infinite number of functions (again restrict [a] to infinite lists) so you can't even enumerate them all!
(Summary: there are types that correspond to a finite, countably infinite and uncountably infinite number of functions.)
This is impossible in general (and for languages like Haskell that does not even has the strong normalization property), and only possible in some (very) special cases (and for more restricted languages), such as when a codomain type has the only one constructor (for example, a function f :: forall a. a -> () can be determined uniquely). In order to reduce a set of possible definitions for a given signature to a singleton set with just one definition need to give more restrictions (in the form of additional properties, for example, it is still difficult to imagine how this can be helpful without giving an example of use).
From the (n-)categorical point of view types corresponds to objects, terms corresponds to arrows (constructors also corresponds to arrows), and function definitions corresponds to 2-arrows. The question is analogous to the question of whether one can construct a 2-category with the required properties by specifying only a set of objects. It's impossible since you need either an explicit construction for arrows and 2-arrows (i.e., writing terms and definitions), or deductive system which allows to deduce the necessary structure using a certain set of properties (that still need to be defined explicitly).
There is also an interesting question: given an ADT (i.e., subcategory of Hask) is it possible to automatically derive instances for Typeable, Data (yes, using SYB), Traversable, Foldable, Functor, Pointed, Applicative, Monad, etc (?). In this case, we have the necessary signatures as well as additional properties (for example, the monad laws, although these properties can not be expressed in Haskell, but they can be expressed in a language with dependent types). There is some interesting constructions:
http://ulissesaraujo.wordpress.com/2007/12/19/catamorphisms-in-haskell
which shows what can be done for the list ADT.
The question is actually rather deep and I'm not sure of the answer, if you're asking about the full glory of Haskell types including type families, GADT's, etc.
What you're asking is whether a program can automatically prove that an arbitrary type is inhabited (contains a value) by exhibiting such a value. A principle called the Curry-Howard Correspondence says that types can be interpreted as mathematical propositions, and the type is inhabited if the proposition is constructively provable. So you're asking if there is a program that can prove a certain class of propositions to be theorems. In a language like Agda, the type system is powerful enough to express arbitrary mathematical propositions, and proving arbitrary ones is undecidable by Gödel's incompleteness theorem. On the other hand, if you drop down to (say) pure Hindley-Milner, you get a much weaker and (I think) decidable system. With Haskell 98, I'm not sure, because type classes are supposed to be able to be equivalent to GADT's.
With GADT's, I don't know if it's decidable or not, though maybe some more knowledgeable folks here would know right away. For example it might be possible to encode the halting problem for a given Turing machine as a GADT, so there is a value of that type iff the machine halts. In that case, inhabitability is clearly undecidable. But, maybe such an encoding isn't quite possible, even with type families. I'm not currently fluent enough in this subject for it to be obvious to me either way, though as I said, maybe someone else here knows the answer.
(Update:) Oh a much simpler interpretation of your question occurs to me: you may be asking if every Haskell type is inhabited. The answer is obviously not. Consider the polymorphic type
a -> b
There is no function with that signature (not counting something like unsafeCoerce, which makes the type system inconsistent).
I find myself running into a problem commonly, when writing larger programs in Haskell. I find myself often wanting multiple distinct types that share an internal representation and several core operations.
There are two relatively obvious approaches to solving this problem.
One is using a type class and the GeneralizedNewtypeDeriving extension. Put enough logic into a type class to support the shared operations that the use case desires. Create a type with the desired representation, and create an instance of the type class for that type. Then, for each use case, create wrappers for it with newtype, and derive the common class.
The other is to declare the type with a phantom type variable, and then use EmptyDataDecls to create distinct types for each different use case.
My main concern is not mixing up values that share internal representation and operations, but have different meanings in my code. Both of those approaches solve that problem, but feel significantly clumsy. My second concern is reducing the amount of boilerplate required, and both approaches do well enough at that.
What are the advantages and disadvantages of each approach? Is there a technique that comes closer to doing what I want, providing type safety without boilerplate code?
There's another straightforward approach.
data MyGenType = Foo | Bar
op :: MyGenType -> MyGenType
op x = ...
op2 :: MyGenType -> MyGenType -> MyGenType
op2 x y = ...
newtype MySpecialType {unMySpecial :: MyGenType}
inMySpecial f = MySpecialType . f . unMySpecial
inMySpecial2 f x y = ...
somefun = ... inMySpecial op x ...
someOtherFun = ... inMySpecial2 op2 x y ...
Alternately,
newtype MySpecial a = MySpecial a
instance Functor MySpecial where...
instance Applicative MySpecial where...
somefun = ... fmap op x ...
someOtherFun = ... liftA2 op2 x y ...
I think these approaches are nicer if you want to use your general type "naked" with any frequency, and only sometimes want to tag it. If, on the other hand, you generally want to use it tagged, then the phantom type approach more directly expresses what you want.
I've benchmarked toy examples and not found a performance difference between the two approaches, but usage does typically differ a bit.
For instance, in some cases you have a generic type whose constructors are exposed and you want to use newtype wrappers to indicate a more semantically specific type. Using newtypes then leads to call sites like,
s1 = Specific1 $ General "Bob" 23
s2 = Specific2 $ General "Joe" 19
Where the fact that the internal representations are the same between the different specific newtypes is transparent.
The type tag approach almost always goes along with representation constructor hiding,
data General2 a = General2 String Int
and the use of smart constructors, leading to a data type definition and call sites like,
mkSpecific1 "Bob" 23
Part of the reason being that you want some syntactically light way of indicating which tag you want. If you didn't provide smart constructors, then client code would often pick up type annotations to narrow things down, e.g.,
myValue = General2 String Int :: General2 Specific1
Once you adopt smart constructors, you can easily add extra validation logic to catch misuses of the tag. A nice aspect of the phantom type approach is that pattern matching isn't changed at all for internal code that has access to the representation.
internalFun :: General2 a -> General2 a -> Int
internalFun (General2 _ age1) (General2 _ age2) = age1 + age2
Of course you can use the newtypes with smart constructors and an internal class for accessing the shared representation, but I think a key decision point in this design space is whether you want to keep your representation constructors exposed. If the sharing of representation should be transparent, and client code should be free to use whatever tag it wishes with no extra validation, then newtype wrappers with GeneralizedNewtypeDeriving work fine. But if you are going to adopt smart constructors for working with opaque representations, then I usually prefer phantom types.
Put enough logic into a type class to support the shared operations that the use case desires. Create a type with the desired representation, and create an instance of the type class for that type. Then, for each use case, create wrappers for it with newtype, and derive the common class.
This presents some pitfalls, depending on the nature of the type and what kind of operations are involved.
First, it forces a lot of functions to be unnecessarily polymorphic--even if in practice every instance does the same thing for different wrappers, the open world assumption for type classes means the compiler has to account for the possibility of other instances. While GHC is definitely smarter than the average compiler, the more information you can give it the more it can do to help you.
Second, this can create a bottleneck for more complicated data structures. Any generic function on the wrapped types will be constrained to the interface presented by the type class, so unless that interface is exhaustive in terms of both expressivity and efficiency, you run the risk of either hobbling algorithms that use the type or altering the type class repeatedly as you find missing functionality.
On the other hand, if the wrapped type is already kept abstract (i.e., it doesn't export constructors) the bottleneck issue is irrelevant, so a type class might make good sense. Otherwise, I'd probably go with the phantom type tags (or possibly the identity Functor approach that sclv described).