I'm trying to reduce my confusion about Haskell's syntax and would like to find out what the separate namespaces are in Haskell.
Namespaces meaning syntactical namespaces corresponding to the various symbols tables the compiler manages, not name scopes defined in code.
For example:
Value names (like function names)
Data constructors
Type constructors
Type parameters (in type definitions)
instances ?
...?
I'm interested because I'm having trouble reading Haskell code (definitely more than with any other language) because I frequently have a hard time figuring out what exactly I'm looking at (especially with data/type constructors/type declarations).
Haskell seems to reuse a handful of syntactical constructs (esp. <name> <name> ...) in many places and relies on context - just turns out that the compiler is a lot better at this than me...
The Haskell Report §1.4 says
There are six kinds of names in Haskell: those for variables and
constructors denote values; those for type variables, type
constructors, and type classes refer to entities related to the type
system; and module names refer to modules. There are two constraints
on naming:
Names for variables and type variables are identifiers beginning with lowercase letters or underscore; the other four kinds of names
are identifiers beginning with uppercase letters.
An identifier must not be used as the name of a type constructor and a class in the same scope.
These are the only constraints; for example, Int may simultaneously be
the name of a module, class, and constructor within a single scope.
The confusion can be avoided if you make sure you understand what you are reading:
an expression: here every upper case name is a data constructor or a qualified variable or constructor whereas lower case are values
a type: here every upper case name is a type constructor or class name, whereas lower case names are type variables.
Related
As I learn Haskell, I can't help but try to understand everything from a formal point of view. After all this is the theoretical coherence I came to look for as a Scala Programmer.
One thing that does compute well in my mind, is fitting the meaning of type declaration in the overraching theory of lambda calculus and everything is an expression. Yes there is Binding, but Binding does not work with type declaration either.
Example
data DataType a = Data a | Datum
Question:
What is the meaning of = here? If it was the declaration of a function, then on the rhs we would get an expression reducible to an irreducible value or another expression (i.e. returning a function). This is not what we have above.
My confusion
We have a type function in DataType a, and Data Constructor a.k.a Value Constructor in Data a and Datum. Clearly a type is not equal to a value, one is at the term level and the other at the type level. Not the same space at all. However it might work to follow the pronunciation provided here https://wiki.haskell.org/Pronunciation. = is pronounced is. As in a DataType is those values. But that is a stretch. because it is not the same semantic as for a function declaration. Hence I'm puzzled. A Type Level function equal a value level function makes no sense to me.
My question reformulated differently so to explain what i am looking for
So in a sense, I would like to understand the semantic of a data declaration and where does it fit in everything is an expression (Haskell theoretical framework) ?
Clarifying the difference with Binding. Generally speaking, when talking about binding, can we say it is an expression of type Unit ? Otherwise, what is it, and what is the type of it, and where does it fit in lambda calculus or whatever we should call the theoretical framework that backs Haskell ?
I think the simplest answer here is that = doesn’t have any one meaning in particular — it is simply the syntax which the Haskell designers decided to use for data types. There is no particular relationship between the = used in function definitions and the = used in ADT definitions; both constructions are used to declare something on the LHS using some sort of specification on the RHS, but that’s about it. The two constructions have more differences than similarities.
As it happens, modern Haskell doesn’t necessarily require the use of = when defining an ADT. The GADTSyntax and GADTs GHCI language extensions enable ‘GADT syntax’, which is an alternate method to define ADTs. It looks like this:
data DataType a where
Data :: a -> DataType a
Datum :: DataType a
This illustrates that the use of = is not necessarily a mandatory part of type declarations: it is simply one convention amongst many.
(As to why the Haskell designers chose to use the convention with =, I’m not entirely sure. Possibly it might have been by analogy with type synonyms, like type Synonym = Maybe Int, in which = does have similarities to its use in function definitions, in that the RHS expresses a type which is then assigned the name on the LHS.)
Since you mentioned you are a Scala programmer, here is what DataType might look like in Scala:
sealed trait DataType[+A]
case class Data[A](a: A) extends DataType[A]
object Datum extends DataType[Nothing]
I read your Haskell code as "A DataType a can be either a Data a or Datum". DataType is a type constructor, whereas Data is a value constructor and Datum is a value. You can think of Data as a function Data :: a -> DataType a, and Datum as something like Datum :: forall a. DataType a (imagine the names were lowercase). The RHS tells you which functions you can use to get a value of type LHS.
I saw that, in the book, Programming Language Design Concepts by John Wiley, 2004, there is a definition for bindables:
"A bindable entity is one that may be bound to an identifier.
Programming languages vary in the kinds of entity that are bindable:
• C’s bindable entities are types, variables, and function procedures.
• JAVA’s bindable entities are values, local variables, instance and
class variables, methods, classes, and packages.
• ADA’s bindable entities include types, values, variables,
procedures, exceptions, packages, and tasks."
I'm curious, which bindable entities are in Haskell?
Haskell has three namespaces, one each for runtime computations, types, and modules.
Any term representing a runtime computation may be named in the computation namespace. data and newtype declarations create new names in the computation namespace for constructing values of their new type and, if record syntax is used, for selecting fields from the new type. class declarations create new names in the computation namespace for their methods.
Any monomorphic type may be named in the type namespace with a type declaration (see comments below for my predictions on confusing subtleties in this statement). data and newtype declarations create new names in the type namespace for constructing the type they declare. class declarations create new names in the type namespace for the constraint they create.
module declarations create new names in the module namespace.
GHC extends Haskell, adding a variety of new ways to bind names (almost all in the type namespace); a comprehensive list is probably too large for this format but the manual is excellent and covers them all.
Now, to the subtleties of type. One confusion that I predict will arise is this: I say only monomorphic types may be named. But one may object that I can certainly write, e.g.
type TypeOfIdMono a = a -> a
id :: TypeOfIdMono a
and that looks like it has named a polymorphic type. I claim that Haskell's penchant for making foralls implicit has instead confused the issue, and that TypeOfId a is in fact monomorphic. With explicit foralls, this has been written:
type TypeOfIdMono a = a -> a
id :: forall a. TypeOfIdMono a
That is: we have not actually named id's type here, but rather the type of a monomorphic function which only operates on as. id says additionally that the caller gets to choose a -- that is, that the function is polymorphic. Compare this declaration, which is not allowed in standard Haskell (though is available via GHC extensions, as alluded to above):
type TypeOfIdPoly = forall a. a -> a
id :: TypeOfIdPoly
Here we really have named a polymorphic type.
In short: one can and should distinguish between three orthogonal concepts: "parameterized types" (e.g. TypeOfIdMono which expects an additional argument), types which mention a type variable (e.g. TypeOfIdMono a), and polymorphic types (e.g. TypeOfIdPoly) which necessarily have a forall.
Suppose I have the following class:
class P a where
nameOf :: a -> String
I would like to declare that all instances of this class are automatically instances of Show. My first attempt would be the following:
instance P a => Show a where
show = nameOf
My first attempt to go this way yesterday resulted in a rabbit warren of language extensions: I was first told to switch on flexible instances, then undecidable instances, then overlapping instances, and finally getting an error about overlapping instance declarations. I gave up and returned to repeating the code. However, this fundamentally seems like a very simple demand, and one that should be easily satisfied.
So, two questions:
Is there a trivially easy way to do this that I've just missed?
Why do I get an overlapping instances problem? I can see why I might need UndecidableInstances, since I seem to be violating the Paterson condition, but there are no overlapping instances around here: there are no instances of P, even. Why does the typechecker believe there are multiple instances for Show Double (as seems to be the case in this toy example)?
You get the overlapping instances error because some of your instances of P may have other instances of Show and then the compiler won't be able to decide which ones to use. If you have an instance of P for Double, then there you go, you get two instances of Show for Double: yours general one and the one already declared in Haskell's base library. How this error is triggered is correctly stated by #augustss in the comments to your question. For more info see the specs.
As you already know, there is no way to achieve what you're trying without the UndecidableInstances. When you enable that flag you must understand that you're taking over the compiler's responsibility to ensure that there won't arise any conflicting instances. This means that, of course, there mustn't be any other instances of Show produced in your library. This also means that your library won't export the P class, which will erase the possibility of users of the library declaring the conflicting instances.
If your case somehow conflicts with the said above, it's a reliable sign of that there must be something wrong with it. And in fact there is...
What you're trying to achieve is incorrect above all. You are missing several important points about the Show typeclass, distinguishing it from constructs like a toString method of popular OO languages:
From Show's haddock:
The result of show is a syntactically correct Haskell expression containing only constants, given the fixity declarations in force at the point where the type is declared. It contains only the constructor names defined in the data type, parentheses, and spaces. When labelled constructor fields are used, braces, commas, field names, and equal signs are also used.
In other words, declaring an instance of Show, which does not produce a valid Haskell expression, is incorrect per se.
Given the above it just doesn't make sense to declare a custom instance of Show when the type allows to simply derive it.
When a type does not allow to derive it (e.g., GADT), generally you'll still have to stick to type-specific instances to produce correct results.
So, if you need a custom representation function, you shouldn't use Show for that. Just declare a custom class, e.g.:
class Repr a where
repr :: a -> String
and approach the instances declaration responsibly.
Some time ago in one of Haskell extensions (can't find the link), and recently in Ur I've found that names (e.g., of record fields) form a Kind. Can somebody explain why Type abstraction is not enough for them?
The answer is simple: because they can appear in types. Consequently, they have to live on the type level (otherwise you would need dependent types). And because they live on the type level, they are classified by a kind.
Record systems define rules for values, types and (maybe) kinds. What rules are used depends on the type system being designed, and what the designer wishes to achieve.
E.g. in Haskell, record labels are:
values (the accessor functions)
those values have types (e.g. Record -> Int)
those types have kinds (*)
Other record systems can use the type or kind system for different purposes.
By putting labels in a separate kind, the type checker can treat them specially, with special rules for e.g. automatic lenses, or proofs to do with record construction (totality perhaps) not true of general purpose functions.
An example of using the kind system in Haskell is the use of "unboxed types". These are types that have:
different runtime representations to regular values
different binding forms (e.g. can't be allocated on the heap)
To keep unboxed types from mixing in with regular types, they are given a different kind, which allows the compiler to track their separation.
So, there is nothing magic about record label names that means you have to use a different kind to represent them -- it is just a choice a language designer can make - and in a dependently-typed language such as Ur or Twelf, that can be a useful distinction.
At first glance, there obvious distinctions between the two kinds of "class". However, I believe there are more similarities:
Both have different kinds of constructors.
Both define a group of operations that could be applied to a particular type of data, in other words, they both define an Interface.
I can see that "class" is much more concise in Haskell and it's also more efficient. But, I have a feeling that, theoretically, "class" and "abstract class" are identical.
What's your opinion?
Er, not really, no.
For one thing, Haskell's type classes don't have constructors; data types do.
Also, a type class instance isn't really attached to the type it's defined for, it's more of a separate entity. You can import instances and data definitions separately, and usually it doesn't really make sense to think about "what class(es) does this piece of data belong to". Nor do functions in a type class have any special access to the data type an instance is defined for.
What a type class actually defines is a collection of identifiers that can be shared to do conceptually equivalent things (in some sense) to different data types, on an explicit per-type basis. This is why it's called ad-hoc polymorphism, in contrast to the standard parametric polymorphism that you get from regular type variables.
It's much, much closer to "overloaded functions" in some languages, where different functions are given the same name, and dispatch is done based on argument types (for some reason, other languages don't typically allow overloading based on return type, though this poses no problem for type classes).
Apart from the implementation differences, one major conceptual difference is regarding when the classes / type classes as declared.
If you create a new class, MyClass, in e.g. Java or C#, you need to specify all the interfaces it provides at the time you develop the class. Now, if you bundle your code up to a library, and provide it to a third party, they are limited by the interfaces you decided the class to have. If they want additional interfaces, they'd have to create a derived class, TheirDerivedClass. Unfortunately, your library might make copies or MyClass without knowledge of the derived class, and might return new instances through its interfaces thatt they'd then have to wrap. So, to really add new interfaces to the class, they'd have to add a whole new layer on top of your library. Not elegant, nor really practical either.
With type classes, you specify interfaces that a type provides separate from the type definition. If a third party library now contained YourType, I can just instantiate YourType to belong to the new interfaces (that you did not provide when you created the type) within my own code.
Thus, type classes allow the user of the type to be in control of what interfaces the type adheres to, while with 'normal' classes, the developer of the class is in control (and has to have the crystal ball needed to see all the possible things for what the user might want to use the class).
From: http://www.haskell.org/tutorial/classes.html
Before going on to further examples of the use of type classes, it is worth pointing out two other views of Haskell's type classes. The first is by analogy with object-oriented programming (OOP). In the following general statement about OOP, simply substituting type class for class, and type for object, yields a valid summary of Haskell's type class mechanism:
"Classes capture common sets of operations. A particular object may be an instance of a class, and will have a method corresponding to each operation. Classes may be arranged hierarchically, forming notions of superclasses and sub classes, and permitting inheritance of operations/methods. A default method may also be associated with an operation."
In contrast to OOP, it should be clear that types are not objects, and in particular there is no notion of an object's or type's internal mutable state. An advantage over some OOP languages is that methods in Haskell are completely type-safe: any attempt to apply a method to a value whose type is not in the required class will be detected at compile time instead of at runtime. In other words, methods are not "looked up" at runtime but are simply passed as higher-order functions.
This slideshow may help you understand the similarities and differences between OO abstract classes and Haskell type classes: Classes, Jim, But Not As We Know Them.