Syntax for nested signatures? - signature

In my ml program I am using nested structures to structure my code. I'm defining the signatures for these structures - but I can't really get to have the signatures nested.
structure Example =
structure Code =
datatype mytype = Mycons of string
for this I'd like to do something like this:
signature EXAMPLE =
signature CODE = (* or stucture Code - doesn't matter *)
datatype mytype
Now this doesn't work; I get syntax errors. My questions:
Is this a bad idea? If so, why?
How do I do it? How do I apply the nested signature to the nested structure?

The syntax in signatures when having nested structures, requires some getting used to.
When trying to specify the signature if a structure within a signature you do it like this
signature JSON =
type t
.. some signature stuff
structure Converter : sig
type json
type 'a t
... Converter specification stuff
... using type json as the parent signatures type t
end where type json = t
See these Hoffman[.sml][.sig] files for a simple examples of this and have a look at the Tree[.sig] file for a bit more complex example.
Remember that you need to mention your signature specification in your structure, else it will be pointless to make the signature in the first place.


A meaningful field name for the ternary relation between program, data, and result?

I have a signature for representing software programs:
sig Program {
???: Data -> Result
Each program maps input data to output result. So, there is a ternary relation (Program -> Data -> Result).
Notice the question marks for the field name. What field name do you suggest?
The name IO seems nice:
sig Program {
IO: Data -> Result
Then I can write elegant expressions such as:
all p: Program | p.IO ...
However, the name IO is meaningful only for (Data -> Result) not (Program -> Data -> Result).
I am stuck. What do you suggest?
IMHO, fields' names are most of the time contextual to the signature they are declared in, and that's really a fine thing.
If you look at a random sample module in Alloy, (e.g. module examples/puzzle/farmer), you'll see that it's not always that fields have meaning outside of their respective signatures:
sig State {
near: set Object,
far: set Object
Here, near and far don't really convey hints on their "temporal" nature.
Long story short, I'd stick to io for conciseness sake.
Indeed, I prefer the names of :
fiels, facts, preds, asserts, parameters, .. to be in lowercase
signatures to be Capitalized
enumeration (outer let), and singleton signatures to be in UPPERCASE

How does Haskell know whether a data type declaration is a variable or a named type?

Take a data type declaration like
data myType = Null | Container TypeA v
As I understand it, Haskell would read this as myType coming in two different flavors. One of them is Null which Haskell interprets just as some name of a ... I guess you'd call it an instance of the type? Or a subtype? Factor? Level? Anyway, if we changed Null to Nubb it would behave in basically the same way--Haskell doesn't really know anything about null values.
The other flavor is Container and I would expect Haskell to read this as saying that the Container flavor takes two fields, TypeA and v. I expect this is because, when making this type definition, the first word is always read as the name of the flavor and everything that follows is another field.
My question (besides: did I get any of that wrong?) is, how does Haskell know that TypeA is a specific named type rather than an un-typed variable? Am I wrong to assume that it reads v as an un-typed variable, and if that's right, is it because of the lower-case initial letter?
By un-typed I mean how the types appear in the following type-declaration for a function:
func :: a -> a
func a = a
First of all, terminology: "flavors" are called "cases" or "constructors". Your type has two cases - Null and Container.
Second, what you call "untyped" is not really "untyped". That's not the right way to think about it. The a in declaration func :: a -> a does not mean "untyped" the same way variables are "untyped" in JavaScript or Python (though even that is not really true), but rather "whoever calls this function chooses the type". So if I call func "abc", then I have chosen a to be String, and now the compiler knows that the result of this call must also be String, since that's what the func's signature says - "I take any type you choose, and I return the same type". The proper term for this is "generic".
The difference between "untyped" and "generic" is that "untyped" is free-for-all, the type will only be known at runtime, no guarantees whatsoever; whereas generic types, even though not precisely known yet, still have some sort of relationship between them. For example, your func says that it returns the same type it takes, and not something random. Or for another example:
mkList :: a -> [a]
mkList a = [a]
This function says "I take some type that you choose, and I will return a list of that same type - never a list of something else".
Finally, your myType declaration is actually illegal. In Haskell, concrete types have to be Capitalized, while values and type variables are javaCase. So first, you have to change the name of the type to satisfy this:
data MyType = Null | Container TypeA v
If you try to compile this now, you'll still get an error saying that "Type variable v is unknown". See, Haskell has decided that v must be a type variable, and not a concrete type, because it's lower case. That simple.
If you want to use a type variable, you have to declare it somewhere. In function declaration, type variables can just sort of "appear" out of nowhere, and the compiler will consider them "declared". But in a type declaration you have to declare your type variables explicitly, e.g.:
data MyType v = Null | Container TypeA v
This requirement exist to avoid confusion and ambiguity in cases where you have several type variables, or when type variables come from another context, such as a type class instance.
Declared this way, you'll have to specify something in place of v every time you use MyType, for example:
n :: MyType Int
n = Null
mkStringContainer :: TypeA -> String -> MyType String
mkStringContainer ta s = Container ta s
-- Or make the function generic
mkContainer :: TypeA -> a -> MyType a
mkContainer ta a = Container ta a
Haskell uses a critically important distinction between variables and constructors. Variables begin with a lower-case letter; constructors begin with an upper-case letter1.
So data myType = Null | Container TypeA v is actually incorrect; the first symbol after the data keyword is the name of the new type constructor you're introducing, so it must start with a capital letter.
Assuming you've fixed that to data MyType = Null | Container TypeA v, then each of the alternatives separated by | is required to consist of a data constructor name (here you've chosen Null and Container) followed by a type expression for each of the fields of that constructor.
The Null constructor has no fields. The Container constructor has two fields:
TypeA, which starts with a capital letter so it must be a type constructor; therefore the field is of that concrete type.
v, which starts with a lowercase letter and is therefore a type variable. Normally this variable would be defined as a type parameter on the MyType type being defined, like data MyType v = Null | Container TypeA v. You cannot normally use free variables, so this was another error in your original example.2
Your data declaration showed how the distinction between constructors and variables matters at the type level. This distinction between variables and constructors is also present at the value level. It's how the compiler can tell (when you're writing pattern matches) which terms are patterns it should be checking the data against, and which terms are variables that should be bound to whatever the data contains. For example:
lookAtMaybe :: Show a => Maybe a -> String
lookAtMaybe Nothing = "Nothing to see here"
lookAtMaybe (Just x) = "I found: " ++ show x
If Haskell didn't have the first-letter rule, then there would be two possible interpretations of the first clause of the function:
Nothing could be a reference to the externally-defined Nothing constructor, saying I want this function rule to apply when the argument matches that constructor. This is the interpretation the first-letter rule mandates.
Nothing could be a definition of an (unused) variable, representing the function's argument. This would be the equivalent of lookAtMaybe x = "Nothing to see here"
Both of those interpretations are valid Haskell code producing different behaviour (try changing the capital N to a lower case n and see what the function does). So Haskell needs a rule to choose between them. The designers chose the first-letter rule as a way of simply disambiguating constructors from variables (that is simple to both the compiler and to human readers) without requiring any additional syntactic noise.
1 The rule about the case of the first letter applies to alphanumeric names, which can only consist of letters, numbers, and underscores. Haskell also has symbolic names, which consists only of symbol characters like +, *, :, etc. For these, the rule is that names beginning with the : character are constructors, while names beginning with another character are variables. This is how the list constructor : is distinguished from a function name like +.
2 With the ExistentialQuantification extension turned on it is possible to write data MyType = Null | forall v. Container TypeA v, so that the the constructor has a field with a variable type and the variable does not appear as a parameter to the overall type. I'm not going to explain how this works here; it's generally considered an advanced feature, and isn't part of standard Haskell code (which is why it requires an extension)

What is the purpose of including the type in its definition in haskell?

I'm a beginner in haskell and I wonder about the right way to define a new type. Suppose I want to define a Point type. In an imperative language, it's usually the equivalent of:
data Point = Int Int
However in haskell I usually see definitions such as:
data Point = Point Int Int
What are the differences and when should each approach be used?
In OO languages you can define a class with something like this
class Point {
int x,y;
Point(int x, int y) {...
it's similar
data Point = ...
is the type definition (similar to class Point above , and
... = Point Int Int
is the constructor, you can also define the constructor with a different name, but you need a name regardless.
data Point = P Int Int
The data definitions are, ultimately, tagged unions. For example:
data Maybe a = Nothing | Just a
Now how would you write this type using your syntax?
Moreover it remains the fact that in Haskell you can pattern match over this values and see which constructor was used to build a value. The name of the constructor is needed for pattern matching, and if the type has just one constructor it often re-uses the same name as the type.
For example:
let x = someOperationReturningMaybe
in case x of
Nothing -> 0
Just y -> y+5
This is different from plain union type, such as C's union where you can say "this thing is etiher an int or a float" but you have no way to know which one it actually is (except by keeping track of the state by hand).
Writing the code above using a C union you have no way to use a case to perform different actions depending on the constructor used, and you have to keep track explicitly what type is contained in that x and use an if.

How do I use a Haskell type constructor as an enumeration?

I am writing a program in Haskell that makes use of a lookup table.
type Table = [(Object, FilePath)]
data Object = Player { pName :: String }
I want to construct this in such a way that Player can be a lookup key:
[(Player, "data/players"), ...]
If I added another Object type Monster, my table might look like:
[(Player, "data/players"), (Monster, "data/mons"), ...]
However, my type definition of a Table suggests that I am looking up instantiated objects when, really, I just want to check if it's one type constructor or the other.
How do I go about doing this?
I suppose I want something like:
data ObjectType = Player | Monster | ...
but is there a way to avoid duplication of the data constructor and type constructor?
You can't really do this in the way you describe. Because Player takes an argument (pName), the type of Player itself is String -> Object, so it won't fit in your Table type properly.
As suggested in your edit, you should probably make a separate enumeration type without arguments specifically for Table:
data ObjectType = PlayerType | MonsterType | ...
Depending on how the other constructors of Object will be defined, you might be able to avoid duplication, e.g.
data Object = Object { objectType :: ObjectType, name :: String }
but that does assume that every kind of Object will have exactly one name argument and nothing else.
On reflection, I wonder if having a lookup table structure makes sense in the first place. You could replace the table with this:
lookupPath :: Object -> String
lookupPath (Player {}) = "data/players"
lookupPath (Monster {}) = "data/mons"
This format will make it harder to do things like persisting the table to disk, but does exactly capture your intention of wanting to match on the object without its parameters.
(The Player {} format for the match is the best way to match on constructors that may acquire more arguments in future, as it saves you from having to update the matching code when this happens.)

Why can't I access my structure's internal ORD_SET structure?

This exercise I made up is intended to help me understand signatures, structures, and functors in Standard ML. I can't seem to get it to work. Just for reference, I'm using
Standard ML of New Jersey v110.75 [built: Sun Jan 20 21:55:21 2013]
I have the following ML signature for an "object for which you can compute magnitude":
signature MAG_OBJ =
type object
val mag : object -> int
If I want to give a structure of "int sets with magnitude," I might have a structure for an ordered int to use with the standard library's ORD_SET signature as follows:
structure OrderedInt : ORD_KEY =
type ord_key = int
val compare =
Then I can create a functor to give me a structure with the desired types and properties:
functor MakeMagSet(structure ELT : ORD_KEY) : MAG_OBJ =
structure Set : ORD_SET = RedBlackSetFn(ELT)
type object = Set.set
val mag = Set.numItems
So far so good (everything compiles, at least). Now I create an instance of the structure for my OrderedInt structure I made above:
structure IntMagSet = MakeMagSet(structure ELT = OrderedInt)
But when I try to use it (create a set and compute its magnitude), I get an error:
val X = IntMagSet.Set.addList(IntMagSet.Set.empty, [0,1,2,3,4,5,6,7,8,9])
gives the error:
Error: unbound structure: Set in path IntMagSet.Set.empty.addList
From what I understand, ascribing a signature opaquely using :> makes it so one can't access any structure internals which are not defined explicitly in the signature, but I ascribed MAG_OBJ transparently, so I should be able to access the Set structure, right? What am I missing here?
Even rewriting the functor to specifically bind the functions I want to the struct is no good:
functor MakeMagSet(structure ELT: ORD_KEY) : MAG_OBJ =
structure Set : ORD_SET = RedBlackSetFn(ELT)
type object = Set.set
val mag = Set.numItems
val empty = Set.empty
val addList = Set.addList
Trying to access "empty" and "addList" give unbound variable errors.
On the other hand, trying to explicitly define the Set structure outside of the struct and use its functions gives a type error upon calling mag:
Error: operator and operand don't agree [tycon mismatch]
operator domain: IntMagSet.object
operand: Set.set
in expression:
IntMagSet.mag X
I think it's because you explicitly said that the type of MakeMagSet makes a MAG_OBJ, whose signature does not contain Set. If you had gotten rid of the : MAG_OBJ or made MAG_OBJ include ORD_SET, then it will work.
