I noticed I can declare tuples in nimrod without giving names for each filed. For example:
type T1 = tuple[string, age: int]
type T2 = tuple[char, string, age: int]
But this doesn't apply for the last filed
type T3 = tuple[string, int] # compilation error
Why is that? Is this intended? Why should the last field always be named?
The compiler actually interprets T1 as a tuple with fields named string and age both of type int and T2 as a tuple with fields named char, string, and age of type int. In short, the standalone "types" in the comma-separated list are interpreted as field names.
This is likely a compiler bug (as you can't use the field names for constructors) in that it doesn't validate the field names. But it's not that you have to provide a type for the last element only: the type will apply to all the elements in the comma-separated list preceding the colon.
Related
I had previously asked where the Winery types are indexed. I noticed that in the serialization for the schema for Bool, which is [4,6], the 4 is the version number, and 6 is the index of SBool in SchemaP. I verified the hypothesis using other "primitive" types like Integer (serialization: 16), Double (18), Text (20). Also, [Bool] will be SVector SBool, serialized to [4,2,6], which makes sense: the 2 is for SVector, the 6 is for SBool.
But SchemaP also contains constructors that I don't intuitively see how are used: SFix, SVar, STag and SLet. What are they, and which type would I need to construct the schema for, to see them used? Why is SLet at the end, but SFix at the beginning?
SFix looks like a µ quantifier for a recursive type. The type µx. T is the type T where x refers to the whole type µx. T. For example, a list data List a = Nil | Cons a (List a) can be represented as L(a) = µr. 1 + a × r, where the recursive occurrence of the type is replaced with the variable r. You could probably see this with a recursive user-defined type like data BinTree a = Leaf | Branch a (BinTree a) (BinTree a).
This encoding doesn’t explicitly include a variable name, because the next constructor SVar specifies that “SVar n refers to the nth innermost fixpoint”, where n is an Int in the synonym type Schema = SchemaP Int. This is a De Bruijn index. If you had some nested recursive types like µx. µy. … = SFix (SFix …), then the inner variable would be referenced as SVar 0 and the outer one as SVar 1 within the body …. This “relative” notation means you can freely reorganise terms without worrying about having to rename variables for capture-avoiding substitution.
SLet is a let binding, and since it’s specified as SLet !(SchemaP a) !(SchemaP a), I presume that SLet e1 e2 is akin to let x = e1 in e2, where the variable is again implicit. So I suspect this may be a deficiency of the docs, and SVar can also refer to Let-bound variables. I don’t know how they use this constructor, but it could be used to make sharing explicit in the schema.
Finally, STag appears to be a way to attach extra “tag” metadata within the schema, in some way that’s specific to the library.
The ordering of these constructors might be maintained for compatibility with earlier versions, so adding new constructors at the end would avoid disturbing the encoding, and I figure the STag and SLet constructors at the end were simply added later.
Take a data type declaration like
data myType = Null | Container TypeA v
As I understand it, Haskell would read this as myType coming in two different flavors. One of them is Null which Haskell interprets just as some name of a ... I guess you'd call it an instance of the type? Or a subtype? Factor? Level? Anyway, if we changed Null to Nubb it would behave in basically the same way--Haskell doesn't really know anything about null values.
The other flavor is Container and I would expect Haskell to read this as saying that the Container flavor takes two fields, TypeA and v. I expect this is because, when making this type definition, the first word is always read as the name of the flavor and everything that follows is another field.
My question (besides: did I get any of that wrong?) is, how does Haskell know that TypeA is a specific named type rather than an un-typed variable? Am I wrong to assume that it reads v as an un-typed variable, and if that's right, is it because of the lower-case initial letter?
By un-typed I mean how the types appear in the following type-declaration for a function:
func :: a -> a
func a = a
First of all, terminology: "flavors" are called "cases" or "constructors". Your type has two cases - Null and Container.
Second, what you call "untyped" is not really "untyped". That's not the right way to think about it. The a in declaration func :: a -> a does not mean "untyped" the same way variables are "untyped" in JavaScript or Python (though even that is not really true), but rather "whoever calls this function chooses the type". So if I call func "abc", then I have chosen a to be String, and now the compiler knows that the result of this call must also be String, since that's what the func's signature says - "I take any type you choose, and I return the same type". The proper term for this is "generic".
The difference between "untyped" and "generic" is that "untyped" is free-for-all, the type will only be known at runtime, no guarantees whatsoever; whereas generic types, even though not precisely known yet, still have some sort of relationship between them. For example, your func says that it returns the same type it takes, and not something random. Or for another example:
mkList :: a -> [a]
mkList a = [a]
This function says "I take some type that you choose, and I will return a list of that same type - never a list of something else".
Finally, your myType declaration is actually illegal. In Haskell, concrete types have to be Capitalized, while values and type variables are javaCase. So first, you have to change the name of the type to satisfy this:
data MyType = Null | Container TypeA v
If you try to compile this now, you'll still get an error saying that "Type variable v is unknown". See, Haskell has decided that v must be a type variable, and not a concrete type, because it's lower case. That simple.
If you want to use a type variable, you have to declare it somewhere. In function declaration, type variables can just sort of "appear" out of nowhere, and the compiler will consider them "declared". But in a type declaration you have to declare your type variables explicitly, e.g.:
data MyType v = Null | Container TypeA v
This requirement exist to avoid confusion and ambiguity in cases where you have several type variables, or when type variables come from another context, such as a type class instance.
Declared this way, you'll have to specify something in place of v every time you use MyType, for example:
n :: MyType Int
n = Null
mkStringContainer :: TypeA -> String -> MyType String
mkStringContainer ta s = Container ta s
-- Or make the function generic
mkContainer :: TypeA -> a -> MyType a
mkContainer ta a = Container ta a
Haskell uses a critically important distinction between variables and constructors. Variables begin with a lower-case letter; constructors begin with an upper-case letter1.
So data myType = Null | Container TypeA v is actually incorrect; the first symbol after the data keyword is the name of the new type constructor you're introducing, so it must start with a capital letter.
Assuming you've fixed that to data MyType = Null | Container TypeA v, then each of the alternatives separated by | is required to consist of a data constructor name (here you've chosen Null and Container) followed by a type expression for each of the fields of that constructor.
The Null constructor has no fields. The Container constructor has two fields:
TypeA, which starts with a capital letter so it must be a type constructor; therefore the field is of that concrete type.
v, which starts with a lowercase letter and is therefore a type variable. Normally this variable would be defined as a type parameter on the MyType type being defined, like data MyType v = Null | Container TypeA v. You cannot normally use free variables, so this was another error in your original example.2
Your data declaration showed how the distinction between constructors and variables matters at the type level. This distinction between variables and constructors is also present at the value level. It's how the compiler can tell (when you're writing pattern matches) which terms are patterns it should be checking the data against, and which terms are variables that should be bound to whatever the data contains. For example:
lookAtMaybe :: Show a => Maybe a -> String
lookAtMaybe Nothing = "Nothing to see here"
lookAtMaybe (Just x) = "I found: " ++ show x
If Haskell didn't have the first-letter rule, then there would be two possible interpretations of the first clause of the function:
Nothing could be a reference to the externally-defined Nothing constructor, saying I want this function rule to apply when the argument matches that constructor. This is the interpretation the first-letter rule mandates.
Nothing could be a definition of an (unused) variable, representing the function's argument. This would be the equivalent of lookAtMaybe x = "Nothing to see here"
Both of those interpretations are valid Haskell code producing different behaviour (try changing the capital N to a lower case n and see what the function does). So Haskell needs a rule to choose between them. The designers chose the first-letter rule as a way of simply disambiguating constructors from variables (that is simple to both the compiler and to human readers) without requiring any additional syntactic noise.
1 The rule about the case of the first letter applies to alphanumeric names, which can only consist of letters, numbers, and underscores. Haskell also has symbolic names, which consists only of symbol characters like +, *, :, etc. For these, the rule is that names beginning with the : character are constructors, while names beginning with another character are variables. This is how the list constructor : is distinguished from a function name like +.
2 With the ExistentialQuantification extension turned on it is possible to write data MyType = Null | forall v. Container TypeA v, so that the the constructor has a field with a variable type and the variable does not appear as a parameter to the overall type. I'm not going to explain how this works here; it's generally considered an advanced feature, and isn't part of standard Haskell code (which is why it requires an extension)
I'm a beginner in Haskell playing around with parsing and building an AST. I wonder how one would go about defining types like the following:
A Value can either be an Identifier or a Literal. Right now, I simply have a type Value with two constructors (taking the name of the identifier and the value of the string literal respectively):
data Value = Id String
| Lit String
However, then I wanted to create a type representing an assignment in an AST, so I need something like
data Assignment = Asgn Value Value
But clearly, I always want the first part of an Assignment to always be an Identifier! So I guess I should make Identifier and Literal separate types to better distinguish things:
data Identifier = Id String
data Literal = Lit String
But how do I define Value now? I thaught of something like this:
-- this doesn't actually work...
data Value = (Id String) -- How to make Value be either an Identifier
| (Lit String) -- or a Literal?
I know I can simply do
data Value = ValueId Identifier
| ValueLit Literal
but this struck me as sort of unelegant and got me wondering if there was a better solution?
I first tried to restructure my types so that I would be able to do it with GADTs, but in the end the simpler solution was to go leftroundabout's suggestion. I guess it's not that "unelegant" anyways.
I am attempting to create a function in Haskell returning the Resp type illustrated below in a strange mix between BNF and Haskell types.
elem ::= String | (String, String, Resp)
Resp ::= [elem]
My question is (a) how to define this type in Haskell, and (b) if there is a way of doing so without being forced to use custom constructors, e.g., Node, rather using only tuples and arrays.
You said that "the variety of keywords (data, type, newtype) has been confusing for me". Here's a quick primer on the data construction keywords in Haskell.
Data
The canonical way to create a new type is with the data keyword. A general type in Haskell is a union of product types, each of which is tagged with a constructor. For example, an Employee might be a line worker (with a name and a salary) or a manager (with a name, salary and a list of reports).
We use the String type to represent an employee's name, and the Int type to represent a salaray. A list of reports is just a list of Employees.
data Employee = Worker String Int
| Manager String Int [Employee]
Type
The type keyword is used to create type synonyms, i.e. alternate names for the same type. This is typically used to make the source more immediately understandable. For example, we could declare a type Name for employee names (which is really just a String) and Salary for salaries (which are just Ints), and Reports for a list of reports.
type Name = String
type Salary = Int
type Reports = [Employee]
data Employee = Worker Name Salary
| Manager Name Salary Reports
Newtype
The newtype keyword is similar to the type keyword, but it adds an extra dash of type safety. One problem with the previous block of code is that, although a worker is a combination of a Name and a Salary, there is nothing to stop you using any old String in the Name field (for example, an address). The compiler doesn't distinguish between Names and plain old Strings, which introduces a class of potential bugs.
With the newtype keyword we can make the compiler enforce that the only Strings that can be used in a Name field are the ones explicitly tagged as Names
newtype Name = Name String
newtype Salary = Salary Int
newtype Reports = Reports [Employee]
data Employee = Worker Name Salary
| Manager Name Salary Reports
Now if we tried to enter a String in the Name field without explicitly tagging it, we get a type error
>>> let kate = Worker (Name "Kate") (Salary 50000) -- this is ok
>>> let fred = Worker "18 Tennyson Av." (Salary 40000) -- this will fail
<interactive>:10:19:
Couldn't match expected type `Name' with actual type `[Char]'
In the first argument of `Worker', namely `"18 Tennyson Av."'
In the expression: Worker "18 Tennyson Av." (Salary 40000)
In an equation for `fred':
fred = Worker "18 Tennyson Av." (Salary 40000)
What's great about this is that because the compiler knows that a Name is really just a String, it optimizes away the extra constructor, so this is just as efficient as using a type declaration -- the extra type safety comes "for free". This requires an important restriction -- a newtype has exactly one constructor with exactly one value. Otherwise the compiler wouldn't know which constructor or value was the correct synonym!
One disadvantage of using a newtype declaration is that now a Salary is no longer just an Int, you can't directly add them together. For example
>>> let kate'sSalary = Salary 50000
>>> let fred'sSalary = Salary 40000
>>> kate'sSalary + fred'sSalary
<interactive>:14:14:
No instance for (Num Salary)
arising from a use of `+'
Possible fix: add an instance declaration for (Num Salary)
In the expression: kate'sSalary + fred'sSalary
In an equation for `it': it = kate'sSalary + fred'sSalary
The somewhat complicated error message is telling you that a Salary isn't a numeric type, so you can't add them together (or at least, you haven't told the compiler how to add them together). One option would be to define a function that gets the underlying Int from the Salary
getSalary :: Salary -> Int
getSalary (Salary sal) = sal
but in fact Haskell will write these for you if you use record syntax when declaring your newtypes
data Salary = Salary { getSalary :: Int }
Now you can write
>>> getSalary kate'sSalary + getSalary fred'sSalary
90000
Part 1:
data Elem = El String | Node String String Resp
type Resp = [Elem]
Part 2: Well... kinda. The unsatisfying answer is: You shouldn't want to because doing so is less type safe. The more direct answer is Elem needs it's own constructor but Resp is easily defined as a type synonym as above. However, I would recommend
newtype Resp = Resp { getElems :: [Elem] }
so that you can't mix up some random list of Elems with a Resp. This also gives you the function getElems so you don't have to do as much pattern matching on a single constructor. The newtype basically let's Haskell know that it should get rid of the overhead of the constructor during runtime so there's no extra indirection which is nice.
When I have a result of type Set(Integer), numbers are not ordered. We have an operation usable on collections called sortedBy ( expr : OclExpression ) : Sequence(T), but when there's only integers in this set, what's the expression to use?
You can just use the asOrderedSet operation (if your collection is in the variable X, then that would be X->asOrderedSet())
From the OCL Standard
asOrderedSet() : OrderedSet(T)
An OrderedSet that contains all the elements from self, with duplicates removed, in an order dependent on the particular
concrete collection type.