Is there a language where types can take content of fields into account?

Is there a language where types can take content of fields into account? - programming-languages

I had this crazy idea and was wondering if such a thing exists:
Usually, in a strongly typed language, types are mainly concerned with memory layout, or membership to an abstract 'class'. So class Foo {int a;} and class Bar {int a; int b;} are distinct, but so is class Baz {int a; int b;} (although it has the same layout, it's a different type). So far, so good.
I was wondering if there is a language that allows one to specify more fine grained contraints as to what makes a type. For example, I'd like to have:
class Person {
//...
int height;
}
class RollercoasterSafe: Person (where .height>140) {}
void ride(RollercoasterSafe p) { //... }
and the compiler would make sure that it's impossible to have p.height < 140 in ride. This is just a stupid example, but I'm sure there are use cases where this could really help. Is there such a thing?

It depends on whether the predicate is checked statically or dynamically. In either case the answer is yes, but the resulting systems look different.
On the static end: PL researchers have proposed the notion of a refinement type, which consists of a base type together with a predicate: http://en.wikipedia.org/wiki/Program_refinement. I believe the idea of refinement types is that the predicates are checked at compile time, which means that you have to restrict the language of predicates to something tractable.
It's also possible to express constraints using dependent types, which are types parameterized by run-time values (as opposed to polymorphic types, which are parameterized by other types).
There are other tricks that you can play with powerful type systems like Haskell's, but IIUC you would have to change height from int to something whose structure the type checker could reason about.
On the dynamic end: SQL has something called domains, as in CREATE DOMAIN: http://developer.postgresql.org/pgdocs/postgres/sql-createdomain.html (see the bottom of the page for a simple example), which again consist of a base type and a constraint. The domain's constraint is checked dynamically whenever a value of that domain is created. In general, you can solve the problem by creating a new abstract data type and performing the check whenever you create a new value of the abstract type. If your language allows you to define automatic coercions from and to your new type, then you can use them to essentially implement SQL-like domains; if not, you just live with plain old abstract data types instead.
Then there are contracts, which are not thought of as types per se but can be used in some of the same ways, such as constraining the arguments and results of functions/methods. Simple contracts include predicates (eg, "accepts a Person object with height > 140"), but contracts can also be higher-order (eg, "accepts a Person object whose makeSmallTalk() method never returns null"). Higher-order contracts cannot be checked immediately, so they generally involve creating some kind of proxy. Contract checking does not create a new type of value or tag existing values, so the dynamic check will be repeated every time the contract is performed. Consequently, programmers often put contracts along module boundaries to minimize redundant checks.

An example of a language with such capabilities is Spec#. From the tutorial documentation available on the project site:
Consider the method ISqrt in Fig. 1, which computes the integer square root of
a given integer x. It is possible to implement the method only if x is non-negative, so
int ISqrt(int x)
requires 0 <= x;
ensures result*result <= x && x < (result+1)*(result+1);
{
int r = 0;
while ((r+1)*(r+1) <= x)
invariant r*r <= x;
{
r++;
}
return r;
}
In your case, you could probably do something like (note that I haven't tried this, I'm just reading docs):
void ride(Person p)
requires p.height > 140;
{
//...
}
There may be a way to roll up that requires clause into a type declaration such as RollercoasterSafe that you have suggested.

Your idea sounds somewhat like C++0x's concepts, though not identical. However, concepts have been removed from the C++0x standard.

I don't know any language that supports that kind of thing, but I don't find it really necessary.
I'm pretty convinced that simply applying validation in the setters of the properties may give you all the necessary restrictions.
In your RollercoasterSafe class example, you may throw an exception when the value of height property is set to a value less than 140. It's runtime checking, but polymorphism can make compile-time checking impossible.

Related

Why are new programming languages shifting types to the other side?

If you look at Rust, Go, Swift, TypeScript and a few others, and compare them to C/C++, the first thing that I noticed was how the types have moved positions.
int one = 1;
In comparsion to:
let one:int = 1;
My question: Why?
To me, personally, it is weird reading type specifiers that far into the line, since I am very used to them being on the left. So it interests me on why the type specifiers are being moved - and this not being the case with just one, but many modern/new languages that are on the table.

To me, personally, it is weird reading type specifiers that far into the line, since I am very used to them being on the left
And English is the best language because it is the only language where the words are spoken in the same order I think them. One wonders why anyone speaks French at all, with the words all in the wrong order!
So it interests me on why the type specifiers are being moved - and this not being the case with just one, but many modern/new languages that are on the table.
I note that you ignore the existence of the many older languages which use this pattern. Visual Basic (mid 1990s) immediately comes to mind.
Function F(x As String) As Object
Pascal, 1970s:
var
Set1 : set of 1..10;
Simply-typed lambda calculus, a programming language invented before computers, in the 1940s:
λx:S.λy:T:S-->T-->S
The whole ML family. I could go on. There are plenty of very old languages that use the types on the right convention.
But we can get far older than the 1940s. When you say in mathematics f : Q --> R, you are putting the name of the function on the left and the type -- a map from Q to R -- on the right. When you say x∈R to indicate that x is a real, you're putting the type on the right. "Type on the right" predates type on the left in C by literally centuries. This is not anything new!
In fact the "types on the left" syntax is the weird one! It just seems natural to you because you happen to have used a language that uses this convention in your formative years.
The types on the right syntax is much superior, for numerous reasons. Just a few:
var x : int = 1;
function y(z : int) : string { ... }
emphasizes that x is a variable and y is a function. If the type comes to the left and you see int y then you don't know whether it is a function or a variable until later. This makes programs harder for humans to read, which is bad enough. As a compiler developer, let me tell you it is quite inconvenient that the type comes on the left in C#. (I could point out numerous inconsistencies in how C# syntax deals with the positions of types.)
Another reason: In the "type on the right" syntax you can make types optional. If you have
var x : int = 1;
then you can easily say "well, we can infer the int, and so eliminate it"
var x = 1;
but if the int is on the left, then what do you do?
Inverting this: you mention TypeScript. TypeScript is a gradually-typed JavaScript. The convention in JavaScript is already
var x = 1;
function f(y) { }
Given that, plainly it is easier to modify both existing code, and the language as a whole, to introduce optional type elements on the right than it would be to make the "var" and "function" keywords replaced by a type.
Consider also the positioning. When you say:
int x = 1;
then the two things that must be consistent -- the type and the initializer -- are as far apart as they possibly can be. With var x : int = 1; they are side by side. And in
int f() {
...
...
return 123;
}
what have we got? The return is logically as far to the right as possible, so why does the function declaration move the type of the return as far to the left as possible?" With the type on the right syntax we have this nice flow:
function f(x : string) : int
{ ... ... ... return 123; }
What happens in a function call? The flow of the declaration is now the same as the flow of control: the things on the left -- initialization of formal parameters -- happens first, and the things on the right -- production of a return value -- happen last.
I could go on at some additional length pointing out how the C style gets it completely backwards, but it is late. Summing up: first, type on the right is superior in almost every possible way, and second, it is very, very old. New languages which use this convention are the ones that are being consistent with traditional practice.

If you do a web search, it is not hard to find the developers of newer languages answering this question in their own words. For example, the Go developers have a FAQ entry on this, as well as an entire article on their language blog. Many programmers are so used to C-like languages that any alternative seems weird, so this question tends to come up a lot...
However, you could argue that the C type declaration syntax itself is odd at best. The pattern-like features for pointers and function types become awkward and unintuitive very quickly, and were never developed as part of, or into, any kind of more general pattern-matching facility. For the sake of familiarity, they were adopted to a greater or lesser degree by many successive C-like languages, but the feature itself sticks out as more of a failed experiment that we have to live with for the sake of backwards compatibility.
One advantage of extricating yourself from C type syntax is that it makes it easier to use types in more places than just declarations. If you can place types conveniently wherever they make sense, you can use your types as annotation, as described in the Swift documentation.

Why must overloaded operators be declared public?

I wanted to overload an operator in a class , and have it private so that it could only be used from within the class.
However, when I tried to compile I got the error message "User-defined operator ... must be declared static and public:
Why do they have to be public?

To answer half part of your question you may see the blog post by Eric Lippert.
Why are overloaded operators always static in C#?
Rather, the question we should be asking ourselves when faced with a
potential language feature is "does the compelling benefit of the
feature justify all the costs?" And costs are considerably more than
just the mundane dollar costs of designing, developing, testing,
documenting and maintaining a feature. There are more subtle costs,
like, will this feature make it more difficult to change the type
inferencing algorithm in the future? Does this lead us into a world
where we will be unable to make changes without introducing backwards
compatibility breaks? And so on.
In this specific case, the compelling benefit is small. If you want to
have a virtual dispatched overloaded operator in C# you can build one
out of static parts very easily. For example:
public class B {
public static B operator+(B b1, B b2) { return b1.Add(b2); }
protected virtual B Add(B b2) { // ...
And there you have it. So, the benefits are small. But the costs are
large. C++-style instance operators are weird. For example, they break
symmetry. If you define an operator+ that takes a C and an int, then
c+2 is legal but 2+c is not, and that badly breaks our intuition about
how the addition operator should behave.
For your question's other part: Why they should be public.
I was not able to find any authentic source, but IMO, if they are not going to be public, and may be used inside a class only, then this can be done using a simple private method, rather than an overloaded operator.
You may see the blog post by RB Whitaker
Operator Overloading
All operator overloads must be public and static, which should make
sense, since we want to have access to the operator throughout the
program, and since it belongs to the class as a whole, rather than any
specific instance of the class.

UML association and dependency

What is the difference between association and dependency? Can you give code examples?
What is the relationship between class A and B?
class A
{
B *b;
void f ()
{
b = new B ();
b->f();
delete b;
}
}

The short answer is: how any specific source language construct should be represented in UML is not strictly defined. This would be part of a standardized UML profile for the language in question, but these are sadly few and far between. Long answer follows.
In your example, I'm afraid I would have to say "neither", just to be difficult. A has a member variable of type B, so the relationship is actually an aggregation or a composition... Or a directed association. In UML, a directed association with a named target role is semantically equivalent to an attribute with the corresponding name.
As a rule of thumb, it's an aggregation if b gets initialized in A's constructor; it's a composition if it also gets destroyed in B's destructor (shared lifecycle). If neither applies, it's an attribute / directed association.
If b was not a member variable in A, and the local variable b was not operatoed on (no methods were called on it), then I would represent that as a dependency: A needs B, but it doesn't have an attribute of that type.
But f() actually calls a method defined in B. This to me makes the correct relationship a <<use>>, which is a more specialized form of dependency.
Finally, an (undirected) association is the weakest form of link between two classes, and for that very reason I tend not to use them when describing source constructs. When I do, I usually use them when there are no direct source code relationships, but the two classes are still somehow related. An example of this might be a situation where the two are responsible for different parts of the same larger algorithm, but a third class uses them both.

It may be useful to see this question I asked: does an association imply a dependency in UML
My understanding is:
Association
public class SchoolClass{
/** This field, of type Bar, represents an association, a conceptual link
* between SchoolClass and Student. (Yes, this should probably be
* a List<Student>, but the array notation is clearer for the explanation)
*/
private Student[] students;
}
Dependency
public class SchoolClass{
private Timetable classTimetable;
public void generateTimetable(){
/*
* Here, SchoolClass depends on TimetableGenerator to function,
* but this doesn't represent a conceptual relationship. It's more of
* a logical implementation detail.
*/
TimetableGenerator timetableGen = new TimetableGenerator();
/*
* Timetable, however, is an association, as it is a conceptual
* relationship that describes some aspect of the data that the
* class holds (Remember OOP101? Objects consist of data and operations
* upon that data, associations are UMLs way or representing that data)
*/
classTimetable = timetableGen.generateTimetable();
}
}

If you want to see the difference at the "code level", in an association between A and B, the implementation of A (or B or both depending on cardinalities, navigability,...) in an OO lang would include an attribute of type B.
Instead in a dependency, A would probably have a method where one of the parameters is of type B. So A and B are not linked but changing B would affect the dependant class A since maybe the way the A method manipulates the object B is no longer valid (e.g. B has changed the signature of a method and this induces a compile error in the class A)

Get it from Wiki: Dependency is a weaker form of relationship which indicates that one class depends on another because it uses it at some point of time. One class depends on another if the latter is a parameter variable or local variable of a method of the former. This is different from an association, where an attribute of the former is an instance of the latter.
So I think the case here is association, if B is a parameter variable or local variable of a method of the A, then they are dependency.

A dependency really is very loosely defined. So there would be no code representation.
Wiki: A dependency is a semantic relationship where a change to the influent or independent modeling element may affect the semantics of the dependent modeling element.[1]
From the OMG Spec: A dependency is a relationship that signifies that a single or a set of model elements requires other model elements for their specification or implementation. This means that the complete semantics of the depending elements is either semantically or structurally dependent on the definition of the supplier element(s).

What is typestate?

What does TypeState refer to in respect to language design? I saw it mentioned in some discussions regarding a new language by mozilla called Rust.

Note: Typestate was dropped from Rust, only a limited version (tracking uninitialized and moved from variables) is left. See my note at the end.
The motivation behind TypeState is that types are immutable, however some of their properties are dynamic, on a per variable basis.
The idea is therefore to create simple predicates about a type, and use the Control-Flow analysis that the compiler execute for many other reasons to statically decorate the type with those predicates.
Those predicates are not actually checked by the compiler itself, it could be too onerous, instead the compiler will simply reasons in terms of graph.
As a simple example, you create a predicate even, which returns true if a number is even.
Now, you create two functions:
halve, which only acts on even numbers
double, which take any number, and return an even number.
Note that the type number is not changed, you do not create a evennumber type and duplicate all those functions that previously acted on number. You just compose number with a predicate called even.
Now, let's build some graphs:
a: number -> halve(a) #! error: `a` is not `even`
a: number, even -> halve(a) # ok
a: number -> b = double(a) -> b: number, even
Simple, isn't it ?
Of course it gets a bit more complicated when you have several possible paths:
a: number -> a = double(a) -> a: number, even -> halve(a) #! error: `a` is not `even`
\___________________________________/
This shows that you reason in terms of sets of predicates:
when joining two paths, the new set of predicates is the intersection of the sets of predicates given by those two paths
This can be augmented by the generic rule of a function:
to call a function, the set of predicates it requires must be satisfied
after a function is called, only the set of predicates it established is satisfied (note: arguments taken by value are not affected)
And thus the building block of TypeState in Rust:
check: checks that the predicate holds, if it does not fail, otherwise adds the predicate to set of predicates
Note that since Rust requires that predicates are pure functions, it can eliminate redundant check calls if it can prove that the predicate already holds at this point.
What Typestate lack is simple: composability.
If you read the description carefully, you will note this:
after a function is called, only the set of predicates it established is satisfied (note: arguments taken by value are not affected)
This means that predicates for a types are useless in themselves, the utility comes from annotating functions. Therefore, introducing a new predicate in an existing codebase is a bore, as the existing functions need be reviewed and tweaked to cater to explain whether or not they need/preserve the invariant.
And this may lead to duplicating functions at an exponential rate when new predicates pop up: predicates are not, unfortunately, composable. The very design issue they were meant to address (proliferation of types, thus functions), does not seem to be addressed.

It's basically an extension of types, where you don't just check whether some operation is allowed in general, but in this specific context. All that at compile time.
The original paper is actually quite readable.

There's a typestate checker written for Java, and Adam Warski's explanatory page gives some useful information. I'm only just figuring this material out myself, but if you are familiar with QuickCheck for Haskell, the application of QuickCheck to monadic state seems similar: categorise the states and explain how they change when they are mutated through the interface.

Typestate is explained as:
leverage type system to encode state changes
Implemented by creating a type for each state
Use move semantics to invalidate a state
Return the next state from the previous state
Optionally drop the state(close file, connections,...)
Compile time enforcement of logic
struct Data;
struct Signed;
impl Data {
fn sign(self) -> Signed {
Signed
}
}
let data = Data;
let singed = data.sign();
data.sign() // Compile error

Does D have 'newtype'?

Does D have 'newtype' (as in Haskell).
It's a naive question, as I'm just skimming D, but Google didn't turn up anything useful.
In Haskell this is a way of making different types of the same thing distinct at compile time, but without incurring any runtime performance penalties.
e.g. you could make newtypes (doubles) for metres, seconds and kilograms. This would error at compile time if your program added a quantity in metres to a quantity in seconds, but would be just as fast at runtime as if both were doubles (which they are at runtime).
If D doesn't have something analogous to 'newtype', what are the accepted methods for dealing with dimensioned quantities?
Thanks,
Chris.

In D1.0 there is typedef, which is the strong typing from a predefined type to a 'newtype.'
D2.0 has removed this and only alias remains (what typedef is in C). There is talk about having a wrapper template that can strongly create a new type.
The issue with typedef was that there were good arguments for making the newtype a sub-type of the predefined type, and also good arguments for making it a super-type.
The semantics of typedef are that the base type is implicitly converted to the newtype, but the newtype is not converted to the base type or other types with the same base type. I am using base type here since:
typedef int Fish;
typedef Fish Cat;
Fish gold = 1;
Cat fluff = gold;
Will fail to compile.
And as of right now, 2.048 DMD still allows the use of typedef (but don't use it).
Having the base type convert to the newtype is useful so you don't have to write
meters = cast(meters) 12.7;

Funny, as he_the_great mentions, D1 had a strong typedef but noone used it, possibly because it was impossible to customize the exact semantics for each case. Possibly the simplest way to handle this situation, at least for primitive types, is to include a mixin template somewhere in Phobos that allows you to forward all operators but have the boilerplate to do this automatically generated via the mixin. Then you'd just create a wrapper struct and be all set.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string