OCaml: Type Checking Objects - object

If I have an object, how can I determine its type? (Is there an OCaml equivalent to Java's instanceof operator?)

OCaml has structural typing for objects rather than nominative typing as in Java. So the type of an object is basically determined (and only determined) by its methods. Objects in OCaml can be created directly, without going through something like a class.
You can write functions which require that its argument objects have certain methods (and that those methods have certain types); for example, the following method takes an argument that is any object with a method "bar":
let foo x = x#bar

There's a discussion of "Matching Objects With Patterns" on Lambda the Ultimate (the paper uses Scala as the language, so won't answer your question). A more relevant Ocaml mailing list thread indicates that there's no RTTI/safe-downcasting for objects.
For algebraic (non object) types you obviously have:
match expr with
Type1 x -> x
Type2 (x,y) -> y
called (pattern) matching
Someone did write an extension that allows down/up-casting Ocaml objects.

In short, you have to encode your own RTTI mechanism. OCaml provides no RTTI or up/down casting (the latter in part because inheritance and subtyping are orthogonal in OCaml rather than unified as in Java).
You could do something with strings or polymorphic variants to encode type information in your classes and objects. I believe that LablGTK does some of this, and provides a utility library to support object tagging and up/down casting.

Somewhat out-of-topic, but the OPA language (which draws heavily from some aspects of OCaml), allows the equivalent of pattern-matching on objects. So it's quite feasible.

Related

Type erasure in Haskell?

I was reading a lecture note on Haskell when I came across this paragraph:
This “not caring” is what the “parametric” in parametric polymorphism means. All Haskell functions must be parametric in their type parameters; the functions must not care or make decisions based on the choices for these parameters. A function can't do one thing when a is Int and a different thing when a is Bool. Haskell simply provides no facility for writing such an operation. This property of a langauge is called parametricity.
There are many deep and profound consequences of parametricity. One consequence is something called type erasure. Because a running Haskell program can never make decisions based on type information, all the type information can be dropped during compilation. Despite how important types are when writing Haskell code, they are completely irrelevant when running Haskell code. This property gives Haskell a huge speed boost when compared to other languages, such as Python, that need to keep types around at runtime. (Type erasure is not the only thing that makes Haskell faster, but Haskell is sometimes clocked at 20x faster than Python.)
What I don't understand is how are "all Haskell functions" parametric? Aren't types explicit/static in Haskell? Also I don't really understand how type erasure improves compiling time runtime?
Sorry if these questions are really basic, I'm new to Haskell.
EDIT:
One more question: why does the author say that "Despite how important types are when writing Haskell code, they are completely irrelevant when running Haskell code"?
What I don't understand is how are "all Haskell functions" parametric?
It doesn't say all Haskell functions are parametric, it says:
All Haskell functions must be parametric in their type parameters.
A Haskell function need not have any type parameters.
One more question: why does the author say that "Despite how important types are when writing Haskell code, they are completely irrelevant when running Haskell code"?
Unlike a dynamically typed language where you need to check at run time if (for example) two things are numbers before trying to add them together, your running Haskell program knows that if you're trying to add them together, then they must be numbers because the compiler made sure of it beforehand.
Aren't types explicit/static in Haskell?
Types in Haskell can often be inferred, in which case they don't need to be explicit. But you're right that they're static, and that is actually why they don't matter at run time, because static means that the compiler makes sure everything has the type that it should before your program ever executes.
Types can be erased in Haskell because the type of an expression is either know at compile time (like True) or its type does not matter at runtime (like []).
There's a caveat to this though, it assumes that all values have some kind of uniform representation. Most Haskell implementations use pointers for everything, so the actual type of what a pointer points to doesn't matter (except for the garbage collector), but you could imagine a Haskell implementation that uses a non-uniform representation and then some type information would have to be kept.
Others have already answered, but perhaps some examples can help.
Python, for instance, retains type information until runtime:
>>> def f(x):
... if type(x)==type(0):
... return (x+1,x)
... else:
... return (x,x)
...
>>> f("hello")
('hello', 'hello')
>>> f(10)
(11, 10)
The function above, given any argument x returns the pair (x,x), except when x is of type int. The function tests for that type at runtime, and if x is found to be an int it behaves in a special way, returning (x+1, x) instead.
To realize the above, the Python runtime must keep track of types. That is, when we do
>>> x = 5
Python can not just store the byte representation of 5 in memory. It also needs to mark that representation with a type tag int, so that when we do type(x) the tag can be recovered.
Further, before doing any operation such as x+1 Python needs to check the type tag to ensure we are really working on ints. If x is for instance a string, Python will raise an exception.
Statically checked languages such as Java do not need such checks at runtime. For instance, when we run
SomeClass x = new SomeClass(42);
x.foo();
the compiler has already checked there's indeed a method foo for x at compile time, so there's no need to do that again. This can improve performance, in principle. (Actually, the JVM does some runtime checks at class load time, but let's ignore those for the sake of simplicity)
In spite of the above, Java has to store type tags like Python does, since it has a type(-) analogous:
if (x instanceof SomeClass) { ...
Hence, Java allows one to write functions which can behave "specially" on some types.
// this is a "generic" function, using a type parameter A
<A> A foo(A x) {
if (x instanceof B) { // B is some arbitrary class
B b = (B) x;
return (A) new B(b.get()+1);
} else {
return x;
}
}
The above function foo() just returns its argument, except when it's of type B, for which a new object is created instead. This is a consequence of using instanceof, which requires every object to carry a tag at runtime.
To be honest, such a tag is already present to be able to implement virtual methods, so it does not cost anything more. Yet, the presence of instanceof makes it possible to cause the above non-uniform behaviour on types -- some types can be handled differently.
Haskell, instead has no such type/instanceof operator. A parametric Haskell function having type
foo :: a -> (a,a)
must behave in the same way at all types. There's no way to cause some "special" behaviour. Concretely, foo x must return (x,x), and we can see this just by looking at the type annotation above. To stress the point, there's no need to look at the code (!!) to prove such property. This is what parametricity ensures from the type above.
Implementations of dynamically typed languages typically need to store type information with each value in memory. This isn't too bad for Lisp-like languages that have just a few types and can reasonably identify them with a few tag bits (although such limited types lead to other efficiency issues). It's much worse for a language with lots of types. Haskell lets you carry type information to runtime, but it forces you to be explicit about it, so you can pay attention to the cost. For example, adding the context Typeable a to a type offers a value with that type access, at runtime, to a representation of the type of a. More subtly, typeclass instance dictionaries are usually specialized away at compile time, but in sufficiently polymorphic or complex cases may survive to runtime. In a compiler like the apparently-abandoned JHC, and one likely possibility for the as-yet-barely-started compiler THC, this could lead to some type information leaking to runtime in the form of pointer tagging. But these situations are fairly easy to identify and only rarely cause serious performance problems.

Typeclasses in Common Lisp

I'm wondering if there's a way to emulate Haskell's typeclasses in Common Lisp.
Generic functions allow overloading, and it's possible to define types using deftype(which could be defined by membership to some list of instances, for example).
But I can't dispatch on a type. Is there a way to make a class a subclass(and a subtype) of some other class after its definition(e.g. making the cons class a subclass of a sequence class, without redefining cons)?
Thanks.
Type classes in Haskell are a means to statically look up implementations for "interfaces" in the form of dictionaries (similarly to how vtables in e.g. C++ are used but (almost) fully statically, unlike C++ which does dynamic dispatch at runtime). Common Lisp however is a dynamically typed language so such lookup would make no sense. However you can implement your own look up of "type class" implementations (instances) at runtime — a design not too hard to imagine in a language as expressive as Common Lisp.
P.S. Python's Zope has an adaption mechanism with very similar charactetistics, if you feel like referring to an existing solution in a dynamic setting.
You cannot modify the class hierarchy in the way you envision, but you can achieve pretty much the same effect.
Suppose that your definition of a sequence is that it has a method for the function sequence-length.
(defclass sequence ...)
(defmethod sequence-length ((s sequence)) ...)
Then you can easily extend your sequence-length method to conses:
(defmethod sequence-length ((s cons))
(length s))
Did that create a new class that includes cons? Not really. You can express the type of things that have a sequence-length method by saying (or sequence cons), but that's not really useful.

In what way is haskells type system more helpful than the type system of another statically typed language

I have been using haskell for a while now. I understand most/some of the concepts but I still do not understand, what exactly does haskells type system allow me to do that I cannot do in another statically typed language. I just intuitively know that haskells type system is better in every imaginable way compared to the type system in C,C++ or java, but I can't explain it logically, primarily because of a lack of in depth knowledge about the differences in type systems between haskell and other statically typed languages.
Could someone give me examples of how haskells type system is more helpful compared to a language with a static type system. Examples, that are terse and can be succinctly expressed would be nice.
The Haskell type system has a number of features which all exist in other languages, but are rarely combined within a single, consistent language:
it is a sound, static type system, meaning that a number of errors are guaranteed not to happen at runtime without needing runtime type checks (this is also the case in Caml, SML and almost the case in Java, but not in, say, Lisp, Python, C, or C++);
it performs static type reconstruction, meaning that the programmer doesn't need to write types unless he wants to, the compiler will reconstruct them on its own (this is also the case in Caml and SML, but not in Java or C);
it supports impredicative polymorphism (type variables), even at higher kinds (unlike Caml and SML, or any other production-ready language known to me);
it has good support for overloading (type classes) (unlike Caml and SML).
Whether any of those make Haskell a better language is open to discussion — for example, while I happen to like type classes a lot, I know quite a few Caml programmers who strongly dislike overloading and prefer to use the module system.
On the other hand, the Haskell type system lacks a few features that other languages support elegantly:
it has no support for runtime dispatch (unlike Java, Lisp, and Julia);
it has no support for existential types and GADTs (these are both GHC extensions);
it has no support for dependent types (unlike Coq, Agda and Idris).
Again, whether any of these are desirable features in a general-purpose programming language is open to discussion.
In addition to what others have answered, it is also Haskell's type system that makes the language pure, i.e. which distinguishes between values of a certain type and effectful computations that produce a result of that type.
One major difference between Haskell's type system and that of most OO languages is that the ability for a function to have side effects is represented by a data type (a monad such as IO). This allows you to write pure functions that the compiler can verify are side-effect-free and referentially transparent, which generally means that they're easier to understand and less prone to bugs. It's possible to write side-effect-free code in other languages, but you don't have the compiler's help in doing so. Haskell makes you think more carefully about which parts of your program need to have side effects (such as I/O or mutable variables) and which parts should be pure.
Also, although it's not quite part of the type system itself, the fact that function definitions in Haskell are expressions (rather than lists of statements) means that more of the code is subject to type-checking. In languages like C++ and Java, it's often possible to introduce logic errors by writing statements in the wrong order, since the compiler doesn't have a way to determine that one statement must precede another. For example, you might have one line that modifies an object's state, and another line that does something important based on that state, and it's up to you to ensure that these things happen in the correct order. In Haskell, this kind of ordering dependency tends to be expressed through function composition — e.g. f (g x) means that g must run first — and the compiler can check the return type of g against the argument type of f to make sure you haven't composed them the wrong way.

What programming languages have something like Haskell’s `newtype`

The Haskell programming language has a concept of newtypes: If I write newtype Foo = Foo (Bar), then a new type Foo is created that is isomorphic to Bar, i.e. there are bijective conversions between the two. Properties of this construct are:
The two types are completely separate (i.e. the compiler will not allow you to use one where the other is expected, without using the explicit conversions).
They share the same representation. In particular, the conversion functions have a run-time cost of zero and return ”the same object” on the heap.
Conversion is only possible between such types and cannot be mis-used, i.e. type safety is preserved.
What other programming languages provide this feature?
One example seems to be single-value-structs in C when used with record accessors/constructors only. Invalid candidates would be single-valued-structs in C when used with casts, as the casts are not checked by the compiler, or objects with a single member in Java, as these would not share the same representation.
Related questions: Does F# have 'newtype' of Haskell? (No) and Does D have 'newtype'? (not any more).
Frege has this, though, unlike in Haskell there is no extra keyword. Instead, every product type with just one component is a newtype.
Example:
data Age = Age Int
Also, all langugaes that have nominal typing and allow to define a type in terms of another should have this feature. For example Oberon, Modula-2 or ADA. So after
type age = integer; {* kindly forgive syntax errors *}
one couldn't confuse an age and some other quantity.
I believe Scala's value classes satisfy these conditions.
For example:
case class Kelvin(k: Double) extends AnyVal
Edit: actually, I'm not sure that conversions have zero overhead in all cases. This documentation describes some cases where allocation of objects on the heap is necessary, so I assume in those cases there would be some runtime overhead in accessing the underlying value from the object.
Go has this:
If we declare
type MyInt int
var i int
var j MyInt
then i has type int and j has type MyInt. The variables i and j have distinct static types and, although they have the same underlying type, they cannot be assigned to one another without a conversion.
"The same underlying type" means that the representation in memory of a MyInt is exactly that of an int. Passing a MyInt to a function expecting an int is a compile-time error. The same is true for composite types, e.g. after
type foo struct { x int }
type bar struct { x int }
you can't pass a bar to a function expecting a foo (test).
Mercury is a pure logic programming language, with a type system similar to Haskell's.
Evaluation in Mercury is strict rather than lazy, so there would be no semantic difference between Mercury's equivalents of newtype and data. Consequently any type which happens to have only one constructor with only one argument is represented identically to the type of that argument, but still treated as the same type; effectively "newtype" is a transparent optimisation in Mercury. Example:
:- type wrapped
---> foo(int)
; bar(string).
:- type wrapper ---> wrapper(wrapped).
:- type synonym == wrapped.
The representation of wrapper will be identical to that of wrapped but is a distinct type, as opposed to synonym which is simply another name for the type wrapped.
Mercury uses tagged pointers in its representations.1 Being strict and being allowed to have different representations for different types, Mercury generally tries to do away with boxing where possible. e.g.
To refer to a value of an "enum-like" type (all nullary constructors) you don't need to point to any memory so you can use a whole word's worth of tag bits to say which constructor it is and inline that in the reference
To reference a list you can use a tagged pointer to a cons cell (rather than a pointer to a structure which itself contains the information about whether it is nill or a cons cell)
etc
The "newtype" optimization is really just one particular application of that general idea. The "wrapper" type doesn't need any memory cells allocated above what is already holding the "wrapped" type. And since it needs zero tag bits, it can also fit any tags in the reference to the "wrapped" type. Therefore the whole reference to the "wrapped" type can be inlined into the reference to the wrapper type, which ends up being indistinguishable at runtime.
1 The details here may only apply to the low level C compilation grades. Mercury can also compile to "high level" C or to Java. There's obviously no bit fiddling going on in Java (though as far as I know the "newtype" optimization still applies), and I'm just less familiar with the implementation details in the high level C grades.
Rust has always allowed you to make single-field types, but with the recently stabilized repr(transparent) attribute you can now be confident that the type created will have the exact data layout as the wrapped type, even across FFI and such.
#[repr(transparent)]
pub struct FooWrapper(Foo);

Haskell's TypeClasses and Go's Interfaces

What are the similarities and the differences between Haskell's TypeClasses and Go's Interfaces? What are the relative merits / demerits of the two approaches?
Looks like only in superficial ways are Go interfaces like single parameter type classes (constructor classes) in Haskell.
Methods are associated with an interface type
Objects (particular types) may have implementations of that interface
It is unclear to me whether Go in any way supports bounded polymorphism via interfaces, which is the primary purpose of type classes. That is, in Haskell, the interface methods may be used at different types,
class I a where
put :: a -> IO ()
get :: IO a
instance I Int where
...
instance I Double where
....
So my question is whether Go supports type polymorphism. If not, they're not really like type classes at all. And they're not really comparable.
Haskell's type classes allow powerful reuse of code via "generics" -- higher kinded polymorphism -- a good reference for cross-language support for such forms of generic program is this paper.
Ad hoc, or bounded polymorphism, via type classes, is well described here. This is the primary purpose of type classes in Haskell, and one not addressed via Go interfaces, meaning they're not really very similar at all. Interfaces are strictly less powerful - a kind of zeroth-order type class.
I will add to Don Stewart's excellent answer that one of the surprising consquences of Haskell's type classes is that you can use logic programming at compile time to generate arbitrarily many instances of a class. (Haskell's type-class system includes what is effectively a cut-free subset of Prolog, very similar to Datalog.) This system is exploited to great effect in the QuickCheck library. Or for a very simple example, you can see how to define a version of Boolean complement (not) that works on predicates of arbitrary arity. I suspect this ability was an unintended consequence of the type-class system, but it has proven incredibly powerful.
Go has nothing like it.
In haskell typeclass instantiation is explicit (i.e. you have to say instance Foo Bar for Bar to be an instance of Foo), while in go implementing an interface is implicit (i.e. when you define a class that defines the right methods, it automatically implements the according interface without having to say something like implement InterfaceName).
An interface can only describe methods where the instance of the interface is the receiver. In a typeclass the instantiating type can appear at any argument position or the return type of a function (i.e. you can say, if Foo is an instance of type Bar there must be a function named baz, which takes an Int and returns a Foo - you can't say that with interfaces).
Very superficial similarities, Go's interfaces are more like structural sub-typing in OCaml.
C++ Concepts (that didn't make it into C++0x) are like Haskell type classes. There were also "axioms" which aren't present in Haskell at all. They let you formalize things like the monad laws.

Resources