Does haskell erase types? - haskell

Does Haskell erase types, and if so, in what ways is this similar/dissimilar to the type erasure that occurs in Java?

Warning: experience+inference. Consult someone who works on both compilers for The Truth.
In the sense that type checking is done at compile time, and several complex features of the type system are reduced to much simpler language constructs, yes, but in a rather different way to Java.
A type signature creates no runtime overhead. The Haskell compiler is good at program transformation (it has more leeway, because the running order is in many cases not specified by the programmer), and automatically inlines appropriate definitions and specialises haskell-polymorhpic (=java-generic) functions to a particular type etc, as it sees fit, if it helps. That's a similar to Java type erasure, but more-so aspect.
There are in essence no type casts needed in Haskell to ensure type safety, because Haskell is designed to be type-safe from the ground up. We don't resort to turning everything into an Object, and we don't cast them back, because a polymorphic(generic) function genuinely does work on any data type, no matter what, pointer types or unboxed integers, it just works, without trickery. So unlike Java, casting is not a feature of compiling polymorphic(generic) code. Haskell folk tend to feel that if you're doing type casting, you said goodbye to type safety anyway.
For a lovely example of how ensuring the code's static type-correctness at compile time can avoid runtime overhead, there's a newtype construct in Haskell which is a type-safe wrapper for an existing type, and it's completely compiled away - all the construction and destruction simply doesn't happen at runtime. The type system ensures at compile time it's used correctly, it can't be got at at runtime except using (type-checked) accessor functions.
Polymorphic(generic) functions don't have polymorphic overheads. Haskell-overloaded functions (Java-interface-instance methods) have a data overhead in the sense that there's an implicit dictionary of functions used for what appears to be late binding to Java programmers, but is in fact, again, determined at compile time.
Summary: yes, even more so than in Java, and no, they were never there at runtime to erase anyway.

C and Pascal have type erasure. Java lets you inspect classes at run-time - even dynamically loaded ones!
What Haskell does is much closer to Pascal than to Java.

Related

Runtime "type terms" in LiquidHaskell vs. Idris

I have been playing around with LiquidHaskell and Idris lately and i got a pretty specific question, that i could not find a definitive answer anywhere.
Idris is a dependently typed language which is great for the most part. However i read that some type terms during type checking can "leak" from compile-time to run-time, even tough Idris does its best to eliminate those terms (this is a special feature even..). However, this elimination is not perfect and sometimes it does happen. If, why and when this happens is not immediately clear from the code and sometimes has an effect on run-time performance.
I have seen people prefering Haskells' type system, because it cannot happen there. When type checking is done, it is done. Types are "thrown away" and not used during run-time.
What is the story for LiquidHaskell? It enhances the type system capabilities quite a bit over traditional Haskell. Does LiquidHaskell also inject run-time bits for certain type "constellations" or (as i suspect) just adds another layer of "better" types over Haskell that do not affect run-time in any shape or form.
Meaning, if one removes the special LiquidHaskell type annotations and compiles it with standard GHC, is the code produced always the same? In other words: Is the LiquidHaskell extension compile-time only?
If yes, this seems like the best of both worlds, or is LiquidHaskell just not as expressive in the type system as Idris and therefore manages without run-time terms?
To answer your question as asked: Liquid Haskell allows you to provide annotations that a separate tool from the compiler verifies. Code is still compiled exactly the same way.
However, I have quibbles with your question as asked. It can be argued that there are cases in which some residue of the type must survive at run time in Haskell - especially when polymorphic recursion is involved. Consider this function:
lots :: Show a => Int -> a -> String
lots 0 x = show x
lots n x = lots (n-1) (x,x)
There is no way to statically determine the exact type involved in the use of show. Something derived from the types must survive to runtime. In practice, this is easy to do by using type class dictionaries. The theoretically important detail is that there's still type-directed behavior being chosen at runtime.

Haskell compiler magic: what requires a special treatment from the compiler?

When trying to learn Haskell, one of the difficulties that arise is the ability when something requires special magic from the compiler. One exemple that comes in mind is the seq function which can't be defined i.e. you can't make a seq2 function behaving exactly as the built-in seq. Consequently, when teaching someone about seq, you need to mention that seq is special because it's a special symbol for the compiler.
Another example would be the do-notation which only works with instances of the Monad class.
Sometimes, it's not always obvious. For instance, continuations. Does the compiler knows about Control.Monad.Cont or is it plain old Haskell that you could have invented yourself? In this case, I think nothing special is required from the compiler even if continuations are a very strange kind of beast.
Language extensions set aside, what other compiler magic Haskell learners should be aware of?
Nearly all the ghc primitives that cannot be implemented in userland are in the ghc-prim package. (it even has a module called GHC.Magic there!)
So browsing it will give a good sense.
Note that you should not use this module in userland code unless you know exactly what you are doing. Most of the usable stuff from it is exported in downstream modules in base, sometimes in modified form. Those downstream locations and APIs are considered more stable, while ghc-prim makes no guarantees as to how it will act from version to version.
The GHC-specific stuff is reexported in GHC.Exts, but plenty of other things go into the Prelude (such as basic data types, as well as seq) or the concurrency libraries, etc.
Polymorphic seq is definitely magic. You can implement seq for any specific type, but only the compiler can implement one function for all possible types [and avoid optimising it away even though it looks no-op].
Obviously the entire IO monad is deeply magic, as is everything to with concurrency and parallelism (par, forkIO, MVar), mutable storage, exception throwing and catching, querying the garbage collector and run-time stats, etc.
The IO monad can be considered a special case of the ST monad, which is also magic. (It allows truly mutable storage, which requires low-level stuff.)
The State monad, on the other hand, is completely ordinary user-level code that anybody can write. So is the Cont monad. So are the various exception / error monads.
Anything to do with syntax (do-blocks, list comprehensions) is hard-wired into the language definition. (Note, though, that some of these respond to LANGUAGE RebindableSyntax, which lets you change what functions it binds to.) Also the deriving stuff; the compiler "knows about" a handful of special classes and how to auto-generate instances for them. Deriving for newtype works for any class though. (It's just copying an instance from one type to another identical copy of that type.)
Arrays are hard-wired. Much like every other programming language.
All of the foreign function interface is clearly hard-wired.
STM can be implemented in user code (I've done it), but it's currently hard-wired. (I imagine this gives a significant performance benefit. I haven't tried actually measuring it.) But, conceptually, that's just an optimisation; you can implement it using the existing lower-level concurrency primitives.

In what way is haskells type system more helpful than the type system of another statically typed language

I have been using haskell for a while now. I understand most/some of the concepts but I still do not understand, what exactly does haskells type system allow me to do that I cannot do in another statically typed language. I just intuitively know that haskells type system is better in every imaginable way compared to the type system in C,C++ or java, but I can't explain it logically, primarily because of a lack of in depth knowledge about the differences in type systems between haskell and other statically typed languages.
Could someone give me examples of how haskells type system is more helpful compared to a language with a static type system. Examples, that are terse and can be succinctly expressed would be nice.
The Haskell type system has a number of features which all exist in other languages, but are rarely combined within a single, consistent language:
it is a sound, static type system, meaning that a number of errors are guaranteed not to happen at runtime without needing runtime type checks (this is also the case in Caml, SML and almost the case in Java, but not in, say, Lisp, Python, C, or C++);
it performs static type reconstruction, meaning that the programmer doesn't need to write types unless he wants to, the compiler will reconstruct them on its own (this is also the case in Caml and SML, but not in Java or C);
it supports impredicative polymorphism (type variables), even at higher kinds (unlike Caml and SML, or any other production-ready language known to me);
it has good support for overloading (type classes) (unlike Caml and SML).
Whether any of those make Haskell a better language is open to discussion — for example, while I happen to like type classes a lot, I know quite a few Caml programmers who strongly dislike overloading and prefer to use the module system.
On the other hand, the Haskell type system lacks a few features that other languages support elegantly:
it has no support for runtime dispatch (unlike Java, Lisp, and Julia);
it has no support for existential types and GADTs (these are both GHC extensions);
it has no support for dependent types (unlike Coq, Agda and Idris).
Again, whether any of these are desirable features in a general-purpose programming language is open to discussion.
In addition to what others have answered, it is also Haskell's type system that makes the language pure, i.e. which distinguishes between values of a certain type and effectful computations that produce a result of that type.
One major difference between Haskell's type system and that of most OO languages is that the ability for a function to have side effects is represented by a data type (a monad such as IO). This allows you to write pure functions that the compiler can verify are side-effect-free and referentially transparent, which generally means that they're easier to understand and less prone to bugs. It's possible to write side-effect-free code in other languages, but you don't have the compiler's help in doing so. Haskell makes you think more carefully about which parts of your program need to have side effects (such as I/O or mutable variables) and which parts should be pure.
Also, although it's not quite part of the type system itself, the fact that function definitions in Haskell are expressions (rather than lists of statements) means that more of the code is subject to type-checking. In languages like C++ and Java, it's often possible to introduce logic errors by writing statements in the wrong order, since the compiler doesn't have a way to determine that one statement must precede another. For example, you might have one line that modifies an object's state, and another line that does something important based on that state, and it's up to you to ensure that these things happen in the correct order. In Haskell, this kind of ordering dependency tends to be expressed through function composition — e.g. f (g x) means that g must run first — and the compiler can check the return type of g against the argument type of f to make sure you haven't composed them the wrong way.

The nature of Haskell type system: static/dynamic, manual/inferred?

I'm learning Haskell and trying to grasp how exactly Haskell type system works re working out what is the type of the thing: dynamic, static, set manually, inferred?
Languages I know a bit:
C, Java: set manually by a programmer, verified at compile time, like int i;, strong typing (subtracting integer from a string is a compile error). Typical static type system.
Python: types inferred automatically by runtime (dynamic typing),
strong typing (subtracting int from a str raises exception).
Perl, PHP: types inferred automatically at runtime (dynamic typing), weak typing.
Haskell: types often inferred automatically at compile time (either this or type is set explicitly by a programmer before compile time), strong typing.
Does Haskell's type system really deserve description "static"? I mean automatic type inference is not (classic) static typing.
Does Haskell's type system really deserve description "static"? I mean automatic type inference is not (classic) static typing.
Type inference is done at compile time. All types are checked at compile time. Haskell implementations may erase types at runtime, as they have a compile-time proof of type safety.
So it is correct to say that Haskell has a "static" type system. "Static" refers to one side of the phase distinction between compile-time and runtime.
To quote Robert Harper:
Most programming languages exhibit a phase distinction between the
static and dynamic phases of processing. The static phase consists of
parsing and type checking to ensure that the program is well-formed;
the dynamic phase consists of execution of well-formed programs. A
language is said to be safe exactly when well-formed programs are well
behaved when executed.
From Practical Foundations for Programming Languages, 2014.
Under this description Haskell is a safe language with a static type system.
As a side note, I'd strongly recommend the above book for those interested in learning the essential skills for understanding about programming languages and their features.
The static-dynamic axis and the manual-inferred (or manifest-inferred) scales are not orthogonal. A static type system can be manifest or inferred, the distinction doesn't apply to dynamic typing. Python, Perl, PHP don't infer types because type inference is the deduction of static types via static analysis (i.e., at compile time).
Dynamic languages don't deduce types like that, they just compute the types of values alongside the actual computation. Haskell does deduce types statically, it is statically typed, you just don't have to write the static types manually. Its type system indeed differs from mainstream type systems, but it differs by being inferred rather than manifest (and many other features), not by being not static.
As for strong/weak typing: Stop using that term, it is overloaded to the point of being useless. If by "the type system is strong/weak" you mean "the type system allows/forbids X" by, then say that, because if you call it strong/weak typing a large portion of your audience will have a different definition and disagree with your use of the terms. Moreover, as you see, for most choices of X it's rather independent of the two distinctions you mention in the title (but then there's the sizable group that uses strong as synonym for static and weak as synonym for dynamic, oh my!).
Before going to SO it would be good to check e.g. Wikipedia, which says that "Haskell has a strong, static type system based on Hindley–Milner type inference." Static refers to when type checking is done, so whether types are inferred or not doesn't matter.

Why does GHC typecheck before desugaring?

Is there a good reason to run the typechecker first? It would seem that the typechecker would be vastly simpler if it ran on a smaller syntax, especially because with the current system every syntax extension needs to touch the typechecker. This question applies especially to arrow syntax, the typechecking of which as described in comments here is known to be bogus.
I imagine one reason for this would be not emitting errors that mention generated code, but this situation is already covered in cases where a deriving clause fails to typecheck; GHC knows that code was generated.
There is a section on this question in the GHC article found in volume 2 of the book "The Architecture of Open Source Application":
Type Checking the Source Language
One interesting design decision is
whether type checking should be done before or after desugaring. The
trade-offs are these:
Type checking before desugaring means that the type checker must deal
directly with Haskell's very large syntax, so the type checker has
many cases to consider. If we desugared into (an untyped variant of)
Core first, one might hope that the type checker would become much
smaller.
On the other hand, type checking after desugaring would
impose a significant new obligation: that desugaring does not affect
which programs are type-correct. After all, desugaring implies a
deliberate loss of information. It is probably the case that in 95% of
the cases there is no problem, but any problem here would force some
compromise in the design of Core to preserve some extra information.
Most seriously of all, type checking a desugared program would make it
much harder to report errors that relate to the original program text,
and not to its (sometimes elaborate) desugared version.
Most compilers type check after desugaring, but for GHC we made the opposite choice:
we type check the full original Haskell syntax, and then desugar the
result. It sounds as if adding a new syntactic construct might be
complicated, but (following the French school) we have structured the
type inference engine in a way that makes it easy. Type inference is
split into two parts:
Constraint generation: walk over the source syntax tree, generating a
collection of type constraints. This step deals with the full syntax
of Haskell, but it is very straightforward code, and it is easy to add
new cases.
Constraint solving: solve the gathered constraints. This is
where the subtlety of the type inference engine lies, but it is
independent of the source language syntax, and would be the same for a
much smaller or much larger language.
On the whole, the
type-check-before-desugar design choice has turned out to be a big
win. Yes, it adds lines of code to the type checker, but they are
simple lines. It avoids giving two conflicting roles to the same data
type, and makes the type inference engine less complex, and easier to
modify. Moreover, GHC's type error messages are pretty good.

Resources