How does one avoid creating an ad-hoc type system in dynamically typed languages? - programming-languages

In every project I've started in languages without type systems, I eventually begin to invent a runtime type system. Maybe the term "type system" is too strong; at the very least, I create a set of type/value-range validators when I'm working with complex data types, and then I feel the need to be paranoid about where data types can be created and modified.
I hadn't thought twice about it until now. As an independent developer, my methods have been working in practice on a number of small projects, and there's no reason they'd stop working now.
Nonetheless, this must be wrong. I feel as if I'm not using dynamically-typed languages "correctly". If I must invent a type system and enforce it myself, I may as well use a language that has types to begin with.
So, my questions are:
Are there existing programming paradigms (for languages without types) that avoid the necessity of using or inventing type systems?
Are there otherwise common recommendations on how to solve the problems that static typing solves in dynamically-typed languages (without sheepishly reinventing types)?
Here is a concrete example for you to consider. I'm working with datetimes and timezones in erlang (a dynamic, strongly typed language). This is a common datatype I work with:
{{Y,M,D},{tztime, {time, HH,MM,SS}, Flag}}
... where {Y,M,D} is a tuple representing a valid date (all entries are integers), tztime and time are atoms, HH,MM,SS are integers representing a sane 24-hr time, and Flag is one of the atoms u,d,z,s,w.
This datatype is commonly parsed from input, so to ensure valid input and a correct parser, the values need to be checked for type correctness, and for valid ranges. Later on, instances of this datatype are compared to each other, making the type of their values all the more important, since all terms compare. From the erlang reference manual
number < atom < reference < fun < port < pid < tuple < list < bit string

Aside from the confsion of static vs. dynamic and strong vs. weak typing:
What you want to implement in your example isn't really solved by most existing static typing systems. Range checks and complications like February 31th and especially parsed input are usually checked during runtime no matter what type system you have.
Your example being in Erlang I have a few recommendations:
Use records. Besides being usefull and helpfull for a whole bunch of reasons, the give you easy runtime type checking without a lot of effort e.g.:
is_same_day(#datetime{year=Y1, month=M1, day=D1},
#datetime{year=Y2, month=M2, day=D2}) -> ...
Effortless only matches for two datetime records. You could even add guards to check for ranges if the source is untrusted. And it conforms to erlangs let it crash method of error handling: if no match is found you get a badmatch, and can handle this on the level where it is apropriate (usually the supervisor level).
Generally write your code that it crashes when the assumptions are not valid
If this doesn't feel static checked enough: use typer and dialyzer to find the kind of errors that can be found statically, whatever remains will be checkd at runtime.
Don't be too restrictive in your functions what "types" you accept, sometimes the added functionality of just doing someting useful even for different inputs is worth more than checking the types and ranges on every function. If you do it where it matters usually you will catch the error early enough for it to be easy fixable. This is especially true for a functionaly language where you allways know where every value comes from.

A lot of good answers, let me add:
Are there existing programming paradigms (for languages without types) that avoid the necessity of using or inventing type systems?
The most important paradigm, especially in Erlang, is this: Assume the type is right, otherwise let it crash. Don't write excessively checking paranoid code, but assume that the input you get is of the right type or the right pattern. Don't write (there are exceptions to this rule, but in general)
foo({tag, ...}) -> do_something(..);
foo({tag2, ...}) -> do_something_else(..);
foo(Otherwise) ->
report_error(Otherwise),
try to fix problem here...
Kill the last clause and have it crash right away. Let a supervisor and other processes do the cleanup (you can use monitors() for janitorial processes to know when a crash has occurred).
Do be precise however. Write
bar(N) when is_integer(N) -> ...
baz([]) -> ...
baz(L) when is_list(L) -> ...
if the function is known only to work with integers or lists respectively. Yes, it is a runtime check but the goal is to convey information to the programmer. Also, HiPE tend to utilize the hint for optimization and eliminate the type check if possible. Hence, the price may be less than what you think it is.
You choose an untyped/dynamically-typed language so the price you have to pay is that type checking and errors from clashes will happen at runtime. As other posts hint, a statically typed language is not exempt from doing some checks as well - the type system is (usually) an approximation of a proof of correctness. In most static languages you often get input which you can't trust. This input is transformed at the "border" of the application and then converted to an internal format. The conversion serves to mark trust: From now on, the thing has been validated and we can assume certain things about it. The power and correctness of this assumption is directly tied to its type signature and how good the programmer is with juggling the static types of the language.
Are there otherwise common recommendations on how to solve the problems that static typing solves in dynamically-typed languages (without sheepishly reinventing types)?
Erlang has the dialyzer which can be used to statically analyze and infer types of your programs. It will not come up with as many type errors as a type checker in e.g., Ocaml, but it won't "cry wolf" either: An error from the dialyzer is provably an error in the program. And it won't reject a program which may be working ok. A simple example is:
and(true, true) -> true;
and(true, _) -> false;
and(false, _) -> false.
The invocation and(true, greatmistake) will return false, yet a static type system will reject the program because it will infer from the first line that the type signature takes a boolean() value as the 2nd parameter. The dialyzer will accept this function in contrast and give it the signature (boolean(), term()) -> boolean(). It can do this, because there is no need to protect a priori for an error. If there is a mistake, the runtime system has a type check that will capture it.

In order for a statically-typed language to match the flexibility of a dynamically-typed one, I think it would need a lot, perhaps infinitely many, features.
In the Haskell world, one hears a lot of sophisticated, sometimes to the point of being scary, teminology. Type classes. Parametric polymorphism. Generalized algebraic data types. Type families. Functional dependencies. The Ωmega programming language takes it even further, with the website listing "type-level functions" and "level polymorphism", among others.
What are all these? Features added to static typing to make it more flexible. These features can be really cool, and tend to be elegant and mind-blowing, but are often difficult to understand. Learning curve aside, type systems often fail to model real-world problems elegantly. A particularly good example of this is interacting with other languages (a major motivation for C# 4's dynamic feature).
Dynamically-typed languages give you the flexibility to implement your own framework of rules and assumptions about data, rather than be constrained by the ever-limited static type system. However, "your own framework" won't be machine-checked, meaning the onus is on you to ensure your "type system" is safe and your code is well-"typed".
One thing I've found from learning Haskell is that I can carry lessons learned about strong typing and sound reasoning over to weaker-typed languages, such as C and even assembly, and do the "type checking" myself. Namely, I can prove that sections of code are correct in and of themselves, by bearing in mind the rules my functions and values are supposed to follow, and the assumptions I am allowed to make about other functions and values. When debugging, I go through and check things again, and think through whether or not my approach is sound.
The bottom line: dynamic typing puts more flexibility at your fingertips. On the other hand, statically-typed languages tend to be more efficient (by orders of magnitude), and good static type systems drastically cut down on debugging time by letting the computer do much of it for you. If you want the benefits of both, install a static type checker in your brain by learning decent, strongly-typed languages.

Sometimes data need validation. Validating any data received from the network is almost always a good idea — especially data from a public network. Being paranoid here is only good. If something resembling a static type system helps this in the least painful way, so be it. There's a reason why Erlang allows type annotations. Even pattern matching can be seen as just a kind of dynamic type checking; nevertheless, it's a central feature of the language. The very structure of data is its 'type' in Erlang.
The good thing is that you can custom-tailor your 'type system' to your needs, make it flexible and smart, while type systems of OO languages typically have fixed features. When data structures you use are immutable, once you've validated such a structure, you're safe to assume it conforms your restrictions, just like with static typing.
There's no point in being ready to process any kind of data at any point of a program, dynamically-typed or not. A 'dynamic type' is essentially a union of all possible types; limiting it to a useful subset is a valid way to program.

A statically typed language detects type errors at compile time. A dynamically typed language detects them at runtime. There are some modest restrictions on what one can write in a statically typed language such that all type errors can be caught at compile time.
But yes, you still have types even in a dynamically typed language, and that's a good thing. The problem is you wander into lots of runtime checks to ensure that you have the types you think you do, since the compiler hasn't taken care of that for you.
Erlang has a very nice tool for specifying and statically verifying lots of types -- dialyzer: Erlang type system, for references.
So don't reinvent types, use the typing tools that Erlang already provides, to handle the types that already exist in your program (but which you haven't yet specified).
And this on its own won't eliminate range checks, unfortunately. Without lots of special sauce you really have to enforce this on your own by convention (and smart constructors, etc. to help), or fall back to runtime checks, or both.

Related

Is Haskell a strongly typed programming language?

Is Haskell strongly typed? I.e. is it possible to change the type of a variable after you assigned one? I can't seem to find the answer on the internet.
Static — types are known at compile time. Java and Haskell have static typing. Also C/C++, C#, Go, Scala, Rust, Kotlin, Pascal to list a few more.
A statically typed language might or might not have type inference. Java almost completely lacks type inference (but it's very slowly changing just a little bit); Haskell has full type inference (except with certain very advanced extensions).
(Type inference is when you only have to declare a minimal amount of types by hand, e.g. var isFoo = true and var person = new Person(), instead of bool isFoo = ... and Person person = ....)
Dynamic — Python, JavaScript, Ruby, PHP, Clojure (and Lisps in general), Prolog, Erlang, Groovy etc. Can also be called "unityped"; dynamic typing can be "emulated" in a static setting, but the reverse is not true except by using external static analysis tools. Some languages make it possible to mix dynamic and static (see gradual typing, e.g. https://typedclojure.org/).
Some languages enable static typing for one or more modules, applied at import time, for example: Python+Mypy, Typed Clojure, JavaScript+Flow, PHP+Hack to name a few.
Strong — values that are intended to be treated as Cat always are; trying to treat them like a Dog will cause a loud meeewww... I mean error.
Weak — this effectively boils down to 2 similar but distinct things: type coercion (e.g. "5"+3 equals 8 in PHP — or does it!) and memory reinterpretation (e.g. (int) someCharValue or (bool) somePtr in C, and C++ as well, but C++ wants you to explicitly say reinterpret_cast). So there's really coercion-weak and reinterpretation-weak, and different languages are weak in one or both of these ways.
Interestingly, note that coercion is implicit by nature and memory reinterpretation is explicit (except in Assembly) — so weak typing consists of an implicit and an explicit behavior. Maybe that's even more of a reason to refer to 2 distinct subcategories under weak typing.
There are languages with all 4 possible combinations, and variations/gradations thereof.
Haskell is static+strong; of course it has unsafeCoerce so it can be static+a bit reinterpret-weak at times, but unsafeCoerce is very much frowned upon except in extreme situations where you are sure about something being the case but just can't seem to persuade the compiler without going all the way back and retelling the entire story in a different way.
C is static+weak because all memory can freely be reinterpreted as something it originally was not meant to contain, hence weak. But all of those reinterpretations are kept track of by the type checker, so still fully static too. But C does not do implicit coercions, so it's only reinterpret-weak.
Python is dynamic+almost entirely strong — there are no types known on any given line of code prior to reaching that line during execution, however values that live at runtime do have types associated with them and it's impossible to reinterpret memory. Implicit coercions are also kept to a meaningful minimum, so one might say Python is 99.9% strong and 0.01% coercion-weak.
PHP and JavaScript are dynamic+mostly weak — dynamic, in that nothing has type until you execute and introspect its contents, and also weak in that coercions happen all the time and with things you'd never really expect to be coerced, unless you are only calling methods and functions and not using built-in operations. These coercions are a source of a lot of humor on the internet. There are no memory reinterpretations so PHP and JS are coercion-weak.
Furthermore, some people like to think that static typing is about variables having type, and strong typing is about values having type — this is a very useful way to go about understanding the full picture, but it's not quite true: some dynamically typed languages also allow variables/parameters to be annotated with types/constraints that are enforced at runtime.
In static typing, it's expressions that have a type; the fact of variables having type is only a consequence of variables being used as a means to glue bigger expressions together from smaller ones, so it's not variables per se that have types.
Similarly, in dynamic typing, it's not the variables that lack statically known type — it's all expressions! Variables lacking type is merely a consequence of the expressions they store lacking type.
One final illustration
In dynamic typing, all the cats, dogs and even elephants (in fact entire zoos!) are packaged up in identically sized boxes.
In static typing these boxes look different and have stickers on them saying what's inside.
Some people like it because they can just use a single box form factor and don't have to put any labels on the boxes — it's only the arrangement of boxes with regards to each other that implicitly (and hopefully) provides type sanity.
Some people also like it because it allows them to do all sorts of tricks with tigers temporarily being transported in boxes that smell like lions, and bears put in the same array of interconnected boxes as wolves or deer.
In such label-free setting of transport boxes, all the possible logicistics scenarios need to be played or simulated in order to detect misalignment in the implicit arrangement, like in a stage performance. No reliable guarantees can be given based on reasoning only, generally speaking. (ad-hoc test cases that need for the entire system to be started up for any partial conclusions to be obtained of its soundness)
With labels and explicit rules on how to deal with boxes of various labels, automated/mechanized logical reasoning can be used to draw up conclusions on what the logistics system won't do or will do for sure (static verification, formal proof, or at least pseudo-proof like QuickCheck), Some aspects of the logistics still need to be verified with trial runs, such as whether the logistics team even got the client right. (integration testing, acceptance testing, end user sanity checks).
Moreover, in weak typing dogs can be sliced up and reassembled as frankenstein cats. Whether they like it or not, and whether the result is ugly or not. (weak typing)
But if you add labels to the boxes, it still matters that Frankenstein cats be put in cat boxes. (static+weak typing)
In strong typing, while you can put a cat in the box of a dog, but you can only keep pretending it's a dog until you try to humiliate it by feeding it something only dogs would eat — if that happens, it will scream out loud, but until that time, if you're in dynamic typing, it will silently accept its place (in a static world it would refuse to be put in a dog's box before you can say "kitty").
You seem to mix up dynamic/static and weak/strong typing.
Dynamic or static typing is about whether the type of a variable can be changed during execution.
Weak or strong typing is about being able to predict type errors just from function signatures.
Haskell is both statically and strongly typed.
However, there is no such thing as variable in Haskell so talking about dynamic or static typing makes no sense since every identifier assigned with a value cannot be changed at execution.
EDIT: But like goldenbull said, those typing notions are not clearly defined.
It is strongly typed. See section 2.3 here: Why Haskell matters
I think you are talking about two different things.
First, haskell, and most functional programming (FP) languages, do NOT have the concept "variable". Instead, they use the concept "name" and "value", they just "bind" a value to a name. Once the value is bound, you can not bind another value to the same name, this is the key feature of FP.
Strong typing is another topic. Yes, haskell is strongly typed, and so are most FP languages. Strong typing gives FP the ability of "type inference" which is powerful to eliminate hidden bugs in compile time and help reduce the size of the source code.
Maybe you are comparing haskell with python? Python is also strongly typed. The difference between haskell and python is "static typed" and "dynamic typed". The actual meaning of term "Strong type" and "Weak Type" are ambiguous and fuzzy. That is another long story...

What's so bad about Template Haskell?

It seems that Template Haskell is often viewed by the Haskell community as an unfortunate convenience. It's hard to put into words exactly what I have observed in this regard, but consider these few examples
Template Haskell listed under "The Ugly (but necessary)" in response to the question Which Haskell (GHC) extensions should users use/avoid?
Template Haskell considered a temporary/inferior solution in Unboxed Vectors of newtype'd values thread (libraries mailing list)
Yesod is often criticized for relying too much on Template Haskell (see the blog post in response to this sentiment)
I've seen various blog posts where people do pretty neat stuff with Template Haskell, enabling prettier syntax that simply wouldn't be possible in regular Haskell, as well as tremendous boilerplate reduction. So why is it that Template Haskell is looked down upon in this way? What makes it undesirable? Under what circumstances should Template Haskell be avoided, and why?
One reason for avoiding Template Haskell is that it as a whole isn't type-safe, at all, thus going against much of "the spirit of Haskell." Here are some examples of this:
You have no control over what kind of Haskell AST a piece of TH code will generate, beyond where it will appear; you can have a value of type Exp, but you don't know if it is an expression that represents a [Char] or a (a -> (forall b . b -> c)) or whatever. TH would be more reliable if one could express that a function may only generate expressions of a certain type, or only function declarations, or only data-constructor-matching patterns, etc.
You can generate expressions that don't compile. You generated an expression that references a free variable foo that doesn't exist? Tough luck, you'll only see that when actually using your code generator, and only under the circumstances that trigger the generation of that particular code. It is very difficult to unit test, too.
TH is also outright dangerous:
Code that runs at compile-time can do arbitrary IO, including launching missiles or stealing your credit card. You don't want to have to look through every cabal package you ever download in search for TH exploits.
TH can access "module-private" functions and definitions, completely breaking encapsulation in some cases.
Then there are some problems that make TH functions less fun to use as a library developer:
TH code isn't always composable. Let's say someone makes a generator for lenses, and more often than not, that generator will be structured in such a way that it can only be called directly by the "end-user," and not by other TH code, by for example taking a list of type constructors to generate lenses for as the parameter. It is tricky to generate that list in code, while the user only has to write generateLenses [''Foo, ''Bar].
Developers don't even know that TH code can be composed. Did you know that you can write forM_ [''Foo, ''Bar] generateLens? Q is just a monad, so you can use all of the usual functions on it. Some people don't know this, and because of that, they create multiple overloaded versions of essentially the same functions with the same functionality, and these functions lead to a certain bloat effect. Also, most people write their generators in the Q monad even when they don't have to, which is like writing bla :: IO Int; bla = return 3; you are giving a function more "environment" than it needs, and clients of the function are required to provide that environment as an effect of that.
Finally, there are some things that make TH functions less fun to use as an end-user:
Opacity. When a TH function has type Q Dec, it can generate absolutely anything at the top-level of a module, and you have absolutely no control over what will be generated.
Monolithism. You can't control how much a TH function generates unless the developer allows it; if you find a function that generates a database interface and a JSON serialization interface, you can't say "No, I only want the database interface, thanks; I'll roll my own JSON interface"
Run time. TH code takes a relatively long time to run. The code is interpreted anew every time a file is compiled, and often, a ton of packages are required by the running TH code, that have to be loaded. This slows down compile time considerably.
This is solely my own opinion.
It's ugly to use. $(fooBar ''Asdf) just does not look nice. Superficial, sure, but it contributes.
It's even uglier to write. Quoting works sometimes, but a lot of the time you have to do manual AST grafting and plumbing. The API is big and unwieldy, there's always a lot of cases you don't care about but still need to dispatch, and the cases you do care about tend to be present in multiple similar but not identical forms (data vs. newtype, record-style vs. normal constructors, and so on). It's boring and repetitive to write and complicated enough to not be mechanical. The reform proposal addresses some of this (making quotes more widely applicable).
The stage restriction is hell. Not being able to splice functions defined in the same module is the smaller part of it: the other consequence is that if you have a top-level splice, everything after it in the module will be out of scope to anything before it. Other languages with this property (C, C++) make it workable by allowing you to forward declare things, but Haskell doesn't. If you need cyclic references between spliced declarations or their dependencies and dependents, you're usually just screwed.
It's undisciplined. What I mean by this is that most of the time when you express an abstraction, there is some kind of principle or concept behind that abstraction. For many abstractions, the principle behind them can be expressed in their types. For type classes, you can often formulate laws which instances should obey and clients can assume. If you use GHC's new generics feature to abstract the form of an instance declaration over any datatype (within bounds), you get to say "for sum types, it works like this, for product types, it works like that". Template Haskell, on the other hand, is just macros. It's not abstraction at the level of ideas, but abstraction at the level of ASTs, which is better, but only modestly, than abstraction at the level of plain text.*
It ties you to GHC. In theory another compiler could implement it, but in practice I doubt this will ever happen. (This is in contrast to various type system extensions which, though they might only be implemented by GHC at the moment, I could easily imagine being adopted by other compilers down the road and eventually standardized.)
The API isn't stable. When new language features are added to GHC and the template-haskell package is updated to support them, this often involves backwards-incompatible changes to the TH datatypes. If you want your TH code to be compatible with more than just one version of GHC you need to be very careful and possibly use CPP.
There's a general principle that you should use the right tool for the job and the smallest one that will suffice, and in that analogy Template Haskell is something like this. If there's a way to do it that's not Template Haskell, it's generally preferable.
The advantage of Template Haskell is that you can do things with it that you couldn't do any other way, and it's a big one. Most of the time the things TH is used for could otherwise only be done if they were implemented directly as compiler features. TH is extremely beneficial to have both because it lets you do these things, and because it lets you prototype potential compiler extensions in a much more lightweight and reusable way (see the various lens packages, for example).
To summarize why I think there are negative feelings towards Template Haskell: It solves a lot of problems, but for any given problem that it solves, it feels like there should be a better, more elegant, disciplined solution better suited to solving that problem, one which doesn't solve the problem by automatically generating the boilerplate, but by removing the need to have the boilerplate.
* Though I often feel that CPP has a better power-to-weight ratio for those problems that it can solve.
EDIT 23-04-14: What I was frequently trying to get at in the above, and have only recently gotten at exactly, is that there's an important distinction between abstraction and deduplication. Proper abstraction often results in deduplication as a side effect, and duplication is often a telltale sign of inadequate abstraction, but that's not why it's valuable. Proper abstraction is what makes code correct, comprehensible, and maintainable. Deduplication only makes it shorter. Template Haskell, like macros in general, is a tool for deduplication.
I'd like to address a few of the points dflemstr brings up.
I don't find the fact that you can't typecheck TH to be that worrying. Why? Because even if there is an error, it will still be compile time. I'm not sure if this strengthens my argument, but this is similar in spirit to the errors that you receive when using templates in C++. I think these errors are more understandable than C++'s errors though, as you'll get a pretty printed version of the generated code.
If a TH expression / quasi-quoter does something that's so advanced that tricky corners can hide, then perhaps it's ill-advised?
I break this rule quite a bit with quasi-quoters I've been working on lately (using haskell-src-exts / meta) - https://github.com/mgsloan/quasi-extras/tree/master/examples . I know this introduces some bugs such as not being able to splice in the generalized list comprehensions. However, I think that there's a good chance that some of the ideas in http://hackage.haskell.org/trac/ghc/blog/Template%20Haskell%20Proposal will end up in the compiler. Until then, the libraries for parsing Haskell to TH trees are a nearly perfect approximation.
Regarding compilation speed / dependencies, we can use the "zeroth" package to inline the generated code. This is at least nice for the users of a given library, but we can't do much better for the case of editing the library. Can TH dependencies bloat generated binaries? I thought it left out everything that's not referenced by the compiled code.
The staging restriction / splitting of compilation steps of the Haskell module does suck.
RE Opacity: This is the same for any library function you call. You have no control over what Data.List.groupBy will do. You just have a reasonable "guarantee" / convention that the version numbers tell you something about the compatibility. It is somewhat of a different matter of change when.
This is where using zeroth pays off - you're already versioning the generated files - so you'll always know when the form of the generated code has changed. Looking at the diffs might be a bit gnarly, though, for large amounts of generated code, so that's one place where a better developer interface would be handy.
RE Monolithism: You can certainly post-process the results of a TH expression, using your own compile-time code. It wouldn't be very much code to filter on top-level declaration type / name. Heck, you could imagine writing a function that does this generically. For modifying / de-monolithisizing quasiquoters, you can pattern match on "QuasiQuoter" and extract out the transformations used, or make a new one in terms of the old.
This answer is in response to the issues brought up by illissius, point by point:
It's ugly to use. $(fooBar ''Asdf) just does not look nice. Superficial, sure, but it contributes.
I agree. I feel like $( ) was chosen to look like it was part of the language - using the familiar symbol pallet of Haskell. However, that's exactly what you /don't/ want in the symbols used for your macro splicing. They definitely blend in too much, and this cosmetic aspect is quite important. I like the look of {{ }} for splices, because they are quite visually distinct.
It's even uglier to write. Quoting works sometimes, but a lot of the time you have to do manual AST grafting and plumbing. The [API][1] is big and unwieldy, there's always a lot of cases you don't care about but still need to dispatch, and the cases you do care about tend to be present in multiple similar but not identical forms (data vs. newtype, record-style vs. normal constructors, and so on). It's boring and repetitive to write and complicated enough to not be mechanical. The [reform proposal][2] addresses some of this (making quotes more widely applicable).
I also agree with this, however, as some of the comments in "New Directions for TH" observe, the lack of good out-of-the-box AST quoting is not a critical flaw. In this WIP package, I seek to address these problems in library form: https://github.com/mgsloan/quasi-extras . So far I allow splicing in a few more places than usual and can pattern match on ASTs.
The stage restriction is hell. Not being able to splice functions defined in the same module is the smaller part of it: the other consequence is that if you have a top-level splice, everything after it in the module will be out of scope to anything before it. Other languages with this property (C, C++) make it workable by allowing you to forward declare things, but Haskell doesn't. If you need cyclic references between spliced declarations or their dependencies and dependents, you're usually just screwed.
I've run into the issue of cyclic TH definitions being impossible before... It's quite annoying. There is a solution, but it's ugly - wrap the things involved in the cyclic dependency in a TH expression that combines all of the generated declarations. One of these declarations generators could just be a quasi-quoter that accepts Haskell code.
It's unprincipled. What I mean by this is that most of the time when you express an abstraction, there is some kind of principle or concept behind that abstraction. For many abstractions, the principle behind them can be expressed in their types. When you define a type class, you can often formulate laws which instances should obey and clients can assume. If you use GHC's [new generics feature][3] to abstract the form of an instance declaration over any datatype (within bounds), you get to say "for sum types, it works like this, for product types, it works like that". But Template Haskell is just dumb macros. It's not abstraction at the level of ideas, but abstraction at the level of ASTs, which is better, but only modestly, than abstraction at the level of plain text.
It's only unprincipled if you do unprincipled things with it. The only difference is that with the compiler implemented mechanisms for abstraction, you have more confidence that the abstraction isn't leaky. Perhaps democratizing language design does sound a bit scary! Creators of TH libraries need to document well and clearly define the meaning and results of the tools they provide. A good example of principled TH is the derive package: http://hackage.haskell.org/package/derive - it uses a DSL such that the example of many of the derivations /specifies/ the actual derivation.
It ties you to GHC. In theory another compiler could implement it, but in practice I doubt this will ever happen. (This is in contrast to various type system extensions which, though they might only be implemented by GHC at the moment, I could easily imagine being adopted by other compilers down the road and eventually standardized.)
That's a pretty good point - the TH API is pretty big and clunky. Re-implementing it seems like it could be tough. However, there are only really only a few ways to slice the problem of representing Haskell ASTs. I imagine that copying the TH ADTs, and writing a converter to the internal AST representation would get you a good deal of the way there. This would be equivalent to the (not insignificant) effort of creating haskell-src-meta. It could also be simply re-implemented by pretty printing the TH AST and using the compiler's internal parser.
While I could be wrong, I don't see TH as being that complicated of a compiler extension, from an implementation perspective. This is actually one of the benefits of "keeping it simple" and not having the fundamental layer be some theoretically appealing, statically verifiable templating system.
The API isn't stable. When new language features are added to GHC and the template-haskell package is updated to support them, this often involves backwards-incompatible changes to the TH datatypes. If you want your TH code to be compatible with more than just one version of GHC you need to be very careful and possibly use CPP.
This is also a good point, but somewhat dramaticized. While there have been API additions lately, they haven't been extensively breakage inducing. Also, I think that with the superior AST quoting I mentioned earlier, the API that actually needs to be used can be very substantially reduced. If no construction / matching needs distinct functions, and are instead expressed as literals, then most of the API disappears. Moreover, the code you write would port more easily to AST representations for languages similar to Haskell.
In summary, I think that TH is a powerful, semi-neglected tool. Less hate could lead to a more lively eco-system of libraries, encouraging the implementation of more language feature prototypes. It's been observed that TH is an overpowered tool, that can let you /do/ almost anything. Anarchy! Well, it's my opinion that this power can allow you to overcome most of its limitations, and construct systems capable of quite principled meta-programming approaches. It's worth the usage of ugly hacks to simulate the "proper" implementation, as this way the design of the "proper" implementation will gradually become clear.
In my personal ideal version of nirvana, much of the language would actually move out of the compiler, into libraries of these variety. The fact that the features are implemented as libraries does not heavily influence their ability to faithfully abstract.
What's the typical Haskell answer to boilerplate code? Abstraction. What're our favorite abstractions? Functions and typeclasses!
Typeclasses let us define a set of methods, that can then be used in all manner of functions generic on that class. However, other than this, the only way classes help avoid boilerplate is by offering "default definitions". Now here is an example of an unprincipled feature!
Minimal binding sets are not declarable / compiler checkable. This could lead to inadvertent definitions that yield bottom due to mutual recursion.
Despite the great convenience and power this would yield, you cannot specify superclass defaults, due to orphan instances http://lukepalmer.wordpress.com/2009/01/25/a-world-without-orphans/ These would let us fix the numeric hierarchy gracefully!
Going after TH-like capabilities for method defaults led to http://www.haskell.org/haskellwiki/GHC.Generics . While this is cool stuff, my only experience debugging code using these generics was nigh-impossible, due to the size of the type induced for and ADT as complicated as an AST. https://github.com/mgsloan/th-extra/commit/d7784d95d396eb3abdb409a24360beb03731c88c
In other words, this went after the features provided by TH, but it had to lift an entire domain of the language, the construction language, into a type system representation. While I can see it working well for your common problem, for complex ones, it seems prone to yielding a pile of symbols far more terrifying than TH hackery.
TH gives you value-level compile-time computation of the output code, whereas generics forces you to lift the pattern matching / recursion part of the code into the type system. While this does restrict the user in a few fairly useful ways, I don't think the complexity is worth it.
I think that the rejection of TH and lisp-like metaprogramming led to the preference towards things like method-defaults instead of more flexible, macro-expansion like declarations of instances. The discipline of avoiding things that could lead to unforseen results is wise, however, we should not ignore that Haskell's capable type system allows for more reliable metaprogramming than in many other environments (by checking the generated code).
One rather pragmatic problem with Template Haskell is that it only works when GHC's bytecode interpreter is available, which is not the case on all architectures. So if your program uses Template Haskell or relies on libraries that use it, it will not run on machines with an ARM, MIPS, S390 or PowerPC CPU.
This is relevant in practice: git-annex is a tool written in Haskell that makes sense to run on machines worrying about storage, such machines often have non-i386-CPUs. Personally, I run git-annex on a NSLU 2 (32 MB of RAM, 266MHz CPU; did you know Haskell works fine on such hardware?) If it would use Template Haskell, this is not possible.
(The situation about GHC on ARM is improving these days a lot and I think 7.4.2 even works, but the point still stands).
Why is TH bad? For me, it comes down to this:
If you need to produce so much repetitive code that you find yourself trying to use TH to auto-generate it, you're doing it wrong!
Think about it. Half the appeal of Haskell is that its high-level design allows you to avoid huge amounts of useless boilerplate code that you have to write in other languages. If you need compile-time code generation, you're basically saying that either your language or your application design has failed you. And we programmers don't like to fail.
Sometimes, of course, it's necessary. But sometimes you can avoid needing TH by just being a bit more clever with your designs.
(The other thing is that TH is quite low-level. There's no grand high-level design; a lot of GHC's internal implementation details are exposed. And that makes the API prone to change...)

Haskell without types

Is it possible to disable or work around the type system in Haskell? There are situations where it is convenient to have everything untyped as in Forth and BCPL or monotyped as in Mathematica. I'm thinking along the lines of declaring everything as the same type or of disabling type checking altogether.
Edit: In conformance with SO principles, this is a narrow technical question, not a request for discussion of the relative merits of different programming approaches. To rephrase the question, "Can Haskell be used in a way such that avoidance of type conflicts is entirely the responsibility of the programmer?"
Also look at Data.Dynamic which allows you to have dynamically typed values in parts of your code without disabling type-checking throughout.
GHC 7.6 (not released yet) has a similar feature, -fdefer-type-errors:
http://hackage.haskell.org/trac/ghc/wiki/DeferErrorsToRuntime
It will defer all type errors until runtime. It's not really untyped but it allows almost as much freedom.
Even with fdefer-type-errors one wouldn't be avoiding the type system. Nor does it really allow type independence. The point of the flag is to allow code with type errors to compile, so long as the errors are not called by the Main function. In particular, any code with a type error, when actually called by a Haskell interpreter, will still fail.
While the prospect of untyped functions in Haskell might be tempting, it's worth noting that the type system is really at the heart of the language. The code proves its own functionality in compilation, and the rigidity of the type system prevents a large number of errors.
Perhaps if you gave a specific example of the problem you're having, the community could address it. Interconverting between number types is something that I've asked about before, and there are a number of good tricks.
Perhaps fdefer-type-errors combined with https://hackage.haskell.org/package/base-4.14.1.0/docs/Unsafe-Coerce.html offers what you need.

Basic Concepts of Language Type Systems

Could someone please explain clearly and succinctly the concepts of language type systems?
I've read a post or two here on type systems, but have trouble finding one that answers all my questions below.
I've heard/read that there are 3 type categorizations: dynamic vs static, strong vs weak, safe vs unsafe.
Some questions:
Are there any others?
What do each of these mean?
If a language allows you to change the type of a variable in runtime (e.g. a variable that used to store an int is later used to store a string), what category does that fall in?
How does Python fit into each of these categories?
Is there anything else I should know about type systems?
Thanks very much!
1) Apparently, there are others: http://en.wikipedia.org/wiki/Type_system
2)
Dynamic => Type checking is done during runtime (program execution) e.g. Python.
Static (as opposed to Dynamic) => Type checking is done during compile time e.g. C++
Strong => Once the type system decides that a particular object is of a type, it doesn't allow it to be used as another type. e.g. Python
Weak (as opposed to Strong) => The type system allows objects types to change. e.g. perl lets you read a number as a string, then use it again as a number
Type safety => I can only best describe with a 'C' statement like:
x = (int *) malloc (...);
malloc returns a (void *) and we simply type-cast it to (int *). At compile time there is no check that the pointer returned by the function malloc will actually be the size of an integer => Some C operations aren't type safe.
I am told that some 'purely functional' languages are inherently type safe, but I do not know any of these languages. I think Standard ML or Haskell would be type safe.
3) "If a language allows you to change the type of a variable in runtime (e.g. a variable that used to store an int is later used to store a string), what category does that fall in?":
This may be dynamic - variables are untyped, values may carry implicit or explicit type information; alternatively, the type system may be able to cope with variables that change type, and be a static type system.
4) Python: It's dynamically and strongly typed. Type safety is something I don't know python (and type safety itself) enough to say anything about.
5) "Is there anything else I should know about type systems?": Maybe read the book #BasileStarynkevitch suggests?
You are asking a lot here :) Type system is a dedicated field of computer science!
Starting from the begining, "a type system is method for proving the absence of certain program behavior" (See B.Pierce's Types and Programming Languages, also referred in the other answer). Programs that pass the type checking is a subset of what would be valid programs. For instance, the method
int answer() {
if(true) { return 42; } else { return "wrong"; }
}
would actually behave well at run-time. The else branch is never executed, and the answer always return 42. The static type system is a conservative analysis that will reject this program, because it can not prove the absence of a type error, that is, that "wrong" is never returned.
Of course, you could improve the type system to actually detect that the else branch never happens. You want to improve the type system to reject as few program as possible. This is why type system have been enriched over the years to support more and more refinement (e.g. generic, etc.)
The point of a type system is to prove the absence of type errors. In practice, they support operations like downcasting that inherently imply run-time type checks, and might lead to type errors. Again, the goal is to make the type system as flexible as possible, so that we don't need to resort to these operations that weaken type safety (e.g. generic).
You can read chapter 1 of the aforementionned book for a really nice introduction. For the rest, I will refer you to What To Know Before Debating Type Systems, which is awesome blog post about the basic concepts.
Is there anything else I should know about type systems?
Oh, yes! :)
Happy immersion in the world of type systems!
I suggest reading B.Pierce's Types and Programming Languages book. And I also suggest learning a bit of a statically-typed, with type inference, language like Ocaml or Haskell.
A type system is a mechanism which controls the functions which access values. Compile time checking is one aspect of this, which rejects programs during compilation if an attempt is made to use a function on values it is not designed to handle. However another aspect is the converse, the selection of functions to handle some values, for example overloading. Another example is specialisation of polymorphic functions (e.g. templates in C++). Inference and deduction are other aspects where the type of functions is deduced by usage rather than specified by the programmer.
Parts of the checking and selection can be deferred until run time. Dispatch of methods based on variant tags or by indirection or specialised tables as for C++ virtual functions or Haskell typeclass dictionaries are two examples provided even in extremely strongly typed languages.
The key concept of type systems is called soundness. A type system is sound if it guarantees no value can be used by an inappropriate function. Roughly speaking an unsound type system has "holes" and is useless. The type system of ISO C89 is sound if you remove casts (and void* conversions), and unsound if you allow them. The type system of ISO C++ is unsound.
A second vital concept of types systems is called expressiveness. Sound type systems for polymorphic programming prevent programmers writing valid code: they're universally too restrictive (and I believe inescapably so). Making type systems more expressive so they allow a wider set of valid programs is the key academic challenge.
Another concept of typing is strength. A strong type system can find more errors earlier. For example many languages have type systems too weak to detect array bounds violations using the type system and have to resort to run time checks. Somehow strength is the opposite of expressiveness: we want to allow more valid programs (expressiveness) but also catch even more invalid ones (strength).
Here's a key question: explain why OO typing is too weak to permit OO to be used as a general development paradigm. [Hint: OO cannot handle relations]

Does case sensitivity have anything to do with strongly typed languages (or loosely typed languages)?

(I admit this may be a n00b question - I know very little about CS theory, mostly a hands-on/hobby sort.)
I was googling up strongly-typed language for the official definition, and one of the top links I found was from Yahoo Answers, which suggested that case sensitive was a part of whether a language is loosely/strongly typed.
I had always thought the simple answer to the difference between a strongly typed/weakly typed language is that the first requires explicit type declarations, while the later is more open, even "dynamic".
The two S/O threads (here and here) I found so far seem to suggest that (more or less), but they don't mention anything about case sensitivity. Is there a relation at all between case sensitive and strong/weak?
A couple of clarifications:
Case sensitivity has nothing to do with strong vs. weak typing, static vs. dynamic typing or any other property of the type system. I don't know why the answer on yahoo answers has gotten its one upvote, but it's completely wrong. Just ignore it.
Strong typing isn't a well-defined term, but it is often used to refer to languages with few implicit type conversions, i.e. languages where it is an error to perform operations on types that do not support that operation.
As an example multiplying the strings "foo" and "bar" gives 0 as the result in perl, while it causes a type error in ruby, python, java, haskell, ml and many other languages. Thus those languages are more strongly typed than perl.
Strong typing is also sometimes used as a synonym for static typing.
A statically typed language is a language in which the types of variables, functions and expressions are known at compile time (or before runtime anyway - a statically typed language need not be compiled per se, though in practice it usually is). This means that if a statically typed program contains a type error, it will not run. If a dynamically typed program contains a type error it will run up to the point where the error happens and then crash.
Whether a language requires type annotations is (somewhat) independent of whether its type system is strong or weak or static or dynamic. In theory a dynamically typed language could require (or at least allow) type annotations and then throw runtime errors when those annotations are broken (though I don't know of any dynamically that actually does this).
More importantly there are many statically and strongly typed languages (e.g. Haskell, ML) that don't require type annotations, but instead use type inference algorithms to infer the types.
In theory, case sensitivity is completely unrelated to type strictness. Case sensitivity is about whether the identifiers foo, FOO, and fOo refer to the same variable, function, or what-have-you. Type strictness is about whether variables have types or just values do, how easy it is to convert among types, and so on.
In practice, there might be a correlation between case sensitivity and type strictness, but I can't think of enough case-insensitive languages right now to make an assessment. My impression is that most languages commonly used today are case sensitive — possibly because C was case sensitive and very influential, possibly because it was the only way to force people to stop PROGRAMMING IN ALL CAPS after a couple decades of FORTRAN, COBOL, and BASIC.
No - they're not connected. Strongly type languages force you to specify the type of data that a variable may hold - such as a real number, an integer, a textual string, or some programmer-defined object. You they can't accidentally assign another type of data into that variable unless it is implicitly convertible: examples of this are that you can generally put a integer into a real number (i.e. double x = 3.14; x = 3; is ok but int x = 3; x = 3.14; might not be, depending on how strongly typed the langauge is). Weakly typed languages just store whatever they're asked to without doing these sanity checks. In strongly typed languages like C++, you can still create type that can store data that can be any of a specific set of types (e.g. C++'s boost::variant), but sometimes they're a bit more limited in how much you can do and how convenient it is to use.
Case sensitivity is means that the uppercase and lowercase versions of the same letter are considered equivalent for some purposes... normally in a string comparison or regular expression match. It is unusual but not unheard of for modern computer languages to ignore the case of letters in variable names (identifiers).

Resources