What does type level programming mean at runtime? - haskell

I am very new to Haskell, so sorry if this is a basic question, or a question founded on shaky understanding
Type level programming is a fascinating idea to me. I think I get the basic premise, but I feel like there is another side to it that is fuzzy to me. I get that the idea is to bring logic and computation into the compiletime instead of runtime, using types. This way you turn what is normally runtime logic/state/data into static logic, e.g. the size of collections.
So I get that for example you can have type level natural numbers, and do type level arithmetic on those natural numbers, and all this calculation and type safety is going on at compile time.
But what does such arithmetic imply at runtime? Especially since Haskell has full type erasure. So for example
If I concatenate two type level lists, then does the type level safety imply something about the behavior or performance of that concatenation at runtime? Or does the type level programming aspect only have meaning at compile time, when the programmer is grappling the code and putting things together?
Or if I have two type level numbers, and then multiply them, what does that mean at runtime? If these operations on large numbers are slow at compile time, are they instantaneous at runtime?
Or if we implemented type level RSA and then use it, what does that even mean at runtime?
Is it purely a compiletime safety/coherence tool? or does type level programming buy us anything for the runtime too? Is the logic and arithmetic 'paid for at compile time' or merely 'assured at compile time' (if that even makes sense)?

As you rightly say, Haskell [without weird extensions] has full type erasure. So that means anything computed purely at the type level is erased at runtime.
However, to do useful stuff, you connect the type-level stuff with your value-level stuff to provide useful properties.
Suppose, for example, you want to write a function that takes a pair of lists, treats them as mathematical vectors, and performs a vector dot-product with them. Now the dot-product is only defined on pairs of vectors of the same size. So if the size of the vectors doesn't match, you can't return a sensible answer.
Without type-level programming, your options are:
Require that the caller always supplies vectors of the same dimension, and cheerfully return gibberish if that requirement is not met. (I.e., ignore the problem.)
Perform an explicit check at run-time, and throw an exception or return Nothing or similar if the dimension don't match.
With type-level programming, you can make it so that if the dimensions don't match, the code does not compile! So that means at run-time you don't need to care about mismatched dimension, because... well, if your code is running, then the dimension cannot be mismatched.
The types have all been erased by this point, but you are still guaranteed that your code cannot crash / return gibberish, because the compiler has checked that that cannot happen.
It's really the same as the ordinary checks the compiler does to make sure you don't try to multiply an integer by a string or something. The types are all erased before runtime, and yet the code does not crash.
Of course, to do a dot-product, we merely have to check that two numbers are equal. We don't need any arithmetic yet. But it should be clear that to check whether the dimensions of our vectors match, we need to know the dimensions of our vectors. And that means that any operations that change the dimension of our vectors needs to do compile-time calculations, so the compiler can know the result size and check it satisfies the requirements.
You can also do more elaborate stuff. Somewhere I saw a library that lets you define a client/server communications protocol, but because it encodes the protocol into ludicrously complicated type signatures [which the compiler automatically infers], it can statically prove that the client and server implement exactly the same protocol (i.e., no bugs with the server not handling one of the messages the client can send). The types get erased at runtime, but we still know the wire protocol can't go wrong.

Related

How do you approach creating a complete new datatype on the "bit-level"?

I would like to create a new data type in Rust on the "bit-level".
For example, a quadruple-precision float. I could create a structure that has two double-precision floats and arbitrarily increase the precision by splitting the quad into two doubles, but I don't want to do that (that's what I mean by on the "bit-level").
I thought about using a u8-array or a bool-array but in both cases, I waste 7 bits of memory (because also bool is a byte large). I know there are several crates that implement something like bit-arrays or bit-vectors, but looking through their source code didn't help me to understand their implementation.
How would I create such a bit-array without wasting memory, and is this the way I would want to choose when implementing something like a quad-precision type?
I don't know how to implement new data types that don't use the basic types or are structures that combine the basic types, and I haven't been able to find a solution on the internet yet; maybe I'm not searching with the right keywords.
The question you are asking has no direct answer: Just like any other programming language, Rust has a basic set of rules for type layouts. This is due to the fact that (most) real-world CPUs can't address individual bits, need certain alignments when referencing memory, have rules regarding how pointer arithmetic works etc. etc.
For instance, if you create a type of just two bits, you'll still need an 8-bit byte to represent that type, because there is simply no way to address two individual bits on most CPU's opcodes; there is also no way to take the address of such a type because addressing works at least on the byte-level. More useful information regarding this can be found here, section 2, The Anatomy of a Type. Be aware that the non-wasting bit-level type you are thinking about needs to fulfill all the rules mentioned there.
It's a perfectly reasonable approach to represent what you want to do e.g. either as a single, wrapped u128 and implement all arithmetic on top of that type. Another, more generic, approach would be to use a Vec<u8>. You'll always do a relatively large amount of bit-masking, indirecting and such.
Having a look at rust_decimal or similar crates might also be a good idea.

Is Haskell a strongly typed programming language?

Is Haskell strongly typed? I.e. is it possible to change the type of a variable after you assigned one? I can't seem to find the answer on the internet.
Static — types are known at compile time. Java and Haskell have static typing. Also C/C++, C#, Go, Scala, Rust, Kotlin, Pascal to list a few more.
A statically typed language might or might not have type inference. Java almost completely lacks type inference (but it's very slowly changing just a little bit); Haskell has full type inference (except with certain very advanced extensions).
(Type inference is when you only have to declare a minimal amount of types by hand, e.g. var isFoo = true and var person = new Person(), instead of bool isFoo = ... and Person person = ....)
Dynamic — Python, JavaScript, Ruby, PHP, Clojure (and Lisps in general), Prolog, Erlang, Groovy etc. Can also be called "unityped"; dynamic typing can be "emulated" in a static setting, but the reverse is not true except by using external static analysis tools. Some languages make it possible to mix dynamic and static (see gradual typing, e.g. https://typedclojure.org/).
Some languages enable static typing for one or more modules, applied at import time, for example: Python+Mypy, Typed Clojure, JavaScript+Flow, PHP+Hack to name a few.
Strong — values that are intended to be treated as Cat always are; trying to treat them like a Dog will cause a loud meeewww... I mean error.
Weak — this effectively boils down to 2 similar but distinct things: type coercion (e.g. "5"+3 equals 8 in PHP — or does it!) and memory reinterpretation (e.g. (int) someCharValue or (bool) somePtr in C, and C++ as well, but C++ wants you to explicitly say reinterpret_cast). So there's really coercion-weak and reinterpretation-weak, and different languages are weak in one or both of these ways.
Interestingly, note that coercion is implicit by nature and memory reinterpretation is explicit (except in Assembly) — so weak typing consists of an implicit and an explicit behavior. Maybe that's even more of a reason to refer to 2 distinct subcategories under weak typing.
There are languages with all 4 possible combinations, and variations/gradations thereof.
Haskell is static+strong; of course it has unsafeCoerce so it can be static+a bit reinterpret-weak at times, but unsafeCoerce is very much frowned upon except in extreme situations where you are sure about something being the case but just can't seem to persuade the compiler without going all the way back and retelling the entire story in a different way.
C is static+weak because all memory can freely be reinterpreted as something it originally was not meant to contain, hence weak. But all of those reinterpretations are kept track of by the type checker, so still fully static too. But C does not do implicit coercions, so it's only reinterpret-weak.
Python is dynamic+almost entirely strong — there are no types known on any given line of code prior to reaching that line during execution, however values that live at runtime do have types associated with them and it's impossible to reinterpret memory. Implicit coercions are also kept to a meaningful minimum, so one might say Python is 99.9% strong and 0.01% coercion-weak.
PHP and JavaScript are dynamic+mostly weak — dynamic, in that nothing has type until you execute and introspect its contents, and also weak in that coercions happen all the time and with things you'd never really expect to be coerced, unless you are only calling methods and functions and not using built-in operations. These coercions are a source of a lot of humor on the internet. There are no memory reinterpretations so PHP and JS are coercion-weak.
Furthermore, some people like to think that static typing is about variables having type, and strong typing is about values having type — this is a very useful way to go about understanding the full picture, but it's not quite true: some dynamically typed languages also allow variables/parameters to be annotated with types/constraints that are enforced at runtime.
In static typing, it's expressions that have a type; the fact of variables having type is only a consequence of variables being used as a means to glue bigger expressions together from smaller ones, so it's not variables per se that have types.
Similarly, in dynamic typing, it's not the variables that lack statically known type — it's all expressions! Variables lacking type is merely a consequence of the expressions they store lacking type.
One final illustration
In dynamic typing, all the cats, dogs and even elephants (in fact entire zoos!) are packaged up in identically sized boxes.
In static typing these boxes look different and have stickers on them saying what's inside.
Some people like it because they can just use a single box form factor and don't have to put any labels on the boxes — it's only the arrangement of boxes with regards to each other that implicitly (and hopefully) provides type sanity.
Some people also like it because it allows them to do all sorts of tricks with tigers temporarily being transported in boxes that smell like lions, and bears put in the same array of interconnected boxes as wolves or deer.
In such label-free setting of transport boxes, all the possible logicistics scenarios need to be played or simulated in order to detect misalignment in the implicit arrangement, like in a stage performance. No reliable guarantees can be given based on reasoning only, generally speaking. (ad-hoc test cases that need for the entire system to be started up for any partial conclusions to be obtained of its soundness)
With labels and explicit rules on how to deal with boxes of various labels, automated/mechanized logical reasoning can be used to draw up conclusions on what the logistics system won't do or will do for sure (static verification, formal proof, or at least pseudo-proof like QuickCheck), Some aspects of the logistics still need to be verified with trial runs, such as whether the logistics team even got the client right. (integration testing, acceptance testing, end user sanity checks).
Moreover, in weak typing dogs can be sliced up and reassembled as frankenstein cats. Whether they like it or not, and whether the result is ugly or not. (weak typing)
But if you add labels to the boxes, it still matters that Frankenstein cats be put in cat boxes. (static+weak typing)
In strong typing, while you can put a cat in the box of a dog, but you can only keep pretending it's a dog until you try to humiliate it by feeding it something only dogs would eat — if that happens, it will scream out loud, but until that time, if you're in dynamic typing, it will silently accept its place (in a static world it would refuse to be put in a dog's box before you can say "kitty").
You seem to mix up dynamic/static and weak/strong typing.
Dynamic or static typing is about whether the type of a variable can be changed during execution.
Weak or strong typing is about being able to predict type errors just from function signatures.
Haskell is both statically and strongly typed.
However, there is no such thing as variable in Haskell so talking about dynamic or static typing makes no sense since every identifier assigned with a value cannot be changed at execution.
EDIT: But like goldenbull said, those typing notions are not clearly defined.
It is strongly typed. See section 2.3 here: Why Haskell matters
I think you are talking about two different things.
First, haskell, and most functional programming (FP) languages, do NOT have the concept "variable". Instead, they use the concept "name" and "value", they just "bind" a value to a name. Once the value is bound, you can not bind another value to the same name, this is the key feature of FP.
Strong typing is another topic. Yes, haskell is strongly typed, and so are most FP languages. Strong typing gives FP the ability of "type inference" which is powerful to eliminate hidden bugs in compile time and help reduce the size of the source code.
Maybe you are comparing haskell with python? Python is also strongly typed. The difference between haskell and python is "static typed" and "dynamic typed". The actual meaning of term "Strong type" and "Weak Type" are ambiguous and fuzzy. That is another long story...

Basic Concepts of Language Type Systems

Could someone please explain clearly and succinctly the concepts of language type systems?
I've read a post or two here on type systems, but have trouble finding one that answers all my questions below.
I've heard/read that there are 3 type categorizations: dynamic vs static, strong vs weak, safe vs unsafe.
Some questions:
Are there any others?
What do each of these mean?
If a language allows you to change the type of a variable in runtime (e.g. a variable that used to store an int is later used to store a string), what category does that fall in?
How does Python fit into each of these categories?
Is there anything else I should know about type systems?
Thanks very much!
1) Apparently, there are others: http://en.wikipedia.org/wiki/Type_system
2)
Dynamic => Type checking is done during runtime (program execution) e.g. Python.
Static (as opposed to Dynamic) => Type checking is done during compile time e.g. C++
Strong => Once the type system decides that a particular object is of a type, it doesn't allow it to be used as another type. e.g. Python
Weak (as opposed to Strong) => The type system allows objects types to change. e.g. perl lets you read a number as a string, then use it again as a number
Type safety => I can only best describe with a 'C' statement like:
x = (int *) malloc (...);
malloc returns a (void *) and we simply type-cast it to (int *). At compile time there is no check that the pointer returned by the function malloc will actually be the size of an integer => Some C operations aren't type safe.
I am told that some 'purely functional' languages are inherently type safe, but I do not know any of these languages. I think Standard ML or Haskell would be type safe.
3) "If a language allows you to change the type of a variable in runtime (e.g. a variable that used to store an int is later used to store a string), what category does that fall in?":
This may be dynamic - variables are untyped, values may carry implicit or explicit type information; alternatively, the type system may be able to cope with variables that change type, and be a static type system.
4) Python: It's dynamically and strongly typed. Type safety is something I don't know python (and type safety itself) enough to say anything about.
5) "Is there anything else I should know about type systems?": Maybe read the book #BasileStarynkevitch suggests?
You are asking a lot here :) Type system is a dedicated field of computer science!
Starting from the begining, "a type system is method for proving the absence of certain program behavior" (See B.Pierce's Types and Programming Languages, also referred in the other answer). Programs that pass the type checking is a subset of what would be valid programs. For instance, the method
int answer() {
if(true) { return 42; } else { return "wrong"; }
}
would actually behave well at run-time. The else branch is never executed, and the answer always return 42. The static type system is a conservative analysis that will reject this program, because it can not prove the absence of a type error, that is, that "wrong" is never returned.
Of course, you could improve the type system to actually detect that the else branch never happens. You want to improve the type system to reject as few program as possible. This is why type system have been enriched over the years to support more and more refinement (e.g. generic, etc.)
The point of a type system is to prove the absence of type errors. In practice, they support operations like downcasting that inherently imply run-time type checks, and might lead to type errors. Again, the goal is to make the type system as flexible as possible, so that we don't need to resort to these operations that weaken type safety (e.g. generic).
You can read chapter 1 of the aforementionned book for a really nice introduction. For the rest, I will refer you to What To Know Before Debating Type Systems, which is awesome blog post about the basic concepts.
Is there anything else I should know about type systems?
Oh, yes! :)
Happy immersion in the world of type systems!
I suggest reading B.Pierce's Types and Programming Languages book. And I also suggest learning a bit of a statically-typed, with type inference, language like Ocaml or Haskell.
A type system is a mechanism which controls the functions which access values. Compile time checking is one aspect of this, which rejects programs during compilation if an attempt is made to use a function on values it is not designed to handle. However another aspect is the converse, the selection of functions to handle some values, for example overloading. Another example is specialisation of polymorphic functions (e.g. templates in C++). Inference and deduction are other aspects where the type of functions is deduced by usage rather than specified by the programmer.
Parts of the checking and selection can be deferred until run time. Dispatch of methods based on variant tags or by indirection or specialised tables as for C++ virtual functions or Haskell typeclass dictionaries are two examples provided even in extremely strongly typed languages.
The key concept of type systems is called soundness. A type system is sound if it guarantees no value can be used by an inappropriate function. Roughly speaking an unsound type system has "holes" and is useless. The type system of ISO C89 is sound if you remove casts (and void* conversions), and unsound if you allow them. The type system of ISO C++ is unsound.
A second vital concept of types systems is called expressiveness. Sound type systems for polymorphic programming prevent programmers writing valid code: they're universally too restrictive (and I believe inescapably so). Making type systems more expressive so they allow a wider set of valid programs is the key academic challenge.
Another concept of typing is strength. A strong type system can find more errors earlier. For example many languages have type systems too weak to detect array bounds violations using the type system and have to resort to run time checks. Somehow strength is the opposite of expressiveness: we want to allow more valid programs (expressiveness) but also catch even more invalid ones (strength).
Here's a key question: explain why OO typing is too weak to permit OO to be used as a general development paradigm. [Hint: OO cannot handle relations]

Is there an object-identity-based, thread-safe memoization library somewhere?

I know that memoization seems to be a perennial topic here on the haskell tag on stack overflow, but I think this question has not been asked before.
I'm aware of several different 'off the shelf' memoization libraries for Haskell:
The memo-combinators and memotrie packages, which make use of a beautiful trick involving lazy infinite data structures to achieve memoization in a purely functional way. (As I understand it, the former is slightly more flexible, while the latter is easier to use in simple cases: see this SO answer for discussion.)
The uglymemo package, which uses unsafePerformIO internally but still presents a referentially transparent interface. The use of unsafePerformIO internally results in better performance than the previous two packages. (Off the shelf, its implementation uses comparison-based search data structures, rather than perhaps-slightly-more-efficient hash functions; but I think that if you find and replace Cmp for Hashable and Data.Map for Data.HashMap and add the appropraite imports, you get a hash based version.)
However, I'm not aware of any library that looks answers up based on object identity rather than object value. This can be important, because sometimes the kinds of object which are being used as keys to your memo table (that is, as input to the function being memoized) can be large---so large that fully examining the object to determine whether you've seen it before is itself a slow operation. Slow, and also unnecessary, if you will be applying the memoized function again and again to an object which is stored at a given 'location in memory' 1. (This might happen, for example, if we're memoizing a function which is being called recursively over some large data structure with a lot of structural sharing.) If we've already computed our memoized function on that exact object before, we can already know the answer, even without looking at the object itself!
Implementing such a memoization library involves several subtle issues and doing it properly requires several special pieces of support from the language. Luckily, GHC provides all the special features that we need, and there is a paper by Peyton-Jones, Marlow and Elliott which basically worries about most of these issues for you, explaining how to build a solid implementation. They don't provide all details, but they get close.
The one detail which I can see which one probably ought to worry about, but which they don't worry about, is thread safety---their code is apparently not threadsafe at all.
My question is: does anyone know of a packaged library which does the kind of memoization discussed in the Peyton-Jones, Marlow and Elliott paper, filling in all the details (and preferably filling in proper thread-safety as well)?
Failing that, I guess I will have to code it up myself: does anyone have any ideas of other subtleties (beyond thread safety and the ones discussed in the paper) which the implementer of such a library would do well to bear in mind?
UPDATE
Following #luqui's suggestion below, here's a little more data on the exact problem I face. Let's suppose there's a type:
data Node = Node [Node] [Annotation]
This type can be used to represent a simple kind of rooted DAG in memory, where Nodes are DAG Nodes, the root is just a distinguished Node, and each node is annotated with some Annotations whose internal structure, I think, need not concern us (but if it matters, just ask and I'll be more specific.) If used in this way, note that there may well be significant structural sharing between Nodes in memory---there may be exponentially more paths which lead from the root to a node than there are nodes themselves. I am given a data structure of this form, from an external library with which I must interface; I cannot change the data type.
I have a function
myTransform : Node -> Node
the details of which need not concern us (or at least I think so; but again I can be more specific if needed). It maps nodes to nodes, examining the annotations of the node it is given, and the annotations its immediate children, to come up with a new Node with the same children but possibly different annotations. I wish to write a function
recursiveTransform : Node -> Node
whose output 'looks the same' as the data structure as you would get by doing:
recursiveTransform Node originalChildren annotations =
myTransform Node recursivelyTransformedChildren annotations
where
recursivelyTransformedChildren = map recursiveTransform originalChildren
except that it uses structural sharing in the obvious way so that it doesn't return an exponential data structure, but rather one on the order of the same size as its input.
I appreciate that this would all be easier if say, the Nodes were numbered before I got them, or I could otherwise change the definition of a Node. I can't (easily) do either of these things.
I am also interested in the general question of the existence of a library implementing the functionality I mention quite independently of the particular concrete problem I face right now: I feel like I've had to work around this kind of issue on a few occasions, and it would be nice to slay the dragon once and for all. The fact that SPJ et al felt that it was worth adding not one but three features to GHC to support the existence of libraries of this form suggests that the feature is genuinely useful and can't be worked around in all cases. (BUT I'd still also be very interested in hearing about workarounds which will help in this particular case too: the long term problem is not as urgent as the problem I face right now :-) )
1 Technically, I don't quite mean location in memory, since the garbage collector sometimes moves objects around a bit---what I really mean is 'object identity'. But we can think of this as being roughly the same as our intuitive idea of location in memory.
If you only want to memoize based on object identity, and not equality, you can just use the existing laziness mechanisms built into the language.
For example, if you have a data structure like this
data Foo = Foo { ... }
expensive :: Foo -> Bar
then you can just add the value to be memoized as an extra field and let the laziness take care of the rest for you.
data Foo = Foo { ..., memo :: Bar }
To make it easier to use, add a smart constructor to tie the knot.
makeFoo ... = let foo = Foo { ..., memo = expensive foo } in foo
Though this is somewhat less elegant than using a library, and requires modification of the data type to really be useful, it's a very simple technique and all thread-safety issues are already taken care of for you.
It seems that stable-memo would be just what you needed (although I'm not sure if it can handle multiple threads):
Whereas most memo combinators memoize based on equality, stable-memo does it based on whether the exact same argument has been passed to the function before (that is, is the same argument in memory).
stable-memo only evaluates keys to WHNF.
This can be more suitable for recursive functions over graphs with cycles.
stable-memo doesn't retain the keys it has seen so far, which allows them to be garbage collected if they will no longer be used. Finalizers are put in place to remove the corresponding entries from the memo table if this happens.
Data.StableMemo.Weak provides an alternative set of combinators that also avoid retaining the results of the function, only reusing results if they have not yet been garbage collected.
There is no type class constraint on the function's argument.
stable-memo will not work for arguments which happen to have the same value but are not the same heap object. This rules out many candidates for memoization, such as the most common example, the naive Fibonacci implementation whose domain is machine Ints; it can still be made to work for some domains, though, such as the lazy naturals.
Ekmett just uploaded a library that handles this and more (produced at HacPhi): http://hackage.haskell.org/package/intern. He assures me that it is thread safe.
Edit: Actually, strictly speaking I realize this does something rather different. But I think you can use it for your purposes. It's really more of a stringtable-atom type interning library that works over arbitrary data structures (including recursive ones). It uses WeakPtrs internally to maintain the table. However, it uses Ints to index the values to avoid structural equality checks, which means packing them into the data type, when what you want are apparently actually StableNames. So I realize this answers a related question, but requires modifying your data type, which you want to avoid...

How does one avoid creating an ad-hoc type system in dynamically typed languages?

In every project I've started in languages without type systems, I eventually begin to invent a runtime type system. Maybe the term "type system" is too strong; at the very least, I create a set of type/value-range validators when I'm working with complex data types, and then I feel the need to be paranoid about where data types can be created and modified.
I hadn't thought twice about it until now. As an independent developer, my methods have been working in practice on a number of small projects, and there's no reason they'd stop working now.
Nonetheless, this must be wrong. I feel as if I'm not using dynamically-typed languages "correctly". If I must invent a type system and enforce it myself, I may as well use a language that has types to begin with.
So, my questions are:
Are there existing programming paradigms (for languages without types) that avoid the necessity of using or inventing type systems?
Are there otherwise common recommendations on how to solve the problems that static typing solves in dynamically-typed languages (without sheepishly reinventing types)?
Here is a concrete example for you to consider. I'm working with datetimes and timezones in erlang (a dynamic, strongly typed language). This is a common datatype I work with:
{{Y,M,D},{tztime, {time, HH,MM,SS}, Flag}}
... where {Y,M,D} is a tuple representing a valid date (all entries are integers), tztime and time are atoms, HH,MM,SS are integers representing a sane 24-hr time, and Flag is one of the atoms u,d,z,s,w.
This datatype is commonly parsed from input, so to ensure valid input and a correct parser, the values need to be checked for type correctness, and for valid ranges. Later on, instances of this datatype are compared to each other, making the type of their values all the more important, since all terms compare. From the erlang reference manual
number < atom < reference < fun < port < pid < tuple < list < bit string
Aside from the confsion of static vs. dynamic and strong vs. weak typing:
What you want to implement in your example isn't really solved by most existing static typing systems. Range checks and complications like February 31th and especially parsed input are usually checked during runtime no matter what type system you have.
Your example being in Erlang I have a few recommendations:
Use records. Besides being usefull and helpfull for a whole bunch of reasons, the give you easy runtime type checking without a lot of effort e.g.:
is_same_day(#datetime{year=Y1, month=M1, day=D1},
#datetime{year=Y2, month=M2, day=D2}) -> ...
Effortless only matches for two datetime records. You could even add guards to check for ranges if the source is untrusted. And it conforms to erlangs let it crash method of error handling: if no match is found you get a badmatch, and can handle this on the level where it is apropriate (usually the supervisor level).
Generally write your code that it crashes when the assumptions are not valid
If this doesn't feel static checked enough: use typer and dialyzer to find the kind of errors that can be found statically, whatever remains will be checkd at runtime.
Don't be too restrictive in your functions what "types" you accept, sometimes the added functionality of just doing someting useful even for different inputs is worth more than checking the types and ranges on every function. If you do it where it matters usually you will catch the error early enough for it to be easy fixable. This is especially true for a functionaly language where you allways know where every value comes from.
A lot of good answers, let me add:
Are there existing programming paradigms (for languages without types) that avoid the necessity of using or inventing type systems?
The most important paradigm, especially in Erlang, is this: Assume the type is right, otherwise let it crash. Don't write excessively checking paranoid code, but assume that the input you get is of the right type or the right pattern. Don't write (there are exceptions to this rule, but in general)
foo({tag, ...}) -> do_something(..);
foo({tag2, ...}) -> do_something_else(..);
foo(Otherwise) ->
report_error(Otherwise),
try to fix problem here...
Kill the last clause and have it crash right away. Let a supervisor and other processes do the cleanup (you can use monitors() for janitorial processes to know when a crash has occurred).
Do be precise however. Write
bar(N) when is_integer(N) -> ...
baz([]) -> ...
baz(L) when is_list(L) -> ...
if the function is known only to work with integers or lists respectively. Yes, it is a runtime check but the goal is to convey information to the programmer. Also, HiPE tend to utilize the hint for optimization and eliminate the type check if possible. Hence, the price may be less than what you think it is.
You choose an untyped/dynamically-typed language so the price you have to pay is that type checking and errors from clashes will happen at runtime. As other posts hint, a statically typed language is not exempt from doing some checks as well - the type system is (usually) an approximation of a proof of correctness. In most static languages you often get input which you can't trust. This input is transformed at the "border" of the application and then converted to an internal format. The conversion serves to mark trust: From now on, the thing has been validated and we can assume certain things about it. The power and correctness of this assumption is directly tied to its type signature and how good the programmer is with juggling the static types of the language.
Are there otherwise common recommendations on how to solve the problems that static typing solves in dynamically-typed languages (without sheepishly reinventing types)?
Erlang has the dialyzer which can be used to statically analyze and infer types of your programs. It will not come up with as many type errors as a type checker in e.g., Ocaml, but it won't "cry wolf" either: An error from the dialyzer is provably an error in the program. And it won't reject a program which may be working ok. A simple example is:
and(true, true) -> true;
and(true, _) -> false;
and(false, _) -> false.
The invocation and(true, greatmistake) will return false, yet a static type system will reject the program because it will infer from the first line that the type signature takes a boolean() value as the 2nd parameter. The dialyzer will accept this function in contrast and give it the signature (boolean(), term()) -> boolean(). It can do this, because there is no need to protect a priori for an error. If there is a mistake, the runtime system has a type check that will capture it.
In order for a statically-typed language to match the flexibility of a dynamically-typed one, I think it would need a lot, perhaps infinitely many, features.
In the Haskell world, one hears a lot of sophisticated, sometimes to the point of being scary, teminology. Type classes. Parametric polymorphism. Generalized algebraic data types. Type families. Functional dependencies. The Ωmega programming language takes it even further, with the website listing "type-level functions" and "level polymorphism", among others.
What are all these? Features added to static typing to make it more flexible. These features can be really cool, and tend to be elegant and mind-blowing, but are often difficult to understand. Learning curve aside, type systems often fail to model real-world problems elegantly. A particularly good example of this is interacting with other languages (a major motivation for C# 4's dynamic feature).
Dynamically-typed languages give you the flexibility to implement your own framework of rules and assumptions about data, rather than be constrained by the ever-limited static type system. However, "your own framework" won't be machine-checked, meaning the onus is on you to ensure your "type system" is safe and your code is well-"typed".
One thing I've found from learning Haskell is that I can carry lessons learned about strong typing and sound reasoning over to weaker-typed languages, such as C and even assembly, and do the "type checking" myself. Namely, I can prove that sections of code are correct in and of themselves, by bearing in mind the rules my functions and values are supposed to follow, and the assumptions I am allowed to make about other functions and values. When debugging, I go through and check things again, and think through whether or not my approach is sound.
The bottom line: dynamic typing puts more flexibility at your fingertips. On the other hand, statically-typed languages tend to be more efficient (by orders of magnitude), and good static type systems drastically cut down on debugging time by letting the computer do much of it for you. If you want the benefits of both, install a static type checker in your brain by learning decent, strongly-typed languages.
Sometimes data need validation. Validating any data received from the network is almost always a good idea — especially data from a public network. Being paranoid here is only good. If something resembling a static type system helps this in the least painful way, so be it. There's a reason why Erlang allows type annotations. Even pattern matching can be seen as just a kind of dynamic type checking; nevertheless, it's a central feature of the language. The very structure of data is its 'type' in Erlang.
The good thing is that you can custom-tailor your 'type system' to your needs, make it flexible and smart, while type systems of OO languages typically have fixed features. When data structures you use are immutable, once you've validated such a structure, you're safe to assume it conforms your restrictions, just like with static typing.
There's no point in being ready to process any kind of data at any point of a program, dynamically-typed or not. A 'dynamic type' is essentially a union of all possible types; limiting it to a useful subset is a valid way to program.
A statically typed language detects type errors at compile time. A dynamically typed language detects them at runtime. There are some modest restrictions on what one can write in a statically typed language such that all type errors can be caught at compile time.
But yes, you still have types even in a dynamically typed language, and that's a good thing. The problem is you wander into lots of runtime checks to ensure that you have the types you think you do, since the compiler hasn't taken care of that for you.
Erlang has a very nice tool for specifying and statically verifying lots of types -- dialyzer: Erlang type system, for references.
So don't reinvent types, use the typing tools that Erlang already provides, to handle the types that already exist in your program (but which you haven't yet specified).
And this on its own won't eliminate range checks, unfortunately. Without lots of special sauce you really have to enforce this on your own by convention (and smart constructors, etc. to help), or fall back to runtime checks, or both.

Resources