There is a well-known fact that C++ templates are turing-complete, CSS is turing-complete (!) and that the C# overload resolution is NP-hard (even without generics).
But is C# 4.0 (with co/contravariance, generics etc) compile-time turing complete?
Unlike templates in C++, generics in C# (and other .net lang) are a runtime generated feature. The compiler does do some checking as to verify the types use but, actual substitution happens at runtime. Same goes for Co and contravariance if I'm not mistaken as well as even the preprocessor directives. Lots of CLR magic.
(At the implementation level, the primary difference is that C#
generic type substitutions are performed at runtime and generic type
information is thereby preserved for instantiated objects)
See MSDN
http://msdn.microsoft.com/en-us/library/c6cyy67b(v=vs.110).aspx
Update:
The CLR does preform type checking via information stored in the metadata associated with the compiled assemblies(Vis-à-vis Jit Compliation), It does this as one of its many services,(ShuggyCoUk answer on this question explains it in detail) (others include memory management and exception handling). So with that I would infer that the compiler has a understanding of state as progression and state as in machine internal state (TC,in part, mean being able to review data (symbols) with so reference to previous data(symbols) , conditionally and evaluate) (I hesitated to state the exact def of TC as I, myself am not sure I have it fully grasped, so feel free to fill in the blanks and correct me when applicable ) So with that I would say with a bit of trepidation, yes, yes it can be.
Related
I'm a bit confused when it comes to a compiled language (compilation to native code) with dynamic typing.
Dynamic typing says that the types in a program are only inferred at runtime.
Now if a language is compiled, there's no interpreter running at runtime; it's just your CPU reading instructions off memory and executing them. In such a scenario, if any instruction violating the type semantics of the language happens to execute at runtime, there's no interpreter to intercept the execution of the program and throw any errors. How does the system work then?
What happens when an instruction violating the type semantics of a dynamically typed compiled language is executed at runtime?
PS: Some of the dynamically typed compiled languages I know of include Scheme, Lua and Common Lisp.
A compiler for a dynamically typed language would simply generate instructions that check the type where necessary. In fact even for some statically typed languages this is sometimes necessary, say, in case of an object oriented language with checked casts (like dynamic_cast in C++). In dynamically types languages it is simply necessary more often.
For these checks to be possible each value needs to be represented in a way that keeps track of its type. A common way to represent values in dynamically typed languages is to represent them as pointers to a struct that contains the type as well as the value (as an optimization this is often avoided in case of (sufficiently small) integers by storing the integer directly as an invalid pointer (by shifting the integer one to the left and setting its least significant bit)).
The best way to do it would be to get the representation of the function (if it can be recovered somehow). Binary serialization is preferred for efficiency reasons.
I think there is a way to do it in Clean, because it would be impossible to implement iTask, which relies on that tasks (and so functions) can be saved and continued when the server is running again.
This must be important for distributed haskell computations.
I'm not looking for parsing haskell code at runtime as described here: Serialization of functions in Haskell.
I also need to serialize not just deserialize.
Unfortunately, it's not possible with the current ghc runtime system.
Serialization of functions, and other arbitrary data, requires some low level runtime support that the ghc implementors have been reluctant to add.
Serializing functions requires that you can serialize anything, since arbitrary data (evaluated and unevaluated) can be part of a function (e.g., a partial application).
No. However, the CloudHaskell project is driving home the need for explicit closure serialization support in GHC. The closest thing CloudHaskell has to explicit closures is the distributed-static package. Another attempt is the HdpH closure representation. However, both use Template Haskell in the way Thomas describes below.
The limitation is a lack of static support in GHC, for which there is a currently unactioned GHC ticket. (Any takers?). There has been a discussion on the CloudHaskell mailing list about what static support should actually look like, but nothing has yet progressed as far as I know.
The closest anyone has come to a design and implementation is Jost Berthold, who has implemented function serialisation in Eden. See his IFL 2010 paper "Orthogonal Serialisation for Haskell". The serialisation support is baked in to the Eden runtime system. (Now available as separate library: packman. Not sure whether it can be used with GHC or needs a patched GHC as in the Eden fork...) Something similar would be needed for GHC. This is the serialisation support Eden, in the version forked from GHC 7.4:
data Serialized a = Serialized { packetSize :: Int , packetData :: ByteArray# }
serialize :: a -> IO (Serialized a)
deserialize :: Serialized a -> IO a
So: one can serialize functions and data structures. There is a Binary instance for Serialized a, allowing you to checkpoint a long-running computation to file! (See Secion 4.1).
Support for such a simple serialization API in the GHC base libraries would surely be the Holy Grail for distributed Haskell programming. It would likely simplify the composability between the distributed Haskell flavours (CloudHaskell, MetaPar, HdpH, Eden and so on...)
Check out Cloud Haskell. It has a concept called Closure which is used to send code to be executed on remote nodes in a type safe manner.
Eden probably comes closest and probably deserves a seperate answer: (De-)Serialization of unevaluated thunks is possible, see https://github.com/jberthold/packman.
Deserialization is however limited to the same program (where program is a "compilation result"). Since functions are serialized as code pointers, previously unknown functions cannot be deserialized.
Possible usage:
storing unevaluated work for later
distributing work (but no sharing of new code)
A pretty simple and practical, but maybe not as elegant solution would be to (preferably have GHC automatically) compile each function into a separate module of machine-independent bytecode, serialize that bytecode whenever serialization of that function is required, and use the dynamic-loader or plugins packages, to dynamically load them, so even previously unknown functions can be used.
Since a module notes all its dependencies, those could then be (de)serialized and loaded too. In practice, serializing index numbers and attaching an indexed list of the bytecode blobs would probably be the most efficient.
I think as long as you compile the modules yourself, this is already possible right now.
As I said, it would not be very pretty though. Not to mention the generally huge security risk of de-serializing code from insecure sources to run in an unsecured environment. :-)
(No problem if it is trustworthy, of course.)
I’m not going to code it up right here, right now though. ;-)
Could someone please explain clearly and succinctly the concepts of language type systems?
I've read a post or two here on type systems, but have trouble finding one that answers all my questions below.
I've heard/read that there are 3 type categorizations: dynamic vs static, strong vs weak, safe vs unsafe.
Some questions:
Are there any others?
What do each of these mean?
If a language allows you to change the type of a variable in runtime (e.g. a variable that used to store an int is later used to store a string), what category does that fall in?
How does Python fit into each of these categories?
Is there anything else I should know about type systems?
Thanks very much!
1) Apparently, there are others: http://en.wikipedia.org/wiki/Type_system
2)
Dynamic => Type checking is done during runtime (program execution) e.g. Python.
Static (as opposed to Dynamic) => Type checking is done during compile time e.g. C++
Strong => Once the type system decides that a particular object is of a type, it doesn't allow it to be used as another type. e.g. Python
Weak (as opposed to Strong) => The type system allows objects types to change. e.g. perl lets you read a number as a string, then use it again as a number
Type safety => I can only best describe with a 'C' statement like:
x = (int *) malloc (...);
malloc returns a (void *) and we simply type-cast it to (int *). At compile time there is no check that the pointer returned by the function malloc will actually be the size of an integer => Some C operations aren't type safe.
I am told that some 'purely functional' languages are inherently type safe, but I do not know any of these languages. I think Standard ML or Haskell would be type safe.
3) "If a language allows you to change the type of a variable in runtime (e.g. a variable that used to store an int is later used to store a string), what category does that fall in?":
This may be dynamic - variables are untyped, values may carry implicit or explicit type information; alternatively, the type system may be able to cope with variables that change type, and be a static type system.
4) Python: It's dynamically and strongly typed. Type safety is something I don't know python (and type safety itself) enough to say anything about.
5) "Is there anything else I should know about type systems?": Maybe read the book #BasileStarynkevitch suggests?
You are asking a lot here :) Type system is a dedicated field of computer science!
Starting from the begining, "a type system is method for proving the absence of certain program behavior" (See B.Pierce's Types and Programming Languages, also referred in the other answer). Programs that pass the type checking is a subset of what would be valid programs. For instance, the method
int answer() {
if(true) { return 42; } else { return "wrong"; }
}
would actually behave well at run-time. The else branch is never executed, and the answer always return 42. The static type system is a conservative analysis that will reject this program, because it can not prove the absence of a type error, that is, that "wrong" is never returned.
Of course, you could improve the type system to actually detect that the else branch never happens. You want to improve the type system to reject as few program as possible. This is why type system have been enriched over the years to support more and more refinement (e.g. generic, etc.)
The point of a type system is to prove the absence of type errors. In practice, they support operations like downcasting that inherently imply run-time type checks, and might lead to type errors. Again, the goal is to make the type system as flexible as possible, so that we don't need to resort to these operations that weaken type safety (e.g. generic).
You can read chapter 1 of the aforementionned book for a really nice introduction. For the rest, I will refer you to What To Know Before Debating Type Systems, which is awesome blog post about the basic concepts.
Is there anything else I should know about type systems?
Oh, yes! :)
Happy immersion in the world of type systems!
I suggest reading B.Pierce's Types and Programming Languages book. And I also suggest learning a bit of a statically-typed, with type inference, language like Ocaml or Haskell.
A type system is a mechanism which controls the functions which access values. Compile time checking is one aspect of this, which rejects programs during compilation if an attempt is made to use a function on values it is not designed to handle. However another aspect is the converse, the selection of functions to handle some values, for example overloading. Another example is specialisation of polymorphic functions (e.g. templates in C++). Inference and deduction are other aspects where the type of functions is deduced by usage rather than specified by the programmer.
Parts of the checking and selection can be deferred until run time. Dispatch of methods based on variant tags or by indirection or specialised tables as for C++ virtual functions or Haskell typeclass dictionaries are two examples provided even in extremely strongly typed languages.
The key concept of type systems is called soundness. A type system is sound if it guarantees no value can be used by an inappropriate function. Roughly speaking an unsound type system has "holes" and is useless. The type system of ISO C89 is sound if you remove casts (and void* conversions), and unsound if you allow them. The type system of ISO C++ is unsound.
A second vital concept of types systems is called expressiveness. Sound type systems for polymorphic programming prevent programmers writing valid code: they're universally too restrictive (and I believe inescapably so). Making type systems more expressive so they allow a wider set of valid programs is the key academic challenge.
Another concept of typing is strength. A strong type system can find more errors earlier. For example many languages have type systems too weak to detect array bounds violations using the type system and have to resort to run time checks. Somehow strength is the opposite of expressiveness: we want to allow more valid programs (expressiveness) but also catch even more invalid ones (strength).
Here's a key question: explain why OO typing is too weak to permit OO to be used as a general development paradigm. [Hint: OO cannot handle relations]
I recently read a paper that compared Design-by-Contract to Test-Driven-Development. There seems to be lot of overlap, some redundancy, and a little bit of synergy between the DbC and TDD. For example, there are systems for automatically generating tests based on contracts.
In what way does DbC overlap with modern type system (such as in haskell, or one of those dependently typed languages) and are there points where using both is better than either?
The paper Typed Contracts for Functional Programming by Ralf Hinze, Johan Jeuring, and Andres Löh had this handy table that illustrates whereabouts contracts sit in the design spectrum of "checking":
| static checking | dynamic checking
-------------------------------------------------------------------
simple properties | static type checking | dynamic type checking
complex properties | theorem proving | contract checking
It seems most answers assume that contracts are checked dynamically.
Note that in some systems contracts are checked statically. In
such systems you can think of contracts as a restricted form of
dependent types which can be checked automatically. Contrast this
with richer dependent types, which are checked interactively, such as
in Coq.
See the "Specification Checking" section on Dana Xu's page for
papers on static and hybrid checking (static followed by dynamic) of
contracts for Haskell and OCaml. The contract system of Xu includes
refinement types and dependent arrows, both of which are dependent
types. Early languages with restricted dependent types that are
automatically checked include the DML and ATS of Pfenning and Xi.
In DML, unlike in Xu's work, the dependent types are restricted so
that automatic checking is complete.
The primary differences is that testing is dynamic and incomplete, relying on measurement to give evidence that you have fully validated whatever property you are testing, while types and typechecking is a formal system that guarantees all possible code paths have been validated against whatever property you are stating in types.
Testing for a property can only approach in the limit the level of assurance a type check for the same property provides out of the box. Contracts increase the base line for dynamic checking.
DBC is valuable as long as you can't express all the assumptions in the type systen itself. Simple haskell example:
take n [] = []
take 0 _ = []
take n (a:as) = take (n-1) as
The type would be:
take :: Int -> [a] -> [a]
Yet, only values greater-equal 0 are ok for n. This is where DBC could step in and, for example, generate appropriate quickcheck properties.
(Of course, in this case, it is too easy to check also for negative values and fix an outcome other than undefined - but there are more complex cases.)
I think DbC and type's systems are not comparable. There are confusion between DbC and type systems (or even verification systems). For example, we can find comparing DbC and Liquid Haskell tool or DbC and QuickCheck framework. IMHO it's not correct: type's systems as well as formal prove systems assert only one - you: you have some algorithms in our mind and you declare properties of these algorithms. Then you implement these algorithms. Types systems as well as formal prove systems verify that the code of implementation corresponds to the declared properties.
DbC verifies not internal things (implementation/code/algorithm) but external things: expected features of things which are external to your code. It can be state of the environment, of file system, of DB, callers, sure your own state, anything. Typical contracts work at runtime, not at compile time or in special verification phase.
Canonical example of DbC shows how was found bug in HP chip. It happens because DbC declares properties of external component: of the chip, its state, transitions, etc. And if your application hits unexpected external world state, it report such case as exception. Magic here is that: 1) you define contracts in one place and don't repeat yourself 2) contracts can be easy turned-off (dropped out from compiled binary). They are more close to Aspects IMHO, due they are not "linear" like subroutines calls.
My recap here is that DbC are more helpful to avoid real world bugs than types systems, because most real bugs happen due to misunderstanding of behavior of external world/environment/frameworks/OS components/etc. Types and prove assistance help only to avoid your own simple errors which can be found with tests or modeling (in MatLab, Mathematica, etc, etc).
In short: you can not find bug in HP chip with types system. Sure, exists illusion that it's possible to something like indexed monads, but real code with such attempts will look super complex, unsupportable and not practical. But I think there are possible some hybrid schemes.
In every project I've started in languages without type systems, I eventually begin to invent a runtime type system. Maybe the term "type system" is too strong; at the very least, I create a set of type/value-range validators when I'm working with complex data types, and then I feel the need to be paranoid about where data types can be created and modified.
I hadn't thought twice about it until now. As an independent developer, my methods have been working in practice on a number of small projects, and there's no reason they'd stop working now.
Nonetheless, this must be wrong. I feel as if I'm not using dynamically-typed languages "correctly". If I must invent a type system and enforce it myself, I may as well use a language that has types to begin with.
So, my questions are:
Are there existing programming paradigms (for languages without types) that avoid the necessity of using or inventing type systems?
Are there otherwise common recommendations on how to solve the problems that static typing solves in dynamically-typed languages (without sheepishly reinventing types)?
Here is a concrete example for you to consider. I'm working with datetimes and timezones in erlang (a dynamic, strongly typed language). This is a common datatype I work with:
{{Y,M,D},{tztime, {time, HH,MM,SS}, Flag}}
... where {Y,M,D} is a tuple representing a valid date (all entries are integers), tztime and time are atoms, HH,MM,SS are integers representing a sane 24-hr time, and Flag is one of the atoms u,d,z,s,w.
This datatype is commonly parsed from input, so to ensure valid input and a correct parser, the values need to be checked for type correctness, and for valid ranges. Later on, instances of this datatype are compared to each other, making the type of their values all the more important, since all terms compare. From the erlang reference manual
number < atom < reference < fun < port < pid < tuple < list < bit string
Aside from the confsion of static vs. dynamic and strong vs. weak typing:
What you want to implement in your example isn't really solved by most existing static typing systems. Range checks and complications like February 31th and especially parsed input are usually checked during runtime no matter what type system you have.
Your example being in Erlang I have a few recommendations:
Use records. Besides being usefull and helpfull for a whole bunch of reasons, the give you easy runtime type checking without a lot of effort e.g.:
is_same_day(#datetime{year=Y1, month=M1, day=D1},
#datetime{year=Y2, month=M2, day=D2}) -> ...
Effortless only matches for two datetime records. You could even add guards to check for ranges if the source is untrusted. And it conforms to erlangs let it crash method of error handling: if no match is found you get a badmatch, and can handle this on the level where it is apropriate (usually the supervisor level).
Generally write your code that it crashes when the assumptions are not valid
If this doesn't feel static checked enough: use typer and dialyzer to find the kind of errors that can be found statically, whatever remains will be checkd at runtime.
Don't be too restrictive in your functions what "types" you accept, sometimes the added functionality of just doing someting useful even for different inputs is worth more than checking the types and ranges on every function. If you do it where it matters usually you will catch the error early enough for it to be easy fixable. This is especially true for a functionaly language where you allways know where every value comes from.
A lot of good answers, let me add:
Are there existing programming paradigms (for languages without types) that avoid the necessity of using or inventing type systems?
The most important paradigm, especially in Erlang, is this: Assume the type is right, otherwise let it crash. Don't write excessively checking paranoid code, but assume that the input you get is of the right type or the right pattern. Don't write (there are exceptions to this rule, but in general)
foo({tag, ...}) -> do_something(..);
foo({tag2, ...}) -> do_something_else(..);
foo(Otherwise) ->
report_error(Otherwise),
try to fix problem here...
Kill the last clause and have it crash right away. Let a supervisor and other processes do the cleanup (you can use monitors() for janitorial processes to know when a crash has occurred).
Do be precise however. Write
bar(N) when is_integer(N) -> ...
baz([]) -> ...
baz(L) when is_list(L) -> ...
if the function is known only to work with integers or lists respectively. Yes, it is a runtime check but the goal is to convey information to the programmer. Also, HiPE tend to utilize the hint for optimization and eliminate the type check if possible. Hence, the price may be less than what you think it is.
You choose an untyped/dynamically-typed language so the price you have to pay is that type checking and errors from clashes will happen at runtime. As other posts hint, a statically typed language is not exempt from doing some checks as well - the type system is (usually) an approximation of a proof of correctness. In most static languages you often get input which you can't trust. This input is transformed at the "border" of the application and then converted to an internal format. The conversion serves to mark trust: From now on, the thing has been validated and we can assume certain things about it. The power and correctness of this assumption is directly tied to its type signature and how good the programmer is with juggling the static types of the language.
Are there otherwise common recommendations on how to solve the problems that static typing solves in dynamically-typed languages (without sheepishly reinventing types)?
Erlang has the dialyzer which can be used to statically analyze and infer types of your programs. It will not come up with as many type errors as a type checker in e.g., Ocaml, but it won't "cry wolf" either: An error from the dialyzer is provably an error in the program. And it won't reject a program which may be working ok. A simple example is:
and(true, true) -> true;
and(true, _) -> false;
and(false, _) -> false.
The invocation and(true, greatmistake) will return false, yet a static type system will reject the program because it will infer from the first line that the type signature takes a boolean() value as the 2nd parameter. The dialyzer will accept this function in contrast and give it the signature (boolean(), term()) -> boolean(). It can do this, because there is no need to protect a priori for an error. If there is a mistake, the runtime system has a type check that will capture it.
In order for a statically-typed language to match the flexibility of a dynamically-typed one, I think it would need a lot, perhaps infinitely many, features.
In the Haskell world, one hears a lot of sophisticated, sometimes to the point of being scary, teminology. Type classes. Parametric polymorphism. Generalized algebraic data types. Type families. Functional dependencies. The Ωmega programming language takes it even further, with the website listing "type-level functions" and "level polymorphism", among others.
What are all these? Features added to static typing to make it more flexible. These features can be really cool, and tend to be elegant and mind-blowing, but are often difficult to understand. Learning curve aside, type systems often fail to model real-world problems elegantly. A particularly good example of this is interacting with other languages (a major motivation for C# 4's dynamic feature).
Dynamically-typed languages give you the flexibility to implement your own framework of rules and assumptions about data, rather than be constrained by the ever-limited static type system. However, "your own framework" won't be machine-checked, meaning the onus is on you to ensure your "type system" is safe and your code is well-"typed".
One thing I've found from learning Haskell is that I can carry lessons learned about strong typing and sound reasoning over to weaker-typed languages, such as C and even assembly, and do the "type checking" myself. Namely, I can prove that sections of code are correct in and of themselves, by bearing in mind the rules my functions and values are supposed to follow, and the assumptions I am allowed to make about other functions and values. When debugging, I go through and check things again, and think through whether or not my approach is sound.
The bottom line: dynamic typing puts more flexibility at your fingertips. On the other hand, statically-typed languages tend to be more efficient (by orders of magnitude), and good static type systems drastically cut down on debugging time by letting the computer do much of it for you. If you want the benefits of both, install a static type checker in your brain by learning decent, strongly-typed languages.
Sometimes data need validation. Validating any data received from the network is almost always a good idea — especially data from a public network. Being paranoid here is only good. If something resembling a static type system helps this in the least painful way, so be it. There's a reason why Erlang allows type annotations. Even pattern matching can be seen as just a kind of dynamic type checking; nevertheless, it's a central feature of the language. The very structure of data is its 'type' in Erlang.
The good thing is that you can custom-tailor your 'type system' to your needs, make it flexible and smart, while type systems of OO languages typically have fixed features. When data structures you use are immutable, once you've validated such a structure, you're safe to assume it conforms your restrictions, just like with static typing.
There's no point in being ready to process any kind of data at any point of a program, dynamically-typed or not. A 'dynamic type' is essentially a union of all possible types; limiting it to a useful subset is a valid way to program.
A statically typed language detects type errors at compile time. A dynamically typed language detects them at runtime. There are some modest restrictions on what one can write in a statically typed language such that all type errors can be caught at compile time.
But yes, you still have types even in a dynamically typed language, and that's a good thing. The problem is you wander into lots of runtime checks to ensure that you have the types you think you do, since the compiler hasn't taken care of that for you.
Erlang has a very nice tool for specifying and statically verifying lots of types -- dialyzer: Erlang type system, for references.
So don't reinvent types, use the typing tools that Erlang already provides, to handle the types that already exist in your program (but which you haven't yet specified).
And this on its own won't eliminate range checks, unfortunately. Without lots of special sauce you really have to enforce this on your own by convention (and smart constructors, etc. to help), or fall back to runtime checks, or both.