What is the difference between statically typed and dynamically typed languages? - programming-languages

What does it mean when we say a language is dynamically typed versus statically typed?

Statically typed languages
A language is statically typed if the type of a variable is known at compile time. For some languages this means that you as the programmer must specify what type each variable is; other languages (e.g.: Java, C, C++) offer some form of type inference, the capability of the type system to deduce the type of a variable (e.g.: OCaml, Haskell, Scala, Kotlin).
The main advantage here is that all kinds of checking can be done by the compiler, and therefore a lot of trivial bugs are caught at a very early stage.
Examples: C, C++, Java, Rust, Go, Scala
Dynamically typed languages
A language is dynamically typed if the type is associated with run-time values, and not named variables/fields/etc. This means that you as a programmer can write a little quicker because you do not have to specify types every time (unless using a statically-typed language with type inference).
Examples: Perl, Ruby, Python, PHP, JavaScript, Erlang
Most scripting languages have this feature as there is no compiler to do static type-checking anyway, but you may find yourself searching for a bug that is due to the interpreter misinterpreting the type of a variable. Luckily, scripts tend to be small so bugs have not so many places to hide.
Most dynamically typed languages do allow you to provide type information, but do not require it. One language that is currently being developed, Rascal, takes a hybrid approach allowing dynamic typing within functions but enforcing static typing for the function signature.

Type checking is the process of verifying and enforcing the constraints of types.
Statically typed programming languages do type checking at compile-time.
Examples: Java, C, C++.
Dynamically typed programming languages do type checking at run-time.
Examples:
Perl, Ruby, Python, PHP, JavaScript.

Here is an example contrasting how Python (dynamically typed) and Go (statically typed) handle a type error:
def silly(a):
if a > 0:
print 'Hi'
else:
print 5 + '3'
Python does type checking at run time, and therefore:
silly(2)
Runs perfectly fine, and produces the expected output Hi. Error is only raised if the problematic line is hit:
silly(-1)
Produces
TypeError: unsupported operand type(s) for +: 'int' and 'str'
because the relevant line was actually executed.
Go on the other hand does type-checking at compile time:
package main
import ("fmt"
)
func silly(a int) {
if (a > 0) {
fmt.Println("Hi")
} else {
fmt.Println("3" + 5)
}
}
func main() {
silly(2)
}
The above will not compile, with the following error:
invalid operation: "3" + 5 (mismatched types string and int)

Simply put it this way: in a statically typed language variables' types are static, meaning once you set a variable to a type, you cannot change it. That is because typing is associated with the variable rather than the value it refers to.
For example in Java:
String str = "Hello"; // variable str statically typed as string
str = 5; // would throw an error since str is
// supposed to be a string only
Where on the other hand: in a dynamically typed language variables' types are dynamic, meaning after you set a variable to a type, you CAN change it. That is because typing is associated with the value it assumes rather than the variable itself.
For example in Python:
some_str = "Hello" # variable some_str is linked to a string value
some_str = 5 # now it is linked to an integer value; perfectly OK
So, it is best to think of variables in dynamically typed languages as just generic pointers to typed values.
To sum up, type describes (or should have described) the variables in the language rather than the language itself. It could have been better used as a language with statically typed variables versus a language with dynamically typed variables IMHO.
Statically typed languages are generally compiled languages, thus, the compilers check the types (make perfect sense right? as types are not allowed to be changed later on at run time).
Dynamically typed languages are generally interpreted, thus type checking (if any) happens at run time when they are used. This of course brings some performance cost, and is one of the reasons dynamic languages (e.g., python, ruby, php) do not scale as good as the typed ones (java, c#, etc.). From another perspective, statically typed languages have more of a start-up cost: makes you usually write more code, harder code. But that pays later off.
The good thing is both sides are borrowing features from the other side. Typed languages are incorporating more dynamic features, e.g., generics and dynamic libraries in c#, and dynamic languages are including more type checking, e.g., type annotations in python, or HACK variant of PHP, which are usually not core to the language and usable on demand.
When it comes to technology selection, neither side has an intrinsic superiority over the other. It is just a matter of preference whether you want more control to begin with or flexibility. just pick the right tool for the job, and make sure to check what is available in terms of the opposite before considering a switch.

http://en.wikipedia.org/wiki/Type_system
Static typing
A programming language is said to use
static typing when type checking is
performed during compile-time as
opposed to run-time. In static typing,
types are associated with variables
not values. Statically typed languages
include Ada, C, C++, C#, JADE, Java,
Fortran, Haskell, ML, Pascal, Perl
(with respect to distinguishing
scalars, arrays, hashes and
subroutines) and Scala. Static typing
is a limited form of program
verification (see type safety):
accordingly, it allows many type
errors to be caught early in the
development cycle. Static type
checkers evaluate only the type
information that can be determined at
compile time, but are able to verify
that the checked conditions hold for
all possible executions of the
program, which eliminates the need to
repeat type checks every time the
program is executed. Program execution
may also be made more efficient (i.e.
faster or taking reduced memory) by
omitting runtime type checks and
enabling other optimizations.
Because they evaluate type information
during compilation, and therefore lack
type information that is only
available at run-time, static type
checkers are conservative. They will
reject some programs that may be
well-behaved at run-time, but that
cannot be statically determined to be
well-typed. For example, even if an
expression always
evaluates to true at run-time, a
program containing the code
if <complex test> then 42 else <type error>
will be rejected as ill-typed, because
a static analysis cannot determine
that the else branch won't be
taken.[1] The conservative behaviour
of static type checkers is
advantageous when
evaluates to false infrequently: A
static type checker can detect type
errors in rarely used code paths.
Without static type checking, even
code coverage tests with 100% code
coverage may be unable to find such
type errors. Code coverage tests may
fail to detect such type errors
because the combination of all places
where values are created and all
places where a certain value is used
must be taken into account.
The most widely used statically typed
languages are not formally type safe.
They have "loopholes" in the
programming language specification
enabling programmers to write code
that circumvents the verification
performed by a static type checker and
so address a wider range of problems.
For example, Java and most C-style
languages have type punning, and
Haskell has such features as
unsafePerformIO: such operations may
be unsafe at runtime, in that they can
cause unwanted behaviour due to
incorrect typing of values when the
program runs.
Dynamic typing
A programming language is said to be
dynamically typed, or just 'dynamic',
when the majority of its type checking
is performed at run-time as opposed to
at compile-time. In dynamic typing,
types are associated with values not
variables. Dynamically typed languages
include Groovy, JavaScript, Lisp, Lua,
Objective-C, Perl (with respect to
user-defined types but not built-in
types), PHP, Prolog, Python, Ruby,
Smalltalk and Tcl. Compared to static
typing, dynamic typing can be more
flexible (e.g. by allowing programs to
generate types and functionality based
on run-time data), though at the
expense of fewer a priori guarantees.
This is because a dynamically typed
language accepts and attempts to
execute some programs which may be
ruled as invalid by a static type
checker.
Dynamic typing may result in runtime
type errors—that is, at runtime, a
value may have an unexpected type, and
an operation nonsensical for that type
is applied. This operation may occur
long after the place where the
programming mistake was made—that is,
the place where the wrong type of data
passed into a place it should not
have. This makes the bug difficult to
locate.
Dynamically typed language systems,
compared to their statically typed
cousins, make fewer "compile-time"
checks on the source code (but will
check, for example, that the program
is syntactically correct). Run-time
checks can potentially be more
sophisticated, since they can use
dynamic information as well as any
information that was present during
compilation. On the other hand,
runtime checks only assert that
conditions hold in a particular
execution of the program, and these
checks are repeated for every
execution of the program.
Development in dynamically typed
languages is often supported by
programming practices such as unit
testing. Testing is a key practice in
professional software development, and
is particularly important in
dynamically typed languages. In
practice, the testing done to ensure
correct program operation can detect a
much wider range of errors than static
type-checking, but conversely cannot
search as comprehensively for the
errors that both testing and static
type checking are able to detect.
Testing can be incorporated into the
software build cycle, in which case it
can be thought of as a "compile-time"
check, in that the program user will
not have to manually run such tests.
References
Pierce, Benjamin (2002). Types and Programming Languages. MIT Press.
ISBN 0-262-16209-1.

Compiled vs. Interpreted
"When source code is translated"
Source Code: Original code (usually typed by a human into a computer)
Translation: Converting source code into something a computer can read (i.e. machine code)
Run-Time: Period when program is executing commands (after compilation, if compiled)
Compiled Language: Code translated before run-time
Interpreted Language: Code translated on the fly, during execution
Typing
"When types are checked"
5 + '3' is an example of a type error in strongly typed languages such as Go and Python, because they don't allow for "type coercion" -> the ability for a value to change type in certain contexts such as merging two types. Weakly typed languages, such as JavaScript, won't throw a type error (results in '53').
Static: Types checked before run-time
Dynamic: Types checked on the fly, during execution
The definitions of "Static & Compiled" and "Dynamic & Interpreted" are quite similar...but remember it's "when types are checked" vs. "when source code is translated".
You'll get the same type errors irrespective of whether the language is compiled or interpreted! You need to separate these terms conceptually.
Python Example
Dynamic, Interpreted
def silly(a):
if a > 0:
print 'Hi'
else:
print 5 + '3'
silly(2)
Because Python is both interpreted and dynamically typed, it only translates and type-checks code it's executing on. The else block never executes, so 5 + '3' is never even looked at!
What if it was statically typed?
A type error would be thrown before the code is even run. It still performs type-checking before run-time even though it is interpreted.
What if it was compiled?
The else block would be translated/looked at before run-time, but because it's dynamically typed it wouldn't throw an error! Dynamically typed languages don't check types until execution, and that line never executes.
Go Example
Static, Compiled
package main
import ("fmt"
)
func silly(a int) {
if (a > 0) {
fmt.Println("Hi")
} else {
fmt.Println("3" + 5)
}
}
func main() {
silly(2)
}
The types are checked before running (static) and the type error is immediately caught! The types would still be checked before run-time if it was interpreted, having the same result. If it was dynamic, it wouldn't throw any errors even though the code would be looked at during compilation.
Performance
A compiled language will have better performance at run-time if it's statically typed (vs. dynamically); knowledge of types allows for machine code optimization.
Statically typed languages have better performance at run-time intrinsically due to not needing to check types dynamically while executing (it checks before running).
Similarly, compiled languages are faster at run time as the code has already been translated instead of needing to "interpret"/translate it on the fly.
Note that both compiled and statically typed languages will have a delay before running for translation and type-checking, respectively.
More Differences
Static typing catches errors early, instead of finding them during execution (especially useful for long programs). It's more "strict" in that it won't allow for type errors anywhere in your program and often prevents variables from changing types, which further defends against unintended errors.
num = 2
num = '3' // ERROR
Dynamic typing is more flexible, which some appreciate. It typically allows for variables to change types, which can result in unexpected errors.

The terminology "dynamically typed" is unfortunately misleading. All languages are statically typed, and types are properties of expressions (not of values as some think). However, some languages have only one type. These are called uni-typed languages. One example of such a language is the untyped lambda calculus.
In the untyped lambda calculus, all terms are lambda terms, and the only operation that can be performed on a term is applying it to another term. Hence all operations always result in either infinite recursion or a lambda term, but never signal an error.
However, were we to augment the untyped lambda calculus with primitive numbers and arithmetic operations, then we could perform nonsensical operations, such adding two lambda terms together: (λx.x) + (λy.y). One could argue that the only sane thing to do is to signal an error when this happens, but to be able to do this, each value has to be tagged with an indicator that indicates whether the term is a lambda term or a number. The addition operator will then check that indeed both arguments are tagged as numbers, and if they aren't, signal an error. Note that these tags are not types, because types are properties of programs, not of values produced by those programs.
A uni-typed language that does this is called dynamically typed.
Languages such as JavaScript, Python, and Ruby are all uni-typed. Again, the typeof operator in JavaScript and the type function in Python have misleading names; they return the tags associated with the operands, not their types. Similarly, dynamic_cast in C++ and instanceof in Java do not do type checks.

Statically typed languages: each variable and expression is already known at compile time.
(int a; a can take only integer type values at runtime)
Examples: C, C++, Java
Dynamically typed languages: variables can receive different values at runtime and their type is defined at run time.
(var a; a can take any kind of values at runtime)
Examples: Ruby, Python.

In Programming, Data Type is a Classification which tells what type of value a variable will hold and what are the mathematical, relational and logical operations can be done on those values without getting error.
In each programming language, to minimize the chance of getting error, type checking is done either before or during program execution. Depending on the Timing of Type Checking, programming languages are 2 types : Statically Typed and Dynamically Typed languages.
Also depending on whether Implicit Type Conversion happens or not, programming languages are 2 types : Strongly Typed and Weakly Typed languages.
Statically Typed :
Type checking is done at compile time
In source code, at the time of variable declaration, data type of that variable must be explicitly specified. Because if data type is specified in source code then at compile time that source code will be converted to machine code and type checking can happen
Here data type is associated with variable like, int count. And this association is static or fixed
If we try to change data type of an already declared variable (int count) by assigning a value of other data type (int count = "Hello") into it, then we will get error
If we try to change data type by redeclaring an already declared variable (int count) using other data type (boolean count) then also we will get error
int count; /* count is int type, association between data type
and variable is static or fixed */
count = 10; // no error
count = 'Hello'; // error
boolean count; // error
As type checking and type error detection is done at compile time that's why during runtime no further type checking is needed. Thus program becomes more optimized, results in faster execution
If we want more rigid code then choosing this type of language is better option
Example : Java, C, C++, Go, Swift etc.
Dynamically Typed :
Type checking is done at runtime
In source code, at the time of variable declaration, no need to explicitly specify data type of that variable. Because during type checking at runtime, the language system determines variable type from data type of the assigned value to that variable
Here data type is associated with the value assigned to the variable like, var foo = 10, 10 is a Number so now foo is of Number data type. But this association is dynamic or flexible
we can easily change data type of an already declared variable (var foo = 10), by assigning a value of other data type (foo = "Hi") into it, no error
we can easily change data type of an already declared variable (var foo = 10), by redeclaring it using value of other data type (var foo = true), no error
var foo; // without assigned value, variable holds undefined data type
var foo = 10; // foo is Number type now, association between data
// type and value is dynamic / flexible
foo = 'Hi'; // foo is String type now, no error
var foo = true; // foo is Boolean type now, no error
As type checking and type error detection is done at runtime, that's why program becomes less optimized, results in slower execution. Although execution of these type of languages can be faster if they implement JIT Compiler
If we want to write and execute code easily then this type of language is better option, but here we can get runtime error
Example : Python, JavaScript, PHP, Ruby etc.

Statically typed languages type-check at compile time and the type can NOT change. (Don't get cute with type-casting comments, a new variable/reference is created).
Dynamically typed languages type-check at run-time and the type of an variable CAN be changed at run-time.

Sweet and simple definitions, but fitting the need:
Statically typed languages binds the type to a variable for its entire scope (Seg: SCALA)
Dynamically typed languages bind the type to the actual value referenced by a variable.

In a statically typed language, a variable is associated with a type which is known at compile time, and that type remains unchanged throughout the execution of a program. Equivalently, the variable can only be assigned a value which is an instance of the known/specified type.
In a dynamically typed language, a variable has no type, and its value during execution can be anything of any shape and form.

Static typed languages (compiler resolves method calls and compile references):
usually better performance
faster compile error feedback
better IDE support
not suited for working with undefined data formats
harder to start a development when model is not defined when
longer compilation time
in many cases requires to write more code
Dynamic typed languages (decisions taken in running program):
lower performance
faster development
some bugs might be detected only later in run-time
good for undefined data formats (meta programming)

Static Type: Type checking performed at compile time.
What actually mean by static type language:
type of a variable must be specified
a variable can reference only a particular type of object*
type check for the value will be performed at the compile time and any type checking will be reported at that time
memory will be allocated at compile time to store the value of that particular type
Example of static type language are C, C++, Java.
Dynamic Type: Type checking performed at runtime.
What actually mean by dynamic type language:
no need to specify type of the variable
same variable can reference to different type of objects
Python, Ruby are examples of dynamic type language.
* Some objects can be assigned to different type of variables by typecasting it (a very common practice in languages like C and C++)

Statically typed languages like C++, Java and Dynamically typed languages like Python differ only in terms of the execution of the type of the variable.
Statically typed languages have static data type for the variable, here the data type is checked during compiling so debugging is much simpler...whereas Dynamically typed languages don't do the same, the data type is checked which executing the program and hence the debugging is bit difficult.
Moreover they have a very small difference and can be related with strongly typed and weakly typed languages. A strongly typed language doesn't allow you to use one type as another eg. C and C++ ...whereas weakly typed languages allow eg.python

Statically Typed
The types are checked before run-time so mistakes can be caught earlier.
Examples = c++
Dynamically Typed
The types are checked during execution.
Examples = Python

Dynamically typed programming that allows the program to change the type of the variable at runtime.
dynamic typing languages : Perl, Ruby, Python, PHP, JavaScript, Erlang
Statically typed, means if you try to store a string in an integer variable, it would not accept it.
Statically typed languages :C, C++, Java, Rust, Go, Scala, Dart

dynamically typed language helps to quickly prototype algorithm concepts without the overhead of about thinking what variable types need to be used (which is a necessity in statically typed language).

Static Typing:
The languages such as Java and Scala are static typed.
The variables have to be defined and initialized before they are used in a code.
for ex. int x; x = 10;
System.out.println(x);
Dynamic Typing:
Perl is an dynamic typed language.
Variables need not be initialized before they are used in code.
y=10; use this variable in the later part of code

Related

Is Haskell a strongly typed programming language?

Is Haskell strongly typed? I.e. is it possible to change the type of a variable after you assigned one? I can't seem to find the answer on the internet.
Static — types are known at compile time. Java and Haskell have static typing. Also C/C++, C#, Go, Scala, Rust, Kotlin, Pascal to list a few more.
A statically typed language might or might not have type inference. Java almost completely lacks type inference (but it's very slowly changing just a little bit); Haskell has full type inference (except with certain very advanced extensions).
(Type inference is when you only have to declare a minimal amount of types by hand, e.g. var isFoo = true and var person = new Person(), instead of bool isFoo = ... and Person person = ....)
Dynamic — Python, JavaScript, Ruby, PHP, Clojure (and Lisps in general), Prolog, Erlang, Groovy etc. Can also be called "unityped"; dynamic typing can be "emulated" in a static setting, but the reverse is not true except by using external static analysis tools. Some languages make it possible to mix dynamic and static (see gradual typing, e.g. https://typedclojure.org/).
Some languages enable static typing for one or more modules, applied at import time, for example: Python+Mypy, Typed Clojure, JavaScript+Flow, PHP+Hack to name a few.
Strong — values that are intended to be treated as Cat always are; trying to treat them like a Dog will cause a loud meeewww... I mean error.
Weak — this effectively boils down to 2 similar but distinct things: type coercion (e.g. "5"+3 equals 8 in PHP — or does it!) and memory reinterpretation (e.g. (int) someCharValue or (bool) somePtr in C, and C++ as well, but C++ wants you to explicitly say reinterpret_cast). So there's really coercion-weak and reinterpretation-weak, and different languages are weak in one or both of these ways.
Interestingly, note that coercion is implicit by nature and memory reinterpretation is explicit (except in Assembly) — so weak typing consists of an implicit and an explicit behavior. Maybe that's even more of a reason to refer to 2 distinct subcategories under weak typing.
There are languages with all 4 possible combinations, and variations/gradations thereof.
Haskell is static+strong; of course it has unsafeCoerce so it can be static+a bit reinterpret-weak at times, but unsafeCoerce is very much frowned upon except in extreme situations where you are sure about something being the case but just can't seem to persuade the compiler without going all the way back and retelling the entire story in a different way.
C is static+weak because all memory can freely be reinterpreted as something it originally was not meant to contain, hence weak. But all of those reinterpretations are kept track of by the type checker, so still fully static too. But C does not do implicit coercions, so it's only reinterpret-weak.
Python is dynamic+almost entirely strong — there are no types known on any given line of code prior to reaching that line during execution, however values that live at runtime do have types associated with them and it's impossible to reinterpret memory. Implicit coercions are also kept to a meaningful minimum, so one might say Python is 99.9% strong and 0.01% coercion-weak.
PHP and JavaScript are dynamic+mostly weak — dynamic, in that nothing has type until you execute and introspect its contents, and also weak in that coercions happen all the time and with things you'd never really expect to be coerced, unless you are only calling methods and functions and not using built-in operations. These coercions are a source of a lot of humor on the internet. There are no memory reinterpretations so PHP and JS are coercion-weak.
Furthermore, some people like to think that static typing is about variables having type, and strong typing is about values having type — this is a very useful way to go about understanding the full picture, but it's not quite true: some dynamically typed languages also allow variables/parameters to be annotated with types/constraints that are enforced at runtime.
In static typing, it's expressions that have a type; the fact of variables having type is only a consequence of variables being used as a means to glue bigger expressions together from smaller ones, so it's not variables per se that have types.
Similarly, in dynamic typing, it's not the variables that lack statically known type — it's all expressions! Variables lacking type is merely a consequence of the expressions they store lacking type.
One final illustration
In dynamic typing, all the cats, dogs and even elephants (in fact entire zoos!) are packaged up in identically sized boxes.
In static typing these boxes look different and have stickers on them saying what's inside.
Some people like it because they can just use a single box form factor and don't have to put any labels on the boxes — it's only the arrangement of boxes with regards to each other that implicitly (and hopefully) provides type sanity.
Some people also like it because it allows them to do all sorts of tricks with tigers temporarily being transported in boxes that smell like lions, and bears put in the same array of interconnected boxes as wolves or deer.
In such label-free setting of transport boxes, all the possible logicistics scenarios need to be played or simulated in order to detect misalignment in the implicit arrangement, like in a stage performance. No reliable guarantees can be given based on reasoning only, generally speaking. (ad-hoc test cases that need for the entire system to be started up for any partial conclusions to be obtained of its soundness)
With labels and explicit rules on how to deal with boxes of various labels, automated/mechanized logical reasoning can be used to draw up conclusions on what the logistics system won't do or will do for sure (static verification, formal proof, or at least pseudo-proof like QuickCheck), Some aspects of the logistics still need to be verified with trial runs, such as whether the logistics team even got the client right. (integration testing, acceptance testing, end user sanity checks).
Moreover, in weak typing dogs can be sliced up and reassembled as frankenstein cats. Whether they like it or not, and whether the result is ugly or not. (weak typing)
But if you add labels to the boxes, it still matters that Frankenstein cats be put in cat boxes. (static+weak typing)
In strong typing, while you can put a cat in the box of a dog, but you can only keep pretending it's a dog until you try to humiliate it by feeding it something only dogs would eat — if that happens, it will scream out loud, but until that time, if you're in dynamic typing, it will silently accept its place (in a static world it would refuse to be put in a dog's box before you can say "kitty").
You seem to mix up dynamic/static and weak/strong typing.
Dynamic or static typing is about whether the type of a variable can be changed during execution.
Weak or strong typing is about being able to predict type errors just from function signatures.
Haskell is both statically and strongly typed.
However, there is no such thing as variable in Haskell so talking about dynamic or static typing makes no sense since every identifier assigned with a value cannot be changed at execution.
EDIT: But like goldenbull said, those typing notions are not clearly defined.
It is strongly typed. See section 2.3 here: Why Haskell matters
I think you are talking about two different things.
First, haskell, and most functional programming (FP) languages, do NOT have the concept "variable". Instead, they use the concept "name" and "value", they just "bind" a value to a name. Once the value is bound, you can not bind another value to the same name, this is the key feature of FP.
Strong typing is another topic. Yes, haskell is strongly typed, and so are most FP languages. Strong typing gives FP the ability of "type inference" which is powerful to eliminate hidden bugs in compile time and help reduce the size of the source code.
Maybe you are comparing haskell with python? Python is also strongly typed. The difference between haskell and python is "static typed" and "dynamic typed". The actual meaning of term "Strong type" and "Weak Type" are ambiguous and fuzzy. That is another long story...

Compiled Language with Dynamic Typing

I'm a bit confused when it comes to a compiled language (compilation to native code) with dynamic typing.
Dynamic typing says that the types in a program are only inferred at runtime.
Now if a language is compiled, there's no interpreter running at runtime; it's just your CPU reading instructions off memory and executing them. In such a scenario, if any instruction violating the type semantics of the language happens to execute at runtime, there's no interpreter to intercept the execution of the program and throw any errors. How does the system work then?
What happens when an instruction violating the type semantics of a dynamically typed compiled language is executed at runtime?
PS: Some of the dynamically typed compiled languages I know of include Scheme, Lua and Common Lisp.
A compiler for a dynamically typed language would simply generate instructions that check the type where necessary. In fact even for some statically typed languages this is sometimes necessary, say, in case of an object oriented language with checked casts (like dynamic_cast in C++). In dynamically types languages it is simply necessary more often.
For these checks to be possible each value needs to be represented in a way that keeps track of its type. A common way to represent values in dynamically typed languages is to represent them as pointers to a struct that contains the type as well as the value (as an optimization this is often avoided in case of (sufficiently small) integers by storing the integer directly as an invalid pointer (by shifting the integer one to the left and setting its least significant bit)).

Basic Concepts of Language Type Systems

Could someone please explain clearly and succinctly the concepts of language type systems?
I've read a post or two here on type systems, but have trouble finding one that answers all my questions below.
I've heard/read that there are 3 type categorizations: dynamic vs static, strong vs weak, safe vs unsafe.
Some questions:
Are there any others?
What do each of these mean?
If a language allows you to change the type of a variable in runtime (e.g. a variable that used to store an int is later used to store a string), what category does that fall in?
How does Python fit into each of these categories?
Is there anything else I should know about type systems?
Thanks very much!
1) Apparently, there are others: http://en.wikipedia.org/wiki/Type_system
2)
Dynamic => Type checking is done during runtime (program execution) e.g. Python.
Static (as opposed to Dynamic) => Type checking is done during compile time e.g. C++
Strong => Once the type system decides that a particular object is of a type, it doesn't allow it to be used as another type. e.g. Python
Weak (as opposed to Strong) => The type system allows objects types to change. e.g. perl lets you read a number as a string, then use it again as a number
Type safety => I can only best describe with a 'C' statement like:
x = (int *) malloc (...);
malloc returns a (void *) and we simply type-cast it to (int *). At compile time there is no check that the pointer returned by the function malloc will actually be the size of an integer => Some C operations aren't type safe.
I am told that some 'purely functional' languages are inherently type safe, but I do not know any of these languages. I think Standard ML or Haskell would be type safe.
3) "If a language allows you to change the type of a variable in runtime (e.g. a variable that used to store an int is later used to store a string), what category does that fall in?":
This may be dynamic - variables are untyped, values may carry implicit or explicit type information; alternatively, the type system may be able to cope with variables that change type, and be a static type system.
4) Python: It's dynamically and strongly typed. Type safety is something I don't know python (and type safety itself) enough to say anything about.
5) "Is there anything else I should know about type systems?": Maybe read the book #BasileStarynkevitch suggests?
You are asking a lot here :) Type system is a dedicated field of computer science!
Starting from the begining, "a type system is method for proving the absence of certain program behavior" (See B.Pierce's Types and Programming Languages, also referred in the other answer). Programs that pass the type checking is a subset of what would be valid programs. For instance, the method
int answer() {
if(true) { return 42; } else { return "wrong"; }
}
would actually behave well at run-time. The else branch is never executed, and the answer always return 42. The static type system is a conservative analysis that will reject this program, because it can not prove the absence of a type error, that is, that "wrong" is never returned.
Of course, you could improve the type system to actually detect that the else branch never happens. You want to improve the type system to reject as few program as possible. This is why type system have been enriched over the years to support more and more refinement (e.g. generic, etc.)
The point of a type system is to prove the absence of type errors. In practice, they support operations like downcasting that inherently imply run-time type checks, and might lead to type errors. Again, the goal is to make the type system as flexible as possible, so that we don't need to resort to these operations that weaken type safety (e.g. generic).
You can read chapter 1 of the aforementionned book for a really nice introduction. For the rest, I will refer you to What To Know Before Debating Type Systems, which is awesome blog post about the basic concepts.
Is there anything else I should know about type systems?
Oh, yes! :)
Happy immersion in the world of type systems!
I suggest reading B.Pierce's Types and Programming Languages book. And I also suggest learning a bit of a statically-typed, with type inference, language like Ocaml or Haskell.
A type system is a mechanism which controls the functions which access values. Compile time checking is one aspect of this, which rejects programs during compilation if an attempt is made to use a function on values it is not designed to handle. However another aspect is the converse, the selection of functions to handle some values, for example overloading. Another example is specialisation of polymorphic functions (e.g. templates in C++). Inference and deduction are other aspects where the type of functions is deduced by usage rather than specified by the programmer.
Parts of the checking and selection can be deferred until run time. Dispatch of methods based on variant tags or by indirection or specialised tables as for C++ virtual functions or Haskell typeclass dictionaries are two examples provided even in extremely strongly typed languages.
The key concept of type systems is called soundness. A type system is sound if it guarantees no value can be used by an inappropriate function. Roughly speaking an unsound type system has "holes" and is useless. The type system of ISO C89 is sound if you remove casts (and void* conversions), and unsound if you allow them. The type system of ISO C++ is unsound.
A second vital concept of types systems is called expressiveness. Sound type systems for polymorphic programming prevent programmers writing valid code: they're universally too restrictive (and I believe inescapably so). Making type systems more expressive so they allow a wider set of valid programs is the key academic challenge.
Another concept of typing is strength. A strong type system can find more errors earlier. For example many languages have type systems too weak to detect array bounds violations using the type system and have to resort to run time checks. Somehow strength is the opposite of expressiveness: we want to allow more valid programs (expressiveness) but also catch even more invalid ones (strength).
Here's a key question: explain why OO typing is too weak to permit OO to be used as a general development paradigm. [Hint: OO cannot handle relations]

Does case sensitivity have anything to do with strongly typed languages (or loosely typed languages)?

(I admit this may be a n00b question - I know very little about CS theory, mostly a hands-on/hobby sort.)
I was googling up strongly-typed language for the official definition, and one of the top links I found was from Yahoo Answers, which suggested that case sensitive was a part of whether a language is loosely/strongly typed.
I had always thought the simple answer to the difference between a strongly typed/weakly typed language is that the first requires explicit type declarations, while the later is more open, even "dynamic".
The two S/O threads (here and here) I found so far seem to suggest that (more or less), but they don't mention anything about case sensitivity. Is there a relation at all between case sensitive and strong/weak?
A couple of clarifications:
Case sensitivity has nothing to do with strong vs. weak typing, static vs. dynamic typing or any other property of the type system. I don't know why the answer on yahoo answers has gotten its one upvote, but it's completely wrong. Just ignore it.
Strong typing isn't a well-defined term, but it is often used to refer to languages with few implicit type conversions, i.e. languages where it is an error to perform operations on types that do not support that operation.
As an example multiplying the strings "foo" and "bar" gives 0 as the result in perl, while it causes a type error in ruby, python, java, haskell, ml and many other languages. Thus those languages are more strongly typed than perl.
Strong typing is also sometimes used as a synonym for static typing.
A statically typed language is a language in which the types of variables, functions and expressions are known at compile time (or before runtime anyway - a statically typed language need not be compiled per se, though in practice it usually is). This means that if a statically typed program contains a type error, it will not run. If a dynamically typed program contains a type error it will run up to the point where the error happens and then crash.
Whether a language requires type annotations is (somewhat) independent of whether its type system is strong or weak or static or dynamic. In theory a dynamically typed language could require (or at least allow) type annotations and then throw runtime errors when those annotations are broken (though I don't know of any dynamically that actually does this).
More importantly there are many statically and strongly typed languages (e.g. Haskell, ML) that don't require type annotations, but instead use type inference algorithms to infer the types.
In theory, case sensitivity is completely unrelated to type strictness. Case sensitivity is about whether the identifiers foo, FOO, and fOo refer to the same variable, function, or what-have-you. Type strictness is about whether variables have types or just values do, how easy it is to convert among types, and so on.
In practice, there might be a correlation between case sensitivity and type strictness, but I can't think of enough case-insensitive languages right now to make an assessment. My impression is that most languages commonly used today are case sensitive — possibly because C was case sensitive and very influential, possibly because it was the only way to force people to stop PROGRAMMING IN ALL CAPS after a couple decades of FORTRAN, COBOL, and BASIC.
No - they're not connected. Strongly type languages force you to specify the type of data that a variable may hold - such as a real number, an integer, a textual string, or some programmer-defined object. You they can't accidentally assign another type of data into that variable unless it is implicitly convertible: examples of this are that you can generally put a integer into a real number (i.e. double x = 3.14; x = 3; is ok but int x = 3; x = 3.14; might not be, depending on how strongly typed the langauge is). Weakly typed languages just store whatever they're asked to without doing these sanity checks. In strongly typed languages like C++, you can still create type that can store data that can be any of a specific set of types (e.g. C++'s boost::variant), but sometimes they're a bit more limited in how much you can do and how convenient it is to use.
Case sensitivity is means that the uppercase and lowercase versions of the same letter are considered equivalent for some purposes... normally in a string comparison or regular expression match. It is unusual but not unheard of for modern computer languages to ignore the case of letters in variable names (identifiers).

Why do a lot of programming languages put the type *after* the variable name?

I just came across this question in the Go FAQ, and it reminded me of something that's been bugging me for a while. Unfortunately, I don't really see what the answer is getting at.
It seems like almost every non C-like language puts the type after the variable name, like so:
var : int
Just out of sheer curiosity, why is this? Are there advantages to choosing one or the other?
There is a parsing issue, as Keith Randall says, but it isn't what he describes. The "not knowing whether it is a declaration or an expression" simply doesn't matter - you don't care whether it's an expression or a declaration until you've parsed the whole thing anyway, at which point the ambiguity is resolved.
Using a context-free parser, it doesn't matter in the slightest whether the type comes before or after the variable name. What matters is that you don't need to look up user-defined type names to understand the type specification - you don't need to have understood everything that came before in order to understand the current token.
Pascal syntax is context-free - if not completely, at least WRT this issue. The fact that the variable name comes first is less important than details such as the colon separator and the syntax of type descriptions.
C syntax is context-sensitive. In order for the parser to determine where a type description ends and which token is the variable name, it needs to have already interpreted everything that came before so that it can determine whether a given identifier token is the variable name or just another token contributing to the type description.
Because C syntax is context-sensitive, it very difficult (if not impossible) to parse using traditional parser-generator tools such as yacc/bison, whereas Pascal syntax is easy to parse using the same tools. That said, there are parser generators now that can cope with C and even C++ syntax. Although it's not properly documented or in a 1.? release etc, my personal favorite is Kelbt, which uses backtracking LR and supports semantic "undo" - basically undoing additions to the symbol table when speculative parses turn out to be wrong.
In practice, C and C++ parsers are usually hand-written, mixing recursive descent and precedence parsing. I assume the same applies to Java and C#.
Incidentally, similar issues with context sensitivity in C++ parsing have created a lot of nasties. The "Alternative Function Syntax" for C++0x is working around a similar issue by moving a type specification to the end and placing it after a separator - very much like the Pascal colon for function return types. It doesn't get rid of the context sensitivity, but adopting that Pascal-like convention does make it a bit more manageable.
the 'most other' languages you speak of are those that are more declarative. They aim to allow you to program more along the lines you think in (assuming you aren't boxed into imperative thinking).
type last reads as 'create a variable called NAME of type TYPE'
this is the opposite of course to saying 'create a TYPE called NAME', but when you think about it, what the value is for is more important than the type, the type is merely a programmatic constraint on the data
If the name of the variable starts at column 0, it's easier to find the name of the variable.
Compare
QHash<QString, QPair<int, QString> > hash;
and
hash : QHash<QString, QPair<int, QString> >;
Now imagine how much more readable your typical C++ header could be.
In formal language theory and type theory, it's almost always written as var: type. For instance, in the typed lambda calculus you'll see proofs containing statements such as:
x : A y : B
-------------
\x.y : A->B
I don't think it really matters, but I think there are two justifications: one is that "x : A" is read "x is of type A", the other is that a type is like a set (e.g. int is the set of integers), and the notation is related to "x ε A".
Some of this stuff pre-dates the modern languages you're thinking of.
An increasing trend is to not state the type at all, or to optionally state the type. This could be a dynamically typed langauge where there really is no type on the variable, or it could be a statically typed language which infers the type from the context.
If the type is sometimes given and sometimes inferred, then it's easier to read if the optional bit comes afterwards.
There are also trends related to whether a language regards itself as coming from the C school or the functional school or whatever, but these are a waste of time. The languages which improve on their predecessors and are worth learning are the ones that are willing to accept input from all different schools based on merit, not be picky about a feature's heritage.
"Those who cannot remember the past are condemned to repeat it."
Putting the type before the variable started innocuously enough with Fortran and Algol, but it got really ugly in C, where some type modifiers are applied before the variable, others after. That's why in C you have such beauties as
int (*p)[10];
or
void (*signal(int x, void (*f)(int)))(int)
together with a utility (cdecl) whose purpose is to decrypt such gibberish.
In Pascal, the type comes after the variable, so the first examples becomes
p: pointer to array[10] of int
Contrast with
q: array[10] of pointer to int
which, in C, is
int *q[10]
In C, you need parentheses to distinguish this from int (*p)[10]. Parentheses are not required in Pascal, where only the order matters.
The signal function would be
signal: function(x: int, f: function(int) to void) to (function(int) to void)
Still a mouthful, but at least within the realm of human comprehension.
In fairness, the problem isn't that C put the types before the name, but that it perversely insists on putting bits and pieces before, and others after, the name.
But if you try to put everything before the name, the order is still unintuitive:
int [10] a // an int, ahem, ten of them, called a
int [10]* a // an int, no wait, ten, actually a pointer thereto, called a
So, the answer is: A sensibly designed programming language puts the variables before the types because the result is more readable for humans.
I'm not sure, but I think it's got to do with the "name vs. noun" concept.
Essentially, if you put the type first (such as "int varname"), you're declaring an "integer named 'varname'"; that is, you're giving an instance of a type a name. However, if you put the name first, and then the type (such as "varname : int"), you're saying "this is 'varname'; it's an integer". In the first case, you're giving an instance of something a name; in the second, you're defining a noun and stating that it's an instance of something.
It's a bit like if you were defining a table as a piece of furniture; saying "this is furniture and I call it 'table'" (type first) is different from saying "a table is a kind of furniture" (type last).
It's just how the language was designed. Visual Basic has always been this way.
Most (if not all) curly brace languages put the type first. This is more intuitive to me, as the same position also specifies the return type of a method. So the inputs go into the parenthesis, and the output goes out the back of the method name.
I always thought the way C does it was slightly peculiar: instead of constructing types, the user has to declare them implicitly. It's not just before/after the variable name; in general, you may need to embed the variable name among the type attributes (or, in some usage, to embed an empty space where the name would be if you were actually declaring one).
As a weak form of pattern-matching, it is intelligable to some extent, but it doesn't seem to provide any particular advantages, either. And, trying to write (or read) a function pointer type can easily take you beyond the point of ready intelligability. So overall this aspect of C is a disadvantage, and I'm happy to see that Go has left it behind.
Putting the type first helps in parsing. For instance, in C, if you declared variables like
x int;
When you parse just the x, then you don't know whether x is a declaration or an expression. In contrast, with
int x;
When you parse the int, you know you're in a declaration (types always start a declaration of some sort).
Given progress in parsing languages, this slight help isn't terribly useful nowadays.
Fortran puts the type first:
REAL*4 I,J,K
INTEGER*4 A,B,C
And yes, there's a (very feeble) joke there for those familiar with Fortran.
There is room to argue that this is easier than C, which puts the type information around the name when the type is complex enough (pointers to functions, for example).
What about dynamically (cheers #wcoenen) typed languages? You just use the variable.

Resources