Why is there such a clear cut between interpreted and compiled languages? - programming-languages

When learning a compiled language like C or C++, you get to know the compiler. In order to run your code, you have to compile it first. Compiling your code translates it from a textual representation into something that can be executed. The resulting code is very fast and can make use of preprocessors and the like.
When learning a dynamic language like Python, Matlab, or Ruby, you get to know the interpreter. In order to run your code, you just type it into the interpreter. Thus, you can play with your code at runtime and change the behavior of your program on the fly. The downside of this seem to be that interpreted languages are rather slow and the lack of a clear compilation time seems to makes preprocessors impossible.
Then there are just-in-time compilers which are used like interpreted languages but with less of a performance deficit compared to compiled languages. But they generally do not sport preprocessors and do not output ready-to-run binaries.
And then I learned Lisp, which can be compiled, interpreted and what have you, all the while being both fast and having a powerful preprocessing system (macros). This seems to be common sense in the Lisp world, but not anywhere else.
Why are there no popular interpreters for C or compilers for Python? Why the strong divide between interpreted and compiled languages? (I know some projects exist that can compile Python or interpret C, but in general they seem to not be very popular).

Most popular compiled languages were designed from the ground up to be compiled: they tend to avoid features that would make it difficult to produce efficient compiled code. These language features include the convenient "dynamic" ones such as dynamic typing, nonuniform containers, and ad hoc object namespaces.
So, an interpreter for a compiled language can't take advantage of the dynamic features that are available to a interpreted language, but lacks the performance advantages of a compiled implementation.
Conversely, a compiler must duplicate all features and behavior of an interpreted language, regardless of expense. In general, this means that the compiled program for an interpreted language will carry much of the overhead of the interpreter. As one example, any kind of eval() functionality effectively requires inclusion of the interpreter.
Finally, these effects are amplified by the mutually reinforcing advantages of large user base, good support, and robust implementation.

Related

What qualifies a programming language as dynamic?

What qualifies a programming language to be called dynamic language? What sort of problems should I use a dynamic programming language to solve? What is the main difference between static programming languages and dynamic programming languages?
I don't think there is black and white here - there is a whole spectrum between dynamic and static.
Let's take two extreme examples for each side of the spectrum, and see where that takes us.
Haskell is an extreme in the static direction.
It has a powerful type system that is checked at compile time: If your program compiles it is free from common and not so common errors.
The compiled form is very different from the haskell program (it is a binary). Consequently runtime reflection and modification is hard, unless you have foreseen it. In comparison to interpreting the original, the result is potentially more efficient, as the compiler is free to do funky optimizations.
So for static languages I usually think: fairly lengthy compile-time analysis needed, type system will prevent me from making silly mistakes but also from doing some things that are actually valid, and if I want to do any manipulation of a program at runtime, it's going to be somewhat of a pain because the runtime representation of a program (i.e. its compiled form) is different from the actual language itself. Also it could be a pain to modify things later on if I have not foreseen it.
Clojure is an extreme in the dynamic direction.
It too has a type system, but at compile time there is no type checking. Many common errors can only be discovered by running the program.
Clojure programs are essentially just Clojure lists (the data structure) and can be manipulated as such. So when doing runtime reflection, you are actually processing a Clojure program more or less as you would type it - the runtime form is very close to the programming language itself. So you can basically do the same things at runtime as you could at "type time". Consequently, runtime performance may suffer because the compiler can't do many up-front optimizations.
For dynamic languages I usually think: short compilation step (basically just reading syntax), so fast and incremental development, practically no limits to what it will allow me to do, but won't prevent me from silly mistakes.
As other posts have indicated, other languages try to take more of a middle ground - e.g. static languages like F# and C# offer reflection capabilities through a separate API, and of course can offer incremental development by using clever tools like F#'s REPL. Dynamic languages sometimes offer optional typing (like Racket, Strongtalk), and generally, it seems, have more advanced testing frameworks to offset the lack of any sanity checking at compile time. Also type hints, while not checked at compile time, are useful hints to generate more efficient code (e.g. Clojure).
If you are looking to find the right tool for a given problem, then this is certainly one of the dimensions you can look at - but by itself is not likely to force a decision either way. Have a think about the other properties of the languages you are considering - is it a functional or OO or logic or ... language? Does it have a good framework for the things I need? Do I need stability and binary backwards compatibility, or can I live with some churn in the compiler? Do I need extensive tooling?Etc.
Dynamic language does many tasks at runtime where a static language would do them at compile-time.
The tasks in question are usually one or more of: type system, method dispatch and code generation.
Which also pretty much answers the questions about their usage.
There are a lot of different definitions in use, but one possible difference is:
A dynamic language typically uses dynamic typing.
A static language typically uses static typing.
Some languages are difficult to classify as either static or dynamically typed. For example, C# is traditionally regarded as a statically typed language, but C# 4.0 introduced a static type called dynamic which behaves in some ways more like a dynamic type than a static type.
What qualifies a programming language to be called dynamic language.
Dynamic languages are generally considered to be those that offer flexibility at run-time. Note that this does not necessarily conflict with static type systems. For example, F# was recently voted "favorite dynamic language on .NET" at a conference even though it is statically typed. Many people consider F# to be a dynamic language because it offers run-time features like meta-circular evaluation, a Read-Evaluate-Print-Loop (REPL) and dynamic typing (of sorts). Also, type inference means that F# code is not littered with type declarations like most statically typed languages (e.g. C, C++, Java, C# 2, Scala).
What are the problems for which I should go for dynamic language to solve.
In general, provided time and space are not of critical importance you probably always want to use languages with run-time flexibility and capabilities like run-time compilation.
This thread covers the issue pretty well:
Static/Dynamic vs Strong/Weak
The question is asked during Dynamic Languages Wizards Series - Panel on Language Design (at 24m 04s).
Answer from Jonathan Rees:
You know one when you see one
Answer from Guy Steele:
A dynamic language is one that defers as many decisions as possible to run time.
For example about array size, the number of data objects to allocate, decisions like that.
The concept is deferring until runtime, that's what I understand to be dynamic.

Looking for a new language that supports both interpreted and native compilation modes

I currently program in Perl, Python, C#, C, C++, Java, and a few other languages, and I'm looking for a new language to use as a primary when doing personal projects.
My current criteria are:
can be run as an interpreted language (i.e., run without having to wait to compile it);
can be compiled to native code;
are strongly typed (even if optionally);
support macros/templating/code morphing/wtf you want to call it;
has a decent number of libraries for it, or easily accessible to it;
Ideas? Suggestions?
I would suggest that Haskell suits your criteria.
Can be run as an interpreted language? Yes, via GHCI.
Can be compiled to native code? Yes.
Is strongly typed? Very much so. Perhaps even the most strongly typed language today, with the exception of some theorem provers like Agda.
Support macros/templating/morphing? If you use template haskell. This is an optional extension of the language however, so most libraries don't use macros. I haven't used template haskell myself so i can't comment on if it's any good.
Has decent library support? The standard library is not bad. There is also Hackage, an open repository of Haskell libraries a bit in the style of CPAN.
Additionally, it sounds like you already know a lot of imperative/object oriented languages. IMHO if you learn another one of those langs. it will probably be a slightly different permutation of features you've already seen somewhere else. Adding another programming paradigm like functional programming to your toolbox will probably be a better learning experience. Though I guess whether that's an advantage or not depends on if you want to learn new things or be productive quickly.
Common Lisp fits: there is an optional typing, efficient native compilation is available, powerful REPL makes it a perfect choice for scripting, and there is a powerful macro metaprogramming.
OCaml fits as well, with CamlP4 for metaprogramming.
Scala? It does run scripts, although they are compiled (transparently) first. I'm not sure what you mean by code morphing etc, but it's pretty good for DSLs. It meets all your other requirements - compiled as much as Java is, strongly typed, and has a reasonable number of its own libraries as well as all of Java's. I'm still a beginner with it, but I like it so far.

Functional programming languages introspection

I'm sketching a design of something (machine learning of functions) that will preferably want a functional programming language, and also introspection, specifically the ability to examine the program's own code in some nicely tractable format, and preferably also the ability to get machine generated code compiled at runtime, and I'm wondering what's the best language to write it in. Lisp of course has strong introspection capabilities, but the statically typed languages also have advantages; the ones I'm considering are:
F# - the .Net platform has a good story here, you can read byte code at run time and also emit byte code and get it compiled; I assume there's no problem accessing these facilities from F#.
Haskell, Ocaml - do these have similar facilities, either via byte code or parse tree?
Are there other languages I should also be looking at?
Haskell's introspection mechanism is Template Haskell, which supports compile time metaprogramming, and when combined with e.g. llvm, provides runtime metaprogramming facilities.
Ocaml has:
Camlp4 to manipulate Ocaml concrete syntax trees in Ocaml. The maintained implementation of Camlp4 is Camlp5.
MetaOCaml for full-scale multi-stage programming.
Ocamljit to generate native code at run time, but I don't think it's been maintained recently.
Ocaml-Java to compile Ocaml code for the Java virtual machine. I don't know if there are nice reflection capabilities.
Not really an answer, but note also the F# Quotations feature and library, for more homoiconicity stuff.
You might check out the typed variant of Racket (previously known as PLT Scheme). It retains most of the syntactic simplicity of Scheme, but provides a static type system. Since Racket is a Scheme, metaprogramming is par for the course, and the runtime can emit native code by way of a JIT.
The Haskell approach would be more along the lines of parsing the source. The Haskell Platform includes a complete source parser, or you can use the GHC API to get access that way.
I'd also look at Scala or Clojure which come with them all the libraries that have been developed for Java. You'll never need to worry if a library does not exist. But more to the point of your question, these languages give you the same reflection (or more powerful types) that you will find within Java.
I'm sketching a design of something (machine learning of functions) that will preferably want a functional programming language, and also introspection, specifically the ability to examine the program's own code in some nicely tractable format, and preferably also the ability to get machine generated code compiled at runtime, and I'm wondering what's the best language to write it in. Lisp of course has strong introspection capabilities, but the statically typed languages also have advantages; the ones I'm considering are:
Can you not just parse the source code like an ordinary interpreter or compiler? Why do you need introspection?
F# - the .Net platform has a good story here, you can read byte code at run time and also emit byte code and get it compiled; I assume there's no problem accessing these facilities from F#.
F# has a rudimentary quotation mechanism but you can only quote some expressions and not other kinds of code, most notably type definitions. Also, its evaluation mechanism is orders of magnitude slower than genuine compilation so it is basically completely useless. You can use reflection to analyze type definitions but, again, it is quite rudimentary.
You can read byte code but that has been compiled so a lot of information and structure has been lost.
F# also has lexing and parsing technology (most notably fslex, fsyacc and FParsec) but it is not as mature as OCaml's.
Haskell, Ocaml - do these have similar facilities, either via byte code or parse tree?
Haskell has Template Haskell but I've never heard of anyone using it (abandonware?).
OCaml has its Camlp4 macro system and a few people do use it but it is poorly documented.
As for lexing and parsing, Haskell has a few libraries (most notably Parsec) and OCaml has many libraries.
Are there other languages I should also be looking at?
Term rewrite languages like Mathematica would be an obvious choice because they make it trivial to manipulate code. The Pure language might be of interest.
You might also consider MetaOCaml for its run-time compilation capabilities.

How can a programming language be "implemented"?

maybe this is just a little misunderstanding but how can a programming language be implemented?
I'm not talking about how to implement my own programming language but about the word "implemented"?
I mean, you can implement a compiler or an interpreter, but a programming language?
What does it mean if I read "C++ is implemented in C" or "Python was implemented in C"?
I think a language is more sth. like a protocol of how someone thinks about things should be implemented. For example, if he wants do display a messagebox he can say the command for this is ShowMessageBox(string) and implement a compiler who will translate this into something that works on a computer (aside from the selected programming paradigms he imagines).
I think this question leads to the question "what is a programming language in reality"? A compiler, an interpreter or just a documented language standard about how things should be implemented in a language?
[EDIT]
Answer: Languages are never implemented, only compilers/interpreters etc. It's this simple.
Here's a very academic answer (from a longtime academic).
First I'll reframe the question:
What does it mean for a programming language to be implemented?
I'll start with "what is a programming language":
A programming language is a formal language (a set of utterances we can characterize precisely through algorithmic rules) such that a sentence in the language has a computational meaning. There are a variety of ways to give computation meaning; two of the most popular are that a computation stands for a function (from values to values, or from machine states to machine states) and that a computation stands for a machine that makes "state transitions" and interacts with the outside world.
A language is implemented when a means is provided to read in an utterance and perform the computation, that is, calculate the function or perform the behavior. The means is the implementation.
Typical implementations include
Direct interpretation of the language syntax. This model is rare but FORTH probably comes closest to it.
Translation of the syntax into virtual-machine code, also called bytecode, which is itself another language and which is interpreted. It is popular to write bytecode interpreters in C. Lua, Perl, Python, and Ruby are implemented more or less this way.
Translation of the syntax into hardware machine instructions, which is itself another language, and which is interpreted by your CPU. C and C++ are typically (but not always) implemented this way.
Direct interpretation of the language in hardware. IA-32 machine code and AMD64 machine code are implemented this way.
When a person says "Language X is implemented in Y", they are usually saying that a translator for X or an interpreter for X's bytecode is written in language Y.
One of the great secrets of compiler writers is the ability to write the compiler for language X in language X itself. If this interests you, get Andrew Appel's paper Axiomatic Bootstrapping: A Guide for Compiler Hackers.
Sometimes the answer to this question is not obvious. Squeak Smalltalk writes both a translator and a bytecode interpreter in Smalltalk, then translates the interpreter to C, which is translated to machine code. What is Squeak implemented in? Smalltalk.
Poke a professor; get a lecture.
You are right, those statements don't make any sense. It's pretty obvious that whoever made those statements doesn't understand the difference between a programming language and a compiler (or interpreter).
This is a surprisingly common problem. For example, sometimes people talk about interpreted languages or compiled languages. That's the same thing: languages aren't interpreted or compiled, they just are. Interpretation and compilation are traits of the implementation not the language.
Another goodie: Python has a GIL. No, it doesn't: one implementation of Python has a GIL, all the other implementations don't, and the Python Language itself certainly doesn't. Or: Ruby has green threads. Again, not true: Ruby has threads. Period. Whether any particular language implementation chooses to implement them as green threads, native threads, platform threads or whatever, is a trait of that particular implementation, not of Ruby. And of course my favorite: Ruby 1.9 is faster than Ruby 1.8. This doesn't even make sense: Ruby 1.9 and Ruby 1.8 are programming languages, i.e. a bunch of abstract mathematical rules. You cannot run a programming language, therefore a programming language can never be "faster" or "slower" than another one.
The most blatant confusion about the difference between programming languages and implementations is the Computer Language Benchmark Game, which claims to benchmark languages against each other but in fact benchmarks implementations.
All of these are just different expressions of the fact that apparently some people seem to be fundamentally incapable of grasping the concept of abstraction. Or at least the concept of having an abstract language and a concrete implementation of that language.
If we go back to the statement that "Python is implemented in C", it should now be obvious that that statement is not just wrong. If the statement were wrong that would imply that the statement even makes sense, i.e. that there is some possible world out there, in which it could at least theoretically be right. But that's not the case. The statement is neither wrong nor right, it simply doesn't make sense. If English were a typed language, it would be a type error.
Python is a programming language. Programming languages aren't implemented in anything. They are just implemented. Compilers and interpreters are implemented in languages. But even if you interpret the statement this way, it isn't true: Jython is implemented in Java, IronPython is implemented in C#, PyPy is implemented in RPython and Python, Pynie is implemented in PGE, NQP and PIR. (Oh, and all of those implementations have compilers, so there goes your "Python is an interpreted language".) Similar with Ruby: Rubinius is implemented in Ruby and C++, JRuby and XRuby are implemented in Java, IronRuby and Ruby.NET are implemented in C#, HotRuby is implemented in ECMAScript, Red Sun is implemented in ActionScript, RubyGoLightly is implemented in Go, Cardinal is implemented in PGE, NQP and PIR, SmallRuby is implemented in Smalltalk/X, MagLev is implemented in GemStone Smalltalk and Ruby, YARI is implemented in Io. And for C++: Clang (which is the C, C++ and Objective-C front-end for LLVM) is implemented in C++ (all three front-ends are implemented in C++).
"C++ is implemented in C". I understand this as "C++ compiler is written in C language". Quite simple, without too much philosophy.
Generally, C++ compiler can be written in any language, including C++ itself (except of the first compiler version).
"Python was implemented in C" means that at least one Python compiler (in this case the most commonly used one) is written using C. The developers of that implementation of Python made a deliberate decision not to use C++. As a statement it is incomplete as Python has also been implemented in Java, in C# and in Python.
The main relevance is that it gives you some idea of the systems you might be able to port the language onto: anything targeted by a C compiler should (at least in theory) be capable of running the C implementation of Python, but if they'd chosen to use C++ there would be a smaller set of systems that could run it.
C++ usually isn't implemented in C these days: I believe it is usually implemented in C++. It is quite common for languages to be implemented in the same language (or a subset of the language) as it means you are no longer dependent on some other unrelated language being available for the target. To bootstrap onto a new system you cross compile from some other system.
If you compile gcc for a new platform the build process involves compiling the source code once with whatever compiler is already available (perhaps an older gcc), then compiling it a second time with the newly compiled compiler, then compiling it a third time with the output from the second compilation. If the second and third versions aren't identical you get a build error. If they are identical then you've got a pretty good indication that it compiled correctly.
A programming language is a standard. Its interpreter or compiler is an implementation of this standard.
To build a new language, you don't necessarily needs to do in in low level machine code (assembly for instance). So, using another language to accomplish your goal (creating a new language here) is perfectly normal. So, when we say: Python was implemented in C, it just means that C was used to create that language. For instance, C can be complied on many different architecture, so the programmers doesn't have to take care of the different type of computers (portable).
A language is just a way to express yourself to the computer. Today, it can be done in various ways. But when you use the same syntax as the language and create your own framework, it's called a library or framework. A programming language is just a notation for writing program. If the notation change, you have a different language. Like French or Spanish comes from Latin. (French is implemented in Latin ;)
Why is there so many different languages? Because the goal of a language is to solve complex problems. So, depending on what you want to try yo accomplish, choosing the appropriate language can be an important decision.
The statement "Language X is implemented in Language Y" makes sense and is true if and only if there exists a canonical implementation of Language X and that implementation is written in Language Y. In common usage, either the first or the most popular implementation is often assumed to be canonical.
For example, Perl is one of the few languages with a definitive canon. "Python is implemented in C" makes sense if CPython is taken to be the canonical implementation of Python, and "C++ is implemented in C" is true for CFront, the original implementation of "C with classes" by Bjarne Stroustrup.
The direct answer:
Implementation in the context you are talking about just means written and language actually means compiler.
The original C++ compiler was as I understand it written in C. There is nothing (apart from knowledge and time) to stop you from writing a C++ compiler in another language.
Implementation is the code that makes software work. Often we talk about the implementation of a function as in: "the function has not been implemented yet."
eg
void foo()
{
//function has not been implemented yet
throw();
}
This often happens during the design phase of a program because the call needs to be there in order to write/debug/concept test the calling code but we haven't got round to implementing (writing the code to go insde the function)

Which languages are dynamically typed and compiled (and which are statically typed and interpreted)?

In my reading on dynamic and static typing, I keep coming up against the assumption that statically typed languages are compiled, while dynamically typed languages are interpreted. I know that in general this is true, but I'm interested in the exceptions.
I'd really like someone to not only give some examples of these exceptions, but try to explain why it was decided that these languages should work in this way.
Here's a list of a few interesting systems. It is not exhaustive!
Dynamically typed and compiled
The Gambit Scheme compiler, Chez Scheme, Will Clinger's Larceny Scheme compiler, the Bigloo Scheme compiler, and probably many others.
Why?
Lots of people really like Scheme. Programs as data, good macro system, 35 years of development, big community. But they want performance. Hence, a number of good native-code compilers—Chez Scheme is even a successful commercial product (interpreted bytecodes are free; native codes you pay for).
The LuaJIT just-in-time compiler for Lua.
Why?
To show it could be done. And then, people started to like getting 3x speedup on their Lua programs. Lua is in a lot of games, where performance matters, plus it's creeping into other products too. 70% of the code in Adobe Lightroom is Lua.
The iconc Icon-to-C compiler.
Why?
The fifty people who used it loved Icon. Totally unusual evaluation model, the most innovative (and in my opinion, best) string-processing system ever designed. But that evaluation model was really expensive, especially on late-1980s computers. By compiling Icon to C, the Icon Project made it possible for big Icon programs to run in fewer hours.
Conclusion: people first develop an attachment to a dynamically typed language, and probably a significant code base. Eventually, the community spits out a native-code compiler so that you can get better performance and solve bigger problems.
Statically Typed and Interpreted
This category is less common, but...
Objective Caml. Dialect of ML, vehicle for lots of innovative experiments in language design.
Why?
Very portable system and very fast compilation times. People like both properties, so the new language-design ideas are desseminated widely.
Moscow ML. Standard ML with a few extra features of the modules system.
Why?
Portable, fast compilation times, easy to make an interactive read/eval/print loop. Became a popular teaching compiler.
C-Terp. An old product, I think maybe from Gimpel Software. Saber C—a product I don't think you can buy any more.
Why?
Debugging. Especially, debugging on 1980s hardware under MS-DOS. For very little resources, you could get really good help debugging C code on very limited hardware (think: 4.77MHz processor with an 8-bit bus, 640K of RAM fully loaded). Nearly impossible to get a good visual debugger for native-compiled code, but with the interpreter, fairly easy.
UCSD Pascal—the system that made "P-code" a household word.
Why?
Teachers liked Niklaus Wirth's language design, and the compiler could run on very small machines. Wirth's clean design and the UCSD P-system made an unbeatable combination, and Pascal was the standard teaching language of the 1970s. Younger people may find it hard to appreciate that in the 1970s there was no debate over what language to teach in the first course. Today I know of programs using C, C++, Haskell, Java, ML, and Scheme. In the 1970s it was always Pascal, and the UCSD P-system was a big reason way.
In case you are wondering, P stood for portable.
Summary: Interpreting a statically typed language is a great way to get an implementation into everybody's hands quickly. (It also had advantages for debugging on Bronze Age hardware.)
Objective-C is compiled and supports dynamic typing (certainly when calling methods via [target doSomething] syntax). That is, you can send any message to a target (using ordinary language syntax, without programming against a reflection API), receive only a warning at compile time that it might not be handled, and receive an exception only at runtime if the target doesn't respond to that selector (which is like a method signature); and you can ask any object (which can all be of static type id if your code doesn't know any better or doesn't care) whether it respondsToSelector: to probe its capabilities.
Java (a statically typed language) is compiled to JVM bytecode, which was interpreted on older versions of the JVM, whereas it now uses Just In Time (JIT) compilation, meaning machine code is generated at runtime. I also believe ML and its dialects can be interpreted, and ML is definitely statically typed.
Python is a dynamic language that has compilers.
See this SO question - Python - why compile?, for instance.
In general, compiling makes the program run much faster.
Actionscript has dynamic typing and compiles to bytecode.
And it even compiles right down to native machine code if you want to release a Flash app on the iPhone.

Resources