Is it possible to create a universal intermediate programming language? - programming-languages

What I mean is, is there a language or could one be designed, such that all high level programming languages could be compiled into this intermediate language?
This is excluding machine languages.

Every general-purpose language that is Turing-complete is a universal programming language.
Two languages (or machines) are considered to be Turing-equivalent if any program for one can be compiled into a program for the other. A language is Turing-complete if it is Turing-equivalent to a Turing machine.
There were several early efforts to formalize the notion of a computation; a Turing machine was one, the lambda calculus another, and the class of general recursive functions a third. Alonzo Church and Alan Turing proved that all three of these formalizations were Turing-equivalent; any program for a Turing machine could be compiled to the lambda calculus, and vice versa, as could any general recursive function be implemented by either the lambda calculus or a Turing machine, and again vice versa.
The Church-Turing thesis hypothesizes that any computation that can be expressed in any formal system can be converted into a program that can run on a Turing machine; or equivalently, can be expressed in the untyped lambda calculus, or is general recursive, based on the equivalence described above.
It is merely a hypothesis and cannot be formally proven, as there is no way to formally characterize the class of computations that are subject to it (without circular reasoning by defining them as the class of computations that can be performed by a Turing machine), but there has never been any proposed model of computation that is not possible to compute with a Turing machine.
Because you can write a simulator of a Turing machine (or implementation of lambda calculus) in almost any general purpose language, and likewise those languages can be compiled to a program running on a Turing machine, pretty much all general purpose languages are Turing complete.
There are, however, some languages which are not Turing complete; regular expressions are an example. They can be simulated by a Turing machine, but they cannot in turn simulate a Turing machine.
Note that none of this addresses efficiency or access to host system resources; merely that the same computation can be expressed, and that it will eventually provide the same answer. There are some languages that are Turing complete in which there are some problems that cannot be computed at the same asymptotic efficiency as in other languages. And some languages provide access to external resources like the filesystem, I/O, networking, etc, while others which just allow computation in memory, but in any language that is Turing complete it would be possible to add an API or method of manipulating memory that allows it to access those external resources, so lack of access to system resources isn't a fundamental limitation, just a limitation of implementation.
As a more practical matter, there are several languages that have been designed to be portable, intermediate languages that are targets of compilation. The LLVM IR is one commonly used example, C-- is another. Also, any bytecode for a language runtime acts this way, the JVM is a compilation target for many languages, the CLR is another. Finally, many languages compile to C, as C compilers are widely available and the code is more portable than machine code.
And more recently, with the advent of the web and JavaScript being a language that is available in every web browser, JavaScript has become a popular target for compilation, both for languages that were designed to compile down to JavaScript like CoffeeScript and Dart, but also existing languages that were originally design to compile to machine code, via projects like Emscripten. Recognizing this usage, there has been effort to specify a subset of JavaScript, with more strict rules, known as asm.js, that makes a better target for compilation, while still allowing the same code to work backwards-compatibly with regular JavaScript engines that don't know anything about asm.js.

Related

Is natural language Turing complete?

I'm pretty sure a human language (e.g. English) is powerful enough to simulate a Turing machine, which would make it Turing complete. However, that would imply natural languages are no more or less expressive than programming languages, which seems questionable.
Is natural language Turing complete?
First of all "Is language X Turing complete" is only a well-defined question given a well-defined semantics for language X. It is nearly impossible to define one for natural languages due to natural languages' complex nature and reliance on context and intuition. Most (all?) natural languages don't even have a well-defined syntax.
That aside, your main confusion is based on the assumption that it's not possible for a computational model to be strictly more powerful than a Turing machine, i.e. be able to simulate a Turing machine, but also to express computations that a Turing machine can not. This is not true. For example we can extend Turing machines with oracles and we get a computational model that's strictly more powerful than plain Turing machines.
In the same vein we could define a programming language MagicLang that can do everything an ordinary programming language can do plus solve the halting problem. Defining a semantics for such a language is easy: just take the semantics of the language we used as a basis and add a function bool halts(string src, string input) with the semantics "returns true if the program described by the source code src successfully terminates after a finite amount of time when given the input input". So that's easy. What's hard, or rather impossible, is implementing this language.
Now one may argue that natural language can also describe the halting problem and our brain can "execute" natural language, i.e. it can answer the question "does this program halt". So if we could build a computer that could do everything our brain can do, it should be able to do this as well. But the thing is our brain can't solve the halting problem with 100% accuracy. Our brain can't even execute regular programs with 100% accuracy. Just remember how often you've stepped through a program in your head and came up with a different result than reality. Our brain is very good at learning, making intuitive connections and applying heuristics, but those things always come with the risk of giving the wrong result.
So could a computer do the same thing? Yes, we can use heuristics and machine learning to approach otherwise unsolvable problems and with that normal programming languages can attempt to solve every problem that can be described in natural language (even the undecidable ones). But just like the brain, those programs will sometimes give wrong results. In fact they will give wrong results much more often as our machine learning algorithms and heuristics aren't nearly as advanced as those of the human brain.
If a software language is sufficiently complex that it can be used to define arbitrary extensions to itself (such as defining arbitrary new functions), then it's clearly Turing-complete.
Using natural language I can, given sufficient time, teach another human terminology and concepts to extend their understanding and ability to discuss arbitrary subjects that they previously couldn't -- I could teach them copyright law, or astrophysics, for example (if they didn't already know them). So, while this may be more of an analogy than an exact identity, there does seem to be a Turing-completeness-like property to natural languages: they can be used to define and transmit arbitrary extensions to themselves. (Admittedly, not every human is really cut out to learn astrophysics -- but then any non-idealized Turning machine has only some finite amount of memory, so it's always possible to define a program that it can't run because it doesn't have enough memory.)

Languages specifically designed to make static verification easier

A lot of languages (perhaps all of them) are designed to make writing programs easier. They all have different domains, and aim to simplify developing programs in these domains (C makes developing low-level programs easier, Java makes developing complex business logic easier, et al.). Perhaps other purposes are sacrificed in sake of writing and maintaining programs in an easier, more natural, less error-prone way.
Are there any languages specifically designed to make verification of source code--i.e. static analysis--easier? Of course, capability to write common programs for modern machines should also persist.
One of the design goals of Ada was to support a certian amount of formal verification. It was moderately successful, but verification didn't exactly take off like they were hoping. Luckily Ada is good for far more than that. Sadly, that hasn't helped it much either...
There's an Ada subset called Spark that keeps this alive today. Praxis sells a development suite built around it.
Are there any languages specifically designed to make verification of source code easier?
This was a vague goal of both the CLU and ML languages, but the only language design I know that takes static verification really seriously is Spark Ada.
Dijkstra's language of guarded commands (as described in Discipline of Programming) was designed to support static verification, but it was explicitly not supposed to be implemented.
Gerard Holzmann's Promela language was also designed for static analysis by the model checker SPIN, but again it's not executable.
Auditors in the E language provide a built-in means of writing code analyses within the language itself and requiring that some section of code pass some static check. You might also be interested in the related-work part of the paper.
I haven’t used it myself, so I can’t speak with any authority, but I understand that the Eiffel programming language was designed to use Code by Contracts, which would make static analysis much easier. I don’t know if that counts or not.
There is SAIL, the Static Analysis Intermediate Language or Flexibo
One has two problems in "making verification of source code easier". One is languages in which you don't do gross things such as arbitrary cases (such as C).
Another is specifying what you want to verify, for that you need a good assertions languages.
While many languages have proposed such assertions languages,
I think the EDA community has been pushing the envelope most effectively with temporal specifications. The "Property Specification Language" is a standard; you can learn more from P1850 Standard for PSL: Property Specification Language (IEEE-1850). One idea behind PSL is that you can add it to existing EDA languages; I think the EDA community has been incorporating into the EDA langauges as time goes by.
I've often wished for something like PSL to embed in conventional computer software.
Static verification is a bad start for this task. It's based on an assumption that it's possible to verify correctness of the program automatically. It's not feasible in real world, and expecting the program to check arbitrarily complex code without any hints is just plain dumb. Usually software for static verification ends up requiring hints all over source code, and in the end generates lots of false positives and false negatives. It has some niche, but that's it. (See introduction to "Types and programming languages" by Pierce)
While these kind of tools were developed by engineers for their own simple purposes, real solution have been baking in an academy. It was found that types in statically typed programming languages are equivalent to logic statements given everything goes smooth and language doesn't have some kind of bad behaviour. This is called "Curry-Howard correspondence", and the embedding of logic into types is "Brouwer-Heyting-Kolmogorov logic". The most powerful proofs are possible only in the languages with powerful types: dependent types. If we forget all this terminology for a while, this means that we can write programs that carry proofs of its own correctness, and these proofs are checked while the program gets compiled, with no executable file given in case of failure.
The positive side of this approach is that you never get any false negatives, i.e. compiled program is guaranteed to work properly according to the specification. Even without extra proofs about specification, programs in dependently-typed languages are less prone to mistakes, because divisions by zero, unhandled exceptions and overflows just never end up in an executable program.
Always writing such proofs by hand is tedious. For that there are "tactics", i.e. programs that generate proofs of correctness. These are almost equivalent to programs for static verification, but, unlike them, are required to generate formal proof.
There is a range of dependently-typed languages for different purposes: Coq, Agda, Idris, Epigram, Cayenne etc.
Tactics are implemented in Coq and probably several more languages. Also Coq is the most mature of them all, with infrastructure including libraries like Bedrock.
In case C code extraction from Coq is not enough for your requirements, you can use ATS, which is on par in performance with C.
Haskell employs weak form of Curry-Howard correspondence: it works fine, unless you start writing failing or forever-looping programs. In case your requirements are not that hard to write formal proofs, consider using Haskell.

Non-deterministic programming languages

I know in Prolog you can do something like
someFunction(List) :-
someOtherFunction(X, List)
doSomethingWith(X)
% and so on
This will not iterate over every element in List; instead, it will branch off into different "machines" (by using multiple threads, backtracking on a single thread, creating parallel universes or what have you), with a separate execution for every possible value of X that causes someOtherFunction(X, List) to return true!
(I have no idea how it does this, but that's not important to the question)
My question is: What other non-deterministic programming languages are out there? It seems like non-determinism is the simplest and most logical way to implement multi-threading in a language with immutable variables, but I've never seen this done before - Why isn't this technique more popular?
Prolog is actually deterministic—the order of evaluation is prescribed, and order matters.
Why isn't nondeterminism more popular?
Nondeterminism is unpopular because it makes it harder to reason about the outcomes of your programs, and truly nondeterministic executions (as opposed to semantics) are hard to implement.
The only nondeterministic languages I'm aware of are
Dijkstra's calculus of guarded commands, which he wanted never to be implemented
Concurrent ML, in which communications may be synchronized nondeterministically
Gerard Holzmann's Promela language, which is the language of the model checker SPIN
SPIN does actually use the nondeterminism and explores the entire state space when it can.
And of course any multithreaded language behaves nondeterministically if the threads are not synchronized, but that's exactly the sort of thing that's difficult to reason about—and why it's so hard to implement efficient, correct lock-free data structures.
Incidentally, if you are looking to achieve parallelism, you can achieve the same thing by a simple map function in a pure functional language like Haskell. There's a reason Google MapReduce is based on functional languages.
The Wikipedia article points to Amb which is a Scheme-derivative with capacities for non-deterministic programming.
As far as I understand, the main reason why programming languages do not do that is because running a non-deterministic program on a deterministic machine (as are all existing computers) is inherently expensive. Basically, a non-deterministic Turing machine can solve complex problems in polynomial time, for which no polynomial algorithm for a deterministic Turing machine is known. In other words, non-deterministic programming fails to capture the essence of algorithmics in the context of existing computers.
The same problem impacts Prolog. Any efficient, or at least not-awfully-inefficient Prolog application must use the "cut" operator to avoid exploring an exponential number of paths. That operator works only as long as the programmer has a good mental view of how the Prolog interpreter will explore the possible paths, in a deterministic and very procedural way. Things which are very procedural do not mix well with functional programming, since the latter is mostly an effort of not thinking procedurally at all.
As a side note, in between deterministic and non-deterministic Turing machines, there is the "quantum computing" model. A quantum computer, assuming that one exists, does not do everything that a non-deterministic Turing machine can do, but it can do more than a deterministic Turing machine. There are people who are currently designing programming languages for the quantum computer (assuming that a quantum computer will ultimately be built). Some of those new languages are functional. You may find a host of useful links on this Wikipedia page. Apparently, designing a quantum programming language, functional or not, and using it, is not easy and certainly not "simple".
One example of a non-deterministic language is Occam, based on CSP theory. The combination of the PAR and ALT constructs can give rise to non-deterministic behaviour in multiprocessor systems, implementing fine grain parallel programs.
When using soft channels, i.e. channels between processes on the same processor, the implementation of ALT will make the behaviour close to deterministic†, but as soon as you start using hard channels (physical off-processor communication links) any illusion of determinism vanishes. Different remote processors are not expected to be synchronised in any way and they may not even have the same core or clock speed.
†The ALT construct is often implemented with a PRI ALT, so you have to explicitly code in fairness if you need it to be fair.
Non-determinism is seen as a disadvantage when it comes to reasoning about and proving programs correct, but in many ways once you've accepted it, you are freed from many of the constraints that determinism forces on your reasoning.
As long as the sequencing of communication doesn't lead to deadlock, which can be done by applying CSP techniques, then the precise order in which things are done should matter much less than whether you get the results that you want in time.
It was arguably this lack of determinism which was a major factor in preventing the adoption of Occam and Transputer systems in military projects, dominated by Ada at the time, where knowing precisely what a CPU was doing at every clock cycle was considered essential to proving a system correct. Without this constraint, Occam and the Transputer systems it ran on (the only CPUs at the time with a formally proven IEEE floating point implementation) would have been a perfect fit for hard real-time military systems needing high levels of processing functionality in a small space.
In Prolog you can have both non-determinism and concurrency. Non-determinism is what you described in your question concerning the example code. You can imagine that a Prolog clause is full of implicit amb statements. It is less known that concurrency is also supported by logic-programming.
History says:
The first concurrent logic programming language was the Relational
Language of Clark and Gregory, which was an offshoot of IC-Prolog.
Later versions of concurrent logic programming include Shapiro's
Concurrent Prolog and Ueda's Guarded Horn Clause language GHC.
https://en.wikipedia.org/wiki/Concurrent_logic_programming
But today we might just go with treads inside logic programming. Here is an example to implement a findall via threads. This can also be modded to perform all kinds of tasks on the collection, or maybe even produce agent networks towards distributed artificial intelligence.
I believe Haskell has the capability to construct and non-deterministic machine. Haskell at first may seem too difficult and abstract for practical use, but it's actually very powerful.
There is a programming language for non-deterministic problems which is called as "control network programming". If you want more information go to http://controlnetworkprogramming.com. This site is still in progress but you can read some info about it.
Java 2K
Note: Before you click the link and being disappointed: This is an esoteric language and has nothing to do with parallelism.
The Sly programming language under development at IBM Research is an attempt to include the non-determinism inherent in multi-threaded execution in the execution of certain types of algorithms. Looks to be very much a work in progress though.

what is a programming language?

Wikipedia says:
A programming language is a machine-readable artificial language designed to express computations that can be performed by a machine, particularly a computer. Programming languages can be used to create programs that specify the behavior of a machine, to express algorithms precisely, or as a mode of human communication.
But is this true? It occurred to me in the shower this morning that a programming language might just be a set of conventions, something that both a human and an appropriately arranged compiler can interpret. If that's the case, then isn't it this definition of a programming language misleading? If that isn't the case, then what's the difference between a compiler and the language it compiles?
Thanks!
z.
A programming language is exactly that set of conventions, but I don't see why that makes the Wikipedia entry misleading, really. If it makes you feel better, you might edit it to read something like:
A programming language is a machine-readable artificial language designed to express computations that can be performed by a machine, particularly a computer. Programming languages can be used to define programs that specify the behavior of a machine, to express algorithms precisely, or as a mode of human communication.
I understand what you are saying, and you are right. Describing a programming language as a "machine-readable artificial language designed to express computations that can be performed by a machine" is unnecessarily specific. Programming languages can be more broadly generalized as established descriptions of tasks (or "a set of conventions") that allow one entity to control the behavior of another. What we traditionally identify as programming languages are just a layer of abstraction between machine code and programmers, and are specifically designed for electronic computers.
Programming languages are not limited to traditional computers (see the K'NEX Computer), and aren't even necessarily limited to computational devices at all. For example, when I am pleased with my dog's behavior, he gets a treat. When I am displeased, he gets nothing. Over time the dog learns the treat/no treat programming and I can use the treats to control his behavior (to an extent).
I don't see what is different between what you are asking...
It occurred to me in the shower this morning that a programming language might just be a set of conventions, something that both a human and an appropriately arranged compiler can interpret.
... and the Wikipedia definition.
The key is that a programming language is just "a machine-readable artificial language".
A compiler does indeed act as an effective specification of a language in terms of a reduction to machine code - however, as it's generally difficult to understand a language by reading the compiler's source, one generally considers a programming language in terms of an abstract processing model that the compiler implements. This abstract model is what one means when one refers to the programming language.
That said, there are indeed many languages (Hi there, PHP!) in which the compiler is the only specification of the language in existence. These languages tend to change unpredictably at times as compiler bugs are fixed or introduced.
Programming languages are an abstraction layer that helps insulate the programmer from having to talk in electrical signals to the computer. The creators of the language have done all the hard work in creating a structure (language) or standard (grammar, conjugation, etc.) that then can be interpreted by a compiler in terms that the computer understands.
All programming languages are really nothing more than domain specific languages for machine code or manipulating the registers and memory of a processing entity.
This is probably the true explanation of what a programming language really is:
Step 1: Think of a language and its grammar, which is a set of rules for making syntactically valid statements using the language. For example, a language called GRID has tiles {0,1} as its alphabet and grammar rules that make sure every GRID statement has equal length and height.
Step 2 (definition of program): GRID, so far, is useless. I'd dare to think of any valid statement of GRID as just data. We need to add something else to GRID: a successor function. So GRID={Grammar, alphabet, successor function}. To make this clear, lets use the rules of "The Game of Life" as successor function.
Step 3: The Game of Life is actually Turing Complete, so GRID={Grammar, alphabet, successor function = GOL} can perform any computation that is computable.
A programming language is nothing but a language with a successor function. The environment that evaluates a valid statement of the language(program) does nothing but follow those successor functions. Variables, for example, are things whose successor functions = (STAY THE SAME)
Computers are just very fast environments ;)
Wikipedia's definition might have been taken out of context. For one thing, only programs written in machine code are machine-readable. Otherwise, you need a compiler to convert C++, Java or even assembly code to machine code so the computer can carry out your instructions. Unless you include comments that are only readable to humans, or unless you are strictly discussing a topic within the realm of your program, programming is insufficient for human communication.

What are some pros/cons to various functional languages?

I know of several functional languages - F#, Lisp and its dialects, R, and more. However, as I've never used any of them (although the three I mentioned are on my "to-learn" list), I was wondering about the pros/cons of the various functional languages out there. Are there significant pros/cons, both in learning the language and in any real-world applications of said language?
Haskell is "extreme" (lazy, pure), has active users, lots of documentation, and makes runnable applications.
SML is "less extreme" (strict, impure), has active users, formal specification, many implementations (SML/NJ, Mlton, Moscow ML, etc.). Implementations vary on how applications are deployed wrt the runtimes.
OCaml is ML with attitude. It has an object orientation, active users, documentation, add ons, and makes runnable applications.
Erlang is concurrent, strict, pure (mostly), and supports distributed apps. It needs a runtime installed separately, so deployment is different from the languages that make runnable binaries.
F# is similar to OCaml with Microsoft backing and .NET libraries.
Scala runs on the JVM and can be used as a functional language with advanced features, or as simply a souped-up Java, or both. The flexibility is cited as a drawback for learning a functional language because it's easy to slip back into imperative Java ways. Of course it is also an advantage if you want to use existing JVM libraries.
I'm not sure if your question is to functional languages in general, or differences between them. For general info on why functional:
http://paulspontifications.blogspot.com/2007/08/no-silver-bullet-and-functional.html
Why Functional Programming Matters
As far as differences between functional languages:
Distinctive traits of the functional languages
The awesome thing about functional languages is that base themselves off of the lambda calculus and other math. This results in being able to use similar algorithms and thoughts across languages more easily.
As far as which one you should learn: Pick one that will have a comfortable environment for you. For example, if you're using .NET and Visual Studio, F# is an excellent fit. (Actually, the VS integration makes F# a strong contender, period.) The book "How to Design Programs" (full text, free, online) with PLT Scheme is also a good choice.
I'm biased, but F# looks to have the biggest "real-world" potential. This is mainly because of the nice IDE/.NET integration, allowing you to fully tap .NET and OO, while keeping a lot of functional power (and extending it in ways too). Scala might be possible contender, but it's more of an OO language that has some functional features; hence Scala won't be as big a productivity gain.
Edit: Just to note JavaScript and Ruby, before someone comments on that :). Ruby is something else you could take a look at if you're doing that type of web dev, as it has a lot of functional concepts in, although not as polished as other languages.
The biggest downside is that once you see the power you can have, you won't be happy using lesser languages. This becomes a problem if you're forced to deal with people who haven't yet understood.
One final note, the only "con" is that "it's so complicated". This isn't actually true -- functional languages are often simpler -- but if you have years of C or whatnot in your brain, it can be a significant hurdle to "get" the functional concept. After it clicks, it should be relatively smooth sailing.
Lisp has a gentle learning curve. You can learn the basics in an hour, though of course it takes longer to learn idioms etc. On the down side, there are many dialects of Lisp, and it's difficult to interact with mainstream environments like Java or .NET.
I would not recommend R unless you need to do statistics. It's a strange language, and not exactly functional. You can do functional programming in R, but most people don't.
If you're familiar with the Microsoft tool stack, F# might be easy to get into. And it has a huge, well-tested library behind it, i.e. the CLR.
You can use a functional programming style in any language, though some make it easier than others. As far as that goes, you might try Python.
ML family (SML/OCaml/F#):
Pros:
Fairly simple
Have effective implementations (on the level with Java/C#)
Easily predictable resource consumption (compared to lazy languages)
Readable syntax
Strong module system
(For F#): large .Net library available
Has mutable variables
Cons:
Sometimes too simple (no typeclasses => problems with overloading)
(Except F#): standard libraries are missing some useful things
Has mutable variables :)
Cannot have infinite data structures (not lazy language)
I haven't mentioned features common to most static-typed functional languages: type inference, parametric polymorphism, higher-order functions, algrebraic data types & pattern matching.
I have learnt Haskell at the university like a pure functional languaje and I can say that's really powerful, but also I couldn't find a practical use.
However, i found this: Haskell in practice . Check it, is amazing.
The characteristics of functional paradigms sometimes are pros, and sometimes cons, depending on the situation / context.
Some of them are:
high level
lambda functions
lazy evaluation
Higher-order functions
recursion
type inference
Cite from wikipedia:
Efficiency issues
Functional programming languages have
been perceived as less efficient in
their use of CPU and memory than
imperative languages such as C and
Pascal.[26] However, for programs that
perform intensive numerical
computations, functional languages
such as OCaml and Clean are similar in
speed to C. For
programs that handle large matrices
and multidimensional databases, array
functional languages (such as J and K)
were designed with speed optimization
in mind.
Purely functional languages have a
reputation for being slower than
imperative languages.
However, immutability of data can, in
many cases, lead to execution
efficiency in allowing the compiler to
make assumptions that are unsafe in an
imperative language, vastly increasing
opportunities for inlining.
Lazy evaluation may also speed up the
program, even asymptotically, whereas
it may slow it down at most by a
constant factor (however, it may
introduce memory leaks when used
improperly).

Resources