Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Beyond the syntax of each language (e.g. print v. echo), what are some key distinctive characteristics to look out for to distinguish a programming language?
As a beginner in programming, I'm still confused between the strengths and weaknesses of each programming language and how to distinguish them beyond their aliases for common native functions. I think it's much easier to classify languages based on a set of distinctive characterstics e.g. OOP v. Functional.
There are many thing that define a PL, here I'l list a few:
Is it procedural, OO, imperative?
Does it has strong type checking(C#, C++, Delphi) or dynamic(PHP, Pythong, JS)
How are references handled? (Does it hide pointers like C#?)
Does it require a runtime (C#, Java) or is it native to the OS(C, C++)
Does it support threads (E.g Eiffel needs extra libraries for it)
There are may others like the prescense of garbage collectors, the handling of params, etc. The Eiffel language has an interesting feature which is Design By Contract, I haven't seen this on any other language(I think C# 4.0 has it now), but it can be pretty useful if well used.
I would recommend you to take a look on Bertrand Meyer's work to get a deeper understanding on how PL's work and the things that define them. Another thing that can define a PL is the interaction level with the system, this what makes the difference between low-level languages and high-level languages.
Hope I can help
In a domain (imperative, functional, concatenative, term rewriting), sometimes its best to look at the presence or absence of any particular set of functionality. For example, for the main stream imperative.
First order functions
Closures
Built in classes, prototypical inheritance, or toolkit (Example: C++, Self/JavaScript, Lua/Perl)
Complex data types (more than array)
In-built concurrency primitives
Futures
Pass by values, pass by name, pass by reference or an combination thereof
Garbage collected or not? What kind?
Event-based
Interface based types, class based types, or no user types (Go, Java, Lua)
etc
You can consider things like:
Can you call functions?
Can you pass functions to other functions?
Can you create new functions? (In C you can pass function pointers to functions, but you cannot create new functions)
Can you create new data types?
Can you create new data types with functions that operate on them? (the typical basis for "OO" languages)
Can you execute code that was not available at compile-time (using an eval function, maybe)?
Must all types be known at compile-time?
Are types available at run-time?
The difference between low-level and high-level languages. (Even though "low" and "high" are relative terms.)
A high-level language will use an abstraction to hide details that low-level languages would expose to the user. For example, in Matlab or Python, you can initialize an N-dimensional array in a single command. Not so in C or assembly.
IMHO the strength of a language is given by how many things you can do with it; how fast and how easy can you accomplish the goals.
The weaknesses of a language are the sum of constraints (of various types) that you encounter while you try to achieve your goal.
There are many features that a programming language may support. Additionally these features aren't always mutually exclusive. For example OCaml and F# are both functional and object oriented. Also writing a list here of all the paradigms that a language can support would be exhaustive, however there is a book Programming Language Pragmatics that is a comprehensive treatment of many paradigms found in programming languages.
However, for me the important things I need to know when working with a language are the following:
Is it dynamically or statically typed
Is it a typed language, and if it is typed is strong or weak?
Is it garbage collected
Does it support pass by value or pass by reference semantics or both?
Does it support first order functions (i.e. can functions be treated as variables)
Is it object-oriented
Polymorphism. Is it parametric or ad-hoc.
How expressive is the type system (i.e. can I create non-leaky abstractions)
Overloaded methods
Generics (templates)
Exception handling.
Type system (typed vs untyped, statically vs dynamically typed, weakly and strongly typed).
Supported paradigms (procedural, object-oriented, functional, logic, multi).
Default implementation (compiler vs interpreter vs JIT-compiler).
Memory management (manual vs automatic (reference counting or GC)).
Intended domain of use (number crunching, prototyping, scripting, DSL, ...).
Generation (1GL, 2GL, 3GL, 4GL, 5GL).
Used natural language (English vs Non-English-based). However, it's about syntax.
General remark: many of this classification scheme are not comprehensive and are not that good. And links are mostly at Wikipedia. So be aware.
You can consider other characteristics such as:
Strong vs weak and static vs dynamic typing, support for generic typing
How memory is handled (is it abstracted or do you have direct control over your data, pass by ref vs pass by value)
Compiled vs interpreted vs a bit of both
The forms of user-defined types available... classes, structures, tuples, lists etc.
Whether threading facilities are inbuilt or you need to turn to external libraries
Facility for generative coding... C++ template metaprogramming is a form of this
In the case of OOP, single vs multi inheritance, interfaces, anonymous/inner classes etc.
Whether a language is multi-paradigm (i.e. C# and its support for functional programming)
Availability of reflection
The verbosity of a language or the amount of 'syntactic sugar'... e.g. C++ is quite verbose when it comes to iterating over a vector. Java is quite succinct when anonymous inner classes are used for event-handling. Python's list comprehensions save a lot of typing.
i have planned to develop a tool that converts a program written in a programming language (eg: Java) to a common markup language (eg: XML) and that markup code is converted to another language (eg: C#).
in simple words, it is a programming language converter that converts program written in one language to another language.
i think it is possible but i don know where to start. i wanna know the possibilities to do so and information about some existing system.
What you are trying to do is extremely hard, but if you want to know what you are up for I've listed the steps you need to follow below:
First the hard bit:
First you obtain or derive an operational semantics for your source and target languages.
Then you enhance the semantics to capture your source and target memory models.
Then you need to unify the two enhanced-semantics within a common operational model.
Then you need to define a mapping from your source languages onto the common operational model.
Then you need to define a mapping from your operational model to your target language
Step 4, as you pointed out in your question, is trivial.
Step 1 is difficult, as most languages do not have sufficiently formal semantics specified; but I recommend checking out http://lucacardelli.name/TheoryOfObjects.html as this is the best starting point for building a traditional OO semantics.
Step 2 is almost certainly impossible in general, but may be merely obscenely difficult if you are willing to sacrifice some efficiency.
Step 3 will depend on how clean the result of step 1 turned out, but is going to be anything from delicate and tricky to impossible.
Step 5 is not going to be trivial, it is effectively writing a compiler.
Ultimately, what you propose to do is impossible in general, due to the difficulties inherited in steps 1 and 2. However it should be difficult, but doable, if you are willing to: severely restrict the source language constructs supported; pretty much forget handling threads correctly; and pick two languages with sufficiently similar semantics (ie. Java and C# are ok, but C++ and anything-else is not).
It depends on what languages you want to support, but in general this is a huge & difficult task unless you plan to only support a very small subset of each language.
The real problem is that each programming languages has different features (with some areas that overlap and others that don't) and different ways of solving the same problems -- and it's pretty tricky to detect the problem the programmer is trying to solve and convert that to a new idiom. :) And think about the differences between GUIs created in different languages....
See http://xmlvm.org/ as an example (a project aimed at converting between source code of many different languages, with an XML middle-point) -- the site covers in some depth the challenges they are tackling and the compromises they take, and (if you still have any interest in this kind of project...) ask more specific followup questions.
Notice specifically what the output source code looks like -- it's not at all readable, maintainable, efficient, etc..
It is "technically easy" to produce XML for any single langauge: build a parser, construct and abstract syntax tree, and dump out that tree as XML. (I build tools that do this off-the-shelf for many languages). By technically easy, I mean that the community knows how to do this (see any compiler textbook, e.g., Aho&Ullman Dragon book). I do not mean this is a trivial exercise in terms of effort, because real languages are complicated and messy; there have been many attempts to build C++ parsers and few successes. (I have one of the successes, and it was expensive to get right).
What is really hard (and I don't try to do) is produce XML according to a single schema in which the language semantics are exposed. And without that, it will be essentially impossible to write a translator from a generic XML to an arbitrary target language. This is known as the UNCOL problem and people have been looking since 1958 for the answer. I note that the Wikipedia article seems to indicate the problem is solved, but you can't find many references to UNCOL in the literature since 1961.
The closest attempt I've seen to this is the OMG's "ASTM" model (http://www.omg.org/spec/ASTM/1.0/Beta1/); it exports XMI which is XML. But the ASTM model has lots of escapes built into it to allow langauges that it doesn't model perfectly (AFAIK, that means every language) to extend the XMI in arbitrary ways so that the language-specific information can be encoded. Consequently each language parser produces a custom version of the XMI, and thus each reader has to pretty much know about the extensions and full generality vanishes.
I hear a lot about functional languages, and how they scale well because there is no state around a function; and therefore that function can be massively parallelized.
However, this makes little sense to me because almost all real-world practical programs need/have state to take care of. I also find it interesting that most major scaling libraries, i.e. MapReduce, are typically written in imperative languages like C or C++.
I'd like to hear from the functional camp where this hype I'm hearing is coming from..
It's important to add one word: "there's no shared state".
Any meaningful program (in any language) changes the state of the world. But (some) functional languages make it impossible to access the same resource from multiple threads simultaneously. The absence of shared state makes multithreading safe.
Functional languages such as Haskell, Scheme and others have what are called "pure functions". A pure function is a function with no side effects. It doesn't modify any other state in the program. This is by definition threadsafe.
Of course you can write pure functions in imperative languages. You also find multi-paradigm languages like Python, Ruby and even C# where you can do imperative programming, functional programming or both.
But the point of Haskell (etc) is that you can't write a non-pure function. Well that's not strictly true but it's mostly true.
Similarly, many imperative languages have immutable objects for much the same reason. An immutable object is one whose state doesn't change once created. Again by definition an immutable object is threadsafe.
You're talking about two different things and don't realize it.
Yes, most real-world programs have state somewhere, but if you want to do multithreading, that state should not be everywhere, and in fact, the fewer places it's in, the better. In functional programs, the default is not to have state, and you can introduce state exactly where you need it and nowhere else. Those parts that are dealing with state will not be as easily multithreaded, but since all the rest of your program is free of side-effects and thus it doesn't matter what order those parts are executed in, it removes a huge barrier to parallelization.
However, this makes little sense to me because almost all real-world
practical programs need/have state to take care of.
You'd be surprised! Yes, all programs need some state (I/O in particular) but often you don't need much more. Just because most programs have heaps of state doesn't mean they need it.
Programming in a functional language encourages you to use less state, and thus your programs become easier to parallelise.
Many functional languages are "impure" which means they allow some state. Haskell doesn't, but Haskell has monads which basically let you get something from nothing: you get state using stateless constructs. Monads are a bit fiddly to work with which is why Haskell gives you a strong incentive to restrict state to as small a part of your program as possible.
I also find it interesting that most major scaling libraries, i.e.
MapReduce, are typically written in imperative languages like C or C++.
Programming concurrent applications is "hard" in C/C++. That's why it's best to do all the dangerous stuff in a library which is heavily tested and inspected. But you still get the flexibility and performance of C/C++.
Higher order functions. Consider a simple reduction operation, summing the elements of an array. In an imperative language, programmers typically write themselves a loop and perform reductions one element at a time.
But that code isn't easy to make multi-threaded. When you write a loop you're assuming an order of operations and you have to spell out how to get from one element to the next. You'd really like to just say "sum the array" and have the compiler, or runtime, or whatever, make the decision about how to work through the array, dividing up the task as necessary between multiple cores, and combining those results together. So instead of writing a loop, with some addition code embedded inside it, an alternative is to pass something representing "addition" into a function that can do the divvying. As soon as you do that, you're writing functionally. You're passing a function (addition) into another function (the reducer). If you write this way then it not only makes more readable code, but when you change architecture, or want to write for heterogeneous architecture, you don't have to change the summer, just the reducer. In practice you might have many different algorithms that all share one reducer so this is a big payoff.
This is just a simple example. You may want to build on this. Functions to apply other functions on 2D arrays, functions to apply functions to tree structures, functions to combine functions to apply functions (eg. if you have a hierarchical structure with trees above and arrays below) and so on.
I've got a bit of fettish for language design and I'm currently playing around with my own hobby language. (http://rogeralsing.com/2010/04/14/playing-with-plastic/)
One thing that really makes my mind bleed is "generators" and the "yield" keyword.
I know C# uses AST transformation to transform enumerator methods into statemachines.
But how does it work in other languages?
Is there any way to get generator support in a language w/o AST transformation?
e.g. Does languages like Python or Ruby resort to AST transformations to solve this to?
(The question is how generators are implemented under the hood in different languages, not how to write a generator in one of them)
Generators are basically semi-coroutines with some annoying limitations. So, obviously, you can implement them using semi-coroutines (and full coroutines, of course).
If you don't have coroutines, you can use any of the other universal control flow constructs. There are a lot of control flow constructs that are "universal" in the sense that every control flow construct (including all the other universal control flow constructs), including coroutines and thus generators can be (more or less) trivially transformed into only that universal construct.
The most well-known of those is probably GOTO. With just GOTO, you can build any other control flow construct: IF-THEN-ELSE, WHILE, FOR, REPEAT-UNTIL, FOREACH, exceptions, threads, subroutine calls, method calls, function calls and so on, and of course also coroutines and generators.
Almost all CPUs support GOTO (although in a CPU, they usually call it jmp). In fact, in many CPUs, GOTO is the only control flow construct, although today native support for at least subroutine calls (call) and maybe some primitive form of exception handling and/or concurrency primitive (compare-and-swap) are usually also built in.
Another well-known control flow primitive are continuations. Continuations are basically a more structured, better manageable and less evil variant of GOTO, especially popular in functional languages. But there also some low-level languages that base their control flow on continuations, for example the Parrot Virtual Machine uses continuations for control flow and I believe there are even some continuation-based CPUs in some research lab somewhere.
C has a sort-of "crappy" form of continuations (setjmp and longjmp), that are much less powerful and less easy to use than "real" continuations, but they are plenty powerful enough to implement generators (and in fact, can be used to implement full continuations).
On a Unix platform, setcontext can be used as a more powerful and higher level alternative to setjmp/longjmp.
Another control flow construct that is well-known, but doesn't probably spring to mind as a low-level substrate build other control flow constructs on top of, are exceptions. There is a paper that shows that exceptions can be more powerful than continuations, thus making exceptions essentially equivalent to GOTO and thus universally powerful. And, in fact, exceptions are sometimes used as universal control flow constructs: the Microsoft Volta project, which compiled .NET bytecode to JavaScript, used JavaScript exceptions to implement .NET threads and generators.
Not universal, but probably powerful enough to implement generators is just plain tail call optimization. (I might be wrong, though. I unfortunately don't have a proof.) I think you can transform a generator into a set of mutually tail-recursive functions. I know that state machines can be implemented using tail calls, so I'm pretty sure generators can, too, since, after all, C# implements generators as state machines. (I think this works especially well together with lazy evaluation.)
Last but not least, in a language with a reified call stack (like most Smalltalks for example), you can build pretty much any kind of control flow constructs you want. (In fact, a reified call stack is basically the procedural low-level equivalent to the functional high-level continuation.)
So, what do other implementations of generators look like?
Lua doesn't have generators per se, but it has full asymmetric coroutines. The main C implementation uses setjmp/longjmp to implement them.
Ruby also doesn't have generators per se, but it has Enumerators, that can be used as generators. Enumerators are not part of the language, they are a library feature. MRI implements Enumerators using continuations, which in turn are implemented using setjmp/longjmp. YARV implements Enumerators using Fibers (which is how Ruby spells "coroutines"), and those are implemented using setjmp/longjmp. I believe JRuby currently implements Enumerators using threads, but they want to switch to something better as soon as the JVM gains some better control flow constructs.
Python has generators that are actually more or less full-blown coroutines. CPython implements them using setjmp/longjmp.
I have been using f# and Haskell to learn functional programming for a while now. Until I can get f# approved at our company I must still use c#. I am still trying however to stay in the functional style as I have noticed several benefits.
Here is a typical problem.
There is a key-set table in the
database with 3 keys (6.5 million
rows)
There are 4 other supporting
tables of small to medium size.
There are complex formulas based on several inputs.
I have to use data from all of the above to calculate a value and associate it with each key-set row and send it back to the database. There is a lot of lookups to the other 4 tables. For performance sake it is all done in memory.
I know exactly how I would do the in OO with static dictionaries, object models, strategy patterns and so forth but in a functional way I cannot get rid of the bad smell of using some of these constructs.
I am currently making the following assumptions for a functional solution.
Static dictionaries are bad. It seems the function could have side affects.
I need an Calculate function the takes an immutable object(s) and returns an immutable object with the three keys and the calculated value. Inside this function there could be another function in the same style.
Traditional OO patterns are probably not going to work.
How would you design this at a high level?
Am I wrong? Have I missed anything?
No, you are not not wrong. Both OOP and functional programming have their benefits and their drawbacks.
A developer needs tho know how and when to use each development style. It's fortunate that C# supports in a way both development styles.
In my opinion, and I use both functional and oop programming styles on a daily bases, oop is best when dealing with complex interactions and inter dependencies between various abstract artifacts (entities, nouns etc. ). Functional programming is best used when dealing with algorithms, data transformations etc. e.g. situations where the complexity of statements needed to solve a given problem is great.
I generally use object oriented programming on my domain (entities, aggregates, value objects, repositories and events) and reserve functional programming for my service objects.
Most of the the time it comes to a smell, or feeling which is best, since in software development aren't clear cut cases either way, and experience and practice often is the best judge for a given choice.
If your looking for speed you may want to consider the underlying data structures your using. Dictionary<> in C# is a hash table while SortedDictionary<> in C# is a binary search tree.
F# and Haskell both do a good job of representing tree data structures. You may want to consider using a more specific data structure over the default ones C# provides.
At a high level I would figure out what performance characteristics your formulas display and compare them to different data structures (wikipedia is a good source if you need a refresher). Once you figure out what data structures to use then I'd worry about what implementations to use.
How would you design this at a high level?
Basically, you use higher-order functions to factor the work into reusable components with low syntactic overhead. Then you might like to migrate from imperative data structures to purely functional data structures (purely functional computation wrapped in side effects for IO like database writes). Finally, you might even track side effects (completely purely functional).
As a rough guide, these three gradations to complete purity are seen firstly in Lisp (largely impure), Standard ML (much heavier use of purely functional data structures) and Haskell (complete purity).
I cannot give more specifics without knowing the exact problem but you can rest assured many people are doing this on a daily basis now and it works extremely well.
Functional programming in an OO language tends to be wrong. It produces overly verbose code that doesn't perform well and is more error prone (such as writing deeply recursive functions in a language that doesn't support tail calls.)
Blockquote 1. Static dictionaries are bad. It seems the function could have side affects.
Either it does or does not have side effects. A static dictionary can be a good way to implement memoization in an OO language.
Blockquote 3. Traditional OO patterns are probably not going to work.
OO patterns work well in an OO language trying to shoe horn FP techniques into a OO language will produce verbose and brittle code. It is rather a lot like trying to use a screw driver with hammer techniques sure it produces a result but there are better ways. Try to use your tools in the best way possible. Certain FP techniques can be useful but completely ignoring the language isn't going to make for good quality code.