Regular expression-like languages to search and replace in *tree-like* structures or even graphs - search

First such example is XSLT. Second example may be a hypothetical language that is basically well-known regular expression language, but additionally has a special construct that matches any number of _balanced_ parenthesis. (Note the difference of approaches of first and second example — first one transforms trees, while the second one transforms strings that are treated as trees. Also, this hypothetical language seems to me to be very useful.)
I do know that there are a lot of such languages, but the importance of tree-like structures in programmers' job and convenience of match and replace approach justifies such a broad question.
Please do not tell about languages with closed source implementation or implementation, severely restricted in other ways. However, if you know really nice language without working implementation, it may be worth to mention. Thanks.

Another language, which is an alternative to XSLT for searching, transforming, and generating XML data, is XQuery (with XQuery Update).

Related

Creating source to source translator

I want to know that what are the strategies to create a source to source translator i.e creating translation from one high level language to another. The two ways that come into my mind are
1- Changing syntax tree of one language to other language syntax tree
2- Changing it to intermediate language and then converting that to other high level language
My question is that is it possible to do the conversion using both strategies and which is more feasible to do, can anyone give some refernces to any theory or implementation done by some converter like any of above methods. And is there any standard xml based intermediate language, i know that xmlvm uses xml as intermediate language but it does not provide any proper specification of the intermediate language.
Any compiler is, roughly, a source-to-source translator. Target language can be an assembly language (or directly a binary machine code language), or C, or whatever high level language you fancy. So, the general compilers theory is applicable.
And just as a word of advice - one intermediate language is normally not nearly enough. Use more. Use dozens of intermediate languages, each different from a previous one in just one tiny aspect. This way any language-to-language translation is nothing but trivial.
Another word of advice (anticipating downvotes here) - stay away from XML, especially as a representation for ASTs.
I would look at LLVM, which can do source to source. Although the output isn't pretty, it might provide some good ideas.
The converters are usually based on constructing the semantic tree of one program and then re-shaping it to the target PL. As an example, take a look at C# to Java convertor.
The second approach is also possible, but the organization of your code may change completely after conversion. So, it is better to keep the intermediate common structure (IL, ST, etc), as high level as possible.
Try Clang! It is powerful for source-to-source translation. As of now it fully supports C, C++, Objective C and Objective C++.
You may also want to look at ROSE compiler infrastructure.

is it possible to markup all programming languages under object oriented paradigm using a common markup schema?

i have planned to develop a tool that converts a program written in a programming language (eg: Java) to a common markup language (eg: XML) and that markup code is converted to another language (eg: C#).
in simple words, it is a programming language converter that converts program written in one language to another language.
i think it is possible but i don know where to start. i wanna know the possibilities to do so and information about some existing system.
What you are trying to do is extremely hard, but if you want to know what you are up for I've listed the steps you need to follow below:
First the hard bit:
First you obtain or derive an operational semantics for your source and target languages.
Then you enhance the semantics to capture your source and target memory models.
Then you need to unify the two enhanced-semantics within a common operational model.
Then you need to define a mapping from your source languages onto the common operational model.
Then you need to define a mapping from your operational model to your target language
Step 4, as you pointed out in your question, is trivial.
Step 1 is difficult, as most languages do not have sufficiently formal semantics specified; but I recommend checking out http://lucacardelli.name/TheoryOfObjects.html as this is the best starting point for building a traditional OO semantics.
Step 2 is almost certainly impossible in general, but may be merely obscenely difficult if you are willing to sacrifice some efficiency.
Step 3 will depend on how clean the result of step 1 turned out, but is going to be anything from delicate and tricky to impossible.
Step 5 is not going to be trivial, it is effectively writing a compiler.
Ultimately, what you propose to do is impossible in general, due to the difficulties inherited in steps 1 and 2. However it should be difficult, but doable, if you are willing to: severely restrict the source language constructs supported; pretty much forget handling threads correctly; and pick two languages with sufficiently similar semantics (ie. Java and C# are ok, but C++ and anything-else is not).
It depends on what languages you want to support, but in general this is a huge & difficult task unless you plan to only support a very small subset of each language.
The real problem is that each programming languages has different features (with some areas that overlap and others that don't) and different ways of solving the same problems -- and it's pretty tricky to detect the problem the programmer is trying to solve and convert that to a new idiom. :) And think about the differences between GUIs created in different languages....
See http://xmlvm.org/ as an example (a project aimed at converting between source code of many different languages, with an XML middle-point) -- the site covers in some depth the challenges they are tackling and the compromises they take, and (if you still have any interest in this kind of project...) ask more specific followup questions.
Notice specifically what the output source code looks like -- it's not at all readable, maintainable, efficient, etc..
It is "technically easy" to produce XML for any single langauge: build a parser, construct and abstract syntax tree, and dump out that tree as XML. (I build tools that do this off-the-shelf for many languages). By technically easy, I mean that the community knows how to do this (see any compiler textbook, e.g., Aho&Ullman Dragon book). I do not mean this is a trivial exercise in terms of effort, because real languages are complicated and messy; there have been many attempts to build C++ parsers and few successes. (I have one of the successes, and it was expensive to get right).
What is really hard (and I don't try to do) is produce XML according to a single schema in which the language semantics are exposed. And without that, it will be essentially impossible to write a translator from a generic XML to an arbitrary target language. This is known as the UNCOL problem and people have been looking since 1958 for the answer. I note that the Wikipedia article seems to indicate the problem is solved, but you can't find many references to UNCOL in the literature since 1961.
The closest attempt I've seen to this is the OMG's "ASTM" model (http://www.omg.org/spec/ASTM/1.0/Beta1/); it exports XMI which is XML. But the ASTM model has lots of escapes built into it to allow langauges that it doesn't model perfectly (AFAIK, that means every language) to extend the XMI in arbitrary ways so that the language-specific information can be encoded. Consequently each language parser produces a custom version of the XMI, and thus each reader has to pretty much know about the extensions and full generality vanishes.

What is the purpose of case sensitivity in languages? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
Is there any advantage of being a case-sensitive programming language?
Why are many languages case sensitive?
Something I have always wondered, is why are languages designed to be case sensitive?
My pea brain can't fathom any possible reason why it is helpful.
But I'm sure there is one out there. And before anyone says it, having a variable called dog and Dog differentiated by case sensitivity is really really bad practise, right?
Any comments appreciated, along with perhaps any history on the matter! I'm insensitive about case sensitivity generally, but sensitive about sensitivity around case sensitivity so let's keep all answers and comments civil!
It's not necessarily bad practice to have two members which are only differentiated by case, in languages which support it. For example, here's a fairly common bit of C#:
private readonly string name;
public string Name { get { return name; } }
Personally I'm quite happy with case sensitivity - particularly as it allows code like the above, where the member variable and property follow conventions anyway, avoiding confusion.
Note that case-sensitivity has a culture aspect too... not all cultures will deem the same characters to be equivalent...
One of the biggest reasons for case-sensitivity in programming languages is readability. Things that mean the same should also look the same.
I found the following interesting example by M. Sandin in a related discussion:
I used to
believe case sensitivity was a
mistake, until I did this in the case
insensitive language PL/SQL (syntax
now entierly forgotten):
function IsValidUserLogin(user:string, password :string):bool begin
result = select * from USERS
where USER_NAME=user and PASSWORD=password;
return not is_empty(result);
end
This passed unnoticed for several
months on a low-volume production
system, and no harm came of it. But it
is a nasty bug, sprung from case
insensitivity, coding conventions, and
the way humans read code. The lesson
for me was that: Things that are the
same should look the same.
Can you see the problem immediately? I couldn't...
I like case sensitivity in order to differentiate between class and instance.
Form form = new Form();
If you can't do that, you end up with variables called myForm or form1 or f, which are not as clean and descriptive as plain old form.
Case sensitivity also means that you don't have references to form, FORM and Form which all mean the same thing. I find it difficult to read such code. I find it much easier to scan code where all references to the same variable look exactly the same.
Something I have always wondered, is why are languages designed to be case sensitive?
Ultimately, it's because it is easier to correctly implement a case-sensitive comparison correctly; you just compare bytes/characters without any conversions. You can also do other things like hashing really easy.
Why is this an issue? Well, case-insensitivity is rather hard to add unless you're in a tiny domain of supported characters (notably, US-ASCII). Case conversion rules vary by locale (the Turkish rules are not the same as those in the rest of the world) and there's no guarantee that flipping a single bit will do the right thing, or that it is always the same bit and under the same preconditions. (IIRC, there's some really complex rules in some language for throwing away diacritics when converting vowels to upper case, and reintroducing them when converting to lower case. I forget exactly what the details are.)
If you're case sensitive, you just ignore all that; it's just simpler. (Mind you, you still ought to pay attention to UNICODE normalization forms, but that's another story and it applies whatever case rules you're using.)
Imagine you have an object called dog, which has a method called Bark(). Also you have defined a class called Dog, which has a static method called Bark(). You write dog.Bark(). So what's it going to do? Call the object's method or the static method from the class? (in a language where :: doesn't exist)
I'm sure originally it was a performance consideration. Converting a string to upper or lower case for caseless comparison isn't an expensive operation exactly, but it's not free either, and on old systems it may have added complexity that the systems of the day weren't ready to handle.
And now, of course, languages like to be compatible with each other (VB for example can't distinguish between C# classes or functions that differ only in case), people are used to naming things the same text but with different cases (See Jon Skeet's answer - I do that a lot), and the value of caseless languages wasn't really enough to outweigh these two.
The reason you can't understand why case-sensitivity is a good idea, is because it is not. It is just one of the weird quirks of C (like 0-based arrays) that now seem "normal" because so many languages copied what C did.
C uses case-sensitivity in indentifiers, but from a language design perspective that was a weird choice. Most languages that were designed from scratch (with no consideration given to being "like C" in any way) were made case-insensitive. This includes Fortran, Cobol, Lisp, and almost the entire Algol family of languages (Pascal, Modula-2, Oberon, Ada, etc.)
Scripting languages are a mixed bag. Many were made case-sensitive because the Unix filesystem was case-sensitive and they had to interact sensibly with it. C kind of grew up organically in the Unix environment, and probably picked up the case-sensitive philosophy from there.
Case-sensitive comparison is (from a naive point of view that ignores canonical equivalence) trivial (simply compare code points), but case-insensitive comparison is not well defined and extremely complex in all cases, and the rules are impossible to remember. Implementing it is possible, but will inadvertedly lead to unexpected and surprising behavior. BTW, some languages like Fortran and Basic have always been case-insensitive.

Why are there not more control structures in most programming languages?

Why do most languages seem to only exhibit fairly basic control structures from a logic point of view? Stuff like If ... then, Else..., loops, For each, switch statement, etc. The standard list seems fairly basic from a logic point of view.
Why is there not much more in the way of logic syntactical sugar? Perhaps something like a proposition engine, where you could feed an array of premises or functions that return complicated self referential interdependent functions and results. Something where you could chain together a complex array of conditions, but represented in a way that was easy and clear to read in the code.
Premise 1
Premise 2 if and only if Premise 1
Premise 3
Premise 4 if Premise 2 and Premise 3
Premise 5 if and only if Premise 4
etc...
Conclusion
I realize that this kind of logic this can be constructed in functions and/or nested conditional statements. But why are there not generally more syntax options for structuring these kind of logical propositions without resulting in hairy looking conditional statements that can be hard to read and debug?
Is there an explanation for the kinds of control structures we typically see in mainstream programming languages? Are there specific control structures you would like to see directly supported by a language's syntax? Does this just add unnecessary complexity to the language?
Have you looked a Prolog? A Prolog program is basically a set of rules that is turned into one big evaluation engine.
From my personal experience Prolog is a bit too weird and I actually prefer ifs, whiles and so on but YMMV.
Boolean algebra is not difficult, and provides a solution for any conditionals you can think of, plus an infinite number of other variants.
You might as well ask for special syntax for "commonly-used" arithmetic expressions. Who is to say what qualifies as commonly-used? And where do you stop adding special-case syntax?
Adding to the complexity of a language parser is not preferable to using constructive expression syntax, combined with extensibility through defining functions.
It's been a long time since my Logic class in college but I would guess it's a mixture of difficulty in writing them into the language vs. the frequency with which they'd be used. I can't say I've ever had the need for them (not that I can recall). For those times that you would require something of that ilk the language designers probably figure you can work out the logic yourself using just the basic structures.
Just my wild guess though.
Because most programming languages don't provide sufficient tools for users to implement them, it is not seen as an important enough feature for the implementer to provide as an extension, and it isn't demanded enough or used enough to be added to the standard.
If you really want it, use a language that provides it, or provides the tools to implement it (for instance, lisp macros).
It sounds as though you are describing a rules engine.
The basic control algorithms we use mirror what processor can do efficiently. Basicly this boils down to simple test-and-branches.
It may seem limiting to you, but many people don't like the idea of writing a simple-looking line of code that requires hundreds or thousands (or millions) of processor cycles to complete. Among these people are systems software folks, who write things like Operating Systems and compilers. Naturally most compilers are going to reflect their own writer's concerns.
It relates to the concern regarding atomicity. If you can express A,B,C,D in simpler structures Y, Z, why not simply not supply A,B,C,D but supply Y, Z instead?
The existing languages reflect 60 years of the tension between atomicity and usability. The modern approach is "small language, large libraries". (C#, Java, C++, etc).
Because computers are binary, all decisions must come down to a 1/0, yes/no, true/false, etc.
To be efficient, the language constructs must reflect this.
Eventually all your code goes down to a micro-code that is executed one instruction at a time. Until the micro-code and accompanying CPU can describe something more colorful, we are stuck with a very plain language.

What does "expressive" mean when referring to programming languages?

I hear this word a lot in sentences like "javascript is a very expressive language". Does it just mean there aren't a lot of rules, or does "expressive" have a more specific meaning?
'Expressive' means that it's easy to write code that's easy to understand, both for the compiler and for a human reader.
Two factors that make for expressiveness:
intuitively readable constructs
lack of boilerplate code
Compare this expressive Groovy, with the less expressive Java eqivalent:
3.times {
println 'Hip hip hooray'
}
vs
for(int i=0; i<3; i++) {
System.out.println("Hip hip hooray");
}
Sometimes you trade precision for expressiveness -- the Groovy example works because it assumes stuff that Java makes you to specify explicitly.
I take it to mean that it's capable of expressing ideas/algorithms/tasks in an easy-to-read and succinct way.
Usually I associate a language being expressive with syntactic sugar, although that's not always the case. Examples in C# of it being expressive would be:
foreach (instead of explicitly writing the iteration)
the using statement (instead of explicitly writing the try/finally)
query expressions (simpler syntax for writing LINQ queries)
extension methods (allowing chaining of method calls, again primarily for LINQ)
anonymous methods and lambda expressions (allowing easier delegate and expression tree construction)
A different example would be generics: before C# got generics, you couldn't express the idea of "an ArrayList containing only strings" in code. (You could document it, of course, or write your own StringList type, but that's not quite the same.)
Neal Grafter has a blog with a good quote from it on the subject...
In my mind, a language construct is expressive if it enables you to write
(and use) an API that can't be written (and used) without the construct.
I'd say that it means you can more naturaly express your thoughts in code.
That's a tough one.
For me, it has to do with the ease at which you can express your intent. This is different in different languages, and also depends a lot on what you want to do, so this is an area where generalizations are common. It's also subjective and personal, of course.
It's easy to think that a more high-level language is always more expressive, but I don't think that is true. It depends on what you're trying to express, i.e. on the problem domain.
If you wanted to print the floating-point number that has the binary pattern 0xdeadbeef, that is far easier to do in C than in Bash, for instance. Yet Bash is, compared to C, an ultra-high-level language. On the other hand, if you want to run a program and collect its output into a text file, that is so simple it's almost invisible in Bash, yet would require at least a page of code in C (assuming a POSIX environment).
Here, a very controversial comparison:
http://redmonk.com/dberkholz/2013/03/25/programming-languages-ranked-by-expressiveness/
So, what are the best languages by these metrics?
If you pick the top 10 based on ranking by median and by IQR, then
take the intersection of them, here’s what’s left. The median and IQR
are listed immediately after the names:
Augeas (48, 28): A domain-specific languages for configuration files
Puppet (52, 65): Another DSL for configuration REBOL (57, 47): A language designed for distributed computing
eC (75, 75): Ecere C, a C derivative with object orientation
CoffeeScript (100, 23): A higher-level language that transcompiles to JavaScript
Clojure (101,51): A Lisp dialect for functional, concurrent programming
Vala (123, 61): An object-oriented language used by GNOME
Haskell (127, 71): A purely functional, compiled language with strong static typing
http://en.wikipedia.org/wiki/Expressive_power
Maybe this site http://gafter.blogspot.com/2007/03/on-expressive-power-of-programming.html can help you
In short he says: In my mind, a language construct is expressive if it enables you to write (and use) an API that can't be written (and used) without the construct. In the context of the Closures for Java proposed language extension, control abstraction APIs are the kind of thing that don't seem to be supported by the competing proposals.
I'd make a distinction between expressivity and expressiveness.
Expressivity - expressive power = the breadth of ideas that can be represented and communicated in a language (with reasonable effort)
Expressiveness - ability to express complex things in a compact way without having to spell out details - the opposite of wordiness (this goes down to "easier to write or understand" or compactness of expression(s) ) This definition is used by that controversial article already mentioned.
The qualifier (with reasonable effort) serves to avoid the sharp edge (and contrived stretches) of "at all" (people "proving" that "everything" can be written in language X even though it's clearly not meant for "that" - example mad "proofs" that "imperative/iterative algo can be written in XSLT")
With these definitions we can reason how expressiveness and expressivity can be antagonists. So called "higher"/declarative languages usually have high expressiveness (compact expressions denote functionality of hundreds, thousands lines of code) but substantially decreased expressivity. They achieve compactness of expression by restricting the domain (things they can work with, ideas one can express in them).
Strictly functional languages have to do huge acrobatics to express very simple things (like counting) if they can at all. When they can't they are incomplete and relegated to a rather narrow, specialized, application.
One thing we didn't touch on is performance. Language that can't give fast result gets relegated to academic, sketching, experimental use. Would you call a language "more expressive" if the same algo runs 100 times slower in it? You'd call it a waste of time :-)
High expressiveness (easier to write or understand) tends to cost a lot of perf, high expressivity usually lets you choose whether to do (approx) the same algo with "lower" (faster) or "higher" (slower) constructs.
Python is good example since it mixes constructs with high expressivity and expressiveness (it's not by chance that it's so bellowed) - as long as they are not mixed that is :-) You'll see articles (including here on StackOverflow) comparing how using very different constructs for the same problem can result in huge perf differences. But it's the fact that you do have a choice (high expressivity) that gives you reasonable trust that you will find (measure) the fastest way - eventually :-)
Quite recent debate: Gremlin vs Cypher (on its way to be enshrined as GQL standard). Cypher is praised for being simple, easier to learn, declarative. But it can't express algos/tactics (in graph crawling) that Gremlin can even in theory and is 100-200 times slower - by admission of the team/company that's writing it and popularizing.
This is why it's important to be aware whether you are talking about expressivity or expressiveness and not reduce it to a vague "expressive".
High expressivity of Gremlin lets you use declarative and imperative "way" as needed and write whole crawler as "engine" (shall we say FSA :-) When I was writing a system with very complex graph crawling, crawlers were in strict C++ (modern - lambdas, higher order templates, packs) based on the style/concepts of Gremlin (you have to think in terms of a crawler being active, 'live' and how far (in the future :-) he can look if you want any chance of being fast).
Gremlin vs Cypher situation is very interesting exactly because they are almost diametric opposites - Cypher all expressiveness (all the way down to simple,easy,declarative), Gremlin all expressivity. If you are writing missile navigation (or algorithmic trading) which one would you chose? How would you know where to look if you call both "expressive" ? :-)

Resources