machine representation of natural text - nlp

I'm currently working on high-level machine representation of natural text.
For example,
"I had one dog but I gave it to Danny who didn't have any"
would be
I.have.dog =1
I.have.dog -=1
Danny.have.dog = 0
Danny.have.dog +=1
something like this....
I'm trying to find resources, but can't really find matching topics..
Is there a valid subject name for this type of research? Any library of resources?
Natural logic sounds like something related but it's not really the same thing I'm working on. Please help me out!

Representing natural language's meaning is the domain of computational semantics. Within that area, lots of frameworks have been developed, though the basic one is still first-order logic.
Specifically, your problem seems to be that of recognizing discourse semantics, which deals with information change brought about by language use. This is pretty much an open area of research, so expect to find a lot of research papers and PhD positions, but little readily-usable software.

As larsmans already said, this is pretty much a really open field of research, called computational semantics (a subfield of computational linguistics.)
There's one important thing that you'll need to understand before starting off in the comp-sem world: most people there use fancy high-level languages. By high-level I don't mean C, but more something like LISP, Prolog, or, as of late, Haskell. Computational semantics is very close to logic, which is why people researching the topic are more comfortable with functional and logical languages — they're closer to what they actually use all day long.
It will also be very useful for you to first look at some foundational course in predicate logic, since that's what the underlying literature usually takes for granted.
A good introduction to the connection between logic and language is L.T.F. Gamut — Logic, Language, and Meaning, volume I. This deals with the linguistic side of semantics, which won't help you implement anything, but it will help you understand the following literature. That said, there are at least some books that will explain predicate logic as they go, but if you ask me, any person really interested in the representation of language as a formal system should take a course in predicate and possibly intuitionist and intensional logic.
To give you a bit of a peek, your example is rather difficult to treat for
current comp-sem approaches. Not impossible, but already pretty high up the
scale of difficulty. What makes it difficult is the tense for one part (dealing
with tense and aspect will typically bring you into even semantics,) but also
that you'd have to define the give and have relations in a way that
works for this example. (An easier example to work with would be, say "I had
a dog, but I gave it to Danny who didn't have any." Can you see why?)
Let's translate "I have a dog."
∃x[dog(x) ∧ have(I,x)]
(There is an object x, such that x is a dog and the have-relation holds between
"I" and x.)
These sentences would then be evaluated against a model, where the "I"
constant might already be defined. By evaluating multiple sentences in sequence,
you could then alter that model so that it keeps track of a conversation.
Let's give you some suggestions to start you off.
The classic comp-sem system is
SHRDLU, which places geometric
figures of certain color in a virtual environment. You can play around with it, since there's a Windows-compatible demo online at that page I linked you to.
The best modern book on the topic is probably Blackburn and Bos
(2005). It's written in Prolog, but
there are sources linked on the page to learn Prolog
(now!)
Van Eijck and Unger give a good course on computational semantics in Haskell, which is a bit more recent, but in my eyes not quite as educational in terms of raw computational semantics as Blackburn and Bos.

Related

Recursion schemes for dummies?

I'm looking for some really simple, easy-to-grasp explanations of recursion schemes and corecursion schemes (catamorphisms, anamorphisms, hylomorphisms etc.) which do not require following lots of links, or opening a category theory textbook. I'm sure I've reinvented many of these schemes unconsciously and "applied" them in my head during the process of coding (I'm sure many of us have), but I have no clue what the (co)recursion schemes I use are called. (OK, I lied. I've just been reading about a few of them, which prompted this question. But before today, I had no clue.)
I think diffusion of these concepts within the programming community has been hindered by the forbidding explanations and examples one tends to come across - for example on Wikipedia, but also elsewhere.
It's also probably been hindered by their names. I think there are some alternative, less mathematical names (something about bananas and barbed wire?) but I have no clue what the cutsier names are for recursion schemes that I use, either.
I think it would help to use examples with datatypes representing simple real-world problems, rather than abstract data types such as binary trees.
Extremely loosely speaking, a catamorphism is just a slight generalization of fold, and an anamorphism is a slight generalization of unfold. (And a hylomorphism is just an unfold followed by a fold.). They're presented in a more rigorous form usually, to make the connection to category theory clearer. The denser form lets us distinguish data (the necessarily finite product of an initial algebra) and codata (the possibly infinite product of a final coalgebra). This distinction lets us guarantee that a fold is never called on an infinite list. The other reason for the funny way that catamorphisms and anamorphisms are generally written is that by operating over F-algebras and F-coalgebras (generated from functors) we can write them once and for all, rather than once over a list, once over a binary tree, etc. This in turn helps make clear exactly why they're all the same thing.
But from a pure intuition standpoint, you can think of cata and ana as reducing and producing, and that's about it.
Edit: a bit more
A metamorphism (Gibbons) is like an inside-out hylo -- its a fold followed by an unfold. So you can use it to tear down a stream and build up a new one with a potentially different structure.
Ekmett posted a nice "field guide" to the various schemes in the literature: http://comonad.com/reader/2009/recursion-schemes/
However, while the "intuitive" explanations are straightforward, the linked code is less so, and the blog posts on some of these might be a tad on the complex/forbidding side.
That said, except perhaps for histomorphisms I don't think the rest of the zoo is necessarily something you'd want to think with directly most of the time. If you "get" hylo and meta, you can express nearly anything in terms of them alone. Typically the other morphisms are more restrictive, not less (but therefore give you more properties "for free").
A few references, from the most category-theoretic (but relevant to give a "territory map" that will let you avoid "clicking lots of links") to the simpler & more self-contained:
As far as the "bananas & barbed wire" vocabulary goes, this comes from the original paper of Meijer, Fokkinga & Patterson (and its sequel by other authors), and it is in sum just as notation-heavy as the less cute alternatives : the "names" (bananas, etc) are just a shortcut to the graphical appearance of the ascii notation of the constructions they are pegged to. For example, catamorphisms (i.e. folds) are represented with (| _ |), and the par-with-parenthesis looks like a "banana", hence the name. This is the paper who is most often called "impenetrable", hence not the first thing I'd look up if I were you.
The basic reference for those recursion schemes (or more precisely, for a relational approach to those recursion schemes) is Bird & de Moor's Algebra of Programming (the book is unavailable except as a print-on demand, but there are copies available second-hand & it should be in libraries). It contains a more paced & detailed explanation of point-free programming, if still "academic" : the book introduces some category-theoretic vocabulary, though in a self-contained manner. Yet, the exercises (that you wouldn't find in a paper) help.
Sorting morphisms by Lex Augustjein, uses sorting algorithms on various data structures to explain recursion schemes. It is pretty much "recursion schemes for dummies" by construction:
This presentation gives the opportunity to introduce the various morphisms in
a simple way, namely as patterns of recursion that are useful in functional programming, instead of the usual approach via category theory, which tends to be needlessly intimidating for the average programmer.
Another approach to making a symbols-free presentation is Jeremy Gibbons' chapter Origami Programming in The Fun of Programming, with some overlap with the previous one. Its bibliography gives a tour of the introductions to the topic.
Edit : Jeremy Gibbons just let me know he has added a link to the bibliography of the whole book on the book's webpage after reading this question. Enjoy !
I'm afraid these last two references only give a solid explanation of (cata|ana|hylo|para)morphisms, but my hope is that this would be enough to tear through the algebraic formalism you can find in more notation-heavy publications. I don't know of any strictly non-category-theoretic explanation of (co-)recursion schemes other than those four.
Tim Williams gave a brilliant talk at the London Haskell User Group last night about recursion schemes with a motivating example of each of the ones you mention. Check out the slides:
http://www.timphilipwilliams.com/slides.html
There are references to all the usual suspects (lenses, bananas, barbed wire ala carte etc) at the end of the slides and you could also google "Origami Programming" which is a nice intro that I hadn't come across before.
and the video will be here when it's uploaded:
http://www.youtube.com/user/LondonHaskell
edit Most of the links in question are in huitseeker's answer above.

Is similarity to "natural language" a convincing selling point for a programming language? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Look, for example at AppleScript (and there are plenty of others, some admittedly quite good) which advertise their use of the natural language metaphor. Code is apparently more readable because it can be/is intended to be constructed in English-like sentences, says they. I'm sure there are people who would like nothing better than to program using only English sentences. However, I have doubts about the viability of a language that takes that paradigm too far (excepting niche cases).
So, after a certain reasonable point, is natural-languaginess a benefit or a misfeature? What if the concept is carried to an extreme -- will code necessarily be more readable? Or might it be unnecessarily long, difficult to work with, and just as capable of producing hilarity on the scale of obfuscated Perl, obfuscated C, and eye-twisting Bash script logorrhea?
I am aware of some specialty cases like "Inform" that are almost pure English, but these have a niche that they're not likely to venture out from. I hear and read about how great it would be for code to read more like English sentences, but are there discussions of the possible disadvantages? If everyday language is so clear, simple, clean, lovely, concise, understandable, why did we invent mathematical notation in the first place?
Is it really easier to describe complex instructions accurately and precisely to a machine in natural language, or isn't something closer to mathematical markup a much better choice? Where should that line be drawn? And finally, are you attracted to languages that are touted as resembling English sentences? Should this whole question have just been a one liner:
naturalLanguage > computerishLanguage ? booAndHiss : cheerLoudly;
Well, of course, natural languages are rarely clear, simple, clean, lovely, concise, understandable which is one of the reasons that most programming is done in languages far from natural.
My answer to this would be that the ideal programming language lies somewhere between a natural language and a very formal language.
On the one extreme, there's the formal, minimal, mathematical languages. Take for example Brainfuck:
,>++++++[<-------->-],[<+>-]<. // according to Wikipedia, this means addition
Or, what's somewhat preferable to the above mess, any type of lambda calculus.
λfxy.x
λfxy.y
This is one possible way of expressing the Boolean truth values in lambda calculus. Doesn't look very neat, especially when you build logical operators (such as AND being e.g. λpq.pqp) around them.
I claim that most people could not write production code in such a minimalistic, hard-to-grasp language.
The problem on the other end of the spectrum, namely natural languages as they are spoken by humans, is that languages with too much complexity and flexibility allows the programmer to express vague and indefinite things that can mean nothing to today's computers. Let's take this sample program:
MAYBE IT WILL RAIN CATS AND DOGS LATER ON. WOULD YOU LIKE THIS, DEAR COMPUTER?
IF SO, PRINT "HELLO" ON THE SCREEN.
IF YOU HATE RAIN MORE THAN GEORGE DOES, PRINT SOME VAGUE GARBAGE INSTEAD.
(IN THE LATTER CASE, IT IS UP TO YOU WHERE YOU OUTPUT THAT GARBAGE.)
Now this is an obvious case of vagueness. But sometimes you would get things wrong with more reasonable natural language programs, such as:
READ AN INTEGER NUMBER FROM THE TERMINAL.
READ ANOTHER INTEGER NUMBER FROM THE TERMINAL.
IF IT IS LARGER THAN ZERO, PRINT AN ERROR.
Which number is IT referring to? And what kind of error should be printed (you forgot to specify it.) — You would have to be really careful to be extremely explicit about what you mean.
It's already too easy to mis-understand other humans. How do you expect a computer to do better?
Thus, a computer language's syntax and grammar has to be strict enough so that it doesn't allow ambiguity. A statement must evaluate in a deterministic way. (There are maybe corner cases; I'm talking about the general case here.)
I personally prefer languages with a very limited set of keywords. You can quickly learn such a language, and you don't have to choose between 10,000 ways of achieving one goal simply because there's 10,000 keywords for doing the same thing (as in: GO/WALK/RUN/TROD/SLEEPWALK/etc. TO THE FRIDGE AND GET ME A BEER!). It means if you need to think about 10,000 different ways of doing something, it won't be due to the language, but due to the fact that there are 9,999 stupid ways to do it, and 1 elegant solution that just shines more than all the others.
Note that I wrote all natural language examples in upper-case. That's because I sort of had good old GW-BASIC and COBOL in mind while I wrote this. There've been some examples of programming languages that lean on natural language, and I think history has shown that they are, in general, somewhat less widespread than e.g. terse C-style languages.
I recently read that according to Gartner there are over 400 billion lines of COBOL source code in active use worldwide today.
That doesn't prove anything other than that banks and governments are fond of their legacy code, but you could construe it as a testament to the success of English-like programming languages. I'm not aware of any other programming language that is so close to English and so verbose.
Aside from that, I tend to agree with the other respondents: Programmers prefer not to type so much, and in general a language based on mathematics-like shorthand is both more expressive and more precise than one based on English.
There's a point where terse, expressive code looks like line noise. Perl, APL and J come to mind as examples with "illegible one-liners." Programmers are humans, and it may be beneficial to leave them with some similarity to natural language to give their brains something familiar to hold on to. Thus, I propagate a happy medium that's reminiscent of but not too close to natural language.
"When a programming language is created that allows programmers to program in simple English, it will be discovered that programmers cannot speak English." ~ Unknown
In my (not so) humble opinion, no.
Natural language is full of ambiguities. Normally we do not think of them because humans can easily disambiguate them, based on many criteria often unavailable to the computer. First off we have knowledge about the world (elephants don't fit in pajamas), but also we use more senses than just hearing when we speak to each other, body language to name one. The intonation and manner things are said with also helps alot to disambiguate. It is harder to catch irony or sarcasm in written text, which is more or less a transcription of what we would say, more in the case IM less in the case of well written articles. In general there is loads and loads of ambiguity in natural language, for instance where the PPs, prepositional phrases attach:
"Workers [dumped [sacks [with flour]]]"
"Workers [dumped [sacks] [with a fork-lift]]]"
Any human immediatly tells where the PP will attach, its reasonable to have sacks with flour in them, and its reasonable to use a fork-lift to dump something. Another very troublesome area is the word "and" which messes up the grammar horrendously, or all the references we use, the pronouns in general, but also more complex references, ie. "Bill bought a Dodge Viper, sadly the car was a lemon".
So we have three options, keep the ambiguities in and try to deal with them, accepting very many errors in disambiguation and very very slow parsing, no LALR or LL will work here, or try to make an artifical grammar resembling natural language, and keeping it deterministic, which is more reasonable but still horrible. We now have a language that falsely resembles English, but it isn't which is confusing. We have none of the benefits of a proper syntax and none of the benefits of natural language, but an oversized overwordly monstrum, with a diffcult and unintuitive grammar, diffcult to learn and slow to write.
The third way is realizing we need a succinct way of expressing ourselves, which can also be processed by a computer, not resembling any natural language, but focusing on being an unambigous description of an algorithm. This will increase the readability, especially if we compare to a very precise natural language counter part. This is why many people prefer to also read the pseudo-code when dealing with difficult problems or advanced algorithms, it relieves us of the trouble with dealing with ambiguities, and is more optimal for expressing computer instructions.
The issue isn't so much that it's easier to describe complex ideas using one approach or the other, but it certainly is easier understanding machine languages (at least for machines). The biggest issue is, as always, ambiguity. Computers are terrible at understand it, so most grammars for programming languages need to be constructed to either remove all ambiguity, or the general language must be constructed so that ambiguity isn't actually a problem (this is tricky).
Any programming language that allows for ambiguity would be terribly error prone; and any natural language that doesn't allow ambiguity would be terribly verbose and convoluted (I'm looking at you, Lojban [ok, maybe Lojban isn't so bad‚ still…]).
The propensity some people show for preferring natural languages for programming languages might essentially root out in the desire to eventually be able to input a physics textbook into a parser, whereupon it'll do your homework when asked.
Of course, that's not to say that programming languages shouldn't have hints of natural language: Especially for OOP it makes good sense to have calling grammar resemble natural grammar, like in Obj-C, which is sort of a game of mad libs:
[pot makeCoffee:strong withSugar:NO];
Doing the same in BrainFuck would be, well, a brainfuck, three full pages of code to flip a switch will do that to you.
In essensce; the best languages are (probably) the ones that resemble natural languages, without pretending to be one. (Avoiding the uncanny valley of programming languages, [if there is such a thing] if you will. [Subclauses! Yay!])
A natural language is too ambiguous to be used as programming language. It has to be artificially constrained to eliminate ambiguities.
But it defeats the purpose of having a "natural" programming language, because you have its verbosity and none of its advantages in expressibility.
I think the fourth language I coded professionally in (after Fortran, Pascal and Cobol) was Natural. Which is a pretty obscure 4GL of 1980's vintage for developing mainframe systems against an ADABAS database.
Called Natural I believe because it had pretensions to be so. Supposedly management-readable like cobol, but minus the fluff.
Which should tell you that attempts at 'Natural' programming languages have a commercial history of over 30 years now (more if you count cobol) but they have pretty much lost out to languages that don't pretend to be 'natural' but do allow the programmer to define the problem succinctly. When I first started coding the 1GL -> 2GL -> 3GL evolution wasn't that old and the progression to 4GL (defined then as a more english-like programming languages) for mainstream work seem an obvious next step. It hasn't worked out that way. If anything getting up to speed with coding now has got harder because there's more abstract concepts to learn.
SQL was designed with natural language in mind originally. Fortunately it hasn't held on too tightly to this and advances since its conception are less "naturalistic".
But anybody that has tried to write a complicated query in SQL will tell you that its not that easy. You have worry about the range of some keywords over your query. You have this incredibly hard to understand query, that does some crazy shit, but you re-write it every time you need to change something because its easier.
Natural language programming is a bad idea. the further you get from assembly, the more mistakes you can make, not in terms of logical errors or anything like that, but in terms of having the wrong assumption about how the script interpreter/bytecode intepreter/compiler makes your code run on the CPU.
Is seems to be a great feature for beginners, or people who program as a "secondary activity". But I doubt you could reach the complexity and polyvalence of actual programming languages with natural language.
If there was a programming language that actually adhered to all of the conventions of the natural language it mimics, then that would be fantastic.
In reality, however, a lot of so-called "natural" programming languages have far stricter syntax than English, which means that although they are easily readable, it is debatable whether they are actually all that easy to write.
What makes sense in English is often a syntax error in AppleScript.
Everyday language isn't so clear, simple, clean, lovely, concise and understandable - to a computer. However, to a human, readability counts for a lot, and the closer you get to a natural language, the easier it is to read. That's why we're not all using assembly language.
If you have a completely natural language, there are a lot of things that need to be handled - the sentence needs to be parsed, each word must be understood - and there is plenty of room for ambiguity. That's generally not a good thing for a programming language, because then we're venturing into psychic programming - the computer has to figure out what you were thinking, which is not at all easy to get.
However, if you can make something sufficiently close to natural language - and yes, Inform 7 is probably the best example - so sentences look natural, but still have some structure you need to follow - then the code is almost instantly readable, even to people that don't know the language. There's usually also less specialized syntax to remember - because you're really just talking (a slightly modified form of) English - but if you have to do something out of the ordinary, then you might have to jump through some hoops to do that.
In practice, most languages don't bother with this, because that makes it easier for them to allow you to be precise. However, some will still hover closer to the "natural language". This can be a good thing: if you have to translate some pseudocode algorithm to a language, you don't need to manipulate it as much to make it work, reducing the risk that you make an error in the translation.
As an example, let's compare C and Pascal. This Pascal code:
for i := 1 to 10 do begin
j := j + 1;
end;
is equivalent to this C code:
for (i = 1; i <= 10; i++) {
j = j + 1;
}
If you had no prior knowledge of either syntax, the Pascal version is generally going to be simpler to read, if only because it's not as complex as a C for.
Let's also consider operators. Pascal and C both share +, - and *. They also both have /, but with different semantics: In C, / does an integer division if both operands are integers; in Pascal, it always does a "real" division and uses div for integer division. That means that you have to take the types into account when figuring out what actually happens in that line of code.
C also has a bunch of other operators: &&, ||, &, |, ^, <<, >> - in Pascal, those operators are instead named and, or, and, or, xor, shl, shr. Instead of relying on some semi-arbitrary sequence of characters, it's spelled out more. It's instantly obvious that xor is - well, XOR - unlike the C version, where there's no obvious correlation between ^ and XOR.
Of course, this is to some degree a matter of opinion: I much prefer a Pascal-like syntax to a C-like syntax, because I think it's more readable, but that doesn't mean everyone else does: A more natural language is usually going to be more verbose, and some people simply dislike that extra level of verbosity.
Basically, it's a matter of choosing what makes the most sense for the problem domain: if the problem domain is very limited (like with Inform), then a natural language makes perfect sense. If it's a very generic domain (like with C), then you either need far more advanced processing than we are currently capable of, or a lot of verbosity to fill in the details - and in that case, you have to choose a balance depending on what sort of users will be using the languages (for regular people, you need more naturalness, for people who know programming, they're usually comfortable enough with less natural languages and will prefer something closer to that end).
I think the question is, who reads and who writes the application code in question? I think, regardless of the language or architecture, a trained software developer should be writing the code, and analyze the code as bugs arise.

What does "powerful" mean, when discussing programming languages?

In the context of programming language discussion/comparison, what does the term "power" mean?
Does it have a well defined meaning? Even a poorly defined meaning?
Say if someone says "language X is more powerful than language Y" or asks the same as a question, what do they mean - or what information are they trying to find out?
It does not have a well-defined meaning. In these types of discussions, "language X is more powerful than language Y" usually means little more than "I like language X more than language Y." On the other end of the spectrum, you'll also usually have someone chime in about how any Turing-complete language can accomplish the same tasks as any other Turing-complete language, so that neither is strictly more powerful than the other.
I think a good meaning for it is expressivity. When a language is highly expressive, it means less code is required to express concepts. To me, this doesn't just mean that you have to write less code to accomplish the same tasks, but also that the code is easily readable by humans. Of course, generally (to a point), having fewer lines of code to read makes the task of reading and understanding easier for humans.
Having a "powerful" standard library comes into play here along the same lines. If a language comes equipped with thorough, complete libraries, then idiomatic code in that language will be able to benefit from the existing library code and not have to repeat or reinvent common functionality in application code. The end result is, again, having to write and read less code to accomplish the same tasks.
I keep saying "generally" and "to a point", because once a language gets too terse, it gets more difficult for humans to decipher. I suppose at this extreme, a language may still be considered "more powerful" (or even "too powerful"). So I guess I'm saying my personal interpretation of "powerful" includes some aspects of "useful" and "readable" in it as well.
C is powerful, because it is low level and gives you access to hardware. Python is powerful because you can prototype quickly. Lisp is powerful because its REPL gives you fantastic debugging opportunities. SQL is powerful because you say what you want and the DMBS will figure out the best way to do it for you. Haskell is powerful because each function can be tested in isolation. C++ is powerful because it has ten times the number of syntactic constructs that any one person ever needs or uses. APL is powerful since it can squeeze a ten-screen program into ten characters. Hell, COBOL is powerful because... why else would all the banks be using it? :)
"Powerful" has no real technical meaning, but lots of people have made proposals.
A couple of the more interesting ones:
Paul Graham wants to call a language "more powerful" if you can write the same programs in fewer lines of code (or some other sane, sensible measure of program size).
Matthias Felleisen has written a very serious theoretical study called On the Expressive Power of Programming Language.
As someone who knows and uses many programming languages, I believe that there are real differences between languages, and that "power" can be a convenient shorthand to describe ways in which one language might be better than another. Nevertheless, whenever I hear a discussion or claim that one language is more powerful than another, I tend to keep one hand firmly on my wallet.
The only meaningful way to describe "power" in a programming language is "can do what I require with the least amount of resources" where "resources" is defined as "whatever costs I'd rather not pay" and could, thus, be development time, CPU time, memory space, money, etc.
So basically the definition of "power" is purely subjective and rendered meaningless in any objective discussion.
Powerful means "high in power". "Power" is something that increases your ability to do things. "Things" vary in shape, size and other things. Loosely speaking therefore, "powerful" when applied to a programming language means that it helps you to do perform your tasks quickly and efficiently.
This makes "powerful" somewhat well defined but not constant across domains. A language powerful in one domain might be crippling in another eg. C is very powerful if you want to do systems level programming since it gives you direct access to the machine and hardware and structures that let you code much faster than you would in assembly. C compilers also produce tight code that runs fast. However, once you move to web applications, C can become very "unpowerful" and crippling since it's so much effort to get something up and running and you have to worry about a lot of extraneous details like memory etc.
Sometimes, languages are "powerful" in multiple domains. This gives them a general "powerful" tag (or badge since were are on SO here). PG's claim is that with LISP, this is the case. That might be true or might not be.
At the end of the day, "powerful" is a loaded word so you should evaluate who is saying it, why he's saying it and what it means to to your work.
There are really only two meanings people are worried about:
"Powerful" in the sense of "takes less resources (time, money, programmers, LOC, etc.) to achieve the same/better result", and "powerful" in the sense of "is capable of doing a wide range of tasks".
Some languages are extrememly resource-effective for a small range of tasks. Others are not so resource-effective but can be applied to a wide range of tasks (e.g. C, which is often used in OS development, creation of compilers and runtime libraries, and work with microcontrollers).
Which of these two meanings someone has in mind when they use the term "powerful" depends on the context (and even then is not always clear). Indeed often it is a bit of both.
Typically there are two distinct meanings:
Expressive, meaning the code tends to be very short and understandable
Low level, meaning you have very fine-grained control over the hardware.
For the most languages, these two definitions are at opposite ends of the spectrum: Python is very expressive but not very low level; C is very low level but not very expressive. Depending on which definition you pick, either language is powerful or not powerful.
nothing absolutely nothing.
To high level programmers it might mean alot of available datatypes built in. Or maybe abstractions to easily create or follow Design Patterns.
Paul Graham is a very high level guy here is what he has to say:
http://www.paulgraham.com/avg.html
Java guys might tell you something about portability, the power to reach every platform.
C/UNIX programmers may tell you that its speed and efficiency, complete control over every inch of memory.
VHDL/Verilog programmers will tell you its complete control over every clock and gate so as to not waste any electricity or time.
But in my opinion a "powerful language" supports all of the features for you to complete your task. Documentation may be important, or perhaps it is portability, or the ability to do graphics. It could be anything, writing a gui from Assembly is just stupid, so is trying to design an embedded processor in flash.
Choosing a language that suits your needs perfectly will always feel like power.
I view the term as marketing fluff, no one well-defined meaning.
If you consider, say, Assembler, C, and C++. On occasions one drops from C++ "down" to C for particualr needs, and in turn from C down to assembler. So that make assembler the most powerful because it's the only language that can do everything. Or, to argue the other way, a single line of C++ code can replace several of C (hiding polymorphic dispatch via function pointers for example) and a single line of C replaces many of assembler. So C++ is more powerful because one line does "more".
I think the term had some currency when products such as early databases and spreadsheets had in-built languages, some quite restricted. So vendors would tout their language as being "powerful" because it was less restricted.
It can have several meanings. In the very basic sense there's power as far as what is computable. In that sense the most powerful languages are Turing Complete which includes pretty much every general purpose programming language (as opposed to most markup languages and domain specific languages which are often not Turing complete).
In a more pragmatic sense it often refers to how concisely (and readably) you can do certain things. Basically how easy is it to do certain tasks in one language compared to another.
What language is more powerful (besides being somewhat subjective) depends heavily on what you're trying to do. If your requirements are to get something running on a small device with 64k of memory you're likely not going to be using Java. Most likely the right language would be C or C++ (or if you're really hard core assembly). If you need a very simple CRUD app done in 1 day, maybe something like Ruby On Rails would be the way to go (I know Rails is a framework and Ruby is the language, but these days what libraries and frameworks are available factor greatly into picking a language)
I think that, perhaps coincidentally, the physics definition of power is relevant here: "The rate at which work is performed."
Of course, a toaster does not perform very quickly the work of putting out fires. Similarly, the power of a programming language is not universal, but specific to the domain or task to which it is being applied. C is a powerful language for writing device drivers or implementations of higher-level languages; Python is a powerful language for writing general-purpose applications; XPath is a powerful language for writing queries on structured data sets.
So given a problem domain, the power of a language can be said to be the rate at which a competent programmer is able to use it to solve problems in that domain.
A precise answer can be tried to reach, by not assuming that the elements that define "powerful" (in the context of languages) come from so many dimensions.
See how many could be, and a lot will be missing:
runtime speed
code size
expressiveness
supported paradigms
development / debugging time
domain specialization
standard libs
codebase
toolchain ecosystem
portability
community
support / documentation
popularity
(add more here)
These and more parameters draw together X picture of how "programming in some language" would be like at X level. That will be only the definition, though, the only real knowledge comes with the actual practice of using the language, but i digress.
The question comes down to which parameter will represent the intrinsic quality of a language. If you refer to a language in itself, its ultimate, intrinsic purpose is "express things", and thus the most representative parameter is rightfully expressiveness, and is also one that resonates frequently when someone talks about how powerful a language is.
At the moment you try to widen the question/answer to cover more than the expressiveness of the language "as a language, as a tongue", you are more talking about different kinds of "environment", social environment, development environment, commercial environment, etc.
Depending of the complexity of the environment to be defined you'll have to mix more parameters that come from multiple, vast, overlapping and sometimes contradictory dimensions, and eventually the point of getting the definition will be lost or the question will have to be narrowed.
This approximation still won't answer "what is an expressive language", but, again, a common understanding are the definitions that Vineet well points out in its answer, and Forest remarks in the comments. I agree, for me "expression" is "conveying meaning".
I remember many instructors in college calling whatever language they were teaching "powerful".
Leads me to think:
Powerful = a relative term comparing the latest way to code something vs. the original or previous way.
I find it useless to use the word "powerful" in regards to discussing anything software related. Every time my professor in college would introduce a new concept such as polymorphism he would say "so this is a really powerful feature". After a while I got annoyed. If everything is powerful then nothing is. It's all the same. You can write code to do anything. Does is really matter how much code is required to do it? You can say it's short or efficient but powerful is just useless. Nuclear energy is powerful. Code is words.
I think that power would normally refer to how quickly it can process data, for example I found that in python as soon as a list exceeds a length of approx. 2000 it becomes unbearably slow whereas in C++ a list can easily contain 20,000 entries without doing so.

What is a computer programming language?

At the risk of sounding naive, I ask this question in search of a deeper understanding of the concept of programming languages in general. I write this question for my own edification and the edification of others.
What is a useful definition of a computer programming language and what are its basic and necessary components? What are the key features that differentiate languages (functional, imperative, declarative, object oriented, scripting, etc...)?
One way to think about this question. Imagine you are looking at the hardware of a modern desktop or laptop computer. Assume, that the C language or any of its variants do not exist. How would you describe to others all the things needed to make the computer expressive and functional in terms of what we expect of personal computers today?
Tangentially related, what is it about computer languages that allow other languages to exist? For example take a scripting language like Javascript, Perl, or PHP. I assume part of the definition of these is that there is an interpreter most likely implemented in C or C++ at some level. Is it possible to write an interpreter for Javascript in Javascript? Is this a requirement for a complete language? Same for Perl, PHP, etc?
I would be satisfied with a list of concepts that can be looked up or researched further.
Like any language, programming languages are simply a communication tool for expressing and conveying ideas. In this case, we're translating our ideas of how software should work into a structured and methodical form that computers (as well as other humans who know the language, in most cases) can read and understand.
What is a useful definition of a computer programming language and what are its basic and necessary components?
I would say the defining characteristic of a programming language is as follows: things written in that language are intended to eventually be transformed into something that is executed. Thus, pseudocode, while perhaps having the structure and rigor of a programming language, is not actually a programming language. Likewise, UML can express many powerful ideas in an abstract manner just like a programming language can, but it falls short because people don't generally write UML to be executed.
How would you describe to others all the things needed to make the computer expressive and functional in terms of what we expect of personal computers today?
Even if the word "programming language" wasn't part of the shared vocabulary of the group I was talking to, I think it would be obvious to the others that we'd need a way to communicate with the computer. Just as no one expects a car to drive itself (yet!) without external instructions in the form of interaction with the steering wheel and pedals, no one could expect the hardware to function without being told what to do. As noted above, a programming language is the conduit through which we can make that communication happen.
Tangentially related, what is it about computer languages that allow other languages to exist?
All useful programming languages have a property called Turing completeness. If one language in the Turing-complete set can do something, then any of them can; they are said to be computationally equivalent.
However, just because they're equally "powerful" doesn't mean they're equally nice to work with for humans. This is why many people are willing to sacrifice the unparalleled micromanagement you get from writing assembly code in exchange for the expressiveness and power you get with higher-level languages, like Ruby, Python, or C#.
Is it possible to write an interpreter for Javascript in Javascript? Is this a requirement for a complete language? Same for Perl, PHP, etc?
Since there is a Javascript interpreter written in C, it follows that it must be possible to write a Javascript interpreter in Javascript, since both are Turing-complete. However, again, note that Turing-completeness says nothing about how hard it is to do something in one language versus another -- only whether it is possible to begin with. Your Javascript-interpreter-inside-Javascript might well be horrendously inefficient, consume absurd amounts of memory, require enormous processing power, and be a hideously ugly hack. But Turing-completeness guarantees it can be done!
While this doesn't directly answer your question, I am reminded of the Revenge of the Nerds essay by Paul Graham about the evolution of programming languages. It's certainly an interesting place to start your investigation.
Not a definition, but I think there are essentially two strands of development in programming languages:
Those working their way up from what the machine can do to something more expressive and less tied to the machine (Assembly, Fortran, C, C++, Java, ...)
Those going down from some mathematical or theoretical computer science concept of computation to something implementable on a real machine (Lisp, Prolog, ML, Haskell, ...)
Of course, in reality the picture is not as neat, and both strands influence each other by borrowing the best ideas.
Slightly long rant ahead.
A computer language is actually not all that different from a human language. Both are used to express ideas and concepts in commonly understood terms. Among different human languages there are syntactic differences, but you can express the same thing in every language (does that make human languages Turing complete? :)). Some languages are better suited for expressing certain things than others.
For example, although technically not completely correct, the Inuit language seems quite suited to describe various kinds of snow. Japanese in my experience is very suitable for expressing ones feelings and state of mind thanks to a large, concise vocabulary in that area. German is pretty good for being very precise thanks to largely unambiguous grammar.
Different programming languages have different specialities as well, but they mostly differ in the level of detail required to express things. The big difference between human and programming languages is mostly that programming languages lack a lot of vocabulary and have very few "grammatical" rules. With libraries you can extend the vocabulary of a language though.
For example:
Make me coffee.
Very easy to understand for a human, but only because we know what each of the words mean.
coffee : a drink made from the roasted and ground beanlike seeds of a tropical shrub
drink : a liquid that can be swallowed
swallow : cause or allow to pass down the throat
... and so on and so on
We know all these definitions by heart, but we had to learn them at some point.
In the same way, a computer can be "taught" to "understand" words as well.
Coffee::make()->giveTo($me);
This could be a perfectly valid expression in a computer language. If the computer "knows" what Coffee, make() and giveTo() means and if $me is defined. It expresses the same idea as the English sentence, just with a different, more rigorous syntax.
In a different environment you'd have to say slightly different things to get the same outcome. In Japanese for example you'd probably say something like:
コーヒーを作ってもらっても良いですか?
Kōhī o tsukuttemoratte mo ii desu ka?
Which would roughly translate to:
if ($Person->isAgreeable('Coffee::make()')) {
return $Person->return(Coffee::make());
}
Same idea, same outcome, but the $me is implied and if you don't check for isAgreeable first you may get a runtime error. In computer terms that would be somewhat analogous to Ruby's implied behaviour of returning the result of the last expression ("grammatical feature") and checking for available memory first (environmental necessity).
If you're talking to a really slow person with little vocabulary, you probably have to explain things in a lot more detail:
Go to the kitchen.
Take a pot.
Fill the pot with water.
...
Just like Assembler. :o)
Anyway, the point being, a programming language is actually a language just like a human language. Their syntax is different and specialized for the problem domain (logic/math) and the "listener" (computers), but they're just ways to transport ideas and concepts.
EDIT:
Another point about "optimization for the listener" is that programming languages try to eliminate ambiguity. The "make me coffee" example could, technically, be understood as "turn me into coffee". A human can tell what's meant intuitively, a computer can't. Hence in programming languages everything usually has one and one meaning only. Where it doesn't you can run into problems, the "+" operator in Javascript being a common example.
1 + 1 -> 2
'1' + '1' -> '11'
See "Programming Considered as a Human Activity." EWD 117.
http://www.cs.utexas.edu/~EWD/transcriptions/EWD01xx/EWD117.html
Also See http://www.csee.umbc.edu/331/current/notes/01/01introduction.pdf
Human expression which:
describes mathematical functions
makes the computer turn switches on and off
This question is very broad. My favorite definition is that a programming language is a means of expressing computations
Precisely
At a high level
In ways we can reason about them
By computation I mean what Turing and Church meant: the Turing machine and the lambda calculus have equivalent expressive power (which is a theorem), and the Church-Turing hypothesis (which is a conjecture) says roughly that there's no more powerful notion of computation out there. In other words, the kinds of computations that can be expressed in any programming languages are at best the kinds that can be expressed using Turing machines or lambda-calculus programs—and some languages will be able to express only a subset of those calculations.
This definition of computation also encompasses your friendly neighborhood hardware, which is pretty easy to simulate using a Turing machine and even easier to simulate using the lambda calculus.
Expressing computations precisely means the computer can't wiggle out of its obligations: if we have a particular computation in mind, we can use a programming language to force the computer to perform that computation. (Languages with "implementation defined" or "undefined" constructs make this task more difficult. Programmers using these languages are often willing to settle for—or may be unknowingly settling for—some computation that is only closely related to the computation they had in mind.)
Expressing computation at a high level is what programming langauges are all about. An important reason that there are so many different programming languages out there is that there are so many different high-level ways of thinking about problems. Often, if you have an important new class of problems to solve, you may be best off creating a new programming language. For example, Larry Wall's writing suggests that solving a class of problems called "systems administration" was a motivation for him to create Perl.
(Another reason there are so many different programming languages out there is that creating a new language is a lot of fun, and anyone can learn to do it.)
Finally, many programmers want languages that make it easy to reason about programs. For example, today a student of mine implemented a new algorithm that made his program run over six times faster. He had to reason very carefully about the contents of C arrays to make sure that the new algorithm would do the same job the old one did. Luckily C has decent tools for reasoning about programs, for example:
A change in a[i] cannot affect the value of a[i-1].
My student also applied a reasoning principle that isn't valid in C:
The sum of of a sequence unsigned integers will be at least as large as any integer in the sequence.
This isn't true in C because the sum might overflow. One reason some programmers prefer languages like Standard ML is that in SML, this reasoning principle is always valid. Of languages in wide use, probably Haskell has the strongest reasoning principles Richard Bird has developed equational reasoning about programs to a high art.
I will not attempt to address all the tangential details that follow your opening question. But I hope you will get something out of an answer that aims to give a deeper understanding, as you asked, of a fundamental question about programming languages.
One thing a lot of "IT" types forget is that there are 2 types of computer programming languages:
Software programming languages: C, Java, Perl, COBAL, etc.
Hardware programming languages: VHDL, Verilog, System Verilog, etc.
Interesting.
I'd say the defining feature of a programming language is the ability to make decisions based on input. Effectively, if and goto. Everything else is lots and lots of syntactic sugar. This is the idea that spawned Brainfuck, which is actually remarkably fun to (try to) use.
There are places where the line blurs; for example, I doubt people would consider XSLT to really be a programming language, but it's Turing-complete. I've even solved a Project Euler problem with it. (Very, very slowly.)
Three main properties of languages come to mind:
How is it run? Is it compiled to bare metal (C), compiled to mostly bare metal with some runtime lookup (C++), run on a JIT virtual machine (Java, .NET), bytecode-interpreted (Perl), or purely interpreted (uhh..)? This doesn't comment much on the language itself, but speaks to how portable the code may be, what sort of speed I might expect (and thus what broad classes of tasks would work well), and sometimes how flexible the language is.
What paradigms does it support? Procedural? Functional? Is the standard library built with classes or functions? Is there reflection? Is there, ideally, support for pretty much whatever I want to do?
How can I represent my data? Are there arrays, and are they fixed-size or not? How easy is it to use strings? Are there structs or hashes built in? What's the type system like? Are there objects? Are they class-based or prototype-based? Is everything an object, or are there primitives? Can I inherit from built-in objects?
I realize the last one is a very large collection of potential questions, but it's all related in my mind.
I imagine rebuilding the programming language landscape entirely from scratch would work pretty much how it did the first time: iteratively. Start with assembly, the list of direct commands the processor understands, and wrap it with something a bit easier to use. Repeat until you're happy.
Yes, you can write a Javascript interpreter in Javascript, or a Python interpreter in Python (see: PyPy), or a Python interpreter in Javascript. Such languages are called self-hosting. Have a look at Perl 6; this has been a goal for its main implementation from the start.
Ultimately, everything just has to translate to machine code, not necessarily C. You can write D or Fortran or Haskell or Lisp if you want. C just happens to be an old standard. And if you write a compiler for language Foo that can ultimately spit out machine code, by whatever means, then you can rewrite that compiler in Foo and skip the middleman. Of course, if your language is purely interpreted, this will probably result in a stack overflow...
As a friend taught me about computer languages, a language is a world. A world of communication with that machine. It is world for implementing ideas, algorithms, functionality, as Alonzo and Alan described. It is the technical equivalent of the mathematical structures that the aforementioned scientists built. It is a language with epxressions and also limits. However, as Ludwig Wittgenstein said "The limits of my language mean the limits of my world", there are always limitations and that's how one chooses it's language that fits better his needs.
It is a generic answer... some thoughts actually and less an answer.
There are many definitions to this but what I prefer is:
Computer programming is programming that helps to solve a particular technical task/problem.
There are 3 key phrases to look out for:
You: Computer will do what you (Programmer) told it to do.
Instruct: Instruction is given to the computer in a language that it can understand. We will discuss that below.
Problem: At the end of the day computers are tools (Complex). They are there to make out life simpler.
The answer can be lengthy but you can find more about computer programming

Flexibility or Consistency, which is Correct for a good programming language?

Recently I'm pondering the characteristics of a good programming language. It's true that "There are a thousand Hamlets in a thousand people's eyes", like simplification, explicitness, readability and more.
But I'm wondering a good programming language should be consistent, which means everyone will write the same(at least very similar) codes to achieve the same functionality, which means everyone will read, understand, maintain others' codes easily and effectively. Not as flexible as everyone will write different codes in readability and quality for one functionality, which mostly depends on programmer's experience and skill.
Till now, I didn't find some languages which matches my definition of consistency, but I'm still looking for. Anyone knows any?
I'm not sure whether you agree with me or not? Please point out if anything wrong with my point. And leave your argument no matter you agree with me or not. Just leave your points and arguments.
Regards.
Both, or either. Take the difference between Perl and Python. In Perl, tell someone to read data from a file and display it to stdout and you can get roughly 85,000 different permutations. In Python, you're likely to have 5-10. However, both are widely used, extremely powerful, highly efficient and capable of nearly any task.
Language choice comes down to personal preference and a bit of what you want to do. OO paradigms will lead to drastically different program structures from program to program, imperative programming paradigms will lead to very similar program structures from programmer to programmer, functional programming paradigms will be somewhere in between.
As an aside, I personally prefer the language to be flexible enough to let me design the application in the manner I see fit as there are times when performance of certain components are an absolute necessity while others they don't matter at all. A strict "One True Way" can place limits on what you do, but "There's More Than One Way To Do It" requires you to know which is the highest performance and make the tradeoff for readability vs. compactness vs. performance (not always mutually exclusive).
There's this great cartoon with a tree swing in every panel. One panel has a regular swing and says "what the customer needed". The next panel shows two seats - one on top of the other and says "what the customer asked for". Next is a swing with three stacked seats that says "what the programmer's built". Everyone has a mental picture of their expectations. And everyone's picture is different.
You can't achieve consistency in that kind of environment. If we can't achieve consistency, can we improve upon readability? Because the end goal is having code that other programmers can maintain.
I like literate programming. It arranges code in a human context - what makes sense for a person to read. A literate style more closely follows your mental model.
Even when I can't use the tools, that style has vastly improved the clarity and effectiveness of comments.
The Zen of Python contains:
There should be one-- and preferably only one --obvious way to do it.

Resources