NLP with greatly constrained input and abilities - nlp

I don't have time to read or digest long intricate discussions on theoretical concepts around NLP (or go get my PHD). That said, I have read a few and it's a damn interesting field. The problem is I need real world solutions, for real world products, in real world time frames.
The problem I'm having is right now I'm not sure what the right questions are to ask to get started implementing. I believe this is mostly related to vocabulary. I'll read somewhere, a blog post, a forum post, a whitepaper, and it says, I'm doing flooping with the blargy blarg method, and I go google flooping and blargy blarg, and I get references to more obscurity. It seemingly never ends.
So, my question is multiphased.
First, more generally, how do I become passingly educated on this quickly? Just in time educated. I only need to know what I need to know to take the next step. I've spent 20 years writing code. Explain quick. I'll get it. (I mean provide a reference to something that explains quickly of course).
I'm happy to read the right book, but I don't want to read a book where I read the chapter introduction that explains what floopy floop is and then skip over the rest of the chapter with examples of floopy flooping (because now I get what it is). I also don't want to read a book that goes into too much detail with theoretical underpinnings or history. For example, the Jurafsky book seems like way more than I need: http://www.amazon.com/gp/product/0131873210. But I will read it if this is the right book to read. (It's also dang expensive!)
I need the root node of the expedited learning tree here, if you will. Point me in the right direction and I'll be quite grateful. I'm expecting quite a lot of firehose drinking - I just need the right firehose.
Second, what I need to do is take a single sentence, with a very reduced vocabulary, and get a grammar tree (sorry if this is the wrong terminology) that I can do something with.
I know I could easily write this command line input style in c in a more conventional manner, but I need it to be way better than that. But I don't need a chatterbot either.
What I'm doing needs to live in a constrained environment. I can't use Python (unfortunately). I can't ship with gigabytes of corpuses. I need any libraries I use to be in c/c++. If I have to write this myself, I will. Hopefully, it will be achievable considering the reduced problem set. Maybe, probably, that's just naive. If so, let me know. :-)

Parsing is a fairly hard problem. Learning to write a good parser from scratch is on the order of a masters degree.
If you wanted to build a tagger or a classifier, there are relatively straightforward algorithms that you can learn without a complete grip on all the math. Parsing for trees is another story.
To get some idea, have a look here. There's a relatively state-of-the-art parser system here, and relevant links to the data resources. If you want to build a parser for a language that doesn't have a Penn TreeBank, for example, you have a rather large data problem to solve.
It might also help if you told us what real-world problem you want to solve. There are a lot of very useful NLP things that can be done without a parser.
Edit
I've been struggling to formulate a useful response to your, ahem, existential situation here.
You'd like to extract information from free-form input typed by people, a-la Zork.
You'd heard that there's this field, NLP, in which people have had some success in 'Natural Language Processing.' You're hoping to be able to find implementations or algorithms accessible to a competent professional program without a gigantic self-education problem.
Here's a biased historical explanation of the problem. Once upon a time, there was a field called AI. As part of AI, smart people set out to try to solve the problem of understanding natural language.
The most important early discovery was this that was a very, very, very, hard problem. What happened next was the natural thing. People set out to split the big, impossible, problem into smaller, less impossible problems. Several levels of splitting ensued.
Now, many years later, some of those smaller problems have really well-understood solutions. Some of them even have open source implementations that are relatively easy for anyone to deploy. See Weka or Apache Mahout for an example. Unfortunately for you, these subproblems are not what you are looking for.
The original, really big, 'AI' problem is far from solved, and the problem you want to tackle is fairly far up the food chain.
As I understand the state of the field, the best way to tackle the problem you have set yourself does not involve parse trees. The most successful systems that have done this have used relatively simple template patterns to identify things they are interested in. Think 'regular expressions'. Think, sadly, 'hacks.' Google 'Eliza.' A major reason for this is that humans, invited to type at a system, do not type in full, grammatical sentences. They leave things out, mistype other things, and generally make it as hard as possible.
I'm sorry to be so discouraging, and I'd be very happy to read someone else's answer giving you a less pessimistic reading. On the other hand, I'd also encourage you to find an online copy of the paper entitled 'Artificial Intelligence Meets Natural Stupidity.'

The Jurafsky & Martin book is the right book to read, or atleast have. It has a good index and a good bibliography and explains most of the concepts you need to know. There might also be a softcover version available which is considerably cheaper.

What you are seeking to do is often referred to as Natural Language Understanding, and solutions can run the gamut from simple pattern matching to methods that include statistical based parsing.
I guess it is important to identify how much flexibility you want to give the user in your dialogue interface. If you want to allow an open-ended, unrestricted natural language, then you will definitely need to start with Jurafsky and Martin. In particular I would read the chapters on parsing, semantics, and dialogue systems.
If the syntax and vocabulary is fairly restricted for your interface, I would suggest treating this problem more like computer language parsing. To do this I would read up on Lex and Yacc (or the GNU variants Flex and Bison) to get a feel for how you can lexicalize input and then place it into some sort of hierarchical structure. Either way, I suggest you read the wikipedia entries on Earley and CYK parsers (I would link to them, but the spam prevention won't allow me to post more than one link). It should shed some light on the basics of natural language parsing.
With either approach, there is a possibility that you will need to craft your own grammar.
A simple more, semantic looking grammar might look something like this:
Sentence := CommandSentence
CommandSentence := Command DT Object PP DT Location
CommandSentence := Command DT Object PP Receiver
Command := give
Command := place
DT := a
DT := an
DT := the
Object := pencil
Object := sword
PP := on
PP := in
Location := cave
Location := table
Receiver := Bob
Receiver := Sally
A a parse for a command like "Place the sword in the cave" would have a simple parse like this:
(Sentence
(CommandSentence
((Command (place))
(DT (the))
(Object (sword))
(PP (in))
(DT (the))
(Location (Cave))
)
)
)
You might also just try playing with a readily available parser like the Collins Parser to see if you can extract useful information from its output.
Good luck!

Related

machine representation of natural text

I'm currently working on high-level machine representation of natural text.
For example,
"I had one dog but I gave it to Danny who didn't have any"
would be
I.have.dog =1
I.have.dog -=1
Danny.have.dog = 0
Danny.have.dog +=1
something like this....
I'm trying to find resources, but can't really find matching topics..
Is there a valid subject name for this type of research? Any library of resources?
Natural logic sounds like something related but it's not really the same thing I'm working on. Please help me out!
Representing natural language's meaning is the domain of computational semantics. Within that area, lots of frameworks have been developed, though the basic one is still first-order logic.
Specifically, your problem seems to be that of recognizing discourse semantics, which deals with information change brought about by language use. This is pretty much an open area of research, so expect to find a lot of research papers and PhD positions, but little readily-usable software.
As larsmans already said, this is pretty much a really open field of research, called computational semantics (a subfield of computational linguistics.)
There's one important thing that you'll need to understand before starting off in the comp-sem world: most people there use fancy high-level languages. By high-level I don't mean C, but more something like LISP, Prolog, or, as of late, Haskell. Computational semantics is very close to logic, which is why people researching the topic are more comfortable with functional and logical languages — they're closer to what they actually use all day long.
It will also be very useful for you to first look at some foundational course in predicate logic, since that's what the underlying literature usually takes for granted.
A good introduction to the connection between logic and language is L.T.F. Gamut — Logic, Language, and Meaning, volume I. This deals with the linguistic side of semantics, which won't help you implement anything, but it will help you understand the following literature. That said, there are at least some books that will explain predicate logic as they go, but if you ask me, any person really interested in the representation of language as a formal system should take a course in predicate and possibly intuitionist and intensional logic.
To give you a bit of a peek, your example is rather difficult to treat for
current comp-sem approaches. Not impossible, but already pretty high up the
scale of difficulty. What makes it difficult is the tense for one part (dealing
with tense and aspect will typically bring you into even semantics,) but also
that you'd have to define the give and have relations in a way that
works for this example. (An easier example to work with would be, say "I had
a dog, but I gave it to Danny who didn't have any." Can you see why?)
Let's translate "I have a dog."
∃x[dog(x) ∧ have(I,x)]
(There is an object x, such that x is a dog and the have-relation holds between
"I" and x.)
These sentences would then be evaluated against a model, where the "I"
constant might already be defined. By evaluating multiple sentences in sequence,
you could then alter that model so that it keeps track of a conversation.
Let's give you some suggestions to start you off.
The classic comp-sem system is
SHRDLU, which places geometric
figures of certain color in a virtual environment. You can play around with it, since there's a Windows-compatible demo online at that page I linked you to.
The best modern book on the topic is probably Blackburn and Bos
(2005). It's written in Prolog, but
there are sources linked on the page to learn Prolog
(now!)
Van Eijck and Unger give a good course on computational semantics in Haskell, which is a bit more recent, but in my eyes not quite as educational in terms of raw computational semantics as Blackburn and Bos.

Resources for learning a new language quickly?

The title may seem slightly self-contradictory, and I accept that you can't really learn a language quickly. However, an experienced programmer that already has knowledge of a few languagues and different styles (functional, OO, imperative etc.) often wants to get started quickly. I've seen a few websites doing effective "translations" in the form of "just show me syntax equivalence". I can't remember the sites now, but for related languages (e.g. Perl/PHP) it's quite common.
Is there a better resource that covers more languages? Is there a resource that covers idioms as well as syntax? I think this would be incredibly useful for doing small amounts of work on existing code bases where you are not familiar with the language. Looking at the existing code, as we know, is not always a good indicator of quality. Likewise, for "learn by doing" weekend project I always have the urge to write reasonably idiomatic, clean code from the start. Such a resource could also link to known good example projects of varying sizes for those that prefer to learn by reading. Reading a well-written medium sized code base can also be much more practical when access to development environments might be limited.
I think it's possible to find tutorials and summaries for individual languages that provide some of this functionality in disparate web locations but I'm hoping there is a good, centralised, comparative place that the busy programmer can turn to.
You generally have two main things to overcome:
Syntax
Reference
Syntax you can pick up fairly quickly with a language tutorial and a stack of samplecode.
Reference (library/API calls) you need to find a proper guide to; perhaps the language reference, or perhaps google...
With those two in place, following a walkthrough (to get you used to using the development environment) will have you pretty much ready - you'll be able to look up what you want to say (reference), and know how to say it (syntax).
This, of course, applies principally to procedural/oop languages; languages that require a paradigm switch (ML/Haskell) you should go to lectures for ;)
(and for the weirder moments, there's SO!)
In the past my favour was "learning by doing". So e.g. I know a little bit of C++ and a lot of C#.Net but I must write a FTP Tool in Python.
So I sit for an hour and so the syntax differences by a tutorial, than I develop the form itself and look at the generated code. Then I search a open source Python FTP Client and get pieces of code (Not copy and paste, write it self to see, feel and remember the code!)
After a few hours I get it.
So: The mix is the best. A book, a piece of good code, the willing to learn and a free night with much coffee.
At the risk of sounding cheesy, I would start with the language's website tutorial and/or FAQ, followed by asking more specific questions here. SO is my centralized location for programming knowledge.
I remember when I learned Perl. I was asked to modify some Perl code at work and I'd never seen the language before. I had experience with several other languages, however, so it wasn't hard to figure out the syntax with the online Perl docs in one window and the code in another, side-by-side. I don't know that solely reading existing code is necessarily the best way to learn. In my case, I didn't know Perl but I could tell that the person who originally wrote the code didn't know Perl either. I'm not sure I could've distinguished between good Perl and really confusing Perl. It would've been nice to be able to ask questions here at the time.
Language isn't important. What is important is learning your ways around designing algorithms and the proper application of design patterns. Focus on the technique, not the language that implements a certain technique. Once you understand the proper development techniques, any programming language will just become real easy, no matter how obscure they are...
When you put a focus on a language, you're restricting your own knowledge.
http://devcheatsheet.com/ seems to be a step in the right direction: it aggregates cheat sheets/quick references and they are (somewhat) manually reviewed. It's also wide-ranging. It still comes up short a bit in terms of "idiomatic" quick reference: for example, the page on Ruby doesn't mention yield.
Rosetta Code appears to be an excellent resource that includes hints on coding idiomatically and moves from simple (like for-loops) to things like drawing. I haven't checked out how comprehensive it is, but there are a large number of languages and tasks listed. The drawbacks re: original question are:
Some of the linking is not accurate
(navigating Python->ForLoop will
take you to the top of the ForLoop
page, not the Python section). It's a
wiki, this can be improved.
Ideally you could "slice" the wiki
however you chose to see e.g. the top
20 tasks for two languages
side-by-side.
http://hyperpolyglot.org/ seems to be an almost perfect match for what I was looking for. The quality is not always there, or idiom can be lacking, but it has the same intention and is pretty comprehensive.

What is a good example to show to a non-programmer to explain what programming "looks like"? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last month.
The community reviewed whether to reopen this question last month and left it closed:
Original close reason(s) were not resolved
Improve this question
A friend of mine asked me the other day if I'm just looking at lists of numbers when I'm programming, or how it works. I tried to explain that it's generally more like math formulae, with the odd english word tossed in, and that it's generally mostly readable. But that's a very vague explanation, and it doesn't really explain much to a non-programmer.
But it got me to thinking about what would make a good example. Not because I want to teach her programming or anything, but simply to give her an idea of what program code "looks like".
And that got me to wonder about what would actually work as a good example. And that's turning out to be surprisingly difficult.
My first thought was obviously a simple "Hello World" program. But it really doesn't show anything useful. It doesn't really show how we use functions, or variables, or control flow structures like if or while to make a program that actually does something. There's no logic to it. The program doesn't react to anything.
So perhaps something like computing prime numbers would be a better example. But then again, that might be too theoretical and impractical... (What good is that? What does it have to do with writing "real" programs?) And again, there's no significant control flow logic in it. It's just a straight sequence of maths.
And also, which language should be used?
Ideally, I don't think it has to be a very "clean" language. But rather, it should probably make the structure clear. If it does that, then a certain amount of noise and clutter might be fine. Perhaps something like C++ would actually be a better example than Python for that reason. The explicit curly braces and type specifiers are obvious "hooks" to help explain how the program is structured, or to highlight that it's not just simple statements that can almost be read out as english.
But with C++ we also get into some downright weird syntax. Why is std::cout << x used to print out x? Why not a "normal" function call syntax? And printf isn't much better, with its arcane format string, and lack of extensibility (do I want to complicate the program by using char* for strings? Or do I use std::string and settle for calling the seemingly unnecessary s.c_str() to get a string that can be printed with printf?
Perhaps a higher level language would be better after all. But which one? And why?
I know there are plenty of similar questions here about which language/example program to use to teach programming. But I think the requirements here are different. When teaching programming, we want simplicity more than anything. We want to avoid anything that hasn't been taught yet. We want to make sure that the student can understand everything on the screen.
I'm not interested in simplicity per se. But rather in giving an "outsider" an idea of "what a program looks like". And programs aren't simple. But they do generally exhibit a certain structure and method to the madness. What language/program would best highlight that?
Edit
Thanks for all the suggestions so far. Some of you have had a somewhat different angle on it than I'd intended.
Perhaps an example is in order. I can't fly an airplane, but I've got a basic understanding of what the cockpit looks like, and what a pilot "does" while flying.
And I'm not a trained carpenter, but I know a saw or a hammer when I see one.
But when you see anything to do with programming in movies, for example, it's usually just screens filled with garbage (as in the green text in the Matrix). It doesn't look like something a normal human being can actually do. There's nothing recognizable in it. Someone who isn't a programmer simply thinks it's black magic.
I don't want to teach her to fly, or to program software.
But I'd like to give her a basic frame of reference. Just an idea of "ah, so that's what you're working with. So it's not just random symbols and numbers on the screen". Even just showing a simple if-statement would be a revelation compared to the Matrix-style random symbols and numbers.
Some of you have suggested explaining an algorithm, or using pseudocode, but that's what I want to avoid. I'd like something that simply shows what actual code looks like, in the same way that you don't have to be a carpenter to look at a saw and get a basic idea of what it is and how it works.
When I was a kid, we once went on vacation in Italy. On the way down, the pilot let me into the cockpit of the plane. Of course, I didn't learn how to fly the plane. But I did get a peek into the pilot's world. I got an idea of how they make the plane go, what the pilot actually does.
That's really all I want to do. My friend has no interest in learning programming, and I don't want to force her to understand source code. But she was curious about what kind of tools or entities I work with. Is it Matrix-style symbols scrolling across the screen? Pure mathematics? English in prose form?
All I'm interested in conveying is that very high-level understanding of "What does it look like when I work".
BASIC
10 PRINT "Sara is the best"
20 GOTO 10
Update: when I was 12, my cousin (he was 14) brought Turbo Pascal 7.0 and installed it in my computer.
He programmed a tic tac toe game from scratch (in BGI mode, for those who know).
I watched/observed step by step how a program evolves until it becomes a complete application.
Until then, I knew only how to print strings in BASIC :-B
You can do a similar thing. Pair programming. Well, actually your friend will be an observer but she'll get an idea ;)
Why not consider a language that doesn't exist (or does, if you so believe) and use Pseudo Code? I think, depending upon what you want to achieve - I'd consider the example of task familiar to the person, but carved up into a pseudo code example.
I generally find the idea of "cooking" or "recipes" to be a great fit when explaining things to non-programmers.
I ask the person to imagine they had a recipe that was fairly complex - e.g. a curry & rice dish. I then suggest that they should try and write it down for someone who has absolutely no idea what they are doing, so that they can cook it.
There is a very definite few stages involved:
Gather the ingredients and tools for the job.
Prepare the ingredients. This is complex. e.g.
get 3 Small Red Peppers.
for each red pepper you have, chop it into chunks about 1cm square.
place the red pepper chunks into a bowl for later.
Seperately to this, call the prepare rice function and have this working asynchronously in the background while you continue on with the cooking.
I'm sure you can see where this is going... ;)
There are a lot of similarities with Cooking and Programming (as there are with many things, but more people have an understanding of cooking than of building a house).
There stages / similarities (as I see it) are:
Gathering: (declaration of what is required to achieve the goal and getting them together).
Preperation: Chopping the ingredients or readying the data connection objects etc for first use.
Asynchronous: The ability to have one thing going while another thing going.
Functions: The rice making, the chicken cooking and the curry cooking all require separate processes and only at the end can you have the makeCurry(chickenMeat, rice) function.
Testing: Ensure that as you are going along, you aren't missing any bits and that everything is going smoothly - e.g. ensuring chicken is cooked before you move to the next stage.
Garbage: Once you've done, you must ensure that you tidy up. ;)
Principals of Best Practice: There are efficient ways of doing things that like cooking, beginning programmers have to learn in addition to the code - sometimes it can be hard to get your head around. e.g. D.R.Y, how to chop efficiently with a knife & don't eat raw chicken ;)
Basically, I think for teaching programming as a general topic - I wouldn't necessarily teach from a language unless you had a compelling reason to do so. Instead, teach initially from the abstract until they understand at least the fundamentals of how things might fall together. Then they may find it easier when sat in front of a monitor and keyboard.
I think there may not be one "right answer" for this one. But I think maybe a few really good ideas you could maybe take bits from all of.
I would explain that programming is giving detailed instructions so the computer can make complex tasks.
How to make two cups of coffee?
Fill the kettle
Boil water
Coffee in cup
Pour on water
Add sugar
Add milk
Do 3 to 6 again
To answer your question directly - what programming “looks like”, I'd show them a print out of a large application. Toy apps or cute things like qsort in haskell really give the wrong idea.
It looks a bit like this. Sometimes.
Maybe everyone is concentrating too much on the code rather than tools. Possibly it's best to show her a project in an IDE, and how it includes various source files and maybe some diagrammatic things like a database schema or a visual user interface designer too. Visual Studio, Eclipse or Xcode are quite far away from most people's mental image of rapidly scrolling glowing green symbols on a black background.
I think you should download some big win32 application, written in AT&T assembly language, and show it to her in notepad, and tell her, "As you can see, it takes a superhuman like myself to program."
Code something that has any comprehensible value to a non-programmer. If I'd demonstrate Quicksort to my mother, it wouldn't be of any use.
Ask the person about his interests. If he/she's into stock exchange for example, hack together a script that reads some stock statistics from a appropriate web page and compiles them into an excel sheet (use csv, to avoid heavy brain-damage ^^) or maybe into a nice graph.
If the person uses Twitter, code something that counts the followers of his followers or something like that.
These tasks are simple enough to be done in a very short time and they already utilize a lot of the basic tools that we programmers use, like loops, libraries (for all the http stuff involved), maybe recursion.
After you're finished or while you're coding (even better), you can explain how your program does its magic.
Just keep it simple and talk in human language. If you show them megabytes of code and talk about things like prototypal inheritance, you just intimidate them and they will lose interest immediately.
To give my wife an idea of what I do to bring in a paycheck (It's real work! I promise! we don't just browse the web all day!) I sat down with her one evening with Python and showed her a couple of the basic concepts: calling a function (print), assigning and reading a variable, and how an if statement works. Since she's a teacher, I likened the concept of conditionals to working with preschoolers :)
IF you hit Jonny THEN you're in time out OTHERWISE you can have a snack.
After reviewing a couple of the very high level concepts, I then showed her the code to a simple number guessing game and let her play it while looking though the code.
# Guessing Game
import random
print("Guess a number between 1 and 100: ")
target = random.randint(1, 100)
guess = 0
guess_count = 1
while guess != target:
guess_count += 1
guess = int(input())
if guess == target:
print("Correct!")
if guess < target:
print("Higher...")
if guess > target:
print("Lower...")
print("Congratulations! You guessed the number in " + str(guess_count) + " guesses!")
Aside from the somewhat abstract concept of "import", this is a very straightforward example that is easy to follow and "connect" to what's happening on the screen, not to mention it actually does something interesting and interactive. I think my wife walked away from the experience a little less mystified by the whole concept without really needing to know much in the way of code.
I think the key is being able to have someone see the code AND it's end result side by side.
There was a CLI graphics package called LOGO, and best known for Turtle Graphics, used to draw shapes on screen using commands like LT 90, RT 105 etc. See if you can find that, it would be a nice experience to try and draw something of medium complexity.
LOGO - Logic Oriented Graphic Oriented programming language.
REPEAT 360 [FD 1 RT 1] -- draws a circle, etc.
See more at logothings or Wikipedia which also has links to modern logo interpreters.
The computer programmer writes programs.
While not programming, the computer programmer annoys attractive women in his workplace.
Then:
(source: markharrison.net)
Now:
When my 5 y.o. daughter asked me the question I made her "develop" the program for a little arrow "robot" that will get him into the upper-left-corner of the board using flowchart-like pieces of paper signifying moves, turns and conditions. I think it applies to grown-ups as well.
I do not claim the invention of this answer, though.
About your edit: I'm afraid, programmers have even less idea of the idea others have about programming. ;-) People think that programming is a matrix-like green video card corruption about as much as they think that spies are all equipped with James Bond's hi tech toys. And any professional in any field is normally irritated when watching the movie concerning his job. Because the movie maker has no idea! Do we know how to properly depict programming in the movie on the other hand? ;-)
I've found that the best approach to "teach someone what programming is without teaching them programming" is actually to just drop anything related to a specific programming language.
Instead (assuming they're actually interested), I would talk them through implementing a function in a program, like a simple bank loan application (most people have had to deal with loans at some stage, if they're above a certain age), and then poking holes in all the assumptions.
Like, what should happen if the user types in a negative loan amount? What if the user cannot afford the loan? How would the loan application know that? How would the loan application know which bank account to check and which payment history to check (ie. who is the user actually)? What if the user tries to type in his name in the loan amount field? What if the user tries to take the loan over 75 years? Should we limit the choices to a list of available lengths?
And then in the end: Programming is taking all of those rules, and writing them in a language that the computer understands, so that it follows those rules to the letter. At this point, if it is felt necessary, I would pull out some simple code so that the overall language can be looked at, and then perhaps written out one of the rules in that language.
Bonus points if you can get your friend to then react with: But what if we forgot something? Well, then we have bugs, and now you know why no software program is bugfree too :)
Definitively something either with graphics, or windows, in a higher level language.
Why? A non-programmer is usually a non-matematician too, that's why he won't get the beauty of sorting. However showing something drawn on a screen ("look, a window!", "look, so little typing and we have a 3D box rotating!") can work wonders ;).
What does it look like when you work?
It looks like typing.
Seriously though, programming is kind of like if Legos were text, and to build a big Lego house, you had to type a lot of text in, just right, hooking up the right pegs with the right holes. So that is how I generally describe it.
It's really hard to understand what programming is like just from a source code example, because it is so abstract.
There is nothing wrong with starting on hello world, as long as you can show what the computer actually does with it. You can then introduce one construct at a time. That's what programming is like- Making incremental changes, and seeing the results.
So you have a hello world program. Now change it to
string Name = getLine();
printf("Hello, %s", name);
then the if construct
printf("Do you like cake?");
string answer = getLine();
if(answer == "yes") {
printf("Yeay! I like cake too!");
} else if(answer == "no") {
printf("Filthy cake hating pig!");
}
then demonstrate that the last program fails when it recieves an answer other than either "yes" or "no", and how you would go about fixing it....
and so on. I don't think you need to go into deep concepts like recursion, or even functions really.
It doesn't really matter what programming you use for this, as long as you're able to show, on a computer, the result of these different programs. (though these psuedocode examples are probably pretty close to being valid python)
Robotics is great for explaining programming, I think, because even simple, contrived examples are practical. The robotics equivalent of Hello World or printing a list of numbers might be having the robot move in a line or spin in a circle. It's easy for a non-programmer to understand that for a robot to do ANYTHING useful it must first move and position itself. This lets you explain simple program structure and flow control.
You want the robot to move forward, but only while there is nothing blocking its path. Then you want it to turn and move again. That's a simple routine using basic flow control, and the functions that you're calling are pretty easy to understand (if your platform has decent abstractions anyway).
Graphics might also work. Anything that has immediate results. jQuery even, because everyone is familiar with rotating pictures and web animations. Even contrived examples like pushing elements around in the DOM has an easy to see effect, and most people will understand what and why the statements in the program do.
Things like Robocode and LOGO are probably really good for this.
(source: wikimedia.org)
{
wait for 6/8;
play F (5), sustain it for 1/4 and a half;
play E flat (5), sustain it for 1/8;
play D flat (5), sustain it for 1/8 and a half;
play F (4), sustain it for 1/16;
// ...
}
Perhaps a metaphor could be that of a composer writing a musical score. The job of a composer is the intellectual activity of creating music. With a score, the composer is telling the pianist what to play, and he does it by means of precise instructions (notes, pauses and so on). If the "instructions" are not precise enough, the pianist will play something different.
The job of a software developer is the intellectual activity of solving problems (problems that have to do with automated processing of data). With source code, the developer is telling the computer what to do, and he does it by means of precise instructions. If the instructions are not precise enough, the computer will do something different and will not solve the problem correctly.
I would just write something in pseudocode that demonstrates how to use a computer to solve an everyday problem. Perhaps determining which store is cheaper to buy a particular grocery list from or some such.
Why not just show the timelapse video A Day in the Life of a Scrum Team?
A programmer writes instructions for the computer to perform.
Running the program results in the computer actually following those instructions.
An example is a cook will follow a recipe in order to bake a loaf of bread. (even if it's in their head)... that's programming. Unlike my wife, the computer follows the recipe exactly every time. My wife, does it in her head and it turns out different but delicious every time ;-)
If you want to go ahead and teach this in more detail then pseudo-code is easy to understand.
e.g.
IF today's date is the 1st of may then
print to screen "Happy Birthday"
ELSE
print to screen "It's not your birthday yet"
The beauty of psuedo code is nearly anyone can understand it and this is the point of it.
Want to show her what programming looks like? Just pop a terminal and
find /
Surprised this is still open, and surprised no one has already given this answer. (I think. I might have accidentally skipped one of the 40 questions that no one is going to read anyway.)
Your answer is in your question
When I was a kid, we once went on vacation in Italy. On the way down,
the pilot let me into the cockpit of the plane. Of course, I didn't
learn how to fly the plane. But I did get a peek into the pilot's
world. I got an idea of how they make the plane go, what the pilot
actually does.
That's really all I want to do.
That's all you have to do. Pick a short exercise out of a tutorial. A moderately longer GUI one could also be beneficial due to the added visuals. (Games might be pushing the length a bit.) And let her watch you code. That's it. It's the same as your pilot example.
Also there are a number of online REPLs that will make watching you code even more immediate.
I say show him bubble sort.
It's an easy, understandable trick, converted to a formal language.
That's what our job is about. Expressing our ideas in a strict, formal language, such that even a machine can understand. A little similar to designing procedures for organizational design.
Code up something quick that reads stock quotes and writes them to an excel spreadsheet. This is easy enough to do with a few minutes and impresses non technical types very quickly as they see the practical value of it.
My usual choice is to retrieve a set of customer records from a database. Using C# and LINQ in Visual Studio, it takes maybe 10 minutes at most to build a web page and dump out the "Northwind" database customers into a grid. The nice thing is that a "list of customers" is something that almost anybody can understand.
Totally depends on the level of her interest (or your level of interest in her). Most people ask that question as idle conversation, and don't really want to know.
Programming is more than algorithms (like "How to make a cup of coffee), it's also fundamentally rooted in mathematics. Most people will be quickly tripped up by the subtle use of mathematical terms (what's a "function"?).
In order to really teach programming, it may help to think back to your own first programming experiences, your first programming teacher, your first programming language. How did you learn? when you were learning, what skills did you already have fresh in your mind (i.e., calculus)? What motivated you to want to understand what a variable is or why there are three different kinds of loops?
Language-wise: Use something like python. Really high level, non-curly-bracket probably better.
Alice which was developed at Carnegie Mellon.
Alice is an innovative 3D programming environment that makes it easy to create an animation for telling a story, playing an interactive game, or a video to share on the web. Alice is a teaching tool for introductory computing. It uses 3D graphics and a drag-and-drop interface to facilitate a more engaging, less frustrating first programming experience.
In pseudo-code:
function dealWithPerson(person){
if(ILike(person)){
getCookie().giveTo(person);
}
else{
person.tell("You shall receive no cookies!");
}
}
dealWithPerson(Person.fromName("Nick"));
dealWithPerson(Person.fromName("John"));
This demonstrates the concept of functions, object-orientation and strings, in a C-like syntax(when I say C-like syntax I refer to the weird characters).
It also shows how code can be reused.
Note that although it is pseudo-code, I wouldn't be surprised if there was some language that accepted this syntax(perhaps JavaScript allows this?).
You could also adapt this example to have loops.
Hope this helps show that person how a program looks like(since it is a realistic syntax and it is relatively easy to understand).
I have been teaching programming for many years and found out that the number of ways you need to explain things is equal to the number of students you have. But one method that works most of the time is as follows:
Present a flow chart that shown the flow of logic of a simple application
Write the instructions in full human language (e.g. English)
Abbreviate each instruction to the short-hand used in the programming language
Choose a less cryptic language like Basic or Pascal for teaching purposes
All code is simply shorthand for natural language. Written in full English most programs seem trivial.
As for a good algorithm, that is another story. It is sad to see many Computer Science courses no longer teach algorithms or brush over it.

Expert system Basics

I need do write an expert systems that should aid user in picking up best mobile phone operator. It should be very simple and not based on languages/libaries such as CLISP or JESS. So I need to write it all from the ground up.
Do you know some books or online tutorials that explains how this can be done?
What I really need to get to know is how to represent knowledge and facts.
Any help would be much appreciated.
If you get any of the good texts on AI, there will be a section on expert systems; you can, if forced, work it out from there and implement your own.
The basic idea is really fairly simple: you have a collection of rules in "if-then" form that represent inferences, or4 implications. Like, for example:
IF blood temperature > 41°C
THEN patient.has-fever := TRUE
IF patient has wet-sounding breathing
THEN patient.has-pneumonia
IF patient.has-fever AND patient.has-pneumonia
THEN CONCLUDE bacterial pneumonia. ACTION prescribe Augmentin
In other words, you have a bunch of rules, and you evaluate the rules until you get to a conclusion. There's a lot more to is (forward or backward chaing and that kind of thing) which you can read about in thed pretty decent Wikipedia article.
I'm puzzled why you can't use an existing rule engine though -- there are a number of them, for most languages, usually under pretty liberal licenses. That's really an easier route unless this is a homework problem or something.
Prolog is well suited to writing rule-based systems (a pretty standard approach to expert systems development). P# compiles to C#, which may meet your needs - and it's free.
More information on P#.
The basis rationale, and mathematical proof, for the PROLOG language, should help you understand most of the concepts you will need to address, if not provide the final language you need to use to implement it.
I couldn't find a link to the original implementation, but it would not help you much anyway. Alain Colmerauer's early work on logic programming should be helpfull.
[EDIT] Sorry, duplicate...
I would vote for some implementation of Prolog or CLIPS, depending if backward or forward chaining logic best suits the problem. Instead of re-implementing either of these, spend the time working out how to integrate them with your environment.
Jess is a good choice but you should read the book "Jess in action" as a first step.

What are good starting points for someone interested in natural language processing? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Question
So I've recently came up with some new possible projects that would have to deal with deriving 'meaning' from text submitted and generated by users.
Natural language processing is the field that deals with these kinds of issues, and after some initial research I found the OpenNLP Hub and university collaborations like the attempto project. And stackoverflow has this.
If anyone could link me to some good resources, from reseach papers and introductionary texts to apis, I'd be happier than a 6 year-old kid opening his christmas presents!
Update
Through one of your recommendations I've found opencyc ('the world's largest and most complete general knowledge base and commonsense reasoning engine'). Even more amazing still, there's a project that is a distilled version of opencyc called UMBEL. It features semantic data in rdf/owl/skos n3 syntax.
I've also stumbled upon antlr, a parser generator for 'constructing recognizers, interpreters, compilers, and translators from grammatical descriptions'.
And there's a question on here by me, that lists tons of free and open data.
Thanks stackoverflow community!
Tough call, NLP is a much wider field than most people think it is. Basically, language can be split up into several categories, which will require you to learn totally different things.
Before I start, let me tell you that I doubt you'll have any notable success (as a professional, at least) without having a degree in some (closely related) field. There is a lot of theory involved, most of it is dry stuff and hard to learn. You'll need a lot of endurance and most of all: time.
If you're interested in the meaning of text, well, that's the Next Big Thing. Semantic search engines are predicted as initiating Web 3.0, but we're far from 'there' yet. Extracting logic from a text is dependant on several steps:
Tokenization, Chunking
Disambiguation on a lexical level (Time flies like an arrow, but fruit flies like a banana.)
Syntactic Parsing
Morphological analysis (tense, aspect, case, number, whatnot)
A small list, off the top of my head. There's more :-), and many more details to each point. For example, when I say "parsing", what is this? There are many different parsing algorithms, and there are just as many parsing formalisms. Among the most powerful are Tree-adjoining grammar and Head-driven phrase structure grammar. But both of them are hardly used in the field (for now). Usually, you'll be dealing with some half-baked generative approach, and will have to conduct morphological analysis yourself.
Going from there to semantics is a big step. A Syntax/Semantics interface is dependant both, on the syntactic and semantic framework employed, and there is no single working solution yet. On the semantic side, there's classic generative semantics, then there is Discourse Representation Theory, dynamic semantics, and many more. Even the logical formalism everything is based on is still not well-defined. Some say one should use first-order logic, but that hardly seems sufficient; then there is intensional logic, as used by Montague, but that seems overly complex, and computationally unfeasible. There also is dynamic logic (Groenendijk and Stokhof have pioneered this stuff. Great stuff!) and very recently, this summer actually, Jeroen Groenendijk presented a new formalism, Inquisitive Semantics, also very interesting.
If you want to get started on a very simple level, read Blackburn and Bos (2005), it's great stuff, and the de-facto introduction to Computational Semantics! I recently extended their system to cover the partition-theory of questions (question answering is a beast!), as proposed by Groenendijk and Stokhof (1982), but unfortunately, the theory has a complexity of O(n²) over the domain of individuals. While doing so, I found B&B's implementation to be a bit, erhm… hackish, at places. Still, it is going to really, really help you dive into computational semantics, and it is still a very impressive showcase of what can be done. Also, they deserve extra cool-points for implementing a grammar that is settled in Pulp Fiction (the movie).
And while I'm at it, pick up Prolog. A lot of research in computational semantics is based on Prolog. Learn Prolog Now! is a good intro. I can also recommend "The Art of Prolog" and Covington's "Prolog Programming in Depth" and "Natural Language Processing for Prolog Programmers", the former of which is available for free online.
Chomsky is totally the wrong source to look to for NLP (and he'd say as much himself, emphatically)--see: "Statistical Methods and Linguistics" by Abney.
Jurafsky and Martin, mentioned above, is a standard reference, but I myself prefer Manning and Schütze. If you're serious about NLP you'll probably want to read both. There are videos of one of Manning's courses available online.
If you get through Prolog until the DCG chapter in Learn Prolog Now! mentioned by Mr. Dimitrov above, you'll have a good beginning at getting some semantics into your system, since Prolog gives you a very simple way of maintaining a database of knowledge and belief, which can be updated through question-answering.
As regards the literature, I have one major recommendation for you: run out and buy Speech and Language Processing by Jurafsky & Martin. It is pretty much the book on NLP (the first chapter is available online); used in a frillion university courses but also very readable for the non-linguist and practically oriented, while at the same time going fairly deep into the linguistics problems. I really cannot recommend it enough. Chapters 17, 18 and 21 seem to be what you're looking for (14, 15 and 18 in the first edition); they show you simple lambda notation which translates pretty well to Prolog DCG's with features.
Oh, btw, on getting the masters in linguistics; if NL semantics is what you're into, I'd rather recommend taking all the AI-related courses you can find (although any courses on "plain" linguistic semantics, logic, logical semantics, DRT, LFG/HPSG/CCG, NL parsing, formal linguistic theory, etc. wouldn't hurt...)
Reading Chomsky's original literature is not really useful; as far as I know there are no current implementations that directly correspond to his theories, all the useful stuff of his is pretty much subsumed by other theories (and anyone who stays near linguists for any matter of time will absorb knowledge of Chomsky by osmosis).
I'd highly recommend playing around with the NLTK and reading the NLTK Book. The NLTK is very powerful and easy to get into.
You could try reading up a bit on phrase structured grammers, which is basically the mathematics behind much language processessing. It's actually not that heavy, being largely based on set and graph theory. I studied it many moons ago as part of a discrete math course, and I guess there are many good references available at this stage.
Edit:Not as much as I expected on google, although this one looks like a good learning source.
One of the early explorers into NLP is Noam Chomsky; he wrote small books on the subject in the 50s through the 70s. You may find that engaging reading.
Cycorp have a short description of how their Cyc knowledge base derives meaning from sentences.
By utilising a massive knowledge base of common facts, the system can determine the most logical parse of a sentence.
A simpler place to begin with the building blocks is the look at the documentation for a package that attempts to do it. I'd recommend the Python [Natural Language Toolkit (NLTK)1, particularly because of their well-written, free book, which is filled with examples. It won't get you all the way to what you want (which is an AI-hard problem), but it will give you a good footing. NLTK has parsers, chunkers, context-free grammars, and more.
This is really hard stuff. I'd start off by getting at least a Masters in Linguistics, and then work towards my PhD in computer science, concentrating on NLP.
The problem is that most of us don't have the understanding of what language is. And without that understanding, it's bloody tough to implement a solution.
Other comments give some readings, which are probably fine if you want to get started playing around with a small subset of the problem, but in order to come up with a really robust solution, then there are no shortcuts. You need the academic background in both disciplines.
A very enjoyable readable introduction is The Language Instinct by Steven Pinker. It goes into the Chomsky stuff and also tells interesting stories from the evolutionary biology angle. Might be worth starting with something like that before diving into Chomsky's papers and related work, if you're new to the subject.

Resources