Checking English Grammar with NLTK [closed] - nlp

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I'm starting to use the NLTK library, and I want to check whether a sentence in English is correct or not.
Example:
"He see Bob" - not correct
"He sees Bob" - correct
I read this, but it's quite hard for me.
I need an easier example.

Grammar checking is an active area of NLP research, so there isn't a 100% answer (maybe not even an 80% answer) at this time. The simplest approach (or at least a reasonable baseline) would be an n-gram language model (normalizing LM probabilities for utterance length and setting a heuristic threshold for 'grammatical' or 'ungrammatical'.
You could use Google's n-gram corpus, or train your own on in-domain data. You might be able to do that with NLTK; you definitely could with LingPipe, the SRI Language Modeling Toolkit, or OpenGRM.
That said, an n-gram model won't perform all that well. If it meets your needs, great, but if you want to do better, you'll have to train a machine-learning classifier. A grammaticality classifier would generally use features from syntactic and/or semantic processing (e.g. POS-tags, dependency and constituency parses, etc.) You might look at some of the work from Joel Tetrault and the team he worked with at ETS, or Jennifer Foster and her team at Dublin.
Sorry there isn't an easy and straightforward answer...

Related

Programming Wavelets for Audio Identification [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
How exactly is a wavelet used digitally?
Wikipedia states
"a wavelet could be created to have a frequency of Middle C and a
short duration of roughly a 32nd note"
Would this be a data structure holding e.g {sampleNumber, frequency} pairs?
If a wavelet is an array of these pairs, how is it applied to the audio data?
How does this wavelet apply to the analysis when using an FFT?
What is actually being compared to identify the signal?
I feel like you've conflated a few different concepts here. The first confusing part is this:
Would this be a data structure holding e.g {sampleNumber, frequency} pairs?
It's a continuous function, so pick your favourite way of representing continuous functions in a discrete computer memory, and that might be a sensible way to represent it.
The wavelet is applied to the audio signal by convolution (this is actually the next paragraph in the Wikipedia article you referenced...), as is relatively standard in most DSP applications (particularly audio-based applications). Wavelets are really just a particular kind of filter in the broader signal-processing sense, in that they have particular properties that are desirable in some applications, but they are still fundamentally just filters!
As for the comparison being performed - it's the presence or absence of a particular frequency in the input signal corresponding to the frequency (or frequencies) that the wavelet is designed to identify.

Language detection for very short text [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm creating an application for detecting the language of short texts, with an average of < 100 characters and contains slang (e.g tweets, user queries, sms).
All the libraries I tested work well for normal web pages but not for very short text. The library that's giving the best results so far is Chrome's Language Detection (CLD) library which I had to build as a shared library.
CLD fails when the text is made of very short words. After looking at the source code of CLD, I see that it uses 4-grams so that could be the reason.
The approach I'm thinking of right now to improve the accuracy is:
Remove brand names, numbers, urls and words like "software", "download", "internet"
Use a dictionary When the text contains a number of short words above a threashold or when it contains too few words.
The dictionary is created from wikipedia news articles + hunspell dictionaries.
What dataset is most suitable for this task? And how can I improve this approach?
So far I'm using EUROPARL and Wikipedia articles. I'm using NLTK for most of the work.
Language detection for very short texts is the topic of current research, so no conclusive answer can be given. An algorithm for Twitter data can be found in Carter, Tsagkias and Weerkamp 2011. See also the references there.
Yes, this is a topic of research and there is some progress that has been made.
For example, the author of "language-detection" at http://code.google.com/p/language-detection/ has created new profiles for short messages. Currently, it supports 17 languages.
I have compared it with Bing Language Detector on a collection of about 500 tweets which are mostly in English and Spanish. The accuracy is as follows:
Bing = 71.97%
Language-Detection Tool with new profiles = 89.75%
For more information, you can check his blog out:
http://shuyo.wordpress.com/2011/11/28/language-detection-supported-17-language-profiles-for-short-messages/
Also omit scientific names or names of medicines etc. Your approach seems quite fine to me. I think wikipedia is the best option for creating a dictionary as it contains standard language. If you are not running out of time, you can also use newspapers.

Best turnkey relation detection library? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
What is the best turnkey (ready to use, industrial-strength) relation detection library?
I have been playing around with NLTK and the results I get are not very satisfactory.
http://nltk.googlecode.com/svn/trunk/doc/book/ch07.html
http://nltk.googlecode.com/svn/trunk/doc/howto/relextract.html
Ideally, I would like a library that can take sentences like:
"Sarah killed a wolf that was eating a child"
and turn it into a data structure that means something like:
killed(Sarah, wolf) AND eating(wolf,child)
I know that this is the subject of a large body of research and that it is not an easy task. That said, is anyone aware of a reasonably robust ready-to-use library for detecting relations?
Update: Extractiv is no longer available.
Extractiv's On-Demand REST service:
http://rest.extractiv.com/extractiv/?url=https://stackoverflow.com/questions/4732686/best-turnkey-relation-detection-library&output_format=html_viewer will process this page, extract and display the two semantic triples you desire in the bottom left corner under "GENERIC". (It throws away some of the text from the page in the html viewer, but this text is not thrown away if you utilize json or rdf output).
This is assuming you're open to a commercial, industrial strength solution, though limited free usage is allowed. It's a web service but open source libraries can be used to access it or could be purchased from Language Computer Corporation.
These relations can be read fairly easily out of the output of dependency notations. For instance, put into the Stanford Parser online, you can see both of the two subject-verb-object triples in your example in the typed dependencies collapsed representation as:
nsubj(killed-2, Sarah-1)
dobj(killed-2, wolf-4)
nsubj(eating-7, wolf-4)
dobj(eating-7, child-9)

Learning a new language project [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Does anyone have a standard project that they use when learning a new language. Kinda like a specification document of a project that includes all aspects of programming. Does anyone use some sort of beginning type project when learning a new language? I guess it also depends on the type of language and what's it's capable of.
Contributing something to an open source project seems to work for me. In addition to getting exposed to some coding habits in the language , you get to work on something useful.
Going through the first few problems of Project Euler is a very good way to get a handle on topics like I/O, recursion, iteration, and basic data structures. I'd highly recommend it.
A friend of mine had a coworker who coded a minesweeper every time when he wanted to learn a new language with GUI.
I like making simple websites for learning.
Pro: you can put it online and show it to people.
Con: the language has to be suitable for web development.
Writing a simple ray tracer:
math functions (pow, sqrt, your own intersection routines)
recursion (because it is a whitted style recursive one)
iteration (for all pixels)
how to write custom types (rays, possibly vectors)
pixel wise graphics
have something to play with compiler's (optimization-) flags
optional:
simple GUI
file reading writing
I've also done so with metatrace.

Good starting book to learn fractal programming [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
what is a good book to learn fractal programming? I am a programmer, I am looking at a book that will be more algorithmic than mathematical (at least not in the beginning chapters). Should teach me the basics of fractals and different ways to generate them.
You might find the electric sheep project interesting.
It's an open source, distributed programming project that generates fractal animations.
Scott Drave's original paper on the electric sheep algorithm is a nice introduction to the algorithm used, and it concentrates mostly on the algorithmic aspect of creating the fractal image:
http://flam3.com/flame.pdf
When trying to learn the Mandelbrot set, I found this link useful.
http://warp.povusers.org/Mandelbrot/
For more than just Mandlebrot stuff if you can get ahold of "The Science of Fractal Images". It's an old book now (I read it when it was first published in 1988) but it's full of bits and pieces to try out.
A "golden oldie" site is Fractint: http://en.wikipedia.org/wiki/Fractint. Fractint is 20 years old and pioneered some of the ethos in collaborative computing. Some of the algorithms needed to increase precision are not trivial and this group developed integer arithmetic to support fractals.
Worth visiting to get the feel if nothing else.
But also visit http://en.wikipedia.org/wiki/Fractal-generating_software. There's a huge variety.
Note that some fractals such as the Mandelbrot set are "pixel-based" while others such as the "Snowflake curve" can use vector graphics. You'll need both approaches.
Some links
Fractal Programming
Fractal Geometry

Resources