Maximum Entropy for Natural Language Processing [closed] - statistics

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Can anyone explain simply how how maximum entropy models work when used in Natural Language Processing. I need to statistically parse simple words and phrases to try to figure out the likelihood of specific words and what objects they refer to or what phrases they are contained within.

I recommend the NLTK python package. You can also use MALLET or WEKA.
For a theoretical background, you should ask at https://stats.stackexchange.com/ or http://metaoptimize.com/qa/ .

Related

Programming language features [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Which features need to be present in a programming language such that it can express any sequential computation which a computer can excute today? And what if the language is Haskell in specific
Haskell is Turing complete.
My current beliefs have high weight on the outcome that any sound and complete description of "feature sets that guarantee Turing completeness" is either infinite or includes a non-terminating algorithm; so I believe it is not reasonable to expect an answer to your other question.

What does ''arbitrary shape'' mean? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have started some work on SVG graphics and i constantly come across the word ''arbitrary shapes''.
What exactly is an arbitrary shape?
An arbitrary shape is just that.. An arbitrary shape.
The word arbitrary in this context means any as in: not a specified, or specific, kind of shape.
This is not really a programming question though.. But rather an English language question.

The idea of Boyer-Moore's good suffix search [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
One of the key features of Boyer-Moore's algorithm is searching for good suffix. It requires to build a table of shifts on each possible suffix? But how to build this shift table? I don't understand it. Thank you!
There is an excellent explanation in the german wikipedia. I know this might not help, but with a bit of luck you can try to understand the example which is very clear.

bayes net open source [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Can anyone recommend a good opensource or free bayes net software program?
I have been using baysealab with a class, but my account will expire and I'd like to continue building and using bns.
If you have access to matlab:
BNT is great
If you prefer python:
NetworkX or Orange
Or for Java:
Weka API
Weka does not have the best documented API, but is quite rich in what algorithms are available.
Hope this helps.

Is this the correct definition of a "corpus"? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I have a huge string of raw text that is about 200,000 words long. It's a book.
I want to use these words to analyze the word relationships, so that I can apply those relationships to other applications.
Is this called a "corpus"?
A corpus, in linguistics, is any coherent body of real-life(*) text or speech being studied. So yes, a book is a corpus. The fact that it's in one string doesn't matter, as long as you don't randomly shuffle the characters.
(*) As opposed to a bunch of made up phrases being shown to test subjects to measure their responses, as is commonly done in psycholinguistics.
Yes.
http://en.wikipedia.org/wiki/Text_corpus
Specifically, because it's uses for statistics.
Usually "corpus" is used to refer to a structured collection, but linguists would know what you're talking about.

Resources