Using NLTK or similar, how in Python can I conjugate verb once I have the base word (lemma)? - nlp

Using NLTK or similar, how can I take a base word (lemma) and get other parts of speech for that word, especially other forms of a verb?
NLTK does not directly support this.
I tried "pattern" with Python 3.9 and 3.10 and could not get it working.
NodeBox and MontyLingua just lead to dead links these days.

Related

Is there any solution if spacy can't be located on my system?

As the picture shows, spacy is well installed:
But I can't still "import" it:
By consulting the official site, it seems that spacy can't be located on my system(win32):
I want to know if there are some solution to it. In fact, I want to use it to treat french corpus, so if it's impossible to profit it, is there any other similar tool I can use to lemmatize french and so on?

Meaningful word detection

I'm trying to filter only meaningful words from a list of words. Some of the words will be gibberish and i want to filter them out. I'm curious if there is a library for this in a common language like python or nodejs. It would be great if the library supports different languages (turkish in this case).
Are you looking for stopwords list? If so, you can refer to this post NLTK available languages for stopwords

How to detect sentence stress by python NLP packages (spaCy or NLTK)?

Can we detect the sentence stress (the stress on some words or pauses between words in a sentence) using common NLP packages such as spaCy or NLTK?
How can we tell content words from structure words using spaCy or NLTK?
Since all NLP programs detect the dependencies, there should be a possibility to identify which words are stressed in natural speech.
I don't think that NLTK or spacy support this directly. You can find content words with either tool, sure, but that's only part of the picture. You want to look for software related to prosody or intonation, which you might find as a component of a text-to-speech system.
Here's a very recently published research paper with code that might be a good place to start: https://github.com/Helsinki-NLP/prosody/ . The annotated data and the references could be useful even if the code might not be exactly the kind of approach you're looking for.
I assume you do not have a special training data set with labeled data in what words to stress. So I guess the simplest way would be to assume, that stressed words are all of the same Part-of-speech. I guess nouns and verbs would be a good start, excluding modal verbs for example.
NLTK comes with PoS-Taggers.
But as natural speech depends lot on context, it's probaly difficult for humans as well to identify a single solution for what to stress in a sentence.

Natural Language Processing Libraries

I'm having a hard time figuring out what library and datasets go together.
Toolkits / Libraries I've found:
CoreNLP - Java
NLTK - Python
OpenNLP - Java
ClearNLP - Java
Out of all of these, some are missing features. For example OpenNLP didn't have a dependency parsing.
I need to find a library that's quick that will also do dependency parsing and part of speech tagging.
The next hurdle is where do we get data sets. I've found a lot of things out there, but nothing full and comprehensive.
Data I've found:
NLTK Corpora
English Web Treebank (looks to be the best but is paid)
OpenNLP
Penn Treebank
I'm confused as to what data sets I need for what features and what's actually available publicly. From my research is seems ClearNLP will work best for but has very little data.
Thank you
Stanford CoreNLP provides both POS tagging and dependency parsing out of the box (plus many other features!), it already has trained models so you don't need any data sets for it work!
Please let me know if you have any more questions about the toolkit!
http://nlp.stanford.edu/software/corenlp.shtml

WORDNET database access

I have download wordnet(2.1) but i dont know how to access wordnet database?
There are both libraries and file-formats documented at the WordNet 3.0 Reference Manual. By the way, is there a reason you aren't using WordNet 3.0?
You should check out NLTK. It's the easiest way to access WordNet. It's written in python.
Just to show you how simple it can be:
>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('motorcar')
[Synset('car.n.01')]
You can find further documentation here:
http://nltk.googlecode.com/svn/trunk/doc/book/ch02.html
(scroll down for WordNet)
Oh and don't forget to actually download WordNet:
>>>import nltk
>>>nltk.download()
then just choose WordNet and download
Highly recommend the MySql build at http://wnsqlbuilder.sourceforge.net/. You can also search for a SQL Server version. Big direct speed upticks.
If you are using C++, Wordnet comes with an interface on its own. You should find them in your WN distribution.
If you are using C#, then sharpnlp.codeplex.com is the place for you, they have a Wordnet interface.
Wordnet also has a Perl distribution, but I don't know if they can be directly used because I dont use Perl.
Best regards,
David
Install nlk and then use
from nltk import wordnet
And then by using synsets you can compare words

Resources