stemming. i need to write a code for this - python-3.x

If you search for something in Google and use a word like "running", Google is smart enough to match "run" or "runs" as well. That's because search engines do what's called stemming before matching words.
In English, stemming involves removing common endings from words to produce a base word. It's hard to come up with a complete set of rules that work for all words, but this simplified set does a pretty good job:
If the word starts with a capital letter, output it without changes.
If the word ends in 's', 'ed', or 'ing' remove those letters, but if the resulting stemmed word is only 1 or 2 letters long (e.g. chopping the ing from sing), use the original word.
Your program should read one word of input and print out the corresponding stemmed word. For example:
Enter the word: states
state
Another example interaction with your program is:
Enter the word: rowed
row
Remember that capitalised words should not be stemmed:
Enter the word: James
James
and nor should words that become too short after stemming:
Enter the word: sing
sing
Here is the code:
word = input("Enter the word:")
x = 'ing'
y = 'ed'
z = 's'
first = word[:1]
last = word[-1:]
uppercase = first.upper
if word == uppercase:
print("")
elif (x in word) == True:
word = (word.replace('ing',''))
print(word)
elif (y in word) == True:
word = (word.replace('ed',''))
print(word)
elif (z in word) == True:
word = (word.replace('s',''))
print(word)

I see two options. Either this is a homework question, in which case - please try to solve your own homework.
The other case - you need this in real life. If so, please look at NLTK for Python natural language processing needs. In particular see http://nltk.org/api/nltk.stem.html

Install NLTK toolkit
and try this
from nltk.stem.porter import PorterStemmer
PorterStemmer.stem_word(word)

Related

(Python) Why doesn't my program run anything while my CPU usage skyrockets?

So I was following this tutorial on youtube however when it comes to running the program, nothing shows up on the terminal while my CPU doubles in usage. I am using VScode to run this program. PS. I am very new to this and I always have frequent trouble when it comes to coding in general so i decided to resort in asking questions.
import random
from words import words
import string
def get_valid_word(words):
word = random.choice(words) #randomly chooses something from the list
while '-' in word or '' in word: #helps pick words that have a dash or space in it
word = random.choice(words)
return word.upper() #returns in uppercase
def hangman():
word = get_valid_word(words)
word_letters = set(word) # saves all the letters in the word
alphabet = set(string.ascii_uppercase) #imports predetermined list from english dictionary
used_letters = set() #keeps track of what user guessed
#getting user input
while len(word_letters) > 0: #keep guessing till you have no letters to guess the word
#letters used
#'' .join (['a', 'b', 'cd']) --> 'a, b cd'
print('you have used these letters: ',' '.join(used_letters))
#what the current word is (ie W - R D)
word_list = [letter if letter in used_letters else '-' for letter in word]
print('current word: ',' '.join(word_list))
user_letter = input('Guess a letter: ')
if user_letter in alphabet - used_letters: #letters haven't used
used_letters.add(user_letter) #add the used letters
if user_letter in word_letters: #if the letter that you've just guessed is in the word then you remove that letter from word_letters. So everytime you've guessed correctly from the word letters which is keeping track of all the letters in the word decreases in size.
word_letters.remove(user_letter)
elif user_letter in used_letters:
print('You have used this')
else:
print('Invalid')
#gets here when len(word_letters) == 0
hangman()

splitting words by syllable with CMU Pronunciation Dictionary, NLTK, and Python3

I am working on a natural language processing project and am stuck on splitting words into syllables (using nltk and cmudict.dict() in python 3).
I currently count syllables by looking a word in my corpus up in the CMU Pronunciation Dictionary and counting the number of stresses in its list of phonemes. This appears to work pretty well.
What I am stuck on is how to use this information to split the accompanying grapheme after counting, as I do not understand how to either translate the phonemes back to the graphemes (seems error prone) or use the list of phonemes to somehow split the grapheme.
Here is the function I wrote to do this (word tokenization happens elsewhere):
def getSyllables(self):
pronunciation = cmudict.dict() # get the pronunciation dictionary
syllableThreshold = 1 # we dont care about 1 syllable words right now
for word in self.tokens:
for grapheme, phonemes in pronunciation.items():
if grapheme == word.lower(): # all graphemes are lowercase, we have to word.lower() to match
syllableCounter = 0
for x in phonemes[0]:
for y in x:
if y[-1].isdigit(): # an item ending in a number is a stress (syllable)
syllableCounter += 1
if syllableCounter > syllableThreshold:
output = ' '.join([word, "=", str(syllableCounter)])
print(output)
print(phonemes)
else:
print(word)
Just as an example, my current output is:
Once
an
angry = 2
[['AE1', 'NG', 'G', 'R', 'IY0']]
man
How can I split the word angry, for example, into an - gry?

Words containing given letters

I'm trying to solve a task with the following content:
Create a function that retrieves the word and string of required letters and returns True if the word has all the required letters at least once.
My code looks like that:
def uses_only(letters, word):
letters = str(input("Enter letters : "))
word = str(input("Enter word : "))
if letters in word:
print("T")
else:
print("F")
uses_only(input, input)
But it doesn't work properly, becouse it returns F if the letter occurs more than once in the word. I searched the internet, but I didn't find anything that would help me. Can somebody explain me how to solve this task correctly?
I'm not sure I understood what you're trying to do.
This is a possible solution: the function checks whether each letter exists at least once in the word.
def f(word, letters):
return all(l in word for l in letters)
For example:
f("information", "oat") # true
f("information", "zfa") # false
you can also do it on this way, probably its something what you was trying:
def uses_only(letters, word):
letters = str(input("Enter letters : "))
word = str(input("Enter word : "))
for letter in letters:
if letter not in word:
return False
return True
print (uses_only(input, input))

Compressing a sentence in python

so i have this task for school and it states:
develop a program that identifies individual words in a sentence, stores these in a list and replaces each word with the position of that word in the list. The sentence has to inputted by the user of the program.
The example we have to use is:
ask not what your country can do for you ask what you can do for your country
should become:
1,2,3,4,5,6,7,8,9,1,3,9,6,7,8,4,5
I have no idea how to even go about starting this task
You could do something like that:
sentence = input().split(' ')
word_list = []
for i, word in enumerate(sentence, start=1):
if word not in word_list:
word_list.append(word)
to_print = str(len(word_list) - 1)
else:
to_print = str(word_list.index(word))
print(to_print, end=',') if i < len(sentence) else print(to_print, end='')

Finding the position of two words that are the same in a list [duplicate]

This question already has answers here:
How to find all occurrences of an element in a list
(18 answers)
Closed 6 years ago.
So I created this code to ask a person to input a sentence.
After that they type a word in that sentence.
Then the code will output the position that word is in.
print("Type a sentence here")
sentence = input("")
sentence = sentence.split()
print("Now type a word in that sentence.")
word = input('')
if word in sentence:
print("I found",word,"in your sentence.")
else:
print("That word is not in your sentence.")
print(sentence.index(word))
The problem I am having is that if they put two of the same word in the sentence it only outputs the first one. Please can you help.
You could use the built-in enumerate to associate every word in your list sentence with its corresponding position. Then use a list comprehension to get every occurrence of the word in the list.
print([i for i, j in enumerate(sentence) if j == word])
Some further considerations would be that maybe you want to convert your sentence to lower case and remove punctuation before attempting to match your word, such that proper punctuation and capitalization will not trip up your matching. Further, you don't need the '' in input() in order for it to be valid - an empty input() without a prompt is fine.
This pb is solved by this script :
import re
print("Type a sentence here")
sentence = raw_input("")
print("Now type a word in that sentence.")
word = raw_input('')
words = re.sub("[^\w]", " ", sentence).split() # using re
position = 1
list_pos = []
for w in words :
if w == word:
print position
list_pos.append(position)
position += 1
if list_pos:
print("I found",word,"in your sentence.")
else:
print("That word is not in your sentence.")
print list_pos

Resources