Python 3 - Replacing words in a sentence with its index value - string

I am writing a program that takes a sentence input from the user as a string (str1Input) and then writes that sentence to a file. After splitting the sentence into a list (str1), the program identifies the unique words and writes it to the same file. I then need to replace each word in the sentence (str1Input) with it's index value. by reading from the file containing the sentence
For instance, if i had "i like to code in python and code things" this would be (str1Input) i would then use "str1Input.split()" which changes it to {'i','like','to','code','in','python','and','code','things'}. After finding the unique values i would get: {'and', 'code', 'like', 'i', 'things', 'python', 'to', 'in'}.
I have a problem as I am unsure as to how i would read from the file to replace each word in the sentence with the index value of that word. here is the code so far:
str1Input = input("Please enter a sentence: ")
str1 = str1Input.split()
str1Write = open('str1.txt','w')
str1Write.write(str1Input)
str1Write.close()
print("The words in that sentence are: ",str1)
unique_words = set(str1)
print("Here are the unique words in that sentence: " ,unique_words)
If anyone could help me with this that would be greatly appreciated, thanks!

Related

How to search if every word in string starts with any of the word in list using python

I am trying to filter sentences from my pandas data-frame having 50 million records using keyword search. If any words in sentence starts with any of these keywords.
WordsToCheck=['hi','she', 'can']
text_string1="my name is handhit and cannary"
text_string2="she can play!"
If I do something like this:
if any(key in text_string1 for key in WordsToCheck):
print(text_string1)
I get False positive as handhit as hit in the last part of word.
How can I smartly avoid all such False positives from my result set?
Secondly, is there any faster way to do it in python? I am using apply function currently.
I am following this link so that my question is not a duplicate: How to check if a string contains an element from a list in Python
If the case is important you can do something like this:
def any_word_starts_with_one_of(sentence, keywords):
for kw in keywords:
match_words = [word for word in sentence.split(" ") if word.startswith(kw)]
if match_words:
return kw
return None
keywords = ["hi", "she", "can"]
sentences = ["Hi, this is the first sentence", "This is the second"]
for sentence in sentences:
if any_word_starts_with_one_of(sentence, keywords):
print(sentence)
If case is not important replace line 3 with something like this:
match_words = [word for word in sentence.split(" ") if word.lower().startswith(kw.lower())]

Define a list of words and check whether any of those words exist in the body of text

can you please help me with the logic of the following questions. I would like to define a list of different words and check if the words exist in the text, if so I would like the word return, if the words are not part of the text, I would like a message to be returned.
The code I have is the following:
def search_word():
with open('testing.txt', 'r') as textFile:
word_list = ['account', 'earn', 'free', 'links', 'click', 'cash', 'extra', 'win', 'bonus', 'card']
for line in textFile.read():
for word in word_list:
if word in line:
print(word)
else:
print('The text does not include any predefined words.')
search_word()
The output I get is the else statement. I know the issue is with the "for line in textFile.read()" code, however I am trying to understand why the logic does not work in the above code.
I get the right result by change to the following code by moving the "fileText = textObjet.read()" before the for loop command.
def search_word():
with open('email.txt', 'r') as textObject:
fileText = textObject.read()
word_list = ['account', 'earn', 'free', 'links', 'click', 'cash', 'Extra', 'win', 'bonus', 'card']
for word in word_list:
if word in fileText:
print(word)
else:
print(word, '- the email does not include any predefined spam words.')
search_word()
I would appreciate your help with understanding the difference in the logic.
thanks.
Lois
Assume the testing.txt contain only one word account, read() return your text file as a string like 'account\n'.
for line in textFile.read(): is reading every character (including space and new line) in that string, something like ['a', 'c', 'c', 'o', 'u', 'n', 't', '\n'], then
for word in word_list: compare words in word_list to every character. 10 words to 'a', then 10 words to 'c' ... and at last 10 words to \n. Not a single word comparison is match so we will have 80 (10 words x 8 characters) else statement executed.
While using fileText = textObject.read() without a for loop, for word in word_list: just compare words in word_list to that string. 'account' to 'account\n', 'earn' to 'account\n' ... 'card' to 'account\n'. This time return only 10 results for 10 words you have.

I have a question about my program that reads 2 sentences entered by a user, then compares them in various ways

I am brand new to Python, and I am trying to write a program that reads two sentences provided by a user, then compares them in the following ways:
I. It has to display a list of all the unique words contained in both sentences.
II. It has to display a list of the words that appear in both sentences.
III. It has to display a list of the words that appear in the first sentence but not the second.
IV. It has to display a list of the words that appear in the second sentence but not the first.
V. It has to display a list of the words that appear in either the first or second sentence but not both.
I tried searching for my issue on this site, but I could not find anything that exactly describes my problem. I have tried reading my book (Learning Python - 5th Ed), and even searched the web for pointers on how to get my program to function correctly.
Here is my code, I apologize in advance, as I know this is not the most efficient way to approach this type of program; but I am brand new to Python, and I could not think of a better way:
def sentenceDisplay():
userSentence1 = input('Please enter your first sentence: ')
userSentence2 = input('Please enter your second sentence: ')
sentDisplay1 = userSentence1.split()
sentDisplay2 = userSentence2.split()
print('The words contained within sentence #1 is: ', sentDisplay1)
print('The words contained within sentence #2 is: ', sentDisplay2)
sentenceDisplay()
def sent1And2Display():
userSent1 = input('Please enter your first sentence: ')
userSent2 = input('Please enter your second sentence: ')
displaySent1And2 = set(userSent1).union(set(userSent2))
displaySent1And2 = (userSent1.split() + userSent2.split())
print('The words contained within both sentence #1 and #2 is: ' + str(displaySent1And2))
sent1And2Display()
def diffOfUnion():
uSent1 = input('Please enter your first sentence: ')
uSent2 = input('Please enter your second sentence: ')
set1And2 = [set(uSent1), set(uSent2)]
nonRepSet = []
for sentSetElm in set1And2:
nonRepSet.append(len(sentSetElm.difference(set.union(*[i for i in set1And2 if i is not sentSetElm]))))
print('The unique words contained within both sentences is: ', (nonRepSet))
diffOfUnion()
def symmDiff():
symmSent1 = input('Please enter your first sentence: ')
symmSent2 = input('Please enter your second sentence: ')
symmDiffSent1And2 = set(symmSent1).symmetric_difference(set(symmSent2))
symmDiffSent1And2 = (symmSent1.split()+ symmSent2.split())
print('The words contained in either sentence #1 and #2, but not both is: ' + str(symmDiffSent1And2))
symmDiff()
I know that using set operations is what I need to do, but my functions (#3 and #4) are not behaving as they should; function #3 is showing the answer as a set of integers, but I need it to display as a set of strings like the rest of my functions. Also, function #4 is not finding the symmetric difference of the two sentences, and I cannot quite figure out as to why.
Any help as to what I am doing wrong would be greatly appreciated,
Thank you.
Short and sweet.
sent1 = "A random sentence.".split()
sent2 = "Some other random sentence.".split()
diff = set(sent1) ^ set(sent2)
union = set(sent1) & set(sent2)
one_not_two = set(sent1) - set(sent2)
two_not_one = set(sent2) - set(sent1)
all_words = sent1 + sent2
print(f"I. The word(s) contained in both sentence #1 and #2: {', '.join(union)}")
print(f"II. All word(s) from either: {', '.join(all_words)}")
print(f"III. The word(s) contained in sentence #1 only: {', '.join(one_not_two)}")
print(f"IV. The word(s) contained in sentence #2 only: {', '.join(two_not_one)}")
print(f"V.The word(s) contained in either sentence #1 or #2, but not both: {', '.join(diff)}")
I. The word(s) contained in both sentence #1 and #2: sentence., random
II. All word(s) from either: A, random, sentence., Some, other, random, sentence.
III. The word(s) contained in sentence #1 only: A
IV. The word(s) contained in sentence #2 only: Some, other
V.The word(s) contained in either sentence #1 or #2, but not both: A, Some, other

Extracting acronyms from each string in list index

I have a list of strings (the other posts only had single words or ints) that are imported from a file and I am having trouble using nested loops to separate each few words in an index into its own list and then taking the first letters of each to create acronyms.
I have tried picking apart each index and processing it through another loop to get the first letter of each word but the closest I got was pulling every first letter from each indexes from the original layer.
text = (infile.read()).splitlines()
acronym = []
separator = "."
for i in range(len(text)):
substring = [text[i]]
for j in range(len(substring)):
substring2 = [substring[j][:1])]
acronym.append(substring2)
print("The Acronym is: ", separator.join(acronym))
Happy Path: The list of multi-word strings will turn be translated into acronyms that are listed with linebreaks.
Example of what should output at the end: D.O.D. \n N.S.A. \n ect.
What's happened so far: Before I had gotten it to take the first letter of the first word of every index at the sentence level but I haven't figured out how to nest these loops to get to the single words of each index.
Useful knowledge: THE BEGINNING FORMAT AFTER SPLITLINES (Since people couldn't read this) is a list with indexes with syntax like this: ['Department of Defense', 'National Security Agency', ...]
What you have is kind of a mess. If you are going to be re-using code, it is often better to just make it into a function. Try this out.
def get_acronym(the_string):
words = the_string.split(" ")
return_string = ""
for word in words:
return_string += word[0]
return return_string
text = ['Department of Defense', 'National Security Agency']
for agency in text:
print("The acronym is: " + get_acronym(agency))
I figured out how to do it from a file. File format was like this:
['This is Foo', 'Coming from Bar', 'Bring Your Own Device', 'Department of Defense']
So if this also helps anyone, enjoy~
infile = open(iname, 'r')
text = (infile.read()).splitlines()
print("The Strings To Become Acronyms Are As Followed: \n", text, "\n")
acronyms = []
for string in text:
words = string.split()
letters = [word[0] for word in words]
acronyms.append(".".join(letters).upper())
print("The Acronyms For These Strings Are: \n",acronyms)
This code outputs like this:
The Strings To Become Acronyms Are As Followed:
['This is Foo', 'Coming from Bar', 'Bring Your Own Device', 'Department of Defense']
The Acronyms For These Strings Are:
['T.I.F', 'C.F.B', 'B.Y.O.D', 'D.O.D']

Expected str instance, int found. How do I change an int to str to make this code work?

I'm trying to write code that analyses a sentence that contains multiple words and no punctuation. I need it to identify individual words in the sentence that is entered and store them in a list. My example sentence is 'ask not what your country can do for you ask what you can do for your country. I then need the original position of the word to be written to a text file. This is my current code with parts taken from other questions I've found but I just can't get it to work
myFile = open("cat2numbers.txt", "wt")
list = [] # An empty list
sentence = "" # Sentence is equal to the sentence that will be entered
print("Writing to the file: ", myFile) # Telling the user what file they will be writing to
sentence = input("Please enter a sentence without punctuation ") # Asking the user to enter a sentenc
sentence = sentence.lower() # Turns everything entered into lower case
words = sentence.split() # Splitting the sentence into single words
positions = [words.index(word) + 1 for word in words]
for i in range(1,9):
s = repr(i)
print("The positions are being written to the file")
d = ', '.join(positions)
myFile.write(positions) # write the places to myFile
myFile.write("\n")
myFile.close() # closes myFile
print("The positions are now in the file")
The error I've been getting is TypeError: sequence item 0: expected str instance, int found. Could someone please help me, it would be much appreciated
The error stems from .join due to the fact you're joining ints on strings.
So the simple fix would be using:
d = ", ".join(map(str, positions))
which maps the str function on all the elements of the positions list and turns them to strings before joining.
That won't solve all your problems, though. You have used a for loop for some reason, in which you .close the file after writing. In consequent iterations you'll get an error for attempting to write to a file that has been closed.
There's other things, list = [] is unnecessary and, using the name list should be avoided; the initialization of sentence is unnecessary too, you don't need to initialize like that. Additionally, if you want to ask for 8 sentences (the for loop), put your loop before doing your work.
All in all, try something like this:
with open("cat2numbers.txt", "wt") as f:
print("Writing to the file: ", myFile) # Telling the user what file they will be writing to
for i in range(9):
sentence = input("Please enter a sentence without punctuation ").lower() # Asking the user to enter a sentenc
words = sentence.split() # Splitting the sentence into single words
positions = [words.index(word) + 1 for word in words]
f.write(", ".join(map(str, positions))) # write the places to myFile
myFile.write("\n")
print("The positions are now in the file")
this uses the with statement which handles closing the file for you, behind the scenes.
As I see it, in the for loop, you try to write into file, than close it, and than WRITE TO THE CLOSED FILE again. Couldn't this be the problem?

Resources