Finding average number of words and sentences in paragraph - python-3.x

I have text file from which I need to find the the average number of words per sentence and the average number of sentences per paragraph where a sentence is a sequence of words followed by either a full-stop, comma or exclamation mark, which in turn must be followed either by a quotation mark (so the sentence is the end of a quote or spoken utterance), or white space (space, tab or new-line character) and where a paragraph is any number of sentences followed by a blank line or by the end of the text without using regex.
I created a list of words i.e [".", ",", "!", "\n", "\t", " "] as my problem says and then iterated over the entire text file.
with open("/Users/abhishekabhishek/downloads/l.txt") as f:
text_lis = f.read()
# print(text_lis)
sentence_count = 0
ens_sentence = [".", ",", "!", "\n", "\t", " "]
for word in ens_sentence:
if word in text_lis:
sentence_count += 1
#print(sentence_count)
# sentence_count gave me the wrong output so I tried splitting it
# using text_lis.split(".") so that I can count the sentences
s = text_lis.split(".")
# the for average number of words per sentence
char_len = 0
for line in s:
words = line.split(" ")
for word in words:
char_len += len(word.split)
average_number_of words = char_len/len(words)
The actual output must be the average number of sentences and average number of words per sentence in that paragraph.The approach that I tried gave me the wrong the output because, there are certain words in the file which also use such punctuations like .' for ex Dr. etc and when I used text_lis.split() it also counts those words as the end of the sentence.
here is the sample text
I would love to try or hear the sample audio your app can produce. I do not want to purchase, because I've purchased so many apps that say they do something and do not deliver.
Can you please add audio samples with text you've converted? I'd love to see the end results.
Thanks!
THE AUTHOR.

Related

need to use a 'for loop' for this one. The user has to enter a sentence and any spaces must be replaced with "%"

the input
sentence = input("Please enter a sentence:")
the for loop (incorrect here)
for i in sentence:
print(sentence)
space_loc = sentence.index(" ")
for c in sentence:
print(space_loc)
for b in range(space_loc):
print("%")
confused about how to get the answer out.
You can try using concatenation of strings and slicing in this one.
sentence = input()
After taking the input simply store the length of your string
length = len(sentence)
Then iterate through every characters in the string and when you find a " ", break the string into two halves using slicing such that each half has one side of the string from " ". And then, join it by a "%" :-
for i in range(length):
if sentence[i]==" ":
sentence = sentence[:i] + "%" + sentence[i+1:]
Here, sentence[:i] is the part of string before the space and sentence[i+1:] is the part of string after the space.
One way of solving your query:
Code
sentence = input("Please enter a sentence:")
ls=sentence.split() #Creating a list of words present in sentence
new_sentence='%'.join(ls) #Joining the list with '%'
print(new_sentence)
Output
Please enter a sentence:Hello there coders!
Hello%there%coders!
EDIT
I do not understand how exactly you want to use the for loop here. If you just want to include a for loop (no restrictions), then you can do this:
Code
ls=[]
a=0
sentence = input("Please enter a sentence:")
for i in range(0,len(sentence)): # This loop will find the words in the sentence and store them in a list. Words are determined by checking the white space. Each space is replaced with '%'
if sentence[i]==' ':
ls.append(sentence[a:i])
a=i
ls.append('%')
ls.append(sentence[a:]) # This is to save the last word
ls1=[]
for i in ls: # Removing any white space inside the list
j=i.replace(' ','')
ls1.append(j)
print(''.join(ls1)) # Displaying final output
Again, your question is very open ended and this is just one way of using for loop to get the desired result!

How can I get the the sum of all combined word lengths to print as a single value?

I am having trouble converting the total of all word counts. I have tried various methods, the length per word is correct, I just can't get a total?
file=open(r"sheSaid.txt","r+")
from collections import Counter
wordcount = Counter(file.read().split())
for word in file.read().split(' '):
word = word.rstrip(".""',?!")
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
#print (word.rstrip(".""..',?!"),wordcount)
for item in wordcount.items(): print("{}\t{}".format(*item))
wordCount = len(wordcount)
#can count all word lengths just fine
print ("The total word count is:", word, wordCount) # when I use Len() Or #Count I cannot get the sum of all wordCounts?
print ("The total length of all words are:","total_of_all_word_counts?")
#I need the sum to complete this
print ("The avg length is:", wordcount/total_of_all_word_counts?)
file.close();
Something like this will do, although not clear what you are looking for, but this will count the words in the text file, calculate the character count for all words and then the character average:
file = open(r"sheSaid.txt","r+")
file_contents = file.read().split() # Reads the file and split by spaces to create a list of words
file.close() # Always good idea to close the file after done with it
words_count = len(file_contents)
characters_count = len(''.join(file_contents))
average_characters = characters_count / words_count
You may want to do an extra effort and handle some other cases, like removing characters like !, . etc. That will be straightforward. This is just a starting point to build upon. You only know the corner cases and what will be in the text file.

Removing a string that startswith a specific char Python

text='I miss Wonderland #feeling sad #omg'
prefix=('#','#')
for line in text:
if line.startswith(prefix):
text=text.replace(line,'')
print(text)
The output should be:
'I miss Wonderland'
But my output is the original string with the prefix removed
So it seems that you do not in fact want to remove the whole "string" or "line", but rather the word? Then you'll want to split your string into words:
words = test.split(' ')
And now iterate through each element in words, performing your check on the first letter. Lastly, combine these elements back into one string:
result = ""
for word in words:
if !word.startswith(prefix):
result += (word + " ")
for line in text in your case will iterate over each character in the text, not each word. So when it gets to e.g., '#' in '#feeling', it will remove the #, but 'feeling' will remain because none of the other characters in that string start with/are '#' or '#'. You can confirm that your code is going character by character by doing:
for line in text:
print(line)
Try the following instead, which does the filtering in a single line:
text = 'I miss Wonderland #feeling sad #omg'
prefix = ('#','#')
words = text.split() # Split the text into a list of its individual words.
# Join only those words that don't start with prefix
print(' '.join([word for word in words if not word.startswith(prefix)]))

How to find words that have no characters of an string?

my question is related to python. I have a .txt file that contains some text. I want to have a code that prompts the user to enter a string of forbidden letters and then print the number of words in the file that don’t contain any of them. I wrote this one:
letters=raw_input("Please enter forbidden letters")
text=open("p.txt", "r").read().split()
for word in text:
for i in letters:
if not j in word:
print word
and also have this one:
letters=raw_input("Please enter forbidden letters")
text=open("p.txt", "r").read().split()
for word in text:
for i in letters:
for j in i:
if not j in word:
print word
But both codes give me words that don't contain each of characters and not all characters. for example if my file have "Hello good friends" and user entered "oh", this codes print:
Hello
good
friends
friends
But I expect just "friends" that don't have nor "o" not "h". and expect to have "friends" one time and not two times.
How can I fix my problem?
Put them in a set to remove dupes. Then for each word you want to print it only if all letters do not appear in the word. Something like this:
letters = raw_input("Please enter forbidden letters")
words = set(open("p.txt", "r").read().split())
for word in words:
if all(letter not in word for letter in letters):
print word

parsing words in a document using specific delimiters

I have a document that I'm parsing words from but I want to consider anything that is not a-z, A-Z, 0-9, or an apostrophe, to be white space. How could I do this if I am using the following bit of code before:
ifstream file;
file.open(filePath);
while(file >> word){
listOfWords.push_back(word); // I want to make sure only words with the stated
// range of characters exist in my list.
}
So, for example, the word hor.se would be two elements in my list, "hor" and "se".
Create a list of "whitespace characters" and then each time you encounter a character, check to see if that character is in the list and if so you've started a new word. This example is written in python, but the concept is the same.
def get_words(whitespace_chars, string):
words = []
current_word = ""
for x in range(0, len(string)):
#check to see if we hit the end of a word.
if(string[x] in whitespace_chars and current_word != ""):
words.append(current_word)
current_word = ""
#add current letter to current word.
else:
current_word += string[x]
#if the last letter isnt whitespace then the last word wont be added, so add here.
if(current_word != ""):
words.append(current_word)
return words
return words

Resources