Summing characters in lines from a text file?

Summing characters in lines from a text file? - python-3.x

I have a text file like this:
This is just
an example of
a textfile
and would like to find the sum of all words that don't contain an "e". This sum is to be printed for every line, and should be the total sum of words in that line.
Currently I have this:
with open(sys.argv[1], "r") as f:
count = 0
for line in f:
words = line.split()
for word in words:
if "e" not in word:
for char in word:
count += 1
print(count)
and the output I get is:
4
6
10
12
14
15
when it should be:
10
4
1

You can use the len builtin to get the length of a string. The reason you're getting larger numbers than you expect is that you're not resetting the count variable for each line, and also you're printing after every word, not each line.
with open(sys.argv[1], "r") as f:
for line in f:
count = 0
words = line.split()
for word in words:
if "e" not in word:
count += len(word)
print(count)
You can write this more compactly as
with open(sys.argv[1], "r") as f:
for line in f:
print(sum(len(word) for word in line.split() if 'e' not in word))

Related

How does tell() in python file handling work

f = open("test.txt","w")
s= "This\nThis\nThis"
f.write(s)
f.close()
f= open("test.txt","r")
w=''
for i in f:
for j in i:
w = w+j
print(w)
print("Number of Characters",len(w))
print("Current Position of handler",f.tell())
f.close()
The output of the above is
This
This
This
Number of Characters 14
Current Position of handler 16
As per the file, there are 12 characters and 2 escape sequences so the number of characters is 14. I got it. But I did not get why the tell() function returns 17

Just an assumption.
I think, in your case, just for '\n', the pointer is moving twice. First, after reading the newline character i.e. \n, the pointer is moving one step right. Secondly, because of newline character, the pointer is going to the beginning of the next line. That's why an extra count is being added to tell() function's result. This won't be happen for other escape characters like '\t' etc.
I ran some examples on my system. You can notice the results one by one.
Example 1
f = open("test.txt","w")
s= "\t"
f.write(s)
f.close()
f= open("test.txt","r")
w=''
for i in f:
for j in i:
w = w+j
print(w)
print("Number of Characters",len(w))
print("Current Position of handler",f.tell())
f.close()
Output
>>>python .\test.py
Number of Characters 1
Current Position of handler 1
Example 2
f = open("test.txt","w")
s= "\n"
f.write(s)
f.close()
f= open("test.txt","r")
w=''
for i in f:
for j in i:
w = w+j
print(w)
print("Number of Characters",len(w))
print("Current Position of handler",f.tell())
f.close()
Output
>>>python .\test.py
Number of Characters 1
Current Position of handler 2
Example 3
f = open("test.txt","w")
s= "ThisThisThis"
f.write(s)
f.close()
f= open("test.txt","r")
w=''
for i in f:
for j in i:
w = w+j
print(w)
print("Number of Characters",len(w))
print("Current Position of handler",f.tell())
f.close()
Output
>>>python .\test.py
ThisThisThis
Number of Characters 12
Current Position of handler 12
Example 4
f = open("test.txt","w")
s= "ThisThisThis\n"
f.write(s)
f.close()
f= open("test.txt","r")
w=''
for i in f:
for j in i:
w = w+j
print(w)
print("Number of Characters",len(w))
print("Current Position of handler",f.tell())
f.close()
Output
>>>python .\test.py
ThisThisThis
Number of Characters 13
Current Position of handler 14
Example 5
f = open("test.txt","w")
s= "ThisThisThis\t"
f.write(s)
f.close()
f= open("test.txt","r")
w=''
for i in f:
for j in i:
w = w+j
print(w)
print("Number of Characters",len(w))
print("Current Position of handler",f.tell())
f.close()
Output
>>>python .\test.py
ThisThisThis
Number of Characters 13
Current Position of handler 13
For your case, you used \n two times in your string. You can count 2 instead of 1 for every \n while guessing the result of tell(). So, 4 + 2 + 4 + 2 + 4 = 16.

could someone instruct me to understand this code

def count_char(text, char):
count = 0
for c in text:
if c == char:
count += 1
return count
filename = input("Enter a filename: ")
with open(filename) as f:
text = f.read()
for char in "abcdefghijklmnopqrstuvwxyz":
perc = 100 * count_char(text, char) / len(text)
print("{0} - {1}%".format(char, round(perc, 2)))

It's a script that counts the relative occurrence of letters abcdefghijklmnopqrstuvwxyz in the given text file.
The first block defines a function that counts how many times the character char is present in the text:
def count_char(text, char):
count = 0
for c in text:
if c == char:
count += 1
return count
The second block asks you to input the name of the file:
filename = input("Enter a filename: ")
and saves the contents of that file as a string in the variable text:
with open(filename) as f:
text = f.read()
The third block displays the relative occurrence of characters a b c d e f g h i j k l m n o p q r s t u v w x y z in text.
For each of these characters, it first computes the proportion of the amount of the given characters in the text count_char(text, char) to the total length of the text len(text) and multiplies the result by 100 to convert it to percentage:
perc = 100 * count_char(text, char) / len(text)
and displays the results as a formatted string. The numbers in curly brackets are replaced by the character char and the percentage of its occurrence, rounded to two decimals round(perc, 2):
print("{0} - {1}%".format(char, round(perc, 2)))
You can read more about string formatting in Python here.

Python 3, count letter frequency in file

The line:
for Char in word returns the error "TypeError: 'int' object is not iterable"
How did it become an integer and not a string?
How do I iterate it?
# Write a program that reads a file and prints the letters in decreasing
# order of frequency. Your program should convert all the input to lower
# case and only count the letters a-z. Your program should not count
# spaces, digits, punctuation, or anything other than the letters a-z.
import string
from string import digits
fname = input("Enter file:")
if len(fname) < 1 : fname = "romeo-full.txt"
fhand = open(fname)
charcount = dict()
for line in fhand:
line = line.rstrip()
line = line.lower()
line = line.translate(str.maketrans('','',string.punctuation))
line = str.maketrans("", "", digits)
for word in line:
for char in word:
charcount[char] = charcount.get(char, 0) + 1
print(charcount)

Counting the frequency distribution of letters in a text file

I'm writing a program that will count how many of each letter there are.
Currently, it's working but it counts upper and lower case letters separately. I tried to convert all of the characters to upper case but it didn't work.
myFile = open('textFile.txt', 'r+')
with open('textFile.txt', 'r') as fileinput:
for line in fileinput:
line = line.upper()
d = {}
for i in myFile.read():
d[i] = d.get(i,0) + 1
for k,v in sorted(d.items()):
print("{}: {}".format(k,v))
If my text file consists of:
abc
ABC
it will print:
(space) : 1
A: 1
B: 1
C: 1
a: 1
b: 1
c: 1
I would like it to print:
A: 2
B: 2
C: 2

the result of line = line.upper() is not used anywhere. Perhaps move the counting code into the block of code that performs the uppercase transformation. Then count the characters in each uppercased line.

in this you are changing character to upper case but reading file only
see line 4 , do somthing like this
myFile = open('textFile.txt', 'r+')
with open('textFile.txt', 'r') as fileinput:
for line in fileinput:
line = line.upper()
d = {}
#change is here
for i in line:
d[i] = d.get(i,0) + 1
for k,v in sorted(d.items()):
print("{}: {}".format(k,v))

In Python, indenting is critical, you are converting the input to uppercase, but then throwing it away.
Try rearranging it like this:
d = {}
#myFile = open('textFile.txt', 'r+') - removed as not needed due to "with" variant of file processing below.
with open('textFile.txt', 'r') as fileinput:
for line in fileinput:
line = line.upper()
for i in line:
d[i] = d.get(i,0) + 1
for k,v in sorted(d.items()):
print("{}: {}".format(k,v))

This will do it.
chars = []
with open('textFile.txt', 'r') as fileinput:
for line in fileinput:
for c in line:
chars.append(c.upper())
d = {}
for i in chars:
d[i] = d.get(i, 0) + 1
for k,v in sorted(d.items()):
print("{}: {}".format(k,v))
Or this:
d = {}
with open('textFile.txt', 'r') as fileinput:
for line in fileinput:
line = line.upper()
for i in line:
d[i] = d.get(i,0) + 1
for k,v in sorted(d.items()):
print("{}: {}".format(k,v))

Print Word & Line Number Where Word Occurs in File Python

I am trying to print the word and line number(s) where the word occurs in the file in Python. Currently I am getting the correct numbers for second word, but the first word I look up does not print the right line numbers. I must iterate through infile, use a dictionary to store the line numbers, remove new line chars, remove any punctuation & skip over blank lines when pulling the number. I need to add a value that is actually a list, so that I may add the line numbers to the list if the word is contained on multiple lines.
Adjusted code:
def index(f,wordf):
infile = open(filename, 'r')
dct = {}
count = 0
for line in infile:
count += 1
newLine = line.replace('\n', ' ')
if newLine == ' ':
continue
for word in wordf:
if word in split_line:
if word in dct:
dct[word] += 1
else:
dct[word] = 1
for word in word_list:
print('{:12} {},'.format(word,dct[word]))
infile.close()
Current Output:
>>> index('leaves.txt',['cedars','countenance'])
pines [9469, 9835, 10848, 10883],
counter [792, 2092, 2374],
Desired output:
>>> index2('f.txt',['pines','counter','venison'])
pines [530, 9469, 9835, 10848, 10883]
counter [792, 2092, 2374]

There is some ambiguity for how your file is set up, but I think it understand.
Try this:
import numpy as np # add this import
...
for word in word_f:
if word in split_line:
np_array = np.array(split_line)
item_index_list = np.where(np_array == word)
dct[word] = item_index_list # note, you might want the 'index + 1' instead of the 'index'
for word in word_f:
print('{:12} {},'.format(word,dct[word]))
...
btw, as far as I can tell, you're not using your 'increment' variable.
I think that'll work, let me know if it doesn't and I'll fix it

per request, I made an additional answer (that I think works) without importing another library
def index2(f,word_f):
infile = open(f, 'r')
dct = {}
# deleted line
for line in infile:
newLine = line.replace('\n', ' ')
if newLine == ' ':
continue
# deleted line
newLine2 = removePunctuation(newLine)
split_line = newLine2.split()
for word in word_f:
count = 0 # you might want to start at 1 instead, if you're going for 'word number'
# important note: you need to have 'word2', not 'word' here, and on the next line
for word2 in split_line: # changed to looping through data
if word2 == word:
if word2 in dct:
temp = dct[word]
temp.append(count)
dct[word] = temp
else:
temp = []
temp.append(count)
dct[word] = temp
count += 1
for word in word_f:
print('{:12} {},'.format(word,dct[word]))
infile.close()
Do be aware, I don't think this code will handle if the words passed in are not in the file. I'm not positive on the file that you're grabbing from, so I can't be sure, but I think it'll seg fault if you pass in a word that doesn't exist in the file.

Note: I took this code from my other post to see if it works, and it seems that it does
def index2():
word_list = ["work", "many", "lots", "words"]
infile = ["lots of words","many many work words","how come this picture lots work","poem poem more words that rhyme"]
dct = {}
# deleted line
for line in infile:
newLine = line.replace('\n', ' ') # shouldn't do anything, because I have no newlines
if newLine == ' ':
continue
# deleted line
newLine2 = newLine # ignoring punctuation
split_line = newLine2.split()
for word in word_list:
count = 0 # you might want to start at 1 instead, if you're going for 'word number'
# important note: you need to have 'word2', not 'word' here, and on the next line
for word2 in split_line: # changed to looping through data
if word2 == word:
if word2 in dct:
temp = dct[word]
temp.append(count)
dct[word] = temp
else:
temp = []
temp.append(count)
dct[word] = temp
count += 1
for word in word_list:
print('{:12} {}'.format(word, ", ".join(map(str, dct[word])))) # edited output so it's comma separated list without a trailing comma
def main():
index2()
if __name__ == "__main__":main()
and the output:
work 2, 5
many 0, 1
lots 0, 4
words 2, 3, 3
and the explanation:
infile = [
"lots of words", # lots at index 0, words at index 2
"many many work words", # many at index 0, many at index 1, work at index 2, words at index 3
"how come this picture lots work", # lots at index 4, work at index 5
"poem poem more words that rhyme" # words at index 3
]
when they get appended in that order, they get the correct word placement position

My biggest error was that I was not properly adding the line number to the counter. I completely used the wrong call, and did nothing to increment the line number as the word was found in the file. The proper format was dct[word] += [count] not dct[word] += 1
def index(filename,word_list):
infile = open(filename, 'r')
dct = {}
count = 0
for line in infile:
count += 1
newLine = line.replace('\n', ' ')
if newLine == ' ':
continue
newLine2 = removePunctuation(newLine)
split_line = newLine2.split()
for word in word_list:
if word in split_line:
if word in dct:
dct[word] += [count]
else:
dct[word] = [count]
for word in word_list:
print('{:12} {}'.format(word,dct[word]))
infile.close()

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Summing characters in lines from a text file? - python-3.x

Related

How does tell() in python file handling work

could someone instruct me to understand this code

Python 3, count letter frequency in file

Counting the frequency distribution of letters in a text file

Print Word & Line Number Where Word Occurs in File Python

Categories

Resources