Python 3, count letter frequency in file

Python 3, count letter frequency in file - python-3.x

The line:
for Char in word returns the error "TypeError: 'int' object is not iterable"
How did it become an integer and not a string?
How do I iterate it?
# Write a program that reads a file and prints the letters in decreasing
# order of frequency. Your program should convert all the input to lower
# case and only count the letters a-z. Your program should not count
# spaces, digits, punctuation, or anything other than the letters a-z.
import string
from string import digits
fname = input("Enter file:")
if len(fname) < 1 : fname = "romeo-full.txt"
fhand = open(fname)
charcount = dict()
for line in fhand:
line = line.rstrip()
line = line.lower()
line = line.translate(str.maketrans('','',string.punctuation))
line = str.maketrans("", "", digits)
for word in line:
for char in word:
charcount[char] = charcount.get(char, 0) + 1
print(charcount)

Related

Summing characters in lines from a text file?

I have a text file like this:
This is just
an example of
a textfile
and would like to find the sum of all words that don't contain an "e". This sum is to be printed for every line, and should be the total sum of words in that line.
Currently I have this:
with open(sys.argv[1], "r") as f:
count = 0
for line in f:
words = line.split()
for word in words:
if "e" not in word:
for char in word:
count += 1
print(count)
and the output I get is:
4
6
10
12
14
15
when it should be:
10
4
1

You can use the len builtin to get the length of a string. The reason you're getting larger numbers than you expect is that you're not resetting the count variable for each line, and also you're printing after every word, not each line.
with open(sys.argv[1], "r") as f:
for line in f:
count = 0
words = line.split()
for word in words:
if "e" not in word:
count += len(word)
print(count)
You can write this more compactly as
with open(sys.argv[1], "r") as f:
for line in f:
print(sum(len(word) for word in line.split() if 'e' not in word))

Palindrome and as well as anagrammatism?

How can we create a program that checks if the input we give(for example: EXAMPLE) has the same length and the same letters with (XAMPLEE)?

this is efficient python code for it
NO_OF_CHARS = 256
def areAnagram(str1, str2):
# Create two count arrays and initialize all values as 0
count1 = [0] * NO_OF_CHARS
count2 = [0] * NO_OF_CHARS
# For each character in input strings, increment count
# in the corresponding count array
for i in str1:
count1[ord(i)]+=1
for i in str2:
count2[ord(i)]+=1
# If both strings are of different length. Removing this
# condition will make the program fail for strings like
# "aaca" and "aca"
if len(str1) != len(str2):
return 0
# Compare count arrays
for i in xrange(NO_OF_CHARS):
if count1[i] != count2[i]:
return 0
return 1
str1 = "EXAMPLE"
str2 = "XAMPLEE"
if areAnagram(str1, str2):
print "The two strings are anagram of each other"
else:
print "The two strings are not anagram of each other"

import array
word1 = "EXAMPLE"
word2 = "XAMPLEE"
letters = array.array('i',(0,)*26)
# Count letters of word1
for c in word1:
charIndex = ord(c) - ord('A')
letters[charIndex] = letters[charIndex]+1
# Count letters of word2 and substract them from word1 count
for c in word2:
charIndex = ord(c) - ord('A')
letters[charIndex] = letters[charIndex]-1
# letters contains only 0 if words are the same
if letters.count(0) < 26:
print("Words are different")
else:
print("Words are anagrams")

Return value of function gets ignored

VOWELS = ['a', 'e', 'i', 'o', 'u']
BEGINNING = ["th", "st", "qu", "pl", "tr"]
def pig_latin2(word):
# word is a string to convert to pig-latin
string = word
string = string.lower()
# get first letter in string
test = string[0]
if test not in VOWELS:
# remove first letter from string skip index 0
string = string[1:] + string[0]
# add characters to string
string = string + "ay"
if test in VOWELS:
string = string + "hay"
print(string)
def pig_latin(word):
string = word
transfer_word = word
string.lower()
test = string[0] + string[1]
if test not in BEGINNING:
pig_latin2(transfer_word)
if test in BEGINNING:
string = string[2:] + string[0] + string[1] + "ay"
print(string)
When I un-comment the code below and replace print(string) with return string in above two functions, it only works for words in pig_latin(). As soon as word should be passed to pig_latin2() I get a value of None for all words and the programs crashes.
# def start_program():
# print("Would you like to convert words or sentence into pig latin?")
# answer = input("(y/n) >>>")
# print("Only have words with spaces, no punctuation marks!")
# word_list = ""
# if answer == "y":
# words = input("Provide words or sentence here: \n>>>")
# new_words = words.split()
# for word in new_words:
# word = pig_latin(word)
# word_list = word_list + " " + word
# print(word_list)
# elif answer == "n":
# print("Goodbye")
# quit()
# start_program()

You're not capturing the return value of the pig_latin2 function. So whatever that function does, you're discarding its output.
Fix this line in the pig_latin function:
if test not in BEGINNING:
string = pig_latin2(transfer_word) # <----------- forgot 'string =' here
When fixed thusly, it works for me. Having said that, there would still be a bunch of stuff to clean up.

Caesar Cipher shift on new lines

I am having a problem with my code trying to do an advanced caesar cipher shift. I changed a certain letter to another, and then added a string to certain parts of a file before encoding, but am having problems doing the shifting now. This is my code:
import string
import sys
count = 1
cont_cipher = "Y"
#User inputs text
while cont_cipher == "Y":
if count == 1:
file = str(input("Enter input file:" ""))
k = str(input("Enter shift amount: "))
purpose = str(input("Encode (E) or Decode (D) ?: "))
#Steps to encode the message, replace words/letter then shift
if purpose == "E":
my_file = open(file, "r")
file_contents = my_file.read()
#change all "e"s to "zw"s
for letter in file_contents:
if letter == "e":
file_contents = file_contents.replace(letter, "zw")
#add "hokie" to beginning, middle, and end of each line
lines = file_contents.split('\n')
def middle_message(lines, position, word_to_insert):
lines = lines[:position] + word_to_insert + lines[position:]
return lines
new_lines = ["hokie" + middle_message(lines[x], len(lines[x])//2, "hokie") + "hokie" for x in range(len(lines))]
#math to do the actual encryption
def caesar(word, shifts):
word_list = list(new_lines)
result = []
s = 0
for c in word_list:
next_ord = ord(c) + s + 2
if next_ord > 122:
next_ord = 97
result.append(chr(next_ord))
s = (s + 1) % shifts
return "".join(result)
if __name__ == "__main__":
print(caesar(my_file, 5))
#close file and add to count
my_file.close()
count = count + 1
The error I am getting is:
TypeError: ord() expected a character, but string of length 58 found
I know that I need to split it into individual characters, but am not sure how to do this. I need to use the updated message and not the original file in this step...

Print Word & Line Number Where Word Occurs in File Python

I am trying to print the word and line number(s) where the word occurs in the file in Python. Currently I am getting the correct numbers for second word, but the first word I look up does not print the right line numbers. I must iterate through infile, use a dictionary to store the line numbers, remove new line chars, remove any punctuation & skip over blank lines when pulling the number. I need to add a value that is actually a list, so that I may add the line numbers to the list if the word is contained on multiple lines.
Adjusted code:
def index(f,wordf):
infile = open(filename, 'r')
dct = {}
count = 0
for line in infile:
count += 1
newLine = line.replace('\n', ' ')
if newLine == ' ':
continue
for word in wordf:
if word in split_line:
if word in dct:
dct[word] += 1
else:
dct[word] = 1
for word in word_list:
print('{:12} {},'.format(word,dct[word]))
infile.close()
Current Output:
>>> index('leaves.txt',['cedars','countenance'])
pines [9469, 9835, 10848, 10883],
counter [792, 2092, 2374],
Desired output:
>>> index2('f.txt',['pines','counter','venison'])
pines [530, 9469, 9835, 10848, 10883]
counter [792, 2092, 2374]

There is some ambiguity for how your file is set up, but I think it understand.
Try this:
import numpy as np # add this import
...
for word in word_f:
if word in split_line:
np_array = np.array(split_line)
item_index_list = np.where(np_array == word)
dct[word] = item_index_list # note, you might want the 'index + 1' instead of the 'index'
for word in word_f:
print('{:12} {},'.format(word,dct[word]))
...
btw, as far as I can tell, you're not using your 'increment' variable.
I think that'll work, let me know if it doesn't and I'll fix it

per request, I made an additional answer (that I think works) without importing another library
def index2(f,word_f):
infile = open(f, 'r')
dct = {}
# deleted line
for line in infile:
newLine = line.replace('\n', ' ')
if newLine == ' ':
continue
# deleted line
newLine2 = removePunctuation(newLine)
split_line = newLine2.split()
for word in word_f:
count = 0 # you might want to start at 1 instead, if you're going for 'word number'
# important note: you need to have 'word2', not 'word' here, and on the next line
for word2 in split_line: # changed to looping through data
if word2 == word:
if word2 in dct:
temp = dct[word]
temp.append(count)
dct[word] = temp
else:
temp = []
temp.append(count)
dct[word] = temp
count += 1
for word in word_f:
print('{:12} {},'.format(word,dct[word]))
infile.close()
Do be aware, I don't think this code will handle if the words passed in are not in the file. I'm not positive on the file that you're grabbing from, so I can't be sure, but I think it'll seg fault if you pass in a word that doesn't exist in the file.

Note: I took this code from my other post to see if it works, and it seems that it does
def index2():
word_list = ["work", "many", "lots", "words"]
infile = ["lots of words","many many work words","how come this picture lots work","poem poem more words that rhyme"]
dct = {}
# deleted line
for line in infile:
newLine = line.replace('\n', ' ') # shouldn't do anything, because I have no newlines
if newLine == ' ':
continue
# deleted line
newLine2 = newLine # ignoring punctuation
split_line = newLine2.split()
for word in word_list:
count = 0 # you might want to start at 1 instead, if you're going for 'word number'
# important note: you need to have 'word2', not 'word' here, and on the next line
for word2 in split_line: # changed to looping through data
if word2 == word:
if word2 in dct:
temp = dct[word]
temp.append(count)
dct[word] = temp
else:
temp = []
temp.append(count)
dct[word] = temp
count += 1
for word in word_list:
print('{:12} {}'.format(word, ", ".join(map(str, dct[word])))) # edited output so it's comma separated list without a trailing comma
def main():
index2()
if __name__ == "__main__":main()
and the output:
work 2, 5
many 0, 1
lots 0, 4
words 2, 3, 3
and the explanation:
infile = [
"lots of words", # lots at index 0, words at index 2
"many many work words", # many at index 0, many at index 1, work at index 2, words at index 3
"how come this picture lots work", # lots at index 4, work at index 5
"poem poem more words that rhyme" # words at index 3
]
when they get appended in that order, they get the correct word placement position

My biggest error was that I was not properly adding the line number to the counter. I completely used the wrong call, and did nothing to increment the line number as the word was found in the file. The proper format was dct[word] += [count] not dct[word] += 1
def index(filename,word_list):
infile = open(filename, 'r')
dct = {}
count = 0
for line in infile:
count += 1
newLine = line.replace('\n', ' ')
if newLine == ' ':
continue
newLine2 = removePunctuation(newLine)
split_line = newLine2.split()
for word in word_list:
if word in split_line:
if word in dct:
dct[word] += [count]
else:
dct[word] = [count]
for word in word_list:
print('{:12} {}'.format(word,dct[word]))
infile.close()

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Python 3, count letter frequency in file - python-3.x

Related

Summing characters in lines from a text file?

Palindrome and as well as anagrammatism?

Return value of function gets ignored

Caesar Cipher shift on new lines

Print Word & Line Number Where Word Occurs in File Python

Categories

Resources