How do I find character frequency form text file through iteration? (python3) - python-3.x

I'm trying to find a way to iterate through a text file and list to find character frequency. I understand that I could use Count() for this. But Count() gives everything including spaces periods and whatnots. Also it does not show the character frequency in alphabetical order. I found a way to do it and it works but not really. I'll explain later. Also when I try to put the frequency I get a KeyError. I'll also explain.
I don't want to put my whole project on here so I'll explain some stuff first. I have a separate list called alphabet_list which includes the alphabet. There's a text file that is already read through and converted into uppercase called new_text.
Character frequency Code:
for i in range(len(alphabet_list)):
for c in new_text:
if c == alphabet_list[i]:
count += 1
else:
count = 0
print(alphbet_list[i] + " " + str(count)
i += 1
Output
A 0
A 0
.
.
.
A 1
A 0
.
.
.
B 0
.
.
.
B 1
B 2
B 0
.
.
.
Z 0
P.S the str(count) is temporarily there because I want to see how it looks like print out, I needed to store the result in dictionary
My output would be that, like I said it works but not really. It will iterate but it iterates through every letter and prints out the result already and does not iterate the whole text file and just print final result. It will add to the result if there is another letter same as before right next to each other. Ex (... bb...) it will be B 1, B 2 like shown in my output. And for some reason when I use return it doesn't work. It returns nothing and just ends the program.
Second Code with KeyError:
I skipped the problem on top because I couldn't find the answer and didn't want to waste my time but ran into another problem lol*
for i in range(len(alphabet_list)):
for c in new_text:
if c == alphabet_list[i]:
count += 1
else:
count = 0
c_freq[alphabet_list[i]] == count
print(c_freq)
i += 1
This one was pretty simple I got a KeyError: 'A'.
I tried only doing the
i = 3 #just random number to test
count = 50
c_freq[alphabet_list[i]] == count
print(c_freq)
and it works, so I'm thinking that problem is also related to the problem above(? maybe). Anyways any help would be great. Thanks!
Sorry for long question but I really needed help.

This should help you:
lst = ['A', 'Z', 'H', 'A', 'B', 'N', 'H', 'Y', '.' , ',','Z'] #Initial list. Note: The list also includes characters such as commas and full stops.
alpha_dict = {}
for ch in lst:
if ch.isalpha(): #Checks if the character is an alphabet
if ch in alpha_dict.keys():
alpha_dict[ch] += 1 #If key already exists, value is incremented by 1
else:
alpha_dict[ch] = 1 #If key does not exist, a new key is created with value 1
print(alpha_dict)
Output:
{'A': 2, 'Z': 2, 'H': 2, 'B': 1, 'N': 1, 'Y': 1}
Since you want the output to be sorted in alphabetical order, add these lines to your code:
key_list = list(alpha_dict.keys()) #Creates a list of all the keys in the dict
key_list.sort() #Sorts the list in alphabetical order
final_dict = {}
for key in key_list:
final_dict[key] = alpha_dict[key]
print(final_dict)
Output:
{'A': 2, 'B': 1, 'H': 2, 'N': 1, 'Y': 1, 'Z': 2}
Thus, here is the final code:
lst = ['A', 'Z', 'H', 'A', 'B', 'N', 'H', 'Y', '.' , ',','Z']
alpha_dict = {}
for ch in lst:
if ch.isalpha():
if ch in alpha_dict.keys():
alpha_dict[ch] += 1
else:
alpha_dict[ch] = 1
key_list = list(alpha_dict.keys())
key_list.sort()
final_dict = {}
for key in key_list:
final_dict[key] = alpha_dict[key]
print(final_dict)
Output:
{'A': 2, 'B': 1, 'H': 2, 'N': 1, 'Y': 1, 'Z': 2}

Related

Calculating average of specific dict key values Python 3.7.4

I am trying to calculate the overall average for my grades dict. Subject level and subject code does not matter.These are my dicts and lists.
grades = {'INFO100' : 'C','INFO102' : 'B', \
'INFO125' : 'B','INFO132' : 'A', \
'INFO180' : '' ,'INFO216' : 'A', \
'INFO282' : 'C','INFO284' : '' , \
'ECON100' : 'C','ECON110' : 'C', \
'ECON218' : '' , 'GEO100' : '' , \
'GEO113' : 'D', 'GEO124' : 'D',}
subjects = ['INFO100','INFO102','INFO125',\
'INFO132','INFO180','INFO216',\
'INFO282','INFO284','ECON100',\
'ECON110','ECON218','GEO100' ,\
'GEO113' ,'GEO124']
subject_code = {'Informatics' : 'INFO',\
'Economy' : 'ECON',\
'Geografi' : 'GEO'}
This is what i have. Not sure where and how to iterate from here on.
convert_grade = {"A": 5, "B": 4, "C": 3, "D": 2, "E": 1, "F": 0}
convert_grade.update({v: k for k,v in convert_grade.items()})
grade_sum = sum(['convert_grade['something'] for 'something' in 'something if 'something' is not None])
average_grade = round(grade_sum / len(['something' for 'something' in 'something' if 'something' is not None]))
print("Average grade:", convert_grade[average_grade])
I got help constructing the way to calculate(i think at least, might be a too fancy way) but i don't really know where to iterate from there. I am fairly new to python so i am still learning how to iterate properly. Any help or guidance is more than welcome. I do not need "my" way of calculation to work, i just need help in the right direction.
grades = {'INFO100' : 'C','INFO102' : 'B',
'INFO125' : 'B','INFO132' : 'A',
'INFO180' : '' ,'INFO216' : 'A',
'INFO282' : 'C','INFO284' : '' ,
'ECON100' : 'C','ECON110' : 'C',
'ECON218' : '' , 'GEO100' : '' ,
'GEO113' : 'D', 'GEO124' : 'D',}
total = 0
count = 0
for k, v in grades.items():
if(v == "A"):
total += 5
elif(v == "B"):
total += 4
elif(v == "C"):
total += 3
elif(v == "D"):
total += 2
elif(v == "E"):
total += 1
else:
total += 0
count += 1
print(total / count)
Yields: 2.4285714285714284
Which is the average of those grades (assuming that blanks count, if you don't want them to count, we can filter them out)
I don't understand why you are adding the reverse relation to the dict with that convert_grade.update line. If you're going to need that (you don't, at least for this specific task), you should have two dictionaries. One for the letter grade -> score mapping and one for score -> letter grade. You should also rename it from convert_grade to grade_to_score or something more descriptive.
First, if '' means that you haven't taken the class yet and it shouldn't be counted as part of your average, you have to filter them out first. You were on the right track but don't check for None check for the empty string, which you do by just treating the string as a boolean, if it's an empty string it won't evaluate to true:
grades = {course: grade for course, grade in grades.items() if grade}
Then you need to convert the letter grades to a list of numbers using a dictionary, and then you can find the average of that list using either sum(x) / len(x) or statistics.mean:
from statistics import mean
grade_to_score = {"A": 5, "B": 4, "C": 3, "D": 2, "E": 1, "F": 0}
average_grade = mean(grade_to_score[grade] for grade in grades.values())
print("Average grade:", average_grade)
If a '' means that you got an F, you can skip the first step and just add an extra '': 0 part to your definition of grades_to_score.
If you want to get averages by course, just filter the dictionary right after the first step
info_grades = {course: grade for course, grade in grades.items() if course.startswith('INFO')}
and then you can pass info_grades.values() instead of grades.values() to statistics.mean() to get the average of all your Informatics courses. If you want to do every one, have a for loop over ['INFO', 'ECON', 'GEO'].

Printing a list where values are in a specific distance from each other using Python

I have a list,
A = ['A','B','C','D','E','F','G','H']
if user input x = 4, then I need an output that shows every value that is 4 distance away from each other.
If starting from 'A' after printing values that are 4 distance away from each other ie: {'A', 'E'}, the code should iterate back and start from 'B' to print all values from there ie: {'B', 'F'}
No number can be in more than one group
Any help is going to be appreciated since I am very new to python.
this is what I have done
x = input("enter the number to divide with: ")
A = ['A','B','C','D','E','F','G','H']
print("Team A is divided by " +x+ " groups")
print("---------------------")
out = [A[i] for i in range(0, len(A), int(x))]
print(out)
My code is printing only the following when user input x =4
{'A', 'E'}
But I need it to look like the following
{'A', 'E'}
{'B', 'F'}
{'C', 'G'}
{'D', 'H'}
what am I doing wrong?
Use zip:
out = list(zip(A, A[x:]))
For example:
x = 4 # int(input("enter the number to divide with: "))
A = ['A','B','C','D','E','F','G','H']
print(f"Team A is divided by {x} groups")
print("---------------------")
out = list(zip(A, A[x:]))
print(out)
Outputs:
[('A', 'E'), ('B', 'F'), ('C', 'G'), ('D', 'H')]
Here you have the live example
If you want to keep the comprehension:
out = [(A[i], A[i+x]) for i in range(0, len(A)-x)]
**You can find my answer below.
def goutham(alist):
for passchar in range(0,len(alist)-4):
i = alist[passchar]
j = alist[passchar+4]
print("{"+i+","+j+"}")
j = 0
alist = ['a','b','c','d','e','f','g','h']
goutham(alist)

Getting index of duplicate letters in a string

from graphics import *
import random
def hangman(word):
returnStuff = {'again':0, '1st':1}
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o',
'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
win = GraphWin("Hangman", 800, 550)
win.setBackground("yellow")
titleText = Text(Point(400,50), 'HANGMAN')
titleText.setSize(24)
titleText.setStyle('bold')
titleText.draw(win)
#Building the hangman base
base = Line(Point(120,350),Point(230,350))
base.draw(win)
stand = Line(Point(175,350),Point(175,150))
stand.draw(win)
stand2 = Line(Point(175,150),Point(250,150))
stand2.draw(win)
stand3 = Line(Point(250,150),Point(250,180))
stand3.draw(win)
#drawing the empty lines for the word
x1 = 150
x2 = 180
l = 0
print(word)
while l< len(word):
wordLine = Line(Point(x1, 420),Point(x2,420))
wordLine.draw(win)
l+=1
x1+=40
x2+=40
guessCounter = 0
textCheck = 0
invalidText = Text(Point(600,100), 'You did not enter a valid letter.')
invalidText.setTextColor('red')
indexes = []
while guessCounter < 6:
#text entry box
textEntry = Entry(Point(600,180),10)
textEntry.draw(win)
guessText = Text(Point(600,150), 'Guess a letter:')
guessText.draw(win)
#user has to click this box to confirm the letter
enterBox = Rectangle(Point(580,200), Point(620,220))
enterBox.setFill('white')
enterBox.draw(win)
clickText = Text(Point(600,210), 'Enter')
clickText.draw(win)
click = win.getMouse()
x = click.getX()
y = click.getY()
if 580 < x < 620 and 200 < y < 220:
guess = textEntry.getText().lower().strip()
if guess not in alphabet:
if textCheck == 0:
invalidText.draw(win)
textCheck = 1
else:
if textCheck == 1:
invalidText.undraw()
textCheck = 0
for letter in word:
if letter == guess:
indexes.append(word.index(guess))
print(indexes)
win.getMouse()
win.close()
return returnStuff
#list with various words pertaining to nanotechnology
words = ['nanotechnology', 'science', 'nanometre' , 'strength', 'chemistry',
'small', 'molecule', 'light' , 'weight', 'technology', 'materials',
'property', 'physics', 'engineering', 'matter', 'waterloo', 'nanobot',
'reaction', 'structure', 'cells']
#picks a random word from the list
word = random.choice(words)
#this variable ensures it opens the game the first time
initialCall = 1
#stores the returnValue for the first call
returnValue = hangman(word)
#sets the initialCall to 0 after first call
if returnValue['1st']==1:
initialCall=0
#Calls the game function again if user wishes
while initialCall == 1 or returnStuff['again'] == 1:
returnValue = hangman(word)
I am making Hangman in Python graphics. I apologize for the huge block of code, it all works fine, I just thought it must be useful. The part of the code that I'm concerned about is this:
else:
if textCheck == 1:
invalidText.undraw()
textCheck = 0
for letter in word:
if letter == guess:
indexes.append(word.index(guess))
print(indexes)
This block of code will be executed when the user's letter guess is in the alphabet, I then run through each letter in the chosen word, and if at any point a letter in the word is the same as the guess letter, I store the index of that letter in a empty list so I can use that to tell the computer where to draw the letters on the empty lines.
It works fine, with the exception of when there is a duplicate letter in the word. For example, engineering has 3 es. Unfortunately, .index() only records the index for when the letter appears the first time, and it disregards the other letters. What is the work around for this so I can get the indexes of all 3 es in that word, instead of 3 indexes of just the first e. For testing purposes, I have printed the chosen word and the index list on the console so I can see what's going on and so I don't actually have to guess a letter.
you can do something like this
def indexes(word,letter):
for i,x in enumerate(word):
if x == letter:
yield i
test
>>> list( indexes("engineering","e") )
[0, 5, 6]
>>>
this function is a generator, that is a lazy function that only give result when asked for then, to get a individual result you use next, the functions is execute until the first yield then return the result and stop its execution and remember where it was, until another call to next is executed in witch point resume execution from the previous point until the next yield, if the is no more raise StopIteration, for example:
>>> word="engineering"
>>> index_e = indexes(word,"e")
>>> next(index_e)
0
>>> print(word)
engineering
>>> next(index_e)
5
>>> next(index_e)
6
>>> next(index_e)
Traceback (most recent call last):
File "<pyshell#13>", line 1, in <module>
next(index_e)
StopIteration
>>>
to update a list with the result of this function, you can use the extend method
>>> my_list=list()
>>> index_e = indexes(word,"e")
>>> my_list.extend( index_e )
>>> my_list
[0, 5, 6]
>>>
generator are used in cases where their result is only a intermediary step to something else because they use a minimum amount of memory, but in this case maybe you want the whole thing so use it as the first example or remake the function to return a list in the first place like this
def indexes(word,letter):
result = list()
for i,x in enumerate(word):
if x == letter:
result.append(i)
return result
sorry if I confuse you with the yield thing.

How to get value for each string index matching key in dictionary in Python

str = 'strings'
new_D = {'r': 1, 's': 1, 't': 1, 'r' : 3, 'i' : 4 }
How can I get each letter in the string assigned to the value in the dictionary by match 'letter-key' and then summarize the values?
Thanks
s = 'strings' #Don't name a variable str, that shadows the builtin str
new_D = {'r': 1, 's': 1, 't': 1, 'r' : 3, 'i' : 4 }
sum_of_chars = sum([newD.get(k,0) for k in s]) #assuming 0 as default for "not in dictionary"
This takes advantage of the fact that:
Strings are iterable. for i in s: print(i) would print each character, seperately.
Dictionaries have a .get(key[,default]) 1 that can take an option argument for "return this value if the key doesn't exist.
I'm using the built-in sum on a list comprehension for the sake of brevity. Brevity can both be a virtue or a vice, but, hey, one list comp is still usually pretty readable after you know what they are.
string = 'strings'
new_D = {'r': 1, 's': 1, 't': 1, 'r' : 3, 'i' : 4 }
sum_of_chars = 0
for character in string:
if character in new_D:
sum_of_chars += new_D[character]
else:
sum_of_chars += 1 # Default?
print(sum_of_chars)
btw, you should not use the name str because it shadows the builtin str and there's a mistake in your dictionary. It contains the entry r two times which doesn't make sense.

Musical note string (C#-4, F-3, etc.) to MIDI note value, in Python

The code in my answer below converts musical notes in strings, such as C#-4 or F-3, to their corresponding MIDI note values.
I am posting this because I am tired of trying to dig it up online every time I need it. I'm sure I'm not the only one who can find a use for it. I just wrote this up — it is tested and correct. It's in Python, but I feel that it pretty close to universally understandable.
#Input is string in the form C#-4, Db-4, or F-3. If your implementation doesn't use the hyphen,
#just replace the line :
# letter = midstr.split('-')[0].upper()
#with:
# letter = midstr[:-1]
def MidiStringToInt(midstr):
Notes = [["C"],["C#","Db"],["D"],["D#","Eb"],["E"],["F"],["F#","Gb"],["G"],["G#","Ab"],["A"],["A#","Bb"],["B"]]
answer = 0
i = 0
#Note
letter = midstr.split('-')[0].upper()
for note in Notes:
for form in note:
if letter.upper() == form:
answer = i
break;
i += 1
#Octave
answer += (int(midstr[-1]))*12
return answer
NOTES_FLAT = ['C', 'Db', 'D', 'Eb', 'E', 'F', 'Gb', 'G', 'Ab', 'A', 'Bb', 'B']
NOTES_SHARP = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B']
def NoteToMidi(KeyOctave):
# KeyOctave is formatted like 'C#3'
key = KeyOctave[:-1] # eg C, Db
octave = KeyOctave[-1] # eg 3, 4
answer = -1
try:
if 'b' in key:
pos = NOTES_FLAT.index(key)
else:
pos = NOTES_SHARP.index(key)
except:
print('The key is not valid', key)
return answer
answer += pos + 12 * (int(octave) + 1) + 1
return answer

Resources