File input frequency sorting - python-3.x

so I have to write a program that:
Takes the filename as an argument.
Reads the file and counts, for each band, how many albums of that band are listed in the file. (http://vlm1.uta.edu/~cconly/teaching/cse1310_spring2015/assignments/assignment7/albums.txt)
Prints on the screen, in descending order of number of albums, a line for each band. Each line should contain the name of the band, followed by a colon and space, and then the number of albums for that band. This would look like this:
band1: number1
band2: number2
band3: number3
so there is my code below, but I keep getting tremendous errors that tells me that things aren't defined when they are, and I'll get this one as well --> TypeError: 'NoneType' object is not iterable, any help would be great!
import fileinput
import os
filename = open("albums.txt", "r") # open album.txt file
def process_line(line):
line = line.lower()
new_line = ""
for letter in line:
if letter in (""",.!"'()"""):
continue
elif letter == '-':
letter = ' '
new_line = new_line + letter
words = new_line.split()
return words
def count_words(filename):
if (os.path.isfile(filename) == False):
print("\nError: file " + filename + " does not exist.\n")
return
#in_file = open(filename, "r")
result = {}
for line in filename:
words = process_line(line)
for word in words:
if (word in result):
result[word] += 1
else:
result[word] = 1
def print_word_frequencies(dictionary):
print()
inverse = inverse_dictionary(dictionary)
frequencies = inverse.keys()
frequencies = list(frequencies) # convert frequencies to a list, so that we can sort it.
frequencies.sort() # sorting the list
frequencies.reverse() # reverse the sorting of the list
for frequency in frequencies: # for words with the same frequency, we want them sorted in
list_of_words = inverse[frequency]
list_of_words.sort() # sorting in alphabetical order
for word in list_of_words:
print(word + ":", frequency)
def inverse_dictionary(in_dictionary):
out_dictionary = {}
for key in in_dictionary:
value = in_dictionary[key]
if (value in out_dictionary):
list_of_keys = out_dictionary[value]
list_of_keys.append(key)
else:
out_dictionary[value] = [key]
return out_dictionary
def main():
filename = "albums.txt"
dictionary = count_words(filename)
print_word_frequencies(dictionary)
main()

Since this is an assignment, I will not give you the full code, but just point out some errors.
First, your indentation is all wrong, and indentation is important in Python! This may just have happened when you pasted your code into the question editor, but maybe not. Particularly, make sure your are not mixing tabs and spaces!
Your count_words method does not return anything, thus dictionary is None and you get TypeError: 'NoneType' object is not iterable in inverse_dictionary
When you do for line in filename, you are iterating the characters in the file name, not the lines in the file, as the global variable filename is shadowed by the filename parameter. Open the file in that method using with open(filename) as the_file:
Your process_line method seems odd. It seems like you remove all the special characters, but then how do you plan to separate band name and album name? You seem to just count words, not albums per band. Try line.split(" - ")[0] to get the band.
All that dictionary-inversing is not needed at all. In print_word_frequencies, just sort the items from the dictionary using some custom key function to sort by the count.
With those hints, you should be able to fix your program. (In case you want to know, I got your program down to about ten lines of code.)

Related

Python 3 split string multiple times

Text file input:
10G/Host_IP,UID,PWD,Host-Name,15-2-7
10G/Host_IP,UID,PWD,Host-Name,12-2-7
root = tk.Tk()
root.attributes("-topmost", True)
root.withdraw()
file = tkinter.filedialog.askopenfilename()
def _10g_script (params):
print (type(params)) ## says params is a str
for items in params:
params1 = items.split(",")
## print(IP, UID, PWD, TID, SH_SL_PT) ## is what I am wanting here,
##then I will split the SH_SL_PT
print (type(params1)) ## says params is a list
with open(file,"r") as fh:
for lines in fh:
rate, param = lines.strip().split("/")
if rate == "10G":
_10g_script(param)
print (type(param)) ## says param is a str
What I am trying to is split the line from the text file the rate and the rest of the parameters, rate and other parameters into separate variables. Pass the rate into the function then split the variable params further into more variables (Host_IP, UID, PWD, Host-Name, SH_SL_PT).
The first split in is a str and after the split, but when I try the second split it says it is a list.
I have tried join, but it puts every character as its own string with a "," in between characters
Any help would be appreciated
Let's walk through the code. Your code starts here:
with open(file,"r") as fh:
for lines in fh:
rate, param = lines.strip().split("/")
if rate == "10G":
_10g_script(param)
print (type(param)) ## says param is a str
We first open the file and then jump into the for loop. This loop splits up the document into lines, and puts these lines into a list that it goes through, meaning that the variable lines is a string of one line of the document, and every iteration we go to the next line.
Next we split the our line using "/". This split creates a list containing two elements, with lines.strip().split("/") = ["10G","Host_IP,UID,PWD,Host-Name,12-2-7"]. However, on the left side you put two variables, rate and param, so python sets rate = "10G" and param = "Host_IP,UID,PWD,Host-Name,12-2-7".
Going into your function, params as you saw is a string. So when you try to loop through it, python assumes that you want each iteration of your loop to go through a single character.
So, instead of writing the function _10g_script, what you can do is:
with open(file,"r") as fh:
for lines in fh:
rate, param = lines.strip().split("/")
if rate == "10G":
#IP = "Host_IP", UID = "UID", TID = "TID", SH_SL_PT
IP, UID, PWD, TID, SH_SL_PT = param.split(",")
print(IP,UID,TID,SH_SL_PT)
Then you would do the same for SH_SL_PT, writing:
SH,SL,PT = SH_SL_PT.split("-")
Wherever you needed that.

Check for non-floats in a csv file python3

I'm trying to read a csv file, and create a 2 dimensional list from the values stored inside.
However I'm running into trouble when I try to check whether or not the values stored can be converted into floats.
Here is the function I have written, which reads the file and creates a list.
def readfile(amount, name):
tempfile = open(name).readlines()[1:] #First value in line is never a float, hence the [1:]
rain_list = []
count = 0.0
for line in tempfile:
line = line.rstrip()
part = line.split(",")
try:
part = float(part)
except ValueError:
print("ERROR: invalid float in line: {}".format(line))
rain_list.append(part[amount])
count += 1
if count == 0:
print("ERROR in reading the file.")
tempfile.close()
return rain_list
It might be a little messy, since it's essentially a patchwork of different possible solutions I have tried.
The values it gets are the name of the file (name) and the amount of values it reads from the file (amount).
Has anyone got an idea why this does not work as I expect it to work?
part is a list of strings. To check & convert for all floats, you'd have to do:
part = [float(x) for x in part]
(wrapped in your exception block)
BTW you should use the csv module to read comma-separated files. It's built-in. Also using enumerate would allow to be able to print the line where the error occurs, not only the data:
reader = csv.reader(tempfile) # better: pass directly the file handle
# and use next(reader) to discard the title line
for lineno,line in enumerate(reader,2): # lineno starts at 2 because of title line
try:
line = [float(x) for x in line]
except ValueError:
print("ERROR: invalid float in line {}: {}".format(lineno,line))

Why, when I sort a list of 13 numbers, does python only return the last number?

I have a list of first name, last name, and score in a text file that refers to a student and their test score. When I go to sort them so I can find the median score, it only returns from the sort() the last grade on the list. I need it to return all of them, but obviously sorted in order. Here is the part of my code in question:
def main():
#open file in read mode
gradeFile = open("grades.txt","r")
#read the column names and assign them to line1 variable
line1 = gradeFile.readline()
#tell it to look at lines 2 to the end
lines = gradeFile.readlines()
#seperate the list of lines into seperate lines
for line in lines:
#initialize grades list as an empty list
gradeList = []
#iterate through each line and take off the \n
line = line.rstrip()
line = line.split(",")
grades = line[-1]
try:
gradeList.append(float(grades))
except ValueError:
pass
#print(gradeList)
#sort the grades
gradeList.sort(reverse=False)
print(gradeList)
You clear the list each time the loop runs. Move the assignment outside the loop.

Python 3.6.1: Code does not execute after a for loop

I've been learning Python and I wanted to write a script to count the number of characters in a text and calculate their relative frequencies. But first, I wanted to know the length of the file. My intention is that, while the script goes from line to line counting all the characters, it would print the current line and the total number of lines, so I could know how much it is going to take.
I executed a simple for loop to count the number of lines, and then another for loop to count the characters and put them in a dictionary. However, when I run the script with the first for loop, it stops early. It doesn't even go into the second for loop as far as I know. If I remove this loop, the rest of the code goes on fine. What is causing this?
Excuse my code. It's rudimentary, but I'm proud of it.
My code:
import string
fname = input ('Enter a file name: ')
try:
fhand = open(fname)
except:
print ('Cannot open file.')
quit()
#Problematic bit. If this part is present, the script ends abruptly.
#filelength = 0
#for lines in fhand:
# filelength = filelength + 1
counts = dict()
currentline = 1
for line in fhand:
if len(line) == 0: continue
line = line.translate(str.maketrans('','',string.punctuation))
line = line.translate(str.maketrans('','',string.digits))
line = line.translate(str.maketrans('','',string.whitespace))
line = line.translate(str.maketrans('','',""" '"’‘“” """))
line = line.lower()
index = 0
while index < len(line):
if line[index] not in counts:
counts[line[index]] = 1
else:
counts[line[index]] += 1
index += 1
print('Currently at line: ', currentline, 'of', filelength)
currentline += 1
listtosort = list()
totalcount = 0
for (char, number) in list(counts.items()):
listtosort.append((number,char))
totalcount = totalcount + number
listtosort.sort(reverse=True)
for (number, char) in listtosort:
frequency = number/totalcount*100
print ('Character: %s, count: %d, Frequency: %g' % (char, number, frequency))
It looks fine the way you are doing it, however to simulate your problem, I downloaded and saved a Guttenberg text book. It's a unicode issue. Two ways to resolve it. Open it as a binary file or add the encoding. As it's text, I'd go the utf-8 option.
I'd also suggest you code it differently, below is the basic structure that closes the file after opening it.
filename = "GutenbergBook.txt"
try:
#fhand = open(filename, 'rb')
#open read only and utf-8 encoding
fhand = open(filename, 'r', encoding = 'utf-8')
except IOError:
print("couldn't find the file")
else:
try:
for line in fhand:
#put your code here
print(line)
except:
print("Error reading the file")
finally:
fhand.close()
For the op, this is a specific occasion. However, for visitors, if your code below the for state does not execute, it is not a python built-in issue, most likely to be: an exception error handling in parent caller.
Your iteration is inside a function, which is called inside a try except block of caller, then if any error occur during the loop, it will get escaped.
This issue can be hard to find, especially when you dealing with intricate architecture.

text file reading and writing, ValueError: need more than 1 value to unpack

I need to make a program in a single def that opens a text file 'grades' where first, last and grade are separated by comas. Each line is a separate student. Then it displays students and grades as well as class average. Then goes on to add another student and grade and saves it to the text file while including the old students.
I guess I just don't understand the way python goes through the text file. If i comment out 'lines' I see it prints the old_names but its as if everything is gone after. When lines is not commented out 'old_names' is not printed which makes me think the file is closed? or empty? however everything is still in the txt file as it should be.
currently i get this error.... Which I am pretty sure is telling me I'm retarded there's no information in 'line'
File "D:\Dropbox\Dropbox\1Python\Batch Processinga\grades.py", line 45, in main
first_name[i], last_name[i], grades[i] = line.split(',')
ValueError: need more than 1 value to unpack
End goal is to get it to give me the current student names and grades, average. Then add one student, save that student and grade to file. Then be able to pull the file back up with all the students including the new one and do it all over again.
I apologize for being a nub.
def main():
#Declare variables
#List of strings: first_name, last_name
first_name = []
last_name = []
#List of floats: grades
grades = []
#Float grade_avg, new_grade
grade_avg = new_grade = 0.0
#string new_student
new_student = ''
#Intro
print("Program displays information from a text file to")
print("display student first name, last name, grade and")
print("class average then allows user to enter another")
print("student.\t")
#Open file “grades.txt” for reading
infile = open("grades.txt","r")
lines = infile.readlines()
old_names = infile.read()
print(old_names)
#Write for loop for each line creating a list
for i in len(lines):
#read in line
line = infile.readline()
#Split data
first_name[i], last_name[i], grades[i] = line.split(',')
#convert grades to floats
grades[i] = float(grades[i])
print(first_name, last_name, grades)
#close the file
infile.close()
#perform calculations for average
grade_avg = float(sum(grades)/len(grades))
#display results
print("Name\t\t Grade")
print("----------------------")
for n in range(5):
print(first_name[n], last_name[n], "\t", grades[n])
print('')
print('Average Grade:\t% 0.1f'%grade_avg)
#Prompt user for input of new student and grade
new_student = input('Please enter the First and Last name of new student:\n').title()
new_grade = eval(input("Please enter {}'s grade:".format(new_student)))
#Write new student and grade to grades.txt in same format as other records
new_student = new_student.split()
new_student = str(new_student[1] + ',' + new_student[0] + ',' + str(new_grade))
outfile = open("grades.txt","w")
print(old_names, new_student ,file=outfile)
outfile.close()enter code here
File objects in Python have a "file pointer", which keeps track of what data you've already read from the file. It uses this to know where to start looking when you call read or readline or readlines. Calling readlines moves the file pointer all the way to the end of the file; subsequent read calls will return an empty string. This explains why you're getting a ValueError on the line.split(',') line. line is an empty string, so line.split(",") returns a list of length 0, but you need a list of length 3 to do the triple assignment you're attempting.
Once you get the lines list, you don't need to interact with the infile object any more. You already have all the lines; you may as well simply iterate through them directly.
#Write for loop for each line creating a list
for line in lines:
columns = line.split(",")
first_name.append(columns[0])
last_name.append(columns[1])
grades.append(float(columns[2]))
Note that I'm using append instead of listName[i] = whatever. This is necessary because Python lists will not automatically resize themselves when you try to assign to an index that doesn't exist yet; you'll just get an IndexError. append, on the other hand, will resize the list as desired.

Resources