I'm trying to read a csv file, and create a 2 dimensional list from the values stored inside.
However I'm running into trouble when I try to check whether or not the values stored can be converted into floats.
Here is the function I have written, which reads the file and creates a list.
def readfile(amount, name):
tempfile = open(name).readlines()[1:] #First value in line is never a float, hence the [1:]
rain_list = []
count = 0.0
for line in tempfile:
line = line.rstrip()
part = line.split(",")
try:
part = float(part)
except ValueError:
print("ERROR: invalid float in line: {}".format(line))
rain_list.append(part[amount])
count += 1
if count == 0:
print("ERROR in reading the file.")
tempfile.close()
return rain_list
It might be a little messy, since it's essentially a patchwork of different possible solutions I have tried.
The values it gets are the name of the file (name) and the amount of values it reads from the file (amount).
Has anyone got an idea why this does not work as I expect it to work?
part is a list of strings. To check & convert for all floats, you'd have to do:
part = [float(x) for x in part]
(wrapped in your exception block)
BTW you should use the csv module to read comma-separated files. It's built-in. Also using enumerate would allow to be able to print the line where the error occurs, not only the data:
reader = csv.reader(tempfile) # better: pass directly the file handle
# and use next(reader) to discard the title line
for lineno,line in enumerate(reader,2): # lineno starts at 2 because of title line
try:
line = [float(x) for x in line]
except ValueError:
print("ERROR: invalid float in line {}: {}".format(lineno,line))
Related
For the code below, I am supposed to write three separate try/except-blocks, which react to the three following errors, respectively:
1) File is not available
2) A line in the file contains a fewer number of elements than expected
3) The user's entry cannot be found in the dictionary as a key
Also, the second error should NOT finish the program. Incorrect lines in the text-file are supposed to be skipped, so the dictionary should have three Key-Value-Pairs.
The content of the text-file looks like this. In the end, the program is supposed to be able to print both the left word(original word) and the right word (translation):
dog Hund
cat Katze
questionmark
snow Schnee
And this is the following code:
with open(filename, "r", encoding="utf8") as dictionaryfile:
for line in dictionaryfile:
elements_from_line = line.split()
word = elements_from_line[0]
translation = elements_from_line[1]
translationdictionary[word] = translation
inputword = input("Welches Wort soll übersetzt werden? >")
correct_translation = translationdictionary[inputword]
print("Das Eingabewort:\t{}\nDie Übersetzung:\t{}".format(inputword, correct_translation))
If u have to do it all in try-except blocks, it could look like this:
if the file does not exist - catch OSError
if a line contains in the file fewer elements than expected - catch IndexError
if user input is not in dictionary - it means that we have to catch KeyError
So, after that we can try something like this:
Editing your code:
filename = 'ala.txt'
translationdictionary = {}
try:
with open(filename, "r", encoding="utf8") as dictionaryfile:
for line in dictionaryfile:
elements_from_line = line.split()
try:
word = elements_from_line[0]
translation = elements_from_line[1]
translationdictionary[word] = translation
except IndexError:
continue # skip line
except OSError: # can not find file
print('File not found')
inputword = input("Welches Wort soll übersetzt werden? >")
try:
correct_translation = translationdictionary[inputword]
print("Das Eingabewort:\t{}\nDie Übersetzung:\t{}".format(inputword, correct_translation))
except KeyError: # lack of key in dict
print('Can not find input in dict')
read = open('700kLine.txt')
# use readline() to read the first line
line = read.readline()
aList = []
for line in read:
try:
num = int(line.strip())
aList.append(num)
except:
print ("Not a number in line " + line)
read.close()
print(aList)
There is 700k Line in that file (every single line has max 2 digits number)
I can only get ~280k Line in that file to in my aList.
So, How can I expand aList capacity 280k to 700k or more? (Is there a different solution for this case?)
Hello, I just solved that problem. Thanks for all your helps. That was an obvious buffer problem.
Solution is just increasing the size of buffer.
link is here
Increase output buffer when running or debugging in PyCharm
Please try this.
filename = '700kLine.txt'
with open(filename) as f:
data = f.readlines()
print(data)
print(type(data)) #stores the data in a list
Yes, you can.
Once a list is defined, you can add, edit or delete its elements. To add more elements at the end, use the append function:
MyList.append(data)
Where MyList is the name of the list and data is the element you want to add.
I tried to re-create your problem:
# creating 700kLine file
with open('700kLine.txt', 'w') as f:
for i in range(700000):
f.write(str(i+1) + '\n')
# creating list from file entries
aList = []
with open('700kLine.txt', 'r') as f:
for line in f:
num = int(line.strip())
aList.append(num)
# print(aList)
print(aList[:30])
Jupyter notebook throws an error while printing all 700K lines due to too much memory used. If you really want to print all 700k values, run the python script from terminal.
It could be that your computer ran out of memory processing the file? I have tried generating an infinite loop appending a single digit to the list and I ended up with 47 million-ish len(list) >> 47119572, the code I use to test as below.
I tried this code on an online REPL and it came to a significantly lower 'len(list)`.
list = []
while True:
try:
if len(list) > 0:
list.append(list[-1] + 1)
else:
list.append(1)
except MemoryError:
print("memory error, last count is: ", list[-1])
raise MemoryError
Maybe try saving bits of data read instead of reading the whole file at once?
Just my assumption.
I've been learning Python and I wanted to write a script to count the number of characters in a text and calculate their relative frequencies. But first, I wanted to know the length of the file. My intention is that, while the script goes from line to line counting all the characters, it would print the current line and the total number of lines, so I could know how much it is going to take.
I executed a simple for loop to count the number of lines, and then another for loop to count the characters and put them in a dictionary. However, when I run the script with the first for loop, it stops early. It doesn't even go into the second for loop as far as I know. If I remove this loop, the rest of the code goes on fine. What is causing this?
Excuse my code. It's rudimentary, but I'm proud of it.
My code:
import string
fname = input ('Enter a file name: ')
try:
fhand = open(fname)
except:
print ('Cannot open file.')
quit()
#Problematic bit. If this part is present, the script ends abruptly.
#filelength = 0
#for lines in fhand:
# filelength = filelength + 1
counts = dict()
currentline = 1
for line in fhand:
if len(line) == 0: continue
line = line.translate(str.maketrans('','',string.punctuation))
line = line.translate(str.maketrans('','',string.digits))
line = line.translate(str.maketrans('','',string.whitespace))
line = line.translate(str.maketrans('','',""" '"’‘“” """))
line = line.lower()
index = 0
while index < len(line):
if line[index] not in counts:
counts[line[index]] = 1
else:
counts[line[index]] += 1
index += 1
print('Currently at line: ', currentline, 'of', filelength)
currentline += 1
listtosort = list()
totalcount = 0
for (char, number) in list(counts.items()):
listtosort.append((number,char))
totalcount = totalcount + number
listtosort.sort(reverse=True)
for (number, char) in listtosort:
frequency = number/totalcount*100
print ('Character: %s, count: %d, Frequency: %g' % (char, number, frequency))
It looks fine the way you are doing it, however to simulate your problem, I downloaded and saved a Guttenberg text book. It's a unicode issue. Two ways to resolve it. Open it as a binary file or add the encoding. As it's text, I'd go the utf-8 option.
I'd also suggest you code it differently, below is the basic structure that closes the file after opening it.
filename = "GutenbergBook.txt"
try:
#fhand = open(filename, 'rb')
#open read only and utf-8 encoding
fhand = open(filename, 'r', encoding = 'utf-8')
except IOError:
print("couldn't find the file")
else:
try:
for line in fhand:
#put your code here
print(line)
except:
print("Error reading the file")
finally:
fhand.close()
For the op, this is a specific occasion. However, for visitors, if your code below the for state does not execute, it is not a python built-in issue, most likely to be: an exception error handling in parent caller.
Your iteration is inside a function, which is called inside a try except block of caller, then if any error occur during the loop, it will get escaped.
This issue can be hard to find, especially when you dealing with intricate architecture.
I can't seem to pull each individual line from a .txt file into a tuple. The 'city-data.txt' file is just a list of the 50 states, capitols, and their lat/longs. I need to create a tuple of all the states.
This is my code so far -
def read_cities(file_name):
file_name = open('city-data.txt' , 'r')
for line in file_name:
road_map = ((line.split('\t')))
return road_map
file_name.close()
print(read_cities('city-data.txt'))
When it's run, it only prints the very first line from the .txt file, as such:
['Alabama', 'Montgomery', '32.361538', '-86.279118\n']
The reason it prints only the very first line is because of this
for line in file_name:
road_map = ((line.split('\t')))
return road_map
You are returning immediately after you consume the first line. This is why it only prints the very first line.
Instead, you need to store these in a list, and return that list in the end.
def read_cities(file_name):
file_name = open('city-data.txt' , 'r')
road_maps = []
for line in file_name:
road_map = ((line.split('\t')))
road_maps.append(road_map)
file_name.close()
# road_maps is a list, since you wanted a tuple we convert it to that
return tuple(road_maps)
print(read_cities('city-data.txt'))
I need to create a tuple of all the states.
Does this mean you only want the first column from each line ? If so, modify it to
def read_cities(file_name):
# notice I've changed this to use file_name instead of
# the hard-coded filename string
file_name = open(file_name , 'r')
# if you need uniqueness, use states=set() and use .add instead
# of .append
states = []
for line in file_name:
line_split = line.split('\t')
# line_split is a list and the 0th index is the state column
state = line_split[0]
# use states.add if you used a set instead of a list
states.append(state)
file_name.close()
return tuple(states)
print(read_cities('city-data.txt'))
so I have to write a program that:
Takes the filename as an argument.
Reads the file and counts, for each band, how many albums of that band are listed in the file. (http://vlm1.uta.edu/~cconly/teaching/cse1310_spring2015/assignments/assignment7/albums.txt)
Prints on the screen, in descending order of number of albums, a line for each band. Each line should contain the name of the band, followed by a colon and space, and then the number of albums for that band. This would look like this:
band1: number1
band2: number2
band3: number3
so there is my code below, but I keep getting tremendous errors that tells me that things aren't defined when they are, and I'll get this one as well --> TypeError: 'NoneType' object is not iterable, any help would be great!
import fileinput
import os
filename = open("albums.txt", "r") # open album.txt file
def process_line(line):
line = line.lower()
new_line = ""
for letter in line:
if letter in (""",.!"'()"""):
continue
elif letter == '-':
letter = ' '
new_line = new_line + letter
words = new_line.split()
return words
def count_words(filename):
if (os.path.isfile(filename) == False):
print("\nError: file " + filename + " does not exist.\n")
return
#in_file = open(filename, "r")
result = {}
for line in filename:
words = process_line(line)
for word in words:
if (word in result):
result[word] += 1
else:
result[word] = 1
def print_word_frequencies(dictionary):
print()
inverse = inverse_dictionary(dictionary)
frequencies = inverse.keys()
frequencies = list(frequencies) # convert frequencies to a list, so that we can sort it.
frequencies.sort() # sorting the list
frequencies.reverse() # reverse the sorting of the list
for frequency in frequencies: # for words with the same frequency, we want them sorted in
list_of_words = inverse[frequency]
list_of_words.sort() # sorting in alphabetical order
for word in list_of_words:
print(word + ":", frequency)
def inverse_dictionary(in_dictionary):
out_dictionary = {}
for key in in_dictionary:
value = in_dictionary[key]
if (value in out_dictionary):
list_of_keys = out_dictionary[value]
list_of_keys.append(key)
else:
out_dictionary[value] = [key]
return out_dictionary
def main():
filename = "albums.txt"
dictionary = count_words(filename)
print_word_frequencies(dictionary)
main()
Since this is an assignment, I will not give you the full code, but just point out some errors.
First, your indentation is all wrong, and indentation is important in Python! This may just have happened when you pasted your code into the question editor, but maybe not. Particularly, make sure your are not mixing tabs and spaces!
Your count_words method does not return anything, thus dictionary is None and you get TypeError: 'NoneType' object is not iterable in inverse_dictionary
When you do for line in filename, you are iterating the characters in the file name, not the lines in the file, as the global variable filename is shadowed by the filename parameter. Open the file in that method using with open(filename) as the_file:
Your process_line method seems odd. It seems like you remove all the special characters, but then how do you plan to separate band name and album name? You seem to just count words, not albums per band. Try line.split(" - ")[0] to get the band.
All that dictionary-inversing is not needed at all. In print_word_frequencies, just sort the items from the dictionary using some custom key function to sort by the count.
With those hints, you should be able to fix your program. (In case you want to know, I got your program down to about ten lines of code.)