Hi I have a csv file where there are two columns, one with numbers and one with letters in the following format:
1234 k
343 o
5687 uiuuo
All I want to do is to fill the blank rows with the previous values. I have written that code which saves my work in a new csv but I get an error that says:
b = w[1]
IndexError: list index out of range
This is my code
import csv
with open('col.csv', 'r') as f:
reader = csv.reader(f)
my_list = list(reader)
#print my_list[1]
#x = my_list[1]
#print x[0]
x = 0
for count in my_list:
w = my_list[x]
a = w[0]
b = w[1]
print (a, b)
#print 'a', a , 'b', b
if a == '' and b == '' and x < 3044:
h = x - 1
my_list[x] = my_list[h]
#print 'my_list[x]', my_list[x]
x = x + 1
#print my_list[x]
elif a != '' and b != '' and x < 3044:
my_list[x] = (a,b)
x = x + 1
# print my_list[x]
writer = csv.writer(open('C:/Users/user/Desktop/col2.csv', 'wb'))
#for count in my_list:
data = my_list
for row in data:
writer.writerow(row)
#print count
When you say
blank lines with previous values
I'm assuming that you want to turn:
1234 k
343 o
5687 uiuuo
Into
1234 k
1234 k
343 o
343 o
5687 uiuuo
You have quite a lot of problems with your code:
import csv
with open('col.csv', 'r') as f:
reader = csv.reader(f)
my_list = list(reader)
If you've commented it out you don't need to include it in your question
#print my_list[1]
#x = my_list[1]
#print x[0]
x = 0
for count in my_list:
You do know that your list doesn't contain counts, right? This is just code that lies. Don't do that. Also, if you want to enumerate over a list and get the index along with the value, that's what enumerate is for. It should be for x, value in enumerate(my_list)
w = my_list[x]
a = w[0]
b = w[1]
Your second row doesn't actually have two elements in it. That's why your code fails. Oops.
print (a, b)
#print 'a', a , 'b', b
This code here is a hot mess. Why are you limiting to x < 3044? h is some random variable name that has no meaning. Don't do that either.
if a == '' and b == '' and x < 3044:
h = x - 1
my_list[x] = my_list[h]
#print 'my_list[x]', my_list[x]
x = x + 1
#print my_list[x]
elif a != '' and b != '' and x < 3044:
my_list[x] = (a,b)
x = x + 1
# print my_list[x]
Don't open files like this, it's possible that they'll never get flushed to disk. Or the entire file won't in any case. Always use a with block!
writer = csv.writer(open('C:/Users/user/Desktop/col2.csv', 'wb'))
#for count in my_list:
data = my_list
for row in data:
writer.writerow(row)
#print count
So... there's an interesting assumption here - that your first row must not be empty. I mean, I guess it could, but then you're going to be writing empty rows, and maybe you don't want that. Also your provided input doesn't seem to match what you're doing, since you're not using a \t delimiter.
If you think about what you want to do you can come up with the steps pretty easily:
for each row in the input file
write out that row to the output file
if it's blank/empty, write out the previous row
So that's pretty straight forward then.
import csv
with open('input.csv') as infile, open('output.csv', 'w') as outfile:
reader = csv.reader(infile, delimiter='\t')
writer = csv.writer(outfile)
for row in reader:
writer.writerow(row)
This works - but it doesn't write the previous row if we've got a blank row. Hm. So how can we do that? Well, why not store the previous row? And if the current row is empty, we can write the previous one instead.
previous_row = [] # If the first row is empty we need an empty list
# or whatever you want.
for row in reader:
if not row:
writer.writerow(previous_row)
else:
writer.writerow(row)
previous_row = row
If you want to treat ['', ''] as an empty row, too you just have to tweak the code:
if not row and not all(row):
...
Now if the row is empty, or the row contains false-y items it will skip that one as well.
Try not to index elements of an empty list or assign them to variables.
Most easy way in your case would be simply clone a complete row.
import csv
with open('col.csv', 'r') as f:
reader = csv.reader(f)
my_list = list(reader)
for i in range(0,len(my_list)):
currentLine = my_list[i]
#Make sure it's not the first line and it's empty, else continue
if not currentLine and i > 0:
my_list[i] =my_list[i-1]
with open('C:/Users/user/Desktop/col2.csv','wb') as f:
writer = csv.writer(f)
for row in my_list:
writer.writerow(row)
Related
I need to filter a txt file on specific words. Words that end with 'd', that are less than 10 letters and words that have duplicate letters should be filtered out from the txt file. Then they should be returned as a list of words and number of words as a pair. So far I have this
exclude = 'd'
f = open('nameofthefile.txt', 'r')
Here is what I made.
I'm not sure I understood everything you said but here is what I made:
nameofthefile.txt:
abcd
abcdefghijk
0123456789d
aabbccdd
main.py:
def filtering(filename: str):
f = open(filename,'r').read().split("\n")
lst = []
for e in f: # For elements in file
if e[-1] == "d" and len(e) < 10 and sorted([*set(e)]) == sorted(e):
lst += [e]
return lst
print(filtering('nameofthefile.txt')) # ["abcd"]
You can refractor it like:
def filtering(filename: str):
f = open(filename,'r').read().split("\n")
return [e for e in f if e[-1] == "d" and len(e) < 10 and sorted([*set(e)]) == sorted(e)]
print(filtering('nameofthefile.txt')) # ["abcd"]
If the last letter is d:
e[-1] == 'd'
If the word is less than 10 letters:
len(e) < 10
If the word doesn't have duplicate letters
sorted([*set(e)]) == sorted(e)
these are my input as a csv file but I can not run my code in ipython because of invalid syntax error but I do not know what should I do?
mandana,5,7,3,15
hamid,3,9,4,20,9,1,8,16,0,5,2,4,7,2,1
sina,0,5,20,14
soheila,13,2,5,1,3,10,12,4,13,17,7,7
ali,1,9
sarvin,0,16,16,13,19,2,17,8
def calculate_sorted_averages('C:\Users\Y A S H E L\Desktop\in.csv','C:\Users\Y A S H E L\Desktop\o.csv'):
averages = {}
with open('C:\Users\Y A S H E L\Desktop\in.csv') as csv_file:
csvfile = csv.reader(csv_file, delimiter=',')
for row in csvfile:
scores = []
for i in range(1, len(row)):
scores.append (float(row[i]))
avg = mean(scores)
averages [row[0]] = avg
averages_ord = OrderedDict (sorted (averages.items(), key=lambda x:(x[1], x[0])))
with open ('C:\Users\Y A S H E L\Desktop\o.csv', 'w') as out:
count = 0
for person in averages_ord:
count += 1
if count == 1:
out.write(person+ ","+ str(averages_ord[person]))
else:
out.write("\n"+ person+ ","+ str(averages_ord[person]))
When I copy your function to a python session I get:
def calculate_sorted_averages('C:\Users\Y A S H E L\Desktop\in.csv','C:\Users\Y A S H E L\Desktop\o.csv'):
^
SyntaxError: invalid syntax
You can define a function with
def foo(filename1, filename2):
You can not define it with a literal string, def foo('test.txt'):
A syntax error means your code is wrong at a basic Python syntax level. It doesn't even try to run your code.
This corrects that syntax error. I haven't tried to run it.
def calculate_sorted_averages(file1, file2):
averages = {}
with open(file1) as csv_file:
csvfile = csv.reader(csv_file, delimiter=",")
for row in csvfile:
scores = []
for i in range(1, len(row)):
scores.append(float(row[i]))
avg = mean(scores)
averages[row[0]] = avg
averages_ord = OrderedDict(sorted(averages.items(), key=lambda x: (x[1], x[0])))
with open(file2, "w") as out:
count = 0
for person in averages_ord:
count += 1
if count == 1:
out.write(person + "," + str(averages_ord[person]))
else:
out.write("\n" + person + "," + str(averages_ord[person]))
I am currently doing CS50 DNA pset and I wrote all of my code but it is slower for large files which results in check50 considering it wrong. I have attached my code and the error check50 shows below.
import sys
import csv
def main():
argc = len(sys.argv)
if (argc != 3):
print("Usage: python dna.py [database] [sequence]")
exit()
# Sets variable name for each argv argument
arg_database = sys.argv[1]
arg_sequence = sys.argv[2]
# Converts sequence csv file to string, and returns as thus
sequence = get_sequence(arg_sequence)
seq_len = len(sequence)
# Returns STR patterns as list
STR_array = return_STRs(arg_database)
STR_array_len = len(STR_array)
# Counts highest instance of consecutively reoccurring STRs
STR_values = STR_count(sequence, seq_len, STR_array, STR_array_len)
DNA_match(STR_values, arg_database, STR_array_len)
# Reads argv2 (sequence), and returns text within as a string
def get_sequence(arg_sequence):
with open(arg_sequence, 'r') as csv_sequence:
sequence = csv_sequence.read()
return sequence
# Reads STR headers from arg1 (database) and returns as list
def return_STRs(arg_database):
with open(arg_database, 'r') as csv_database:
database = csv.reader(csv_database)
STR_array = []
for row in database:
for column in row:
STR_array.append(column)
break
# Removes first column header (name)
del STR_array[0]
return STR_array
def STR_count(sequence, seq_len, STR_array, STR_array_len):
# Creates a list to store max recurrence values for each STR
STR_count_values = [0] * STR_array_len
# Temp value to store current count of STR recurrence
temp_value = 0
# Iterates over each STR in STR_array
for i in range(STR_array_len):
STR_len = len(STR_array[i])
# Iterates over each sequence element
for j in range(seq_len):
# Ensures it's still physically possible for STR to be present in sequence
while (seq_len - j >= STR_len):
# Gets sequence substring of length STR_len, starting from jth element
sub = sequence[j:(j + (STR_len))]
# Compares current substring to current STR
if (sub == STR_array[i]):
temp_value += 1
j += STR_len
else:
# Ensures current STR_count_value is highest
if (temp_value > STR_count_values[i]):
STR_count_values[i] = temp_value
# Resets temp_value to break count, and pushes j forward by 1
temp_value = 0
j += 1
i += 1
return STR_count_values
# Searches database file for DNA matches
def DNA_match(STR_values, arg_database, STR_array_len):
with open(arg_database, 'r') as csv_database:
database = csv.reader(csv_database)
name_array = [] * (STR_array_len + 1)
next(database)
# Iterates over one row of database at a time
for row in database:
name_array.clear()
# Copies entire row into name_array list
for column in row:
name_array.append(column)
# Converts name_array number strings to actual ints
for i in range(STR_array_len):
name_array[i + 1] = int(name_array[i + 1])
# Checks if a row's STR values match the sequence's values, prints the row name if match is found
match = 0
for i in range(0, STR_array_len, + 1):
if (name_array[i + 1] == STR_values[i]):
match += 1
if (match == STR_array_len):
print(name_array[0])
exit()
print("No match")
exit()
main()
Check50 error link:
https://submit.cs50.io/check50/fd890301a0dc9414cd29c2b4dcb27bd47e6d0a48
If you wait for long, then you get the answer but since my program is running slow check50 is considering it wrong
Well, I solved it just by adding a break statement.
I have a txt file which has values x , y listed as
20
80
70.....
I wrote code to read the x and y but i am not sure what i am doing wrong .
def readTruth():
with open("Truth.txt") as f:
for line in f:
x_truth, y_truth = line.split("\n")
return x_truth,y_truth
def main():
x,y = readTruth()
print(x)
if __name__ == "__main__":
main()
I only see one value getting printed in x.
You are reading one line at a time. So you cannot access the values in the 2nd line while reading the first line. Splitting the line by the newline character "\n" will do nothing in this instance.
If you only have 2 lines in your text file, you could do something like this:
# Note here that I am lazy and used a string here instead of text file
a_string = "1\n2"
def readTruth():
x_truth, y_truth = a_string.split("\n")
return x_truth,y_truth
x,y = readTruth()
print(x) # 1
print(y) # 2
But I suspect you have more than just 2 values. You could refactor your text file to hold the 2 values on the same line, separated by a space or comma. If you do so, your solution will work. You would just need to split by the comma or space, whichever delimiter you choose to use.
If you must have each number on a separate line, then your solution won't work. You would need to add the results to a list of X values and a list of Y values:
# truth.txt:
# 1
# 2
# 3
# 4
#
f = open("truth.txt", "r")
def readTruth():
counter = 1
X_vals = []
Y_vals = []
for line in f.readlines():
# If it is an even numbered line, add to Y_vals
if counter % 2 == 0:
Y_vals.append(line.strip("\n"))
# Otherwise it is an odd numbered line, so add to X_vals
else:
X_vals.append(line.strip("\n"))
counter+=1
return X_vals, Y_vals
x,y = readTruth()
print(x) # ['1', '3']
print(y) # ['2', '4']
Based on comments from the question poster, I assume they have a blank line between each number in their text file. This means each number is on an odd numbered line. The quick solution, added onto my previous example, is to skip blank lines:
# truth.txt:
# 1
#
# 2
#
# 3
#
# 4
#
f = open("truth.txt", "r")
def readTruth():
counter = 1
X_vals = []
Y_vals = []
for line in f.readlines():
# Skip blank lines
if line.strip("\n") != "":
# If it is an even numbered line, add to Y_vals
if counter % 2 == 0:
Y_vals.append(line.strip("\n"))
# Otherwise it is an odd numbered line, so add to X_vals
else:
X_vals.append(line.strip("\n"))
counter+=1
return X_vals, Y_vals
x,y = readTruth()
print(x) # ['1', '3']
print(y) # ['2', '4']
We obtain the values of X and Y:
def readTruth():
with open("Truth.txt") as f:
for line in f:
x_truth, y_truth = line.split("\n")
return x_truth,y_truth
def main():
x,y = readTruth()
print("Var_X = "+ str(x[0]))
print("Var_Y = "+ str(x[1]))
You can put the variables in a list for each X and Y
I'm trying to put my list "Grid" into the board so whenever the user gives what column and row they want to fire at, it will update the guess onto the board. Any help on this?
def displayGrid(Rows,Columns):
output = ' |'
if (Rows >= 10) or (Columns >= 27):
print("Please pick a Row less than 10 or a number less than 27")
else:
for title in range(97,97+Columns):
output = output + chr(title)
output = output + ' |'
print(output.upper())
for row in range(Rows):
output = str(row+1) + '| '
for col in range(Columns):
output = output + ' | '
print(output)
displayGrid(Rows, Columns)
GuessRow = int(input("What row do you guess? \n"))
GuessColumn = int(input("What column do you guess? \n"))
def userGuess(GuessRow, GuessColumn):
grid = []
for row in range(Rows):
grid.append([])
for col in range(Columns):
grid[row].append('')
grid[GuessRow-1][GuessColumn-1] = 'X'
print(grid)
userGuess(GuessRow, GuessColumn)
Here are couple example functions. create_grid creates an empty grid of zeros. update_grid updates the specified index to an X. I find pprint helpful for nicely formatting nested tables. Also checkout tabulate library when you are working on output.
from pprint import pprint as pp
def create_grid(numRows,numColumns):
grid = []
for row in range(numRows):
row = []
for column in range(numColumns):
row.append(0)
grid.append(row)
return grid
def update_grid(grid, guessRow, guessColumn):
grid[guessColumn][guessRow] = 'X'
numRows = 7
numColumns = 7
grid = create_grid(numRows,numColumns)
pp(grid)
guessRow = 5
guessColumn = 2
update_grid(grid, guessRow, guessColumn)
pp(grid)