cs50 Pset 6 DNA - Issue creating list - python-3.x

I have a code which iterates through the text, and tells me which is the maximum amount of times each dna STR is found. The only step missing to be able to match these values with the CSV file, is to store them into a list, BUT I AM NOT ABLE TO DO SO. When I run the code, the maximum values are printed independently for each STR sequence.
I have tried to "append" the values into a list, but I was not successful, thus, I cannot match it with the dna sequences of the CSV (large nor small).
Any help or advcise is greatly appreciated!
Here is my code, and the results I get with using "text 1" and "small csv":
`
import cs50
import sys
import csv
import os
if len(sys.argv) != 3:
print("Usage: python dna.py data.csv sequence.txt")
csv_db = sys.argv[1]
file_seq = sys.argv[2]
with open(csv_db, newline='') as csvfile: #with open(csv_db) as csv_file
csv_reader = csv.reader(csvfile, delimiter=',')
header = next(csv_reader)
i = 1
while i < len(header):
STR = header[i]
len_STR = len(STR)
with open(file_seq, 'r') as my_file:
file_reader = my_file.read()
counter = 0
a = 0
b = len_STR
list = []
for text in file_reader:
if file_reader[a:b] != STR:
a += 1
b += 1
else:
counter += 1
a += len_STR
b += len_STR
list.append(counter)
print(list)
i += 1
`

The problem is in place of variable "list" declaration. Every time you iterates through STRs in variable "header" you declares:
list = []
Thus, you create absolutely new variable, which stores only the length of current STR. To make a list with all STRs appended you need to declare variable "list" before the while loop and operator "print" after the while loop:
list = []
while i < len(header):
<your loop code>
print(list)
This should solve your problem.
P.S. Avoid to use "list" as a variable declaration. The "list" is a python built-in function and it is automatically declared. So, when you redeclare it, you will not be able to use list() function in your code.

Related

Never resets list

I am trying to create a calorie counter the standard input goes like this:
python3 calories.txt < test.txt
Inside calories the food is the following format: apples 500
The problem I am having is that whenever I calculate the values for the person it seems to never return to an empty list..
import sys
food = {}
eaten = {}
finished = {}
total = 0
#mappings
def calories(x):
with open(x,"r") as file:
for line in file:
lines = line.strip().split()
key = " ".join(lines[0:-1])
value = lines[-1]
food[key] = value
def calculate(x):
a = []
for keys,values in x.items():
for c in values:
try:
a.append(int(food[c]))
except:
a.append(100)
print("before",a)
a = []
total = sum(a) # Problem here
print("after",a)
print(total)
def main():
calories(sys.argv[1])
for line in sys.stdin:
lines = line.strip().split(',')
for c in lines:
values = lines[0]
keys = lines[1:]
eaten[values] = keys
calculate(eaten)
if __name__ == '__main__':
main()
Edit - forgot to include what test.txt would look like:
joe,almonds,almonds,blue cheese,cabbage,mayonnaise,cherry pie,cola
mary,apple pie,avocado,broccoli,butter,danish pastry,lettuce,apple
sandy,zuchini,yogurt,veal,tuna,taco,pumpkin pie,macadamia nuts,brazil nuts
trudy,waffles,waffles,waffles,chicken noodle soup,chocolate chip cookie
How to make it easier on yourself:
When reading the calories-data, convert the calories to int() asap, no need to do it every time you want to sum up somthing that way.
Dictionary has a .get(key, defaultvalue) accessor, so if food not found, use 100 as default is a 1-liner w/o try: ... except:
This works for me, not using sys.stdin but supplying the second file as file as well instead of piping it into the program using <.
I modified some parsings to remove whitespaces and return a [(name,cal),...] tuplelist from calc.
May it help you to fix it to your liking:
def calories(x):
with open(x,"r") as file:
for line in file:
lines = line.strip().split()
key = " ".join(lines[0:-1])
value = lines[-1].strip() # ensure no whitespaces in
food[key] = int(value)
def getCal(foodlist, defValueUnknown = 100):
"""Get sum / total calories of a list of ingredients, unknown cost 100."""
return sum( food.get(x,defValueUnknown ) for x in foodlist) # calculate it, if unknown assume 100
def calculate(x):
a = []
for name,foods in x.items():
a.append((name, getCal(foods))) # append as tuple to list for all names/foods eaten
return a
def main():
calories(sys.argv[1])
with open(sys.argv[2]) as f: # parse as file, not piped in via sys.stdin
for line in f:
lines = line.strip().split(',')
for c in lines:
values = lines[0].strip()
keys = [x.strip() for x in lines[1:]] # ensure no whitespaces in
eaten[values] = keys
calced = calculate(eaten) # calculate after all are read into the dict
print (calced)
Output:
[('joe', 1400), ('mary', 1400), ('sandy', 1600), ('trudy', 1000)]
Using sys.stdin and piping just lead to my console blinking and waiting for manual input - maybe VS related...

Counting the occurrences of all letters in a txtfile [duplicate]

This question already has answers here:
I'm trying to count all letters in a txt file then display in descending order
(4 answers)
Closed 6 years ago.
I'm trying to open a file and count the occurrences of letters.
So far this is where I'm at:
def frequencies(filename):
infile=open(filename, 'r')
wordcount={}
content = infile.read()
infile.close()
counter = {}
invalid = "ā€˜'`,.?!:;-_\nā€”' '"
for word in content:
word = content.lower()
for letter in word:
if letter not in invalid:
if letter not in counter:
counter[letter] = content.count(letter)
print('{:8} appears {} times.'.format(letter, counter[letter]))
Any help would be greatly appreciated.
best way is using numpy packages, the example would be like this
import numpy
text = "xvasdavawdazczxfawaczxcaweac"
text = list(text)
a,b = numpy.unique(text, return_counts=True)
x = sorted(zip(b,a), reverse=True)
print(x)
in your case, you can combine all your words into single string, then convert the string into list of character
if you want to remove all except character, you can use regex to clean it
#clean all except character
content = re.sub(r'[^a-zA-Z]', r'', content)
#convert to list of char
content = list(content)
a,b = numpy.unique(content, return_counts=True)
x = sorted(zip(b,a), reverse=True)
print(x)
If you are looking for a solution not using numpy:
invalid = set([ch for ch in "ā€˜'`,.?!:;-_\nā€”' '"])
def frequencies(filename):
counter = {}
with open(filename, 'r') as f:
for ch in (char.lower() for char in f.read()):
if ch not in invalid:
if ch not in counter:
counter[ch] = 0
counter[ch] += 1
results = [(counter[ch], ch) for ch in counter]
return sorted(results)
for result in reversed(frequencies(filename)):
print result
I would suggest using collections.Counter instead.
Compact Solution
from collections import Counter
from string import ascii_lowercase # a-z string
VALID = set(ascii_lowercase)
with open('in.txt', 'r') as fin:
counter = Counter(char.lower() for line in fin for char in line if char.lower() in VALID)
print(counter.most_common()) # print values in order of most common to least.
More readable solution.
from collections import Counter
from string import ascii_lowercase # a-z string
VALID = set(ascii_lowercase)
with open('in.txt', 'r') as fin:
counter = Counter()
for char in (char.lower() for line in fin for char in line):
if char in VALID:
counter[char] += 1
print(counter)
If you don't want to use a Counter then you can just use a dict.
from string import ascii_lowercase # a-z string
VALID = set(ascii_lowercase)
with open('test.txt', 'r') as fin:
counter = {}
for char in (char.lower() for line in fin for char in line):
if char in VALID:
# add the letter to dict
# dict.get used to either get the current count value
# or default to 0. Saves checking if it is in the dict already
counter[char] = counter.get(char, 0) + 1
# sort the values by occurrence in descending order
data = sorted(counter.items(), key = lambda t: t[1], reverse = True)
print(data)

how to update contents of file in python

def update():
global mylist
i = j = 0
mylist[:]= []
key = input("enter student's tp")
myf = open("data.txt","r+")
ml = myf.readlines()
#print(ml[1])
for line in ml:
words = line.split()
mylist.append(words)
print(mylist)
l = len(mylist)
w = len(words)
print(w)
print(l)
for i in range(l):
for j in range(w):
print(mylist[i][j])
## if(key == mylist[i][j]):
## print("found at ",i,j)
## del mylist[i][j]
## mylist[i].insert((j+1), "xxx")
below is the error
print(mylist[i][j])
IndexError: list index out of range
I am trying to update contents in a file. I am saving the file in a list as lines and each line is then saved as another list of words. So "mylist" is a 2D list but it is giving me error with index
Your l variable is the length of the last line list. Others could be shorter.
A better idiom is to use a for loop to iterate over a list.
But there is an even better way.
It appears you want to replace a "tp" (whatever that is) with the string xxx everywhere. A quicker way to do that would be to use regular expressions.
import re
with open('data.txt') as myf:
myd = myf.read()
newd = re.sub(key, 'xxx', myd)
with open('newdata.txt', 'w') ad newf:
newf.write(newd)

How to print results from this function

I'm new to Python and programming in general and need a little help with this (partially finished) function. It's calling a text file with a bunch of rows of comma delimited data (age, salary, education and so on). However, I've run into a problem from the outset. I don't know how to return the results.
My aim is to create dictionaries for each category and for each row to be sorted and tallied.
e.g. 100 people over 50, 200 people under 50 and so on.
Am I in the correct ball park?
file = "adultdata.txt"
def make_data(file):
try:
f = open(file, "r")
except IOError as e:
print(e)
return none
large_list = []
avg_age = 0
row_count_under50 = 0
row_count_over50 = 0
#create 2 dictionaries per category
employ_dict_under50 = {}
employ_dict_over50 = {}
for row in f:
edited_row = row.strip()
my_list = edited_row.split(",")
try:
#Age Category
my_list[0] = int(my_list[0])
#Work Category
if my_list[-1] == " <=50K":
if my_list[1] in employ_dict_under50:
employ_dict_under50[my_list[1]] += 1
else:
employ_dict_under50[my_list[1]] = 1
row_count_u50 += 1
else:
if my_list[1] in emp_dict_o50:
employ_dict_over50[my_list[1]] += 1
else:
employ_dict_over50[my_list[1]] = 1
row_count_o50 += 1
# Other categories here
print(my_list)
#print(large_list)
#return
# Ignored categories here - e.g. my_list[insert my list numbers here] = None
I do not have access to your file but I had a go at correcting most of the errors you had in your code.
These are a list of the mistakes I found in your code:
your function make_data is essentially useless and is out of scope. You need to remove it entirely
When using a file object f, you need to use readline to extract data from the file.
It is also best to use a with statement when using IO resources like files
You had numerous variables which were badly named in the inner loop and did not exist
You declared a try in the inner loop without a catch. You can remove the try because you are not trying to catch any Error
You have some very basic errors which are related to general programming, can I assume your new to this? If thats the case then you should probably follow some more beginner tutorials online until you get a grasp of what commands you need to use to perform basic tasks.
Try compare your code to this and see if you can understand what i'm trying to say:
file = "adultdata.txt"
large_list = []
avg_age = 0
row_count_under50 = 0
row_count_over50 = 0
#create 2 dictionaries per category
employ_dict_under50 = {}
employ_dict_over50 = {}
with open(file, "r") as f:
row = f.readline()
edited_row = row.strip()
my_list = edited_row.split(",")
#Age Category
my_list[0] = int(my_list[0])
#Work Category
if my_list[-1] == " <=50K":
if my_list[1] in employ_dict_under50:
employ_dict_under50[my_list[1]] += 1
else:
employ_dict_under50[my_list[1]] = 1
row_count_under50 += 1
else:
if my_list[1] in employ_dict_over50:
employ_dict_over50[my_list[1]] += 1
else:
employ_dict_over50[my_list[1]] = 1
row_count_over50 += 1
# Other categories here
print(my_list)
#print(large_list)
#return
I cannot say for certain if this code will work or not without your file but it should give you a head start.

How to merge two lists at a delimited token in python3

I am a CS major at the University of Alabama, we have a project in our python class and I am stuck...probably for some stupid reason, but I cant seem to find the answer.
here is the link to the project, as it would be a pain to try and explain on here.
http://beastie.cs.ua.edu/cs150/projects/project1.html
here is my code:
import sys
from scanner import scan
def clInput():
#Gets command line input
log1 = sys.argv[1]
log2 = sys.argv[2]
name = sys.argv[3]
if len(sys.argv) != 4:
print('Incorrect number of arguments, should be 3')
sys.exit(1)
return log1,log2,name
def openFiles(log1,log2):
#Opens sys.argv[1]&[2] for reading
f1 = open(log1, 'r')
f2 = open(log2, 'r')
return f1, f2
def merge(log1,log2):
#Merges parsed logs into list without '---'
log1Parse = [[]]
log2Parse = [[]]
log1Count = 0
log2Count = 0
for i in log1:
if i != ['---']:
log1Parse[log1Count].append(i)
else:
log1Count += 1
log1Parse.append([])
for i in log2:
if i != ['---']:
log2Parse[log2Count].append(i)
else:
log2Count += 1
log2Parse.append([])
return(log1Parse[0] + log2Parse[0] + log1Parse[1] + log2Parse[1])
def searchMerge(name,merged):
#Searches Merged list for sys.argv[3]
for i in range(len(merged)):
if (merged[i][1] == name):
print(merged[i][0],merged[i][1]," ".join(merged[i][2:]))
def main():
log1,log2,name = clInput()
f1,f2 = openFiles(log1,log2)
#Sets the contents of the two scanned files to variables
tokens1 = scan(f1)
tokens2 = scan(f2)
#Call to merge and search
merged = merge(tokens1,tokens2)
searchMerge(name,merged)
main()
ok. so heres the problem. We are to merge two lists together into a sorted master list, delimited at the ---'s
my two log files match the ones posted on the website i linked to above. This code works, however if there are more than two instances of the ---'s in each list, it will not jump to the next list to get the other tokens, and so forth. I have it working for two with the merge function. at the end of that function i return
return(log1Parse[0] + log2Parse[0] + log1Parse[1] + log2Parse[1])
but this only works for two instances of ---. Is there anyway i can change my return to look at all of the indexes instead of having to manually put in [0],[1],[2], etc.? I need it to delimit and merge for an arbitrary amount. Please help!!
p.s. disregard the noobness...im a novice, we all gotta start somewhere
p.p.s. - the from scanner import scan is a scanner i wrote to take in all of the tokens in a given list
so.py:
import sys
def main():
# check and load command line arguments
# your code
if len(sys.argv) != 4:
print('Incorrect number of arguments, should be 3')
sys.exit(1)
# open files using file io
# your code
f1 = open(log1, 'r')
f2 = open(log2, 'r')
# list comprehension to process and filter log files
l1 = [ x.strip().split(" ",2) for x in f1.readlines() if x.strip() != "---" ]
l2 = [ x.strip().split(" ",2) for x in f2.readlines() if x.strip() != "---" ]
f1.close()
f2.close()
sorted_merged_lists = sorted(l1 + l2)
results = [ x for x in sorted_merged_lists if x[1] == name ]
for result in results:
print result
main()
CLI:
$ python so.py log1.txt log2.txt Matt
['12:06:12', 'Matt', 'Logged In']
['13:30:07', 'Matt', 'Opened Terminal']
['15:02:00', 'Matt', 'Opened Evolution']
['15:31:16', 'Matt', 'Logged Out']
docs:
http://docs.python.org/release/3.0.1/tutorial/datastructures.html#list-comprehensions
http://docs.python.org/release/3.0.1/library/stdtypes.html?highlight=strip#str.strip
http://docs.python.org/release/3.0.1/library/stdtypes.html?highlight=split#str.split
http://docs.python.org/release/3.0.1/library/functions.html?highlight=sorted#sorted

Resources