Never resets list - python-3.x

I am trying to create a calorie counter the standard input goes like this:
python3 calories.txt < test.txt
Inside calories the food is the following format: apples 500
The problem I am having is that whenever I calculate the values for the person it seems to never return to an empty list..
import sys
food = {}
eaten = {}
finished = {}
total = 0
#mappings
def calories(x):
with open(x,"r") as file:
for line in file:
lines = line.strip().split()
key = " ".join(lines[0:-1])
value = lines[-1]
food[key] = value
def calculate(x):
a = []
for keys,values in x.items():
for c in values:
try:
a.append(int(food[c]))
except:
a.append(100)
print("before",a)
a = []
total = sum(a) # Problem here
print("after",a)
print(total)
def main():
calories(sys.argv[1])
for line in sys.stdin:
lines = line.strip().split(',')
for c in lines:
values = lines[0]
keys = lines[1:]
eaten[values] = keys
calculate(eaten)
if __name__ == '__main__':
main()
Edit - forgot to include what test.txt would look like:
joe,almonds,almonds,blue cheese,cabbage,mayonnaise,cherry pie,cola
mary,apple pie,avocado,broccoli,butter,danish pastry,lettuce,apple
sandy,zuchini,yogurt,veal,tuna,taco,pumpkin pie,macadamia nuts,brazil nuts
trudy,waffles,waffles,waffles,chicken noodle soup,chocolate chip cookie

How to make it easier on yourself:
When reading the calories-data, convert the calories to int() asap, no need to do it every time you want to sum up somthing that way.
Dictionary has a .get(key, defaultvalue) accessor, so if food not found, use 100 as default is a 1-liner w/o try: ... except:
This works for me, not using sys.stdin but supplying the second file as file as well instead of piping it into the program using <.
I modified some parsings to remove whitespaces and return a [(name,cal),...] tuplelist from calc.
May it help you to fix it to your liking:
def calories(x):
with open(x,"r") as file:
for line in file:
lines = line.strip().split()
key = " ".join(lines[0:-1])
value = lines[-1].strip() # ensure no whitespaces in
food[key] = int(value)
def getCal(foodlist, defValueUnknown = 100):
"""Get sum / total calories of a list of ingredients, unknown cost 100."""
return sum( food.get(x,defValueUnknown ) for x in foodlist) # calculate it, if unknown assume 100
def calculate(x):
a = []
for name,foods in x.items():
a.append((name, getCal(foods))) # append as tuple to list for all names/foods eaten
return a
def main():
calories(sys.argv[1])
with open(sys.argv[2]) as f: # parse as file, not piped in via sys.stdin
for line in f:
lines = line.strip().split(',')
for c in lines:
values = lines[0].strip()
keys = [x.strip() for x in lines[1:]] # ensure no whitespaces in
eaten[values] = keys
calced = calculate(eaten) # calculate after all are read into the dict
print (calced)
Output:
[('joe', 1400), ('mary', 1400), ('sandy', 1600), ('trudy', 1000)]
Using sys.stdin and piping just lead to my console blinking and waiting for manual input - maybe VS related...

Related

Read data from txt file, store it, use it for analyzing, write it to the txt file

The task is to read from given txt file the data add the numbers in there to the list[], so that every number in a row will be a element/object in this list. After reading the file created list will be sent to the main().
this list with the objects will be parameters for the def Analyze part in where at the same time
will be found min, max, average and sum.
def lueTiedosto(data):
Tiedosto = open("L07T4D1.txt", 'r', encoding="UTF-8")
Rivi = Tiedosto.readline()
while (len(Rivi) > 0):
data.append(int(Rivi))
Rivi = Tiedosto.readline()
for element in data:
print(element)
print(f"Tiedosto L07T4D1.txt luettu.")
Tiedosto.close()
return element
The fixed code which works:
def lueTiedosto(data):
Lue = input("Luettavan tiedoston nimi on ''.\n")
print(f"Anna uusi nimi, enter säilyttää nykyisen: ", end='')
Tiedosto = open(Lue, 'r', encoding="UTF-8")
Rivi = Tiedosto.readline()
while (len(Rivi) > 0):
data.append(int(Rivi))
Rivi = Tiedosto.readline()
print(f"Tiedosto '{Lue}' luettu.")
Tiedosto.close()
return data
Making an assumption that your input file is similar to the following:
10000
12345
10008
12000
I would do the following:
filepath = r".......\L07T4D1.txt" # Path to file being loaded
def readData(filepath: str) -> list[int]:
# Returns a list of integers from file
rslt = []
with open (filepath, 'r') as f:
data = f.readline().strip()
while data:
data = data.split(' ')
rslt.append(int(data[0]))
data = f.readline().strip()
return rslt
def analyze(data: list[int]) -> None:
# prints results of data analysis
print(f'Max Value = {max(data)}')
print(f'Min Value = {min(data)}')
print(f'Sum Value = {sum(data)}')
print(f'Avg Value = {sum(data)/len(data)}')
Running analyze(readData(filepath)) Yields:
Max Value = 12345
Min Value = 10000
Sum Value = 44353
Avg Value = 11088.25

pd.rename key KeyError: 'New_Name'

Edit 12/07/19: The problem was not in fact with pd.rename fuction but the fact that I did not return from the function the pandas dataframe and as a result the column change did not exist when printing. i.e.
def change_column_names(as_pandas, old_name, new_name):
as_pandas.rename(columns={old_name: new_name}, inplace=)
return as_pandas <- This was missing*
Please see the user comment below to uptick them for finding this error for me.
Alternatively, you can continue reading.
The data can be downloaded from this link, yet I have added a sample dataset. The formatting of the file is not a typical CSV file and I believe this may have been an assessment piece and is related to Hidden Decision Tree article. I have given the portion of the code as it solves the issues surrounding the format of the text file as mentioned above and allows the user to rename the column.
The problem occured when I tried to assign create a re-naming function:
def change_column_names(as_pandas, old_name, new_name):
as_pandas.rename(columns={old_name: new_name}, inplace=)
However, it seem to work when I set the variable names inside rename function.
def change_column_names(as_pandas):
as_pandas.rename(columns={'Unique Pageviews': 'Page_Views'}, inplace=True)
return as_pandas
Sample Dataset
Title URL Date Unique Pageviews
oupUrl=tutorials 18-Apr-15 5608
"An Exclusive Interview with Data Expert, John Bottega" http://www.datasciencecentral.com/forum/topics/an-exclusive-interview-with-data-expert-john-bottega?groupUrl=announcements 10-Jun-14 360
Announcing Composable Analytics http://www.datasciencecentral.com/forum/topics/announcing-composable-analytics 15-Jun-14 367
Announcing the release of Spark 1.5 http://www.datasciencecentral.com/forum/topics/announcing-the-release-of-spark-1-5 12-Sep-15 156
Are Extreme Weather Events More Frequent? The Data Science Answer http://www.datasciencecentral.com/forum/topics/are-extreme-weather-events-more-frequent-the-data-science-answer 5-Oct-15 204
Are you interested in joining the University of California for an empiricalstudy on 'Big Data'? http://www.datasciencecentral.com/forum/topics/are-you-interested-in-joining-the-university-of-california-for-an 7-Feb-13 204
Are you smart enough to work at Google? http://www.datasciencecentral.com/forum/topics/are-you-smart-enough-to-work-at-google 11-Oct-15 3625
"As a software engineer, what's the best skill set to have for the next 5-10years?" http://www.datasciencecentral.com/forum/topics/as-a-software-engineer-what-s-the-best-skill-set-to-have-for-the- 12-Feb-16 2815
A Statistician's View on Big Data and Data Science (Updated) http://www.datasciencecentral.com/forum/topics/a-statistician-s-view-on-big-data-and-data-science-updated-1 21-May-14 163
A synthetic variance designed for Hadoop and big data http://www.datasciencecentral.com/forum/topics/a-synthetic-variance-designed-for-hadoop-and-big-data?groupUrl=research 26-May-14 575
A Tough Calculus Question http://www.datasciencecentral.com/forum/topics/a-tough-calculus-question 10-Feb-16 937
Attribution Modeling: Key Analytical Strategy to Boost Marketing ROI http://www.datasciencecentral.com/forum/topics/attribution-modeling-key-concept 24-Oct-15 937
Audience expansion http://www.datasciencecentral.com/forum/topics/audience-expansion 6-May-13 223
Automatic use of insights http://www.datasciencecentral.com/forum/topics/automatic-use-of-insights 27-Aug-15 122
Average length of dissertations by higher education discipline. http://www.datasciencecentral.com/forum/topics/average-length-of-dissertations-by-higher-education-discipline 4-Jun-15 1303
This is the full code that produces the Key Error:
def change_column_names(as_pandas):
as_pandas.rename(columns={'Unique Pageviews': 'Page_Views'}, inplace=True)
def change_column_names(as_pandas, old_name, new_name):
as_pandas.rename(columns={old_name: new_name}, inplace=True)
def change_column_names(as_pandas):
as_pandas.rename(columns={'Unique Pageviews': 'Page_Views'},
inplace=True)
def open_as_dataframe(file_name_in):
reader = pd.read_csv(file_name_in, encoding='windows-1251')
return reader
# Get each column of data including the heading and separate each element
i.e. Title, URL, Date, Page Views
# and save to string_of_rows with comma separator for storage as a csv
# file.
def get_columns_of_data(*args):
# Function that accept variable length arguments
string_of_rows = str()
num_cols = len(args)
try:
if num_cols > 0:
for number, element in enumerate(args):
if number == (num_cols - 1):
string_of_rows = string_of_rows + element + '\n'
else:
string_of_rows = string_of_rows + element + ','
except UnboundLocalError:
print('Empty file \'or\' No arguments received, cannot be zero')
return string_of_rows
def open_file(file_name):
try:
with open(file_name) as csv_file_in, open('HDT_data5.txt', 'w') as csv_file_out:
csv_read = csv.reader(csv_file_in, delimiter='\t')
for row in csv_read:
try:
row[0] = row[0].replace(',', '')
csv_file_out.write(get_columns_of_data(*row))
except TypeError:
continue
print("The file name '{}' was successfully opened and read".format(file_name))
except IOError:
print('File not found \'OR\' Not in current directory\n')
# All acronyms used in variable naming correspond to the function at time
# of return from function.
# csv_list being a list of the v file contents the remainder i.e. 'st' of
# csv_list_st = split_title().
def main():
open_file('HDTdata3.txt')
multi_sets = open_as_dataframe('HDT_data5.txt')
# change_column_names(multi_sets)
change_column_names(multi_set, 'Old_Name', 'New_Name')
print(multi_sets)
main()
I cleaned up your code so it would run. You were changing the column names but not returning the result. Try the following:
import pandas as pd
import numpy as np
import math
def set_new_columns(as_pandas):
titles_list = ['Year > 2014', 'Forum', 'Blog', 'Python', 'R',
'Machine_Learning', 'Data_Science', 'Data',
'Analytics']
for number, word in enumerate(titles_list):
as_pandas.insert(len(as_pandas.columns), titles_list[number], 0)
def title_length(as_pandas):
# Insert new column header then count the number of letters in 'Title'
as_pandas.insert(len(as_pandas.columns), 'Title_Length', 0)
as_pandas['Title_Length'] = as_pandas['Title'].map(str).apply(len)
# Although it is log, percentage of change is inverse linear comparison of
#logX1 - logX2
# therefore you could think of it as the percentage change in Page Views
# map
# function allows for function to be performed on all rows in column
# 'Page_Views'.
def log_page_view(as_pandas):
# Insert new column header
as_pandas.insert(len(as_pandas.columns), 'Log_Page_Views', 0)
as_pandas['Log_Page_Views'] = as_pandas['Page_Views'].map(lambda x: math.log(1 + float(x)))
def change_to_numeric(as_pandas):
# Check for missing values then convert the column to numeric.
as_pandas = as_pandas.replace(r'^\s*$', np.nan, regex=True)
as_pandas['Page_Views'] = pd.to_numeric(as_pandas['Page_Views'],
errors='coerce')
def change_column_names(as_pandas):
as_pandas.rename(columns={'Unique Pageviews': 'Page_Views'}, inplace=True)
return as_pandas
def open_as_dataframe(file_name_in):
reader = pd.read_csv(file_name_in, encoding='windows-1251')
return reader
# Get each column of data including the heading and separate each element
# i.e. Title, URL, Date, Page Views
# and save to string_of_rows with comma separator for storage as a csv
# file.
def get_columns_of_data(*args):
# Function that accept variable length arguments
string_of_rows = str()
num_cols = len(args)
try:
if num_cols > 0:
for number, element in enumerate(args):
if number == (num_cols - 1):
string_of_rows = string_of_rows + element + '\n'
else:
string_of_rows = string_of_rows + element + ','
except UnboundLocalError:
print('Empty file \'or\' No arguments received, cannot be zero')
return string_of_rows
def open_file(file_name):
import csv
try:
with open(file_name) as csv_file_in, open('HDT_data5.txt', 'w') as csv_file_out:
csv_read = csv.reader(csv_file_in, delimiter='\t')
for row in csv_read:
try:
row[0] = row[0].replace(',', '')
csv_file_out.write(get_columns_of_data(*row))
except TypeError:
continue
print("The file name '{}' was successfully opened and read".format(file_name))
except IOError:
print('File not found \'OR\' Not in current directory\n')
# All acronyms used in variable naming correspond to the function at time
# of return from function.
# csv_list being a list of the v file contents the remainder i.e. 'st' of
# csv_list_st = split_title().
def main():
open_file('HDTdata3.txt')
multi_sets = open_as_dataframe('HDT_data5.txt')
multi_sets = change_column_names(multi_sets)
change_to_numeric(multi_sets)
log_page_view(multi_sets)
title_length(multi_sets)
set_new_columns(multi_sets)
print(multi_sets)
main()

Creating a dictionary to count the number of occurrences of Sequence IDs

I'm trying to write a function to count the number of each sequence ID that occurs in this file (it's a sample blast file)
The picture above is the input file I'm dealing with.
def count_seq(input):
dic1={}
count=0
for line in input:
if line.startswith('#'):
continue
if line.find('hits found'):
line=line.split('\t')
if line[1] in dic1:
dic1[line]+=1
else:
dic1[line]=1
return dic1
Above is my code which when called just returns empty brackets {}
So I'm trying to count how many times each of the sequence IDs (second element of last 13 lines) occur eg: FO203510.1 occurs 4 times.
Any help would be appreciated immensely, thanks!
Maybe this is what you're after:
def count_seq(input_file):
dic1={}
with open(input_file, "r") as f:
for line in f:
line = line.strip()
if not line.startswith('#'):
line = line.split()
seq_id = line[1]
if not seq_id in dic1:
dic1[seq_id] = 1
else:
dic1[seq_id] += 1
return dic1
print(count_seq("blast_file"))
This is a fitting case for collections.defaultdict. Let f be the file object. Assuming the sequences are in the second column, it's only a few lines of code as shown.
from collections import defaultdict
d = defaultdict(int)
seqs = (line.split()[1] for line in f if not line.strip().startswith("#"))
for seq in seqs:
d[seq] += 1
See if it works!

How to print results from this function

I'm new to Python and programming in general and need a little help with this (partially finished) function. It's calling a text file with a bunch of rows of comma delimited data (age, salary, education and so on). However, I've run into a problem from the outset. I don't know how to return the results.
My aim is to create dictionaries for each category and for each row to be sorted and tallied.
e.g. 100 people over 50, 200 people under 50 and so on.
Am I in the correct ball park?
file = "adultdata.txt"
def make_data(file):
try:
f = open(file, "r")
except IOError as e:
print(e)
return none
large_list = []
avg_age = 0
row_count_under50 = 0
row_count_over50 = 0
#create 2 dictionaries per category
employ_dict_under50 = {}
employ_dict_over50 = {}
for row in f:
edited_row = row.strip()
my_list = edited_row.split(",")
try:
#Age Category
my_list[0] = int(my_list[0])
#Work Category
if my_list[-1] == " <=50K":
if my_list[1] in employ_dict_under50:
employ_dict_under50[my_list[1]] += 1
else:
employ_dict_under50[my_list[1]] = 1
row_count_u50 += 1
else:
if my_list[1] in emp_dict_o50:
employ_dict_over50[my_list[1]] += 1
else:
employ_dict_over50[my_list[1]] = 1
row_count_o50 += 1
# Other categories here
print(my_list)
#print(large_list)
#return
# Ignored categories here - e.g. my_list[insert my list numbers here] = None
I do not have access to your file but I had a go at correcting most of the errors you had in your code.
These are a list of the mistakes I found in your code:
your function make_data is essentially useless and is out of scope. You need to remove it entirely
When using a file object f, you need to use readline to extract data from the file.
It is also best to use a with statement when using IO resources like files
You had numerous variables which were badly named in the inner loop and did not exist
You declared a try in the inner loop without a catch. You can remove the try because you are not trying to catch any Error
You have some very basic errors which are related to general programming, can I assume your new to this? If thats the case then you should probably follow some more beginner tutorials online until you get a grasp of what commands you need to use to perform basic tasks.
Try compare your code to this and see if you can understand what i'm trying to say:
file = "adultdata.txt"
large_list = []
avg_age = 0
row_count_under50 = 0
row_count_over50 = 0
#create 2 dictionaries per category
employ_dict_under50 = {}
employ_dict_over50 = {}
with open(file, "r") as f:
row = f.readline()
edited_row = row.strip()
my_list = edited_row.split(",")
#Age Category
my_list[0] = int(my_list[0])
#Work Category
if my_list[-1] == " <=50K":
if my_list[1] in employ_dict_under50:
employ_dict_under50[my_list[1]] += 1
else:
employ_dict_under50[my_list[1]] = 1
row_count_under50 += 1
else:
if my_list[1] in employ_dict_over50:
employ_dict_over50[my_list[1]] += 1
else:
employ_dict_over50[my_list[1]] = 1
row_count_over50 += 1
# Other categories here
print(my_list)
#print(large_list)
#return
I cannot say for certain if this code will work or not without your file but it should give you a head start.

How to merge two lists at a delimited token in python3

I am a CS major at the University of Alabama, we have a project in our python class and I am stuck...probably for some stupid reason, but I cant seem to find the answer.
here is the link to the project, as it would be a pain to try and explain on here.
http://beastie.cs.ua.edu/cs150/projects/project1.html
here is my code:
import sys
from scanner import scan
def clInput():
#Gets command line input
log1 = sys.argv[1]
log2 = sys.argv[2]
name = sys.argv[3]
if len(sys.argv) != 4:
print('Incorrect number of arguments, should be 3')
sys.exit(1)
return log1,log2,name
def openFiles(log1,log2):
#Opens sys.argv[1]&[2] for reading
f1 = open(log1, 'r')
f2 = open(log2, 'r')
return f1, f2
def merge(log1,log2):
#Merges parsed logs into list without '---'
log1Parse = [[]]
log2Parse = [[]]
log1Count = 0
log2Count = 0
for i in log1:
if i != ['---']:
log1Parse[log1Count].append(i)
else:
log1Count += 1
log1Parse.append([])
for i in log2:
if i != ['---']:
log2Parse[log2Count].append(i)
else:
log2Count += 1
log2Parse.append([])
return(log1Parse[0] + log2Parse[0] + log1Parse[1] + log2Parse[1])
def searchMerge(name,merged):
#Searches Merged list for sys.argv[3]
for i in range(len(merged)):
if (merged[i][1] == name):
print(merged[i][0],merged[i][1]," ".join(merged[i][2:]))
def main():
log1,log2,name = clInput()
f1,f2 = openFiles(log1,log2)
#Sets the contents of the two scanned files to variables
tokens1 = scan(f1)
tokens2 = scan(f2)
#Call to merge and search
merged = merge(tokens1,tokens2)
searchMerge(name,merged)
main()
ok. so heres the problem. We are to merge two lists together into a sorted master list, delimited at the ---'s
my two log files match the ones posted on the website i linked to above. This code works, however if there are more than two instances of the ---'s in each list, it will not jump to the next list to get the other tokens, and so forth. I have it working for two with the merge function. at the end of that function i return
return(log1Parse[0] + log2Parse[0] + log1Parse[1] + log2Parse[1])
but this only works for two instances of ---. Is there anyway i can change my return to look at all of the indexes instead of having to manually put in [0],[1],[2], etc.? I need it to delimit and merge for an arbitrary amount. Please help!!
p.s. disregard the noobness...im a novice, we all gotta start somewhere
p.p.s. - the from scanner import scan is a scanner i wrote to take in all of the tokens in a given list
so.py:
import sys
def main():
# check and load command line arguments
# your code
if len(sys.argv) != 4:
print('Incorrect number of arguments, should be 3')
sys.exit(1)
# open files using file io
# your code
f1 = open(log1, 'r')
f2 = open(log2, 'r')
# list comprehension to process and filter log files
l1 = [ x.strip().split(" ",2) for x in f1.readlines() if x.strip() != "---" ]
l2 = [ x.strip().split(" ",2) for x in f2.readlines() if x.strip() != "---" ]
f1.close()
f2.close()
sorted_merged_lists = sorted(l1 + l2)
results = [ x for x in sorted_merged_lists if x[1] == name ]
for result in results:
print result
main()
CLI:
$ python so.py log1.txt log2.txt Matt
['12:06:12', 'Matt', 'Logged In']
['13:30:07', 'Matt', 'Opened Terminal']
['15:02:00', 'Matt', 'Opened Evolution']
['15:31:16', 'Matt', 'Logged Out']
docs:
http://docs.python.org/release/3.0.1/tutorial/datastructures.html#list-comprehensions
http://docs.python.org/release/3.0.1/library/stdtypes.html?highlight=strip#str.strip
http://docs.python.org/release/3.0.1/library/stdtypes.html?highlight=split#str.split
http://docs.python.org/release/3.0.1/library/functions.html?highlight=sorted#sorted

Resources