I wrote a small python script to read from a CSV, check for certain values, then write results to another CSV file.
Here is my code
`
import pandas as pd
import os
current_directory = os.getcwd()
pf = pd.read_csv(current_directory + '\Appointments.csv')
total_kept = {}
for index, row in pf.iterrows():
if total_kept.get(row['Acct#']) not in total_kept:
values = {"row['Acct#']": 0}
total_kept.update(values) //this line and the above one are supposed to initalize values to 0
if row['Status'] == 'K':
total_kept[row['Acct#']] = total_kept.get(row['Acct#'],0) + 1
else:
if row['Status'] == 'K':
total_kept[row['Acct#']] = total_kept.get(row['Acct#'],0) + 1
with open("results.csv", 'w') as f:
for key in total_kept.keys():
f.write("%s, %s\n" % (key, total_kept[key]))
`
For some reason, it is not adding ID's with no 'K' to the dictionary when they should have a value of 0. I am trying to count how many 'K' statuses from each ID. My thought was to create a dictionary with the key being the ID and the value being the number of 'K' instances.
Related
I am trying to figure out how to make my pandas DataFrame group data together. Currently, if you input data, for example: 1, 1, 2, 2, 3 - it will come out like this:
column1 column2
1: 2
2: 2
3: 1
I would like it to just group the data if they have the same value in column2 and for this example, just show 2 : 2, meaning 2 of the numbers inserted were both inserted twice, so we will just count that. Here is my current code:
from collections import Counter
import time
filename = input("Enter name to save this file as:\n\n")
print("Your file will be saved as: " + filename + ".csv\n\n")
time.sleep(0.5)
print("Please enter information")
contents = []
def amount():
while True:
try:
line = input()
except EOFError:
break
contents.append(line)
return
amount()
count = Counter(contents)
print(count)
import pandas as pd
d = count
df = pd.DataFrame.from_dict(d, orient='index').reset_index()
df.columns =['column1', 'column2']
df.sort_values(by=['column2'].value_counts())
df.to_csv(filename + ".csv", encoding='utf-8', index=False)
Any help would be appreciated.
I have tried in this using value_counts() in Pandas, not sure if it'll work for you!
values = [1, 1, 2, 2, 3]
df = pd.DataFrame(values, columns=['values'])
res = df['values'].value_counts().to_frame().reset_index().sort_values('index')
# renaming the columns
res.columns = ['Values', 'Count']
display(res)
I have a code which iterates through the text, and tells me which is the maximum amount of times each dna STR is found. The only step missing to be able to match these values with the CSV file, is to store them into a list, BUT I AM NOT ABLE TO DO SO. When I run the code, the maximum values are printed independently for each STR sequence.
I have tried to "append" the values into a list, but I was not successful, thus, I cannot match it with the dna sequences of the CSV (large nor small).
Any help or advcise is greatly appreciated!
Here is my code, and the results I get with using "text 1" and "small csv":
`
import cs50
import sys
import csv
import os
if len(sys.argv) != 3:
print("Usage: python dna.py data.csv sequence.txt")
csv_db = sys.argv[1]
file_seq = sys.argv[2]
with open(csv_db, newline='') as csvfile: #with open(csv_db) as csv_file
csv_reader = csv.reader(csvfile, delimiter=',')
header = next(csv_reader)
i = 1
while i < len(header):
STR = header[i]
len_STR = len(STR)
with open(file_seq, 'r') as my_file:
file_reader = my_file.read()
counter = 0
a = 0
b = len_STR
list = []
for text in file_reader:
if file_reader[a:b] != STR:
a += 1
b += 1
else:
counter += 1
a += len_STR
b += len_STR
list.append(counter)
print(list)
i += 1
`
The problem is in place of variable "list" declaration. Every time you iterates through STRs in variable "header" you declares:
list = []
Thus, you create absolutely new variable, which stores only the length of current STR. To make a list with all STRs appended you need to declare variable "list" before the while loop and operator "print" after the while loop:
list = []
while i < len(header):
<your loop code>
print(list)
This should solve your problem.
P.S. Avoid to use "list" as a variable declaration. The "list" is a python built-in function and it is automatically declared. So, when you redeclare it, you will not be able to use list() function in your code.
I am learning Python. I tried to get the keys of a dictionary. But I only get the last key. In my understanding, method keys() is used to get all keys in the dictionary.
Following are my questions?
1. Why I cannot get all keys?
2. If I have a dictionary, how can I get the value if I know the key? e.g. dict = {'Ben':8, 'Joe':7, 'Mary' : 9}. How can I input the key = "Ben", so the program can output the value 8? The tutorial shows that the key must be immutable. This constraint is very inconvenient when trying to get a value with a given key.
Any suggestion would be highly appreciated.
Here are my code.
import os, tarfile, urllib
work_path = os.getcwd()
input_control_file = "input_control"
import os, tarfile, urllib
work_path = os.getcwd()
input_control_file = "input_control"
input_control= work_path + "/" + input_control_file
#open control file if file exist
#read setting info
try:
#if the file does not exist,
#then it would throw an IOError
f = open(input_control, 'r')
#define dictionary/hash table
for LINE in f:
LINE = LINE.strip() #remove leading and trailing whitespace
lst = LINE.split() #split string into lists
lst[0] = lst[0].split(":")[0]
dic = {lst[0].strip():lst[1].strip()}
except IOError:
# print(os.error) will <class 'OSError'>
print("Reading file error. File " + input_control + " does not exist.")
#get keys
def getkeys(dict):
return list(dict.keys())
print("l39")
print(getkeys(dic))
print("end")
Below are the outputs.
l39
['source_type']
end
The reason is that you are reassigning variable dic again in for loop. You are not updating or adding the dictionary, instead you are reassigning the variable. In that case, dic will have only the last entry. You can change your for loop to:
dic = {}
for LINE in f:
LINE = LINE.strip() #remove leading and trailing whitespace
lst = LINE.split() #split string into lists
lst[0] = lst[0].split(":")[0]
dic.update({lst[0].strip():lst[1].strip()}) # update the dictionary with new values.
For your other question, if you have the dictionary dic = {'Ben':8, 'Joe':7, 'Mary' : 9}, then you can get the value by: dic['Ben']. It will return the value 8 or will raise KeyError if key Ben is not found in the dictionary. To avoid KeyError, you can use the get() method of dictionary. It will return None if provided key is not found in the dictionary.
val = dic['Ben'] # returns 8
val = dic['Hen'] # will raise KeyError
val = dic.get('Hen') # will return None
In your for loop, you are re-initializing the dictionary value, while you need to update the dictionary, i.e., append the key-value pair to the pre-existing dictionary. For this, use
dic.update({lst[0].strip() : lst[1].strip()})
This will update the key-value pair to the dictionary. Now, when you use dic.keys(), you will get all the keys of dic, as a list.
As for your second question, access the dictionary, just like accessing a list, except that list is accessed with indices, and dictionary will be accessed by keys. Say, you have a list and a dictionary as
lst = [1, 2, 3, 4, 5]
dic = {'a' : 1, 'b' : 2, 'c' : 3, 'd' : 4, 'e' : 5}
To get value 2 from list, you do lst[1], i.e., value at index 1. Similarly, if you want to get the value 2 from dictionary, do dic['b'], i.e., value of key 'b'. It is as simple as that.
I am using python3 and pandas to create a script that will:
Be dynamic across different dataset lengths(rows) and unique values - completed
Take unique values from column A and create separate dataframes as variables for each unique entry - completed
Add totals to the bottom of each dataframe - completed
Concatenate the separate dataframes back together - incomplete
The issue is I am unable to formulate a way to create a list of the variables in use and apply them as arg in to the command pd.concat.
The sample dataset. The dataset may have more unique BrandFlavors or less which is why the script must be flexible and dynamic.
Script:
import pandas as pd
import warnings
warnings.simplefilter(action='ignore')
excel_file = ('testfile.xlsx')
df = pd.read_excel(excel_file)
df = df.sort_values(by='This', ascending=False)
colarr = df.columns.values
arr = df[colarr[0]].unique()
for i in range(len(arr)):
globals()['var%s' % i] = df.loc[df[colarr[0]] == arr[i]]
for i in range(len(arr)):
if globals()['var%s' % i].empty:
''
else:
globals()['var%s' % i] = globals()['var%s' % i].append({'BrandFlavor':'Total',
'This':globals()['var%s' % i]['This'].sum(),
'Last':globals()['var%s' % i]['Last'].sum(),
'Diff':globals()['var%s' % i]['Diff'].sum(),
'% Chg':globals()['var%s' % i]['Diff'].sum()/globals()['var%s' % i]['Last'].sum() * 100}, ignore_index=True)
globals()['var%s' % i]['% Chg'].fillna(0, inplace=True)
globals()['var%s' % i].fillna(' ', inplace=True)
I have tried this below, however the list is a series of strings
vararr = []
count = 0
for x in range(len(arr)):
vararr.append('var' + str(count))
count = count + 1
df = pd.concat([vararr])
pd.concat does not recognize a string. I tired to build a class with an arg defined but had the same issue.
The desired outcome would be a code snippet that generated a list of variables that matched the ones created by lines 9/10 and could be referenced by pd.concat([list, of, vars, here]). It must be dynamic. Thank you
Just fixing the issue at hand, you shouldn't use globals to make variables, that is not considered good practice. Your code should work with some minor modifications.
import pandas as pd
import warnings
warnings.simplefilter(action='ignore')
excel_file = ('testfile.xlsx')
df = pd.read_excel(excel_file)
df = df.sort_values(by='This', ascending=False)
def good_dfs(dataframe):
if dataframe.empty:
pass
else:
this = dataframe.This.sum()
last = dataframe.Last.sum()
diff = dataframe.Diff.sum()
data = {
'BrandFlavor': 'Total',
'This': this,
'Last': last,
'Diff': diff,
'Pct Change': diff / last * 100
}
dataframe.append(data, ignore_index=True)
dataframe['Pct Change'].fillna(0.0, inplace=True)
dataframe.fillna(' ', inplace=True)
return dataframe
colarr = df.columns.values
arr = df[colarr[0]].unique()
dfs = []
for i in range(len(arr)):
temp = df.loc[df[colarr[0]] == arr[i]]
dfs.append(temp)
final_dfs = [good_dfs(d) for d in dfs]
final_df = pd.concat(final_dfs)
Although I will say, there are far easier ways to accomplish what you want without doing all of this, however that can be a separate question.
I have data in csv - 2 columns, 1st column contains member id and second contains characteristics in Key-Value pairs (nested one under another).
I have seen online codes which convert a simple Key-value pairs but not able to transform data like what i have shown above
I want to transform this data into a excel table as below
I did it with this XlsxWriter package, so first you have to install it by running pip install XlsxWriter command.
import csv # to read csv file
import xlsxwriter # to write xlxs file
import ast
# you can change this names according to your local ones
csv_file = 'data.csv'
xlsx_file = 'data.xlsx'
# read the csv file and get all the JSON values into data list
data = []
with open(csv_file, 'r') as csvFile:
# read line by line in csv file
reader = csv.reader(csvFile)
# convert every line into list and select the JSON values
for row in list(reader)[1:]:
# csv are comma separated, so combine all the necessary
# part of the json with comma
json_to_str = ','.join(row[1:])
# convert it to python dictionary
str_to_dict = ast.literal_eval(json_to_str)
# append those completed JSON into the data list
data.append(str_to_dict)
# define the excel file
workbook = xlsxwriter.Workbook(xlsx_file)
# create a sheet for our work
worksheet = workbook.add_worksheet()
# cell format for merge fields with bold and align center
# letters and design border
merge_format = workbook.add_format({
'bold': 1,
'border': 1,
'align': 'center',
'valign': 'vcenter'})
# other cell format to design the border
cell_format = workbook.add_format({
'border': 1,
})
# create the header section dynamically
first_col = 0
last_col = 0
for index, value in enumerate(data[0].items()):
if isinstance(value[1], dict):
# this if mean the JSON key has something else
# other than the single value like dict or list
last_col += len(value[1].keys())
worksheet.merge_range(first_row=0,
first_col=first_col,
last_row=0,
last_col=last_col,
data=value[0],
cell_format=merge_format)
for k, v in value[1].items():
# this is for go in deep the value if exist
worksheet.write(1, first_col, k, merge_format)
first_col += 1
first_col = last_col + 1
else:
# 'age' has only one value, so this else section
# is for create normal headers like 'age'
worksheet.write(1, first_col, value[0], merge_format)
first_col += 1
# now we know how many columns exist in the
# excel, and set the width to 20
worksheet.set_column(first_col=0, last_col=last_col, width=20)
# filling values to excel file
for index, value in enumerate(data):
last_col = 0
for k, v in value.items():
if isinstance(v, dict):
# this is for handle values with dictionary
for k1, v1 in v.items():
if isinstance(v1, list):
# this will capture last 'type' list (['Grass', 'Hardball'])
# in the 'conditions'
worksheet.write(index + 2, last_col, ', '.join(v1), cell_format)
else:
# just filling other values other than list
worksheet.write(index + 2, last_col, v1, cell_format)
last_col += 1
else:
# this is handle single value other than dict or list
worksheet.write(index + 2, last_col, v, cell_format)
last_col += 1
# finally close to create the excel file
workbook.close()
I commented out most of the line to get better understand and reduce the complexity because you are very new to Python. If you didn't get any point let me know, I'll explain as much as I can. Additionally I used enumerate() python Built-in Function. Check this small example which I directly get it from original documentation. This enumerate() is useful when numbering items in the list.
Return an enumerate object. iterable must be a sequence, an iterator, or some other object which supports iteration. The __next__() method of the iterator returned by enumerate() returns a tuple containing a count (from start which defaults to 0) and the values obtained from iterating over iterable.
>>> seasons = ['Spring', 'Summer', 'Fall', 'Winter']
>>> list(enumerate(seasons))
[(0, 'Spring'), (1, 'Summer'), (2, 'Fall'), (3, 'Winter')]
>>> list(enumerate(seasons, start=1))
[(1, 'Spring'), (2, 'Summer'), (3, 'Fall'), (4, 'Winter')]
Here is my csv file,
and here is the final output of the excel file. I just merged the duplicate header values (matchruns and conditions).