Python adding lists - python-3.5

I am importing data from a file, which is working correctly. I have appended the data from this file into 3 different lists, name, mark, mark2 although I don't understand how or if i can make a new list called total_marks and add a calculation appending mark + mark2 into total_marks. Tried looking about for help on this and couldn't find much relating to it. The plan is to actually add the two lists together and work out a percentage which the total marks would be 150.

To add the two lists item by item:
combined = []
for m1, m2 in zip(mark, mark2):
combined.append(m1+m2)
The zip function returns an item pair from the two lists for each pair in the lists.:
https://docs.python.org/3/library/functions.html#zip
Then you can perform the final operation this way:
final = []
for m in combined:
final.append(m/150*100)
As I said in my comment, I highly recommend that once you've gotten past learning the basics that you then take the time to learn two libraries: pandas and xlwings. These will greatly help your ability to interact between python and excel. An operation like you have here becomes much simpler once you learn pandas dataframes.

Here is a better way, using pandas.
import pandas
df = pandas.read_csv('Classmarks.csv', index_col = 'student_name', names = ('student_name', 'mark1', 'mark2'), header = None)
df['combined'] = df['mark1'] + df['mark2']
df['final'] = df['combined'] / 150 * 100
print(df)
Don't have to do any loops using pandas. And you can then write it back to a csv file:
df.to_csv('Classmarksout.csv')

Related

Trying to merge 2 dataframes but receiving value error of merging object and int32 columns

I have been trying to address an issue mentioned here
I had been trying to use a list of dates to filter a dataframe, and a very gracious person was helping me, but now with the current code, I am receiving these errors.
# Assign a sequential number to each trading day
df_melt_test_percent = df_melt_test_percent.sort_index().assign(DayNumber=lambda x: range(len(x)))
# Find the indices of the FOMC_dates
tmp = pd.merge(
df_FOMC_dates, df_melt_test_percent[['DayNumber']],
left_on='FOMC_date', right_on='DayNumber'
)
# For each row, get the FOMC_dates ± 3 days
tmp['delta'] = tmp.apply(lambda _: range(-3, 4), axis=1)
tmp = tmp.explode('delta')
tmp['DayNumber'] += tmp['delta']
# Assemble the result
result = pd.merge(tmp, df_melt_test_percent, on='DayNumber')
Screenshots of dataframes:
If anyone has any advice on how to fix this, it would be greatly appreciated.
EDIT #1:
The columns on which you want to merge are not the same types in both dataframes. Likely one is string the other integer. You should convert to the same type before merging. Assuming from the little bit you showed, before your merge, run:
tmp['DayNumber'] = tmp['DayNumber'].astype(int)
Alternatively:
df_melt_test_percent['DayNumber'] = df_melt_test_percent['DayNumber'].astype(str)
NB. This might not work as you did not provide a full example. Either search by yourself the right types or provide a reproducible example.

Does anyone know how to pull averages from a text file for several different people?

I have a text file (Player_hits.text) that I am trying to pull player batting averages from. Similar to lines 179-189 I want to find an average. However, I do not want to find the average for the entire team. Instead, I want to find the average of every individual player on the team.
For instance, the text file is set up as such:
Player_hits.txt
In this file a 1 defines a hit and a 0 means the player did not get a hit. I am trying to pull an individual average for both players. (Alex = 0.500, Riley = 0.666)
If someone could help, that would be greatly appreciated!
Thanks!
Link to original code on repl.it: Baseball Stat-Tracking
JSONDecodeError Image
The json.decoder.JSONDecodeError: is coming because the json.loads() doesn't interpret that (each line, '[[1, 'Riley']\n'as valid json format. You can use ast to read in that list as a literal evaluation, thus storing that as a list element [', 'Riley'] in your list of p_hits.
Then the second part is you can convert to the dataframe and groupby the 'name' column. So jim has the right idea, but there's errors in that too (Ie. colmuns should be columns, and the items in the list need to be strings ['hit','name'], not undeclared variables.
import pandas as pd
import ast
p_hits = []
with open('Player_hits.txt') as hits:
for line in hits:
l = ast.literal_eval(line)
p_hits.append(l)
df = pd.DataFrame(p_hits, columns=['hit', 'name'])
Output: with an example dataset I made
print(df.groupby(['name']).mean())
hit
name
Matt 0.714286
Riley 0.285714
Todd 0.500000
import pandas as pd
import json
p_hits = []
with open('Player_hits.txt') as hits:
for line in hits:
l = json.loads(line)
p_hits.append(l)
df = pd.DataFrame.from_records(p_hits, colmuns=[hit, name])
df.groupby(['name']).mean()

Stuck using pandas to build RPG item generator

I am trying to build a simple random item generator for a game I am working on.
So far I am stuck trying to figure out how to store and access all of the data. I went with pandas using .csv files to store the data sets.
I want to add weighted probabilities to what items are generated so I tried to read the csv files and compile each list into a new set.
I got the program to pick a random set but got stuck when trying to pull a random row from that set.
I am getting an error when I use .sample() to pull the item row which makes me think I don't understand how pandas works. I think I need to be creating new lists so I can later index and access the various statistics of the items once one is selected.
Once I pull the item I was intending on adding effects that would change the damage and armor and such displayed. So I was thinking of having the new item be its own list then use damage = item[2] + 3 or whatever I need
error is: AttributeError: 'list' object has no attribute 'sample'
Can anyone help with this problem? Maybe there is a better way to set up the data?
here is my code so far:
import pandas as pd
import random
df = [pd.read_csv('weapons.csv'), pd.read_csv('armor.csv'), pd.read_csv('aether_infused.csv')]
def get_item():
item_class = [random.choices(df, weights=(45,40,15), k=1)] #this part seemed to work. When I printed item_class it printed one of the entire lists at the correct odds
item = item_class.sample()
print (item) #to see if the program is working
get_item()
I think you are getting slightly confused with lists vs list elements. This should work. I stubbed your dfs with simple ones
import pandas as pd
import random
# Actual data. Comment it out if you do not have the csv files
df = [pd.read_csv('weapons.csv'), pd.read_csv('armor.csv'), pd.read_csv('aether_infused.csv')]
# My stubs -- uncomment and use this instead of the line above if you want to run this specific example
# df = [pd.DataFrame({'weapons' : ['w1','w2']}), pd.DataFrame({'armor' : ['a1','a2', 'a3']}), pd.DataFrame({'aether' : ['e1','e2', 'e3', 'e4']})]
def get_item():
# I removed [] from the line below -- choices() already returns a list of length 1
item_class = random.choices(df, weights=(45,40,15), k=1)
# I added [0] to choose the first element of item_class which is a list of length 1 from the line above
item = item_class[0].sample()
print (item) #to see if the program is working
get_item()
prints random rows from random dataframes that I setup such as
weapons
1 w2

How to split compound word in pandas?

I have and document that consist of many compounds (or sometimes combined) word as:
document.csv
index text
0 my first java code was helloworld
1 my cardoor is totally broken
2 I will buy a screwdriver to fix my bike
As seen above some words are combined or compound and I am using compound word splitter from here to fix this issue, however, I have trouble to apply it in each row of my document (like pandas series) and convert the document into a clean form of:
cleanDocument.csv
index text
0 my first java code was hello world
1 my car door is totally broken
2 I will buy a screw driver to fix my bike
(I am aware of word such as screwdriver should be together, but my goal is cleaning the document). If you have a better idea for splitting only combined words, please let me know.
splitter code may works as:
import pandas as pd
import splitter ## This use enchant dict (pip install enchant requires)
data = pd.read_csv('document.csv.csv')
then, it should use:
splitter.split(data) ## ???
I already looked into something like this but this not work in my case. thanks
You use apply wit axis =1 : Can you try the following
data.apply(lambda x: splitter.split(j) for j in (x.split()), axis = 1)
I do not have splitter installed on my system. By looking at the link you have provided, I have this following code. Can you try:
def handle_list(m):
ret_lst = []
L = m['text'].split()
for wrd in L:
g = splitter.split(wrd)
if g :
ret_lst.extend(g)
else:
ret_lst.append(wrd)
return ret_lst
dft.apply(handle_list, axis = 1)

Iterate over 4 pandas data frame columns and store them into 4 lists with one for loop instead of 4 for loops

I am currently working on pandas structure in Python. I wrote a function that extracts data from Pandas data frame and stores it in lists. The code is working but I feel like there is a part that I could write in one for loop instead four for loops. I will give you an example below. The idea of this part of the code is to extract four columns from a pandas data frame into four lists. I did it with 4 separate for loops but I want to have one loop that does the thing.
col1,col1,col1,col1 = [],[],[],[]
for j in abc['col1']:
col1.append(j)
for k in abc['col2']:
col2.append(k)
for l in abc['col3']:
col3.append(l)
for n in abc['col4']:
col4.append(n)
And my idea is to write a one for loop that does all the code. I tried to do something like this, but it doesn't work.
col1,col1,col1,col1 = [],[],[],[]
for j,k,l,n in abc[['col1','col2','col3','col4']]
col1.append(j)
col2.append(k)
col3.append(l)
col4.append(n)
Can you help me with this idea to wrap four for loops into the one? I would appreciate your help!
You don't need to use loops at all; you can just convert each column into a list directly.
list_1 = df["col"]to_list()
Have a look at this previous question.
Treating a panda dataframe like a list usually works, but is very bad for performance. I'd consider using the iterrows() function instead.
This would work as in the following example:
col1,col2,col3,col4 = [],[],[],[]
for index, row in df.iterrows():
col1.append(row['col1'])
col2.append(row['col2'])
col3.append(row['col3'])
col4.append(row['col4'])
It's probably easier to use pandas.values and then numpy.ndarray.to_list():
col = ['col1','col2','col3']
data = []*len(col)
for i in range(len(col)):
data[i] = df[col(i)].values.to_list()

Resources