Replacing series values pandas - python-3.x

I have this following dataframe:
And i have this following list:
and i want to replace the series value of team_stat['First Half']['W'] to the list value of first_half_win_result

Try the following code, it would convert list to series for pandas
team_stat['First Half']['W'] = pd.Series(first_half_win_result)

well i find the solution:
team_stat = team_stat.transpose()
team_stat.loc['First Half', 'W'] = first_half_win_result
team_stat = team_stat.transpose()

Related

Pandas object - save to .csv

I have a pandas object df and I would like to save that to .csv:
df.to_csv('output.csv', index = False)
Even if the data frame is displayed right in the terminal after printing, in the *.csv some lines are shifted several columns forward. I do not know how to demonstrate that in the minimal working code. I tried that with the one problematic column, but the result of one column was correct in the *.csv. What should I check, please? The whole column contains strings.
After advice:
selected['SpType'] = selected['SpType'].str.replace('\t', '')
I obtained an error:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
selected['SpType'] = selected['SpType'].str.replace('\t', '')
If the tabs are the problem, you could just replace all tabs.
If the tabs occur in column column_name you could do something like:
df['column_name'] = df['column_name'].str.replace('\t', '')
If the problem is in several columns, you could loop over all columns. eg.:
for col in df.columns:
df[col] = df[col].str.replace('\t', '')
df.to_csv('output.csv', index = False)

Replace items like A2 as AA in the dataframe

I have a list of items, like "A2BCO6" and "ABC2O6". I want to replace them as A2BCO6--> AABCO6 and ABC2O6 --> ABCCO6. The number of items are much more than presented here.
My dataframe is like:
listAB:
Finctional_Group
0 Ba2NbFeO6
1 Ba2ScIrO6
3 MnPb2WO6
I create a duplicate array and tried to replace with following way:
B = ["Ba2", "Pb2"]
C = ["BaBa", "PbPb"]
for i,j in range(len(B)), range(len(C)):
listAB["Finctional_Group"]= listAB["Finctional_Group"].str.strip().str.replace(B[i], C[j])
But it does not produce correct output. The output is like:
listAB:
Finctional_Group
0 PbPbNbFeO6
1 PbPbScIrO6
3 MnPb2WO6
Please suggest the necessary correction in the code.
Many thanks in advance.
I used for simplicity purpose chemparse package that seems to suite your needs.
As always we import the required packages, in this case chemparse and pandas.
import chemparse
import pandas as pd
then we create a pandas.DataFrame object like in your example with your example data.
df = pd.DataFrame(
columns=["Finctional_Group"], data=["Ba2NbFeO6", "Ba2ScIrO6", "MnPb2WO6"]
)
Our parser function will use chemparse.parse_formula which returns a dict of element and their frequency in a molecular formula.
def parse_molecule(molecule: str) -> dict:
# initializing empty string
molecule_in_string = ""
# iterating over all key & values in dict
for key, value in chemparse.parse_formula(molecule).items():
# appending number of elements to string
molecule_in_string += key * int(value)
return molecule_in_string
molecule_in_string contains the molecule formula without numbers now. We just need to map this function to all elements in our dataframe column. For that we can do
df = df.applymap(parse_molecule)
print(df)
which returns:
0 BaBaNbFeOOOOOO
1 BaBaScIrOOOOOO
2 MnPbPbWOOOOOO
dtype: object
Source code for chemparse: https://gitlab.com/gmboyer/chemparse

Remove duplicates form a list in pandas

I have a list like this :
['35UP\nPLx', '35UP']
I need a list of unique elements:
['PLx', '35UP']
i have tried this :
veh_line = list(dict.fromkeys(filter['p_Mounting_Location'].replace('\n',',', regex=True).tolist()))
This is one approach using str.splitlines with set.
Ex:
data = ['35UP\nPLx', '35UP']
result = list(set(j for i in data for j in i.splitlines()))
print(result)
Output:
['35UP', 'PLx']

How to save tuples output form for loop to DataFrame Python

I have some data 33k rows x 57 columns.
In some columns there is a data which I want to translate with dictionary.
I have done translation, but now I want to write back translated data to my data set.
I have problem with saving tuples output from for loop.
I am using tuples for creating good translation. .join and .append is not working in my case. I was trying in many case but without any success.
Looking for any advice.
data = pd.read_csv(filepath, engine="python", sep=";", keep_default_na=False)
for index, row in data.iterrows():
row["translated"] = (tuple(slownik.get(znak) for znak in row["1st_service"]))
I just want to see in print(data["1st_service"] a translated data not the previous one before for loop.
First of all, if your csv doesn't already have a 'translated' column, you'll have to add it:
import numpy as np
data['translated'] = np.nan
The problem is the row object you're trying to write to is only a view of the dataframe, it's not the dataframe itself. Plus you're missing square brackets for your list comprehension, if I'm understanding what you're doing. So change your last line to:
data.loc[index, "translated"] = tuple([slownik.get(znak) for znak in row["1st_service"]])
and you'll get a tuple written into that one cell.
In future, posting the exact error message you're getting is very helpful!
I have manage it, below working code:
data = pd.read_csv(filepath, engine="python", sep=";", keep_default_na=False)
data.columns = []
slownik = dict([ ])
trans = ' '
for index, row in data.iterrows():
trans += str(tuple([slownik.get(znak) for znak in row["1st_service"]]))
data['1st_service'] = trans.split(')(')
data.to_csv("out.csv", index=False)
Can you tell me if it is well done?
Maybe there is an faster way to do it?
I am doing it for 12 columns in one for loop, as shown up.

How to format string to place variable in python [duplicate]

This question already has answers here:
Pandas Passing Variable Names into Column Name
(3 answers)
Closed 4 years ago.
I am trying to read unique values for columns in list but unable to put variable correctly in a way that it becomes a command. If i run c_data.ABC.unique() directly then i get list of unique values in ABC column. Please suggest what is going wrong.
import pandas as pd
c_data=pd.read_csv("/home/fileName.csv")
list=['ABC','DEF']
for f in list:
cl="c_data.{}.unique()".format(f)
print(cl)
Output:
c_data.ABC.unique()
c_data.DEF.unique()
You should definetly check on these indexing basic in pandas. So, about your answer you can use the most basic indexing by brackets [] and string column name, for example c_data['ABC'], so you can iterate like this:
c_data = pd.read_csv("/home/fileName.csv")
list = ['ABC', 'DEF']
for f in list:
print(c_data[f].unique())
If you want/need to use format method, you can just replace column name with formatted string:
c_data = pd.read_csv("/home/fileName.csv")
list = ['ABC', 'DEF']
for f in list:
print(c_data['{0}'.format(f)].unique()])
Also, you can use bracket indexing with a list of string, which will give you another DataFrame. Then you can iterate over DataFrame itself which will give you column names:
c_data = pd.read_csv("/home/fileName.csv")
f_data = c_data[['ABC', 'DEF']]
for f in f_data:
print(f_data[f].unique())

Resources