How get ranges of one column gruop by class column? In Pandas

How get ranges of one column gruop by class column? In Pandas - python-3.x

I'm practicing with Pandas and i want to get the ranges of a column from a dataframe by the values of another column.
An example dataset:
Points Grade
1 7.5 C
2 9.3 A
3 NaN A
4 1.3 F
5 8.7 B
6 9.5 A
7 7.9 C
8 4.5 F
9 8.0 B
10 6.8 D
11 5.0 D
I want group ranges of points for each grade so i can induce missing values.
For that goal i need gets something like this:
Grade Points
A [9.5, 9.3]
B [8.7, 8.0]
C [7.5, 7.0]
D [6.8, 5.0]
F [1.3, 4.5]
I can get it with for and that kinds of stuffs but is it possible with pandas in some easy way?
I tried all groupby combinations i know and nothing. Some suggestion?

You can first filter df with notnull and then groupby and tolist with reset_index:
print df
Points Grade
0 7.5 C
1 9.3 A
2 NaN A
3 1.3 F
4 8.7 B
5 9.5 A
6 7.9 C
7 4.5 F
8 8.0 B
9 6.8 D
10 5.0 D
print df['Points'].notnull()
0 True
1 True
2 False
3 True
4 True
5 True
6 True
7 True
8 True
9 True
10 True
Name: Points, dtype: bool
print df.loc[df['Points'].notnull()]
Points Grade
0 7.5 C
1 9.3 A
3 1.3 F
4 8.7 B
5 9.5 A
6 7.9 C
7 4.5 F
8 8.0 B
9 6.8 D
10 5.0 D
print df.loc[df['Points'].notnull()].groupby('Grade')['Points']
.apply(lambda x: x.tolist()).reset_index()
Grade Points
0 A [9.3, 9.5]
1 B [8.7, 8.0]
2 C [7.5, 7.9]
3 D [6.8, 5.0]
4 F [1.3, 4.5]

Related

Groupby count of non NaN of another column and a specific calculation of the same columns in pandas

I have a data frame as shown below
ID Class Score1 Score2 Name
1 A 9 7 Xavi
2 B 7 8 Alba
3 A 10 8 Messi
4 A 8 10 Neymar
5 A 7 8 Mbappe
6 C 4 6 Silva
7 C 3 2 Pique
8 B 5 7 Ramos
9 B 6 7 Serge
10 C 8 5 Ayala
11 A NaN 4 Casilas
12 A NaN 4 De_Gea
13 B NaN 2 Seaman
14 C NaN 7 Chilavert
15 B NaN 3 Courtous
From the above, I would like to calculate the number of players with scoer1 less than or equal to 6 in each Class along with count of non NaN rows (Class wise)
Expected output:
Class Total_Number Count_Non_NaN Score1_less_than_6_# Avg_score1
A 6 4 0 8.5
B 5 3 2 6
C 4 3 2 5
tried below code
df2 = df.groupby('Class').agg(Total_Number = ('Score1','size'),
Score1_less_than_6 = ('Score1',lambda x: x.between(0,6).sum()),
Avg_score1 = ('Score1','mean'))
df2 = df2.reset_index()
df2

Groupby and aggregate using a dictionary
df['s'] = df['Score1'].le(6)
df.groupby('Class').agg(**{'total_number': ('Score1', 'size'),
'count_non_nan': ('Score1', 'count'),
'score1_less_than_six': ('s', 'sum'),
'avg_score1': ('Score1', 'mean')})
total_number count_non_nan score1_less_than_six avg_score1
Class
A 6 4 0 8.5
B 5 3 2 6.0
C 4 3 2 5.0

Try:
x = df.groupby("Class", as_index=False).agg(
Total_Number=("Class", "count"),
Count_Non_NaN=("Score1", lambda x: x.notna().sum()),
Score1_less_than_6=("Score1", lambda x: (x <= 6).sum()),
Avg_score1=("Score1", "mean"),
)
print(x)
Prints:
Class Total_Number Count_Non_NaN Score1_less_than_6 Avg_score1
0 A 6 4.0 0.0 8.5
1 B 5 3.0 2.0 6.0
2 C 4 3.0 2.0 5.0

Rearrange columns in DataFrame

Having a DataFrame structured as follows:
country A B C D
0 Albany 5.2 4.7 253.75 4
1 China 7.5 3.4 280.72 3
2 Portugal 4.6 7.5 320.00 6
3 France 8.4 3.6 144.00 3
4 Greece 2.1 10.0 331.00 6
I wanted to get something like this:
cost A B
country C D C D
Albany 2.05 4 1.85 4
China 2.67 3 1.21 3
Portugal 1.44 6 2.34 6
France 5.83 3 2.50 3
Greece 0.63 6 3.02 6
I mean, get the columns A and B as headers over C and D, keeping D the same with its constant value, and calculating in C the percentage resulting of the header over C. Example for Albany:
value C in A: (5.2/253.75)*100 = 2.05
value C in B: (4.7/253.75)*100 = 1.85
Is there any way to do it?
Thanks!

You can divide multiple columns, here A and B by DataFrame.div, then DataFrame.reindex by MultiIndex created by MultiIndex.from_product and last set D columns by original with MultiIndex slicers:
cols = ['A','B']
mux = pd.MultiIndex.from_product([cols, ['C', 'D']])
df1 = df[cols].div(df['C'], axis=0).mul(100).reindex(mux, axis=1, level=0)
idx = pd.IndexSlice
df1.loc[:, idx[:, 'D']] = df[['D'] * len(cols)].to_numpy()
#pandas bellow 0.24
#df1.loc[:, idx[:, 'D']] = df[['D'] * len(cols)].values
print (df1)
A B
C D C D
0 2.049261 4 1.852217 4
1 2.671701 3 1.211171 3
2 1.437500 6 2.343750 6
3 5.833333 3 2.500000 3
4 0.634441 6 3.021148 6

How to replace the missing values with average of ffill() and bfill() in pandas?

This is a sample dataframe and it containsNA:
x y z datetime
0 2 3 4 02-02-2019
1 NA NA NA 03-02-2019
2 3 5 7 04-02-2019
3 NA NA NA 05-02-2019
4 4 7 9 06-02-2019
Now, i want to fill these NA values and i can do this by using either ffill() or bfill(). But what if want to apply the average of the ffill() & bfill(). Then how can i do this?
The direct average df = (df.fill() + df.bfill()) / 2 didn't work because of datetime column.
The end dataframe should look like this:
x y z datetime
0 2 3 4 02-02-2019
1 2.5 4 5.5 03-02-2019
2 3 5 7 04-02-2019
3 3.5 6 8 05-02-2019
4 4 7 9 06-02-2019

Check with df.interpolate:
df.interpolate()
x y z datetime
0 2.0 3.0 4.0 02-02-2019
1 2.5 4.0 5.5 03-02-2019
2 3.0 5.0 7.0 04-02-2019
3 3.5 6.0 8.0 05-02-2019
4 4.0 7.0 9.0 06-02-2019

Working with two data frames with different size in python

I am working with two data frames.
The sample data is as follow:
DF = ['A','B','C','D','E','A','C','B','B']
DF1 = pd.DataFrame({'Team':DF})
DF2 = pd.DataFrame({'Team':['A','B','C','D','E'],'Rating':[1,2,3,4,5]})
i want to add a new column to DF1 as follow:
Team Rating
A 1
B 2
C 3
D 4
E 5
A 1
C 3
B 2
B 2
How can I add a new column?
I used
DF1['Rating']= np.where(DF1['Team']== DF2['Team'],DF2['Rating'],0)
Error : ValueError: Can only compare identically-labeled Series objects
Thanks
ZEP

I think need map by Series created with set_index and if not match get NaNs, so fillna was added for replace to 0:
DF1['Rating']= DF1['Team'].map(DF2.set_index('Team')['Rating']).fillna(0)
print (DF1)
Team Rating
0 A 1
1 B 2
2 C 3
3 D 4
4 E 5
5 A 1
6 C 3
7 B 2
8 B 2
DF = ['A','B','C','D','E','A','C','B','B', 'G']
DF1 = pd.DataFrame({'Team':DF})
DF2 = pd.DataFrame({'Team':['A','B','C','D','E'],'Rating':[1,2,3,4,5]})
DF1['Rating']= DF1['Team'].map(DF2.set_index('Team')['Rating']).fillna(0)
print (DF1)
Team Rating
0 A 1.0
1 B 2.0
2 C 3.0
3 D 4.0
4 E 5.0
5 A 1.0
6 C 3.0
7 B 2.0
8 B 2.0
9 G 0.0 <- G not in DF2['Team']
Detail:
print (DF1['Team'].map(DF2.set_index('Team')['Rating']))
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
5 1.0
6 3.0
7 2.0
8 2.0
9 NaN
Name: Team, dtype: float64

You can use:
In [54]: DF1['new_col'] = DF1.Team.map(DF2.set_index('Team').Rating)
In [55]: DF1
Out[55]:
Team new_col
0 A 1
1 B 2
2 C 3
3 D 4
4 E 5
5 A 1
6 C 3
7 B 2
8 B 2

i think you can use pd.merge
DF1=pd.merge(DF1,DF2,how='left',on='Team')
DF1
Team Rating
0 A 1
1 B 2
2 C 3
3 D 4
4 E 5
5 A 1
6 C 3
7 B 2
8 B 2

Multiple columns difference of 2 Pandas DataFrame

I am new to Python and Pandas , can someone help me with below report.
I want to report difference of N columns and create new columns with difference value, is it possible to make it dynamic as I have more than 30 columns. (Columns are fixed numbers, rows values can change)
A and B can be Alpha numeric

Use join with sub for difference of DataFrames:
#if columns are strings, first cast it
df1 = df1.astype(int)
df2 = df2.astype(int)
#if first columns are not indices
#df1 = df1.set_index('ID')
#df2 = df2.set_index('ID')
df = df1.join(df2.sub(df1).add_prefix('sum'))
print (df)
A B sumA sumB
ID
0 10 2.0 5 3.0
1 11 3.0 6 5.0
2 12 4.0 7 5.0
Or similar:
df = df1.join(df2.sub(df1), rsuffix='sum')
print (df)
A B Asum Bsum
ID
0 10 2.0 5 3.0
1 11 3.0 6 5.0
2 12 4.0 7 5.0
Detail:
print (df2.sub(df1))
A B
ID
0 5 3.0
1 6 5.0
2 7 5.0

IIUC
df1[['C','D']]=(df2-df1)[['A','B']]
df1
Out[868]:
ID A B C D
0 0 10 2.0 5 3.0
1 1 11 3.0 6 5.0
2 2 12 4.0 7 5.0
df1.assign(B=0)
Out[869]:
ID A B C D
0 0 10 0 5 3.0
1 1 11 0 6 5.0
2 2 12 0 7 5.0

The 'ID' column should really be an index. See the Pandas tutorial on indexing for why this is a good idea.
df1 = df1.set_index('ID')
df2 = df2.set_index('ID')
df = df1.copy()
df[['C', 'D']] = df2 - df1
df['B'] = 0
print(df)
outputs
A B C D
ID
0 10 0 5 3.0
1 11 0 6 5.0
2 12 0 7 5.0

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How get ranges of one column gruop by class column? In Pandas - python-3.x

Related

Groupby count of non NaN of another column and a specific calculation of the same columns in pandas

Rearrange columns in DataFrame

How to replace the missing values with average of ffill() and bfill() in pandas?

Working with two data frames with different size in python

Multiple columns difference of 2 Pandas DataFrame

Categories

Resources