Python: Summing every five rows of column b data and create a new column - python-3.x

I have a dataframe like below. I would like to sum row 0 to 4 (every 5 rows) and create another column with summed value ("new column"). My real dataframe has 263 rows so, last three rows every 12 rows will be sum of three rows only. How I can do this using Pandas/Python. I have started to learn Python recently. Thanks for any advice in advance!
My data patterns is more complex as I am using the index as one of my column values and it repeats like:
Row Data "new column"
0 5
1 1
2 3
3 3
4 2 14
5 4
6 8
7 1
8 2
9 1 16
10 0
11 2
12 3 5
0 3
1 1
2 2
3 3
4 2 11
5 2
6 6
7 2
8 2
9 1 13
10 1
11 0
12 1 2
...
259 50 89
260 1
261 4
262 5 10
I tried iterrows and groupby but can't make it work so far.

Use this:
df['new col'] = df.groupby(df.index // 5)['Data'].transform('sum')[lambda x: ~(x.duplicated(keep='last'))]
Output:
Data new col
0 5 NaN
1 1 NaN
2 3 NaN
3 3 NaN
4 2 14.0
5 4 NaN
6 8 NaN
7 1 NaN
8 2 NaN
9 1 16.0
Edit to handle updated question:
g = df.groupby(df.Row).cumcount()
df['new col'] = df.groupby([g, df.Row // 5])['Data']\
.transform('sum')[lambda x: ~(x.duplicated(keep='last'))]
Output:
Row Data new col
0 0 5 NaN
1 1 1 NaN
2 2 3 NaN
3 3 3 NaN
4 4 2 14.0
5 5 4 NaN
6 6 8 NaN
7 7 1 NaN
8 8 2 NaN
9 9 1 16.0
10 10 0 NaN
11 11 2 NaN
12 12 3 5.0
13 0 3 NaN
14 1 1 NaN
15 2 2 NaN
16 3 3 NaN
17 4 2 11.0
18 5 2 NaN
19 6 6 NaN
20 7 2 NaN
21 8 2 NaN
22 9 1 13.0
23 10 1 NaN
24 11 0 NaN
25 12 1 2.0

Related

How to get the last row with null value

I have a table:
a b c
1 11 21
2 12 22
3 3 3
NaN 14 24
NaN 15 NaN
4 4 4
5 15 25
6 6 6
7 17 27
I want to remove all the rows in column a before the last row with the null value. The output that I want is:
a b c
NaN 15 NaN
4 4 4
5 15 25
6 6 6
7 17 27
I couldn't find a better solution for this but first_valid_index and last_valid_index. I think I don't need that.
BONUS
I also want to add a new column in the dataframe if all the values in a row are the same. The following rows should have the same value:
new a b c
NaN NaN 15 NaN
4 4 4 4
4 5 15 25
6 6 6 6
6 7 17 27
Thank you!
Use isna with idxmax:
new_df = df.iloc[df["a"].isna().idxmax()+1:]
Output:
a b c
4 NaN 15 NaN
5 4.0 4 4.0
6 5.0 15 25.0
7 6.0 6 6.0
8 7.0 17 27.0
Then use pandas.Series.where with nunique:
new_df["new"] = new_df["a"].where(new_df.nunique(axis=1).eq(1)).ffill()
print(new_df)
Final output:
a b c new
4 NaN 15 NaN NaN
5 4.0 4 4.0 4.0
6 5.0 15 25.0 4.0
7 6.0 6 6.0 6.0
8 7.0 17 27.0 6.0
Find the rows that contain an NaN:
nanrows = df['a'].isnull()
Find the index of the last of them:
nanmax = df[nanrows].index.max()
Do slicing:
df.iloc[nanmax:]
# a b c
#4 NaN 15 NaN
#5 4.0 4 4.0
#6 5.0 15 25.0
#7 6.0 6 6.0
#8 7.0 17 27.0

Pandas: Combine pandas columns that have the same column name

If we have the following df,
df
A A B B B
0 10 2 0 3 3
1 20 4 19 21 36
2 30 20 24 24 12
3 40 10 39 23 46
How can I combine the content of the columns with the same names?
e.g.
A B
0 10 0
1 20 19
2 30 24
3 40 39
4 2 3
5 4 21
6 20 24
7 10 23
8 Na 3
9 Na 36
10 Na 12
11 Na 46
I tried groupby and merge and both are not doing this job.
Any help is appreciated.
If columns names are duplicated you can use DataFrame.melt with concat:
df = pd.concat([df['A'].melt()['value'], df['B'].melt()['value']], axis=1, keys=['A','B'])
print (df)
A B
0 10.0 0
1 20.0 19
2 30.0 24
3 40.0 39
4 2.0 3
5 4.0 21
6 20.0 24
7 10.0 23
8 NaN 3
9 NaN 36
10 NaN 12
11 NaN 46
EDIT:
uniq = df.columns.unique()
df = pd.concat([df[c].melt()['value'] for c in uniq], axis=1, keys=uniq)
print (df)
A B
0 10.0 0
1 20.0 19
2 30.0 24
3 40.0 39
4 2.0 3
5 4.0 21
6 20.0 24
7 10.0 23
8 NaN 3
9 NaN 36
10 NaN 12
11 NaN 46

Groupby and if condtion on a data frame in pandas

I have a below data frame
df=
city code qty1 qty2 month type
hyd 1 10 12 1 x
hyd 2 12 21 y
hyd 2 15 36 x
hyd 4 25 44 3 z
pune 1 10 1 x
pune 3 12 2 2 y
pune 1 15 3 x
pune 2 25 4 x
ban 2 10 1 1 X
ban 4 10 2 x
ban 2 12 3 x
ban 1 15 4 3 y
I want to groupby(city and code) and find both res1 and res2 based on the below conditions.
The result data frame is
result=
city code res1 res2
hyd 1 Nan 12
hyd 2 27 Nan
hyd 4 Nan Nan
pune 1 25 Nan
pune 3 Nan Nan
pune 2 25 Nan
ban 2 12 10
ban 4 10 Nan
ban 1 Nan Nan
I have tried grouping and itering the result of groupyby with the conditions. But no result. Any help would be appreciated. Thanks
You can groupby then calculated what you need one by one , then concat back
g=df.groupby(['city','code'])
pd.concat([g.apply(lambda x : sum(x['qty1'][x['month']==''])),g.apply(lambda x : sum(x['qty2'][(x['month']!='')&(x['type']=='x')]))],axis=1)
Out[135]:
0 1
city code
ban 1 0 0
2 12 0
4 10 0
hyd 1 0 12
2 27 0
4 0 0
pune 1 25 0
2 25 0
3 0 0
IIUC
df = df.set_index(['city', 'code'])
cond1 = df.month.isnull()
df['res1'] = df[cond1].groupby(['city', 'code']).qty1.sum()
cond2 = df.month.notnull() & (df.type=='x')
df['res2'] = df[cond2].groupby(['city', 'code']).qty2.sum()
qty1 qty2 month type res1 res2
city code
hyd 1 10 12 1.0 x NaN 12.0
2 12 21 NaN y 27.0 NaN
2 15 36 NaN x 27.0 NaN
4 25 44 3.0 z NaN NaN
pune 1 10 1 NaN x 25.0 NaN
3 12 2 2.0 y NaN NaN
1 15 3 NaN x 25.0 NaN
2 25 4 NaN x 25.0 NaN
ban 2 10 1 1.0 x 12.0 1.0
4 10 2 NaN x 10.0 NaN
2 12 3 NaN x 12.0 1.0
1 15 4 3.0 y NaN NaN

Transposing multi index dataframe in pandas

HID gen views
1 1 20
1 2 2532
1 3 276
1 4 1684
1 5 779
1 6 200
1 7 545
2 1 20
2 2 7478
2 3 750
2 4 7742
2 5 2643
2 6 208
2 7 585
3 1 21
3 2 4012
3 3 2019
3 4 1073
3 5 3372
3 6 8
3 7 1823
3 8 22
this is a sample section of a data frame, where HID and gen are indexes.
how can it be transformed like this
HID 1 2 3 4 5 6 7 8
1 20 2532 276 1684 779 200 545 nan
2 20 7478 750 7742 2643 208 585 nan
3 21 4012 2019 1073 3372 8 1823 22
Its called pivoting i.e
df.reset_index().pivot('HID','gen','views')
gen 1 2 3 4 5 6 7 8
HID
1 20.0 2532.0 276.0 1684.0 779.0 200.0 545.0 NaN
2 20.0 7478.0 750.0 7742.0 2643.0 208.0 585.0 NaN
3 21.0 4012.0 2019.0 1073.0 3372.0 8.0 1823.0 22.0
Use unstack:
df = df['views'].unstack()
If need also HID column add reset_index + rename_axis:
df = df['views'].unstack().reset_index().rename_axis(None, 1)
print (df)
HID 1 2 3 4 5 6 7 8
0 1 20.0 2532.0 276.0 1684.0 779.0 200.0 545.0 NaN
1 2 20.0 7478.0 750.0 7742.0 2643.0 208.0 585.0 NaN
2 3 21.0 4012.0 2019.0 1073.0 3372.0 8.0 1823.0 22.0

How to merge two dataframes with MultiIndex?

I have a frame looks like:
2015-12-30 2015-12-31
300100 am 1 3
pm 3 2
300200 am 5 1
pm 4 5
300300 am 2 6
pm 3 7
and the other frame looks like
2016-1-1 2016-1-2 2016-1-3 2016-1-4
300100 am 1 3 5 1
pm 3 2 4 5
300200 am 2 5 2 6
pm 5 1 3 7
300300 am 1 6 3 2
pm 3 7 2 3
300400 am 3 1 1 3
pm 2 5 5 2
300500 am 1 6 6 1
pm 5 7 7 5
Now I want to merge the two frames, and the frame after merge to be looked like this:
2015-12-30 2015-12-31 2016-1-1 2016-1-2 2016-1-3 2016-1-4
300100 am 1 3 1 3 5 1
pm 3 2 3 2 4 5
300200 am 5 1 2 5 2 6
pm 4 5 5 1 3 7
300300 am 2 6 1 6 3 2
pm 3 7 3 7 2 3
300400 am 3 1 1 3
pm 2 5 5 2
300500 am 1 6 6 1
pm 5 7 7 5
I tried pd.merge(frame1,frame2,right_index=True,left_index=True), but what it returned was not the desired format. Can anyone help? Thanks!
You can use concat:
print (pd.concat([frame1, frame2], axis=1))
2015-12-30 2015-12-31 1.1.2016 2.1.2016 3.1.2016 4.1.2016
300100 am 1.0 3.0 1 3 5 1
pm 3.0 2.0 3 2 4 5
300200 am 5.0 1.0 2 5 2 6
pm 4.0 5.0 5 1 3 7
300300 am 2.0 6.0 1 6 3 2
pm 3.0 7.0 3 7 2 3
300400 am NaN NaN 3 1 1 3
pm NaN NaN 2 5 5 2
300500 am NaN NaN 1 6 6 1
pm NaN NaN 5 7 7 5
Values in first and second column are converted to float, because NaN values convert int to float - see docs.
One possible solution is replace NaN by some int e.g. 0 and then convert to int:
print (pd.concat([frame1, frame2], axis=1)
.fillna(0)
.astype(int))
2015-12-30 2015-12-31 1.1.2016 2.1.2016 3.1.2016 4.1.2016
300100 am 1 3 1 3 5 1
pm 3 2 3 2 4 5
300200 am 5 1 2 5 2 6
pm 4 5 5 1 3 7
300300 am 2 6 1 6 3 2
pm 3 7 3 7 2 3
300400 am 0 0 3 1 1 3
pm 0 0 2 5 5 2
300500 am 0 0 1 6 6 1
pm 0 0 5 7 7 5
you can use join
frame1.join(frame2, how='outer')

Resources