How to swap two rows of a Pandas DataFrame? - python-3.x

Suppose I have this dataframe :
0 1 2 3 4
0 0 1 2 3 4
1 5 6 7 8 9
2 10 11 12 13 14
3 15 16 17 18 19
4 20 21 22 23 24
I want to swap the position of row 1 and 2.
Is there a native Pandas function that can do this?
Thanks!

Use rename with a custom dict and sort_index
d = {1: 2, 2: 1}
df_final = df.rename(d).sort_index()
Out[27]:
0 1 2 3 4
0 0 1 2 3 4
1 10 11 12 13 14
2 5 6 7 8 9
3 15 16 17 18 19
4 20 21 22 23 24

As far as I am aware there is no Native Pandas function for this.
But here is a custom function:
# Input
df = pd.DataFrame(np.arange(25).reshape(5, -1))
# Output
def swap_rows(df, i1, i2):
a, b = df.iloc[i1, :].copy(), df.iloc[i2, :].copy()
df.iloc[i1, :], df.iloc[i2, :] = b, a
return df
print(swap_rows(df, 1, 2))
Output:
0 1 2 3 4
0 0 1 2 3 4
1 10 11 12 13 14
2 5 6 7 8 9
3 15 16 17 18 19
4 20 21 22 23 24
Cheers!

Try numpy flip:
df.iloc[1:3] = np.flip(df.to_numpy()[1:3], axis=0)
df
0 1 2 3 4
0 0 1 2 3 4
1 10 11 12 13 14
2 5 6 7 8 9
3 15 16 17 18 19
4 20 21 22 23 24

df1=df.copy()
df1.iloc[1,:],df1.iloc[2,:]=df.iloc[2,:],df.iloc[1,:]
df1

Related

Map to a Trans capacity request from transit(Nodal) points using pandas dataframes

Input Dataframe With TerminalID,TName,XY cordinate, PeopleID
import pandas as pd
data = {
'TerminalID': ['5','21','21','2','21','2','5','22','22','22','2','32','41','41','42','50','50'],
'TName': ['AD','AMBO','AMBO','PS','AMBO','PS','AD','AM','AM','AM','PS','BO','BA','BA','BB','AZ','AZ'],
'xy': ['1.12731,1.153756','0.12731,0.153757','0.12731,0.153757','1.989385,1.201941','0.12731,0.153757','1.989385,1.201941','1.12731,1.153756','2.12731,1.153756','2.12731,1.153756','2.12731,1.153756','1.989385,1.201941','1.989385,1.201941','2.989385,1.201941','2.989385,1.201941','2.989385,3.201941','3.989385,3.201941','3.989385,3.201941'],
'Pcode': [ 'None','Z014','Z015','Z016','Z017','Z018','None','Z020','Z021','Z022','Z023','Z024','Z025','Z026','Z027','Z028','Z029']
}
df = pd.DataFrame.from_dict(data)
Out[55]:
output of DF1
TerminalID TName xy Pcode
0 5 AD 1.12731,1.153756 None
1 21 AMBO 0.12731,0.153757 Z014
2 21 AMBO 0.12731,0.153757 Z015
3 2 PS 1.989385,1.201941 Z016
4 21 AMBO 0.12731,0.153757 Z017
5 2 PS 1.989385,1.201941 Z018
6 5 AD 1.12731,1.153756 None
7 22 AM 2.12731,1.153756 Z020
8 22 AM 2.12731,1.153756 Z021
9 22 AM 2.12731,1.153756 Z022
10 2 PS 1.989385,1.201941 Z023
11 32 BO 1.989385,1.201941 Z024
12 41 BA 2.989385,1.201941 Z025
13 41 BA 2.989385,1.201941 Z026
14 42 BB 2.989385,3.201941 Z027
15 50 AZ 3.989385,3.201941 Z028
16 50 AZ 3.989385,3.201941 Z029
DF2,
T_cap is the capacity requirement at the Terminal ID and the T_load is the Load details, Tcap is the running count increment and T_load is the actual request at the Termainal,
The 0 at the start and the end are padding for the solution
data2= {
'BusID': ['18','18','18','18','18','18','18','18','18'],
'Tcap': ['0','2','3','6','7','8','10','12','12'],
'T_Load': ['0','2','1','2','2','1','2','2','0'],
'TerminalID': [ '5','21','33','2','32','42','41','50','5'],
'TName':['AD','AMBO','AM','PS','BO','BB','BA','AZ','AD']
}
df2 = pd.DataFrame.from_dict(data2)
Out[59]:
BusID Tcap T_Load TerminalID TName
0 18 0 0 5 AD
1 18 2 2 21 AMBO
2 18 3 1 33 AM
3 18 6 2 2 PS
4 18 7 2 32 BO
5 18 8 1 42 BB
6 18 10 2 41 BA
7 18 12 2 50 AZ
8 18 12 0 5 AD
Data Frame # the Final output requested
The output is based on the T_Load contstrains.
data3 = {
'BusID': ['18','18','18','18','18','18','18','18','18'],
'Tcap': ['0','2','3','6','7','8','10','12','12'],
'T_Load': ['0','2','1','3','1','1','2','2','0'],
'TerminalID': [ '5','21','33','2','32','42','41','50','5'],
'TName':['AD','AMBO','AM','PS','BO','BB','BA','AZ','AD'],
'Pcode':['None','Z013,Z019','Z020','Z016,Z018,Z023','Z024','Z027','Z025,Z026','Z028,Z029','None']
}
df3 = pd.DataFrame.from_dict(data3)
Out[61]:
BusID Tcap T_Load TerminalID TName Pcode
0 18 0 0 5 AD None
1 18 2 2 21 AMBO Z013,Z019
2 18 3 1 33 AM Z020
3 18 6 3 2 PS Z016,Z018,Z023
4 18 7 1 32 BO Z024
5 18 8 1 42 BB Z027
6 18 10 2 41 BA Z025,Z026
7 18 12 2 50 AZ Z028,Z029
8 18 12 0 5 AD None
Thanking you
My solution aggreagte join per TerminalID and TName and assign to another DataFrame by aggreagte list, last filter values by positions in list comprehension with join:
s = df.groupby(['TerminalID','TName'])['Pcode'].agg(list).rename('P_list')
df = df2.join(s, on=['TerminalID','TName'])
df['P_list'] = [','.join(x[:int(y)]) if int(y) != 0 else None
for x, y in zip(df['P_list'], df['T_Load'])]
print (df)
BusID Tcap T_Load TerminalID TName P_list
0 18 0 0 5 AD None
1 18 2 2 21 AMBO Z014,Z015
2 18 3 1 22 AM Z020
3 18 6 3 2 PS Z016,Z018,Z023
4 18 7 1 32 BO Z024
5 18 8 1 42 BB Z027
6 18 10 2 41 BA Z025,Z026
7 18 12 2 50 AZ Z028,Z029
8 18 12 0 5 AD None
You can map the aggregated strings per TName:
df2['Plist'] = df2['TName'].map(df.groupby('TName')['Pcode'].agg(','.join))
or, if you want to replace multiple string None as single one:
df2['Plist'] = df2['TName'].map(df.groupby('TName')['Pcode']
.agg(lambda x: ','.join(e for e in x if e != 'None'))
.replace('', 'None')
)
output:
BusID Tcap T_Load TerminalID TName Plist
0 18 0 0 5 AD None
1 18 2 2 21 AMBO Z014,Z015,Z017
2 18 3 1 22 AM Z020,Z021,Z022
3 18 6 3 2 PS Z016,Z018,Z023
4 18 7 1 32 BO Z024
5 18 8 1 42 BB Z027
6 18 10 2 41 BA Z025,Z026
7 18 12 2 50 AZ Z028,Z029
8 18 12 0 5 AD None
update: limiting the output:
you can then trim the column with a regex, we can use a groupby to benefit from vectorized string operations within each group (this is mostly interesting if there are few groups and many rows):
df2['P_list'] = (df2.groupby('T_Load')['P_list']
.apply(lambda c: c.str.extract(rf'((?:[^,]+,?){{,{str(c.name)}}})',
expand=False)
.str.strip(',')
)
.replace('', 'None')
)
output:
BusID Tcap T_Load TerminalID TName P_list
0 18 0 0 5 AD None
1 18 2 2 21 AMBO Z014,Z015
2 18 3 1 22 AM Z020
3 18 6 3 2 PS Z016,Z018,Z023
4 18 7 1 32 BO Z024
5 18 8 1 42 BB Z027
6 18 10 2 41 BA Z025,Z026
7 18 12 2 50 AZ Z028,Z029
8 18 12 0 5 AD None

Efficient way to populate missing indexes from pandas group by

I grouped a column in a pandas dataframe by the number of occurrences of an event per hour of the day like so:
df_sep.hour.groupby(df_sep.time.dt.hour).size()
Which gives the following result:
time
2 31
3 6
4 7
5 4
6 38
7 9
8 5
9 31
10 8
11 2
12 5
13 30
14 1
15 1
16 28
18 1
20 4
21 29
Name: hour, dtype: int64
For plotting, I would like to complete the series for each hour of the day. ie, there are no occurrences at midnight (0). So for every missing hour, I would like to create that index and add zero to the corresponding value.
To solve this I created two lists (x and y) using the following loop, but it feels a bit hacky... is there a better way to solve this?
x = []
y = []
for i in range(24):
if i not in df_sep.hour.groupby(df_sep.time.dt.hour).size().index:
x.append(i)
y.append(0)
else:
x.append(i)
y.append(df_sep.hour.groupby(df_sep.time.dt.hour).size().loc[i])
result:
for i, j in zip(x, y):
print(i, j)
0 0
1 0
2 31
3 6
4 7
5 4
6 38
7 9
8 5
9 31
10 8
11 2
12 5
13 30
14 1
15 1
16 28
17 0
18 1
19 0
20 4
21 29
22 0
23 0
Use Series.reindex with range(24):
df_sep.hour.groupby(df_sep.time.dt.hour).size().reindex(range(24), fill_value=0)

how to shift column labels to left python

I have dataframe i want to move column name to left from specific column. original dataframe have many columns can not do this by rename columns
df=pd.DataFrame({'A':[1,3,4,7,8,11,1,15,20,15,16,87],
'H':[1,3,4,7,8,11,1,15,78,15,16,87],
'N':[1,3,4,98,8,11,1,15,20,15,16,87],
'p':[1,3,4,9,8,11,1,15,20,15,16,87],
'B':[1,3,4,6,8,11,1,19,20,15,16,87],
'y':[0,0,0,0,1,1,1,0,0,0,0,0]})
print((df))
A H N p B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Here i want to remove label N first dataframe after removing label N
A H p B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Rrquired output:
A H P B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Here last column can be ignore
Note: in original dataframe have many columns , can not rename columns , so need some auto method to shift column names lef
You can do
df.columns=sorted(df.columns.str.replace('N',''),key=lambda x : x=='')
df
A H p B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Replace the columns with your own custom list.
>>> cols = list(df.columns)
>>> cols.remove('N')
>>> df.columns = cols + ['']
Output
>>> df
A H p B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0

Sum two dataframes for equal entries

I have two dataframes with same entries in column A, but different entries in columns B and C.
One dataframe has multiple entries for one entry in A.
df1
A B C
0 this 3 4
1 is 4 6
2 an 7 9
3 example 12 20
df2
A B C
0 this 11 11
1 this 5 9
2 this 18 7
3 is 12 14
4 an 1 4
5 an 8 12
6 example 3 17
7 example 9 5
8 example 19 6
9 example 7 1
I want to sum the two dataframes for same entries in column A. The result shoul look like this:
df3
A B C
0 this 14 15
1 this 8 13
2 this 21 11
3 is 16 20
4 an 8 13
5 an 15 21
6 example 15 37
7 example 21 25
8 example 31 26
9 example 19 21
How can I calculate this in a fast way in pandas?
Use DataFrame.merge to left merge the dataframe df2 with df1 on column A then add the columns B, C of df2 to the columns B, C of df3:
df3 = df2[['A']].merge(df1, on='A', how='left')
df3[['B', 'C']] += df2[['B', 'C']]
Result:
print(df3)
A B C
0 this 14 15
1 this 8 13
2 this 21 11
3 is 16 20
4 an 8 13
5 an 15 21
6 example 15 37
7 example 21 25
8 example 31 26
9 example 19 21
OR another possible idea if order is not important:
df3 = df2.set_index('A').add(df1.set_index('A')).reset_index()
print(df3)
A B C
0 an 8 13
1 an 15 21
2 example 15 37
3 example 21 25
4 example 31 26
5 example 19 21
6 is 16 20
7 this 14 15
8 this 8 13
9 this 21 11

Getting a number of quarter from numeric week number and the week number within the quarter in python?

I've a list of number from 1 to 53. I am trying to calculate 1) the quarter of a week and 2) the number of that week within that quarter using numeric week numbers. (if 53, needs to be qtr 4 wk 14, if 27 needs to be 3rd quarter wk 1). Got this working in excel, but not in python? Any thoughts?
tried the following, but at each try I've an issue with the wk's like 13 or 27 depending on the method I'm using.
13 -> should be qtr 1 , 27 -> should be 3 qtr.
df['qtr1'] = df['wk']//13
df['qtr2']=(np.maximum((df['wk']-1),1)/13)+1
df['qtr3']=((df1['wk']-1)//13)
df['qtr4'] = df['qtr2'].astype(int)
Results are awkward
wk qtr qtr2 qtr3 qtr4
1.0 0 1.076923 -1.0 1
13.0 1(wrong) 1.923077 0.0 1
14.0 1 2.000000 1.0 2
27.0 2 3.000000 1.0 2 (wrong)
28.0 2 3.076923 2.0 3
You can convert your weeks to integers, by using astype:
df['wk'] = df['wk'].astype(int)
You should subtract it with one first, like:
df['qtr'] = ((df['wk']-1) // 13) + 1
df['weekinqtr'] = (df['wk']-1) % 13 + 1
since 13//13 will be 1, not zero. This gives us:
>>> df
wk qtr weekinqtr
0 1 1 1
1 13 1 13
2 14 2 1
3 26 2 13
4 27 3 1
5 28 3 2
If you want extra columns per quarter, you can use get_dummies(..) [pandas-doc] to obtain a one-hot encoding per quarter:
>>> df.join(pd.get_dummies(df['qtr'], prefix='qtr'))
wk qtr weekinqtr qtr_1 qtr_2 qtr_3
0 1 1 1 1 0 0
1 13 1 13 1 0 0
2 14 2 1 0 1 0
3 26 2 13 0 1 0
4 27 3 1 0 0 1
5 28 3 2 0 0 1
Using div // and modulo % work for what you want I think
In [254]: df = pd.DataFrame({'week':range(52)})
In [255]: df['qtr'] = (df['week'] // 13) + 1
In [256]: df['qtr_week'] = df['week'] % 13
In [257]: df.loc[(df['qtr_week'] ==0),'qtr_week']=13
In [258]: df
Out[258]:
week qtr qtr_week
0 1 1 1
1 2 1 2
2 3 1 3
3 4 1 4
4 5 1 5
5 6 1 6
6 7 1 7
7 8 1 8
8 9 1 9
9 10 1 10
10 11 1 11
11 12 1 12
12 13 2 13
13 14 2 1
14 15 2 2
15 16 2 3
16 17 2 4
17 18 2 5
18 19 2 6
19 20 2 7
20 21 2 8
21 22 2 9
22 23 2 10
23 24 2 11
24 25 2 12
25 26 3 13
26 27 3 1
27 28 3 2
28 29 3 3
29 30 3 4
30 31 3 5
31 32 3 6
32 33 3 7
33 34 3 8
34 35 3 9
35 36 3 10
36 37 3 11
37 38 3 12
38 39 4 13
39 40 4 1
40 41 4 2
41 42 4 3
42 43 4 4
43 44 4 5
44 45 4 6
45 46 4 7
46 47 4 8
47 48 4 9
48 49 4 10
49 50 4 11
50 51 4 12

Resources