Rolling Sum in Pandas - python-3.x

I need to find the sum of (each row + 3) subsequent rows. Here is the code I'm using to get this accomplished: df_sum_row_and_3_subsequent_rows = df.iloc[::-1].rolling(4, min_periods=1).sum().sum(axis=1).iloc[::-1]
Is there a better way of doing this?
Here is the DataFrame that I'm using:
NBL NBT NBR SBL SBT SBR EBL EBT EBR WBL WBT WBR
DATETIME
2020-01-01 10:00:00 9 2 28 8 6 5 4 92 9 21 124 5
2020-01-01 10:15:00 13 0 24 12 2 7 5 91 7 20 123 11
2020-01-01 10:30:00 5 1 16 10 2 4 9 115 12 21 118 9
2020-01-01 10:45:00 10 5 25 9 2 6 5 114 6 25 128 13
2020-01-01 11:00:00 11 4 28 11 3 7 7 110 8 30 126 10
2020-01-01 11:15:00 0 0 0 0 0 0 0 0 0 0 0 0
2020-01-01 11:30:00 0 0 0 0 0 0 0 0 0 0 0 0
2020-01-01 11:45:00 8 5 24 12 10 12 14 130 18 42 154 17
2020-01-01 12:00:00 14 5 29 15 1 17 4 138 17 44 141 9
2020-01-01 12:15:00 12 4 45 13 3 13 13 147 27 47 134 13

Related

Pandas: Combine pandas columns that have the same column name

If we have the following df,
df
A A B B B
0 10 2 0 3 3
1 20 4 19 21 36
2 30 20 24 24 12
3 40 10 39 23 46
How can I combine the content of the columns with the same names?
e.g.
A B
0 10 0
1 20 19
2 30 24
3 40 39
4 2 3
5 4 21
6 20 24
7 10 23
8 Na 3
9 Na 36
10 Na 12
11 Na 46
I tried groupby and merge and both are not doing this job.
Any help is appreciated.
If columns names are duplicated you can use DataFrame.melt with concat:
df = pd.concat([df['A'].melt()['value'], df['B'].melt()['value']], axis=1, keys=['A','B'])
print (df)
A B
0 10.0 0
1 20.0 19
2 30.0 24
3 40.0 39
4 2.0 3
5 4.0 21
6 20.0 24
7 10.0 23
8 NaN 3
9 NaN 36
10 NaN 12
11 NaN 46
EDIT:
uniq = df.columns.unique()
df = pd.concat([df[c].melt()['value'] for c in uniq], axis=1, keys=uniq)
print (df)
A B
0 10.0 0
1 20.0 19
2 30.0 24
3 40.0 39
4 2.0 3
5 4.0 21
6 20.0 24
7 10.0 23
8 NaN 3
9 NaN 36
10 NaN 12
11 NaN 46

how to shift column labels to left python

I have dataframe i want to move column name to left from specific column. original dataframe have many columns can not do this by rename columns
df=pd.DataFrame({'A':[1,3,4,7,8,11,1,15,20,15,16,87],
'H':[1,3,4,7,8,11,1,15,78,15,16,87],
'N':[1,3,4,98,8,11,1,15,20,15,16,87],
'p':[1,3,4,9,8,11,1,15,20,15,16,87],
'B':[1,3,4,6,8,11,1,19,20,15,16,87],
'y':[0,0,0,0,1,1,1,0,0,0,0,0]})
print((df))
A H N p B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Here i want to remove label N first dataframe after removing label N
A H p B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Rrquired output:
A H P B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Here last column can be ignore
Note: in original dataframe have many columns , can not rename columns , so need some auto method to shift column names lef
You can do
df.columns=sorted(df.columns.str.replace('N',''),key=lambda x : x=='')
df
A H p B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Replace the columns with your own custom list.
>>> cols = list(df.columns)
>>> cols.remove('N')
>>> df.columns = cols + ['']
Output
>>> df
A H p B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0

Getting a number of quarter from numeric week number and the week number within the quarter in python?

I've a list of number from 1 to 53. I am trying to calculate 1) the quarter of a week and 2) the number of that week within that quarter using numeric week numbers. (if 53, needs to be qtr 4 wk 14, if 27 needs to be 3rd quarter wk 1). Got this working in excel, but not in python? Any thoughts?
tried the following, but at each try I've an issue with the wk's like 13 or 27 depending on the method I'm using.
13 -> should be qtr 1 , 27 -> should be 3 qtr.
df['qtr1'] = df['wk']//13
df['qtr2']=(np.maximum((df['wk']-1),1)/13)+1
df['qtr3']=((df1['wk']-1)//13)
df['qtr4'] = df['qtr2'].astype(int)
Results are awkward
wk qtr qtr2 qtr3 qtr4
1.0 0 1.076923 -1.0 1
13.0 1(wrong) 1.923077 0.0 1
14.0 1 2.000000 1.0 2
27.0 2 3.000000 1.0 2 (wrong)
28.0 2 3.076923 2.0 3
You can convert your weeks to integers, by using astype:
df['wk'] = df['wk'].astype(int)
You should subtract it with one first, like:
df['qtr'] = ((df['wk']-1) // 13) + 1
df['weekinqtr'] = (df['wk']-1) % 13 + 1
since 13//13 will be 1, not zero. This gives us:
>>> df
wk qtr weekinqtr
0 1 1 1
1 13 1 13
2 14 2 1
3 26 2 13
4 27 3 1
5 28 3 2
If you want extra columns per quarter, you can use get_dummies(..) [pandas-doc] to obtain a one-hot encoding per quarter:
>>> df.join(pd.get_dummies(df['qtr'], prefix='qtr'))
wk qtr weekinqtr qtr_1 qtr_2 qtr_3
0 1 1 1 1 0 0
1 13 1 13 1 0 0
2 14 2 1 0 1 0
3 26 2 13 0 1 0
4 27 3 1 0 0 1
5 28 3 2 0 0 1
Using div // and modulo % work for what you want I think
In [254]: df = pd.DataFrame({'week':range(52)})
In [255]: df['qtr'] = (df['week'] // 13) + 1
In [256]: df['qtr_week'] = df['week'] % 13
In [257]: df.loc[(df['qtr_week'] ==0),'qtr_week']=13
In [258]: df
Out[258]:
week qtr qtr_week
0 1 1 1
1 2 1 2
2 3 1 3
3 4 1 4
4 5 1 5
5 6 1 6
6 7 1 7
7 8 1 8
8 9 1 9
9 10 1 10
10 11 1 11
11 12 1 12
12 13 2 13
13 14 2 1
14 15 2 2
15 16 2 3
16 17 2 4
17 18 2 5
18 19 2 6
19 20 2 7
20 21 2 8
21 22 2 9
22 23 2 10
23 24 2 11
24 25 2 12
25 26 3 13
26 27 3 1
27 28 3 2
28 29 3 3
29 30 3 4
30 31 3 5
31 32 3 6
32 33 3 7
33 34 3 8
34 35 3 9
35 36 3 10
36 37 3 11
37 38 3 12
38 39 4 13
39 40 4 1
40 41 4 2
41 42 4 3
42 43 4 4
43 44 4 5
44 45 4 6
45 46 4 7
46 47 4 8
47 48 4 9
48 49 4 10
49 50 4 11
50 51 4 12

How to update a DataFrame with a another DataFrame by its index?

I have a origin data frame as below:
0 0
1 6.19
2 6.19
3 16.19
4 16.19
5 179.33
6 179.33
7 179.33
8 179.33
9 179.33
10 179.33
11 0
12 179.33
13 179.33
14 0
15 0
16 0
17 0
18 0
19 11.49
20 11.49
21 7.15
22 7.15
23 16.19
24 16.19
25 17.85
26 17.85
And the second Data Frame is:
2 3.19
4 16.19
6 179.33
8 179.33
10 179.33
13 179.33
20 11.49
22 7.15
24 16.19
26 17.85
You can see the first column is index, and I want it to be updated according the second data list.
For example:
0 0
1 6.19
2 6.19
Since my second data Frame is 3.19 with index at 2. so my expect output should be like:
0 0
1 6.19
2 3.19
How to I reach that? BTW, I have try to do that like:
for i in df.index:
new = df2.aa[i]
if new:
df.loc['aa'][i]=new
But it should have a good way to do that in pandas, plz help me.
Use Series.combine_first or Series.update:
df1['col'] = df2['col'].combine_first(df1['col'])
Or:
df1['col'].update(df2['col'])
print (df1)
col
0 0.00
1 6.19
2 3.19
3 16.19
4 16.19
5 179.33
6 179.33
7 179.33
8 179.33
9 179.33
10 179.33
11 0.00
12 179.33
13 179.33
14 0.00
15 0.00
16 0.00
17 0.00
18 0.00
19 11.49
20 11.49
21 7.15
22 7.15
23 16.19
24 16.19
25 17.85
26 17.85

Concatenate two dataframes by column

I have 2 dataframes. First dataframe contain number of year and count with 0:
year count
0 1890 0
1 1891 0
2 1892 0
3 1893 0
4 1894 0
5 1895 0
6 1896 0
7 1897 0
8 1898 0
9 1899 0
10 1900 0
11 1901 0
12 1902 0
13 1903 0
14 1904 0
15 1905 0
16 1906 0
17 1907 0
18 1908 0
19 1909 0
20 1910 0
21 1911 0
22 1912 0
23 1913 0
24 1914 0
25 1915 0
26 1916 0
27 1917 0
28 1918 0
29 1919 0
.. ... ...
90 1980 0
91 1981 0
92 1982 0
93 1983 0
94 1984 0
95 1985 0
96 1986 0
97 1987 0
98 1988 0
99 1989 0
100 1990 0
101 1991 0
102 1992 0
103 1993 0
104 1994 0
105 1995 0
106 1996 0
107 1997 0
108 1998 0
109 1999 0
110 2000 0
111 2001 0
112 2002 0
113 2003 0
114 2004 0
115 2005 0
116 2006 0
117 2007 0
118 2008 0
119 2009 0
[120 rows x 2 columns]
Second dataframe have similar columns but filled with smaller number of years and filled count:
year count
0 1970 1
1 1957 7
2 1947 19
3 1987 12
4 1979 7
5 1940 1
6 1950 19
7 1972 4
8 1954 15
9 1976 15
10 2006 3
11 1963 16
12 1980 6
13 1956 13
14 1967 5
15 1893 1
16 1985 5
17 1964 6
18 1949 11
19 1945 15
20 1948 16
21 1959 16
22 1958 12
23 1929 1
24 1965 12
25 1969 15
26 1946 12
27 1961 1
28 1988 1
29 1918 1
30 1999 3
31 1986 3
32 1981 2
33 1960 2
34 1974 4
35 1953 9
36 1968 11
37 1916 2
38 1955 5
39 1978 1
40 2003 1
41 1982 4
42 1984 3
43 1966 4
44 1983 3
45 1962 3
46 1952 4
47 1992 2
48 1973 4
49 1993 10
50 1975 2
51 1900 1
52 1991 1
53 1907 1
54 1977 4
55 1908 1
56 1998 2
57 1997 3
58 1895 1
I want to create third dataframe df3. For each row, if year in df1 and df2 are equal, then df3["count"] = df2["count"] else df3["count"] = df1["count"].
I tried to use join to do this:
df_new = df2.join(df1, on='year', how='left')
df_new['count'] = df_new['count'].fillna(0)
print(df_new)
But got an error:
ValueError: columns overlap but no suffix specified: Index(['year'], dtype='object')
I found the solution to this error(Pandas join issue: columns overlap but no suffix specified) But after I run code with those changes:
df_new = df2.join(df1, on='year', how='left', lsuffix='_left', rsuffix='_right')
df_new['count'] = df_new['count'].fillna(0)
print(df_new)
But output is not what I want:
count year
0 NaN 1890
1 NaN 1891
2 NaN 1892
3 NaN 1893
4 NaN 1894
5 NaN 1895
6 NaN 1896
7 NaN 1897
8 NaN 1898
9 NaN 1899
10 NaN 1900
11 NaN 1901
12 NaN 1902
13 NaN 1903
14 NaN 1904
15 NaN 1905
16 NaN 1906
17 NaN 1907
18 NaN 1908
19 NaN 1909
20 NaN 1910
21 NaN 1911
22 NaN 1912
23 NaN 1913
24 NaN 1914
25 NaN 1915
26 NaN 1916
27 NaN 1917
28 NaN 1918
29 NaN 1919
.. ... ...
29 1.0 1918
30 3.0 1999
31 3.0 1986
32 2.0 1981
33 2.0 1960
34 4.0 1974
35 9.0 1953
36 11.0 1968
37 2.0 1916
38 5.0 1955
39 1.0 1978
40 1.0 2003
41 4.0 1982
42 3.0 1984
43 4.0 1966
44 3.0 1983
45 3.0 1962
46 4.0 1952
47 2.0 1992
48 4.0 1973
49 10.0 1993
50 2.0 1975
51 1.0 1900
52 1.0 1991
53 1.0 1907
54 4.0 1977
55 1.0 1908
56 2.0 1998
57 3.0 1997
58 1.0 1895
[179 rows x 2 columns]
Desired output is:
year count
0 1890 0
1 1891 0
2 1892 0
3 1893 1
4 1894 0
5 1895 1
6 1896 0
7 1897 0
8 1898 0
9 1899 0
10 1900 1
11 1901 0
12 1902 0
13 1903 0
14 1904 0
15 1905 0
16 1906 0
17 1907 1
18 1908 1
19 1909 0
20 1910 0
21 1911 0
22 1912 0
23 1913 0
24 1914 0
25 1915 0
26 1916 2
27 1917 0
28 1918 1
29 1919 0
.. ... ...
90 1980 6
91 1981 2
92 1982 4
93 1983 3
94 1984 3
95 1985 5
96 1986 3
97 1987 12
98 1988 1
99 1989 0
100 1990 0
101 1991 1
102 1992 2
103 1993 10
104 1994 0
105 1995 0
106 1996 0
107 1997 3
108 1998 2
109 1999 3
110 2000 0
111 2001 0
112 2002 0
113 2003 1
114 2004 0
115 2005 0
116 2006 3
117 2007 0
118 2008 0
119 2009 0
[120 rows x 2 columns]
The issue if because you should place year as index. In addition, if you don't want to lose data, you should join on outer instead of left.
This is my code:
df = pd.DataFrame({
"year" : np.random.randint(1850, 2000, size=(100,)),
"qty" : np.random.randint(0, 10, size=(100,)),
})
df2 = pd.DataFrame({
"year" : np.random.randint(1850, 2000, size=(100,)),
"qty" : np.random.randint(0, 10, size=(100,)),
})
df = df.set_index("year")
df2 = df2.set_index("year")
df3 = df.join(df2["qty"], how = "outer", lsuffix='_left', rsuffix='_right')
df3 = df3.fillna(0)
At this step you have 2 columns with values from df1 or df2. In you merge rule, I don't get what you want. You said :
if df1["qty"] == df2["qty"] => df3["qty"] = df2["qty"]
if df1["qty"] != df2["qty"] => df3["qty"] = df1["qty"]
That means you want everytime df1["qty"] because of df1["qty"] == df2["qty"]. Am I right ?
Just in case. If you want a code to adjust you can use apply as follow :
def foo(x1, x2):
if x1 == x2:
return x2
else:
return x1
df3["count"] = df3.apply(lambda row: foo(row["qty_left"], row["qty_left"]), axis=1)
df3.drop(["qty_left","qty_right"], axis = 1, inplace = True)
I hope it helps,
Nicolas

Resources