How to apply rolling mean function by axis 1 python

How to apply rolling mean function by axis 1 python - python-3.x

Simply we can calculate mean by axis:
import pandas as pd
df=pd.DataFrame({'A':[1,1,0,1,0,1,1,0,1,1,1],
'b':[1,1,0,1,0,1,1,0,1,1,1],
'c':[1,1,0,1,0,1,1,0,1,1,1]})
# max_of_three columns
mean= np.max(df.mean(axis=1))
How to do this same this with rolling mean ?
I tried 1:
# max_of_three columns
mean=df.rolling(2).mean(axis=1)
got this error:
UnsupportedFunctionCall: numpy operations are not valid with window objects. Use .rolling(...).mean() instead
I tried 2:
def tt(x):
x=pd.DataFrame(x)
b1=np.max(x.mean(axis=1))
return b1
# max_of_three columns
mean=df.rolling(2).apply(tt,raw=True)
But from here I get three columns in result, in real should be 1 value for each moving window.
Where I am doing mistake? or any other efficient way to doing this.

You use the axis argument in rolling as:
df.rolling(2, axis=0).mean()
>>> A b c
0 NaN NaN NaN
1 1.0 1.0 1.0
2 0.5 0.5 0.5
3 0.5 0.5 0.5
4 0.5 0.5 0.5
5 0.5 0.5 0.5
6 1.0 1.0 1.0
7 0.5 0.5 0.5
8 0.5 0.5 0.5
9 1.0 1.0 1.0
10 1.0 1.0 1.0
r = df.rolling(2, axis=1).mean()
r
>>> A b c
0 NaN 1.0 1.0
2 NaN 0.0 0.0
3 NaN 1.0 1.0
4 NaN 0.0 0.0
5 NaN 1.0 1.0
6 NaN 1.0 1.0
7 NaN 0.0 0.0
8 NaN 1.0 1.0
9 NaN 1.0 1.0
10 NaN 1.0 1.0
r.max()
>>> A NaN
b 1.0
c 1.0
dtype: float64

Related

Splitting a dataframe when NaN rows are found

I'm trying to split a dataframe when NaN rows are found using grps = dfs.isnull().all(axis=1).cumsum().
But this is not working when some of the rows have NaN entry in a single column.
import pandas as pd
from pprint import pprint
import numpy as np
d = {
't': [0, 1, 2, 0, 2, 0, 1],
'input': [2, 2, 2, 2, 2, 2, 4],
'type': ['A', 'A', 'A', 'B', 'B', 'B', 'A'],
'value': [0.1, 0.2, 0.3, np.nan, 2, 3, 1],
}
df = pd.DataFrame(d)
dup = df['t'].diff().lt(0).cumsum()
dfs = (
df.groupby(dup, as_index=False, group_keys=False)
.apply(lambda x: pd.concat([x, pd.Series(index=x.columns, name='').to_frame().T]))
)
pprint(dfs)
grps = dfs.isnull().all(axis=1).cumsum()
temp = [dfs.dropna() for _, dfs in dfs.groupby(grps)]
i = 0
dfm = pd.DataFrame()
for df in temp:
df["name"] = f'name{i}'
i=i+1
df = df.append(pd.Series(dtype='object'), ignore_index=True)
dfm = dfm.append(df, ignore_index=True)
print(dfm)
Input df:
t input type value
0 0.0 2.0 A 0.1
1 1.0 2.0 A 0.2
2 2.0 2.0 A 0.3
NaN NaN NaN NaN
3 0.0 2.0 B NaN
4 2.0 2.0 B 2.0
NaN NaN NaN NaN
5 0.0 2.0 B 3.0
6 1.0 4.0 A 1.0
Output obtained:
t input type value name
0 0.0 2.0 A 0.1 name0
1 1.0 2.0 A 0.2 name0
2 2.0 2.0 A 0.3 name0
3 NaN NaN NaN NaN NaN
4 2.0 2.0 B 2.0 name1
5 NaN NaN NaN NaN NaN
6 0.0 2.0 B 3.0 name2
7 1.0 4.0 A 1.0 name2
8 NaN NaN NaN NaN NaN
9 NaN NaN NaN NaN NaN
Expected:
t input type value name
0 0.0 2.0 A 0.1 name0
1 1.0 2.0 A 0.2 name0
2 2.0 2.0 A 0.3 name0
3 NaN NaN NaN NaN NaN
4 0.0 2.0 B NaN name1
5 2.0 2.0 B 2.0 name1
6 NaN NaN NaN NaN NaN
7 0.0 2.0 B 3.0 name2
8 1.0 4.0 A 1.0 name2
9 NaN NaN NaN NaN NaN
I am basically doing this to append names to the last column of the dataframe after splitting df
using
dfs = (
df.groupby(dup, as_index=False, group_keys=False)
.apply(lambda x: pd.concat([x, pd.Series(index=x.columns, name='').to_frame().T]))
)
and appending NaN rows.
Again, I use the NaN rows to split the df into a list and add new column. But dfs.isnull().all(axis=1).cumsum() isn't working for me. And I also get an additional NaN row in the last row fo the output obtained.
Suggestions on how to get the expected output will be really helpful.

Setup
df = pd.DataFrame(d)
print(df)
t input type value
0 0 2 A 0.1
1 1 2 A 0.2
2 2 2 A 0.3
3 0 2 B NaN
4 2 2 B 2.0
5 0 2 B 3.0
6 1 4 A 1.0
Simplify your approach
# assign name column before splitting
m = df['t'].diff().lt(0)
df['name'] = 'name' + m.cumsum().astype(str)
# Create null dataframes to concat
nan_rows = pd.DataFrame(index=m[m].index)
last_nan_row = pd.DataFrame(index=df.index[[-1]])
# Concat and sort index
df_out = pd.concat([nan_rows, df, last_nan_row]).sort_index(ignore_index=True)
Result
t input type value name
0 0.0 2.0 A 0.1 name0
1 1.0 2.0 A 0.2 name0
2 2.0 2.0 A 0.3 name0
3 NaN NaN NaN NaN NaN
4 0.0 2.0 B NaN name1
5 2.0 2.0 B 2.0 name1
6 NaN NaN NaN NaN NaN
7 0.0 2.0 B 3.0 name2
8 1.0 4.0 A 1.0 name2
9 NaN NaN NaN NaN NaN
Alternatively if you still want to start with the initial input as dfs, here is another approach:
dfs = dfs.reset_index(drop=True)
m = dfs.isna().all(1)
dfs.loc[~m, 'name'] = 'name' + m.cumsum().astype(str)

Drop NaN containing rows in pandas DataFrame with column condition

I have a dataframe with Columns A,B,D and C. I would like to drop all NaN containing rows in the dataframe only where D and C columns contain value 0.
Eg:
Would anyone be able to help me in this issue.
Thanks & Best Regards
Michael

Use boolean indexing with inverted mask by ~:
np.random.seed(2021)
df = pd.DataFrame(np.random.choice([1,0,np.nan], size=(10, 4)), columns=list('ABCD'))
print (df)
A B C D
0 1.0 0.0 0.0 1.0
1 0.0 NaN NaN 1.0
2 NaN 0.0 0.0 0.0
3 1.0 1.0 NaN NaN
4 NaN NaN 0.0 0.0
5 0.0 NaN 0.0 1.0
6 0.0 NaN NaN 1.0
7 0.0 1.0 NaN NaN
8 1.0 0.0 1.0 0.0
9 0.0 NaN NaN NaN
If need remove columns if both D and C has 0 and another columns has NaNs use DataFrame.all for test if both values are 0 and chain by & for bitwise AND with
DataFrame.any for test if at least one value is NaN tested by DataFrame.isna:
m = df[['D','C']].eq(0).all(axis=1) & df.isna().any(axis=1)
df1 = df[~m]
print (df1)
A B C D
0 1.0 0.0 0.0 1.0
1 0.0 NaN NaN 1.0
3 1.0 1.0 NaN NaN
5 0.0 NaN 0.0 1.0
6 0.0 NaN NaN 1.0
7 0.0 1.0 NaN NaN
8 1.0 0.0 1.0 0.0
9 0.0 NaN NaN NaN
Another alternative without ~ for invert, but all conditions and also & is changed to | for bitwise OR:
m = df[['D','C']].ne(0).any(axis=1) | df.notna().all(axis=1)
df1 = df[m]
print (df1)
A B C D
0 1.0 0.0 0.0 1.0
1 0.0 NaN NaN 1.0
3 1.0 1.0 NaN NaN
5 0.0 NaN 0.0 1.0
6 0.0 NaN NaN 1.0
7 0.0 1.0 NaN NaN
8 1.0 0.0 1.0 0.0
9 0.0 NaN NaN NaN

How to read data from excel and concatenate columns vertically?

I'm reading this data from an excel file:
a b
0 x y x y
1 0 1 2 3
2 0 1 2 3
3 0 1 2 3
4 0 1 2 3
5 0 1 2 3
For each a and b categories (a.k.a samples), there two colums of x and y values. I want to convert this excel data into a dataframe that looks like this (concatenating vertically data from samples a and b):
sample x y
0 a 0.0 1.0
1 a 0.0 1.0
2 a 0.0 1.0
3 a 0.0 1.0
4 a 0.0 1.0
5 b 2.0 3.0
6 b 2.0 3.0
7 b 2.0 3.0
8 b 2.0 3.0
9 b 2.0 3.0
I've written the following code:
x=np.arange(0,4,2) # create a variable that allows to select even columns
sample_df=pd.DataFrame() # create an empty dataFrame
for i in x: # looping through the excel data
sample = pd.read_excel(xls2, usecols=[i,i], nrows=0, header=0)
values_df= pd.read_excel(xls2, usecols=[i,i+1], nrows=5, header=1)
values_df.insert(loc=0, column='sample', value=sample.columns[0])
sample_df=pd.concat([sample_df, values_df], ignore_index=True)
display(sample_df)
But, this is the Output I obtain:
sample x y x.1 y.1
0 a 0.0 1.0 NaN NaN
1 a 0.0 1.0 NaN NaN
2 a 0.0 1.0 NaN NaN
3 a 0.0 1.0 NaN NaN
4 a 0.0 1.0 NaN NaN
5 b NaN NaN 2.0 3.0
6 b NaN NaN 2.0 3.0
7 b NaN NaN 2.0 3.0
8 b NaN NaN 2.0 3.0
9 b NaN NaN 2.0 3.0

Create multiple new columns based multiple conditions in Pandas

I try to get new columns a and b based on the following dataframe:
a_x b_x a_y b_y
0 13.67 0.0 13.67 0.0
1 13.42 0.0 13.42 0.0
2 13.52 1.0 13.17 1.0
3 13.61 1.0 13.11 1.0
4 12.68 1.0 13.06 1.0
5 12.70 1.0 12.93 1.0
6 13.60 1.0 NaN NaN
7 12.89 1.0 NaN NaN
8 11.68 1.0 NaN NaN
9 NaN NaN 8.87 0.0
10 NaN NaN 8.77 0.0
11 NaN NaN 7.97 0.0
If b_x or b_y are 0.0 (at this case they have same values if they both exist), then a_x and b_y share same values, so I take either of them as new columns a and b; if b_x or b_y are 1.0, they are different values, so I calculate means of a_x and a_y as the values of a, take either b_x and b_y as b;
If a_x, b_x or a_y, b_y is not null, so I'll take existing values as a and b.
My expected results will like this:
a_x b_x a_y b_y a b
0 13.67 0.0 13.67 0.0 13.670 0
1 13.42 0.0 13.42 0.0 13.420 0
2 13.52 1.0 13.17 1.0 13.345 1
3 13.61 1.0 13.11 1.0 13.360 1
4 12.68 1.0 13.06 1.0 12.870 1
5 12.70 1.0 12.93 1.0 12.815 1
6 13.60 1.0 NaN NaN 13.600 1
7 12.89 1.0 NaN NaN 12.890 1
8 11.68 1.0 NaN NaN 11.680 1
9 NaN NaN 8.87 0.0 8.870 0
10 NaN NaN 8.77 0.0 8.770 0
11 NaN NaN 7.97 0.0 7.970 0
How can I get an result above? Thank you.

Use:
#filter all a and b columns
b = df.filter(like='b')
a = df.filter(like='a')
#test if at least one 0 or 1 value
m1 = b.eq(0).any(axis=1)
m2 = b.eq(1).any(axis=1)
#get means of a columns
a1 = a.mean(axis=1)
#forward filling mising values and select last column
b1 = b.ffill(axis=1).iloc[:, -1]
a2 = a.ffill(axis=1).iloc[:, -1]
#new Dataframe with 2 conditions
df1 = pd.DataFrame(np.select([m1, m2], [[a2, b1], [a1, b1]]), index=['a','b']).T
#join to original
df = df.join(df1)
print (df)
a_x b_x a_y b_y a b
0 13.67 0.0 13.67 0.0 13.670 0.0
1 13.42 0.0 13.42 0.0 13.420 0.0
2 13.52 1.0 13.17 1.0 13.345 1.0
3 13.61 1.0 13.11 1.0 13.360 1.0
4 12.68 1.0 13.06 1.0 12.870 1.0
5 12.70 1.0 12.93 1.0 12.815 1.0
6 13.60 1.0 NaN NaN 13.600 1.0
7 12.89 1.0 NaN NaN 12.890 1.0
8 11.68 1.0 NaN NaN 11.680 1.0
9 NaN NaN 8.87 0.0 8.870 0.0
10 NaN NaN 8.77 0.0 8.770 0.0
11 NaN NaN 7.97 0.0 7.970 0.0
But I think solution should be simplify, because mean should be used for both conditions (because mean of same values is same like first value):
b = df.filter(like='b')
a = df.filter(like='a')
m1 = b.eq(0).any(axis=1)
m2 = b.eq(1).any(axis=1)
a1 = a.mean(axis=1)
b1 = b.ffill(axis=1).iloc[:, -1]
df['a'] = a1
df['b'] = b1
print (df)
a_x b_x a_y b_y a b
0 13.67 0.0 13.67 0.0 13.670 0.0
1 13.42 0.0 13.42 0.0 13.420 0.0
2 13.52 1.0 13.17 1.0 13.345 1.0
3 13.61 1.0 13.11 1.0 13.360 1.0
4 12.68 1.0 13.06 1.0 12.870 1.0
5 12.70 1.0 12.93 1.0 12.815 1.0
6 13.60 1.0 NaN NaN 13.600 1.0
7 12.89 1.0 NaN NaN 12.890 1.0
8 11.68 1.0 NaN NaN 11.680 1.0
9 NaN NaN 8.87 0.0 8.870 0.0
10 NaN NaN 8.77 0.0 8.770 0.0
11 NaN NaN 7.97 0.0 7.970 0.0

'Error: Can't assign to function call' while using eval() function in python

I am trying to run 2 nested loops to separate data from 1 huge dataframe (say, data) into 12 separate dataframes. 'data' has columns (leaf1, leaf2, leaf3, leaf4, .., leaf12). I created 12 different dataframes with names leaf1, leaf2, leaf3 ...,leaf12. I am checking each row of main dataframe. And if that row is not 'NaN', then I am appending that into one of the newly created dataframes using following code:
leaf1 = pd.DataFrame()
leaf2 = pd.DataFrame()
.
.
.
leaf12 = pd.DataFrame()
list1 = ['leaf1', 'leaf2',...,'leaf12']
for i in list1:
temp1 = data[[i]]
if temp1.isnull().any().any() == False:
eval(i) = eval(i).append(temp1)
In the last line, I need to convert the string in to variable and then append the dataframe into that variable. However, I am getting an error. Please help.

I think better is convert it to dictionary of DataFrames:
np.random.seed(1997)
df = pd.DataFrame(np.random.choice([np.nan,1,5], size=(10,12)))
df.columns = ['leaf{}'.format(x+1) for x in df.columns]
print (df)
leaf1 leaf2 leaf3 leaf4 leaf5 leaf6 leaf7 leaf8 leaf9 leaf10 \
0 1.0 1.0 NaN NaN 5.0 5.0 5.0 5.0 1.0 NaN
1 1.0 5.0 NaN 1.0 1.0 NaN 1.0 NaN 5.0 1.0
2 1.0 5.0 5.0 1.0 NaN 1.0 1.0 NaN 5.0 NaN
3 1.0 5.0 NaN 1.0 1.0 5.0 5.0 1.0 1.0 1.0
4 NaN 5.0 5.0 5.0 NaN 1.0 1.0 1.0 1.0 1.0
5 NaN NaN NaN 1.0 NaN 5.0 5.0 1.0 1.0 1.0
6 5.0 1.0 1.0 1.0 NaN 1.0 1.0 5.0 5.0 1.0
7 5.0 1.0 5.0 NaN NaN 5.0 NaN 1.0 1.0 5.0
8 5.0 5.0 1.0 NaN 1.0 1.0 5.0 1.0 5.0 1.0
9 5.0 1.0 5.0 NaN 5.0 NaN NaN 5.0 1.0 NaN
leaf11 leaf12
0 NaN 1.0
1 NaN 5.0
2 1.0 5.0
3 1.0 1.0
4 NaN NaN
5 5.0 1.0
6 5.0 NaN
7 NaN 1.0
8 NaN 1.0
9 NaN 5.0
dfs = {c:df[[c]] if df[c].notnull().all()
else pd.DataFrame(columns=[c]) for c in df.columns }
It is same as:
dfs = {}
for c in df.columns:
if df[c].notnull().all():
dfs[c] = df[[c]]
else:
dfs[c] = pd.DataFrame(columns=[c])
And then select by keys, here column name:
print (dfs['leaf1'])
leaf1
0 1.0
1 1.0
2 4.0
3 1.0
4 4.0
5 1.0
6 1.0
7 1.0
8 4.0
9 1.0
print (dfs['leaf3'])
Empty DataFrame
Columns: [leaf3]
Index: []

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to apply rolling mean function by axis 1 python - python-3.x

Related

Splitting a dataframe when NaN rows are found

Drop NaN containing rows in pandas DataFrame with column condition

How to read data from excel and concatenate columns vertically?

Create multiple new columns based multiple conditions in Pandas

'Error: Can't assign to function call' while using eval() function in python

Categories

Resources