How to make each bin of data as column of dataframe - python-3.x

I have have dataframe with column A , i want to divide column in bins and count of each bin as column of dataframe , for example bin from 0 to how many points and add this in in dataframe.
i used this code for binning but i am not sure how to insert count column in df.
df=pd.DataFrame({'max':[0.2,0.3,1,1.5,2.5,0.2]})
print(df)
max
0 0.2
1 0.3
2 1.0
3 1.5
4 2.5
5 0.2
bins = [0, 0.5, 1, 1.5, 2, 2.5]
x=pd.cut(df['max'], bins)
desired output
print(df)
0_0.5_count 0.5_1_count
0 3 1

First add parameter label to cut, then count by Series.value_counts and for DataFrame use Series.to_frame with transpose by DataFrame.T:
bins = [0, 0.5, 1, 1.5, 2, 2.5]
labels = ['{}_{}_count'.format(i, j) for i, j in zip(bins[:-1], bins[1:])]
x=pd.cut(df['max'], bins, labels=labels).value_counts().sort_index().to_frame(0).T
print (x)
0_0.5_count 0.5_1_count 1_1.5_count 1.5_2_count 2_2.5_count
0 3 1 1 0 1
Details:
print (pd.cut(df['max'], bins, labels=labels))
0 0_0.5_count
1 0_0.5_count
2 0.5_1_count
3 1_1.5_count
4 2_2.5_count
5 0_0.5_count
Name: max, dtype: category
Categories (5, object): [0_0.5_count < 0.5_1_count < 1_1.5_count < 1.5_2_count < 2_2.5_count]
print (pd.cut(df['max'], bins, labels=labels).value_counts())
0_0.5_count 3
2_2.5_count 1
1_1.5_count 1
0.5_1_count 1
1.5_2_count 0
Name: max, dtype: int64
Alternative solution with GroupBy.size:
bins = [0, 0.5, 1, 1.5, 2, 2.5]
labels = ['{}_{}_count'.format(i, j) for i, j in zip(bins[:-1], bins[1:])]
x= df.groupby(pd.cut(df['max'], bins, labels=labels)).size().rename_axis(None).to_frame().T
print (x)
0_0.5_count 0.5_1_count 1_1.5_count 1.5_2_count 2_2.5_count
0 3 1 1 0 1

Related

Count positive, negative or zero values numbers for multiple columns in Python

Given a dataset as follows:
[{'id': 1, 'ltp': 2, 'change': nan},
{'id': 2, 'ltp': 5, 'change': 1.5},
{'id': 3, 'ltp': 3, 'change': -0.4},
{'id': 4, 'ltp': 0, 'change': 2.0},
{'id': 5, 'ltp': 5, 'change': -0.444444},
{'id': 6, 'ltp': 16, 'change': 2.2}]
Or
id ltp change
0 1 2 NaN
1 2 5 1.500000
2 3 3 -0.400000
3 4 0 2.000000
4 5 5 -0.444444
5 6 16 2.200000
I would like to count the number of positive, negative and 0 values for columns ltp and change, the result may like this:
columns positive negative zero
0 ltp 5 0 1
1 change 3 2 0
How could I do that with Pandas or Numpy? Thanks.
Updated: if I need groupby type and count following the logic above
id ltp change type
0 1 2 NaN a
1 2 5 1.500000 a
2 3 3 -0.400000 a
3 4 0 2.000000 b
4 5 5 -0.444444 b
5 6 16 2.200000 b
The expected output:
type columns positive negative zero
0 a ltp 3 0 0
1 a change 1 1 0
2 b ltp 2 0 1
3 b change 2 1 0
Use np.sign with selected columns first, then counts values in value_counts, transpose, replaced missing values and last rename columns names by dictionary with convert index to column columns:
d= {-1:'negative', 1:'positive', 0:'zero'}
df = (np.sign(df[['ltp','change']])
.apply(pd.value_counts)
.T
.fillna(0)
.astype(int)
.rename(columns=d)
.rename_axis('columns')
.reset_index())
print (df)
columns negative zero positive
0 ltp 0 1 5
1 change 2 0 3
EDIT: Another solution with type column with DataFrame.melt, mapping column with np.sign and count values by crosstab:
d= {-1:'negative', 1:'positive', 0:'zero'}
df1 = df.melt(id_vars='type', value_vars=['ltp','change'], var_name='columns')
df1['value'] = np.sign(df1['value']).map(d)
df1 = (pd.crosstab([df1['type'],df1['columns']], df1['value'])
.rename_axis(columns=None)
.reset_index())
print (df1)
type columns negative positive zero
0 a change 1 1 0
1 a ltp 0 3 0
2 b change 1 2 0
3 b ltp 0 2 1

How to check when column value change from 0 to 1 and after many count column 1 and column 2 values are same

I have one dataframe with column x and column y , I want to check when column x value changes from 0 to 1 and count column y value change from 0 to 1 after how many rows after x changes from 0 to 1
here is my dataframe;
df1=pd.DataFrame({'x':[0,0,0,0,0,1,1,1,1,0,0,0,1,1,1,1,1,1,0,0,1,1,1,1],'y':[0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,1,1,1,0,0,1,1,1,1]})
desired_output
df_out=pd.DataFrame({'count_delay':[1,3,0]})
You can try with diff
id1 = df1.index[df1.x.diff().eq(1)]
id2 = df1.index[df1.y.diff().eq(1)]
id2-id1
Int64Index([1, 3, 0], dtype='int64')
For groupby
df1.groupby(df1.x.diff().eq(1).cumsum()).y.apply(lambda x : x.index[x.diff().eq(1)]-x.index.min())
x
0 Int64Index([], dtype='int64')
1 Int64Index([1], dtype='int64')
2 Int64Index([3], dtype='int64')
3 Int64Index([], dtype='int64')
Name: y, dtype: object

Python Pandas Conditional Sum and subtract previous row

I am new here and i need some help with python pandas.
I need help creating a new column where i get sum of another columns + previous row of this calculated row.
This is my example:
df = pd.DataFrame({
'column0': ['x', 'x', 'y', 'x', 'y', 'y', 'x'],
'column1': [50, 100, 30, 0, 30, 80, 0],
'column2': [0, 0, 0, 10, 0, 0, 30],
})
print(df)
column0 column1 column2
0 x 50 0
1 x 100 0
2 y 30 0
3 x 0 10
4 y 30 0
5 y 80 0
6 x 0 30
I have used loc to filter this DataFrame like this:
df = df.loc[df['column0'] == 'x']
df = df.reset_index(drop=True)
Now...when i try to get the output, i don't get correct result:
df['Result'] = df['column1'] + df['column2']
df['Result'] = df['column1'] + df['column2'] + df['Result'].shift(1)
print(df)
column0 column1 column2 Result
0 x 50 0 NaN
1 x 100 0 100.0
2 x 0 10 10.0
3 x 0 30 30.0
I just want this output....
column0 column1 column2 Result
0 x 50 0 50
1 x 100 0 150.0
2 x 0 10 160.0
3 x 0 30 190.0
Thank you very much!
You can use .cumsum() to calculate a cumulative sum of the column:
df = pd.DataFrame({
'column1': [50, 100, 30, 0, 30, 80, 0],
'column2': [0, 0, 0, 10, 0, 0, 30],
})
df['column3'] = df['column1'].cumsum() - df['column2'].cumsum()
This results in:
column1 column2 column3
0 50 0 50
1 100 0 150
2 30 0 180
3 0 10 170
4 30 0 200
5 80 0 280
6 0 30 250

Dataframe sequence detection: Find groups where three rows in a row have negative values

Lets say I have a column df['test']:
-1, -2, -3, 2, -4, 3, -5, -4, -3, -7
So I would like to filter out the groups which have at least three negative values in a row. So
groups = my_grouping_function_by_sequence()
groups[0] = [-1,-2-3]
groups[1] = [-5,-4,-3,-7]
Are there some pre-defined checks on testing for sequences in numerical data for pandas? It does not need to be pandas, but I am searching for a fast and adaptable solution. Any advice would be helpful. Thanks!
Using GroupBy and cumsum to create groups of consecutive negative numbers.
grps = df['test'].gt(0).cumsum()
dfs = [d.dropna() for _, d in df.mask(df['test'].gt(0)).groupby(grps) if d.shape[0] >= 3]
Output
for df in dfs:
print(df)
test
0 -1.0
1 -2.0
2 -3.0
test
6 -5.0
7 -4.0
8 -3.0
9 -7.0
Explanation
Let's go through this step by step:
The first line, creates groups for consecutive negative numbers
print(grps)
0 0
1 0
2 0
3 1
4 1
5 2
6 2
7 2
8 2
9 2
Name: test, dtype: int32
But as we can see, it also includes the positive numbers, which we don't want to consider in our ouput. So we use DataFrame.mask to convert these values to NaN:
df.mask(df['test'].gt(0))
# same as df.mask(df['test'] > 0)
test
0 -1.0
1 -2.0
2 -3.0
3 NaN
4 -4.0
5 NaN
6 -5.0
7 -4.0
8 -3.0
9 -7.0
Then we groupby on this dataframe and only keep the groups which are >= 3 rows:
for _, d in df.mask(df['test'].gt(0)).groupby(grps):
if d.shape[0] >= 3:
print(d.dropna())
test
0 -1.0
1 -2.0
2 -3.0
test
6 -5.0
7 -4.0
8 -3.0
9 -7.0
Too acknowledge #erfan answer elegant but didn't easily understand. My attempt below.
df = pd.DataFrame({'test': [-1, -2, -3, 2, -4, 3, -5, -4, -3, -7]})
Conditionally select rows with negatives
df['j'] = np.where(df['test']<0,1,-1)
df['k']=df['j'].rolling(3, min_periods=1).sum()
df2=df[df['k']==3]
slice Iteratively the dataframe getting 3rd and 2 consecutive rows above
for index, row in df2.iterrows():
print(df.loc[index - 2 : index + 0, 'test'])
#Erfan your answer is brilliant and I'm still trying to understand the second line. Your first line got me started to try to write it in my own, less efficient way.
import pandas as pd
df = pd.DataFrame({'test': [-1, -2, -3, 2, -4, 3, -5, -4, -3, -7]})
df['+ or -'] = df['test'].gt(0)
df['group'] = df['+ or -'].cumsum()
df_gb = df.groupby('group').count().reset_index().drop('+ or -', axis=1)
df_new = pd.merge(df, df_gb, how='left', on='group').drop('+ or -', axis=1)
df_new = df_new[(df_new['test_x'] < 0) & (df_new['test_y'] >=3)].drop('test_y',
axis=1)
for i in df_new['group'].unique():
j = pd.DataFrame(df_new.loc[df_new['group'] == i, 'test_x'])
print(j)

Drop a column in pandas if all values equal 1?

How do I drop columns in pandas where all values in that column are equal to a particular number? For instance, consider this dataframe:
df = pd.DataFrame({'A': [1, 1, 1, 1],
'B': [0, 1, 2, 3],
'C': [1, 1, 1, 1]})
print(df)
Output:
A B C
0 1 0 1
1 1 1 1
2 1 2 1
3 1 3 1
How would I drop the 1 columns so that the output is:
B
0 0
1 1
2 2
3 3
Use DataFrame.loc with test if at least one non 1 value by DataFrame.ne with DataFrame.any:
df1 = df.loc[:, df.ne(1).any()]
Or test for 1 by DataFrame.eq with DataFrame.all for all Trues per columns and inverted mask by ~:
df1 = df.loc[:, ~df.eq(1).all()]
print (df1)
B
0 0
1 1
2 2
3 3
EDIT:
One consideration is what do you want to happen if you have a column with Nan and 1 only?
Then replace NaNs to 0 by DataFrame.fillna and use same solution like before:
df1 = df.loc[:, df.fillna(0).ne(1).any()]
df1 = df.loc[:, ~df.fillna(0).eq(1).all()]
You can use any:
df.loc[:, df.ne(1).any()]
One consideration is what do you want to happen if you have a column with Nan and 1 only?
If you want to drop under this condition also, you will to either fillna with 1 or add or and new condition.
df = pd.DataFrame({'A': [1, 1, 1, 1],
'B': [0, 1, 2, 3],
'C': [1, 1, 1, np.nan]})
print(df)
A B C
0 1 0 1.0
1 1 1 1.0
2 1 2 1.0
3 1 3 NaN
All these leave that column with NaN and 1's.
df.loc[:, df.ne(1).any()]
df.loc[:, ~df.eq(1).all()]
So, you can add this addition to drop that column also.
df.loc[:, ~(df.eq(1) | df.isna()).all()]
Output:
B
0 0
1 1
2 2
3 3

Resources