How to extract value of column based on value change in other column python - python-3.x

I have dataframe with two columns i want extract value of first column based on second column, if in last 3 rows of column 2 value change from 0 to any value then extract value of column 1.
df=pd.DataFrame({'column1':[1,5,6,7,8,11,12,14,18,20],'column2':[0,0,1,1,0,0,0,256,256,0]})
print(df)
column1 column2
0 1 0
1 5 0
2 6 1
3 7 1
4 8 0
5 11 0
6 12 0
7 14 256
8 18 256
9 20 0
out_put=pd.DataFrame({'column1':[20],'column2':[0]})
print(out_put)
column1 column2
0 20 0

I believe you need check difference with last values to first in last 3 values of second column:
df1 = df.tail(3)
df2 = df1[df1['column2'].eq(0).view('i1').diff().eq(1)]
print (df2)
column1 column2
9 20 0
Details:
#last 3 rows
print (df1)
column1 column2
7 14 256
8 18 256
9 20 0
#compare second colum for equality
print (df1['column2'].eq(0))
7 False
8 False
9 True
Name: column2, dtype: bool
#convert mask to integers
print (df1['column2'].eq(0).view('i1'))
7 0
8 0
9 1
Name: column2, dtype: int8
#get difference
print (df1['column2'].eq(0).view('i1').diff())
Name: column2, dtype: int8
7 NaN
8 0.0
9 1.0
Name: column2, dtype: float64
#compare by 1
print (df1['column2'].eq(0).view('i1').diff().eq(1))
7 False
8 False
9 True
Name: column2, dtype: bool
And last filter by boolean indexing.

Related

Python create a column based on the values of each row of another column

I have a pandas dataframe as below:
import pandas as pd
df = pd.DataFrame({'ORDER':["A", "A", "A", "B", "B","B"], 'GROUP': ["A_2018_1B1", "A_2018_1B1", "A_2018_1M1", "B_2018_I000_1C1", "B_2018_I000_1B1", "B_2018_I000_1C1H"], 'VAL':[1,3,8,5,8,10]})
df
ORDER GROUP VAL
0 A A_2018_1B1 1
1 A A_2018_1B1H 3
2 A A_2018_1M1 8
3 B B_2018_I000_1C1 5
4 B B_2018_I000_1B1 8
5 B B_2018_I000_1C1H 10
I want to create a column "CAL" as sum of 'VAL' where GROUP name is same for all the rows expect H character in the end. So, for example, 'VAL' column for 1st two rows will be added because the only difference between the 'GROUP' is 2nd row has H in the last. Row 3 will remain as it is, Row 4 and 6 will get added and Row 5 will remain same.
My expected output
ORDER GROUP VAL CAL
0 A A_2018_1B1 1 4
1 A A_2018_1B1H 3 4
2 A A_2018_1M1 8 8
3 B B_2018_I000_1C1 5 15
4 B B_2018_I000_1B1 8 8
5 B B_2018_I000_1C1H 10 15
Try with replace then transform
df.groupby(df.GROUP.str.replace('H','')).VAL.transform('sum')
0 4
1 4
2 8
3 15
4 8
5 15
Name: VAL, dtype: int64
df['CAL'] = df.groupby(df.GROUP.str.replace('H','')).VAL.transform('sum')

How to count consecutive value change to zero in column python

from dataframe want to check how many times value change to zero in columns.
here is input df
pd.DataFrame({'value1':[3,4,7,0,11,20,0,20,15,16],
'value2':[2,2,0,8,8,2,2,2,5,5],
'value3':[7,10,20,4008,0,1,4820,1,1,1]})
value1 value2 value3
0 3 2 7
1 4 2 10
2 7 0 20
3 0 8 4008
4 11 8 0
5 20 2 1
6 0 2 4820
7 20 2 1
8 15 5 1
9 16 5 1
desired output:
df_out=pd.DataFrame({'value1_count':[2],
'value2_count':[1],
'value3_ount':[1]})
value1_count value2_count value3_ount
0 2 1 1
Try this
df.eq(0).astype(int).diff().eq(-1).sum()
Out[77]:
value1 2
value2 1
value3 1
dtype: int64
To get exact your output, just add the following
df.eq(0).astype(int).diff().eq(-1).sum().to_frame().T.add_suffix('_count')
Out[85]:
value1_count value2_count value3_count
0 2 1 1
Here's something you can do
df_out=pd.DataFrame({'value1_count':[df['value1'].value_counts()[0]],'value2_count':[df['value2'].value_counts()[0]],'value3_count':[df['value3'].value_counts()[0]]})
Output
value1_count value2_count value3_count
0 2 1 1
.value_counts() returns a pandas.Series object with the frequency of all the values, the index being the value. So at index [0] you find the frequency of zeros in the column.
>>> columns_name = ['value1_count','value2_count','value3_ount']
>>> df_out = pd.DataFrame((df==0).sum().values.reshape(1,-1), columns=columns_name )
>>> df_out
value1_count value2_count value3_ount
0 2 1 1

How to check value change in column

in my dataframe have three columns columns value ,ID and distance . i want to check in ID column when its changes from 2 to any other value count rows and record first value and last value when 2 changes to other value and save and also save corresponding value of column distance when change from 2 to other in ID column.
df=pd.DataFrame({'value':[3,4,7,8,11,20,15,20,15,16],'ID':[2,2,8,8,8,2,2,2,5,5],'distance':[0,0,1,0,0,0,0,0,0,0]})
print(df)
value ID distance
0 3 2 0
1 4 2 0
2 7 8 1
3 8 8 0
4 11 8 0
5 20 2 0
6 15 2 0
7 20 2 0
8 15 5 0
9 16 5 0
required results:
df_out=pd.DataFrame({'rows_Count':[3,2],'value_first':[7,15],'value_last':[11,16],'distance_first':[1,0]})
print(df_out)
rows_Count value_first value_last distance_first
0 3 7 11 1
1 2 15 16 0
Use:
#compare by 2
m = df['ID'].eq(2)
#filter out data before first 2 (in sample data not, in real data possible)
df = df[m.cumsum().ne(0)]
#create unique groups for non 2 groups, add misisng values by reindex
s = m.ne(m.shift()).cumsum()[~m].reindex(df.index)
#aggregate with helper s Series
df1 = df.groupby(s).agg({'ID':'size', 'value':['first','last'], 'distance':'first'})
#flatten MultiIndex
df1.columns = df1.columns.map('_'.join)
df1 = df1.reset_index(drop=True)
print (df1)
ID_size value_first value_last distance_first
0 3 7 11 1
1 2 15 16 0
Verify in changed data (not only 2 first group):
df=pd.DataFrame({'value':[3,4,7,8,11,20,15,20,15,16],
'ID':[1,7,8,8,8,2,2,2,5,5],
'distance':[0,0,1,0,0,0,0,0,0,0]})
print(df)
value ID distance
0 3 1 0 <- changed ID
1 4 7 0 <- changed ID
2 7 8 1
3 8 8 0
4 11 8 0
5 20 2 0
6 15 2 0
7 20 2 0
8 15 5 0
9 16 5 0
#compare by 2
m = df['ID'].eq(2)
#filter out data before first 2 (in sample data not, in real data possible)
df = df[m.cumsum().ne(0)]
#create unique groups for non 2 groups, add misisng values by reindex
s = m.ne(m.shift()).cumsum()[~m].reindex(df.index)
#aggregate with helper s Series
df1 = df.groupby(s).agg({'ID':'size', 'value':['first','last'], 'distance':'first'})
#flatten MultiIndex
df1.columns = df1.columns.map('_'.join)
df1 = df1.reset_index(drop=True)
print (df1)
ID_size value_first value_last distance_first
0 2 15 16 0

Selective multiplication of a pandas dataframe

I have a pandas Dataframe and Series of the form
df = pd.DataFrame({'Key':[2345,2542,5436,2468,7463],
'Segment':[0] * 5,
'Values':[2,4,6,6,4]})
print (df)
Key Segment Values
0 2345 0 2
1 2542 0 4
2 5436 0 6
3 2468 0 6
4 7463 0 4
s = pd.Series([5436, 2345])
print (s)
0 5436
1 2345
dtype: int64
In the original df, I want to multiply the 3rd column(Values) by 7 except for the keys which are present in the series. So my final df should look like
What should be the best way to achieve this in Python 3.x?
Use DataFrame.loc with Series.isin for filter Value column with inverted condition for non membership with multiple by scalar:
df.loc[~df['Key'].isin(s), 'Values'] *= 7
print (df)
Key Segment Values
0 2345 0 2
1 2542 0 28
2 5436 0 6
3 2468 0 42
4 7463 0 28
Another method could be using numpy.where():
df['Values'] *= np.where(~df['Key'].isin([5436, 2345]), 7,1)

Unstacking a pandas dataframe

Suppose I have a dataframe with two columns called 'column' and 'value' that looks like this:
Dataframe 1:
column value
0 column1 1
1 column2 1
2 column3 1
3 column4 1
4 column5 2
5 column6 1
6 column7 1
7 column8 1
8 column9 8
9 column10 2
10 column1 1
11 column2 1
12 column3 1
13 column4 3
14 column5 2
15 column6 1
16 column7 1
17 column8 1
18 column9 1
19 column10 2
20 column1 5
.. ... ...
I want to transform this dataframe so that it looks like this:
Dataframe 2:
column1 column2 column3 column4 column5 column6 column7 column8 column9 column10
0 1 1 1 1 2 1 1 1 8 2
1 1 1 1 3 2 1 1 1 1 2
2 5 .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. ..
Now I know how to do it the other way around. If you have a dataframe called df that looks like dataframe 2 you can stack it with the following code:
df = (df.stack().reset_index(level=0, drop=True).rename_axis(['column']).reset_index(name='value'))
Unfortunately, I don't know how to go back!
Question: How do I manipulate dataframe 1 (unstack it, if that's a word) so that it looks like dataframe 2?
Create MultiIndex by set_index with counter Series by cumcount and reshape by unstack:
g = df.groupby('column').cumcount()
df1 = df.set_index([g, 'column'])['value'].unstack(fill_value=0)
print (df1)
column column1 column10 column2 column3 column4 column5 column6 \
0 1 2 1 1 1 2 1
1 1 2 1 1 3 2 1
2 5 0 0 0 0 0 0
column column7 column8 column9
0 1 1 8
1 1 1 1
2 0 0 0
Last if need sorting by numeric value of columns names use extract for integers, convert them and get positions of columns by argsort - last reorder by iloc:
df1 = df1.iloc[:, df1.columns.str.extract('(\d+)', expand=False).astype(int).argsort()]
print (df1)
column column1 column2 column3 column4 column5 column6 column7 \
0 1 1 1 1 2 1 1
1 1 1 1 3 2 1 1
2 5 0 0 0 0 0 0
column column8 column9 column10
0 1 8 2
1 1 1 2
2 0 0 0

Resources