Suppose I have a dataframe with two columns called 'column' and 'value' that looks like this:
Dataframe 1:
column value
0 column1 1
1 column2 1
2 column3 1
3 column4 1
4 column5 2
5 column6 1
6 column7 1
7 column8 1
8 column9 8
9 column10 2
10 column1 1
11 column2 1
12 column3 1
13 column4 3
14 column5 2
15 column6 1
16 column7 1
17 column8 1
18 column9 1
19 column10 2
20 column1 5
.. ... ...
I want to transform this dataframe so that it looks like this:
Dataframe 2:
column1 column2 column3 column4 column5 column6 column7 column8 column9 column10
0 1 1 1 1 2 1 1 1 8 2
1 1 1 1 3 2 1 1 1 1 2
2 5 .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. ..
Now I know how to do it the other way around. If you have a dataframe called df that looks like dataframe 2 you can stack it with the following code:
df = (df.stack().reset_index(level=0, drop=True).rename_axis(['column']).reset_index(name='value'))
Unfortunately, I don't know how to go back!
Question: How do I manipulate dataframe 1 (unstack it, if that's a word) so that it looks like dataframe 2?
Create MultiIndex by set_index with counter Series by cumcount and reshape by unstack:
g = df.groupby('column').cumcount()
df1 = df.set_index([g, 'column'])['value'].unstack(fill_value=0)
print (df1)
column column1 column10 column2 column3 column4 column5 column6 \
0 1 2 1 1 1 2 1
1 1 2 1 1 3 2 1
2 5 0 0 0 0 0 0
column column7 column8 column9
0 1 1 8
1 1 1 1
2 0 0 0
Last if need sorting by numeric value of columns names use extract for integers, convert them and get positions of columns by argsort - last reorder by iloc:
df1 = df1.iloc[:, df1.columns.str.extract('(\d+)', expand=False).astype(int).argsort()]
print (df1)
column column1 column2 column3 column4 column5 column6 column7 \
0 1 1 1 1 2 1 1
1 1 1 1 3 2 1 1
2 5 0 0 0 0 0 0
column column8 column9 column10
0 1 8 2
1 1 1 2
2 0 0 0
Related
Given the following dataframe:
user_id col1 col2
1 A 4
1 A 22
1 A 112
1 B -0.22222
1 B 9
1 C 0
2 A -1
2 A -5
2 K NA
And I want to group by user_id and col1 and count. Then to sort the counts within the groups in descending order.
Here is what I'm trying to do but I don't get the right output:
df[["user_id", "col1"]]. \
groupby(["user_id", "col1"]). \
agg(counts=("col1","count")). \
reset_index(). \
sort_values(["user_id", "col1", "counts"], ascending=False)
Please advise what should I change to make it work.
Expected output:
user_id col1 counts
1 A 3
B 2
C 1
2 A 2
K 1
Use GroupBy.size:
In [199]: df.groupby(['user_id', 'col1']).size()
Out[199]:
user_id col1
1 A 3
B 2
C 1
2 A 2
K 1
OR:
In [201]: df.groupby(['user_id', 'col1']).size().reset_index(name='counts')
Out[201]:
user_id col1 counts
0 1 A 3
1 1 B 2
2 1 C 1
3 2 A 2
4 2 K 1
EDIT:
In [206]: df.groupby(['user_id', 'col1']).agg({'col2': 'size'})
Out[206]:
col2
user_id col1
1 A 3
B 2
C 1
2 A 2
K 1
EDIT-2: For sorting, use:
In [213]: df.groupby(['user_id', 'col1'])['col2'].size().sort_values(ascending=False)
Out[213]:
user_id col1
1 A 3
2 A 2
1 B 2
2 K 1
1 C 1
Name: col2, dtype: int64
Using the main idea from Mayank answer:
df.groupby(["id_user","col1"]).size().reset_index(name="counts").sort_values(["id_user", "col1"], ascending=False)
Solved my issue.
I have the following data frame:
Col1 Col2 Col3 Type
0 1 2 3 1
1 4 5 6 1
2 7 8 9 2
and I would like to have a shuffled output like :
Col3 Col1 Col2 Type
0 3 1 2 1
1 6 4 5 1
2 9 7 8 2
How to achieve this?
Use DataFrame.sample with axis=1:
df = df.sample(frac=1, axis=1)
If need last column not changed position:
a = df.columns[:-1].to_numpy()
np.random.shuffle(a)
print (a)
['Col3' 'Col1' 'Col2']
df = df[np.append(a, ['Type'])]
print (df)
Col2 Col3 Col1 Type
0 3 1 2 1
1 6 4 5 1
2 9 7 8 2
I Have a dataframe with columns as 'PK', 'Column1', 'Column2'.
I want to update Column1 and Column2 as follows:
If Column1 > Column2 then (Column1 = Column1 - Column2) and at the same time Column2 = 0
Similarly
If Column1 < Column2 then (Column2 = Column2 - Column1) and at the same time Column1 = 0
I have tried with following but it is not giving expected result:
df["Column1"] = np.where(df['Column1'] > df['Column2'], df['Column1'] - df['Column2'], 0)
df["Column2"] = np.where(df['Column1'] < df['Column2'], df['Column2'] - df['Column1'], 0)
Use DataFrame.assign for avoid testing overwriten column Column1 in second line of your code:
df = pd.DataFrame({
'Column1':[4,5,4,5,5,4],
'Column2':[7,8,9,4,2,3],
})
print (df)
Column1 Column2
0 4 7
1 5 8
2 4 9
3 5 4
4 5 2
5 4 3
a = np.where(df['Column1'] > df['Column2'], df['Column1'] - df['Column2'], 0)
b = np.where(df['Column1'] < df['Column2'], df['Column2'] - df['Column1'], 0)
df = df.assign(Column1 = a, Column2 = b)
print (df)
Column1 Column2
0 0 3
1 0 3
2 0 5
3 1 0
4 3 0
5 1 0
from dataframe want to check how many times value change to zero in columns.
here is input df
pd.DataFrame({'value1':[3,4,7,0,11,20,0,20,15,16],
'value2':[2,2,0,8,8,2,2,2,5,5],
'value3':[7,10,20,4008,0,1,4820,1,1,1]})
value1 value2 value3
0 3 2 7
1 4 2 10
2 7 0 20
3 0 8 4008
4 11 8 0
5 20 2 1
6 0 2 4820
7 20 2 1
8 15 5 1
9 16 5 1
desired output:
df_out=pd.DataFrame({'value1_count':[2],
'value2_count':[1],
'value3_ount':[1]})
value1_count value2_count value3_ount
0 2 1 1
Try this
df.eq(0).astype(int).diff().eq(-1).sum()
Out[77]:
value1 2
value2 1
value3 1
dtype: int64
To get exact your output, just add the following
df.eq(0).astype(int).diff().eq(-1).sum().to_frame().T.add_suffix('_count')
Out[85]:
value1_count value2_count value3_count
0 2 1 1
Here's something you can do
df_out=pd.DataFrame({'value1_count':[df['value1'].value_counts()[0]],'value2_count':[df['value2'].value_counts()[0]],'value3_count':[df['value3'].value_counts()[0]]})
Output
value1_count value2_count value3_count
0 2 1 1
.value_counts() returns a pandas.Series object with the frequency of all the values, the index being the value. So at index [0] you find the frequency of zeros in the column.
>>> columns_name = ['value1_count','value2_count','value3_ount']
>>> df_out = pd.DataFrame((df==0).sum().values.reshape(1,-1), columns=columns_name )
>>> df_out
value1_count value2_count value3_ount
0 2 1 1
I have dataframe with two columns i want extract value of first column based on second column, if in last 3 rows of column 2 value change from 0 to any value then extract value of column 1.
df=pd.DataFrame({'column1':[1,5,6,7,8,11,12,14,18,20],'column2':[0,0,1,1,0,0,0,256,256,0]})
print(df)
column1 column2
0 1 0
1 5 0
2 6 1
3 7 1
4 8 0
5 11 0
6 12 0
7 14 256
8 18 256
9 20 0
out_put=pd.DataFrame({'column1':[20],'column2':[0]})
print(out_put)
column1 column2
0 20 0
I believe you need check difference with last values to first in last 3 values of second column:
df1 = df.tail(3)
df2 = df1[df1['column2'].eq(0).view('i1').diff().eq(1)]
print (df2)
column1 column2
9 20 0
Details:
#last 3 rows
print (df1)
column1 column2
7 14 256
8 18 256
9 20 0
#compare second colum for equality
print (df1['column2'].eq(0))
7 False
8 False
9 True
Name: column2, dtype: bool
#convert mask to integers
print (df1['column2'].eq(0).view('i1'))
7 0
8 0
9 1
Name: column2, dtype: int8
#get difference
print (df1['column2'].eq(0).view('i1').diff())
Name: column2, dtype: int8
7 NaN
8 0.0
9 1.0
Name: column2, dtype: float64
#compare by 1
print (df1['column2'].eq(0).view('i1').diff().eq(1))
7 False
8 False
9 True
Name: column2, dtype: bool
And last filter by boolean indexing.