Unstacking a pandas dataframe

Unstacking a pandas dataframe - python-3.x

Suppose I have a dataframe with two columns called 'column' and 'value' that looks like this:
Dataframe 1:
column value
0 column1 1
1 column2 1
2 column3 1
3 column4 1
4 column5 2
5 column6 1
6 column7 1
7 column8 1
8 column9 8
9 column10 2
10 column1 1
11 column2 1
12 column3 1
13 column4 3
14 column5 2
15 column6 1
16 column7 1
17 column8 1
18 column9 1
19 column10 2
20 column1 5
.. ... ...
I want to transform this dataframe so that it looks like this:
Dataframe 2:
column1 column2 column3 column4 column5 column6 column7 column8 column9 column10
0 1 1 1 1 2 1 1 1 8 2
1 1 1 1 3 2 1 1 1 1 2
2 5 .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. ..
Now I know how to do it the other way around. If you have a dataframe called df that looks like dataframe 2 you can stack it with the following code:
df = (df.stack().reset_index(level=0, drop=True).rename_axis(['column']).reset_index(name='value'))
Unfortunately, I don't know how to go back!
Question: How do I manipulate dataframe 1 (unstack it, if that's a word) so that it looks like dataframe 2?

Create MultiIndex by set_index with counter Series by cumcount and reshape by unstack:
g = df.groupby('column').cumcount()
df1 = df.set_index([g, 'column'])['value'].unstack(fill_value=0)
print (df1)
column column1 column10 column2 column3 column4 column5 column6 \
0 1 2 1 1 1 2 1
1 1 2 1 1 3 2 1
2 5 0 0 0 0 0 0
column column7 column8 column9
0 1 1 8
1 1 1 1
2 0 0 0
Last if need sorting by numeric value of columns names use extract for integers, convert them and get positions of columns by argsort - last reorder by iloc:
df1 = df1.iloc[:, df1.columns.str.extract('(\d+)', expand=False).astype(int).argsort()]
print (df1)
column column1 column2 column3 column4 column5 column6 column7 \
0 1 1 1 1 2 1 1
1 1 1 1 3 2 1 1
2 5 0 0 0 0 0 0
column column8 column9 column10
0 1 8 2
1 1 1 2
2 0 0 0

Related

How to group a dataframe by multiple columns, sum and sort the totals in descending order?

Given the following dataframe:
user_id col1 col2
1 A 4
1 A 22
1 A 112
1 B -0.22222
1 B 9
1 C 0
2 A -1
2 A -5
2 K NA
And I want to group by user_id and col1 and count. Then to sort the counts within the groups in descending order.
Here is what I'm trying to do but I don't get the right output:
df[["user_id", "col1"]]. \
groupby(["user_id", "col1"]). \
agg(counts=("col1","count")). \
reset_index(). \
sort_values(["user_id", "col1", "counts"], ascending=False)
Please advise what should I change to make it work.
Expected output:
user_id col1 counts
1 A 3
B 2
C 1
2 A 2
K 1

Use GroupBy.size:
In [199]: df.groupby(['user_id', 'col1']).size()
Out[199]:
user_id col1
1 A 3
B 2
C 1
2 A 2
K 1
OR:
In [201]: df.groupby(['user_id', 'col1']).size().reset_index(name='counts')
Out[201]:
user_id col1 counts
0 1 A 3
1 1 B 2
2 1 C 1
3 2 A 2
4 2 K 1
EDIT:
In [206]: df.groupby(['user_id', 'col1']).agg({'col2': 'size'})
Out[206]:
col2
user_id col1
1 A 3
B 2
C 1
2 A 2
K 1
EDIT-2: For sorting, use:
In [213]: df.groupby(['user_id', 'col1'])['col2'].size().sort_values(ascending=False)
Out[213]:
user_id col1
1 A 3
2 A 2
1 B 2
2 K 1
1 C 1
Name: col2, dtype: int64

Using the main idea from Mayank answer:
df.groupby(["id_user","col1"]).size().reset_index(name="counts").sort_values(["id_user", "col1"], ascending=False)
Solved my issue.

Shuffle pandas columns

I have the following data frame:
Col1 Col2 Col3 Type
0 1 2 3 1
1 4 5 6 1
2 7 8 9 2
and I would like to have a shuffled output like :
Col3 Col1 Col2 Type
0 3 1 2 1
1 6 4 5 1
2 9 7 8 2
How to achieve this?

Use DataFrame.sample with axis=1:
df = df.sample(frac=1, axis=1)
If need last column not changed position:
a = df.columns[:-1].to_numpy()
np.random.shuffle(a)
print (a)
['Col3' 'Col1' 'Col2']
df = df[np.append(a, ['Type'])]
print (df)
Col2 Col3 Col1 Type
0 3 1 2 1
1 6 4 5 1
2 9 7 8 2

Update two dataframe column based on condition

I Have a dataframe with columns as 'PK', 'Column1', 'Column2'.
I want to update Column1 and Column2 as follows:
If Column1 > Column2 then (Column1 = Column1 - Column2) and at the same time Column2 = 0
Similarly
If Column1 < Column2 then (Column2 = Column2 - Column1) and at the same time Column1 = 0
I have tried with following but it is not giving expected result:
df["Column1"] = np.where(df['Column1'] > df['Column2'], df['Column1'] - df['Column2'], 0)
df["Column2"] = np.where(df['Column1'] < df['Column2'], df['Column2'] - df['Column1'], 0)

Use DataFrame.assign for avoid testing overwriten column Column1 in second line of your code:
df = pd.DataFrame({
'Column1':[4,5,4,5,5,4],
'Column2':[7,8,9,4,2,3],
})
print (df)
Column1 Column2
0 4 7
1 5 8
2 4 9
3 5 4
4 5 2
5 4 3
a = np.where(df['Column1'] > df['Column2'], df['Column1'] - df['Column2'], 0)
b = np.where(df['Column1'] < df['Column2'], df['Column2'] - df['Column1'], 0)
df = df.assign(Column1 = a, Column2 = b)
print (df)
Column1 Column2
0 0 3
1 0 3
2 0 5
3 1 0
4 3 0
5 1 0

How to count consecutive value change to zero in column python

from dataframe want to check how many times value change to zero in columns.
here is input df
pd.DataFrame({'value1':[3,4,7,0,11,20,0,20,15,16],
'value2':[2,2,0,8,8,2,2,2,5,5],
'value3':[7,10,20,4008,0,1,4820,1,1,1]})
value1 value2 value3
0 3 2 7
1 4 2 10
2 7 0 20
3 0 8 4008
4 11 8 0
5 20 2 1
6 0 2 4820
7 20 2 1
8 15 5 1
9 16 5 1
desired output:
df_out=pd.DataFrame({'value1_count':[2],
'value2_count':[1],
'value3_ount':[1]})
value1_count value2_count value3_ount
0 2 1 1

Try this
df.eq(0).astype(int).diff().eq(-1).sum()
Out[77]:
value1 2
value2 1
value3 1
dtype: int64
To get exact your output, just add the following
df.eq(0).astype(int).diff().eq(-1).sum().to_frame().T.add_suffix('_count')
Out[85]:
value1_count value2_count value3_count
0 2 1 1

Here's something you can do
df_out=pd.DataFrame({'value1_count':[df['value1'].value_counts()[0]],'value2_count':[df['value2'].value_counts()[0]],'value3_count':[df['value3'].value_counts()[0]]})
Output
value1_count value2_count value3_count
0 2 1 1
.value_counts() returns a pandas.Series object with the frequency of all the values, the index being the value. So at index [0] you find the frequency of zeros in the column.

>>> columns_name = ['value1_count','value2_count','value3_ount']
>>> df_out = pd.DataFrame((df==0).sum().values.reshape(1,-1), columns=columns_name )
>>> df_out
value1_count value2_count value3_ount
0 2 1 1

How to extract value of column based on value change in other column python

I have dataframe with two columns i want extract value of first column based on second column, if in last 3 rows of column 2 value change from 0 to any value then extract value of column 1.
df=pd.DataFrame({'column1':[1,5,6,7,8,11,12,14,18,20],'column2':[0,0,1,1,0,0,0,256,256,0]})
print(df)
column1 column2
0 1 0
1 5 0
2 6 1
3 7 1
4 8 0
5 11 0
6 12 0
7 14 256
8 18 256
9 20 0
out_put=pd.DataFrame({'column1':[20],'column2':[0]})
print(out_put)
column1 column2
0 20 0

I believe you need check difference with last values to first in last 3 values of second column:
df1 = df.tail(3)
df2 = df1[df1['column2'].eq(0).view('i1').diff().eq(1)]
print (df2)
column1 column2
9 20 0
Details:
#last 3 rows
print (df1)
column1 column2
7 14 256
8 18 256
9 20 0
#compare second colum for equality
print (df1['column2'].eq(0))
7 False
8 False
9 True
Name: column2, dtype: bool
#convert mask to integers
print (df1['column2'].eq(0).view('i1'))
7 0
8 0
9 1
Name: column2, dtype: int8
#get difference
print (df1['column2'].eq(0).view('i1').diff())
Name: column2, dtype: int8
7 NaN
8 0.0
9 1.0
Name: column2, dtype: float64
#compare by 1
print (df1['column2'].eq(0).view('i1').diff().eq(1))
7 False
8 False
9 True
Name: column2, dtype: bool
And last filter by boolean indexing.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Unstacking a pandas dataframe - python-3.x

Related

How to group a dataframe by multiple columns, sum and sort the totals in descending order?

Shuffle pandas columns

Update two dataframe column based on condition

How to count consecutive value change to zero in column python

How to extract value of column based on value change in other column python

Categories

Resources