pandas how to convert a two-dimension dataframe to a one-dimension dataframe - python-3.x

suppose I have a dataframe with multi columns.
a b c
1
2
3
How to convert it to a single columns dataframe
1 a
2 a
3 a
1 b
2 b
3 b
1 c
2 c
3 c
please note that the former is a Dataframe other than Panel

Use melt:
df = df.reset_index().melt('index', var_name='col').set_index('index')[['col']]
print (df)
col
index
1 a
2 a
3 a
1 b
2 b
3 b
1 c
2 c
3 c
Or numpy.repeat and numpy.tile with DataFrame constructor::
a = np.repeat(df.columns, len(df))
b = np.tile(df.index, len(df.columns))
df = pd.DataFrame(a, index=b, columns=['col'])
print (df)
col
1 a
2 a
3 a
1 b
2 b
3 b
1 c
2 c
3 c

another way is,
pd.DataFrame(list(itertools.product(df.index, df.columns.values))).set_index([0])
Output:
1
0
1 a
1 b
1 c
2 a
2 b
2 c
3 a
3 b
3 c
For exact output:
use sort_values
print pd.DataFrame(list(itertools.product(df.index, df.columns.values))).set_index([0]).sort_values(by=[1])
1
0
1 a
2 a
3 a
1 b
2 b
3 b
1 c
2 c
3 c

Related

Complete DataFrame with missing steps python

I have a pandas data frame which misses some rows. It actually has the following format:
id step var1 var2
1 1 a h
2 1 b i
3 1 c g
1 3 d k
2 2 e l
5 2 f m
6 1 g n
...
An observation should pass through every steps. I mean id ==1 has step 1 and 3 but misses step 2 (which I don't want). id==2 has step 1 and 2 and there is no step 3 and this is fine because there is no gap. id ==5 has step 2 but doesn't have step 1 so I am missing a line there.
I need to add some rows to complete the steps, I would keep var1 var2 and id as the same.
I would like to obtain this df :
id step var1 var2
1 1 a h
2 1 b i
3 1 c g
1 3 d k
2 2 e l
5 2 f m
6 1 g n
1 2 a h
5 1 f m
...
It would be awesome if anyone could help with a smooth solution
You can try pivot the table then ffill and bfill:
(df.pivot(index='id', columns='step')
.groupby(level=0, axis=1)
.apply(lambda x: x.ffill().bfill())
.stack()
.reset_index()
)
Output:
id step var1 var2
0 1 1 a h
1 1 2 e l
2 1 3 d k
3 2 1 b i
4 2 2 e l
5 2 3 d k
6 3 1 c g
7 3 2 e l
8 3 3 d k
9 5 1 c g
10 5 2 f m
11 5 3 d k
12 6 1 g n
13 6 2 f m
14 6 3 d k

pandas transform one row into multiple rows

I have a dataframe as below.
My dataframe as below.
ID list
1 a, b, c
2 a, s
3 NA
5 f, j, l
I need to break each items in the list column(String) into independent row as below:
ID item
1 a
1 b
1 c
2 a
2 s
3 NA
5 f
5 j
5 l
Thanks.
Use str.split to separate your items then explode:
print (df.assign(list=df["list"].str.split(", ")).explode("list"))
ID list
0 1 a
0 1 b
0 1 c
1 2 a
1 2 s
2 3 NaN
3 5 f
3 5 j
3 5 l
A beginners approach : Just another way of doing the same thing using pd.DataFrame.stack
df['list'] = df['list'].map(lambda x : str(x).split(','))
dfOut = pd.DataFrame(df['list'].values.tolist())
dfOut.index = df['ID']
dfOut = dfOut.stack().reset_index()
del dfOut['level_1']
dfOut.rename(columns = {0 : 'list'}, inplace = True)
Output:
ID list
0 1 a
1 1 b
2 1 c
3 2 a
4 2 s
5 3 nan
6 5 f
7 5 j
8 5 l

Python Pandas: copy several columns at specific row from one dataframe to another with different names

I have dataframe1 with columns a,b,c,d with 5 rows.
I also have another dataframe2 with columns e,f,g,h
Let's say I want to copy columns a,b in row 3 from dataframe1 to columns f,g in row 3 at dataframe2.
I tried to use this code:
dataframe2.loc[3,['f','g']] = dataframe1.loc[3,['a','b']].
The results was NaN in dataframe2.
Any ideas how can I solve it?
One idea is convert to numpy array for avoid alignment data by columns names:
dataframe2.loc[3,['f','g']] = dataframe1.loc[3,['a','b']].values
Sample:
dataframe1 = pd.DataFrame({'a':list('abcdef'),
'b':[4,5,4,5,5,4],
'c':[7,8,9,4,2,3]})
print (dataframe1)
a b c
0 a 4 7
1 b 5 8
2 c 4 9
3 d 5 4
4 e 5 2
5 f 4 3
dataframe2 = pd.DataFrame({'f':list('HIJK'),
'g':[0,0,7,1],
'h':[0,1,0,1]})
print (dataframe2)
f g h
0 H 0 0
1 I 0 1
2 J 7 0
3 K 1 1
dataframe2.loc[3,['f','g']] = dataframe1.loc[3,['a','b']].values
print (dataframe2)
f g h
0 H 0 0
1 I 0 1
2 J 7 0
3 d 5 1

Deleting the first instance in a data frame

I was wondering whats the best way to delete the first instance of a particular index in a Pandas dataframe?
In the example below, I want to delete row 0,5 and 9
Use boolean indexing with Index.duplicated:
df = pd.DataFrame({'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')}, index=[0,0,1,2,2,2])
print (df)
A B C D E F
0 a 4 7 1 5 a
0 b 5 8 3 3 a
1 c 4 9 5 6 a
2 d 5 4 7 9 b
2 e 5 2 1 2 b
2 f 4 3 0 4 b
df = df[df.index.duplicated()]
print (df)
A B C D E F
0 b 5 8 3 3 a
2 e 5 2 1 2 b
2 f 4 3 0 4 b
Detail:
print (df.index.duplicated())
[False True False False True True]
Heres a way to do it using groupby:
rst = df.reset_index()
df['int_index'] = df.reset_index().index
firsts = df.groupby(df.index).first()
filt = df[~df['int_index'].isin(firsts['int_index'])]
missing = df[df.index.value_counts() == 1]
res = pd.concat([drp, missing]).sort_index().drop('int_index', axis=1)

Pandas Conditionally Combine (and sum) Rows

Given the following data frame:
import pandas as pd
df=pd.DataFrame({'A':['A','A','A','B','B','B'],
'B':[1,1,2,1,1,1],
'C':[2,4,6,3,5,7]})
df
A B C
0 A 1 2
1 A 1 4
2 A 2 6
3 B 1 3
4 B 1 5
5 B 1 7
Wherever there are duplicate rows per columns 'A' and 'B', I'd like to combine those rows and sum the value under column 'C' like this:
A B C
0 A 1 6
2 A 2 6
3 B 1 15
So far, I can at least identify the duplicates like this:
df['Dup']=df.duplicated(['A','B'],keep=False)
Thanks in advance!
use groupby() and sum():
In [94]: df.groupby(['A','B']).sum().reset_index()
Out[94]:
A B C
0 A 1 6
1 A 2 6
2 B 1 15

Resources