Pandas Transform and add second line between the measurements

Pandas Transform and add second line between the measurements - python-3.x

I am struggeling with transforming a pandas dataframe.
df=
0 A -- cm
1 B -- cm2
2 C 69 cm/s
3 D 48 cm/s
4 E 152 ms
5 F 1.05 NaN
6 G 9.15 NaN
7 H -- ms
8 I 8 cm/s
9 J 12 cm/s
I want to transform it to:
> A A_Unit B B_Unit C C_Unit ...
> -- cm -- cm2 69 cm/s ...
A to J is a parameter.
Conversion to a dataframe with only numbers works very good with df.T.drop(0), but i have actually no clue, how to add the label of the units it's values next to the parameter columns.
Maybe someone has a good idea and might help me with this topic.
Thanks
Thomas

You can stack and transpose since you will always have groupings of two.
u = df.set_index(0).stack().to_frame().T
u.columns = [
x if y == 1 else f'{x}_Unit' for x, y in u.columns]
A A_Unit B B_Unit C C_Unit D D_Unit E E_Unit F G H H_Unit I I_Unit J J_Unit
0 -- cm -- cm2 69 cm/s 48 cm/s 152 ms 1.05 9.15 -- ms 8 cm/s 12 cm/s

IIUC:
df = pd.read_clipboard(sep='\s\s+', header=None)
df_out = df.set_index(1).drop(0, axis=1).rename(columns={2:'',3:'Unit'}).stack()
df_out = df_out.to_frame().T
df_out.columns = [f'{i}_{j}' if j else f'{i}' for i, j in df_out.columns]
df_out
Output:
A A_Unit B B_Unit C C_Unit D D_Unit E E_Unit F G H \
0 -- cm -- cm2 69 cm/s 48 cm/s 152 ms 1.05 9.15 --
H_Unit I I_Unit J J_Unit
0 ms 8 cm/s 12 cm/s

Related

MELT: multiple values without duplication

Cant be this hard. I Have
df=pd.DataFrame({'id':[1,2,3],'name':['j','l','m'], 'mnt':['f','p','p'],'nt':['b','w','e'],'cost':[20,30,80],'paid':[12,23,45]})
I need
import numpy as np
df1=pd.DataFrame({'id':[1,2,3,1,2,3],'name':['j','l','m','j','l','m'], 't':['f','p','p','b','w','e'],'paid':[12,23,45,np.nan,np.nan,np.nan],'cost':[20,30,80,np.nan,np.nan,np.nan]})
I have 45 columns to invert.
I tried
(df.set_index(['id', 'name'])
.rename_axis(['paid'], axis=1)
.stack().reset_index())

EDIT: I think simpliest here is set missing values by variable column in DataFrame.melt:
df2 = df.melt(['id', 'name','cost','paid'], value_name='t')
df2.loc[df2.pop('variable').eq('nt'), ['cost','paid']] = np.nan
print (df2)
id name cost paid t
0 1 j 20.0 12.0 f
1 2 l 30.0 23.0 p
2 3 m 80.0 45.0 p
3 1 j NaN NaN b
4 2 l NaN NaN w
5 3 m NaN NaN e
Use lreshape working with dictionary of lists for specified which columns are 'grouped' together:
df2 = pd.lreshape(df, {'t':['mnt','nt'], 'mon':['cost','paid']})
print (df2)
id name t mon
0 1 j f 20
1 2 l p 30
2 3 m p 80
3 1 j b 12
4 2 l w 23
5 3 m e 45

Add columns - specify name - row value based on other

I have a Dataframe like this:
data = {'TYPE':['X', 'Y', 'Z'],'A': [11,12,13], 'B':[21,22,23], 'C':[31,32,34]}
df = pd.DataFrame(data)
TYPE A B C
0 X 11 21 31
1 Y 12 22 32
2 Z 13 23 34
I like to get the following DataFrame:
TYPE A A_added B B_added C C_added
0 X 11 15 21 25 31 35
1 Y 12 18 22 28 32 38
2 Z 13 20 23 30 34 40
For each column (next to TYPE column), here A,B,C:
add a new column with the name column_name_added
if TYPE = X add 4, if TYPE = Y add 6, if Z add 7

Idea is multiple values by helper Series created by Series.map with dictionary with DataFrame.add, add to original by DataFrame.join and last change order of columns by DataFrame.reindex:
d = {'X':4,'Y':6, 'Z':7}
cols = df.columns[:1].tolist() + [i for x in df.columns[1:] for i in (x, x + '_added')]
df1 = df.iloc[:, 1:].add(df['TYPE'].map(d), axis=0, fill_value=0).add_suffix('_added')
df2 = df.join(df1).reindex(cols, axis=1)
print (df2)
TYPE A A_added B B_added C C_added
0 X 11 15 21 25 31 35
1 Y 12 18 22 28 32 38
2 Z 13 20 23 30 34 41
EDIT: For values not matched dictionary are created missing values, so if add Series.fillna it return value 7 for all another values:
d = {'X':4,'Y':6}
cols = df.columns[:1].tolist() + [i for x in df.columns[1:] for i in (x, x + '_added')]
df1 = df.iloc[:, 1:].add(df['TYPE'].map(d).fillna(7).astype(int), axis=0).add_suffix('_added')
df2 = df.join(df1).reindex(cols, axis=1)
print (df2)
TYPE A A_added B B_added C C_added
0 X 11 15 21 25 31 35
1 Y 12 18 22 28 32 38
2 Z 13 20 23 30 34 41

Efficient way to perform iterative subtraction and division operations on pandas columns

I have a following dataframe-
A B C Result
0 232 120 9 91
1 243 546 1 12
2 12 120 5 53
I want to perform the operation of the following kind-
A B C Result A-B/A+B A-C/A+C B-C/B+C
0 232 120 9 91 0.318182 0.925311 0.860465
1 243 546 1 12 -0.384030 0.991803 0.996344
2 12 120 5 53 -0.818182 0.411765 0.920000
which I am doing using
df['A-B/A+B']=(df['A']-df['B'])/(df['A']+df['B'])
df['A-C/A+C']=(df['A']-df['C'])/(df['A']+df['C'])
df['B-C/B+C']=(df['B']-df['C'])/(df['B']+df['C'])
which I believe is a very crude and ugly way to do.
How to do it in a more correct way?

You can do the following:
# take columns in a list except the last column
colnames = df.columns.tolist()[:-1]
# compute
for i, c in enumerate(colnames):
if i != len(colnames):
for k in range(i+1, len(colnames)):
df[c + '_' + colnames[k]] = (df[c] - df[colnames[k]]) / (df[c] + df[colnames[k]])
# check result
print(df)
A B C Result A_B A_C B_C
0 232 120 9 91 0.318182 0.925311 0.860465
1 243 546 1 12 -0.384030 0.991803 0.996344
2 12 120 5 53 -0.818182 0.411765 0.920000

This is a perfect case to use DataFrame.eval:
cols = ['A-B/A+B','A-C/A+C','B-C/B+C']
x = pd.DataFrame([df.eval(col).values for col in cols], columns=cols)
df.assign(**x)
A B C Result A-B/A+B A-C/A+C B-C/B+C
0 232 120 9 91 351.482759 786.753086 122.000000
1 243 546 1 12 240.961207 243.995885 16.583333
2 12 120 5 53 128.925000 546.998168 124.958333
The advantage of this method respect to the other solution, is that it does not depend on the order of the operation sings that appear as column names, but rather as mentioned in the documentation it is used to:
Evaluate a string describing operations on DataFrame columns.

pandas df merge avoid duplicate column names

The question is when merge two dfs, and they all have a column called A, then the result will be a df having A_x and A_y, I am wondering how to keep A from one df and discard another one, so that I don't have to rename A_x to A later on after the merge.

Just filter your dataframe columns before merging.
df1 = pd.DataFrame({'Key':np.arange(12),'A':np.random.randint(0,100,12),'C':list('ABCD')*3})
df2 = pd.DataFrame({'Key':np.arange(12),'A':np.random.randint(100,1000,12),'C':list('ABCD')*3})
df1.merge(df2[['Key','A']], on='Key')
Output: (Note: C is not duplicated)
A_x C Key A_y
0 60 A 0 440
1 65 B 1 731
2 76 C 2 596
3 67 D 3 580
4 44 A 4 477
5 51 B 5 524
6 7 C 6 572
7 88 D 7 984
8 70 A 8 862
9 13 B 9 158
10 28 C 10 593
11 63 D 11 177

It depends if need append columns with duplicated columns names to final merged DataFrame:
...then add suffixes parameter to merge:
print (df1.merge(df2, on='Key', suffixes=('', '_')))
--
... if not use #Scott Boston solution.

how to add a new column in dataframe which divides multiple columns and finds the maximum value

This maybe real simple solution but I am new to python 3 and I have a dataframe with multiple columns. I would like to add a new column to the existing dataframe - which does the following calculation i.e.
New Column = Max((Column A/Column B), (Column C/Column D), (Column E/Column F))
I can do a max based on the following code but wanted to check how can I do div alongwith it.
df['Max'] = df[['Column A','Column B','Column C', 'Column D', 'Column E', 'Column F']].max(axis=1)
Column A Column B Column C Column D Column E Column F Max
3600 36000 22 11 3200 3200 36000
2300 2300 13 26 1100 1200 2300
1300 13000 15 33 1000 1000 13000
Thanks

You can div the df by itself by slicing the columns in steps and then take the max:
In [105]:
df['Max'] = df.ix[:,df.columns[::2]].div(df.ix[:,df.columns[1::2]].values, axis=1).max(axis=1)
df
Out[105]:
Column A Column B Column C Column D Column E Column F Max
0 3600 36000 22 11 3200 3200 2
1 2300 2300 13 26 1100 1200 1
2 1300 13000 15 33 1000 1000 1
Here are the intermediate values:
In [108]:
df.ix[:,df.columns[::2]].div(df.ix[:,df.columns[1::2]].values, axis=1)
Out[108]:
Column A Column C Column E
0 0.1 2.000000 1.000000
1 1.0 0.500000 0.916667
2 0.1 0.454545 1.000000

You can try something like as follows
df['Max'] = df.apply(lambda v: max(v['A'] / v['B'].astype(float), v['C'] / V['D'].astype(float), v['E'] / v['F'].astype(float)), axis=1)
Example
In [14]: df
Out[14]:
A B C D E F
0 1 11 1 11 12 98
1 2 22 2 22 67 1
2 3 33 3 33 23 4
3 4 44 4 44 11 10
In [15]: df['Max'] = df.apply(lambda v: max(v['A'] / v['B'].astype(float), v['C'] /
v['D'].astype(float), v['E'] / v['F'].astype(float)), axis=1)
In [16]: df
Out[16]:
A B C D E F Max
0 1 11 1 11 12 98 0.122449
1 2 22 2 22 67 1 67.000000
2 3 33 3 33 23 4 5.750000
3 4 44 4 44 11 10 1.100000

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Pandas Transform and add second line between the measurements - python-3.x

Related

MELT: multiple values without duplication

Add columns - specify name - row value based on other

Efficient way to perform iterative subtraction and division operations on pandas columns

pandas df merge avoid duplicate column names

how to add a new column in dataframe which divides multiple columns and finds the maximum value

Categories

Resources