replace a column in a dataframe by inserting another dataframe

replace a column in a dataframe by inserting another dataframe - python-3.x

I have two dataframes df1 and df2:
data1 = {'A':[1,3,2,1.4,2,1,2,4], 'B':[10,30,20,1.4,2,78,2,78],'C':[200,340,20,180,2,201,2,100]}
df1 = pd.DataFrame(data1)
print(df1)
A B C
0 1.0 10.0 200
1 3.0 30.0 340
2 2.0 20.0 20
3 1.4 1.4 180
4 2.0 2.0 2
5 1.0 78.0 201
6 2.0 2.0 2
7 4.0 78.0 100
data2 = {'D':['a1','a2','a3','a4',2,1,'a3',4], 'E':['b1','b2',20,1.4,2,78,2,78],'F':[200,340,'c1',180,2,'c2',2,100]}
df2 = pd.DataFrame(data2)
print(df2)
D E F
0 a1 b1 200
1 a2 b2 340
2 a3 20 c1
3 a4 1.4 180
4 2 2 2
5 1 78 c2
6 a3 2 2
7 4 78 100
I want to insert df2 in df1 by replacing column B in df1. How can a dataframe be inserted by reaplcing column in another dataframe.
desired result:
A D E F C
0 1.0 a1 b1 200 200
1 3.0 a2 b2 340 340
2 2.0 a3 20 c1 20
3 1.4 a4 1.4 180 180
4 2.0 2 2 2 2
5 1.0 1 78 c2 201
6 2.0 a3 2 2 2
7 4.0 4 78 100 100

Idea is use concat with select columns by positions by DataFrame.iloc and Index.get_loc, last remove original column by DataFrame.drop:
c = 'B'
pos = df1.columns.get_loc(c)
df = pd.concat([df1.iloc[:, :pos], df2, df1.iloc[:, pos:]], axis=1).drop(c, axis=1)
print (df)
A D E F C
0 1.0 a1 b1 200 200
1 3.0 a2 b2 340 340
2 2.0 a3 20 c1 20
3 1.4 a4 1.4 180 180
4 2.0 2 2 2 2
5 1.0 1 78 c2 201
6 2.0 a3 2 2 2
7 4.0 4 78 100 100
Tested another columns:
c = 'A'
pos = df1.columns.get_loc(c)
df = pd.concat([df1.iloc[:, :pos], df2, df1.iloc[:, pos:]], axis=1).drop(c, axis=1)
print (df)
D E F B C
0 a1 b1 200 10.0 200
1 a2 b2 340 30.0 340
2 a3 20 c1 20.0 20
3 a4 1.4 180 1.4 180
4 2 2 2 2.0 2
5 1 78 c2 78.0 201
6 a3 2 2 2.0 2
7 4 78 100 78.0 100
c = 'C'
pos = df1.columns.get_loc(c)
df = pd.concat([df1.iloc[:, :pos], df2, df1.iloc[:, pos:]], axis=1).drop(c, axis=1)
print (df)
A B D E F
0 1.0 10.0 a1 b1 200
1 3.0 30.0 a2 b2 340
2 2.0 20.0 a3 20 c1
3 1.4 1.4 a4 1.4 180
4 2.0 2.0 2 2 2
5 1.0 78.0 1 78 c2
6 2.0 2.0 a3 2 2
7 4.0 78.0 4 78 100

IIUC we use itertools with reindex
import itertools
l=list(itertools.chain.from_iterable(list(df2) if item == 'B' else [item] for item in list(df1)))
pd.concat([df1,df2], axis=1).reindex(columns=l)
A D E F C
0 1.0 a1 b1 200 200
1 3.0 a2 b2 340 340
2 2.0 a3 20 c1 20
3 1.4 a4 1.4 180 180
4 2.0 2 2 2 2
5 1.0 1 78 c2 201
6 2.0 a3 2 2 2
7 4.0 4 78 100 100

Related

Multiply 2 different dataframe with same dimension and repeating rows

I am trying to multiply two data frame
Df1
Name|Key |100|101|102|103|104
Abb AB 2 6 10 5 1
Bcc BC 1 3 7 4 2
Abb AB 5 1 11 3 1
Bcc BC 7 1 4 5 0
Df2
Key_1|100|101|102|103|104
AB 10 2 1 5 1
BC 1 10 2 2 4
Expected Output
Name|Key |100|101|102|103|104
Abb AB 20 12 10 25 1
Bcc BC 1 30 14 8 8
Abb AB 50 2 11 15 1
Bcc BC 7 10 8 10 0
I have tried grouping Df1 and then multiplying with Df2 but it didn't work
Please help me on how to approach this problem

You can rename the df2 Key_1 to Key(similar to df1) , then set index and mul on level=1
df1.set_index(['Name','Key']).mul(df2.rename(columns={'Key_1':'Key'})
.set_index('Key'),level=1).reset_index()
Or similar:
df1.set_index(['Name','Key']).mul(df2.set_index('Key_1')
.rename_axis('Key'),level=1).reset_index()
As correctly pointed by #QuangHoang , you can do without renaming too:
df1.set_index(['Name','Key']).mul(df2.set_index('Key_1'),level=1).reset_index()
Name Key 100 101 102 103 104
0 Abb AB 20 12 10 25 1
1 Bcc BC 1 30 14 8 8
2 Abb AB 50 2 11 15 1
3 Bcc BC 7 10 8 10 0

IIUC reindex_like
df1.set_index('Key',inplace=True)
df1=df1.mul(df2.set_index('Key_1').reindex_like(df1).values).fillna(df1)
Out[235]:
Name 100 101 102 103 104
Key
AB Abb 20.0 12.0 10.0 25.0 1.0
BC Bcc 1.0 30.0 14.0 8.0 8.0
AB Abb 50.0 2.0 11.0 15.0 1.0
BC Bcc 7.0 10.0 8.0 10.0 0.0

We could also use DataFrame.merge with pd.Index.difference to select columns.
mul_cols = df1.columns.difference(['Name','Key'])
df1.assign(**df1[mul_cols].mul(df2.merge(df1[['Key']],
left_on = 'Key_1',
right_on = 'Key')[mul_cols]))
Name Key 100 101 102 103 104
0 Abb AB 20 12 10 25 1
1 Bcc BC 10 6 7 20 2
2 Abb AB 5 10 22 6 4
3 Bcc BC 7 10 8 10 0

Rename column index from 0 to last column pandas

I have a pandas data frame dat as below:
0 1 0 1 0 1
0 A 23 0.1 122 56 9
1 B 24 0.45 564 36 3
2 C 25 0.2 235 87 5
3 D 13 0.8 567 42 6
4 E 5 0.9 356 12 2
As you can see from above, the columns' index are 0,1,0,1,0,1 etc. I want to rename back to original index starting from 0,1,2,3,4 ... and I did the following:
dat = dat.reset_index(drop=True)
The index was not changed. How do I get the index renamed in this case? Thanks in advance.

dat.columns = range(dat.shape[1])

There are quite a few ways:
dat = dat.rename(columns = lambda x: dat.columns.get_loc(x))
Or
dat = dat.rename(columns = dict(zip(dat.columns, range(dat.shape[1]))))
Or
dat = dat.set_axis(pd.RangeIndex(dat.shape[1]), axis=1, inplace=False)
Out[677]:
0 1 2 3 4 5
0 A 23 0.10 122 56 9
1 B 24 0.45 564 36 3
2 C 25 0.20 235 87 5
3 D 13 0.80 567 42 6
4 E 5 0.90 356 12 2

How to get a table in a format using pandas and export to csv?

I am new to pandas .I want the a table in a format which I need to export csv.
What I have tried is:
o_rg,o_gg,a_rg,a_gg are arrays
df1=pd.DataFrame({'RED':o_rg,'GREEN':o_gg})
df2=pd.DataFrame({'RED':a_rg,'RED':a_gg})
df=df1-(df2)
pop_complete = pd.concat([df, df1, df2], keys=["O-A", "O", "A"], axis=1)
pop_complete.index = ['A1','A3','A8']
df1 = pop_complete.stack(0)[['RED','GREEN']].reindex(["O", "A", "O-A"], axis=0, level=1)
df1.to_csv("OUT.CSV")
What I get the output as:
RED GREEN
A1 O 14.0 14.0
A 14.0 12.0
O-A 0.0 2.0
A3 O 12.0 9.0
A 12.0 10.0
O-A 0.0 -1.0
A8 O 15.0 12.0
A 15.0 12.0
O-A 0.0 0.0
What I actually want is:
RED GREEN
A1
O 14.0 14.0
A 14.0 12.0
O-A 0.0 2.0
A3
O 12.0 9.0
A 12.0 10.0
O-A 0.0 -1.0
A8
O 15.0 12.0
A 15.0 12.0
O-A 0.0 0.0
where 'A1','A3','A8' ... can be stored in array cases=[]
How to get the actual output?

Use custom functions:
#from previous answer
df1 = pop_complete.stack(0)[['RED','GREEN']].reindex(["O", "A", "O-A"], axis=0, level=1)
print (df1)
RED GREEN
A1 O 14 14
A 14 14
O-A 0 0
A3 O 12 9
A 12 10
O-A 0 -1
A8 O 15 12
A 15 15
O-A 0 -3
If need all numeric values:
def f(x):
df2 = pd.DataFrame(columns=x.columns,
index=pd.MultiIndex.from_tuples([(x.name, x.name)]))
return df2.append(x)
df3 = df1.groupby(level=0, group_keys=False).apply(f).reset_index(level=0, drop=True)
print (df3)
RED GREEN
A1 NaN NaN
O 14 14
A 14 14
O-A 0 0
A3 NaN NaN
O 12 9
A 12 10
O-A 0 -1
A8 NaN NaN
O 15 12
A 15 15
O-A 0 -3
If need empty strings:
def f(x):
df2 = pd.DataFrame('', columns=x.columns,
index=pd.MultiIndex.from_tuples([(x.name, x.name)]))
return df2.append(x)
df3 = df1.groupby(level=0, group_keys=False).apply(f).reset_index(level=0, drop=True)
print (df3)
RED GREEN
A1
O 14 14
A 14 14
O-A 0 0
A3
O 12 9
A 12 10
O-A 0 -1
A8
O 15 12
A 15 15
O-A 0 -3

Data Cleaning Python: Replacing the values of a column not within a range with NaN and then dropping the raws which contain NaN

I am doing kind of research and need to delete the raws containing some values which are not in a specific range using Python.
My Dataset in Excel:
I want to replace the big values of column A (not within range 1-20) with NaN. Replace Big values of column B (not within range 21-40) and so on.
Now I want to drop/ delete the raws contains the NaN values
Expected output should be like:

You can try this to solve your problem. Here, I tried to simulate your problem and solve it with below given code:
import numpy as np
import pandas as pd
data = pd.read_csv('c.csv')
print(data)
data['A'] = data['A'].apply(lambda x: np.nan if x in range(1,10,1) else x)
data['B'] = data['B'].apply(lambda x: np.nan if x in range(10,20,1) else x)
data['C'] = data['C'].apply(lambda x: np.nan if x in range(20,30,1) else x)
print(data)
data = data.dropna()
print(data)
Orignal data:
A B C
0 1 10 20
1 2 11 22
2 4 15 25
3 8 20 30
4 12 25 35
5 18 40 55
6 20 45 60
Output with NaN:
A B C
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN 20.0 30.0
4 12.0 25.0 35.0
5 18.0 40.0 55.0
6 20.0 45.0 60.0
Final Output:
A B C
4 12.0 25.0 35.0
5 18.0 40.0 55.0
6 20.0 45.0 60.0
Try this for non-integer numbers:
import numpy as np
import pandas as pd
data = pd.read_csv('c.csv')
print(data)
data['A'] = data['A'].apply(lambda x: np.nan if x in (round(y,2) for y in np.arange(1.00,10.00,0.01)) else x)
data['B'] = data['B'].apply(lambda x: np.nan if x in (round(y,2) for y in np.arange(10.00,20.00,0.01)) else x)
data['C'] = data['C'].apply(lambda x: np.nan if x in (round(y,2) for y in np.arange(20.00,30.00,0.01)) else x)
print(data)
data = data.dropna()
print(data)
Output:
A B C
0 1.25 10.56 20.11
1 2.39 11.19 22.92
2 4.00 15.65 25.27
3 8.89 20.31 30.15
4 12.15 25.91 35.64
5 18.29 40.15 55.98
6 20.46 45.00 60.48
A B C
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN 20.31 30.15
4 12.15 25.91 35.64
5 18.29 40.15 55.98
6 20.46 45.00 60.48
A B C
4 12.15 25.91 35.64
5 18.29 40.15 55.98
6 20.46 45.00 60.48

try this,
df= df.drop(df.index[df.idxmax()])
O/P:
A B C D
0 1 21 41 61
1 2 22 42 62
2 3 23 43 63
3 4 24 44 64
4 5 25 45 65
5 6 26 46 66
6 7 27 47 67
7 8 28 48 68
8 9 29 49 69
13 14 34 54 74
14 15 35 55 75
15 16 36 56 76
16 17 37 57 77
17 18 38 58 78
18 19 39 59 79
19 20 40 60 80
use idxmax and drop the returned index.

Concatenate two dataframes of different sizes (pandas)

I have two dataframes with unique ids. They share some columns but not all. I need to create a combined dataframe which will include rows from missing ids from the second dataframe. Tried merge and concat, no luck. It's probably too late, my brain stopped working. Will appreciate your help!
df1 = pd.DataFrame({
'id': ['a','b','c','d','f','g','h','j','k','l','m'],
'metric1': [123,22,356,412,54,634,72,812,129,110,200],
'metric2':[1,2,3,4,5,6,7,8,9,10,11]
})
df2 = pd.DataFrame({
'id': ['a','b','c','d','f','g','h','q','z','w'],
'metric1': [123,22,356,412,54,634,72,812,129,110]
})
df2
The result should look like this:
id metric1 metric2
0 a 123 1.0
1 b 22 2.0
2 c 356 3.0
3 d 412 4.0
4 f 54 5.0
5 g 634 6.0
6 h 72 7.0
7 j 812 8.0
8 k 129 9.0
9 l 110 10.0
10 m 200 11.0
11 q 812 NaN
12 z 129 NaN
13 w 110 NaN

In this case using combine_first
df1.set_index('id').combine_first(df2.set_index('id')).reset_index()
Out[766]:
id metric1 metric2
0 a 123.0 1.0
1 b 22.0 2.0
2 c 356.0 3.0
3 d 412.0 4.0
4 f 54.0 5.0
5 g 634.0 6.0
6 h 72.0 7.0
7 j 812.0 8.0
8 k 129.0 9.0
9 l 110.0 10.0
10 m 200.0 11.0
11 q 812.0 NaN
12 w 110.0 NaN
13 z 129.0 NaN

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

replace a column in a dataframe by inserting another dataframe - python-3.x

Related

Multiply 2 different dataframe with same dimension and repeating rows

Rename column index from 0 to last column pandas

How to get a table in a format using pandas and export to csv?

Data Cleaning Python: Replacing the values of a column not within a range with NaN and then dropping the raws which contain NaN

Concatenate two dataframes of different sizes (pandas)

Categories

Resources