I am trying to multiply two data frame
Df1
Name|Key |100|101|102|103|104
Abb AB 2 6 10 5 1
Bcc BC 1 3 7 4 2
Abb AB 5 1 11 3 1
Bcc BC 7 1 4 5 0
Df2
Key_1|100|101|102|103|104
AB 10 2 1 5 1
BC 1 10 2 2 4
Expected Output
Name|Key |100|101|102|103|104
Abb AB 20 12 10 25 1
Bcc BC 1 30 14 8 8
Abb AB 50 2 11 15 1
Bcc BC 7 10 8 10 0
I have tried grouping Df1 and then multiplying with Df2 but it didn't work
Please help me on how to approach this problem
You can rename the df2 Key_1 to Key(similar to df1) , then set index and mul on level=1
df1.set_index(['Name','Key']).mul(df2.rename(columns={'Key_1':'Key'})
.set_index('Key'),level=1).reset_index()
Or similar:
df1.set_index(['Name','Key']).mul(df2.set_index('Key_1')
.rename_axis('Key'),level=1).reset_index()
As correctly pointed by #QuangHoang , you can do without renaming too:
df1.set_index(['Name','Key']).mul(df2.set_index('Key_1'),level=1).reset_index()
Name Key 100 101 102 103 104
0 Abb AB 20 12 10 25 1
1 Bcc BC 1 30 14 8 8
2 Abb AB 50 2 11 15 1
3 Bcc BC 7 10 8 10 0
IIUC reindex_like
df1.set_index('Key',inplace=True)
df1=df1.mul(df2.set_index('Key_1').reindex_like(df1).values).fillna(df1)
Out[235]:
Name 100 101 102 103 104
Key
AB Abb 20.0 12.0 10.0 25.0 1.0
BC Bcc 1.0 30.0 14.0 8.0 8.0
AB Abb 50.0 2.0 11.0 15.0 1.0
BC Bcc 7.0 10.0 8.0 10.0 0.0
We could also use DataFrame.merge with pd.Index.difference to select columns.
mul_cols = df1.columns.difference(['Name','Key'])
df1.assign(**df1[mul_cols].mul(df2.merge(df1[['Key']],
left_on = 'Key_1',
right_on = 'Key')[mul_cols]))
Name Key 100 101 102 103 104
0 Abb AB 20 12 10 25 1
1 Bcc BC 10 6 7 20 2
2 Abb AB 5 10 22 6 4
3 Bcc BC 7 10 8 10 0
Related
If we have the following df,
df
A A B B B
0 10 2 0 3 3
1 20 4 19 21 36
2 30 20 24 24 12
3 40 10 39 23 46
How can I combine the content of the columns with the same names?
e.g.
A B
0 10 0
1 20 19
2 30 24
3 40 39
4 2 3
5 4 21
6 20 24
7 10 23
8 Na 3
9 Na 36
10 Na 12
11 Na 46
I tried groupby and merge and both are not doing this job.
Any help is appreciated.
If columns names are duplicated you can use DataFrame.melt with concat:
df = pd.concat([df['A'].melt()['value'], df['B'].melt()['value']], axis=1, keys=['A','B'])
print (df)
A B
0 10.0 0
1 20.0 19
2 30.0 24
3 40.0 39
4 2.0 3
5 4.0 21
6 20.0 24
7 10.0 23
8 NaN 3
9 NaN 36
10 NaN 12
11 NaN 46
EDIT:
uniq = df.columns.unique()
df = pd.concat([df[c].melt()['value'] for c in uniq], axis=1, keys=uniq)
print (df)
A B
0 10.0 0
1 20.0 19
2 30.0 24
3 40.0 39
4 2.0 3
5 4.0 21
6 20.0 24
7 10.0 23
8 NaN 3
9 NaN 36
10 NaN 12
11 NaN 46
I have a pandas data frame dat as below:
0 1 0 1 0 1
0 A 23 0.1 122 56 9
1 B 24 0.45 564 36 3
2 C 25 0.2 235 87 5
3 D 13 0.8 567 42 6
4 E 5 0.9 356 12 2
As you can see from above, the columns' index are 0,1,0,1,0,1 etc. I want to rename back to original index starting from 0,1,2,3,4 ... and I did the following:
dat = dat.reset_index(drop=True)
The index was not changed. How do I get the index renamed in this case? Thanks in advance.
dat.columns = range(dat.shape[1])
There are quite a few ways:
dat = dat.rename(columns = lambda x: dat.columns.get_loc(x))
Or
dat = dat.rename(columns = dict(zip(dat.columns, range(dat.shape[1]))))
Or
dat = dat.set_axis(pd.RangeIndex(dat.shape[1]), axis=1, inplace=False)
Out[677]:
0 1 2 3 4 5
0 A 23 0.10 122 56 9
1 B 24 0.45 564 36 3
2 C 25 0.20 235 87 5
3 D 13 0.80 567 42 6
4 E 5 0.90 356 12 2
HID gen views
1 1 20
1 2 2532
1 3 276
1 4 1684
1 5 779
1 6 200
1 7 545
2 1 20
2 2 7478
2 3 750
2 4 7742
2 5 2643
2 6 208
2 7 585
3 1 21
3 2 4012
3 3 2019
3 4 1073
3 5 3372
3 6 8
3 7 1823
3 8 22
this is a sample section of a data frame, where HID and gen are indexes.
how can it be transformed like this
HID 1 2 3 4 5 6 7 8
1 20 2532 276 1684 779 200 545 nan
2 20 7478 750 7742 2643 208 585 nan
3 21 4012 2019 1073 3372 8 1823 22
Its called pivoting i.e
df.reset_index().pivot('HID','gen','views')
gen 1 2 3 4 5 6 7 8
HID
1 20.0 2532.0 276.0 1684.0 779.0 200.0 545.0 NaN
2 20.0 7478.0 750.0 7742.0 2643.0 208.0 585.0 NaN
3 21.0 4012.0 2019.0 1073.0 3372.0 8.0 1823.0 22.0
Use unstack:
df = df['views'].unstack()
If need also HID column add reset_index + rename_axis:
df = df['views'].unstack().reset_index().rename_axis(None, 1)
print (df)
HID 1 2 3 4 5 6 7 8
0 1 20.0 2532.0 276.0 1684.0 779.0 200.0 545.0 NaN
1 2 20.0 7478.0 750.0 7742.0 2643.0 208.0 585.0 NaN
2 3 21.0 4012.0 2019.0 1073.0 3372.0 8.0 1823.0 22.0
I have two pandas df that look like this
df1
Amount Price
0 5 50
1 10 53
2 15 55
3 30 50
4 45 61
df2
Used amount
0 4.5
1 1.2
2 6.2
3 4.1
4 25.6
5 31
6 19
7 15
I am trying to insert a new column on df2 that will give provide the price from the df1, df1 and df2 have different size, df1 is smaller
I am expecting something like this
df3
Used amount price
0 4.5 50
1 1.2 50
2 6.2 53
3 4.1 50
4 25.6 50
5 31 61
6 19 50
7 15 55
I am thinking to solve this, with something like this function
def price_function(key, table):
used_amount_df2 = (row[0] for row in df1)
price = filter(lambda x: x < key, used_amount_df1)
Here is my own solution
1st approach:
from itertools import product
import pandas as pd
df2=df2.reset_index()
DF=pd.DataFrame(list(product(df2.Usedamount, df1.Amount)), columns=['l1', 'l2'])
DF['DIFF']=(DF.l1-DF.l2)
DF=DF.loc[DF.DIFF<=0,]
DF=DF.sort_values(['l1','DIFF'],ascending=[True,False]).drop_duplicates(['l1'],keep='first')
df1.merge(DF,left_on='Amount',right_on='l2',how='left').merge(df2,left_on='l1',right_on='Usedamount',how='right').loc[:,['index','Usedamount','Price']].set_index('index').sort_index()
Out[185]:
Usedamount Price
index
0 4.5 50
1 1.2 50
2 6.2 53
3 4.1 50
4 25.6 50
5 31.0 61
6 19.0 50
7 15.0 55
2nd using pd.merge_asof I recommend this
df2=df2.rename({'Used amount':Amount}).sort_values('Amount')
df2=df2.reset_index()
pd.merge_asof(df2,df1,on='Amount',allow_exact_matches=True,direction='forward')\
.set_index('index').sort_index()
Out[206]:
Amount Price
index
0 4.5 50
1 1.2 50
2 6.2 53
3 4.1 50
4 25.6 50
5 31.0 61
6 19.0 50
7 15.0 55
Using pd.IntervalIndex you can
In [468]: df1.index = pd.IntervalIndex.from_arrays(df1.Amount.shift().fillna(0),df1.Amount)
In [469]: df1
Out[469]:
Amount Price
(0.0, 5.0] 5 50
(5.0, 10.0] 10 53
(10.0, 15.0] 15 55
(15.0, 30.0] 30 50
(30.0, 45.0] 45 61
In [470]: df2['price'] = df2['Used amount'].map(df1.Price)
In [471]: df2
Out[471]:
Used amount price
0 4.5 50
1 1.2 50
2 6.2 53
3 4.1 50
4 25.6 50
5 31.0 61
6 19.0 50
7 15.0 55
You can use cut or searchsorted for create bins.
Notice: Index in df1 has to be default - 0,1,2....
#create default index if necessary
df1 = df1.reset_index(drop=True)
#create bins
bins = [0] + df1['Amount'].tolist()
#get index values of df1 by values of Used amount
a = pd.cut(df2['Used amount'], bins=bins, labels=df1.index)
#assign output
df2['price'] = df1['Price'].values[a]
print (df2)
Used amount price
0 4.5 50
1 1.2 50
2 6.2 53
3 4.1 50
4 25.6 50
5 31.0 61
6 19.0 50
7 15.0 55
a = df1['Amount'].searchsorted(df2['Used amount'])
df2['price'] = df1['Price'].values[a]
print (df2)
Used amount price
0 4.5 50
1 1.2 50
2 6.2 53
3 4.1 50
4 25.6 50
5 31.0 61
6 19.0 50
7 15.0 55
You can use pd.DataFrame.reindex with method=bfill
df1.set_index('Amount').reindex(df2['Used amount'], method='bfill')
Price
Used amount
4.5 50
1.2 50
6.2 53
4.1 50
25.6 50
31.0 61
19.0 50
15.0 55
To add that to a new column we can use
join
df2.join(
df1.set_index('Amount').reindex(df2['Used amount'], method='bfill'),
on='Used amount'
)
Used amount Price
0 4.5 50
1 1.2 50
2 6.2 53
3 4.1 50
4 25.6 50
5 31.0 61
6 19.0 50
7 15.0 55
Or assign
df2.assign(
Price=df1.set_index('Amount').reindex(df2['Used amount'], method='bfill').values)
Used amount Price
0 4.5 50
1 1.2 50
2 6.2 53
3 4.1 50
4 25.6 50
5 31.0 61
6 19.0 50
7 15.0 55
I have a dataframe that looks like this:
userId movieId rating
0 1 31 2.5
1 1 1029 3.0
2 1 3671 3.0
3 2 10 4.0
4 2 17 5.0
5 3 60 3.0
6 3 110 4.0
7 3 247 3.5
8 4 10 4.0
9 4 112 5.0
10 5 3 4.0
11 5 39 4.0
12 5 104 4.0
I need to get a dataframe which has unique userId, number of ratings by the user and the average rating by the user as shown below:
userId count mean
0 1 3 2.83
1 2 2 4.5
2 3 3 3.5
3 4 2 4.5
4 5 3 4.0
Can someone help?
df1 = df.groupby('userId')['rating'].agg(['count','mean']).reset_index()
print(df1)
userId count mean
0 1 3 2.833333
1 2 2 4.500000
2 3 3 3.500000
3 4 2 4.500000
4 5 3 4.000000
Drop movieId since we're not using it, groupby userId, and then apply the aggregation methods:
import pandas as pd
df = pd.DataFrame({'userId': [1,1,1,2,2,3,3,3,4,4,5,5,5],
'movieId':[31,1029,3671,10,17,60,110,247,10,112,3,39,104],
'rating':[2.5,3.0,3.0,4.0,5.0,3.0,4.0,3.5,4.0,5.0,4.0,4.0,4.0]})
df = df.drop('movieId', axis=1).groupby('userId').agg(['count','mean'])
print(df)
Which produces:
rating
count mean
userId
1 3 2.833333
2 2 4.500000
3 3 3.500000
4 2 4.500000
5 3 4.000000
Here's a NumPy based approach using the fact that userID column appears to be sorted -
unq, tags, count = np.unique(df.userId.values, return_inverse=1, return_counts=1)
mean_vals = np.bincount(tags, df.rating.values)/count
df_out = pd.DataFrame(np.c_[unq, count], columns = (('userID', 'count')))
df_out['mean'] = mean_vals
Sample run -
In [103]: df
Out[103]:
userId movieId rating
0 1 31 2.5
1 1 1029 3.0
2 1 3671 3.0
3 2 10 4.0
4 2 17 5.0
5 3 60 3.0
6 3 110 4.0
7 3 247 3.5
8 4 10 4.0
9 4 112 5.0
10 5 3 4.0
11 5 39 4.0
12 5 104 4.0
In [104]: df_out
Out[104]:
userID count mean
0 1 3 2.833333
1 2 2 4.500000
2 3 3 3.500000
3 4 2 4.500000
4 5 3 4.000000