I was looking for the way to extend the range values inside a Pandas column by interpolation, but I still don't know how to set the 'limits' of the interpolation, I mean, it's something like:
[Distance] [Radiation]
12 120
13 130
14 140
15 150
16 160
17 170
So, what I'm trying to get is the full range of column [Radiation] according to the complete secuence of column [Distance] by interpolation.
[Distance] [Radiation]
1 10
2 20
. .
. .
12 120
13 130
14 140
15 150
16 160
. .
. .
20 200
I was looking in the documentation of pandas and scipy methods but I think I couldn't find it yet.
Thanks for your insights.
One idea is use DataFrame.reindex for add all not existing values of distance and then use DataFrame.interpolate with barycentric method:
df = (df.set_index('Distance')
.reindex(range(1, 21))
.interpolate(method='barycentric', limit_direction='both')
.reset_index())
print (df)
Distance Radiation
0 1 10.0
1 2 20.0
2 3 30.0
3 4 40.0
4 5 50.0
5 6 60.0
6 7 70.0
7 8 80.0
8 9 90.0
9 10 100.0
10 11 110.0
11 12 120.0
12 13 130.0
13 14 140.0
14 15 150.0
15 16 160.0
16 17 170.0
17 18 180.0
18 19 190.0
19 20 200.0
I have the following data frame.
item_id price quantile
0 1 10 0.1
1 3 20 0.2
2 4 30 0.3
3 6 40 0.4
4 11 50 0.5
5 12 60 0.6
6 15 70 0.7
7 20 80 0.8
8 25 90 0.9
9 26 100 1.0
I would like to have a customed rank function, which starts from the record whose quantile closest to 0.44, then goes down, and goes up, then goes down, and goes up ...
The result should look like:
item_id price quantile customed_rank
0 1 10 0.1 6
1 3 20 0.2 4
2 4 30 0.3 2
3 6 40 0.4 1
4 11 50 0.5 3
5 12 60 0.6 5
6 15 70 0.7 7
7 20 80 0.8 8
8 25 90 0.9 9
9 26 100 1.0 10
Other then looping over the entire data frame to do that, is there a more elegant way to achieve this? Thanks!
You want to rank by the absolute value of the difference between quantile and 0.44.
(df['quantile'] - 0.44).abs().rank()
0 7.0
1 5.0
2 3.0
3 1.0
4 2.0
5 4.0
6 6.0
7 8.0
8 9.0
9 10.0
Name: quantile, dtype: float64
A faster (but uglier) alternative is to argsort twice.
(df['quantile'] - 0.44).abs().values.argsort().argsort() + 1
array([ 7, 5, 3, 1, 2, 4, 6, 8, 9, 10])
Note that this solution is only faster if you work with Numpy array objects (through the values property), rather than Pandas series objects.
I've a pandas dataset with open, high, low, close and key column. Now I want to group the dataset by key and calculate pivot with the formula - (high + low + close) / 3. Upto this I'm able to do. But the requirement is to shift the calculated data to next group which I'm unable to code.
I'm able to group the dataset by key column and able to calculate pivot data.
import pandas as pd
data = pd.DataFrame([[110, 115, 105, 111, 1],[11, 16, 6, 12, 1],[12, 17, 7, 13, 1],[12, 16, 6, 11, 2],[9, 13, 4, 13, 2],[13, 18, 9, 12, 3],[14, 16, 10, 13, 3]], columns=["open","high","low","close","key"])
data['p'] = (data.high.groupby(data.key).transform('max') + data.low.groupby(data.key).transform('min') + data.close.groupby(data.key).transform('last')) / 3
print(data)
Currently I'm getting below output.
open high low close key p
0 110 115 105 111 1 44.666667
1 11 16 6 12 1 44.666667
2 12 17 7 13 1 44.666667
3 12 16 6 11 2 11.000000
4 9 13 4 13 2 11.000000
5 13 18 9 12 3 13.333333
6 14 16 10 13 3 13.333333
But after shifting value to next group the expected output should be as mentioned below.
open high low close key p
0 110 115 105 111 1 NaN
1 11 16 6 12 1 NaN
2 12 17 7 13 1 NaN
3 12 16 6 11 2 44.666667
4 9 13 4 13 2 44.666667
5 13 18 9 12 3 11.000000
6 14 16 10 13 3 11.000000
Instead 3 dimes groupby use GroupBy.agg with dictionary, then sum values per rows and divide 3. Last use Series.map with Series.shifted values for new column:
s = data.groupby('key').agg({'low':'min','high':'max','close':'last'}).sum(axis=1) / 3
data['s'] = data['key'].map(s.shift())
print(data)
open high low close key s
0 110 115 105 111 1 NaN
1 11 16 6 12 1 NaN
2 12 17 7 13 1 NaN
3 12 16 6 11 2 44.666667
4 9 13 4 13 2 44.666667
5 13 18 9 12 3 11.000000
6 14 16 10 13 3 11.000000
cars_df = pd.DataFrame((car.iloc[:[1,3,4,6]].values), columns = ['mpg', 'dip', 'hp', 'wt'])
car_t = car.iloc[:9].values
target_names = [0,1]
car_df['group'] = pd.series(car_t, dtypre='category')
sb.pairplot(cars_df)
I have tried using .iloc(axis=0)[xxxx] and making a slice into a list and a tuple. no dice. Any thoughts? I am trying to make a scatter plot from a lynda.com video but in the video, the host is using .ix which is deprecated. So I am using .iloc[]
car = a dataframe
a few lines of data
"Car_name","mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb"
"Mazda RX4",21,6,160,110,3.9,2.62,16.46,0,1,4,4
"Mazda RX4 Wag",21,6,160,110,3.9,2.875,17.02,0,1,4,4
"Datsun 710",22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
"Hornet 4 Drive",21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
"Hornet Sportabout",18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
"Valiant",18.1,6,225,105,2.76,3.46,20.22,1,0,3,1
"Duster 360",14.3,8,360,245,3.21,3.57,15.84,0,0,3,4
"Merc 240D",24.4,4,146.7,62,3.69,3.19,20,1,0,4,2
"Merc 230",22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
"Merc 280",19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4
"Merc 280C",17.8,6,167.6,123,3.92,3.44,18.9,1,0,4,4
"Merc 450SE",16.4,8,275.8,180,3.07,4.07,17.4,0,0,3,3
I think you want select multiple columns by iloc:
cars_df = car.iloc[:, [1,3,4,6]]
print (cars_df)
mpg disp hp wt
0 21.0 160.0 110 2.620
1 21.0 160.0 110 2.875
2 22.8 108.0 93 2.320
3 21.4 258.0 110 3.215
4 18.7 360.0 175 3.440
5 18.1 225.0 105 3.460
6 14.3 360.0 245 3.570
7 24.4 146.7 62 3.190
8 22.8 140.8 95 3.150
9 19.2 167.6 123 3.440
10 17.8 167.6 123 3.440
11 16.4 275.8 180 4.070
sb.pairplot(cars_df)
Not 100% sure with another code, it seems need:
#select also 9. column
cars_df = car.iloc[:, [1,3,4,6,9]]
#rename 9. column
cars_df = cars_df.rename(columns={'am':'group'})
#convert it to categorical
cars_df['group'] = pd.Categorical(cars_df['group'])
print (cars_df)
mpg disp hp wt group
0 21.0 160.0 110 2.620 1
1 21.0 160.0 110 2.875 1
2 22.8 108.0 93 2.320 1
3 21.4 258.0 110 3.215 0
4 18.7 360.0 175 3.440 0
5 18.1 225.0 105 3.460 0
6 14.3 360.0 245 3.570 0
7 24.4 146.7 62 3.190 0
8 22.8 140.8 95 3.150 0
9 19.2 167.6 123 3.440 0
10 17.8 167.6 123 3.440 0
11 16.4 275.8 180 4.070 0
#add parameetr hue for different levels of a categorical variable
sb.pairplot(cars_df, hue='group')
I have two pandas df that look like this
df1
Amount Price
0 5 50
1 10 53
2 15 55
3 30 50
4 45 61
df2
Used amount
0 4.5
1 1.2
2 6.2
3 4.1
4 25.6
5 31
6 19
7 15
I am trying to insert a new column on df2 that will give provide the price from the df1, df1 and df2 have different size, df1 is smaller
I am expecting something like this
df3
Used amount price
0 4.5 50
1 1.2 50
2 6.2 53
3 4.1 50
4 25.6 50
5 31 61
6 19 50
7 15 55
I am thinking to solve this, with something like this function
def price_function(key, table):
used_amount_df2 = (row[0] for row in df1)
price = filter(lambda x: x < key, used_amount_df1)
Here is my own solution
1st approach:
from itertools import product
import pandas as pd
df2=df2.reset_index()
DF=pd.DataFrame(list(product(df2.Usedamount, df1.Amount)), columns=['l1', 'l2'])
DF['DIFF']=(DF.l1-DF.l2)
DF=DF.loc[DF.DIFF<=0,]
DF=DF.sort_values(['l1','DIFF'],ascending=[True,False]).drop_duplicates(['l1'],keep='first')
df1.merge(DF,left_on='Amount',right_on='l2',how='left').merge(df2,left_on='l1',right_on='Usedamount',how='right').loc[:,['index','Usedamount','Price']].set_index('index').sort_index()
Out[185]:
Usedamount Price
index
0 4.5 50
1 1.2 50
2 6.2 53
3 4.1 50
4 25.6 50
5 31.0 61
6 19.0 50
7 15.0 55
2nd using pd.merge_asof I recommend this
df2=df2.rename({'Used amount':Amount}).sort_values('Amount')
df2=df2.reset_index()
pd.merge_asof(df2,df1,on='Amount',allow_exact_matches=True,direction='forward')\
.set_index('index').sort_index()
Out[206]:
Amount Price
index
0 4.5 50
1 1.2 50
2 6.2 53
3 4.1 50
4 25.6 50
5 31.0 61
6 19.0 50
7 15.0 55
Using pd.IntervalIndex you can
In [468]: df1.index = pd.IntervalIndex.from_arrays(df1.Amount.shift().fillna(0),df1.Amount)
In [469]: df1
Out[469]:
Amount Price
(0.0, 5.0] 5 50
(5.0, 10.0] 10 53
(10.0, 15.0] 15 55
(15.0, 30.0] 30 50
(30.0, 45.0] 45 61
In [470]: df2['price'] = df2['Used amount'].map(df1.Price)
In [471]: df2
Out[471]:
Used amount price
0 4.5 50
1 1.2 50
2 6.2 53
3 4.1 50
4 25.6 50
5 31.0 61
6 19.0 50
7 15.0 55
You can use cut or searchsorted for create bins.
Notice: Index in df1 has to be default - 0,1,2....
#create default index if necessary
df1 = df1.reset_index(drop=True)
#create bins
bins = [0] + df1['Amount'].tolist()
#get index values of df1 by values of Used amount
a = pd.cut(df2['Used amount'], bins=bins, labels=df1.index)
#assign output
df2['price'] = df1['Price'].values[a]
print (df2)
Used amount price
0 4.5 50
1 1.2 50
2 6.2 53
3 4.1 50
4 25.6 50
5 31.0 61
6 19.0 50
7 15.0 55
a = df1['Amount'].searchsorted(df2['Used amount'])
df2['price'] = df1['Price'].values[a]
print (df2)
Used amount price
0 4.5 50
1 1.2 50
2 6.2 53
3 4.1 50
4 25.6 50
5 31.0 61
6 19.0 50
7 15.0 55
You can use pd.DataFrame.reindex with method=bfill
df1.set_index('Amount').reindex(df2['Used amount'], method='bfill')
Price
Used amount
4.5 50
1.2 50
6.2 53
4.1 50
25.6 50
31.0 61
19.0 50
15.0 55
To add that to a new column we can use
join
df2.join(
df1.set_index('Amount').reindex(df2['Used amount'], method='bfill'),
on='Used amount'
)
Used amount Price
0 4.5 50
1 1.2 50
2 6.2 53
3 4.1 50
4 25.6 50
5 31.0 61
6 19.0 50
7 15.0 55
Or assign
df2.assign(
Price=df1.set_index('Amount').reindex(df2['Used amount'], method='bfill').values)
Used amount Price
0 4.5 50
1 1.2 50
2 6.2 53
3 4.1 50
4 25.6 50
5 31.0 61
6 19.0 50
7 15.0 55