I have the following Pandas dataframe:
a0 a1 a2 a3
0.2 0.46 15.85 124.06 -380.04
0.4 0.21 28.20 -53.17 87.97
0.6 1.10 -5.55 167.76 -417.72
0.8 0.82 6.11 16.90 -70.86
1.0 1.00 0.00 0.00 0.00
Which is made by:
import pandas as pd
df = pd.DataFrame(data={'a0': [0.46,0.21,1.10,0.82,1],
'a1': [15.85,28.20,-5.55,6.11,0],
'a2': [124.06,-53.17,167.76,16.90,0],
'a3': [-380.04,87.97,-417.72,-70.86,0]},
index=pd.Series(['0.2', '0.4', '0.6','0.8','1.0']))
a0,a1,a2,a3 are polynomial coefficients from a fit y= a0 + a1x + a2x^2 + a3*x^3.
5 fits have been made for 5 ratios Ht/H, these ratios are on the indices.
I want to return values for a0.. a3 for specified Ht/H ratio.
For example, if I specify Ht/H= 0.9, I want to get a0= 0.91, a1= 3.05,a2= 8.45,a3= -35.43.
First I notice that your index is currently strings, and you want numeric for interpolation. So do:
df.index = pd.to_numeric(df.index)
Let's try reindex:
s = 0.9
# create new index that includes the new value
new_idx = np.unique(list(df.index) + [s])
df.reindex(new_idx).interpolate('index').loc[s]
Output:
a0 0.910
a1 3.055
a2 8.450
a3 -35.430
Name: 0.9, dtype: float64
Related
Have got an array like below with columns ['item','Space','rem_spc']
array([['Pineapple', 0.5, 0.5],
['Mango', 0.75, 0.25],
['Apple', 0.375, 0.625],
['Melons', 0.25, 0.75],
['Grape', 0.125, 0.875]], dtype=object)
need to convert this array to dataframe along with new column ['nxt_item'] which should be generated for first array row alone(Here, for Pineapple) with below conditions:
to find the first nearest items array['Space'] whose sum equals array['rem_spc'] for pineapple.
Expected Output:
item Space rem_spc nxt_item
Pineapple 0.5 0.5 {Apple, Grape} #0.5 = 0.375 + 0.125
Mango 0.75 0.25
Apple 0.375 0.625
Melons 0.25 0.75
Grape 0.125 0.875
Thanks!
A possible solution (another would be using binary linear programming):
from itertools import product
n = len(df) - 1
combinations = product([0, 1], repeat=n)
a = np.array(list(combinations))
df['nxt_item'] = np.nan
df.loc[0, 'nxt_item'] = (
'{' +
', '.join(list(
df.loc[[False] +
a[np.argmin(np.abs(df.iloc[0, 2] -
np.sum(a * df['Space'][1:].values, axis=1))), :]
.astype(bool).tolist(), 'item']))
+ '}')
Output:
item Space rem_spc nxt_item
0 Pineapple 0.5 0.5 {Apple, Grape}
1 Mango 0.75 0.25 NaN
2 Apple 0.375 0.625 NaN
3 Melons 0.25 0.75 NaN
4 Grape 0.125 0.875 NaN
I have a Pandas Data Frame with two string columns, which I would like to split on space, like this:
df =
A B
0.1 0.5 0.01 ... 0.3 0.1 0.4 ...
I would like to split both these columns and form new columns for as many values, which result out of the split.
So, the result:
df =
A1 A2. A3 ... B1 B2 B3
0.1 0.5 0.01 ... 0.3 0.1 0.4
Currently, I am doing:
df = df.join(df['A'].str.split(' ', expand = True)
df = df.join(df['B'].str.split(' ', expand = True)
But, I get the following error:
columns overlap but no suffix specified
This is because I guess columns names of 1st and 2nd split overlap?
So, my question is how to split multiple columns by providing column names or suffixes for multiple splits?
Use DataFrame.add_prefix for columns names by splitted column:
df = df.join(df['A'].str.split(expand = True).add_prefix('A'))
df = df.join(df['B'].str.split(expand = True).add_prefix('B'))
print (df)
A B A0 A1 A2 B0 B1 B2
0 0.1 0.5 0.01 0.3 0.1 0.4 0.1 0.5 0.01 0.3 0.1 0.4
Another idea is use list comprehension:
cols = ['A','B']
df1 = pd.concat([df[c].str.split(expand=True).add_prefix(c) for c in cols], axis=1)
print (df1)
A0 A1 A2 B0 B1 B2
0 0.1 0.5 0.01 0.3 0.1 0.4
And for add all original columns:
df = df.join(df1)
For the dataframe df :
dummy_data1 = {'category': ['White', 'Black', 'Hispanic','White'],
'Pop':['75','85','90','100'],'White_ratio':[0.6,0.4,0.7,0.35],'Black_ratio':[0.3,0.2,0.1,0.45], 'Hispanic_ratio':[0.1,0.4,0.2,0.20] }
df = pd.DataFrame(dummy_data1, columns = ['category', 'Pop','White_ratio', 'Black_ratio', 'Hispanic_ratio'])
I want to add a new column to this data frame,'pop_n', by first checking the category, and then multiplying the value in 'Pop' by the corresponding ratio value in the columns. For the first row,
the category is 'White' so it should multiply 75 with 0.60 and put 45 in pop_n column.
I thought about writing something like :
df['pop_n']= (df['Pop']*df['White_ratio']).where(df['category']=='W')
this works but just for one category.
I will appreciate any helps with this.
Thanks.
Using DataFrame.filter and DataFrame.lookup:
First we use filter to get the columns with ratio in the name. Then split and keep the first word before the underscore only.
Finally we use lookup to match the category values to these columns.
# df['Pop'] = df['Pop'].astype(int)
df2 = df.filter(like='ratio').rename(columns=lambda x: x.split('_')[0])
df['pop_n'] = df2.lookup(df.index, df['category']) * df['Pop']
category Pop White_ratio Black_ratio Hispanic_ratio pop_n
0 White 75 0.60 0.30 0.1 45.0
1 Black 85 0.40 0.20 0.4 17.0
2 Hispanic 90 0.70 0.10 0.2 18.0
3 White 100 0.35 0.45 0.2 35.0
Locate the columns that have underscores in their names:
to_rename = {x: x.split("_")[0] for x in df if "_" in x}
Find the matching factors:
stack = df.rename(columns=to_rename)\
.set_index('category').stack()
factors = stack[map(lambda x: x[0]==x[1], stack.index)]\
.reset_index(drop=True)
Multiply the original data by the factors:
df['pop_n'] = df['Pop'].astype(int) * factors
# category Pop White_ratio Black_ratio Hispanic_ratio pop_n
#0 White 75 0.60 0.30 0.1 45
#1 Black 85 0.40 0.20 0.4 17
#2 Hispanic 90 0.70 0.10 0.2 18
#3 White 100 0.35 0.45 0.2 35
Trying to use panda to calculate life expectanc with complex equations.
Multiply or divide column by column is not difficult to do.
My data is
A b
1 0.99 1000
2 0.95 =0.99*1000=990
3 0.93 = 0.95*990
Field A is populated and field be has only the 1000
Field b (b2) = A1*b1
Tried shift function, got result for b2 only and the rest zeros any help please thanks mazin
IIUC, if you're starting with:
>>> df
A b
0 0.99 1000.0
1 0.95 NaN
2 0.93 NaN
Then you can do:
df.loc[df.b.isnull(),'b'] = (df.A.cumprod()*1000).shift()
>>> df
A b
0 0.99 1000.0
1 0.95 990.0
2 0.93 940.5
Or more generally:
df['b'] = (df.A.cumprod()*df.b.iloc[0]).shift().fillna(df.b.iloc[0])
I'm trying to convert from returns to a price index to simulate close prices for the ffn library, but without success.
import pandas as pd
times = pd.to_datetime(pd.Series(['2014-07-4',
'2014-07-15','2014-08-25','2014-08-25','2014-09-10','2014-09-15']))
strategypercentage = [0.01, 0.02, -0.03, 0.04,0.5,-0.3]
df = pd.DataFrame({'llt_return': strategypercentage}, index=times)
df['llt_close']=1
df['llt_close']=df['llt_close'].shift(1)*(1+df['llt_return'])
df.head(10)
llt_return llt_close
2014-07-04 0.01 NaN
2014-07-15 0.02 1.02
2014-08-25 -0.03 0.97
2014-08-25 0.04 1.04
2014-09-10 0.50 1.50
2014-09-15 -0.30 0.70
How can I make this correct?
You can use the cumulative product of return-relatives.
A return-relative is one-plus that day's return.
>>> start = 1.0
>>> df['llt_close'] = start * (1 + df['llt_return']).cumprod()
>>> df
llt_return llt_close
2014-07-04 0.01 1.0100
2014-07-15 0.02 1.0302
2014-08-25 -0.03 0.9993
2014-08-25 0.04 1.0393
2014-09-10 0.50 1.5589
2014-09-15 -0.30 1.0912
This assumes the price index starts at start on the close of the trading day prior to 2014-07-04.
On 7-04, you have a 1% return and the price index closes at 1 * (1 + .01) = 1.01.
On 7-15, return was 2%; close price will be 1.01 * (1 + .02) = 1.0302.
Granted, this is not completely realistic given you're forming a price indexing from irregular-frequency data (missing dates), but hopefully this answers your question.