How to populate subsequent rows based on previous row value and value from another column in Python Pandas? - python-3.x

I have the following df.
cases percent_change
100 0.01
NaN 0.00
NaN -0.001
NaN 0.05
For the next rows (starting in the second row) from the cases column, it's calculated as next cases = previous cases * (1 + previous percent_change), or for the row below the 100, it is calculated as 100 * (1 + 0.01) = 101. Thus, it should populate like so
cases percent_change
100 0.01
101 0.00
101 -0.001
100.899 0.05
I want to ignore the first row (or 100). Here is my code which is not working
df.loc[1:, 'cases'] = df['cases'].shift(1) * (1 + df['percent_change'].shift(1))
Tried this as well with no success
df.loc[1:, 'cases'] = df.loc[1:, 'cases'].shift(1) * (1 + df.loc[1:, 'percent_change'].shift(1))

df['cases'] = (df.percent_change.shift(1).fillna(0) + 1).cumprod() * df.at[0, 'cases']
print(df)
Prints:
cases percent_change
0 100.000 0.010
1 101.000 0.000
2 101.000 -0.001
3 100.899 0.050

Related

Create some features based on the average growth rate of y for the month over the past few years

Assuming we have dataset df (which can be downloaded from this link), I want to create some features based on the average growth rate of y for the month of the past several years, for example: y_agr_last2, y_agr_last3, y_agr_last4, etc.
The formula is:
For example, for September 2022, y_agr_last2 = ((1 + 3.85/100)*(1 + 1.81/100))^(1/2) -1, y_agr_last3 = ((1 + 3.85/100)*(1 + 1.81/100)*(1 + 1.6/100))^(1/3) -1.
The code I use is as follows, which is relatively repetitive and trivial:
import math
df['y_shift12'] = df['y'].shift(12)
df['y_shift24'] = df['y'].shift(24)
df['y_shift36'] = df['y'].shift(36)
df['y_agr_last2'] = pow(((1+df['y_shift12']/100) * (1+df['y_shift24']/100)), 1/2) -1
df['y_agr_last3'] = pow(((1+df['y_shift12']/100) * (1+df['y_shift24']/100) * (1+df['y_shift36']/100)), 1/3) -1
df.drop(['y_shift12', 'y_shift24', 'y_shift36'], axis=1, inplace=True)
df
How can the desired result be achieved more concisely?
References:
Create some features based on the mean of y for the month over the past few years
Following is one way to generalise it:
import functools
import operator
num_yrs = 3
for n in range(1, num_yrs+1):
df[f"y_shift{n*12}"] = df["y"].shift(n*12)
df[f"y_agr_last{n}"] = pow(functools.reduce(operator.mul, [1+df[f"y_shift{i*12}"]/100 for i in range(1, n+1)], 1), 1/n) - 1
df = df.drop(["y_agr_last1"] + [f"y_shift{n*12}" for n in range(1, num_yrs+1)], axis=1)
Output:
date y x1 x2 y_agr_last2 y_agr_last3
0 2018/1/31 -13.80 1.943216 3.135839 NaN NaN
1 2018/2/28 -14.50 0.732108 0.375121 NaN NaN
...
22 2019/11/30 4.00 -0.273262 -0.021146 NaN NaN
23 2019/12/31 7.60 1.538851 1.903968 NaN NaN
24 2020/1/31 -11.34 2.858537 3.268478 -0.077615 NaN
25 2020/2/29 -34.20 -1.246915 -0.883807 -0.249940 NaN
26 2020/3/31 46.50 -4.213756 -4.670146 0.221816 NaN
...
33 2020/10/31 -1.00 1.967062 1.860070 -0.035569 NaN
34 2020/11/30 12.99 2.302166 2.092842 0.041998 NaN
35 2020/12/31 5.54 3.814303 5.611199 0.030017 NaN
36 2021/1/31 -6.41 4.205601 4.948924 -0.064546 -0.089701
37 2021/2/28 -22.38 4.185913 3.569100 -0.342000 -0.281975
38 2021/3/31 17.64 5.370519 3.130884 0.465000 0.298025
...
54 2022/7/31 0.80 -6.259455 -6.716896 0.057217 0.052793
55 2022/8/31 -5.30 1.302754 1.412277 0.015121 -0.000492
56 2022/9/30 NaN -2.876968 -3.785964 0.028249 0.024150

How to find the index of a row for a particular value in a particular column and then create a new column with that starting point?

Example of dataframe
What I'm trying to do with my dataframe...
Locate the first 0 value in a certain column (G in the example photo).
Create a new column (Time) with the value 0 lining up on the same row with the same 0 value in column (G).
And then each row after the 0 in column (Time) +(1/60) until the end of the data.
And -(1/60) before the 0 in (Time) column until the beginning of data.
What is the best method to achieve this?
Any advice would be appreciated. Thank you.
Pretty straight forward
identify index of row that contains the value you are looking for
then construct an array where start is negative, zero will be index row and end at value for end of series
import numpy as np
df = pd.DataFrame({"Time":np.full(25, 0), "G":[i if i>0 else 0 for i in range(10,-15,-1)]})
# find the where value is zero
idx = df[df["G"]==0].index[0]
intv = round(1/60, 3)
# construct a numpy array from a range of values (start, stop, increment)
df["Time"] = np.arange(-idx*intv, (len(df)-idx)*intv, intv)
df.loc[idx, "Time"] = 0 # just remove any rounding at zero point
print(df.to_string(index=False))
output
Time G
-0.170 10
-0.153 9
-0.136 8
-0.119 7
-0.102 6
-0.085 5
-0.068 4
-0.051 3
-0.034 2
-0.017 1
0.000 0
0.017 0
0.034 0
0.051 0
0.068 0
0.085 0
0.102 0
0.119 0
0.136 0
0.153 0
0.170 0
0.187 0
0.204 0
0.221 0
0.238 0

build function which takes values from above row in pandas dataframe

i have the following dataframe:
i want to build func to apply on column 'c' that will take the subtraction from columns 'd' and 'u' and add the value from the row above in column 'c'.
so the the table will look as follow:
for example in row number 2 the calculation will be: 44.37 - 0 + 149.77 = 194.14
in row number 4 the calculation will be 11.09 - 6.45 + 210.78 = 215.42
and so on..
i tried to build function using iloc with while loop or shift but non of them worked as i got an error:
("'numpy.float64' object has no attribute 'iloc'", 'occurred at index 0')
("'numpy.float64' object has no attribute 'shift'", 'occurred at index 0')
any idea how to make this function will be great.
thanks!!
You can apply direct subtraction of columns and use cummulative sum to add values
d u
0 0.000 149.75
1 0.000 44.37
2 0.000 16.64
3 6.450 11.09
4 77.345 5.54
5 64.520 16.40
df1['C'] = (df1['u'] - df1['d']).cumsum()
OUt:
d u c
0 0.000 149.75 149.750
1 0.000 44.37 194.120
2 0.000 16.64 210.760
3 6.450 11.09 215.400
4 77.345 5.54 143.595
5 64.520 16.40 95.475

Python LIfe Expectancy

Trying to use panda to calculate life expectanc with complex equations.
Multiply or divide column by column is not difficult to do.
My data is
A b
1 0.99 1000
2 0.95 =0.99*1000=990
3 0.93 = 0.95*990
Field A is populated and field be has only the 1000
Field b (b2) = A1*b1
Tried shift function, got result for b2 only and the rest zeros any help please thanks mazin
IIUC, if you're starting with:
>>> df
A b
0 0.99 1000.0
1 0.95 NaN
2 0.93 NaN
Then you can do:
df.loc[df.b.isnull(),'b'] = (df.A.cumprod()*1000).shift()
>>> df
A b
0 0.99 1000.0
1 0.95 990.0
2 0.93 940.5
Or more generally:
df['b'] = (df.A.cumprod()*df.b.iloc[0]).shift().fillna(df.b.iloc[0])

pandas, how to get close price from returns?

I'm trying to convert from returns to a price index to simulate close prices for the ffn library, but without success.
import pandas as pd
times = pd.to_datetime(pd.Series(['2014-07-4',
'2014-07-15','2014-08-25','2014-08-25','2014-09-10','2014-09-15']))
strategypercentage = [0.01, 0.02, -0.03, 0.04,0.5,-0.3]
df = pd.DataFrame({'llt_return': strategypercentage}, index=times)
df['llt_close']=1
df['llt_close']=df['llt_close'].shift(1)*(1+df['llt_return'])
df.head(10)
llt_return llt_close
2014-07-04 0.01 NaN
2014-07-15 0.02 1.02
2014-08-25 -0.03 0.97
2014-08-25 0.04 1.04
2014-09-10 0.50 1.50
2014-09-15 -0.30 0.70
How can I make this correct?
You can use the cumulative product of return-relatives.
A return-relative is one-plus that day's return.
>>> start = 1.0
>>> df['llt_close'] = start * (1 + df['llt_return']).cumprod()
>>> df
llt_return llt_close
2014-07-04 0.01 1.0100
2014-07-15 0.02 1.0302
2014-08-25 -0.03 0.9993
2014-08-25 0.04 1.0393
2014-09-10 0.50 1.5589
2014-09-15 -0.30 1.0912
This assumes the price index starts at start on the close of the trading day prior to 2014-07-04.
On 7-04, you have a 1% return and the price index closes at 1 * (1 + .01) = 1.01.
On 7-15, return was 2%; close price will be 1.01 * (1 + .02) = 1.0302.
Granted, this is not completely realistic given you're forming a price indexing from irregular-frequency data (missing dates), but hopefully this answers your question.

Resources