Calculating growth index of stock performance which is reseted every four years - python-3.x

I am analysing a stock-index in 4-year cycles and would like to start with a base of 1 at the beginning of the first year and the calculate the returns on top, so that I get a column in the dataframe that goes 1, 1.02, 1.03, 1.025...
The return indexed calculation would be (todaysValue/yesterdaysValue)*yesterdaysIndexValue.
The df looks like this:
Datetimeindex Stockindex CycleYear Diff Daynumber
01.01.1968 96.47 0 1 1
...
03.01.1972 101.67 0 1 1
...
06.09.2022 3908.19 2 0 699
07.09.2022 3979.87 2 0 700
08.09.2022 4006.18 2 0 701
I would now like to add a column df['Retindex'] that starts every 4 years at 1 and calculates the indexed-returns until the end of year 4.
I have created the column to that has a 1 at the start of each cycle.
df['Retindex'] = df['Daynumber'].loc[df['Daynumber'] == 1]
Then I tried creating the rest of the index with this:
for id in df[df['Retindex'].isnull() == True].index: df.loc[id, 'Retindex'] = (df[Stockindex]/df[Stockindex].shift().loc[id]) * df['Retindex'].shift().loc[id]
Here I am getting the error: "ValueError: Incompatible indexer with Series"
I have tried other ways as well but I am unfortunately not progressing on this. Can anyone help?

Related

Average data points in a range while condition is met in a Pandas DataFrame

I have a very large dataset with over 400,000 rows and growing. I understand that you are not supposed to use iterows to modify a pandas data frame. However I'm a little lost on what I should do in this case, since I'm not sure I could use .loc() or some rolling filter to modify a data frame in the way I need to. I'm trying to figure out if I can take a data frame and average the range while the condition is met. For example:
Condition
Temp.
Pressure
1
8
20
1
7
23
1
8
22
1
9
21
0
4
33
0
3
35
1
9
21
1
11
20
1
10
22
While the condition is == 1 the outputed dataframe would look like this:
Condition
Avg. Temp.
Avg. Pressure
1
8
21.5
1
10
21
Has anyone attempted something similar that can put me on the right path? I was thinking of using something like this:
df = pd.csv_read(csv_file)
for index, row in df.iterrows():
if row['condition'] == 1:
#start index = first value that equals 1
else: #end index & calculate rolling average of range
len = end - start
new_df = df.rolling(len).mean()
I know that my code isn't great, I also know I could brute force it doing something similar as I have shown above, but as I said it has a lot of rows and continues to grow so I need to be efficient.
TRY:
result = df.groupby((df.Condition != df.Condition.shift()).cumsum()).apply(
lambda x: x.rolling(len(x)).mean().dropna()).reset_index(drop=True)
print(result.loc[result.Condition.eq(1)]) # filter by required condition
OUTPUT:
Condition Temp. Pressure
0 1.0 8.0 21.5
2 1.0 10.0 21.0

Is there a way to convert my column of incrementing integers separated by zero to the number of intervals encountered so far in a pandas datafram?

I'm working in pandas and I have a column in my dataframe filled by 0s and incrementing integers starting at one. I would like to add another column of integers but that column would be a counter of how many intervals separated by zero we have encountered to this point. For example my data would like like
Index
1
2
3
0
1
2
0
1
and I would like it to look like
Index IntervalCount
1 1
2 1
3 1
0 1
1 2
2 2
0 2
1 2
Is it possible to do this with vectorized operation or do I have to do this iteratively? Note, it's not important that it be a new column could also overwrite the old one.
You can use cumsum function.
df["IntervalCount"] = (df["Index"] == 1).cumsum()

How can I find the highest value between rows every time that they met a certain condition?

I have been struggling with a problem with my data frame build in pandas that is current like this
MyDataFrame:
Index Status Value
0 A 10
1 A 8
2 A 5
3 B 9
4 B 5
5 A 1
6 B 2
7 A 3
8 A 5
9 A 1
The desired output would be:
Index Status Value
0 A 10
1 B 9
2 A 1
3 B 2
4 A 5
So far I tried to use range and while conditions to filter, however, if I put a conditional like :
for i in range:
if Status[i] == "A":
print(Value[i])
if Status == "B":
break
** The code above is more an example of what I have been trying to reach my goal, I tried to use .iloc and range with while, but maybe in the wrong way idk.*
The desired output isn't printed.
One thing that complicates this filtering process is that MyDataFrame changes every time that I run the script since it uses another base of data to create this DataFrame.
I believe that I'm missing something simple, but it has been almost a week and I can't figure out.
Thanks in advance for all your answers and support.
Let us try using shift with cumsum create the groupby key , then it is groupby + agg
out = df.groupby(df.Status.ne(df.Status.shift()).cumsum()).agg({'Status':'first','Value':'max'})
Out[14]:
Status Value
Status
1 A 10
2 B 9
3 A 1
4 B 2
5 A 5
Very close to #BEN_YO:
grp = (df['Status'] != df['Status'].shift()).cumsum()
df.loc[df.groupby(grp)['Value'].idxmax()]
Output:
Status Value
Index
0 A 10
3 B 9
5 A 1
6 B 2
8 A 5
Create groups using shift and inequality with cumsum, then groupby and find the index of the max value of 'Value', idxmax, and filter the dataframe using loc

EXCEL Count number of weeks in month based on date

I am trying to look up a value in a matrix based on a given date. The matrix has the first day of the week along the vertical axis, and the first day of the month along the horizontal axis.
For a given day, e.g. 31/08/15 I would like to match the exact date to the vertical axis of the matrix (i.e. 31/08/15), and the month to the horizontal axis (1/08/15).
So in the example below, an input of 31/08/15 should provide an output of 3.
01/06/2015 01/07/2015 01/08/2015 01/09/2015
03/08/2015 1 0 0 0
10/08/2015 0 2 0 0
17/08/2015 0 0 3 0
24/08/2015 0 0 0 4
31/08/2015 0 0 3 0
I am trying and failing with index and match formulae.
I have tried the following:
=index(area where to look, match(31/08/15,first column,0),match(and(month(31/08/15),year(31/08/15)),(and(month(first row),year(first row)),0)
Hope this is clear, thanks!
You can use an INDEX function with two MATCH functions top supply both the row and column.
    
The formula in D8 is,
=INDEX($B$2:$E$6,MATCH(C8,$A$2:$A$6,0),MATCH(DATE(YEAR(C8),MONTH(C8),1),$B$1:$E$1,0))
I'm a little concerned about the dates matching exactly down column A but a little maths manipulation with the WEEKDAY function would take care of that.
=INDEX($B$2:$E$6,MATCH(C9-WEEKDAY(C9, 2)+1,$A$2:$A$6,0),MATCH(DATE(YEAR(C9),MONTH(C9),1),$B$1:$E$1,0))
Here you go:
=INDEX($B$2:$E$6,MATCH(DATE(2015,8,31),$A$2:$A$6,),MATCH(DATE(2015,8,1),$B$1:$E$1,))

How to compute the maximum series of a specific condition returning true

i have a slight issue to count the MAX frequency of where the third colmn is bigger than the second. This is just a statistic with scores.
The issue is that i want to have it in one single formula without a macro.
B C
------
2 0
1 2
2 1
2 3
0 1
1 2
0 1
3 3
0 2
0 2
i have tried it with:
{=MAX(FREQUENCY(B3:B100;B3:B100>=C3:C100))} to get 1 for B
{=MAX(FREQUENCY(C3:C100;C3:C100>=B3:B100))} to get 7 for C
I excpected it to deliver me the longest series where the value in the one column was bigger than in the other one, but i failed hard...
Try this version to get 7
=MAX(FREQUENCY(IF(C3:C100>=B3:B100,IF(B3:B100<>"",ROW(B3:B100))),IF(C3:C100<B3:B100,ROW(B3:B100))))
confirmed with CTRL+SHIFT+ENTER
obviously reverse the ranges to get your other result
See example here

Resources