Building a sum in a for loop from a dataset - python-3.x

I have a dataset with two columns, Earning and Inflation_rate
Assuming the Dataset looks like this
0 Earning Inflation_rate
1 10000 0.00
2 12000 0.12
3 13000 0.13
I need to see how much it has built up over the course of these years, the formula is
(Income build until last year x inflation rate) + (first year earning/75)
For Year 1 that would be:
( 0 x 0.00 ) + (10000/75)
For Year 2 that would be:
(133 x 0.12) + (12000/75)
My code is the following:
y=int(input("Please enter the number of years of contribution: "))
income_built=0
divide=75
for i in range(1,y+1):
income_built=(income_built*df['Inflation_rate'].iloc[i-1,i])+(df['Earning'].iloc[i-1,i])/divide
print(int(income_built))
And I get the error
IndexingError: Too many indexers
How can I solve this ? Thank you!

The error is coming from your use of df['Inflation_rate'].iloc[i-1,i] and df['Earning'].iloc[i-1,i].
Pandas has a number of different ways of accessing values in a df, but the two most used are loc and iloc, both of which are explained well here.
I am slightly unclear about the formula you're trying to use. But I think if df['Inflation_rate'].iloc[i-1,i] is changed to df.loc[i-1,'Inflation_rate'] and df['Earning'].iloc[i-1,i] is changed to df.loc[i-1,'Earning', the error should dissappear.

Related

Survival rates >1 when using mrOpen in the package FSA

I am currently doing some population analyses with the package "FSA" in R.
By using the mrOpencommand, I want to get the survival rate.
My rawdata is a simple table with one row per indidivual, one column per sample date and values of 0 and 1 (for not capured or captured during that respective sampling).
id
total.captures
date1
date2
date3
etc
1
3
1
1
1
...
2
1
1
0
0
...
The first two columns contain the individual id and the aggregated number of captures which is why I excluded them in the analysis.
This is the exact code:
hold.data<-capHistSum(data, cols2use = c(3:13))
est.data<-mrOpen(hold.data)
summary(est.data)
confint(est.data)
It seems to work out, as I get the tables and summaries with all the parameters. See here as an example:
Screenshot_Results
However, there's a problem with the survival estimate phi.
The phi value is not between 0 and 1, but in some cases, exceeds 1.
Any idea, what went wrong here?
Thanks,
Pia

Using apply to modify different dataframe in pandas

I'm running into some issues regarding the use of apply in Pandas.
I have a dataframe, where there is multiple measurements made on certain days, on different measurement sites. To give an example, Site 1 has 2 measurements to make, every 7 or so days.
We know it has to be 2 measurements. So what I'm trying to do now is to check where on which days there were not enough measurements made.
site measurement expected date
0 1 2 1 01-01-2020
1 2 3 2 01-01-2020
2 3 4 2 01-01-2020
3 3 5 2 01-01-2020
4 2 1 2 08-01-2020
5 2 4 2 08-01-2020
I've made a sorted and aggregated DataFrame, that has aggregated the measurements, to basically be able to iterate over the measurements as to not go over the same days twice when there are multiple measurements.
site measurement expected date
0 1 2 1 01-01-2020
1 2 3 2 01-01-2020
2 3 9 2 01-01-2020
3 2 5 2 08-01-2020
For a function, I'm now using the function filter_amount(df_sorted, df, group).
group is to count the amount of measurements made df.groupby(["site_id", "date"]).count()
site measurement date
0 1 2 01-01-2020
1 2 1 01-01-2020
2 3 2 01-01-2020
3 2 2 08-01-2020
The current function basically goes like this:
def filter_amount(df_sorted, df, group):
for i in df_sorted.index:
"Locate amount of measurements for this day and site actually made, in group"
Check how many measurements are expected.
If not enough measurements:
find all measurements in normal df and drop them
So in this example, the measurements from site 2 on 1-1-20 have to be dropped, because there are not enough measurements. The ones from 8-1-20 are valid, because they expect 2, and 2 happen.
The problem is that this is extremely slow with over 500k rows.
The variables I need, I get by using .at[], but I'm trying to make it faster by using apply so I can parallelize the operations, but can't figure it out. I'm doing the apply on df_sorted and passing the arguments needed in, but it's not actually dropping the measurements from the original df.
I have a feeling that it's possible to do it with some sort of groupby on the original df to save operations..
I hope it's clear enough, happy to elaborate any questions.

Find the previous 3 month average value in Spotfire

I am attempting to get the last 3 months average of out%. so far could not find any sample that suit for my requirement. Appreciate any help that could be provided.
There's a multiple column that i need to consider. I have to add the current month good% and the out% in the calculation to get the forecast.
This code gives me an error and the values reflected in the column is incorrect.
Sum([out]) / Sum([in]) over (LastPeriods(3,[month]))
Dt TotalIN TotalOut Good OUT% Good%
2/1/2019 79606 51384 0 64.55% 0
3/1/2019 84194 61211 0 72.70% 0
4/1/2019 92458 67807 0 73.34% 0
5/1/2019 94531 66988 95 70.86% 0.10%
6/1/2019 29623 18181 2903 60.94% 9.73%
Thanks for adding some data I could copy in. Below is the early morning hack I made a column of the summary table. I also binned the row axes to make this work.
Column Calculation:
(Sum([Good%]) * ((
Sum([TotalOut]) OVER (Previous([Axis.Rows],3)) + Sum([TotalOut]) OVER (Previous([Axis.Rows],2)) + Sum([TotalOut]) OVER (Previous([Axis.Rows],1)))
/ (
Sum([TotalIn]) OVER (Previous([Axis.Rows],3)) + Sum([TotalIn]) OVER (Previous([Axis.Rows],2)) + Sum([TotalIn]) OVER (Previous([Axis.Rows],1)))))
+ Sum([Out%])
Row Axes:
<BinByDateTime([Dt],"Year.Month",1)>
There might be a cleaner version with below but I'm not good with THEN. The problem with LastPeriods is that you want NULL if there aren't enough months.
THEN If(Count() OVER (LastPeriods(3,[Axis.X]))=3,[Value],NULL)

SUMIF formula to find sum for dates today or earlier

so I have this table of salaries that I make (hypothetically):
A B C D
1 Date salary how_much_I_earned_so_far
2 total =SUM(salaries_until_today)
3 2017-10-1 5000
4 2017-11-1 5000
5 today-> 2017-12-1 5000
6 2018-01-1 5000
7 2018-02-1 5000
8 future.. 2018-03-1 5000
9 2018-04-1 5000
now I want to calculate on D2 the amount of money I have earned so far..
to do that, I want to sum up all the past salaries from C3 all the way down to C_x where x is the index of the line where today < B_x
so that raised me two questions
1) how do I select unknown index of cell? usually when I do formulas it looks like this =SUM(C2:C9) so how can I make the number 9 be variable?
2) can I create variable that depends on a number of lines of where a cell is smaller than a value? I know how to get the current day, its simply =TODAY() but now I want to compare it with all B's and find the index of line where its smaller than it.. how do I start?
I'm sorry if that's a weird question, I'm a programmer and its frustrating me that a simple thing that I could quickly solve by code cannot be accomplished in a sophisticated app such excel..
thanks.
=SUMIF(B:B,"<=" & TODAY(),C:C)

Sum of average in Spotfire calculated column

I am averaging a calculated column to show on a visualisation but my results are not what I want.
Exploration wells Discoveries made
0.10 0
0.87 0.32
0.51 0
0.35 0
0.51 0
I am calculating the success rate using a simple expression
[Discoveries made] / [Exploration Wells]
But when I use the Avg aggregate in the visual Spotfire averages the Expression row by row rather than taking the average of the sum (for each year lets say)
So instead of getting 13.5% I get 7.4%
Is there a workaround for this one?
Thanks.
Here's my summary table
I am looking to get the division of the sums for[Discoveries (net)] and [Exploration Wells (net), which for this year would be 2.65/7.71=0.34
Instead I am getting 0.29
My raw data table looks like that:
That's because Spotfire calculates the division on a row by row basis and provides me with the Avg of the calculated column (0.29)
Any ideas?
Thanks
#Mourst- I calculated success rate which is [Discoveries] / [Exploration Wells] with the raw data provided. I am getting 0.34.
Not sure as why you are getting 0.29. Could you post more details as how you are calculating?
Here are the output tables:

Resources