Something in for-loop breaks down after 17 loops - python-3.x

I have the following for-loop:
i = 1
y = 4
for column in new_columns:
df[column] = (df['column1'] * (1+G1)**i * (df['ER1'] - df['ER2']) \
* df['column2'] * (1+ df['Column3'])**i + df['column1'] \
* (1+G1)**i * df['E2'] * df['Column2'] * (1+ df['Column3'])**i \
* (1 - ER2) * df['Column4'])/ (1 + df['ER4'])**y
i += 1
y += 1
I noticed a bizarre kink in a graph made of the new columns and I decided to double-check the calculation by running the same thing in MS Excel. The ratio between the Python and Excel columns is 1 until loop number 17. On the 18th loop, the ratio jumps to 1.0249 (Python produces 2.5 % higher numbers) and stays there until the last loop (30). There is no kink on the graph produced in MS Excel. Any wise thoughts?

After spending about 8 hours on this, I finally noticed that I had a duplicate in new columnswhich obviously didn't show up on the resulting dataframe and hence the difference with Excel. Sorry guys.

Related

Replicate graph in Excel

I am trying to replicate the below graph in Excel. Actually, it is a sample size calculator which i found in the below page (see the link)
https://goodcalculators.com/sample-size-calculator/
I use the formula:
`n = [z2 * p * (1 - p) / e2] / [1 + (z2 * p * (1 - p) / (e2 * N))]`
I order to replicate the graph created 2 columns in Excel the "Sample" and "MoE"
For Margin error (MoE) i just created the values from 1 to 0 (eg. 1, 0.99, 0.98, 0.97,...,0)
and then for column sample i estimated it based on the above equation using the below values
z - 1.96 for a confidence level (α) of 95%,
p - proportion (expressed as a decimal),
N - population size,
e - margin of error (MoE)
and for margin of error i used the value from margin error column which created above. However, when i plot the data this is what i get. i am confused where is my mistake

Unexpected modulo behavior

In excel, I have this formula:
MOD(-10 + 9, 12) + 1
And the expected result is 12
However, in PowerQuery the same formula:
Number.Mod(-10 + 9, 12) + 1
Results in 0
The strange thing is that for other numbers ( -1 ) I get the same result in both systems..
I expect this to have something to do with the nature of MOD, and how I'm using negative numbers.. But I would still like to know which is 'correct'..
I found the answer here:
https://www.youtube.com/watch?v=K4ImPRsi3vg&ab_channel=ExcelIsFun
MOD(n, d) = n - d * INT(n/d)
Number.Mod(n, d) = n - d * TRUNC(n/d)
They are calculated in different ways.

Calculate The Total Result of The Arithmetic Sequence With Big Number in Less Than 1 Second

How to Construct a Python 3 function sum(n) that takes a positive integer n as an input and perform the following computation:
sum(n)=5+10+⋯+5(n−1)+5n.
The value of 𝑛n is between 1 and 10^15. The timelimit for the computation is 1 second. To make your code efficient, try to use the explicit formula (closed form) of sum(n).
Test:
print(sum(1))
print(sum(2))
print(sum(3))
Result:
5
15
30
What I Have Tried:
def sum(n):
AK = 0
n += 1
for i in range(1,n):
P = 5 * i
AK += P
return AK
Unfortunately it takes more than 1 second to finish
as Hans Kesting said, the result is 5 times the sum of 1...n and so you can try this simple and easy piece of code. I haven't actually tried it but in practice, it should be less than one second
def sum(n):
return 5 * (n * (n + 1) // 2)

Generate a Dataframe that follow a mathematical function for each column / row

Is there a way to create/generate a Pandas DataFrame from scratch, such that each record follows a specific mathematical function?
Background: In Financial Mathematics, very basic financial-derivatives (e.g. calls and puts) have closed-form pricing formulas (e.g. Black Scholes). These pricing formulas can be called stochastic functions (because they involve a random term)
I'm trying to create a Monte Carlo simulation of a stock price (and subseuqently an option payoff and price based on the stock price). I need, say, 1000 paths (rows) and 100 time-steps (columns). I want to "initiate" a dataframe that is 1000 by 100 and follows a stochastic equation.
# Psuedo-code
MonteCarloDF = DataFrame(rows=1000, columns=100, customFunc=TRUE,
appliedBy='by column',
FUNC={s0=321;
s_i=prev*exp(r-q*sqrt(sigma))*T +
(etc)*NormDist(rnd())*sqr(deltaT)}
)
Column 0 in every row would be 321, and each subsequent column would be figured out based on the FUNC above.
This is an example of something similar done in VBA
Function MonteCarlo_Vanilla_call(S, K, r, q, vol, T, N)
sum = 0
payoff = 0
For i = 1 To N
S_T = S * Exp((r - q - 0.5 * vol ^ 2) * T + vol * Sqr(T) * Application.NormSInv(Rnd()))
payoff = Application.Max(S_T - K, 0)
sum = sum + payoff
Next i
MonteCarlo_Vanilla_call = Exp(-r * T) * sum / N
End Function
Every passed in variable is a constant.
In my case, I want each next column in the same row to be just like S_T in the VBA code. That's really the only like that matters. I want to apply a function like S_T = S * Exp((r - q - 0.5 * vol ^ 2) * T + vol * Sqr(T) * Application.NormSInv(Rnd())) . Each S_T is the next column in the same row. There's N columns making one simulation. I will have, for example, 1000 simulations.
321 | 322.125 | 323.277 | ... | column 100 value
321 | 320.704 | 319.839 | ... | column 100 value
321 | 321.471 | 318.456 | ... | column 100 value
...
row 1000| etc | etc | ... | value (1000,100)
IIUC, you could create your own function to generate a DataFrame.
Within the function iterate using .iloc[:, -1] to use the last created column.
We'll also use numpy.random.randn to generate an array of normally distributed random values.
You may need to adjust the default values of your variables, but the idea would be something like:
Function
import pandas as pd
import numpy as np
from math import exp, sqrt
def monte_carlo_df(nrows,
ncols,
col_1_val,
r=0.03,
q=0.5,
sigma=0.002,
T=1.0002,
deltaT=0.002):
"""Returns stochastic monte carlo DataFrame"""
# Create first column
df = pd.DataFrame({'s0': [col_1_val] * nrows})
# Create subsequent columns
for i in range(1, ncols):
df[f's{i}'] = (df.iloc[:, -1] * exp(r - q * sqrt(sigma)) * T
+ (np.random.randn(nrows) * sqrt(deltaT)))
return df
Usage example
df = monte_carlo_df(nrows=1000, ncols=100, col_1_val=321)
To me your problem is a specific version of the following one: Pandas calculations based on other rows. Since you can pivot it shouldn't matter if we are talking rows or columns.
There is also a question relating to calculations using columns: Pandas complex calculation based on other columns which has a good suggestion of using a rolling window (rolling function) or using shift function: Calculate the percentage increase or decrease based on the previous column value of the same row in pandas dataframe
Speed considerations of similar calculations (or numpy vs pandas discussion): Numpy, Pandas: what is the fastest way to calculate dataset row value basing on previous N values?
To sum it all up - it seems that your question is somewhat of a duplicate.

using excel divide any value in a col by 1000, then multp new value by 10 & upd a running total cell, while leaving the orig value alone

For example if I enter a 2000 in B3, I would like that number divided by 1000, then multiplied by 10, and have the new value added to a running total. ie (2000/1000 * 10=20)
RunningTotal = 20
For clarity, if I enter 8000 in B4, then I would like to (8000/1000 * 10 = 80 )
RunningTotal = 100
Notice that
(x / 1000 * 10) + (y / 1000 * 10) = (x + y)/1000 * 10
So the equation for your running total cell only needs to be:
=SUM(B3:B10)/1000*10
Assuming B3:B10 is the appropriate range for your inputs.
equation formula
(x / 1000 * 10) + (y / 1000 * 10) = (x + y)/1000 * 10
=
(x+y) * 0.01
=
sum(B3:B?) * 0.01
You can simply record a macro:
http://office.microsoft.com/en-us/excel-help/create-a-macro-HP005204711.aspx
So you do everything you would like to calculate "by hand" while recording a macro and then you can call it at any time you want to execute.
Thanks for the replies.
I will be entering in values into col B(B3,B4...B200 etc.). The running total value will be displayed in it's own cell. The running total value will be updated upon save or enter. The orig value entered into the B column cells should remain the same.
The formula or calculation should occur anytime a value is entered into a cell within col B.
So if I have any number of values in col B, what I am expecting is:
1000/1000 * 10 = 10
5000/1000 * 10 = 60
8000/1000 * 10 = 140 etc.
If I use a separate cell to hold the runningTotal value, for ex K2...formula is: =C3/1000*10 ...how can I repeat this formula for subsequent cells in col B, but update the cell K2?
Must I use a holding column for each corresponding value entered in col B, then add the value to the K2 cell?
Ah, yes, I have managed to get it working by using a holding cell. if there is a more elegant approach, I am open to suggestions.
thanks for your advice.

Resources