Replicate graph in Excel - excel

I am trying to replicate the below graph in Excel. Actually, it is a sample size calculator which i found in the below page (see the link)
https://goodcalculators.com/sample-size-calculator/
I use the formula:
`n = [z2 * p * (1 - p) / e2] / [1 + (z2 * p * (1 - p) / (e2 * N))]`
I order to replicate the graph created 2 columns in Excel the "Sample" and "MoE"
For Margin error (MoE) i just created the values from 1 to 0 (eg. 1, 0.99, 0.98, 0.97,...,0)
and then for column sample i estimated it based on the above equation using the below values
z - 1.96 for a confidence level (α) of 95%,
p - proportion (expressed as a decimal),
N - population size,
e - margin of error (MoE)
and for margin of error i used the value from margin error column which created above. However, when i plot the data this is what i get. i am confused where is my mistake

Related

Formula to Calculate Subscriber Churn Revenue

I'm trying to sum up 12 months of subscriber revenue factoring a 6% monthly churn (assuming no signups) to come up with the one-year value of a subscriber. A simple future value gives me the start and end values, but I'm looking to get the sum of the monthly declining revenues in a single Excel / Google Sheets formula. I can make 11 entries (plus the starting month full value), but is there a better one-liner or formula for this?
This gives me the 12th-month revenue:
=FV(-6%,11,0,100)
I'd like to get the sum without this:
=100 + FV(-6%,1,0,100) + FV(-6%,2,0,100) ... FV(-6%,11,0,100)
You are looking for the sum of a finite geometric series:
1 + r + r^2 + r^3 .... + r^11
And the sum of this series is
(1 - r^12) / (1 - r)
where r = 1 - 6%
So the formula would be
= (1 - (1-6%)^12 ) / (1 - (1-6%) ) * 100
This is assuming the OP meant
=100 + FV(-6%,1,0,-100) + FV(-6%,2,0,-100) ... FV(-6%,11,0,-100)
as FV(-6%,1,0,100) would output a negative number
I don't know much about such math but would the following formula give you the result?
=100+SUMPRODUCT(FV(-6%,ROW(1:11),0,-100))
The formula works in both Excel and Google Spreadsheets

Something in for-loop breaks down after 17 loops

I have the following for-loop:
i = 1
y = 4
for column in new_columns:
df[column] = (df['column1'] * (1+G1)**i * (df['ER1'] - df['ER2']) \
* df['column2'] * (1+ df['Column3'])**i + df['column1'] \
* (1+G1)**i * df['E2'] * df['Column2'] * (1+ df['Column3'])**i \
* (1 - ER2) * df['Column4'])/ (1 + df['ER4'])**y
i += 1
y += 1
I noticed a bizarre kink in a graph made of the new columns and I decided to double-check the calculation by running the same thing in MS Excel. The ratio between the Python and Excel columns is 1 until loop number 17. On the 18th loop, the ratio jumps to 1.0249 (Python produces 2.5 % higher numbers) and stays there until the last loop (30). There is no kink on the graph produced in MS Excel. Any wise thoughts?
After spending about 8 hours on this, I finally noticed that I had a duplicate in new columnswhich obviously didn't show up on the resulting dataframe and hence the difference with Excel. Sorry guys.

Formula for weighted averages

This is a bit of a hybrid between a mathematical and an Excel issue. I currently have an Excel sheet with a list of yearly observations. To simplify, lets say that for five years I'm looking at:
2015=5
2014=3
2013=4
2012=1
2011=6
What I would like to do is write a formula that counts the number of values in question (5 in this case), divides 100% of the weight and and makes each preceding value be worth 10% less than the last.
So in this case
2015 would be worth (roughly rounded) 24%
2014=22%
2013=20%
2012=18%
2011=16%
if you add the weight for each they add up to 100%.
As an example the numbers to be presented for weighting are:
1.225 for 2015 (5*.24)
.6615 for 2014 (3*.22)
.7938 for 2013 (4*.20)
.1786 for 2012 (1*.18)
.9645 for 2011 (6*.16)
I have calculated all of these numbers manually but would need a formula that can adapt to the number of periods being used as I will be adding more over time.
Sum of n terms (Sn) of an geometric series starting with a and having ratio r is:
Sn = a(1 − r^n)
___________
1 − r
All you need to do is rearrange and solve for a given Sn=100, r=0.9 and n=number of terms....
a(1 − r^n) = Sn(1-r)
a = Sn(1-r)
_______
(1-r^n)
For 5 terms:
a = 100 * (1 - 0.9) / (1 - 0.9^5) = 24.419
for 10 terms:
a = 100 * (1 - 0.9) / (1 - 0.9^10) = 15.353

Explanation of normalized edit distance formula

Based on this paper:
IEEE TRANSACTIONS ON PAITERN ANALYSIS : Computation of Normalized Edit Distance and Applications In this paper Normalized Edit Distance as followed:
Given two strings X and Y over a finite alphabet, the normalized edit
distance between X and Y, d( X , Y ) is defined as the minimum of W( P
) / L ( P )w, here P is an editing path between X and Y , W ( P ) is
the sum of the weights of the elementary edit operations of P, and
L(P) is the number of these operations (length of P).
Can i safely translate the normalized edit distance algorithm explained above as this:
normalized edit distance =
levenshtein(query 1, query 2)/max(length(query 1), length(query 2))
You are probably misunderstanding the metric. There are two issues:
The normalization step is to divide W(P) which is the weight of the edit procedure over L(P), which is the length of the edit procedure, not over the max length of the strings as you did;
Also, the paper showed that (Example 3.1) normalized edit distance cannot be simply computed with levenshtein distance. You probably need to implement their algorithm.
An explanation of Example 3.1 (c):
From aaab to abbb, the paper used the following transformations:
match a with a;
skip a in the first string;
skip a in the first string;
skip b in the second string;
skip b in the second string;
match the final bs.
These are 6 operations which is why L(P) is 6; from the matrix in (a), matching has cost 0, skipping has cost 2, thus we have total cost of 0 + 2 + 2 + 2 + 2 + 0 = 8, which is exactly W(P), and W(P) / L(P) = 1.33. Similar results can be obtained for (b), which I'll left to you as exercise :-)
The 3 in figure 2(a) refers to the cost of changing "a" to "b" or the cost of changing "b" to "a". The columns with lambdas in figure 2(a) mean that it costs 2 in order to insert or delete either an "a" or a "b".
In figure 2(b), W(P) = 6 because the algorithm does the following steps:
keep first a (cost 0)
convert first b to a (cost 3)
convert second b to a (cost 3)
keep last b (cost 0)
The sum of the costs of the steps is W(P). The number of steps is 4 which is L(P).
In figure 2(c), the steps are different:
keep first a (cost 0)
delete first b (cost 2)
delete second b (cost 2)
insert a (cost 2)
insert a (cost 2)
keep last b (cost 0)
In this path there are six steps so the L(P) is 6. The sum of the costs of the steps is 8 so W(P) is 8. Therefore the normalized edit distance is 8/6 = 4/3 which is about 1.33.

Express a percentage as a range value

I have a range:
1.75 [which I need to be 100%] to 4 [which needs to be 0%] (inclusive).
I need to be able to put in the percent, and get back the value.
How would I find the value for a percentage, say 50%, using a formula in Excel?
What I have tried so far: If I 'pretend' to reverse the percentage so that 1.75 is 0% and 4 is 100%, it seems a lot easier: I can use = (x - 1.75) / (4 - 1.75) * 100 to return the percentage of x, which is to say (x - min) / (max - min) * 100 = percentage of a range.
But I can't get this to work when the max is actually lower than the min. And...I'm not looking for the percent, I'm looking for the value when I enter the percent. :-/
The percentage of the value in the range is
=(max - value) / (max - min)
The value at some percentage is
=(min - max) * percentage + max
Edit: Perhaps a more intuitive way to attack "the value at some percentage" (notice I changed the terms here):
= (max - min) * (1- percentage) + min
IOW,
= (total distance) * (complement of fractional distance) + baseline
The complement is needed because you have reversed the sense of upper and lower bounds.
Like so, I used =4-(2.25*A1),=4-(2.25*A2) and =4-(2.25*A3)
0 4
0.5 2.875
1 1.75

Resources