ANOVA result is inconsistent (AIC VS deviance) - statistics

I am working on GLM models (using glmer). Now I am exploring whether I need an interaction term. I'd like to find the best model, but the following result is confusing:
Models:
g1: y ~ year + (1 | BZR/PLR)
g2: y ~ year + year * BZR + (1 | BZR/PLR)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
g1 11 16580 16664 -8279.2 16558
g2 90 16612 17296 -8215.8 16432 126.87 79 0.0005135 ***
AIC is better in g1 while g2 shows better logLiK and deviance. Which model do I have to select?
Thank you very much for all the comments in advance!!
Best,
Aekyung

Related

Calculate quarterly compound interest from yearly simple interest

Let's say the yearly simple interest is 10% on a principal of $100. At the end of one year, the new principal is $110. I'm trying to calculate the compound interest equivalent so by the end of the 4th quarter, the new principal should still be $110. In the example below, I'm compounding quarterly (which is incorrect) and I'm ending up with $110.38. How do I modify the formula so I end up at $110?
With your current setup:
In B5: =B2*(1+B1)^(1/4)
In B6 and drag down: =B5*(1+B$1)^(1/4).
This is technically maths rather than programming but, since Excel is a crossover, we can possibly let it through :-)
The formula for calculating initial capital plus cumulative interest on an amount of b at r% per period over n periods is:
newb = b * (1 + r/100)n
Hence, the formula for getting 10% per year with quarterly interest over that year is (using 1.1, since newb must be 10% higher than b):
1.1 = (1 + r/100)4
So, let's just give the expression 1 + r/100 the term mult for now, and we can work out the rate from that later:
mult^4 = 1.1
=> mult = ∜(1.1)
=> mult = 1.024113 (roughly)
We can then calculate that the desired interest rate is 2.4113% (by starting with mult, subtracting one, then multiplying by a hundred).
And here's the table to prove it (interest values are rouned):
Current New
Balance Interest Balance
------- -------- -------
100.00 2.41 102.41
102.41 2.47 104.88
104.88 2.53 107.41
107.41 2.59 110.00
-----
10.00
You can see that you reach the 10% increase at the end of the fourth quarter.
In Excel, assuming A1 holds the desired annual interest rate (like 10) and B1 holds the number of periods in a year (like 4), you can calculate the periodic interest rate with:
= 100 * (power (1 + a1 / 100, 1 / b1) - 1)
as per the following screenshot (which also has the four quarterly calculations):
The formulae for the tabular cells are, if you're interested:
+ A + B + C
3 | 100 | =ROUND(a3*$c$1/100,2) | =a3+b3
4 | =c3 | =ROUND(a3*$c$1/100,2) | =a4+b4
5 | =c4 | =ROUND(a3*$c$1/100,2) | =a5+b5
6 | =c5 | =ROUND(a3*$c$1/100,2) | =a6+b6
Feel free to use them as you see fit.

Formula to Calculate Subscriber Churn Revenue

I'm trying to sum up 12 months of subscriber revenue factoring a 6% monthly churn (assuming no signups) to come up with the one-year value of a subscriber. A simple future value gives me the start and end values, but I'm looking to get the sum of the monthly declining revenues in a single Excel / Google Sheets formula. I can make 11 entries (plus the starting month full value), but is there a better one-liner or formula for this?
This gives me the 12th-month revenue:
=FV(-6%,11,0,100)
I'd like to get the sum without this:
=100 + FV(-6%,1,0,100) + FV(-6%,2,0,100) ... FV(-6%,11,0,100)
You are looking for the sum of a finite geometric series:
1 + r + r^2 + r^3 .... + r^11
And the sum of this series is
(1 - r^12) / (1 - r)
where r = 1 - 6%
So the formula would be
= (1 - (1-6%)^12 ) / (1 - (1-6%) ) * 100
This is assuming the OP meant
=100 + FV(-6%,1,0,-100) + FV(-6%,2,0,-100) ... FV(-6%,11,0,-100)
as FV(-6%,1,0,100) would output a negative number
I don't know much about such math but would the following formula give you the result?
=100+SUMPRODUCT(FV(-6%,ROW(1:11),0,-100))
The formula works in both Excel and Google Spreadsheets

Why scikit learn confusion matrix is reversed?

I have 3 questions:
1)
The confusion matrix for sklearn is as follows:
TN | FP
FN | TP
While when I'm looking at online resources, I find it like this:
TP | FP
FN | TN
Which one should I consider?
2)
Since the above confusion matrix for scikit learn is different than the one I find in other rescources, in a multiclass confusion matrix, what's the structure will be? I'm looking at this post here:
Scikit-learn: How to obtain True Positive, True Negative, False Positive and False Negative
In that post, #lucidv01d had posted a graph to understand the categories for multiclass. is that category the same in scikit learn?
3)
How do you calculate the accuracy of a multiclass? for example, I have this confusion matrix:
[[27 6 0 16]
[ 5 18 0 21]
[ 1 3 6 9]
[ 0 0 0 48]]
In that same post I referred to in question 2, he has written this equation:
Overall accuracy
ACC = (TP+TN)/(TP+FP+FN+TN)
but isn't that just for binary? I mean, for what class do I replace TP with?
The reason why sklearn has show their confusion matrix like
TN | FP
FN | TP
like this is because in their code, they have considered 0 to be the negative class and one to be positive class. sklearn always considers the smaller number to be negative and large number to positive. By number, I mean the class value (0 or 1). The order depends on your dataset and class.
The accuracy will be the sum of diagonal elements divided by the sum of all the elements.p The diagonal elements are the number of correct predictions.
As the sklearn guide says: "(Wikipedia and other references may use a different convention for axes)"
What does it mean? When building the confusion matrix, the first step is to decide where to put predictions and real values (true labels). There are two possibilities:
put predictions to the columns, and true labes to rows
put predictions to the rows, and true labes to columns
It is totally subjective to decide which way you want to go. From this picture, explained in here, it is clear that scikit-learn's convention is to put predictions to columns, and true labels to rows.
Thus, according to scikit-learns convention, it means:
the first column contains, negative predictions (TN and FN)
the second column contains, positive predictions (TP and FP)
the first row contains negative labels (TN and FP)
the second row contains positive labels (TP and FN)
the diagonal contains the number of correctly predicted labels.
Based on this information I think you will be able to solve part 1 and part 2 of your questions.
For part 3, you just sum the values in the diagonal and divide by the sum of all elements, which will be
(27 + 18 + 6 + 48) / (27 + 18 + 6 + 48 + 6 + 16 + 5 + 21 + 1 + 3 + 9)
or you can just use score() function.
The scikit-learn convention is to place predictions in columns and real values in rows
The scikit-learn convention is to put 0 by default for a negative class (top) and 1 for a positive class (bottom). the order can be changed using labels = [1,0].
You can calculate the overall accuracy in this way
M = np.array([[27, 6, 0, 16], [5, 18,0,21],[1,3,6,9],[0,0,0,48]])
M
sum of diagonal
w = M.diagonal()
w.sum()
99
sum of matrices
M.sum()
160
ACC = w.sum()/M.sum()
ACC
0.61875

Calculating contrast values on Excel

I am currently studying experimental designs in statistics and I am calculating values pertaining to 2^3 factorial designs.
The question that I have is particularly with the calculations of the "contrasts".
My goal of this question is to learn how to use the table "Coded Factors" and "Total" in order to get the values "Contrast" using the IF THEN function in Excel.
For example, Contrast A is calculated as : x - y . Where
x = sum of the values in the Total, where the Coded Factor A is + .
And y= sum of the values in the Total, where the Coded Factor A is - .
This would be rather simple, but for the interactions it is a bit more complex.
For example, contrast AC is obtained as : x - y . Where
x = sum of the values in the Total, where the product of Coded Factor A and that of C becomes + .
And y = sum of the values in the Total, where the product of Coded Factor A and that of B becomes - .
I would really appreciate your help.
Edited:
Considering the way how IF statements work, I thought that it might be a good idea to convert the + into 1 and - into -1 to make the calculation straight forward.
Convert all +/- to 1/-1. Use some cells as helper..
Put in these formulas :
J2 --> =LEFT(J1)
K2 --> =MID(J1,2,1)
L2 --> =MID(J1,3,1)
Put
J3 --> =IF(J$2="",1,INDEX($B3:$D3,MATCH(J$2,$B$2:$D$2,0)))
and drag to L10. Then
M3 --> =J3*K3*L3*G3
and drag to M10. Lastly,
M1 --> =SUM(M3:M10)
How to use : Input the Factor comb in cell J1 and the result will be in M1.
Idea : separate the factor text > load the multiplier > multiply Total values with multiplier > get sum.
Hope it helps.

Formula for weighted averages

This is a bit of a hybrid between a mathematical and an Excel issue. I currently have an Excel sheet with a list of yearly observations. To simplify, lets say that for five years I'm looking at:
2015=5
2014=3
2013=4
2012=1
2011=6
What I would like to do is write a formula that counts the number of values in question (5 in this case), divides 100% of the weight and and makes each preceding value be worth 10% less than the last.
So in this case
2015 would be worth (roughly rounded) 24%
2014=22%
2013=20%
2012=18%
2011=16%
if you add the weight for each they add up to 100%.
As an example the numbers to be presented for weighting are:
1.225 for 2015 (5*.24)
.6615 for 2014 (3*.22)
.7938 for 2013 (4*.20)
.1786 for 2012 (1*.18)
.9645 for 2011 (6*.16)
I have calculated all of these numbers manually but would need a formula that can adapt to the number of periods being used as I will be adding more over time.
Sum of n terms (Sn) of an geometric series starting with a and having ratio r is:
Sn = a(1 − r^n)
___________
1 − r
All you need to do is rearrange and solve for a given Sn=100, r=0.9 and n=number of terms....
a(1 − r^n) = Sn(1-r)
a = Sn(1-r)
_______
(1-r^n)
For 5 terms:
a = 100 * (1 - 0.9) / (1 - 0.9^5) = 24.419
for 10 terms:
a = 100 * (1 - 0.9) / (1 - 0.9^10) = 15.353

Resources