how to calculate accuracy from decision trees? - decision-tree

Hi, I am taking a course on Coursera and came into this question. My answer is 1-(4048+3456)/8124=0.076. However, the answer is 0.067. Anybody can help me to solve this? Thank you!!

Accuracy: The number of correct predictions made divided by the total number of predictions made.
We're going to predict the majority class associated with a particular node as True. i.e. use the larger value attribute from each node.
So the accuracy for:
Depth 1: (3796 + 3408) / 8124
Depth 2: (3760 + 512 + 3408 + 72) / 8124
Depth_2 - Depth_1 = 0.06745

First We will draw confusion metrics for both cases and then find accuracy.
Confusion metrics:
Accuracy= (TP + TN) / (Total number of observation)
Accuracy calculation:
Depth 1: (3796 + 3408) / 8124
Depth 2: (3760 + 512 + 3408 + 72) / 8124
Depth_2 - Depth_1 = 0.06745

Though answer is correct but the confusion matrix looks not correct.
This should be the confusion matrix( for depth-2).
enter image description here

Related

Ryan Joiner Normality Test P-value

i tried a lot to calculate the Ryan Joiner(RJ) P-Value. is there any method or formula to calculate RJ P-Value.
i found how to calculate RJ value but unable to find the manual calcuation part for RJ P-Value. Minitab is calculating by some startegy . i want to know that how calculate it in manually.
please support me on this.
The test statistic RJ needs to be compared to a critical value CV in order to make a determination of whether to reject or fail to reject the null hypothesis.
The value of CV depends on the sample size and confidence level desired, and the values are empirically derived: generate large numbers of normally distributed datasets for each sample size n, calculate RJ statistic for each, then CV for a=0.10 is the 10th percentile value of RJ.
Sidenote: For some reason I'm seeing a 90% confidence level used many places for Ryan-Joiner, when a 95% confidence is commonly used for other normality tests. I'm not sure why.
I recommend reading the original Ryan-Joiner 1976 paper:
https://www.additive-net.de/de/component/jdownloads/send/70-support/236-normal-probability-plots-and-tests-for-normality-thomas-a-ryan-jr-bryan-l-joiner
In that paper, the following critical value equations were empirically derived (I wrote out in python for convenience):
def rj_critical_value(n, a=0.10)
if a == 0.1:
return 1.0071 - (0.1371 / sqrt(n)) - (0.3682 / n) + (0.7780 / n**2)
elif a == 0.05:
return 1.0063 - (0.1288 / sqrt(n)) - (0.6118 / n) + (1.3505 / n**2)
elif a == 0.01:
return 0.9963 - (0.0211 / sqrt(n)) - (1.4106 / n) + (3.1791 / n**2)
else:
raise Exception("a must be one of [0.10, 0.05, 0.01]")
The RJ test statistic then needs to be compared to that critical value:
If RJ < CV, then the determination is NOT NORMAL.
If RJ > CV, then the determination is NORMAL.
Minitab is going one step further - working backwards to determine the value of a at which CV == RJ. This value would be the p-value you're referencing in your original question.

Linear decay as learning rate scheduler (pytorch)

I have read about LinearLR and ConstantLR in the Pytorch docs but I can't figure out, how to get a linear decay of my learning rate. Say I have epochs = 10 and lr=0.1 then I want to linearly reduce my learning-rate from 0.1 to 0 (or any other number) in 10 steps i.e by 0.01 in each step.
The two constraints you have are: lr(step=0)=0.1 and lr(step=10)=0. So naturally, lr(step) = -0.1*step/10 + 0.1 = 0.1*(1 - step/10).
This is known as the polynomial learning rate scheduler. Its general form is:
def polynomial(base_lr, iter, max_iter, power):
return base_lr * ((1 - float(iter) / max_iter) ** power)
Which in your case would be called with polynomial(base_lr=0.1, max_iter=10, power=1).

Implication of binary cross entropy loss value in Keras?

During training I saw that the binary cross entropy loss is positively unbounded.
So can we interpret anything from just looking at the loss value alone, for example if the binary cross entropy loss is 0.5 does this mean that the model could only guess the correct result half of the time ?
The loss seen is a mean average of the loss. When you have one output sigmoid, with a batch size of 1, in my opinion, thats right. Having a greater batch size, makes this more complicated. One example:
batch_size=4
error_batch_1 = 0.4 #close
error_batch_2 = 0.3 #close
error_batch_3 = 0.3 #close
error_batch_4 = 1 #far away
When the average is computed, we get: 2/4 = 0.5
When you look at the error that way, you would think that half of the predictions were correct, but in real, 3 of 4 were correct (implying, that the result is rounded to 1 or 0)

Rounding Error: Harmonic mean with exponent of small numbers

Let us say I have log_a1=-1000, log_a2=-1001, and log_a3=-1002.
n=3
I want to get the harmonic mean (HM) of a1, a2 and a3 (not log_a1, log_a2 and log_a3) such that HM = n/[1/exp(log_a1) + 1/exp(log_a2) + 1/exp(log_a3)].
However, due to rounding error exp(log_a1)=exp(-1000)=0 and accordingly 1/exp(log_a)=inf and HM=0.
Is there any mathematical trick to do? It is okay to get either HM or log(HM).
The best approach is probably to keep things in log scale. Many scientific languages have a log-add-exp function (e.g. numpy.logaddexp in python) that does what you want to high precision, with both the input and the result in log form.
The idea is that you want to compute e^-1000 + e^-1001 + e^-1002, so you factor it to e^-1000 (1 + + e^-1 + e^-2) and take the log. The result is -1000 + log(1 + e^-1 + e^-2), which can be computed without loss of precision.
log(HM)=log(n)-log(1)+log_a_max - log(sum(1./exp(log_ai - log_a_max)))
For a=[-1000, -1001, -1002];
log(HM)=-1001.309

How to avoid impression bias when calculate the ctr?

When we train a ctr(click through rate) model, sometimes we need calcute the real ctr from the history data, like this
#(click)
ctr = ----------------
#(impressions)
We know that, if the number of impressions is too small, the calculted ctr is not real. So we always set a threshold to filter out the large enough impressions.
But we know that the higher impressions, the higher confidence for the ctr. Then my question is that: Is there a impressions-normalized statistic method to calculate the ctr?
Thanks!
You probably need a representation of confidence interval for your estimated ctr. Wilson score interval is a good one to try.
You need below stats to calculate the confidence score:
\hat p is the observed ctr (fraction of #clicked vs #impressions)
n is the total number of impressions
zα/2 is the (1-α/2) quantile of the standard normal distribution
A simple implementation in python is shown below, I use z(1-α/2)=1.96 which corresponds to a 95% confidence interval. I attached 3 test results at the end of the code.
# clicks # impressions # conf interval
2 10 (0.07, 0.45)
20 100 (0.14, 0.27)
200 1000 (0.18, 0.22)
Now you can set up some threshold to use the calculated confidence interval.
from math import sqrt
def confidence(clicks, impressions):
n = impressions
if n == 0: return 0
z = 1.96 #1.96 -> 95% confidence
phat = float(clicks) / n
denorm = 1. + (z*z/n)
enum1 = phat + z*z/(2*n)
enum2 = z * sqrt(phat*(1-phat)/n + z*z/(4*n*n))
return (enum1-enum2)/denorm, (enum1+enum2)/denorm
def wilson(clicks, impressions):
if impressions == 0:
return 0
else:
return confidence(clicks, impressions)
if __name__ == '__main__':
print wilson(2,10)
print wilson(20,100)
print wilson(200,1000)
"""
--------------------
results:
(0.07048879557839793, 0.4518041980521754)
(0.14384999046998084, 0.27112660859398174)
(0.1805388068716823, 0.22099327100894336)
"""
If you treat this as a binomial parameter, you can do Bayesian estimation. If your prior on ctr is uniform (a Beta distribution with parameters (1,1)) then your posterior is Beta(1+#click, 1+#impressions-#click). Your posterior mean is #click+1 / #impressions+2 if you want a single summary statistic of this posterior, but you probably don't, and here's why:
I don't know what your method for determining whether ctr is high enough, but let's say you're interested in everything with ctr > 0.9. You can then use the cumulative density function of the beta distribution to look at what proportion of probability mass is over the 0.9 threshold (this will just be 1 - the cdf at 0.9). In this way, your threshold will naturally incorporate uncertainty about the estimate because of limited sample size.
There are many ways to calculate this confidence interval. An alternative to the Wilson Score is the Clopper-Perrson interval, which I found useful in spreadsheets.
Upper Bound Equation
Lower Bound Equation
Where
B() is the the Inverse Beta Distribution
alpha is the confidence level error (e.g for 95% confidence-level, alpha is 5%)
n is the number of samples (e.g. impressions)
x is the number of successes (e.g. clicks)
In Excel an implementation for B() is provided by the BETA.INV formula.
There is no equivalent formula for B() in Google Sheets, but a Google Apps Script custom function can be adapted from the JavaScript Statistical Library (e.g search github for jstat)

Resources