STATS: Interpreting KSL test in JMP - statistics

JMP has very little documentation on the KSL test when testing for normality. My data set is 10k large and I obtain the following when applying a goodness-of-fit test. Can someone make sense of the JMP output for me?
D Prob>D
0.081786 < 0.0100*
Note: H0 = The data is from the Normal distribution. Small p-values reject H0.
I suspect the interpretation is that the data is not from a normal distribution. Yet their results and formatting are confusing to me.
Thank you!

Your formatting is kind of messed up, but I know what you're talking about.
In short, this JMP output is telling you that your data is not from a Normal distribution. The less than sign < is part of the p-value output interpretation, telling you that the
Prob[D > D_critical] is less than 0.01
or
Prob[0.081786 > D_critical] < 0.01
This is probably related to how JMP calculates the KSL test stat and CDF since the KSL test is an empirical distribution. In the sense that the algorithm used won't continue calculating the p-value below 0.01, so it just says that your p-value is less than 0.01.
If you do this test in JMP where the null is not rejected, the Prob>D will show you "> 0.15", because their algorithm simply stops calculating once it hits 0.15.

Related

estimate standard error from sample means

Random sample of 143 girl and 127 boys were selected from a large population.A measurement was taken of the haemoglobin level(measured in g/dl) of each child with the following result.
girl n=143 mean = 11.35 sd = 1.41
boys n=127 mean 11.01 sd =1.32
estimate the standard error of the difference between the sample means
In essence, we'd pool the standard errors by adding them. This implies that we´re answering the question: what is the vairation of the sampling distribution considering both samples?
SD = sqrt( (sd₁**2 / n₁) + (sd₂**2 / n₂) \
SD = sqrt( (1.41**2 / 143) + (1.32**2 / 127) ≈ 0.1662
Notice that the standrad deviation squared is simply the variance of each sample. As you can see, in our case the value is quite small, which indicates that the difference between sampled means doesn´t need to be that large for there to be a larger than expected difference between obervations.
We´d calculate the difference between means as 0.34 (or -0.34 depending on the nature of the question) and divide this difference by the standrad error to get a t-value. In our case 2.046 (or -2.046) indicates that the observed difference is 2.046 times larger than the average difference we would expect given the variation the variation that we measured AND the size of our sample.
However, we need to verify whether this observation is statistically significant by determining the t-critical value. This t-critical can be easily calculated by using a t-value chart: one needs to know the alpha (typically 0.05 unless otherwise stated), one needs to know the original alternative hypothesis (if it was something along the lines of there is a difference between genders then we would apply a two tailed distribution - if it was something along the lines of gender X has a hameglobin level larger/smaller than gender X then we would use a single tailed distribution).
If the t-value > t-critical then we would claim that the difference between means is statistically significant, thereby having sufficient evident to reject the null hypothesis. Alternatively, if t-value < t-critical, we would not have statistically significant evidence against the null hypothesis, thus we would fail to reject the null hypothesis.

Is ftol and gtol needed in scipy.optimize.minimize is it proper to give it a very low value?

I am using the bounded limited memory BFGS optimizer to minimize the value of a black box function. I have simulated randomly many input parameter combinations and realized that the ftol and gtol parameter is only in the way, and it doesn't contribute anything to decreasing the value of my function (there is a positive correlation between the outputs and the random inputs for ftol and gtol, so the smaller the better). So I set both to 1E-18 and focused on configuring the other parameters, thus the exit message CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH meaning that the entire optimization depended on the correct value for eps I guess.
Then I set both ftol and gtol to 1E-20 to not stand in the way, but then I started getting sub-optimal results.
So my optimizer is:
scipy.optimize.minimize(function, x0=guess.flatten(), method='L-BFGS-B', bounds=bounds, options={ 'maxcor': maxcor, 'ftol': 1E-20, 'gtol': 1E-20, 'eps': eps, 'maxfun': maxrounds, 'maxiter': maxrounds, 'maxls': maxls})
So I set it to 1E-20 and the other values are fed randomly. The average output for a bigger sample is smaller with 1E-20 than with 1E-18, I don't understand why, they are supposed to be very small numbers that are negligible. I also started getting exit messages like CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL which I don't know how it was possible for such a small tolerance.
So I have the following questions:
1) Is it worth setting ftol and gtol to such low values as 1E-20?
2) Should I set tol the (external tolerance value) if ftol and gtol is already set? I don't want it to exit ahead of time. Or does tol as an exit threshold get disabled if gtol and ftol is enabled?
3) Is it possible that Scipy, Numpy or Python3 itself can't handle floating point values with 20 decimals. I noticed Python mostly prints 18 digits for floats, so could the problem be that I put too many digits. If so then what is the maximum number of digits handled by scipy.optimize? (Scipy v1.4.1 | Numpy v1.18.1 | Python 3.5.3)
The tolerance that you are setting is not really achievable due to round-off errors. You can read here more about the floating point precision in Python. You should choose a small, but sensible number for gtol and ftol, usually 1e-6 - 1e-8 works.

Calculating accuracy from precision, recall, f1-score - scikit-learn

I made a huge mistake. I printed output of scikit-learn svm accuracy as:
str(metrics.classification_report(trainExpected, trainPredict, digits=6))
Now I need to calculate accuracy from following output:
precision recall f1-score support
1 0.000000 0.000000 0.000000 1259
2 0.500397 1.000000 0.667019 1261
avg / total 0.250397 0.500397 0.333774 2520
Is it possible to calculate accuracy from these values?
PS: I don't want to spend another day for getting outputs of the model. I just realized this mistake hopefully I don't need to start from the beginning.
You can compute the accuracy from precision, recall and number of true/false positives or in your case support (even if precision or recall were 0 due to a 0 numerator or denominator).
TruePositive+FalseNegative=Support_True
TrueNegative+FalsePositive=Support_False
Precision=TruePositive/(TruePositive+FalsePositive) if TruePositive+FalsePositive!=0 else 0
Recall=TruePositive/(TruePositive+FalseNegative) if TruePositive+FalseNegative!=0 else 0
Accuracy=(TruePositive+TrueNegative)/(TruePositive+TrueNegative+FalsePositive+FalseNegative)
-or-
Given TruePositive/TrueNegative counts for example then:
TPP=TruePositive/Precision=TruePositive+FalsePositive if Precision!=0 and TruePositive!=0 else TPP=0
TPR=TruePositive/Recall=TruePositive+FalseNegative if Recall!=0 and TruePositive!=0 else TPR=0
In the above, when TruePositive==0, then no computation is possible without more information about FalseNegative/FalsePositive. Hence the support is better.
Accuracy=(TruePositive+TrueNegative)/(TPP+TPR-TruePositive+TrueNegative)
But in your case given was support so we use recall:
Recall=TruePositive/Support_True if Support_True!=0 else 0
TruePositive=Recall*Support_True, likewise TrueNegative=Recall_False*Support_False in all cases
Accuracy=(Recall*Support_True+Recall_False*Support_False)/(Support_True + Support_False)
In your case (0*1259+1*1261)/(1259+1261)=0.500397 which is exactly what you would expect when only one class is predicted. The respective precision score becomes the accuracy in that case.
Like the other poster said, better to use the library. But since this sounded like potentially a mathematical question as well, this can be used.
No need to spend more time on it. The metrics module has everything you need in it and you have already computed the predicted values. It's a one line change.
print(metrics.accuracy_score(trainExpected, trainPredict))
I suggest that you spend some time to read the linked page to learn more about evaluating models in general.
I do think you have a bigger problem with at hand -- you have zero predicted values for your 1 class, despite having balanced classes. You likely have a problem in your data, modeling strategy, or code that you'll have to deal with.

Test an hypothesis significantly

How can I test the hypothesis that the execution time of an
algorithm is not exponentially, with respect to the size of the data.
For exaple, I have the sample:
[n time(s)] = {[02 0.36], [03 1.15], [04 2.66], [05 5.48], [06 6.54], [07 11.22], [08 12.87], [09 16.94], [10 17.59]}
where n is the size of the data. I want to proof significantly that the time does not grow exponentially with respect to the data.
What should be the hypotheses H0, H1.
Should I use anova or f-test? How do I apply it?
Thanks.
Note: this should be a comment, not an answer, but it got too long.
You probably need to learn a bit more about the rationale behind hypothesis testing. I suggest that you start with some online material such as this: http://stattrek.com/hypothesis-test/hypothesis-testing.aspx, but you may also need to look at some books on statistics. Your question, as it is now, cannot be answered because you can never "use statistics to prove something". Statistics will only tell you what is probable. So you cannot prove that the execution time does not grow exponentially. From you sample data, it really looks that it's not exponential. As a matter of fact, it really looks to be linear, so the growth is probably linear:
The code for generating this image in R is:
> n <- 2:10
> time <-c(0.36, 1.15, 2.66, 5.48, 6.54, 11.22, 12.87, 16.94, 17.59)
> model.linear <- lm(time ~n) # LM = Linear Model, time ~ a*n + b
> plot(time ~ n)
> lines(predict(model.linear)~n, col=2)
Do you need statistics to show that this linear model is a good fit? I hope you don't.

P-value, significance level and hypothesis

I am confused about the concept of p-value. In general, if the p-value is greater than alpha which is generally 0.05, we are fail to reject null hypothesis and if the p-value is less than alpha, we reject null hypothesis. As I understand, if the p-value is greater than alpha, difference between two group is just coming from sampling error or by chance.So far everything is okay. However, if the p-value is less than alpha, the result is statistically significant, I was supposing it to be statistically nonsignificant ( because, in case p-value is less than alpha we reject null hypothesis).
Basically, if result statistically significant, reject null hypothesis. But, how a hypothesis can be rejected, if it is statistically significant? From the word of "statistically significant", I am understanding that the result is good.
You are mistaking what the significance means in terms of the p-value.
I will try to explain below:
Let's assume a test about the means of two populations being equal. We will perform a t-test to test that by drawing one sample from each population and calculating the p-value.
The null hypothesis and the alternative:
H0: m1 - m2 = 0
H1: m1 - m2 != 0
Which is a two-tailed test (although not important for this).
Let's assume that you get a p-value of 0.01 and your alpha is 0.05. The p-value is the probability of the means being equal when sampling from the two populations (m1 and m2). This means that there is a 1% probability that the means will be equal or in other words only 1 out of 100 sample pairs will have a mean difference of 0.
Such a low probability of the two means being equal makes us confident (makes us certain) that the means of the populations are not equal and thus we consider the result to be statistically significant.
What is the threshold that makes us think that a result is significant? That is determined by the significance level (a) which in this case is 5%.
The p-value being less than the significance level is what makes us think that the result is significant and therefore we are certain that we can reject the null hypothesis since the probability of the NULL hypothesis being true is very low.
I hope that makes sense now!
Let me make an example that I often use with my pupils, in order to explan the concepts of null hypothesis, alpha, & significance.
Let's say we're playing a round of Poker. I deal the cards & we make our bets. Hey, lucky me! I got a flush on my first hand. You curse your luck and we deal again. I get another flush and win. Another round, and again, I get 4 aces: at this point you kick the table and call me a cheater: "this is bs! You're trying to rob me!"
Let's explain this in terms of probability: There is a possibility associated with getting a flush on the first hand: anyone can get lucky. There's a smaller probability of getting too lucky twice in a row. There is finally a probability of getting really lucky three times in a row. But for the third shot, you are stating: "the probability that you get SO LUCKY is TOO SMALL. I REJECT the idea that you're just lucky. I'm calling you a cheater". That is, you rejected the null hypothesis (the hypothesis that nothing is going on!)
The null hypothesis is, in all cases: "This thing we are observing is an effect of randomness". In our example, the null hypothesis states: "I'm just getting all these good hands one after the other, because i'm lucky"
p-value is the value associated with an event, given that it happens randomly. You can calculate the odds of getting good hands in poker after properly shuffling the deck. Or for example: if I toss a fair coin 20 times, the odss that I get 20 heads in a row is 1/(2^20) = 0.000000953 (really small). That's the p-value for 20 heads in a row, tossing 20 times.
"Statistically significant", means "This event seems to be weird. It has a really tiny probability of happening by chance. So, i'll reject the null hypothesis."
Alpha, or critical p-value, is the magic point where you "kick the table", and reject the null hypothesis. In experimental applications, you define this in advance (alpha=0.05, e.g.) In our poker example, you can call me a cheater after three lucky hands, or after 10 out of 12, and so on. It's a threshold of probability.
okay for p-value you should at least know about null hypothesis and alternate hypothesis
null hypothesis means take an example we have 2 flowers and it is saying there is no significant difference between them
and alternate hypothesis is saying that there is significant difference between them
and yes what is the significant value for p- value most of the data scientist take as 0.05 but it is based on researches(value of level of significant)
0.5
0.05
0.01
0.001
can be taken as p-value
okay now p-value is taken by you but what to do next
if your model p-value is 0.03 and significant value you have taken 0.05 so you have to reject null hypothesis means there is significant difference between 2 flowers or simple as stated
p-value of your model < level of significant than reject it
and your model p-value is >level of significant than null hypothesis is going to accept.

Resources