Mlogloss value in catboost starts negative and increases - loss-function

I am running catboost classifier with catboost using settings:
model = CatBoostClassifier(iterations=1000, learning_rate=0.05, depth=7, loss_function='MultiClass',calc_feature_importance=True)
I have 5 classes and it starts from -ve values and increases as below while fitting model:
0: learn: -1.5036342 test: -1.5039740 best: -1.5039740 (0) total: 18s remaining: 4h 59m 46s
1: learn: -1.4185548 test: -1.4191364 best: -1.4191364 (1) total: 37.8s remaining: 5h 14m 24s
2: learn: -1.3475387 test: -1.3482641 best: -1.3482641 (2) total: 56.3s remaining: 5h 12m 1s
3: learn: -1.2868831 test: -1.2877465 best: -1.2877465 (3) total: 1m 15s remaining: 5h 12m 32s
4: learn: -1.2342138 test: -1.2351585 best: -1.2351585 (4) total: 1m 34s remaining: 5h 13m 56s
Is this normal behaviour? While in most of the machine learning algorithms, logloss is positive and decreases with training. What am I missing here?

Yes, this is normal behaviour.
When you specify loss_function='MultiClass' in parameters of your model, it uses another loss function, not LogLoss, for optimisation. The definition can be found here.
To understand the sign of that function, you can think of the best-case scenario and the worst-case scenario. In the best-case the value of the target function for object ai is all concentrated in the correct class t, so the fraction in the log (in the formula on the linked page) would equal 1, and the log would be 0. However, when you diverge from that best-case, the fraction in the log would decrease towards 0, and the log itself would get more and more negative.

Related

How to set the maxfun limit of the lbfgs solver on scikit-learn LogisticRegression model?

My scikit-learn LogisticRegression model, which uses the lbfgs solver, is stopping early as shown in the logs bellow. The data is standardized.
(...)
At iterate13150 f= 4.05397D+03 |proj g|= 2.41194D+04
At iterate13200 f= 4.05213D+03 |proj g|= 1.36863D+04
.venv/lib/python3.8/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of f AND g EVALUATIONS EXCEEDS LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[Parallel(n_jobs=-1)]: Done 1 out of 1 | elapsed: 5.5s finished
* * *
Tit = total number of iterations
Tnf = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip = number of BFGS updates skipped
Nact = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F = final function value
* * *
N Tit Tnf Tnint Skip Nact Projg F
62 13240 15001 1 0 0 4.800D+04 4.051D+03
F = 4051.0211050375365
sklearn uses the scipy implementation of the lbfgs solver. The function scipy/optimize/_lbfgsb_py.py:_minimize_lbfgsb has the following early stop conditions
if n_iterations >= maxiter:
task[:] = 'STOP: TOTAL NO. of ITERATIONS REACHED LIMIT'
elif sf.nfev > maxfun:
task[:] = ('STOP: TOTAL NO. of f AND g EVALUATIONS '
'EXCEEDS LIMIT')
I am indeed hitting the sf.nfev > maxfun limit. Unfortunatly, sklearn fixes the value of maxfun to 15_000 when it instanciates the scipy solver (`sklearn/linear_model/_logistic.py:442).
When I hotfix the sklearn package to set maxfun to 100_000, the solver converges. But this is not a real solution (since I do not want to carry arround a custom sklearn dist with one different constant).
Any ideas on how to set the maxfun parameter in another way?

Ryan Joiner Normality Test P-value

i tried a lot to calculate the Ryan Joiner(RJ) P-Value. is there any method or formula to calculate RJ P-Value.
i found how to calculate RJ value but unable to find the manual calcuation part for RJ P-Value. Minitab is calculating by some startegy . i want to know that how calculate it in manually.
please support me on this.
The test statistic RJ needs to be compared to a critical value CV in order to make a determination of whether to reject or fail to reject the null hypothesis.
The value of CV depends on the sample size and confidence level desired, and the values are empirically derived: generate large numbers of normally distributed datasets for each sample size n, calculate RJ statistic for each, then CV for a=0.10 is the 10th percentile value of RJ.
Sidenote: For some reason I'm seeing a 90% confidence level used many places for Ryan-Joiner, when a 95% confidence is commonly used for other normality tests. I'm not sure why.
I recommend reading the original Ryan-Joiner 1976 paper:
https://www.additive-net.de/de/component/jdownloads/send/70-support/236-normal-probability-plots-and-tests-for-normality-thomas-a-ryan-jr-bryan-l-joiner
In that paper, the following critical value equations were empirically derived (I wrote out in python for convenience):
def rj_critical_value(n, a=0.10)
if a == 0.1:
return 1.0071 - (0.1371 / sqrt(n)) - (0.3682 / n) + (0.7780 / n**2)
elif a == 0.05:
return 1.0063 - (0.1288 / sqrt(n)) - (0.6118 / n) + (1.3505 / n**2)
elif a == 0.01:
return 0.9963 - (0.0211 / sqrt(n)) - (1.4106 / n) + (3.1791 / n**2)
else:
raise Exception("a must be one of [0.10, 0.05, 0.01]")
The RJ test statistic then needs to be compared to that critical value:
If RJ < CV, then the determination is NOT NORMAL.
If RJ > CV, then the determination is NORMAL.
Minitab is going one step further - working backwards to determine the value of a at which CV == RJ. This value would be the p-value you're referencing in your original question.

Statistical tests for two random datasets

I need to compare two data sets, i randomly created them in Julia, with rand. I want to know if there's some statistical test (that can be perform in Julia JuMP) that tells me how different the distributions are (having no assumptions of the original distribution).
Why would you want to perform this in JuMP?
This is really a job for the HypothesisTests package:
https://github.com/JuliaStats/HypothesisTests.jl
julia> using HypothesisTests
julia> x, y = rand(100), rand(100);
julia> test = HypothesisTests.ApproximateTwoSampleKSTest(x, y)
Approximate two sample Kolmogorov-Smirnov test
----------------------------------------------
Population details:
parameter of interest: Supremum of CDF differences
value under h_0: 0.0
point estimate: 0.11
Test summary:
outcome with 95% confidence: fail to reject h_0
two-sided p-value: 0.5806
Details:
number of observations: [100,100]
KS-statistic: 0.7778174593052022
julia> pvalue(test)
0.5806177304235198
https://juliastats.org/HypothesisTests.jl/stable/nonparametric/#HypothesisTests.ApproximateTwoSampleKSTest

how to calculate accuracy from decision trees?

Hi, I am taking a course on Coursera and came into this question. My answer is 1-(4048+3456)/8124=0.076. However, the answer is 0.067. Anybody can help me to solve this? Thank you!!
Accuracy: The number of correct predictions made divided by the total number of predictions made.
We're going to predict the majority class associated with a particular node as True. i.e. use the larger value attribute from each node.
So the accuracy for:
Depth 1: (3796 + 3408) / 8124
Depth 2: (3760 + 512 + 3408 + 72) / 8124
Depth_2 - Depth_1 = 0.06745
First We will draw confusion metrics for both cases and then find accuracy.
Confusion metrics:
Accuracy= (TP + TN) / (Total number of observation)
Accuracy calculation:
Depth 1: (3796 + 3408) / 8124
Depth 2: (3760 + 512 + 3408 + 72) / 8124
Depth_2 - Depth_1 = 0.06745
Though answer is correct but the confusion matrix looks not correct.
This should be the confusion matrix( for depth-2).
enter image description here

Getting probability as 0 or 1 in KNN (predict_proba)

I was using KNN from sklearn and predicted the labels using predict_proba. I was expecting the values in the range of 0 to 1 since it tells the probability for a particular class. But I am only getting 0 & 1.
I have put large k values also but to no gain. Though I have only 1000 samples with features around 200 and the matrix is largely sparse.
Can anybody tell me what could be the solution here?
sklearn.neighbors.KNeighborsClassifier(n_neighbors=**k**)
The reason why you're getting only 0 & 1 is because of the n_neighbors = k parameter. If k value is set to 1, then you will get 0 or 1. If it's set to 2, you will get 0, 0.5 or 1. And if it's set to 3, then the probability outputs will be 0, 0.333, 0.666, or 1.
Also note that probability values are essentially meaningless in KNN. The algorithm is based on similarity and distance.
The reason might be lack of variety of data in training and test sets.
If the features of a sample may only exist in a particular class and its features don't exist in any sample of other classes in training set, then that sample will be predicted to belong that class with probability of 100% (1) and 0% (0) for other classes.
Otherwise; let say you have 2 classes and test a sample like knn.predict_proba(sample) and expect some result like [[0.47, 0.53]] The result will yield 1 in total in either way.
If thats the case, try generating your own test sample that has features from more than one classes objects in training set.

Resources