PCA algorithm in WEKA - attributes

I'm using a data-set with 22 attributes in WEKA and after applying the PrincipalComponents attribute evaluator I get 8 ranked attributes with 0.95 variance covered. As far as I know, the sum of the new values of the attributes should be equal to 0.95, but for some reason my ranked attribute variance doesn't add up to. What's the issue with it?
All of the attributes are numeric, except for the class attribute which is nominal.
Here it is - the output I get in WEKA after PCA:
Ranked attributes
Any help is appreciated, thank you.

The variances being totaled by yourself are scores derived from cumulative variances.
Please refer
Answer by Mark Hall:
http://weka.8497.n7.nabble.com/PCA-Output-td3579.html
And
Q&A on PCA variances add to 100%
https://stats.stackexchange.com/questions/32901/do-components-of-pca-really-represent-percentage-of-variance-can-they-sum-to-mo

Related

Confusion matrix 5x5 formula for finding accuracy, precision, recall ,and f1-score

im try to study confusion matrix. i know about 2x2 confusion matrix but i still don't understand how to count 5x5 confusion matrix for finding accuracy, precision, recall and, f1 - score. Can anyone help me with this ? i appreciate every help.
See my answer here: Calculating Equal error rate(EER) for a multi class classification problem
In short, one strategy is to split the multiclass problem into a set of binary classification, for each class a "one vs. all others" classification. Then for each binary problem you can calculate F1, precision and recall, and if you want you can average (uniformly or weighted) the scores of each class to get one F1 score which will represent the multiclass problem.
As for confusion matrix larger than 2x2: the rows are the true labels and the columns are predicated labels. Then the number in cell (i,j) is the number of samples from class i which were classified as class j (note that i=j corresponds to correct prediction). The accuracy is the trace of the confusion matrix divided by the number of samples.

Calculating the parameters of a mixture of von Mises distribution

I would like to calculate the values of the concentration (kappa) and mean direction (mu) for a von Mises mixture model from the theta values given by the movMF() function in R. At the bottom of this message-chain there is a similar question with an example for a two component vonMises. The solutions seems to be mu = theta/norm(theta) and kappa = norm(theta), however, this gives a matrix of four values for mu, where I'd expect it be only a vector of two values (one mean direction for each component). I have a feeling I misunderstood the meaning of mu or it might be that the conversion formulas are wrong. I'd appreciate any help or clarification in my matter.

precision, recall, F1 metrics exclude a label sklearn

I have a classifier for a NER task, and since 'O' labels are by far more than all others, I want to exclude it in metrics calculation.
I want to compute macro and micro scores with sklearn package. Macro scores can be calculated with precision_recall_fscore_support, because it returns the precision, recall, F1 and support for each label separetly.
Can I use sklearn package to compute and micro scores as well?
The answer turns out to be very simple. The label parameter of the function determines which labels to include in scores calculation. It is also combined with the macro, micro averages.

PCA results on imbalanced data with duplicates

I am using sklearn IPCA decomposition and surprised that if I delete duplicates from my dataset, the result differs from the "unclean" one.
What is the reason? As I think, the variance is the same.
The answer is simple. The duplicates from the dataset change the variance.
https://stats.stackexchange.com/a/381983/230117

Goodness of fit for Gaussian process output using matlab?

I used fitrgp from gaussian process matlab toolbox and calculated the predicted values for a given observation. I calculated in three different cases and got three predicted values arrays say ypred1,ypred2 and ypred3. Now I want to test the goodness of fit for these outputs in order to judge which algorithm values gives more accurate result. The details of fitrgp is given below link,
https://uk.mathworks.com/help/stats/gaussian-process-regression-models.html
It would be grateful if anyone help me in this regard. Thank you in advace

Resources