How to inverse the form of the confusion matrix - confusion-matrix

So I have these confusion matrices that describe how my model is performing to its three subsets (training,validation and testing), which are:
Given those images I wanted to be able to merge them into a single plot which would look like this:
The code that I've implemented reverses all my confusion matrices and returns them like this:
Here's the code that I've used in order to implement only the given task.
P.S: There's more code behind it
plotconfusion(Training_y_plot,Training_y_plot_problepseis);
title("Trainig Confusion Matrix");
first_graph=gcf;
first_graph.NextPlot='new';
ax1=findobj(first_graph,'Type','Axes');
plotconfusion(Validation_y_plot,Validation_y_plot_problepseis);
title("Validation Confusion Matrix");
second_graph=gcf;
second_graph.NextPlot='new';
ax2=findobj(second_graph,'Type','Axes');
plotconfusion(Testing_y_plot,Testing_y_plot_problepseis);
title("Testing Confusion Matrix");
third_graph=gcf;
third_graph.NextPlot='new';
ax3=findobj(third_graph,'Type','Axes');
plotconfusion(y_eisodos_sto_net,y_plot_problepseis);
title("All Confusion Matrix");
fourth_graph=gcf;
fourth_graph.NextPlot='new';
ax4=findobj(fourth_graph,'Type','Axes');
total_plot=figure(1);
f1s1=subplot(221);
copyobj(allchild(ax1),f1s1);
f1s2=subplot(222);
copyobj(allchild(ax2),f1s2);
f1s3=subplot(223);
copyobj(allchild(ax3),f1s3);
f1s4=subplot(224);
copyobj(allchild(ax4),f1s4);
I could really use some help

Related

Cross-Correlation between 3d fields numerically

I have a two 3D variables for a each time step (so I have N 3d matrix var(Nx,Ny,Nz), for each variables). I want to construct the two point statistics but I guess I'm doing something wrong.
Two-point statistics formula, where x_r is the reference point and x is the independent variable
I know that the theoretical formulation of a two-point cross correlation is the one written above.
Let's for sake of simplicity ignore the normalization, so I'm focusing on the numerator, that is the part I'm struggling with.
So, my two variables are two 3D matrix, with the following notation phi(x,y,z) = phi(i,j,k), same for psi.
My aim is to compute a 3d correlation, so given a certain reference point Reference_Point = (xr,yr,zr), but I guess I'm doing something wrong. I'm trying that on MATLAB, but my results are not accurate, and by doing some researches online it does seem that I should do convolutions or fft, but I don't find any theoretical framework that explains how to do that and why the formulation above in practices should be implemented by the use of a conv or fft. Moreover I would like to implement my cross-correlation in the spatial domain and not in the frequency domain, and with the convolution I don't understand how to choose the reference point.
Thank you so much in advance for reply

How to compute the iteration matrix for nth NLBGS iteration

I was wondering if there was a direct way of computing the iteration matrix for nth Linear Block Gauss Seidel iteration within OpenMDAO?
thank you
If I understand you correctly, you are referring to the matrix-form of the Gauss Seidel algorithm where you take Ax=b, and break A up into the Diagonal (D), Lower (L) and Upper (U) parts, then use those parts to compute the next iterate.
Specifically you compute [D-L]^-1. This, I believe is what you are referring to as the "iteration matrix" (I am not familiar with this terminology, but based on the algorithm I'm comfortable making an educated guess).
This formulation of the algorithm is useful to think about and a simple way to implement it, but OpenMDAO takes a different approach. The LBGS algorithm implemented in OpenMDAO is set up to work in a matrix-free manner. That means it only interacts with the linear operator methods solve_linear and apply_linear and never explicitly assembles the A matrix at all. Hence there isn't an opportunity to split A up into D, L, U.
Depending on the way you constructed the model, the A matrix you would need might or might not be there at all because OpenMDAO is capable of working in a completely matrix free context. However, if all of your components use the compute_partials or linearize methods to provide partial derivatives then the data you would need for the A matrix does exist in memory.
You'll have to dig for it a bit, and ironically the best place to see how to do that is in the direct solver which does actually require the matrix be formed to compute a factorization.
Also, in that code you'll see a function can iteratively call the linear operator to construct a dense matrix even if the underlying components don't provide their partials directly. Please note that this approach for assembling the matrix is extremely slow and is not recommended for normal operations.

How to classify arrays using svm?

I want to make a model which can differentiate between general functions eg. If a given set of points fall on a line or a parabola etc.
I am not able to train a svc directly on arrays as it expects an array of 2d shape
Any suggestions?
Note: eventually i want to build it into classifying into periodic functions given a set of data points
Okay so your input is an array of points, each point has coordinates (x,y) and your label is the type of function.
In math, this task is called interpolation and this is where you get a set of points and you return the function that should be returned.
What you are describing seems more like non-linear regression (curve fitting) than it is about classification, you'll have too many classes to cover and it doesn't really make sense to do that anyway.
Here is a tutorial in python about non-linear regression that would be more useful. https://scipy-cookbook.readthedocs.io/items/robust_regression.html

ELKI: clustering object with Gaussian uncertainty

I am very new to java and using ELKI. I have three dimensional objects have information about their uncertainty ( a multivariate gaussian). I would like to use FDBSCAN to cluster my data. I am wondering if it is possible to do this in ELKI using the UncertainiObject class. However, I am not sure how to do this.
Any help or pointers to examples will be very useful.
Yes, you can use, e.g., SimpleGaussianContinuousUncertainObject to model uncertain data with Gaussian uncertainty. But if you want a full multivariate Gaussian, you will have to modify its source code. It is not a very complicated class.
Many of the algorithms assume you can put a bounding box around uncertain objects, in order to prune the search space (otherwise, you will always be in O(n^2)). This is more difficult with rotated Gaussians!
The key difficulty with using all of these is actually data input. There is no standard file format for specifying objects with uncertainty. Apparently, most people that work with uncertain data just use certain data, and add an artificial uncertainty to it. But even that needs a lot of parameters to tune, and I am not convinced by this approach.

How to collapse a RandomForest into an equivalent decision tree?

The way I understand it, in creating a random forest, the algorithm bundles a bunch of randomly generated decision trees together, weighting them such that they fit the training data.
Is it reasonable to say that this average of forests could be simplified into a simple decision tree? And, if so - how can I access and present this tree?
What I'm looking to do here is extract the information in the tree to help identify both the leading attributes, their boundary values and placement in the tree. I'm assuming that such a tree would provide insight to a human (or computer heuristic) as to which attributes within a dataset provide the most insight into determining the target outcome.
This probably seems a naive question - and if so, please be patient, I'm new to this and want to get to a stage where I understand it sufficiently.
RandomForest uses bootstrap to create many training sets by sampling the data with replacement (bagging). Each bootstrapped set is very close to the original data, but slightly different, since it may have multiples of the some points and some other points in the original data will be missing. (This helps create a whole bunch of similar but different sets that as a whole represent the population your data came from, and allow better generalization)
Then it fits a DecisionTree to each set. However, what a regular DecisionTree does at each step, is to loop over each feature, find the best split for each feature, and in the end choose to do the split in the feature that produced the best one among all. In RandomForest, instead of looping over every feature to find the best split, you only try a random subsample at each step (default is sqrt(n_features)).
So, every tree in RandomForest is fit to a bootstrapped random training set. And at each branching step, it only looks at a subsample of features, so some of the branching will be good but not necessarily the ideal split. This means that each tree is a less than ideal fit to the original data. When you average the result of all these (sub-ideal) trees, though, you get a robust prediction. Regular DecisionTrees overfit the data, this two-way randomization (bagging and feature subsampling) allow them to generalize and a forest usually does a good job.
Here is the catch: While you can average out the output of each tree, you cannot really "average the trees" to get an "average tree". Since trees are a bunch of if-then statements that are chained, there is no way of taking these chains and coming up with a single chain that produces the result that's the same as averaged result from each chain. Each tree in the forest is different, even if same features show up, they show up in different places of the trees, which makes it impossible to combine. You cannot represent a RandomForest as a single tree.
There are two things you can do.
1) As RPresle mentioned, you can look at the .feature_importances_ attribute, which for each feature averages the splitting score from different trees. The idea is, while you can't get an average tree, you can quantify how much and how effectively each feature is used in the forest by averaging their score in each tree.
2) When I fit a RandomForest model and need to get some insight into what's happening, how the features are affecting the result, I also fit a single DecisionTree. Now, this model is usually not good at all by itself, it will easily be outperformed by the RandomForest and I wouldn't use it to predict anything, but by drawing and looking at the splits in this tree, combined with the .feature_importances_ of the forest, I usually get a pretty good idea of the big picture.

Resources