I was wondering if there was a direct way of computing the iteration matrix for nth Linear Block Gauss Seidel iteration within OpenMDAO?
thank you
If I understand you correctly, you are referring to the matrix-form of the Gauss Seidel algorithm where you take Ax=b, and break A up into the Diagonal (D), Lower (L) and Upper (U) parts, then use those parts to compute the next iterate.
Specifically you compute [D-L]^-1. This, I believe is what you are referring to as the "iteration matrix" (I am not familiar with this terminology, but based on the algorithm I'm comfortable making an educated guess).
This formulation of the algorithm is useful to think about and a simple way to implement it, but OpenMDAO takes a different approach. The LBGS algorithm implemented in OpenMDAO is set up to work in a matrix-free manner. That means it only interacts with the linear operator methods solve_linear and apply_linear and never explicitly assembles the A matrix at all. Hence there isn't an opportunity to split A up into D, L, U.
Depending on the way you constructed the model, the A matrix you would need might or might not be there at all because OpenMDAO is capable of working in a completely matrix free context. However, if all of your components use the compute_partials or linearize methods to provide partial derivatives then the data you would need for the A matrix does exist in memory.
You'll have to dig for it a bit, and ironically the best place to see how to do that is in the direct solver which does actually require the matrix be formed to compute a factorization.
Also, in that code you'll see a function can iteratively call the linear operator to construct a dense matrix even if the underlying components don't provide their partials directly. Please note that this approach for assembling the matrix is extremely slow and is not recommended for normal operations.
Related
I have a two 3D variables for a each time step (so I have N 3d matrix var(Nx,Ny,Nz), for each variables). I want to construct the two point statistics but I guess I'm doing something wrong.
Two-point statistics formula, where x_r is the reference point and x is the independent variable
I know that the theoretical formulation of a two-point cross correlation is the one written above.
Let's for sake of simplicity ignore the normalization, so I'm focusing on the numerator, that is the part I'm struggling with.
So, my two variables are two 3D matrix, with the following notation phi(x,y,z) = phi(i,j,k), same for psi.
My aim is to compute a 3d correlation, so given a certain reference point Reference_Point = (xr,yr,zr), but I guess I'm doing something wrong. I'm trying that on MATLAB, but my results are not accurate, and by doing some researches online it does seem that I should do convolutions or fft, but I don't find any theoretical framework that explains how to do that and why the formulation above in practices should be implemented by the use of a conv or fft. Moreover I would like to implement my cross-correlation in the spatial domain and not in the frequency domain, and with the convolution I don't understand how to choose the reference point.
Thank you so much in advance for reply
The way I understand it, in creating a random forest, the algorithm bundles a bunch of randomly generated decision trees together, weighting them such that they fit the training data.
Is it reasonable to say that this average of forests could be simplified into a simple decision tree? And, if so - how can I access and present this tree?
What I'm looking to do here is extract the information in the tree to help identify both the leading attributes, their boundary values and placement in the tree. I'm assuming that such a tree would provide insight to a human (or computer heuristic) as to which attributes within a dataset provide the most insight into determining the target outcome.
This probably seems a naive question - and if so, please be patient, I'm new to this and want to get to a stage where I understand it sufficiently.
RandomForest uses bootstrap to create many training sets by sampling the data with replacement (bagging). Each bootstrapped set is very close to the original data, but slightly different, since it may have multiples of the some points and some other points in the original data will be missing. (This helps create a whole bunch of similar but different sets that as a whole represent the population your data came from, and allow better generalization)
Then it fits a DecisionTree to each set. However, what a regular DecisionTree does at each step, is to loop over each feature, find the best split for each feature, and in the end choose to do the split in the feature that produced the best one among all. In RandomForest, instead of looping over every feature to find the best split, you only try a random subsample at each step (default is sqrt(n_features)).
So, every tree in RandomForest is fit to a bootstrapped random training set. And at each branching step, it only looks at a subsample of features, so some of the branching will be good but not necessarily the ideal split. This means that each tree is a less than ideal fit to the original data. When you average the result of all these (sub-ideal) trees, though, you get a robust prediction. Regular DecisionTrees overfit the data, this two-way randomization (bagging and feature subsampling) allow them to generalize and a forest usually does a good job.
Here is the catch: While you can average out the output of each tree, you cannot really "average the trees" to get an "average tree". Since trees are a bunch of if-then statements that are chained, there is no way of taking these chains and coming up with a single chain that produces the result that's the same as averaged result from each chain. Each tree in the forest is different, even if same features show up, they show up in different places of the trees, which makes it impossible to combine. You cannot represent a RandomForest as a single tree.
There are two things you can do.
1) As RPresle mentioned, you can look at the .feature_importances_ attribute, which for each feature averages the splitting score from different trees. The idea is, while you can't get an average tree, you can quantify how much and how effectively each feature is used in the forest by averaging their score in each tree.
2) When I fit a RandomForest model and need to get some insight into what's happening, how the features are affecting the result, I also fit a single DecisionTree. Now, this model is usually not good at all by itself, it will easily be outperformed by the RandomForest and I wouldn't use it to predict anything, but by drawing and looking at the splits in this tree, combined with the .feature_importances_ of the forest, I usually get a pretty good idea of the big picture.
In my application I am computing the inverse of a block tridiagonal matrix A - will Theano's matrix inverse account for that structure (by using a more efficient matrix inverse algorithm)?
Further, I only need the diagonal and first off diagonal blocks of the resulting inverse matrix. Is there a way of preventing Theano from computing the remaining blocks?
Generally, I'm curious whether it would be worth implementing a forward/backward block tridagonal matrix inverse algorithm myself.
As of April 2015 Theano matrix inverse function won't do it directly:
http://deeplearning.net/software/theano/library/tensor/nlinalg.html#theano.tensor.nlinalg.MatrixInverse
Theano do not have many optimization and function related to that type of methods. It partially wrap what is under numpy.linalg (most of it) and some of scipy.linalg:
http://deeplearning.net/software/theano/library/tensor/slinalg.html
So you are better in the short term to do it with numpy/scipy directly.
If you want to add those feature to Theano, this can be done. But it need someone with the time and willingness to do it.
I need to use the SVD form of a matrix to extract concepts from a series of documents. My matrix is of the form A = [d1, d2, d3 ... dN] where di is a binary vector of M components. Then the svd decomposition gives me svd(A) = U x S x V' with S containing the singular values.
I use SVDLIBC to do the processing in nodejs (using a small module I wrote to use it). It seemed to work all well, but I noticed something quite weird in the running time behavior depending on the state of my matrix (where N, M are growing, but already above 1000 for each).
First, I didn't consider extracting the same document vectors, but now after some tests, it looks like adding a document twice sometimes speeds the processing extraordinarily.
Do I have to make sure that each of the columns of A are pairwise-independent? Are they required to be all linearly independent? (I thought nope, since SVD just seems to be performing its job well even with some columns being exactly the same, it will simply show in the resulting decomposition which columns / rows are useless by having 0 components in U or V)
Now that it sometimes takes way too much time to compute the SVD of my big matrix, I was trying to reduce its size by removing the same columns, but I found out that actually adding dummy same vectors can make it way faster. Is that normal? What's happening?
Logically, I'd say that I want my matrix to contain as much information as possible, and thus
[A] Remove all same columns, and in the best case, maybe
[B] Remove linearly dependent columns.
Doing [A] seems pretty simple and not computationally too expensive, I could hash my vectors at construction to check what are the possibly same vectors, and then spend time to check these, but are there good computation techniques for [A] and [B]?
(I'd appreciate for [A] to not have to check equality of a new vector with the whole past vectors the brute-force way, and as for [B], I don't know any good way to check it / do it).
Added related question: about my second question, why would SVD's running time behavior change so massively by just adding one similar column? Is that a normal possible behavior, or does it mean I should look for a bug in SVDLIBC?
It is difficult to say where the problem is without samples of fast and slow input matrices. But, since one of the primary uses of the SVD is to provide a rotation that eliminates covariance, redundant (or the same) columns should not cause problems.
To answer your question about if the slow behavior being a bug in the library you're using, I'd suggest trying to retrieve the SVD of the same matrix using another tool. For example, in Octave, retrieve an SVD of your matrix to compare runtimes:
[U, S, V] = svd(A)
I have n points in R^3 that I want to cover with k ellipsoids or cylinders (I don't really care; whichever is easier). I want to approximately minimize the union of the volumes. Let's say n is tens of thousands and k is a handful. Development time (i.e. simplicity) is more important than runtime.
Obviously I can run k-means and use perfect balls for my ellipsoids. Or I can run k-means, then use minimum enclosing ellipsoids per cluster rather than covering with balls, though in the worst case that's no better. I've seen talk of handling anisotropy with k-means but the links I saw seemed to think I had a tensor in hand; I don't, I just know the data will be a union of ellipsoids. Any suggestions?
[Edit: There's a couple votes for fitting a mixture of multivariate Gaussians, which seems like a viable thing to try. Firing up an EM code to do that won't minimize the volume of the union, but of course k-means doesn't minimize volume either.]
So you likely know k-means is NP-hard, and this problem is even more general (harder). Because you want to do ellipsoids it might make a lot of sense to fit a mixture of k multivariate gaussian distributions. You would probably want to try and find a maximum likelihood solution, which is a non-convex optimization, but at least it's easy to formulate and there is likely code available.
Other than that you're likely to have to write your own heuristic search algorithm from scratch, this is just a huge undertaking.
I did something similar with multi-variate gaussians using this method. The authors use kurtosis as the split measure, and I found it to be a satisfactory method for my application, clustering points obtained from a laser range finder (i.e. computer vision).
If the ellipsoids can overlap a lot,
then methods like k-means that try to assign points to single clusters
won't work very well.
Part of each ellipsoid has to fit the surface of your object,
but the rest may be inside it, don't-cares.
That is, covering algorithms
seem to me quite different from clustering / splitting algorithms;
unions are not splits.
Gaussian mixtures with lots of overlaps ?
No idea, but see the picture and code on Numerical Recipes p. 845.
Coverings are hard even in 2d, see
find-near-minimal-covering-set-of-discs-on-a-2-d-plane.