does Theano's symbolic matrix inverse "know" about structured matrices? - theano

In my application I am computing the inverse of a block tridiagonal matrix A - will Theano's matrix inverse account for that structure (by using a more efficient matrix inverse algorithm)?
Further, I only need the diagonal and first off diagonal blocks of the resulting inverse matrix. Is there a way of preventing Theano from computing the remaining blocks?
Generally, I'm curious whether it would be worth implementing a forward/backward block tridagonal matrix inverse algorithm myself.

As of April 2015 Theano matrix inverse function won't do it directly:
http://deeplearning.net/software/theano/library/tensor/nlinalg.html#theano.tensor.nlinalg.MatrixInverse
Theano do not have many optimization and function related to that type of methods. It partially wrap what is under numpy.linalg (most of it) and some of scipy.linalg:
http://deeplearning.net/software/theano/library/tensor/slinalg.html
So you are better in the short term to do it with numpy/scipy directly.
If you want to add those feature to Theano, this can be done. But it need someone with the time and willingness to do it.

Related

Eigenvectors in Julia vs Numpy

I'm currently working to diagonalize a 5000x5000 Hermitian matrix, and I find that when I use Julia's eigen function in the LinearAlgebra module, which produces both the eigenvalues and eigenvectors, I get different results for the eigenvectors compared to when I solve the problem using numpy's np.linalg.eigh function. I believe both of them use BLAS, but I'm not sure what else they may be using that is different.
Has anyone else experienced this/knows what is going on?
numpy.linalg.eigh(a, UPLO='L') is a different algorithm. It assumes the matrix is symmetric and takes the lower triangular matrix (as a default) to more efficiently compute the decomposition.
The equivalent to Julia's LinearAlgebra.eigen() is numpy.linalg.eig. You should get the same result if you turn your matrix in Julia into a Symmetric(A, uplo=:L) matrix before feeding it into LinearAlgebra.eigen().
Check out numpy's docs on eig and eigh. Whilst Julia's standard LinearAlgebra capabilities are here. If you go down to the special matrices sections, it details what special methods it uses depending on the type of special matrix thanks to multiple dispatch.

How to compute the iteration matrix for nth NLBGS iteration

I was wondering if there was a direct way of computing the iteration matrix for nth Linear Block Gauss Seidel iteration within OpenMDAO?
thank you
If I understand you correctly, you are referring to the matrix-form of the Gauss Seidel algorithm where you take Ax=b, and break A up into the Diagonal (D), Lower (L) and Upper (U) parts, then use those parts to compute the next iterate.
Specifically you compute [D-L]^-1. This, I believe is what you are referring to as the "iteration matrix" (I am not familiar with this terminology, but based on the algorithm I'm comfortable making an educated guess).
This formulation of the algorithm is useful to think about and a simple way to implement it, but OpenMDAO takes a different approach. The LBGS algorithm implemented in OpenMDAO is set up to work in a matrix-free manner. That means it only interacts with the linear operator methods solve_linear and apply_linear and never explicitly assembles the A matrix at all. Hence there isn't an opportunity to split A up into D, L, U.
Depending on the way you constructed the model, the A matrix you would need might or might not be there at all because OpenMDAO is capable of working in a completely matrix free context. However, if all of your components use the compute_partials or linearize methods to provide partial derivatives then the data you would need for the A matrix does exist in memory.
You'll have to dig for it a bit, and ironically the best place to see how to do that is in the direct solver which does actually require the matrix be formed to compute a factorization.
Also, in that code you'll see a function can iteratively call the linear operator to construct a dense matrix even if the underlying components don't provide their partials directly. Please note that this approach for assembling the matrix is extremely slow and is not recommended for normal operations.

handling large non-sparse matrices for computing SVD

I have a large matrix (right now about 450000 x 50, might be even larger) that I want to compute its SVD decomposition. The matrix isn't sparse and numpy can't seem to handle it and exits with MemoryError.
I tried using np.float16 and it didn't help. python's table package can't seem to help either (since I need to use the whole matrix later to find eigenvalues).
Do any of you have an idea how can I compute and use massive matrices?

How to multiply sparse matrices using hmatrix

I've got a matrix m of dimension 3329×3329 with lots of zero fields and I want to calculate m^9.
After trying this with the matrix package (Data.Matrix is easy to use) I figured that a sparse matrix would make a better representation of this in terms of memory usage and possibly also computation speed. So I'm trying to figure out how to use the hmatrix package. I've already managed to create a sparse matrix:
module Example where
import Numeric.LinearAlgebra as LA
assocExample :: AssocMatrix
assocExample = [((0,0), 1),((3329,5),1)]
sparseExample :: GMatrix
sparseExample = LA.mkSparse assocExample
My problem at this point appears to be that I've got a GMatrix, but for the multiplication operator (<>) I need a Matrix t instead.
By looking trough the hmatrix documentation on hackage I didn't manage to figure out how to obtain a Matrix t here.
I've also had a quick gaze at the introduction to hmatrix but the term sparse isn't even mentioned in it.
My hunch is that this should be easy enough to do, but I'm missing something simple.
Sparse matrices are to my knowledge rather young in hmatrix. Looking through the docs it seems there is no product of sparse matrices. You must implement it yourself.
Edit: And if you done so, comment here: https://github.com/albertoruiz/hmatrix/issues/162 (also substantiates my statement above)

Quadratic Programming and quasi newton method BFGS

Yesterday, I posted a question about general concept of SVM Primal Form Implementation:
Support Vector Machine Primal Form Implementation
and "lejlot" helped me out to understand that what I am solving is a QP problem.
But I still don't understand how my objective function can be expressed as QP problem
(http://en.wikipedia.org/wiki/Support_vector_machine#Primal_form)
Also I don't understand how QP and Quasi-Newton method are related
All I know is Quasi-Newton method will SOLVE my QP problem which supposedly formulated from
my objective function (which I don't see the connection)
Can anyone walk me through this please??
For SVM's, the goal is to find a classifier. This problem can be expressed in terms of a function that you are trying to minimize.
Let's first consider the Newton iteration. Newton iteration is a numerical method to find a solution to a problem of the form f(x) = 0.
Instead of solving it analytically we can solve it numerically by the follwing iteration:
x^k+1 = x^k - DF(x)^-1 * F(x)
Here x^k+1 is the k+1th iterate, DF(x)^-1 is the inverse of the Jacobian of F(x) and x is the kth x in the iteration.
This update runs as long as we make progress in terms of step size (delta x) or if our function value approaches 0 to a good degree. The termination criteria can be chosen accordingly.
Now consider solving the problem f'(x)=0. If we formulate the Newton iteration for that, we get
x^k+1 = x - HF(x)^-1 * DF(x)
Where HF(x)^-1 is the inverse of the Hessian matrix and DF(x) the gradient of the function F. Note that we are talking about n-dimensional Analysis and can not just take the quotient. We have to take the inverse of the matrix.
Now we are facing some problems: In each step, we have to calculate the Hessian matrix for the updated x, which is very inefficient. We also have to solve a system of linear equations, namely y = HF(x)^-1 * DF(x) or HF(x)*y = DF(x).
So instead of computing the Hessian in every iteration, we start off with an initial guess of the Hessian (maybe the identity matrix) and perform rank one updates after each iterate. For the exact formulas have a look here.
So how does this link to SVM's?
When you look at the function you are trying to minimize, you can formulate a primal problem, which you can the reformulate as a Dual Lagrangian problem which is convex and can be solved numerically. It is all well documented in the article so I will not try to express the formulas in a less good quality.
But the idea is the following: If you have a dual problem, you can solve it numerically. There are multiple solvers available. In the link you posted, they recommend coordinate descent, which solves the optimization problem for one coordinate at a time. Or you can use subgradient descent. Another method is to use L-BFGS. It is really well explained in this paper.
Another popular algorithm for solving problems like that is ADMM (alternating direction method of multipliers). In order to use ADMM you would have to reformulate the given problem into an equal problem that would give the same solution, but has the correct format for ADMM. For that I suggest reading Boyds script on ADMM.
In general: First, understand the function you are trying to minimize and then choose the numerical method that is most suited. In this case, subgradient descent and coordinate descent are most suited, as stated in the Wikipedia link.

Resources