In a Python 3 application I'm using NumPy to calculate eigenvalues and eigenvectors of a symmetric real matrix.
Here's my demo code:
import numpy as np
a = np.random.rand(3,3) # generate a random array shaped (3,3)
a = (a + a.T)/2 # a becomes a random simmetric matrix
evalues1, evectors1 = np.linalg.eig(a)
evalues2, evectors2 = np.linalg.eigh(a)
Except for the signs, I got the same eigenvectors and eigenvalues using np.linalg.eig and np.linalg.eigh. So, what's the difference between the two methods?
Thanks
EDIT: I've read the docs here https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.eig.html
and here https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.eigh.html
but still I can not understand why I should use eigh() when I have a symmetric array.
eigh guarantees you that the eigenvalues are sorted and uses a faster algorithm that takes advantage of the fact that the matrix is symmetric. If you know that your matrix is symmetric, use this function.
Attention, eigh doesn't check if your matrix is indeed symmetric, it by default just takes the lower triangular part of the matrix and assumes that the upper triangular part is defined by the symmetry of the matrix.
eig works for general matrices and therefore uses a slower algorithm, you can check that for example with IPythons magic command %timeit. If you test with larger matrices, you will also see that in general the eigenvalues are not sorted here.
Related
I was looking for a crate that would allow me to easily and randomly generate probability vectors, stochastic matrices or, in general, ndarrays that are stochastic. For people not familiar with these concepts, a probability vector v is defined as follows
0 <= v[i] <= 1, for all i
sum(v[i]) = 1
Similarly, a stochastic matrix is a matrix where each column (or row) satisfies the conditions above.
More generally, a ndarray A would be stochastic if
0 <= A[i, j, k, ..., h] <= 1, for all indices
sum(A[i, j, k, ..., :]) = 1, for all indices
Here, ... just means other indices between k and the last index h. : is a notation to indicate all elements of that dimension.
Is there a crate that does this easily (i.e. you just need to call a function or something like that)? If not, how would you do it? I suppose one could just generate a random ndarray and then change the array by dividing the last dimension by the sum of the elements in that dimension, so, for a 1d array (a probability vector), we could do something like this
use ndarray::Array1;
use ndarray_rand::RandomExt;
use ndarray_rand::rand_distr::Uniform;
fn main() {
let mut a = Array1::random(10, Uniform::new(0.0, 1.0));
a = &a / a.sum();
println!("The sum is {:?}", a.sum());
}
But how would you do it for higher dimensional arrays? We could use a for loop an iterate over all indices, but that doesn't look like it would be efficient. I suppose there must be a way to do this operation in a vectorized form. Is there a function (in the standard library, in the ndarray crate or some other crate) that does this for us? Could we use ndarray-rand to do this without having to divide by the sum?
Requirements
Efficiency is not strictly necessary, but it would be nice.
I am more looking for a simple solution (no more complicated than what I wrote above).
Numerical stability would also be great (e.g. generating random integers and then dividing by the sum may be a better idea than generating random floats and then do the same thing).
I would like to use ndarrays and the related crate, but it's ok if you share also other solutions (which may be useful to others that don't use ndarrays)
I would argue that sampling with whatever distribution you have on hands (U(0,1), Exponential, abs Normal, ...) and then dividing by sum is the wrong way to go.
Start with distribution which has property values being in the [0...1] range and sum of values being equal to 1.
Fortunately, there is such distribution - Dirichlet distribution.
And, apparently, there is a Rust lib to do Dirichlet sampling. Cannot say anything about lib quality.
https://docs.rs/rand_distr/latest/rand_distr/struct.Dirichlet.html
UPDATE
Wrt sampling and then normalizing, problem is, noone knows what would be distribution of the RVs
U(0,1)/(U(0,1) + U(0,1) + ... + U(0,1))
Mean value? Median? Variance? Anything to say at all?
You could even construct it like
[U(0,1);Exp(2);|N(0,1)|;U(0,88);Exp(4.5);...] and as soon as you divide it by sum, values in the vector would be between 0 and 1 and summed to 1. Even less to say about properties of such RVs.
I assume you want to generate random vector/matrices for some purpose, like Monte Carlo etc. Dealing with known distribution with well-defined properties, mean values, variance looks like right way to go.
If I understand correctly, the Dirichlet distribution allows you to generate a probability vector, where the probabilities depend on the initial parameters that you pass, but you would still need to pass these parameters (manually)
Yes, concentration parameters. By default all ones, which makes RVs uniformly distributed in the simplex.
So, are you suggesting the Dirichlet distribution because it was designed on purpose to generate probability vectors?
I'm suggesting Dirichlet because by default it will produce uniformly in-the-simplex distributed RVs, summed to 1 and with well-known statistical properties, starting with PDF, CDF, mean, median, variance, ...
UPDATE II
For Dirichlet
PDF=Prod(xiai-1)/B(a)
So for the case where all ai=1
PDF = 1/B(a)
so given the constrains defining simplex Sum(xi)=1 this is as uniform as it gets.
I'm currently working to diagonalize a 5000x5000 Hermitian matrix, and I find that when I use Julia's eigen function in the LinearAlgebra module, which produces both the eigenvalues and eigenvectors, I get different results for the eigenvectors compared to when I solve the problem using numpy's np.linalg.eigh function. I believe both of them use BLAS, but I'm not sure what else they may be using that is different.
Has anyone else experienced this/knows what is going on?
numpy.linalg.eigh(a, UPLO='L') is a different algorithm. It assumes the matrix is symmetric and takes the lower triangular matrix (as a default) to more efficiently compute the decomposition.
The equivalent to Julia's LinearAlgebra.eigen() is numpy.linalg.eig. You should get the same result if you turn your matrix in Julia into a Symmetric(A, uplo=:L) matrix before feeding it into LinearAlgebra.eigen().
Check out numpy's docs on eig and eigh. Whilst Julia's standard LinearAlgebra capabilities are here. If you go down to the special matrices sections, it details what special methods it uses depending on the type of special matrix thanks to multiple dispatch.
I am developing my own Architecture Search algorithm using Pythons numpy. Currently I am trying to determine how to develop a cost function that can see the distance between X and Y, or two matrices.
I'd like to reduce the difference between the two, to a meaningful scalar value.
Ideally between 0 and 1, so that if both sets of elements within the matrices are the same numerically and positionally, a 0 is returned.
In the example below, I have the output of my algorithm X. Both X and Y are the same shape. I tried to sum the difference between the two matrices; however I'm not sure that using summation will work in all conditions. I also tried returning the mean. I don't think that either approach will work though. Aside from looping through both matrices and comparing elements directly, is there a way to capture the degree of difference in a scalar?
Y = np.arange(25).reshape(5, 5)
for i in range(1000):
X = algorithm(Y)
# I try to reduce the difference between the two matrices to a scalar value
cost = np.sum(X-Y)
There are many ways to calculate a scalar "difference" between two matrices. Here are just two examples.
The mean square error:
((m1 - m2) ** 2).mean() ** 0.5
The max absolute error:
np.abs(m1 - m2).max()
The choice of the metric depends on your problem.
I need to add many big 3D arrays (with a shape of 500x500x500) together and want to speed up the process by using multiplication in the Fourier space. The problem is that I don't get the same answer when multiplying in the Fourier space compared to simply adding the matrix.
To test it out, I wrote a minimal example trying to make it work but the answer is not what I expected. Either my math knowledge is wrong or I am not using the function correctly.
Below is the simplest code showing what I am trying to do:
import numpy as np
c = np.asarray(((1,2),(2,3)))
d = np.asarray(((1,4),(1,5)))
print("Transform")
Nc = np.fft.rfft2(c)
Nd = np.fft.rfft2(d)
print("Inverse")
Nnc = np.fft.irfft2(Nc)
Nnd = np.fft.irfft2(Nd)
print("Somme")
S = np.dot(Nc, Nd)
print(np.fft.irfft2(S))
When I print S, I get the result:
[[6, 28],[10,46]]
But from what I understood about the Fourier space, multiplication would mean addition outside of the Fourier space so I should get S = c + d?
Am I doing something wrong using the FFT function or is my assumption that S should equal c plus d wrong?
There is a little misunderstanding here:
Multiplication in Fourier space corresponds to convolution in the spatial domain and not to addition.
There is no way to speed up addition in that way.
If you want to compute c+d through the Fourier domain, you'd have to add the two spectra, not multiply them:
np.fft.irfft2(Nc+Nd) == c+d # (up to numerical precision)
Of course, this is much slower than simply adding the matrices in the spatial domain.
As #Florian said, it is convolution that can be sped up by multiplying in the spatial domain.
Before running a GMM clustering model, I use a standard Scaler to transform my data into a 0 mean, 1 std dataset
Having then performed clustering, I am interested in representing the learned cluster back in the original space rather than the 0-mean, 1 standard deviation, where the feature values make more sense.
Is it then correct to do the following:
Get the mean by multiplying the mean of each GMM cluster by the
scaler.mean_ parameters.
Get the standard deviation by multiplying the square of the
diagonal covariance matrix by the scaler.std_ parameters.
I'd appreciate any feedback,
Thank you!
For the cluster centers you can use scaler.inverse_transform() directly (because they live in the same space as your data). It adds the column means back and scales each column back up by its standard deviation.
import numpy as np
from sklearn.preprocessing import StandardScaler
X = np.random.randn(10, 3)
scaler = StandardScaler()
scaler.fit(X)
You will then see that
scaler.inverse_transform(scaler.transform(X)) - X
is equal or extremely close to 0, making the two essentially equal. In order to automate you r pipeline, you should also take a look at sklearn.pipeline.Pipeline, with which you can concatenate your processes and invoke transform and inverse_transform methods.
As for the rescaling of the covariance, you should multiply np.diag(scaler.std_) to the right and to the left of your cluster covariance matrices.
To answer your question:
1) You obtain the mean by multiplying the cluster means by scaler.std_ and adding scaler.mean_ back.
2) You rescale the cluster covariances by multiplying left and right by, np.diag(scaler.std_), viz rescaled_cov = np.diag(scaler.std_).dot(cov).dot(np.diag(scaler.std_))
Note: If your covariance matrices are rather large, you may not want to create another (diagonal, but dense) matrix of the same size. The operation scaler.std_[:, np.newaxis] * cov * scaler.std_ is equivalent mathematically to 2) but does not require creating the diagonal matrix.