Reconstructing a matrix from an SVD in python 3 - python-3.x

Hi so basically my question is I have a matrix which I've SVD decomposed and have it in the variables u, s, and v. I've made some alterations to the s matrix to make it diagonal, as well as altered some of the numbers. Now I'm basically trying to reconstruct it into a regular matrix from the 3 matrices back into the original matrix. Does anyone know of any functions that do this? I can't seem to find any examples of this within numpy.

The only mildly tricky bit would be "expanding" s If you have scipy installed it has scipy.linalg.diagsvd which can do that for you:
>>> import numpy as np
>>> import scipy.linalg as la
>>>
>>> rng = np.random.default_rng()
>>> A = rng.uniform(-1,1,(4,3))
>>> u,s,v = np.linalg.svd(A)
>>>
>>> B = u#la.diagsvd(s,*A.shape)#v
>>>
>>> np.allclose(A,B)
True

I figured it out, just using the np.matmul() function and then just multiplying the 3 matrices of u s and v together was enough to get them back into an original matrix.

Related

How can I interpolate a numpy array so that it becomes a certain length?

I have three numpy arrays each with different lengths:
A.shape = (3401,)
B.shape = (2200,)
C.shape = (4103,)
I would like to average the three arrays to produce a new array with size of the largest array (in this case C):
D.shape = (4103,)
Problem is, I don't think I can do this without adding "fake" data to A and B, by interpolation.
How can I perform interpolation on the first two numpy arrays so that they are of the same length as array C?
Do I even need to interpolate here?
First thing that comes to mind is zoom from scipy:
The array is zoomed using spline interpolation of the requested order.
Code:
import numpy as np
from scipy.ndimage import zoom
A = np.random.rand(3401)
B = np.random.rand(2200)
C = np.ones(4103)
for arr in [A, B]:
zoom_rate = C.shape[0] / arr.shape[0]
arr = zoom(arr, zoom_rate)
print(arr.shape)
Output:
(4103,)
(4103,)
I think the simplest option is to do the following:
D = np.concatenate([np.average([A[:2200], B, C[:2200]], axis=0),
np.average([A[2200:3401], C[2200:3401]], axis=0),
C[3401:]])

how to remove element pairs from numpy array?

I have an array:
coordinates = np.asarray(list(product(seq, seq))) - fieldSize_va/2.0
This coordinates is numpy.ndarray type with 1600 elements (pairs). And can be seen as:
>>> array([[-4.5, -4.5], [-4.5, -4.26923077], [-4.5 , -4.03846154], ..., [4.5, 4.03846154], [4.5, 4.26923077], [4.5, 4.5]])
I have another array:
centralLines = np.asarray([(xa, ya),(xa, yb),(xb, ya),(xb, yb)])
which has values as:
>>> array([[ 0.11538462, 0.11538462], [ 0.11538462, -0.11538462], [-0.11538462, 0.11538462], [-0.11538462, -0.11538462]])
The coordinates variable contains all the pairs that are in centralLines variable. I want to remove centralLines pair elements from coordinates. How to do this??
The coordinates variable is computed using the following code:
import math
import numpy as np
from itertools import product
from numpy import linspace,degrees,random
N = 40 * 40
fieldSize_va = 9
seq = linspace(0, fieldSize_va, math.sqrt(N))
coordinates = np.asarray(list(product(seq, seq))) - fieldSize_va/2.0
Solution
One easy way to solve this would be to sweep the original array and keep the different pairs:
result = np.array([position for position in coordinates if position not in centralLines])
However, I must warn you that this solution is not optimized. Perhaps somebody else comes with a faster vectorized solution.
Sidenote 1
I would recommend you to follow some of the common guidelines of python syntax, namely PEP8.
Sidenote 2
Importing numpy just once improves readability of your code!
Repetitive:
import numpy as np
from numpy import linspace
seq = linspace(0, fieldSize_va, math.sqrt(N))
Better:
import numpy as np
seq = np.linspace(0, fieldSize_va, math.sqrt(N))
Sidenote 3
The square root is already included in numpy, as np.sqrt. You can then prescind of importing the math module.

How to reverse one hot encoded value to Label?

I am working on simple dataset to detect rock or mine with class names 'R' and 'M'. I have one hot encoded R to 1 and M to 0. Now I want to revese it.
I have tried many ways but couldn't find approach to convert back 1 to R and 0 to M
import numpy as np
import pandas as pd
import keras
from sklearn.preprocessing import LabelEncoder
df=pd.read_csv('D:\\Datasets\\node-fussy-examples-master\\node-fussy-
examples-master\\sonar\\training.csv')
ds=df.values
x_train=df[df.columns[0:60]].values
y_train=df[df.columns[60]]
encoder = LabelEncoder()
encoder.fit(y_train)
encoded_Y = encoder.transform(y_train)
I expect 1 to be R and 0 to be M
You can use inverse_transform method:
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit([1, 2, 2, 6])
print(le.transform([1, 1, 2, 6]))
print(le.inverse_transform([0, 0, 1, 2]))
If you need to do the same thing in Tensorflow, look at this thread.
I just came across a use case today where I needed to convert an onehot-encoded tensor back to a normal label tensor. I know you can use np.argmax(probs, axis=1) or something to reverse an onehot-encoded probability tensor but that didn't work in my case as my data was not a soft probability tensor but rather a label tensor filled with either 0 or 1. I know this is not entirely relevant to OP's question but I thought someone might need to do something similar so I will just write my solution down here.
def reverse_onehot(onehot_data):
# onehot_data assumed to be channel last
data_copy = np.zeros(onehot_data.shape[:-1])
for c in range(onehot_data.shape[-1]):
img_c = onehot_data[..., c]
data_copy[img_c == 1] = c
return data_copy
Let's say y is your one-hot-encoded array. Then the following should give you the labels back:
unique_classes[np.argmax(y, axis=1)]
assuming you used unique_classes for encoding too (order is important).

Print L and U matrices calculated by SuperLU using scipy

How can I print sparse L and U matrices calculated by splu, which uses SuperLU?
My MWE:
>>> import scipy
>>> import scipy.sparse
>>> import scipy.sparse.linalg
>>> from numpy import array
>>> M = scipy.array([ [19,0,21,21,0],[12,21,0,0,0],[0,12,16,0,0],[0,0,0,5,21],[12,12,0,0,18] ])
>>> cscM = scipy.sparse.csc_matrix(M)
>>> lu_obj = scipy.sparse.linalg.splu(cscM)
>>> b = array([1, 2, 3, 4, 5])
>>> lu_obj.solve(b)
array([ 0.01245301, 0.08812209, 0.12140843, -0.08505639, 0.21072771])
You can use
lu_obj = scipy.sparse.linalg.splu(A)
L,R = lu_obj.L, lu_obj.R
in the current scipy version, which returns the matrices in csc format (scipy docs).
Glancing through the scipy docs and source, scipy.sparse.linalg.splu does indeed use SuperLU. It looks like SuperLU may not explicitly calculate L or U. L & U are apt to be more dense than your original sparse matrix, so it makes sense to avoid storing them if they are not needed. If it is any consolation, your lu_obj does contain the permutaion info for L & U: lu_obj.perm_c, lu_obj.perm_r.
To get L & U, the path of least work is to use scipy.linalg.lu to get the LU matrixes. You'll have to convert your sparse matrixes to dense ones, though. ie
P, L, U = scipy.linalg.lu(cscM.todense())

Measure the uniformity of distribution of points in a 2D square

I am currently running into this problem: I have a 2D square, and have a set of points inside it, say, 1000 points. I need a way to see if the distribution of points inside the square are spread out (or more or less uniformly distributed) or they tend to gather together in some spot area inside the square.
Need a mathematical/statistical (not programming) way to determine this. I googled, found something like goodness of fit, Kolmogorov... and just wonder if there are other approaches to achieve this. Need this for class paper.
So: Inputs: a 2D square, and 1000 points.
Output: yes/no (yes = evenly spread out, no = gathering together in some spots).
Any idea would be appreciated.
Thanks
If your points are independent you can just check the distribution for each dimension individually. The Kolmogorov-Smirnov test (a measure of the distance between 2 distributions) is a good test for this. First let's generate and plot some Gaussian-distributed points so you can see how you can use the KS test (statistic) to detect a nonuniform distribution.
>>> import numpy as np
>>> from matplotlib.pyplot import plt
>>> X = np.random.gauss(1000, 2) # 1000 2-D points, normally distributed
>>> from sklearn.preprocessing import MinMaxScaler
>>> scaler = MinMaxScaler()
>>> X = scaler.fit_transform(X) # fit to default uniform dist range 0-1
>>> X
array([[ 0.46169481, 0.7444449 ],
[ 0.49408692, 0.5809512 ],
...,
[ 0.60877526, 0.59758908]])
>>> plt.scatter(*list(X))
>>> from scipy import stats
>>> from sklearn.preprocessing import StandardScaler, MinMaxScaler
>>> stats.kstest(MinMaxScaler().fit_transform(X[:,0]), 'uniform')
KstestResult(statistic=0.24738043186386116, pvalue=0.0)
The low p-value and high KS-statistic (distance from the uniform distribution) says nearly certainly did not come from a uniform distribution between 0 and 1
>>> stats.kstest(StandardScaler().fit_transform(X[:,0]), 'norm')
KstestResult(statistic=0.028970945967462303, pvalue=0.36613946547024456)
But they probably did come from a normal distribution with mean 0 and standard deviation 1 because of the high p-value and low KS distance.
Then you'd just repeat the KS-Tests for the second dimension (Y)

Resources