python numpy stack matrices and add specific corner/column entries - python-3.x

Say we have two matrices A and B with a size of 2 by 2. Is there a command that can stack them horizontally and add A[:,1] to B[:,0] so that the resulting matrix C is 2 by 3, with C[:,0] = A[:,0], C[:,1] = A[:,1] + B[:,0], C[:,2] = B[:,1]. One step further, stacking them on diagonal so that C[0:2,0:2] = A, C[1:2,1:2] = B, C[1,1] = A[1,1] + B[0,0]. C is 3 by 3 in this case. Hard coding this routine is not hard, but I'm just curious since MATLAB has a similar function if my memory serves me well.

A straight forward approach is to copy or add the two arrays to a target:
In [882]: A=np.arange(4).reshape(2,2)
In [883]: C=np.zeros((2,3),int)
In [884]: C[:,:-1]=A
In [885]: C[:,1:]+=A # or B
In [886]: C
Out[886]:
array([[0, 1, 1],
[2, 5, 3]])
Another approach is to to pad A at the end, pad B at the start, and sum; while there is a convenient pad function, it won't be any faster.
And for the diagonal
In [887]: C=np.zeros((3,3),int)
In [888]: C[:-1,:-1]=A
In [889]: C[1:,1:]+=A
In [890]: C
Out[890]:
array([[0, 1, 0],
[2, 3, 1],
[0, 2, 3]])
Again the 2 arrays could be pad and added.
I'm not aware of any specialized function to do this; even if there were, it probably would do the same thing. This isn't a common enough operation to justify a compiled version.
I have built up finite element sparse matrices by adding over lapping element matrices. The sparse formats for both MATLAB and scipy facilitate this (duplicate coordinates are summed).
============
In [896]: np.pad(A,[[0,0],[0,1]],mode='constant')+np.pad(A,[[0,0],[1,0]],mode='
...: constant')
Out[896]:
array([[0, 1, 1],
[2, 5, 3]])
In [897]: np.pad(A,[[0,1],[0,1]],mode='constant')+np.pad(A,[[1,0],[1,0]],mode='
...: constant')
Out[897]:
array([[0, 1, 0],
[2, 3, 1],
[0, 2, 3]])
What's the special MATLAB code for doing this?
in Octave I found:
prepad(A,3,0,axis=2)+postpad(A,3,0,axis=2)

Related

torch matrix equaity sum operation

I want to do an operation similar to matrix multiplication, except instead of multiplying I want to check equality. The effect that I want to achieve is similar to the following:
a = torch.Tensor([[1, 2, 3], [4, 5, 6]]).to(torch.uint8)
b = torch.Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]]).to(torch.uint8)
result = [[sum(a[i] == b [j]) for j in range(len(b))] for i in range(len(a))]
Is there a way that I can use einsum, or any other function in pytorch to achieve the above efficiently?
You can use torch.repeat and torch.repeat_interleave:
a = torch.Tensor([[1, 2, 3], [4, 5, 6]])
b = torch.Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
mask = a.repeat_interleave(3, dim=0) == b.repeat((2, 1))
torch.sum(mask, axis=1).reshape(a.shape)
# output
tensor([[3, 0, 0],
[0, 3, 0]])
You can make use of the broadcasting to do the same, for instance with
result = (a[:, None, :] == b[None, :, :]).sum(dim=2)
Here None just introduces a dummy dimensions - alternatively you can use the less visual .unsqueeze() instead.
matrix multiplication is ij,jk->ik in einsum notation, all of these operations are equivalent with varying levels of verbosity:
a # b
torch.einsum("ij,jk", a, b)
torch.einsum("ij,jk->ik", a, b)
(a[:,:,None] * b[None,:,:]).sum(1)
"multiply i and k dimensions and reduce j dimension"
i, j, k i, j, k
a: (2, 3) => (2, 3, None)
b: (3, 3) (None, 3, 3)
It should now be clear from this function decomposition that multiplication can be replaced with any binary operation, e.g. the equality operation.
Unfortunately, there is no generalized form of einsum (AFAIK) in pytorch that swaps the multiplication "out-of-the-box". There is however the einops library which is basically a wrapper around deep learning frameworks such as PyTorch.

Pytorch merging list of tensors together

Let's say I have a list of tensors ([A , B , C ] where each tensor of is of shape [batch_size X 1024].
I want to merge all the tensors into a single tensor in the following way :
The first row in A is the first row in the new tensor, and the first row of B is the seocnd row in the new tensor, and the first row of C is the third row of the new tensor and so on and so forth.
So far I did it with for loops and this is not effictive at all.
Would love to hear about more efficent ways.
Thanks
Here's a minimal example that works:
import torch
a = torch.tensor([[1,1],[1,1]])
b = torch.tensor([[2,2],[2,2]])
c = torch.tensor([[3,3],[3,3]])
torch.stack([a,b,c],dim=0).view(6,2).t().contiguous().view(6,2)
The output is:
tensor([[1, 1],
[2, 2],
[3, 3],
[1, 1],
[2, 2],
[3, 3]])
In your case, view(6,2) should change to batch_size*3, 1024.
Solution adapted from PyTorch forums
where an example was shown with two tensors.

I am having trouble multiplying two matrices with numpy

I am trying to use numpy to multiply two matrices:
import numpy as np
A = np.array([[1, 3, 2], [4, 0, 1]])
B = np.array([[1, 0, 5], [3, 1, 2]])
I tested the process and ran the calculations manually, utilizing the formula for matrix multiplications. So, in this case, I would first multiply [1, 0, 5] x A, which resulted in [11, 9] and then multiply [3, 1, 2] x B, which resulted in [10, 14]. Finally, the product of this multiplication is [[11, 9], [10, 14]]
nevertheless, when I use numpy to multiply these matrices, I am getting an error:
ValueError: shapes (2,3) and (2,3) not aligned: 3 (dim 1) != 2 (dim 0)
Is there a way to do this with python, successfully?
Read the docs on matrix multiplication in numpy, specifically on behaviours.
The behavior depends on the arguments in the following way.
If both arguments are 2-D they are multiplied like conventional
matrices. If either argument is N-D, N > 2, it is treated as a stack
of matrices residing in the last two indexes and broadcast
accordingly. If the first argument is 1-D, it is promoted to a matrix
by prepending a 1 to its dimensions. After matrix multiplication the
prepended 1 is removed. If the second argument is 1-D, it is promoted
to a matrix by appending a 1 to its dimensions. After matrix
multiplication the appended 1 is removed.
to get your output, try transposing one before multiplying?
c=np.matmul(A,B.transpose())
array([[11, 10],
[ 9, 14]])

Numpy matrix addition vs ndarrays, convenient oneliner

How does numpy's matrix class work? I understand it will likely be removed in the future, so I am trying to understand how it works, so I can do the same with ndarrrays.
>>> x=np.matrix([[1,1,1],[2,2,2],[3,3,3]])
>>> x[:,0] + x[0,:]
matrix([[2, 2, 2],
[3, 3, 3],
[4, 4, 4]])
Seems like a row of ones got added to every row.
>>> x=np.matrix([[1,2,3],[1,2,3],[1,2,3]])
>>> x[0,:] + x[:,0]
matrix([[2, 3, 4],
[2, 3, 4],
[2, 3, 4]])
Now it seems like a column of ones got added to every column. What it does it with the identity is even weirder,
>>> x=np.matrix([[1,0,0],[0,1,0],[0,0,1]])
>>> x[0,:] + x[:,0]
matrix([[2, 1, 1],
[1, 0, 0],
[1, 0, 0]])
EDIT:
It seems if you take a (N,1) shape matrix and add it to a (N,1) shape matrix, then of one these is replicated to form a (N,N) matrix and the other is added to every row or column of this new matrix. It seems to be a convenience restricted to vectors of the right sizes. A nice use case was networkx's implementation of Floyd-Warshal.
Is there an equivalently convenient one-liner for this using standard numpy ndarrays?

scikit-learn: Get selected features for prediction data

I have a training set of data. The python script for creating the model also calculates the attributes into a numpy array (It's a bit vector). I then want to use VarianceThreshold to eliminate all features that have 0 variance (eg. all 0 or 1). I then run get_support(indices=True) to get the indices of the select columns.
My issue now is how to get only the selected features for the data I want to predict. I first calculate all features and then use array indexing but it does not work:
x_predict_all = getAllFeatures(suppl_predict)
x_predict = x_predict_all[indices] #only selected features
indices is a numpy array.
The returned array x_predict has the correct length len(x_predict) but wrong shape x_predict.shape[1] which is still the original length. My classifier then throws an error due to wrong shape
prediction = gbc.predict(x_predict)
File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", li
ne 1032, in _init_decision_function
self.n_features, X.shape[1]))
ValueError: X.shape[1] should be 1855, not 2090.
How can I solve this issue?
You can do it like this:
Test data
from sklearn.feature_selection import VarianceThreshold
X = np.array([[0, 2, 0, 3],
[0, 1, 4, 3],
[0, 1, 1, 3]])
selector = VarianceThreshold()
Alternative 1
>>> selector.fit(X)
>>> idxs = selector.get_support(indices=True)
>>> X[:, idxs]
array([[2, 0],
[1, 4],
[1, 1]])
Alternative 2
>>> selector.fit_transform(X)
array([[2, 0],
[1, 4],
[1, 1]])

Resources