Scipy sparse.kron gives non-sparse matrix - python-3.x

I am getting unexpected non-sparse results when using the kron method of Scipy's sparse module. Specifically, matrix elements that are equal to zero after performing the kronecker product are being kept in the result, and I'd like to understand what I should do to ensure the output is still fully sparse.
Here's an example of what I mean, taking the kronecker product of two copies of the identity:
import scipy.sparse as sp
s = sp.eye(2)
S = sp.kron(s,s)
S
<4x4 sparse matrix of type '<class 'numpy.float64'>'
with 8 stored elements (blocksize = 2x2) in Block Sparse Row format>
print(S)
(0, 0) 1.0
(0, 1) 0.0
(1, 0) 0.0
(1, 1) 1.0
(2, 2) 1.0
(2, 3) 0.0
(3, 2) 0.0
(3, 3) 1.0
The sparse matrix S should only contain the 4 (diagonal) non-zero entries, but here it also has other entries that are equal to zero. Any pointers on what I am doing wrong would be much appreciated.

In
Converting from sparse to dense to sparse again decreases density after constructing sparse matrix
I point out that sparse.kron produces, by default a BSR format matrix. That's what your display shows. Those extra zeros are part of the dense blocks.
If you specify another format, kron will not produce those zeros:
In [672]: sparse.kron(s,s,format='csr')
Out[672]:
<4x4 sparse matrix of type '<class 'numpy.float64'>'
with 4 stored elements in Compressed Sparse Row format>
In [673]: _.A
Out[673]:
array([[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.]])

Related

How to add to pytorch tensor at indices?

I have to admit, I'm a bit confused by the scatter* and index* operations - I'm not sure any of them do exactly what I'm looking for, which is very simple:
Given some 2-D tensor
z = tensor([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])
And a list (or tensor?) of 2-d indexes:
inds = tensor([[0, 0],
[1, 1],
[1, 2]])
I want to add a scalar to z at those indexes (and do it efficiently):
znew = z.something_add(inds, 3)
->
znew = tensor([[4., 1., 1., 1.],
[1., 4., 4., 1.],
[1., 1., 1., 1.]])
If I have to I can make that scalar a tensor of whatever shape (where all elements = 3), but I'd rather not...
You must provide two lists to your indexing. The first having the row positions and the second the column positions. In your example, it would be:
z[[0, 1, 1], [0, 1, 2]] += 3
torch.Tensor indexing follows Numpy. See https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#integer-array-indexing for more details.
This code achieves what you want:
z_new = z.clone() # copy the tensor
z_new[inds[:, 0], inds[:, 1]] += 3 # modify selected indices of new tensor
In PyTorch, you can index each axis of a tensor with another tensor.

Pytorch select values from the last tensor dimension with indices from another tenor with a smaller dimension

I have a tensor a with three dimensions. The first dimension corresponds to minibatch size, the second to the sequence length, and the third to the feature dimension. E.g.,
>>> a = torch.arange(1, 13, dtype=torch.float).view(2,2,3) # Consider the values of a to be random
>>> a
tensor([[[ 1., 2., 3.],
[ 4., 5., 6.]],
[[ 7., 8., 9.],
[10., 11., 12.]]])
I have a second, two-dimensional tensor. Its first dimension corresponds to the minibatch size and its second dimension to the sequence length. It contains values in the range of the indices of the third dimension of a. as third dimension has size 3, so b can contain values 0, 1 or 2. E.g.,
>>> b = torch.LongTensor([[0, 2],[1,0]])
>>> b
tensor([[0, 2],
[1, 0]])
I want to obtain a tensor c that has the shape of b and contains all the values of a that are referenced by b.
In the upper scenario I would like to have:
c = torch.empty(2,2)
c[0,0] = a[0, 0, b[0,0]]
c[1,0] = a[1, 0, b[1,0]]
c[0,1] = a[0, 1, b[0,1]]
c[1,1] = a[1, 1, b[1,1]]
>>> c
tensor([[ 1., 5.],
[ 8., 10.]])
How can I create the tensor c fast? Further, I also want c to be differentiable (be able to use .backprob()). I am not too familiar with pytorch, so I am not sure, if a differentiable version of this exists.
As an alternative, instead of c having the same shape as b I could also use a c with the same shape of a, having only zeros, but at the places referenced by b ones. Then I could multiply a and c to obtain a differentiable tensor.
Like follows:
c = torch.zeros(2,2,3, dtype=torch.float)
c[0,0,b[0,0]] = 1
c[1,0,b[1,0]] = 1
c[0,1,b[0,1]] = 1
c[1,1,b[1,1]] = 1
>>> a*c
tensor([[[ 1., 0., 0.],
[ 0., 5., 0.]],
[[ 0., 8., 0.],
[10., 0., 0.]]])
Lets declare necessary variables first: (notice requires_grad in a's initialization, we will use it to ensure differentiability)
a = torch.arange(1,13,dtype=torch.float32,requires_grad=True).reshape(2,2,3)
b = torch.LongTensor([[0, 2],[1,0]])
Lets reshape a and squash minibatch and sequence dimensions:
temp = a.reshape(-1,3)
so temp now looks like:
tensor([[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.],
[10., 11., 12.]], grad_fn=<AsStridedBackward>)
Notice now each value of b can be used in each row of temp to get desired output. Now we do:
c = temp[range(len(temp )),b.view(-1)].view(b.size())
Notice how we index temp, range(len(temp )) to select each row and 1D b i.e b.view(-1) to get corresponding columns. Lastly .view(b.size()) brings this array to the same size as b.
If we print c now:
tensor([[ 1., 6.],
[ 8., 10.]], grad_fn=<ViewBackward>)
The presence of grad_fn=.. shows that c requires gradient i.e. its differentiable.

PyTorch: new_ones vs ones

In PyTorch what is the difference between new_ones() vs ones(). For example,
x2.new_ones(3,2, dtype=torch.double)
vs
torch.ones(3,2, dtype=torch.double)
For the sake of this answer, I am assuming that your x2 is a previously defined torch.Tensor. If we then head over to the PyTorch documentation, we can read the following on new_ones():
Returns a Tensor of size size filled with 1. By default, the
returned Tensor has the same torch.dtype and torch.device as this
tensor.
Whereas ones()
Returns a tensor filled with the scalar value 1, with the shape
defined by the variable argument sizes.
So, essentially, new_ones allows you to quickly create a new torch.Tensor on the same device and data type as a previously existing tensor (with ones), whereas ones() serves the purpose of creating a torch.Tensor from scratch (filled with ones).
new_ones()
# defining the tensor along with device to run on. (Assuming CUDA hardware is available)
x = torch.rand(5, 3, device="cuda")
new_ones() works with existing tensor. y will inherit the datatype from x and it will run on same device as defined in x
y = x.new_ones(2, 2)
print(y)
Output:
tensor([[1., 1.],
[1., 1.]], device='cuda:0')
ones()
# defining tensor. By default it will run on CPU.
x = torch.ones(5, 3)
print(x)
Output:
tensor([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
ones() is used to define tensor with 1. (as shown in example) of given size and is not dependent on the existing tensor, whereas new_ones() works with existing tensor which inherits properties like datatype and device from existing tensor and define the tensor with given size.

Keras custom layer/constraint to implement equal weights

I would like to create a layer in Keras such that:
y = Wx + c
where W is a block matrix with the form:
A and B are square matrices with elements:
and c is a bias vector with repeated elements:
How can I implement these restrictions? I was thinking it could either be implemented in the MyLayer.build() when initializing weights or as a constraint where I can specify certain indices to be equal but I am unsure how to do so.
You can define such W using Concatenate layer.
import keras.backend as K
from keras.layers import Concatenate
A = K.placeholder()
B = K.placeholder()
row1 = Concatenate()([A, B])
row2 = Concatenate()([B, A])
W = Concatenate(axis=1)([row1, row2])
Example evaluation:
import numpy as np
get_W = K.function(outputs=[W], inputs=[A, B])
get_W([np.eye(2), np.ones((2,2))])
Returns
[array([[1., 0., 1., 1.],
[0., 1., 1., 1.],
[1., 1., 1., 0.],
[1., 1., 0., 1.]], dtype=float32)]
To figure out exact solution you can use placeholder's shape argument. Addition and multiplication are quite straightforward.

What is the difference between dtype= and .astype() in numpy?

Context: I would like to use numpy ndarrays with float32 instead of float64.
Edit: Additional context - I'm concerned about how numpy is executing these calls because they will be happening repeatedly as part of a backpropagation routine in a neural net. I'd like the net to carry out all addition/subtraction/multiplication/division in float32 for validation purposes, as I want to compare results with another group's work. It seems like initialization for methods like randn will always go from float64 -> float32 with .astype() casting. Once my ndarray is of type float32 if i use np.dot for example will those multiplications happen in float32? How can I verify?
The documentation is not clear to me - http://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html
I figured out I can just add .astype('float32') to the end of a numpy call, for example, np.random.randn(y, 1).astype('float32').
I also see that dtype=np.float32 is an option, for example, np.zeros(5, dtype=np.float32). However, trying np.random.randn((y, 1), dtype=np.float32) returns the following error:
b = np.random.randn((3,1), dtype=np.float32)
TypeError: randn() got an unexpected keyword argument 'dtype'
What is the difference between declaring the type as float32 using dtype and using .astype()?
Both b = np.zeros(5, dtype=np.float32) and b = np.zeros(5).astype('float32') when evaluated with:
print(type(b))
print(b[0])
print(type(b[0]))
prints:
[ 0. 0. 0. 0. 0.]
<class 'numpy.ndarray'>
0.0
<class 'numpy.float32'>
Let's see if I can address some of the confusion I'm seeing in the comments.
Make an array:
In [609]: x=np.arange(5)
In [610]: x
Out[610]: array([0, 1, 2, 3, 4])
In [611]: x.dtype
Out[611]: dtype('int32')
The default for arange is to make an int32.
astype is an array method; it can used on any array:
In [612]: x.astype(np.float32)
Out[612]: array([ 0., 1., 2., 3., 4.], dtype=float32)
arange also takes a dtype parameter
In [614]: np.arange(5, dtype=np.float32)
Out[614]: array([ 0., 1., 2., 3., 4.], dtype=float32)
whether it created the int array first and converted it, or made the float32 directly isn't any concern to me. This is a basic operation, done in compiled code.
I can also give it a float stop value, in which case it will give me a float array - the default float type.
In [615]: np.arange(5.0)
Out[615]: array([ 0., 1., 2., 3., 4.])
In [616]: _.dtype
Out[616]: dtype('float64')
zeros is similar; the default dtype is float64, but with a parameter I can change that. Since its primary task with to allocate memory, and it doesn't have to do any calculation, I'm sure it creates the desired dtype right away, without further conversion. But again, this is compiled code, and I shouldn't have to worry about what it is doing under the covers.
In [618]: np.zeros(5)
Out[618]: array([ 0., 0., 0., 0., 0.])
In [619]: _.dtype
Out[619]: dtype('float64')
In [620]: np.zeros(5,dtype=np.float32)
Out[620]: array([ 0., 0., 0., 0., 0.], dtype=float32)
randn involves a lot of calculation, and evidently it is compiled to work with the default float type. It does not take a dtype. But since the result is an array, it can be cast with astype.
In [623]: np.random.randn(3)
Out[623]: array([-0.64520949, 0.21554705, 2.16722514])
In [624]: _.dtype
Out[624]: dtype('float64')
In [625]: __.astype(np.float32)
Out[625]: array([-0.64520949, 0.21554704, 2.16722512], dtype=float32)
Let me stress that astype is a method of an array. It takes the values of the array and produces a new array with the desire dtype. It does not act retroactively (or in-place) on the array itself, or on the function that created that array.
The effect of astype is often (always?) the same as a dtype parameter, but the sequence of actions is different.
In https://stackoverflow.com/a/39625960/901925 I describe a sparse matrix creator that takes a dtype parameter, and implements it with an astype method call at the end.
When you do calculations such as dot or *, it tries to match the output dtype with inputs. In the case of mixed types it goes with the higher precision alternative.
In [642]: np.arange(5,dtype=np.float32)*np.arange(5,dtype=np.float64)
Out[642]: array([ 0., 1., 4., 9., 16.])
In [643]: _.dtype
Out[643]: dtype('float64')
In [644]: np.arange(5,dtype=np.float32)*np.arange(5,dtype=np.float32)
Out[644]: array([ 0., 1., 4., 9., 16.], dtype=float32)
There are casting rules. One way to look those up is with can_cast function:
In [649]: np.can_cast(np.float64,np.float32)
Out[649]: False
In [650]: np.can_cast(np.float32,np.float64)
Out[650]: True
It is possible in some calculations that it will cast the 32 to 64, do the calculation, and then cast back to 32. The purpose would be to avoid rounding errors. But I don't know how you find that out from the documentation or tests.
arr1 = np.array([25, 56, 12, 85, 34, 75])
arr2 = np.array([42, 3, 86, 32, 856, 46])
arr1.astype(np.complex)
print (arr1)
print(type(arr1[0]))
print(arr1.astype(np.complex))
arr2 = np.array(arr2,dtype='complex')
print(arr2)
print(type(arr2[0]))
OUTPUT for above
[25 56 12 85 34 75]
<class 'numpy.int64'>
[25.+0.j 56.+0.j 12.+0.j 85.+0.j 34.+0.j 75.+0.j]
[ 42.+0.j 3.+0.j 86.+0.j 32.+0.j 856.+0.j 46.+0.j]
<class 'numpy.complex128'>
It can be seen that astype changes the type temporally as we do in normal type casting but where as the generic method changes the type permanently
.astype() copies the data.
>>> a = np.ones(3, dtype=float)
>>> a
array([ 1., 1., 1.])
>>> b = a.astype(int)
>>> b
array([1, 1, 1])
>>> np.may_share_memory(a, b)
False
Note that astype() copies the data even if the dtype is actually the same:
>>> c = a.astype(float)
>>> np.may_share_memory(a, c)
False

Resources