Understanding an Einsum usage for graph convolution - pytorch

I am reading the code for the spatial-temporal graph convolution operation here:
https://github.com/yysijie/st-gcn/blob/master/net/utils/tgcn.py and I'm having some difficulty understanding what is happening with an einsum operation. In particular
For x a tensor of shape (N, kernel_size, kc//kernel_size, t, v), where
kernel_size is typically 3, and lets say kc=64*kernel_size, t is the number of frames, say 64, and v the number of vertices, say 25. N is the batch size.
Now for a tensor A of shape (3, 25, 25) where each dimension is a filtering op on the graph vertices, an einsum is computed as:
x = torch.einsum('nkctv,kvw->nctw', (x, A))
I'm not sure how to interpret this expression. What I think it's saying is that for each batch element, for each channel c_i out of 64, sum each of the three matrices obtained by matrix multiplication of the (64, 25) feature map at that channel with the value of A[i]. Do I have this correct? The expression is a bit of a mouthful, and notation wise there seems to be a bit of a strange usage of kc as one variable name, but then decomposition of k as kernel size and c as the number of channels (192//3 = 64) in the expression for the einsum. Any insights appreciated.

Helps when you look closely at the notation:
nkctv for left side
kvw on the right side
nctw being the result
Missing from the result are:
k
v
These elements are summed together into a single value and squeezed, leaving us the resulting shape.
Something along the lines of (expanded shapes (added 1s) are broadcasted and sum per element):
left: (n, k, c, t, v, 1)
right: (1, k, 1, 1, v, w)
Now it goes (l, r for left and right):
torch.mul(l, r)
torch.sum(l, r, dim=(1, 4))
squeeze any singular dimensions
It is pretty hard to get, hence Einstein's summation helps in terms of thinking about resulting shapes “mixed” with each other, at least for me.

Y = torch.einsum('nkctv,kvw->nctw', (x, A)) means:
einsum interpretation on graph
For better understanding, I have replaced the x in left hand side with Y

Related

PyTorch doubly stochastic normalisation of 3D tensor

I'm trying to implement double stochastic normalisation of an N x N x P tensor as described in Section 3.2 in Gong, CVPR 2019. This can be done easily in the N x N case using matrix operations but I am stuck with the 3D tensor case. What I have so far is
def doubly_stochastic_normalise(E):
"""E: n x n x f"""
E = E / torch.sum(E, dim=1, keepdim=True) # normalised across rows
F = E / torch.sum(E, dim=0, keepdim=True) # normalised across cols
E = torch.einsum('ijp,kjp->ikp', E, F)
return E
but I'm wondering if there is a method without einsum.
In this setting, you can always fall back to using torch.matmul (batched matrix multiplication to be more precise). However, this requires you to transpose the axis. Recall the matrix multiplication for two 3D inputs, in einsum notation, it gives us:
bik,bkj->bij
Notice how the k dimension gets reduces. To get to this setting, we need to transpose the inputs of the operator. In your case we have:
ijp ? kjp -> ikp
↓ ↓ ↑
pij # pjk -> pik
This translates to:
>>> (E.permute(2,0,1) # F.permute(2,1,0)).permute(1,2,0)
# ijp ➝ pij kjp ➝ pjk pik ➝ ikp
You can argue your method is not only shorter but also a lot more readable. I would therefore stick with torch.einsum. The reason why the einsum operator is so useful here is because you can perform axes transpositions on the fly.

Vectorized implementation of field-aware factorization

I would like to implement the field-aware factorization model (FFM) in a vectorized way. In FFM, a prediction is made by the following equation
where w are the embeddings that depend on the feature and the field of the other feature. For more info, see equation (4) in FFM.
To do so, I have defined the following parameter:
import torch
W = torch.nn.Parameter(torch.Tensor(n_features, n_fields, n_factors), requires_grad=True)
Now, given an input x of size (batch_size, n_features), I want to be able to compute the previous equation. Here is my current (non-vectorized) implementation:
total_inter = torch.zeros(x.shape[0])
for i in range(n_features):
for j in range(i + 1, n_features):
temp1 = torch.mm(
x[:, i].unsqueeze(1),
W[i, feature2field[j], :].unsqueeze(0))
temp2 = torch.mm(
x[:, j].unsqueeze(1),
W[j, feature2field[i], :].unsqueeze(0))
total_inter += torch.sum(temp1 * temp2, dim=1)
Unsurprisingly, this implementation is horribly slow since n_features can easily be as large as 1000! Note however that most of the entries of x are 0. All inputs are appreciated!
Edit:
If it can help in any ways, here are some implementations of this model in PyTorch:
pytorch-fm
ctr_model_zoo
Unfortunately, I cannot figure out exactly how they have done it.
Additional update:
I can now obtain the product of x and W in a more efficient way by doing:
temp = torch.einsum('ij, jkl -> ijkl', x, W)
Thus, my loop is now:
total_inter = torch.zeros(x.shape[0])
for i in range(n_features):
for j in range(i + 1, n_features):
temp1 = temp[:, i, feature2field[j], :]
temp2 = temp[:, j, feature2field[i], :]
total_inter += 0.5 * torch.sum(temp1 * temp2, dim=1)
It is however still too long since this loop goes over for about 500 000 iterations.
Something that could potentially help you speed up the multiplication is using pytorch sparse tensors.
Also something that might work would be the following:
Create n arrays, one for each feature i that would hold its corresponding field factors in each row. e.g. for feature i = 0
[ W[0, feature2field[0], :],
W[0, feature2field[1], :],
W[0, feature2field[n], :]]
Then calculate the multiplication of those arrays, lets call them F, with X
R[i] = F[i] * X
So each element in R would hold the result of the multiplication, an array, of the F[i] with X.
Next you would multiply each R[i] with its transpose
R[i] = R[i] * R[i].T
Now you can do the summation in a loop like before
for i in range(n_features):
total_inter += torch.sum(R[i], dim=1)
Please take this with a grain of salt as i haven't tested it. In any case i think that it will point you in the right direction.
One problem that might occur is in the transpose multiplication in which each element will also be multiplied with itself and then be added in the sum. I don't think it will affect the classifier but in any case you can make the elements in the diagonal of the transpose and above 0 (including the diagonal).
Also although minor nevertheless please move the 1st unsqueeze operation outside of the nested for loop.
I hope it helps.

Compute sum of pairwise sums of two array's columns

I am looking for a way to avoid the nested loops in the following snippet, where A and B are two-dimensional arrays, each of shape (m, n) with m, n beeing arbitray positive integers:
import numpy as np
m, n = 5, 2
a = randint(0, 10, (m, n))
b = randint(0, 10, (m, n))
out = np.empty((n, n))
for i in range(n):
for j in range(n):
out[i, j] = np.sum(A[:, i] + B[:, j])
The above logic is roughly equivalent to
np.einsum('ij,ik', A, B)
with the exception that einsum computes the sum of products.
Is there a way, equivalent to einsum, that computes a sum of sums? Or do I have to write an extension for this operation?
einsum needs to perform elementwise multiplication and then it does summing (optional). As such it might not be applicable/needed to solve our case. Read on!
Approach #1
We can leverage broadcasting such that the first axes are aligned
and second axis are elementwise summed after extending dimensions to 3D. Finally, we need summing along the first axis -
(A[:,:,None] + B[:,None,:]).sum(0)
Approach #2
We can simply do outer addition of columnar summations of each -
A.sum(0)[:,None] + B.sum(0)
Approach #3
And hence, bring in einsum -
np.einsum('ij->j',A)[:,None] + np.einsum('ij->j',B)
You can also use numpy.ufunc.outer, specifically here numpy.add.outer after summing along axis 0 as #Divakar mentioned in #approach 2
In [126]: numpy.add.outer(a.sum(0), b.sum(0))
Out[126]:
array([[54, 67],
[43, 56]])

No N-dimensional tranpose in PyTorch

PyTorch's torch.transpose function only transposes 2D inputs. Documentation is here.
On the other hand, Tensorflow's tf.transpose function allows you to transpose a tensor of N arbitrary dimensions.
Can someone please explain why PyTorch does not/cannot have N-dimension transpose functionality? Is this due to the dynamic nature of the computation graph construction in PyTorch versus Tensorflow's Define-then-Run paradigm?
It's simply called differently in pytorch. torch.Tensor.permute will allow you to swap dimensions in pytorch like tf.transpose does in TensorFlow.
As an example of how you'd convert a 4D image tensor from NHWC to NCHW (not tested, so might contain bugs):
>>> img_nhwc = torch.randn(10, 480, 640, 3)
>>> img_nhwc.size()
torch.Size([10, 480, 640, 3])
>>> img_nchw = img_nhwc.permute(0, 3, 1, 2)
>>> img_nchw.size()
torch.Size([10, 3, 480, 640])
Einops supports verbose transpositions for arbitrary number of dimensions:
from einops import rearrange
x = torch.zeros(10, 3, 100, 100)
y = rearrange(x, 'b c h w -> b h w c')
x2 = rearrange(y, 'b h w c -> b c h w') # inverse to the first
(and the same code works for tensorfow as well)

Why do we need Theano reshape?

I don't understand why do we need tensor.reshape() function in Theano. It is said in the documentation:
Returns a view of this tensor that has been reshaped as in
numpy.reshape.
As far as I understood, theano.tensor.var.TensorVariable is some entity that is used for computation graphs creation. And it is absolutely independent of shapes. For instance when you create your function you can pass there matrix 2x2 or matrix 100x200. As I thought reshape somehow restricts this variety. But it is not. Suppose the following example:
X = tensor.matrix('X')
X_resh = X.reshape((3, 3))
Y = X_resh ** 2
f = theano.function([X_resh], Y)
print(f(numpy.array([[1, 2], [3, 4]])))
As I understood, it should give an error since I passed matrix 2x2 not 3x3, but it computes element-wise squares perfectly.
So what is the shape of the theano tensor variable and where should we use it?
There is an error in the provided code though Theano fails to point this out.
Instead of
f = theano.function([X_resh], Y)
you should really use
f = theano.function([X], Y)
Using the original code you are actually providing the tensor after the reshape so the reshape command never gets executed. This can be seen by adding
theano.printing.debugprint(f)
which prints
Elemwise{sqr,no_inplace} [id A] '' 0
|<TensorType(float64, matrix)> [id B]
Note that there is no reshape operation in this compiled execution graph.
If one changes the code so that X is used as the input instead of X_resh then Theano throws an error including the message
ValueError: total size of new array must be unchanged Apply node that
caused the error: Reshape{2}(X, TensorConstant{(2L,) of 3})
This is expected because one cannot reshape a tensor with shape (2, 2) (i.e. 4 elements) into a tensor with shape (3, 3) (i.e. 9 elements).
To address the broader question, we can use symbolic expressions in the target shape and those expressions can be functions of the input tensor's symbolic shape. Here's some examples:
import numpy
import theano
import theano.tensor
X = theano.tensor.matrix('X')
X_vector = X.reshape((X.shape[0] * X.shape[1],))
X_row = X.reshape((1, X.shape[0] * X.shape[1]))
X_column = X.reshape((X.shape[0] * X.shape[1], 1))
X_3d = X.reshape((-1, X.shape[0], X.shape[1]))
f = theano.function([X], [X_vector, X_row, X_column, X_3d])
for output in f(numpy.array([[1, 2], [3, 4]])):
print output.shape, output

Resources