Torch mv behavior not understandable - pytorch

The following screenshots show that torch.mv is unusable in a situation that obviously seem to be correct... how is this possible, any idea what can be the problem?
this first image shows the correct situation, where the vector has 10 rows for a matrix of 10 columns, but I showed the other also just in case. Also swapping w.mv(x) for x.mv(w) does not make a difference.
However, the # operator works... the thing is that for my own reasons I want to use mv, so I would like to know what the problem is.

According to documentation:
torch.mv(input, vec, *, out=None) → Tensor
If input is a (n×m) tensor, vec is a 1-D tensor of size m, out will be 1-D of size n.
The x here should be 1-D, but in your case it's 10x1 (2D). You can remove extra dimension (or create a single dimension x)
>>> w.mv(x.squeeze())
tensor([ 0.1432, -2.0639, -2.1871, -1.8837, 0.7333, -0.4000, 0.4023, -1.1318,
0.0423, -1.2136])
>>> w # x
tensor([[ 0.1432],
[-2.0639],
[-2.1871],
[-1.8837],
[ 0.7333],
[-0.4000],
[ 0.4023],
[-1.1318],
[ 0.0423],
[-1.2136]])

Related

Retrieve elements from a 3D tensor with a 2D index tensor

I am playing around with GPT2 and I have 2 tensors:
O: An output tensor of shaped (B, S-1, V) where B is the batch size S is the the number of timestep and V is the vocabulary size. This is the output of a generative model and is softmaxed along the 2nd dimension.
L: A 2D tensor shaped (B, S-1) where each element is the index of the correct token for each timestep for each sample. This is basically the labels.
I want to extract the predicted probability of the corresponding correct token from tensor O based on tensor L such that I will end up with a 2D tensor shaped (B, S). Is there an efficient way of doing this apart from using loops?
For reference, I based my answer on this Medium article.
Essentially, your answer lies in torch.gather, assuming that both of your tensors are just regular torch.Tensors (or can be converted to one).
import torch
# Specify some arbitrary dimensions for now
B = 3
V = 6
S = 4
# Make example reproducible
torch.manual_seed(42)
# L necessarily has to be a torch.LongTensor, otherwise indexing will fail.
L = torch.randint(0, V, size=[B, S])
O = torch.rand([B, S, V])
# Now collect the results. L needs to have similar dimension,
# except in the axis you want to collect along.
X = torch.gather(O, dim=2, index=L.unsqueeze(dim=2))
# Make sure X has no "unnecessary" dimension
X = X.squeeze(dim=2)
It is a bit difficult to see whether this produces the exact correct results, which is why I included a random seed which makes the example deterministic in the result, and you an easily verify that it gets you the desired results. However, for clarification, one could also use a lower-dimensional tensor, for which this becomes clearer what exactly torch.gather does.
Note that torch.gather also allows you to index multiple indexes in the same row theoretically. Meaning if you instead got a multiclass example for which multiple values are correct, you could similarly use a tensor L of shape [B, S, number_of_correct_samples].

ValuerError regarding dimensions when declaring PyTorch tensor

I'm currently trying to convert a list of values into a PyTorch tensor and am facing some difficulties.
The exact code that's causing the error is:
input_tensor = torch.cuda.FloatTensor(data)
Here, data is a list with two elements: The first element is another list of NumPy arrays and the second element is a list of tuples. The sizes of both lists differ, and I believe this is causing the following error:
*** ValueError: expected sequence of length x at dim 2 (got y)
Usually y is larger than x. I've tried playing around with an IPython terminal to see what's wrong, and it appears that trying to convert data of this format directly into PyTorch tensors doesn't work. Taking each individual element of the data list and converting those into tensors works, though.
Does anybody know why this doesn't work and perhaps also be able to provide some feedback on how to achieve my original goal? Thanks in advance.
Let's say that the first sublist of data contains n 1D arrays, each of size m, and the second sublist contains k tuples, each of size p.
When calling torch.FloatTensor(data) each sublist is converted to a 2D tensor, of shape (n, m) and of shape (k, p) respectively; then they are stack together to form a 3D tensor. This is possible only if n=k and m=p -- think of a 3D tensor as a cuboid.
This is quite obvious I think, so I guess you have m = p and want to create a 2D tensor of shape (n+k, m) by simply concatenating the two sublists:
torch.FloatTensor(np.concatenate(data))

In PyTorch, what makes a tensor have non-contiguous memory?

According to this SO and this PyTorch discussion, PyTorch's view function works only on contiguous memory, while reshape does not. In the second link, the author even claims:
[view] will raise an error on a non-contiguous tensor.
But when does a tensor have non-contiguous memory?
This is a very good answer, which explains the topic in the context of NumPy. PyTorch works essentially the same. Its docs don't generally mention whether function outputs are (non)contiguous, but that's something that can be guessed based on the kind of the operation (with some experience and understanding of the implementation). As a rule of thumb, most operations preserve contiguity as they construct new tensors. You may see non-contiguous outputs if the operation works on the array inplace and change its striding. A couple of examples below
import torch
t = torch.randn(10, 10)
def check(ten):
print(ten.is_contiguous())
check(t) # True
# flip sets the stride to negative, but element j is still adjacent to
# element i, so it is contiguous
check(torch.flip(t, (0,))) # True
# if we take every 2nd element, adjacent elements in the resulting array
# are not adjacent in the input array
check(t[::2]) # False
# if we transpose, we lose contiguity, as in case of NumPy
check(t.transpose(0, 1)) # False
# if we transpose twice, we first lose and then regain contiguity
check(t.transpose(0, 1).transpose(0, 1)) # True
In general, if you have non-contiguous tensor t, you can make it contiguous by calling t = t.contiguous(). If t is contiguous, call to t.contiguous() is essentially a no-op, so you can do that without risking a big performance hit.
I think your title contiguous memory is a bit misleading. As I understand, contiguous in PyTorch means if the neighboring elements in the tensor are actually next to each other in memory. Let's take a simple example:
x = torch.tensor([[1, 2, 3], [4, 5, 6]]) # x is contiguous
y = torch.transpose(0, 1) # y is non-contiguous
According to documentation of tranpose():
Returns a tensor that is a transposed version of input. The given dimensions dim0 and dim1 are swapped.
The resulting out tensor shares it’s underlying storage with the input tensor, so changing the content of one would change the content of the other.
So that x and y in the above example share the same memory space. But if you check their contiguity with is_contiguous(), you will find that x is contiguous and y is not. Now you will find that contiguity does not refer to contiguous memory.
Since x is contiguous, x[0][0] and x[0][1] are next to each other in memory. But y[0][0] and y[0][1] is not. That is what contiguous means.

Copying data from one tensor to another using bit masking

import numpy as np
import torch
a = torch.zeros(5)
b = torch.tensor(tuple((0,1,0,1,0)),dtype=torch.uint8)
c= torch.tensor([7.,9.])
print(a[b].size())
a[b]=c
print(a)
torch.Size([2])tensor([0., 7., 0., 9., 0.])
I am struggling to understand how this works. I initially thought the above code was using Fancy indexing but I realised that values from c tensors are getting copied corresponding to the indices marked 1. Also, if I don't specify dtype of b as uint8 then the above code does not work. Can someone please explain me the mechanism of the above code.
Indexing with arrays works the same as in numpy and most other vectorized math packages I am aware of. There are two cases:
When b is of type uint8 (think boolean, pytorch doesn't distinguish bool from uint8), a[b] is a 1-d array containing the subset of values of a (a[i]) for which the corresponding in b (b[i]) was nonzero. These values are aliased to the original a so if you modify them, their corresponding locations will change as well.
The alternative type you can use for indexing is an array of int64, in which case a[b] creates an array of shape (*b.shape, *a.shape[1:]). Its structure is as if each element of b (b[i]) was replaced by a[i]. In other words, you create a new array by specifying from which indexes of a should the data be fetched. Again, the values are aliased to the original a, so if you modify a[b] the values of a[b[i]], for each i, will change. An example usecase is shown in this question.
These two modes are explained for numpy in integer array indexing and boolean array indexing, where for the latter you have to keep in mind that pytorch uses uint8 in place of bool.
Also, if your goal is to copy data from one tensor to another you have to keep in mind that an operation like a[ixs] = b[ixs] is an in-place operation (a is modified in place), which my not play well with autograd. If you want to do out of place masking, use torch.where. An example usecase is shown in this answer.

(Incremental)PCA's Eigenvectors are not transposed but should be?

When we posted a homework assignment about PCA we told the course participants to pick any way of calculating the eigenvectors they found. They found multiple ways: eig, eigh (our favorite was svd). In a later task we told them to use the PCAs from scikit-learn - and were surprised that the results differed a lot more than we expected.
I toyed around a bit and we posted an explanation to the participants that either solution was correct and probably just suffered from numerical instabilities in the algorithms. However, recently I picked that file up again during a discussion with a co-worker and we quickly figured out that there's an interesting subtle change to make to get all results to be almost equivalent: Transpose the eigenvectors obtained from the SVD (and thus from the PCAs).
A bit of code to show this:
def pca_eig(data):
"""Uses numpy.linalg.eig to calculate the PCA."""
data = data.T # data
val, vec = np.linalg.eig(data)
return val, vec
versus
def pca_svd(data):
"""Uses numpy.linalg.svd to calculate the PCA."""
u, s, v = np.linalg.svd(data)
return s ** 2, v
Does not yield the same result. Changing the return of pca_svd to s ** 2, v.T, however, works! It makes perfect sense following the definition by wikipedia: The SVD of X follows X=UΣWT where
the right singular vectors W of X are equivalent to the eigenvectors of XTX
So to get the eigenvectors we need to transposed the output v of np.linalg.eig(...).
Unless there is something else going on? Anyway, the PCA and IncrementalPCA both show wrong results (or eig is wrong? I mean, transposing that yields the same equality), and looking at the code for PCA reveals that they are doing it as I did it initially:
U, S, V = linalg.svd(X, full_matrices=False)
# flip eigenvectors' sign to enforce deterministic output
U, V = svd_flip(U, V)
components_ = V
I created a little gist demonstrating the differences (nbviewer), the first with PCA and IncPCA as they are (also no transposition of the SVD), the second with transposed eigenvectors:
Comparison without transposition of SVD/PCAs (normalized data)
Comparison with transposition of SVD/PCAs (normalized data)
As one can clearly see, in the upper image the results are not really great, while the lower image only differs in some signs, thus mirroring the results here and there.
Is this really wrong and a bug in scikit-learn? More likely I am using the math wrong – but what is right? Can you please help me?
If you look at the documentation, it's pretty clear from the shape that the eigenvectors are in the rows, not the columns.
The point of the sklearn PCA is that you can use the transform method to do the correct transformation.

Resources