I'm looking for such a function Tensor.compress(*dims) where dims=(int ...) is a consecutive sequence of integer:
a = torch.rand(2,2,3)
b = a.compress(0,1)
b.size()
>>> (4,3)
I know view would work, however, in case I don't know the shape of a in advance, I have to do an extra operation to acquire its size and then do view, which is not what I want.
You do not have to explicitly "do the math", torch.view can do some of it for you if you use -1 as the shape of one of the dimensions:
b = a.view(-1, *a.shape[2:])
b.shape
>>> torch.Size([4, 3])
Related
I am learning the Transformer. Here is the pytorch document for MultiheadAttention. In their implementation, I saw there is a constraint:
assert self.head_dim * num_heads == self.embed_dim, "embed_dim must be divisible by num_heads"
Why require the constraint: embed_dim must be divisible by num_heads? If we go back to the equation
Assume:
Q, K,V are n x emded_dim matrices; all the weight matrices W is emded_dim x head_dim,
Then, the concat [head_i, ..., head_h] will be a n x (num_heads*head_dim) matrix;
W^O with size (num_heads*head_dim) x embed_dim
[head_i, ..., head_h] * W^O will become a n x embed_dim output
I don't know why we require embed_dim must be divisible by num_heads.
Let say we have num_heads=10000, the resuts are the same, since the matrix-matrix product will absort this information.
From what I understood, it is a simplification they have added to keep things simple. Theoretically, we can implement the model like you proposed (similar to the original paper).
In pytorch documention, they have briefly mentioned it.
Note that `embed_dim` will be split across `num_heads` (i.e. each head will have dimension `embed_dim` // `num_heads`)
Also, if you see the Pytorch implementation, you can see it is a bit different (optimised in my point of view) when comparing to the originally proposed model. For example, they use MatMul instead of Linear and Concat layer is ignored. Refer the below which shows the first encoder (with Btach size 32, 10 words, 512 features).
P.s:
If you need to see the model params (like the above image), this is the code I used.
import torch
transformer_model = torch.nn.Transformer(d_model=512, nhead=8, num_encoder_layers=1,num_decoder_layers=1,dim_feedforward=11) # change params as necessary
tgt = torch.rand((20, 32, 512))
src = torch.rand((11, 32, 512))
torch.onnx.export(transformer_model, (src, tgt), "transformer_model.onnx")
When you have a sequence of seq_len x emb_dim (ie. 20 x 8) and you want to use num_heads=2, the sequence will be split along the emb_dim dimension. Therefore you get two 20 x 4 sequences. You want every head to have the same shape and if emb_dim isn't divisible by num_heads this wont work. Take for example a sequence 20 x 9 and again num_heads=2. Then you would get 20 x 4 and 20 x 5 which are not the same dimension.
I have a tensor a of shape (1, N, 1). I need to left shift the tensor along dimension 1 and add a new value as replacement. I have found a way to make this work and following is the code.
a = torch.from_numpy(np.array([1, 2, 3]))
a = a.unsqueeze(0).unsqeeze(2) # (1, 3, 1), my data resembles this shape, therefore the two unsqueeze
# want to left shift a along dim 1 and insert a new value at the end
# I achieve the required shifts using the following code
b = a.squeeze
c = b.roll(shifts=-1)
c[-1] = 4
c = c.unsqueeze(0).unsqueeze(2)
# c = [[[2], [3], [4]]]
My question is, is there a simpler way to do this? Thanks.
You don't actually need to squeeze and perform your operations followed by unsqeezing your input tensor a. Instead you can directly do those two operations as follows:
# No need to squeeze
c = torch.roll(a, shifts=-1, dims=1)
c[:,-1,:] = 4
# No need to unsqeeze
# c = [[[2], [3], [4]]]
Let's say we have a tensor of size B x C x W x H (as common for batches of images), and we want to reshape it to B x M where M = C*W*H. Is there a built in way to do so without explicitly mentioning B?
If we know B in advance we can do following, even without explicitly knowing any of the three C,W,H:
a = torch.randn(20,3,512,512)
b = a.reshape((20, -1)) #we can use -1 to infer the dimension `M`
But can we also do so without knowing B?
(I know we could obviously find B using B = a.shape[0], but my question is whether it is possible without knowing B either.)
The only other way would be to calculate the second dimension and use -1 for the first.
a = torch.randn(20,3,512,512)
print(a.shape)
b = a.reshape((20, -1))
print(b.shape)
b = a.reshape((-1, 786432)) # 3*512*512
print(b.shape)
torch.Size([20, 3, 512, 512])
torch.Size([20, 786432])
torch.Size([20, 786432])
Because there can be only one -1 when reshape.
In principle you could make it a generic function working with any batch size by simply using first dimension of input, e.g.:
a = torch.randn(20, 3, 512, 512)
b = a.reshape((a.shape[0], -1))
You can wrap it in a function and just call it whenever necessary.
I have the following code:
a = torch.randint(0,10,[3,3,3,3])
b = torch.LongTensor([1,1,1,1])
I have a multi-dimensional index b and want to use it to select a single cell in a. If b wasn't a tensor, I could do:
a[1,1,1,1]
Which returns the correct cell, but:
a[b]
Doesn't work, because it just selects a[1] four times.
How can I do this? Thanks
A more elegant (and simpler) solution might be to simply cast b as a tuple:
a[tuple(b)]
Out[10]: tensor(5.)
I was curious to see how this works with "regular" numpy, and found a related article explaining this quite well here.
You can split b into 4 using chunk, and then use the chunked b to index the specific element you want:
>> a = torch.arange(3*3*3*3).view(3,3,3,3)
>> b = torch.LongTensor([[1,1,1,1], [2,2,2,2], [0, 0, 0, 0]]).t()
>> a[b.chunk(chunks=4, dim=0)] # here's the trick!
Out[24]: tensor([[40, 80, 0]])
What's nice about it is that it can be easily generalized to any dimension of a, you just need to make number of chucks equal the dimension of a.
This is the code I'm trying to write im new to coding so im sure im way off any help would be great. Thank you in advance.
Write a function normalize(vector) which takes in a vector and returns the normalized vector with respect to the infinity norm. i.e. (1/infNorm(vector)) * vector.
def normalize(vector):
infNorm(vector) = abs(vector[0])
for i in vector:
if abs(i) > norm:
infNorm(vector) = abs(i)
finalvector = (1/infNorm(vector)) * vector
return finalvector
vector = [2, 5, 7]
print(normalize(vector))
You are confusing function call parameters using () with sequence indices []. By sequence, I mean a Python sequence, which includes things like tuples and lists. Here, you're using a list as a vector. (You could also use tuples, but only if you don't plan to modify them. So we'll stick with lists, for generality and simplicity.)
Also, you need two loops: one to find the norm, and one to apply it.
def infnorm(vector):
norm = 0
for i in range(len(vector)):
if abs(vector[i]) > norm:
norm = vector[i]
return norm
def normalize(vector):
norm = infnorm(vector)
return [v/norm for v in vector]
vector = [2, 5, 7]
print(normalize(vector))
Results:
[0.2857142857142857, 0.7142857142857143, 1.0]
Note that I didn't take the absolute value of each element before normalizing it. I'm no vector wizard, so that might be wrong, but I'm guessing that the normalized vector can have negative values.
The last tricky bit, the return value for normalize(vector), is called a "list comprehension". It's a nifty python trick to build a list using a formula. They look odd at first, but with a little practice it gets easy and they're quite precise and clear. Check it out.
If you are going to use a for loop to find the maximum value of an array in python, I'd suggest splitting the normalize function in two functions, one to get the infinity norm and another one to calculate the vector, as such:
def infNorm(vector):
norm = vector[0]
for element in vector:
if norm < abs(element):
norm = abs(element)
return norm
def normalize(vector):
norm = infNorm(vector)
new_vector = []
for element in vector:
new_vector.append((1.0/norm)*element)
return new_vector
Otherwise, you could use the max() built-in function from python, with such function, the code would look like this:
def normalize(vector):
norm = abs(max(vector, key=abs))
new_vector = []
for element in vector:
new_vector.append((1.0/norm)*element)
return new_vector
By the way, when you have a symbol, followed by parenthesis, you are trying to invoke a function.So, when you do infNorm(vector) = abs(vector[0]), you are trying to assign a value to a function call, which will result in a syntax error. The correct way would be just infNorm = abs(vector[0]).
The infinity norm is the sum of the absolute values of the elements. For instance, here is what sagemath offers for one vector, for the infinity norm, the 2-norm and the 1-norm.
In general to normalise a vector according to a norm you divide each of its elements by its length in that norm.
Then this can be expressed in Python in this way:
>>> vec = [-2, 5, 3]
>>> inf_norm = sum([abs(v) for v in vec])
>>> inf_norm
10
>>> normalised_vec = [v/inf_norm for v in vec]
>>> normalised_vec
[-0.2, 0.5, 0.3]