Why is the timm visual transformer position embedding initializing to zeros?

Why is the timm visual transformer position embedding initializing to zeros? - pytorch

I'm looking at the timm implementation of visual transformers and for the positional embedding, he is initializing his position embedding with zeros as follows:
self.pos_embed = nn.Parameter(torch.zeros(1, num_patches + 1, embed_dim))
See here:
https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vision_transformer.py#L309
I'm not sure how this actually embeds anything about the position when it is later added to the patch?
x = x + self.pos_embed
Any feedback is appreciated.

The positional embedding is a parameter that gets included in the computational graph and gets updated during training. So, it doesn't matter if you initialize with zeros; they are learned during training.

Related

NLP: transformer learning weights

The softmax function obtains the weights and then MatMul with V.
Are the weights stored anywhere? Or how the learning process happened if the weights are not stored or used on the next round?
Moreover, the linear transformation does not use the weights!
Source code: https://github.com/fawazsammani/chatbot-transformer/blob/master/models.py

I would draw your attention to read the documentation always
So as we can see if we continue to the code implementation of nn.linear layer
we will see this line :
self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
which is the weights that you are asking about.
Hope this answers your question!

Why are input data transferrd to Variable type?

I'm reading code that implementing YOLOv3 with Pytorch, and coming with a line like this:
for batch_i, (_, imgs, targets) in enumerate(dataloader):
batches_done = len(dataloader) * epoch + batch_i
imgs = Variable(imgs.to(device)) # ??
targets = Variable(targets.to(device), requires_grad=False)
imgs is the input data, and I can't understand why there exits the transform: Variable(imgs.to(device)))
Does this mean that the input data should be trained(since the default option is that requires_grad=true) or is there another reason?

As Natthaphon pointed out in his comment I don't really see the calls to Variable make any sense in the scenario.
Technically the Variable automatically becomes part of the computational graph. So maybe it's written by someone coming over from tensorflow or with visualization of the complete computational graph in mind.

if you read doc here
the Variable API has been deprecated.
Hence, we should not bother using Variable to wrap a tensor anymore.
you can proceed with the variable wrapper in latest torch version.

pytorch where is Embedding "max_norm" implemented?

The "embedding" class documentation https://pytorch.org/docs/stable/nn.html says
max_norm (float, optional) – If given, will renormalize the embedding vectors to have a norm lesser than this before extracting.
1) In my model, I use this embedding class as a parameter, not just as an input (the model learns the embedding.) In this case, I assume every time when updates happen, the embedding gets renormalized, not only when it's initialized. Is my understanding correct?
2) I wanted to confirm 1) by looking at the source, but I couldn't find the implementation in pytorch embedding class. https://pytorch.org/docs/stable/_modules/torch/nn/modules/sparse.html
Can someone point me to the max_norm implementation?

If you see forward function in Embedding class here, there is a reference to torch.nn.functional.embedding which uses embedding_renorm_ which is in the cpp documentation here which means it is a cpp implementation. Some github search on pytorch repo pointed to this files (1, 2).
Answer to 1 is yes. Answer to 2 is above.

Use pretrained embedding in Spanish with Torchtext

I am using Torchtext in an NLP project. I have a pretrained embedding in my system, which I'd like to use. Therefore, I tried:
my_field.vocab.load_vectors(my_path)
But, apparently, this only accepts the names of a short list of pre-accepted embeddings, for some reason. In particular, I get this error:
Got string input vector "my_path", but allowed pretrained vectors are ['charngram.100d', 'fasttext.en.300d', ..., 'glove.6B.300d']
I found some people with similar problems, but the solutions I can find so far are "change Torchtext source code", which I would rather avoid if at all possible.
Is there any other way in which I can work with my pretrained embedding? A solution that allows to use another Spanish pretrained embedding is acceptable.
Some people seem to think it is not clear what I am asking. So, if the title and final question are not enough: "I need help using a pre-trained Spanish word-embedding in Torchtext".

It turns out there is a relatively simple way to do this without changing Torchtext's source code. Inspiration from this Github thread.
1. Create numpy word-vector tensor
You need to load your embedding so you end up with a numpy array with dimensions (number_of_words, word_vector_length):
my_vecs_array[word_index] should return your corresponding word vector.
IMPORTANT. The indices (word_index) for this array array MUST be taken from Torchtext's word-to-index dictionary (field.vocab.stoi). Otherwise Torchtext will point to the wrong vectors!
Don't forget to convert to tensor:
my_vecs_tensor = torch.from_numpy(my_vecs_array)
2. Load array to Torchtext
I don't think this step is really necessary because of the next one, but it allows to have the Torchtext field with both the dictionary and vectors in one place.
my_field.vocab.set_vectors(my_field.vocab.stoi, my_vecs_tensor, word_vectors_length)
3. Pass weights to model
In your model you will declare the embedding like this:
my_embedding = toch.nn.Embedding(vocab_len, word_vect_len)
Then you can load your weights using:
my_embedding.weight = torch.nn.Parameter(my_field.vocab.vectors, requires_grad=False)
Use requires_grad=True if you want to train the embedding, use False if you want to freeze it.
EDIT: It looks like there is another way that looks a bit easier! The improvement is that apparently you can pass the pre-trained word vectors directly during the vocabulary-building step, so that takes care of steps 1-2 here.

pytorch - modify embedding backpropagation

I would like to modify the back-propagation for the embedding layer but I don't understand where the definition is.
In the definition available in https://pytorch.org/docs/stable/_modules/torch/nn/functional.html in the embedding function, they call torch.embedding and here there should be defined how the weights are updated.
So my question is:
Where can I find the documentation of torch.embedding?

It calls underlying C function, in my torch build(version 4) this file.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Why is the timm visual transformer position embedding initializing to zeros? - pytorch

The positional embedding is a parameter that gets included in the computational graph and gets updated during training. So, it doesn't matter if you initialize with zeros; they are learned during training.

Related

NLP: transformer learning weights

Why are input data transferrd to Variable type?

pytorch where is Embedding "max_norm" implemented?

Use pretrained embedding in Spanish with Torchtext

pytorch - modify embedding backpropagation

Categories

Resources