I would like to modify the back-propagation for the embedding layer but I don't understand where the definition is.
In the definition available in https://pytorch.org/docs/stable/_modules/torch/nn/functional.html in the embedding function, they call torch.embedding and here there should be defined how the weights are updated.
So my question is:
Where can I find the documentation of torch.embedding?
It calls underlying C function, in my torch build(version 4) this file.
Related
It is difficult to retrain my models in new data because I never remember my initial optimizer, loss function, and hyperparameters. How can I extract all arguments I am passing to a TensorFlow function? Let's say from the code below, how to extract a list with the arguments learning_rate, beta_1, beta_2, and so on.
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001,
beta_1=0.9,beta_2=0.999,
epsilon=1e-07, amsgrad=False,
name="Adam")
I just want to extract names thus I can later on call them by for example:
optimizer.learning_rate
I have try .keys(), .classes(), but nothing work. Of course I can inspect it using dir(optimizer) but the output is not filtered.
I just found a way. The drawback it requires compiling the model first. I will post it because maybe someone has the same issue.
model.optimizer.get_config()
The softmax function obtains the weights and then MatMul with V.
Are the weights stored anywhere? Or how the learning process happened if the weights are not stored or used on the next round?
Moreover, the linear transformation does not use the weights!
Source code: https://github.com/fawazsammani/chatbot-transformer/blob/master/models.py
I would draw your attention to read the documentation always
So as we can see if we continue to the code implementation of nn.linear layer
we will see this line :
self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
which is the weights that you are asking about.
Hope this answers your question!
I have a quantized model in pytorch and now I want to extract the parameter of the quantized linear layer and implement the forward manually.
I search the source code but only find this function.
def forward(self, x: torch.Tensor) -> torch.Tensor:
return torch.ops.quantized.linear(
x, self._packed_params._packed_params, self.scale, self.zero_point)
But no where I can find how torch.ops.quantized.linear is defined.
Can someone give me a hind how the forward of quantized linear are defined?
In answer to the question of where torch.ops.quantized.linear is, I was looking for the same thing but was never able to find it. I believe it's probably somewhere in the aten (C++ namespace). I did, however, find some useful PyTorch-based implementations in the NVIDIA TensorRT repo below. It's quite possible these are the ones actually called by PyTorch via some DLLs. If you're trying to add quantization to a custom layer, these implementations walk you through it.
You can find the docs here and the GitHub page here.
For the linear layer specifically, see the QuantLinear layer here
Under the hood, this calls TensorQuantFunction.apply() for post-training quantization or FakeTensorQuantFunction.apply() for quantization-aware training.
I have a problem statement where I want to predict multiple continuous outputs using a text input. I tried using 'robertaforsequenceclassification' from HuggingFace library. But the documentation states that when the number of outputs in the final layer is more than 1, a cross entropy loss is used automatically as mentioned here: https://huggingface.co/transformers/v2.2.0/model_doc/bert.html#transformers.BertForSequenceClassification.
But I want to use an RMSE loss in a regression setting with two classes in the final layer. How would one go about modifying it?
BertForSequenceClassification is a small wrapper that wraps the BERTModel.
It calls the models, takes the pooled output (the second member of the output tuple), and applies a classifier over it. The code is here https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_bert.py#L1168
The simplest solution is writing your own simple wrapper class (based on the BertForSequenceClassification class) hat will do the regression that will do the regression with the loss you like.
The "embedding" class documentation https://pytorch.org/docs/stable/nn.html says
max_norm (float, optional) – If given, will renormalize the embedding vectors to have a norm lesser than this before extracting.
1) In my model, I use this embedding class as a parameter, not just as an input (the model learns the embedding.) In this case, I assume every time when updates happen, the embedding gets renormalized, not only when it's initialized. Is my understanding correct?
2) I wanted to confirm 1) by looking at the source, but I couldn't find the implementation in pytorch embedding class. https://pytorch.org/docs/stable/_modules/torch/nn/modules/sparse.html
Can someone point me to the max_norm implementation?
If you see forward function in Embedding class here, there is a reference to torch.nn.functional.embedding which uses embedding_renorm_ which is in the cpp documentation here which means it is a cpp implementation. Some github search on pytorch repo pointed to this files (1, 2).
Answer to 1 is yes. Answer to 2 is above.