The "embedding" class documentation https://pytorch.org/docs/stable/nn.html says
max_norm (float, optional) – If given, will renormalize the embedding vectors to have a norm lesser than this before extracting.
1) In my model, I use this embedding class as a parameter, not just as an input (the model learns the embedding.) In this case, I assume every time when updates happen, the embedding gets renormalized, not only when it's initialized. Is my understanding correct?
2) I wanted to confirm 1) by looking at the source, but I couldn't find the implementation in pytorch embedding class. https://pytorch.org/docs/stable/_modules/torch/nn/modules/sparse.html
Can someone point me to the max_norm implementation?
If you see forward function in Embedding class here, there is a reference to torch.nn.functional.embedding which uses embedding_renorm_ which is in the cpp documentation here which means it is a cpp implementation. Some github search on pytorch repo pointed to this files (1, 2).
Answer to 1 is yes. Answer to 2 is above.
Related
I have a quantized model in pytorch and now I want to extract the parameter of the quantized linear layer and implement the forward manually.
I search the source code but only find this function.
def forward(self, x: torch.Tensor) -> torch.Tensor:
return torch.ops.quantized.linear(
x, self._packed_params._packed_params, self.scale, self.zero_point)
But no where I can find how torch.ops.quantized.linear is defined.
Can someone give me a hind how the forward of quantized linear are defined?
In answer to the question of where torch.ops.quantized.linear is, I was looking for the same thing but was never able to find it. I believe it's probably somewhere in the aten (C++ namespace). I did, however, find some useful PyTorch-based implementations in the NVIDIA TensorRT repo below. It's quite possible these are the ones actually called by PyTorch via some DLLs. If you're trying to add quantization to a custom layer, these implementations walk you through it.
You can find the docs here and the GitHub page here.
For the linear layer specifically, see the QuantLinear layer here
Under the hood, this calls TensorQuantFunction.apply() for post-training quantization or FakeTensorQuantFunction.apply() for quantization-aware training.
In my understanding, DCGAN use convolution layer in both Generator and Discriminator, and WGAN adjust the loss function, optimizer, clipping and last sigmoid function. The part they control is not overlapping. So are there any conflict if i implement both changes of DCGAN & WGAN in one model?
According to my experience, DCGAN proposed a well-tuned and simple model (or more specifically we can say it proposed a simple network structure with well-tuned optimizer) to generate images.WGAN proposed a new method of measuring the distance between data distribution and model distribution and theoretically solved the GAN's problem:unstable,mode collpase etc.
So you can utilize the network structure and parameters proposed in DCGAN and the way of updating parameters of discriminator and generator proposed in WGAN. And i've done that before, It's not conflict.
But in practice, you might not get a very good result when you implement WGAN.It's more advisable to implement WGAN-GP
There is an image generated by WGAN-GP
images generated by WGAN-GP
Hope my answer is helpful.
I saw both transformer and estimator were mentioned in the sklearn documentation.
Is there any difference between these two words?
The basic difference is that a:
Transformer transforms the input data (X) in some ways.
Estimator predicts a new value (or values) (y) by using the input data (X).
Both the Transformer and Estimator should have a fit() method which can be used to train them (they learn some characteristics of the data). The signature is:
fit(X, y)
fit() does not return any value, just stores the learnt data inside the object.
Here X represents the samples (feature vectors) and y is the target vector (which may have single or multiple values per corresponding sample in X). Note that y can be optional in some transformers where its not needed, but its mandatory for most estimators (supervised estimators). Look at StandardScaler for example. It needs the initial data X for finding the mean and std of the data (it learns the characteristics of X, y is not needed).
Each Transformer should have a transform(X, y) function which like fit() takes the input X and returns a new transformed version of X (which generally should have same number samples but may or may not have same features).
On the other hand, Estimator should have a predict(X) method which should output the predicted value of y from the given X.
There will be some classes in scikit-learn which implement both transform() and predict(), like KMeans, in that case carefully reading the documentation should solve your doubts.
Transformer is a type of Estimator that implements transform method.
Let me support that statement with examples I have come across in sklearn implementation.
Class sklearn.preprocessing.FunctionTransformer :
This inherits from two other classes TransformerMixin, BaseEstimator
Class sklearn.preprocessing.PowerTransformer :
This also inherits from TransformerMixin, BaseEstimator
From what I understand, Estimators just take data, do some processing, and store data based on logic implemented in its fit method.
Note: Estimator's aren't used to predict values directly. They don't even have predict method in them.
Before I give more explanation to the above statement, let me tell you about Mixin Classes.
Mixin Class: These are classes that implement a Mix-in design pattern. Wikipedia has very good explanation about it. You can read it here . To summarise, these are classes you write which have methods that can be used in many different classes. So, you write them in one class and just inherit in many different classes(A form of composition. Read These Links - Link1 Link2)
In Sklearn there are many mixin classes. To name a few
ClassifierMixin, RegressorMixin, TransformerMixin.
Here, TransformerMixin is the class that's inherited by every Transformer used in sklearn. TransformerMixin class has only one method which is reusable in every transformer and that is fit_transform.
All transformers inherit two classes, BaseEstimator(Which has fit method) and TransformerMixin(Which has fit_transform method). And, Each transformer has transform method based on its functionality
I guess that gives an answer to your question. Now, let me answer the statement I made regarding the Estimator for prediction.
Every Model Class has its own predict class that does prediction.
Consider LinearRegression, KNeighborsClassifier, or any other Model class. They all have a predict function declared in them. This is used for prediction. Not the Estimator.
The sklearn usage is perhaps a little unintuitive, but "estimator" doesn't mean anything very specific: basically everything is an estimator.
From the sklearn glossary:
estimator:
An object which manages the estimation and decoding of a model...
Estimators must provide a fit method, and should provide set_params and get_params, although these are usually provided by inheritance from base.BaseEstimator.
transformer:
An estimator supporting transform and/or fit_transform...
As in #VivekKumar's answer, I think there's a tendency to use the word estimator for what sklearn instead calls a "predictor":
An estimator supporting predict and/or fit_predict. This encompasses classifier, regressor, outlier detector and clusterer...
It looks like parameters and children show the same info, so what is the difference between them?
import torch
print('torch.__version__', torch.__version__)
m = torch.load('imagenet_resnet18.pth')
print(m.parameters)
print(m.children)
model.parameters() is a generator that returns tensors containing your model parameters.
model.children() is a generator that returns layers of the model from which you can extract your parameter tensors using <layername>.weight or <layername>.bias
Visit this link for a simple tutorial on accessing and freezing model layers.
The (only, during my writing) current answer is not to the point, and thus misleading in my own opinion. By the current docs(08/23/2022):
children():
Returns an iterator over immediate children modules.
This should mean that it will stop at non-leaf node like torch.nn.Sequential, torch.nn.ModuleList, etc.
parameters(recurse=True):
Returns an iterator over module parameters. This is typically passed to an optimizer.
"Passed to an optimizer" should imply that recursive cases are taken care by the team. Just pass the return value/object to the optimizer.
Since I know you're lazy developers, you must read this answer from PyTorch forum to see the output of children() done by someone: https://discuss.pytorch.org/t/module-children-vs-module-modules/4551/4?u=raining_day513
I would like to modify the back-propagation for the embedding layer but I don't understand where the definition is.
In the definition available in https://pytorch.org/docs/stable/_modules/torch/nn/functional.html in the embedding function, they call torch.embedding and here there should be defined how the weights are updated.
So my question is:
Where can I find the documentation of torch.embedding?
It calls underlying C function, in my torch build(version 4) this file.