How pytorch implement forward for a quantized linear layer? - pytorch

I have a quantized model in pytorch and now I want to extract the parameter of the quantized linear layer and implement the forward manually.
I search the source code but only find this function.
def forward(self, x: torch.Tensor) -> torch.Tensor:
return torch.ops.quantized.linear(
x, self._packed_params._packed_params, self.scale, self.zero_point)
But no where I can find how torch.ops.quantized.linear is defined.
Can someone give me a hind how the forward of quantized linear are defined?

In answer to the question of where torch.ops.quantized.linear is, I was looking for the same thing but was never able to find it. I believe it's probably somewhere in the aten (C++ namespace). I did, however, find some useful PyTorch-based implementations in the NVIDIA TensorRT repo below. It's quite possible these are the ones actually called by PyTorch via some DLLs. If you're trying to add quantization to a custom layer, these implementations walk you through it.
You can find the docs here and the GitHub page here.
For the linear layer specifically, see the QuantLinear layer here
Under the hood, this calls TensorQuantFunction.apply() for post-training quantization or FakeTensorQuantFunction.apply() for quantization-aware training.

Related

How to use soft labels in computer vision with PyTorch?

I have an image dataset with soft labels (i.e. the images don't belong to a single class, but rather I have a probability distribution saying that there's a 66% chance this image belong in one class and 33% chance it belongs in some other class).
I am struggling to figure out how to setup my PyTorch code to allow this to be represented by the model and outputted correctly. The probabilities are saved in a csv file. I have looked at the PyTorch docs and other resources which mention the cross entropy loss function but I am still unclear how to import the data successfully and make use of soft labels.
What you are trying to solve is a multi-label classification task, i.e. instances can be classified with more than one label at a time. You cannot use torch.CrossEntropyLoss since it only allows for single-label targets. So you have two options:
Either use a soft version of the nn.CrossEntropyLoss function, this can be done by implementing the loss by hand allowing for soft targets. You can find such implementation on Soft Cross Entropy in PyTorch.
Or consider the task a multiple "independent" binary classification tasks, in this case, you would use nn.BCEWithLogitsLoss (this layer contains a sigmoid function).
Pytorch CrossEntropyLoss Supports Soft Labels Natively Now
Thanks to the Pytorch team, I believe this problem has been solved with the current version of the torch CROSSENTROPYLOSS.
You can directly input probabilities for each class as target (see the doc).
Here is the forum discussion that pushed this enhancement.

Difference between CrossEntropyLoss and NNLLoss with log_softmax in PyTorch?

When I am building a classifier in PyTorch, I have 2 options to do
Using the nn.CrossEntropyLoss without any modification in the model
Using the nn.NNLLoss with F.log_softmax added as the last layer in the model
So there are two approaches.
Now, what approach should anyone use, and why?
They're the same.
If you check the implementation, you will find that it calls nll_loss after applying log_softmax on the incoming arguments.
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
Edit: seems like the links are now broken, here's the C++ implementation which shows the same information.
Both the cross-entropy and log-likelihood are two different interpretations of the same formula. In the log-likelihood case, we maximize the probability (actually likelihood) of the correct class which is the same as minimizing cross-entropy. Though you're correct both of these have created some ambiguity in the literature, however, there are some subtleties and caveats, I would highly suggest you go through this thread, as this topic has been rigorously discussed there. You may find it useful.
Cross-Entropy or Log-Likelihood in Output layer

Difference between DCGAN & WGAN

In my understanding, DCGAN use convolution layer in both Generator and Discriminator, and WGAN adjust the loss function, optimizer, clipping and last sigmoid function. The part they control is not overlapping. So are there any conflict if i implement both changes of DCGAN & WGAN in one model?
According to my experience, DCGAN proposed a well-tuned and simple model (or more specifically we can say it proposed a simple network structure with well-tuned optimizer) to generate images.WGAN proposed a new method of measuring the distance between data distribution and model distribution and theoretically solved the GAN's problem:unstable,mode collpase etc.
So you can utilize the network structure and parameters proposed in DCGAN and the way of updating parameters of discriminator and generator proposed in WGAN. And i've done that before, It's not conflict.
But in practice, you might not get a very good result when you implement WGAN.It's more advisable to implement WGAN-GP
There is an image generated by WGAN-GP
images generated by WGAN-GP
Hope my answer is helpful.

Using PyTorch for scientific computation

I would like to use PyTorch as a scientific computation package. It has much to recommend it in that respect - its Tensors are basically GPU-accelerated numpy arrays, and its autograd mechanism is potentially useful for a lot of things besides neural networks.
However, the available tutorials and documentation seem strongly geared towards quickly getting people up and running using it for machine learning. Although there is lots of good information available on the Tensor and Variable classes (and I understand that material reasonably well), the nn and optim packages always seem to be introduced by example rather than by explaining the API, which makes it hard to figure out exactly what's going on.
My main question at this point is whether I can use the optim package without also using the nn package, and if so how to do so. Of course I can always implement my simulations as subclasses of nn.Module even though they are not neural networks, but I would like to understand what happens under the hood when I do this, and what benefits/drawbacks it would give for my particular application.
More broadly, I would appreciate pointers to any resource that gives more of a logical overview of the API (for nn and optim specifically), rather than just presenting examples.
This is a partial self-answer to the specific question about using optim without using nn. The answer is, yes, you can do that. In fact, from looking at the source code, the optim package doesn't know anything about nn and only cares about Variables and tensors.
The documentation gives the following incomplete example:
optimizer = optim.Adam([var1, var2], lr = 0.0001)
and then later:
for input, target in dataset:
optimizer.zero_grad()
output = model(input)
loss = loss_fn(output, target)
loss.backward()
optimizer.step()
The function model isn't defined anywhere and looks like it might be something to do with nn, but in fact it can just be a Python function that computes output from input using var1 and var2 as parameters, as long as all the intermediate steps are done using Variables so that it can be differentiated. The call to optimizer.step() will update the values of var1 and var2 automatically.
In terms of the structure of PyTorch overall, it seems that optim and nn are independent of one another, with nn being basically just a convenient way to chain differentiable functions together, along with a library of such functions that are useful in machine learning. I would still appreciate pointers to a good technical overview of the whole package, though.

Weights in Convolution Layers in Keras

I want to know if the filters' weights in a, for example, 2D convolution layer in Keras are shared along the spatial dimensions by default. If yes, is there any way to have not shared weights?
I found that LocallyConnected2D does what I am looking for.
The LocallyConnected2D layer works similarly to the Conv2D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input.
I'm not clear on what your asking but:
The weights in the a single convolutional layer are shared. That is, the filters share the same weights with each stride.
However The weights between two convolutonal layers are not shared by default in keras.
There is no getting around shared wiegths in the filters within the conv layer. Since the execution of the convolution if offloaded to C++ libraries.
See this answer for further reference, in particular:
The implementation of tf.nn.conv2d() is written in C++, which invokes
optimized code using either Eigen (on CPU) or the cuDNN library (on
GPU). You can find the implementation here.

Resources