Bitwise shift operator in Pytorch and ONNX export - pytorch

I have a quantized network from a framework, that I want to export as an ONNX file. The quantization requires adding intermediary layers performing bitwise right shift to avoid overflow. I have to insert these layers in between some other layers of the network.
I think I can use the Bitshift operator from Pytorch and somehow create the additional layers.
My first question is: do such layer already exist or do I have to create them from scratch? For example, could I use a linear layer, with a diagonal weight matrix, and tell the framework not to change the weights?
Then, for the ONNX export, I believe such layers are not supported yet. Is there a simple way to do the export anyway?
Thanks for your help.

Related

Conv2D filters and CNN architecture

I am currently pursuing undergraduation, I am working on CNN model to recognize Telegu characters.
This Questions has two parts,
I have a (32,32,1) shape Telegu character images, I want to train my CNN model to recognize the character. So, what should be my model architecture and how to decide the architecture, no of parameters and hidden layers. I know that my case is exactly same as handwritten digit recognition, but I want to know how to decide those parameters. Is there any common practice in building such architecture.
Operation Conv2D (32, (5,5)) means 32 filters of size 5x5 are applied on to the input, my question is are these filters all same or different, if different what kind of filters are initialized and who decides them?
I tried to surf internet but everywhere I go, the answer I get is Conv2D operation applies filters on input and does the convolution operation.
To decide which model architecture would be best, you need to experiment. Thats the only way. As you want to classify, VGG architecture would be a good starting point I believe. You need to experiment with number of parameters as it depends on your problem. You can use Keras Tuner for it: https://keras.io/keras_tuner/
For kernel initialization, as far as I know convolutional layers in Keras uses Glorot Uniform Initialization but you can change that by using kernel_initializer parameter. Long story short, convolutional layers are initialized with a distribution function and as training goes filters change the values inside, which is learning process. https://keras.io/api/layers/initializers
Edit: I forgot to inform you that I suggest VGG architecture but in a way you downsize the models a lot. Your input shape is little so if your model is too much deep, you will overfit really quickly.

Get exact formula used by Pytorch autograd to compute gradients

I am implementing a custom CNN with some custom modules in it. I have implemented only the forward pass for the custom modules and left their backward pass to autograd.
I have manually computed the correct formulae for backpropagation through the parameters of the custom modules, and I wished to see whether they match with the formulae used internally by autograd to compute the gradients.
Is there any way to see this?
Thanks
Edit (To add a test case) :-
I have a complex affine layer where the weights and inputs are complex-valued matrices, and the operation is a matrix multiplication of the weight and input matrices.
The multiplication of two complex numbers is given by -
(a+ib)(c+id) = (ac-bd)+i(ad+bc)
I computed the backpropagation formula for this layer given we have the incoming gradient from the higher layer.
It comes out to be dL/dI(n) = (hermitian(W(n))).matmul(dL/dI(n+1))
where I(n) and W(n) are the input and weight of nth layer and I(n+1) is input of (n+1)th layer.
So I wished to check whether autograd is also computing dL/dI(n) using the same formula that I derived.
(Since Pytorch doesn't support complex-valued tensors backpropagation as for now, I have created my own representation of complex numbers by dealing with separate real and imaginary tensors)
I don't believe there is such a feature in pytorch, even because it would be quite unreadable. What you can do is to implement a custom backward method for your layer with the formula you derived, then know by design that the backpropagation is what you want.

How to add CRF layer in a tensorflow sequential model?

I am trying to implement a CRF layer in a TensorFlow sequential model for a NER problem. I am not sure how to do it. Previously when I implemented CRF, I used CRF from keras with tensorflow as backend i.e. I created the entire model in keras instead of tensorflow and then passed the entire model through CRF. It worked.
But now I want to develop the model in Tensorflow as tensorflow2.0.0 beta already has keras inbuilt in it and I am trying to build a sequential layer and add CRF layer after a bidirectional lstm layer. Although I am not sure how to do that. I have gone through the CRF documentation in tensorflow-addons and it contains different functions such as forward CRF etc etc but not sure how to implement them as a layer ? I am wondering is it possible at all to implement a CRF layer inside a sequential tensorflow model or do I need to build the model graph from scratch and then use CRF functions ? Can anyone please help me with it. Thanks in advance
In the training process:
You can refer to this API:
tfa.text.crf_log_likelihood(
inputs,
tag_indices,
sequence_lengths,
transition_params=None
)
The inputs are the unary potentials(just like that in the logistic regression, and you can refer to this answer) and here in your case, they are the logits(it is usually not the distributions after the softmax activation function) or states of the BiLSTM for each character in the encoder(P1, P2, P3, P4 in the diagram above; ).
The tag_indices are the target tag indices, and the sequence_lengths represent the sequence lengths in a batch.
The transition_params are the binary potentials(also how the tag transits from one time step to the next), you can create the matrix yourself or you just let the API do it for you.
In the inference process:
You just utilize this API:
tfa.text.viterbi_decode(
score,
transition_params
)
The score stands for the same input like that in the training(the P1, P2, P3, P4 states) and the transition_params are also that trained in the training process.

Image Classification using Tensorflow

I am doing transfer-learning/retraining using Tensorflow Inception V3 model. I have 6 labels. A given image can be one single type only, i.e, no multiple class detection is needed. I have three queries:
Which activation function is best for my case? Presently retrain.py file provided by tensorflow uses softmax? What are other methods available? (like sigmoid etc)
Which Optimiser function I should use? (GradientDescent, Adam.. etc)
I want to identify out-of-scope images, i.e. if users inputs a random image, my algorithm should say that it does not belong to the described classes. Presently with 6 classes, it gives one class as a sure output but I do not want that. What are possible solutions for this?
Also, what are the other parameters that we may tweak in tensorflow. My baseline accuracy is 94% and I am looking for something close to 99%.
Since you're doing single label classification, softmax is the best loss function for this, as it maps your final layer logit values to a probability distribution. Sigmoid is used when it's multilabel classification.
It's always better to use a momentum based optimizer compared to vanilla gradient descent. There's a bunch of such modified optimizers like Adam or RMSProp. Experiment with them to see what works best. Adam is probably going to give you the best performance.
You can add an extra label no_class, so your task will now be a 6+1 label classification. You can feed in some random images with no_class as the label. However the distribution of your random images must match the test image distribution, else it won't generalise.

Weights in Convolution Layers in Keras

I want to know if the filters' weights in a, for example, 2D convolution layer in Keras are shared along the spatial dimensions by default. If yes, is there any way to have not shared weights?
I found that LocallyConnected2D does what I am looking for.
The LocallyConnected2D layer works similarly to the Conv2D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input.
I'm not clear on what your asking but:
The weights in the a single convolutional layer are shared. That is, the filters share the same weights with each stride.
However The weights between two convolutonal layers are not shared by default in keras.
There is no getting around shared wiegths in the filters within the conv layer. Since the execution of the convolution if offloaded to C++ libraries.
See this answer for further reference, in particular:
The implementation of tf.nn.conv2d() is written in C++, which invokes
optimized code using either Eigen (on CPU) or the cuDNN library (on
GPU). You can find the implementation here.

Resources