calculate flops in a custom pytorch model - pytorch

I have a deeply nested pytorch model and want to calculate the flops per layer. I tried using the flopth, ptflops, pytorch-OpCounter library but couldn't run it for such a deeply nested model. How to calculate the number of mul/add operations and flops each layer in this model?

Using Flop Counter for PyTorch Models worked.
The following was mentioned in ptflops because of which my custom model faced errors -
**This script doesn't take into account torch.nn.functional.* operations. For an instance, if one have a semantic segmentation model and use torch.nn.functional.interpolate to upscale features, these operations won't contribute to overall amount of flops. To avoid that one can use torch.nn.Upsample instead of torch.nn.functional.interpolate.**

Related

BERT Ensemble shared linear layer worse than individual model

I fine-tuned two bert-base models, initialized with different weights, on the same dataset. I then attempted to combine my pretrained models via a shared linear layer. Supposed there is no problem in my code, is there a possibility that this combination performs worse during training and hence on a test set than the individual models? - This is my situation.
No. The shared linear layer is essentially an ensemble machine learning method, which combines two "weak" models into a single stronger model. The parameters for this combination are learned to optimize performance on the training set, so unless the shared layer is designed in such a way that it doesn't actually utilize the input features, its performance should always be at least as good as the worse of the two ensembled models on the training set. This is because at a minimum, the shared layer should be able to learn to output exactly the result of the better model and ignore the other model. Of course, it would be reasonable to achieve worse testing performance as the distribution of the data may differ.
Some causes of your issue may be:
Initializing in a local optimum
Different activation function in shared layer
Other parameter settings

Cascading two or more LSTM models

I am working on a case study where i want to make a comparison of performance between a standard LSTM model and a cascaded lstm models as provided in the picture (you could see the block diagram). I would like to know what function could be useful to stack these models. it worth mentioning that each output sequence is an input to the next block, i.e. the LSTM-1hr model has been cascaded with each other and the output block was separately trained in a supervised manner while freezing weights for the input block. The secondary block is initialized with the weights from the basic 1hr model.
the image shows the block diagram of the models that i want to build

How to get the pruned random forest model after pruning?

In a random forest regressor from Scikit Learn it is possible to set a ccp_alpha parameter that is related to the pruning technique (docs) and I'm using it to control my overfitting.
After applying it I would like to use this pruned model to perform hyperparameter tuning with random Search and find my best model. So, I want this pruned model.
Is it possible to get this pruned model?
When you apply the .fit(X_train, y_train) function to an object of the RandomForestClassifier() or RandomForestRegressor() class, the returned fitted model has already been pruned.
This happens under the hood in the sklearn implementation. Theoretically speaking too, a RandomForest is not just a combination of DecisionTrees, but is the pruned, aggregated, and using default settings, bootstrapped version of multiple large decision trees.
Rest assured, the model returned here is not overfitting due to pruning. If you do notice overfitting, I'd suggest you check the o.o.b score of your model and describe your entire data pipeline for further suggestions
Refer to this documentation from scikit-learn
https://scikitlearn.org/stable/auto_examples/tree/plot_cost_complexity_pruning.html
It includes a detailed explanantion of implementing pruning using cost-complexity.

Difference between DCGAN & WGAN

In my understanding, DCGAN use convolution layer in both Generator and Discriminator, and WGAN adjust the loss function, optimizer, clipping and last sigmoid function. The part they control is not overlapping. So are there any conflict if i implement both changes of DCGAN & WGAN in one model?
According to my experience, DCGAN proposed a well-tuned and simple model (or more specifically we can say it proposed a simple network structure with well-tuned optimizer) to generate images.WGAN proposed a new method of measuring the distance between data distribution and model distribution and theoretically solved the GAN's problem:unstable,mode collpase etc.
So you can utilize the network structure and parameters proposed in DCGAN and the way of updating parameters of discriminator and generator proposed in WGAN. And i've done that before, It's not conflict.
But in practice, you might not get a very good result when you implement WGAN.It's more advisable to implement WGAN-GP
There is an image generated by WGAN-GP
images generated by WGAN-GP
Hope my answer is helpful.

Keras: better way to implement layer-wise training model?

I'm currently learning implementing layer-wise training model with Keras. My solution is complicated and time-costing, could someone give me some suggestions to do it in a easy way? Also could someone explain the topology of Keras especially the relations among nodes.outbound_layer, nodes.inbound_layer and how did they associated with tensors: input_tensors and output_tensors? From the topology source codes on github, I'm quite confused about:
input_tensors[i] == inbound_layers[i].inbound_nodes[node_indices[i]].output_tensors[tensor_indices[i]]
Why the inbound_nodes contain output_tensors, I'm not clear about the relations among them....If I wanna remove layers in certain positions of the API model, what should I firstly remove? Also, when adding layers to some certain places, what shall I do first?
Here is my solution to a layerwise training model. I can do it on Sequential model and now trying to implement in on the API model:
To do it, I'm simply add a new layer after finish previous training and re-compile (model.compile()) and re-fit (model.fit()).
Since Keras model requires output layer, I would always add an output layer. As a result, each time when I wanna add a new layer, I have to remove the output layer then add it back. This can be done using model.pop(), in this case model has to be a keras.Sequential() model.
The Sequential() model supports many useful functions including model.add(layer). But for customised model using model API: model=Model(input=...., output=....), those pop() or add() functions are not supported and implement them takes some time and maybe not convenient.

Resources