pytorch profiler explanation of layers - pytorch

I am using the pytorch profiler api but i am having difficulty understanding the output. Here in the figure what does the Name column mean? I understand it shows the model layers but what is aten:gru,aten:uniform etc. Also I want the output to be like that of torchprof where i can see how much time each feature is taking (a more detailed breakdown). What's the equivalent of prof.display(show_events=True) in pytorch profiler?

Related

How to get the inference compute graph of the pytorch model?

I want to hand write a framework to perform inference of a given neural network. The network is so complicated, so to make sure my implementation is correct, I need to know how exactly the inference process is done on device.
I tried to use torchviz to visualize the network, but what I got seems to be the back propagation compute graph, which is really hard to understand.
Then I tried to convert the pytorch model to ONNX format, following the instruction enter link description here, but when I tried to visualize it, it seems that the original layers of the model had been seperated into very small operators.
I just want to get the result like this
How can I get this? Thanks!
Have you tried saving the model with torch.save (https://pytorch.org/tutorials/beginner/saving_loading_models.html) and opening it with Netron? The last view you showed is a view of the Netron app.
You can try also the package torchview, which provides several features (useful especially for large models). For instance you can set the display depth (depth in nested hierarchy of moduls).
It is also based on forward prop
github repo
Disclaimer: I am the author of the package
Note: The accepted format for tool is pytorch model

What type of optimization to perform on my multi-label text classification LSTM model with Keras?

I'm using Windows 10 machine. Libraries: Keras with Tensorflow 2.0 Embeddings: Glove(100 dimensions).
I am trying to implement an LSTM architecture for multi-label text classification.
I am using different types of fine-tuning to achieve better results but with no luck so far.
The main problem I believe is the difference in class distributions of my dataset but after a lot of tries and errors, I couldn't implement stratified-k-split in Keras.
I am also experimenting with dropout layers, batch sizes, # of layers, learning rates, clip values, validation splits but I get a minimum boost or worst performance sometimes.
For metrics, I use mainly ROC and F1.
I also followed the suggestion from a StackOverflow member who said to delete some of my examples so I can balance my dataset but if I do that I will have a very low number of examples.
What would you suggest to me?
If someone can provide code based on my implementation for
stratified-k-split I would be grateful cause I have checked all the
online resources but can't implement it.
Any tips, suggestions will be really helpful.
Metrics Plots
Dataset form+Embedings form+train-test-split form
Dataset's labels distribution
My LSTM implementation

Looking for input on an accuracy rate that is different than the exact deep learning compiled code

I just began my Deep learning journey with Keras along with Tenserflow. I followed a tutorial that used a feed forward model on MNIST dataset. The strange part is that I used the same complied code, yet, I got a higher accuracy rate than the exact same code. I'm looking to understand why or how can this happen?

Hyperparameters tuning with Google Cloud ML Engine and XGBoost

I am trying to replicate the hyperparameter tuning example reported at this link but I want to use scikit learn XGBoost instead of tensorflow in my training application.
I am able to run multiple trials in a single job, on for each of the hyperparameters combination. However, the Training output object returned by ML-Engine does not include the finalMetric field, reporting metric information (see the differences in the picture below).
What I get with the example of the link above:
Training output object with Tensorflow training app
What I get running my Training application with XGBoost:
Training output object with XGBoost training app
Is there a way for XGBoost to return training metrics to ML-Engine?
It seems that this process is automated for tensorflow, as specified in the documentation:
How Cloud ML Engine gets your metric
You may notice that there are no instructions in this documentation
for passing your hyperparameter metric to the Cloud ML Engine training
service. That's because the service monitors TensorFlow summary events
generated by your training application and retrieves the metric.
Is there a similar mechanism for XGBoost?
Now, I can always dump each metric results to a file at the end of each trial and then analyze them manually to select the best parameters. But, by doing so, am I loosing the automated mechanism offered by Cloud ML Engine, especially concerning the "ALGORITHM_UNSPECIFIED" hyperparameters search algorithm?
i.e.,
ALGORITHM_UNSPECIFIED: [...] applies Bayesian optimization to search
the space of possible hyperparameter values, resulting in the most
effective technique for your set of hyperparameters.
Hyperparameter tuning support of XGBoost was implemented in a different way. We created the cloudml-hypertune python package to help do it. We're still working on the public doc for it. At the meantime, you can refer to this staging sample to learn about how to use it.
Sara Robinson over at google put together a good post on how to do this. Rather than regurgitate and claim it as my own, I'll post this here for anyone else that comes across this post:
https://sararobinson.dev/2019/09/12/hyperparameter-tuning-xgboost.html

Caffe vs Theano MNIST example

I'm trying to learn (and compare) different deep learning frameworks, by the time they are Caffe and Theano.
http://caffe.berkeleyvision.org/gathered/examples/mnist.html
and
http://deeplearning.net/tutorial/lenet.html
I follow the tutorial to run those frameworks on MNIST dataset. However, I notice a quite difference in term of accuracy and performance.
For Caffe, it's extremely fast for the accuracy to build up to ~97%. In fact, it only takes 5 mins to finish the program (using GPU) which the final accuracy on test set of over 99%. How impressive!
However, on Theano, it is much poorer. It took me more than 46 minutes (using same GPU), just to achieve 92% test performance.
I'm confused as it should not have so much difference between the frameworks running relatively same architectures on same dataset.
So my question is. Is the accuracy number reported by Caffe is the percentage of correct prediction on test set? If so, is there any explanation for the discrepancy?
Thanks.
The examples for Theano and Caffe are not exactly the same network. Two key differences which I can think of are that the Theano example uses sigmoid/tanh activation functions, while the Caffe tutorial uses the ReLU activation function, and that the Theano code uses normal minibatch gradient descent while Caffe uses a momentum optimiser. Both differences will significantly affect the training time of your network. And using the ReLU unit will likely also affect the accuracy.
Note that Caffe is a deep learning framework which already has ready-to-use functions for many commonly used things like the momentum optimiser. Theano, on the other hand, is a symbolic maths library which can be used to build neural networks. However, it is not a deep learning framework.
The Theano tutorial you mentioned is an excellent resource to understand how exactly convolutional and other neural networks work on a basic level. However, it will be cumbersome to implement all the state-of-the-art tweaks. If you want to get state-of-the-art results quickly you are better off using one of the existing deep learning frameworks. Apart from Caffe, there are a number of frameworks based on Theano. I know of keras, blocks, pylearn2, and my personal favourite lasagne.

Resources