Caffe vs Theano MNIST example

Caffe vs Theano MNIST example - theano

I'm trying to learn (and compare) different deep learning frameworks, by the time they are Caffe and Theano.
http://caffe.berkeleyvision.org/gathered/examples/mnist.html
and
http://deeplearning.net/tutorial/lenet.html
I follow the tutorial to run those frameworks on MNIST dataset. However, I notice a quite difference in term of accuracy and performance.
For Caffe, it's extremely fast for the accuracy to build up to ~97%. In fact, it only takes 5 mins to finish the program (using GPU) which the final accuracy on test set of over 99%. How impressive!
However, on Theano, it is much poorer. It took me more than 46 minutes (using same GPU), just to achieve 92% test performance.
I'm confused as it should not have so much difference between the frameworks running relatively same architectures on same dataset.
So my question is. Is the accuracy number reported by Caffe is the percentage of correct prediction on test set? If so, is there any explanation for the discrepancy?
Thanks.

The examples for Theano and Caffe are not exactly the same network. Two key differences which I can think of are that the Theano example uses sigmoid/tanh activation functions, while the Caffe tutorial uses the ReLU activation function, and that the Theano code uses normal minibatch gradient descent while Caffe uses a momentum optimiser. Both differences will significantly affect the training time of your network. And using the ReLU unit will likely also affect the accuracy.
Note that Caffe is a deep learning framework which already has ready-to-use functions for many commonly used things like the momentum optimiser. Theano, on the other hand, is a symbolic maths library which can be used to build neural networks. However, it is not a deep learning framework.
The Theano tutorial you mentioned is an excellent resource to understand how exactly convolutional and other neural networks work on a basic level. However, it will be cumbersome to implement all the state-of-the-art tweaks. If you want to get state-of-the-art results quickly you are better off using one of the existing deep learning frameworks. Apart from Caffe, there are a number of frameworks based on Theano. I know of keras, blocks, pylearn2, and my personal favourite lasagne.

Related

What type of optimization to perform on my multi-label text classification LSTM model with Keras?

I'm using Windows 10 machine. Libraries: Keras with Tensorflow 2.0 Embeddings: Glove(100 dimensions).
I am trying to implement an LSTM architecture for multi-label text classification.
I am using different types of fine-tuning to achieve better results but with no luck so far.
The main problem I believe is the difference in class distributions of my dataset but after a lot of tries and errors, I couldn't implement stratified-k-split in Keras.
I am also experimenting with dropout layers, batch sizes, # of layers, learning rates, clip values, validation splits but I get a minimum boost or worst performance sometimes.
For metrics, I use mainly ROC and F1.
I also followed the suggestion from a StackOverflow member who said to delete some of my examples so I can balance my dataset but if I do that I will have a very low number of examples.
What would you suggest to me?
If someone can provide code based on my implementation for
stratified-k-split I would be grateful cause I have checked all the
online resources but can't implement it.
Any tips, suggestions will be really helpful.
Metrics Plots
Dataset form+Embedings form+train-test-split form
Dataset's labels distribution
My LSTM implementation

Tensorflow and Bert What are they exactly and what's the difference between them?

I'm interested in NLP and I come up with Tensorflow and Bert, both seem to be from Google and both seem to be the best thing for Sentiment Analysis as of today but I don't understand what are they exactly and what is the difference between them... Can someone explain?

Tensorflow is an open-source library for machine learning that will let you build a deep learning model/architecture. But the BERT is one of the architectures itself. You can build many models using TensorFlow including RNN, LSTM, and even the BERT. The transformers like the BERT are a good choice if you just want to deploy a model on your data and you don't care about the deep learning field itself. For this purpose, I recommended the HuggingFace library that provides a straightforward way to employ a transformer model in just a few lines of code. But if you want to take a deeper look at these models, I will suggest you to learns about the well-known deep learning architectures for text data like RNN, LSTM, CNN, etc., and try to implement them using an ML library like Tensorflow or PyTorch.

Bert and Tensorflow is not different thing , There are not only 2, but many implementations of BERT. Most are basically equivalent.
The implementations that you mentioned are:
The original code by Google, in Tensorflow. https://github.com/google-research/bert
Implementation by Huggingface, in Pytorch and Tensorflow, that reproduces the same results as the original implementation and uses the same checkpoints as the original BERT article. https://github.com/huggingface/transformers
These are the differences regarding different aspects:
In terms of results, there is no difference in using one or the other, as they both use the same checkpoints (same weights) and their results have been checked to be equal.
In terms of reusability, HuggingFace library is probably more reusable, as it is designed specifically for that. Also, it gives you the freedom of choosing TensorFlow or Pytorch as deep learning framework.
In terms of performance, they should be the same.
In terms of community support (e.g. asking questions in github or stackoverflow about them), HuggingFace library is better suited, as there are a lot of people using it.
Apart from BERT, the transformers library by HuggingFace has implementations for lots of models: OpenAI GPT-2, RoBERTa, ELECTRA, ...

Why is accuracy higher with Caffe than with tf.keras?

I converted a model from tf.keras to caffe. When I evaluate the model with Caffe on the test set, I find that the accuracy is higher with caffe than with tf.keras. I can't think of a way to get a hand on the source of the problem (if there's a problem in the first place...)
Is this difference due to the lower-level libraries used for accelerating the computations (I am thinking of cudnn and the caffe engine)? Is there a well-known accuracy problem with the keras module of tensorflow?
By the way, there are other people that have a similar issue:
https://github.com/keras-team/keras/issues/4444

This can happen.
Once you convert your keras .h5 model to .caffemodel, the weights are numerically copied. But, internally you'll load your model to Caffe and not Keras.
As, caffe and keras are two different libraries, their internal algorithms can vary slightly. Also if you change your pre-processing scheme that can change the result too. Usually, if you use pruning (to optimize the size) the performance can go low, in the weird case this can be thought of as an extreme regularization and act as a performance booster in test.

I want to customise the last layer of VGG 19 architecture for a classification. which will be more useful keras or pytorch?

I want to customise the last layer of VGG 19 architecture for a classification problem. which will be more useful keras or pytorch?

It heavily depends on what you want to do with it.
While Keras offers different backends, such as TensorFlow or Theano (which in turn can offer you a little more flexibility), and transfers better to production systems,
PyTorch is definitely also easy to implement. Additionally, it offers great scaling on (multi-)GPU systems, since it is trivial to outsource your computations in a PyTorch model. I do not know how easy that is in Keras (never done it, so I genuinely cannot judge).
If you just want to play around with one of the frameworks, it usually boils down to personal preference. I personally prefer PyTorch, due to its more "python-esque" approach to things, but I know many people that prefer Keras because of its clear and simple layout and documentation.
Providing a little more information, or your context, can also potentially increase the quality of the answers you receive.

Can i turn the CIFAR-10 dataset to grayscale images and convert it to same dimension as MNIST dataset. Will the model be invalid or fail to learn?

I'm new in the field of Deep Neural Network. There are various deep learning frameworks nearby. Notably Theano, Torch7, Caffe, and recently open sourced TensorFlow. I have tried out a couple of tutorials with TensorFlow provided on their site. Specifically the MNIST dataset. I guess this is the hello world of every deep learning framework out there. I also viewed tutorials from here. This one was explained in detail, but they do not provide hands on experience with any deep learning frameworks. So which framework should be better for beginners? I looked up similar questions asked on Quora. Some said that theano is tougher to learn but it gives more control, Caffe is easier, but it gives less control over the network. And nothing on Tensorflow, as it is new, but from what i've seen the documentation is not That well written, also it seems tougher to understand. So as a newbie what should i choose to learn?
Another question, As I said, MNIST is the hello world of every deep learning framework, and many neural networks can be found for recognizing MNIST dataset. So, if I use the same network to detect other dataset, say CIFAR-10 dataset, will it work?? Let's just say that i turn the CIFAR-10 dataset to grayscale images and convert it to same dimension as MNIST dataset. Will the model be invalid or fail to learn? or have bad accuracy or what?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Caffe vs Theano MNIST example - theano

Related

What type of optimization to perform on my multi-label text classification LSTM model with Keras?

Tensorflow and Bert What are they exactly and what's the difference between them?

Why is accuracy higher with Caffe than with tf.keras?

I want to customise the last layer of VGG 19 architecture for a classification. which will be more useful keras or pytorch?

Can i turn the CIFAR-10 dataset to grayscale images and convert it to same dimension as MNIST dataset. Will the model be invalid or fail to learn?

Categories

Resources