I want to get a deep idea about how this keras layers works in a model. What does each layer doing in the model etc. I followed kers documentation and information isn't enough. If any of you know place to get more knowledge let me know.Thanks in advance
Keras layers are widely used CNN, DNN and RNN layers. There is atleast one research paper for each of them and there is a lot of educational material out there. If you are really curious you could look at keras' code. Some links for you:
https://github.com/keras-team/keras/tree/master/keras/layers
http://cs231n.github.io/convolutional-networks/
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence
http://www.jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf
Related
LayoutLM builds itself on top of BERT as the baseline, but I want to substitute BERT for MobileBERT because BERT is too large. Unfortunately, the Huggingface Transformers library doesn't give you the option to change the baseline model for LayoutLM. How should I go about swapping BERT for MobileBERT? I'm aware they have very different configurations.
I'm aware this is a very broad question and a wide topic, but I can't find anything about it online. How would I go about it and where should I start?
LayoutLM can be traind with the MiniLM models but with a slight accuaracy loss.
I'm interested in NLP and I come up with Tensorflow and Bert, both seem to be from Google and both seem to be the best thing for Sentiment Analysis as of today but I don't understand what are they exactly and what is the difference between them... Can someone explain?
Tensorflow is an open-source library for machine learning that will let you build a deep learning model/architecture. But the BERT is one of the architectures itself. You can build many models using TensorFlow including RNN, LSTM, and even the BERT. The transformers like the BERT are a good choice if you just want to deploy a model on your data and you don't care about the deep learning field itself. For this purpose, I recommended the HuggingFace library that provides a straightforward way to employ a transformer model in just a few lines of code. But if you want to take a deeper look at these models, I will suggest you to learns about the well-known deep learning architectures for text data like RNN, LSTM, CNN, etc., and try to implement them using an ML library like Tensorflow or PyTorch.
Bert and Tensorflow is not different thing , There are not only 2, but many implementations of BERT. Most are basically equivalent.
The implementations that you mentioned are:
The original code by Google, in Tensorflow. https://github.com/google-research/bert
Implementation by Huggingface, in Pytorch and Tensorflow, that reproduces the same results as the original implementation and uses the same checkpoints as the original BERT article. https://github.com/huggingface/transformers
These are the differences regarding different aspects:
In terms of results, there is no difference in using one or the other, as they both use the same checkpoints (same weights) and their results have been checked to be equal.
In terms of reusability, HuggingFace library is probably more reusable, as it is designed specifically for that. Also, it gives you the freedom of choosing TensorFlow or Pytorch as deep learning framework.
In terms of performance, they should be the same.
In terms of community support (e.g. asking questions in github or stackoverflow about them), HuggingFace library is better suited, as there are a lot of people using it.
Apart from BERT, the transformers library by HuggingFace has implementations for lots of models: OpenAI GPT-2, RoBERTa, ELECTRA, ...
I read many articles saying Keras is too high level and hard to be used for research. I found Keras has Lambda layer, and custom layer, so what are some detailed examples of what Pytorch can achieve while Keras cannot or is very tricky to? Thanks.
I'm trying to write code for A Neural Probabilistic Language Model by yoshua Bengio, 2003, but I'm not able to understand the connections between the input layer and projection matrix and between projection matrix and hidden layer. I'm not able to get how exactly is the learning for word-vector representation taking place.
have a look at this answer here
It explains the difference between the hidden layer and the projection layer.
Referring to this thesis
Also, do read this paper by Tomas Mikolov and go through this tutorial.
this will really improve your understanding.
Hope this helps!
I'm new in the field of Deep Neural Network. There are various deep learning frameworks nearby. Notably Theano, Torch7, Caffe, and recently open sourced TensorFlow. I have tried out a couple of tutorials with TensorFlow provided on their site. Specifically the MNIST dataset. I guess this is the hello world of every deep learning framework out there. I also viewed tutorials from here. This one was explained in detail, but they do not provide hands on experience with any deep learning frameworks. So which framework should be better for beginners? I looked up similar questions asked on Quora. Some said that theano is tougher to learn but it gives more control, Caffe is easier, but it gives less control over the network. And nothing on Tensorflow, as it is new, but from what i've seen the documentation is not That well written, also it seems tougher to understand. So as a newbie what should i choose to learn?
Another question, As I said, MNIST is the hello world of every deep learning framework, and many neural networks can be found for recognizing MNIST dataset. So, if I use the same network to detect other dataset, say CIFAR-10 dataset, will it work?? Let's just say that i turn the CIFAR-10 dataset to grayscale images and convert it to same dimension as MNIST dataset. Will the model be invalid or fail to learn? or have bad accuracy or what?