In package tf.estimator, there's a lot of defined estimators. I want to use them in Keras.
I checked TF docs, there's only one converting method that could convert keras. Model to tf. estimator, but no way to convert from estimator to Model.
For example, if we want to convert the following estimator:
tf.estimator.DNNLinearCombinedRegressor
How could it be converted into Keras Model?
You cannot because estimators can run arbitrary code in their model_fn functions and Keras models must be much more structured, whether sequential or functional they must consist of layers, basically.
A Keras model is a very specific type of object that can therefore be easily wrapped and plugged into other abstractions.
Estimators are based on arbitrary Python code with arbitrary control flow and so it's quite tricky to force any structure onto them.
Estimators support 3 modes - train, eval and predict. Each of these could in theory have completely independent flows, with different weights, architectures etc. This is almost unthinkable in Keras and would essentially amount to 3 separate models.
Keras, in contrast, supports 2 modes - train and test (which is necessary for things like Dropout and Regularisation).
Related
I have a (PyTorch) timm ViT-B/16 model that's been pre-trained on a bunch of domain specific data. I'd like to be able to load the parameters to an equivalent model created using the huggingface transformers library for usage with multi-modal data.
Googling hasn't really helped me locate a convenience function to do the conversion. Apart from going layer by layer and manually translating the keys of the state dictionary, is there any way to do this conversion?
And in case I'm missing something, if there's an intervening layer (say a BatchNorm) that doesn't have an equivalent in either model - is the conversion still useful?
I have a problem statement where I want to predict multiple continuous outputs using a text input. I tried using 'robertaforsequenceclassification' from HuggingFace library. But the documentation states that when the number of outputs in the final layer is more than 1, a cross entropy loss is used automatically as mentioned here: https://huggingface.co/transformers/v2.2.0/model_doc/bert.html#transformers.BertForSequenceClassification.
But I want to use an RMSE loss in a regression setting with two classes in the final layer. How would one go about modifying it?
BertForSequenceClassification is a small wrapper that wraps the BERTModel.
It calls the models, takes the pooled output (the second member of the output tuple), and applies a classifier over it. The code is here https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_bert.py#L1168
The simplest solution is writing your own simple wrapper class (based on the BertForSequenceClassification class) hat will do the regression that will do the regression with the loss you like.
I have a set of sentences and their scores, I would like to train a marking system that could predict the score for a given sentence, such one example is like this:
(X =Tomorrow is a good day, Y = 0.9)
I would like to use LSTM to build such a marking system, and also consider the sequential relationship between each word in the sentence, so the training example shown above is transformed as following:
(x1=Tomorrow, y1=is) (x2=is, y2=a) (x3=a, y3=good) (x4=day, y4=0.9)
When training this LSTM, I would like the first three time steps using a softmax classifier, and the final step using a MSE. It is obvious that the loss function used in this LSTM is composed of two different loss functions. In this case, it seems the Keras does not provide the way to address my problem directly. In addition, I am not sure whether my method to build the marking system is correct or not.
Keras support multiple loss functions as well:
model = Model(inputs=inputs,
outputs=[lang_model, sent_model])
model.compile(optimizer='sgd',
loss=['categorical_crossentropy', 'mse'],
metrics=['accuracy'], loss_weights=[1., 1.])
Based on your explanation, I think you need a model that first, predict a token based on previous tokens, in NLP domain it usually called Language model, and then compute a score which I assume it is a sentiment (it is applicable to other domain).
To do so, you can train your language model with LSTM and pick the last output of LSTM for your ranking task. To this end, you need to define two loss function: categorical_crossentropy for the language model and MSE for the ranking task.
This tutorial would be helpful: https://www.pyimagesearch.com/2018/06/04/keras-multiple-outputs-and-multiple-losses/
I am doing transfer-learning/retraining using Tensorflow Inception V3 model. I have 6 labels. A given image can be one single type only, i.e, no multiple class detection is needed. I have three queries:
Which activation function is best for my case? Presently retrain.py file provided by tensorflow uses softmax? What are other methods available? (like sigmoid etc)
Which Optimiser function I should use? (GradientDescent, Adam.. etc)
I want to identify out-of-scope images, i.e. if users inputs a random image, my algorithm should say that it does not belong to the described classes. Presently with 6 classes, it gives one class as a sure output but I do not want that. What are possible solutions for this?
Also, what are the other parameters that we may tweak in tensorflow. My baseline accuracy is 94% and I am looking for something close to 99%.
Since you're doing single label classification, softmax is the best loss function for this, as it maps your final layer logit values to a probability distribution. Sigmoid is used when it's multilabel classification.
It's always better to use a momentum based optimizer compared to vanilla gradient descent. There's a bunch of such modified optimizers like Adam or RMSProp. Experiment with them to see what works best. Adam is probably going to give you the best performance.
You can add an extra label no_class, so your task will now be a 6+1 label classification. You can feed in some random images with no_class as the label. However the distribution of your random images must match the test image distribution, else it won't generalise.
I've implemented a neural network using Keras. Once trained and tested for final test accuracy, using a matrix with a bunch of rows containing features (plus corresponding labels), I have a model which I should be able to use for prediction.
How can I feed a single unseen example, meaning a feature vector to the model, to obtain a class prediction?
I've looked at their documentation here but could not find a method for it.
What you want is the predict method, it takes a batch of input samples and produces predictions, which are the outputs computer by your network. To feed a single example you can just put it inside a numpy ndarray wrapper.