I have implemented a custom metric based on SIM and when i try the code it works. I have implemented it using tensors and np arrays and both give the same results. However when I start fitting the model the values given back are a lot higher then the values I get when i load the weights generated by the training and applying the same function.
My function is:
def SIM(y_true,y_pred):
n_y_true=y_true/(K.sum(y_true)+K.epsilon())
n_y_pred=y_pred/(K.sum(y_pred)+K.epsilon())
return K.mean(K.sum( K.minimum(n_y_true, n_y_pred)))
When I compile the Keras model I add this to the metrics and during training it gives for example SIM: 0.7092.
When i load the weights and try it the SIM score is around 0.3. The correct weights are loaded (when restarting training with these weights the same values popup). Does anybody know if I am doing anything wrong?
Why are the metrics given back during training so much higher compared to running the function over a batch?
Related
I have a pre-trained model, by changing final fc layers I have created model for downstream task. Now, I want to load weights from pertained weights. I tried self.model.load_from_checkpoint (self.pretrained_model_path). But, when I print weight values from model layers, they are exactly same which indicates weights were not loaded/updated. Note that is not giving me any warning/error.
Edit:
self.model.backbone = self.model.load_from_checkpoint (self.pretrained_model_path).backbone
updates the parameters with pre_trained weights. There might be optimal way but I found this fix.
I have a regression model with Euclidean distance as a loss function and RMSE as a metric evaluation (lower is better). When I passed my train, test sets to model.fit I have train_rmse, and test_rmse which their values made sense to me. But when I pass the test sets into model.evalute after loading the weight of the trained model I got different results which are approximately twice the result with model.fit. And I am aware of the difference that should happen between the train evaluation and test evaluation as I know from Keras that :
the training loss is the average of the losses over each batch of training data. Because your model is changing over time, the loss over the first batches of an epoch is generally higher than over the last batches. On the other hand, the testing loss for an epoch is computed using the model as it is at the end of the epoch, resulting in a lower loss.
But here I am talking about the result of test-set passed to model.fit in which I beleived is evaluated on the final model. In Keras documentation, they said on validation argument that I am passing the test set in it:
validation_data: Data on which to evaluate the loss and any model metrics at the end of each epoch. The model will not be trained on this data.
When I searched about the problem I found several issues
1- Some people like here report that this issue is with the model itself if they have batch normalization layer,or if you do transfer learning and freeze some BN layers like here. my model has BN layers, and I did not freeze any layer. Also, I used the same model for the mulit-class classification problem (not regression) and the result was the same for test set in the model.fit and model.evaluate.
2- Other people like said that this is related with either the prediction or metric calculation like here, in which they found that this difference is related with the different of dtype for y_true and y_pred if someone is float32 and other float64 for example, then the metric calculation will be different. When they unified the dtype the problem is fixed.
I believed that the last case applied to me since in the regression task my labels now is tf.float32. My y_true labels already cast to tf.float32 through tfrecord, so I tried to cast the y_pred to tf.float32 before the rmse calculation, and I still have the difference in the result.
So My questions are:
Why this difference in results
To whom I should rely for test set, on model.fit result or model.evalute
I know that for training loss and accuracy, keras does a running average over the batches, and I know for model.evalute, these metric are calculated by taking all the dataset one time on the final model. But how the validation loss and accuracy calculated for validation set passed to model.fit?
UPDATE:
The problem was in the shape conflict between the y_true and y_pred. As for y_true label I save it in tfrecords as float single value and eventually will be with the size of [batch_size] while the regression model gives the prediction as [batch_size, 1] and then the result of tf.subtract(y_true, y_pred) in rmse equation will result in matrix of [batch_size, batch_sizze] and with taking the mean of this final one you will never guess it is wrong and the code will not throw any error but the calculation of rmse will be wrong. I am still working to make the shape consistent but still didn't find a good solution.
Keras 2.0 removed F1 score, but I would like to monitor its value. I am using a sequential model to train a Neural Net.
I defined a function, as suggested here How to calculate F1 Macro in Keras?.
This function works fine only if used it inside model.compile. In this way I see its value at each step. The problem is that I don't want just to see its value but I would like my training to behave differently according to its value, using the callbacks of Keras.
If I try to insert my custom metric in the callbacks then I get this error:
'function object is not iterable'
Do you know how to define a function such that it can be used as an argument in the callbacks?
Callback of Keras will enable us to retrieve the model at different period, based on the metric which we keep track of. This will not affect the training procedure of the model.
You can train your model only with respect to some loss function. For example, cross entropy for classification problem. The readily available loss function in keras are given here
Precision, recall or f1-score are not differentialable functions. Hence, we cannot use that as a loss function for model training.
May be, if you want to tune your hyperparameter (such as learning rate, class weights) for improving f1 score, then you can be do that.
For tuning hyper parameters you can use hyperopt, tutorials
I'm trying to transfer my model from single run to hyper-parameter tuning using RandomizedSearchCV.
In my single run case, my data is splitted into train/validation/test data.
When I run RandomizedSearchCV on my train_data with default 3-fold CV, I notice that the length of my train_input is reduced to 66% of train_data (which makes sense in a 3-fold CV...).
So I'm guessing that I should merge my initial train and validation set into a larger train set and let RandomizedSearchCV split it into train and validation sets.
Would that be the right way to go?
My question is: how can I access the remaining 33% of my train_input to feed it to my validation accuracy test function (note that my score function is running on test set)?
Thanks for your help!
Yoann
I'm not sure that my code would help here since my question is rather generic.
This is the answer that I found by going through sklearn's code: the RandomizedSearchCV doesn't return the splited validation data in an easy way and I should definitely merge my initial train and validation set into a larger train set and let RandomizedSearchCV split it into train and validation sets.
The train_data is splitted for CV using a cross-validator into a train/validation set (in my case, the Stratified K-Folds http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html)
My estimator is defined as follows:
class DNNClassifier(BaseEstimator, ClassifierMixin):
It needs a score function to be able to evaluate the CV performance on the validation set. There is a default score function defined in the ClassifierMixin class (which returns the the mean accuracy and requires a predict function to be implemented in the Estimator class).
In my case, I implemented a custom score function within my estimator class.
The hyperparameter search and CV fit is done calling the fit function of RandomizedSearchCV.
RandomizedSearchCV(DNNClassifier(), param_distribs).fit(train_data)
This fit function runs the estimator's custom fit function on the train set and then the score function on the validation set.
This is done using the _fit_and_score function from the ._validation library.
So I can access the automatically splitted validation set (33% of my train_data input) at the end of my estimator's fit function.
I'd have preferred to access it within my estimator's fit function so that I can use it to plot validation accuracy over training steps and for early stop (I'll keep a separate validation set for that).
I guess I could reconstruct the automatically generated validation set by looking for the missing indexes from my initial train_data (the train_data used in the estimator's fit function has 66% of the indexes of the initial train_data).
If that is something that someone has already done I'd love to hear about it!
I'm doing LogisticRegressionWithSGD on spark for synthetic dataset. I've calculated the error on matlab using vanilla gradient descent and on R which is ~5%. I got similar weight that was used in the model that I used to generate y. The dataset was generated using this example.
While I am able to get very close error rate at the end with different stepsize tuning, the weights for individual feature isn't the same. In fact, it varies a lot. I tried LBFGS for spark and it's able to predict both error and weight correctly in few iterations. My problem is with logistic regression with SGD on spark.
The weight I'm getting:
[0.466521045342,0.699614292387,0.932673108363,0.464446310304,0.231458578991,0.464372487994,0.700369689073,0.928407671516,0.467131704168,0.231629845549,0.46465456877,0.700207596219,0.935570594833,0.465697758292,0.230127949916]
The weight I want:
[2,3,4,2,1,2,3,4,2,1,2,3,4,2,1]
Intercept I'm getting: 0.2638102010832128
Intercept I want: 1
Q.1. Is it the problem with synthetic dataset. I have tried tuning with minBatchFraction, stepSize, iteration and intercept. I couldn't get it right.
Q.2. Why is spark giving me this weird weights? Would it be wrong to expect similar weights from Spark's model?
Please let me know if extra details is needed to answer my question.
It actually did converge, your weights are normalized between 0 and 1, while expected max value is for, multiply everything you got from SGD with 4, you can see the correlation even for intercept value.