KerasTuner: best step is always 0? - keras

The plot from my keras tuner shows that the 'best step' is always zero. Does this mean that the first iteration of each trial was the best (lowest MSE) or am I misunderstanding?
Keras tuner plot picture
Many thanks

Related

What might be the best loss function when target is a gaussian label?

I have a simple CNN with the inputs as
Cropped grayscale patches of size MxN centered on the object of interest. The intensity of each patch is rescaled to [0, 1].
Target Gaussian label of the same size MXN with values ranging
in [5.0155e-173, 1]. This label is kept fixed throughout the training.
The goal is to learn the target label and use the learned model to detect the object in a test image. I am using Adam optimizer with various loss functions such as categorical_crossentropy, mean_squared_error, and mean_absolute_error but training halts soon probably due to the low values returned by all these loss functions (vanishing gradients?). Increasing the batch size from 1 to 16~32 sometimes helps in completing the iteration but gives undesired outcomes at test time.
Is it because the loss function is too sensitive to the lower values in the target and even treats them as outliers hence steering the whole learning process in the wrong direction?
I'll be grateful for your help in fixing the loss function in such a scenario.
I think that the best choice here is to use some probability ditribution pseudo-distance, the first choice that came to my mind is to use Kullback-Leiber Divergence, it is already implemented in pytorch and keras( see [kldivloss](https://pytorch.org/docs/stable/nn.html#kldivloss and keras) Other famous ditances may include Jesnsen-Shanon divergence and Earth-Mover distance (This the same distance thatwas used in WGAN

Can I change direction of ROC?

I made a model, and I wanted to get AUC with independent test set.
So I got AUC which is 0.3.
In many writings, high AUC(1-0.3) can appear if i change 'direction'argument of roc function.
(i understand this comment BUT)
I'm thinking about this ;
AUC=0.3 means that
model which was made by training set could't predict test set .
Changing direction simply give high AUC, but isn't right principally.
Can I change dirrection of roc ? I

Binary classifier too confident to plot ROC curve with sklearn?

I have a created a binary classifier in Tensorflow that will output a generator object containing predictions. I extract the predictions (e.g [0.98, 0.02]) from the object into a list, later converting this into a numpy array. I have the corresponding array of labels for these predictions. Using these two arrays I believe I should be able to plot a roc curve via:
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve
fpr, tpr, thr = roc_curve(labels, predictions[:,1])
plt.plot(fpr, tpr)
plt.show()
print(fpr)
print(tpr)
print(thr)
Where predictions[:,1] gives the positive prediction score. However, running this code leads to only a flat line and only three values for each fpr, tpr and thr:
Flat line roc plot and limited function outputs.
The only theory I have as to why this is happening is because my classifier is too sure of it's predictions. Many, if not all, of the positive prediction scores are 1.0, or incredibly close to zero:
[[9.9999976e-01 2.8635742e-07]
[3.3693312e-11 1.0000000e+00]
[1.0000000e+00 9.8642090e-09]
...
[1.0106111e-15 1.0000000e+00]
[1.0000000e+00 1.0030269e-09]
[8.6156778e-15 1.0000000e+00]]
According to a few sources including this stackoverflow thread and this stackoverflow thread, the very polar values of my predictions could be creating an issue for roc_curve().
Is my intuition correct? If so is there anything I can do about it to plot my roc_curve?
I've tried to include all the information I think would be relevant to this issue but if you would like any more information about my program please ask.
ROC is generated by changing the threshold on your predictions and finding the sensitivity and specificity for each threshold. This generally means that as you increase the threshold, your sensitivity decreases but your specificity increases and it draws a picture of the overall quality of your predicted probabilities. In your case, since everything is either 0 or 1 (or very close to it) there are no meaningful thresholds to use. That's why the thr value is basically [ 1, 1, 1 ].
You can try to arbitrarily pull the values closer to 0.5 or alternatively implement your own ROC curve calculation with more tolerance for small differences.
On the other hand you might want to review your network because such result values often mean there is a problem there, maybe the labels leaked into the network somehow and therefore it produces perfect results.

How do I test my classifier accuracy against random values?

I've set up my first scikit-learn example to play with and I'm trying to gauge accuracy on my predictions. I've got training and test lists set up fine, but I'm getting ~0.95 accuracy even if I give it random values.
This looks to be because I'm checking for 0/1 labels, and 95% of the labels are zero's, so it's guessing on 0's and getting 0.95 accuracy (I think?). Obviously this isn't what I want.
How do I go about deciding if my classifiers are working, and how do I get meaningful accuracy values?
You have a clear class imbalance issue. Your classifier is predicting 0 all the time knowing it will be right 95% of the time. You can inspect this by calling predict(X_test) on your fitted classifier. If all the values are 0 you know this is the case.
To get a better idea on how the model performs you can upsample the data labelled with 1 or down sample the data labelled with 0. You can use this package which builds off scikit-learn and implements a number of resampling methods. Alternatively, you can use scikit learns resampling method. Which will bootstrap new data points for you.

Keras reinforcement training with softmax

A project i am working on has a reinforcement learning stage using the REINFORCE algorithm. The used model has a final softmax activation layer and because of that a negative learning rate is used as a replacement for negative rewards. I have some doubts about this process and can't find much literature on using a negative learning rate.
Does reinforement learning work with switching learning rate between positive and negative? and if not what would be a better approach, get rid of softmax or has keras a nice option for this?
Loss function:
def log_loss(y_true, y_pred):
'''
Keras 'loss' function for the REINFORCE algorithm,
where y_true is the action that was taken, and updates
with the negative gradient will make that action more likely.
We use the negative gradient because keras expects training data
to minimize a loss function.
'''
return -y_true * K.log(K.clip(y_pred, K.epsilon(), 1.0 - K.epsilon()))
Switching learning rate:
K.set_value(optimizer.lr, lr * (+1 if won else -1))
learner_net.train_on_batch(np.concatenate(st_tensor, axis=0),
np.concatenate(mv_tensor, axis=0))
Update, test results
I ran a test with only positive reinforcement samples, omitting all negative examples and thus the negative learning rate. Winning rate is rising, it is improving and i can safely assume using a negative learning rate is not correct.
anybody any thoughts on how we should implement it?
Update, model explanation
We are trying to recreate AlphaGo as described by DeepMind, the slow policy net:
For the first stage of the training pipeline, we build on prior work
on predicting expert moves in the game of Go using supervised
learning13,21–24. The SL policy network pσ(a| s) alternates between convolutional
layers with weights σ, and rectifier nonlinearities. A final softmax
layer outputs a probability distribution over all legal moves a.
Not sure if it the best way but at least i found a way that works.
for all negative training samples i reuse the network prediction, set the action i want to unlearn to zero and adjust all values to sum up to one again
i tried several ways to adjust them afterwards but haven't run enough tests to be sure what works best:
apply softmax ( action that has to be unlearned gets a nonzero value.. )
redistribute old action value over all other actions
set all illigal action values to zero and distribute the total removed value
distribute value proportional to value of other values
probably there are several other ways to do so, it might depend on use case what works best and there might be a better way to do so but this one works at least.

Resources