I am using Keras with the scikit-learn wrapper. In particular, I want to use GridSearchCV for hyper-parameters optimisation.
This is a multi-class problem, i.e. the target variable can have only one label chosen on a set of n classes. For instance, the target variable can be 'Class1', 'Class2' ... 'Classn'.
# self._arch creates my model
nn = KerasClassifier(build_fn=self._arch, verbose=0)
clf = GridSearchCV(
nn,
param_grid={ ... },
# I use f1 score macro averaged
scoring='f1_macro',
n_jobs=-1)
# self.fX is the data matrix
# self.fy_enc is the target variable encoded with one-hot format
clf.fit(self.fX.values, self.fy_enc.values)
The problem is that, when score is computed during cross-validation, the true label for validation samples is encoded one-hot, while the prediction for some reason collapses to binary label (when the target variable has only two classes). For instance, this is the last part of the stack trace:
...........................................................................
/Users/fbrundu/.pyenv/versions/3.6.0/lib/python3.6/site-packages/sklearn/metrics/classification.py in _check_targets(y_true=array([[ 0., 1.],
[ 0., 1.],
[ 0... 0., 1.],
[ 0., 1.],
[ 0., 1.]]), y_pred=array([1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1,...0, 1, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 1, 1]))
77 if y_type == set(["binary", "multiclass"]):
78 y_type = set(["multiclass"])
79
80 if len(y_type) > 1:
81 raise ValueError("Can't handle mix of {0} and {1}"
---> 82 "".format(type_true, type_pred))
type_true = 'multilabel-indicator'
type_pred = 'binary'
83
84 # We can't have more than one value on y_type => The set is no more needed
85 y_type = y_type.pop()
86
ValueError: Can't handle mix of multilabel-indicator and binary
How can I instruct Keras/sklearn to give back predictions in one-hot encoding?
Following Vivek's comment, I used the original (not one-hot-encoded) target array, and I configured (in my Keras model, see code) the loss sparse_categorical_crossentropy, as per the comments to this issue.
arch.compile(
optimizer='sgd',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Related
I'm working on a classification problem. The number of classes is 5. I have a ground truth vector that has the shape (3) instead of 1. The values in this target vector are the possible classes and the predicted vector is of the shape (1x5) which holds the softmax scores for all the classes.
For example:
predicted_vector = tensor([0.0669, 0.1336, 0.3400, 0.3392, 0.1203]
ground_truth = tensor([3,2,5])
For the above illustration, a typical argmax operation would result in declaring class 3 as the predicted class (0.34) but I want the model to reward even if the argmax class is any of 3,2, or 5.
Which loss function is recommended for such a use case?
As jodag pointed out in the comments you can try to treat it as a multi-label classification problem.
So [[0, 1, 2], [0, 2, 4], [3, 3, 3]] will be transformed into:
tensor([[1., 1., 1., 0., 0.],
[1., 0., 1., 0., 1.],
[0., 0., 0., 1., 0.]])
Here is an example of how this can be implemented:
import torch
from torch.nn import BCELoss
predicted_vector = torch.rand((3, 5))
ground_truth = torch.LongTensor([[0, 1, 2], [0, 2, 4], [3, 3, 3]])
labels_onehot = torch.zeros_like(predicted_vector)
labels_onehot.scatter_(1, ground_truth, 1)
loss_fn = BCELoss()
loss = loss_fn(predicted_vector, labels_onehot)
Also you can add different weights to different labels
For this problem, a given sample is in exactly one class (say, class 3), but for training purposes, predicting class 2 or 5 is still okay so the model isn't penalised that heavily.
This is a typical single-label, multi-class problem, but with probabilistic (“soft”) labels, and CrossEntropyLoss should be used (and not use softmax()).
In this example, the (soft) target might be a probability of 0.7 for class 3, a probability of 0.2 for class 2, and a probability of 0.1 for class 5 (and zero for everything else).
I am having a hard time understanding the following scenario. I have a output probability of 0.0 on each class which means value of metrics such as f1 score, accuracy and recall should be zero? However i get the following:
import torch, torchmetrics
preds = torch.tensor([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
target = torch.tensor([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
print("F1: ", torchmetrics.functional.f1_score(preds, target))
print("Accuracy: ", torchmetrics.functional.accuracy(preds, target))
print("Recall: ", torchmetrics.functional.recall(preds, target))
print("Precision: ", torchmetrics.functional.precision(preds, target))
Output:
F1: tensor(0.)
Accuracy: tensor(0.6667)
Recall: tensor(0.)
Precision: tensor(0.)
Why is accuracy 0.6667? I would expect all outputs to be 0.0.
Your preds is a probabilities array for multi-label classification problem:
To make it simpler, I will assume the example like that:
preds = torch.tensor([[0., 0., 0.]]) # multi-labels [label1, label2, label3]
target = torch.tensor([[1, 0, 0]])
The true negatives are 2 since classifier predicts not existence for label2 and label3 while label2 and label3 indeed should not be existed.
The true positives are 0 since classifier predicts the existence of any label while a label should be existed.
The false negative is 1 since classifier predicts no existence for label1 while label1 should be existed.
The false positives are 0 since classifier predicts any label while a label should not be existed.
According to the above equation, Accuracy = 2/3 = 0.6667
You can read here more about different metrics and their calculations.
I have several Pytorch tensors ranging from 1-dimensional (e.g. torch.Size([128]), to 4-dimensional (e.g. torch.Size([256, 128, 3, 3]). Each tensor represents a weight in a neural network.
For each of these tensors I need to upscale 1 or 2 dimensions, for example
torch.Size([128])to torch.Size([256]),
torch.Size([256, 128, 3, 3]) to torch.Size([512, 256, 3, 3]),
torch.Size([3, 256, 1, 1]) to torch.Size([3, 512, 1, 1]).
I've looked at torch.nn.Upsample or nn.functional.interpolate and similar functions but I can't find a good way to do this comprehensively for each of my problems other than hardcoding it.
In the case of the simple 1D example I'm looking for a scaled version of my original tensor, something like this:
torch.arange(0, 9, dtype=torch.float32)
t = torch.arange(0, 9, dtype=torch.float32)
# = tensor([0., 1., 2., 3., 4., 5., 6., 7., 8.])
t_up = upsample(factor=2)
# = tensor([0., 0.5, 1., 1.5, 2., 2.5, 3., 3.5, 4., 4.5, 5., 5.5, 6., 6.5 7., 7.5, 8.])
Any help would be appreciated.
Your pattern is very irregular as:
torch.Size([128]) to torch.Size([256]) - 1D and interpolate everything
torch.Size([256, 128, 3, 3]) to torch.Size([512, 256, 3, 3]) - 4D and upscale first two dimensions
torch.Size([3, 256, 1, 1]) to torch.Size([3, 512, 1, 1]) - 3D and upscale only second dimension without the first
There is no clear way around "hard coding" in this case and "clever" approaches would probably only raise eyebrows when someone is going over your code.
Your 1D example uses linear mode with align_corners=False, not sure about 4D examples, but those would require bilinear mode at least.
size for torch.nn.functional.interpolate flattens 1 dimensions for some reason, hence only scale_factor is an option.
Some of the data has to be reshaped for interpolate
All in all, hardcoding and some comments are the best option in this case as there is no clear way to group different ways of expanding tensors you are given (and trying to be smart in this case is probably a dead end).
I am using Keras Tuner and using RandomSearch() to hypertune my regression model. While I can hypertune using "relu" and "selu", I am unable to do the same for Leaky Relu. I understand that the reason "relu" and "selu" string works because, for "relu" and "selu", string aliases are available. String alias is not available for Leaky Relu. I tried passing a callable object of Leaky Relu (see my example below) but it doesn't seem to work. Can you please advise me how to do that? I have the same issue with using Parametric Leaky Relu,
Thank you in advance!
def build_model(hp):
model = Sequential()
model.add(
Dense(
units = 18,
kernel_initializer = 'normal',
activation = 'relu',
input_shape = (18, )
)
)
for i in range(hp.Int( name = "num_layers", min_value = 1, max_value = 5)):
model.add(
Dense(
units = hp.Int(
name = "units_" + str(i),
min_value = 18,
max_value = 180,
step = 18),
kernel_initializer = 'normal',
activation = hp.Choice(
name = 'dense_activation',
values=['relu', 'selu', LeakyReLU(alpha=0.01) ],
default='relu'
)
)
)
model.add( Dense( units = 1 ) )
model.compile(
optimizer = tf.keras.optimizers.Adam(
hp.Choice(
name = "learning_rate", values = [1e-2, 1e-3, 1e-4]
)
),
loss = 'mse'
)
return model
As a work-around, you can add another activation function in the tf.keras.activations.* module by modifying the source file ( which you'll see is activations.py )
Here's the code for tf.keras.activations.relu which you'll see in activations.py,
#keras_export('keras.activations.relu')
#dispatch.add_dispatch_support
def relu(x, alpha=0., max_value=None, threshold=0):
"""Applies the rectified linear unit activation function.
With default values, this returns the standard ReLU activation:
`max(x, 0)`, the element-wise maximum of 0 and the input tensor.
Modifying default parameters allows you to use non-zero thresholds,
change the max value of the activation,
and to use a non-zero multiple of the input for values below the threshold.
For example:
>>> foo = tf.constant([-10, -5, 0.0, 5, 10], dtype = tf.float32)
>>> tf.keras.activations.relu(foo).numpy()
array([ 0., 0., 0., 5., 10.], dtype=float32)
>>> tf.keras.activations.relu(foo, alpha=0.5).numpy()
array([-5. , -2.5, 0. , 5. , 10. ], dtype=float32)
>>> tf.keras.activations.relu(foo, max_value=5).numpy()
array([0., 0., 0., 5., 5.], dtype=float32)
>>> tf.keras.activations.relu(foo, threshold=5).numpy()
array([-0., -0., 0., 0., 10.], dtype=float32)
Arguments:
x: Input `tensor` or `variable`.
alpha: A `float` that governs the slope for values lower than the
threshold.
max_value: A `float` that sets the saturation threshold (the largest value
the function will return).
threshold: A `float` giving the threshold value of the activation function
below which values will be damped or set to zero.
Returns:
A `Tensor` representing the input tensor,
transformed by the relu activation function.
Tensor will be of the same shape and dtype of input `x`.
"""
return K.relu(x, alpha=alpha, max_value=max_value, threshold=threshold)
Copy this code and paste it just below. Change #keras_export('keras.activations.relu') to #keras_export( 'keras.activations.leaky_relu' ) and also change the value of alpha to 0.2, like,
#keras_export('keras.activations.leaky_relu')
#dispatch.add_dispatch_support
def relu(x, alpha=0.2, max_value=None, threshold=0):
"""Applies the rectified linear unit activation function.
With default values, this returns the standard ReLU activation:
`max(x, 0)`, the element-wise maximum of 0 and the input tensor.
Modifying default parameters allows you to use non-zero thresholds,
change the max value of the activation,
and to use a non-zero multiple of the input for values below the threshold.
For example:
>>> foo = tf.constant([-10, -5, 0.0, 5, 10], dtype = tf.float32)
>>> tf.keras.activations.relu(foo).numpy()
array([ 0., 0., 0., 5., 10.], dtype=float32)
>>> tf.keras.activations.relu(foo, alpha=0.5).numpy()
array([-5. , -2.5, 0. , 5. , 10. ], dtype=float32)
>>> tf.keras.activations.relu(foo, max_value=5).numpy()
array([0., 0., 0., 5., 5.], dtype=float32)
>>> tf.keras.activations.relu(foo, threshold=5).numpy()
array([-0., -0., 0., 0., 10.], dtype=float32)
Arguments:
x: Input `tensor` or `variable`.
alpha: A `float` that governs the slope for values lower than the
threshold.
max_value: A `float` that sets the saturation threshold (the largest value
the function will return).
threshold: A `float` giving the threshold value of the activation function
below which values will be damped or set to zero.
Returns:
A `Tensor` representing the input tensor,
transformed by the relu activation function.
Tensor will be of the same shape and dtype of input `x`.
"""
return K.relu(x, alpha=alpha, max_value=max_value, threshold=threshold)
You can use the String alias keras.activations.leaky_relu.
# Custom activation function
from keras.layers import Activation
from keras import backend as K
from keras.utils.generic_utils import get_custom_objects
## Add leaky-relu so we can use it as a string
get_custom_objects().update({'leaky-relu': Activation(LeakyReLU(alpha=0.2))})
## Main activation functions available to use
activation_functions = ['sigmoid', 'relu', 'elu', 'leaky-relu', 'selu', 'gelu',"swish"]
I am training a regression model that takes approximates the weights for the equation :
Y = R+B+G
For this, I provide pre-determined values of R, B and G and Y, as training data.
R = np.array([-4, -10, -2, 8, 5, 22, 3], dtype=float)
B = np.array([4, -10, 0, 0, 15, 5, 1], dtype=float)
G = np.array([0, 10, 5, 8, 1, 2, 38], dtype=float)
Y = np.array([0, -10, 3, 16, 21, 29, 42], dtype=float)
The training batch consisted of 1x3 array corresponding to Ith value of R, B and G.
RBG = np.array([R,B,G]).transpose()
print(RBG)
[[ -4. 4. 0.]
[-10. -10. 10.]
[ -2. 0. 5.]
[ 8. 0. 8.]
[ 5. 15. 1.]
[ 22. 5. 2.]
[ 3. 1. 38.]]
I used a neural network with 3 inputs, 1 dense layer (hidden layer) with 2 neurons and the output layer (output) with a single neuron.
hidden = tf.keras.layers.Dense(units=2, input_shape=[3])
output = tf.keras.layers.Dense(units=1)
Further, I trained the model
model = tf.keras.Sequential([hidden, output])
model.compile(loss='mean_squared_error',
optimizer=tf.keras.optimizers.Adam(0.1))
history = model.fit(RBG,Y, epochs=500, verbose=False)
print("Finished training the model")
The loss vs epoch plot was as normal, decreasing and then flat.
But when I tested the model, using random values of R, B and G as
print(model.predict([[1],[1],[1]]))
expecting the output to be 1+1+1 = 3, but got the Value Error:
ValueError: Error when checking input: expected dense_2_input to have shape (3,) but got array with shape (1,)
Any idea where I might be getting wrong?
Surprisingly, the only input it responds to, is the training data itself. i.e,
print(model.predict(RBG))
[[ 2.1606684e-07]
[-3.0000000e+01]
[-3.2782555e-07]
[ 2.4000002e+01]
[ 4.4999996e+01]
[ 2.9000000e+01]
[ 4.2000000e+01]]
As the error says, the problem is in your shape of the input. You need to transpose [[1],[1],[1]] this input then you will have the shape that is expected by the model.
so npq = np.array([[1],[1],[1]]).transpose() and now feed this to model.predict(npq)