How to calculate F1-micro score using lasagne - scikit-learn

import theano.tensor as T
import numpy as np
from nolearn.lasagne import NeuralNet
def multilabel_objective(predictions, targets):
epsilon = np.float32(1.0e-6)
one = np.float32(1.0)
pred = T.clip(predictions, epsilon, one - epsilon)
return -T.sum(targets * T.log(pred) + (one - targets) * T.log(one - pred), axis=1)
net = NeuralNet(
# your other parameters here (layers, update, max_epochs...)
# here are the one you're interested in:
objective_loss_function=multilabel_objective,
custom_score=("validation score", lambda x, y: np.mean(np.abs(x - y)))
)
I found this code online and wanted to test it. It did work, the results include training loss, test loss, validation score and during time and so on.
But how can I get the F1-micro score? Also, if I was trying to import scikit-learn to calculate the F1 after adding the following code:
data = data.astype(np.float32)
classes = classes.astype(np.float32)
net.fit(data, classes)
score = cross_validation.cross_val_score(net, data, classes, scoring='f1', cv=10)
print score
I got this error:
ValueError: Can't handle mix of multilabel-indicator and
continuous-multioutput
How to implement F1-micro calculation based on above code?

Suppose your true labels on the test set are y_true (shape: (n_samples, n_classes), composed only of 0s and 1s), and your test observations are X_test (shape: (n_samples, n_features)).
Then you get your net predicted values on the test set by y_test = net.predict(X_test).
If you are doing multiclass classification:
Since in your network you have set regression to False, this should be composed of 0s and 1s only, too.
You can compute the micro averaged f1 score with:
from sklearn.metrics import f1_score
f1_score(y_true, y_pred, average='micro')
Small code sample to illustrate this (with dummy data, use your actual y_test and y_true):
from sklearn.metrics import f1_score
import numpy as np
y_true = np.array([[0, 0, 1], [0, 1, 0], [0, 0, 1], [0, 0, 1], [0, 1, 0]])
y_pred = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1], [0, 0, 1], [0, 0, 1]])
t = f1_score(y_true, y_pred, average='micro')
If you are doing multilabel classification:
You are not outputting a matrix of 0 and 1, but a matrix of probabilities. y_pred[i, j] is the probability that observation i belongs to the class j.
You need to define a threshold value, above which you will say an observation belongs to a given class. Then you can attribute labels accordingly and proceed just the same as in the previous case.
thresh = 0.8 # choose your own value
y_test_binary = np.where(y_test > thresh, 1, 0)
# creates an array with 1 where y_test>thresh, 0 elsewhere
f1_score(y_true, y_pred_binary, average='micro')

Related

torchmetrics represent uncertainty

I am using torchmetrics to calculate metrics such as F1 score, Recall, Precision and Accuracy in multilabel classification setting. With random initiliazed weights the softmax output (i.e. prediction) might look like this with a batch size of 8:
import torch
y_pred = torch.tensor([[0.1944, 0.1931, 0.2184, 0.1968, 0.1973],
[0.2182, 0.1932, 0.1945, 0.1973, 0.1968],
[0.2182, 0.1932, 0.1944, 0.1973, 0.1969],
[0.2182, 0.1931, 0.1945, 0.1973, 0.1968],
[0.2184, 0.1931, 0.1944, 0.1973, 0.1968],
[0.2181, 0.1932, 0.1941, 0.1970, 0.1976],
[0.2183, 0.1932, 0.1944, 0.1974, 0.1967],
[0.2182, 0.1931, 0.1945, 0.1973, 0.1968]])
With the correct labels (one-hot encoded):
y_true = torch.tensor([[0, 0, 1, 0, 1],
[0, 1, 0, 0, 1],
[0, 1, 0, 0, 1],
[0, 0, 1, 1, 0],
[0, 0, 1, 1, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0],
[0, 0, 1, 0, 1]])
And I can calculate the metrics by taking argmax:
import torchmetrics
torchmetrics.functional.f1_score(y_pred.argmax(-1), y_true.argmax(-1))
output:
tensor(0.1250)
The first prediction happens to be correct while the rest are wrong. However, none of the predictive probabilities are above 0.3, which means that the model is generally uncertain about the predictions. I would like to encode this and say that the f1 score should be 0.0 because none of the predictive probabilities are above a 0.3 threshold.
Is this possible with torchmetrics or sklearn library?
Is this common practice?
You need to threshold you predictions before passing them to your torchmetrics
t0, t1, mask_gt = batch
mask_pred = self.forward(t0, t1)
loss = self.criterion(mask_pred.squeeze().float(), mask_gt.squeeze().float())
mask_pred = torch.sigmoid(mask_pred).squeeze()
mask_pred = torch.where(mask_pred > 0.5, 1, 0)
# integers to comply with metrics input type
mask_pred = mask_pred.long()
mask_gt = mask_gt.long()
f1_score = self.f1(mask_pred, mask_gt)
precision = self.precision_(mask_pred, mask_gt)
recall = self.recall(mask_pred, mask_gt)
jaccard = self.jaccard(mask_pred, mask_gt)
The defined torchmetrics
self.f1 = F1Score(num_classes=2, average='macro', mdmc_average='samplewise')
self.recall = Recall(num_classes=2, average='macro', mdmc_average='samplewise')
self.precision_ = Precision(num_classes=2, average='macro', mdmc_average='samplewise') # self.precision exists in torch.nn.Module. Hence '_' symbol
self.jaccard = JaccardIndex(num_classes=2)

Is this the correct use of sklearn classification report for multi-label classification reports?

I am training a neural network with tf-keras. It is a multi-label classification where each sample belongs to multiple classes [1,0,1,0..etc] .. the final model line (just for clarity) is:
model.add(tf.keras.layers.Dense(9, activation='sigmoid'))#final layer
model.compile(loss='binary_crossentropy', optimizer=optimizer,
metrics=[tf.keras.metrics.BinaryAccuracy(),
tfa.metrics.F1Score(num_classes=9, average='macro',threshold=0.5)])
I need to generate precision, recall and F1 scores for these (I already get the F1 score reported during training). For this I am using sklearns classification report, but I need to confirm that I am using it correctly in the multi-label setting.
from sklearn.metrics import classification_report
pred = model.predict(x_test)
pred_one_hot = np.around(pred)#this generates a one hot representation of predictions
print(classification_report(one_hot_ground_truth, pred_one_hot))
This works fine and i get the full report for every class including F1 scores that match the F1score metric from tensorflow addons (for macro F1). Sorry this post is verbose but what I am unsure about is:
Is it correct that the predictions need to be one-hot encoded in the case of the multi-label setting? If I pass in the normal prediction scores (sigmoid probabilities) an error is thrown...
thank you.
It is correct to use classification_report for both binary, multi-class and multi-label classification.
The labels are not one-hot-encoded in case of multi-class classification. They simply need to be either indices or labels.
You can see that both code below yield the same output:
Example with indices
from sklearn.metrics import classification_report
import numpy as np
labels = np.array(['A', 'B', 'C'])
y_true = np.array([1, 2, 0, 1, 2, 0])
y_pred = np.array([1, 2, 1, 1, 1, 0])
print(classification_report(y_true, y_pred, target_names=labels))
Example with labels
from sklearn.metrics import classification_report
import numpy as np
labels = np.array(['A', 'B', 'C'])
y_true = labels[np.array([1, 2, 0, 1, 2, 0])]
y_pred = labels[np.array([1, 2, 1, 1, 1, 0])]
print(classification_report(y_true, y_pred))
Both returns
precision recall f1-score support
A 1.00 0.50 0.67 2
B 0.50 1.00 0.67 2
C 1.00 0.50 0.67 2
accuracy 0.67 6
macro avg 0.83 0.67 0.67 6
weighted avg 0.83 0.67 0.67 6
In the context of multi-label classification, classification_report can be used like in the example below:
from sklearn.metrics import classification_report
import numpy as np
labels =['A', 'B', 'C']
y_true = np.array([[1, 0, 1],
[0, 1, 0],
[1, 1, 1]])
y_pred = np.array([[1, 0, 0],
[0, 1, 1],
[1, 1, 1]])
print(classification_report(y_true, y_pred, target_names=labels))

LDA covariance matrix not match calculated covariance matrix

I'm looking to better understand the covariance_ attribute returned by scikit-learn's LDA object.
I'm sure I'm missing something, but I expect it to be the covariance matrix associated with the input data. However, when I compare .covariance_ against the covariance matrix returned by numpy.cov(), I get different results.
Can anyone help me understand what I am missing? Thanks and happy to provide any additional information.
Please find a simple example illustrating the discrepancy below.
import numpy as np
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
# Sample Data
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 0, 0, 0])
# Covariance matrix via np.cov
print(np.cov(X.T))
# Covariance matrix via LDA
clf = LinearDiscriminantAnalysis(store_covariance=True).fit(X, y)
print(clf.covariance_)
In sklearn.discrimnant_analysis.LinearDiscriminantAnalysis, the covariance is computed as follow:
In [1]: import numpy as np
...: cov = np.zeros(shape=(X.shape[1], X.shape[1]))
...: for c in np.unique(y):
...: Xg = X[y == c, :]
...: cov += np.count_nonzero(y==c) / len(y) * np.cov(Xg.T, bias=1)
...: print(cov)
array([[0.66666667, 0.33333333],
[0.33333333, 0.22222222]])
So it corresponds to the sum of the covariance of each individual class multiplied by a prior which is the class frequency. Note that this prior is a parameter of LDA.

How do I mask a loss function in Keras with the TensorFlow backend?

I am trying to implement a sequence-to-sequence task using LSTM by Keras with the TensorFlow backend. The inputs are English sentences with variable lengths. To construct a dataset with 2-D shape [batch_number, max_sentence_length], I add EOF at the end of the line and pad each sentence with enough placeholders, e.g. #. And then each character in the sentence is transformed into a one-hot vector, so that the dataset has 3-D shape [batch_number, max_sentence_length, character_number]. After LSTM encoder and decoder layers, softmax cross-entropy between output and target is computed.
To eliminate the padding effect in model training, masking could be used on input and loss function. Mask input in Keras can be done by using layers.core.Masking. In TensorFlow, masking on loss function can be done as follows: custom masked loss function in TensorFlow.
However, I don't find a way to realize it in Keras, since a user-defined loss function in Keras only accepts parameters y_true and y_pred. So how to input true sequence_lengths to loss function and mask?
Besides, I find a function _weighted_masked_objective(fn) in \keras\engine\training.py. Its definition is
Adds support for masking and sample-weighting to an objective function.
But it seems that the function can only accept fn(y_true, y_pred). Is there a way to use this function to solve my problem?
To be specific, I modify the example of Yu-Yang.
from keras.models import Model
from keras.layers import Input, Masking, LSTM, Dense, RepeatVector, TimeDistributed, Activation
import numpy as np
from numpy.random import seed as random_seed
random_seed(123)
max_sentence_length = 5
character_number = 3 # valid character 'a, b' and placeholder '#'
input_tensor = Input(shape=(max_sentence_length, character_number))
masked_input = Masking(mask_value=0)(input_tensor)
encoder_output = LSTM(10, return_sequences=False)(masked_input)
repeat_output = RepeatVector(max_sentence_length)(encoder_output)
decoder_output = LSTM(10, return_sequences=True)(repeat_output)
output = Dense(3, activation='softmax')(decoder_output)
model = Model(input_tensor, output)
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.summary()
X = np.array([[[0, 0, 0], [0, 0, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]],
[[0, 0, 0], [0, 1, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]]])
y_true = np.array([[[0, 0, 1], [0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 1, 0]], # the batch is ['##abb','#babb'], padding '#'
[[0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]]])
y_pred = model.predict(X)
print('y_pred:', y_pred)
print('y_true:', y_true)
print('model.evaluate:', model.evaluate(X, y_true))
# See if the loss computed by model.evaluate() is equal to the masked loss
import tensorflow as tf
logits=tf.constant(y_pred, dtype=tf.float32)
target=tf.constant(y_true, dtype=tf.float32)
cross_entropy = tf.reduce_mean(-tf.reduce_sum(target * tf.log(logits),axis=2))
losses = -tf.reduce_sum(target * tf.log(logits),axis=2)
sequence_lengths=tf.constant([3,4])
mask = tf.reverse(tf.sequence_mask(sequence_lengths,maxlen=max_sentence_length),[0,1])
losses = tf.boolean_mask(losses, mask)
masked_loss = tf.reduce_mean(losses)
with tf.Session() as sess:
c_e = sess.run(cross_entropy)
m_c_e=sess.run(masked_loss)
print("tf unmasked_loss:", c_e)
print("tf masked_loss:", m_c_e)
The output in Keras and TensorFlow are compared as follows:
As shown above, masking is disabled after some kinds of layers. So how to mask the loss function in Keras when those layers are added?
If there's a mask in your model, it'll be propagated layer-by-layer and eventually applied to the loss. So if you're padding and masking the sequences in a correct way, the loss on the padding placeholders would be ignored.
Some Details:
It's a bit involved to explain the whole process, so I'll just break it down to several steps:
In compile(), the mask is collected by calling compute_mask() and applied to the loss(es) (irrelevant lines are ignored for clarity).
weighted_losses = [_weighted_masked_objective(fn) for fn in loss_functions]
# Prepare output masks.
masks = self.compute_mask(self.inputs, mask=None)
if masks is None:
masks = [None for _ in self.outputs]
if not isinstance(masks, list):
masks = [masks]
# Compute total loss.
total_loss = None
with K.name_scope('loss'):
for i in range(len(self.outputs)):
y_true = self.targets[i]
y_pred = self.outputs[i]
weighted_loss = weighted_losses[i]
sample_weight = sample_weights[i]
mask = masks[i]
with K.name_scope(self.output_names[i] + '_loss'):
output_loss = weighted_loss(y_true, y_pred,
sample_weight, mask)
Inside Model.compute_mask(), run_internal_graph() is called.
Inside run_internal_graph(), the masks in the model is propagated layer-by-layer from the model's inputs to outputs by calling Layer.compute_mask() for each layer iteratively.
So if you're using a Masking layer in your model, you shouldn't worry about the loss on the padding placeholders. The loss on those entries will be masked out as you've probably already seen inside _weighted_masked_objective().
A Small Example:
max_sentence_length = 5
character_number = 2
input_tensor = Input(shape=(max_sentence_length, character_number))
masked_input = Masking(mask_value=0)(input_tensor)
output = LSTM(3, return_sequences=True)(masked_input)
model = Model(input_tensor, output)
model.compile(loss='mae', optimizer='adam')
X = np.array([[[0, 0], [0, 0], [1, 0], [0, 1], [0, 1]],
[[0, 0], [0, 1], [1, 0], [0, 1], [0, 1]]])
y_true = np.ones((2, max_sentence_length, 3))
y_pred = model.predict(X)
print(y_pred)
[[[ 0. 0. 0. ]
[ 0. 0. 0. ]
[-0.11980877 0.05803877 0.07880752]
[-0.00429189 0.13382857 0.19167568]
[ 0.06817091 0.19093043 0.26219055]]
[[ 0. 0. 0. ]
[ 0.0651961 0.10283815 0.12413475]
[-0.04420842 0.137494 0.13727818]
[ 0.04479844 0.17440712 0.24715884]
[ 0.11117355 0.21645413 0.30220413]]]
# See if the loss computed by model.evaluate() is equal to the masked loss
unmasked_loss = np.abs(1 - y_pred).mean()
masked_loss = np.abs(1 - y_pred[y_pred != 0]).mean()
print(model.evaluate(X, y_true))
0.881977558136
print(masked_loss)
0.881978
print(unmasked_loss)
0.917384
As can be seen from this example, the loss on the masked part (the zeroes in y_pred) is ignored, and the output of model.evaluate() is equal to masked_loss.
EDIT:
If there's a recurrent layer with return_sequences=False, the mask stop propagates (i.e., the returned mask is None). In RNN.compute_mask():
def compute_mask(self, inputs, mask):
if isinstance(mask, list):
mask = mask[0]
output_mask = mask if self.return_sequences else None
if self.return_state:
state_mask = [None for _ in self.states]
return [output_mask] + state_mask
else:
return output_mask
In your case, if I understand correctly, you want a mask that's based on y_true, and whenever the value of y_true is [0, 0, 1] (the one-hot encoding of "#") you want the loss to be masked. If so, you need to mask the loss values in a somewhat similar way to Daniel's answer.
The main difference is the final average. The average should be taken over the number of unmasked values, which is just K.sum(mask). And also, y_true can be compared to the one-hot encoded vector [0, 0, 1] directly.
def get_loss(mask_value):
mask_value = K.variable(mask_value)
def masked_categorical_crossentropy(y_true, y_pred):
# find out which timesteps in `y_true` are not the padding character '#'
mask = K.all(K.equal(y_true, mask_value), axis=-1)
mask = 1 - K.cast(mask, K.floatx())
# multiply categorical_crossentropy with the mask
loss = K.categorical_crossentropy(y_true, y_pred) * mask
# take average w.r.t. the number of unmasked entries
return K.sum(loss) / K.sum(mask)
return masked_categorical_crossentropy
masked_categorical_crossentropy = get_loss(np.array([0, 0, 1]))
model = Model(input_tensor, output)
model.compile(loss=masked_categorical_crossentropy, optimizer='adam')
The output of the above code then shows that the loss is computed only on the unmasked values:
model.evaluate: 1.08339476585
tf unmasked_loss: 1.08989
tf masked_loss: 1.08339
The value is different from yours because I've changed the axis argument in tf.reverse from [0,1] to [1].
If you're not using masks as in Yu-Yang's answer, you can try this.
If you have your target data Y with length and padded with the mask value, you can:
import keras.backend as K
def custom_loss(yTrue,yPred):
#find which values in yTrue (target) are the mask value
isMask = K.equal(yTrue, maskValue) #true for all mask values
#since y is shaped as (batch, length, features), we need all features to be mask values
isMask = K.all(isMask, axis=-1) #the entire output vector must be true
#this second line is only necessary if the output features are more than 1
#transform to float (0 or 1) and invert
isMask = K.cast(isMask, dtype=K.floatx())
isMask = 1 - isMask #now mask values are zero, and others are 1
#multiply this by the inputs:
#maybe you might need K.expand_dims(isMask) to add the extra dimension removed by K.all
yTrue = yTrue * isMask
yPred = yPred * isMask
return someLossFunction(yTrue,yPred)
If you have padding only for the input data, or if Y has no length, you can have your own mask outside the function:
masks = [
[1,1,1,1,1,1,0,0,0],
[1,1,1,1,0,0,0,0,0],
[1,1,1,1,1,1,1,1,0]
]
#shape (samples, length). If it fails, make it (samples, length, 1).
import keras.backend as K
masks = K.constant(masks)
Since masks depend on your input data, you can use your mask value to know where to put zeros, such as:
masks = np.array((X_train == maskValue).all(), dtype='float64')
masks = 1 - masks
#here too, if you have a problem with dimensions in the multiplications below
#expand masks dimensions by adding a last dimension = 1.
And make your function taking masks from outside of it (you must recreate the loss function if you change the input data):
def customLoss(yTrue,yPred):
yTrue = masks*yTrue
yPred = masks*yPred
return someLossFunction(yTrue,yPred)
Does anyone know if keras automatically masks the loss function??
Since it provides a Masking layer and says nothing about the outputs, maybe it does it automatically?
I took both anwers and imporvised a way for Multiple Timesteps, single Missing target Values, Loss for LSTM(or other RecurrentNN) with return_sequences=True.
Daniels Answer would not suffice for multiple targets, due to isMask = K.all(isMask, axis=-1). Removing this aggregation made the function undifferentiable, probably. I do not know for shure, since I never run the pure function and cannot tell if its able to fit a model.
I fused Yu-Yangs's and Daniels answer together and it worked.
from tensorflow.keras.layers import Layer, Input, LSTM, Dense, TimeDistributed
from tensorflow.keras import Model, Sequential
import tensorflow.keras.backend as K
import numpy as np
mask_Value = -2
def get_loss(mask_value):
mask_value = K.variable(mask_value)
def masked_loss(yTrue,yPred):
#find which values in yTrue (target) are the mask value
isMask = K.equal(yTrue, mask_Value) #true for all mask values
#transform to float (0 or 1) and invert
isMask = K.cast(isMask, dtype=K.floatx())
isMask = 1 - isMask #now mask values are zero, and others are 1
isMask
#multiply this by the inputs:
#maybe you might need K.expand_dims(isMask) to add the extra dimension removed by K.all
yTrue = yTrue * isMask
yPred = yPred * isMask
# perform a root mean square error, whereas the mean is in respect to the mask
mean_loss = K.sum(K.square(yPred - yTrue))/K.sum(isMask)
loss = K.sqrt(mean_loss)
return loss
#RootMeanSquaredError()(yTrue,yPred)
return masked_loss
# define timeseries data
n_sample = 10
timesteps = 5
feat_inp = 2
feat_out = 2
X = np.random.uniform(0,1, (n_sample, timesteps, feat_inp))
y = np.random.uniform(0,1, (n_sample,timesteps, feat_out))
# define model
model = Sequential()
model.add(LSTM(50, activation='relu',return_sequences=True, input_shape=(timesteps, feat_inp)))
model.add(Dense(feat_out))
model.compile(optimizer='adam', loss=get_loss(mask_Value))
model.summary()
# %%
model.fit(X, y, epochs=50, verbose=0)
Note that Yu-Yang's answer does not appear to work on Tensorflow Keras 2.7.0
Surprisingly, model.evaluate does not compute masked_loss or unmasked_loss. Instead, it assumes that the loss from all masked input steps is zero (but still includes those steps in the mean() calculation). This means that every masked timestep actually reduces the calculated error!
#%% Yu-yang's example
# https://stackoverflow.com/a/47060797/3580080
import tensorflow as tf
import tensorflow.keras as keras
import numpy as np
# Fix the random seed for repeatable results
np.random.seed(5)
tf.random.set_seed(5)
max_sentence_length = 5
character_number = 2
input_tensor = keras.Input(shape=(max_sentence_length, character_number))
masked_input = keras.layers.Masking(mask_value=0)(input_tensor)
output = keras.layers.LSTM(3, return_sequences=True)(masked_input)
model = keras.Model(input_tensor, output)
model.compile(loss='mae', optimizer='adam')
X = np.array([[[0, 0], [0, 0], [1, 0], [0, 1], [0, 1]],
[[0, 0], [0, 1], [1, 0], [0, 1], [0, 1]]])
y_true = np.ones((2, max_sentence_length, 3))
y_pred = model.predict(X)
print(y_pred)
# See if the loss computed by model.evaluate() is equal to the masked loss
unmasked_loss = np.abs(1 - y_pred).mean()
masked_loss = np.abs(1 - y_pred[y_pred != 0]).mean()
print(f"model.evaluate= {model.evaluate(X, y_true)}")
print(f"masked loss= {masked_loss}")
print(f"unmasked loss= {unmasked_loss}")
Prints:
[[[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0.05340272 -0.06415359 -0.11803789]
[ 0.08775083 0.00600774 -0.10454659]
[ 0.11212641 0.07632366 -0.04133942]]
[[ 0. 0. 0. ]
[ 0.05394626 0.08956442 0.03843312]
[ 0.09092357 -0.02743799 -0.10386454]
[ 0.10791279 0.04083341 -0.08820333]
[ 0.12459432 0.09971555 -0.02882453]]]
1/1 [==============================] - 1s 658ms/step - loss: 0.6865
model.evaluate= 0.6864957213401794
masked loss= 0.9807082414627075
unmasked loss= 0.986495852470398
(This is intended as a comment rather than an answer).

Classification with restrictions

How should I best use scikit-learn for the following supervised classification problem (simplified), with binary features:
import numpy as np
from sklearn.tree import DecisionTreeClassifier
train_data = np.array([[0, 0, 1, 0],
[1, 0, 1, 1],
[0, 1, 1, 1]], dtype=bool)
train_targets = np.array([0, 1, 2])
c = DecisionTreeClassifier()
c.fit(train_data, train_targets)
p = c.predict(np.array([1, 1, 1, 1], dtype=bool))
print(p)
# -> [1]
That works fine. However, suppose now that I know a priori that the presence of feature 0 excludes class 1. Can additional information of this kind be easily included in the classification process?
Currently, I'm just doing some (problem-specific and heuristic) postprocessing to adjust the resulting class. I could perhaps also manually preprocess and split the dataset into two according to the feature, and train two classifiers separately (but with K such features, this ends up in 2^K splitting).
Can additional information of this kind be easily included in the classification process?
Domain-specific hacks are left to the user. The easiest way to do this is to predict probabilities...
>>> prob = c.predict_proba(X)
and then rig the probabilities to get the right class out.
>>> invalid = (prob[:, 1] == 1) & (X[:, 0] == 1)
>>> prob[invalid, 1] = -np.inf
>>> pred = c.classes_[np.argmax(prob, axis=1)]
That's -np.inf instead of 0 so the 1 label doesn't come up as a result of tie-breaking vs. other zero-probability classes.

Resources