How do I calculate the confusion matrix in PyTorch efficiently? - pytorch

I have a tensor that contains my predictions and a tensor that contains the actual labels for my binary classification problem. How can I calculate the confusion matrix efficiently?

After my first version using a for-loop has proven inefficient, this is the fastest solution I came up with so far, for two equal-dimensional tensors prediction and truth:
def confusion(prediction, truth):
confusion_vector = prediction / truth
true_positives = torch.sum(confusion_vector == 1).item()
false_positives = torch.sum(confusion_vector == float('inf')).item()
true_negatives = torch.sum(torch.isnan(confusion_vector)).item()
false_negatives = torch.sum(confusion_vector == 0).item()
return true_positives, false_positives, true_negatives, false_negatives
Commented version and test-case at https://gist.github.com/the-bass/cae9f3976866776dea17a5049013258d

Related

compute loss between to matrices

In one of google's articles (EXPLORING TRADEOFFS IN MODELS FOR LOW-LATENCY SPEECH ENHANCEMENT) the use this
loss function to minimize the error, I have coded this function in Python with Tensorflow,
def loss_cal(noise_source, mask, target):
landa = 0.113
masked_spec = noise_source*mask
cc = K.l2_normalize(tf.abs(tf.pow(tf.maximum(1e-4, target), 0.3))-tf.abs(tf.pow(tf.maximum(1e-4, masked_spec), 0.3)))
cm = K.l2_normalize(tf.math.pow(tf.maximum(1e-4, target), 0.3)-tf.math.pow(tf.maximum(1e-4, masked_spec), 0.3))
res = tf.math.square(cc)+landa*tf.math.square(cm)
return res
but it returns a matrix, while the loss functions must be returing a scalre, please correct me, is my implementation wrong? or it's possible to train a model with loss as matrix?
Shouldn't you be using just norm, not l2_normalize?
l2_normalize returns a matrix normalized with the norm value.

How to sample Logits and Probabilties from a transformer seq2seq model for reinforcement learning?

skipping the formalities:
I am trying to apply reinforcement learning to a transformer based seq2seq model (for abstractive summarization purposes) in Pytorch.
My current setup looks something like this:
I am getting a greedy distribution (summary) from the model by inferring one token at a time in a loop
def get_greedy_distribution(model, batch):
src, (shift_tgt, lbl_tgt), segs, clss, mask_src, mask_tgt, mask_cls = batch
# the mock targets are just torch.zeros tensors to store inferred tokens
mock_tgt = get_mock_tgt(shift_tgt)
mock_return = get_mock_tgt(shift_tgt)
max_length = shift_tgt.shape[1]
with torch.no_grad():
for i in range(0, max_length-1):
prediction = model(src, mock_tgt, segs, clss, mask_src, mask_tgt, mask_cls)
prediction = F.softmax(prediction, dim=2)
val, ix = prediction.data.topk(1)
mock_tgt[:, i+1] = ix.squeeze()[:, i].detach()
mock_return[:, i] = ix.squeeze()[:, i].detach()
return mock_return
I am getting a sample distribution, with probabilities, from the model in a similar way:
def get_distribution(model, batch):
src, (shift_tgt, lbl_tgt), segs, clss, mask_src, mask_tgt, mask_cls = batch
mock_tgt = get_mock_tgt(shift_tgt)
mock_return = get_mock_tgt(shift_tgt)
max_length = shift_tgt.shape[1]
log_probs = []
for i in range(0, max_length-1):
prediction = model(src, mock_tgt, segs, clss, mask_src, mask_tgt, mask_cls)
prediction = F.softmax(prediction, dim=2)
multi_dist = Categorical(prediction[:, i])
x_t = multi_dist.sample()
log_prob = multi_dist.log_prob(x_t)
mock_tgt[:, i+1] = x_t
mock_return[:, i] = x_t
log_probs.append(log_prob)
return mock_return, log_probs
However, I am a bit unsure if I am inferring the sample distribution correctly. This would work well in an RNN context where I can sample logits and probabilities during the typical RNN loop, but it feels slightly wrong when using a Transformer.
How would you suggest to approach the Transformer for a typical baseline-sampled reinforcement learning setup (I am guessing it is a policy gradient)?
Pytorch code is preferred but if you have Tensorflow examples I am sure I can figure it out.

How to calculate unbalanced weights for BCEWithLogitsLoss in pytorch

I am trying to solve one multilabel problem with 270 labels and i have converted target labels into one hot encoded form. I am using BCEWithLogitsLoss(). Since training data is unbalanced, I am using pos_weight argument but i am bit confused.
pos_weight (Tensor, optional) – a weight of positive examples. Must be a vector with length equal to the number of classes.
Do i need to give total count of positive values of each label as a tensor or they mean something else by weights?
The PyTorch documentation for BCEWithLogitsLoss recommends the pos_weight to be a ratio between the negative counts and the positive counts for each class.
So, if len(dataset) is 1000, element 0 of your multihot encoding has 100 positive counts, then element 0 of the pos_weights_vector should be 900/100 = 9. That means that the binary crossent loss will behave as if the dataset contains 900 positive examples instead of 100.
Here is my implementation:
(new, based on this post)
pos_weight = (y==0.).sum()/y.sum()
(original)
def calculate_pos_weights(class_counts):
pos_weights = np.ones_like(class_counts)
neg_counts = [len(data)-pos_count for pos_count in class_counts]
for cdx, pos_count, neg_count in enumerate(zip(class_counts, neg_counts)):
pos_weights[cdx] = neg_count / (pos_count + 1e-5)
return torch.as_tensor(pos_weights, dtype=torch.float)
Where class_counts is just a column-wise sum of the positive samples. I posted it on the PyTorch forum and one of the PyTorch devs gave it his blessing.
Maybe is a little late, but here is how I calculate the same. Looking into the documentation:
For example, if a dataset contains 100 positive and 300 negative examples of a single class, then pos_weight for the class should be equal to 300/100 = 3.
So an easy way to calcule the positive weight is using the tensor methods with your label vector "y", in my case train_dataset.data.y. And then calculating the total negative labels.
num_positives = torch.sum(train_dataset.data.y, dim=0)
num_negatives = len(train_dataset.data.y) - num_positives
pos_weight = num_negatives / num_positives
Then the weights can be used easily as:
criterion = torch.nn.BCEWithLogitsLoss(pos_weight = pos_weight)
PyTorch solution
Well, actually I have gone through docs and you can simply use pos_weight indeed.
This argument gives weight to positive sample for each class, hence if you have 270 classes you should pass torch.Tensor with shape (270,) defining weight for each class.
Here is marginally modified snippet from documentation:
# 270 classes, batch size = 64
target = torch.ones([64, 270], dtype=torch.float32)
# Logits outputted from your network, no activation
output = torch.full([64, 270], 0.9)
# Weights, each being equal to one. You can input your own here.
pos_weight = torch.ones([270])
criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight)
criterion(output, target) # -log(sigmoid(0.9))
Self-made solution
When it comes to weighting, there is no built-in solution, but you may code one yourself really easily:
import torch
class WeightedMultilabel(torch.nn.Module):
def __init__(self, weights: torch.Tensor):
self.loss = torch.nn.BCEWithLogitsLoss()
self.weights = weights.unsqueeze()
def forward(outputs, targets):
return self.loss(outputs, targets) * self.weights
Tensor has to be of the same length as the number of classes in your multilabel classification (270), each giving weight for your specific example.
Calculating weights
You just add labels of every sample in your dataset, divide by the minimum value and inverse at the end.
Sort of snippet:
weights = torch.zeros_like(dataset[0])
for element in dataset:
weights += element
weights = 1 / (weights / torch.min(weights))
Using this approach class occurring the least will give normal loss, while others will have weights smaller than 1.
It might cause some instability during training though, so you might want to experiment with those values a little (maybe log transform instead of linear?)
Other approach
You may think about upsampling/downsampling (though this operation is complicated as you would add/delete other classes as well, so advanced heuristics would be needed I think).
Just to provide a quick revision on #crypdick's answer, this implementation of the function worked for me:
def calculate_pos_weights(class_counts,data):
pos_weights = np.ones_like(class_counts)
neg_counts = [len(data)-pos_count for pos_count in class_counts]
for cdx, (pos_count, neg_count) in enumerate(zip(class_counts, neg_counts)):
pos_weights[cdx] = neg_count / (pos_count + 1e-5)
return torch.as_tensor(pos_weights, dtype=torch.float)
Where data is the dataset you're trying to apply weights to.

keras: unsupervised learning with external constraint

I have to train a network on unlabelled data of binary type (True/False), which sounds like unsupervised learning. This is what the normalised data look like:
array([[-0.05744527, -1.03575495, -0.1940105 , -1.15348956, -0.62664491,
-0.98484037],
[-0.05497629, -0.50935675, -0.19396862, -0.68990988, -0.10551919,
-0.72375012],
[-0.03275552, 0.31480204, -0.1834951 , 0.23724946, 0.15504367,
0.29810553],
...,
[-0.05744527, -0.68482282, -0.1940105 , -0.87534175, -0.23580062,
-0.98484037],
[-0.05744527, -1.50366446, -0.1940105 , -1.52435329, -1.14777063,
-0.98484037],
[-0.05744527, -1.26970971, -0.1940105 , -1.33892142, -0.88720777,
-0.98484037]])
However, I do have a constraint on the total number of True labels in my data. This doesn't mean I can build a classical custom loss function in Keras taking (y_true, y_pred) arguments as required: my external constraint is just on the predicted total of True and False, not on the individual labels.
My question is whether there is a somewhat "standard" approach to this kind of problems, and how that is implementable in Keras.
POSSIBLE SOLUTION
Should I assign y_true randomly as 0/1, have a network return y_pred as 1/0 with a sigmoid activation function, and then define my loss function as
sum_y_true = 500 # arbitrary constant known a priori
def loss_function(y_true, y_pred):
loss = np.abs(y_pred.sum() - sum_y_true)
return loss
In the end, I went with the following solution, which worked.
1) Define batches in your dataframe df with a batch_id column, so that in each batch Y_train is your identical "batch ground truth" (in my case, the total number of True labels in the batch). You can then pass these instances together to the network. This can be done with a generator:
def grouper(g,x,y):
while True:
for gr in g.unique():
# this assigns indices to the entire set of values in g,
# then subsects to all the rows in which g == gr
indices = g == gr
yield (x[indices],y[indices])
# train set
train_generator = grouper(df.loc[df['set'] == 'train','batch_id'], X_train, Y_train)
# validation set
val_generator = grouper(df.loc[df['set'] == 'val','batch_id'], X_val, Y_val)
2) define a custom loss function, to track how close the total number of instances predicted as true matches the ground truth:
def custom_delta(y_true, y_pred):
loss = K.abs(K.mean(y_true) - K.sum(y_pred))
return loss
def custom_wrapper():
def custom_loss_function(y_true, y_pred):
return custom_delta(y_true, y_pred)
return custom_loss_function
Note that here
a) Each y_true label is already the sum of the ground truth in our batch (cause we don't have individual values). That's why y_true is not summed over;
b) K.mean is actually a bit of an overkill to extract a single scalar from this uniform tensor, in which all y_true values in each batch are identical - K.min or K.max would also work, but I haven't tested whether their performance is faster.
3) Use fit_generator instead of fit:
fmodel = Sequential()
# ...your layers...
# Create the loss function object using the wrapper function above
loss_ = custom_wrapper()
fmodel.compile(loss=loss_, optimizer='adam')
history1 = fmodel.fit_generator(train_generator, steps_per_epoch=total_batches,
validation_data=val_generator,
validation_steps=df.loc[encs.df['set'] == 'val','batch_id'].nunique(),
epochs=20, verbose = 2)
This way the problem is basically addressed as one of supervised learning, although without individual labels, which means that notions like true/false positive are meaningless here.
This approach not only managed to give me a y_pred that closely matches the totals I know per batch. It actually finds two groups (True/False) that occupy the expected different portions of parameter space.

Scikit-learn Randomforest with warm_start results (non-broadcastable output ...)

I am trying to build an online random forest classifier. In a for loop I faced an error that I can not find the reason for.
clf = RandomForestClassifier(n_estimators=1, warm_start=True)
In the for loop, I am increasing the number of estimators while reading new data.
clf.n_estimators = (clf.n_estimators + 1)
clf = clf.fit(data_batch, label_batch)
After going through the loop for 3 times, when running the code predict as follows in the loop:
predicted = clf.predict(data_batch)
I get the following error:
ValueError: non-broadcastable output operand with shape (500,1) doesn't match the broadcast shape (500,2)
While the data is in shape (500,153) and the label is (500,).
Here is a more complete code:
clf = RandomForestClassifier(n_estimators=1, warm_start=True)
clf = clf.fit(X_train, y_train)
predicted = clf.predict(X_test)
batch_size = 500
for i in xrange(batch_init_size, records, batch_size):
from_ = (i + 1)
to_ = (i + batch_size + 1)
data_batch = data[from_:to_, :]
label_batch = label[from_:to_]
predicted = clf.predict(data_batch)
clf.n_estimators = (clf.n_estimators + 1)
clf = clf.fit(data_batch, label_batch)
Yes, the error is due to batches having an unequal number of sample classes.
I solved this by using a batch size which will consist of all classes.
I found the cause of the problem:
As the data is imbalanced, there is a high possibility that all the samples from some batches are all from a single class. In such cases in file forest.py is unable to operate on one single dimension and one 2 dimension matrices. Here is the code in forest.py from scikit-learn:
def accumulate_prediction(predict, X, out, lock):
prediction = predict(X, check_input=False)
with lock:
if len(out) == 1:
out[0] += prediction
else:
for i in range(len(out)):
out[i] += prediction[i]

Resources