I have 10 classes, and my y_test has shape (1000, 10) and it looks like this:
array([[0, 0, 1, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 1],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=int64)
If I use the following where i is the class number
fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_pred[:, i])
should y_pred be
y_pred = model.predict(x_test)
OR
y_pred = np.argmax(model.predict(x_test), axis=1)
lb = LabelBinarizer()
lb.fit(y_test)
y_pred = lb.transform(y_pred)
The first option gives me something like this:
[[6.87280996e-11 6.28617670e-07 9.96915460e-01 ... 3.08361766e-03
3.47333212e-14 2.83545876e-09]
[7.04240659e-30 1.51786850e-07 8.49807921e-28 ... 6.62584656e-33
6.97696034e-19 1.01019222e-20]
[2.97537670e-14 2.67199534e-24 2.85646610e-19 ... 2.19898160e-15
7.03626012e-22 7.56072279e-18]
...
[1.63774752e-15 1.32784101e-06 1.23182635e-05 ... 3.60217566e-14
6.01247484e-05 2.61179358e-01]
[2.09420733e-35 6.94865276e-10 1.14242395e-22 ... 5.08080394e-22
1.20934697e-19 1.77760468e-17]
[1.68334747e-13 8.53335252e-04 4.40571597e-07 ... 1.70050384e-06
1.48684137e-06 2.93400045e-03]]
with shape (1000,10).
where the latter option gives
[[0 0 1 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
...
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]]
with shape (1000,10)
Which way is the correct approach? in other words, what would this y_pred be when passing to sklearn.metrics.roc_curve().
Forget to mention, using the first option gives me extremely high (almost 1) AUC values for all classes, whereas the second option seems to generate reasonable AUC values.
The ROC curves using the two options are below, which one looks more correct?
There is nothing wrong with the first option, and that's what the documentation asks for:
y_scorendarray of shape (n_samples,)
Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).
Also, the first graph looks like a ROC curve, while the second is weird.
And finally, ROC curves intend to study "different classification thresholds". That means you need predictions "as probabilities" (confidences), not as 0's and 1's.
When you take an argmax, you throw away the probabilities/confidences, making it impossible to study thresholds.
Related
I see that a simple checkerboard pattern can be created fairly concisely with numpy Does anyone know if a checkerboard where each square may contain multiple values could be created? E.g.:
1 1 0 0 1 1
1 1 0 0 1 1
0 0 1 1 0 0
0 0 1 1 0 0
Although there is no equivalent of np.indices in PyTorch, you can still find a workaround using a combination of torch.arange, torch.meshgrid, and torch.stack:
def indices(h,w):
return torch.stack(torch.meshgrid(torch.arange(h), torch.arange(w)))
This allows you to define a base tensor with a checkboard pattern following your linked post:
>>> base = indices(2,3).sum(axis=0) % 2
tensor([[0, 1, 0],
[1, 0, 1]])
Then you can repeat the row end columns with torch.repeat_interleave:
>>> base.repeat_interleave(2, dim=0).repeat_interleave(2, dim=1)
tensor([[0, 0, 1, 1, 0, 0],
[0, 0, 1, 1, 0, 0],
[1, 1, 0, 0, 1, 1],
[1, 1, 0, 0, 1, 1]])
And you can take the opposite of a given checkboard x by computing 1-x.
So you could define a function like this:
def checkerboard(shape, k):
"""
shape: dimensions of output tensor
k: edge size of square
"""
h, w = shape
base = indices(h//k, w//k).sum(dim=0) % 2
x = base.repeat_interleave(k, 0).repeat_interleave(k, 1)
return 1-x
And try with:
>>> checkerboard((4,6), 2)
tensor([[1, 1, 0, 0, 1, 1],
[1, 1, 0, 0, 1, 1],
[0, 0, 1, 1, 0, 0],
[0, 0, 1, 1, 0, 0]])
Is there a function or a set of arguments that I can use in order to calculate Precision and Recall for a multi-label problem?
Note that with multi-label I mean that each sample can be classified into more than one class.
The following is not returning what I would expect:
import torch
from torchmetrics import Precision
target = torch.tensor([
[0, 0, 1, 1, 0], # Sample 1 belongs to class 2 and 3 (zero-indexed)
[0, 0, 1, 0, 0], # Sample 2 belongs to class 2 (zero-indexed)
])
preds = torch.tensor([
[0, 0, 0, 0, 0], # Sample 1 predicted to belong to no class
[0, 0, 0, 0, 0], # Sample 2 predicted to belong to no class
])
metric = Precision(num_classes=5, mdmc_average="samplewise")
print(metric(preds, target))
It returns: tensor(0.7000), but it should be 0% since there are no True Positives.
I'm facing issues when I try to update the estimation of the state matrix. because the sizes does not match.
I'm reading this excellent documentation to learn about de theory behind the kalman filter.
In summary the sizes of the matrix are:
x = state vector => mx*1
z = output vector => mz*1
F = state transition matrix => mx*nx
P = estimated uncertainty matrix => mx*nx
Q = process noise uncertanty matrix => mx*nx
R = Measurement uncertainty matrix => mz*nz
H = Observation matrix => mz*nx
K = Kalman Gain => mx*nz
In my implementation, I am using the filter for tracking an object. My state vector looks like this.
x=[px,vx,ax,py,vy,ay];
The output vector
z=[px,py]
The state transition matrix
F = [
[ 1, 1, 0.5, 0, 0, 0],
[ 0, 1, 1, 0, 0, 0],
[ 0, 0, 1, 0, 0, 0],
[ 0, 0, 0, 1, 1, 0.5],
[ 0, 0, 0, 0, 1, 1],
[ 0, 0, 0, 0, 0, 1]]
the variance-covariance matrix
P = [
[ 3, 2, 1, 0, 0, 0],
[ 2, 3, 2, 0, 0, 0],
[ 1, 2, 3, 0, 0, 0],
[ 0, 0, 0, 3, 2, 1],
[ 0, 0, 0, 2, 3, 2],
[ 0, 0, 0, 1, 2, 3]]
The process noise
Q = [
[ 0.1, 0.1, 0.1, 0, 0, 0],
[ 0.1, 0.1, 0.1, 0, 0, 0],
[ 0.1, 0.1, 0.1, 0, 0, 0],
[ 0, 0, 0, 0.1, 0.1, 0.1],
[ 0, 0, 0, 0.1, 0.1, 0.1],
[ 0, 0, 0, 0.1, 0.1, 0.1]]
The measurement noise
R = [
[3, 0],
[0, 2]]
The observation matrix
H = [
[1, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0]]
I did this because i want only the position of the result. (x,y)
Running the acuations...
1. Time Update
1. Extrapolate the state
Xk^=F*Xk_1
No problem here, no control input because i'm just reading the values, the result is mx*1
2. Extrapolate the uncertainty
Pk=F*Pk_1*FT + Q
No Problem here, the sizes are correct, and the result is like the theory says, mx*nx
2. Measurement Update
1. Computing the kalman gain
K=Pk*HT*(H*Pk*HT+ R)^-1
No problem here, the result is like the theory, mx*nz.
2.Update the estimate uncertainty
Pk'= Pk- K*H*Pk
No problem here, the result is like the expected, mx*nx.
3.Update the estimate (state), with measurement.
Xk=Xk^ + K*(z-H*Xk^)
And finally this is the problem, what i am doing is compute first K*(z-H*Xk^). it results in a matrix with mz*nz, in other words (2x2) size, but the Xk^ vector has this size mx*1. So, when I try to add this both matrix, it results in an error.
How can I solve this? I dont know, I can't see what i'm missing.
i'm using nodejs to run this. But i have the step by step in spread sheet in excel.
(z - H*Xk^) is also called innovation , and it should have dimensions nz x 1. I suspect this is where the problem occurs.
You can define
z^ = H*Xk^, and its dimensions should also be nz x 1
My guess is that you have multiple stacked measurements in the z variable, i.e., instead of z being nz*1 , it actually is nz x Nz, where nz is the dimensionality of a measurement, and Nz the number of measurements. In that case you can either perform sequential single measurement updates after separating the z^ matrix into nz x 1 vectors, but that would mean you would have to be able to calculate a prediction for every measurement in the stacked measurements vector.
If we split
Xk^ + K*(z-H*Xk^)
into steps:
H*Xk^ is mz*nx by nx*1 so is mz*1
z - H*Xk^ is mz*1 - mz*1 so is mz*1
K*(z-H*Xk^) is mx*nz by mz*1 so is mx*1
(BTW I find your use of e.g. mx and nx confusing. These numbers must be equal (because P is square).)
I am facing an issue in a multi label, multi class classification task. I have a dataset of size 33000, each samples containing 104 classes. I split the dataset in 16500 samples with labels such as [1, 0, 1, 0, 0, …], [0, 1, 1, 0, 1, …], [1, 0, 0, 0] (each label has at least one element 1 in it) and 16500 labels such as [0, 0, 0, …], [0, 0, 0, …] (all elements in all labels are 0). When calculating the pos_count for each class, the number pos_count_0 for class 0 is how many of 1 appear in the first position of each label in my dataset. For class 1, pos_count_1 the number of 1 in the second position and so on. And after that, the pos_weight of class 0 is (33000-pos_count_0)/pos_count_0, pos_weight of class 1 is (33000-pos_count_1)/pos_count_1 ? I am a little bit confused how neg_count and pos_count for a class are calculated.
I have a matrix like below:
[0 0 1 1]
[0 0 1 1]
[0 0 0 0]
[0 0 0 0]
I need to divide it into multiple 3x3 matrices starting from top left through right. It's sort of a 3x3 slide across the matrix. In this example, we would have 4 3x3 matrices like so:
[0 0 1] [0 1 1]
1 = [0 0 1] 2 = [0 1 1]
[0 0 0] [0 0 0]
[0 0 1] [0 1 1]
3 = [0 0 0] 4 = [0 0 0]
[0 0 0] [0 0 0]
I've tried this using tf.extract_image_patch and got the 4 matrices, but I'm still not sure how I can do a sort-of running product for these matrices in Tensorflow. Or, better could achieve the running product without having to pre-calculate the separate matrices.
With running product I mean this: I need to multiple above 1-4 matrices element-wise and need to get 1 3x3 matrix. For example, 1 & 2 matrices would be multiplied, the result would be multiplied with matrix 3, and the result again would be multiplied with matrix 4.This operation should give me start of the patch([[1 1], [1 1]]) in my original matrix, a matrix like below:
[0 0 1]
res = [0 0 0]
[0 0 0]
Once done, I need to make this operation part of my network, a Tensorflow layer perhaps.
I'd appreciate if someone could help me achieve this. Thanks.
EDIT
This seems to be one way to multiply matrices in a list, but I'm still looking for 1) slice matrices into multiple parts and multiply them in a better way and 2) to add this as a layer to a network:
tf.scan(lambda a, b: tf.multiply(tf.squeeze(a), tf.squeeze(b)), original)
you could use tf.nn.conv2d, manipulating a matrix like this is called a convolution
see tensorflow.org/api_docs/python/tf/nn/conv2d
You can use numpy array slicing
import numpy as np
A = np.array([[0, 0, 1, 1],
[0, 0, 1, 1],
[0, 0, 0, 0],
[0, 0, 0, 0]])
res = A[:-1, :-1] * A[:-1, 1:] * A[1:, :-1] * A[1:, 1:]
and then, perhaps, convert the numpy array to a Tensor object by
tf.convert_to_tensor(res)