Related
I am more interested in optimizing my multiclass problem with Brier score instead of accuracy. To achieve that, I am evaluating my classifiers with the results of predict_proba() like:
import numpy as np
probs = np.array(
[ [1, 0, 0],
[0, 1, 0],
[1, 0, 0],
[0, 1, 0],
[0, 0, 1],
[1, 0, 0],
[0, 0, 1],
[0, 0, 1]]
)
targets = np.array(
[[0.9, 0.05, 0.05],
[0.1, 0.8, 0.1],
[0.7, 0.2, 0.1],
[0.1, 0.9, 0],
[0, 0, 1],
[0.5, 0.3, 0.2],
[0.1, 0.5, 0.4],
[0.34, 0.33, 0.33]]
)
def brier_multi(targets, probs):
return np.mean(np.sum((probs - targets) ** 2, axis=1))
brier_multi(targets, probs)
Is it possible to optimize scikit-learns classifier directly during training for multiclass Brier score instead of accuracy?
Edit:
...
pipe = Pipeline(
steps=[
("preprocessor", preprocessor),
("selector", None),
("classifier", model.get("classifier")),
]
)
def brier_multi(targets, probs):
ohe_targets = OneHotEncoder().fit_transform(targets.reshape(-1, 1))
return np.mean(np.sum(np.square(probs - ohe_targets), axis=1))
brier_multi_loss = make_scorer(
brier_multi,
greater_is_better=False,
needs_proba=True,
)
search = GridSearchCV(
estimator=pipe,
param_grid=model.get("param_grid"),
scoring=brier_multi_loss,
cv=3,
n_jobs=-1,
refit=True,
verbose=3,
)
search.fit(X_train, y_train)
...
leads to nan as score
/home/andreas/.local/lib/python3.8/site-packages/sklearn/model_selection/_search.py:969: UserWarning: One or more of the test scores are non-finite: [nan nan nan nan nan nan nan nan nan]
warnings.warn(
You're already aware of the scoring parameter, so you just need to wrap your brier_multi into the format expected by GridSearchCV. There's a utility for that, make_scorer:
from sklearn.metrics import make_scorer
neg_mc_brier_score = make_scorer(
brier_multi,
greater_is_better=False,
needs_proba=True,
)
GridSearchCV(..., scoring=neg_mc_brier_score)
See the User Guide and the docs for make_scorer.
Unfortunately, that won't run, because your version of the scorer expects a one-hot-encoded targets array, whereas sklearn multiclass will send y_true as a 1d array. As a hack to make sure the rest works, you can modify:
def brier_multi(targets, probs):
ohe_targets = OneHotEncoder().fit_transform(targets.reshape(-1, 1))
return np.mean(np.sum(np.square(probs - ohe_targets), axis=1))
but I would encourage you to make this more robust (what if the classes aren't just 0, 1, ..., n_classes-1?).
For what it's worth, sklearn has a PR in progress to add multiclass Brier score: https://github.com/scikit-learn/scikit-learn/pull/22046 (be sure to see the linked PR18699, as it has the beginning of development and review).
I have two tensors: scores and lists
scores is of shape (x, 8) and lists of (x, 8, 4). I want to filter the max values for each row in scores and filter the respective elements from lists.
Take the following as an example (shape dimension 8 was reduced to 2 for simplicity):
scores = torch.tensor([[0.5, 0.4], [0.3, 0.8], ...])
lists = torch.tensor([[[0.2, 0.3, 0.1, 0.5],
[0.4, 0.7, 0.8, 0.2]],
[[0.1, 0.2, 0.1, 0.3],
[0.4, 0.3, 0.2, 0.5]], ...])
Then I would like to filter these tensors to:
scores = torch.tensor([0.5, 0.8, ...])
lists = torch.tensor([[0.2, 0.3, 0.1, 0.5], [0.4, 0.3, 0.2, 0.5], ...])
NOTE:
I tried so far, to retrieve the indices from the original score vector and use it as an index vector to filter lists:
# PSEUDO-CODE
indices = scores.argmax(dim=1)
for list, idx in zip(lists, indices):
list = list[idx]
That is also where the question name is coming from.
I imagine you tried something like
indices = scores.argmax(dim=1)
selection = lists[:, indices]
This does not work because the indices are selected for every element in dimension 0, so the final shape is (x, x, 4).
The perform the correct selection you need to replace the slice with a range.
indices = scores.argmax(dim=1)
selection = lists[range(indices.size(0)), indices]
I followed example and tried to use gridsearch with a random forest classifier to generate roc_auc_score, however, the y_prob=model.predict_proba(X_test)
I generated was in list (two arrays) rather than one. So I was wondering what went wrong here.
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import label_binarize
from sklearn.model_selection import GridSearchCV, StratifiedShuffleSplit
from sklearn.metrics import roc_auc_score
X = np.random.rand(50,10)
y = np.random.permutation([1] * 25 + [0] * 25)
y= label_binarize(y, classes=[0, 1])
y= np.hstack((1-y, y))
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=7)
index_split = sss.split(X, y)
train_index = []
test_index = []
for train_ind, test_ind in index_split:
train_index.extend(train_ind)
test_index.extend(test_ind)
data_train = X[train_index]
out_train = y[train_index]
data_test = X[test_index]
out_test = y[test_index]
rf = RandomForestClassifier()
grids = {
'n_estimators': [10, 50, 100, 200],
'max_features': ['auto', 'sqrt', 'log2'],
'criterion': ['gini', 'entropy']
}
rf_grids_searched = GridSearchCV(rf,
grids,
scoring = "roc_auc",
n_jobs = -1,
refit=True,
cv = 5,
verbose=10)
rf_grids_searched.fit(data_train, out_train)
rf_best = rf_grids_searched.best_estimator_
y_prob=rf_best.predict_proba(data_test)
print(roc_auc_score(out_test, y_prob))
my result:
array([[0.5, 0.5],
[0.5, 0.5],
[0.7, 0.3],
[0.3, 0.7],
[0.7, 0.3],
[0.5, 0.5],
[0.1, 0.9],
[0.6, 0.4],
[0.6, 0.4],
[0.4, 0.6]]), array([[0.5, 0.5],
[0.5, 0.5],
[0.3, 0.7],
[0.7, 0.3],
[0.3, 0.7],
[0.5, 0.5],
[0.9, 0.1],
[0.4, 0.6],
[0.4, 0.6],
[0.6, 0.4]])]
expected results with probability of [0,1]:
array([[0.5, 0.5],
[0.5, 0.5],
[0.7, 0.3],
[0.3, 0.7],
[0.7, 0.3],
[0.5, 0.5],
[0.1, 0.9],
[0.6, 0.4],
[0.6, 0.4],
I also tried not to binarize y in the first place and then train gridsearch to get the following array y_prob. Later, I binarize y_test to match the dimension of y_prob and get the score. I was wondering if the sequence is correct?
code:
out_test1= label_binarize(out_test, classes=[0, 1])
out_test1= np.hstack((1-out_test1, out_test1))
print(roc_auc_score(out_test1, y_prob))
array([[0.6, 0.4],
[0.5, 0.5],
[0.6, 0.4],
[0.5, 0.5],
[0.7, 0.3],
[0.3, 0.7],
[0.8, 0.2],
[0.4, 0.6],
[0.8, 0.2],
[0.4, 0.6]])
The grid search's predict_proba method is just a dispatch to the best estimator's predict_proba. And from the docstring for RandomForestClassifier.predict_proba (emphasis added):
Returns
p : ndarray of shape (n_samples, n_classes), or a list of n_outputs
such arrays if n_outputs > 1. ...
Since you've specified two outputs (two columns in y), you get predicted probabilities for each of the two classes for each of the two targets.
I have a 2D tensor:
samples = torch.Tensor([
[0.1, 0.1], #-> group / class 1
[0.2, 0.2], #-> group / class 2
[0.4, 0.4], #-> group / class 2
[0.0, 0.0] #-> group / class 0
])
and a label for each sample corresponding to a class:
labels = torch.LongTensor([1, 2, 2, 0])
so len(samples) == len(labels). Now I want to calculate the mean for each class / label. Because there are 3 classes (0, 1 and 2) the final vector should have dimension [n_classes, samples.shape[1]] So the expected solution should be:
result == torch.Tensor([
[0.1, 0.1],
[0.3, 0.3], # -> mean of [0.2, 0.2] and [0.4, 0.4]
[0.0, 0.0]
])
Question: How can this be done in pure pytorch (i.e. no numpy so that I can autograd) and ideally without for loops?
All you need to do is form an mxn matrix (m=num classes, n=num samples) which will select the appropriate weights, and scale the mean appropriately. Then you can perform a matrix multiplication between your newly formed matrix and the samples matrix.
Given your labels, your matrix should be (each row is a class number, each class a sample number and its weight):
[[0.0000, 0.0000, 0.0000, 1.0000],
[1.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.5000, 0.5000, 0.0000]]
Which you can form as follows:
M = torch.zeros(labels.max()+1, len(samples))
M[labels, torch.arange(len(samples)] = 1
M = torch.nn.functional.normalize(M, p=1, dim=1)
torch.mm(M, samples)
Output:
tensor([[0.0000, 0.0000],
[0.1000, 0.1000],
[0.3000, 0.3000]])
Note that the output means are correctly sorted in class order.
Why does M[labels, torch.arange(len(samples))] = 1 work?
This is performing a broadcast operation between the labels and the number of samples. Essentially, we are generating a 2D index for every element in labels: the first specifies which of the m classes it belongs to, and the second simply specifies its index position (from 1 to N). Another way would be top explicitly generate all the 2D indices:
twoD_indices = []
for count, label in enumerate(labels):
twoD_indices.append((label, count))
Reposting here an answer from #ptrblck_de in the Pytorch forums
labels = labels.view(labels.size(0), 1).expand(-1, samples.size(1))
unique_labels, labels_count = labels.unique(dim=0, return_counts=True)
res = torch.zeros_like(unique_labels, dtype=torch.float).scatter_add_(0, labels, samples)
res = res / labels_count.float().unsqueeze(1)
As previous solutions do not work for the case of sparse groups (e.g., not all the groups are in the data), I made one :)
def groupby_mean(value:torch.Tensor, labels:torch.LongTensor) -> (torch.Tensor, torch.LongTensor):
"""Group-wise average for (sparse) grouped tensors
Args:
value (torch.Tensor): values to average (# samples, latent dimension)
labels (torch.LongTensor): labels for embedding parameters (# samples,)
Returns:
result (torch.Tensor): (# unique labels, latent dimension)
new_labels (torch.LongTensor): (# unique labels,)
Examples:
>>> samples = torch.Tensor([
[0.15, 0.15, 0.15], #-> group / class 1
[0.2, 0.2, 0.2], #-> group / class 3
[0.4, 0.4, 0.4], #-> group / class 3
[0.0, 0.0, 0.0] #-> group / class 0
])
>>> labels = torch.LongTensor([1, 5, 5, 0])
>>> result, new_labels = groupby_mean(samples, labels)
>>> result
tensor([[0.0000, 0.0000, 0.0000],
[0.1500, 0.1500, 0.1500],
[0.3000, 0.3000, 0.3000]])
>>> new_labels
tensor([0, 1, 5])
"""
uniques = labels.unique().tolist()
labels = labels.tolist()
key_val = {key: val for key, val in zip(uniques, range(len(uniques)))}
val_key = {val: key for key, val in zip(uniques, range(len(uniques)))}
labels = torch.LongTensor(list(map(key_val.get, labels)))
labels = labels.view(labels.size(0), 1).expand(-1, value.size(1))
unique_labels, labels_count = labels.unique(dim=0, return_counts=True)
result = torch.zeros_like(unique_labels, dtype=torch.float).scatter_add_(0, labels, value)
result = result / labels_count.float().unsqueeze(1)
new_labels = torch.LongTensor(list(map(val_key.get, unique_labels[:, 0].tolist())))
return result, new_labels
For 3D Tensors:
For those, who are interested. I expanded #yhenon's answer to the case, where labels is a 2D tensor and samples is a 3D Tensor. This might be useful, if you want to execute this operation in batches (as I do). But it comes with a caveat (see at the end).
M = torch.zeros(labels.shape[0], labels.max()+1, labels.shape[1])
M[torch.arange(len(labels))[:,None], labels, torch.arange(labels.size(1))] = 1
M = torch.nn.functional.normalize(M, p=1, dim=-1)
result = M#samples
samples = torch.Tensor([[
[0.1, 0.1], #-> group / class 1
[0.2, 0.2], #-> group / class 2
[0.4, 0.4], #-> group / class 2
[0.0, 0.0] #-> group / class 0
], [
[0.5, 0.5], #-> group / class 0
[0.2, 0.2], #-> group / class 1
[0.4, 0.4], #-> group / class 2
[0.1, 0.1] #-> group / class 3
]])
labels = torch.LongTensor([[1, 2, 2, 0], [0, 1, 2, 3]])
Output:
>>> result
tensor([[[0.0000, 0.0000],
[0.1000, 0.1000],
[0.3000, 0.3000],
[0.0000, 0.0000]],
[[0.5000, 0.5000],
[0.2000, 0.2000],
[0.4000, 0.4000],
[0.1000, 0.1000]]])
Be careful: Now, result[0] has a length of 4 (instead of 3 in #yhenon's answer), because labels[1] contains a 3. The last row contains only 0s. If you don't except 0s in the last rows of your resulting tensor, you can use this code and deal with the 0s later.
This ended up being a different issue from the one in the question
I have a very simple Keras model that accepts time series data. I want to use a recurrent layer to predict a new sequence of the same dimensions, with a softmax on the end to provide a normalised result at each time step.
This is how my model looks.
x = GRU(256, return_sequences=True)(x)
x = TimeDistributed(Dense(3, activation='softmax'))(x)
Imagine the input is something like:
[
[0.25, 0.25, 0.5],
[0.3, 0.3, 0.4],
[0.2, 0.7, 0.1],
[0.1, 0.1, 0.8]
]
I'd expect the output to be the same shape and normalised at each step, like:
[
[0.15, 0.35, 0.5],
[0.35, 0.35, 0.3],
[0.1, 0.6, 0.3],
[0.1, 0.2, 0.7]
]
But what I actually get is a result where the sum of elements in each row are actually a quarter (or whatever fraction of the number of rows), not 1.
Put simply, I thought the idea of TimeDistributed was to apply the Dense layer to each time step, so effectively the Dense with softmax activation would be applied repeatedly to each timestep. But I seem to be getting a result that looks like it is normalized across all elements in the output matrix of time steps.
Since I seem to understand incorrectly, is there a way to get a Dense softmax result for each time step (normalized to 1 at each step) without having to predict each time step sequentially?
It appears that the issue wasn't with the handling of Softmax with a TimeDistributed wrapper, but an error in my predictions function, which was summing over the whole matrix rather than on a row by row basis.