I'm using pytorch lightning, and at the end of each epoch, I create a confusion matrix from torchmetrics.ConfusionMatrix (see code below). I would like to log this into Wandb, but the Wandb confusion matrix logger only accepts y_targets and y_predictions. Does anyone know how to extract the updated confusion matrix y_targets and y_predictions from a confusion matrix, or alternatively give Wandb my updated confusion matrix in a way that it can be processed into eg a heatmap within wandb?
class ClassificationTask(pl.LightningModule):
def __init__(self, model, lr=1e-4, augmentor=augmentor):
super().__init__()
self.model = model
self.lr = lr
self.save_hyperparameters() #not being used at the moment, good to have ther in the future
self.augmentor=augmentor
self.matrix = torchmetrics.ConfusionMatrix(num_classes=9)
self.y_trues=[]
self.y_preds=[]
def training_step(self, batch, batch_idx):
x, y = batch
x=self.augmentor(x)#.to('cuda')
y_pred = self.model(x)
loss = F.cross_entropy(y_pred, y,) #weights=class_weights_tensor
acc = accuracy(y_pred, y)
metrics = {"train_acc": acc, "train_loss": loss}
self.log_dict(metrics)
return loss
def validation_step(self, batch, batch_idx):
loss, acc = self._shared_eval_step(batch, batch_idx)
metrics = {"val_acc": acc, "val_loss": loss, }
self.log_dict(metrics)
return metrics
def _shared_eval_step(self, batch, batch_idx):
x, y = batch
y_hat = self.model(x)
loss = F.cross_entropy(y_hat, y)
acc = accuracy(y_hat, y)
self.matrix.update(y_hat,y)
return loss, acc
def validation_epoch_end(self, outputs):
confusion_matrix = self.matrix.compute()
wandb.log({"my_conf_mat_id" : confusion_matrix})
def configure_optimizers(self):
return torch.optim.Adam((self.model.parameters()), lr=self.lr)
I'm actually working on the same issue currently. I found this great PR Feature request for PyTorch lightning. Perhaps this could be of help. I think a possible solution is utilizing torch metrics confusion matrix and then incorporating that into your train/val/test steps and logging them.
https://github.com/Lightning-AI/metrics/issues/880
Related
I'm a beginner and just trying to get in pytorch and neural networks. Therefore I created some dataset. The dataset consists of two input variables and one output variable (basicly the output is a linear function with some noise). Now I want to set up a neural network and train it with the dataset. I followed some tutorial and wrote this code:
df = pd.read_csv(r" ... .csv")
X = df[["x", "y"]]
y = df[["goal"]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, random_state=42)
X_train, y_train = np.array(X_train), np.array(y_train)
X_test, y_test = np.array(X_test), np.array(y_test)
# Convert data to torch tensors
class Data(Dataset):
def __init__(self, X, y):
self.X = torch.from_numpy(X.astype(np.float32))
self.y = torch.from_numpy(y.astype(np.float32))
self.len = self.X.shape[0]
def __getitem__(self, index):
return self.X[index], self.y[index]
def __len__(self):
return self.len
batch_size = 32
# Instantiate training and test data
train_data = Data(X_train, y_train)
train_dataloader = DataLoader(dataset=train_data, batch_size=batch_size, shuffle=True)
test_data = Data(X_test, y_test)
test_dataloader = DataLoader(dataset=test_data, batch_size=batch_size, shuffle=True)
input_dim = 2
hidden_dim_1 = 2
output_dim = 1
class NeuralNetwork(nn.Module):
def __init__(self, input_dim, hidden_dim_1, output_dim):
super(NeuralNetwork, self).__init__()
self.layer_1 = nn.Linear(input_dim, hidden_dim_1)
self.layer_out = nn.Linear(hidden_dim_1, output_dim)
def forward(self, x):
x = F.relu(self.layer_1(x))
x = self.layer_out(x)
return x
model = NeuralNetwork(input_dim, hidden_dim_1, output_dim)
optimizer = optim.SGD(model.parameters(), lr=0.01)
def train(epoch):
model.train()
for batch_id, (data, target) in enumerate(train_data):
data = Variable(data)
target = Variable(target)
target = target.to(dtype=torch.float32)
optimizer.zero_grad()
out = model(data)
criterion = F.mse_loss
loss = criterion(out, target)
print(loss.detach().numpy())
loss.backward()
optimizer.step()
for epoch in range(1, 30):
train(epoch)
My problem is that the printed loss is extremly high (e8-area) and does not decrease.
I tried to change some settings of the neural network, changed the batchsize, learningrate and tried other optimizers and loss functions. But none of the changes really helped. My research also didn't bring any success. Seems to me that there is a more basic mistake in my coding. What did I wrong?
Thanks in advance!
Your code seems fine to me (although I might miss a bug). It is in general never safe to say which networks will be successful and which won't, but here are some suggestions if you can't see any progress:
Check the input data. Maybe try plotting it to make sure that it actually contains what you think it does. You may print out the inputs, predicted and expected values (or better, view them in a debugger) to see what's wrong.
Normalize the input data. If there are high values in the input / output data, losses may explode. Ensure that most of the values are roughly between -1 and 1.
Lower the learning rate. 0.01 is generally a good starting point, but who knows.
Try training for more epochs. Depending on the noise in your data, this could be necessary.
Try adding more neurons. A linear function should in theory be fine with not that many, but maybe the noise is too 'complex'.
I'm trying to get my toy network to learn a sine wave.
I output (via tanh) a number between -1 and 1, and I want the network to minimise the following loss, where self(x) are the predictions.
loss = -torch.mean(self(x)*y)
This should be equivalent to trading a stock with a sinusoidal price, where self(x) is our desired position, and y are the returns of the next time step.
The issue I'm having is that the network doesn't learn anything. It does work if I change the loss function to be torch.mean((self(x)-y)**2) (MSE), but this isn't what I want. I'm trying to focus the network on 'making a profit', not making a prediction.
I think the issue may be related to the convexity of the loss function, but I'm not sure, and I'm not certain how to proceed. I've experimented with differing learning rates, but alas nothing works.
What should I be thinking about?
Actual code:
%load_ext tensorboard
import matplotlib.pyplot as plt; plt.rcParams["figure.figsize"] = (30,8)
import torch;from torch.utils.data import Dataset, DataLoader
import torch.nn.functional as F;import pytorch_lightning as pl
from torch import nn, tensor
def piecewise(x): return 2*(x>0)-1
class TsDs(torch.utils.data.Dataset):
def __init__(self, s, l=5): super().__init__();self.l,self.s=l,s
def __len__(self): return self.s.shape[0] - 1 - self.l
def __getitem__(self, i): return self.s[i:i+self.l], torch.log(self.s[i+self.l+1]/self.s[i+self.l])
def plt(self): plt.plot(self.s)
class TsDm(pl.LightningDataModule):
def __init__(self, length=5000, batch_size=1000): super().__init__();self.batch_size=batch_size;self.s = torch.sin(torch.arange(length)*0.2) + 5 + 0*torch.rand(length)
def train_dataloader(self): return DataLoader(TsDs(self.s[:3999]), batch_size=self.batch_size, shuffle=True)
def val_dataloader(self): return DataLoader(TsDs(self.s[4000:]), batch_size=self.batch_size)
dm = TsDm()
class MyModel(pl.LightningModule):
def __init__(self, learning_rate=0.01):
super().__init__();self.learning_rate = learning_rate
super().__init__();self.learning_rate = learning_rate
self.conv1 = nn.Conv1d(1,5,2)
self.lin1 = nn.Linear(20,3);self.lin2 = nn.Linear(3,1)
# self.network = nn.Sequential(nn.Conv1d(1,5,2),nn.ReLU(),nn.Linear(20,3),nn.ReLU(),nn.Linear(3,1), nn.Tanh())
# self.network = nn.Sequential(nn.Linear(5,5),nn.ReLU(),nn.Linear(5,3),nn.ReLU(),nn.Linear(3,1), nn.Tanh())
def forward(self, x):
out = x.unsqueeze(1)
out = self.conv1(out)
out = out.reshape(-1,20)
out = nn.ReLU()(out)
out = self.lin1(out)
out = nn.ReLU()(out)
out = self.lin2(out)
return nn.Tanh()(out)
def step(self, batch, batch_idx, stage):
x, y = batch
loss = -torch.mean(self(x)*y)
# loss = torch.mean((self(x)-y)**2)
print(loss)
self.log("loss", loss, prog_bar=True)
return loss
def training_step(self, batch, batch_idx): return self.step(batch, batch_idx, "train")
def validation_step(self, batch, batch_idx): return self.step(batch, batch_idx, "val")
def configure_optimizers(self): return torch.optim.SGD(self.parameters(), lr=self.learning_rate)
#logger = pl.loggers.TensorBoardLogger(save_dir="/content/")
mm = MyModel(0.1);trainer = pl.Trainer(max_epochs=10)
# trainer.tune(mm, dm)
trainer.fit(mm, datamodule=dm)
#
If I understand you correctly, I think that you were trying to maximize the unnormalized correlation between the network's prediction, self(x), and the target value y.
As you mention, the problem is the convexity of the loss wrt the model weights. One way to see the problem is to consider that the model is a simple linear predictor w'*x, where w is the model weights, w' it's transpose, and x the input feature vector (assume a scalar prediction for now). Then, if you look at the derivative of the loss wrt the weight vector (i.e., the gradient), you'll find that it no longer depends on w!
One way to fix this is change the loss to,
loss = -torch.mean(torch.square(self(x)*y))
or
loss = -torch.mean(torch.abs(self(x)*y))
You will have another big problem, however: these loss functions encourage unbound growth of the model weights. In the linear case, one solves this by a Lagrangian relaxation of a hard constraint on, for example, the norm of the model weight vector. I'm not sure how this would be done with neural networks as each layer would need it's own Lagrangian parameter...
I wrote a short snippet to train a classification model, and learn the learning rate of its optimization algorithm. In my example I tried to update weights of a network in an inner optimization loop and to learn the learning rate of the weight updates using an outer optimization loop (meta-optimization). I'm getting the error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [3, 10]], which is output 0 of AsStridedBackward0, is at version 12; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
My code snippet is as following (NOTE: I'm using _stateless, an experimental functional API for nn. You need to run with the nightly build of pytorch.)
import torch
from torch import nn, optim
from torch.utils.data import Dataset, DataLoader
from torch.nn.utils import _stateless
class MyDataset(Dataset):
def __init__(self, N):
self.N = N
self.x = torch.rand(self.N, 10)
self.y = torch.randint(0, 3, (self.N,))
def __len__(self):
return self.N
def __getitem__(self, idx):
return self.x[idx], self.y[idx]
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.fc1 = nn.Linear(10, 10)
self.fc2 = nn.Linear(10, 3)
self.relu = nn.ReLU()
self.alpha = nn.Parameter(torch.randn(1))
self.beta = nn.Parameter(torch.randn(1))
def forward(self, x):
y = self.relu(self.fc1(x))
return self.fc2(y)
epochs = 20
N = 100
dataset = DataLoader(dataset=MyDataset(N), batch_size=10)
model = MyModel()
loss_func = nn.CrossEntropyLoss()
optim = optim.Adam([model.alpha], lr=1e-3)
params = dict(model.named_parameters())
for i in range(epochs):
model.train()
train_loss = 0
for batch_idx, (x, y) in enumerate(dataset):
logits = _stateless.functional_call(model, params, x) # predict
loss_inner = loss_func(logits, y) # loss
optim.zero_grad() # reset grad
loss_inner.backward(create_graph=True, inputs=params.values()) # compute grad
train_loss += loss_inner.item() # store loss
for k, p in params.items():
if k is not 'alpha' and k is not 'beta':
p.update = - model.alpha * p.grad
params[k] = p + p.update # update weight
print('Train Epoch: {}\tLoss: {:.6f}'.format(i, train_loss / N))
logits = _stateless.functional_call(model, params, x) # predict
loss_meta = loss_func(logits, y)
loss_meta.backward()
loss_meta.step()
From the error message, I understand that the issue comes from weight update for the weights of the second layer of the network, which points to an error in my inner loop optimization. Any suggestions would be appreciated.
Check this link and save PARAMs per each epoch and use same inner batch:
https://discuss.pytorch.org/t/issue-using-parameters-internal-method/134549/11
for i in range(epochs):
model.train()
train_loss = 0
params = dict(model.named_parameters()) # add this
for batch_idx, (x, y) in enumerate(dataset):
params = {k: v.clone() for k,v in params.items()} # add this
logits = _stateless.functional_call(model, params, x) # predict
loss_inner = loss_func(logits, y)
..................
You should be updating params[k].data instead of params[k]
(Deleted the example to avoid distraction)
Let me enter in a kind of fundamental discussion (not an answer to your question).
If I undertand correctly you want to compute loss(f(w[i], x)) , and computing the w[i+1,j] = w[i,j] + g(v[j], w[i,j].grad(w.r.t loss)) . Then in the end you want to compute v[j+1] = v[j] + v[j].grad(w.r.t loss).
The gradient of v[j] is computed using the backward propagation, as a function of grad w[i,j]. So what you are trying to do is to choose v[j] that results in a good w[i,j]. I would ask: why would you bother about v[j] if you can control w[i,j] directly? And that's what the standard approach.
I am looking at the example provided on PyTorch-Lightning official documentation https://pytorch-lightning.readthedocs.io/en/0.9.0/lightning-module.html.
Here the loss and metric is calculated on the concrete batch. But when logging one is not interested in the accuracy for a particular batch, which can be rather small and not representative, but the averaged over all epoch. Do I understand correctly, that there is some code performing the averaging over all batches, passed through the epoch?
import pytorch_lightning as pl
from pytorch_lightning.metrics import functional as FM
class ClassificationTask(pl.LightningModule):
def __init__(self, model):
super().__init__()
self.model = model
def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self.model(x)
loss = F.cross_entropy(y_hat, y)
return pl.TrainResult(loss)
def validation_step(self, batch, batch_idx):
x, y = batch
y_hat = self.model(x)
loss = F.cross_entropy(y_hat, y)
acc = FM.accuracy(y_hat, y)
result = pl.EvalResult(checkpoint_on=loss)
result.log_dict({'val_acc': acc, 'val_loss': loss})
return result
def test_step(self, batch, batch_idx):
result = self.validation_step(batch, batch_idx)
result.rename_keys({'val_acc': 'test_acc', 'val_loss': 'test_loss'})
return result
def configure_optimizers(self):
return torch.optim.Adam(self.model.parameters(), lr=0.02)
If you want to average metrics over the epoch, you'll need to tell the LightningModule you've subclassed to do so. There are a few different ways to do this such as:
Call result.log('train_loss', loss, on_step=True, on_epoch=True, prog_bar=True, logger=True) as shown in the docs with on_epoch=True so that the training loss is averaged across the epoch. I.e.:
def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self.model(x)
loss = F.cross_entropy(y_hat, y)
result = pl.TrainResult(loss)
result.log('train_loss', loss, on_step=True, on_epoch=True, prog_bar=True, logger=True)
return result
Alternatively, you can call the log method on the LightningModule itself: self.log("train_loss", loss, on_epoch=True, sync_dist=True) (Optionally passing sync_dist=True to reduce across accelerators).
You'll want to do something similar in validation_step to get aggregated val-set metrics or implement the aggregation yourself in the validation_epoch_end method.
I'm wondering if I can do calibration in xgboost. To be more specific, does xgboost come with an existing calibration implementation like in scikit-learn, or are there some ways to put the model from xgboost into a scikit-learn's CalibratedClassifierCV?
As far as I know in sklearn this is the common procedure:
# Train random forest classifier, calibrate on validation data and evaluate
# on test data
clf = RandomForestClassifier(n_estimators=25)
clf.fit(X_train, y_train)
clf_probs = clf.predict_proba(X_test)
sig_clf = CalibratedClassifierCV(clf, method="sigmoid", cv="prefit")
sig_clf.fit(X_valid, y_valid)
sig_clf_probs = sig_clf.predict_proba(X_test)
sig_score = log_loss(y_test, sig_clf_probs)
print "Calibrated score is ",sig_score
If I put an xgboost tree model into the CalibratedClassifierCV an error will be thrown (of course):
RuntimeError: classifier has no decision_function or predict_proba method.
Is there a way to integrate the excellent calibration module of scikit-learn with xgboost?
Appreciate your insightful ideas!
Answering to my own question, an xgboost GBT can be integrated with scikit-learn by writing a wrapper class like the case below.
class XGBoostClassifier():
def __init__(self, num_boost_round=10, **params):
self.clf = None
self.num_boost_round = num_boost_round
self.params = params
self.params.update({'objective': 'multi:softprob'})
def fit(self, X, y, num_boost_round=None):
num_boost_round = num_boost_round or self.num_boost_round
self.label2num = dict((label, i) for i, label in enumerate(sorted(set(y))))
dtrain = xgb.DMatrix(X, label=[self.label2num[label] for label in y])
self.clf = xgb.train(params=self.params, dtrain=dtrain, num_boost_round=num_boost_round)
def predict(self, X):
num2label = dict((i, label)for label, i in self.label2num.items())
Y = self.predict_proba(X)
y = np.argmax(Y, axis=1)
return np.array([num2label[i] for i in y])
def predict_proba(self, X):
dtest = xgb.DMatrix(X)
return self.clf.predict(dtest)
def score(self, X, y):
Y = self.predict_proba(X)
return 1 / logloss(y, Y)
def get_params(self, deep=True):
return self.params
def set_params(self, **params):
if 'num_boost_round' in params:
self.num_boost_round = params.pop('num_boost_round')
if 'objective' in params:
del params['objective']
self.params.update(params)
return self
See full example here.
Please don't hesitate to provide a smarter way of doing this!
A note from the hell scape that is July 2020:
You no longer need a wrapper class. The predict_proba method is built into the xgboost sklearn python apis. Not sure when they were added but they are there for v1.0.0 on for certain.
Note: this is of course only true for classes that would have the predict_proba method. Ex: The XGBRegressor doesn't. The XGBClassifier does.