I am trying to use weighted mean squared error loss function for the regression task with imbalanced dataset. Basically, I have different weight assigned to each example and I am using the weighted MSE loss function. Is there a way to sample the weight tensor using TensorDataset along with input and output batch samples?
def weighted_mse_loss(inputs, targets, weights=None):
loss = (inputs - targets) ** 2
if weights is not None:
loss *= weights.expand_as(loss)
loss = torch.mean(loss)
return loss
train_dataset = torch.utils.data.TensorDataset(x_train, y_train)
weights = torch.rand(len(train_dataset))
for x, y in train_loader:
optimizer.zero_grad()
out = model(x)
loss = weighted_mse_loss(y, out, weights)
loss.backward()
If you can get the weights before creating the train dataset:
train_dataset = TensorDataset(x_train, y_train, weights)
for x, y, w in train_dataset:
...
Otherwise:
train_dataset = TensorDataset(x_train, y_train)
for (x, y), w in zip(train_dataset, weights):
...
You can also use a DataLoader but be carefull about shuffling with the second method
Related
I built a very simple structure
class classifier (nn.Module):
def __init__(self):
super().__init__()
self.classify = nn.Sequential(
nn.Linear(166,80),
nn.Tanh(),
nn.Linear(80,40),
nn.Tanh(),
nn.Linear(40,1),
nn.Softmax()
)
def forward (self, x):
pred = self.classify(x)
return pred
model = classifier()
The loss function and optimizer are defined as
criteria = nn.BCEWithLogitsLoss()
iteration = 1000
learning_rate = 0.1
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
and here is the training and evaluation section
for epoch in range (iteration):
model.train()
y_pred = model(x_train)
loss = criteria(y_pred,y_train)
optimizer.zero_grad()
loss.backward()
optimizer.step()
model.eval()
with torch.inference_mode():
test_pred = model(x_test)
test_loss = criteria(test_pred, y_test)
if epoch % 100 == 0:
print(loss)
print(test_loss)
I received the same loss values, and by debugging, I found that the weights were not being updated.
The problem is in the network architecture: you are using a Softmax layer on a single valued output at the end. As per the definition of the softmax function, for a output vector x, we have, for index i:
softmax(x_i) = e^{x_i} / sum_j (e^{x_j})
Here, you only have a single valued output. Due to this, the output of your neural network is always 1, irrespective of the inputs or the weights. To fix this, remove the Softmax layer at the end. An activation function like Sigmoid might be more appropriate, and in fact you are already applying this when using the BCEWithLogitsLoss.
The problem lies here
y_pred = model(x_train)
loss = criteria(y_pred,y_train)
optimizer.zero_grad()
loss.backward()
optimizer.step()
after loss is calculated, you are clearing the gradients by doing optimizer.zero_grad()
the ideal case should be:
optimizer.zero_grad()
y_pred = model(x_train)
loss = criteria(y_pred,y_train)
loss.backward()
optimizer.step()
In Pytorch quickstart tutorial the code uses model.eval() during evaluation/test but it does not call model.train() during training.
According to this and source, some modules like BatchNorm and Dropout need to know if the model is in train or evaluation mode. The model in the tutorial does not use any such module so it runs to convergence. Am I missing something or Pytorch's very first tutorial actually has a logical bug?
Training:
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
# Compute prediction error
pred = model(X)
loss = loss_fn(pred, y)
# Backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()
if batch % 100 == 0:
loss, current = loss.item(), batch * len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
You can see there is no model.train() in the above code.
Testing:
def test(dataloader, model):
size = len(dataloader.dataset)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= size
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
At the second line, there is a model.eval().
Training loop:
epochs = 5
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model, loss_fn, optimizer)
test(test_dataloader, model)
print("Done!")
This loop calls train() and test() methods without any call to model.train(). So after the first call of test(), the model is always in "evaluation" mode. If we add a BatchNorm to the model we'll be on our way to encounter a hard-to-find bug.
Main question:
Is it good practice to always call model.train() during training and model.eval() during evaluation/test?
I have a 2 branch network where one branch outputs regression value and another branch outputs classification label.
model = Model(inputs=inputs, outputs=[output1, output2])
model.compile(loss=[my_loss_reg, my_loss_class], optimizer='adam')
I want to implement a custom loss function (my_loss_reg()) for the regression branch such that at the regression end I want to add a fraction of the classification loss as follows,
def my_loss_reg(y_true, y_pred):
loss_mse=K.mean(K.sum(K.square(y_true-y_pred)))
#loss_reg = calculate_classification_loss() # How to implement this?
final_loss = some_function(loss_mse, loss_reg) # Can calculate only if loss_reg is available
return final_loss
The y_true and y_pred are true and predicted regression values at the regression branch. To calculate the classifcation loss I need the true and predicted classifcation labels, which is not available in my_loss_reg().
My question is how to calculate or access the classifcation loss at the regression end of the network? Similarly, I want to get the regression loss at the classification end while calulating the custom loss function my_loss_class() for the classification.
How can I do that? Any code snippets will be helpful. I found this solution but this is no longer valid with the latest version of Tensorflow and Keras.
All you need is simply available in native keras
you can automatically combine multiple losses using loss_weights parameter
In the example below I tried to reproduce a task where I combined an mse loss for the regression and a sparse_categorical_crossentropy for the classification task
features,n_sample,n_class = 10, 200, 3
X = np.random.uniform(0,1, (n_sample,features))
y = np.random.randint(0,n_class, n_sample)
inp = Input(shape=(features,))
x = Dense(64, activation='relu')(inp)
hidden = Dense(16, activation='relu')(x)
x = Dense(64, activation='relu')(hidden)
out_reg = Dense(features, name='out_reg')(x) # output regression
x = Dense(32, activation='relu')(hidden)
out_class = Dense(n_class, activation='softmax', name='out_class')(x) # output classification
model = Model(inp, [out_reg,out_class])
model.compile(optimizer='adam',
loss = {'out_reg':'mse', 'out_class':'sparse_categorical_crossentropy'},
loss_weights = {'out_reg':1., 'out_class':0.5})
model.fit(X, [X,y], epochs=10)
In this specific case, the loss is the result of 1*out_reg + 0.5*out_class
if you want to put your custom losses you simply have to do in this way
def my_loss_reg(y_true, y_pred):
return ...
def my_loss_class(y_true, y_pred):
return ...
model.compile(optimizer='adam',
loss = {'out_reg':my_loss_reg, 'out_class':my_loss_class},
loss_weights = {'out_reg':1., 'out_class':0.5})
model.fit(X, [X,y], epochs=10)
I have defined the following custom model and training loop in Keras:
class CustomModel(keras.Model):
def train_step(self, data):
x, y = data
with tf.GradientTape() as tape:
y_pred = self(x, training=True) # Forward pass
loss = self.compiled_loss(y, y_pred, regularization_losses=self.losses)
trainable_vars = self.trainable_variables
gradients = tape.gradient(loss, trainable_vars)
self.optimizer.apply_gradients(zip(gradients, trainable_vars))
self.compiled_metrics.update_state(y, y_pred)
return {m.name: m.result() for m in self.metrics}
And I am using the following code to train the model on a simple toy data set:
inputs = keras.layers.Input(shape=(1,))
hidden = keras.layers.Dense(1, activation='tanh')(inputs)
outputs = keras.layers.Dense(1)(hidden)
x = np.arange(0, 2*np.pi, 2*np.pi/100)
y = np.sin(x)
nnmodel = CustomModel(inputs, outputs)
nnmodel.compile(optimizer=keras.optimizers.SGD(lr=0.1), loss="mse", metrics=["mae"])
nnmodel.fit(x, y, batch_size=100, epochs=2000)
I want to be able to see the values of the gradient and the trainable_vars variables in the train_step function for each training loop, and I am not sure how to do this.
I have tried to set a break point inside the train_step function in my python IDE and expecting it to stop at the break point for each epoch of the training after I call model.fit() but this didn't happen. I also tried to have them print out the values in the log after each epoch but I am not sure how to achieve this.
I have a trained Tensorflow 2.0 model (from tf.keras.Sequential()) that takes an input layer with 26 columns (X) and produces an output layer with 1 column (Y).
In TF 1.x I was able to calculate the gradient of the output with respect to the input with the following:
model = load_model('mymodel.h5')
sess = K.get_session()
grad_func = tf.gradients(model.output, model.input)
gradients = sess.run(grad_func, feed_dict={model.input: X})[0]
In TF2 when I try to run tf.gradients(), I get the error:
RuntimeError: tf.gradients is not supported when eager execution is enabled. Use tf.GradientTape instead.
In the question In TensorFlow 2.0 with eager-execution, how to compute the gradients of a network output wrt a specific layer?, we see an answer on how to calculate gradients with respect to intermediate layers, but I don't see how to apply this to gradients with respect to the inputs. On the Tensorflow help for tf.GradientTape, there are examples with calculating gradients for simple functions, but not neural networks.
How can tf.GradientTape be used to calculate the gradient of the output with respect to the input?
This should work in TF2:
inp = tf.Variable(np.random.normal(size=(25, 120)), dtype=tf.float32)
with tf.GradientTape() as tape:
preds = model(inp)
grads = tape.gradient(preds, inp)
Basically you do it the same way as TF1, but using GradientTape.
I hope this is what you're looking for. This will give the gradients of the output w.r.t. the inputs.
# Whatever the input you like goes in as the initial_value
x = tf.Variable(np.random.normal(size=(25, 120)), dtype=tf.float32)
y_true = np.random.choice([0,1], size=(25,10))
print(model.output)
print(model.predict(x))
with tf.GradientTape() as tape:
pred = model.predict(x)
grads = tape.gradients(pred, x)
In the above case, we should use tape.watch()
for (x, y) in test_dataset:
with tf.GradientTape() as tape:
tape.watch(x)
pred = model(x)
grads = tape.gradient(pred, x)
but the grads will just have the grads of the inputs
The following method is better, you can use model to predict the prediction results and compute the loss, then use the loss to calculate the grads of all trainable variables
with tf.GradientTape() as tape:
predictions = model(x, training=True)
loss = loss_function(y, predictions)
grads = tape.gradient(loss, model.trainable_variables)