Implementing Batch for Image Segmentation - pytorch

I wrote a Python 3.5 script for doing street segmentation. Since I'm new in Image Segementation, I did not use predefined dataloaders from pytorch, instead I wrote them by my self (for better understanding). Until now I only use a batch size of 1. Now I want to generalize this for arbitrary batch sizes.
This is a snippet of my Dataloader:
def augment_data(batch_size):
# [...] defining some paths and data transformation (including ToTensor() function)
# The images are named by numbers (Frame numbers), this allows me to find the correct label image for a given input image.
all_input_image_paths = {int(elem.split('\\')[-1].split('.')[0]) : elem for idx, elem in enumerate(glob.glob(input_dir + "*"))}
all_label_image_paths = {int(elem.split('\\')[-1].split('.')[0]) : elem for idx, elem in enumerate(glob.glob(label_dir + "*"))}
dataloader = {"train":[], "val":[]}
all_samples = []
img_counter = 0
for key, value in all_input_image_paths.items():
input_img = Image.open(all_input_image_paths[key])
label_img = Image.open(all_label_image_paths[key])
# Here I use my own augmentation function which crops the input and label on the same position and do other things.
# We get a list of new augmented data
augmented_images = generate_augmented_images(input_img, label_img)
for elem in augmented_images:
input_as_tensor = data_transforms['norm'](elem[0])
label_as_tensor = data_transforms['val'](elem[1])
input_as_tensor.unsqueeze_(0)
label_as_tensor.unsqueeze_(0)
is_training_data = random.uniform(0.0, 1.0)
if is_training_data <= 0.7:
dataloader["train"].append([input_as_tensor, label_as_tensor])
else:
dataloader["val"].append([input_as_tensor, label_as_tensor])
img_counter += 1
shuffle(dataloader["train"])
shuffle(dataloader["val"])
dataloader_batched = {"train":[], "val":[]}
# Here I group my data to a given batch size
for elem in dataloader["train"]:
batch = []
for i in range(batch_size):
batch.append(elem)
dataloader_batched["train"].append(batch)
for elem in dataloader["val"]:
batch = []
for i in range(batch_size):
batch.append(elem)
dataloader_batched["val"].append(batch)
return dataloader_batched
This is a snippet of my training method with batch size 1:
while epoch <= num_epochs:
# Each epoch has a training and validation phase
for phase in ['train', 'val']:
if phase == 'train':
scheduler.step(3)
model.train() # Set model to training mode
else:
model.eval() # Set model to evaluate mode
running_loss = 0.0
counter = 0
# Iterate over data.
for inputs, labels in dataloaders[phase]:
counter += 1
max_num = len(dataloaders[phase])
inputs = inputs.to(device)
labels = labels.to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward
# track history if only in train
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
loss = criterion(outputs, labels)
# backward + optimize only if in training phase
if phase == 'train':
loss.backward()
optimizer.step()
# statistics
running_loss += loss.item() * inputs.size(0)
epoch_loss = running_loss / dataset_sizes[phase]
If I execute this, I get of course the error:
for inputs, labels in dataloaders[phase]:
ValueError: not enough values to unpack (expected 2, got 1)
I understand why, because now I have a list of images and not only an input and label image as before. So guessed I need a second for loop which iterates over these batches. So I tried this:
# Iterate over data.
for elem in dataloaders[phase]:
for inputs, labels in elem:
counter += 1
max_num = len(dataloaders[phase])
inputs = inputs.to(device)
labels = labels.to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward
# track history if only in train
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
# _, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
# backward + optimize only if in training phase
if phase == 'train':
loss.backward()
optimizer.step()
But for me it looks like the optimization step (back-prop) is only applied on the last image of the batch. Is that true? And if so, how can I fix this? I guess if I indent the with-Block, then I get again a batch size 1 optimization.
Thanks in advance

But for me it looks like the optimization step (back-prop) is only applied on the last image of the batch.
It should not apply only based on the last image. It should apply based on the batch size.
If you set bs=2 and it should apply to the batch of two images.
Optimization step actually will update the params of your network. Backprop is a fancy name for PyTorch autograd system that computes the first order gradients.

Related

How to retrieve attention weight alignment for tokens using transformer (BERT) model?

I am working on textclassification with transformer models (PyTorch, Huggingface, running on GPU).
I have already my model and my training loop and it works fine but to better understand wrong predictions of the model, I want to "dig a little deeper" and look at the attention weights the model gave the tokens of the misclassified predictions (for evaluation during training and later for predictions on the test set).
I tried already so many things and I already found a way the model outputs attention weights (will attach an example), BUT a) it outputs too many attention weights for each misclassified example (can it be because it gives the weights for several layers?) and b) the attention weights are not connected to the tokens. I want it in a more interpretable way, like coloring the tokens in a darker color the more weight is on that token.
This is my code so far (irrelevant snippets left out):
model = AutoModelForSequenceClassification.from_pretrained(checkpoint,
num_labels=len(label_dict),
output_attentions=True,
output_hidden_states=True)
def get_wrong_predictions(predictions, true_vals):
wrong_predictions = []
for i in range(len(true_vals)):
if np.argmax(predictions[i]) != true_vals[i]:
wrong_predictions.append((true_vals[i], np.argmax(predictions[i])))
return wrong_predictions
seed_val = 17
random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)
def evaluate(dataloader_val):
model.eval()
loss_val_total = 0
predictions, true_vals = [], []
attentions = []
for batch in dataloader_val:
batch = tuple(b.to(device) for b in batch)
inputs = {'input_ids': batch[0],
'attention_mask': batch[1],
'labels': batch[2],
}
with torch.no_grad():
outputs = model(**inputs)
loss = outputs[0]
logits = outputs[1]
attention_scores = outputs.attentions[-1].detach().cpu().numpy() # extract the attention scores from the last layer
loss_val_total += loss.item()
logits = logits.detach().cpu().numpy()
label_ids = inputs['labels'].cpu().numpy()
predictions.append(logits)
true_vals.append(label_ids)
attentions.append(attention_scores)
loss_val_avg = loss_val_total/len(dataloader_val)
predictions = np.concatenate(predictions, axis=0)
true_vals = np.concatenate(true_vals, axis=0)
attentions = np.concatenate(attentions, axis=0)
preds = np.argmax(predictions, axis=1)
misclassified = np.where(preds != true_vals)[0]
texts = df["description"].tolist()
global misclassified_examples
misclassified_examples = []
#misclassified = []
for idx in misclassified[:min(len(attention_scores), len(misclassified))]:
text = texts[idx]
true_label = true_vals[idx]
pred_label = preds[idx]
attention_weights = attentions[idx] # get the attention weights for the misclassified examples
misclassified_examples.append({
'text': text,
'true_label': true_label,
'pred_label': pred_label,
'attention_weights' : attention_weights
})
return loss_val_avg, predictions, true_vals
train_losses = [] #to plot later
val_losses = []
overall_accuracy = []
for epoch in tqdm(range(1, epochs+1)):
model.train()
loss_train_total = 0
progress_bar = tqdm(dataloader_train, desc='Epoch {:1d}'.format(epoch), leave=False, disable=False)
for batch in progress_bar:
model.zero_grad()
batch = tuple(b.to(device) for b in batch)
inputs = {'input_ids': batch[0],
'attention_mask': batch[1],
'labels': batch[2],
}
outputs = model(**inputs)
loss = outputs[0]
loss_train_total += loss.item()
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
optimizer.step()
scheduler.step()
progress_bar.set_postfix({'training_loss': '{:.3f}'.format(loss.item()/len(batch))})
loss_train_avg = loss_train_total/len(dataloader_train) #to plot later
train_losses.append(loss_train_avg) # to plot later
tqdm.write(f'\nEpoch {epoch}')
loss_train_avg = loss_train_total/len(dataloader_train)
tqdm.write(f'Training loss: {loss_train_avg}')
val_loss, predictions, true_vals = evaluate(dataloader_validation)
val_f1 = f1_score_func(predictions, true_vals)
tqdm.write(f'Validation loss: {val_loss}')
tqdm.write(f'F1 Score (Weighted): {val_f1}')
val_overallaccuracy = overallaccuracy(true_vals, predictions)
tqdm.write(f'Overall Accuracy: {val_overallaccuracy}')
val_mcc = mcc_score_func(true_vals, predictions)
tqdm.write(f'MCC: {val_mcc}')
val_acc = accuracy_per_class(predictions, true_vals)
tqdm.write(f'Accuracy per class: {val_acc}')
val_losses.append(val_loss)
overall_accuracy.append(val_overallaccuracy)
The image shows exemplarily what kind of output I get, but that is "too much" attention as I expected one number for one token and then of course some way to relate that number of attention weight back to that token.
[enter image description here](https://i.stack.imgur.com/ICmvp.png)
Does anyone know how I could adjust my code to include attention weight alignment for the tokens? I would be VERY thankful!!

Precision,recall, F1 score with Sklearn on Pytorch

I've been looking through samples but am unable to understand how to integrate the precision, recall and f1 metrics for my model. My code is as follows:
for epoch in range(num_epochs):
#Calculate Accuracy (stack tutorial no n_total)
n_correct = 0
n_total = 0
for i, (words, labels) in enumerate(train_loader):
words = words.to(device)
labels = labels.to(dtype=torch.long).to(device)
# Forward pass
outputs = model(words)
loss = criterion(outputs, labels)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
#feedforward tutorial solution
_, predicted = torch.max(outputs, 1)
n_correct += (predicted == labels).sum().item()
n_total += labels.shape[0]
accuracy = 100 * n_correct/n_total
#Push to matplotlib
train_losses.append(loss.item())
train_epochs.append(epoch)
train_acc.append(accuracy)
#Loss and Accuracy
if (epoch+1) % 10 == 0:
print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.2f}, Acc: {accuracy:.2f}')
Since you have the predicted and the labels variables, you can aggregate them during the epoch loop and convert them to numpy arrays to calculate the required metrics.
At the beginning of the epoch, initialize two empty lists; one for true labels and one for ground truth labels.
for epoch in range(num_epochs):
predicted_labels, ground_truth_labels = [], []
...
Then, keep appending the respective entries to each list during the epoch:
...
_, predicted = torch.max(outputs, 1)
n_correct += (predicted == labels).sum().item()
# appending
predicted_labels.append(predicted.cpu().detach().numpy())
ground_truth_labels.append(labels.cpu().detach().numpy())
...
Then, at the epoch end, you could use precision_recall_fscore_support with predicted_labels and ground_truth_labels as inputs.
Notes:
You'll probably have to refer something like this to flatten the above two lists.
Read about torch.no_grad() to apply it as a good practice during the calculations of metrics.

Pytorch BCE loss not decreasing for word sense disambiguation task

I am performing word sense disambiguation and have created my own vocabulary of the top 300k most common English words. My model is very simple where each word in the sentences (their respective index value) is passed through an embedding layer which embeds the word and average the resulting embedding. The averaged embedding is then sent through a linear layer, as shown in the model below.
class TestingClassifier(nn.Module):
def __init__(self, vocabSize, features, embeddingDim):
super(TestingClassifier, self).__init__()
self.embeddings = nn.Embedding(vocabSize, embeddingDim)
self.linear = nn.Linear(features, 2)
self.sigmoid = nn.Sigmoid()
def forward(self, inputs):
embeds = self.embeddings(inputs)
avged = torch.mean(embeds, dim=-1)
output = self.linear(avged)
output = self.sigmoid(output)
return output
I am running BCELoss as loss function and SGD as optimizer. My problem is that my loss barely decreases as training goes on, almost as if it converges with a very high loss. I have tried different learning rates (0.0001, 0.001, 0.01 and 0.1) but I get the same issue.
My training function is as follows:
def train_model(model,
optimizer,
lossFunction,
batchSize,
epochs,
isRnnModel,
trainDataLoader,
validDataLoader,
earlyStop = False,
maxPatience = 1
):
validationAcc = []
patienceCounter = 0
stopTraining = False
model.train()
# Train network
for epoch in range(epochs):
losses = []
if(stopTraining):
break
for inputs, labels in tqdm(trainDataLoader, position=0, leave=True):
optimizer.zero_grad()
# Predict and calculate loss
prediction = model(inputs)
loss = lossFunction(prediction, labels)
losses.append(loss)
# Backward propagation
loss.backward()
# Readjust weights
optimizer.step()
print(sum(losses) / len(losses))
curValidAcc = check_accuracy(validDataLoader, model, isRnnModel) # Check accuracy on validation set
curTrainAcc = check_accuracy(trainDataLoader, model, isRnnModel)
print("Epoch", epoch + 1, "Training accuracy", curTrainAcc, "Validation accuracy:", curValidAcc)
# Control early stopping
if(earlyStop):
if(patienceCounter == 0):
if(len(validationAcc) > 0 and curValidAcc < validationAcc[-1]):
benchmark = validationAcc[-1]
patienceCounter += 1
print("Patience counter", patienceCounter)
elif(patienceCounter == maxPatience):
print("EARLY STOP. Patience level:", patienceCounter)
stopTraining = True
else:
if(curValidAcc < benchmark):
patienceCounter += 1
print("Patience counter", patienceCounter)
else:
benchmark = curValidAcc
patienceCounter = 0
validationAcc.append(curValidAcc)
Batch size is 32 (training set contains 8000 rows), vocabulary size is 300k, embedding dimension is 24. I have tried adding more linear layers to the network, but it makes no difference. The prediction accuracy on the training and validation sets stays at around 50% (which is horrible) even after many epochs of training. Any help is much appreciated!

Unable to use existing code working with base transformers on 'large' models

My Python code works OK for base transformer models, but when I attempt to use 'large' models, or roberta models I receive error mesages. The most common message I print below.
Epoch 1 / 40
RuntimeError Traceback (most recent call last)
in ()
12
13 #train model
---> 14 train_loss, _ = fine_tune()
15 # WE DON'T CARE ABOUT THE SECOND ITEM THE MODEL OUTPUTS (total_preds)
16 # We onlt want the average loss values here 'avg_loss'
5 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
1688 if input.dim() == 2 and bias is not None:
1689 # fused op is marginally faster
-> 1690 ret = torch.addmm(bias, input, weight.t())
1691 else:
1692 output = input.matmul(weight.t())
RuntimeError: mat1 dim 1 must match mat2 dim 0
I am guessing there is some kind of a mismatch between matrices(Tensors) such that an operation cannot occur. If I can better understand the issue, I can better address the necessary changes to my code. Her is the fine tuning function I am using...
def fine_tune():
model.train()
total_loss, total_accuracy = 0, 0
empty list to save model predictions
total_preds=[]
iterate over batches
for step,batch in enumerate(train_dataloader):
# progress update after every 50 batches.
if step % 50 == 0 and not step == 0:
print(' Batch {:>5,} of {:>5,}.'.format(step, len(train_dataloader)))
# push the batch to gpu
batch = [r.to(device) for r in batch]
sent_id, mask, labels = batch
# clear previously calculated gradients
model.zero_grad()
# get model predictions for the current batch
preds = model(sent_id, mask)
# compute the loss between actual and predicted values
loss = cross_entropy(preds, labels)
# add on to the total loss
total_loss = total_loss + loss.item()
# backward pass to calculate the gradients
loss.backward()
# clip the the gradients to 1.0. It helps in preventing the exploding gradient problem
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
# update parameters
optimizer.step()
# model predictions are stored on GPU. So, push it to CPU
preds=preds.detach().cpu().numpy()
# Length of preds is the same as the batch size
# append the model predictions
total_preds.append(preds)
compute the training loss of the epoch
avg_loss = total_loss / len(train_dataloader)
reshape the predictions in form of (number of samples, no. of classes)
total_preds = np.concatenate(total_preds, axis=0)
return avg_loss, total_preds
regards, Mark
I wrote a print statement to reveal the size of the input from the pre-trained model. This revealed that true size, namely 1024, rather than the default hard-code value of 768 in the program I have modified. An easy fix once I understood the problem. The moral of the story for me is, when a YouTuber ( a good one actually!) says " all transformers have an output dimension of 768" don't take that necessarily as gospel!

TF | How to predict from CNN after training is done

Trying to work with the framework provided in the course Stanford cs231n, given the code below.
I can see the accuracy getting better and the net is trained however after the training process and checking the results on the validation set, how would I go to input one image into the model and see its prediction?
I have searched around and couldn't find some built in predict function in tensorflow as there is in keras.
Initializing the net and its parameters
# clear old variables
tf.reset_default_graph()
# setup input (e.g. the data that changes every batch)
# The first dim is None, and gets sets automatically based on batch size fed in
X = tf.placeholder(tf.float32, [None, 30, 30, 1])
y = tf.placeholder(tf.int64, [None])
is_training = tf.placeholder(tf.bool)
def simple_model(X,y):
# define our weights (e.g. init_two_layer_convnet)
# setup variables
Wconv1 = tf.get_variable("Wconv1", shape=[7, 7, 1, 32]) # Filter of size 7x7 with depth of 3. No. of filters is 32
bconv1 = tf.get_variable("bconv1", shape=[32])
W1 = tf.get_variable("W1", shape=[4608, 360]) # 5408 is 13x13x32 where 13x13 is the output of 7x7 filter on 32x32 image with padding of 2.
b1 = tf.get_variable("b1", shape=[360])
# define our graph (e.g. two_layer_convnet)
a1 = tf.nn.conv2d(X, Wconv1, strides=[1,2,2,1], padding='VALID') + bconv1
h1 = tf.nn.relu(a1)
h1_flat = tf.reshape(h1,[-1,4608])
y_out = tf.matmul(h1_flat,W1) + b1
return y_out
y_out = simple_model(X,y)
# define our loss
total_loss = tf.losses.hinge_loss(tf.one_hot(y,360),logits=y_out)
mean_loss = tf.reduce_mean(total_loss)
# define our optimizer
optimizer = tf.train.AdamOptimizer(5e-4) # select optimizer and set learning rate
train_step = optimizer.minimize(mean_loss)
Function for evaluating the model whether for training or validation and plots the results:
def run_model(session, predict, loss_val, Xd, yd,
epochs=1, batch_size=64, print_every=100,
training=None, plot_losses=False):
# Have tensorflow compute accuracy
correct_prediction = tf.equal(tf.argmax(predict,1), y)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# shuffle indicies
train_indicies = np.arange(Xd.shape[0])
np.random.shuffle(train_indicies)
training_now = training is not None
# setting up variables we want to compute and optimize
# if we have a training function, add that to things we compute
variables = [mean_loss,correct_prediction,accuracy]
if training_now:
variables[-1] = training
# counter
iter_cnt = 0
for e in range(epochs):
# keep track of losses and accuracy
correct = 0
losses = []
# make sure we iterate over the dataset once
for i in range(int(math.ceil(Xd.shape[0]/batch_size))):
# generate indicies for the batch
start_idx = (i*batch_size)%Xd.shape[0]
idx = train_indicies[start_idx:start_idx+batch_size]
# create a feed dictionary for this batch
feed_dict = {X: Xd[idx,:],
y: yd[idx],
is_training: training_now }
# get batch size
actual_batch_size = yd[idx].shape[0]
# have tensorflow compute loss and correct predictions
# and (if given) perform a training step
loss, corr, _ = session.run(variables,feed_dict=feed_dict)
# aggregate performance stats
losses.append(loss*actual_batch_size)
correct += np.sum(corr)
# print every now and then
if training_now and (iter_cnt % print_every) == 0:
print("Iteration {0}: with minibatch training loss = {1:.3g} and accuracy of {2:.2g}"\
.format(iter_cnt,loss,np.sum(corr)/actual_batch_size))
iter_cnt += 1
total_correct = correct/Xd.shape[0]
total_loss = np.sum(losses)/Xd.shape[0]
print("Epoch {2}, Overall loss = {0:.3g} and accuracy of {1:.3g}"\
.format(total_loss,total_correct,e+1))
if plot_losses:
plt.plot(losses)
plt.grid(True)
plt.title('Epoch {} Loss'.format(e+1))
plt.xlabel('minibatch number')
plt.ylabel('minibatch loss')
plt.show()
return total_loss,total_correct
The functions calls that trains the model
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
print('Training')
run_model(sess,y_out,mean_loss,x_train,y_train,1,64,100,train_step,True)
print('Validation')
run_model(sess,y_out,mean_loss,x_val,y_val,1,64)
You do not need to go far, you simply pass your new (test) feature matrix X_test into your network and perform a forward pass - the output layer is the prediction. So the code is something like this
session.run(y_out, feed_dict={X: X_test})

Resources