I’m working on sentence classification with pretrained BERT.
I used BertForSequenceClassification class but it doesn’t look being trained.
Maybe I'm getting wrong output but I can't figure out what is the right output.
Can someone let me know what’s wrong with this code?
import torch
from torch import nn
from torch.optim import AdamW
from torch.utils.data import DataLoader, Dataset
from torchtext.datasets import SST2
from tqdm import tqdm
from transformers import AutoTokenizer, AutoModel, BertForSequenceClassification
LR = 0.0005
EPOCHS = 5
BATCH_SIZE = 128
device = torch.device("cuda:" + "0" if torch.cuda.is_available() else "cpu")
model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
base_model = BertForSequenceClassification.from_pretrained(model_name)
max_input_length = 128
model = base_model.to(device)
criterion = nn.CrossEntropyLoss().to(device)
optimizer = AdamW(model.parameters(), lr=LR)
train_datapipe = SST2(split="train")
valid_datapipe = SST2(split="dev")
def collate_batch(batch):
ids, types, masks, label_list = [], [], [], []
for text, label in batch:
tokenized = tokenizer(text,
padding="max_length", max_length=max_input_length,
truncation=True, return_tensors="pt")
ids.append(tokenized['input_ids'])
types.append(tokenized['token_type_ids'])
masks.append(tokenized['attention_mask'])
label_list.append(label)
input_data = {
"input_ids": torch.squeeze(torch.stack(ids)).to(device),
"token_type_ids": torch.squeeze(torch.stack(types)).to(device),
"attention_mask": torch.squeeze(torch.stack(masks)).to(device)
}
label_list = torch.tensor(label_list, dtype=torch.int64)
return input_data, label_list
train_dataloader = DataLoader(train_datapipe, shuffle=True, batch_size=BATCH_SIZE, collate_fn=collate_batch)
valid_dataloader = DataLoader(valid_datapipe, batch_size=BATCH_SIZE, collate_fn=collate_batch)
# print("total instances: ", len(train_dataloader))
for epoch in range(EPOCHS):
model.train()
train_loss = []
all_labels = []
all_outs = []
for i, (input_data, label) in enumerate(tqdm(train_dataloader)):
model.zero_grad()
output = model(**input_data).logits
label = label.to(device)
loss = criterion(output, label)
loss.backward()
optimizer.step()
Related
I am new to multi-class text classification with BERT. I have been following a tutorial (https://towardsdatascience.com/multi-label-multi-class-text-classification-with-bert-transformer-and-keras-c6355eccb63a) for leaning purposes.
I am able to get the script below running up to calculating the confusion matrix. The classification report also does not work. I would be grateful if someone can help me. My apologies if this question has already been asked. I searched everywhere and could not find an answer.
The error is here: y_predicted = numpy.argmax(predicted_raw, axis = 1). The error message says "axis 1 is out of bounds for array of dimension 1" When I change axis to zero. The new error message is "Singleton array 0 cannot be considered a valid collection." I think what the axis=0 error says is that y_predicted is null. I double checked it with an if statement.
import pandas
import numpy
import re
import nltk
# for plotting
import matplotlib.pyplot as plt
import seaborn as sns
input_dataframe = pandas.read_csv('tutorial6.csv')
fig, ax = plt.subplots()
fig.suptitle("Product", fontsize=12)
input_dataframe["Product"].reset_index().groupby("Product").count().sort_values(by=
"index").plot(kind="barh", legend=False,
ax=ax).grid(axis='x')
plt.show()
def utils_preprocess_text(text, flg_stemm=False, flg_lemm=True, lst_stopwords=None):
## clean (convert to lowercase and remove punctuations and characters and then strip)
text = re.sub(r'[^\w\s]', '', str(text).lower().strip())
## Tokenize (convert from string to list)
lst_text = text.split()
## remove Stopwords
if lst_stopwords is not None:
lst_text = [word for word in lst_text if word not in
lst_stopwords]
## Stemming (remove -ing, -ly, ...)
if flg_stemm == True:
ps = nltk.stem.porter.PorterStemmer()
lst_text = [ps.stem(word) for word in lst_text]
## Lemmatisation (convert the word into root word)
if flg_lemm == True:
lem = nltk.stem.wordnet.WordNetLemmatizer()
lst_text = [lem.lemmatize(word) for word in lst_text]
## back to string from list
text = " ".join(lst_text)
return text
lst_stopwords = nltk.corpus.stopwords.words("english")
input_dataframe["text_clean"] = input_dataframe ["Consumer_Complaint"].apply(lambda x:
utils_preprocess_text(x, flg_stemm=False, flg_lemm=True,
lst_stopwords=lst_stopwords))
from tensorflow.keras.utils import to_categorical
possible_labels = input_dataframe.Product.unique()
label_dict = {}
for index, possible_label in enumerate(possible_labels):
label_dict[possible_label] = index
print(label_dict)
input_dataframe['label'] = input_dataframe.Product.replace(label_dict)
# Split into train and test - stratify over Issue
from sklearn.model_selection import train_test_split
data_train, data_test = train_test_split(input_dataframe, test_size = 0.2,stratify = input_dataframe[["label"]])
# Load Huggingface transformers
from transformers import TFBertModel, BertConfig, BertTokenizerFast
# Then what you need from tensorflow.keras
from tensorflow.keras.layers import Input, Dropout, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.initializers import TruncatedNormal
from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.metrics import CategoricalAccuracy
from tensorflow.keras.utils import to_categorical
### --------- Setup BERT ---------- ###
# Name of the BERT model to use
model_name = 'bert-base-uncased'
# Max length of tokens
max_length = 100
# Load transformers config and set output_hidden_states to False
config = BertConfig.from_pretrained(model_name)
config.output_hidden_states = False
# Load BERT tokenizer
tokenizer = BertTokenizerFast.from_pretrained(pretrained_model_name_or_path = model_name, config = config)
# Load the Transformers BERT model
transformer_model = TFBertModel.from_pretrained(model_name, config = config)
### ------- Build the model ------- ###
# TF Keras documentation: https://www.tensorflow.org/api_docs/python/tf/keras/Model
# Load the MainLayer
bert = transformer_model.layers[0]
# Build your model input
input_ids = Input(shape=(max_length,), name='input_ids', dtype='int32')
inputs = {'input_ids': input_ids}
# Load the Transformers BERT model as a layer in a Keras model
bert_model = bert(inputs)[1]
dropout = Dropout(config.hidden_dropout_prob, name='pooled_output')
pooled_output = dropout(bert_model, training=False)
# Then build your model output
product = Dense(8, kernel_initializer=TruncatedNormal(stddev=config.initializer_range), name='product')(pooled_output)
outputs = {'product': product}
# And combine it all in a model object
model = Model(inputs=inputs, outputs=outputs, name='BERT_MultiLabel_MultiClass')
# Take a look at the model
model.summary()
# Set an optimizer
optimizer = Adam()
# Set loss and metrics
loss = {'product': CategoricalCrossentropy(from_logits = True)}
metric = {'product': CategoricalAccuracy('accuracy')}
# Compile the model
model.compile(
optimizer = optimizer,
loss = loss,
metrics = metric)
# Ready output data for the model
y_train = to_categorical(data_train['label'],8)
y_test = to_categorical(data_test['label'],8)
x_train = tokenizer(
text=data_train['Consumer_Complaint'].to_list(),
add_special_tokens=True,
max_length=max_length,
truncation=True,
padding=True,
return_tensors='tf',
return_token_type_ids = False,
return_attention_mask = False,
verbose = True)
x_test = tokenizer(
text=data_test['Consumer_Complaint'].to_list(),
add_special_tokens=True,
max_length=max_length,
truncation=True,
padding=True,
return_tensors='tf',
return_token_type_ids = False,
return_attention_mask = False,
verbose = True)
# Fit the model
history = model.fit(
x={'input_ids': x_train['input_ids']},
y={'product': y_train},
validation_split=0.2,
batch_size=64,
epochs=1)
### ----- Evaluate the model ------ ###
model_eval = model.evaluate(
x={'input_ids': x_test['input_ids']},
y={'product': y_test}
)
print("This is evaluation: ", model_eval)
accr = model.evaluate(x_test['input_ids'],y_test)
print('Test set\n Loss: {:0.3f}\n Accuracy: {:0.3f}'.format(accr[0],accr[1]))
from matplotlib import pyplot as plt
plt.title('Loss')
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='test')
plt.legend()
plt.show();
# plot loss and accuracy
metrics = [k for k in history.history.keys() if ("loss" not in k) and ("val" not in k)]
fig, ax = plt.subplots(nrows=1, ncols=2, sharey=True)
ax[0].set(title="Training")
ax11 = ax[0].twinx()
ax[0].plot(history.history['loss'], color='black')
ax[0].set_xlabel('Epochs')
ax[0].set_ylabel('Loss', color='black')
for metric in metrics:
ax11.plot(history.history[metric], label=metric)
ax11.set_ylabel("Score", color='steelblue')
ax11.legend()
ax[1].set(title="Validation")
ax22 = ax[1].twinx()
ax[1].plot(history.history['val_loss'], color='black')
ax[1].set_xlabel('Epochs')
ax[1].set_ylabel('Loss', color='black')
for metric in metrics:
ax22.plot(history.history['val_'+metric], label=metric)
ax22.set_ylabel("Score", color="steelblue")
plt.show()
#Testing our model on the test data.
predicted_raw = model.predict({'input_ids':x_test['input_ids']})
print(type(predicted_raw))
predicted_raw=list(predicted_raw)
predicted_raw=numpy.array(predicted_raw)
y_predicted = numpy.argmax(predicted_raw, axis = 1)
y_true = data_test.label
from sklearn.metrics import accuracy_score,confusion_matrix,classification_report
confusionmatrix = confusion_matrix(y_predicted,y_true)
I am trying to get the confusion matrix and classification report working.
I am new in NLP.
On Pytorch, I'm training BertForSequenceClassification for a multi-label task.
Is it possible for the output label to match the text?
nput is a text like : name, phon, address,output : label_name, label_phone, label_address.
In this case, input : phon, address, name,output : label_name, label_phone, label_address.
However, I'd like the output to look something like this: label_phone, label_address, label_name.
Is there anything I can change?
## Package
# PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.utils.data.dataset import random_split
import torch.utils.data as data
# BERT Related Libraries
from transformers import BertTokenizer, BertForSequenceClassification
# Python
import pandas as pd
import numpy as np
import os
import time
import matplotlib.pyplot as plt
# ML Parameters
epoch = 8
device = 'cuda'
num_labels = 9
batch_size = 32
epsilon = 1e-8
learning_rate = 5e-5
## Define model
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=num_labels)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
optimizer = optim.AdamW(model.parameters(), lr = learning_rate, eps = epsilon, weight_decay = 1e-2)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size = 1, gamma=0.1, last_epoch=-1)
criterion = nn.BCEWithLogitsLoss()
train_losses = []
train_acces = []
val_losses = []
val_acces = []
## Train with training set
def train(model, iterator, optimizer, criterion, device):
model.train()
train_loss = 0
correct = 0
total = 0
for batch_idx, (sentences, labels) in enumerate(iterator):
# tokenize the sentences
encoding = tokenizer(sentences, return_tensors='pt', padding=True, truncation=True)
input_ids = encoding['input_ids']
attention_mask = encoding['attention_mask']
# generate prediction
optimizer.zero_grad()
outputs = model(input_ids, attention_mask=attention_mask) # NOT USING INTERNAL CrossEntropyLoss
# compute gradients and update weights
loss = criterion(outputs.logits, labels) # BCEWithLogitsLoss has sigmoid
loss.backward()
optimizer.step()
# accumulate train loss
train_loss += loss.item()
# record processed data count
prob = outputs.logits.sigmoid()
total += (labels.size(0)*labels.size(1))
# take the index of the highest prob as prediction output
THRESHOLD = 0.7
prediction = prob.detach().clone()
prediction[prediction > THRESHOLD] = 1
prediction[prediction <= THRESHOLD] = 0
correct += prediction.eq(labels).sum().item()
train_acc = correct/total
print('train_loss: %f\t' % (train_loss))
print('train_acc: %f\t' % (train_acc))
return train_loss, train_acc
## Validate with testing set
def test(model, iterator, optimizer, criterion, device):
model.eval()
val_loss = 0
correct = 0
total = 0
with torch.no_grad():
for batch_idx, (sentences, labels) in enumerate(iterator):
# tokenize the sentences
encoding = tokenizer(sentences, return_tensors='pt', padding=True, truncation=True)
input_ids = encoding['input_ids']
attention_mask = encoding['attention_mask']
# generate prediction
outputs = model(input_ids, attention_mask=attention_mask)
loss = criterion(outputs.logits, labels)
val_loss += loss.item()
# record processed data count
prob = outputs.logits.sigmoid()
total += (labels.size(0)*labels.size(1))
# take the index of the highest prob as prediction output
THRESHOLD = 0.7
prediction = prob.detach().clone()
prediction[prediction > THRESHOLD] = 1
prediction[prediction <= THRESHOLD] = 0
correct += prediction.eq(labels).sum().item()
val_acc = correct/total
print('val_loss: %f\t' % (val_loss))
print('val_acc: %f\t' % (val_acc))
print('correct: %i , total: %i' % (correct, total))
return val_loss, val_acc
for e in range(epoch):
print("===== Epoch %i =====" % e)
print("Training started ...")
train_loss, train_acc = train(model, train_loader, optimizer, criterion, device)
train_losses.append(train_loss/len(train_loader))
train_acces.append(train_acc/len(train_loader))
# validation testing
print("Testing started ...")
val_loss, val_acc = test(model, test_loader, optimizer, criterion, device)
val_losses.append(val_loss/len(test_loader))
val_acces.append(val_acc/len(test_loader))
scheduler.step()
If you need more information,please let me konw.
Thanks for all.
I followed Aladdin Persson's Youtube video to code up just the encoder portion of the transformer model in PyTorch, except I just used the Pytorch's multi-head attention layer. The model seems to produce the correct shape of data. However, during training, the training loss does not drop and the resulting model always predicts the same output of 0.4761. Dataset used for training is from the Sarcasm Detection Dataset from Kaggle. Would appreciate any help you guys can give on errors that I have made.
import pandas as pd
from transformers import BertTokenizer
import torch.nn as nn
import torch
from sklearn.model_selection import train_test_split
from torch.optim.lr_scheduler import ReduceLROnPlateau
import math
df = pd.read_json("Sarcasm_Headlines_Dataset_v2.json", lines=True)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
encoded_input = tokenizer(df['headline'].tolist(), return_tensors='pt',padding=True)
X = encoded_input['input_ids']
y = torch.tensor(df['is_sarcastic'].values).float()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify = y)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)
torch.cuda.empty_cache()
class TransformerBlock(nn.Module):
def __init__(self,embed_dim, num_heads, dropout, expansion_ratio):
super(TransformerBlock, self).__init__()
self.attention = nn.MultiheadAttention(embed_dim, num_heads)
self.norm1 = nn.LayerNorm(embed_dim)
self.norm2 = nn.LayerNorm(embed_dim)
self.feed_forward = nn.Sequential(
nn.Linear(embed_dim, expansion_ratio*embed_dim),
nn.ReLU(),
nn.Linear(expansion_ratio*embed_dim,embed_dim)
)
self.dropout = nn.Dropout(dropout)
def forward(self, value, key, query):
attention, _ = self.attention(value, key, query)
x=self.dropout(self.norm1(attention+query))
forward = self.feed_forward(x)
out=self.dropout(self.norm2(forward+x))
return out
class Encoder(nn.Module):
#the vocab size is one more than the max value in the X matrix.
def __init__(self,vocab_size=30109,embed_dim=128,num_layers=1,num_heads=4,device="cpu",expansion_ratio=4,dropout=0.1,max_length=193):
super(Encoder,self).__init__()
self.device = device
self.word_embedding = nn.Embedding(vocab_size,embed_dim)
self.position_embedding = nn.Embedding(max_length,embed_dim)
self.layers = nn.ModuleList(
[
TransformerBlock(embed_dim,num_heads,dropout,expansion_ratio) for _ in range(num_layers)
]
)
self.dropout = nn.Dropout(dropout)
self.classifier1 = nn.Linear(embed_dim,embed_dim)
self.classifier2 = nn.Linear(embed_dim,1)
self.relu = nn.ReLU()
def forward(self,x):
N, seq_length = x.shape
positions = torch.arange(0,seq_length).expand(N, seq_length).to(self.device)
out = self.dropout(self.word_embedding(x) + self.position_embedding(positions))
for layer in self.layers:
#print(out.shape)
out = layer(out,out,out)
#Get the first output for classification
#Pooled output from hugging face is: Last layer hidden-state of the first token of the sequence (classification token) further processed by a Linear layer and a Tanh activation function.
#Pooled output from hugging face will be different from out[:,0,:], which is the output from the CLS token.
out = self.relu(self.classifier1(out[:,0,:]))
out = self.classifier2(out)
return out
torch.cuda.empty_cache()
net = Encoder(device=device)
net.to(device)
batch_size = 32
num_train_samples = X_train.shape[0]
num_val_samples = X_test.shape[0]
criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(net.parameters(),lr=1e-5)
scheduler = ReduceLROnPlateau(optimizer, 'min', patience=5)
val_loss_hist=[]
loss_hist=[]
epoch = 0
min_val_loss = math.inf
print("Training Started")
patience = 0
for _ in range(100):
epoch += 1
net.train()
epoch_loss = 0
permutation = torch.randperm(X_train.size()[0])
for i in range(0,X_train.size()[0], batch_size):
indices = permutation[i:i+batch_size]
features=X_train[indices].to(device)
labels=y_train[indices].reshape(-1,1).to(device)
output = net.forward(features)
loss = criterion(output, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
epoch_loss+=loss.item()
epoch_loss = epoch_loss / num_train_samples * num_val_samples
loss_hist.append(epoch_loss)
#print("Eval")
net.eval()
epoch_val_loss = 0
permutation = torch.randperm(X_test.size()[0])
for i in range(0,X_test.size()[0], batch_size):
indices = permutation[i:i+batch_size]
features=X_test[indices].to(device)
labels = y_test[indices].reshape(-1,1).to(device)
output = net.forward(features)
loss = criterion(output, labels)
epoch_val_loss+=loss.item()
val_loss_hist.append(epoch_val_loss)
scheduler.step(epoch_val_loss)
#if epoch % 5 == 0:
print("Epoch: " + str(epoch) + " Train Loss: " + format(epoch_loss, ".4f") + ". Val Loss: " + format(epoch_val_loss, ".4f") + " LR: " + str(optimizer.param_groups[0]['lr']))
if epoch_val_loss < min_val_loss:
min_val_loss = epoch_val_loss
torch.save(net.state_dict(), "torchmodel/weights_best.pth")
print('\033[93m'+"Model Saved"+'\033[0m')
patience = 0
else:
patience += 1
if (patience == 10):
break
print("Training Ended")
I'm trying to implement a code for sentiment analysis( positive or negative labels) using BERT and i want to add a BiLSTM layer to see if I can increase the accuracy of the pretrained model from HuggingFace. I have the below code and a few questions :
import numpy as np
import pandas as pd
from sklearn import metrics
import transformers
import torch
from torch.utils.data import Dataset, DataLoader, RandomSampler, SequentialSampler
from transformers import BertTokenizer, BertModel, BertConfig
from torch import cuda
import re
import torch.nn as nn
device = 'cuda' if cuda.is_available() else 'cpu'
MAX_LEN = 200
TRAIN_BATCH_SIZE = 8
VALID_BATCH_SIZE = 4
EPOCHS = 1
LEARNING_RATE = 1e-05 #5e-5, 3e-5 or 2e-5
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
class CustomDataset(Dataset):
def __init__(self, dataframe, tokenizer, max_len):
self.tokenizer = tokenizer
self.data = dataframe
self.comment_text = dataframe.review
self.targets = self.data.sentiment
self.max_len = max_len
def __len__(self):
return len(self.comment_text)
def __getitem__(self, index):
comment_text = str(self.comment_text[index])
comment_text = " ".join(comment_text.split())
inputs = self.tokenizer.encode_plus(comment_text,None,add_special_tokens=True,max_length=self.max_len,
pad_to_max_length=True,return_token_type_ids=True)
ids = inputs['input_ids']
mask = inputs['attention_mask']
token_type_ids = inputs["token_type_ids"]
return {
'ids': torch.tensor(ids, dtype=torch.long),
'mask': torch.tensor(mask, dtype=torch.long),
'token_type_ids': torch.tensor(token_type_ids, dtype=torch.long),
'targets': torch.tensor(self.targets[index], dtype=torch.float)
}
train_size = 0.8
train_dataset=df.sample(frac=train_size,random_state=200)
test_dataset=df.drop(train_dataset.index).reset_index(drop=True)
train_dataset = train_dataset.reset_index(drop=True)
print("FULL Dataset: {}".format(df.shape))
print("TRAIN Dataset: {}".format(train_dataset.shape))
print("TEST Dataset: {}".format(test_dataset.shape))
training_set = CustomDataset(train_dataset, tokenizer, MAX_LEN)
testing_set = CustomDataset(test_dataset, tokenizer, MAX_LEN)
train_params = {'batch_size': TRAIN_BATCH_SIZE,'shuffle': True,'num_workers': 0}
test_params = {'batch_size': VALID_BATCH_SIZE,'shuffle': True,'num_workers': 0}
training_loader = DataLoader(training_set, **train_params)
testing_loader = DataLoader(testing_set, **test_params)
class BERTClass(torch.nn.Module):
def __init__(self):
super(BERTClass, self).__init__()
self.bert = BertModel.from_pretrained('bert-base-uncased',return_dict=False, num_labels =2)
self.lstm = nn.LSTM(768, 256, batch_first=True, bidirectional=True)
self.linear = nn.Linear(256*2,2)
def forward(self, ids , mask,token_type_ids):
sequence_output, pooled_output = self.bert(ids, attention_mask=mask, token_type_ids = token_type_ids)
lstm_output, (h, c) = self.lstm(sequence_output) ## extract the 1st token's embeddings
hidden = torch.cat((lstm_output[:, -1, :256], lstm_output[:, 0, 256:]), dim=-1)
linear_output = self.linear(lstm_output[:, -1].view(-1, 256 * 2))
return linear_output
model = BERTClass()
model.to(device)
print(model)
def loss_fn(outputs, targets):
return torch.nn.BCEWithLogitsLoss()(outputs, targets)
optimizer = torch.optim.Adam(params = model.parameters(), lr=LEARNING_RATE)
def train(epoch):
model.train()
for _, data in enumerate(training_loader, 0):
ids = data['ids'].to(device, dtype=torch.long)
mask = data['mask'].to(device, dtype=torch.long)
token_type_ids = data['token_type_ids'].to(device, dtype=torch.long)
targets = data['targets'].to(device, dtype=torch.float)
outputs = model(ids, mask, token_type_ids)
optimizer.zero_grad()
loss = loss_fn(outputs, targets)
if _ % 5000 == 0:
print(f'Epoch: {epoch}, Loss: {loss.item()}')
optimizer.zero_grad()
loss.backward()
optimizer.step()
for epoch in range(EPOCHS):
train(epoch)
So on the above code I ran into the error : Target size (torch.Size([8])) must be the same as input size (torch.Size([8, 2])) . Checked online and tried to use targets = targets.unsqueeze(2) but then I get another error that I must use values from [-2,1] for unsqueeze. I also tried to modify the loss function to
def loss_fn(outputs, targets):
return torch.nn.BCELoss()(outputs, targets)
but I still receive the same error. Can someone advise if there is a solution to this problem? Or what can I do to make this work fine? Many thanks in advance.
Using this mnist image classification model :
%reset -f
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import torch.utils.data as data_utils
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from matplotlib import pyplot
from pandas import DataFrame
import torchvision.datasets as dset
import os
import torch.nn.functional as F
import time
import random
import pickle
from sklearn.metrics import confusion_matrix
import pandas as pd
import sklearn
trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (1.0,))])
root = './data'
if not os.path.exists(root):
os.mkdir(root)
train_set = dset.MNIST(root=root, train=True, transform=trans, download=True)
test_set = dset.MNIST(root=root, train=False, transform=trans, download=True)
batch_size = 64
train_loader = torch.utils.data.DataLoader(
dataset=train_set,
batch_size=batch_size,
shuffle=True)
test_loader = torch.utils.data.DataLoader(
dataset=test_set,
batch_size=batch_size,
shuffle=True)
class NeuralNet(nn.Module):
def __init__(self):
super(NeuralNet, self).__init__()
self.fc1 = nn.Linear(28*28, 500)
self.fc2 = nn.Linear(500, 256)
self.fc3 = nn.Linear(256, 2)
def forward(self, x):
x = x.view(-1, 28*28)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
num_epochs = 2
random_sample_size = 200
values_0_or_1 = [t for t in train_set if (int(t[1]) == 0 or int(t[1]) == 1)]
values_0_or_1_testset = [t for t in test_set if (int(t[1]) == 0 or int(t[1]) == 1)]
print(len(values_0_or_1))
print(len(values_0_or_1_testset))
train_loader_subset = torch.utils.data.DataLoader(
dataset=values_0_or_1,
batch_size=batch_size,
shuffle=True)
test_loader_subset = torch.utils.data.DataLoader(
dataset=values_0_or_1_testset,
batch_size=batch_size,
shuffle=False)
train_loader = train_loader_subset
# Hyper-parameters
input_size = 100
hidden_size = 100
num_classes = 2
# learning_rate = 0.00001
learning_rate = .0001
# Device configuration
device = 'cpu'
print_progress_every_n_epochs = 1
model = NeuralNet().to(device)
# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
N = len(train_loader)
# Train the model
total_step = len(train_loader)
most_recent_prediction = []
test_actual_predicted_dict = {}
rm = random.sample(list(values_0_or_1), random_sample_size)
train_loader_subset = data_utils.DataLoader(rm, batch_size=4)
for epoch in range(num_epochs):
for i, (images, labels) in enumerate(train_loader_subset):
# Move tensors to the configured device
images = images.reshape(-1, 2).to(device)
labels = labels.to(device)
# Forward pass
outputs = model(images)
loss = criterion(outputs, labels)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch) % print_progress_every_n_epochs == 0:
print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, i+1, total_step, loss.item()))
predicted_test = []
model.eval() # eval mode (batchnorm uses moving mean/variance instead of mini-batch mean/variance)
probs_l = []
predicted_values = []
actual_values = []
labels_l = []
with torch.no_grad():
for images, labels in test_loader_subset:
images = images.to(device)
labels = labels.to(device)
outputs = model(images)
_, predicted = torch.max(outputs.data, 1)
predicted_test.append(predicted.cpu().numpy())
sm = torch.nn.Softmax()
probabilities = sm(outputs)
probs_l.append(probabilities)
labels_l.append(labels.cpu().numpy())
predicted_values.append(np.concatenate(predicted_test).ravel())
actual_values.append(np.concatenate(labels_l).ravel())
if (epoch) % 1 == 0:
print('test accuracy : ', 100 * len((np.where(np.array(predicted_values[0])==(np.array(actual_values[0])))[0])) / len(actual_values[0]))
I'm to attempting to integrate 'Local Interpretable Model-Agnostic Explanations for machine learning classifiers' : https://marcotcr.github.io/lime/
It appears PyTorch support is not enabled as it is not mentioned in doc and following tutorial :
https://marcotcr.github.io/lime/tutorials/Tutorial%20-%20images.html
With my updated code for PyTorch :
from lime import lime_image
import time
explainer = lime_image.LimeImageExplainer()
explanation = explainer.explain_instance(images[0].reshape(28,28), model(images[0]), top_labels=5, hide_color=0, num_samples=1000)
Causes error :
/opt/conda/lib/python3.6/site-packages/skimage/color/colorconv.py in gray2rgb(image, alpha)
830 is_rgb = False
831 is_alpha = False
--> 832 dims = np.squeeze(image).ndim
833
834 if dims == 3:
AttributeError: 'Tensor' object has no attribute 'ndim'
So appears tensorflow object is expected here ?
How to integrate LIME with PyTorch image classification ?
Here's my solution:
Lime expects an image input of type numpy. This is why you get the attribute error and a solution would be to convert the image (from Tensor) to numpy before passing it to the explainer object. Another solution would be to select a specific image with the test_loader_subset and convert it with img = img.numpy().
Secondly, in order to make LIME work with pytorch (or any other framework), you'll need to specify a batch prediction function which outputs the prediction scores of each class for each image. The name of this function (here I've called it batch_predict) is then passed to explainer.explain_instance(img, batch_predict, ...). The batch_predict needs to loop through all images passed to it, convert them to Tensor, make a prediction and finally return the prediction score list (with numpy values). This is how I got it working.
Note also that the images need to have shape (... ,... ,3) or (... ,... ,1) in order to be properly segmented by the default segmentation algorithm. This means that you might have to use np.transpose(img, (...)). You may specify the segmentation algorithm as well if the results are poor.
Finally you'll need to display the LIME image mask on top of the original image. This snippet shows how this may be done:
from skimage.segmentation import mark_boundaries
temp, mask = explanation.get_image_and_mask(explanation.top_labels[0], positive_only=False, num_features=5, hide_rest=False)
img_boundry = mark_boundaries(temp, mask)
plt.imshow(img_boundry)
plt.show()
This notebook is a good reference:
https://github.com/marcotcr/lime/blob/master/doc/notebooks/Tutorial%20-%20images%20-%20Pytorch.ipynb