huggingface distillbert classification using multiprocessing - pytorch

I am trying to use torch multiprocessing to parallelize the predictions from two separate huggingface distillbert classification models. It seems to be deadlocked at the prediction step. I am using python 3.6.5, torch 1.5.0 and huggingface transformers version 2.11.0.
The output from running the code is
Tree enc done
Begin tree prediction<------(Comment: Both begin tree
End tree predictions<------- and end tree predictions)
0.03125429153442383
Dn prediction
Dn enc done
Begin dn predictions<------(Comment: Both begin dn
End dn predictions<------- and end dn predictions)
0.029727697372436523
----------Done sequential predictions-------------
--------Start Parallel predictions--------------
Tree prediction
Tree enc done
Begin tree prediction. <------(Comment: Process is deadlocked after this)
Dn prediction
Dn enc done
Begin dn predictions. <-------(Comment: Process is deadlocked after this)
During parallel predictions it seems to be deadlocking and not printing out "End tree predictions" and "End dn predictions". Not sure why this is happening.
The code is
import torch
import torch.multiprocessing as mp
import time
import transformers
from transformers import DistilBertForSequenceClassification
# Load the BERT tokenizer.
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)
tree_model = DistilBertForSequenceClassification.from_pretrained(
"distilbert-base-uncased",
num_labels = 2,
output_attentions = False,
output_hidden_states = False
)
tree_model.eval()
dn_model = DistilBertForSequenceClassification.from_pretrained(
"distilbert-base-uncased",
num_labels = 2,
output_attentions = False,
output_hidden_states = False,
)
dn_model.eval()
tree_model.share_memory()
dn_model.share_memory()
def predict(sentences =[], tokenizer=tokenizer,models=(tree_model,dn_model,None)):
MAX_SENTENCE_LENGTH = 16
start = time.time()
input_ids = []
attention_masks = []
predictions = []
tree_model = models[0]
dn_model = models[1]
if models[0]:
print("Tree prediction")
if models[1]:
print("Dn prediction")
for sent in sentences:
encoded_dict = tokenizer.encode_plus(
sent,
add_special_tokens = True,
max_length = MAX_SENTENCE_LENGTH,
pad_to_max_length = True,
return_attention_mask = True,
return_tensors = 'pt',
)
# Add the encoded sentence to the list.
input_ids.append(encoded_dict['input_ids'])
# And its attention mask (simply differentiates padding from non-padding).
attention_masks.append(encoded_dict['attention_mask'])
if tree_model:
print("Tree enc done")
if dn_model:
print("Dn enc done")
# Convert the lists into tensors.
new_input_ids = torch.cat(input_ids, dim=0)
new_attention_masks = torch.cat(attention_masks, dim=0)
with torch.no_grad():
# Forward pass, calculate logit predictions
if tree_model:
print("Begin tree prediction")
outputs = tree_model(new_input_ids,
attention_mask=new_attention_masks)
print("End tree predictions")
else:
print("Begin dn predictions")
outputs = dn_model(new_input_ids,
attention_mask=new_attention_masks)
print("End dn predictions")
logits = outputs[0]
logits = logits.detach().cpu()
print(time.time()-start)
predictions = logits
return predictions
def get_tree_prediction(sentence, tokenizer=tokenizer,models=(tree_model,dn_model, None)):
return predict(sentences =[sentence], tokenizer=tokenizer,models=models)
def get_dn_prediction(sentence, tokenizer=tokenizer,models=(tree_model,dn_model, None)):
return predict(sentences =[sentence], tokenizer=tokenizer,models=models)
if __name__ == '__main__':
sentence = "hello world"
processes = []
get_tree_prediction(sentence, tokenizer, (tree_model,None,None))
get_dn_prediction(sentence, tokenizer, (None,dn_model,None))
print("----------Done sequential predictions-------------")
print('\n--------Start Parallel predictions--------------')
tr_p = mp.Process(target=get_tree_prediction, args=(sentence, tokenizer,
(tree_model,None,None)))
tr_p.start()
processes.append(tr_p)
dn_p = mp.Process(target=get_dn_prediction, args=(sentence, tokenizer,
(None,dn_model,None)))
dn_p.start()
processes.append(dn_p)
for p in processes:
p.join()

Related

SHAP Error - OM when allocating tensor with shape[23020,128,768]

I am trying to get shap values using shap.KernelExplainer for a bert classifier implemented using keras layers. The error that I get comes because there is not enough memory. But I am not sure what it is causing it because I have reduced all the parameters to as much as I can.
Error:
ResourceExhaustedError: OOM when allocating tensor with shape[23020,128,768] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:Tile]
Transformer Model:
def process_sentences(sentence: List[str],
tokenizer: PreTrainedTokenizer,
max_len: int) -> Dict[str, np.ndarray]:
"""
Tokenize the text sentences.
Parameters
----------
sentence:
Sentence to be processed.
tokenizer:
Tokenizer to be used.
Returns
-------
Tokenized representation containing:
- input_ids
- attention_mask
"""
# since we are using the model for classification, we need to include special char (i.e, '[CLS]', ''[SEP]')
# check the example here: https://huggingface.co/transformers/v4.4.2/quicktour.html
z = tokenizer(sentence,
add_special_tokens=True,
padding='max_length',
max_length=max_len,
truncation=True,
return_attention_mask = True,
return_tensors='np')
return z;
use_bert = True
if use_bert:
from transformers import BertTokenizerFast
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
else:
from transformers import DistilBertTokenizerFast
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')
if use_bert:
from transformers import TFBertModel, BertConfig
config = BertConfig(output_hidden_states=True)
transformer = TFBertModel.from_pretrained('bert-base-uncased', config=config)
else:
from transformers import TFDistilBertModel, DistilBertConfig
config = DistilBertConfig(output_hidden_states=True)
transformer = TFDistilBertModel.from_pretrained('distilbert-base-uncased', config=config)
X_train = train.clean_review.values.tolist()
X_test=test.clean_review.values.tolist()
# tokenize datasets
X_train = process_sentences(X_train, tokenizer, max_len)
X_test = process_sentences(X_test, tokenizer, max_len)
y_train = y_train.rate
y_test = y_test.rate
class Classifier(tf.keras.Model):
def __init__(self,
transformer,
hidden_dims: int = 128,
output_dims: int = 2,
dropout_rate: float = 0.2):
"""
Constructor
Parameters
----------
transformer:
Transformer model to be leveraged.
hidden_dims:
hidden layer's dimension.
output_dims:
Output layer's dimension.
dropout_rate:
Dropout layer's dropout rate.
"""
super().__init__()
self.hidden_dims = hidden_dims
self.output_dims = output_dims
self.dropout_rate = dropout_rate
self.transformer = transformer
self.dense_1 = tf.keras.layers.Dense(self.hidden_dims, activation='relu')
self.dropout_1 = tf.keras.layers.Dropout(self.dropout_rate)
self.dense_2 = tf.keras.layers.Dense(self.output_dims, activation='softmax')
def call(self,
input_ids: Union[np.ndarray, tf.Tensor],
attention_mask: Optional[Union[np.ndarray, tf.Tensor]]=None,
training=False):
"""
Performs forward pass throguh the model.
Parameters
----------
input_ids:
Indices of input sequence tokens in the vocabulary.
attention_mask:
Mask to avoid performing attention on padding token indices.
Returns
-------
Classification probabilities.
"""
out = self.transformer(input_ids=input_ids, attention_mask=attention_mask, training=training)
out = out.last_hidden_state[:, 0, :] # extract the embedding corresponding to [CLS] token
out = self.dense_1(out)
out = self.dropout_1(out, training=training)
out = self.dense_2(out)
return out
# define the classification model
model = Classifier(transformer)
ShAP:
##background
X_train=train.clean_review.values.tolist()[:10]
X_train = process_sentences(X_train, tokenizer, max_len)
# tokenize text
tokenized_samples = X_train
X_train = tokenized_samples['input_ids']
# the values of the kwargs have to be `tf.Tensor`.
# see transformers issue #14404: https://github.com/huggingface/transformers/issues/14404
kwargs_train = {k:tf.constant(v) for k, v in tokenized_samples.items() if k == 'attention_mask'}
##SAMPLE
X_test=test.clean_review.values.tolist()[:3]
X_test = process_sentences(X_test, tokenizer, max_len)
# tokenize text
tokenized_samples = X_test
X_test = tokenized_samples['input_ids']
# the values of the kwargs have to be `tf.Tensor`.
# see transformers issue #14404: https://github.com/huggingface/transformers/issues/14404
kwargs_test = {k:tf.constant(v) for k, v in tokenized_samples.items() if k == 'attention_mask'}
kernel_explainer = shap.KernelExplainer(model, X_train)
kernel_shap_values = kernel_explainer.shap_values(X_test)

Issues calculating accuracy for custom BERT model

I'm having some issues trying to calculate the accuracy of a custom BERT model which also uses the pretrained model from Huggingface. This is the code that I have :
import numpy as np
import pandas as pd
from sklearn import metrics, linear_model
import torch
from torch.utils.data import Dataset, DataLoader, RandomSampler, SequentialSampler
from transformers import BertTokenizer, BertModel
from torch import cuda
import re
import torch.nn as nn
device = 'cuda' if cuda.is_available() else 'cpu'
MAX_LEN = 200
TRAIN_BATCH_SIZE = 8 # 12, 64
VALID_BATCH_SIZE = 4
EPOCHS = 1
LEARNING_RATE = 1e-4 #3e-4, 1e-4, 5e-5, 3e-5
tokenizer = BertTokenizer.from_pretrained('bert-base-multilingual-uncased')
file1 = open('test.txt', 'r')
list_com = []
list_label = []
for line in file1:
possible_labels = 'positive|negative'
label = re.findall(possible_labels, line)
line = re.sub(possible_labels, ' ', line)
line = re.sub('\n', ' ', line)
list_com.append(line)
list_label.append(label[0])
list_tuples = list(zip(list_com, list_label))
file1.close()
labels = ['positive', 'negative']
df = pd.DataFrame(list_tuples, columns=['review', 'sentiment'])
df['sentiment'] = df['sentiment'].map({'positive': 1, 'negative': 0})
for i in range(0,len(df['sentiment'])):
list_label[i] = df['sentiment'][i]
#print(df)
class CustomDataset(Dataset):
def __init__(self, dataframe, tokenizer, max_len):
self.tokenizer = tokenizer
self.data = dataframe
self.comment_text = dataframe.review
self.targets = self.data.sentiment
self.max_len = max_len
def __len__(self):
return len(self.comment_text)
def __getitem__(self, index):
comment_text = str(self.comment_text[index])
comment_text = " ".join(comment_text.split())
inputs = self.tokenizer.encode_plus(comment_text,None,add_special_tokens=True,max_length=self.max_len,
pad_to_max_length=True,return_token_type_ids=False,truncation=True)
ids = inputs['input_ids']
mask = inputs['attention_mask']
return {
'ids': torch.tensor(ids, dtype=torch.long),
'mask': torch.tensor(mask, dtype=torch.long),
'targets': torch.tensor(self.targets[index], dtype=torch.float)
}
train_size = 0.8
train_dataset=df.sample(frac=train_size,random_state=200)
test_dataset=df.drop(train_dataset.index).reset_index(drop=True)
train_dataset = train_dataset.reset_index(drop=True)
print("FULL Dataset: {}".format(df.shape))
print("TRAIN Dataset: {}".format(train_dataset.shape))
print("TEST Dataset: {}".format(test_dataset.shape))
training_set = CustomDataset(train_dataset, tokenizer, MAX_LEN)
testing_set = CustomDataset(test_dataset, tokenizer, MAX_LEN)
train_params = {'batch_size': TRAIN_BATCH_SIZE,'shuffle': True,'num_workers': 0}
test_params = {'batch_size': VALID_BATCH_SIZE,'shuffle': True,'num_workers': 0}
training_loader = DataLoader(training_set, **train_params)
testing_loader = DataLoader(testing_set, **test_params)
class BERTClass(torch.nn.Module):
def __init__(self):
super(BERTClass, self).__init__()
self.bert = BertModel.from_pretrained('bert-base-multilingual-uncased',return_dict=False,num_labels = 2)
self.lstm = nn.LSTM(768, 256, batch_first=True, bidirectional=True)
self.linear = nn.Linear(256*2,2)
def forward(self, ids , mask):
sequence_output, pooled_output = self.bert(ids, attention_mask=mask )
lstm_output, (h, c) = self.lstm(sequence_output) ## extract the 1st token's embeddings
hidden = torch.cat((lstm_output[:, -1, :256], lstm_output[:, 0, 256:]), dim=-1)
linear_output = self.linear(lstm_output[:, -1].view(-1, 256 * 2))
return linear_output
model = BERTClass()
model.to(device)
#print(model)
def loss_fn(outputs, targets):
return torch.nn.CrossEntropyLoss()(outputs, targets)
optimizer = torch.optim.Adam(params = model.parameters(), lr=LEARNING_RATE)
def train(epoch):
model.train()
for _, data in enumerate(training_loader, 0):
ids = data['ids'].to(device, dtype=torch.long)
mask = data['mask'].to(device, dtype=torch.long)
targets = data['targets'].to(device, dtype=torch.long)
outputs = model(ids, mask)
optimizer.zero_grad()
loss = loss_fn(outputs, targets)
if _ % 1000 == 0:
print(f'Epoch: {epoch}, Loss: {loss.item()}')
optimizer.zero_grad()
loss.backward()
optimizer.step()
for epoch in range(EPOCHS):
train(epoch)
def validation(epoch):
model.eval()
fin_targets = []
fin_outputs = []
with torch.no_grad():
for _, data in enumerate(testing_loader, 0):
ids = data['ids'].to(device, dtype=torch.long)
mask = data['mask'].to(device, dtype=torch.long)
targets = data['targets'].to(device, dtype=torch.float)
outputs = model(ids, mask)
fin_targets.extend(targets.cpu().detach().numpy().tolist())
fin_outputs.extend(torch.sigmoid(outputs).cpu().detach().numpy().tolist())
return fin_outputs, fin_targets
for epoch in range(EPOCHS):
outputs, targets = validation(epoch)
outputs = np.array(outputs) >= 0.5
accuracy = metrics.accuracy_score(targets, outputs)
print(f"Accuracy Score = {accuracy}")
torch.save(model.state_dict, 'model.pt')
print(f'Model saved!')
It should be a binary classification, positive(1) or negative(0), but when i try to compute the accuracy i get the error ValueError: Classification metrics can't handle a mix of binary and multilabel-indicator targets oh this line accuracy = metrics.accuracy_score(targets, outputs) .The outputs look like this:
[[ True False]
[False False]
[ True False]
[ True False]
[ True False]
[ True False]
[ True False]
[False True]
[ True False]
[ True False]
[False True]]
Can someone advise what would be the fix to this? Or if there something else that can improve this? Also, I saved the model and I want to know how can I use the saved model to classify user input in another .py file?(assuming that we enter a sentence from keyboard and we want the model to classify it).

Gradient-based saliency of input words in a pytorch model from transformers library

The following code is used to determine the impact of input words on the most probable output unit.
def _register_embedding_list_hook(model, embeddings_list):
def forward_hook(module, inputs, output):
embeddings_list.append(output.squeeze(0).clone().cpu().detach().numpy())
embedding_layer = model.bert.embeddings.word_embeddings
handle = embedding_layer.register_forward_hook(forward_hook)
return handle
def _register_embedding_gradient_hooks(model, embeddings_gradients):
def hook_layers(module, grad_in, grad_out):
embeddings_gradients.append(grad_out[0])
embedding_layer = model.bert.embeddings.word_embeddings
hook = embedding_layer.register_backward_hook(hook_layers)
return hook
def saliency_map(model, input_ids, segment_ids, input_mask):
torch.enable_grad()
model.eval()
embeddings_list = []
handle = _register_embedding_list_hook(model, embeddings_list)
embeddings_gradients = []
hook = _register_embedding_gradient_hooks(model, embeddings_gradients)
model.zero_grad()
A = model(input_ids, token_type_ids=segment_ids, attention_mask=input_mask)
pred_label_ids = np.argmax(A.logits[0].detach().numpy())
A.logits[0][pred_label_ids].backward()
handle.remove()
hook.remove()
saliency_grad = embeddings_gradients[0].detach().cpu().numpy()
saliency_grad = np.sum(saliency_grad[0] * embeddings_list[0], axis=1)
norm = np.linalg.norm(saliency_grad, ord=1)
saliency_grad = [e / norm for e in saliency_grad]
return saliency_grad
which is used in the following way (for a sentiment analysis model):
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("textattack/bert-base-uncased-imdb")
model = AutoModelForSequenceClassification.from_pretrained("textattack/bert-base-uncased-imdb")
tokens = tokenizer('A really bad movie')
input_ids = torch.tensor([tokens['input_ids']], dtype=torch.long)
token_type_ids = torch.tensor([tokens['token_type_ids']], dtype=torch.long)
attention_ids = torch.tensor([tokens['attention_mask']], dtype=torch.long)
saliency_scores = saliency_map(model, input_ids,
token_type_ids,
attention_ids)
But it produces the following scores for the tokens which are nonsense since for example "bad" has a negative effect on the predicted class (which is negative). What's wrong with this code?
Here are some more examples:

Python scikit svm "Vocabulary not fitted or provided"

Playing around with Python's scikit SVM Linear Support Vector Classification and I'm running into an error when I attempt to make predictions:
import pickle
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.stem import PorterStemmer
from nltk import word_tokenize
import string
# Function to pass the list to the Tf-idf vectorizer
def returnPhrase(inputList):
return inputList
# Pre-processing the sentence which we input to predict the emotion
def transformSentence(sentence):
s = []
sentence = sentence.replace('\n', '')
sentTokenized = word_tokenize(sentence)
s.append(sentTokenized)
sWithoutPunct = []
punctList = list(string.punctuation)
curSentList = s[0]
newSentList = []
for word in curSentList:
if word.lower() not in punctList:
newSentList.append(word.lower())
sWithoutPunct.append(newSentList)
mystemmer = PorterStemmer()
tokenziedStemmed = []
for i in range(0, len(sWithoutPunct)):
curList = sWithoutPunct[i]
newList = []
for word in curList:
newList.append(mystemmer.stem(word))
tokenziedStemmed.append(newList)
return tokenziedStemmed
# Extracting the features for SVM
myVectorizer = TfidfVectorizer(analyzer='word', tokenizer=returnPhrase, preprocessor=returnPhrase,
token_pattern=None,
ngram_range=(1, 3))
# The SVM Model
curC = 2 # cost factor in SVM
SVMClassifier = svm.LinearSVC(C=curC)
filename = 'finalized_model.sav'
# load the model from disk
loaded_model = pickle.load(open(filename, 'rb'))
# Input sentence
with open('trial_truth_001.txt', 'r') as file:
sent = file.read().replace('\n', '')
transformedTest = transformSentence(sent)
X_test = myVectorizer.transform(transformedTest).toarray()
Prediction = loaded_model.predict(X_test)
# Printing the predicted emotion
print(Prediction)
It's when I attempt to use the LinearSVC to predict that I'm informed:
sklearn.exceptions.NotFittedError: Vocabulary not fitted or provided
What am I missing here? Obviously it is the way I fit and transform the data.
I think you just have to change the line
X_test = myVectorizer.transform(transformedTest).toarray()
to
X_test = myVectorizer.fit_transform(transformedTest).toarray()

Pytorch-Implement the same model in pytorch and keras but got different results

I am learning pytorch and want to practice it with an keras example (https://keras.io/examples/lstm_seq2seq/), this is a seq2seq 101 example which translate eng to fra on char-level features (no embedding).
Keras code is below:
from keras.models import Model
from keras.layers import Input, LSTM, Dense
import numpy as np
batch_size = 64 # Batch size for training.
epochs = 100 # Number of epochs to train for.
latent_dim = 256 # Latent dimensionality of the encoding space.
num_samples = 10000 # Number of samples to train on.
# Path to the data txt file on disk.
data_path = 'fra-eng/fra.txt'
# Vectorize the data.
input_texts = []
target_texts = []
input_characters = set()
target_characters = set()
with open(data_path, 'r', encoding='utf-8') as f:
lines = f.read().split('\n')
for line in lines[: min(num_samples, len(lines) - 1)]:
input_text, target_text = line.split('\t')
# We use "tab" as the "start sequence" character
# for the targets, and "\n" as "end sequence" character.
target_text = '\t' + target_text + '\n'
input_texts.append(input_text)
target_texts.append(target_text)
for char in input_text:
if char not in input_characters:
input_characters.add(char)
for char in target_text:
if char not in target_characters:
target_characters.add(char)
input_characters = sorted(list(input_characters))
target_characters = sorted(list(target_characters))
num_encoder_tokens = len(input_characters)
num_decoder_tokens = len(target_characters)
max_encoder_seq_length = max([len(txt) for txt in input_texts])
max_decoder_seq_length = max([len(txt) for txt in target_texts])
print('Number of samples:', len(input_texts))
print('Number of unique input tokens:', num_encoder_tokens)
print('Number of unique output tokens:', num_decoder_tokens)
print('Max sequence length for inputs:', max_encoder_seq_length)
print('Max sequence length for outputs:', max_decoder_seq_length)
input_token_index = dict(
[(char, i) for i, char in enumerate(input_characters)])
target_token_index = dict(
[(char, i) for i, char in enumerate(target_characters)])
encoder_input_data = np.zeros(
(len(input_texts), max_encoder_seq_length, num_encoder_tokens),
dtype='float32')
decoder_input_data = np.zeros(
(len(input_texts), max_decoder_seq_length, num_decoder_tokens),
dtype='float32')
decoder_target_data = np.zeros(
(len(input_texts), max_decoder_seq_length, num_decoder_tokens),
dtype='float32')
for i, (input_text, target_text) in enumerate(zip(input_texts, target_texts)):
for t, char in enumerate(input_text):
encoder_input_data[i, t, input_token_index[char]] = 1.
for t, char in enumerate(target_text):
# decoder_target_data is ahead of decoder_input_data by one timestep
decoder_input_data[i, t, target_token_index[char]] = 1.
if t > 0:
# decoder_target_data will be ahead by one timestep
# and will not include the start character.
decoder_target_data[i, t - 1, target_token_index[char]] = 1.
# Define an input sequence and process it.
encoder_inputs = Input(shape=(None, num_encoder_tokens))
encoder = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None, num_decoder_tokens))
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,
initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
# Run training
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
model.fit([encoder_input_data, decoder_input_data], decoder_target_data,
batch_size=batch_size,
epochs=epochs,
validation_split=0.2)
# Save model
model.save('s2s.h5')
# Next: inference mode (sampling).
# Here's the drill:
# 1) encode input and retrieve initial decoder state
# 2) run one step of decoder with this initial state
# and a "start of sequence" token as target.
# Output will be the next target token
# 3) Repeat with the current target token and current states
# Define sampling models
encoder_model = Model(encoder_inputs, encoder_states)
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(
decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
[decoder_inputs] + decoder_states_inputs,
[decoder_outputs] + decoder_states)
# Reverse-lookup token index to decode sequences back to
# something readable.
reverse_input_char_index = dict(
(i, char) for char, i in input_token_index.items())
reverse_target_char_index = dict(
(i, char) for char, i in target_token_index.items())
def decode_sequence(input_seq):
# Encode the input as state vectors.
states_value = encoder_model.predict(input_seq)
# Generate empty target sequence of length 1.
target_seq = np.zeros((1, 1, num_decoder_tokens))
# Populate the first character of target sequence with the start character.
target_seq[0, 0, target_token_index['\t']] = 1.
# Sampling loop for a batch of sequences
# (to simplify, here we assume a batch of size 1).
stop_condition = False
decoded_sentence = ''
while not stop_condition:
output_tokens, h, c = decoder_model.predict(
[target_seq] + states_value)
# Sample a token
sampled_token_index = np.argmax(output_tokens[0, -1, :])
sampled_char = reverse_target_char_index[sampled_token_index]
decoded_sentence += sampled_char
# Exit condition: either hit max length
# or find stop character.
if (sampled_char == '\n' or
len(decoded_sentence) > max_decoder_seq_length):
stop_condition = True
# Update the target sequence (of length 1).
target_seq = np.zeros((1, 1, num_decoder_tokens))
target_seq[0, 0, sampled_token_index] = 1.
# Update states
states_value = [h, c]
return decoded_sentence
for seq_index in range(100):
# Take one sequence (part of the training set)
# for trying out decoding.
input_seq = encoder_input_data[seq_index: seq_index + 1]
decoded_sentence = decode_sequence(input_seq)
print('-')
print('Input sentence:', input_texts[seq_index])
print('Decoded sentence:', decoded_sentence)
I want to implement this exact same model using pytorch, below is my code:
from __future__ import unicode_literals, print_function, division
from io import open
import unicodedata
import string
import re
import random
import numpy as np
import torch
import torch.nn as nn
from torch import optim
import torch.nn.functional as F
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
data_path = './eng_fra.txt'
# Vectorize the data.
input_texts = []
target_texts = []
input_characters = set()
target_characters = set()
with open(data_path, 'r', encoding='utf-8') as f:
lines = f.read().split('\n')
for line in lines[: min(num_samples, len(lines) - 1)]:
#print('line:',line)
input_text, target_text = line.split('\t')
# We use "tab" as the "start sequence" character
# for the targets, and "\n" as "end sequence" character.
target_text = '\t' + target_text + '\n' # why?
# print('input_text and target_text:',input_text, target_text)
input_texts.append(input_text)
target_texts.append(target_text)
for char in input_text:
if char not in input_characters:
input_characters.add(char)
for char in target_text:
if char not in target_characters:
target_characters.add(char)
input_characters = sorted(list(input_characters))
target_characters = sorted(list(target_characters))
num_encoder_tokens = len(input_characters)
print('input_characters',input_characters)
num_decoder_tokens = len(target_characters)
print('target_characters',target_characters)
max_encoder_seq_length = max([len(txt) for txt in input_texts])
max_decoder_seq_length = max([len(txt) for txt in target_texts])
print('max_encoder_seq_length and max_decoder_seq_length',max_encoder_seq_length,max_decoder_seq_length)
input_token_index = dict(
[(char, i) for i, char in enumerate(input_characters)])
target_token_index = dict(
[(char, i) for i, char in enumerate(target_characters)])
# define the shapes
encoder_input_data = np.zeros(
(len(input_texts), max_encoder_seq_length, num_encoder_tokens),
dtype='float32')
decoder_input_data = np.zeros(
(len(input_texts), max_decoder_seq_length, num_decoder_tokens),
dtype='float32')
decoder_target_data = np.zeros(
(len(input_texts), max_decoder_seq_length, num_decoder_tokens),
dtype='float32')
# one hot encoding for each word in each sentence
for i, (input_text, target_text) in enumerate(zip(input_texts, target_texts)):
for t, char in enumerate(input_text):
encoder_input_data[i, t, input_token_index[char]] = 1.
for t, char in enumerate(target_text):
# decoder_target_data is ahead of decoder_input_data by one timestep
decoder_input_data[i, t, target_token_index[char]] = 1.
if t > 0:
# decoder_target_data will be ahead by one timestep
# and will not include the start character.
decoder_target_data[i, t - 1, target_token_index[char]] = 1.
encoder_input_data=torch.Tensor(encoder_input_data).to(device)
decoder_input_data=torch.Tensor(decoder_input_data).to(device)
decoder_target_data=torch.Tensor(decoder_target_data).to(device)
class encoder(nn.Module):
def __init__(self):
super(encoder,self).__init__()
self.LSTM=nn.LSTM(input_size=num_encoder_tokens,hidden_size=256,batch_first=True)
def forward(self,x):
out,(h,c)=self.LSTM(x)
return h,c
class decoder(nn.Module):
def __init__(self):
super(decoder,self).__init__()
self.LSTM=nn.LSTM(input_size=num_decoder_tokens,hidden_size=256,batch_first=True)
self.FC=nn.Linear(256,num_decoder_tokens)
def forward(self,x, hidden):
out,(h,c)=self.LSTM(x,hidden)
out=self.FC(out)
return out,(h,c)
class seq2seq(nn.Module):
def __init__(self,encoder,decoder):
super(seq2seq,self).__init__()
self.encoder=encoder
self.decoder=decoder
def forward(self,encode_input_data,decode_input_data):
hidden, cell = self.encoder(encode_input_data)
output, (hidden, cell) = self.decoder(decode_input_data, (hidden, cell))
return output
encoder=encoder().to(device)
# encoder_loss = nn.CrossEntropyLoss() # CrossEntropyLoss compute softmax internally in pytorch
# encoder_optimizer = torch.optim.Adam(encoder.parameters(), lr=0.001)
decoder=decoder().to(device)
# decoder_loss = nn.CrossEntropyLoss() # CrossEntropyLoss compute softmax internally in pytorch
# decoder_optimizer = torch.optim.Adam(decoder.parameters(), lr=0.001)
model=seq2seq(encoder,decoder).to(device)
optimizer = optim.RMSprop(model.parameters(),lr=0.01)
loss_fun=nn.CrossEntropyLoss()
# model.train()
num_epochs=50
batches=np.array_split(range(decoder_target_data.shape[0]),100)
total_step=len(batches)
for epoch in range(num_epochs):
for i,batch_ids in enumerate(batches):
encoder_input=encoder_input_data[batch_ids]
decoder_input=decoder_input_data[batch_ids]
decoder_target=decoder_target_data[batch_ids]
output = model(encoder_input, decoder_input)
loss=loss_fun(output.view(-1,93).to(device),decoder_target.view(-1,93).max(dim=1)[1].to(device))
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (i+1) % 20 == 0:
print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
.format(epoch+1, num_epochs, i+1, total_step, loss.item()))
# Reverse-lookup token index to decode sequences back to
# something readable.
reverse_input_char_index = dict(
(i, char) for char, i in input_token_index.items())
reverse_target_char_index = dict(
(i, char) for char, i in target_token_index.items())
def decode_sequence(input_seq):
# Encode the input as state vectors.
h,c=model.encoder(input_seq)
# Generate empty target sequence of length 1.
# Populate the first character of target sequence with the start character.
target_seq = torch.zeros((1, 1, num_decoder_tokens)).to(device)
target_seq[0, 0, target_token_index['\t']] = 1.
# Sampling loop for a batch of sequences
# (to simplify, here we assume a batch of size 1).
stop_condition = False
decoded_sentence = ''
while not stop_condition:
output_tokens, (h_t, c_t) = model.decoder(target_seq,(h,c))
# Sample a token
sampled_token_index = output_tokens.view(-1,93).squeeze(0).max(dim=0)[1].item()
sampled_char = reverse_target_char_index[sampled_token_index]
decoded_sentence += sampled_char
# Exit condition: either hit max length
# or find stop character.
if (sampled_char == '\n' or
len(decoded_sentence) > max_decoder_seq_length):
stop_condition = True
# Update the target sequence (of length 1).
target_seq = torch.zeros((1, 1, num_decoder_tokens)).to(device)
target_seq[0, 0, sampled_token_index] = 1.
# Update states
h,c=h_t,c_t
return decoded_sentence
for seq_index in range(100):
# Take one sequence (part of the training set)
# for trying out decoding.
input_seq = encoder_input_data[seq_index: seq_index + 1]
decoded_sentence = decode_sequence(input_seq)
print('-')
print('Input sentence:', input_texts[seq_index])
print('Decoded sentence:', decoded_sentence)
As you can see, I used exactly the same data processing and model structure. My pytorch version can run without error, but the performance seems worse than the original keras version by comparing the translation results.
One thing might cause error is the loss function (cross_entropy). In pytorch, cross_entropy loss function seems does not support one-hot labels directly, which I need to change the label to integer. However I don't think this should make big difference.
If you want to run the models, the data can be downloaded from:
https://github.com/jinfagang/pytorch_chatbot/blob/master/datasets/eng-fra.txt
Did I do something wrong in my code? Many thanks
One way to look at the issue would be:
Fixing seeds to the same value in both Pytorch and Keras, albeit it cannot really guarantee the same output.
Weight initialization in Pytorch is different from Keras. Make sure they have the same weight initialization functions
I've been using for a problem of mine and I can say that even with 1 and 2 being setup identically, there is a high probability of getting the same results (that could be due to the way how Pytorch is implemented).
Hope that helps! Please update us if you managed to resolve your issue.

Resources