Profiling TensorFlow Code to improve efficiency - python-3.x

I’m new to tensorflow. I takes images and returns a preference score for each image, using a pre-trained siamese cnn model. I would like to figure out what parts of my code are expensive. Is there a way to do that with tensorflow or tensorboard? Or does anyone see what parts of my code could be reworked to increase efficiency? Any tips are greatly appreciated.
Code:
# function to get preference scores for unseen items
def pref_score(Item_x,user_x):
from collections import defaultdict
from io import BytesIO
res_x = defaultdict(lambda: defaultdict(float))
for jjj in list(Item_x.keys()):
tstImg3=np.round(np.array(Image.open(BytesIO(Item_x[jjj][b'imgs'])).convert('RGB').resize((224,224)),dtype=np.float32))
rep_gan=tf.reshape(tstImg3, shape=[-1, 224, 224, 3])
with tf.device('/gpu:0'):
gan_image=rep_gan
image=tf.image.resize_nearest_neighbor(images=gan_image, size=[224,224], align_corners=None, name=None)
with tf.variable_scope("DVBPR") as scope:
scope.reuse_variables()
result = CNN(image,1.0)
user=tf.placeholder(dtype=tf.int32,shape=[1])
idx=tf.reduce_sum(tf.matmul(result,tf.transpose(tf.gather(thetau,user))),1)
res_x[jjj] = sess.run([gan_image,idx],feed_dict={user:[user_x]})[1]
print(user_x,"===>",jjj)
return res_x
# running function to get preference scores on images
test_res=pref_score(Item_x=test_item,user_x=1)

Related

resize_token_embeddings on the a pertrained model with different embedding size

I would like to ask about the way to change the embedding size of the trained model.
I have a trained model models/BERT-pretrain-1-step-5000.pkl.
Now I am adding a new token [TRA]to the tokeniser and try to use the resize_token_embeddings to the pertained one.
from pytorch_pretrained_bert_inset import BertModel #BertTokenizer
from transformers import AutoTokenizer
from torch.nn.utils.rnn import pad_sequence
import tqdm
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model_bert = BertModel.from_pretrained('bert-base-uncased', state_dict=torch.load('models/BERT-pretrain-1-step-5000.pkl', map_location=torch.device('cpu')))
#print(tokenizer.all_special_tokens) #--> ['[UNK]', '[SEP]', '[PAD]', '[CLS]', '[MASK]']
#print(tokenizer.all_special_ids) #--> [100, 102, 0, 101, 103]
num_added_toks = tokenizer.add_tokens(['[TRA]'], special_tokens=True)
model_bert.resize_token_embeddings(len(tokenizer)) # --> Embedding(30523, 768)
print('[TRA] token id: ', tokenizer.convert_tokens_to_ids('[TRA]')) # --> 30522
But I encountered the error:
AttributeError: 'BertModel' object has no attribute 'resize_token_embeddings'
I assume that it is because the model_bert(BERT-pretrain-1-step-5000.pkl) I had has the different embedding size.
I would like to know if there is any way to fit the embedding size of my modified tokeniser and the model I would like to use as the initial weights.
Thanks a lot!!
resize_token_embeddings is a huggingface transformer method. You are using the BERTModel class from pytorch_pretrained_bert_inset which does not provide such a method. Looking at the code, it seems like they have copied the BERT code from huggingface some time ago.
You can either wait for an update from INSET (maybe create a github issue) or write your own code to extend the word_embedding layer:
from torch import nn
embedding_layer = model.embeddings.word_embeddings
old_num_tokens, old_embedding_dim = embedding_layer.weight.shape
num_new_tokens = 1
# Creating new embedding layer with more entries
new_embeddings = nn.Embedding(
old_num_tokens + num_new_tokens, old_embedding_dim
)
# Setting device and type accordingly
new_embeddings.to(
embedding_layer.weight.device,
dtype=embedding_layer.weight.dtype,
)
# Copying the old entries
new_embeddings.weight.data[:old_num_tokens, :] = embedding_layer.weight.data[
:old_num_tokens, :
]
model.embeddings.word_embeddings = new_embeddings

Cannot export PyTorch model to ONNX

I am trying to convert a pre-trained torch model to ONNX, but recive the following error:
RuntimeError: step!=1 is currently not supported
I'm trying this on a pre-trained colorization model: https://github.com/richzhang/colorization
Here is the code I ran in Google Colab:
!git clone https://github.com/richzhang/colorization.git
cd colorization/
import colorizers
model = colorizer_siggraph17 = colorizers.siggraph17(pretrained=True).eval()
input_names = [ "input" ]
output_names = [ "output" ]
dummy_input = torch.randn(1, 1, 256, 256, device='cpu')
torch.onnx.export(model, dummy_input, "test_converted_model.onnx", verbose=True,
input_names=input_names, output_names=output_names)
I appreciate any help :)
UPDATE 1: #Proko suggestion solved the ONNX export issue. Now I have a new possibly related problem when I try to convert the ONNX to TensorRT. I get the following error:
[TensorRT] ERROR: Network must have at least one output
Here is the code I used:
import torch
import pycuda.driver as cuda
import pycuda.autoinit
import tensorrt as trt
import onnx
TRT_LOGGER = trt.Logger()
def build_engine(onnx_file_path):
# initialize TensorRT engine and parse ONNX model
builder = trt.Builder(TRT_LOGGER)
builder.max_workspace_size = 1 << 25
builder.max_batch_size = 1
if builder.platform_has_fast_fp16:
builder.fp16_mode = True
network = builder.create_network()
parser = trt.OnnxParser(network, TRT_LOGGER)
# parse ONNX
with open(onnx_file_path, 'rb') as model:
print('Beginning ONNX file parsing')
parser.parse(model.read())
print('Completed parsing of ONNX file')
# generate TensorRT engine optimized for the target platform
print('Building an engine...')
engine = builder.build_cuda_engine(network)
context = engine.create_execution_context()
print("Completed creating Engine")
return engine, context
ONNX_FILE_PATH = 'siggraph17.onnx' # Exported using the code above
engine,_ = build_engine(ONNX_FILE_PATH)
I tried to force the build_engine function to use the output of the network by:
network.mark_output(network.get_layer(network.num_layers-1).get_output(0))
but it did not work.
I appropriate any help!
Like I have mentioned in a comment, this is because slicing in torch.onnx supports only step = 1 but there are 2-step slicing in the model:
self.model2(conv1_2[:,:,::2,::2])
Your only option as for now is to rewrite slicing to be some other ops. You can do it by using range and reshape to obtain proper indices. Consider the following function "step-less-arange" (I hope it is generic enough for anyone with similar problem):
def sla(x, step):
diff = x % step
x += (diff > 0)*(step - diff) # add length to be able to reshape properly
return torch.arange(x).reshape((-1, step))[:, 0]
usage:
>> sla(11, 3)
tensor([0, 3, 6, 9])
Now you can replace every slice like this:
conv2_2 = self.model2(conv1_2[:,:,self.sla(conv1_2.shape[2], 2),:][:,:,:, self.sla(conv1_2.shape[3], 2)])
NOTE: you should optimize it. Indices are calculated for every call so it might be wise to pre-compute it.
I have tested it with my fork of the repo and I was able to save the model:
https://github.com/prokotg/colorization
What works for me was to add the opset_version=11 on torch.onnx.export
First I had tried use opset_version=10, but the API suggest 11 so it works.
So your function should be:
torch.onnx.export(model, dummy_input, "test_converted_model.onnx", verbose=True,opset_version=11,
input_names=input_names, output_names=output_names)

Can we save the result of the Hyperopt Trials with Sparktrials

I am currently trying to optimize the hyperparameters of a gradient boosting method with the library hyperopt. When I was working on my own computer, I used the class Trials and I was able to save and reload my results with the library pickles. This allowed me to have a save of all the set of parameters I tested. My code looked like that :
from hyperopt import SparkTrials, STATUS_OK, tpe, fmin
from LearningUtils.LearningUtils import build_train_test, get_train_test, mean_error, rmse, mae
from LearningUtils.constants import MAX_EVALS, CV, XGBOOST_OPTIM_SPACE, PARALELISM
from sklearn.model_selection import cross_val_score
import pickle as pkl
if os.path.isdir(PATH_TO_TRIALS): #we reload the past results
with open(PATH_TO_TRIALS, 'rb') as trials_file:
trials = pkl.load(trials_file)
else : # We create the trials file
trials = Trials()
# classic hyperparameters optimization
def objective(space):
regressor = xgb.XGBRegressor(n_estimators = space['n_estimators'],
max_depth = int(space['max_depth']),
learning_rate = space['learning_rate'],
gamma = space['gamma'],
min_child_weight = space['min_child_weight'],
subsample = space['subsample'],
colsample_bytree = space['colsample_bytree'],
verbosity=0
)
regressor.fit(X_train, Y_train)
# Applying k-Fold Cross Validation
accuracies = cross_val_score(estimator=regressor, x=X_train, y=Y_train, cv=5)
CrossValMean = accuracies.mean()
return {'loss':1-CrossValMean, 'status': STATUS_OK}
best = fmin(fn=objective,
space=XGBOOST_OPTIM_SPACE,
algo=tpe.suggest,
max_evals=MAX_EVALS,
trials=trials,
return_argmin=False)
# Save the trials
pkl.dump(trials, open(PATH_TO_TRIALS, "wb"))
Now, I would like to make this code work on a distant serveur with more CPUs in order to allow parallelisation and gain time.
I saw that I can simply do that using the SparkTrials class of hyperopt instead ot Trials. But, SparkTrials objects cannot be saved with pickles. Do you have any idea on how I could save and reload my trials results stored in a Sparktrials object ?
so this might be a bit late, but after messing around a bit, I found a kind of hacky solution:
spark_trials= SparkTrials()
pickling_trials = dict()
for k, v in spark_trials.__dict__.items():
if not k in ['_spark_context', '_spark']:
pickling_trials[k] = v
pickle.dump(pickling_trials, open('pickling_trials.hyperopt', 'wb'))
The _spark_context and the _spark attributes of the SparkTrials instance are the culprits of not being able to serialize the object. It turns out that you dont need them if you want to re-use the object, because if you want to re-run the optimization again, a new spark context is created anyway, so you can re use the trials as:
new_sparktrials = SparkTrials()
for att, v in pickling_trials.items():
setattr(new_sparktrials, att, v)
best = fmin(loss_func,
space=search_space,
algo=tpe.suggest,
max_evals=1000,
trials=new_sparktrials)
voilà :)

How to get a specific sample from pytorch DataLoader?

In Pytorch, is there any way of loading a specific single sample using the torch.utils.data.DataLoader class? I'd like to do some testing with it.
The tutorial uses
trainloader = torch.utils.data.DataLoader(...)
images, labels = next(iter(trainloader))
to fetch a random batch of samples. Is there are way, using DataLoader, to get a specific sample?
Cheers
Turn off the shuffle in DataLoader
Use batch_size to calculate the batch in which the desired sample you are looking for falls in
Iterate to the desired batch
Code
import torch
import numpy as np
import itertools
X= np.arange(100)
batch_size = 2
dataloader = torch.utils.data.DataLoader(X, batch_size=batch_size, shuffle=False)
sample_at = 5
k = int(np.floor(sample_at/batch_size))
my_sample = next(itertools.islice(dataloader, k, None))
print (my_sample)
Output:
tensor([4, 5])
if you want to get a specific signle sample from your dataset you can
you should check Subset class.(https://pytorch.org/docs/stable/data.html#torch.utils.data.Subset)
something like this:
indices = [0,1,2] # select your indices here as a list
subset = torch.utils.data.Subset(train_set, indices)
trainloader = DataLoader(subset , batch_size = 16 , shuffle =False) #set shuffle to False
for image , label in trainloader:
print(image.size() , '\t' , label.size())
print(image[0], '\t' , label[0]) # index the specific sample
here is a useful link if you want to learn more about the Pytorch data loading utility
(https://pytorch.org/docs/stable/data.html)

Using multiprocessing.pool load keras model cause predict "ValueError Tensor Tensor"

I have saved more than 1000 models for each item. Now I need to load all these models into memory (a dataframe) to do predictions. If I just use "for" loop to load these models, each loading will be 3 seconds slower than the previous model loading. So I try to use multiprocessing.pool (ThreadPool).
But, strangely, using ThreadPool will cause the prediction "ValueError: Tensor Tensor". If using normal loading, the prediction is fine.
I tried thread also got error msg
#following code will lead to ValueError
from multiprocessing.pool import ThreadPool as Pool
def load_model(stock):
model_pred.at[0, stock] = keras.models.load_model (
'C:/Users/chenp/Documents/rqpro/models/{}_model.h5'.format (stock))
pool = Pool(processes=16)
for stock in trade_stocks['stock']:
pool.map (load_model, (stock,))
#Prediction
for stock in trade_stocks['stock']:
model = model_pred.loc[0, stock]
prediction = model.predict(pred_data)
#Get following msg:
ValueError: Tensor Tensor("dense_9/Softmax:0", shape=(?, 2), dtype=float32) is not an element of this graph.
#Normal code but too low efficient
for stock in trade_stocks['stock']:
model_pred.at[0, stock] = keras.models.load_model(
'C:/Users/chenp/Documents/rqpro/models/{}_model.h5'.format(stock))
#Get following msg:
ValueError: Tensor Tensor("dense_9/Softmax:0", shape=(?, 2), dtype=float32) is not an element of this graph.
This happens as Keras is not thread safe. For solving this problem, please use _make_predict_function() before predicting. For detailed answer, please check

Resources