How can I do simple matmul on edge tpu? - python-3.x

I can't work out how to invoke my .tflite model that does matmul on the coral accelerator using the python api.
The .tflite model is generated from some example code here. It works well using the tf.lite.Interpreter() class but I don't know how to transform it to work with the edgetpu class. I have tried edgetpu.basic.basic_engine.BasicEngine() by changing the models datatype from numpy.float32 to numpy.uint8, but that did not help. I am a complete beginner with TensorFlow and just want to use my tpu for matmul.
import numpy
import tensorflow as tf
import edgetpu
from edgetpu.basic.basic_engine import BasicEngine
def export_tflite_from_session(session, input_nodes, output_nodes, tflite_filename):
print("Converting to tflite...")
converter = tf.lite.TFLiteConverter.from_session(session, input_nodes, output_nodes)
tflite_model = converter.convert()
with open(tflite_filename, "wb") as f:
f.write(tflite_model)
print("Converted %s." % tflite_filename)
#This does matmul just fine but does not use the TPU
def test_tflite_model(tflite_filename, examples):
print("Loading TFLite interpreter for %s..." % tflite_filename)
interpreter = tf.lite.Interpreter(model_path=tflite_filename)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
print("input details: %s" % input_details)
print("output details: %s" % output_details)
for i, input_tensor in enumerate(input_details):
interpreter.set_tensor(input_tensor['index'], examples[i])
interpreter.invoke()
model_output = []
for i, output_tensor in enumerate(output_details):
model_output.append(interpreter.get_tensor(output_tensor['index']))
return model_output
#this should use the TPU, but I don't know how to run the model or if it needs
#further processing. One matrix can be constant for my use case
def test_tpu(tflite_filename,examples):
print("Loading TFLite interpreter for %s..." % tflite_filename)
#TODO edgetpu.basic
interpreter = BasicEngine(tflite_filename)
interpreter.allocate_tensors()#does not work...
def main():
tflite_filename = "model.tflite"
shape_a = (2, 2)
shape_b = (2, 2)
a = tf.placeholder(dtype=tf.float32, shape=shape_a, name="A")
b = tf.placeholder(dtype=tf.float32, shape=shape_b, name="B")
c = tf.matmul(a, b, name="output")
numpy.random.seed(1234)
a_ = numpy.random.rand(*shape_a).astype(numpy.float32)
b_ = numpy.random.rand(*shape_b).astype(numpy.float32)
with tf.Session() as session:
session_output = session.run(c, feed_dict={a: a_, b: b_})
export_tflite_from_session(session, [a, b], [c], tflite_filename)
tflite_output = test_tflite_model(tflite_filename, [a_, b_])
tflite_output = tflite_output[0]
#test the TPU
tflite_output = test_tpu(tflite_filename, [a_, b_])
print("Input example:")
print(a_)
print(a_.shape)
print(b_)
print(b_.shape)
print("Session output:")
print(session_output)
print(session_output.shape)
print("TFLite output:")
print(tflite_output)
print(tflite_output.shape)
print(numpy.allclose(session_output, tflite_output))
if __name__ == '__main__':
main()

You're only converting your model once, and your model is not fully compiled for the Edge TPU. From the docs:
At the first point in the model graph where an unsupported operation occurs, the compiler partitions the graph into two parts. The first part of the graph that contains only supported operations is compiled into a custom operation that executes on the Edge TPU, and everything else executes on the CPU
There are several specific requirements that the model must meet:
quantization-aware training
constant tensor sizes and model parameters at compile time
tensors are 3-dimensional or smaller.
models only use operations supported by the Edge TPU.
There is an online compiler as well as a CLI version that is useful for translating .tflite models into Edge TPU compatible .tflite models.
Your code is also incomplete. You've passed your model to the class here:
interpreter = BasicEngine(tflite_filename)
but you're missing the step of actually running the inference on the tensor:
output = RunInference(interpreter)

Related

Cannot export PyTorch model to ONNX

I am trying to convert a pre-trained torch model to ONNX, but recive the following error:
RuntimeError: step!=1 is currently not supported
I'm trying this on a pre-trained colorization model: https://github.com/richzhang/colorization
Here is the code I ran in Google Colab:
!git clone https://github.com/richzhang/colorization.git
cd colorization/
import colorizers
model = colorizer_siggraph17 = colorizers.siggraph17(pretrained=True).eval()
input_names = [ "input" ]
output_names = [ "output" ]
dummy_input = torch.randn(1, 1, 256, 256, device='cpu')
torch.onnx.export(model, dummy_input, "test_converted_model.onnx", verbose=True,
input_names=input_names, output_names=output_names)
I appreciate any help :)
UPDATE 1: #Proko suggestion solved the ONNX export issue. Now I have a new possibly related problem when I try to convert the ONNX to TensorRT. I get the following error:
[TensorRT] ERROR: Network must have at least one output
Here is the code I used:
import torch
import pycuda.driver as cuda
import pycuda.autoinit
import tensorrt as trt
import onnx
TRT_LOGGER = trt.Logger()
def build_engine(onnx_file_path):
# initialize TensorRT engine and parse ONNX model
builder = trt.Builder(TRT_LOGGER)
builder.max_workspace_size = 1 << 25
builder.max_batch_size = 1
if builder.platform_has_fast_fp16:
builder.fp16_mode = True
network = builder.create_network()
parser = trt.OnnxParser(network, TRT_LOGGER)
# parse ONNX
with open(onnx_file_path, 'rb') as model:
print('Beginning ONNX file parsing')
parser.parse(model.read())
print('Completed parsing of ONNX file')
# generate TensorRT engine optimized for the target platform
print('Building an engine...')
engine = builder.build_cuda_engine(network)
context = engine.create_execution_context()
print("Completed creating Engine")
return engine, context
ONNX_FILE_PATH = 'siggraph17.onnx' # Exported using the code above
engine,_ = build_engine(ONNX_FILE_PATH)
I tried to force the build_engine function to use the output of the network by:
network.mark_output(network.get_layer(network.num_layers-1).get_output(0))
but it did not work.
I appropriate any help!
Like I have mentioned in a comment, this is because slicing in torch.onnx supports only step = 1 but there are 2-step slicing in the model:
self.model2(conv1_2[:,:,::2,::2])
Your only option as for now is to rewrite slicing to be some other ops. You can do it by using range and reshape to obtain proper indices. Consider the following function "step-less-arange" (I hope it is generic enough for anyone with similar problem):
def sla(x, step):
diff = x % step
x += (diff > 0)*(step - diff) # add length to be able to reshape properly
return torch.arange(x).reshape((-1, step))[:, 0]
usage:
>> sla(11, 3)
tensor([0, 3, 6, 9])
Now you can replace every slice like this:
conv2_2 = self.model2(conv1_2[:,:,self.sla(conv1_2.shape[2], 2),:][:,:,:, self.sla(conv1_2.shape[3], 2)])
NOTE: you should optimize it. Indices are calculated for every call so it might be wise to pre-compute it.
I have tested it with my fork of the repo and I was able to save the model:
https://github.com/prokotg/colorization
What works for me was to add the opset_version=11 on torch.onnx.export
First I had tried use opset_version=10, but the API suggest 11 so it works.
So your function should be:
torch.onnx.export(model, dummy_input, "test_converted_model.onnx", verbose=True,opset_version=11,
input_names=input_names, output_names=output_names)

How to use multiprocessing in PyTorch?

I'm trying to use PyTorch with complex loss function. In order to accelerate the code, I hope that I can use the PyTorch multiprocessing package.
The first trial, I put 10x1 features into the NN and get 10x4 output.
After that, I want to pass 10x4 parameters into a function to do some calculation. (The calculation will be complex in the future.)
After calculating, the function will return a 10x1 array in total. This array will be set as NN_energy and calculate loss function.
Besides, I also want to know if there is another method to create a backward-able array to store the NN_energy array, instead of using
NN_energy = net(Data_in)[0:10,0]
Thanks a lot.
Full Code:
import torch
import numpy as np
from torch.autograd import Variable
from torch import multiprocessing
def func(msg,BOP):
ans = (BOP[msg][0]+BOP[msg][1]/BOP[msg][2])*BOP[msg][3]
return ans
class Net(torch.nn.Module):
def __init__(self, n_feature, n_hidden_1, n_hidden_2, n_output):
super(Net, self).__init__()
self.hidden_1 = torch.nn.Linear(n_feature , n_hidden_1) # hidden layer
self.hidden_2 = torch.nn.Linear(n_hidden_1, n_hidden_2) # hidden layer
self.predict = torch.nn.Linear(n_hidden_2, n_output ) # output layer
def forward(self, x):
x = torch.tanh(self.hidden_1(x)) # activation function for hidden layer
x = torch.tanh(self.hidden_2(x)) # activation function for hidden layer
x = self.predict(x) # linear output
return x
if __name__ == '__main__': # apply_async
Data_in = Variable( torch.from_numpy( np.asarray(list(range( 0,10))).reshape(10,1) ).float() )
Ground_truth = Variable( torch.from_numpy( np.asarray(list(range(20,30))).reshape(10,1) ).float() )
net = Net( n_feature=1 , n_hidden_1=15 , n_hidden_2=15 , n_output=4 ) # define the network
optimizer = torch.optim.Rprop( net.parameters() )
loss_func = torch.nn.MSELoss() # this is for regression mean squared loss
NN_output = net(Data_in)
args = range(0,10)
pool = multiprocessing.Pool()
return_data = pool.map( func, zip(args, NN_output) )
pool.close()
pool.join()
NN_energy = net(Data_in)[0:10,0]
for i in range(0,10):
NN_energy[i] = return_data[i]
loss = torch.sqrt( loss_func( NN_energy , Ground_truth ) ) # must be (1. nn output, 2. target)
print(loss)
Error messages:
File
"C:\ProgramData\Anaconda3\lib\site-packages\torch\multiprocessing\reductions.py",
line 126, in reduce_tensor
raise RuntimeError("Cowardly refusing to serialize non-leaf tensor which requires_grad, "
RuntimeError: Cowardly refusing to serialize non-leaf tensor which
requires_grad, since autograd does not support crossing process
boundaries. If you just want to transfer the data, call detach() on
the tensor before serializing (e.g., putting it on the queue).
First of all, Torch Variable API is deprecated since a very long time, just don't use it.
Next, torch.from_numpy( np.asarray(list(range( 0,10))).reshape(10,1) ).float() is wrong at many levels: np.asarray of list is useless since a copy will be performed anyway, and np.array takes list as input by design. Then, np.arange is available to return a range as numpy array, and it is also available on Torch. Next, specifying both dimension for reshape is useless and error prone, you could simply do reshape((-1, 1)), or even better unsqueeze(-1).
Here is the simplified expression torch.arange(10, dtype=torch.float32, requires_grad=True).unsqueeze(-1).
Using multiprocessing pool is a bad practice if using batch processing is possible. It will be both way more efficient and readable. Indeed, performing N small algebraic operations in parallel is always slower and a larger single algebraic operation, and even more on GPU. More importantly, computing the gradient is not supported by multiprocessing, hence the error that you get. Yet, this is partially true, because it is supports for tensors on cpu since 1.6.0. Have a lok, to the official release changelog.
Could you post a more representative example of what func method could be to make sure you really need it ?
NB: Distributed autograd as you are looking is now available in Pytorch as an experimental feature available in beta since 1.6.0. Have a look to the official documentation.

How to load tfrecord in pytorch?

How to use tfrecord with pytorch?
I have downloaded "Youtube8M" datasets with video-level features, but it is stored in tfrecord.
I tried to read some sample from these file to convert it to numpy and then load in pytorch. But it failed.
reader = YT8MAggregatedFeatureReader()
files = tf.gfile.Glob("/Data/youtube8m/train*.tfrecord")
filename_queue = tf.train.string_input_producer(
files, num_epochs=5, shuffle=True)
training_data = [
reader.prepare_reader(filename_queue) for _ in range(1)
]
unused_video_id, model_input_raw, labels_batch, num_frames = tf.train.shuffle_batch_join(
training_data,
batch_size=1024,
capacity=1024 * 5,
min_after_dequeue=1024,
allow_smaller_final_batch=True ,
enqueue_many=True)
with tf.Session() as sess:
label_numpy = labels_batch.eval()
print(type(label_numpy))
But this step have no result, just stuck for a long while without any response.
One work around is to use tensorflow 1.1* eager mode or tensorflow 2+ to loop through the dataset(so you can use var len feature, use buckets window), then just
torch.as_tensor(val.numpy()).to(device) to use in torch.
You can use the DALI library to load the tfrecords directly in a PyTorch code.
You can find out, how to do it in their documentation.
Maybe this can help you: TFRecord reader for PyTorch
I cooked up this:
class LiTS(torch.utils.data.Dataset):
def __init__(self, filenames):
self.filenames = filenames
def __len__(self):
return len(self.filenames)
def __getitem__(self, idx):
volume, segmentation = None, None
if idx >= len(self):
raise IndexError()
ds = tf.data.TFRecordDataset(filenames[idx:idx+1])
for x, y in ds.map(read_tfrecord):
volume = torch.from_numpy(x.numpy())
segmentation = torch.from_numpy(y.numpy())
return volume, segmentation

Tensorflow : How to get tensor name from Tensorboard?

I downloaded a ssd_mobilenet_v2_coco from tensorflow detection model zoo. And I used import_pb_to_tensorboard.py to show the structure on Tensorboard.
I find a node named 'image_tensor', this is the picture discribed in Tensorboard.
I want to use the function 'get_tensor_by_name()' to input a new image and get the ouputs. However, it failed.
I tried 'get_operation_by_name()' , it didn't work neither.
Here is the code:
import tensorflow as tf
def one_image(im_path, model_path):
sess= tf.Session()
with sess.as_default():
image_tensor = tf.image.decode_jpeg(tf.read_file(im_path), channels=3)
saver = tf.train.import_meta_graph(model_path + "/model.ckpt.meta")
saver.restore(sess, tf.train.latest_checkpoint(model_path))
graph = tf.get_default_graph()
# x = graph.get_tensor_by_name("import/image_tensor:0")
# out_put = graph.get_tensor_by_name("import/detection_classes:0")
x = graph.get_operation_by_name("import/image_tensor").outputs[0]
outputs = graph.get_operation_by_name("import/detection_classes").outputs[0]
out_put = sess.run(outputs, feed_dict={x: image_tensor.eval()})
print(out_put)
sess.close()
if __name__ == "__main__":
one_image("testimg-4-resize.jpg", "ssd_mobilenet_v2_coco_2018_03_29")
And here is the KeyError:
KeyError: "The name 'import/image_tensor' refers to an Operation not in the graph."
I am wondering how to get the tensor name from Tensorboard and whether there is another way to load model from 'only-ckpts'.
'only-ckpts' means files only include 'model.ckpt.data-00000-of-00001' , 'model.ckpt.index' and 'model.ckpt.meta'.
Any advice will be grateful. Thank you in advance.
The tool import_pb_to_tensorboard.py uses tf.import_graph_def to import the graph and uses default name argument, which is "import" as documented.
Your code imports the graph through tf.train.import_meta_graph and uses default import_scope argument, which will not prefix imported tensor or operation name. It is obvious then you have two options to correct this error:
Do the following in place of your import_meta_graph line:
saver = tf.train.import_meta_graph(model_path + "/model.ckpt.meta",
import_scope='import')
Remove import/ prefix when trying to get tensor or operation by name like this:
x = graph.get_tensor_by_name("image_tensor:0")
out_put = graph.get_tensor_by_name("detection_classes:0")
x = graph.get_operation_by_name("image_tensor").outputs[0]
outputs = graph.get_operation_by_name("detection_classes").outputs[0]

Creating a session in a graph that uses another graph and its session

Versions : I am using tensorflow (version : v1.1.0-13-g8ddd727 1.1.0) in python3 (Python 3.4.3 (default, Nov 17 2016, 01:08:31) [GCC 4.8.4] on linux), it is installed from source and GPU-based (name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate (GHz) 1.076).
Context : Generative adversarial networks (GANs) learn to synthesise new samples from a high-dimensional distribution by passing samples drawn from a latent space through a generative network. When the high-dimensional distribution describes images of a particular data set, the network should learn to generate visually similar image samples for latent variables that are close to each other in the latent space. For tasks such as image retrieval and image classification, it may be useful to exploit the arrangement of the latent space by projecting images into it, and using this as a representation for discriminative tasks.
Context Problem : I am trying to invert a generator (compute L2 norm between an input image in cifar10 and a image g(z) of the generator, where z is a parameter to be trained with stochastic gradient descent in order to minimize this norm and find an approximation of the preimage of the input image).
Technical Issue : Therefore, I am building a new graph in a new session in tensorflow but I need to use a trained gan that was trained in another session, which I cannot import because the two graphs are not the same. That is to say, when I use sess.run(), the variables are not found and therefore there is a Error Message.
The code is
import tensorflow as tf
from data import cifar10, utilities
from . import dcgan
import logging
logger = logging.getLogger("gan.test")
BATCH_SIZE = 1
random_z = tf.get_variable(name='z_to_invert', shape=[BATCH_SIZE, 100], initializer=tf.random_normal_initializer())
#random_z = tf.random_normal([BATCH_SIZE, 100], mean=0.0, stddev=1.0, name='random_z')
# Generate images with generator
generator = dcgan.generator(random_z, is_training=True, name='generator')
# Add summaries to visualise output images
generator_visualisation = tf.cast(((generator / 2.0) + 0.5) * 255.0, tf.uint8)
summary_generator = tf.summary.\
image('summary/generator', generator_visualisation,
max_outputs=8)
#Create one image to test inverting
test_image = map((lambda inp: (inp[0]*2. -1., inp[1])),
utilities.infinite_generator(cifar10.get_train(), BATCH_SIZE))
inp, _ = next(test_image)
summary_inp = tf.summary.image('input_image', inp)
img_summary = tf.summary.merge([summary_generator, summary_inp])
with tf.name_scope('error'):
error = inp - generator #generator = g(z)
# We set axis = None because norm(tensor, ord=ord) is equivalent to norm(reshape(tensor, [-1]), ord=ord)
error_norm = tf.norm(error, ord=2, axis=None, keep_dims=False, name='L2Norm')
summary_error = tf.summary.scalar('error_norm', error_norm)
with tf.name_scope('Optimizing'):
optimizer = tf.train.AdamOptimizer(0.001).minimize(error_norm, var_list=z)
sv = tf.train.Supervisor(logdir="gan/invert_logs/", save_summaries_secs=None, save_model_secs=None)
batch = 0
with sv.managed_session() as sess:
logwriter = tf.summary.FileWriter("gan/invert_logs/", sess.graph)
while not sv.should_stop():
if batch > 0 and batch % 100 == 0:
logger.debug('Step {} '.format(batch))
(_, s) = sess.run((optimizer, summary_error))
logwriter.add_summary(s, batch)
print('step %d: Patiente un peu poto!' % batch)
img = sess.run(img_summary)
logwriter.add_summary(img, batch)
batch += 1
print(batch)
I understood what is the problem, it is actually that I am trying to run a session which is saved in gan/train_logs but the graph does not have those variables I am trying to run.
Therefore, I tried to implement this instead :
graph = tf.Graph()
tf.reset_default_graph()
with tf.Session(graph=graph) as sess:
ckpt = tf.train.get_checkpoint_state('gan/train_logs/')
saver = tf.train.import_meta_graph(ckpt.model_checkpoint_path + '.meta', clear_devices=True)
saver.restore(sess, ckpt.model_checkpoint_path)
logwriter = tf.summary.FileWriter("gan/invert_logs/", sess.graph)
#inp, _ = next(test_image)
BATCH_SIZE = 1
#Create one image to test inverting
test_image = map((lambda inp: (inp[0]*2. -1., inp[1])),
utilities.infinite_generator(cifar10.get_train(), BATCH_SIZE))
inp, _ = next(test_image)
#M_placeholder = tf.placeholder(tf.float32, shape=cifar10.get_shape_input(), name='M_input')
M_placeholder = inp
zmar = tf.summary.image('input_image', inp)
#Create sample noise from random normal distribution
z = tf.get_variable(name='z', shape=[BATCH_SIZE, 100], initializer=tf.random_normal_initializer())
# Function g(z) zhere z is randomly generated
g_z = dcgan.generator(z, is_training=True, name='generator')
generator_visualisation = tf.cast(((g_z / 2.0) + 0.5) * 255.0, tf.uint8)
sum_generator = tf.summary.image('summary/generator', generator_visualisation)
img_summary = tf.summary.merge([sum_generator, zmar])
with tf.name_scope('error'):
error = M_placeholder - g_z
# We set axis = None because norm(tensor, ord=ord) is equivalent to norm(reshape(tensor, [-1]), ord=ord)
error_norm = tf.norm(error, ord=2, axis=None, keep_dims=False, name='L2Norm')
summary_error = tf.summary.scalar('error_norm', error_norm)
with tf.name_scope('Optimizing'):
optimizer = tf.train.AdamOptimizer(0.001).minimize(error_norm, var_list=z)
sess.run(tf.global_variables_initializer())
for i in range(10000):
(_, s) = sess.run((optimizer, summary_error))
logwriter.add_summary(s, i)
print('step %d: Patiente un peu poto!' % i)
img = sess.run(img_summary)
logwriter.add_summary(img, i)
print('Done Training')
This script runs, but I have checked on tensorboard, the generator that is used here does not have the trained weights and it only produces noise.
I think I am trying to run a session in a graph that uses another graph and its trained session. I have read thoroughly the Graphs and Session documentation on tensorflow website https://www.tensorflow.org/versions/r1.3/programmers_guide/graphs, I have found an interesting tf.import_graph_def function :
You can rebind tensors in the imported graph to tf.Tensor objects in the default graph by passing the optional input_map argument. For example, input_map enables you to take import a graph fragment defined in a tf.GraphDef, and statically connect tensors in the graph you are building to tf.placeholder tensors in that fragment.
You can return tf.Tensor or tf.Operation objects from the imported graph by passing their names in the return_elements list.
But I don't know how to use this function, no example is given, and also I only found those two links that may help me :
https://github.com/tensorflow/tensorflow/issues/7508
Tensorflow: How to use a trained model in a application?
It would be really nice to have your help on this topic. This should be straightforward for someone who has already used the tf.import_graph_def function... What I really need is to get the trained generator to apply it to a new variable z which is to be trained in another session.
Thanks

Resources