I'm new to Computer Vision model structure, and I'm using Tensorflow for Node JS #tensorflow/tfjs-node to make some models detect some objects. With Mobilenet and Resnet SSD, the models are using the Channels Last format, so when I create a Tensor with tf.node.decodeImage the format is by default Channels Last, like shape: [1, 1200, 1200, 3] for 3 channels, and the predictions data work great, able to recognize objects.
But model from Pytorch, converted to ONNX, then to Protobuf PB format, the saved_model.pb has the Channels First format, like shape: [1, 3, 1200, 1200].
Now I need to create Tensor from image but with Channels First format. I found many exemple of creating conv1d, conv2d specifying the format dataFormat='channelsFirst'. But I don't know how to apply it to an image data. Here is the API https://js.tensorflow.org/api/latest/#layers.conv2d .
Here is the Tensor codes:
const tf = require('#tensorflow/tfjs-node');
let imgTensor = tf.node.decodeImage(new Uint8Array(subBuffer), 3);
imgTensor = imgTensor.cast('float32').div(255);
imgTensor = imgTensor.expandDims(0); // to add the most left axis of size 1
console.log('tensor', imgTensor);
This gives me a shape with channels last that is not compatible with the Model shape with channels first:
tensor Tensor {
kept: false,
isDisposedInternal: false,
shape: [ 1, 1200, 1200, 3 ],
dtype: 'float32',
size: 4320000,
strides: [ 4320000, 3600, 3 ],
dataId: {},
id: 7,
rankType: '4',
scopeId: 4
}
I know of tf.shape, but it reshapes without converting to channels first, and the result seems useless in predictions results. Don't know what I'm missing.
you can use something like this:
const nchw = tf.transpose(nhwc, [0, 3, 1, 2]);
I'm trying to just apply maxpool2d (from torch.nn) on a single image (not as a maxpool layer). Here is my code right now:
name = 'astronaut'
imshow(images[name], name)
img = images[name]
# pool of square window of size=3, stride=1
m = nn.MaxPool2d(3,stride = 1)
img_transform = torch.Tensor(images[name])
plt.imshow(m(img_transform).view((512,510)))
The issue is, this code gives me a very green image as a result. I am sure the problem is with the dimensions of view, but I was unable to find how to apply maxpool to just one image so I couldn't fix it. The dimension of the image I'm considering is 512x512. The arguments for view make no sense for me right now, it's just the only number that gives a result...
If for example, I gave 512,512 as the argument for view, I get the following error:
RuntimeError: shape '[512, 512]' is invalid for input of size 261120
If anyone can tell me how to apply maxpool, avgpool, or minpool to an image and display the result I would be super grateful!
Thanks (:
Assuming your image is a numpy.array upon loading (please see comments for explanation of each step):
import numpy as np
import torch
# Assuming you have 3 color channels in your image
# Assuming your data is in Width, Height, Channels format
numpy_img = np.random.randint(low=0, high=255, size=(512, 512, 3))
# Transform to tensor
tensor_img = torch.from_numpy(numpy_img)
# PyTorch takes images in format Channels, Width, Height
# We have to switch their dimensions using `permute`
tensor_img = tensor_img.permute(2, 0, 1)
tensor_img.shape # Shape [3, 512, 512]
# Layers always need batch as first dimension (even for one image)
# unsqueeze will add it for you
ready_tensor_img = tensor_img.unsqueeze(dim=0)
ready_tensor_img.shape # Shape [1, 3, 512, 512]
pooling = torch.nn.MaxPool2d(kernel_size=3, stride=1)
# You need to cast your image to float as
# pooling is not implemented for Tensors of type long
new_img = pooling(ready_tensor_img.float())
If your image is black and white you would need shape [1, 1, 512, 512] (single channel only), you can't leave/squeeze those dimensions, they always have to be there for any torch.nn.Module!
To transform tensor into image again you could use similar steps:
# Cast to long and squeeze batch dimension
no_batch = new_img.long().squeeze(dim=0)
# Unpermute
width_height_channels = no_batch.permute(1, 2, 0)
width_height_channels.shape # Shape: [510, 510, 3]
# Cast to numpy and you have your image
final_image = width_height_channels.numpy()
I am trying to implement the UNet architecture in Pytorch. When I print the model using print(model) I get the correct architecture:
but when I try to print the summary using (or any other input size for that matter):
from torchsummary import summary
summary(model, input_size=(13, 572, 572))
I get an error:
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 70 and 71 in dimension 2 at /Users/distiller/project/conda/conda-bld/pytorch_1579022061893/work/aten/src/TH/generic/THTensor.cpp:612
However, it works perfectly if I give the input_size as input_size=(3, 224, 224))( like it worked for this person here). I am so baffled.
Can someone help me what's wrong?
Edit: I have used the model architecture from here.
This UNet architecture you provided doesn't support that shape (unless the depth parameter is <= 3). Ultimately the reason for this is that the size of a downsampling operation isn't invertible since multiple input shapes map to the same output shape. For example consider
>> torch.nn.functional.max_pool2d(torch.zeros(1, 1, 10, 10), 2).shape
torch.Size([1, 1, 5, 5])
>> torch.nn.functional.max_pool2d(torch.zeros(1, 1, 11, 11), 2).shape
torch.Size([1, 1, 5, 5])
So the question is, given only the output shape is 5x5, what was the shape of the input? Was it 10x10 or 11x11? This same phenomenon applies to downsampling via strided convolutions.
The problem is that the UNet class tries to combine features from the downsampling half to the network to the features in the upsampling half. If it "guesses wrong" about the original shape during upsampling then you will receive a dimension mismatch error.
To avoid this issue you'll need to ensure that the height and width of your input data are multiples of 2**(depth-1). So, for the default depth=5 you need the input image height and width to be a multiple of 16 (e.g. 560 or 576). Alternatively, since 572 is divisible by 4 then you could also set depth=3 to make it work.
Does anybody know how to deal with Tensorflow 'work_element_count' errors?
F ./tensorflow/core/util/cuda_launch_config.h:127] Check failed: work_element_count > 0 (0 vs. 0)
Aborted (core dumped)
Here is part of my source code:
class DiscriminatorModel:
def __init__(self, session, some_parameters):
self.sess = session
self.parameters = some_parameters
def build_feed_dict(self, input_frames, gt_output_frames, generator):
feed_dict = {}
batch_size = np.shape(gt_output_frames)[0]
print(batch_size) # 1
print(np.shape(generator.input_frames_train)) # (?,7,32,32,32,1)
print(np.shape(input_frames)) # (1,7,32,32,32,1)
print(np.shape(generator.gt_frames_train)) # (?,7,32,32,32,1)
print(np.shape(gt_output_frames)) # (1,7,32,32,32,1)
g_feed_dict={generator.input_frames_train:input_frames,
generator.gt_frames_train:gt_output_frames}
def getshape(d):
if isinstance(d, dict):
return {k:getshape(d[k]) for k in d}
else:
return None
print("g_feed_dict shape :", getshape(g_feed_dict),"\n")
# {<tf.Tensor 'generator/data/Placeholder:0' shape=(?, 32, 32, 32, 1) dtype=float32>: None, <tf.Tensor 'generator/data/Placeholder_1:0' shape=(?, 32, 32, 32, 1) dtype=float32>: None}
print(sys.getsizeof(generator.scale_preds_train)) # 96
print(sys.getsizeof(g_feed_dict)) # 288
# error occurs here.
g_scale_preds = self.sess.run(generator.scale_preds_train, feed_dict=g_feed_dict)
# F ./tensorflow/core/util/cuda_launch_config.h:127] Check failed: work_element_count > 0 (0 vs. 0)
# Aborted (core dumped)
def train_step(self, batch, generator):
print(np.shape(batch)) # [1, 7, 32, 32, 32, 2]
input_frames = batch[:, :, :, :, :, :-1]
gt_output_frames = batch[:, :, :, :, :, -1:]
feed_dict = self.build_feed_dict(input_frames, gt_output_frames, generator)
class GeneratorModel:
def __init__(self, session, some_parameters):
self.sess = session
self.parameters = some_parameters
self.input_frames_train = tf.placeholder(
tf.float32, shape=[None, 7, 32, 32, 32, 1])
self.gt_frames_train = tf.placeholder(
tf.float32, shape=[None, 7, 32, 32, 32, 1])
self.input_frames_test = tf.placeholder(
tf.float32, shape=[None, 7, 32, 32, 32, 1])
self.gt_frames_test = tf.placeholder(
tf.float32, shape=[None, 7, 32, 32, 32, 1])
self.scale_preds_train = []
for p in range(4):
# scale size, 4 --> 8 --> 16 --> 32
sc = 4*(2**p)
# this passes tf.Tensor array of shape (1,7,sc,sc,sc,1)
train_preds = calculate(self.width_train,
self.height_train,
self.depth_train,
...)
self.scale_preds_train.append(train_preds
# [ <..Tensor shape=(1,7,4,4,4,1) ....>,
# <..Tensor shape=(1,7,8,8,8,1) ....>,
# <..Tensor shape=(1,7,16,16,16,1)..>,
# <..Tensor shape=(1,7,32,32,32,1)..> ]
print(self.scale_preds_train)
sess = tf.Session()
d_model = DiscriminatorModel(sess, some_parameters)
g_model = GeneratorModel(sess, some_parameters)
sess.run(tf.global_variables_initializer())
# this returns numpy array of shape [1,7,32,32,32,2]
batch = get_batch()
# trouble here.
d_model.train_step(batch, g_model)
I've seen some recommendations about:
use CUDA 9.0 / cuDNN 7.0 / tensorflow-gpu 1.7.0 (--> I'm already using these)
check if batch has size greater than 0 (--> it seems they are.)
do not use more gpus than the number of samples in a batch (--> I do not)
I use single 11GB gpu among 5 of them, specified as
~$ CUDA_VISIBLE_DEVICES=2 python3 foo.py
and the batch size is 1.
Can anyone tell the missing points or things I've done wrong?
Edit 1.
I found a case that gets through this error. If I give some modification to input like
# ... previous code does not change
print(sys.getsizeof(g_feed_dict)) # 288
temp_index = 0
temp_input = [generator.scale_preds_train[temp_index],
generator.scale_preds_train[temp_index],
generator.scale_preds_train[temp_index],
generator.scale_preds_train[temp_index]]
# this <temp_input> does not raise error here.
# however temp_index > 0 don't work.
g_scale_preds = self.sess.run(temp_input, feed_dict=g_feed_dict)
This makes input passed to the sess.run with its shape something like
[(1,7,4,4,4,1), (1,7,4,4,4,1), (1,7,4,4,4,1), (1,7,4,4,4,1)]
which should be (originally) list of scaled shapes like [(1,7,4,4,4,1), (1,7,8,8,8,1), (1,7,16,16,16,1), (1,7,32,32,32,1)].
Also, the arrays in the dictionary feed_dict is of shape
(1,7,32,32,32,1).
It seems like the error comes from tensorflow-gpu trying to reach wrong indices of array (where the memory is not allocated actually) therefore the "work element is count 0" (But I'm not sure yet).
I cannot understand why the temp_index > 0 (e.g. 1, 2, 3) does throw same
Check failed error, while 0 is the only shape that does not.
Edit 2.
After I changed my gpu from TITAN Xp to GeForce GTX, the error log said
Floating point exception (core dumped)
at the same code (sess.run).
In my case, one of the conv layers has 0 output feature maps, which causes this problem.
Now I've solved it..
Just as the GTX error log had told me, there was something becomes zero, and was actually a denominator (thus irrelevant with all of those code above). Specifications at the last debug is as follows:
CUDA 8.0 / Tensorflow 1.8.0
with GeForce GTX of course. I think the log showed different (and slightly more detailed) because of versions rather than the actual GPU, even though different version itself did not solve indeed.
I was training the model on Colab and got the same problem. The issue was 'num_classes', in the config file it was set to 2 while my model had 36 classes.
You should consider paying attention to num_classes in your config file.
Often, I'd like to work with variable-size data, e.g. the number of samples.
To get this data into tensorflow, I use python variables (e.g. "num_samples=2000") to define shapes.
This means I have to re-create a new graph for each number of samples.
Setting validate_shape=False is not an option to me.
Is there a Tensorflow-way of having dimension sizes as variables?
tf.placeholder() allows you to create tensors which will be filled only at runtime ; and it allows to define tensors with variable-size dimensions using None in their shape.
tf.shape() gives you the dynamic size of a tensor, itself as a tensor (actually as a tf.TensorShape, which you can use e.g. to dynamically generate other tensors). See tf.TensorShape for more detailed explanations.
An example to hopefully make things clearer:
import tensorflow as tf
import numpy as np
# Creating a placeholder for 3-channel images with undefined batche size, height and width:
images = tf.placeholder(tf.float32, shape=(None, None, None, 3))
# Dynamically obtaining the actual shape of the images:
images_shape = tf.shape(images)
# Demonstrating how this shape can be use to dynamically create other tensors:
ones = tf.ones(images_shape, dtype=images.dtype)
images_plus1 = images + ones
with tf.Session() as sess:
for i in range(2):
# Generating a random number of images with a random HxW:
num_images = np.random.randint(1, 10)
height, width = np.random.randint(10, 20), np.random.randint(10, 20)
images_zero = np.zeros((num_images, height, width, 3), dtype=np.float32)
# Running our TF operation, feeding the placeholder with the actual images:
res = sess.run(images_plus1, feed_dict={images: images_zero})
print("Shape: {} ; Pixel Val: {}".format(res.shape, res[0, 0, 0]))
# > Shape: (6, 14, 13, 3) ; Pixel Val: [1. 1. 1.]
# > Shape: (8, 11, 15, 3) ; Pixel Val: [1. 1. 1.]
# ^ As you can see, we run the same graph each time with a different number of
# images / different shape