How to change Tensor format to channels first in Tensorflow js? - node.js

I'm new to Computer Vision model structure, and I'm using Tensorflow for Node JS #tensorflow/tfjs-node to make some models detect some objects. With Mobilenet and Resnet SSD, the models are using the Channels Last format, so when I create a Tensor with tf.node.decodeImage the format is by default Channels Last, like shape: [1, 1200, 1200, 3] for 3 channels, and the predictions data work great, able to recognize objects.
But model from Pytorch, converted to ONNX, then to Protobuf PB format, the saved_model.pb has the Channels First format, like shape: [1, 3, 1200, 1200].
Now I need to create Tensor from image but with Channels First format. I found many exemple of creating conv1d, conv2d specifying the format dataFormat='channelsFirst'. But I don't know how to apply it to an image data. Here is the API https://js.tensorflow.org/api/latest/#layers.conv2d .
Here is the Tensor codes:
const tf = require('#tensorflow/tfjs-node');
let imgTensor = tf.node.decodeImage(new Uint8Array(subBuffer), 3);
imgTensor = imgTensor.cast('float32').div(255);
imgTensor = imgTensor.expandDims(0); // to add the most left axis of size 1
console.log('tensor', imgTensor);
This gives me a shape with channels last that is not compatible with the Model shape with channels first:
tensor Tensor {
kept: false,
isDisposedInternal: false,
shape: [ 1, 1200, 1200, 3 ],
dtype: 'float32',
size: 4320000,
strides: [ 4320000, 3600, 3 ],
dataId: {},
id: 7,
rankType: '4',
scopeId: 4
}
I know of tf.shape, but it reshapes without converting to channels first, and the result seems useless in predictions results. Don't know what I'm missing.

you can use something like this:
const nchw = tf.transpose(nhwc, [0, 3, 1, 2]);

Related

HuggingFace-Transformers --- NER single sentence/sample prediction

I am trying to predict with the NER model, as in the tutorial from huggingface (it contains only the training+evaluation part).
I am following this exact tutorial here : https://github.com/huggingface/notebooks/blob/master/examples/token_classification.ipynb
The training works flawlessly, but the problems that I have begin when I try to predict on a simple sample.
model_checkpoint = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
loaded_model = AutoModel.from_pretrained('./my_model_own_custom_training.pth',
from_tf=False)
input_sentence = "John Nash is a great mathematician, he lives in France"
tokenized_input_sentence = tokenizer([input_sentence],
truncation=True,
is_split_into_words=False,
return_tensors='pt')
predictions = loaded_model(tokenized_input_sentence["input_ids"])[0]
Predictions is of shape (1,13,768)
How can I arrive at the final result of the form [JOHN <-> ‘B-PER’, … France <-> “B-LOC”], where B-PER and B-LOC are two ground truth labels, representing the tag for a person and location respectively?
The result of the prediction is:
torch.Size([1, 13, 768])
If I write:
print(predictions.argmax(axis=2))
tensor([613, 705, 244, 620, 206, 206, 206, 620, 620, 620, 477, 693, 308])
I get the tensor above.
However I would have expected to get the tensor representing the ground truth [0…8] labels from the ground truth annotations.
Summary when loading the model :
loading configuration file ./my_model_own_custom_training.pth/config.json
Model config DistilBertConfig {
“name_or_path": “distilbert-base-uncased”,
“activation”: “gelu”,
“architectures”: [
“DistilBertForTokenClassification”
],
“attention_dropout”: 0.1,
“dim”: 768,
“dropout”: 0.1,
“hidden_dim”: 3072,
“id2label”: {
“0”: “LABEL_0”,
“1”: “LABEL_1”,
“2”: “LABEL_2”,
“3”: “LABEL_3”,
“4”: “LABEL_4”,
“5”: “LABEL_5”,
“6”: “LABEL_6”,
“7”: “LABEL_7”,
“8”: “LABEL_8”
},
“initializer_range”: 0.02,
“label2id”: {
“LABEL_0”: 0,
“LABEL_1”: 1,
“LABEL_2”: 2,
“LABEL_3”: 3,
“LABEL_4”: 4,
“LABEL_5”: 5,
“LABEL_6”: 6,
“LABEL_7”: 7,
“LABEL_8”: 8
},
“max_position_embeddings”: 512,
“model_type”: “distilbert”,
“n_heads”: 12,
“n_layers”: 6,
“pad_token_id”: 0,
“qa_dropout”: 0.1,
“seq_classif_dropout”: 0.2,
“sinusoidal_pos_embds”: false,
"tie_weights”: true,
“transformers_version”: “4.8.1”,
“vocab_size”: 30522
}
The answer is a bit trickier than expected[Huge credits to Niels Rogge].
Firstly, loading models in huggingface-transformers can be done in (at least) two ways:
AutoModel.from_pretrained('./my_model_own_custom_training.pth', from_tf=False)
AutoModelForTokenClassification.from_pretrained('./my_model_own_custom_training.pth', from_tf=False)
It seems that, according to the task at hand, different AutoModels subclasses need to be used. In this scenario I posted, it is the AutoModelForTokenClassification() that has to be used.
After that, a solution to obtain the predictions would be to do the following:
# forward pass
outputs = model(**encoding)
logits = outputs.logits
predictions = logits.argmax(-1)

Reading test images for resnet18

I am trying to read an image file and classify and image.
My model is resnet18 I trained it previously and planning to use a different .py script to classify a list of images. This is my network:
PATH = './net.pth'
model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 16)
model_ft.load_state_dict(torch.load(PATH))
model_ft.eval()
And I am trying to read Images this way:
imsize = 256
loader = transforms.Compose([transforms.Scale(imsize), transforms.ToTensor()])
def image_loader(image_name):
#load image, returns cuda tensor
image = Image.open(image_name)
image = loader(image).float()
image = Variable(image, requires_grad=True)
image = image.unsqueeze(0)
return image.cuda()
image = image_loader("dataset/test/9673.png")
model_ft(image)
I am getting this error:
"Given groups=1, weight of size [64, 3, 7, 7], expected input[1, 4, 676, 256] to have 3 channels, but got 4 channels instead"
I've got recommended to remove the unsqueeze for resnet18, doing that I got the following error:
"Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 3-dimensional input of size [4, 676, 256] instead"
I do not quite understand the problem I am dealing with, how should I read my test set? I'll need to write the class ID-s and the file names into a .txt afterwards.
You are using a PNG image which has 4 channels. your network expects 3 channels.
Convert to RGB and you should be fine. In your image_loader simply do:
image = Image.open(image_name).convert('RGB')
I think your image input is of the shape: batch_size*4*height*width instead of batch_size*3*height*width. That's why the error. Can you do this and report the shape:
print(image.shape) after the call to image_loader().

fit_transform vs transform when doing inference

I have trained a keras model and saved it. I now want to use the model in a web app for inference. I want to preprocess the inputs by scaling them using StandardScaler() from sklearn.
But whenever i run transform(inputs) an error occurs wanting me to do fitting first. This was the code
from sklearn.preprocessing import StandardScaler
inputs = [1,8,0,0,4,18,4,3,576,9,8,8,14,1,0,4,0,0,3,6,0,1,1]
inputs = scale.transform(inputs)
preds = model.predict(inputs, batch_size = 1)
I then changed the code inorder to do fitting
from sklearn.preprocessing import StandardScaler
inputs = [1,8,0,0,4,18,4,3,576,9,8,8,14,1,0,4,0,0,3,6,0,1,1]
inputs = scale.fit_transform(inputs)
preds = model.predict(inputs, batch_size = 1)
It worked but the scaled data are all bunch of zeros regardless of the inputs i provide, making wrong predicitions. Am certain am missing some key concepts here, i am asking for help. Thank you
The standard scaler function has formula:
z = (x - u) / s
Here,
x: Element
u: Mean
s: Standard Deviation
This element transformation is done column-wise.
Therefore, when you call to fit the values of mean and standard_deviation are calculated.
Eg:
from sklearn.preprocessing import StandardScaler
import numpy as np
x = np.random.randint(50,size = (10,2))
x
Output:
array([[26, 9],
[29, 39],
[23, 26],
[29, 22],
[28, 41],
[11, 6],
[42, 40],
[ 1, 25],
[ 0, 39],
[44, 45]])
Now, fitting the standard scaler
scale = StandardScaler()
scale.fit(x)
You can see the mean and standard deviation using the built methods for the StandardScaler object
# Mean
scale.mean_ # array([23.3, 29.2])
# Standard Deviation
scale.scale_ # array([14.36697602, 13.12859475])
You transform these values using the transform method.
scale.transform(x)
Output:
array([[ 0.18793099, -1.53862621],
[ 0.3967432 , 0.74646222],
[-0.02088122, -0.24374277],
[ 0.3967432 , -0.54842122],
[ 0.32713913, 0.89880145],
[-0.85613006, -1.76713506],
[ 1.3015961 , 0.82263184],
[-1.55217075, -0.31991238],
[-1.62177482, 0.74646222],
[ 1.44080424, 1.20347991]])
Calculation for 1st element:
z = (26 - 23.3) / 14.36697602
z = 0.18793099
How to use this?
The transformation should be done before training your model. The training should be done on transformed data. And for the prediction, the test data should use the same mean and standard deviation values as your training data. ie. Do not use fit method on the test data. You should use the object that was used to transform the training data to transform your test data.

Maxpool of an image in pytorch

I'm trying to just apply maxpool2d (from torch.nn) on a single image (not as a maxpool layer). Here is my code right now:
name = 'astronaut'
imshow(images[name], name)
img = images[name]
# pool of square window of size=3, stride=1
m = nn.MaxPool2d(3,stride = 1)
img_transform = torch.Tensor(images[name])
plt.imshow(m(img_transform).view((512,510)))
The issue is, this code gives me a very green image as a result. I am sure the problem is with the dimensions of view, but I was unable to find how to apply maxpool to just one image so I couldn't fix it. The dimension of the image I'm considering is 512x512. The arguments for view make no sense for me right now, it's just the only number that gives a result...
If for example, I gave 512,512 as the argument for view, I get the following error:
RuntimeError: shape '[512, 512]' is invalid for input of size 261120
If anyone can tell me how to apply maxpool, avgpool, or minpool to an image and display the result I would be super grateful!
Thanks (:
Assuming your image is a numpy.array upon loading (please see comments for explanation of each step):
import numpy as np
import torch
# Assuming you have 3 color channels in your image
# Assuming your data is in Width, Height, Channels format
numpy_img = np.random.randint(low=0, high=255, size=(512, 512, 3))
# Transform to tensor
tensor_img = torch.from_numpy(numpy_img)
# PyTorch takes images in format Channels, Width, Height
# We have to switch their dimensions using `permute`
tensor_img = tensor_img.permute(2, 0, 1)
tensor_img.shape # Shape [3, 512, 512]
# Layers always need batch as first dimension (even for one image)
# unsqueeze will add it for you
ready_tensor_img = tensor_img.unsqueeze(dim=0)
ready_tensor_img.shape # Shape [1, 3, 512, 512]
pooling = torch.nn.MaxPool2d(kernel_size=3, stride=1)
# You need to cast your image to float as
# pooling is not implemented for Tensors of type long
new_img = pooling(ready_tensor_img.float())
If your image is black and white you would need shape [1, 1, 512, 512] (single channel only), you can't leave/squeeze those dimensions, they always have to be there for any torch.nn.Module!
To transform tensor into image again you could use similar steps:
# Cast to long and squeeze batch dimension
no_batch = new_img.long().squeeze(dim=0)
# Unpermute
width_height_channels = no_batch.permute(1, 2, 0)
width_height_channels.shape # Shape: [510, 510, 3]
# Cast to numpy and you have your image
final_image = width_height_channels.numpy()

Does tensorflow have shape variables?

Often, I'd like to work with variable-size data, e.g. the number of samples.
To get this data into tensorflow, I use python variables (e.g. "num_samples=2000") to define shapes.
This means I have to re-create a new graph for each number of samples.
Setting validate_shape=False is not an option to me.
Is there a Tensorflow-way of having dimension sizes as variables?
tf.placeholder() allows you to create tensors which will be filled only at runtime ; and it allows to define tensors with variable-size dimensions using None in their shape.
tf.shape() gives you the dynamic size of a tensor, itself as a tensor (actually as a tf.TensorShape, which you can use e.g. to dynamically generate other tensors). See tf.TensorShape for more detailed explanations.
An example to hopefully make things clearer:
import tensorflow as tf
import numpy as np
# Creating a placeholder for 3-channel images with undefined batche size, height and width:
images = tf.placeholder(tf.float32, shape=(None, None, None, 3))
# Dynamically obtaining the actual shape of the images:
images_shape = tf.shape(images)
# Demonstrating how this shape can be use to dynamically create other tensors:
ones = tf.ones(images_shape, dtype=images.dtype)
images_plus1 = images + ones
with tf.Session() as sess:
for i in range(2):
# Generating a random number of images with a random HxW:
num_images = np.random.randint(1, 10)
height, width = np.random.randint(10, 20), np.random.randint(10, 20)
images_zero = np.zeros((num_images, height, width, 3), dtype=np.float32)
# Running our TF operation, feeding the placeholder with the actual images:
res = sess.run(images_plus1, feed_dict={images: images_zero})
print("Shape: {} ; Pixel Val: {}".format(res.shape, res[0, 0, 0]))
# > Shape: (6, 14, 13, 3) ; Pixel Val: [1. 1. 1.]
# > Shape: (8, 11, 15, 3) ; Pixel Val: [1. 1. 1.]
# ^ As you can see, we run the same graph each time with a different number of
# images / different shape

Resources