Pytorch cifar10 images are not normalized - pytorch

transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainset.data[0]
I am using the above code and expect that data will be normalized. But it is not, below is the result. I need to access data using the data method to do some more processing. Output is below.
array([[[ 59, 62, 63],
[ 43, 46, 45],
[ 50, 48, 43],
...,
[158, 132, 108],
[152, 125, 102],
[148, 124, 103]],

The torchvision.transforms.Normalize is merely a shift-scale operator. Given parameters mean (the "shift") and std (the "scale"), it will map the input to (input - shift) / scale.
Since you are using mean=0.5 and std=0.5 on all three channels, the results with be (input - 0.5) / 0.5 which is only normalizing your data if its statistic is in fact mean=0.5 and std=0.5 which is of course not the case.
With that in mind, what you should be doing is providing the actual dataset's statistics. For CIFAR10, these can be for example found here:
mean = [0.4914, 0.4822, 0.4465]
std = [0.2470, 0.2435, 0.2616]
With those values, you will be able to normalize your data properly to mean=0 and std=1.
I've written a more general, long-form answer here.

Related

How to lower the last dimension of a Tensor?

I have an immature question.
For example, I got a tensor with the size of: torch.Size([2, 1, 80, 64]).
I need to turn it into another tensor with the size of: torch.Size([2, 1, 80, 16]).
Are there any right ways to achieve that?
There exist many functions to achieve dimensionality reduction and the following are some examples:
randomly select 16 out of the 64 features
take the mean of every four features (64/4=16)
use a dimensionality reduction technique like PCA
apply a linear transformation
apply a convolution function
To give a satisfying answer, more information about why and what you want to do is necessary.
Answered by: #ptrblck_de
slice the tensor
y = x[..., :16]
print(y.shape)
# torch.Size([2, 1, 80, 16])
index it with a stride of 4
y = x[..., ::4]
print(y.shape)
# torch.Size([2, 1, 80, 16])
use any pooling (max, avg, etc.) layer (the same would also work using adaptive pooling layers)
pool = nn.MaxPool2d((1, 2), (1, 4))
y = pool(x)
print(y.shape)
# torch.Size([2, 1, 80, 16])
pool = nn.AdaptiveAvgPool2d(output_size=(80, 16))
y = pool(x)
print(y.shape)
# torch.Size([2, 1, 80, 16])
or manually reduce the last dimension with any reduction op (sum, mean, max, etc.)

How to use affine_grid for batch tensor in pytorch?

Following the official tutorial on affine_grid, this line (inside function stn()):
grid = F.affine_grid(theta, x.size())
gives me an error:
RuntimeError: Expected size for first two dimensions of batch2 tensor to be: [10, 3] but got: [4000, 3].
using the input as follow:
from torchinfo import summary
model = Net()
summary(model, input_size=(10, 1, 256, 256))
theta is of size (4000, 2, 3) and x is of size (10, 1, 256, 256), how do I properly manipulate theta for correct dimension to use with batch tensor?
EDIT: I honestly don't see any mistake here, and the dimension is actually according the the affine_grid doc, is there something changed in the function itself?

Ignore padding class (0) during multi class classification

I have a problem where given a set of tokens, predict another token. For this task I use an embedding layer with Vocab-size + 1 as input_size. The +1 is because the sequences are padded with zeros. Eg. given a Vocab-size of 10 000 and max_sequence_len=6, x_train looks like:
array([[ 0, 0, 0, 11, 22, 4],
[ 29, 6, 12, 29, 1576, 29],
...,
[ 0, 0, 67, 8947, 7274, 7019],
[ 0, 0, 0, 15, 10000, 50]])
y_train consists of integers between 1 and 10000, with other words, this becomes a multi-class classification problem with 10000 classes.
My problem: When I specify the output size in the output layer, I would like to specify 10000, but the model will predict the classes 0-9999 if I do this. Another approach is to set output size to 10001, but then the model can predict the 0-class (padding), which is unwanted.
Since y_train is mapped from 1 to 10000, I could remap it to 0-9999, but since they share mapping with the input, this seems like an unnecessary workaround.
EDIT:
I realize, and which #Andrey pointed out in the comments, that I could allow for 10001 classes, and simply add padding to the vocabulary, although I am never interested in the network predicting 0's.
How can I tell the model to predict on the labels 1-10000, whilst at the meantime have 10000 classes, not 10001?
I would use the following approach:
import tensorflow as tf
inputs = tf.keras.layers.Input(shape=())
x = tf.keras.layers.Embedding(10001, 512)(inputs) # input shape of full vocab size [10001]
x = tf.keras.layers.Dense(10000, activation='softmax')(x) # training weights based on reduced vocab size [10000]
z = tf.zeros(tf.shape(x)[:-1])[..., tf.newaxis]
x = tf.concat([z, x], axis=-1) # add constant zero on the first position (to avoid predicting 0)
model = tf.keras.Model(inputs=inputs, outputs=x)
inputs = tf.random.uniform([10, 10], 0, 10001, dtype=tf.int32)
labels = tf.random.uniform([10, 10], 0, 10001, dtype=tf.int32)
model.compile(loss='sparse_categorical_crossentropy')
model.fit(inputs, labels)
pred = model.predict(inputs) # all zero positions filled by 0 (which is minimum value)

"saved_model_cli show" on a keras+transformers model display different inputs and shapes that the one used for training

I am using transformers TFBertForSequenceClassification.from_pretrained with 'bert-base-multilingual-uncased') and keras to build my model.
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# metric
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
# optimizer
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate, epsilon=epsilon)
# create and compile the Keras model in the context of strategy.scope
model = TFBertForSequenceClassification.from_pretrained(pretrained_weights,
num_labels=num_labels,
cache_dir=pretrained_model_dir)
model._name = 'tf_bert_classification'
# compile Keras model
model.compile(optimizer=optimizer,
loss=loss,
metrics=[metric])
I am using SST2 data, that are tokenize and the feed to the model for training. The data have the following shape:
shape: (32,)
dict structure
dim: 3
[input_ids / attention_mask / token_type_ids ]
[(32, 128) / (32, 128) / (32, 128) ]
[ndarray / ndarray / ndarray ]
and here an example:
({'input_ids': <tf.Tensor: shape=(32, 128), dtype=int32, numpy=
array([[ 101, 21270, 94696, ..., 0, 0, 0],
[ 101, 143, 45100, ..., 0, 0, 0],
[ 101, 24220, 102, ..., 0, 0, 0],
...,
[ 101, 11008, 10346, ..., 0, 0, 0],
[ 101, 43062, 15648, ..., 0, 0, 0],
[ 101, 13178, 18418, ..., 0, 0, 0]], dtype=int32)>, 'attention_mask': ....
As we can see we have input_ids with shape (32, 128) where 32 is the batch size and 128 is the maxiumum length of the string (max for BERT is 512). We also have attention_mask and token_type_ids with the same structure.
I am able to train a model and to do prediction using model.evaluate(test_dataset). All good.
The issue that I am having is that when I serve the model on GCP, then it require data in a different input shape and structure! I saw the same if I run the cli on the saved model:
saved_model_cli show --dir $MODEL_LOCAL --tag_set serve --signature_def serving_default
The given SavedModel SignatureDef contains the following input(s):
inputs['input_ids'] tensor_info:
dtype: DT_INT32
shape: (-1, 5)
name: serving_default_input_ids:0
The given SavedModel SignatureDef contains the following output(s):
outputs['output_1'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 2)
name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict
As we can see we only need to give input_ids and not (attention_mask and token_type_ids) and the sape is different. While the batch size is not defined (-1) which expected, the maxium length is 5 instead of 128!It was working 2 months ago and I probably introduce something that created this issue.
I tried few version of Tensorfow (2.2.0 and 2.3.0) and transformers (2.8.0, 2.9.0 and 3.0.2). I cannot see the Keras'model input and outpu shape (None):
model.inputs
model.outputs
model.summary()
Model: "tf_bert_classification"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bert (TFBertMainLayer) multiple 167356416
_________________________________________________________________
dropout_37 (Dropout) multiple 0
_________________________________________________________________
classifier (Dense) multiple 1538
=================================================================
Total params: 167,357,954
Trainable params: 167,357,954
Non-trainable params: 0
Any idea what could explain that the saved model require a different input that the one use for training! I could use the Keras functional API and defined the input shape but I am pretty sure the this code was working before.
I have seen such behavior when model was instantiated from pretrained one, then weights were loaded, and only then it was saved in fully-pledged keras format.
When I was loading the latter afterwards, it was not able to issue correct prediction because its signatures became garbage: attention_mask disappeared, seq_length changed, dummy None inputs appeared out of nowhere. So probably try to save your model in keras format right after fitting (without intermediary loading from weights), if that's your case.

How to resize a 3D volumes in a Keras network model?

I have a 3D volume let's say of (142, 142, 142) size. I expand a dim as a channel because the volume is considered as a grayscale image, so (142, 142, 142, 1).
Then I want to input a batch of these volumes into a Keras model. First, I crop the volume to (32, 32, 32, 1). Then, I need to resize the cropped cube to (48, 48, 48, 1). Therefore I wrote a model like this,
input_shape = (input_dim, input_dim, input_dim, 1)
input1 = Input(shape=input_shape, name="input_1")
cropped = Cropping3D(cropping, data_format="channels_last")(input1)
resized = Lambda(
lambda cube: K.resize_volumes(cube, 2, 2, 2, data_format="channels_last")
)(cropped)
However, keras.backend.resize_volumes only allows to input an integer as the resize factor, which means my plan doesn't work this way, because from 32 to 48, the resize factor would be 1.5.
So how can I do the resize as a keras layer? Thanks!!!

Resources