I apologize that this is probably a simple question that has been answered before, but I could not find the answer. I’m attempting to use a CNN to extract features and then input that into a FC network that outputs 2 variables. I’m attempting to use the functional linear layer as a way to dynamically handle the flattened features. The self.cnn is a Sequential container which last layer is the nn.Flatten(). When I print the size of x after the CNN I see it is 15x152064, so I’m unclear why the F.linear layer is failing to run with the error below. Any help would be appreciated.
RuntimeError: size mismatch, get 15, 15x152064,2
x = self.cnn(x)
batch_size, channels = x.size()
x = F.linear(x, torch.Tensor([256,channels]))
y_hat = self.FC(x)
torch.Tensor([256, channels]) does not create a tensor of size (256, channels) but the 1D tensor containing the values 256 and channels instead. I don't know how you want to initialize your weights, but there are a couple options :
# Identity transform:
x = F.linear(x, torch.ones(256,channels))
# Random transform :
x = F.linear(x, torch.randn(256,channels))
Related
I recently started learning and using automatic differentiation to determine the gradients and jacobian matrix of a neural network with respect to a given input. The method suggested by tensorflow is the tape.gradient and tape.jacobian method. However, I am not able to obtain the jacobian matrix using this method due to some bug in tensorflow. It works when I calculated tape.gradient(y_pred, x), but not the jacobian matrix, which should have a shape of (200,3). I am open to other ways to calculate the jacobian matrix, but I am more inclined to use automatic differentiation methods within Tensorflow. The current version I am using is Tensorflow 2.1.0. Greatly appreciate any advice!
import tensorflow as tf
import numpy as np
# The neural network accepts 3 inputs and produces 200 outputs. The actual values of the inputs and outputs are not written in the code as it is too involved.
num_inputs = 3
num_outputs = 200
num_hidden_layers = 5
num_neurons = 50
kernel = 'he_uniform'
activation = tf.keras.layers.LeakyReLU(alpha=0.3)
# Details of model (MLP)
current_model = tf.keras.models.Sequential()
current_model.add(tf.keras.Input(shape=(num_inputs,)))
for i in range(num_hidden_layers):
current_model.add(tf.keras.layers.Dense(units=num_neurons, activation=activation, kernel_initializer=kernel))
current_model.add(tf.keras.layers.Dense(units=num_outputs, activation='linear', kernel_initializer=kernel))
# Finding the Jacobian matrix with respect to a given input of the neural network
# In this case, the inputs are [0.02, 0.4 and 0.12] (i.e. 3 inputs)
x = tf.Variable([[0.02, 0.4, 0.12]], dtype=tf.float32)
with tf.GradientTape() as tape:
y_pred = x
for layer in current_model.layers:
y_pred = layer(y_pred)
jacobian = tape.jacobian(y_pred, x)
print(jacobian)
Below is the error returned. I removed some parts for privacy purposes.
StagingError: in converted code:
C:\Users\...\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\ops\parallel_for\control_flow_ops.py:183 f *
return _pfor_impl(loop_fn, iters, parallel_iterations=parallel_iterations)
C:\Users\...\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\ops\parallel_for\control_flow_ops.py:256 _pfor_impl
outputs.append(converter.convert(loop_fn_output))
C:\Users\...\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\ops\parallel_for\pfor.py:1280 convert
output = self._convert_helper(y)
C:\Users\...\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\ops\parallel_for\pfor.py:1453 _convert_helper
if flags.FLAGS.op_conversion_fallback_to_while_loop:
C:\Users\...\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\platform\flags.py:84 __getattr__
wrapped(_sys.argv)
C:\Users\...\anaconda3\envs\tf\lib\site-packages\absl\flags\_flagvalues.py:633 __call__
name, value, suggestions=suggestions)
UnrecognizedFlagError: Unknown command line flag 'f'
I am new with Pytorch, and will be glad if someone will be able to help me understand the following (and correct me if I am wrong), regarding the meaning of the command x.view in Pytorch first tutorial, and in general about the input of convolutional layers and the input of fully-connected layers:
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
As far as I understand, an input 256X256 image to a convolutional layer is inserted in its 2D form (i.e. - a 256X256 matrix, or a 256X256X3 in the case of a color image). Nevertheless, when we insert an image to a fully-connected linear layer, we need to first reshape the 2D image into a 1D vector (am I right? Is this true also in general (or only in Pytorch)? ). Is this why we use the command “x = x.view(-1, 16 * 5 * 5)” before inserting x into the fully-connected layers?
If the input image x would be 3D (e.g. 256X256X256), would the syntax of the given above “forward” function remain the same?
Thanks a lot in advance
Its from Petteri Nevavuori's lecture notes and shows how a feature map is produced from an image I with a kernel K. With each application of the kernel a dot product is calculated, which effectively is the sum of element-wise multiplications between I and K in an K-sized area within I.
You could say that kernel looks for diagonal features. It then searches the image and finds a perfect matching feature in the lower left corner. Otherwise the kernel is able to identify only parts the feature its looking for. This why the product is called a feature map, as it tells how well a kernel was able to identify a feature in any location of the image it was applied to.
Answer adapted from: https://discuss.pytorch.org/t/convolution-input-and-output-channels/10205/3
Let's say we consider an input image of shape (W x H x 3) where input volume has 3 channels (RGB image). Now we would like to create a ConvLayer for this image.
Each kernel in the ConvLayer will use all input channels of the input volume. Let’s assume we would like to use a 3 by 3 kernel. This kernel will have 27 weights and 1 bias parameter, since (W * H * input_Channels = 3 * 3 * 3 = 27 weights).
The number of output channels is the number of different kernels used in the ConvLayer. If we would like to output 64 channels, we need to define ConvLayer such that it uses 64 different 3x3 kernels.
If you check out the documentation of Conv2d, we can define a ConvLayer mimicking above scenario as follows.
nn.Conv2d(3, 64, 3, stride=1)
Where in_channels = 3, out_channels = 64, kernel_size = 3x3. Check out what is stride in the documentation.
If you check out the implementation of Linear layer, you would see the underlying mathematical equation that a linear operation mimics is: y = Ax + b.
According to pytorch documentation of linear layer, we can see it expects an input of shape (N,∗,in_features) and the output is of shape (N,∗,out_features). So, in your case, if the input image x is of shape 256 x 256 x 256, and you want to transform all the (256*256*256) features to a specific number of feature, you can define a linear layer as:
llayer = nn.Linear(256*256*256, num_features)
I'm training a CNN model on Keras to classify images belong to 2 classes. I have about 600 images of class 0 and 1000 images of class 1. My model is shown below. The problem is it always gave me the output of the class with higher samples. I tried to change the last activation function into sigmoid but it did not help at all. I also tried to add batch normalization as well as regularization and dropout.
def model(input_shape):
#Define the input placeholder as a tensor with shape input_shape
X_input = Input(input_shape)
# First layer
X = Conv2D(32,(5,5),strides=(1,1),padding='same',name='conv1')(X_input)
X = BatchNormalization()(X)
X = Activation('relu')(X)
X = MaxPooling2D((3,3),strides=2,name='pool1')(X)
# Second layer
X = Conv2D(32,(5,5),strides=(1,1),padding='same',name='conv2')(X)
X = BatchNormalization()(X)
X = Activation('relu')(X)
X = AveragePooling2D((3,3),strides=2,name='pool2')(X)
# Third layer
X = Conv2D(64,(5,5),strides=(1,1),padding='same',name='conv3')(X)
X = BatchNormalization()(X)
X = Activation('relu')(X)
X = AveragePooling2D((3,3),strides=2,name='pool3')(X)
# Flatten
X = Flatten()(X)
X = Dense(64,activation='softmax',name='fc1')(X)
X = dropout(0.5)(X)
X = Dense(2,activation='softmax',name='fc2')(X)
# Create the model
model = Model(inputs = X_input,outputs = X)
return model
Okay, you have a 2-class classification problem, where the number of samples in the classes are imbalanced. This problem is especially common for medical diagnostics, and causes a lot of headache not only to you.
You have stated that the specific problem you are solving is classifying tumor histopathological images. These images are very rich in patterns, so 1600 images in total is typically not enough to learn meaningful features and representations. The distribution in minibatches is also imbalanced, so the gradient will always be slightly shifted (given that other filters are producing random noise since they didn't have time to learn meaningful features) towards the local minimum of one-class classification.
There are, however, several techniques to improve the performance:
Trim the larger class, so that the number of samples in each class is roughly the same. So in your case, reduce the class with 1000 samples to around 600.
If you don't want to train on less data, try assign a weight to each class. Here is a link to short example. So, in your specific case, you have 600 images of class A and 1000 images of class B. So, you can assign a weight 1.0 for class B and 10/6 to class A.
As stated above, 1600 samples is not enough to learn meaningful features. Especially when you have a Deep Neural Network. So, what you can try is Transfer Learning. A very detailed tutorial (which was one on SO Documentation) can be found here. Make sure to look closely at the layer's visualization in the tutorial.
Also, to test the model's capability, you might want to try intentional overfitting. A quick checklist can be found here.
I am trying to implement a text classification model using a CNN. As far as I know, for text data, we should use 1d Convolutions. I saw an example in pytorch using Conv2d but I want to know how can I apply Conv1d for text? Or, it is actually not possible?
Here is my model scenario:
Number of in-channels: 1, Number of out-channels: 128
Kernel size : 3 (only want to consider trigrams)
Batch size : 16
So, I will provide tensors of shape, <16, 1, 28, 300> where 28 is the length of a sentence. I want to use Conv1d which will give me 128 feature maps of length 26 (as I am considering trigrams).
I am not sure, how to define nn.Conv1d() for this setting. I can use Conv2d but want to know is it possible to achieve the same using Conv1d?
This example of Conv1d and Pool1d layers into an RNN resolved my issue.
So, I need to consider the embedding dimension as the number of in-channels while using nn.Conv1d as follows.
m = nn.Conv1d(200, 10, 2) # in-channels = 200, out-channels = 10
input = Variable(torch.randn(10, 200, 5)) # 200 = embedding dim, 5 = seq length
feature_maps = m(input)
print(feature_maps.size()) # feature_maps size = 10,10,4
Although I don't work with text data, the input tensor in its current form would only work using conv2d. One possible way to use conv1d would be to concatenate the embeddings in a tensor of shape e.g. <16,1,28*300>. You can reshape the input with view In pytorch.
I read all posts in the net adressing the issue where people forgot to change the target vector to a matrix, and as a problem remains after this change, I decided to ask my question here. Workarounds are mentioned below, but new problems show and I am thankful for suggestions!
Using a convolution network setup and binary crossentropy with sigmoid activation function, I get a dimension mismatch problem, but not during the training data, only during validation / test data evaluation. For some strange reason, of of my validation set vectors get his dimension switched and I have no idea, why. Training, as mentioned above, works fine. Code follows below, thanks a lot for help (and sorry for hijacking the thread, but I saw no reason for creating a new one), most of it copied from the lasagne tutorial example.
Workarounds and new problems:
Removing "axis=1" in the valAcc definition helps, but validation accuracy remains zero and test classification always returns the same result, no matter how many nodes, layers, filters etc. I have. Even changing training set size (I have around 350 samples for each class with 48x64 grayscale images) does not change this. So something seems off
Network creation:
def build_cnn(imgSet, input_var=None):
# As a third model, we'll create a CNN of two convolution + pooling stages
# and a fully-connected hidden layer in front of the output layer.
# Input layer using shape information from training
network = lasagne.layers.InputLayer(shape=(None, \
imgSet.shape[1], imgSet.shape[2], imgSet.shape[3]), input_var=input_var)
# This time we do not apply input dropout, as it tends to work less well
# for convolutional layers.
# Convolutional layer with 32 kernels of size 5x5. Strided and padded
# convolutions are supported as well; see the docstring.
network = lasagne.layers.Conv2DLayer(
network, num_filters=32, filter_size=(5, 5),
nonlinearity=lasagne.nonlinearities.rectify,
W=lasagne.init.GlorotUniform())
# Max-pooling layer of factor 2 in both dimensions:
network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))
# Another convolution with 16 5x5 kernels, and another 2x2 pooling:
network = lasagne.layers.Conv2DLayer(
network, num_filters=16, filter_size=(5, 5),
nonlinearity=lasagne.nonlinearities.rectify)
network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))
# A fully-connected layer of 64 units with 25% dropout on its inputs:
network = lasagne.layers.DenseLayer(
lasagne.layers.dropout(network, p=.25),
num_units=64,
nonlinearity=lasagne.nonlinearities.rectify)
# And, finally, the 2-unit output layer with 50% dropout on its inputs:
network = lasagne.layers.DenseLayer(
lasagne.layers.dropout(network, p=.5),
num_units=1,
nonlinearity=lasagne.nonlinearities.sigmoid)
return network
Target matrices for all sets are created like this (training target vector as an example)
targetsTrain = np.vstack( (targetsTrain, [[targetClass], ]*numTr) );
...and the theano variables as such
inputVar = T.tensor4('inputs')
targetVar = T.imatrix('targets')
network = build_cnn(trainset, inputVar)
predictions = lasagne.layers.get_output(network)
loss = lasagne.objectives.binary_crossentropy(predictions, targetVar)
loss = loss.mean()
params = lasagne.layers.get_all_params(network, trainable=True)
updates = lasagne.updates.nesterov_momentum(loss, params, learning_rate=0.01, momentum=0.9)
valPrediction = lasagne.layers.get_output(network, deterministic=True)
valLoss = lasagne.objectives.binary_crossentropy(valPrediction, targetVar)
valLoss = valLoss.mean()
valAcc = T.mean(T.eq(T.argmax(valPrediction, axis=1), targetVar), dtype=theano.config.floatX)
train_fn = function([inputVar, targetVar], loss, updates=updates, allow_input_downcast=True)
val_fn = function([inputVar, targetVar], [valLoss, valAcc])
Finally, here the two loops, training and test. The first is fine, the second throws the error, excerpts below
# -- Neural network training itself -- #
numIts = 100
for itNr in range(0, numIts):
train_err = 0
train_batches = 0
for batch in iterate_minibatches(trainset.astype('float32'), targetsTrain.astype('int8'), len(trainset)//4, shuffle=True):
inputs, targets = batch
print (inputs.shape)
print(targets.shape)
train_err += train_fn(inputs, targets)
train_batches += 1
# And a full pass over the validation data:
val_err = 0
val_acc = 0
val_batches = 0
for batch in iterate_minibatches(valset.astype('float32'), targetsVal.astype('int8'), len(valset)//3, shuffle=False):
[inputs, targets] = batch
[err, acc] = val_fn(inputs, targets)
val_err += err
val_acc += acc
val_batches += 1
Erorr (excerpts)
Exception "unhandled ValueError"
Input dimension mis-match. (input[0].shape[1] = 52, input[1].shape[1] = 1)
Apply node that caused the error: Elemwise{eq,no_inplace}(DimShuffle{x,0}.0, targets)
Toposort index: 36
Inputs types: [TensorType(int64, row), TensorType(int32, matrix)]
Inputs shapes: [(1, 52), (52, 1)]
Inputs strides: [(416, 8), (4, 4)]
Inputs values: ['not shown', 'not shown']
Again, thanks for help!
so it seems the error is in the evaluation of the validation accuracy.
When you remove the "axis=1" in your calculation, the argmax goes on everything, returning only a number.
Then, broadcasting steps in and this is why you would see the same value for the whole set.
But from the error you have posted, the "T.eq" op throws the error because it has to compare a 52 x 1 with a 1 x 52 vector (matrix for theano/numpy).
So, I suggest you try to replace the line with:
valAcc = T.mean(T.eq(T.argmax(valPrediction, axis=1), targetVar.T))
I hope this should fix the error, but I haven't tested it myself.
EDIT:
The error lies in the argmax op that is called.
Normally, the argmax is there to determine which of the output units is activated the most.
However, in your setting you only have one output neuron which means that the argmax over all output neurons will always return 0 (for first arg).
This is why you have the impression your network gives you always 0 as output.
By replacing:
valAcc = T.mean(T.eq(T.argmax(valPrediction, axis=1), targetVar.T))
with:
binaryPrediction = valPrediction > .5
valAcc = T.mean(T.eq(binaryPrediction, targetVar.T)
you should get the desired result.
I'm just not sure, if the transpose is still necessary or not.