Related
I have recently started to learn coding with PyTorch. While I was trying to build a CNN model for the FashionMNIST dataset, I encountered the following problem :
TypeError Traceback (most recent call last)
in ()
----> 1 model = CNN (K)
TypeError: init() takes 1 positional argument but 2 were given
I have read the answers to similar questions but still, I am not able to solve my problem. I would be deeply grateful if anyone could help me in this regard.
Here is the code:
train_dataset = torchvision.datasets.FashionMNIST (root = '.', train = True, transform = transforms.ToTensor (), download= True)
test_dataset = torchvision.datasets.FashionMNIST (root = '.', train= False, transform = transforms.ToTensor (), download = True)
K = len (set (train_dataset.targets.numpy ()))
class CNN (nn.Module):
def __int__ (self, K):
super (CNN, self).__int__ ()
self.conv_layers = nn.Sequential (
nn.Conv2d (in_channels= 1, out_channels= 32, kernel_size= 3, stride = 2),
nn.ReLU (),
nn.Conv2d (in_channels= 32, out_channels= 64, kernel_size= 3, stride = 2),
nn.ReLU (),
nn.Conv2d (in_channels= 64, out_channels= 128, kernel_size= 3, stride= 2),
nn.ReLU ()
)
self.dense_layers = nn.Sequential (nn.Dropout (0.2),
nn.Linear (128 * 2 * 2, 512),
nn.ReLU (),
nn.Dropout (0.2),
nn.Linear (512, K)
)
def forward (self, x):
out = self.conv_layers (x)
out = out.view (out.size (0), -1)
out = self.dense_layers (out)
return out
There is a typo in your initializer method head: it should be def __init__, not def __int__.
You could try deleting "self" in your def forward(self, x): line, since it's already inside your object... I'm not really sure about that, but it seems that one of the "self" arguments are not required since you are already inside your obj class.
So, I'm training a DCGAN model in pytorch on celeba dataset (people). And here is the architecture of the generator:
Generator(
(main): Sequential(
(0): ConvTranspose2d(100, 512, kernel_size=(4, 4), stride=(1, 1), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): ConvTranspose2d(512, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(7): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(8): ReLU(inplace=True)
(9): ConvTranspose2d(128, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(10): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(11): ReLU(inplace=True)
(12): ConvTranspose2d(64, 3, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(13): Tanh()
)
)
So after training, I want to check what generator outputs if I feed an occluded image like this:
(size: 64X64)
But as u might have guessed that the image has 3 channels and my generator accepts a latent vector of 100 channels at the starting, so what is the correct way to feed this image to the generator and check the output. (I'm expecting that the generator tries to generate only the occluded part of the image). If you want a reference code then try this demo file of pytorch. I have modified this file according to my own needs, so for referring, this will do the trick.
You just can't do that. As you said, your network expects 100 dimensional input which is normally sampled from standard normal distribution:
So the generator's job is to take this random vector and generate 3x64x64 image that is indistinguishable from real images. Input is a random 100 dimensional vector sampled from standard normal distribution. I don't see any way to input your image into the current network without modifying the architecture and retraining the new model. If you want to try a new model, you can change input to occluded images, apply some conv. / linear layers to reduce the dimensions to 100 then keep the rest of the network same. This way network will try to learn to generate images not from latent vector but from the feature vector extracted from occluded images. It may or may not work.
EDIT I've decided to give it a go and see if network can learn with this type of conditioned input vectors instead of latent vectors. I've used the tutorial example you've linked and added a couple of changes. First a new network for receiving input and reducing it to 100 dimensions:
class ImageTransformer(nn.Module):
def __init__(self):
super(ImageTransformer, self).__init__()
self.main = nn.Sequential(
nn.Conv2d(3, 1, 4, 2, 1, bias=False),
nn.LeakyReLU(0.2, inplace=True)
)
self.linear = nn.Linear(32*32, 100)
def forward(self, input):
out = self.main(input).view(input.shape[0], -1)
return self.linear(out).view(-1, 100, 1, 1)
Just a simple convolution layer + relu + linear layer to map to 100 dimensions at the output. Note that you can try a much better network here as a better feature extractor, I just wanted to make a simple test.
fixed_input = next(iter(dataloader))[0][0:64, :, : ,:]
fixed_input[:, :, 20:44, 20:44] = torch.tensor(np.zeros((24,24), dtype = np.float32))
fixed_input = fixed_input.to(device)
This is how I modify the tensor to add a black patch over the input. Just sampled a batch to create a fixed input to track the process as it was done in the tutorial with a random vector.
# Create the generator
netG = Generator().to(device)
netD = Discriminator().to(device)
netT = ImageTransformer().to(device)
# Apply the weights_init function to randomly initialize all weights
# to mean=0, stdev=0.2.
netG.apply(weights_init)
netD.apply(weights_init)
netT.apply(weights_init)
# Print the model
print(netG)
print(netD)
print(netT)
Most of the steps are same, just created an instance of the new transformer network. Then finally, training loop is slightly modified where generator is not fed random vectors but it is given outputs of the new transformer network.
img_list = []
G_losses = []
D_losses = []
iters = 0
for epoch in range(num_epochs):
for i, data in enumerate(dataloader, 0):
############################
# (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))
###########################
## Train with all-real batch
netD.zero_grad()
transformed = data[0].detach().clone()
transformed[:, :, 20:44, 20:44] = torch.tensor(np.zeros((24,24), dtype = np.float32))
transformed = transformed.to(device)
real_cpu = data[0].to(device)
b_size = real_cpu.size(0)
label = torch.full((b_size,), real_label, dtype=torch.float, device=device)
output = netD(real_cpu).view(-1)
errD_real = criterion(output, label)
errD_real.backward()
D_x = output.mean().item()
## Train with all-fake batch
fake = netT(transformed)
fake = netG(fake)
label.fill_(fake_label)
output = netD(fake.detach()).view(-1)
errD_fake = criterion(output, label)
errD_fake.backward()
D_G_z1 = output.mean().item()
errD = errD_real + errD_fake
optimizerD.step()
############################
# (2) Update G network: maximize log(D(G(z)))
###########################
netG.zero_grad()
label.fill_(real_label)
output = netD(fake).view(-1)
errG = criterion(output, label)
errG.backward()
D_G_z2 = output.mean().item()
optimizerG.step()
# Output training stats
if i % 50 == 0:
print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f'
% (epoch, num_epochs, i, len(dataloader),
errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))
# Save Losses for plotting later
G_losses.append(errG.item())
D_losses.append(errD.item())
# Check how the generator is doing by saving G's output on fixed_noise
if (iters % 500 == 0) or ((epoch == num_epochs-1) and (i == len(dataloader)-1)):
with torch.no_grad():
fake = netT(fixed_input)
fake = netG(fake).detach().cpu()
img_list.append(vutils.make_grid(fake, padding=2, normalize=True))
iters += 1
Training was somewhat okay in terms of loss reductions etc. Finally this is what I got after 5 epochs training:
So what does this result tell us? Since the generator's inputs were not randomly taken from a normal distribution, generator wasn't able to learn the distribution of faces to create varying range of output faces. And since the input is a conditioned feature vector, output images' range is limited. So in summary, random inputs are required for the generator even though it learned to remove patches :)
Suppose I create simplest model in Keras:
from keras.layers import *
from keras import Input, Model
import coremltools
def MyModel(inputs_shape=(None,None,3), channels=64):
inpt = Input(shape=inputs_shape)
# channels
skip = Conv2D(channels, (3, 3), strides=1, activation=None, padding='same', name='conv_in')(inpt)
out = Conv2D(3, (3, 3), strides=1, padding='same', activation='tanh',name='out')(skip)
return Model(inputs=inpt, outputs=out)
model = MyModel()
coreml_model = coremltools.converters.keras.convert(model,
input_names=["inp1"],
output_names=["out1"],
image_scale=1.0,
model_precision='float32',
use_float_arraytype=True,
input_name_shape_dict={'inp1': [None, 384, 384, 3]}
)
spec = coreml_model._spec
print(spec.description.input[0])
print(spec.description.input[0].type.multiArrayType.shape)
print(spec.description.output[0])
coremltools.utils.save_spec(spec, "test.mlmodel")
The output is:
2 : out, <keras.layers.convolutional.Conv2D object at 0x7f08ca491470>
3 : out__activation__, <keras.layers.core.Activation object at 0x7f08ca4b0b70>
name: "inp1"
type {
multiArrayType {
shape: 3
shape: 384
shape: 384
dataType: FLOAT32
}
}
[3, 384, 384]
name: "out1"
type {
multiArrayType {
shape: 3
dataType: FLOAT32
}
}
So the output shape is 3, which is incorrect. And when I try to get rid from input_name_shape_dict I get:
Please provide a finite height (H), width (W) & channel value (C) using input_name_shape_dict arg with key = 'inp1' and value = [None, H, W, C]
Converted .mlmodel can be modified to have flexible input shape using coremltools.models.neural_network.flexible_shape_utils
So it wants NHWC.
Attempt of inference yields:
Layer 'conv_in' of type 'Convolution' has input rank 3 but expects rank at least 4
When I attempt to add extra dimension to input:
spec.description.input[0].type.multiArrayType.shape.extend([1, 3, 384, 384])
del spec.description.input[0].type.multiArrayType.shape[0]
del spec.description.input[0].type.multiArrayType.shape[0]
del spec.description.input[0].type.multiArrayType.shape[0]
[name: "inp1"
type {
multiArrayType {
shape: 1
shape: 3
shape: 384
shape: 384
dataType: FLOAT32
}
}
]
I get for inference:
Shape (1 x 384 x 384 x 3) was not in enumerated set of allowed shapes
Following this advice and making input shape (1,1,384,384,3) dos not help.
How can I make it working and producing correct output?
Inference:
From PIL import Image
model_cml = coremltools.models.MLModel('my.mlmodel')
# load image
img = np.array(Image.open('patch4.png').convert('RGB'))[np.newaxis,...]/127.5 - 1
# Make predictions
predictions = model_cml.predict({'inp1':img})
# save result
res = predictions['out1']
res = np.clip((res[0]+1)*127.5,0,255).astype(np.uint8)
Image.fromarray(res).save('out32.png')
UPDATE:
I am able to run this model with inputs (3,1,384,384), the result produces is (1,3,3,384,384) which does not make any sense for me.
UPDATE 2:
setting fixed shape in Keras
def MyModel(inputs_shape=(384,384,3), channels=64):
inpt = Input(shape=inputs_shape)
fixed output shape problem, but I still cannot run the model (Layer 'conv_in' of type 'Convolution' has input rank 3 but expects rank at least 4)
UPDATE:
The following works to get rid of input and conv_in shapes mismatch.
1). Downgrade to coremltools==3.0. Version 3.3 (model version 4) seems broken.
2.) Use fixed shape in keras model, no input_shape_dist and variable shape for coreml model
from keras.layers import *
from keras import Input, Model
import coremltools
def MyModel(inputs_shape=(384,384,3), channels=64):
inpt = Input(shape=inputs_shape)
# channels
skip = Conv2D(channels, (3, 3), strides=1, activation=None, padding='same', name='conv_in')(inpt)
out = Conv2D(3, (3, 3), strides=1, padding='same', activation='tanh',name='out')(skip)
return Model(inputs=inpt, outputs=out)
model = MyModel()
model.save('test.model')
print(model.summary())
'''
# v.3.3
coreml_model = coremltools.converters.keras.convert(model,
input_names=["image"],
output_names="out1",
image_scale=1.0,
model_precision='float32',
use_float_arraytype=True,
input_name_shape_dict={'inp1': [None, 384, 384, 3]}
)
'''
coreml_model = coremltools.converters.keras.convert(model,
input_names=["image"],
output_names="out1",
image_scale=1.0,
model_precision='float32',
)
spec = coreml_model._spec
from coremltools.models.neural_network import flexible_shape_utils
shape_range = flexible_shape_utils.NeuralNetworkMultiArrayShapeRange()
shape_range.add_channel_range((3,3))
shape_range.add_height_range((64, 384))
shape_range.add_width_range((64, 384))
flexible_shape_utils.update_multiarray_shape_range(spec, feature_name='image', shape_range=shape_range)
print(spec.description.input)
print(spec.description.input[0].type.multiArrayType.shape)
print(spec.description.output)
coremltools.utils.save_spec(spec, "my.mlmodel")
In the inference script, feed array of the shape (1,1,3,384,384):
img = np.zeros((1,1,3,384,384))
# Make predictions
predictions = model_cml.predict({'inp1':img})
res = predictions['out1'] # (3, 384,384)
You can ignore what the mlmodel file has in the output shape if it is incorrect. This is more of a metadata issue, i.e. the model will still work fine and do the right thing. The converter isn't always able to figure out the correct output shape (not sure why).
I'm using transfer learning for semantic segmentation.
model=vgg(weights="imagenet")
new_model=Sequential()
for l,n in model.layers:
new_model.add(l)
if(n==18): break
#Upsampling
m1=model.layers[-1].output
new_model.add(Conv2DTranspose(512,(3,3),strides=(2,2),
padding="same"))
m2=new_model.layers[-1].output
concatenate1=concatenate(m1,m2)
Till this step it works fine. Now how can I add this concatenation to the network.
new_model.layers[-1].output=concatenate1.output
new_model.layers[-1].output=concatenate1
# these are wrong
You can directly use the functional version of keras that will be easier for you.
You will simply use all the layers to n = 18 and that output will connect to m1.
Finally, you create the model. The code would be the following:
model = vgg(weights="imagenet")
input_ = model.input
for l, n in model.layers:
if n == 18:
last_layer = l
break
#Upsampling
m1 = last_layer
m2 = Conv2DTranspose(512, (3, 3), strides=(2, 2), padding="same")(m1)
concatenate1 = Concatenate(axis=-1)(m1, m2)
new_model = Model(inputs=input_, outputs=concatenate1)
My input shape is a 10000x500 text document. 10000 represents number of documents and 500 represents number of words.
What I am trying to do is to feed the text for kera's embedding, followed by BLSTM, and then followed by Conv2D and then 2Dpooling, flatten and finally a fully connected dense layer.
Architecture is shown as below:
inp = Input(shape=(500,))
x = Embedding(max_features=10000, embed_size=100)(inp)
x = Bidirectional(CuDNNLSTM(50, return_sequences=True))(x)
x = Conv2D(filters=128, kernel_size=(3, 3), input_shape=(100,500,1))(x)
x = MaxPooling2D()(x)
x = Flatten()(x)
x = Dense(1, activation="sigmoid")(x)
The output shape from the embedding would be (None, 500, 100)
The output shape from BLSTM's hidden state would be (None, 500, 100).
I would like a Conv2D to extract local features over hidden layers from BLSTM. However, I'm having dimension discrepancy error.
ValueError: Input 0 is incompatible with layer conv2d_8: expected ndim=4, found ndim=3
I have tried a solution here When bulding a CNN, I am getting complaints from Keras that do not make sense to me. but still getting the error.
You have two options:
a) Use Conv2D with rows=100, cols=500 and channels=1 by adding a dimension to x:
x = Lambda(lambda t: t[..., None])(x)
x = Conv2D(filters=128, kernel_size=(3, 3), input_shape=(100,500,1))(x)
b) Use Conv1D with steps=100 and input_dim=500, and use MaxPooling1D:
x = Conv1D(filters=128, kernel_size=3, input_shape=(100, 500))(x)
x = MaxPooling1D()(x)
x = Flatten()(x)