I have images of shape 391 x 400. I attempted to use the autoencoder as described here.
Specifically, I have used the following code:
from keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
from keras import backend as K
input_img = Input(shape=(391, 400, 1)) # adapt this if using `channels_first` image data format
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# at this point the representation is (4, 4, 8) i.e. 128-dimensional
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
I am getting the following:
ValueError: Error when checking target: expected conv2d_37 to have shape (None, 392, 400, 1) but got array with shape (500, 391, 400, 1)
What I need: a layer that would drop/crop/reshape the last layer from 392 x 400 to 391 x 400.
Thank you for any help.
There's a layer called Cropping2D. To crop the last layer from 392 x 400 to 391 x 400, you can use it by:
cropped = Cropping2D(cropping=((1, 0), (0, 0)))(decoded)
autoencoder = Model(input_img, cropped)
The tuple ((1, 0), (0, 0)) means to crop 1 row from the top. If you want to crop from bottom, use ((0, 1), (0, 0)) instead. You can see the documentation for more detailed description about the cropping argument.
Related
I have been following the Keras documentation to build up a CNN autoencoder
https://blog.keras.io/building-autoencoders-in-keras.html .
from keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
from keras import backend as K
input_img = Input(shape=(28, 28, 1)) # adapt this if using `channels_first` image data format
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# at this point the representation is (4, 4, 8) i.e. 128-dimensional
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
I have noticed that it uses Conv2D in its decoding layers instead of Conv2DTranspose. But some other articles explain CNN autoencoders using Conv2DTranspose as a replacement for Upsampling2D and Conv2D. I have seen several questions related to Conv2DTranspose itself. But I haven't found an answer to my question.
My question is can I use Conv2DTranspose instead of Upsampling2D and Conv2D layers. If so, why haven't the authors themselves (Keras documentation) have not used it? Does it make any difference?
Transpose Convolutions often result in artifacts called Checkerboard artifacts - Small adjacent squares easily distinguishable from each other. These make it very easy for humans to recognize fake images from real ones.
You can read this article for more information.
In short, using Resizing + Conv2D instead of Conv2dTranspose minimizes these checkerboard artifacts.
I am new with using unsupervised CNN model in python. I am trying to use CNN model for image classification with unsupervised spectrogram input images. Each image is of size 523 width and 393 height. And I have tried the following code
X_data = []
files = glob.glob ("C:/train/*.png")
for myFile in files:
image = cv2.imread (myFile)
image_resized = misc.imresize(image, (523,393))
image_resi = misc.imresize(image_resized, (28, 28))
assert image_resized.shape == (523,393, 3), "img %s has shape %r" % (myFile, image_resized.shape)
X_data.append (image_resi)
X_datatest = []
files = glob.glob ("C:/test/*.png")
for myFile in files:
image = cv2.imread (myFile)
image_resized = misc.imresize(image, (523,393))
image_resi = misc.imresize(image_resized, (28, 28))
assert image_resized.shape == (523,393, 3), "img %s has shape %r" % (myFile, image_resized.shape)
X_datatest.append (image_resi)
X_data = np.array(X_data)
X_datatest = np.array(X_datatest)
X_data= X_data.astype('float32') / 255.
X_datatest = X_datatest.astype('float32') / 255.
X_data = np.reshape(X_data, (len(X_data), 28, 28, 3)) # adapt this if using `channels_first` image data format
X_datatest = np.reshape(X_datatest, (len(X_datatest), 28, 28, 3)) # adapt this if using `channels_first` image data format
noise_factor = 0.5
x_train_noisy = X_data + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=X_data.shape)
x_test_noisy = X_datatest + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=X_datatest.shape)
x_train_noisy = np.clip(x_train_noisy, 0., 1.)
x_test_noisy = np.clip(x_test_noisy, 0., 1.)
input_img = Input(shape=(28, 28, 3)) # adapt this if using `channels_first` image data format
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# at this point the representation is (7, 7, 32)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy', metrics=['accuracy'] )
autoencoder.fit(x_train_noisy, X_data,
epochs=100,
batch_size=128,
verbose = 2,
validation_data=(x_test_noisy, X_datatest),
callbacks=[TensorBoard(log_dir='/tmp/tb', histogram_freq=0, write_graph=False)])
I have tried to make noise and fed it as labels because i didn't have labels because this is an unsupervised spectrogram data. But the output is 33% only for the accuracy. I don't know why. Can anyone help me with this and try to make me understand the numbers of filters, kernels and the resize with 28*28 based on what? And why we just use the image size which is here 523 width and 393 height?
I have a situation where input is an image and a group of (3) numeric fields and output is an image mask. I am not sure about how to do that in KERAS...
My architecture is somewhat like the attachment. I am aware about the CNN and Dense architectures, just not sure how to pass the inputs in the corresponding networks and do the concat operation. Also, suggestion of berrer architecture for this will be great!!!!!
Please suggest me, preferably with example code.
Thanks in Advance, Utpal.
I can advice to try U-net model for this problem. Usual U-net represents several conv and maxpooling layers, and then several conv and upsampling layers:
In the current problem you can mix up non-spatial data (image annotation) at the middle:
Also maybe it's a good idea to start with pre-trained VGG-16 (see below vgg.load_weights(VGG_Weights_path)).
See code below (based on Divam Gupta's repo):
from keras.models import *
from keras.layers import *
def VGGUnet(n_classes, input_height=416, input_width=608, data_length=128, vgg_level=3):
assert input_height % 32 == 0
assert input_width % 32 == 0
# https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_th_dim_ordering_th_kernels.h5
img_input = Input(shape=(3, input_height, input_width))
data_input = Input(shape=(data_length,))
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1', data_format=IMAGE_ORDERING)(img_input)
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2', data_format=IMAGE_ORDERING)(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool', data_format=IMAGE_ORDERING)(x)
f1 = x
# Block 2
x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1', data_format=IMAGE_ORDERING)(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2', data_format=IMAGE_ORDERING)(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool', data_format=IMAGE_ORDERING)(x)
f2 = x
# Block 3
x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1', data_format=IMAGE_ORDERING)(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2', data_format=IMAGE_ORDERING)(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv3', data_format=IMAGE_ORDERING)(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool', data_format=IMAGE_ORDERING)(x)
f3 = x
# Block 4
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1', data_format=IMAGE_ORDERING)(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv2', data_format=IMAGE_ORDERING)(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv3', data_format=IMAGE_ORDERING)(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool', data_format=IMAGE_ORDERING)(x)
f4 = x
# Block 5
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1', data_format=IMAGE_ORDERING)(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2', data_format=IMAGE_ORDERING)(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3', data_format=IMAGE_ORDERING)(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool', data_format=IMAGE_ORDERING)(x)
f5 = x
x = Flatten(name='flatten')(x)
x = Dense(4096, activation='relu', name='fc1')(x)
x = Dense(4096, activation='relu', name='fc2')(x)
x = Dense(1000, activation='softmax', name='predictions')(x)
vgg = Model(img_input, x)
vgg.load_weights(VGG_Weights_path)
levels = [f1, f2, f3, f4, f5]
# Several dense layers for image annotation processing
data_layer = Dense(1024, activation='relu', name='data1')(data_input)
data_layer = Dense(input_height * input_width / 256, activation='relu', name='data2')(data_layer)
data_layer = Reshape((1, input_height / 16, input_width / 16))(data_layer)
# Mix image annotations here
o = (concatenate([f4, data_layer], axis=1))
o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
o = (Conv2D(512, (3, 3), padding='valid', data_format=IMAGE_ORDERING))(o)
o = (BatchNormalization())(o)
o = (UpSampling2D((2, 2), data_format=IMAGE_ORDERING))(o)
o = (concatenate([o, f3], axis=1))
o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
o = (Conv2D(256, (3, 3), padding='valid', data_format=IMAGE_ORDERING))(o)
o = (BatchNormalization())(o)
o = (UpSampling2D((2, 2), data_format=IMAGE_ORDERING))(o)
o = (concatenate([o, f2], axis=1))
o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
o = (Conv2D(128, (3, 3), padding='valid', data_format=IMAGE_ORDERING))(o)
o = (BatchNormalization())(o)
o = (UpSampling2D((2, 2), data_format=IMAGE_ORDERING))(o)
o = (concatenate([o, f1], axis=1))
o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
o = (Conv2D(64, (3, 3), padding='valid', data_format=IMAGE_ORDERING))(o)
o = (BatchNormalization())(o)
o = Conv2D(n_classes, (3, 3), padding='same', data_format=IMAGE_ORDERING)(o)
o_shape = Model(img_input, o).output_shape
output_height = o_shape[2]
output_width = o_shape[3]
o = (Reshape((n_classes, output_height * output_width)))(o)
o = (Permute((2, 1)))(o)
o = (Activation('softmax'))(o)
model = Model([img_input, data_input], o)
model.outputWidth = output_width
model.outputHeight = output_height
return model
To train and evaluate a keras model with several inputs prepare separate arrays for each of the input layers - image_train and annotation_train (preserving an order by the first axis, i.e. number of the sample) and call this:
model.fit([image_train, annotation_train], result_segmentation_train, batch_size=..., epochs=...)
test_loss, test_acc = model.evaluate([image_test, annotation_test], result_segmentation_test)
Good luck!
hi I am building a image classifier for one-class classification in which i've used autoencoder while running this model I am getting this error (ValueError: Layer conv2d_3 was called with an input that isn't a symbolic tensor. Received type: . Full input: [(128, 128, 3)]. All inputs to the layer should be tensors.)
num_of_samples = img_data.shape[0]
labels = np.ones((num_of_samples,),dtype='int64')
labels[0:376]=0
names = ['cat']
Y = np_utils.to_categorical(labels, num_class)
input_shape=img_data[0].shape
x,y = shuffle(img_data,Y, random_state=2)
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=2)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_shape)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# at this point the representation is (4, 4, 8) i.e. 128-dimensional
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_shape, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
autoencoder.fit(X_train, X_train,
epochs=50,
batch_size=32,
shuffle=True,
validation_data=(X_test, X_test),
callbacks=[TensorBoard(log_dir='/tmp/autoencoder')])
Here:
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_shape)
A shape is not a tensor.
Do this:
from keras.layers import *
inputTensor = Input(input_shape)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(inputTensor)
Hint about autoencoders
You should separate the encoder and decoder as individual models. Later you will probably want to work with only one of them.
Encoder:
inputTensor = Input(input_shape)
x = ....
encodedData = MaxPooling2D((2, 2), padding='same')(x)
encoderModel = Model(inputTensor,encodedData)
Decoder:
encodedInput = Input((4,4,8))
x = ....
decodedData = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
decoderModel = Model(encodedInput,decodedData)
Autoencoder:
autoencoderInput = Input(input_shape)
encoded = encoderModel(autoencoderInput)
decoded = decoderModel(encoded)
autoencoderModel = Model(autoencoderInput,decoded)
I am using Keras autoencodes with Theano backend. And want to make autoencode for 720x1080 RGB images.
This is my code
from keras.datasets import mnist
import numpy as np
from keras.layers import Input, LSTM, RepeatVector, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
from PIL import Image
x_train = []
x_train_noisy = []
for i in range(5,1000):
image = Image.open('data/trailerframes/frame' + str(i) + '.jpg', 'r')
x_train.append(np.array(image))
image = Image.open('data/trailerframes_avg/frame' + str(i) + '.jpg', 'r')
x_train_noisy.append(np.array(image))
x_train = np.array(x_train)
x_train = x_train.astype('float32') / 255.
x_train_noisy = np.array(x_train_noisy)
x_train_noisy = x_train_noisy.astype('float32') / 255.
input_img = Input(shape=(720, 1080, 3))
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), data_format="channels_last", activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), data_format="channels_last", activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), data_format="channels_last", activation='sigmoid', padding='same')(x)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
autoencoder.fit(x_train_noisy, x_train,
epochs=10,
batch_size=128,
shuffle=True,
validation_data=(x_train_noisy, x_train))
But it gives me an error
ValueError: Error when checking input: expected input_7 to have shape (None, 720, 1080, 3) but got array with shape (995, 720, 1280, 3)
Input error:
As simple as:
You defined your input as (720,1080,3)
You're trying to trian your model with data in the form (720,1280,3)
One of them is wrong, and I think it's a typo in the input:
#change 1080 for 1280
input_img = Input(shape=(720, 1280, 3))
Output error (target):
Now, your target data is shaped like (720,1280,3), and your last layer outputs (720,1280,1)
A simple fix is:
decoded = Conv2D(3, (3, 3), data_format="channels_last", activation='sigmoid', padding='same')(x)
Using the encoder:
After training that model, you can create submodels for using only the encoder or the decoder:
encoderModel = Model(input_img, decoded)
decoderInput = Input((shape of the encoder output))
decoderModel = Model(decoderInput,decoded))
These two models will share the exact same weights of the entire model, training one model will affect all three models.
For using them without training, you can use model.predict(data), which will give you the results without training.