ResNet50 nan loss with Keras 2

ResNet50 nan loss with Keras 2 - keras

Since upgrading to Keras 2 I'm seeing nan loss when trying to fine tune ResNet50. Loss and accuracy look ok if I use a single convolutional layer (commented out below) instead of resnet. Am I missing something that changed with Keras 2?
from keras.applications.resnet50 import ResNet50
from keras.layers import Flatten, Dense, Input, Conv2D, Activation, Flatten
from keras.layers.pooling import MaxPooling2D
from keras.models import Model
from keras.optimizers import SGD
import numpy as np
inp = Input(batch_shape=(32, 224, 224, 3), name='input_image')
### resnet
modelres = ResNet50(weights="imagenet", include_top=False, input_tensor=inp)
x = modelres.output
x = Flatten()(x)
### single convolutional layer
#x = Conv2D(32, (3,3))(inp)
#x = Activation('relu')(x)
#x = MaxPooling2D(pool_size=(3,3))(x)
#x = Flatten()(x)
#x = Dense(units=32)(x)
predictions = Dense(units=2, kernel_initializer="he_normal", activation="softmax")(x)
model = Model(inputs=inp, outputs=predictions)
model.compile(SGD(lr=.001, momentum=0.9), "categorical_crossentropy", metrics=["accuracy"])
# generate images of all ones with the same label
def gen():
while True:
x_data = np.ones((32,224,224,3)).astype('float32')
y_data = np.zeros((32,2)).astype('float32')
y_data[:,1]=1.0
yield x_data, y_data
model.fit_generator(gen(), 10, validation_data=gen(), validation_steps=1)
The beginning and end of model.summary() looks like:
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_image (InputLayer) (32, 224, 224, 3) 0
____________________________________________________________________________________________________
zero_padding2d_1 (ZeroPadding2D) (32, 230, 230, 3) 0
____________________________________________________________________________________________________
conv1 (Conv2D) (32, 112, 112, 64) 9472
...
avg_pool (AveragePooling2D) (32, 1, 1, 2048) 0
____________________________________________________________________________________________________
flatten_1 (Flatten) (32, 2048) 0
____________________________________________________________________________________________________
dense_1 (Dense) (32, 2) 4098
====================================================================================================
Training output is:
Epoch 1/1
10/10 [==============================] - 30s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00

Everything works fine when I switch the backend to tensorflow instead of theano. Looks like something about the theano implementation broke in keras 2.

Related

Binary image classifier always predicting one class

I am trying to design a model for binary image classification, this is my first classifier and I am following an online tutorial but the model always predicts class 0
My dataset contains 3620 and 3651 images of each class respectively, I don't suppose the problem is due to an imbalanced dataset as the model is predicting only the class with lower number of sample in the dataset.
My code
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras import backend as K
img_hieght, img_width = 150,150
train_data_dir = 'dataset/train'
#validation_data_dir = 'dataset/validation'
nb_train_samples = 3000
#nb_validation_samples = 500
epochs = 10
batch_size = 16
if K.image_data_format() == 'channels_first':
input_shape = (3, img_width, img_hieght)
else:
input_shape = (img_width, img_hieght, 3)
model = Sequential()
model.add(Conv2D(32,(3,3), input_shape = input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(32,(3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64,(3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss = 'binary_crossentropy', optimizer = 'rmsprop', metrics = ['accuracy'])
train_datagen = ImageDataGenerator(
rescale = 1. /255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size = (img_width,img_hieght),
batch_size = batch_size,
class_mode = 'binary')
model.fit_generator(train_generator,
steps_per_epoch = nb_train_samples//batch_size,
epochs = epochs)
model.save('classifier.h5')
I have tried checking the model summary as well, but couldn't detect anything notable
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 148, 148, 32) 896
_________________________________________________________________
activation_1 (Activation) (None, 148, 148, 32) 0
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 74, 74, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 72, 72, 32) 9248
_________________________________________________________________
activation_2 (Activation) (None, 72, 72, 32) 0
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 36, 36, 32) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 34, 34, 64) 18496
_________________________________________________________________
activation_3 (Activation) (None, 34, 34, 64) 0
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 17, 17, 64) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 18496) 0
_________________________________________________________________
dense_1 (Dense) (None, 64) 1183808
_________________________________________________________________
activation_4 (Activation) (None, 64) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 64) 0
_________________________________________________________________
dense_2 (Dense) (None, 1) 65
_________________________________________________________________
activation_5 (Activation) (None, 1) 0
=================================================================
Total params: 1,212,513
Trainable params: 1,212,513
Non-trainable params: 0
_________________________________________________________________
None
I have not used validation dataset, I am using only training data and testing the model manually using:
import tensorflow as tf
from keras.preprocessing.image import ImageDataGenerator
batch_size = 16
path = 'dataset/test'
imgen = ImageDataGenerator(rescale=1/255.)
testGene = imgen.flow_from_directory(directory=path,
target_size=(150, 150,),
shuffle=False,
class_mode='binary',
batch_size=batch_size,
save_to_dir=None
)
model = tf.keras.models.load_model("classifier.h5")
pred = model.predict_generator(testGene, steps=testGene.n/batch_size)
print(pred)
Here are the accuracy and loss values per epochs:
Epoch 1/10
187/187 [==============================] - 62s 330ms/step - loss: 0.5881 - accuracy: 0.7182
Epoch 2/10
187/187 [==============================] - 99s 529ms/step - loss: 0.4102 - accuracy: 0.8249
Epoch 3/10
187/187 [==============================] - 137s 733ms/step - loss: 0.3266 - accuracy: 0.8646
Epoch 4/10
187/187 [==============================] - 159s 851ms/step - loss: 0.3139 - accuracy: 0.8620
Epoch 5/10
187/187 [==============================] - 112s 597ms/step - loss: 0.2871 - accuracy: 0.8873
Epoch 6/10
187/187 [==============================] - 60s 323ms/step - loss: 0.2799 - accuracy: 0.8847
Epoch 7/10
187/187 [==============================] - 66s 352ms/step - loss: 0.2696 - accuracy: 0.8870
Epoch 8/10
187/187 [==============================] - 57s 303ms/step - loss: 0.2440 - accuracy: 0.8947
Epoch 9/10
187/187 [==============================] - 56s 299ms/step - loss: 0.2478 - accuracy: 0.8994
Epoch 10/10
187/187 [==============================] - 53s 285ms/step - loss: 0.2448 - accuracy: 0.9047

You use only 3000 samples per epoch (see line nb_train_samples = 3000), while having 3620 and 3651 images for the each class. Given that model gets 90% accuracy and predicts only zeros, I suppose that you pass only class-zero images to the network during training. Consider increasing nb_train_samples.

keras copying VGG16 pretrained weights layer by layer

I want to copy some of the VGG16 layer weights layer by layer to another small network with alike layers, but I get an error that says:
File "/home/d/Desktop/s/copyweights.py", line 78, in <module>
list(f["model_weights"].keys())
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/usr/local/lib/python3.5/dist-packages/h5py/_hl/group.py", line 262, in __getitem__
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'model_weights' doesn't exist)"
The file is definitely in the path, I downloaded the file again just to make sure it is not corrupted (and it doesn't cause an error when I use model.load_weights in general - I also have HDF5 installed
Here is the code:
from keras import applications
from keras.preprocessing.image import ImageDataGenerator
from keras import optimizers
from keras.models import Sequential, Model
from keras.layers import Dropout, Flatten, Dense, GlobalAveragePooling2D
from keras import backend as k
from keras.callbacks import ModelCheckpoint, LearningRateScheduler, TensorBoard, EarlyStopping
from keras import layers
from keras import models
from keras import optimizers
from keras.layers import Dropout
from keras.regularizers import l2
from keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau
import matplotlib.pyplot as plt
from keras.preprocessing.image import ImageDataGenerator
import os
epochs = 50
callbacks = []
#schedule = None
decay = 0.0
earlyStopping = EarlyStopping(monitor='val_loss', patience=10, verbose=0, mode='min')
mcp_save = ModelCheckpoint('.mdl_wts.hdf5', save_best_only=True, monitor='val_loss', mode='min')
reduce_lr_loss = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3, verbose=1, epsilon=1e-5, mode='min')
base_model = models.Sequential()
base_model.add(layers.Conv2D(64, (3, 3), activation='relu', name='block1_conv1', input_shape=(256, 256, 3)))
base_model.add(layers.Conv2D(64, (3, 3), activation='relu', name='block1_conv2'))
base_model.add(layers.MaxPooling2D((2, 2)))
#model.add(Dropout(0.2))
base_model.add(layers.Conv2D(128, (3, 3), activation='relu', name='block2_conv1'))
base_model.add(layers.Conv2D(128, (3, 3), activation='relu', name='block2_conv2'))
base_model.add(layers.MaxPooling2D((2, 2), name='block2_pool'))
#model.add(Dropout(0.2))
base_model.summary()
"""
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 256, 256, 3) 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 256, 256, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 256, 256, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 128, 128, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 128, 128, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 128, 128, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 64, 64, 128) 0
=================================================================
Total params: 260,160.0
Trainable params: 260,160.0
Non-trainable params: 0.0
"""
layer_dict = dict([(layer.name, layer) for layer in base_model.layers])
[layer.name for layer in base_model.layers]
"""
['input_1',
'block1_conv1',
'block1_conv2',
'block1_pool',
'block2_conv1',
'block2_conv2',
'block2_pool']
"""
import h5py
weights_path = '/home/d/Desktop/s/vgg16_weights_new.h5' # ('https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg19_weights_tf_dim_ordering_tf_kernels.h5)
f = h5py.File(weights_path)
list(f["model_weights"].keys())
"""
['block1_conv1',
'block1_conv2',
'block1_pool',
'block2_conv1',
'block2_conv2',
'block2_pool',
'block3_conv1',
'block3_conv2',
'block3_conv3',
'block3_conv4',
'block3_pool',
'block4_conv1',
'block4_conv2',
'block4_conv3',
'block4_conv4',
'block4_pool',
'block5_conv1',
'block5_conv2',
'block5_conv3',
'block5_conv4',
'block5_pool',
'dense_1',
'dense_2',
'dense_3',
'dropout_1',
'global_average_pooling2d_1',
'input_1']
"""
# list all the layer names which are in the model.
layer_names = [layer.name for layer in base_model.layers]
"""
# Here we are extracting model_weights for each and every layer from the .h5 file
>>> f["model_weights"]["block1_conv1"].attrs["weight_names"]
array([b'block1_conv1/kernel:0', b'block1_conv1/bias:0'],
dtype='|S21')
# we are assiging this array to weight_names below
>>> f["model_weights"]["block1_conv1"]["block1_conv1/kernel:0]
<HDF5 dataset "kernel:0": shape (3, 3, 3, 64), type "<f4">
# The list comprehension (weights) stores these two weights and bias of both the layers
>>>layer_names.index("block1_conv1")
1
>>> model.layers[1].set_weights(weights)
# This will set the weights for that particular layer.
With a for loop we can set_weights for the entire network.
"""
for i in layer_dict.keys():
weight_names = f["model_weights"][i].attrs["weight_names"]
weights = [f["model_weights"][i][j] for j in weight_names]
index = layer_names.index(i)
base_model.layers[index].set_weights(weights)
base_model.add(layers.Flatten())
base_model.add(layers.Dropout(0.5)) #Dropout for regularization
base_model.add(layers.Dense(256, activation='relu'))
base_model.add(layers.Dense(1, activation='sigmoid')) #Sigmoid function at the end because we have just two classes
# compile the model with a SGD/momentum optimizer
# and a very slow learning rate.
base_model.compile(loss='binary_crossentropy',
optimizer=optimizers.Adam(lr=1e-4, decay=decay),
metrics=['accuracy'])
os.environ["CUDA_VISIBLE_DEVICES"]="0"
train_dir = '/home/d/Desktop/s/data/train'
eval_dir = '/home/d/Desktop/s/data/eval'
test_dir = '/home/d/Desktop/s/data/test'
# create a data generator
train_datagen = ImageDataGenerator(rescale=1./255, #Scale the image between 0 and 1
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,)
val_datagen = ImageDataGenerator(rescale=1./255) #We do not augment validation data. we only perform rescale
test_datagen = ImageDataGenerator(rescale=1./255) #We do not augment validation data. we only perform rescale
# load and iterate training dataset
train_generator = train_datagen.flow_from_directory(train_dir, target_size=(224,224),class_mode='binary', batch_size=16, shuffle='True', seed=42)
# load and iterate validation dataset
val_generator = val_datagen.flow_from_directory(eval_dir, target_size=(224,224),class_mode='binary', batch_size=16, shuffle='True', seed=42)
# load and iterate test dataset
test_generator = test_datagen.flow_from_directory(test_dir, target_size=(224,224), class_mode=None, batch_size=1, shuffle='False', seed=42)
#The training part
#We train for 64 epochs with about 100 steps per epoch
history = base_model.fit_generator(train_generator,
steps_per_epoch=train_generator.n // train_generator.batch_size,
epochs=epochs,
validation_data=val_generator,
validation_steps=val_generator.n // val_generator.batch_size,
callbacks=[earlyStopping, mcp_save, reduce_lr_loss])
#Save the model
#base_model.save_weights('/home/d/Desktop/s/base_model_weights.h5')
#base_model.save('/home/d/Desktop/s/base_model_keras.h5')
#lets plot the train and val curve
#get the details form the history object
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)
#Train and validation accuracy
plt.plot(epochs, acc, 'b', label='Training accuracy')
plt.plot(epochs, val_acc, 'r', label='Validation accuracy')
plt.title('Training and Validation accurarcy')
plt.legend()
plt.figure()
#Train and validation loss
plt.plot(epochs, loss, 'b', label='Training loss')
plt.plot(epochs, val_loss, 'r', label='Validation loss')
plt.title('Training and Validation loss')
plt.legend()
plt.show()

Take the VGG16 model directly:
from keras.applications import VGG16
vgg = VGG16(choose parameters)
for layer in vgg.layers:
layer_weights_list = layer.get_weights()

Error fitting the model - expected conv2d_3_input to have 4 dimensions

I am writing to build a model to predict handwritten characters using the dataset given here (https://www.kaggle.com/sachinpatel21/az-handwritten-alphabets-in-csv-format)
EDIT: ( after making the changes suggested in the comments )
Error I get now : ValueError: Error when checking input: expected conv2d_4_input to have shape (28, 28, 1) but got array with shape (249542, 784, 1)
Find below the code for the CNN :
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras import backend as K
from keras.utils import np_utils
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
seed = 785
np.random.seed(seed)
dataset = np.loadtxt('../input/A_Z Handwritten Data/A_Z Handwritten Data.csv', delimiter=',')
print(dataset.shape) # (372451, 785)
X = dataset[:,1:785]
Y = dataset[:,0]
(X_train, X_test, Y_train, Y_test) = train_test_split(X, Y, test_size=0.33, random_state=seed)
X_train = X_train / 255
X_test = X_test / 255
X_train = X_train.reshape((-1, X_train.shape[0], X_train.shape[1], 1))
X_test = X_test.reshape((-1, X_test.shape[0], X_test.shape[1], 1))
print(X_train.shape) # (1, 249542, 784, 1)
Y_train = np_utils.to_categorical(Y_train)
Y_test = np_utils.to_categorical(Y_test)
print(Y_test.shape) # (122909, 26)
num_classes = Y_test.shape[1] # 26
model = Sequential()
model.add(Conv2D(32, (5, 5), input_shape=(28, 28, 1), activation='relu', data_format="channels_last"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print("DONE")
model.fit(X_train, Y_train, validation_data=(X_test, Y_test), epochs=10, batch_size=256, verbose=2)
# Final evaluation of the model
scores = model.evaluate(X_test,Y_test, verbose=0)
print("CNN Error: %.2f%%" % (100-scores[1]*100))
model.save('weights.model')

So the problem is that your data isn't structured properly. Look at the solution below:
Read the data with pandas:
data = pd.read_csv('/users/vpolimenov/Downloads/A_Z Handwritten Data.csv')
data.shape
# shape: (372450, 785)
Get your X and y:
data.rename(columns={'0':'label'}, inplace=True)
X = data.drop('label',axis = 1)
y = data['label']
Split and scale:
X_train, X_test, y_train, y_test = train_test_split(X,y)
standard_scaler = MinMaxScaler()
standard_scaler.fit(X_train)
X_train = standard_scaler.transform(X_train)
X_test = standard_scaler.transform(X_test)
Here is the magic:
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1).astype('float32')
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1).astype('float32')
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
X_train.shape
# (279337, 28, 28, 1)
Here is your model:
num_classes = y_test.shape[1] # 26
model = Sequential()
model.add(Conv2D(32, (5, 5), input_shape=(28, 28, 1), activation='relu', data_format="channels_last"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print("DONE")
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=256, verbose=2) # WHERE I GET THE ERROR
Summary of your model:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_25 (Conv2D) (None, 24, 24, 32) 832
_________________________________________________________________
max_pooling2d_25 (MaxPooling (None, 12, 12, 32) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 12, 12, 32) 0
_________________________________________________________________
flatten_25 (Flatten) (None, 4608) 0
_________________________________________________________________
dense_42 (Dense) (None, 128) 589952
_________________________________________________________________
dense_43 (Dense) (None, 26) 3354
=================================================================
Total params: 594,138
Trainable params: 594,138
Non-trainable params: 0
I've stopped it after the second epoch, but you can see it working:
Train on 279337 samples, validate on 93113 samples
Epoch 1/10
- 80s - loss: 0.2478 - acc: 0.9308 - val_loss: 0.1021 - val_acc: 0.9720
Epoch 2/10
- 273s - loss: 0.0890 - acc: 0.9751 - val_loss: 0.0716 - val_acc: 0.9803
Epoch 3/10
Note:
It takes so long to fit due to the huge number of parameters in your network. You can try to reduce those and get a much faster/efficient network.

Keras: Dimensions must be equal

I was doing some classification with keras, when met this error:
InvalidArgumentError: Dimensions must be equal, but are 256 and 8 for 'dense_185/MatMul' (op: 'MatMul') with input shapes: [?,256], [8,300].
It surprised me because the dimension of the input to the dense is 1.
This is a sequential model with a few custom layers. I have no idea why 8 appears in the error of dense layer.
class Residual(Layer):
def __init__(self,input_shape,**kwargs):
super(Residual, self).__init__(**kwargs)
self.input_shapes = input_shape
def call(self, x):
print(np.shape(x)) # (?, 128, 8)
first_layer = Conv1D(256, 4, activation='relu', input_shape = self.input_shapes)(x)
print(np.shape(first_layer)) (?, 125, 256)
x = Conv1D(256, 4, activation='relu')(first_layer)
print(np.shape(x)) (?, 122, 256)
x = Conv1D(256, 4, activation='relu')(x)
print(np.shape(x)) (?, 119, 256)
x = ZeroPadding1D(padding=3)(x)
residual = Add()([x, first_layer])
x = Activation("relu")(residual)
return x
class Pooling(Layer):
def __init__(self,**kwargs):
super(Pooling, self).__init__(**kwargs)
def call(self, x):
first_layer = GlobalMaxPooling1D(data_format='channels_last')(x)
second_layer = GlobalAveragePooling1D(data_format='channels_last')(x)
pooling = Add()([first_layer, second_layer])
print(np.shape(pooling)) (?, 256)
return pooling
model = Sequential()
model.add(Residual(input_shape=(128,8)))
model.add(Pooling())
model.add(Dense(300, activation='relu'))
model.add(Dense(150, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=keras.optimizers.Adadelta(), metrics=['accuracy'])
model.fit(np.array(dataset_data), dataset_target, epochs=1000, validation_split=0.1, verbose=1, batch_size=8)
Dimensions:
(1000, 128, 8) - input (1000 audio, 8 features, 128 seq_length)
(1000, 10) - target (1000 audio, 10 classes)

I think there are two edits required:
Add InputLayer as entrance for the data
Define compute_output_shape method at least for Pooling layer (link). If this method is not defined, Dense layer can't figure out what's input shape for it, I guess, and then fails.
Also there's minor editing - since model have InputLayer, you need no more input_shape kwarg in Residual layer.
class Residual(Layer):
def __init__(self, **kwargs): # remove input shape
super(Residual, self).__init__(**kwargs)
def call(self, x):
print(np.shape(x))
first_layer = Conv1D(256, 4, activation='relu')(x)
print(np.shape(first_layer))
x = Conv1D(256, 4, activation='relu')(first_layer)
print(np.shape(x))
x = Conv1D(256, 4, activation='relu')(x)
print(np.shape(x))
x = ZeroPadding1D(padding=3)(x)
residual = Add()([x, first_layer])
x = Activation("relu")(residual)
return x
class Pooling(Layer):
def __init__(self, **kwargs):
super(Pooling, self).__init__(**kwargs)
def call(self, x):
# !!! I build model without data_format argument - my version of keras
# doesn't support it !!!
first_layer = GlobalMaxPooling1D(data_format='channels_last')(x)
second_layer = GlobalAveragePooling1D(data_format='channels_last')(x)
pooling = Add()([first_layer, second_layer])
print(np.shape(pooling))
self.output_dim = int(np.shape(pooling)[-1]) # save output shape
return pooling
def compute_output_shape(self, input_shape):
# compute output shape here
return (input_shape[0], self.output_dim)
Initialize model:
model = Sequential()
model.add(InputLayer((128,8)))
model.add(Residual())
model.add(Pooling())
model.add(Dense(300, activation='relu'))
model.add(Dense(150, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
Out:
(?, 128, 8)
(?, 125, 256)
(?, 122, 256)
(?, 119, 256)
(?, 256)
Summary of the model (don't know why Residual and Pooling don't show params the have. I guess some additional method required for this classes to count internal params):
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
residual_10 (Residual) (None, 128, 8) 0
_________________________________________________________________
pooling_8 (Pooling) (None, 256) 0
_________________________________________________________________
dense_15 (Dense) (None, 300) 77100
_________________________________________________________________
dense_16 (Dense) (None, 150) 45150
_________________________________________________________________
dense_17 (Dense) (None, 10) 1510
=================================================================
Total params: 123,760
Trainable params: 123,760
Non-trainable params: 0
_________________________________________________________________
Create fake data and check training process:
dataset_data = np.random.randn(1000, 128, 8)
dataset_target = np.zeros((1000, 10))
dataset_target[:, 0] = 1
model.fit(np.array(dataset_data), dataset_target, epochs=1000,
validation_split=0.1, verbose=1, batch_size=8)
Train on 900 samples, validate on 100 samples
Epoch 1/1000
900/900 [==============================] - 2s 2ms/step - loss: 0.0235 - acc: 0.9911 - val_loss: 9.4426e-05 - val_acc: 1.0000
Epoch 2/1000
900/900 [==============================] - 1s 1ms/step - loss: 4.2552e-05 - acc: 1.0000 - val_loss: 1.7458e-05 - val_acc: 1.0000
Epoch 3/1000
900/900 [==============================] - 1s 1ms/step - loss: 1.1342e-05 - acc: 1.0000 - val_loss: 7.3141e-06 - val_acc: 1.0000
... and so on
Looks like it works.

ValueError with Dense() layer in Keras

I'm trying to train a model that takes in two inputs, concatenates them, and feeds the result into an LSTM. The last layer is a Dense() call, and the targets are binary vectors (with more than one 1). The task is classification.
My input sequences are 50 rows of 23 timesteps with 5625 features (x_train), and my supplementary input (not really a sequence) is 50 one-hot rows of length 23 (total_hours)
The error I'm getting is:
ValueError: Error when checking target: expected dense_1 to have shape (1, 5625)
but got array with shape (5625, 1)
And my code is:
import numpy as np
from keras.layers import LSTM, Dense, Input, Concatenate
from keras.models import Model
#CREATING DUMMY INPUT
hours_input_1 = np.eye(23)
hours_input_2 = np.eye(23)
hours_input_3 = np.pad(np.eye(4), pad_width=((0, 19), (0, 19)), mode='constant')
hours_input_3 = hours_input_3[:4,]
total_hours = np.vstack((hours_input_1, hours_input_2, hours_input_3))
seq_input = np.random.normal(size=(50, 24, 5625))
y_train = np.array([seq_input[i, -1, :] for i in range(50)])
x_train = np.array([seq_input[i, :-1, :] for i in range(50)])
#print 'total_hours', total_hours.shape #(50, 23)
#print 'x_train', x_train.shape #(50, 23, 5625)
#print 'y_train shape', y_train.shape #(50, 5625)
#MODEL DEFINITION
seq_model_in = Input(shape=(1,), batch_shape=(1, 1, 5625))
hours_model_in = Input(shape=(1,), batch_shape=(1, 1, 1))
merged = Concatenate(axis=-1)([seq_model_in, hours_model_in])
#print merged.shape #(1, 1, 5626) = added the 'hour' on as an extra feature
merged_lstm = LSTM(10, batch_input_shape=(1, 1, 5625), return_sequences=False, stateful=True)(merged)
merged_dense = Dense(5625, activation='sigmoid')(merged_lstm)
model = Model(inputs=[seq_model_in, hours_model_in], outputs=merged_dense)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
#TRAINING
for epoch in range(10):
for i in range(50):
y_true = y_train[i,:]
for j in range(23):
input_1 = np.expand_dims(np.expand_dims(x_train[i][j], axis=1), axis=1)
input_1 = np.reshape(input_1, (1, 1, x_train.shape[2]))
input_2 = np.expand_dims(np.expand_dims(np.array([total_hours[i][j]]), axis=1), axis=1)
tr_loss, tr_acc = model.train_on_batch([input_1, input_2], y_true)#np.array([y_true]))
model.reset_states()
My model.summary() looks like this:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (1, 1, 5625) 0
__________________________________________________________________________________________________
input_2 (InputLayer) (1, 1, 1) 0
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (1, 1, 5626) 0 input_1[0][0]
input_2[0][0]
__________________________________________________________________________________________________
lstm_1 (LSTM) (1, 10) 225480 concatenate_1[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (1, 5625) 61875 lstm_1[0][0]
==================================================================================================
Total params: 287,355
Trainable params: 287,355
Non-trainable params: 0
__________________________________________________________________________________________________
I am working with Keras version 2.1.2 with the TensorFlow backend (TensorFlow version 1.4.0. How can I resolve the ValueError?

It turns out I needed to address the target, as the ValueError implied.
If you replace:
y_true = y_train[i,:]
with:
y_true_1 = np.expand_dims(y_train[i,:], axis=1)
y_true = np.swapaxes(y_true_1, 0, 1)
The code runs.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

ResNet50 nan loss with Keras 2 - keras

Everything works fine when I switch the backend to tensorflow instead of theano. Looks like something about the theano implementation broke in keras 2.

Related

Binary image classifier always predicting one class

keras copying VGG16 pretrained weights layer by layer

Error fitting the model - expected conv2d_3_input to have 4 dimensions

Keras: Dimensions must be equal

ValueError with Dense() layer in Keras

Categories

Resources