Implementing and tuning a simple CNN for 3D data using Keras Conv3D - keras

I'm trying to implement a 3D CNN using Keras. However, I am having some difficulties understanding some details in the results obtained and further enhancing the accuracy.
The data that I am trying to analyzing have the shape {64(d1)x64(d2)x38(d3)}, where d1 and d2 are the length and width of the image (64x64 pixels) and d3 is the time dimension. In other words, I have 38 images. The channel parameter is set to 1 as my data are actually raw data and not really colorful images.
My data consist of 219 samples, hence 219x64x64x38. They are divided into training and validation sets with 20% for validation. In addition, I have a fixed 147 additional data for testing.
Below is my code that works fine. It creates a txt file that saves the results for the different combination of parameters in my network (grid search). Here in this code, I only consider tuning 2 parameters: the number of filters and lambda for L2 regularizer. I fixed the dropout and the kernel size for the filters. However, later I considered their variations.
I also tried to set the seed value so that I have some sort of reproducibility (I don't think that I have achieved this task).
My question is that:
Given the below architecture and code, I always reach for all the given combinations of parameters a convergence for the training accuracy towards 1 (which is good). However, for the validation accuracy it is most of the time around 80% +/- 4% (rarely below 70%) despite the hyper-parameters combination. Similar behavior for the test accuracy. How can I enhance this accuracy to above 90% ?
As far as I know, having a gap between the train and validation/test accuracy is a result from overfitting. However, in my model I am adding dropouts and L2 regularizers and also changing the size of my network which should somehow reduce this gap (but it is not).
Is there anything else I can do besides modifying my input data? Does adding more layers help? Or is there maybe a pre-trained 3D CNN like in the case of 2D CNN (e.g., AlexNet)? Should I try ConvLSTM? Is this the limit of this architecture?
Thank you :)
import numpy as np
import tensorflow as tf
import keras
from keras.models import Sequential
from keras.layers import Conv3D, MaxPooling3D, Dense, Flatten, Activation
from keras.utils import to_categorical
from keras.regularizers import l2
from keras.layers import Dropout
from keras.utils import multi_gpu_model
import scipy.io as sio
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score
from keras.callbacks import ReduceLROnPlateau
tf.set_random_seed(1234)
def normalize_minmax(X_train):
"""
Normalize to [0,1]
"""
from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()
X_minmax_train = min_max_scaler.fit_transform(X_train)
return X_minmax_train
# generate and prepare the dataset
def get_data():
# Load and prepare the data
X_data = sio.loadmat('./X_train')['X_train']
Y_data = sio.loadmat('./Y_train')['targets_train']
X_test = sio.loadmat('./X_test')['X_test']
Y_test = sio.loadmat('./Y_test')['targets_test']
return X_data, Y_data, X_test, Y_test
def get_model(X_train, Y_train, X_validation, Y_validation, F1_nb, F2_nb, F3_nb, kernel_size_1, kernel_size_2, kernel_size_3, l2_lambda, learning_rate, reduce_lr, dropout_conv1, dropout_conv2, dropout_conv3, dropout_dense, no_epochs):
no_classes = 5
sample_shape = (64, 64, 38, 1)
batch_size = 32
dropout_seed = 30
conv_seed = 20
# Create the model
model = Sequential()
model.add(Conv3D(F1_nb, kernel_size=kernel_size_1, kernel_regularizer=l2(l2_lambda), padding='same', kernel_initializer='glorot_uniform', input_shape=sample_shape))
model.add(Activation('selu'))
model.add(MaxPooling3D(pool_size=(2,2,2)))
model.add(Dropout(dropout_conv1, seed=conv_seed))
model.add(Conv3D(F2_nb, kernel_size=kernel_size_2, kernel_regularizer=l2(l2_lambda), padding='same', kernel_initializer='glorot_uniform'))
model.add(Activation('selu'))
model.add(MaxPooling3D(pool_size=(2,2,2)))
model.add(Dropout(dropout_conv2, seed=conv_seed))
model.add(Conv3D(F3_nb, kernel_size=kernel_size_3, kernel_regularizer=l2(l2_lambda), padding='same', kernel_initializer='glorot_uniform'))
model.add(Activation('selu'))
model.add(MaxPooling3D(pool_size=(2,2,2)))
model.add(Dropout(dropout_conv3, seed=conv_seed))
model.add(Flatten())
model.add(Dense(512, kernel_regularizer=l2(l2_lambda), kernel_initializer='glorot_uniform'))
model.add(Activation('selu'))
model.add(Dropout(dropout_dense, seed=dropout_seed))
model.add(Dense(no_classes, activation='softmax'))
model = multi_gpu_model(model, gpus = 2)
# Compile the model
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adam(lr=learning_rate),
metrics=['accuracy'])
# Train the model.
history = model.fit(X_train, Y_train, batch_size=batch_size, epochs=no_epochs, validation_data=(X_validation, Y_validation),callbacks=[reduce_lr])
return model, history
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=5, min_lr=0.0001)
learning_rate = 0.001
no_epochs = 100
X_data, Y_data, X_test, Y_test = get_data()
# Normalize the train/val data
for i in range(X_data.shape[0]):
for j in range(X_data.shape[3]):
X_data[i,:,:,j] = normalize_minmax(X_data[i,:,:,j])
X_data = np.expand_dims(X_data, axis=4)
# Normalize the test data
for i in range(X_test.shape[0]):
for j in range(X_test.shape[3]):
X_test[i,:,:,j] = normalize_minmax(X_test[i,:,:,j])
X_test = np.expand_dims(X_test, axis=4)
# Shuffle the training data
# fix random seed for reproducibility
seedValue = 40
permutation = np.random.RandomState(seed=seedValue).permutation(len(X_data))
X_data = X_data[permutation]
Y_data = Y_data[permutation]
Y_data = np.squeeze(Y_data)
Y_test = np.squeeze(Y_test)
#Split between train and validation (20%). Here I did not use the classical validation_split=0.2 just to make sure that the data is the same for the different architectures I am using.
X_train = X_data[0:175,:,:,:,:]
Y_train = Y_data[0:175]
X_validation = X_data[176:,:,:,:]
Y_validation = Y_data[176:]
Y_train = to_categorical(Y_train,num_classes=5).astype(np.integer)
Y_validation = to_categorical(Y_validation,num_classes=5).astype(np.integer)
Y_test = to_categorical(Y_test,num_classes=5).astype(np.integer)
l2_lambda_list = [(1*pow(10,-4)),(2*pow(10,-4)),
(3*pow(10,-4)),
(4*pow(10,-4)),
(5*pow(10,-4)),(6*pow(10,-4)),
(7*pow(10,-4)),
(8*pow(10,-4)),(9*pow(10,-4)),(10*pow(10,-4))
]
filters_nb = [(128,64,64),(128,64,32),(128,64,16),(128,64,8),(128,32,32),(128,32,16),(128,32,8),(128,16,8),(128,8,8),
(64,64,32),(64,64,16),(64,64,8),(64,32,32),(64,32,16),(64,32,8),(64,16,16),(64,16,8),(64,8,8),
(32,32,16),(32,32,8),(32,16,16),(32,16,8),(32,8,8),
(16,16,16),(16,16,8),(16,8,8)
]
DropOut = [(0.25,0.25,0.25,0.5),
(0,0,0,0.1),(0,0,0,0.2),(0,0,0,0.3),(0,0,0,0.4),(0,0,0,0.5),
(0.1,0.1,0.1,0),(0.2,0.2,0.2,0),(0.3,0.3,0.3,0),(0.4,0.4,0.4,0),(0.5,0.5,0.5,0),
(0.1,0.1,0.1,0.1),(0.1,0.1,0.1,0.2),(0.1,0.1,0.1,0.3),(0.1,0.1,0.1,0.4),(0.1,0.1,0.1,0.5),
(0.15,0.15,0.15,0.1),(0.15,0.15,0.15,0.2),(0.15,0.15,0.15,0.3),(0.15,0.15,0.15,0.4),(0.15,0.15,0.15,0.5),
(0.2,0.2,0.2,0.1),(0.2,0.2,0.2,0.2),(0.2,0.2,0.2,0.3),(0.2,0.2,0.2,0.4),(0.2,0.2,0.2,0.5),
(0.25,0.25,0.25,0.1),(0.25,0.25,0.25,0.2),(0.25,0.25,0.25,0.3),(0.25,0.25,0.25,0.4),(0.25,0.25,0.25,0.5),
(0.3,0.3,0.3,0.1),(0.3,0.3,0.3,0.2),(0.3,0.3,0.3,0.3),(0.3,0.3,0.3,0.4),(0.3,0.3,0.3,0.5),
(0.35,0.35,0.35,0.1),(0.35,0.35,0.35,0.2),(0.35,0.35,0.35,0.3),(0.35,0.35,0.35,0.4),(0.35,0.35,0.35,0.5)
]
kernel_size = [(3,3,3),
(2,3,3),(2,3,4),(2,3,5),(2,3,6),(2,3,7),(2,3,8),(2,3,9),(2,3,10),(2,3,11),(2,3,12),(2,3,13),(2,3,14),(2,3,15),
(3,3,3),(3,3,4),(3,3,5),(3,3,6),(3,3,7),(3,3,8),(3,3,9),(3,3,10),(3,3,11),(3,3,12),(3,3,13),(3,3,14),(3,3,15),
(3,4,3),(3,4,4),(3,4,5),(3,4,6),(3,4,7),(3,4,8),(3,4,9),(3,4,10),(3,4,11),(3,4,12),(3,4,13),(3,4,14),(3,4,15),
]
for l in range(len(l2_lambda_list)):
l2_lambda = l2_lambda_list[l]
f = open("My Results.txt", "a")
lambda_Str = str(l2_lambda)
f.write("---------------------------------------\n")
f.write("lambda = "+f"{lambda_Str}\n")
f.write("---------------------------------------\n")
for i in range(len(filters_nb)):
F1_nb = filters_nb[i][0]
F2_nb = filters_nb[i][1]
F3_nb = filters_nb[i][2]
kernel_size_1 = kernel_size[0]
kernel_size_2 = kernel_size_1
kernel_size_3 = kernel_size_1
dropout_conv1 = DropOut[0][0]
dropout_conv2 = DropOut[0][1]
dropout_conv3 = DropOut[0][2]
dropout_dense = DropOut[0][3]
# fit model
model, history = get_model(X_train, Y_train, X_validation, Y_validation, F1_nb, F2_nb, F3_nb, kernel_size_1, kernel_size_2, kernel_size_3, l2_lambda, learning_rate, reduce_lr, dropout_conv1, dropout_conv2, dropout_conv3, dropout_dense, no_epochs)
# Evaluate metrics
predictions = model.predict(X_test)
out = np.argmax(predictions, axis=1)
Y_test = sio.loadmat('./Y_test')['targets_test']
Y_test = np.squeeze(Y_test)
loss = history.history['loss'][no_epochs-1]
acc = history.history['acc'][no_epochs-1]
val_loss = history.history['val_loss'][no_epochs-1]
val_acc = history.history['val_acc'][no_epochs-1]
# accuracy: (tp + tn) / (p + n)
accuracy = accuracy_score(Y_test, out)
# f1: 2 tp / (2 tp + fp + fn)
f1 = f1_score(Y_test, out,average='macro')
a = str(filters_nb[i][0]) + ',' + str(filters_nb[i][1]) + ',' + str(filters_nb[i][2]) + ': ' + str('f1-metric: ') + str('%f' % f1) + str(' | loss: ') + str('%f' % loss) + str(' | acc: ') + str('%f' % acc) + str(' | val_loss: ') + str('%f' % val_loss) + str(' | val_acc: ') + str('%f' % val_acc) + str(' | test_acc: ') + str('%f' % accuracy)
f.write(f"{a}\n")
f.close()

Related

Calculate gradient of validation error w.r.t inputs using Keras/Tensorflow or autograd

I need to calculate the gradient of the validation error w.r.t inputs x. I'm trying to see how much the validation error changes when I perturb one of the training samples.
The validation error (E) explicitly depends on the model weights (W).
The model weights explicitly depend on the inputs (x and y).
Therefore, the validation error implicitly depends on the inputs.
I'm trying to calculate the gradient of E w.r.t x directly.
An alternative approach would be to calculate the gradient of E w.r.t W (can easily be calculated) and the gradient of W w.r.t x (cannot do at the moment), which would allow the gradient of E w.r.t x to be calculated.
I have attached a toy example. Thanks in advance!
import numpy as np
import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical
import tensorflow as tf
from autograd import grad
train_images = mnist.train_images()
train_labels = mnist.train_labels()
test_images = mnist.test_images()
test_labels = mnist.test_labels()
# Normalize the images.
train_images = (train_images / 255) - 0.5
test_images = (test_images / 255) - 0.5
# Flatten the images.
train_images = train_images.reshape((-1, 784))
test_images = test_images.reshape((-1, 784))
# Build the model.
model = Sequential([
Dense(64, activation='relu', input_shape=(784,)),
Dense(64, activation='relu'),
Dense(10, activation='softmax'),
])
# Compile the model.
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'],
)
# Train the model.
model.fit(
train_images,
to_categorical(train_labels),
epochs=5,
batch_size=32,
)
model.save_weights('model.h5')
# Load the model's saved weights.
# model.load_weights('model.h5')
calculate_mse = tf.keras.losses.MeanSquaredError()
test_x = test_images[:5]
test_y = to_categorical(test_labels)[:5]
train_x = train_images[:1]
train_y = to_categorical(train_labels)[:1]
train_y = tf.convert_to_tensor(train_y, np.float32)
train_x = tf.convert_to_tensor(train_x, np.float64)
with tf.GradientTape() as tape:
tape.watch(train_x)
model.fit(train_x, train_y, epochs=1, verbose=0)
valid_y_hat = model(test_x, training=False)
mse = calculate_mse(test_y, valid_y_hat)
de_dx = tape.gradient(mse, train_x)
print(de_dx)
# approach 2 - does not run
def calculate_validation_mse(x):
model.fit(x, train_y, epochs=1, verbose=0)
valid_y_hat = model(test_x, training=False)
mse = calculate_mse(test_y, valid_y_hat)
return mse
train_x = train_images[:1]
train_y = to_categorical(train_labels)[:1]
validation_gradient = grad(calculate_validation_mse)
de_dx = validation_gradient(train_x)
print(de_dx)
Here's how you can do this. Derivation is as below.
Few things to note,
I have reduced the feature size from 784 to 256 as I was running out of memory in colab (line marked in the code) . Might have to do some mem profiling to find out why
Only computed grads for the first layer. Easily extendable to other layers
Disclaimer: this derivation is correct to best of my knowledge. Please do some research and verify that it is the case. You will run into memory issues for larger inputs and layer sizes.
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical
import tensorflow as tf
f = 256
model = Sequential([
Dense(64, activation='relu', input_shape=(f,)),
Dense(64, activation='relu'),
Dense(10, activation='softmax'),
])
# Compile the model.
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'],
)
w = model.weights[0]
# Inputs and labels
x_tr = tf.Variable(np.random.normal(size=(1,f)), shape=(1, f), dtype='float32')
y_tr = np.random.choice([0,1,2,3,4,5,6,7,8,9], size=(1,1))
y_tr_onehot = tf.keras.utils.to_categorical(y_tr, num_classes=10).astype('float32')
x_v = tf.Variable(np.random.normal(size=(1,f)), shape=(1, f), dtype='float32')
y_v = np.random.choice([0,1,2,3,4,5,6,7,8,9], size=(1,1))
y_v_onehot = tf.keras.utils.to_categorical(y_v, num_classes=10).astype('float32')
# In the context of GradientTape
with tf.GradientTape() as tape1:
with tf.GradientTape() as tape2:
y_tr_pred = model(x_tr)
tr_loss = tf.keras.losses.MeanSquaredError()(y_tr_onehot, y_tr_pred)
tmp_g = tape2.gradient(tr_loss, w)
print(tmp_g.shape)
# d(dE_tr/d(theta))/dx
# Warning this step consumes lot of memory for large layers
lr = 0.001
grads_1 = -lr * tape1.jacobian(tmp_g, x_tr)
with tf.GradientTape() as tape3:
y_v_pred = model(x_v)
v_loss = tf.keras.losses.MeanSquaredError()(y_v_onehot, y_v_pred)
# dE_val/d(theta)
grads_2 = tape3.gradient(v_loss, w)[tf.newaxis, :]
# Just crunching the dimension to get the final desired shape of (1,256)
grad = tf.matmul(tf.reshape(grads_2,[1, -1]), tf.reshape(tf.transpose(grads_1,[2,1,0,3]),[1, -1, 256]))

Writing my own FGSM Adversarial attack always gives zero perturbation

I'm trying to write a simple FGSM attack on MNIST. I tried the foolbox library and it seems to work but the FGSM is very slow (perhaps because it searches for a minimum eps/perturbation which gives a different target label). I started writing my own and my code always gives me zero perturbation. I.e., if I plot the x_adversarial it is the same as x_input. I checked that the gradient computation leads all zero matrix. The computed loss function is small, but I imagine there is some gradient to this loss function. Can somebody please help me? I have been racking my head for a week now without any progress. Thanks again!
import tensorflow as tf
import numpy as np
import foolbox
import tensorflow.keras.backend as K
import matplotlib.pyplot as plt
# Importing the required Keras modules containing model and layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Dropout, Flatten, MaxPooling2D
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
num_digits_to_classify = 10
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
input_shape = (28, 28, 1)
# Making sure that the values are float so that we can get decimal points after division
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
# Normalizing the RGB codes by dividing it to the max RGB value.
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print('Number of images in x_train', x_train.shape[0])
print('Number of images in x_test', x_test.shape[0])
def create_model_deep():
model = Sequential()
model.add(Conv2D(32, kernel_size=(5,5), activation='relu', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Conv2D(64,kernel_size=(5,5),activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Flatten()) # Flattening the 2D arrays for fully connected layers
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(num_digits_to_classify,activation='softmax'))
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
model = create_model_deep()
model.summary()
model.fit(x=x_train,y=y_train, epochs=10)
model.evaluate(x_test, y_test)
########################## foolbox FGSM attack ###############################################
from keras.backend import set_learning_phase
set_learning_phase(0)
from foolbox.criteria import Misclassification
fmodel = foolbox.models.TensorFlowModel.from_keras(model, bounds=(0,1))
attack = foolbox.attacks.FGSM(fmodel, criterion=Misclassification())
fgsm_error = 0.0
for i in range(x_test.shape[0]):
if i%1000 == 0:
print(i)
adversarial = attack(x_test[i],y_test[i])
if adversarial is not None:
adversarial = adversarial.reshape(1,28,28,1)
model_predictions = model.predict(adversarial)
label = np.argmax(model_predictions)
if label != y_test[i]:
fgsm_error = fgsm_error + 1.0
fgsm_error = fgsm_error/x_test.shape[0]
########################## My own FGSM attack ###############################################
sess = K.get_session()
eps = 0.3
x_adv = tf.placeholder(tf.float32,shape=(1,28,28,1),name="adv_example")
x_noise = tf.placeholder(tf.float32,shape=(1,28,28,1),name="adv_noise")
x_input = x_test[0].reshape(1,28,28,1)
y_input = y_test[0]
def loss_fn(y_true, y_pred):
return K.sparse_categorical_crossentropy(y_true, y_pred)
grad = K.gradients(loss_fn(y_input,model.output), model.input)
delta = K.sign(grad[0])
x_noise = x_noise + delta
x_adv = x_adv + eps*delta
x_adv = K.clip(x_adv,0.0,1.0)
x_adv, x_noise, grad = sess.run([x_adv, x_noise, grad], feed_dict={model.input:x_input, x_adv:x_input, x_noise:np.zeros_like(x_input)})
pred = model.predict(x_adv)
The following code seems to work now. Please look at the comment I made below.
sess = K.get_session()
eps = 0.3
i = 100
x_input = x_test[i].reshape(1,28,28,1)
y_input = y_test[i]
x_adv = x_input
# Added noise
x_noise = np.zeros_like(x_input)
def loss_fn(y_true, y_pred):
target = K.one_hot(y_true,10)
loss = K.categorical_crossentropy(target, y_pred)
return loss
#loss = K.print_tensor(loss,message='loss = ')
#return K.sparse_categorical_crossentropy(y_true, y_pred)
def loss_fn_sparse(y_true, y_pred):
loss = K.sparse_categorical_crossentropy(y_true, y_pred)
return loss
image = K.cast(x_input,dtype='float32')
y_pred = model(image)
loss = loss_fn_sparse(y_input, y_pred)
grad = K.gradients(loss, image)
delta = K.sign(grad[0])
x_noise = x_noise + delta
x_adv = x_adv + eps*delta
x_adv = K.clip(x_adv,0.0,1.0)
x_adv, x_noise = sess.run([x_adv, x_noise], feed_dict={model.input:x_input})
pred = model.predict(x_adv)

How to use SimpleRNN to model a modulus function & predict many to many sequence

I have designed this toy problem to understand the working of SimpleRNN in Keras.
My input sequence is:
[x1,x2,x3,x4,x5]
and the corresponding output is:
[(0+x1)%2,(x1+x2)%2,(x2+x3)%2,(x3+x4)%2,(x4+x5)%2)]
My code is:
import numpy as np
import random
from scipy.ndimage.interpolation import shift
def generate_sequence():
max_len = 5
x = np.random.randint(1,100,max_len)
shifted_x = shift(x, 1, cval=0)
y = (x + shifted_x) % 2
return x.reshape(max_len,1),y.reshape(max_len,1),shifted_x.reshape(max_len,1)
X_train = np.zeros((100,5,1))
y_train = np.zeros((100,5,1))
for i in range(100):
x,y,z = generate_sequence()
X_train[i] = x
y_train[i] = y
X_test = np.zeros((100,5,1))
y_test = np.zeros((100,5,1))
for i in range(100):
x,y,z = generate_sequence()
X_test[i] = x
y_test[i] = y
from keras.layers import SimpleRNN
model = Sequential()
model.add(SimpleRNN(3,input_shape=(5,1),return_sequences=True,name='rnn'))
model.add(Dense(1,activation='sigmoid'))
# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
print('Train...')
model.fit(X_train, y_train,
batch_size=70,
epochs=200,verbose=0,validation_split=0.3)
score, acc = model.evaluate(X_test, y_test,
batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)
When I train this SimpleRNN I only get an accuracy of 50%, each item in the sequence only depends on the previous item. Why is the RNN struggling to learn this?
100/100 [==============================] - 0s 37us/step
Test score: 0.6975522041320801
Test accuracy: 0.5120000243186951
UPDATE:
It turns out mod function is very hard to model, I switched to simple data generation strategy like y[t] = x[t] < x[t-1], then I could see the model performing with 80% binary accuracy.
def generate_rnn_sequence():
max_len = 5
x = np.random.randint(1,100,max_len)
shifted_x = shift(x, 1, cval=0)
y = (x < shifted_x).astype(float)
return x.reshape(5,1),y.reshape(5,1)
So How do i model a mod function using a RNN?

Error in reshaping input tokenized text predicting the sentiments in a lstm rnn

I am new to neural network and have been learning it's application in the field of text analytics, so i have used a lstm rnn for the application in python.
After training the model on a dataset of dimension 20,000*1 (2000-being the text and ,1-being the sentiment of the text) i got a good accuracy of 99%, after which i validated the model which was working fine(using the model.predict()function).
Now just to test my model i have been trying to give random text inputs either from a dataframe or variables containing some text but i always landup with the error of reshaping the array , where it is required that the input to the rnn model be of the dimension (1,30).
But when i re-input the training data into the model for prediction , the model works absolutely fine , why is this happening?
link for the screenshot of error
link for image of model summary
training data
I am just stuck here and any kind of suggestion will help me learning more about rnn, i am attaching the error and the rnn model code with this request.
Thank You
Regards
Tushar Upadhyay
import numpy as np
import pandas as pd
import keras
import sklearn
from sklearn.feature_extraction.text import CountVectorizer
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense, Embedding, LSTM
from sklearn.model_selection import train_test_split
from keras.utils.np_utils import to_categorical
import re
data=pd.read_csv('..../twitter_tushar_data.csv')
max_fatures = 4000
tokenizer = Tokenizer(num_words=max_fatures, split=' ')
tokenizer.fit_on_texts(data['tweetText'].values)
X = tokenizer.texts_to_sequences(data['tweetText'].values)
X = pad_sequences(X)
embed_dim = 128
lstm_out = 196
model = Sequential()
keras.layers.core.SpatialDropout1D(0.2) #used to avoid overfitting
model.add(Embedding(max_fatures, embed_dim,input_length = X.shape[1]))
model.add(LSTM(196, recurrent_dropout=0.2, dropout=0.2))
model.add(Dense(2,activation='softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics
= ['accuracy'])
print(model.summary())
#splitting data in training and testing parts
Y = pd.get_dummies(data['SA']).values
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size =
0.30, random_state = 42)
print(X_train.shape,Y_train.shape)
print(X_test.shape,Y_test.shape)
batch_size = 128
model.fit(X_train, Y_train, epochs = 7, batch_size=batch_size, verbose =
2)
validation_size = 3500
X_validate = X_test[-validation_size:]
Y_validate = Y_test[-validation_size:]
X_test = X_test[:-validation_size]
Y_test = Y_test[:-validation_size]
score,acc = model.evaluate(X_test, Y_test, verbose = 2, batch_size = 128)
print("score: %.2f" % (score))
print("acc: %.2f" % (acc))
pos_cnt, neg_cnt, pos_correct, neg_correct = 0, 0, 0, 0
for x in range(len(X_validate)):
result =
model.predict(X_validate[x].reshape(1,X_test.shape[1]),batch_size=1,verbose
= 2)[0]
if np.argmax(result) == np.argmax(Y_validate[x]):
if np.argmax(Y_validate[x]) == 0:
neg_correct += 1
else:
pos_correct += 1
if np.argmax(Y_validate[x]) == 0:
neg_cnt += 1
else:
pos_cnt += 1
print("pos_acc", pos_correct/pos_cnt*100, "%")
print("neg_acc", neg_correct/neg_cnt*100, "%")
I got the solution to my question, it was just a matter of tokenizing the input properly, Thanks !! The code is below for prediction of different user inputs..
text=np.array(['you are a pathetic awful movie'])
print(text.shape)
tk=Tokenizer(num_words=4000,lower=True,split=" ")
tk.fit_on_texts(text)
prediction=model.predict(sequence.pad_sequences(tk.texts_to_sequences(text),
maxlen=max_review_length))
print(prediction)
print(np.argmax(prediction))

About 'Building Autoencoders in Keras'?

I read the Building Autoencoders in Keras, the url is https://blog.keras.io/building-autoencoders-in-keras.html
In the section of Adding a sparsity constraint on the encoded representations, I have tried according to his description, but the loss can't down to 0.11, instead around 0.26.
So the result is fuzzy:
Can anyone who has done this experiment tell me what's wrong with it?
It's my code:
from keras.layers import Input, Dense
from keras.models import Model
from keras import regularizers
encoding_dim = 32 # 压缩后维度
input_img = Input(shape = (784,))
# 编码
encoded = Dense(encoding_dim, activation = 'relu',
activity_regularizer = regularizers.l1(1e-4)
)(input_img)
# 解码
decoded = Dense(784, activation = 'sigmoid')(encoded)
# 创建自动编码器
autoencoder = Model(input_img, decoded)
# 编码器
encoder = Model(input_img, encoded)
encoded_input = Input(shape = (encoding_dim,))
# 最后一层全连接层作为解码器
decoder_layer = autoencoder.layers[-1]
# 解码器
decoder = Model(encoded_input, decoder_layer(encoded_input))
# 编译模块
autoencoder.compile(optimizer = 'adadelta', loss = 'binary_crossentropy')
from keras.datasets import mnist
import numpy as np
(x_train, _), (x_test, _) = mnist.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape(x_train.shape[0], np.prod(x_train.shape[1:]))
x_test = x_test.reshape(x_test.shape[0], np.prod(x_test.shape[1:]))
autoencoder.fit(x_train, x_train,
epochs = 100,
batch_size = 256,
shuffle = True,
validation_data = (x_test, x_test))
encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)
import matplotlib.pyplot as plt
n = 10
plt.figure(figsize = (20, 4))
for i in range(10):
# 原图
ax = plt.subplot(2, n, i + 1)
plt.imshow(x_test[i].reshape(28, 28))
plt.gray()
ax.set_axis_off()
# 解码后的图
ax = plt.subplot(2, n, n + i + 1)
plt.imshow(decoded_imgs[i].reshape(28, 28))
plt.gray()
ax.set_axis_off()
plt.savefig('simpleSparse.png')
from keras import backend as K
K.clear_session()
I copied your code verbatim, and reproduced the error you got.
Solution: reduce the batch size from 256, to 16. You'll noice a huge difference in output even after 10 epochs of training.
Explanation: What is probably going on is that even though there is a reduction in your training loss, you are averaging the gradient over too many examples, to a point that the step you are taking in the direction of the gradient is cancelling itself out in some higher dimensional space and your learning algorithm is being tricked into thinking its converged to a local minima, when in reality it can't decided where to go. This last part explains why it seems that all the outputs you get look blurry and exactly the same.
Update: Reduce batch size to 4, and you'll get near perfect reconstruction even after 10 epochs.
You need to change the batch size in your code.
autoencoder.fit(x_train, x_train,epochs = 10,batch_size = 16,shuffle = True,validation_data = (x_test, x_test))
known bug. need to set regularizer to 10e-7
activity_regularizer=regularizers.activity_l1(10e-7))(input_img)
after 50 epochs, val_loss: 0.1424
https://github.com/keras-team/keras/issues/5414
If you drop your batch size it takes much longer

Resources