Changing batch_size parameter in keras leads to broadcast error - python-3.x

I am running a simple encoder-decoder setup to train a representation for a one dimensional image. In this sample the input are lines with varying slopes and in the encoded layer we would expect something that resembles the slope. My setup is keras with a tensorflow backend. I am very new to this as well.
It all works fine, at least until I move away from steps_per_epoch to batch_size in the model.fit() method. Certain values of the batch_size, such as 1,2,3, 8 and 16 do work, for others I get a value error. My initial guess was 2^n, but that did not work.
The error I get for batch_size = 5
ValueError: operands could not be broadcast together with shapes (5,50) (3,50) (5,50)
I am trying to understand which relation between batch_size and training data is valid such that it always passes. I assumed that the training set would be simply divided into floor(N/batch_size) batches and the remainder would be processed as such.
My questions are:
What is the relation between size of data set and batch_size that are allowed.
What exactly is the keras/tensorflow trying to do such that the batch_size is important?
Thank you very much for the help.
The code to reproduce this is
import numpy as np
from keras.models import Model
from keras.layers import Input, Dense, Conv1D, Concatenate
from keras.losses import mse
from keras.optimizers import Adam
INPUT_DIM = 50
INTER_DIM = 15
LATENT_DIM = 1
# Prepare Sample Data
one_line = np.linspace(1, 30, INPUT_DIM).reshape(1, INPUT_DIM)
test_array = np.repeat(one_line, 1000, axis=0)
slopes = np.linspace(0, 1, 1000).reshape(1000, 1)
data = test_array * slopes
# Train test split
train_mask = np.where(np.random.sample(1000) < 0.8, 1, 0).astype('bool')
x_train = data[train_mask].reshape(-1, INPUT_DIM, 1)
x_test = data[~train_mask].reshape(-1, INPUT_DIM, 1)
# Define Model
input = Input(shape=(INPUT_DIM, 1), name='input')
conv_layer_small = Conv1D(filters=1, kernel_size=[3], padding='same')(input)
conv_layer_medium = Conv1D(filters=1, kernel_size=[5], padding='same')(input)
merged_convs = Concatenate()(
[conv_layer_small, conv_layer_medium])
latent = Dense(LATENT_DIM, name='latent_layer',
activation='relu')(merged_convs)
encoder = Model(input, latent)
decoder_int = Dense(INTER_DIM, name='dec_int_layer', activation='relu')(latent)
output = Dense(INPUT_DIM, name='output', activation='linear')(decoder_int)
encoder_decoder = Model(input, output, name='encoder_decoder')
# Add Loss
reconstruction_loss = mse(input, output)
encoder_decoder.add_loss(reconstruction_loss)
encoder_decoder.compile(optimizer='adam')
if __name__ == '__main__':
epochs = 100
encoder_decoder.fit(
x_train,
epochs=epochs,
batch_size=4,
verbose=2
)

Related

Does the sequence length of a RNN/LSTM have to be the same for the input and output?

I have a question about the input and output data in a RNN or LSTM. A RNN expects a 3-dimensional vector as input of the form (Batch_size, sequence_length_input, features_input) and a 3-dimensional output vector of the form (Batch_size, sequence_length_output, features_output).
I know that the features_input and features_output don't have to have the same number while the Batch_size has to be equal for input and output. But what about the middle part sequence_length_input and sequence_length_output. Do they have to be the same? At least in my example (with Keras and Tensorflow) I always get an error if they are not the same. So I am wondering whetever I have a bug in the code or if this is generally not possible.
So can I for example use as input for the training, the data X_train =(1000, 100, 10) and the output Y_train = (1000, 20, 3) such that I have a mapping for each of the 1000 itmes (Batch_size) from a 10-dimensional (features_input) time series with 100 time steps (sequence_length_input) to a 3-dimensional (features_output) time series with 20 time steps (sequence_length_output).
Update: Here is my code with a RNN for time series forecasting that only works if the sequence_length of the input steps_backward is equal to the sequence_length of the output steps_forward otherwise it will throw a ValueError:
ValueError: Dimensions must be equal, but are 192 and 96 for '{{node mean_squared_error/SquaredDifference}} = SquaredDifference[T=DT_FLOAT](sequential_5/time_distributed_5/Reshape_1, IteratorGetNext:1)' with input shapes: [?,192,1], [?,96,1].
In the code I use the 96 past timesteps (or 2*96=192 timesteps) to predict the future 96 timesteps. When the number of past and future timesteps are equal (equal sequence_length), everything works fine. Otherwise (unequal sequence_length) I get the ValueError.
Code:
#Import modules
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
from tensorflow import keras
# Define the parameters of the RNN and the training
epochs = 1
batch_size = 50
steps_backwards = 2 * 96
steps_forward = 96
split_fraction_trainingData = 0.70
split_fraction_validatinData = 0.90
randomSeedNumber = 50
#Read dataset
df = pd.read_csv('C:/Users/User1/Desktop/TestData.csv', sep=';', header=0, low_memory=False, infer_datetime_format=True, parse_dates={'datetime':[0]}, index_col=['datetime'])
# standardize data
data = df.values
indexWithYLabelsInData = 0
data_X = data[:, 0:3]
data_Y = data[:, indexWithYLabelsInData].reshape(-1, 1)
scaler_standardized_X = StandardScaler()
data_X = scaler_standardized_X.fit_transform(data_X)
data_X = pd.DataFrame(data_X)
scaler_standardized_Y = StandardScaler()
data_Y = scaler_standardized_Y.fit_transform(data_Y)
data_Y = pd.DataFrame(data_Y)
# Prepare the input data for the RNN
series_reshaped_X = np.array([data_X[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
series_reshaped_Y = np.array([data_Y[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
timeslot_x_train_end = int(len(series_reshaped_X)* split_fraction_trainingData)
timeslot_x_valid_end = int(len(series_reshaped_X)* split_fraction_validatinData)
X_train = series_reshaped_X[:timeslot_x_train_end, :steps_backwards]
X_valid = series_reshaped_X[timeslot_x_train_end:timeslot_x_valid_end, :steps_backwards]
X_test = series_reshaped_X[timeslot_x_valid_end:, :steps_backwards]
Y_train = series_reshaped_Y[:timeslot_x_train_end, steps_backwards:]
Y_valid = series_reshaped_Y[timeslot_x_train_end:timeslot_x_valid_end, steps_backwards:]
Y_test = series_reshaped_Y[timeslot_x_valid_end:, steps_backwards:]
# Build the model and train it
np.random.seed(randomSeedNumber)
tf.random.set_seed(randomSeedNumber)
model = keras.models.Sequential([
keras.layers.SimpleRNN(10, return_sequences=True, input_shape=[None, 3]),
keras.layers.SimpleRNN(10, return_sequences=True),
keras.layers.TimeDistributed(keras.layers.Dense(1))
])
model.compile(loss="mean_squared_error", optimizer="adam")
history = model.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, validation_data=(X_valid, Y_valid))
#Predict the test data
Y_pred = model.predict(X_test)
and here is some test data https://filetransfer.io/data-package/ufbzh09o#link
Reminder: The code and the data provide a Minimal reproducible example. Maybe you can have a look at it as in this code the sequence_length has to be equal for the input and output data, otherwise I get an error. Unfortuantely I still have not figured out why this this problem occurs
I have encountered the same problems. My input data shape is [512,10,3], and the output data is [512,20,1], which means that the last ten-time time steps data is used to predict the future twenty-time time steps. When I tried to implement it in PyTorch, the same problem as you appeared. Finally, I just used the last state of the LSTM to repeat 20 times and feed into the next fully connected layers. However, I cannot do it in the classic backpropagation (just made up of fully connected layers) neural network.

How do I know what the output of model.predict() correspond to?

I am trying to make a CNN that classifies cats and dogs and I am using flow_from_directory() to prepare my data for the model.
from keras import Sequential
from keras_preprocessing.image import ImageDataGenerator
from keras.layers import *
from keras.callbacks import ModelCheckpoint
from keras.optimizers import *
import keras
import numpy as np
import os
img_size = 250 # number of pixels for width and height
#Random Seed
np.random.seed(123456789)
training_path = os.getcwd() + "/cats and dogs images/train"
testing_path = os.getcwd() + "/cats and dogs images/test"
#Defines the Model
model = Sequential([
Conv2D(filters=128, kernel_size=(3,3), activation="relu", padding="same", input_shape=(img_size,img_size,3)),
MaxPool2D(pool_size=(2,2), strides=2),
Conv2D(filters=64, kernel_size=(3,3), activation="relu", padding="same"),
Flatten(),
Dense(32, activation="relu"),
Dense(2, activation="softmax")
])
#Scales the pixel values to between 0 to 1
datagen = ImageDataGenerator(rescale=1.0/255.0)
Batch_size = 10
#Prepares Training Data
training_dataset = datagen.flow_from_directory(directory = training_path,
target_size=(img_size,img_size),
classes = ["cat","dog"],
class_mode = "categorical",
batch_size = Batch_size)
#Prepares Testing Data
testing_dataset = datagen.flow_from_directory(directory = testing_path,
target_size=(img_size,img_size),
classes = ["cat","dog"],
class_mode = "categorical",
batch_size = Batch_size)
#Compiles the model
#model.compile(loss="categorical_crossentropy", optimizer="sgd", metrics=['accuracy'])
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=['accuracy'])
#model.compile(loss="mse", optimizer="sgd", metrics=[keras.metrics.MeanSquaredError()])
#Checkpoint
filepath = os.getcwd() + "/trained_model.h5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min', save_freq=1)
#Fitting the model to the dataset (Training the Model)
model.fit(x = training_dataset, steps_per_epoch = 400,
validation_data=testing_dataset, validation_steps=100,
epochs = 10, callbacks=[checkpoint], verbose = 1)
# evaluate model on training dataset
_,acc = model.evaluate_generator(training_dataset, steps=len(training_dataset), verbose=0)
print("Accuracy on training dataset:")
print('> %.3f' % (acc * 100.0))
#evaluate model on testing dataset
_,acc = model.evaluate_generator(testing_dataset, steps=len(testing_dataset), verbose=0)
print("Accuracy on testing dataset:")
print('> %.3f' % (acc * 100.0))
I want to know how the output of model.predict() is going to correspond to the labels cats and dogs and which one of the two numbers in the output is a cat and which is a dog?
Here's my code for loading the model and giving a prediction:
from keras.models import Sequential
from keras_preprocessing.image import *
from keras.layers import *
import tensorflow as tf
import numpy as np
from keras.layers.experimental.preprocessing import Rescaling
import os
import cv2
from keras.models import *
img_size = 250
#Load weights into new model
filepath = os.getcwd() + "/trained_model.h5"
model = load_model(filepath)
print("Loaded model from disk")
#Scales the pixel values to between 0 to 1
#datagen = ImageDataGenerator(rescale=1.0/255.0)
#Prepares Testing Data
testing_dataset = cv2.imread(os.getcwd() + "/cats and dogs images/single test sample/507.png")
#img = datagen.flow_from_directory(testing_dataset, target_size=(img_size,img_size))
img = cv2.resize(testing_dataset, (img_size,img_size))
newimg = np.asarray(img)
pixels = newimg.astype('float32')
pixels /= 255.0
print(pixels.shape)
pixels = np.expand_dims(pixels, axis=0)
print(pixels.shape)
prediction = model.predict(pixels)
print(prediction)
And here is the output from the prediction code above:
Loaded model from disk
(250, 250, 3)
(1, 250, 250, 3)
[[5.4904184e-27 1.0000000e+00]]
As you can see, the prediction gave an array of two numbers, but which one corresponds to the dog label and which to the cat label? By the way, the model isn't fully trained so I am just testing out the code to see if it works.
The model output depends on how you loaded the data and specified how the classes are going to be ordered/labelled in this code you provided:
training_dataset = datagen.flow_from_directory(directory = training_path,
target_size=(img_size,img_size),
classes = ["cat","dog"],
class_mode = "categorical",
batch_size = Batch_size)
#Prepares Testing Data
testing_dataset = datagen.flow_from_directory(directory = testing_path,
target_size=(img_size,img_size),
classes = ["cat","dog"],
class_mode = "categorical",
batch_size = Batch_size)
You specified during the loading of the data that the classes are going to be ordered Cat then Dog in classes argument.
Therefor the output is going to be ordered as two probabilities (summing to 1)
The first probability refers to by how % that the input image is cat and the second probability refers to by how % that the input image is dog.
You use this line:
output_class = np.argmax(prediction, axis=1)
This line will compare the elements of the list and outputs which index of the elements of the list is the greatest (In our case the list containing the two probabilities) in the form of [1] (or [0, 1] depending on the shape of the output) this means that the said image is a dog, since the 2nd element in the output list is 1 if it were [0] (or [1, 0] depending on the shape of the output) then that means that the output class of the input image is cat.

MNIST and transfer learning with VGG16 in Keras- low validation accuracy

I recently started taking advantage of Keras's flow_from_dataframe() feature for a project, and decided to test it with the MNIST dataset. I have a directory full of the MNIST samples in png format, and a dataframe with the absolute directory for each in one column and the label in the other.
I'm also using transfer learning, importing VGG16 as a base, and adding my own 512 node relu dense layer and 0.5 drop-out before a softmax layer of 10. (For digits 0-9). I'm using rmsprop (lr=1e-4) as the optimizer.
When I launch my environment, it calls the latest version of keras_preprocessing from Git, which has support for absolute directories and capitalized file extensions.
My problem is that I have a very high training accuracy, and a terribly low validation accuracy. By my final epoch (10), I had a training accuracy of 0.94 and a validation accuracy of 0.01.
I'm wondering if there's something fundamentally wrong with my script? With another dataset, I'm even getting NaNs for both my training and validation loss values after epoch 4. (I checked the relevant columns, there aren't any null values!)
Here's my code. I'd be deeply appreciative is someone could glance through it and see if anything jumped out at them.
import pandas as pd
import numpy as np
import keras
from keras_preprocessing.image import ImageDataGenerator
from keras import applications
from keras import optimizers
from keras.models import Model
from keras.layers import Dropout, Flatten, Dense, GlobalAveragePooling2D
from keras import backend as k
from keras.callbacks import ModelCheckpoint, CSVLogger
from keras.applications.vgg16 import VGG16, preprocess_input
# INITIALIZE MODEL
img_width, img_height = 32, 32
model = VGG16(weights = 'imagenet', include_top=False, input_shape = (img_width, img_height, 3))
# freeze all layers
for layer in model.layers:
layer.trainable = False
# Adding custom Layers
x = model.output
x = Flatten()(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
predictions = Dense(10, activation="softmax")(x)
# creating the final model
model_final = Model(input = model.input, output = predictions)
# compile the model
rms = optimizers.RMSprop(lr=1e-4)
#adadelta = optimizers.Adadelta(lr=0.001, rho=0.5, epsilon=None, decay=0.0)
model_final.compile(loss = "categorical_crossentropy", optimizer = rms, metrics=["accuracy"])
# LOAD AND DEFINE SOURCE DATA
train = pd.read_csv('MNIST_train.csv', index_col=0)
val = pd.read_csv('MNIST_test.csv', index_col=0)
nb_train_samples = 60000
nb_validation_samples = 10000
batch_size = 60
epochs = 10
# Initiate the train and test generators
train_datagen = ImageDataGenerator()
test_datagen = ImageDataGenerator()
train_generator = train_datagen.flow_from_dataframe(dataframe=train,
directory=None,
x_col='train_samples',
y_col='train_labels',
has_ext=True,
target_size = (img_height,
img_width),
batch_size = batch_size,
class_mode = 'categorical',
color_mode = 'rgb')
validation_generator = test_datagen.flow_from_dataframe(dataframe=val,
directory=None,
x_col='test_samples',
y_col='test_labels',
has_ext=True,
target_size = (img_height,
img_width),
batch_size = batch_size,
class_mode = 'categorical',
color_mode = 'rgb')
# GET CLASS INDICES
print('****************')
for cls, idx in train_generator.class_indices.items():
print('Class #{} = {}'.format(idx, cls))
print('****************')
# DEFINE CALLBACKS
path = './chk/epoch_{epoch:02d}-valLoss_{val_loss:.2f}-valAcc_{val_acc:.2f}.hdf5'
chk = ModelCheckpoint(path, monitor = 'val_acc', verbose = 1, save_best_only = True, mode = 'max')
logger = CSVLogger('./chk/training_log.csv', separator = ',', append=False)
nPlus = 1
samples_per_epoch = nb_train_samples * nPlus
# Train the model
model_final.fit_generator(train_generator,
steps_per_epoch = int(samples_per_epoch/batch_size),
epochs = epochs,
validation_data = validation_generator,
validation_steps = int(nb_validation_samples/batch_size),
callbacks = [chk, logger])
Have you tried explicitly defining the classes of the images? as such:
train_generator=image.ImageDataGenerator().flow_from_dataframe(classes=[0,1,2,3,4,5,6,7,8,9])
in both the train and validation generators.
I have found that sometimes the train and validation generators create different correspondence dictionaries.

Two dimensional (position) time series prediction in Keras using LSTM

I am trying to use the Keras LSTM implementation to predict a series of x, y pairs into the future. The x, y pairs specify location in a 2D plane. I would like to predict them 60 steps into the future.
I have 36k data pairs, which I have split up into 30k for training and 5880k for testing. I have prepared the training data by creating a 3D array of shape (30000, 60, 2) where each element is a rolling 60 length snippet of the training data, e.g. [[[x0, y0], [x1, y1], ... [x59, y59]], [x1, y1], [x2, y2], ... [x60, y60]], ... [x30000, y30000], [x30001, y30001], ... [x30059, y30059]]]. The target data is the exact same thing, only offset by 60 elements. The idea is to basically use 60 pairs to predict the next 60 pairs.
I'm getting the following error, indicating that the model is expecting the target data to have only two dimensions.
ValueError: Error when checking model target: expected lstm_1 to have 2 dimensions, but got array with shape (30000, 60, 2)
It looks like the model is dropping the fact that my data is 2D. Clearly I am missing something conceptually here, but I'm not sure what it is. I'd be grateful if someone could put me on the right track.
Here is my code:
import numpy as np
from numpy import genfromtxt
from keras.models import Sequential
from keras.layers import Dense, LSTM, Activation, GRU, Dropout
TRAINING_SET_SIZE = 30000
epochs = 1
original_data = genfromtxt('training_data.txt', delimiter=',', dtype='int')
training_set = []
for i in range(len(original_data) - 120):
training_set.append(original_data[i:i+120])
training_set = np.array(training_set)
train_input = []
train_output = []
for i in range(TRAINING_SET_SIZE):
train_input.append(training_set[i][0:60])
train_output.append(training_set[i][60:120])
train_input = np.array(train_input)
train_output = np.array(train_output)
test_input = []
test_output = []
for i in range(TRAINING_SET_SIZE, len(original_data) - 120):
test_input.append(training_set[i][0:60])
test_output.append(training_set[i][60:120])
test_input = np.array(test_input)
test_output = np.array(test_output)
s = (train_input.shape[1], train_input.shape[2])
model = Sequential()
model.add(LSTM(60, input_shape=s, unroll=True))
model.compile(loss='mean_squared_error', optimizer='adam')
model.summary()
print("Inputs: {}".format(model.input_shape))
print("Outputs: {}".format(model.output_shape))
print("Actual input: {}".format(train_input.shape))
print("Actual output: {}".format(train_output.shape))
print('Training')
model.fit(train_input, train_output, validation_split=0.2, batch_size=1, epochs=epochs, verbose=1, shuffle=False)
model.save('my_model.h5')
score = model.evaluate(test_input, test_output, batch_size=1)
print(score)
print('Predicting')
predicted_output = model.predict(test_input, batch_size=1)

About 'Building Autoencoders in Keras'?

I read the Building Autoencoders in Keras, the url is https://blog.keras.io/building-autoencoders-in-keras.html
In the section of Adding a sparsity constraint on the encoded representations, I have tried according to his description, but the loss can't down to 0.11, instead around 0.26.
So the result is fuzzy:
Can anyone who has done this experiment tell me what's wrong with it?
It's my code:
from keras.layers import Input, Dense
from keras.models import Model
from keras import regularizers
encoding_dim = 32 # 压缩后维度
input_img = Input(shape = (784,))
# 编码
encoded = Dense(encoding_dim, activation = 'relu',
activity_regularizer = regularizers.l1(1e-4)
)(input_img)
# 解码
decoded = Dense(784, activation = 'sigmoid')(encoded)
# 创建自动编码器
autoencoder = Model(input_img, decoded)
# 编码器
encoder = Model(input_img, encoded)
encoded_input = Input(shape = (encoding_dim,))
# 最后一层全连接层作为解码器
decoder_layer = autoencoder.layers[-1]
# 解码器
decoder = Model(encoded_input, decoder_layer(encoded_input))
# 编译模块
autoencoder.compile(optimizer = 'adadelta', loss = 'binary_crossentropy')
from keras.datasets import mnist
import numpy as np
(x_train, _), (x_test, _) = mnist.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape(x_train.shape[0], np.prod(x_train.shape[1:]))
x_test = x_test.reshape(x_test.shape[0], np.prod(x_test.shape[1:]))
autoencoder.fit(x_train, x_train,
epochs = 100,
batch_size = 256,
shuffle = True,
validation_data = (x_test, x_test))
encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)
import matplotlib.pyplot as plt
n = 10
plt.figure(figsize = (20, 4))
for i in range(10):
# 原图
ax = plt.subplot(2, n, i + 1)
plt.imshow(x_test[i].reshape(28, 28))
plt.gray()
ax.set_axis_off()
# 解码后的图
ax = plt.subplot(2, n, n + i + 1)
plt.imshow(decoded_imgs[i].reshape(28, 28))
plt.gray()
ax.set_axis_off()
plt.savefig('simpleSparse.png')
from keras import backend as K
K.clear_session()
I copied your code verbatim, and reproduced the error you got.
Solution: reduce the batch size from 256, to 16. You'll noice a huge difference in output even after 10 epochs of training.
Explanation: What is probably going on is that even though there is a reduction in your training loss, you are averaging the gradient over too many examples, to a point that the step you are taking in the direction of the gradient is cancelling itself out in some higher dimensional space and your learning algorithm is being tricked into thinking its converged to a local minima, when in reality it can't decided where to go. This last part explains why it seems that all the outputs you get look blurry and exactly the same.
Update: Reduce batch size to 4, and you'll get near perfect reconstruction even after 10 epochs.
You need to change the batch size in your code.
autoencoder.fit(x_train, x_train,epochs = 10,batch_size = 16,shuffle = True,validation_data = (x_test, x_test))
known bug. need to set regularizer to 10e-7
activity_regularizer=regularizers.activity_l1(10e-7))(input_img)
after 50 epochs, val_loss: 0.1424
https://github.com/keras-team/keras/issues/5414
If you drop your batch size it takes much longer

Resources