I've got some number classification model, on test data it works OK, but when I want to classifier other images, I faced with problems that my model can't exactly predict what number is it. Pls, help me improve the model.predict() performance.
I've tried to train my model in many ways, in the code below there is a function that creates classification model, I trained this model actually many ways, [1K < n < 60K] of input test data, [3 < e < 50] of trained iterations.
def load_data():
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
train_images = tf.keras.utils.normalize(train_images, axis = 1)
test_images = tf.keras.utils.normalize(test_images, axis = 1)
return (train_images, train_labels), (test_images, test_labels)
def create_model():
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128, activation = tf.nn.relu))
model.add(tf.keras.layers.Dense(128, activation = tf.nn.relu))
model.add(tf.keras.layers.Dense(10, activation = tf.nn.softmax))
data = load_data(n=60000, k=5)
model.compile(optimizer ='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(data[0][0][:n], data[0][1][:n], epochs = e)# ive tried from 3-50 epochs
model.save(config.model_name)
def load_model():
return tf.keras.models.load_model(config.model_name)def predict(images):
try:
model = load_model()
except:
create_model()
model = load_model()
images = tf.keras.utils.normalize(images, axis = 0)
d = load_data()
plot_many_images([d[0][0][0].reshape((28,28)), images[0]],['data', 'image'])
predictions = model.predict(images)
return predictions
I think that my input data isn't looking like the data is predicting model, but I've tried to make it as similar as I can. On this pic(https://imgur.com/FfLGMEK) on the LEFT is train data image, and on RIGHT is my parsed image, they are both 28x28 pix, both a cv2.noramalized
for the test image predictions I've used this(https://imgur.com/RMfKtag) sudoku, it's already formatted to be similar with a test data numbers, but when I test this image with the model prediction the result is not so nice(https://imgur.com/RQFvLNE)
As you can see predicted data leaves much to be desired.
P.S. the (' ') items in predicted data result made by my hands(I've replaced numbers at that positions by ' '), cos after predictions they all have some value(1-9), its not necessary now.
what do you mean "on test data it works OK"? if you mean its works good for train data but do not has a good prediction on test data, maybe your model was over-fit in training phase. i suggest to use train/validation/test approach to train your network.
Related
I have a Tensorflow regression model that i have with been working with. I have the model tuned well and getting good results while training. However, when i goto evalute the results are horrible. I did some research and found that i am not normalizing my test features and labels as well so i suspect that is where the problem is. My thought is to normalize the whole dataset before splitting the dataset into train and test sets but i am getting an attribute error that has me stumped.
here is the code sample. Please help :)
#concatenate the surface data and single_downhole_col into a single dataframe
training_Data =[]
training_Data = pd.concat([surface_Data, single_downhole_col], axis=1)
#print('training data shape:',training_Data.shape)
#print(training_Data.head())
#normalize the data using keras
model_normalizer_layer = tf.keras.layers.Normalization(axis=-1)
model_normalizer_layer.adapt(training_Data)
normalized_training_Data = model_normalizer_layer(training_Data)
#convert the data frame to array
dataset = normalized_training_Data.copy()
dataset.tail()
#create a training and test set
train_dataset = dataset.sample(frac=0.8, random_state=0)
test_dataset = dataset.drop(train_dataset.index)
#check the data
train_dataset.describe().transpose()
#split features from labels
train_features = train_dataset.copy()
test_features = test_dataset.copy()
and if there is any interest in knowing how the normalizer layer is used in the model then please see below
def build_and_compile_model(data):
model = keras.Sequential([
model_normalizer_layer,
layers.Dense(260, input_dim=401,activation='relu'),
layers.Dense(80, activation='relu'),
#layers.Dense(40, activation='relu'),
layers.Dense(1)
])
i found that quasimodos suggestion of using normalization of the data set before processing in my model was the ideal solution. It scaled the data 0 to 1 for all columns as expected and allowed me to display the data prior to training to validate it was correct.
For whatever reason the keras.layers.normalization was not working in my case.
x = training_Data.values
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
training_Data = pd.DataFrame(x_scaled)
# normalize the data using keras
model_normalizer_layer = tf.keras.layers.Normalization(axis=-1)
model_normalizer_layer.adapt(training_Data)
normalized_training_Data = model_normalizer_layer(training_Data)
The only part that i have yet to figure out is how do i scale the predict data from the model back to the original ranges of the column??? i'm sure its simple but i'm stumped.
I'm using a pre trained InceptionV3 on Keras to retrain the model to make a binary image classification (data labeled with 0's and 1's).
I'm reaching about 65% of accuracy on my k-fold validation with never seen data, but the problem is the model is overfitting to soon. I need to improve this average accuracy, and I guess there is something related to this overfitting problem.
Here are the loss values on epochs:
Here is the code. The dataset and label variables are Numpy Arrays.
dataset = joblib.load(path_to_dataset)
labels = joblib.load(path_to_labels)
le = LabelEncoder()
labels = le.fit_transform(labels)
labels = to_categorical(labels, 2)
X_train, X_test, y_train, y_test = sk.train_test_split(dataset, labels, test_size=0.2)
X_train, X_val, y_train, y_val = sk.train_test_split(X_train, y_train, test_size=0.25) # 0.25 x 0.8 = 0.2
X_train = np.array(X_train)
y_train = np.array(y_train)
X_val = np.array(X_val)
y_val = np.array(y_val)
X_test = np.array(X_test)
y_test = np.array(y_test)
aug = ImageDataGenerator(
rotation_range=20,
zoom_range=0.15,
horizontal_flip=True,
fill_mode="nearest")
pre_trained_model = InceptionV3(input_shape = (299, 299, 3),
include_top = False,
weights = 'imagenet')
for layer in pre_trained_model.layers:
layer.trainable = False
x = layers.Flatten()(pre_trained_model.output)
x = layers.Dense(1024, activation = 'relu')(x)
x = layers.Dropout(0.2)(x)
x = layers.Dense(2, activation = 'softmax')(x) #already tried with sigmoid activation, same behavior
model = Model(pre_trained_model.input, x)
model.compile(optimizer = RMSprop(lr = 0.0001),
loss = 'binary_crossentropy',
metrics = ['accuracy']) #Already tried with Adam optimizer, same behavior
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=100)
mc = ModelCheckpoint('best_model_inception_rmsprop.h5', monitor='val_accuracy', mode='max', verbose=1, save_best_only=True)
history = model.fit(x=aug.flow(X_train, y_train, batch_size=32),
validation_data = (X_val, y_val),
epochs = 100,
callbacks=[es, mc])
The training dataset has 2181 images and validation has 727 images.
Something is wrong, but I can't tell what...
Any thoughts of what can be done to improve it?
One way to avoid overfitting is to use a lot of data. The main reason overfitting happens is because you have a small dataset and you try to learn from it. The algorithm will have greater control over this small dataset and it will make sure it satisfies all the datapoints exactly. But if you have a large number of datapoints, then the algorithm is forced to generalize and come up with a good model that suits most of the points.
Suggestions:
Use a lot of data.
Use less deep network if you have a small number of data samples.
If 2nd satisfies then don't use huge number of epochs - Using many epochs leads is kinda forcing your model to learn that and your model will learn it well but can not generalize.
From your loss graph , i see that the model is generalized at early epoch ( where there is intersection of both the train & val score) so plz try to use the model saved at that epoch ( and not the later epochs which seems to overfit)
Second option what you have is use lot of training samples..
If you have less no. of training samples then use data augmentations
Have you tried following?
Using a higher dropout value
Lower Learning Rate (lr=0.00001 or lr=0.000001 ...)
More data augmentation you can use.
It seems to me your data amount is low. You may use a lower ratio for test and validation (10%, 10%).
I am getting acquainted with Tensorflow-Probability and here I am running into a problem. During training, the model returns nan as the loss (possibly meaning a huge loss that causes overflowing). Since the functional form of the synthetic data is not overly complicated and the ratio of data points to parameters is not frightening at first glance at least I wonder what is the problem and how it could be corrected.
The code is the following --accompanied by some possibly helpful images:
# Create and plot 5000 data points
x_train = np.linspace(-1, 2, 5000)[:, np.newaxis]
y_train = np.power(x_train, 3) + 0.1*(2+x_train)*np.random.randn(5000)[:, np.newaxis]
plt.scatter(x_train, y_train, alpha=0.1)
plt.show()
# Define the prior weight distribution -- all N(0, 1) -- and not trainable
def prior(kernel_size, bias_size, dtype = None):
n = kernel_size + bias_size
prior_model = Sequential([
tfpl.DistributionLambda(
lambda t: tfd.MultivariateNormalDiag(loc = tf.zeros(n) , scale_diag = tf.ones(n)
))
])
return(prior_model)
# Define variational posterior weight distribution -- multivariate Gaussian
def posterior(kernel_size, bias_size, dtype = None):
n = kernel_size + bias_size
posterior_model = Sequential([
tfpl.VariableLayer(tfpl.MultivariateNormalTriL.params_size(n) , dtype = dtype), # The parameters of the model are declared Variables that are trainable
tfpl.MultivariateNormalTriL(n) # The posterior function will return to the Variational layer that will call it a MultivariateNormalTril object that will have as many dimensions
# as the parameters of the Variational Dense Layer. That means that each parameter will be generated by a distinct Normal Gaussian shifted and scaled
# by a mu and sigma learned from the data, independently of all the other weights. The output of this Variablelayer will become the input to the
# MultivariateNormalTriL object.
# The shape of the VariableLayer object will be defined by the number of paramaters needed to create the MultivariateNormalTriL object given
# that it will live in a Space of n dimensions (event_size = n). This number is returned by the tfpl.MultivariateNormalTriL.params_size(n)
])
return(posterior_model)
x_in = Input(shape = (1,))
x = tfpl.DenseVariational(units= 2**4,
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/x_train.shape[0],
activation='relu')(x_in)
x = tfpl.DenseVariational(units= 2**4,
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/x_train.shape[0],
activation='relu')(x)
x = tfpl.DenseVariational(units=tfpl.IndependentNormal.params_size(1),
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/x_train.shape[0])(x)
y_out = tfpl.IndependentNormal(1)(x)
model = Model(inputs = x_in, outputs = y_out)
def nll(y_true, y_pred):
return -y_pred.log_prob(y_true)
model.compile(loss=nll, optimizer= 'Adam')
model.summary()
Train the model
history = model.fit(x_train1, y_train1, epochs=500)
The problem seems to be in the loss function: negative log-likelihood of the independent normal distribution without any specified location and scale leads to the untamed variance which leads to the blowing up the final loss value. Since you're experimenting with the variational layers, you must be interested in the estimation of the epistemic uncertainty, to that end, I'd recommend to apply the constant variance.
I tried to make a couple of slight changes to your code within the following lines:
first of all, the final output y_out comes directly from the final variational layer without any IndpendnetNormal distribution layer:
y_out = tfpl.DenseVariational(units=1,
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/x_train.shape[0])(x)
second, the loss function now contains the necessary calculations with the normal distribution you need but with the static variance in order to avoid the blowing up of the loss during training:
def nll(y_true, y_pred):
dist = tfp.distributions.Normal(loc=y_pred, scale=1.0)
return tf.reduce_sum(-dist.log_prob(y_true))
then the model is compiled and trained in the same way as before:
model.compile(loss=nll, optimizer= 'Adam')
history = model.fit(x_train, y_train, epochs=3000)
and finally let's sample 100 different predictions from the trained model and plot these values to visualize the epistemic uncertainty of the model:
predicted = [model(x_train) for _ in range(100)]
for i, res in enumerate(predicted):
plt.plot(x_train, res , alpha=0.1)
plt.scatter(x_train, y_train, alpha=0.1)
plt.show()
After 3000 epochs the result looks like this (with the reduced number of training points to 3000 instead of 5000 to speed-up the training):
The model has 38,589 trainable parameters but you have only 5,000 points as data; so, effective training is impossible with so many parameters.
I'm currently trying to use sklearn to correlate population stagnation and happiness internationally. I've prepared and cleaned datasets with pandas, but for some reason, any model I try will not train. One of my columns for the data was countries, so I used pandas get_dummies function to account for feeding strings into the models. My shapes for the training and testing variables are as follows: (617, 67),(617,),(151, 67),(151,).
rf_class = RandomForestClassifier(n_estimators=5)
log_class = LogisticRegression()
svm_class = SVC(kernel='rbf', C=1E11, verbose=False)
def run(model, model_name='this model', trainX=trainX, trainY=trainY, testX=testX, testY=testY):
# print(cross_val_score(model, trainX, trainY, scoring='accuracy', cv=10))
accuracy = cross_val_score(model, trainX, trainY,
scoring='accuracy', cv=2).mean() * 100
model.fit(trainX, trainY)
testAccuracy = model.score(testX, testY)
print("Training accuracy of "+model_name+" is: ", accuracy)
print("Testing accuracy of "+model_name+" is: ", testAccuracy*100)
print('\n')
# run(rf_class,'log')
model = log_class
model.fit(trainX,trainY)
perm = PermutationImportance(model, random_state=1).fit(testX, testY)
eli5.show_weights(perm, feature_names=feature_names)
Is my dataset simply too small to train on? Are the dummies too much for the models? Any help that can be offered is greatly appreciated.
I am trying to build a regressor with Keras but it seems my model is not able to predict very high values and hence giving me high (Mean Absolute) loss at the output. Other than that, it can recognize the pattern as shown below.
def build_model(features):
# create model
main_input = Input(shape=(len(features[0]),), dtype='float32', name='main_input')
main_out = Dense(20, kernel_initializer="glorot_uniform", bias_initializer="glorot_uniform", activation='tanh')(main_input)
x = Dropout(0.1)(main_out)
output = Dense(1, name='main_output')(x)
model = Model(inputs=[main_input], outputs=[output])
return model
After I normalized both my input and output and edited the model, I still have the problem that my regressor cannot predict very high values.