I am trying to train a keras_retinanet model as shown in the code given below and the training is working fine. I created a CSVGenerator data-generator for the fit_generator function which inherits the "Generator" super class in which there's a parameter called "batch_size" defaulted to "1".
I would like to change the value of this "batch_size" variable, but I am not able to figure out how can I do that. Any help is much appreciated.
model = load_model('./snapshots/resnet50_csv_01.h5',
backbone_name='resnet50')
generator = CSVGenerator(
csv_data_file='./data_set_retina/train.csv',
csv_class_file='./data_set_retina/class_id_mapping'
)
generator_val = CSVGenerator(
csv_data_file='./data_set_retina/val.csv',
csv_class_file='./data_set_retina/class_id_mapping'
)
model.compile(
loss={
'regression' : keras_retinanet.losses.smooth_l1(),
'classification': keras_retinanet.losses.focal()
},
optimizer=keras.optimizers.adam(lr=1e-5, clipnorm=0.001)
)
csv_logger = keras.callbacks.CSVLogger('./logs/training_log.csv',
separator=',', append=False)
model.fit_generator(generator, steps_per_epoch=80000, epochs=50,
verbose=1, callbacks=[csv_logger],
validation_data=generator_val,validation_steps=20000,class_weight=None,
max_queue_size=10, workers=1, use_multiprocessing=False,
shuffle=True, initial_epoch=0)
I suppose that you're speaking about the keras-retinanet repository.
You can find the batch size here:
https://github.com/fizyr/keras-retinanet/blob/b28c358c71026d7a5bcb1f4d928241a693d95230/keras_retinanet/bin/train.py#L395
This variable is then passed to the generators in the common_args dictionary.
In fact, it is also possible to instantiate your CSVGenerator passing batch_size argument. Following your code snippet:
generator = CSVGenerator(
csv_data_file='./data_set_retina/train.csv',
csv_class_file='./data_set_retina/class_id_mapping',
batch_size=16
)
Related
I am creating my custom layers tf.keras model using mobile net pretrained layer. Model training is running fine but when saving the best picked model it is giving an error. Below is the snippet of the code that I used
pretrained_model = tf.keras.applications.MobileNetV2(
weights='imagenet',
include_top=False,
input_shape=[*IMAGE_SIZE, IMG_CHANNELS])
pretrained_model.trainable = True #fine tuning
model = tf.keras.Sequential([
tf.keras.layers.Lambda(# Convert image from int[0, 255] to the format expect by this model
lambda data:tf.keras.applications.mobilenet.preprocess_input(
tf.cast(data, tf.float32)), input_shape=[*IMAGE_SIZE, 3]),
pretrained_model,
tf.keras.layers.GlobalAveragePooling2D()])
model.add(tf.keras.layers.Dense(64, name='object_dense',kernel_regularizer=tf.keras.regularizers.l2(l2=0.001)))
model.add(tf.keras.layers.BatchNormalization(scale=False, center = False))
model.add(tf.keras.layers.Activation('relu', name='relu_dense_64'))
model.add(tf.keras.layers.Dropout(rate=0.2, name='dropout_dense_64'))
model.add(tf.keras.layers.Dense(32, name='object_dense_2',kernel_regularizer=tf.keras.regularizers.l2(l2=0.01)))
model.add(tf.keras.layers.BatchNormalization(scale=False, center = False))
model.add(tf.keras.layers.Activation('relu', name='relu_dense_32'))
model.add(tf.keras.layers.Dropout(rate=0.2, name='dropout_dense_32'))
model.add(tf.keras.layers.Dense(16, name='object_dense_16', kernel_regularizer=tf.keras.regularizers.l2(l2=0.01)))
model.add(tf.keras.layers.Dense(len(CLASS_NAMES), activation='softmax', name='object_prob'))
m1 = tf.keras.metrics.CategoricalAccuracy()
m2 = tf.keras.metrics.Recall()
m3 = tf.keras.metrics.Precision()
optimizers = [
tfa.optimizers.AdamW(learning_rate=lr * .001 , weight_decay=wd),
tfa.optimizers.AdamW(learning_rate=lr, weight_decay=wd)
]
optimizers_and_layers = [(optimizers[0], model.layers[0]), (optimizers[1], model.layers[1:])]
optimizer = tfa.optimizers.MultiOptimizer(optimizers_and_layers)
model.compile(
optimizer= optimizer,
loss = 'categorical_crossentropy',
metrics=[m1, m2, m3],
)
checkpoint_path = os.getcwd() + os.sep + 'keras_model'
checkpoint_cb = tf.keras.callbacks.ModelCheckpoint(filepath=os.path.join(checkpoint_path),
monitor = 'categorical_accuracy',
save_best_only=True,
save_weights_only=True)
history = model.fit(train_data, validation_data=test_data, epochs=N_EPOCHS, callbacks=[checkpoint_cb])
At tf.keras.callbacks.ModelCheckpoint is giving me an error
TypeError: Unable to serialize 1.0000000656873453e-05 to JSON. Unrecognized type <class 'tensorflow.python.framework.ops.EagerTensor'>.
Below is the link to the Google Colab notebook in case you want to replicate the issue
https://colab.research.google.com/drive/1wQbUFfhtDaB5Xta574UkAXJtthui7Bt9?usp=sharing
This seems to be a bug in Tensorflow or Keras. The tensor that's being serialized to JSON is from your optimizer definition.
model.optimizer.optimizer_specs[0]["optimizer"].get_config()["weight_decay"]
<tf.Tensor: shape=(), dtype=float32, numpy=1.0000001e-05>
From the implementation of tfa.optimizers.AdamW, the weight_decay is serialized using tf.keras.optimizers.Adam._serialize_hyperparameter. This function assumes that if you pass in a callable for the hyperparameter, it returns a non-tensor value when called, but in your notebook, it was implemented as
wd = lambda: 1e-02 * schedule(step)
where schedule() returns a Tensor. I tried some various ways to try to convert the tensor to a scalar value, but I couldn't get them to work. As a workaround, I implemented wd as a LearningRateSchedule so it'll serialize properly, though the code was clunkier. Replacing the definitions of wd and lr with this code allowed model training to complete for me without any issues.
class MyExponentialDecay(tf.keras.optimizers.schedules.ExponentialDecay):
def __call__(self, step):
return 1e-2 * super().__call__(step)
wd = MyExponentialDecay(
initial_learning_rate,
decay_steps=14,
decay_rate=0.8,
staircase=True)
lr = 1e2 * schedule(step)
After training completes, the model.save() call will fail. I believe this is the same issue which was reported here in the Tensorflow Addons Github. The summary of this issue is that the get_config() function for the optimizers will include a "gv" key in the config which stores Tensor objects, which aren't JSON serializable.
At the time of writing, this issue has not been resolved yet. If you don't need the optimizer state for the final saved model, you can pass in the include_optimizer=False argument to model.save() which worked for me. Otherwise, you may need to patch the library or the specific optimizer class implementation to get rid of the "gw" key in the config like the OP did in that thread.
While implementing K-fold using Scikit Learn in DecisionTreeClassifier model, I'm having hard time understanding why this baseline code doesn't contain any model initialization part. From my perspective, while fitting take place with iterations, the model which has already learned by first iteration stays the same(with identical parameter) during the second loop fitting and so on.
You can see my code below.
What I'm really curious about is, "Unlike other deep learning libraries like Pytorch etc, isn't there any need for model initialization for scikit-learn? or does this code below automatically do the initialization?(if so plz let me know where the parameter initialization take place)
model = DecisionTreeClassifier()
cv_accuracy = []
n_iter = 0
kfold = KFold(n_splits = 5, random_state = None, shuffle = False)
for train_index, validation_index in kfold.split(train_data, train_label):
x_train, x_val = train_data[train_index], train_data[validation_index]
y_train, y_val = train_label[train_index], train_label[validation_index]
train_size = x_train.shape[0]
val_size = x_val.shape[0]
model.fit(x_train, y_train)
pred = model.predict(x_val)
n_iter += 1
accuracy = np.round(accuracy_score(y_val, pred), 4)
cv_accuracy.append(accuracy)
# Thought I should initialize model somehow... in this part
model = DecisionTreeClassifier()
print('\n## Accuracy : ', np.mean(cv_accuracy))
fit() constructs a brand new tree behind the scenes (DecisionTreeClassifierObj.tree_), so it does that for you. The class init just provides the parameters it will use.
Here's the source code for that btw so you can see.
Simplified version of what fit() does:
#Process data
self.tree_ = Tree(self.n_features_, self.n_classes_, self.n_outputs_)
builder = BestFirstTreeBuilder(splitter, min_samples_split, min_samples_leaf,
min_weight_leaf, max_depth, max_leaf_nodes,
self.min_impurity_decrease, min_impurity_split)
builder.build(self.tree_, X, y, sample_weight)
self._prune_tree()
return self
I am following along the following guide to tensorflow regression models: https://www.tensorflow.org/tutorials/keras/basic_regression
Using basketball data. I am wanting to predict NBA career length based on college stats. I currently have normalized data in the format:
I then build the following model based on the code in the above link:
def build_model():
model = keras.Sequential([
keras.layers.Dense(64, activation=tf.nn.relu,
input_shape=(train.shape[1],)),
keras.layers.Dense(64, activation=tf.nn.relu),
keras.layers.Dense(1)
])
optimizer = tf.train.RMSPropOptimizer(0.001)
model.compile(loss='mse',
optimizer=optimizer,
metrics=['mae'])
return model
model = build_model()
model.summary()
Which appears to work fine. However when I then try to run the model and record the history using the following code:
EPOCHS = 200
labels = ['Age','G','FG','FGA','X3P','X3PA','FTA','TRB','AST','STL','BLK','Wt','final_ht','colyears','nbayears']
# Store training stats
history = model.fit(train, labels, epochs=EPOCHS, validation_split=0.2, verbose=0)
This gives me an error that: 'str' object has no attribute 'ndim', which I am having trouble understanding what it means. Am I doing something wrong?
When you call the .fit function of the model the second parameter should represent your target variable (NBA career length). This will be a one-dimensional array instead of the list you tried to pass to the function.
This should solve the problem.
Based on this post. I need some basic implementation help. Below you see my model using a Dropout layer. When using the noise_shape parameter, it happens that the last batch does not fit into the batch size creating an error (see other post).
Original model:
def LSTM_model(X_train,Y_train,dropout,hidden_units,MaskWert,batchsize):
model = Sequential()
model.add(Masking(mask_value=MaskWert, input_shape=(X_train.shape[1],X_train.shape[2]) ))
model.add(Dropout(dropout, noise_shape=(batchsize, 1, X_train.shape[2]) ))
model.add(Dense(hidden_units, activation='sigmoid', kernel_constraint=max_norm(max_value=4.) ))
model.add(LSTM(hidden_units, return_sequences=True, dropout=dropout, recurrent_dropout=dropout))
Now Alexandre Passos suggested to get the runtime batchsize with tf.shape. I tried to implement the runtime batchsize idea it into Keras in different ways but never working.
import Keras.backend as K
def backend_shape(x):
return K.shape(x)
def LSTM_model(X_train,Y_train,dropout,hidden_units,MaskWert,batchsize):
batchsize=backend_shape(X_train)
model = Sequential()
...
model.add(Dropout(dropout, noise_shape=(batchsize[0], 1, X_train.shape[2]) ))
...
But that did just give me the input tensor shape but not the runtime input tensor shape.
I also tried to use a Lambda Layer
def output_of_lambda(input_shape):
return (input_shape)
def LSTM_model_2(X_train,Y_train,dropout,hidden_units,MaskWert,batchsize):
model = Sequential()
model.add(Lambda(output_of_lambda, outputshape=output_of_lambda))
...
model.add(Dropout(dropout, noise_shape=(outputshape[0], 1, X_train.shape[2]) ))
And different variants. But as you already guessed, that did not work at all.
Is the model definition actually the correct place?
Could you give me a tip or better just tell me how to obtain the running batch size of a Keras model? Thanks so much.
The current implementation does adjust the according to the runtime batch size. From the Dropout layer implementation code:
symbolic_shape = K.shape(inputs)
noise_shape = [symbolic_shape[axis] if shape is None else shape
for axis, shape in enumerate(self.noise_shape)]
So if you give noise_shape=(None, 1, features) the shape will be (runtime_batchsize, 1, features) following the code above.
I'm trying to create a pretty simple GANs model, and not sure how to combine the generator and the discriminator for training the generator
from keras import optimizers
from keras.layers import Input, Dense
from keras.models import Sequential, Model
import numpy as np
def build_generator(input_dim=10, output_dim=40, hidden_dim=28):
model = Sequential()
model.add(Dense(hidden_dim, input_dim=input_dim, activation='sigmoid', kernel_initializer="random_uniform"))
model.add(Dense(output_dim, activation='sigmoid', kernel_initializer="random_uniform"))
return model
def build_discriminator(input_dim=40, hidden_dim=28, output_dim=50):
input_d = Input(shape=(input_dim,))
encoded = Dense(hidden_dim, activation='sigmoid', kernel_initializer="random_uniform")(input_d)
decoded = Dense(output_dim, activation='sigmoid', kernel_initializer="random_uniform")(encoded)
x = Dense(1, activation='relu')(encoded)
y = Dense(1, activation='sigmoid')(encoded)
model = Model(inputs=input_d, outputs=[decoded, x, y])
return model
sgd = optimizers.SGD(lr=0.1)
generator = build_generator(10, 100, 70)
discriminator = build_discriminator(100, 60, 80)
generator.compile(loss='mean_squared_error', optimizer=sgd)
discriminator.trainable = True
discriminator.compile(loss='mean_squared_error', optimizer=sgd)
discriminator.trainable = False
Now I'm not sure how to combine them both, so the discriminator will receive the generator output and than will pass the generator back propagation data
For this, the best to do is to use the functional Model API. This is suited for more complex models, accepting branches, concatenations, etc.
(It's still possible, in this specific case to use the sequential models, but using the functional API always sounded better to me, for freedom and further experiments on the models)
So, you may preserve your two sequential models. All you have to do is to build a third model that contains these two.
generator = build_generator(....) #don't create a new generator, use the one you have.
discriminator = build_discriminator(....)
Now, a functional API model has its input shape defined like this:
inputTensor = Input(inputShape) #inputShape must be the same as in generator
And we work by passing inputs to layers and getting outputs:
#Getting the output of the generator given our input tensor:
genOut = generator(inputTensor) #you call a model just like you call a layer
#and we pass the generator's output to the discriminator, getting its output:
discOut = discriminator(genOut)
Finally, we create the actual model by defining its start and end points:
GAN = Model(inputTensor, discOut)
Use the model.layers[i].trainable parameter before compile to define which layers will be trainable or not in each of the models.
Combining the Generator & Discriminator models can, indeed, sometimes be quite confusing. I found this repository in the link below, which demonstrates quite well with a detailed code of how to construct multiple architectures of GANs in keras:
https://github.com/kochlisGit/Keras-GAN