Continue training a deeplearning4j model after it has been saved and loaded - conv-neural-network

I am using a Convolutional Neural Network and I am saving it and loading it via the model serializer class.
What I want to do is to be able to come back at a later time and continue training the model on new data provided to it.
What I am doing is I load it using
ComputationGraph net = ModelSerializer.restoreComputationGraph(modelFileName);
and then I give it the data like before with
net.train(dataSetIterator);
This seems to work, but it makes my accuracy really bad. It was about 89% before I did this, and, using the same data, it gets to be around 50% accurate after a few iterations (using the same data it just trained itself on, so if anything it should be getting stupidly more accurate right?).
Am I missing a step?

I think it'll be difficult to answer based on the information given, but I'll give you an example.
I had this exact problem. I had based my app on the GravesLSTMCharModellingExample (which is LSTM). I had saved my model after running for a couple of epochs (at which point it generated legible sentences), but when loading it, it produced garbage.
I thought everything was the same, but in the end it turned out I didn't initialize the CharacterIterator the same. When I fixed it, it worked as expected.
So to cut a long story short;
Check your values when initializing the auxiliary classes.

Related

Loading a model checkpoint in lesser amount of memory

I had a question that I can't find any answers to online. I have trained a model whose checkpoint file is about 20 GB. Since I do not have enough RAM with my system (or Colaboratory/Kaggle either - the limit being 16 GB), I can't use my model for predictions.
I know that the model has to be loaded into memory for the inferencing to work. However, is there a workaround or a method that can:
Save some memory and be able to load it in 16 GB of RAM (for CPU), or the memory in the TPU/GPU
Can use any framework (since I would be working with both) TensorFlow + Keras, or PyTorch (which I am using right now)
Is such a method even possible to do in either of these libraries? One of my tentative solutions was not load it in chunks perhaps, essentially maintaining a buffer for the model weights and biases and performing calculations accordingly - though I haven't found any implementations for that.
I would also like to add that I wouldn't mind the performance slowdown since it is to be expected with low-specification hardware. As long as it doesn't take more than two weeks :) I can definitely wait that long...
Yoy can try the following:
split model by two parts
load weights to the both parts separately calling model.load_weights(by_name=True)
call the first model with your input
call the second model with the output of the first model

Keras Batch Normalization "is broken": model fails to predict. Is it _really_ broken? Is there a fix? Or specific documentation about?

Intro
I am making a classifier to recognize presence of defects in pictures, and in the path of improving my models, I tried Batch Normalization, mainly to exploit its ability to fasten convergence.
While it gives the expected speed benefits, I also observed some strange symptoms:
validation metrics are far from good. It smells of overfitting of course
predictions calculated at any point during training are completely wrong, particularly when images are picked from the training dataset; the corresponding metrics match with the (val_loss, val_acc) rather than with (loss, acc) printed during training
This failing to predict is the evidence that worries me the most. A model which does not predict the same as in training, is useless!
Searches
Googling around I found some posts that seem to be related, particularly this one (Keras BN layer is broken) which also claims the existence of a patch and of a pull request, that sadly "was rejected".
This is quite convincing, in that it explains a failure mechanism that matches my observations. As far as I understand, since BN calculates and keeps moving statistics (exponential averages and standard deviations) for doing its job, which require many iterations to stabilize and become significant, of course it will behave bad when it comes to make a prediction from scratch, when those statistics are not mature enough (in case I have misunderstood this concept, please tell me).
Actual Questions
But thinking more thoroughly, this doesn't really close the issue, and actually raises further doubts. I am still perplexed that:
This Keras BN being broken, is said to affect the use case of transfer learning, while mine is a classical case of a convolutional classifier, trained starting form standard glorot initialization. This should have been complained about by thousands of users, while instead there isn't much discussion about)
technically: if my understanding is correct, why aren't these statistics (since they are so fundamental for prediction) saved in the model, so that their latest update is available to make a prediction? It seems perfectly feasible to keep and use them at prediction time, as for any trainable parameter
managementwise: if Keras' BN were really broken, how could such a deadful bug remain unaddressed for more than one year? Isn't really out there anybody using BN and needing predictions out of their models? And not even anybody able to fix it?
more practically: on the contrary, if it is not a bug, but just a bad understanding on how to use it, were do I get a clear illustration of "how to correctly get a prediction in Keras for a model which uses BN?" (demo code would be appreciated)
Obviously I would really love that the right questions is the last, but I had to include the previous ones, given the evidence of someone claiming that Keras BN is broken.
Note to SE OP: before *closing the question as too broad*, please consider that, being not really clear what the issue is (Keras BN being broken, or the users being unable to use it properly), I had to offer more directions, among which whoever wishing to answer can choose.
Details
I am using keras 2.2.4 from a python 3.6 virtual environment (under pyenv/virtualenv).
data are fed through a classic ImageDataGenerator() + flow_from_directory() / flow_from_dataframe() scheme (augmentation is turned off though: only rescale=1./255 is applied), but I also tried to make them static
actually in the end, for verifying the above behaviour, I generated only one dataset x,y=next(valid_generator) and used an unique batch scheme for both training and validation. While on the training side it converges (yes, the aim was exactly to let it overfit!), on the validation side both metrics are poor and predictions are completely wrong and erratic (almost random)
in this setup, if BN is turned off, val_loss and val_acc match exactly with loss and acc, and with those that I can obtain from predictions calulated after training has finished.
Update
In the process of writing a minimal example of the issue, after battling to put in evidence the problem, I recognized that the problem is showing/not showing up in different machines. In particular, the problem is evident on a host running Keras 2.3.1, while another host with Keras 2.2.4 doesn't show it.
I'll post a minimal example here along with specific module versions asap.

Tensorflow object detection API - validation loss behaviour

I am trying to use the TensorFlow object detection API to recognize a specific object (guitars) in pictures and videos.
As for the data, I downloaded the images from the OpenImage dataset, and derived the .tfrecord files. I am testing with different numbers, but for now let's say I have 200 images in the training set and 100 in the evaluation one.
I'm traininig the model using the "ssd_mobilenet_v1_coco" as a starting point, and the "model_main.py" script, so that I can have training and validation results.
When I visualize the training progress in TensorBoard, I get the following results for train:
and validation loss:
respectively.
I am generally new to computer vision and trying to learn, so I was trying to figure out the meaning of these plots.
The training loss goes as expected, decreasing over time.
In my (probably simplistic) view, I was expecting the validation loss to start at high values, decrease as training goes on, and then start increasing again if the training goes on for too long and the model starts overfitting.
But in my case, I don't see this behavior for the validation curve, which seems to be trending upwards basically all the time (excluding fluctuations).
Have I been training the model for too little time to see the behavior I'm expecting? Are my expectations wrong in the first place? Am I misinterpreting the curves?
Ok, I fixed it by decreasing the initial_learning_rate from 0.004 to 0.0001.
It was the obvious solution, considering the wild oscillations of the validation loss, but at first I thought it wouldn't work since there seems to be already a learning rate scheduler in the config file.
However, immediately below (in the config file) there's a num_steps option, and it's stated that
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
Honestly, I don't remember if I commented out the num_steps option...if I didn't, it seems my learning rate was kept to the initial value of 0.004, which turned out to be too high.
If I did comment it out (so that the learning scheduler was active), I guess that, instead of the decrease, it still started from too high of a value.
Anyway, it's working much better now, I hope this can be useful if anyone is experiencing the same problem.

Why isn't my RL model behaving the same after being loaded in pytorch?

I'm training some simple neural networks for Reinforcement Learning in Pytorch. At the end of the training, I save the model like so:
torch.save(self.policy_NN.state_dict(), self.model_fname)
It's doing pretty good at this point. Then later, in another script, I load it again, like so:
self.policy_NN.load_state_dict(torch.load(model_fname))
And then just play out the episode, as if the training never stopped (except I'm not doing DQN learning anymore, it's just taking the greedy action at each point). So I'd expect it to behave basically as it did when I saved it.
However, whenever I load it, it behaves completely differently, to the point that it seems like it didn't learn at all before I saved it. For example, if I look at the last 1000 time steps of the training session, it will get many rewards, but after loading it, it gets basically none.
I've verified (by doing print(self.policy_NN.state_dict())) that the weights and biases are in fact the same when I save the model and when I load it again.
What could be going on? Is there something else to the network that might not be getting saved somehow?
Dropout and some other layers behave differently in eval and train mode. You can switch between these by model.train() and model.eval().
I remember reading that RL typically suffers from brittle learning where changing the input even slightly causes wildly different performance. The example was when training an algorithm on an Atari game and simply by shifting the screen right by one pixel the network lost all performance gains.
You might want to check that in both modes your environment behaves similarly.

Aggregate training results to predits

When training the model the results depend on the sampling. In order to obtain something better you could repeat the training (in another randomly create training sample, using Ffolds, StratifiedKFold ... ), somehow aggregate the results and have this way a result that will be more robust that one create in a particular case alone. Question: is it already implemented in sklearn or similar?. Apologies is this is a straighforward question, I haven't see a simple solution.
I see that there is a function called cross_val_predict however my first impresion having a quick look to the source code is that it predecits as many times as trains and I would like to predicts only ones, so I can piclke the, somehow aggregate results, and predict later, instead of repeat the whole training thing again.
So far I think the best option are the ensemblers in sklearn.
I left here the solution I was using before. I am pretty sure could be improved (as mentioned before the Ensemblers in sklearn) are better. I have placed here https://github.com/rafaelvalero/aggreating_predictions_sklearn, where I have left a notebook with and example (using iris database), in case anyone can play around and see in details how could be done.
That solution will train models (in parallel, using joblib), pickle the trained model (a model from SKlearn), store the results (using joblib dump) and later would recover them to create predictions (in parallel, using joblib) that later are aggregated.

Resources