How to get rid of "Keras weights file saving" message - keras

Im currently working with Keras and creating Convolutional NN. I have been always creating an architecture using Sequential() and now I tried to do it in another way another way of creating an architecture. And now, whenever I execute my code it always outputs annoying message annoying message and I dont know how to get rid of this
There is no info about such problem on the internet, but I have tried to do this
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
but it doesnt work
UPDATE:
I have figured out that this message appears after I do this:
model = Model(input_, output)
And now I dont know why this message appears

Related

Importing Tensorboard for maskRcnn(Matterport - Mask RCNN)

I am currently trying to implement Mask RCNN by following Matterport repo. I have a doubt regarding implementation of tensorboard.
The dataset is similar to coco dataset. Inside model.py under def train, tensorboard is mentioned as
callbacks = [ keras.callbacks.TensorBoard(log_dir=self.log_dir,histogram_freq=0, write_graph=True, write_images=False)
But What else I should mention for using tensorboard? When I try to run the tensorboard, it say log file not found. I know that there is something I am missing some where!!. Please help me out !
In your model.train() ensure you set custom_callbacks = callbacks parameter.
If you specified these parameters exactly like this, then it means that your issue is that you do not properly open the logs directory.
Open (inside Anaconda/PyCharm) or a separate Python terminal and put the absolute path(to make sure it works):
tensorboard --logdir = my_absolute_path/logs/

How to fix '_pickle.UnpicklingError: invalid load key, '<' ' error in Pytorch

The problem I encountered when I ran the official code of maskrcnn-benchmark for facebookresearch,which was wrong when loading the pre-training model.
The code runs on a remote server at the school and the graphics card is an NVIDIA P100.
checkpointer = DetectronCheckpointer(
cfg, model, optimizer, scheduler, output_dir, save_to_disk)
extra_checkpoint_data = checkpointer.load(cfg.MODEL.WEIGHT)
arguments.update(extra_checkpoint_data)
I expect to run the code correctly and understand why this is the problem.
The reason about the problem is that the previous download was not finished. So when I deleted the original file and re-downloaded it, the problem was solved.

Why would Tensorflow not save run_metadata?

I was simply trying to generate a summary that would show the run_metadata as follows:
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
summary = sess.run([x, y], options=run_options, run_metadata=run_metadata)
train_writer.add_run_metadata(paths.logs, 'step%d' % step)
train_writer.add_summary(paths.logs, step)
I made sure the path to the logs folder exists, this is confirmed by the fact the the summary file is generated but no metadata is presetn. Now I am not sure a file is actually generated to be honest (for the metadata), but when I open tensorboard, the graph looks fine and the session runs dropdown menu is populated. Now when I select any of the runs it shows a progress bar "Parsing metadata.pbtxt" that stops and hangs right half way through.
This prevents me from gathering any more additional info about my graph. Am I missing something ? A similar issue happened when trying to run this tutorial locally (MNIST summary tutorial). I feel like I am missing something simple. Does anyone have an idea about what could cause this issue ? Why would my tensorboard hang when trying to load a session run data ?
I can't believe I made it work right after posting the question but here it goes. I noticed that this line:
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
was giving me an error so I removed the params and turned it into
run_options = tf.RunOptions()
without realizing that this is what caused the metadata not to be parsed. Once I researched the error message:
Couldn't open CUDA library cupti64_90.dll
I looked into this Github Thread and moved the file into the bin folder. After that I ran again my code with the trace_level param, had no errors and the metadata was successfully parsed.

Google Colab: "Unable to connect to the runtime" after uploading Pytorch model from local

I am using a simple (not necessarily efficient) method for Pytorch model saving.
import torch
from google.colab import files
torch.save(model, filename) # save a trained model on the VM
files.download(filename) # download the model to local
best_model = files.upload() # select the model just downloaded
best_model[filename] # access the model
Colab disconnects during execution of the last line, and hitting RECONNECT tab always shows ALLOCATING -> CONNECTING (fails, with "unable to connect to the runtime" message in the left bottom corner) -> RECONNECT. At the same time, executing any one of the cells gives Error message "Failed to execute cell, Could not send execute message to runtime: [object CloseEvent]"
I know it is related to the last line, because I can successfully connect with my other google accounts which doesn't execute that.
Why does it happen? It seems the google accounts which have executed the last line can no longer connect to the runtime.
Edit:
One night later, I can reconnect with the google account after session expiration. I just attempted the approach in the comment, and found that just files.upload() the Pytorch model would lead to the problem. Once the upload completes, Colab disconnects.
Try disabling your ad-blocker. Worked for me
(I wrote this answer before reading your update. Think it may help.)
files.upload() is just for uploading files. We have no reason to expect it to return some pytorch type/model.
When you call a = files.upload(), a is a dictionary of filename - a big bytes array.
{'my_image.png': b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR....' }
type(a['my_image.png'])
Just like when you do open('my_image', 'b').read()
So, I think the next line best_model[filename] try to print the whole huge bytes array, which bugs the colab.

Tflearns .fit() method with numpy.ndarrays causing TypeError

So I get this error TypeError: unhashable type: 'numpy.ndarray' when executing the code below. I searched through Stackoverflow but haven't found a way to fix my problem. The goal is to classify digits via the mnist dataset. The error is in the modell.fit() method (from tflearn). I can attach the full error message of the error if needed. I tried it also with the method were you put the x and y lables in an dictionary and train it with this but it raised another error message. (Note I excluded my predict function in this code).
Code:
import tflearn.datasets.mnist as mnist
x,y,X,Y=mnist.load_data(one_hot=True)
x=x.reshape([-1,28,28,1])
X=X.reshape([-1,28,28,1])
import tflearn
class Neural_Network():
def __init__(self,x,y):
self.x=x
self.y=y
self.epochs=60000
def main(self):
cnn=tflearn.layers.core.input_data(shape=[None,28,28,1],name="input_layer")
cnn=tflearn.layers.conv.conv_2d(cnn,32,2, activation="relu")
cnn=tflearn.layers.conv.max_pool_2d(cnn,2)
cnn=tflearn.layers.conv.conv_2d(cnn,32,2, activation="relu")
cnn=tflearn.layers.conv.max_pool_2d(cnn,2)
cnn=tflearn.layers.core.flatten(cnn)
cnn=tflearn.layers.core.fully_connected(cnn,1000,activation="relu")
cnn=tflearn.layers.core.dropout(cnn,0.85)
cnn=tflearn.layers.core.fully_connected(cnn,10,activation="softmax")
cnn=tflearn.layers.estimator.regression(cnn,learning_rate=0.001)
modell=tflearn.DNN(cnn)
modell.fit(self.x,self.y)
modell.save("mnist.modell")
nn=Neural_Network(x,y)
nn.main()
nn.predict(X[1])
print("Label for prediction:",Y[1])
So the problem fixed it self. I only restarted my Jupiter-Notebook and everything worked fine. But with a few execptions: 1. I have to restart the Kernel everytime I want to retrain the net, 2. I get another error while I try to load the saved modell, so I can't work on (the error is NotFoundError: Key Conv2D_2/W not found in checkpoint). I will ask another question for this problem. Conclusion: Try to relod your Jupiter Notebook if something is't working well. And if you are want to train a ANN restart your Kernel.

Resources