I have built a Sequential Keras model with three layers: A Gaussian Noise layer, a hidden layer, and the output layer with the same dimension as the input layer. For this, I'm using the Keras package that comes with Tensorflow 2.0.0-beta1. Thus, I'd like to get the output of the hidden layer, such that I circumvent the Gaussian Noise layer since it's only necessary in the training phase.
To achieve my goal, I followed the instructions in https://keras.io/getting-started/faq/#how-can-i-obtain-the-output-of-an-intermediate-layer, which are pretty much described in Keras, How to get the output of each layer? too.
I have tried the following example from the official Keras documentation:
from tensorflow import keras
from tensorflow.keras import backend as K
dae = keras.Sequential([
keras.layers.GaussianNoise( 0.001, input_shape=(10,) ),
keras.layers.Dense( 80, name="hidden", activation="relu" ),
keras.layers.Dense( 10 )
])
optimizer = keras.optimizers.Adam()
dae.compile( loss="mse", optimizer=optimizer, metrics=["mae"] )
# Here the fitting process...
# dae.fit( · )
# Attempting to retrieve a decoder functor.
encoder = K.function([dae.input, K.learning_phase()],
[dae.get_layer("hidden").output])
However, when K.learning_phase() is used to create the Keras backend functor, I get the error:
Traceback (most recent call last):
File "/anaconda3/lib/python3.6/contextlib.py", line 99, in __exit__
self.gen.throw(type, value, traceback)
File "/anaconda3/lib/python3.6/site-packages/tensorflow_core/python/keras/backend.py", line 534, in _scratch_graph
yield graph
File "/anaconda3/lib/python3.6/site-packages/tensorflow_core/python/keras/backend.py", line 3670, in __init__
base_graph=source_graph)
File "/anaconda3/lib/python3.6/site-packages/tensorflow_core/python/eager/lift_to_graph.py", line 249, in lift_to_graph
visited_ops = set([x.op for x in sources])
File "/anaconda3/lib/python3.6/site-packages/tensorflow_core/python/eager/lift_to_graph.py", line 249, in <listcomp>
visited_ops = set([x.op for x in sources])
AttributeError: 'int' object has no attribute 'op'
The code works great if I don't include K.learning_phase(), but I need to make sure that the output from my hidden layer is evaluated over an input that is not polluted with noise (i.e. in "test" mode -- not "training" mode).
I know my other option is to create a model from the original denoising autoencoder, but can anyone point me into why my approach from the officially documented functor creation fails?
Firstly, ensure your packages are up-to-date, as your script works fine for me. Second, encoder won't get the outputs - continuing from your snippet after # Here is the fitting process...,
x = np.random.randn(32, 10) # toy data
y = np.random.randn(32, 10) # toy labels
dae.fit(x, y) # run one iteration
encoder = K.function([dae.input, K.learning_phase()], [dae.get_layer("hidden").output])
outputs = [encoder([x, int(False)])][0][0] # [0][0] to index into nested list of len 1
print(outputs.shape)
# (32, 80)
However, as of Tensorflow 2.0.0-rc2, this will not work with eager execution enabled - disable via:
tf.compat.v1.disable_eager_execution()
Related
I have updated my tensorflow in Python from 1.14 to 2.0 . Now I have a problem with gradient computing, in order to see the GradCam visualisation for a layer.
For example with a model named my_cnn_model, that is already fitted on data, for a classification problem with three classes. If I want to "compute the gradCam" for a given layer named "conv2d_3" for example, I would start with the following in 1.14 :
layer_conv = my_cnn_model.get_layer( "conv2d_3" )
#I want it with respect to the first class (so 0), because for example it might have been the model prediction for that image, so I check the proba for that class :
final_layer = my_cnn_model.output[:, 0]
#Then I computed the gradients like that :
grads = keras.backend.gradients( final_layer, layer_conv.output )[0]
print(grads)
The last statement (print) would say (the shape is specific for the cnn I used but nevermind):
Tensor("gradients/max_pooling2d/MaxPool_grad/MaxPoolGrad:0", shape=(?, 76, 76, 64), dtype=float32)
Now, when I use tf 2.0 : the grads computing part, so :
grads = keras.backend.gradients( final_layer, layer_conv.output )[0]
is not working any more, with the error :
RuntimeError: tf.gradients is not supported when eager execution is enabled. Use tf.GradientTape instead.
I already searched, and found things like
with tf.GradientTape() as tape:
...
But all the same I get errors, or I couldn't get the same output Tensor("gradients/max_pooling2d/MaxPool_grad/MaxPoolGrad:0", shape=(?, 76, 76, 64), dtype=float32), so the rest of my gradcam function does not work.
How could I compute the grads, which of course would be similar to my 1.14 tf env? Do I miss something trivial?
Edit : I used the functionnal API, with my own CNN, or with "Transfer Learning" model already here in tf.keras, with modified/added layers at the top.
Thanks for any help.
If you are not interested in eager mode, like using old code all around, you can simply disable eager execution.
As mentioned here:
import tensorflow as tf
tf.compat.v1.disable_eager_execution()
If, on the other hand, you want to keep eager mode on, of if another thing is troubling your code, you can instead:
#you need a persistent tape if you're calling many gradients instead of just one
with tf.GradientTape(persistent = True) as tape:
#must "watch" all variables that are not "trainable weights"
#if you are using them for gradients
tape.watch(layer_conv.output)
#if the input data should be watched (you're getting the gradients related to the inputs)
input_tensor = tf.constant(input_data)
tape.watch(input_tensor)
#must do the entire prediction inside this tape block.
#it would be better if you could make your model output all tensors of interest
#not sure if you can do "some_layer.output" in eager mode for this purpose
model_outputs = model(input_tensor)
#finally, outside the block you can get the gradients
g1 = tape.gradient(model_outputs, layer_conv.output)
#again, maybe you need this layer output to be "actually output"
#instead of gotten from the layer like this
g2 = tape.gradient(some_output, input_tensor)
g3...
g4...
#finally delete the persistent tape
del tape
I am trying to use this example code from the PyTorch website to convert a python model for use in the PyTorch c++ api (LibTorch).
Converting to Torch Script via Tracing
To convert a PyTorch model to Torch Script via tracing, you must pass an instance of your model along with an example input to the torch.jit.trace function. This will produce a torch.jit.ScriptModule object with the trace of your model evaluation embedded in the module’s forward method:
import torch
import torchvision
# An instance of your model.
model = torchvision.models.resnet18()
# An example input you would normally provide to your model's forward() method.
example = torch.rand(1, 3, 224, 224)
# Use torch.jit.trace to generate a torch.jit.ScriptModule via tracing.
traced_script_module = torch.jit.trace(model, example)
traced_script_module.save("model.pt")
This example works fine, and saves out the file as expected.
When i switch to this model:
model = models.segmentation.deeplabv3_resnet101(pretrained=True)
It gives me the following error:
File "convert.py", line 14, in <module>
traced_script_module = torch.jit.trace(model, example)
File "C:\Python37\lib\site-packages\torch\jit\__init__.py", line 636, in trace
raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1])
I assume this is because the example format is wrong, but how can I get the correct one?
Based on the comments below, my new code is:
import torch
import torchvision
from torchvision import models
model = models.segmentation.deeplabv3_resnet101(pretrained=True)
model.eval()
# An example input you would normally provide to your model's forward() method.
example = torch.rand(1, 3, 224, 224)
# Use torch.jit.trace to generate a torch.jit.ScriptModule via tracing.
traced_script_module = torch.jit.trace(model, example)
traced_script_module.save("model.pt")
And i now get the error:
File "convert.py", line 15, in <module>
traced_script_module = torch.jit.trace(model, example)
File "C:\Python37\lib\site-packages\torch\jit\__init__.py", line 636, in trace
var_lookup_fn, _force_outplace)
RuntimeError: Only tensors and (possibly nested) tuples of tensors are supported as inputs or outputs of traced functions (toIValue at C:\a\w\1\s\windows\pytorch\torch/csrc/jit/pybind_utils.h:91)
(no backtrace available)
(from pytorch forums)
trace only supports modules that have tensor or tuple of tensor as output.
According to deeplabv3 implementation, its output is OrderedDict. That is a problem.
To solve this, make a wrapper module
class wrapper(torch.nn.Module):
def __init__(self, model):
super(wrapper, self).__init__()
self.model = model
def forward(self, input):
results = []
output = self.model(input)
for k, v in output.items():
results.append(v)
return tuple(results)
model = wrapper(deeplap_model)
#trace...
Has my model saving out.
Your problem originates in the BatchNorm layer. If it requires more than one value per channel, then your model is in training mode. Could you invoke https://pytorch.org/cppdocs/api/classtorch_1_1nn_1_1_module.html#_CPPv4N5torch2nn6Module4evalEv on the model and see if there's an improvement?
Otherwise you could also try to generate random data with more than one instance in a batch, i.e. example = torch.rand(5, 3, 224, 224).
Furthermore, you should take care to properly normalise your data, however, this isn't causing the error here.
I'm trying to perform a k-max pooling in order to select top-k elements of a dense with shape (None, 30). I tried a MaxPooling1D layer but it doesn't work, since keras pooling layers require at least a 2d input shape. I'm using the following Lambda layer, but I got the following error:
layer_1.shape
(None, 30)
layer_2 = Lambda(lambda x: tf.nn.top_k(x, k=int(int(x.shape[-1])/2),
sorted=True,
name="Top_k_final"))(layer_1)
Error: File
"/usr/local/lib/python3.5/dist-packages/keras/engine/base_layer.py",
line 474, in call
output_shape = self.compute_output_shape(input_shape) File "/usr/local/lib/python3.5/dist-packages/keras/layers/core.py", line
652, in compute_output_shape
return K.int_shape(x) File "/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py",
line 591, in int_shape
return tuple(x.get_shape().as_list()) AttributeError: 'TopKV2' object has no attribute 'get_shape'
Based on this example, I solved the problem. In fact, I solved the problem by adding .values to get the tensor values from the tf.nn.top_k, as follows. But I'm not sure if my solution is correct or not.
layer_2 = Lambda(lambda x: tf.nn.top_k(x, k=int(int(x.shape[-1])/2),
sorted=True,
name="Top_k_final").values)(layer_1)
I'm trying to build a system to check sentence similarities using a siamese LSTM model using Manhattan distance as the distance function while merging two layers.
I'm using the code found in this article
https://medium.com/mlreview/implementing-malstm-on-kaggles-quora-question-pairs-competition-8b31b0b16a07
The issue is that after I've built and saved the model in a json file I'm unable to load the model as an error gets thrown saying
name 'exponent_neg_manhattan_distance' is not defined
Here's the code:
# Model variables
n_hidden = 50
gradient_clipping_norm = 1.25
batch_size = 64
n_epoch = 5
def exponent_neg_manhattan_distance(left, right):
''' Helper function for the similarity estimate of the LSTMs outputs'''
return K.exp(-K.sum(K.abs(left-right), axis=1, keepdims=True))
# The visible layer
left_input = Input(shape=(max_seq_length,), dtype='int32')
right_input = Input(shape=(max_seq_length,), dtype='int32')
embedding_layer = Embedding(len(embeddings), embedding_dim, weights=[embeddings], input_length=max_seq_length, trainable=False)
# Embedded version of the inputs
encoded_left = embedding_layer(left_input)
encoded_right = embedding_layer(right_input)
# Since this is a siamese network, both sides share the same LSTM
shared_lstm = LSTM(n_hidden)
left_output = shared_lstm(encoded_left)
right_output = shared_lstm(encoded_right)
# Calculates the distance as defined by the MaLSTM model
malstm_distance = Merge(mode=lambda x: exponent_neg_manhattan_distance(x[0], x[1]), output_shape=lambda x: (x[0][0], 1))([left_output, right_output])
# Pack it all up into a model
malstm = Model([left_input, right_input], [malstm_distance])
# Adadelta optimizer, with gradient clipping by norm
optimizer = Adadelta(clipnorm=gradient_clipping_norm)
malstm.compile(loss='mean_squared_error', optimizer=optimizer, metrics=['accuracy'])
# Start training
training_start_time = time()
malstm_trained = malstm.fit([X_train['left'], X_train['right']], Y_train, batch_size=batch_size, nb_epoch=n_epoch,
validation_data=([X_validation['left'], X_validation['right']], Y_validation))
print("Training time finished.\n{} epochs in {}".format(n_epoch, datetime.timedelta(seconds=time()-training_start_time)))
malstm.save('malstm.h5')
model_json = malstm.to_json()
with open ('malstm.json', 'w') as file:
file.write(model_json)
malstm.save_weights('malst_w.h5')
Now when I try to load the model I get the following error:
model = model_from_json(open('malstm.json').read(), custom_objects = {"exponent_neg_manhattan_distance":exponent_neg_manhattan_distance})
C:\Users\archi\Miniconda3\lib\site-packages\keras\engine\topology.py:1271: UserWarning: The `Merge` layer is deprecated and will be removed after 08/2017. Use instead layers from `keras.layers.merge`, e.g. `add`, `concatenate`, etc.
return cls(**config)
Traceback (most recent call last):
File "<ipython-input-12-4c72a4db6c29>", line 1, in <module>
model = model_from_json(open('malstm.json').read(), custom_objects = {"exponent_neg_manhattan_distance":exponent_neg_manhattan_distance})
File "C:\Users\archi\Miniconda3\lib\site-packages\keras\models.py", line 349, in model_from_json
return layer_module.deserialize(config, custom_objects=custom_objects)
File "C:\Users\archi\Miniconda3\lib\site-packages\keras\layers\__init__.py", line 55, in deserialize
printable_module_name='layer')
File "C:\Users\archi\Miniconda3\lib\site-packages\keras\utils\generic_utils.py", line 144, in deserialize_keras_object
list(custom_objects.items())))
File "C:\Users\archi\Miniconda3\lib\site-packages\keras\engine\topology.py", line 2524, in from_config
process_node(layer, node_data)
File "C:\Users\archi\Miniconda3\lib\site-packages\keras\engine\topology.py", line 2483, in process_node
layer(input_tensors, **kwargs)
File "C:\Users\archi\Miniconda3\lib\site-packages\keras\engine\topology.py", line 619, in __call__
output = self.call(inputs, **kwargs)
File "C:\Users\archi\Miniconda3\lib\site-packages\keras\legacy\layers.py", line 209, in call
return self.mode(inputs, **arguments)
File "<ipython-input-19-913812c640b3>", line 28, in <lambda>
NameError: name 'exponent_neg_manhattan_distance' is not defined
I've searched online and the issue is probably because of the use of the lambda function. Is there any way I could load this model because it took a crazy amount of time to train. Any help would be appreciated!
First save your model with model.save
Then load with with custom objects
model.save("model_path")
from keras.models import load_model
# Returns a compiled model identical to the previous one model =
load_model('model_path',
custom_objects={
'RAdam':RAdam,
'exponent_neg_manhattan_distance':exponent_neg_manhattan_distance})
Converting comment into answer: you can salvage the weights of the network if you just create it yourself in code again. The error is about creating the network from JSON, but let's follow from:
# ...
# Pack it all up into a model
malstm = Model([left_input, right_input], [malstm_distance])
# ... you don't need compile for only predict
# ... skip training and model saving
# malstm.save_weights('malst_w.h5')
malstm.load_weights('malst_w.h5')
malstm.predict(...)
now, the weights are loaded into the existing model which you created in code.
When I train an scikit-learn v0.15 SGDClassifier with these options: SGDClassifier(loss='log', class_weight=None, penalty='l2'), training completes with no error.
Yet when I train this classifier with class_weight='auto' on scikit-learn v0.15, I get this error:
return self.model.fit(X, y)
File "/home/rose/.local/lib/python2.7/site-packages/scikit_learn-0.15.0b1-py2.7-linux-x86_64.egg/sklearn/linear_model/stochastic_gradient.py", line 485, in fit
sample_weight=sample_weight)
File "/home/rose/.local/lib/python2.7/site-packages/scikit_learn-0.15.0b1-py2.7-linux-x86_64.egg/sklearn/linear_model/stochastic_gradient.py", line 389, in _fit
classes, sample_weight, coef_init, intercept_init)
File "/home/rose/.local/lib/python2.7/site-packages/scikit_learn-0.15.0b1-py2.7-linux-x86_64.egg/sklearn/linear_model/stochastic_gradient.py", line 336, in _partial_fit
y_ind)
File "/home/rose/.local/lib/python2.7/site-packages/scikit_learn-0.15.0b1-py2.7-linux-x86_64.egg/sklearn/utils/class_weight.py", line 43, in compute_class_weight
raise ValueError("classes should have valid labels that are in y")
ValueError: classes should have valid labels that are in y
What could cause it?
For reference, here's the documentation on class_weight:
Preset for the class_weight fit parameter. Weights associated with
classes. If not given, all classes are supposed to have weight one.
The “auto” mode uses the values of y to automatically adjust weights
inversely proportional to class frequencies.
I think this may be a bug within scikit-learn. As a work around, try the following:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y_encoded = le.fit_transform(y)
self.model.fit(X, y_encoded)
pred = le.inverse_transform(self.model.predict(X))
I am working on a fix in this PR:
https://github.com/scikit-learn/scikit-learn/pull/3515
Please feel free to test and report if it solves the problem for you.