I have a Keras Model which calculates two tensors, r1 and r2 of the same shape. I would like to have the model calculate (r1 - r2)**2.
I can take the sum of these tensors with keras.layers.add(r1, r2). I can take a product with keras.layers.multiply(r1, r2). If there was a subtract function, I'd write
r = keras.layers.subtract(r1, r2)
square_diff = keras.layers.multiply(r, r)
but there doesn't appear to be a keras.layers.subtract function.
In lieu of that I've been trying to figure out how to multiply one of my inputs by a constant -1 tensor and then adding, but I can't figure out how to create that -1 tensor. I've tried a number of variants on
negative_one = keras.backend.constant(np.full(r1.get_shape()), -1)
none of which work. Presumably because the dimensionality of r1 is (?, 128) (i.e. the first dimension is a batch size, and the second represents 128 hidden elements.)
What is the correct way in Keras to take the difference of two tensors?
As dhinckley mentioned, you should use Lambda layer. But I would suggest to define your custom function first. With this code will a little bit more clear:
import keras.backend as K
from keras.layers import Lambda
def squared_differences(pair_of_tensors):
x, y = pair_of_tensors
return K.square(x - y)
square_diff = Lambda(squared_differences)([r1, r2])
I'm not qualified to say whether or not this is the correct way, but the following code will calculate (r1 - r2)**2 as you request. The key enabler here is the use of the Keras functional API and Lambda layers to invert the sign of an input tensor.
import numpy as np
from keras.layers import Input, Lambda
from keras.models import Model
from keras.layers import add
r1 = Input(shape=(1,2,2))
r2 = Input(shape=(1,2,2))
# Lambda for subtracting two tensors
minus_r2 = Lambda(lambda x: -x)(r2)
subtracted = add([r1,minus_r2])
out= Lambda(lambda x: x**2)(subtracted)
model = Model([r1,r2],out)
a = np.arange(4).reshape([1,1,2,2])
b = np.ones(4).reshape([1,1,2,2])
print(model.predict([a,b]))
# [[[[ 1. 0.]
# [ 1. 4.]]]]
print((a-b)**2)
# [[[[ 1. 0.]
# [ 1. 4.]]]]
Related
I recently started learning and using automatic differentiation to determine the gradients and jacobian matrix of a neural network with respect to a given input. The method suggested by tensorflow is the tape.gradient and tape.jacobian method. However, I am not able to obtain the jacobian matrix using this method due to some bug in tensorflow. It works when I calculated tape.gradient(y_pred, x), but not the jacobian matrix, which should have a shape of (200,3). I am open to other ways to calculate the jacobian matrix, but I am more inclined to use automatic differentiation methods within Tensorflow. The current version I am using is Tensorflow 2.1.0. Greatly appreciate any advice!
import tensorflow as tf
import numpy as np
# The neural network accepts 3 inputs and produces 200 outputs. The actual values of the inputs and outputs are not written in the code as it is too involved.
num_inputs = 3
num_outputs = 200
num_hidden_layers = 5
num_neurons = 50
kernel = 'he_uniform'
activation = tf.keras.layers.LeakyReLU(alpha=0.3)
# Details of model (MLP)
current_model = tf.keras.models.Sequential()
current_model.add(tf.keras.Input(shape=(num_inputs,)))
for i in range(num_hidden_layers):
current_model.add(tf.keras.layers.Dense(units=num_neurons, activation=activation, kernel_initializer=kernel))
current_model.add(tf.keras.layers.Dense(units=num_outputs, activation='linear', kernel_initializer=kernel))
# Finding the Jacobian matrix with respect to a given input of the neural network
# In this case, the inputs are [0.02, 0.4 and 0.12] (i.e. 3 inputs)
x = tf.Variable([[0.02, 0.4, 0.12]], dtype=tf.float32)
with tf.GradientTape() as tape:
y_pred = x
for layer in current_model.layers:
y_pred = layer(y_pred)
jacobian = tape.jacobian(y_pred, x)
print(jacobian)
Below is the error returned. I removed some parts for privacy purposes.
StagingError: in converted code:
C:\Users\...\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\ops\parallel_for\control_flow_ops.py:183 f *
return _pfor_impl(loop_fn, iters, parallel_iterations=parallel_iterations)
C:\Users\...\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\ops\parallel_for\control_flow_ops.py:256 _pfor_impl
outputs.append(converter.convert(loop_fn_output))
C:\Users\...\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\ops\parallel_for\pfor.py:1280 convert
output = self._convert_helper(y)
C:\Users\...\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\ops\parallel_for\pfor.py:1453 _convert_helper
if flags.FLAGS.op_conversion_fallback_to_while_loop:
C:\Users\...\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\platform\flags.py:84 __getattr__
wrapped(_sys.argv)
C:\Users\...\anaconda3\envs\tf\lib\site-packages\absl\flags\_flagvalues.py:633 __call__
name, value, suggestions=suggestions)
UnrecognizedFlagError: Unknown command line flag 'f'
I would like to code in tf.Keras a Neural Network with a couple of loss functions. One is a standard mse (mean squared error) with a factor loading, while the other is basically a regularization term on the output of a hidden layer. This second loss is added through self.add_loss() in a user-defined class inheriting from tf.keras.layers.Layer. I have a couple of questions (the first is more important though).
1) The error I get when trying to combine the two losses together is the following:
ValueError: Shapes must be equal rank, but are 0 and 1
From merging shape 0 with other shapes. for '{{node AddN}} = AddN[N=2, T=DT_FLOAT](loss/weighted_loss/value, model/new_layer/mul_1)' with input shapes: [], [100].
So it comes from the fact that the tensors which should add up to make one unique loss value have different shapes (and ranks). Still, when I try to print the losses during the training, I clearly see that the vectors returned as losses have shape batch_size and rank 1. Could it be that when the 2 losses are summed I have to provide them (or at least the loss of add_loss) as scalar? I know the mse is usually returned as a vector where each entry is the mse from one sample in the batch, hence having batch_size as shape. I think I tried to do the same with the "regularization" loss. Do you have an explanation for this behavio(u)r?
The sample code which gives me error is the following:
import numpy as np
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Input
def rate_mse(rate=1e5):
#tf.function # also needed for printing
def loss(y_true, y_pred):
tmp = rate*K.mean(K.square(y_pred - y_true), axis=-1)
# tf.print('shape %s and rank %s output in mse'%(K.shape(tmp), tf.rank(tmp)))
tf.print('shape and rank output in mse',[K.shape(tmp), tf.rank(tmp)])
tf.print('mse loss:',tmp) # print when I put tf.function
return tmp
return loss
class newLayer(tf.keras.layers.Layer):
def __init__(self, rate=5e-2, **kwargs):
super(newLayer, self).__init__(**kwargs)
self.rate = rate
# #tf.function # to be commented for NN training
def call(self, inputs):
tmp = self.rate*K.mean(inputs*inputs, axis=-1)
tf.print('shape and rank output in regularizer',[K.shape(tmp), tf.rank(tmp)])
tf.print('regularizer loss:',tmp)
self.add_loss(tmp, inputs=True)
return inputs
tot_n = 10000
xx = np.random.rand(tot_n,1)
yy = np.pi*xx
train_size = int(0.9*tot_n)
xx_train = xx[:train_size]; xx_val = xx[train_size:]
yy_train = yy[:train_size]; yy_val = yy[train_size:]
reg_layer = newLayer()
input_layer = Input(shape=(1,)) # input
hidden = Dense(20, activation='relu', input_shape=(2,))(input_layer) # hidden layer
hidden = reg_layer(hidden)
output_layer = Dense(1, activation='linear')(hidden)
model = Model(inputs=[input_layer], outputs=[output_layer])
model.compile(optimizer='Adam', loss=rate_mse(), experimental_run_tf_function=False)
#model.compile(optimizer='Adam', loss=None, experimental_run_tf_function=False)
model.fit(xx_train, yy_train, epochs=100, batch_size = 100,
validation_data=(xx_val,yy_val), verbose=1)
#new_xx = np.random.rand(10,1); new_yy = np.pi*new_xx
#model.evaluate(new_xx,new_yy)
print(model.predict(np.array([[1]])))
2) I would also have a secondary question related to this code. I noticed that printing with tf.print inside the function rate_mse only works with tf.function. Similarly, the call method of newLayer is only taken into consideration if the same decorator is commented during training. Can someone explain why this is the case or reference me to a possible solution?
Thanks in advance to whoever can provide me help. I am currently using Tensorflow 2.2.0 and keras version is 2.3.0-tf.
I stuck with the same problem for a few days. "Standard" loss is going to be a scalar at the moment when we add it to the loss from add_loss. The only way how I get it working is to add one more axis while calculating mean. So we will get a scalar, and it will work.
tmp = self.rate*K.mean(inputs*inputs, axis=[0, -1])
I have a RGB image of shape (256,256,3) and I have a weight mask of shape (256,256). How do I perform the element-wise multiplication between them with Keras? (all channels share the same mask)
You need a Reshape so both tensors have the same number of dimensions, and a Multiply layer
mask = Reshape((256,256,1))(mask)
out = Multiply()([image,mask])
If you have variable shapes, you can use a single Lambda layer like this:
import keras.backend as K
def multiply(x):
image,mask = x
mask = K.expand_dims(mask, axis=-1) #could be K.stack([mask]*3, axis=-1) too
return mask*image
out = Lambda(multiply)([image,mask])
As an alternative you can do this using a Lambda layer (as in #DanielMöller's answer you need to add a third axis to the mask):
from keras import backend as K
out = Lambda(lambda x: x[0] * K.expand_dims(x[1], axis=-1))([image, mask])
I am using keras to predict time series with LSTM and I realize that we can predict using datas that has not the same timestep than the ones we used to train. For example:
import numpy as np
import keras.optimizers
from keras.models import Sequential
from keras.layers import Dense,Activation,Dropout,TimeDistributed
from keras.layers import LSTM
Xtrain = np.random.rand(10,3,2) #Here timestep is 3
Ytrain = np.random.rand(10,1)
model = Sequential()
model.add(LSTM(input_dim = Xtrain.shape[2],output_dim =10,return_sequences = False))
model.add(Activation("sigmoid"))
model.add(Dense(1))
KerasOptimizer = keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)
model.compile(loss="mse", optimizer=KerasOptimizer)
model.fit(Xtrain,Ytrain,nb_epoch = 1,batch_size = 1)
XBis = np.random.rand(10,4,2) #here timestep is 4
XTer = np.random.rand(10,2,2) #here timestep is 2
model.predict(Xtrain)
model.predict(XBis)
model.predict(XBis)
So my question is: why is that? If we train a model with n timesteps and we use data with n+1 timestep for prediction maybe the model uses only the first n timesteps. But if we try to predict with n-1 timestep, how is it working?
If you look at how the LSTM layer is defined in your example, you will note that you are not telling specifically what is the size of the time dimension, only the number of features present at each time point (input_dim) and the number of desired output features (output_dim). Also, since you have return_sequences=False it will only output the result at the last time point, so the tensor yielded by the layer will always have the shape [batch size] x [output dim] (in this case, 10 x 10), discarding the time dimension.
So the size of the time dimension does not really affect to the "applicability" of the model; the layer will just go through all the available time steps and give you the last output.
Of course, that does not mean that the model will necessarily work well for any input. If all the examples in your training data have a time dimension of size N but the you try to predict using N+1, N-1, 100 * N or whatever else, you may not have reliable results.
I am trying to reuse the weight matrix from a previous layer. As a toy example I want to do something like this:
import numpy as np
from keras.layers import Dense, Input
from keras.layers import merge
from keras import backend as K
from keras.models import Model
inputs = Input(shape=(4,))
inputs2 = Input(shape=(4,))
dense_layer = Dense(10, input_shape=(4,))
dense1 = dense_layer(inputs)
def my_fun(my_inputs):
w = my_inputs[0]
x = my_inputs[1]
return K.dot(w, x)
merge1 = merge([dense_layer.W, inputs2], mode=my_fun)
The problem is that dense_layer.W is not a keras tensor. So I get the following error:
Exception: Output tensors to a Model must be Keras tensors. Found: dot.0
Any idea on how to convert dense_layer.W to a Keras tensor?
Thanks
It seems that you want to share weights between layers.
I think You can use denselayer as shared layer for inputs and inputs2.
merge1=dense_layer(inputs2)
Do check out shared layers # https://keras.io/getting-started/functional-api-guide/#shared-layers
I don't think that you can use the merge layer like this.
But to answer your question, you will probably have to create a custom layer which has tied weights. Look at this example.
Otherwise, the way to access the weights of a layer is to use get_weights() method on that layer, this will retrun a list of numpy arrays containing the weights. For the case of the Dense layer, it will contain weights and bias.
There are two cases for the solution, depending on what you are trying to do:
You would like to share the W matrix between your two operations, and the W matrix for these two operations are kept the same even if its value changed during training or for some other reason. Then you should use dense.weights[0] which is the W matrix as a tensor from your dense layer.
If you are only going to use the value of W matrix at the time of your code is written and this value is never going to change, then use K.constant(dense.get_weights[0]) which extracts the weights as numpy array and is converted into tensor.