output layer regularization implementation - keras

I’m building a NN model using keras, and I wish to impose a constraint on it that doesn’t (directly) have to do with the weights. Would be very grateful for some help / points me towards some relevant keywords to look up. The constraint I wish to impose is a bit complex, but it can be simplified in the following manner: I wish to impose a constraint on the output of certain inputs of the net. For the sake of simplicity, let’s say the constraint looks like NN(3)+NN(4) < 10, where NN is the neural net, which can be seen as a function. How can I impose such a constraint? Thank you very much in advance for any help on the subject!
edit: A more detailed explanation of what I'm trying to do and why.
The theoretical model I'm building is this:
I'm feeding the output of the first net into the input of the second net, along with an additive gaussian noise.
The constraint I wish to impose is on the output of the first NN (g). Why? Without a constraint, the net maps the inputs to outputs as high as it possibly can in order to make the additive noise as insignificant as possible. And rightly so, this is the optimal encoding function g, but it's not very interesting :) And so I wish to impose a constraint on the output of the first NN (g). More specifically, the constraint is on the total power of the function: integral{ fX(x) * g(x)^2 dx }. But this can be simplified more or less, to a function that looks something like what I described earlier - g(3)+g(4)<10. More specifically, the function is sum { fX(x) * g(i)^2 * dx } < max_power, for some sampled inputs i.
This is the problem, now here's how I attempted to implement it:
model = Sequential([
Dense(300, input_dim=1, activation='relu'),
Dense(300, activation='relu'),
Dense(1, activation='linear', name=encoder_output),
GaussianNoise(nvar, name='noise'),
Dense(300, activation='relu', name=decoder_input),
Dense(300, activation='relu'),
Dense(1, activation='linear', name=decoder_output),
])
Mainly, this is supposedly a single neural net, and not really 2 (although there is no difference obviously).
The import things to note is the input dim 1, output dim 1 (x and y in the diagram), and the gaussian noise in the middle. The hidden layers are not very interesting right now, I'll optimize them at a later point.
In this model, I wish to impose a constraint on the output of a (supposedly) hidden layer named encoder_output. Hope this clarifies things.

You could use a multi input/multi output model with shared weights layers. The model could for example look like this:
from keras.layers import Input, Dense, Add
from keras.models import Model
# Shared weights layers
hidden = Dense(10, activation='relu')
nn_output = Dense(1, activation='relu')
x1 = Input(shape=(1,))
h1 = hidden(x1)
y1 = nn_output(h1)
x2 = Input(shape=(1,))
h2 = hidden(x2)
y2 = nn_output(h2)
# Your constraint
# In case it should be more complicated, you can implement
# a custom keras layer
sum = Add()([y1, y2])
model = Model(inputs=[x1, x2], outputs=[y1, y2, sum])
model.compile(optimizer='sgd', loss='mse')
X_train_1 = [3,4]
X_train_2 = [4,3]
y_train_1 = [123,456] # your expected output
y_train_2 = [456,123] # your expected output
s = [10,10] # expected sums
model.fit([X_train_1, X_train_2], [y_train_1, y_train_2, s], epochs=10)
If you have no exact value for your constraint that can be used as an expected output, you can remove it from the outputs and write a simple custom regularizer that would be used on it. There is a simple example for a custom regularizer in the Keras documentation.

Related

Keras custom loss function that depends on the input features

I have a multilabel classification problem with K labels and also I have a function, let's call it f that for each example in the dataset takes in two matrices, let's call them H and P. Both matrices are part of the input data.
For each vector of labels y (for one example), i.e. y is a vector with dimension (K \times 1), I compute a scalar value f_out = f(H, P, y).
I want to define a loss function that minimizes the mean absolute percentage error between the two vectors formed by the values f_out_true = f(H, P, y_true) and f_out_pred = f(H, P, y_pred) for all examples.
Seeing the documentation of Keras, I know that customized loss function comes in the form custmLoss(y_pred, y_true), however, the loss function I want to define depends on the input data and these values f_out_true and f_out_pred need to be computed example by example to form the two vectors that I want to minimize the mean absolute percentage error.
As far as I am aware, there is no way to make a loss function that takes anything other than the model output and the corresponding ground truth. So, the only way to do what you want is to make the input part of your model's output. To do this, simply build your model with the functional API, and then add the input tensor to the list of outputs:
input = Input(input_shape)
# build the rest of your model with the standard functional API here
# this example model was taken from the Keras docs
x = Dense(100, activation='relu')(input)
x = Dense(100, activation='relu')(x)
x = Dense(100, activation='relu')(x)
output = Dense(10, activation='softmax')(x)
model = Model(inputs=[input], outputs=[output, input])
Then, make y_true a combination of your input data and the original ground truth.
I don't have a whole lot of experience with the functional API so it's hard to be more specific, but hopefully this points you in the right direction.

Keras hypernetwork implementation?

What would be the most straightforward way to implement a hypernetwork in Keras? That is, where one leg of the network creates the weights for another? In particular, I would like to do template matching where I feed the template in to a CNN leg that generates a convolutional kernel for a leg that operates on the main image. The part I'm unsure of is where I have a CNN layer that is fed weights externally, yet the gradients still flow through properly for training.
The weights leg:
For the weights leg, just create a regiular network as you would with Keras.
Be sure that its output(s) have shape like (spatial_kernel_size1, spatial_kernel_size2, input_channels, output_channels)
Usint the functional API you can create a few weights, for instance:
inputs = Input((imgSize1, imgSize2, imgChannels))
w1 = Conv2D(desired_channels, ....)(inputs)
w2 = Conv2D(desired_channels2, ....)(inputs or w1)
....
You should apply some kind of pooling here, since your outputs will have a huge size and you probably want filters with small sizes such as 3, 5, etc.
w1 = GlobalAveragePooling2D()(w1) #maybe GlobalMaxPooling2D
w2 = GlobalAveragePooling2D()(w2)
If you're using fixed image sizes, you could also use other kinds of pooling or flatten and dense, etc.
Make sure you reshape the weights for the correct shape.
w1 = Reshape((size1,size2,input_channels, output_channels))(w1)
w2 = Reshape((sizeA, sizeB, input_channels2, output_channels2))(w2)
....
The choice of the number of channels is up to you to optimize
The convolutional leg:
Now, this leg will only use "non trainable" convolutions, they can be found directly in the backend and be used in Lambda layers:
out1 = Lambda(lambda x: K.conv2d(x[0], x[1]))([inputs,w1])
out2 = Lambda(lambda x: K.conv2d(x[0], x[1]))([out1,w2])
Now, how you're going to interleave the layers, how many weights, etc., is also something you should optimize for yourself.
Create the model:
model = Model(inputs, out2)
Interleaving
You may take an output from this leg as input for the weight generator leg too:
w3 = Conv2D(filters, ...)(out2)
w3 = GlobalAveragePooling2D()(w3)
w3 = Reshape((sizeI, sizeII, inputC, outputC))(w3)
out3 = Lambda(lambda x: K.conv2d(x[0], x[1]))([out2,w3])

Tensorflow- How to display accuracy rate for a linear regression model

I have a linear regression model that seems to work. I first load the data into X and the target column into Y, after that I implement the following...
X_train, X_test, Y_train, Y_test = train_test_split(
X_data,
Y_data,
test_size=0.2
)
rng = np.random
n_rows = X_train.shape[0]
X = tf.placeholder("float")
Y = tf.placeholder("float")
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")
pred = tf.add(tf.multiply(X, W), b)
cost = tf.reduce_sum(tf.pow(pred-Y, 2)/(2*n_rows))
optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate).minimize(cost)
init = tf.global_variables_initializer()
init_local = tf.local_variables_initializer()
with tf.Session() as sess:
sess.run([init, init_local])
for epoch in range(FLAGS.training_epochs):
avg_cost = 0
for (x, y) in zip(X_train, Y_train):
sess.run(optimizer, feed_dict={X:x, Y:y})
# display logs per epoch step
if (epoch + 1) % FLAGS.display_step == 0:
c = sess.run(
cost,
feed_dict={X:X_train, Y:Y_train}
)
print("Epoch:", '%04d' % (epoch + 1), "cost=", "{:.9f}".format(c))
print("Optimization Finished!")
accuracy, accuracy_op = tf.metrics.accuracy(labels=tf.argmax(Y_test, 0), predictions=tf.argmax(pred, 0))
print(sess.run(accuracy))
I cannot figure out how to print out the model's accuracy. For example, in sklearn, it is simple, if you have a model you just print model.score(X_test, Y_test). But I do not know how to do this in tensorflow or if it is even possible.
I think I'd be able to calculate the Mean Squared Error. Does this help in any way?
EDIT
I tried implementing tf.metrics.accuracy as suggested in the comments but I'm having an issue implementing it. The documentation says it takes 2 arguments, labels and predictions, so I tried the following...
accuracy, accuracy_op = tf.metrics.accuracy(labels=tf.argmax(Y_test, 0), predictions=tf.argmax(pred, 0))
print(sess.run(accuracy))
But this gives me an error...
FailedPreconditionError (see above for traceback): Attempting to use uninitialized value accuracy/count
[[Node: accuracy/count/read = IdentityT=DT_FLOAT, _class=["loc:#accuracy/count"], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
How exactly does one implement this?
Turns out, since this is a multi-class Linear Regression problem, and not a classification problem, that tf.metrics.accuracy is not the right approach.
Instead of displaying the accuracy of my model in terms of percentage, I instead focused on reducing the Mean Square Error (MSE) instead.
From looking at other examples, tf.metrics.accuracy is never used for Linear Regression, and only classification. Normally tf.metric.mean_squared_error is the right approach.
I implemented two ways of calculating the total MSE of my predictions to my testing data...
pred = tf.add(tf.matmul(X, W), b)
...
...
Y_pred = sess.run(pred, feed_dict={X:X_test})
mse = tf.reduce_mean(tf.square(Y_pred - Y_test))
OR
mse = tf.metrics.mean_squared_error(labels=Y_test, predictions=Y_pred)
They both do the same but obviously the second approach is more concise.
There's a good explanation of how to measure the accuracy of a Linear Regression model here.
I didn't think this was clear at all from the Tensorflow documentation, but you have to declare the accuracy operation, and then initialize all global and local variables, before you run the accuracy calculation:
accuracy, accuracy_op = tf.metrics.accuracy(labels=tf.argmax(Y_test, 0), predictions=tf.argmax(pred, 0))
# ...
init_global = tf.global_variables_initializer
init_local = tf.local_variables_initializer
sess.run([init_global, init_local])
# ...
# run accuracy calculation
I read something on Stack Overflow about the accuracy calculation using local variables, which is why the local variable initializer is necessary.
After reading the complete code you posted, I noticed a couple other things:
In your calculation of pred, you use
pred = tf.add(tf.multiply(X, W), b). tf.multiply performs element-wise multiplication, and will not give you the fully connected layers you need for a neural network (which I am assuming is what you are ultimately working toward, since you're using TensorFlow). To implement fully connected layers, where each layer i (including input and output layers) has ni nodes, you need separate weight and bias matrices for each pair of successive layers. The dimensions of the i-th weight matrix (the weights between the i-th layer and the i+1-th layer) should be (ni, ni + 1), and the i-th bias matrix should have dimensions (ni + 1, 1). Then, going back to the multiplication operation - replace tf.multiply with tf.matmul, and you're good to go. I assume that what you have is probably fine for a single-class linear regression problem, but this is definitely the way you want to go if you plan to solve a multiclass regression problem or implement a deeper network.
Your weight and bias tensors have a shape of (1, 1). You give the variables the initial value of np.random.randn(), which according to the documentation, generates a single floating point number when no arguments are given. The dimensions of your weight and bias tensors need to be supplied as arguments to np.random.randn(). Better yet, you can actually initialize these to random values in Tensorflow: W = tf.Variable(tf.random_normal([dim0, dim1], seed = seed) (I always initialize random variables with a seed value for reproducibility)
Just a note in case you don't know this already, but non-linear activation functions are required for neural networks to be effective. If all your activations are linear, then no matter how many layers you have, it will reduce to a simple linear regression in the end. Many people use relu activation for hidden layers. For the output layer, use softmax activation for multiclass classification problems where the output classes are exclusive (i.e., where only one class can be correct for any given input), and sigmoid activation for multiclass classification problems where the output classes are not exlclusive.

Merge a forward lstm and a backward lstm in Keras

I would like to merge a forward LSTM and a backward LSTM in Keras. The input array of the backward LSTM is different from that of a forward LSTM. Thus, I cannot use keras.layers.Bidirectional.
The forward input is (10, 4).
The backward input is (12, 4) and it is reversed before put into the model. I would like to reverse it again after LSTM and merge it with the forward.
The simplified model is as follows.
from lambdawithmask import Lambda as MaskLambda
def reverse_func(x, mask=None):
return tf.reverse(x, [False, True, False])
forward = Sequential()
backward = Sequential()
model = Sequential()
forward.add(LSTM(input_shape = (10, 4), output_dim = 4, return_sequences = True))
backward.add(LSTM(input_shape = (12, 4), output_dim = 4, return_sequences = True))
backward.add(MaskLambda(function=reverse_func, mask_function=reverse_func))
model.add(Merge([forward, backward], mode = "concat", concat_axis = 1))
When I run this, the error message is:
Tensors in list passed to 'values' of 'ConcatV2' Op have types [bool, float32] that don't all match.
Could anyone help me? I coded in Python 3.5.2 with Keras (2.0.5) and the backend is tensorflow (1.2.1).
First of all, if you have two different inputs, you cannot use a Sequential model. You must use the functional API Model:
from keras.models import Model
The two first models can be sequential, no problem, but the junction must be a regular model. When it's about concatenating, I also use the functional approach (create the layer, then pass the input):
junction = Concatenate(axis=1)([forward.output,backward.output])
Why axis=1? You can only concatenate things with the same shape. Since you have 10 and 12, they're not compatible unless you use this exact axis for the merge, which is the second axis, considering you have (BatchSize, TimeSteps, Units)
For creating the final model, use the Model, specify the inputs and outputs:
model = Model([forward.input,backward.input], junction)
In the model to be reversed, use simply a Lambda layer. A MaskLambda does more than just the function you want. I also suggest you use the keras backend insted of tensorflow functions:
import keras.backend as K
#instead of the MaskLambda:
backward.add(Lambda(lambda x: K.reverse(x,axes=[1]), output_shape=(12,?))
Here, the ? is the amount of units your LSTM layers have. See PS at the end.
PS: I'm not sure output_dim is useful in the LSTM layer. It's necessary in Lambda layers, but I never use it anywhere else. Shapes are natural consequences of the amount of "units" you put in your layers. Strangely, you didn't specify the amount of units.
PS2: How exactly do you want to concatenate two sequences with different sizes?
As said in above answer, using a Functional API offers you much flexibility in case of multi input/output models. You can simply set the go_backwards argument as True to reverse the traversal of the input vector by the LSTM layer.
I have defined the smart_merge function below which merges the forward and backward LSTM layers together along with handling the single traversal case.
from keras.models import Model
from keras.layers import Input, merge
def smart_merge(vectors, **kwargs):
return vectors[0] if len(vectors)==1 else merge(vectors, **kwargs)
input1 = Input(shape=(10,4), dtype='int32')
input2 = Input(shape=(12,4), dtype='int32')
LtoR_LSTM = LSTM(56, return_sequences=False)
LtoR_LSTM_vector = LtoR_LSTM(input1)
RtoL_LSTM = LSTM(56, return_sequences=False, go_backwards=True)
RtoL_LSTM_vector = RtoL_LSTM(input2)
BidireLSTM_vector = [LtoR_LSTM_vector]
BidireLSTM_vector.append(RtoL_LSTM_vector)
BidireLSTM_vector= smart_merge(BidireLSTM_vector, mode='concat')

Strange behaviour sequence to sequence learning for variable length sequences

I am training a sequence to sequence model for variable length sequences with Keras, but I am running into some unexpected problems. It is unclear to me whether the behaviour I am observing is the desired behaviour of the library and why it would be.
Model Creation
I've made a recurrent model with an embeddings layer and a GRU recurrent layer that illustrates the problem. I used mask_zero=0.0 for the embeddings layer instead of a masking layer, but changing this doesn't seem to make a difference (nor does adding a masking layer before the output):
import numpy
from keras.layers import Embedding, GRU, TimeDistributed, Dense, Input
from keras.models import Model
import keras.preprocessing.sequence
numpy.random.seed(0)
input_layer = Input(shape=(3,), dtype='int32', name='input')
embeddings = Embedding(input_dim=20, output_dim=2, input_length=3, mask_zero=True, name='embeddings')(input_layer)
recurrent = GRU(5, return_sequences=True, name='GRU')(embeddings)
output_layer = TimeDistributed(Dense(1), name='output')(recurrent)
model = Model(input=input_layer, output=output_layer)
output_weights = model.layers[-1].get_weights()
output_weights[1] = numpy.array([0.2])
model.layers[-1].set_weights(output_weights)
model.compile(loss='mse', metrics=['mse'], optimizer='adam', sample_weight_mode='temporal')
I use masking and the sample_weight parameter to exclude the padding values from the training/evaluation. I will test this model on one input/output sequence which I pad using the Keras padding function:
X = [[1, 2]]
X_padded = keras.preprocessing.sequence.pad_sequences(X, dtype='float32', maxlen=3)
Y = [[[1], [2]]]
Y_padded = keras.preprocessing.sequence.pad_sequences(Y, maxlen=3, dtype='float32')
Output Shape
Why the output is expected to be formatted in this way. Why can I not use input/output sequences that have exactly the same dimensionality? model.evaluate(X_padded, Y_padded) gives me a dimensionality error.
Then, when I run model.predict(X_padded) I get the following output (with numpy.random.seed(0) before generating the model):
[[[ 0.2 ]
[ 0.19946882]
[ 0.19175649]]]
Why isn't the first input masked for the output layer? Is the output_value computed anyways (and equal to the bias, as the hidden layer values are 0? This does not seem desirable. Adding a Masking layer before the output layer does not solve this problem.
MSE calculation
Then, when I evaluate the model (model.evaluate(X_padded, Y_padded)), this returns the Mean Squared Error (MSE) of the entire sequence (1.3168) including this first value, which I suppose is to be expected when it isn't masked, but not what I would want.
From the Keras documentation I understand I should use the sample_weight parameter to solve this problem, which I tried:
sample_weight = numpy.array([[0, 1, 1]])
model_evaluation = model.evaluate(X_padded, Y_padded, sample_weight=sample_weight)
print model.metrics_names, model_evaluation
The output I get is
['loss', 'mean_squared_error'] [2.9329459667205811, 1.3168648481369019]
This leaves the metric (MSE) unaltered, it is still the MSE over all values, including the one that I wanted masked. Why? This is not what I want when I evaluate my model. It does cause a change in the loss value, which appears to be the MSE over the last two values normalised to not give more weight to longer sequences.
Am I doing something wrong with the sample weights? Also, I can really not figure out how this loss value came about. What should I do to exclude the padded values from both training and evaluation (I assume the sample_weight parameter works the same in the fit function).
It was indeed a bug in the library, in Keras 2 this issue is resolved.

Resources