Keras specification of y - keras

I am trying to understand the following snippets from Keras documentation. What is the logic of specifying y as Dense statement followed by '(x)'? Not sure about the purpose of this statement.
# this is a logistic regression in Keras
x = Input(shape=(32,))
y = Dense(16, activation='softmax')(x)
model = Model(x, y)

It is quite simple, in this case y is the output of a Dense layer with 16 units and a softmax activation, given the input x.
This structure is meant to be functional-like, so you can specify models with multiple inputs and outputs easily in code.

Related

Tensorflow 1.15 / Keras 2.3.1 Model.train_on_batch() returns more values than there are outputs/loss functions

I am trying to train a model that has more than one output and as a result, also has more than one loss function attached to it when I compile it.
I haven't done something similar in the past (not from scratch at least).
Here's some code I am using to figure out how this works.
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.models import Model
batch_size = 50
input_size = 10
i = Input(shape=(input_size,))
x = Dense(100)(i)
x_1 = Dense(output_size)(x)
x_2 = Dense(output_size)(x)
model = Model(i, [x_1, x_2])
model.compile(optimizer = 'adam', loss = ["mse", "mse"])
# Data creation
x = np.random.random_sample([batch_size, input_size]).astype('float32')
y = np.random.random_sample([batch_size, output_size]).astype('float32')
loss = model.train_on_batch(x, [y,y])
print(loss) # sample output [0.8311912, 0.3519104, 0.47928077]
I would expect the variable loss to have two entries (one for each loss function), however, I get back three. I thought maybe one of them is the weighted average but that does not look to be the case.
Could anyone explain how passing in multiple loss functions works, because obviously, I am misunderstanding something.
I believe the three outputs are the sum of all the losses, followed by the individual losses on each output.
For example, if you look at the sample output you've printed there:
0.3519104 + 0.47928077 = 0.83119117 ≈ 0.8311912
Your assumption that there should be two losses in incorrect. You have a model with two outputs, and you specified one loss for each output, but the model has to be trained on a single loss, so Keras trains the model on a new loss that is the sum of the per-output losses.
You can control how these losses are mixed using the loss_weights parameter in model.compile. I think by default it takes weights values equal to 1.0.
So in the end what train_on_batch returns is the loss, output one mse, and output two mse. That is why you get three values.

How to obtain the Jacobian Matrix with respect to the inputs of a keras model neural network?

I recently started learning and using automatic differentiation to determine the gradients and jacobian matrix of a neural network with respect to a given input. The method suggested by tensorflow is the tape.gradient and tape.jacobian method. However, I am not able to obtain the jacobian matrix using this method due to some bug in tensorflow. It works when I calculated tape.gradient(y_pred, x), but not the jacobian matrix, which should have a shape of (200,3). I am open to other ways to calculate the jacobian matrix, but I am more inclined to use automatic differentiation methods within Tensorflow. The current version I am using is Tensorflow 2.1.0. Greatly appreciate any advice!
import tensorflow as tf
import numpy as np
# The neural network accepts 3 inputs and produces 200 outputs. The actual values of the inputs and outputs are not written in the code as it is too involved.
num_inputs = 3
num_outputs = 200
num_hidden_layers = 5
num_neurons = 50
kernel = 'he_uniform'
activation = tf.keras.layers.LeakyReLU(alpha=0.3)
# Details of model (MLP)
current_model = tf.keras.models.Sequential()
current_model.add(tf.keras.Input(shape=(num_inputs,)))
for i in range(num_hidden_layers):
current_model.add(tf.keras.layers.Dense(units=num_neurons, activation=activation, kernel_initializer=kernel))
current_model.add(tf.keras.layers.Dense(units=num_outputs, activation='linear', kernel_initializer=kernel))
# Finding the Jacobian matrix with respect to a given input of the neural network
# In this case, the inputs are [0.02, 0.4 and 0.12] (i.e. 3 inputs)
x = tf.Variable([[0.02, 0.4, 0.12]], dtype=tf.float32)
with tf.GradientTape() as tape:
y_pred = x
for layer in current_model.layers:
y_pred = layer(y_pred)
jacobian = tape.jacobian(y_pred, x)
print(jacobian)
Below is the error returned. I removed some parts for privacy purposes.
StagingError: in converted code:
C:\Users\...\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\ops\parallel_for\control_flow_ops.py:183 f *
return _pfor_impl(loop_fn, iters, parallel_iterations=parallel_iterations)
C:\Users\...\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\ops\parallel_for\control_flow_ops.py:256 _pfor_impl
outputs.append(converter.convert(loop_fn_output))
C:\Users\...\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\ops\parallel_for\pfor.py:1280 convert
output = self._convert_helper(y)
C:\Users\...\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\ops\parallel_for\pfor.py:1453 _convert_helper
if flags.FLAGS.op_conversion_fallback_to_while_loop:
C:\Users\...\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\platform\flags.py:84 __getattr__
wrapped(_sys.argv)
C:\Users\...\anaconda3\envs\tf\lib\site-packages\absl\flags\_flagvalues.py:633 __call__
name, value, suggestions=suggestions)
UnrecognizedFlagError: Unknown command line flag 'f'

Training many-to-many stateful LSTM with and without final dense layer

I am trying to train a recurrent model in Keras containing an LSTM for regression purposes.
I would like to use the model online and, as far as I understood, I need to train a stateful LSTM.
Since the model has to output a sequence of values, I hope it computes the loss on each of the expected output vector.
However, I fear my code is not working this way and I would be grateful if anyone would help me to understand if I am doing right or if there is some better approach.
The input to the model is a sequence of 128-dimensional vectors. Each sequence in the training set has a different lenght.
At each time, the model should output a vector of 3 elements.
I am trying to train and compare two models:
A) a simple LSTM with 128 inputs and 3 outputs;
B) a simple LSTM with 128 inputs and 100 outputs + a dense layer with 3 outputs;
For model A) I wrote the following code:
# Model
model = Sequential()
model.add(LSTM(3, batch_input_shape=(1, None, 128), return_sequences=True, activation = "linear", stateful = True))`
model.compile(loss='mean_squared_error', optimizer=Adam())
# Training
for i in range(n_epoch):
for j in np.random.permutation(n_sequences):
X = data[j] # j-th sequences
X = X[np.newaxis, ...] # X has size 1 x NTimes x 128
Y = dataY[j] # Y has size NTimes x 3
history = model.fit(X, Y, epochs=1, batch_size=1, verbose=0, shuffle=False)
model.reset_states()
With this code, model A) seems to train fine because the output sequence approaches the ground-truth sequence on the training set.
However, I wonder if the loss is really computed by considering all NTimes output vectors.
For model B), I could not find any way to get the entire output sequence due to the dense layer. Hence, I wrote:
# Model
model = Sequential()
model.add(LSTM(100, batch_input_shape=(1, None, 128), , stateful = True))
model.add(Dense(3, activation="linear"))
model.compile(loss='mean_squared_error', optimizer=Adam())
# Training
for i in range(n_epoch):
for j in np.random.permutation(n_sequences):
X = data[j] #j-th sequence
X = X[np.newaxis, ...] # X has size 1 x NTimes x 128
Y = dataY[j] # Y has size NTimes x 3
for h in range(X.shape[1]):
x = X[0,h,:]
x = x[np.newaxis, np.newaxis, ...] # h-th vector in j-th sequence
y = Y[h,:]
y = y[np.newaxis, ...]
loss += model.train_on_batch(x,y)
model.reset_states() #After the end of the sequence
With this code, model B) does not train fine. It seems to me the training does not converge and loss values increase and decrease cyclically
I have also tried to use as Y only the last vector and them calling the fit function on the Whole training sequence X, but no improvements.
Any idea? Thank you!
If you want to still have three outputs per step of your sequence, you need to TimeDistribute your Dense layer like so:
model.add(TimeDistributed(Dense(3, activation="linear")))
This applies the dense layer to each timestep independently.
See https://keras.io/layers/wrappers/#timedistributed

output layer regularization implementation

I’m building a NN model using keras, and I wish to impose a constraint on it that doesn’t (directly) have to do with the weights. Would be very grateful for some help / points me towards some relevant keywords to look up. The constraint I wish to impose is a bit complex, but it can be simplified in the following manner: I wish to impose a constraint on the output of certain inputs of the net. For the sake of simplicity, let’s say the constraint looks like NN(3)+NN(4) < 10, where NN is the neural net, which can be seen as a function. How can I impose such a constraint? Thank you very much in advance for any help on the subject!
edit: A more detailed explanation of what I'm trying to do and why.
The theoretical model I'm building is this:
I'm feeding the output of the first net into the input of the second net, along with an additive gaussian noise.
The constraint I wish to impose is on the output of the first NN (g). Why? Without a constraint, the net maps the inputs to outputs as high as it possibly can in order to make the additive noise as insignificant as possible. And rightly so, this is the optimal encoding function g, but it's not very interesting :) And so I wish to impose a constraint on the output of the first NN (g). More specifically, the constraint is on the total power of the function: integral{ fX(x) * g(x)^2 dx }. But this can be simplified more or less, to a function that looks something like what I described earlier - g(3)+g(4)<10. More specifically, the function is sum { fX(x) * g(i)^2 * dx } < max_power, for some sampled inputs i.
This is the problem, now here's how I attempted to implement it:
model = Sequential([
Dense(300, input_dim=1, activation='relu'),
Dense(300, activation='relu'),
Dense(1, activation='linear', name=encoder_output),
GaussianNoise(nvar, name='noise'),
Dense(300, activation='relu', name=decoder_input),
Dense(300, activation='relu'),
Dense(1, activation='linear', name=decoder_output),
])
Mainly, this is supposedly a single neural net, and not really 2 (although there is no difference obviously).
The import things to note is the input dim 1, output dim 1 (x and y in the diagram), and the gaussian noise in the middle. The hidden layers are not very interesting right now, I'll optimize them at a later point.
In this model, I wish to impose a constraint on the output of a (supposedly) hidden layer named encoder_output. Hope this clarifies things.
You could use a multi input/multi output model with shared weights layers. The model could for example look like this:
from keras.layers import Input, Dense, Add
from keras.models import Model
# Shared weights layers
hidden = Dense(10, activation='relu')
nn_output = Dense(1, activation='relu')
x1 = Input(shape=(1,))
h1 = hidden(x1)
y1 = nn_output(h1)
x2 = Input(shape=(1,))
h2 = hidden(x2)
y2 = nn_output(h2)
# Your constraint
# In case it should be more complicated, you can implement
# a custom keras layer
sum = Add()([y1, y2])
model = Model(inputs=[x1, x2], outputs=[y1, y2, sum])
model.compile(optimizer='sgd', loss='mse')
X_train_1 = [3,4]
X_train_2 = [4,3]
y_train_1 = [123,456] # your expected output
y_train_2 = [456,123] # your expected output
s = [10,10] # expected sums
model.fit([X_train_1, X_train_2], [y_train_1, y_train_2, s], epochs=10)
If you have no exact value for your constraint that can be used as an expected output, you can remove it from the outputs and write a simple custom regularizer that would be used on it. There is a simple example for a custom regularizer in the Keras documentation.

Concatenate outputs of LSTM in Keras

I intend to feed all outputs of timesteps from a LSTM to a fully-connected layer. However, the following codes fail. How can I reduce 3D output of LSTM to 2D by concatenating each output of timestep?
X = LSTM(units=128,return_sequences=True)(input_sequence)
X = Dropout(rate=0.5)(X)
X = LSTM(units=128,return_sequences=True)(X)
X = Dropout(rate=0.5)(X)
X = Concatenate()(X)
X = Dense(n_class)(X)
X = Activation('softmax')(X)
You can use the Flatten layer to flatten the 3D output of LSTM layer to a 2D shape.
As a side note, it is better to use dropout and recurrent_dropout arguments of LSTM layer instead of using Dropout layer directly with recurrent layers.
Additional to #todays answer:
It seems like you want to use return_sequences just to concatenate it into a dense layer. If you did not already try it with return_sequeunces=False, I would recommend you to do to so. The main purpose of return_sequences is to stack LSTMS or to make seq2seq predictions. In your case it should be enough to just use the LSTM.

Resources