Keras residual connection on only part of input - keras

I already made a model without residual connection which compile and fit without any errors [using Keras Sequential API]
I wish to test a modified version just adding a residual connection like in SPEECH ENHANCEMENT BASED ON DEEP NEURAL NETWORKS WITH SKIP CONNECTIONS. So, I need to use Functional API instead.
My problem is extract a piece in the middle of inputs. I tried that.
INPUT_SIZE = N*OUTPUT_SIZE # N must be odd
HIDDEN_SIZE = N*OUTPUT_SIZE # N must be odd
modelInputs = Input(shape=(INPUT_SIZE,))
x = Dense(HIDDEN_SIZE, activation='relu', kernel_initializer=INPUT_KERNEL_INITIALIZER)(modelInputs)
for _ in np.arange(1,N_HIDDEN):
x = Dense(HIDDEN_SIZE, activation='relu', kernel_initializer=INPUT_KERNEL_INITIALIZER)(x)
Y = Dense(OUTPUT_SIZE, activation='relu', kernel_initializer=INPUT_KERNEL_INITIALIZER)(x)
# --------------------------------------------------------
# Here, 4 options I tried to get "modelInputs_selected"
# --------------------------------------------------------
# Try 1
modelInputs_selected = Lambda(lambda x: x[int(N/2)*OUTPUT_SIZE:(int(N/2)+1)*OUTPUT_SIZE])(modelInputs)
# Try 2 [Try 1 with 'output_shape' filled]
modelInputs_selected = Lambda(lambda x: x[int(N/2)*OUTPUT_SIZE:(int(N/2)+1)*OUTPUT_SIZE, :], output_shape=(OUTPUT_SIZE,))(modelInputs)
# Try 3
modelInputs_selected = K.transpose(K.gather(K.transpose(modelInputs), K.arange(int(N/2)*OUTPUT_SIZE, (int(N/2)+1)*OUTPUT_SIZE)))
# Try 4 [Try 3 unwrapped]
toto1 = K.transpose(modelInputs)
toto2 = K.gather(toto1, K.arange(int(N/2)*OUTPUT_SIZE, (int(N/2)+1)*OUTPUT_SIZE))
modelInputs_selected = K.transpose(toto2)
# --------------------------------------------------------
# End of option tried
# --------------------------------------------------------
predictions = add([modelInputs_selected, Y])
model = Model(inputs=modelInputs, outputs=predictions)
Results are:
Try 1 & Try 2:
Error = Incoherent shape during add()
Try 3 & Try 4:
Good shapes for add()
Error with Model(...)
I gone into Model() step by step. We go backward starting with the last layer
Output add() OK
Previous one K.transpose(): Error AttributeError: 'Tensor' object has no attribute '_keras_history' in "build_map_of_graph"
The model construction failed because I use a function from the backend (TensorFlow, for me)?
Anyone can help, please?
Maybe if I use a multiply()?
modelInputs is (m, N*OUTPUT_SIZE) and modelInputs_selected is (m,OUTPUT_SIZE)
With the good matrix A (N*OUTPUT_SIZE, OUTPUT_SIZE): modelInputs_selected = multiply(modelInputs, A)

Error in try 1
You're ignoring the samples dimension (the first dimension).
The tensor x should come in with shape (samples_or_batch_size, INPUT_SIZE).
Solution:
So you need lambda x: x[:, int(N/2)*OUTPUT_SIZE:(int(N/2)+1)*OUTPUT_SIZE]
But honestly, why not one of these?
lambda x: x[:,:OUTPUT_SIZE]
lambda x: x[:,-OUTPUT_SIZE:]
Is it a problem taking from the beginning or from the end? This even frees you from using an N, the only condition will be that INPUT_SIZE >= OUTPUT_SIZE.
Error in try 2
You're not ignoring the samples, but you're inverting the dimensions.
It should be x[:,expression] instead of x[expression,:].
You only need to declare the output shape if you're not using Tensorflow (Tensorflow can do it automatically for you)
Errors in 3 and 4
You cannot use any function outside of a layer.
There is no problem if you use either Keras backend or Tensorflow functions, but they must be inside layers, such as in try 1 and 2.
The no _keras_history is a typical error for using functions outside of layers.

Related

Tensorflow: why is the outcome of single neural network node without activation function different then my own calculation?

i created a single node with 3 inputs and one output with bias 0 and no activation function.
as far as i understand, the only thing that happens here is a matrix multiplication between the input vector and the randomly initialized weights but when i do the multiplication myself with the same inputs and weights i get a different outcome? what am i missing/doing wrong?
thanks in advance!
i base my calculation on some code provided here
here is the code:
def example_code(self):
import tensorflow as tf
data = [[1.0,2.0,3.0]]
x = tf.placeholder(tf.float32,shape=[1,3],name="mydata")
node = tf.layers.Dense(units=1)
y = node(x)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
print("input: "+str(data))
outcome = sess.run(y,feed_dict={x:data})
#print("outcome from tensorflow: "+str(outcome))
weights = node.get_weights()[0]
bias = node.get_weights()[1]
print("weights: "+str(weights))
print("bias: "+str(bias))
print("outcome from tensorflow: " + str(outcome))
outcome = tf.matmul(data,weights)
print("manually calculated outcome: "+str(sess.run(outcome)))
output from code:
input: [[1.0, 2.0, 3.0]]
weights: [[ 0.72705185] [-0.70188504] [ 0.5336163 ]]
bias: [0.]
outcome from tensorflow: [[-1.3463312]]
manually calculated outcome: [[0.9241307]]
The problem is that tf.layers is not using uses is not using your session sess. This in turn results in different initializations for the weights, hence the two different values. tf.layers ends up using tf.keras.backend.get_session() to retrieve the session used for initialization and retrieval of weights (node.get_weights()). tf.keras.backend.get_session() tries to use the default session if there is one, and if there is not then it creates its own session. In this case, sess is not configured as default session (only tf.InteractiveSession gets automatically configured as default session on construction). The simplest fix is to use tf.Session in the recommended way, as a context manager:
def example_code(self):
import tensorflow as tf
with tf.Session() as sess:
data = [[1.0,2.0,3.0]]
x = tf.placeholder(tf.float32,shape=[1,3],name="mydata")
node = tf.layers.Dense(units=1)
y = node(x)
init = tf.global_variables_initializer()
sess.run(init)
print("input: "+str(data))
outcome = sess.run(y,feed_dict={x:data})
#print("outcome from tensorflow: "+str(outcome))
weights = node.get_weights()[0]
bias = node.get_weights()[1]
print("weights: "+str(weights))
print("bias: "+str(bias))
print("outcome from tensorflow: " + str(outcome))
outcome = tf.matmul(data,weights)
print("manually calculated outcome: "+str(sess.run(outcome)))
This will set sess as default session, and also it will make sure its resources are freed when the function is finished (which was another issue in your code). If for whatever reason you want to use some session as default but do not want to close it with the context manager, you can just use as_default():
def example_code(self):
import tensorflow as tf
sess = tf.Session():
with sess.as_default():
data = [[1.0,2.0,3.0]]
x = tf.placeholder(tf.float32,shape=[1,3],name="mydata")
node = tf.layers.Dense(units=1)
y = node(x)
init = tf.global_variables_initializer()
sess.run(init)
print("input: "+str(data))
outcome = sess.run(y,feed_dict={x:data})
#print("outcome from tensorflow: "+str(outcome))
weights = node.get_weights()[0]
bias = node.get_weights()[1]
print("weights: "+str(weights))
print("bias: "+str(bias))
print("outcome from tensorflow: " + str(outcome))
outcome = tf.matmul(data,weights)
print("manually calculated outcome: "+str(sess.run(outcome)))
# You need to manually ensure that the session gets closed after
sess.close()

Reinforce learning - how to teach a neuronal network avoid actions already chosen during the episode?

I built a custom Open AI Gym environment in which I have 13 different actions and and 33 observation items. During an episode every action can be used, but it can be used only once otherwise the episode ends. Thus the maximum lenght of an episode is 13.
I tried to train several neuronal network for this, but so far the NN did not learned it well and it ends much prior the 13rd step. The last layer of the NN is a softmax layer with 13 neurons.
Do you have any idea how an NN would look like which could learn to choose from 13 actions one-by-one?
Kind regards,
Ferenc
I found something interesting in this topic
https://ai.stackexchange.com/questions/7755/how-to-implement-a-constrained-action-space-in-reinforcement-learning
Will check if the 'do-nothing' idea helps ...
At the end I wrote this code:
from keras import backend as K
import tensorflow as tf
def mask_output2(x):
inp, soft_out = x
# add a very small value in order to avoid having 0 everywhere
c = K.constant(0.0000001, dtype='float32', shape=(32, 13))
y = soft_out + c
y = Lambda(lambda x: K.switch(K.equal(x[0],0), x[1], K.zeros_like(x[1])))([inp, soft_out])
y_sum = K.sum(y, axis=-1)
y_sum_corrected = Lambda(lambda x: K.switch(K.equal(x[0],0), K.ones_like(x[0]), x[0] ))([y_sum])
y_sum_corrected = tf.divide(1,y_sum_corrected)
y = tf.einsum('ij,i->ij', y, y_sum_corrected)
return y
It simply corrects the sigmoid result in order to clear (set to 0) those neurons where the inp tensor is set to 1 (showing an action already used).

Apply softmax on a subset of neurons

I'm building a convolutional net in Keras that assigns multiple classes to an image. Given that the image has 9 points of interest that can be classified in one of the three ways I wanted to add 27 output neurons with softmax activation that would compute probability for each consecutive triple of neurons.
Is it possible to do that? I know I can simply add a big softmax layer but this would result in a probability distribution over all output neurons which is too broad for my application.
In the most naive implementation, you can reshape your data and you'll get exactly what you described: "probability for each consecutive triplet".
You take the output with 27 classes, shaped like (batch_size,27) and reshape it:
model.add(Reshape((9,3)))
model.add(Activation('softmax'))
Take care to reshape your y_true data as well. Or add yet another reshape in the model to restore the original form:
model.add(Reshape((27,))
In more elaborate solutions, you'd probably separate the 9 points of insterest according to their locations (if they have a roughly static location) and make parallel paths. For instance, suppose your 9 locations are evenly spaced rectangles, and you want to use the same net and classes for those segments:
inputImage = Input((height,width,channels))
#supposing the width and height are multiples of 3, for easiness in this example
recHeight = height//3
recWidth = width//3
#create layers here without calling them
someConv1 = Conv2D(...)
someConv2 = Conv2D(...)
flatten = Flatten()
classificator = Dense(..., activation='softmax')
outputs = []
for i in range(3):
for j in range(3):
fromH = i*recHeight
toH = fromH + recHeight
fromW = j*recWidth
toW = fromW + recWidth
imagePart = Lambda(
lambda x: x[:,fromH:toH, fromW:toW,:],
output_shape=(recHeight,recWidth,channels)
)(inputImage)
#using the same net and classes for all segments
#if this is not true, create new layers here instead of using the same
output = someConv1(imagePart)
output = someConv2(output)
output = flatten(output)
output = classificator(output)
outputs.append(output)
outputs = Concatenate()(outputs)
model = Model(inputImage,outputs)

Keras 1.0: getting intermediate layer output

I am currently trying to visualize the output of an intermediate layer in Keras 1.0 (which I could do with Keras 0.3) but it does not work anymore.
x = model.input
y = model.layers[3].output
f = theano.function([x], y)
But I get the following error:
MissingInputError: ("An input of the graph, used to compute DimShuffle{x,x,x,x}(keras_learning_phase), was not provided and not given a value.Use the Theano flag exception_verbosity='high',for more information on this error.", keras_learning_phase)
Prior to Keras 1.0, with my graph model, I could just do:
x = graph.inputs['input'].input
y = graph.nodes[layer].get_output(train=False)
f = theano.function([x], y, allow_input_downcast=True)
So I suspect it to come from the "train=False" parameter which I don't know how to set in the new version.
Thank you for your help
Try:
In the import statements first give
from keras import backend as K
from theano import function
then
f = K.function([model.layers[0].input, K.learning_phase()],
[model.layers[3].output])
# output in test mode = 0
layer_output = get_3rd_layer_output([X_test, 0])[0]
# output in train mode = 1
layer_output = get_3rd_layer_output([X_train, 1])[0]
This was just answered by François Chollet on github:
Your model apparently has a different behavior in training and test mode, and so needs to know what mode it should be using.
Use
iterate = K.function([input_img, K.learning_phase()], [loss, grads])
and pass 1 or 0 as value for the learning phase, based on whether you want the model in training mode or test mode.
https://github.com/fchollet/keras/issues/2417

LSTMLayer produces NaN values even before training it

I'm currently trying to construct a LSTM network with Lasagne to predict the next step of noisy sequences. I first trained a stack of 2 LSTM layers for a while, but had to use an abysmally small learning rate (1e-6) because of divergence issues (that ultimately produced NaN values). The results were kind of disappointing, as the network produced smooth, out-of-phase versions of the input.
I then came to the conclusion I should use better parameter initialization than what is given by default. The goal was to start from a network that just mimics identity, since for strongly auto-correlated signal it should be a good first estimation of the next step (x(t) ~ x(t+1)), and to sprinkle a bit of noise on top of it.
import theano, numpy, lasagne
from theano import tensor as T
from lasagne.layers.recurrent import LSTMLayer, InputLayer, Gate
from lasagne.layers import DropoutLayer
from lasagne.nonlinearities import sigmoid, tanh, leaky_rectify
from lasagne.layers import get_output
from lasagne.init import GlorotNormal, Normal, Constant
floatX = 'float32'
# function to create a lstm that ~ propagate the input from start to finish off the bat
# should be a good start for a predictive lstm with high one-step autocorrelation
def create_identity_lstm(input, shape, orig_inp=None, noiselvl=0.01, G=10., mask_input=None):
inp, out = shape
# orig_inp is used to limit the number of units that are actually used to pass the input information from one layer to the other - the rest of the units should produce ~ 0 activation.
if orig_inp is None:
orig_inp = inp
# input gate
inputgate = Gate(
W_in=GlorotNormal(noiselvl),
W_hid=GlorotNormal(noiselvl),
W_cell=Normal(noiselvl),
b=Constant(0.),
nonlinearity=sigmoid
)
# forget gate
forgetgate = Gate(
W_in=GlorotNormal(noiselvl),
W_hid=GlorotNormal(noiselvl),
W_cell=Normal(noiselvl),
b=Constant(0.),
nonlinearity=sigmoid
)
# cell gate
cell = Gate(
W_in=GlorotNormal(noiselvl),
W_hid=GlorotNormal(noiselvl),
W_cell=None,
b=Constant(0.),
nonlinearity=leaky_rectify
)
# output gate
outputgate = Gate(
W_in=GlorotNormal(noiselvl),
W_hid=GlorotNormal(noiselvl),
W_cell=Normal(noiselvl),
b=Constant(0.),
nonlinearity=sigmoid
)
lstm = LSTMLayer(input, out, ingate=inputgate, forgetgate=forgetgate, cell=cell, outgate=outputgate, nonlinearity=leaky_rectify, mask_input=mask_input)
# change matrices and biases
# ingate - should return ~1 (matrices = 0, big bias)
b_i = lstm.b_ingate.get_value()
b_i[:orig_inp] += G
lstm.b_ingate.set_value(b_i)
# forgetgate - should return 0 (matrices = 0, big negative bias)
b_f = lstm.b_forgetgate.get_value()
b_f[:orig_inp] -= G
b_f[orig_inp:] += G # to help learning future features, I preserve a large bias on "unused" units to help it remember stuff
lstm.b_forgetgate.set_value(b_f)
# cell - should return x(t) (W_xc = identity, rest is 0)
W_xc = lstm.W_in_to_cell.get_value()
for i in xrange(orig_inp):
W_xc[i, i] += 1.
lstm.W_in_to_cell.set_value(W_xc)
# outgate - should return 1 (same as ingate)
b_o = lstm.b_outgate.get_value()
b_o[:orig_inp] += G
lstm.b_outgate.set_value(b_o)
# done
return lstm
I then use this lstm generation code to generate the following network:
# layers
#input + dropout
input = InputLayer((None, None, 7), name='input')
mask = InputLayer((None, None), name='mask')
drop1 = DropoutLayer(input, p=0.33)
#lstm1 + dropout
lstm1 = create_identity_lstm(drop1, (7, 1024), mask_input=mask)
drop2 = DropoutLayer(lstm1, p=0.33)
#lstm2 + dropout
lstm2 = create_identity_lstm(drop2, (1024, 128), orig_inp=7, mask_input=mask)
drop3 = DropoutLayer(lstm2, p=0.33)
#lstm3
lstm3 = create_identity_lstm(drop3, (128, 7), orig_inp=7, mask_input=mask)
# symbolic variables and prediction
x = input.input_var
ma = mask.input_var
ma_reshape = ma.dimshuffle((0,1,'x'))
yhat = get_output(lstm3, deterministic=False)
yhat_det = get_output(lstm3, deterministic=True)
y = T.ftensor3('y')
predict = theano.function([x, ma], yhat_det)
Problem is, even without any training, this network produces garbage values and sometimes even a bunch of NaNs, right from the very first LSTM layer:
X = numpy.random.random((5, 10000, 7)).astype('float32')
Masks = numpy.ones(X.shape[:2], dtype='float32')
hid1 = get_output(lstm1, determistic=True)
get_hid1 = theano.function([x, ma], hid1)
h1 = get_hid1(X, Masks)
print numpy.isnan(h1).sum(axis=1).sum(axis=1)
array([6379520, 6367232, 6377472, 6376448, 6378496])
# even the first output value is garbage!
print h1[:,0,0] - X[:,0,0]
array([-0.03898358, -0.10118812, 0.34877831, -0.02509735, 0.36689138], dtype=float32)
I don't get why, I checked each matrices and their values are fine, like I wanted them to be. I even tried to recreate each gate activations and the resulting hidden activations using the actual numpy arrays and they reproduce the input just fine. What did I do wrong there??

Resources