Restricting the output values of layers in Keras - keras

I have defined my MLP in the code below. I want to extract the values of layer_2.
def gater(self):
dim_inputs_data = Input(shape=(self.train_dim[1],))
dim_svm_yhat = Input(shape=(3,))
layer_1 = Dense(20,
activation='sigmoid')(dim_inputs_data)
layer_2 = Dense(3, name='layer_op_2',
activation='sigmoid', use_bias=False)(layer_1)
layer_3 = Dot(1)([layer_2, dim_svm_yhat])
out_layer = Dense(1, activation='tanh')(layer_3)
model = Model(input=[dim_inputs_data, dim_svm_yhat], output=out_layer)
adam = optimizers.Adam(lr=0.01)
model.compile(loss='mse', optimizer=adam, metrics=['accuracy'])
return model
Suppose the output of layer_2 is below in matrix form
0.1 0.7 0.8
0.1 0.8 0.2
0.1 0.5 0.5
....
I would like below to be fed into layer_3 instead of above
0 0 1
0 1 0
0 1 0
Basically, I want the first maximum values to be converted to 1 and other to 0.
How can this be achieved in keras?.

Who decides the range of output values?
Output range of any layer in a neural network is decided by the activation function used for that layer. For example, if you use tanh as your activation function, your output values will be restricted to [-1,1] (and the values are continuous, check how the values get mapped from [-inf,+inf] (input on x-axis) to [-1,+1] (output on y-axis) here, understanding this step is very important)
What you should be doing is add a custom activation function that restricts your values to a step function i.e., either 1 or 0 for [-inf, +inf] and apply it to that layer.
How do I know which function to use?
You need to create y=some_function that satisfies all your needs (the input to output mapping) and convert that to Python code just like this one:
from keras import backend as K
def binaryActivationFromTanh(x, threshold) :
# convert [-inf,+inf] to [-1, 1]
# you can skip this step if your threshold is actually within [-inf, +inf]
activated_x = K.tanh(x)
binary_activated_x = activated_x > threshold
# cast the boolean array to float or int as necessary
# you shall also cast it to Keras default
# binary_activated_x = K.cast_to_floatx(binary_activated_x)
return binary_activated_x
After making your custom activation function, you can use it like
x = Input(shape=(1000,))
y = Dense(10, activation=binaryActivationFromTanh)(x)
Now test the values and see if you are getting the values like you expected. You can now throw this piece into a bigger neural network.
I strongly discourage adding new layers to add restriction to your outputs, unless it is solely for activation (like keras.layers.LeakyReLU).

Use Numpy in between. Here is an example with a random matrix:
a = np.random.random((5, 5)) # simulate random value output of your layer
result = (a == a.max(axis=1)[:,None]).astype(int)
See also this thread: Numpy: change max in each row to 1, all other numbers to 0
You than feed in result as input to your next layer.
For wrapping the Numpy calculation you could use the Lambda layer. See examples here: https://keras.io/layers/core/#lambda
Edit:
Suggestion doesn´t work. I keep answer only to keep related comments.

Related

Input shape for 1D convolution network in keras

I am quite new to keras and I have a problem in understanding shapes.
I wanted to create 1D Conv Keras model as follows, I don't know this is correct or not:
TIME_PERIODS = 511
num_sensors = 2
num_classes = 4
BATCH_SIZE = 400
EPOCHS = 50
model_m = Sequential()
model_m.add(Conv1D(100, 10, activation='relu', input_shape=(TIME_PERIODS, num_sensors)))
model_m.add(Conv1D(100, 10, activation='relu'))
model_m.add(MaxPooling1D(3))
model_m.add(Conv1D(160, 10, activation='relu'))
model_m.add(Conv1D(160, 10, activation='relu'))
model_m.add(GlobalAveragePooling1D())
model_m.add(Dropout(0.5))
model_m.add(Dense(num_classes, activation='softmax'))
The input data I have is 888 different panda data frame where each frame is of shape (511, 3) where 511 is numbers of signal points and 0th column is sensor1 values, 1st column is sensor2 values and 2nd column is labelled signals.
Now how I should combine all my 888 different panda data frame so I have x_train and y_train from X and Y using Sklearn train_test_split.
Also, I think the input shape I am defining for the model is wrong and I don't think I actually have TIME_PERIODS because, for 1-time point, I have 2 sensor inputs (orange, blue line) value and 1 output label (green line).
The context of the problem I am trying to solve e.g.
input: time-based 2 sensors values say for 1 AM-2 AM hour from a user, output: the range of times e.g where the user was doing activity 1, activity 2, activity X on 1:10-1:15, 1:15-1:30, 1:30-2:00, The above plot show a sample training input and output.
The problem is inspired from here but in my case, I don't have any time period, my 1-time point has 1 output label.
Update 1:
I am almost certain that my TIME_PERIODS=1 as for the prediction I will give 511 inputs and expects to get 511 output values.
Each dataframe is an independent sequence?
fileNames = get a list of filenames here, you can maybe os.listdir for that
allFrames = [pandas.read_csv(filename,... other_things...).values for filename in fileNames]
allData = np.stack(allFrames, axis=0)
inputData = allData[:,:num_sensors]
outputData = allData[:, -1:]
You can now use train test split the way you want.
Your input shape is correct.
If you want to predict the whole sequence, then you have to remove the poolings. Every convolution should use padding='same'.
And maybe you should use a Biridectional(LSTM(units, return_sequences=True)) layer somewhere to make your model stronger.
A simple model as an example. (Notice that models are totally open to creativity)
from keras.layers import *
inputs = Input((TIME_PERIODS,num_sensors)) #Should be called "time_steps" to be precise
outputs = Conv1D(any, 3, padding='same', activation = 'tanh')(inputs)
outputs = Bidirectional(LSTM(any, return_sequences=True))(outputs)
outputs = Conv1D(num_classes, activation='softmax', padding='same')(outputs)
model = keras.models.Model(inputs, outputs)
To say the least, you're in the correct path. The full solution for this would be like,
df = pd.concat([pd.read_csv(fname, index_col=<int>, header=<int>) for f filenames], ignore_index=True, axis=0)
inputs = df.loc[:,:-1]
labels = df.loc[:,0]
X_train, X_test, y_train, y_test = train_test_split(inputs, labels, test_size=<float>)
To add a bit more information, note how you are doing,
model_m.add(Conv1D(100, 10, activation='relu', input_shape=(TIME_PERIODS, num_sensors)))
and not
model_m.add(Conv1D(100, 10, activation='relu', padding='SAME', input_shape=(TIME_PERIODS, num_sensors)))
So, as you're not setting padding="Same" for the convolution layers this might have the undesirable effect of input becoming smaller and smaller as you go deeper to the model. If that's what you need, that's okay. Otherwise, set `padding="SAME".
For example, without same-padding you'll get, a width around 144 when you get to the GlobalPooling layer, where if you use same-padding it would be roughly 170. It's not a major problem here, but can easily lead to negative sizes in your input for deeper layers.

Debug output of keras layers during training

When fitting a model using keras, I encounter nans, and I want to debug the output of each layer.
The code has an input in1 which goes through multiple layers, and during the final layer I multiply elementwise with another input in2 and then do the prediction. The input in2 is sparse and is used for masking (a row resembles something like this [0 0 0 1 0 0 1 0 1 0... 0]). Label matrix contains one-hot-encoded rows. Input in1 is a vector of real values.
in1 = Input(shape=(27,), name='in1')
in2 = Input(shape=(1000,), name='in2')
# Hidden layers
hidden_1 = Dense(1024, activation='relu')(in1)
hidden_2 = Dense(512, activation='relu')(hidden_1)
hidden_3 = Dense(256, activation='relu')(hidden_2)
hidden_4 = Dense(10, activation='linear')(hidden_3)
final = Dense(1000, activation='linear')(hidden_4)
# Ensure we do not overflow when we exponentiate
final2 = Lambda(lambda x: x - K.max(x))(final)
#Masked soft-max using Lambda and merge-multiplication
exponentiate = Lambda(lambda x: K.exp(x))(final2)
masked = Multiply()([exponentiate, in2])
predicted = Lambda(lambda x: x / K.sum(x))(masked)
# Compile with categorical crossentropy and adam
mdl = Model(inputs=[in1, in2],outputs=predicted)
mdl.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
tensorboard = TensorBoard(log_dir="/Users/somepath/tmp/{}".format(time()), write_graph=True,
write_grads=True)
mdl.fit({'in1': in1_matrix, 'in2': in2_matrix},
label_matrix, epochs=1, batch_size=32, verbose=2, callbacks=[tensorboard])
I want to print the output of each layer, gradients during training and how to send auxiliary input (in2) while debugging.
I have tried to print the output of each layer like below, which works until layer7:
get_layer_output = K.function([mdl.layers[0].input],[mdl.layers[7].output])
layer_output = get_layer_output([in1_matrix])
But when I get to layer 8, I'm unable to add in2_matrix. I get the following error when I use the following code to print.
get_layer_output2 = K.function([mdl.layers[0].input],[mdl.layers[8].output])
layer_output2 = get_layer_output2([in1_matrix])
Error:
InvalidArgumentError: You must feed value for placeholder tensor 'in2' with dtype float and shape [?,1000]
I don't know how to provide in2 in K.function, and also in2_matrix to get_layer_output2.
(I have checked the in1_matrix, in2_matrix, and the label_matrix. They all look fine, with no nans or inf. Label array has no rows or columns with all zeros.)'
I'm new to Keras, any idea on how to debug nans, with callbacks even to print gradients would be appreciated. Please also let me know if there is anything wrong with the way the layers are composed.
If you print out mdl.layers[8], you can find it is Input layer, I guess you want to get the output of mdl.layers[9], which is Multiply layer. You can get like this,
get_layer_output2 = K.function([mdl.layers[0].input, mdl.layers[8].input],[mdl.layers[9].output])
layer_output2 = get_layer_output2([in1_matrix, in2_matrix])

How to build an RNN using numpy

I'm trying to Implement a Recurrent Neural Network using Numpy in python. I'm trying to implement a Many-to-One RNN, for a classification problem. I'm a little fuzzy on the psuedo code, especially on the BPTT concept. I'm comfortable with the forward pass ( not entirely sure if my implementation is correct ) but really confused with back ward pass, and I need some advice from experts in this field.
I did check out related posts :
1) Implementing RNN in numpy
2) Output for RNN
3) How can I build RNN
But I feel my issue is with understanding the psuedo code / concept first up, code in those posts is complete and have reached further stage than mine.
My Implementation is inspired from the tutorial:
WildML RNN from scratch
I did implement a Feed-Forward Neural Network following part of tutorial from the same author, but I'm really confused with this implementation of his. Andrew Ng's RNN video suggests 3 different weights ( Weights for activation, Input and Output layers ) but the above tutorial only has two sets of weights ( correct me if I'm wrong ).
The nomenclature in my code follows that of Andrew Ng's RNN pseudo code ...
I'm reshaping my input samples in to 3D ( batch_size, n_time steps, n_ dimensions ) ... Once , I reshape my samples I'm doing forward pass on each sample seperately ...
Here's my code:
def RNNCell(X, lr, y=None, n_timesteps=None, n_dimensions=None, return_sequence = None, bias = None):
'''Simple function to compute forward and bakward passes for a Many-to-One Recurrent Neural Network Model.
This function Reshapes X,Y in to 3D array of shape (batch_size, n_timesteps, n_ dimensions) and then performs
recurrent operations on each sample of the data for n_timesteps'''
# If user has specified some target variable
if len(y) != 0:
# No. of unique values in the target variables will be the dimesions for the output layer
_,n_unique = np.unique(y, return_counts=True)
else:
# If there's no target variable given, then dimensions of target variable by default is 2
n_unique = 2
# Weights of Vectors to multiply with input samples
Wx = np.random.uniform(low = 0.0,
high = 0.3,
size = (n_dimensions, n_dimensions))
# Weights of Vectors to multiply with resulting activations
Wy = np.random.uniform(low = 0.0,
high = 0.3,
size = (n_dimensions, n_timesteps))
# Weights of Vectors to multiple with activations of previous time steps
Wa = np.random.randn(n_dimensions, n_dimensions)
# List to hold activations of each time step
activations = {'a-0' : np.zeros(shape=(n_timesteps-1, n_dimensions),
dtype=float)}
# List to hold Yhat at each time step
Yhat = []
try:
# Reshape X to align with the shape of RNN architecture
X = np.reshape(X, newshape=(len(X), n_timesteps, n_dimensions))
except:
return "Sorry can't reshape and array in to your shape"
def Forward_Prop(sample):
# Outputs at the last time step
Ot = 0
# In each time step
for time_step in range(n_timesteps+1):
if time_step < n_timesteps:
# activation G ( Wa.a<t> + X<t>.Wx )
activations['a-' + str(time_step+1)] = ReLu( np.dot( activations['a-' + str(time_step)], Wa )
+ np.dot( sample[time_step, :].reshape(1, n_dimensions) , Wx ) )
# IF it's the last time step then use softmax activation function
elif time_step == n_timesteps:
# Wy.a<t> and appending that to Yhat list
Ot = softmax( np.dot( activations['a-' + str(time_step)], Wy ) )
# Return output probabilities
return Ot
def Backward_Prop(Yhat):
# List to hold errors for the last layer
error = []
for ind in range(len(Yhat)):
error.append( y[ind] - Yhat[ind] )
error = np.array(error)
# Calculating Delta for the output layer
delta_out = error * lr
#* relu_derivative(activations['a-' + str(n_timesteps)])
# Calculating gradient for the output layer
grad_out = np.dot(delta_out.reshape(len(X), n_timesteps),
activations['a-' + str(n_timesteps)])
# I'm basically stuck at this point
# Adjusting weights for the output layer
Wy = Wy - (lr * grad_out.reshape((n_dimesions, n_timesteps)))
for sample in X:
Yhat.append( Forward_Prop(sample) )
Backward_Prop(Yhat)
return Yhat
# DUMMY INPUT DATA
X = np.random.random_integers(low=0, high = 5, size = (10, 10 ));
# DUMMY LABELS
y = np.array([[0],
[1],
[1],
[1],
[0],
[0],
[1],
[1],
[0],
[1]])
I understand that my BPTT implementation is wrong, but I'm not thinking clearly and I need some experts' perspective on where exactly I'm missing the trick. I don't expect a detailed debugging of my code, I only require a high level overview of the pseudo code on back propagation ( assuming my forward prop is correct ). I think my fundamental problem can also be with the way I'm doing my forward pass on each sample individually.
I'm stuck on this problem since 3 days now, and it's really frustrating not being able to think clearly. I'd be really grateful if someone could point me in the right direction and clear my confusion. Thank you for your time in advance !! I really appreciate it once again !

How to calculate class scores when batch size changes

My question is at the bottom, but first I will explain what I am attempting to achieve.
I have an example I am trying to implement on my own model. I am creating an adversarial image, in essence I want to graph how the image score changes when the epsilon value changes.
So let's say my model has already been trained, and in this example I am using the following model...
x = tf.placeholder(tf.float32, shape=[None, 784])
...
...
# construct model
logits = tf.matmul(x, W) + b
pred = tf.nn.softmax(logits) # Softmax
Next, let us assume I extract an array of images of the number 2 from the mnist data set, and I saved it in the following variable...
# convert into a numpy array of shape [100, 784]
labels_of_2 = np.concatenate(labels_of_2, axis=0)
So now, in the example that I have, the next step is to try different epsilon values on every image...
# random epsilon values from -1.0 to 1.0
epsilon_res = 101
eps = np.linspace(-1.0, 1.0, epsilon_res).reshape((epsilon_res, 1))
labels = [str(i) for i in range(10)]
num_colors = 10
cmap = plt.get_cmap('hsv')
colors = [cmap(i) for i in np.linspace(0, 1, num_colors)]
# Create an empty array for our scores
scores = np.zeros((len(eps), 10))
for j in range(len(labels_of_2)):
# Pick the image for this iteration
x00 = labels_of_2[j].reshape((1, 784))
# Calculate the sign of the derivative,
# at the image and at the desired class
# label
sign = np.sign(im_derivative[j])
# Calculate the new scores for each
# adversarial image
for i in range(len(eps)):
x_fool = x00 + eps[i] * sign
scores[i, :] = logits.eval({x: x_fool,
keep_prob: 1.0})
Now we can graph the images using the following...
# Create a figure
plt.figure(figsize=(10, 8))
plt.title("Image {}".format(j))
# Loop through the score functions for each
# class label and plot them as a function of
# epsilon
for k in range(len(scores.T)):
plt.plot(eps, scores[:, k],
color=colors[k],
marker='.',
label=labels[k])
plt.legend(prop={'size':8})
plt.xlabel('Epsilon')
plt.ylabel('Class Score')
plt.grid('on')
For the first image the graph would look something like the following...
Now Here Is My Question
Let's say the model I trained used a batch_size of 100, in that case the following line would not work...
scores[i, :] = logits.eval({x: x_fool,
keep_prob: 1.0})
In order for this to work, I would need to pass an array of 100 images to the model, but in this instance x_fool is just one image of size (1, 784).
I want to graph the effect of different epsilon values on class scores for any one image, but how can I do so when I need calculate the score of 100 images at a time (since my model was trained on a batch_size of 100)?
You can choose to not choose a batch size by setting it to None. That way, any batch size can be used.
However, keep in mind that this non-choice could com with a moderate penalty.
This fixes it if you start again from scratch. If you start from an existing trained network with a batch size of 100, you can create a test network that is similar to your starting network except for the batch size. You can set the batch size to 1, or again, to None.
I realised the problem was not with the batch_size but with the format of the image I was attempting to pass to the model. As user1735003 pointed out, the batch_size does not matter.
The reason I could not pass the image to the model was because I was passing it as so...
x_fool = x00 + eps[i] * sign
scores[i, :] = logits.eval({x: x_fool})
The problem with this is that the shape of the image is simply (784,) whereas the placeholder needs to accept an array of images of shape shape=[None, 784], so what needs to be done is to reshape the image.
x_fool = labels_of_2[0].reshape((1, 784)) + eps[i] * sign
scores[i, :] = logits.eval({x:x_fool})
Now my image is shape (1, 784) which can now be accepted by the placeholder.

Input dimension mismatch binary crossentropy Lasagne and Theano

I read all posts in the net adressing the issue where people forgot to change the target vector to a matrix, and as a problem remains after this change, I decided to ask my question here. Workarounds are mentioned below, but new problems show and I am thankful for suggestions!
Using a convolution network setup and binary crossentropy with sigmoid activation function, I get a dimension mismatch problem, but not during the training data, only during validation / test data evaluation. For some strange reason, of of my validation set vectors get his dimension switched and I have no idea, why. Training, as mentioned above, works fine. Code follows below, thanks a lot for help (and sorry for hijacking the thread, but I saw no reason for creating a new one), most of it copied from the lasagne tutorial example.
Workarounds and new problems:
Removing "axis=1" in the valAcc definition helps, but validation accuracy remains zero and test classification always returns the same result, no matter how many nodes, layers, filters etc. I have. Even changing training set size (I have around 350 samples for each class with 48x64 grayscale images) does not change this. So something seems off
Network creation:
def build_cnn(imgSet, input_var=None):
# As a third model, we'll create a CNN of two convolution + pooling stages
# and a fully-connected hidden layer in front of the output layer.
# Input layer using shape information from training
network = lasagne.layers.InputLayer(shape=(None, \
imgSet.shape[1], imgSet.shape[2], imgSet.shape[3]), input_var=input_var)
# This time we do not apply input dropout, as it tends to work less well
# for convolutional layers.
# Convolutional layer with 32 kernels of size 5x5. Strided and padded
# convolutions are supported as well; see the docstring.
network = lasagne.layers.Conv2DLayer(
network, num_filters=32, filter_size=(5, 5),
nonlinearity=lasagne.nonlinearities.rectify,
W=lasagne.init.GlorotUniform())
# Max-pooling layer of factor 2 in both dimensions:
network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))
# Another convolution with 16 5x5 kernels, and another 2x2 pooling:
network = lasagne.layers.Conv2DLayer(
network, num_filters=16, filter_size=(5, 5),
nonlinearity=lasagne.nonlinearities.rectify)
network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))
# A fully-connected layer of 64 units with 25% dropout on its inputs:
network = lasagne.layers.DenseLayer(
lasagne.layers.dropout(network, p=.25),
num_units=64,
nonlinearity=lasagne.nonlinearities.rectify)
# And, finally, the 2-unit output layer with 50% dropout on its inputs:
network = lasagne.layers.DenseLayer(
lasagne.layers.dropout(network, p=.5),
num_units=1,
nonlinearity=lasagne.nonlinearities.sigmoid)
return network
Target matrices for all sets are created like this (training target vector as an example)
targetsTrain = np.vstack( (targetsTrain, [[targetClass], ]*numTr) );
...and the theano variables as such
inputVar = T.tensor4('inputs')
targetVar = T.imatrix('targets')
network = build_cnn(trainset, inputVar)
predictions = lasagne.layers.get_output(network)
loss = lasagne.objectives.binary_crossentropy(predictions, targetVar)
loss = loss.mean()
params = lasagne.layers.get_all_params(network, trainable=True)
updates = lasagne.updates.nesterov_momentum(loss, params, learning_rate=0.01, momentum=0.9)
valPrediction = lasagne.layers.get_output(network, deterministic=True)
valLoss = lasagne.objectives.binary_crossentropy(valPrediction, targetVar)
valLoss = valLoss.mean()
valAcc = T.mean(T.eq(T.argmax(valPrediction, axis=1), targetVar), dtype=theano.config.floatX)
train_fn = function([inputVar, targetVar], loss, updates=updates, allow_input_downcast=True)
val_fn = function([inputVar, targetVar], [valLoss, valAcc])
Finally, here the two loops, training and test. The first is fine, the second throws the error, excerpts below
# -- Neural network training itself -- #
numIts = 100
for itNr in range(0, numIts):
train_err = 0
train_batches = 0
for batch in iterate_minibatches(trainset.astype('float32'), targetsTrain.astype('int8'), len(trainset)//4, shuffle=True):
inputs, targets = batch
print (inputs.shape)
print(targets.shape)
train_err += train_fn(inputs, targets)
train_batches += 1
# And a full pass over the validation data:
val_err = 0
val_acc = 0
val_batches = 0
for batch in iterate_minibatches(valset.astype('float32'), targetsVal.astype('int8'), len(valset)//3, shuffle=False):
[inputs, targets] = batch
[err, acc] = val_fn(inputs, targets)
val_err += err
val_acc += acc
val_batches += 1
Erorr (excerpts)
Exception "unhandled ValueError"
Input dimension mis-match. (input[0].shape[1] = 52, input[1].shape[1] = 1)
Apply node that caused the error: Elemwise{eq,no_inplace}(DimShuffle{x,0}.0, targets)
Toposort index: 36
Inputs types: [TensorType(int64, row), TensorType(int32, matrix)]
Inputs shapes: [(1, 52), (52, 1)]
Inputs strides: [(416, 8), (4, 4)]
Inputs values: ['not shown', 'not shown']
Again, thanks for help!
so it seems the error is in the evaluation of the validation accuracy.
When you remove the "axis=1" in your calculation, the argmax goes on everything, returning only a number.
Then, broadcasting steps in and this is why you would see the same value for the whole set.
But from the error you have posted, the "T.eq" op throws the error because it has to compare a 52 x 1 with a 1 x 52 vector (matrix for theano/numpy).
So, I suggest you try to replace the line with:
valAcc = T.mean(T.eq(T.argmax(valPrediction, axis=1), targetVar.T))
I hope this should fix the error, but I haven't tested it myself.
EDIT:
The error lies in the argmax op that is called.
Normally, the argmax is there to determine which of the output units is activated the most.
However, in your setting you only have one output neuron which means that the argmax over all output neurons will always return 0 (for first arg).
This is why you have the impression your network gives you always 0 as output.
By replacing:
valAcc = T.mean(T.eq(T.argmax(valPrediction, axis=1), targetVar.T))
with:
binaryPrediction = valPrediction > .5
valAcc = T.mean(T.eq(binaryPrediction, targetVar.T)
you should get the desired result.
I'm just not sure, if the transpose is still necessary or not.

Resources