how to enforce feature orthogonality in keras - keras

I am new to Keras and Tensorflow. I want to add a penalty to my categorical cross entropy loss function based on some of the outputs in the network. Specifically, I decompose the outputs of a fully connected layer into 8 partitions want these outputs to be orthogonal. So I append the activations into a list and convert it into a stack by using Keras backend. Here is how I partitioned the activations:
for i in range(8):
x_sub = Lambda(lambda x: x[:,i*128:i*128+128])(x)
features.append(x_sub)
#convert batch of feature lists in to 8x(128*batch_size) keras tensor
outs.append(Lambda(lambda x: K.reshape(K.stack(x, axis=0), (8, -1)))(features))
net = Model(inputs=[net.input], outputs=outs)
And then I define the loss as follows:
def OrthLoss(features):
W = K.l2_normalize(features, axis=1)
diff = K.dot(W, K.transpose(W)) - K.eye(8)
return K.mean(diff)
However, this does not seem to converge. Is this a right way to accomplish this? I first tried to enforce the orthogonality as a regularizer on weights but as far as I understand Keras performs regularization on each layer separately and found no way to define a regularizer on multiple weights.

Related

Question about understanding Weights of Keras LSTM model

I am implementing Federated Learning (FL) using Keras LSTM. (For this question, FL details are not necessary.)
Starting with the simple example where multiple models are trained at different clients. Each client shares their model weights with the server and (in this simple example), the model weights are averaged by the server and a global model is sent to the remaining clients. (Keeping long story short).
Keeping things further simple at this stage: I am using single LSTM unit, with input_shape = (1,1)
Now, when I tried to get the weights of Keras LSTM, it is a list of 3 arrays.
Weights[0] and Weights[1] elements are the floating point values, where as Weight[2] are the Binary 0/1 values. Is my understanding correct that Weight[2] is the On/OFF gate associated with the tanh gate?
Is there any information about these weights?
n_steps = 1
n_features = 1 # This indicates the number of past values
model1 = Sequential()
model1.add(LSTM(1, activation = 'relu', input_shape=(n_steps, n_features)))
model1.compile(loss='mae', optimizer = 'adamax')
Weights = model1.get_weights()
print(model1.summary())

torch.nn.CrossEntropyLoss over Multiple Batches

I am currently working with torch.nn.CrossEntropyLoss. As far as I know, it is common to compute the loss batch-wise. However, is there a possibility to compute the loss over multiple batches?
More concretely, assume we are given the data
import torch
features = torch.randn(no_of_batches, batch_size, feature_dim)
targets = torch.randint(low=0, high=10, size=(no_of_batches, batch_size))
loss_function = torch.nn.CrossEntropyLoss()
Is there a way to compute in one line
loss = loss_function(features, targets) # raises RuntimeError: Expected target size [no_of_batches, feature_dim], got [no_of_batches, batch_size]
?
Thank you in advance!
You can compute multiple cross-entropy losses but you'll need to do your own reduction. Since cross-entropy loss assumes the feature dim is always the second dimension of the features tensor you will also need to permute it first.
loss_function = torch.nn.CrossEntropyLoss(reduction='none')
loss = loss_function(features.permute(0,2,1), targets).mean(dim=1)
which will result in a loss tensor with no_of_batches entries.

Graph disconnected: cannot obtain value for tensor Tensor

I have to train a GAN network with Generator and Discriminator. My Generator Network is as below.
def Generator(image_shape=(512,512,3):
inputs = Input(image_shape)
# 5 convolution Layers
# 5 Deconvolution Layers along with concatenation
# output shape is (512,512,3)
model=Model(inputs=inputs,outputs=outputs, name='Generator')
return model, output
My Discriminator Network is as below. The first step in Discriminator network is that I have to concatenate the input of discriminator with output of Generator.
def Discriminator(Generator_output, image_shape=(512,512,3)):
inputs=Input(image_shape)
concatenated_input=concatenate([Generator_output, inputs], axis=-1)
# Now start applying Convolution Layers on concatenated_input
# Deconvolution Layers
return Model(inputs=inputs,outputs=outputs, name='Discriminator')
Initiating the Architectures
G, Generator_output=Generator(image_shape=(512,512,3))
G.summary
D=Discriminator(Generator_output, image_shape=(512,512,3))
D.summary()
My Problem is when I pass concatenated_input to convolution layers it gets me the following error.
Graph disconnected: cannot obtain value for tensor Tensor("input_1:0", shape=(?, 512, 512, 3), dtype=float32) at layer "input_1". The following previous layers were accessed without issue: []
If I remove the concatenation layer it works perfectly but why it's not working after concatenation layer although the shape of inputs and Generator_output in concatenation is also same i.e. (512,512,3).
The key insight that will help you here is that Models are just like layers in Keras but self contained. So to connect one model output to another, you need to say the second model receieves an input of matching shape rather than directly passing that tensor:
def Discriminator(gen_output_shape, image_shape=(512,512,3)):
inputs=Input(image_shape)
gen_output=Input(gen_output_shape)
concatenated_input=concatenate([gen_output, inputs], axis=-1)
# Now start applying Convolution Layers on concatenated_input
# Deconvolution Layers
return Model(inputs=[inputs, gen_output],outputs=outputs, name='Discriminator')
And then you can use it like a layer:
G=Generator(image_shape=(512,512,3))
D=Discriminator((512,512,3), image_shape=(512,512,3))
some_other_image_input = Input((512,512,3))
discriminator_output = D(some_other_image_input, G) # model is used like a layer
# so the output of G is connected to the input of D
D.summary()
gan = Model(inputs=[all,your,inputs], outputs=[outputs,for,training])
# you can still use G and D like separate models, save them, train them etc
To train them together you can create another Model that has all the required inputs, calls the generator / discriminator. Think of using a lock and key idea, every model has some inputs and you can use them like layers in another Model so long you provide the correct inputs.

How to use the result of embedding with mask_zero=True in keras

In keras, I want to calculate the mean of nonzero embedding output.
I wonder what is the difference between mask_zero=True or False in Embedding Layer.
I tried the code below :
input_data = Input(shape=(5,), dtype='int32', name='input')
embedding_layer = Embedding(1000, 24, input_length=5,mask_zero=True,name='embedding')
out = word_embedding_layer(input_data)
def antirectifier(x):
x = K.mean(x, axis=1, keepdims=True)
return x
def antirectifier_output_shape(input_shape):
shape = list(input_shape)
return tuple(shape)
out = Lambda(antirectifier, output_shape=antirectifier_output_shape,name='lambda')(out)
But it seems that the result is the mean of all the elements, how can i just calculate the mean of all nonzero inputs?
From the function's doc :
If this is True then all subsequent layers in the model need to
support masking
Your lambda function doesn't support masking. For example Recurrent layers in Keras support masking. If you set mask_zero=True in your embeddings, then all the 0 indices that you feed to the embedding layer will be propagated as "masked" and the following layers that are able to understand the "masked" information will use them.
Basically, if you build a "mean" layer that grabs the mask and computes the average only for non-masked values, then you will get the desired results.
You can find here a way to build your lambda layers that support masking
I hope it helps.

what exactly does 'tf.contrib.rnn.DropoutWrapper'' in tensorflow do? ( three citical questions)

As I know, DropoutWrapper is used as follows
__init__(
cell,
input_keep_prob=1.0,
output_keep_prob=1.0,
state_keep_prob=1.0,
variational_recurrent=False,
input_size=None,
dtype=None,
seed=None
)
.
cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True)
cell = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=0.5)
cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True)
the only thing I know is that it is use for dropout while training.
Here are my three questions
What are input_keep_prob,output_keep_prob and state_keep_prob respectively?
(I guess they define dropout probability of each part of RNN, but exactly
where?)
Is dropout in this context applied to RNN not only when training but also prediction process? If it's true, is there any way to decide whether I do or don't use dropout at prediction process?
As API documents in tensorflow web page, if variational_recurrent=True dropout works according to the method on a paper
"Y. Gal, Z Ghahramani. "A Theoretically Grounded Application of Dropout in Recurrent Neural Networks". https://arxiv.org/abs/1512.05287 " I understood this paper roughly. When I train RNN, I use 'batch' not single time-series. In this case, tensorflow automatically assign different dropout mask to different time-series in a batch?
input_keep_prob is for the dropout level (inclusion probability) added when fitting feature weights. output_keep_prob is for the dropout level added for each RNN unit output. state_keep_prob is for the hidden state that is fed to the next layer.
You can initialize each of the above mentioned parameters as follows:
import tensorflow as tf
dropout_placeholder = tf.placeholder_with_default(tf.cast(1.0, tf.float32))
tf.nn.rnn_cell.DropoutWrapper(tf.nn.rnn_cell.BasicRNNCell(n_hidden_rnn),
input_keep_prob = dropout_placeholder, output_keep_prob = dropout_placeholder,
state_keep_prob = dropout_placeholder)
The default dropout level will be 1 during prediction or anything else that we can feed during training.
The masking is done for the fitted weights rather than for the sequences that are included in the batch. As far as I know, it's done for the entire batch.
keep_prob = tf.cond(dropOut,lambda:tf.constant(0.9), lambda:tf.constant(1.0))
cells = rnn.DropoutWrapper(cells, output_keep_prob=keep_prob)

Resources